Asynchronous Data Pipeline = AWS (S3 & SQS) + FME Cloud
Presentation Details
Tesera was challenged with a web app use case requiring storing large amounts of data, handling a large numbers of data processing requests, implementing a non-blocking data pipeline, minimizing infrastructure costs, and automating data handling and processing. Using the value provided by Amazon Web Services (AWS) Simple Storage Service (S3) for data storage, AWS Simple Queue Service (SQS) for task queuing, and FME Cloud for data processing infrastructure, Tesera has developed an asynchronous data pipeline that manages client data from upload through to model indicator development for a number of analysis applications. Additionally, the FME Cloud Scheduler is used to reduce infrastructure costs having the data processing engine available only during business hours, and SQS to house the task in queue until the processing engine is online. The entire pipeline includes: 1. user project creation via web application -> SQS queue to initiate base project data 2. user data upload via web application -> data storage to S3 -> SQS queue of data to be checked -> FME Cloud to perform data checking -> S3 to store results of data checking 3. user views results of data checking via web application -> SQS queue of valid data to be processed -> FME Cloud to process raw data into model indicators -> S3 to store processed project data 4. user views project data via web application and kicks off data analysis My presentation will walk users through the entire pipeline focusing on how we have setup FME Cloud to watch for and process new SQS queue messages, how we generically manage data locally on FME Cloud, and how we get to use S3 as the primary data repository.