Stream processing is essential to understand and embrace in today’s data-driven world: it is the practice of analyzing and transforming data in real time as it arrives. By instantly analyzing new data arriving via a stream, organizations can gain insights and take action immediately.
Stream processing is different from batch processing and therefore requires a different architecture.
Data Streams | Batch Processing |
Handle an unbounded, infinite flow of data as it happens. | Handle bounded, finite data with a beginning and end. |
Stream data usually has the following properties:
- Small records (often JSON or XML fragments).
- High velocity (and therefore, high overall data volume).
- Variable velocity, with both quiet and busy periods.
- Data may arrive out of sequence compared to when the event happened.
The Internet of Things (IoT) is a prime example of technology that relies on data streams, as information continuously arrives from various sensors. There are also many business applications for stream processing—imagine anything that arrives rapidly and continuously, such as customer orders, airline reservations, insurance claims, bank transactions, call logs, market data, and even emails.
How to get started with stream processing
In a recent webinar poll, we found that most users are currently leveraging streaming data or plan to within the next few months. With this rising demand for handling high-velocity data, it’s important to choose the right enterprise integration platform that can do both batch and stream processing.
Build a workspace in FME Form to process and analyze the data. An example workspace could ingest a Kafka stream, filter out a specific type of data, and then write the data to Snowflake. Then, deploy the workspace in FME Flow and connect it to the stream.
FME Flow Streams | FME Flow Automations |
Handles high-volume data streams (millions of records per minute). Always on and receiving data, ready to process it immediately. | Handles event processing and automating complex workflows. A trigger starts an Automation, which runs a workspace to complete a job. |
Tip: Because of the nature of data streams, it’s critical to deploy a single, simple workspace that can quickly process a lot of data.
FME-supported IoT and messaging protocols include RabbitMQ, MQTT, Kafka, SQS, JMS, TCP/IP, and many more. If FME doesn’t connect to the one you need, let us know!
Common stream processing workflows
Below is a list of common stream processing workflows. Visit FME and Stream Processing and watch our webinar, Powering Real-Time Decisions with Continuous Data Streams, to see demos and learn the technical details of each workflow.
- Windowing: Break up the data stream into time-based groups for processing, e.g. process it in 1-minute intervals.
- Filtering: Reduce data volumes by processing only the area of interest, e.g. filter by location to process data that’s within a bounding box or geofence.
- Enriching: Join the data with other datasets, e.g. overlay it on a geographic region or do a database join.
- Aggregating: Summarize blocks of data, e.g. get the average speed of vehicles each minute.
- Event detection: Detect patterns and only trigger an event when certain criteria are met, e.g. send an alert if a fleet vehicle leaves the service area.
- Proximity analysis: Select geographic features based on their distance from other features, e.g. if a gas leak is detected in an incoming data stream, where is it and which customers are affected?
- Snapping: Bring spatial features together if they are within a certain distance of each other, e.g. snap vehicle points to a road network.
With the rise of sensors and high-velocity data collection, stream processing is a rapidly growing technology for delivering real-time insights. Visit the FME Community and let us know how you plan to use data streams!
Learn more: FME and Stream Processing