Klarrio/UGent launch Streaming Analytics Benchmark
If you are dealing with big data, high throughput, low latency, adaptive online algorithms, streaming analytics is the way to go. In recent years, a lot of R&D was put into the development of so-called streaming frameworks. It should be noted that a streaming context, in which incoming streams are aggregated, is totally different from online analytics. The latter type is performed on streaming data that has been enriched by static data. Additionally, streaming data types can be very different from case to case and little is known about how different frameworks react to different types of data.
Although the industry acknowledges the power of streaming analytics, one struggles to decide which framework will suit their needs and subsequently solve their problem in an optimal way.
Therefore, Klarrio, in close collaboration with UGent Big Data Analytics Team, is developing a streaming analytics benchmark.
The included frameworks are Apache Spark, Apache Flink, Apache Storm (Trident) and Kafka Streams.
The first phase of the benchmark will focus on measuring throughput and latency for a basic streaming job comprising the following steps: 1. Ingest: Read event data from Kafka. 2. Basic transformations: Parse the data. 3. Joins: Join datastreams across topics together. 4. Aggregation: Compute aggregated metrics for each measurement point. 5. Window operations: Evolution of the metrics over specified look-back periods.
A second phase of the project will include data augmentation by enriching streaming data with static data. The third phase will extend the first and second phase by implementing analytical models for predictive and prescriptive analytics.
For further information or if you have any questions please do not hesitate to contact us: info AT klarrio.com
More info: http://www.bigdata.ugent.be/