Vaclav Kosar's face photo
Vaclav Kosar
Software And Machine Learning Blog

Our Presentation At Spark N AI Summit

Me and Marek Novotny had opportunity to present our POC and future plans for Spark Structured Streaming data lineage.

Presentation page can be found here.

Video and Photos:

The video is available here.

image1 image2

Full description:

Data lineage tracking is one of the significant problems that companies in highly regulated industries face. These companies are forced to have a good understanding of how data flows through their systems to comply with strict regulatory frameworks. Many of these organizations also utilize big and fast data technologies such as Hadoop, Apache Spark and Kafka. Spark has become one of the most popular engines for big data computing. In recent releases, Spark also provides the Structured Streaming component, which allows for real-time analysis and processing of streamed data from many sources. Spline is a data lineage tracking and visualization tool for Apache Spark. Spline captures and stores lineage information from internal Spark execution plans in a lightweight, unobtrusive and easy to use manner.

Additionally, Spline offers a modern user interface that allows non-technical users to understand the logic of Apache Spark applications. In this presentation we cover the support of Spline for Structured Streaming and we demonstrate how data lineage can be captured for streaming applications.

04 Oct 2018

Privacy Policy How many days left in this quarter?