下载全部视频和PPT,请关注公众号(bigdata_summit),并点击“视频下载”菜单
Building Event-Driven Services with Stateful Streams
by Benjamin Stopford, Engineer, Confluent
video, slide
Event Driven Services come in many shapes and sizes from tiny event driven functions that dip into an event stream, right through to heavy, stateful services which can facilitate request response. This practical talk makes the case for building this style of system using Stream Processing tools. We also walk through a number of patterns for how we actually put these things together.
下面的内容来自机器翻译:
事件驱动服务具有许多形式和尺寸,从小事件驱动的功能进入事件流,直到沉重,有状态的服务,这可以方便请求响应。这个实际的谈话使得使用流处理工具来构建这种类型的系统成为可能。我们也通过一些模式来解释我们如何将这些东西放在一起。
Building Stateful Financial Applications with Kafka Streams
by Charles Reese, Senior Software Engineer, Funding Circle
video, slide
At Funding Circle, we are building a global lending platform with Apache Kafka and Kafka Streams to handle high volume, real-time processing with rapid clearing times similar to a stock exchange. In this talk, we will provide an overview of our system architecture and summarize key results in edge service connectivity, idempotent processing, and migration strategies.
下面的内容来自机器翻译:
在Funding Circle,我们正在与Apache Kafka和Kafka Streams建立一个全球性的贷款平台,以处理大批量,实时的处理,快速的结算时间与证券交易所类似。在本次演讲中,我们将概述我们的系统架构,并总结边缘服务连接,幂等处理和迁移策略的关键成果。
Fast Data in Supply Chain Planning
by Jeroen Soeters, Lead Developer, ThoughtWorks
video, slide
We are migrating one of the top 3 consumer packaged goods companies from a batch-oriented systems architecture to a streaming micro services platform. In this talk I’ll explain how we leverage the Lightbend reactive stack and Kafka to achieve this and how the 4 Kafka APIs fit in our architecture. Also I explain why Kafka Streams <3 Enterprise Integration Patterns.
下面的内容来自机器翻译:
我们正在将三大消费品公司之一从批处理系统架构迁移到流式微服务平台。在这个演讲中,我将解释我们如何利用Lightbend反应堆和Kafka来实现这个目标,以及4个Kafka API如何适应我们的架构。另外我解释了为什么Kafka Streams <3企业集成模式。
Kafka Stream Processing for Everyone with KSQL
by Nick Dearden, Director of Engineering, Confluent
video, slide
The rapidly expanding world of stream processing can be daunting, with new concepts (various types of time semantics, windowed aggregates, changelogs, and so on) and programming frameworks to master. KSQL is a new open-source project which aims to simplify all this and make stream processing available to everyone.
下面的内容来自机器翻译:
随着新概念(各种类型的时间语义,窗口聚合,更新日志等)和编程框架的掌握,流处理的迅速发展的世界将变得艰巨。 KSQL是一个新的开源项目,旨在简化所有这些工作,并为每个人提供流处理。
Portable Streaming Pipelines with Apache Beam
by Frances Perry, Software Engineer, Google
video, slide
Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. By cleanly separating the user’s processing logic from details of the underlying execution engine, the same pipelines will run on any Apache Beam runtime environment, whether it’s on-premise or in the cloud, on open source frameworks like Apache Spark or Apache Flink, or on managed services like Google Cloud Dataflow. In this talk, I will:
Briefly, introduce the capabilities of the Beam model for data processing and integration with IO connectors like Apache Kafka.
Discuss the benefits Beam provides regarding portability and ease-of-use.
Demo the same Beam pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Flink on Google Cloud, Apache Spark on AWS, Apache Apex on-premise).
Give a glimpse at some of the challenges Beam aims to address in the future.
下面的内容来自机器翻译:
就像SQL作为声明性数据分析的通用语言一样,Apache Beam旨在提供一种便携式标准,用于在各种平台上以各种语言表示健壮的,无序的数据处理管道。通过将用户的处理逻辑与基础执行引擎的细节完全分离,相同的管道将运行在任何Apache Beam运行时环境(无论是内部部署还是云中),Apache Spark或Apache Flink等开放源代码框架上,还是像谷歌云数据流管理的服务。在这个演讲中,我会:
简而言之,介绍Beam模型的功能,用于数据处理和IO连接器(如Apache Kafka)的集成。
讨论Beam提供的有关便携性和易用性的好处。
在多个部署场景(例如,Google Cloud上的Apache Flink,AWS上的Apache Spark,Apache Apex内部部署)上演示运行在多个运行器上的相同Beam管道。
瞥见梁在未来要解决的一些挑战。
Query the Application, Not a Database: “Interactive Queries” in Kafka’s Streams API
by Matthias Sax, Engineer, Confluent
video, slide
Kafka Streams allows to build scalable streaming apps without a cluster. This “Cluster-to-go” approach is extended by a “DB-to-go” feature: Interactive Queries allows to directly query app internal state, eliminating the need for an external DB to access this data. This avoids redundantly stored data and DB update latency, and simplifies the overall architecture, e.g., for micro-services.
下面的内容来自机器翻译:
Kafka Streams允许在没有群集的情况下构建可扩展的流式应用程序。这种“Cluster-to-go”方法通过“DB-to-go”功能进行扩展:交互式查询允许直接查询应用程序内部状态,无需外部数据库来访问这些数据。这避免了冗余存储的数据和数据库更新等待时间,并且简化了整体架构,例如对于微服务。
Real-Time Document Rankings with Kafka Streams
by Hunter Kelly, Senior Software/Data Engineer, Zalando
video, slide
The HITS algorithm creates a score for documents; one is “hubbiness”, the other is “authority”. Usually this is done as a batch operation, working on all the data at once. However, with careful consideration, this can be implemented in a streaming architecture using KStreams and KTables, allowing efficient real time sampling of rankings at a frequency appropriate to the specific use case.
下面的内容来自机器翻译:
HITS算法为文档创建分数;一个是“喧嚣”,一个是“权威”。通常这是作为批处理操作完成的,一次处理所有的数据。然而,经过慎重的考虑,这可以在使用KStreams和KTables的流式架构中实现,从而以适合特定用例的频率对排名进行高效的实时采样。
Streaming Processing in Python – 10 ways to avoid summoning Cuthulu
by Holden Karau, Principal Software Engineer, IBM
video, slide
<3 Python & want to process data from Kafka? This talk will look how to make this awesome. In many systems the traditional approach involves first reading the data into the JVM and then passing the data to Python, which can be a little slow, and on a bad day results in almost impossible to debug. This talk will look at how to be more awesome in Spark & how to do this in Kafka Streams.
下面的内容来自机器翻译:
<3 Python&想要处理来自Kafka的数据?这个演讲将看看如何使这个真棒。在许多系统中,传统的方法是首先将数据读入JVM,然后将数据传递给Python,这可能会稍微慢一点,而在糟糕的一天中,几乎不可能调试。这个演讲将讨论如何在Spark中更加棒,以及如何在Kafka Streams中做到这一点。