Best Practices for Stream Computing Processing with Flink on Zeppelin

Best Practices for Stream Computing Processing with Flink on Zeppelin

Content framework:

Big Data Overview
Flink Learning Framework
Demonstration of best practices for stream computing on EMR Studio

1. Overview of Big Data

Big Data Processing ETL (Data → Data)
Big Data Analysis BI (Data → Dashboard)
Machine Learning AI (Data → Model)

2. Flink Learning Framework

Flink Essentials

Stateful
Time
Flink Architecture
Flink API
Flink Configuration
Flink Log

Stateful:

Why

Timeliness of stream computing

Unbounded Stream Computing

When

Window

Join

Pattern

How

statebackend

Time

Event time
Processing time
Watermark

Flink Architecture

Flink API

Flink Configuration

Cluster Configuration
Job Configuration
Statebackend
Resource Manager
SQL/Python
Reference documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/

Flink Log

III. Best Practices for Stream Computing on EMR Studio

EMR Studio features:

Compatible with open source components

EMR Studio has been optimized and enhanced based on the open source software Apache Zeppelin, Jupyter Notebook, and Apache Airflow.


Supports connecting multiple clusters and adapting to multiple computing engines. Interactive development + seamless job scheduling. Applicable to a variety of big data application scenarios. Computing and storage separation

Flink Clients

Flink on Zeppelin (Phase 1) - Interactive Flink Client

Flink on Zeppelin (Phase 2) - Interactive JobManager

Flink on Zeppelin Main Features

Original link: http://click.aliyun.com/m/1000286010/

<<:  It’s time to launch 5G applications

>>:  External tools connect to SaaS mode cloud data warehouse MaxCompute practice

Blog    

Recommend

TCP/IP based application programming interface

In "TCP/IP Basics: Data Encapsulation",...

What are the security standards for 5G?

[Editor's Recommendation] 5G security standar...

Accelerate the release of new infrastructure value with data as the core

[[341973]] Yu Yingtao, Co-President of Tsinghua U...

The core network and its vital role in cellular connectivity

The emergence of the Internet of Things (IoT) and...

What is in the Http Header?

The author has developed a simple, stable, and sc...

Global fiber shortage threatens 5G and data center infrastructure

According to a report by the Financial Times (FT....

Choose Cisco ACI, the future data center has unlimited potential

[51CTO.com original article] According to market ...

Blockchain is booming, why haven’t the giants entered the market yet?

The popularity of blockchain is due to the emerge...

...