Best Practices for Stream Computing Processing with Flink on Zeppelin

Best Practices for Stream Computing Processing with Flink on Zeppelin

Content framework:

Big Data Overview
Flink Learning Framework
Demonstration of best practices for stream computing on EMR Studio

1. Overview of Big Data

Big Data Processing ETL (Data → Data)
Big Data Analysis BI (Data → Dashboard)
Machine Learning AI (Data → Model)

2. Flink Learning Framework

Flink Essentials

Stateful
Time
Flink Architecture
Flink API
Flink Configuration
Flink Log

Stateful:

Why

Timeliness of stream computing

Unbounded Stream Computing

When

Window

Join

Pattern

How

statebackend

Time

Event time
Processing time
Watermark

Flink Architecture

Flink API

Flink Configuration

Cluster Configuration
Job Configuration
Statebackend
Resource Manager
SQL/Python
Reference documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/

Flink Log

III. Best Practices for Stream Computing on EMR Studio

EMR Studio features:

Compatible with open source components

EMR Studio has been optimized and enhanced based on the open source software Apache Zeppelin, Jupyter Notebook, and Apache Airflow.


Supports connecting multiple clusters and adapting to multiple computing engines. Interactive development + seamless job scheduling. Applicable to a variety of big data application scenarios. Computing and storage separation

Flink Clients

Flink on Zeppelin (Phase 1) - Interactive Flink Client

Flink on Zeppelin (Phase 2) - Interactive JobManager

Flink on Zeppelin Main Features

Original link: http://click.aliyun.com/m/1000286010/

<<:  It’s time to launch 5G applications

>>:  External tools connect to SaaS mode cloud data warehouse MaxCompute practice

Recommend

My girlfriend suddenly asked me what DNS is...

[[357457]] This article is reprinted from the WeC...

What is the difference between MPLS and IP?

MPLS VS IP (1) IP forwarding principle: The route...

IPv6 is coming, what should we do with SDN?

IPv6 has been called for so many years, and final...

Fiber Polarity and Its Role in Switching Technology

Before we delve into the world of switching techn...

Health and Risk: A New Model for Data Center Capacity Management

Some analysis companies believe that capacity man...

Accident review: We duplicated the order ID!

[[428490]] introduce In many business systems, we...

5G, IoT and AI: Art and tech jobs for 2021

We are witnessing a massive transformation in the...

...