The basic concepts of Kafka producers, consumers, and brokers

Kafka is a publishing and subscription-based messaging system. It is often referred to as a "distributed commit log" or "distributed streaming platform". File system or database commit logs are used to provide a persistent record of all transactions, and the state of the system can be reconstructed by rebuilding these logs. Similarly, Kafka's data is persisted and stored in a certain order and can be read on demand.

1. Kafka topology

2. Characteristics of Kafka

It also provides high throughput for distribution and subscription. It is understood that Kafka can produce about 250,000 messages per second (50MB) and process 550,000 messages per second (110MB). The number of messages mentioned here may not be particularly accurate because the size of the messages may not be consistent;

Persistence operations can be performed to persist messages to disk and store them in the form of logs, so they can be used for batch consumption, such as ETL, and real-time applications. Data loss can be prevented by persisting data to hard disk and replication.
Distributed system, easy to expand outward. All producers, brokers and consumers will have multiple, all distributed. Machines can be expanded without downtime.

The state of message processing is maintained on the consumer side, not on the server side, and can be automatically balanced in the event of a failure.
Supports online and offline scenarios.

3. Kafka's core concepts

Glossary
Producer message producer
Consumer Message consumer
ConsumerGroup consumer group, can consume messages from partitions in Topic in parallel
Broker Cache agent, one or more servers in a Kafka cluster are collectively referred to as brokers.
Topic Kafka processes different types of message sources (feeds of messages)
Partition Topic is a physical grouping. A topic can be divided into multiple partitions. Each partition is an ordered queue. Each message in a partition is assigned an ordered ID (offset).
Message is the basic unit of communication. Each producer can publish some messages to a topic.
Producers are message and data generators. The process of publishing messages to a Kafka topic is called producers.
Consumers are consumers of messages and data. The process of subscribing to a topic and processing its published data is called consumers.

3.1 The concept of Producers

Message and data generators. The process of publishing messages to a Kafka topic is called producers.
The Producer publishes the message to the specified Topic. The Producer can also decide which partition the message belongs to, for example, based on a round-robin approach or some other algorithm.
Asynchronous sending and batch sending can effectively improve sending efficiency. The asynchronous sending mode of kafka producer allows batch sending, first caching the messages in memory, and then sending them out in batches at one time.

3.2 The concept of broker:

Broker does not have a replica mechanism. Once a broker goes down, the messages of that broker will be unavailable.
Broker does not save the status of subscribers, which is saved by the subscribers themselves.
The statelessness makes it difficult to delete messages (the deleted messages may be subscribed to). Kafka uses a time-based SLA (service assurance), and messages will be deleted after being stored for a certain period of time (usually 7 days).
Consumer subscribers can rewind back to any location to re-consume. When a subscriber fails, the smallest offset (id) can be selected to re-read the consumption message.

3.3 Message Composition

Message: It is the basic unit of communication. Each producer can publish a message to a topic.
Messages in Kafka are organized based on topics. Different topics are independent of each other. Each topic can be divided into different partitions, and each partition stores a part of the message.
Each Message in a partion contains the following three attributes:
offset long
MessageSize int32
Specific content of data messages

3.4 The concept of consumers

Message and data consumers, the process of subscribing to a topic and processing the messages it publishes is called consumers. In Kafka, we can think of a group as a "subscriber". Each partition in a topic will only be consumed by a consumer in a "subscriber", but a consumer can consume messages from multiple partitions. Note: Kafka's design principle determines that for a topic, the same group cannot have more consumers than the number of partitions consuming at the same time, otherwise it will mean that some consumers cannot get the message

<<: How to smoothly go online after MySQL table sharding?

>>: Experience the Serverless application programming model in cloud native scenarios

Software: Share 9 practical computer software, worth a look

Recommend

Ruijie's all-scenario cloud desktop leads the new trend of Internet medical development

At present, affected by the epidemic, Internet me...

5G has yet to bring innovation in connectivity pricing

It is reported that unlike its predecessor, 5G ha...

The 5G process will not be interrupted, and the short-term impact of the epidemic on the optical communications industry is controllable

During the past Spring Festival, the novel corona...

...

Huawei redefines data infrastructure. Here are the answers to five questions that the industry should be concerned about!

[51CTO.com original article] On May 15, Huawei re...

How many hurdles does industrial digital transformation have to overcome? Wind River provides a cost-effective option that integrates the old and the new

[51CTO.com original article] "It's time ...

The basic concepts of Kafka producers, consumers, and brokers

Software: Share 9 practical computer software, worth a look

Huawei Enjoy 10S first hands-on: light and good-looking, screen fingerprint and photography are amazing

CMIVPS offers 10% off on Seattle high-security VPS, AS4837 line optimization starting at $6.1/month

Looking ahead to network technology trends in 2018

5G and satellite, what is the relationship?

Chinese companies are strong in 5G R&D, spectrum strategy planning needs to be implemented

vRAN, C-RAN, O-RAN, OpenRAN, the love-hate relationship between Open RAN

What is the difference between HTTP and RPC?

Overview of 5G development plans for the country and provinces and cities in 2021

spinservers: $19/month - Dual E5-2670, 8G memory, 500G disk, 10TB monthly bandwidth, San Jose/Dallas data center

Recommend

Ruijie's all-scenario cloud desktop leads the new trend of Internet medical development

5G has yet to bring innovation in connectivity pricing

The 5G process will not be interrupted, and the short-term impact of the epidemic on the optical communications industry is controllable

Huawei releases power distribution solution based on edge computing IoT and AMI 3.0 solution covering all scenarios

10gbiz: Hong Kong CN2 GIA+China Unicom VIP line launched, VPS 60% off monthly payment starting from US$2.75

Global chip makers warn that global chip shortages will last for years

Interpretation: Why is 5G industry application so tepid? It requires full participation from all parties

Huawei fully opens HMS and calls on more developers to join the new all-scenario smart ecosystem

Top ten trend predictions: Where will domestic telecom operators go in 2021?

Huawei redefines data infrastructure. Here are the answers to five questions that the industry should be concerned about!

How Do PoE Switches Work?

At the Huawei Maimang 8 launch event, He Gang of Huawei said that the greater the challenge, the greater the achievement

5 Fast-Developing Technology Trends in the Network Industry in 2017

How many hurdles does industrial digital transformation have to overcome? Wind River provides a cost-effective option that integrates the old and the new