Apache Kafka 集群搭建与使用
继续之前的 Apache Kafka 部署与启动 ,单机的kafka的topic的创建,发送消息和接收消息,单播和多播消息,以及本次的集群搭建和使用。
一、启动zookeeper
[root@node-100 zookeeper]# cd zookeeper-3.4.12/
[root@node-100 zookeeper-3.4.12]# ls
bin conf dist-maven ivysettings.xml lib logs README.md recipes zookeeper-3.4.12.jar zookeeper-3.4.12.jar.md5 zookeeper.out
build.xml contrib docs ivy.xml LICENSE.txt NOTICE.txt README_packaging.txt src zookeeper-3.4.12.jar.asc zookeeper-3.4.12.jar.sha1
[root@node-100 zookeeper-3.4.12]# bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.12/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@node-100 zookeeper-3.4.12]# bin/zkCli.sh -server 192.168.5.100:2181
客户端连接成功后,查看之前的节点信息
Connecting to 192.168.5.100:2181
2019-01-22 22:19:21,790 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.12-e5259e437540f349646870ea94dc2658c4e44b3b, built on 03/27/2018 03:55 GMT
2019-01-22 22:19:21,794 [myid:] - INFO [main:Environment@100] - Client environment:host.name=node-100
2019-01-22 22:19:21,794 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_191
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/local/java/jdk1.8.0_191/jre
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/local/zookeeper/zookeeper-3.4.12/bin/../build/classes:/usr/local/zookeeper/zookeeper-3.4.12/bin/../build/lib/*.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/slf4j-log4j12-1.7.25.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/slf4j-api-1.7.25.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/netty-3.10.6.Final.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/log4j-1.2.17.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../lib/audience-annotations-0.5.0.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../zookeeper-3.4.12.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../src/java/lib/*.jar:/usr/local/zookeeper/zookeeper-3.4.12/bin/../conf:.:/usr/local/java/jdk1.8.0_191/lib:/usr/local/java/jdk1.8.0_191/jre/lib:
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.0-327.el7.x86_64
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root
2019-01-22 22:19:21,796 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/usr/local/zookeeper/zookeeper-3.4.12
2019-01-22 22:19:21,797 [myid:] - INFO [main:ZooKeeper@441] - Initiating client connection, connectString=192.168.5.100:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@69d0a921
Welcome to ZooKeeper!
JLine support is enabled
2019-01-22 22:19:21,887 [myid:] - INFO [main-SendThread(192.168.5.100:2181):ClientCnxn$SendThread@1028] - Opening socket connection to server 192.168.5.100/192.168.5.100:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-22 22:19:21,985 [myid:] - INFO [main-SendThread(192.168.5.100:2181):ClientCnxn$SendThread@878] - Socket connection established to 192.168.5.100/192.168.5.100:2181, initiating session
2019-01-22 22:19:22,015 [myid:] - INFO [main-SendThread(192.168.5.100:2181):ClientCnxn$SendThread@1302] - Session establishment complete on server 192.168.5.100/192.168.5.100:2181, sessionid = 0x100000793a20000, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: 192.168.5.100:2181(CONNECTED) 0] ls /
[cluster, controller_epoch, brokers, zookeeper, admin, isr_change_notification, consumers, log_dir_event_notification, latest_producer_id_block, config]
[zk: 192.168.5.100:2181(CONNECTED) 1]
二、启动kafka
[root@node-100 kafka]# cd kafka_2.12-2.1.0/
[root@node-100 kafka_2.12-2.1.0]# ls
bin config data libs LICENSE logs NOTICE site-docs
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-server-start.sh -daemon config/server.properties
[root@node-100 kafka_2.12-2.1.0]#
三、创建主题
创建一个名字为 test 的Topic,这个topic只有一个partition,并且备份因子也设置为1:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
去zk看下这个topic
[zk: 192.168.5.100:2181(CONNECTED) 2] ls /brokers/
ids topics seqid
[zk: 192.168.5.100:2181(CONNECTED) 2] ls /brokers/topics
[test]
也可以通过以下命令来查看kafka中目前存在的topic
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-topics.sh --list --zookeeper localhost:2181
test
[root@node-100 kafka_2.12-2.1.0]#
除了我们通过手工的方式创建Topic,我们可以配置broker,当producer发布一个消息某个指定的Topic,但是这个Topic并不存在时,就自动创建。
四、发送消息
kafka自带了一个producer命令客户端,可以从本地文件中读取内容,或者我们也可以以命令行中直接输入内容,并将这些内容以消息的形式发送到kafka集群中。在默认情况下,每一个行会被当做成一个独立的消息。
首先我们要运行发布消息的脚本,然后在命令中输入要发送的消息的内容:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
>test msg
五、消费消息
对于consumer,kafka同样也携带了一个命令行客户端,会将获取到内容在命令中进行输出。
新版的命令(2.0):
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --consumer-property group.id=testGroup --consumer-property client.id=consumer-1 --topic test
test msg
老版本的命令(1.0):
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
如果是通过不同的终端窗口来运行以上的命令,你将会看到在producer终端输入的内容,很快就会在consumer的终端窗口上显示出来。
以上所有的命令都有一些附加的选项;当我们不携带任何参数运行命令的时候,将会显示出这个命令的详细用法。
比如说:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-console-producer.sh
Read data from standard input and publish it to Kafka.
Option Description
------ -----------
--batch-size <Integer: size> Number of messages to send in a single
batch if they are not being sent
synchronously. (default: 200)
--broker-list <String: broker-list> REQUIRED: The broker list string in
the form HOST1:PORT1,HOST2:PORT2.
--compression-codec [String: The compression codec: either 'none',
compression-codec] 'gzip', 'snappy', 'lz4', or 'zstd'.
If specified without value, then it
defaults to 'gzip'
--line-reader <String: reader_class> The class name of the class to use for
reading lines from standard in. By
default each line is read as a
separate message. (default: kafka.
tools.
ConsoleProducer$LineMessageReader)
--max-block-ms <Long: max block on The max time that the producer will
send> block for during a send request
(default: 60000)
--max-memory-bytes <Long: total memory The total memory used by the producer
in bytes> to buffer records waiting to be sent
to the server. (default: 33554432)
--max-partition-memory-bytes <Long: The buffer size allocated for a
memory in bytes per partition> partition. When records are received
which are smaller than this size the
producer will attempt to
optimistically group them together
until this size is reached.
(default: 16384)
--message-send-max-retries <Integer> Brokers can fail receiving the message
for multiple reasons, and being
unavailable transiently is just one
of them. This property specifies the
number of retires before the
producer give up and drop this
message. (default: 3)
--metadata-expiry-ms <Long: metadata The period of time in milliseconds
expiration interval> after which we force a refresh of
metadata even if we haven't seen any
leadership changes. (default: 300000)
--producer-property <String: A mechanism to pass user-defined
producer_prop> properties in the form key=value to
the producer.
--producer.config <String: config file> Producer config properties file. Note
that [producer-property] takes
precedence over this config.
--property <String: prop> A mechanism to pass user-defined
properties in the form key=value to
the message reader. This allows
custom configuration for a user-
defined message reader.
--request-required-acks <String: The required acks of the producer
request required acks> requests (default: 1)
--request-timeout-ms <Integer: request The ack timeout of the producer
timeout ms> requests. Value must be non-negative
and non-zero (default: 1500)
--retry-backoff-ms <Integer> Before each retry, the producer
refreshes the metadata of relevant
topics. Since leader election takes
a bit of time, this property
specifies the amount of time that
the producer waits before refreshing
the metadata. (default: 100)
--socket-buffer-size <Integer: size> The size of the tcp RECV size.
(default: 102400)
--sync If set message send requests to the
brokers are synchronously, one at a
time as they arrive.
--timeout <Integer: timeout_ms> If set and the producer is running in
asynchronous mode, this gives the
maximum amount of time a message
will queue awaiting sufficient batch
size. The value is given in ms.
(default: 1000)
--topic <String: topic> REQUIRED: The topic id to produce
messages to.
[root@node-100 kafka_2.12-2.1.0]#
还有一些其他命令如下:
查看组名
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list
testGroup
[root@node-100 kafka_2.12-2.1.0]#
查看消费者的消费偏移量
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group testGroup
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test 0 20 20 0 consumer-1-86bc402b-40d3-4aad-99b5-ce89f2603f29 /192.168.5.100 consumer-1
[root@node-100 kafka_2.12-2.1.0]#
消费多主题
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --whitelist "test|test-2"
六、单播消费
一条消息只能被某一个消费者消费的模式,类似queue模式,只需让所有消费者在同一个消费组里即可
分别在两个客户端执行如下消费命令,然后往主题里发送消息,结果只有一个客户端能收到消息。
producer:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
>test msg
>
consumer-1:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --consumer-property group.id=testGroup --topic test
test msg
consumer-2:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --consumer-property group.id=testGroup --topic test
这种情况,自己开一个producer,两个consumer,参数:group.id=testGroup --topic test 相同,可以在producer端发送消息,然后只有一个consumer可以收到消息。
七、多播消费
一条消息能被多个消费者消费的模式,类似publish-subscribe模式费,针对Kafka同一条消息只能被同一个消费组下的某一个消费者消费的特性,要实现多播只要保证这些消费者属于不同的消费组即可。我们再增加一个消费者,该消费者属于testGroup-2消费组,结果两个客户端都能收到消息。
producer:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
>test 123
>
consumer-1(group.id=testGroup):
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --consumer-property group.id=testGroup --consumer-property --topic test
test 123
consumer-2(group.id=testGroup-2):
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --consumer-property group.id=testGroup-2 --topic test
test 123
集群的搭建与部署
对于kafka来说,一个单独的broker意味着kafka集群中只有一个接点。要想增加kafka集群中的节点数量,只需要多启动几个broker实例即可。为了有更好的理解,现在我们在一台机器上同时启动三个broker实例。
首先,我们需要建立好其他2个broker的配置文件:
[root@node-100 kafka_2.12-2.1.0]# cp config/server.properties config/server-1.properties
[root@node-100 kafka_2.12-2.1.0]# cp config/server.properties config/server-2.properties
[root@node-100 kafka_2.12-2.1.0]#
修改配置文件的内容分别如下:
config/server-1.properties:
broker.id=1
port=9093
log.dirs=/usr/local/kafka/kafka_2.12-2.1.0/data/kafka-logs-1
config/server-2.properties:
broker.id=2
port=9094
log.dirs=/usr/local/kafka/kafka_2.12-2.1.0/data/kafka-logs-2
之前的config/server.properties:
broker.id=0
port=9092
log.dirs=/usr/local/kafka/kafka_2.12-2.1.0/data/kafka-logs
broker.id属性在kafka集群中必须要是唯一的。我们需要重新指定port和log目录,因为我们是在同一台机器上运行多个实例。如果不进行修改的话,consumer只能获取到一个instance实例的信息,或者是相互之间的数据会被影响。
目前我们已经有一个zookeeper实例和一个broker实例在运行了,现在我们只需要在启动2个broker实例即可:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-server-start.sh -daemon config/server-1.properties
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-server-start.sh -daemon config/server-2.properties
现在我们创建一个新的topic,备份因子设置为3:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
Created topic "my-replicated-topic".
[root@node-100 kafka_2.12-2.1.0]#
现在我们已经有了集群,并且创建了一个3个备份因子的topic,但是到底是哪一个broker在为这个topic提供服务呢(因为我们只有一个分区,所以肯定同时只有一个broker在处理这个topic)?
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2
[root@node-100 kafka_2.12-2.1.0]#
以下是输出内容的解释,第一行是所有分区的概要信息,之后的每一行表示每一个partition的信息。因为目前我们只有一个partition,因此关于partition的信息只有一行。
- leader节点负责给定partition的所有读写请求。
- replicas 表示某个partition在哪几个broker上存在备份。不管这个几点是不是leader,甚至这个节点挂了,也会列出来。
- isr 是replicas的一个子集,它只列出当前还存活着的,并且备份了该partition的节点。
现在我们的案例中,0号节点是leader,即使用server.properties启动的那个进程。
我们可以运行相同的命令查看之前创建的名称为test的topic
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
[root@node-100 kafka_2.12-2.1.0]#
之前设置了topic的partition数量为1,备份因子为1,因此显示就如上所示了。
现在我们向新建的topic中发送一些message:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
>test 001
>test 002
现在开始消费:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
test 001
test 002
现在我们来测试下选举,因为broker0目前是leader,所以我们要将其kill掉
[root@node-100 kafka_2.12-2.1.0]# ps -ef | grep server.properties
[root@node-100 kafka_2.12-2.1.0]# kill -9 2732
现在再执行命令:
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 1 Replicas: 0,1,2 Isr: 1,2
[root@node-100 kafka_2.12-2.1.0]#
我们可以看到,leader节点已经变成了broker1。要注意的是,在Isr中,已经没有了broker 0号节点。leader的选举也是从ISR(in-sync replica)中进行的。
再次将broker 0启动,查看topic信息
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-server-start.sh -daemon config/server.properties
[root@node-100 kafka_2.12-2.1.0]# bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 1 Replicas: 0,1,2 Isr: 1,2,0
[root@node-100 kafka_2.12-2.1.0]#
在Isr中,已经又有了broker 0号节点,且leader还是broker 1。
看下zk中的partition的state信息:
[zk: 192.168.5.100:2181(CONNECTED) 11] get /brokers/topics/my-replicated-topic/partitions/0/state
{"controller_epoch":3,"leader":1,"version":1,"leader_epoch":2,"isr":[1,2]}
cZxid = 0xd3
ctime = Tue Jan 22 23:24:44 CST 2019
mZxid = 0x136
mtime = Wed Jan 23 00:08:32 CST 2019
pZxid = 0xd3
cversion = 0
dataVersion = 3
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 74
numChildren = 0
[zk: 192.168.5.100:2181(CONNECTED) 12]
跟我们看到的topic信息是一样的。
还有一些其他小问题没有补充,下次继续:)。
如有问题,欢迎指正:)