kafka集群出现奇怪的现象

小乞丐 发布于 2015/04/08 18:42
阅读 12K+
收藏 0

各位好:

 最近在研究kafka的时候遇到一个奇怪的现象,还请不灵赐教!

1:当集群环境运行一段时间后会出现奇怪错误,zookeeper无法连接到其中一个node;但是node是活动的;

下图是server.log日志:


192.168.233.134上启动了zookeeper和两个node(端口分别为9092和9093);

然后9092和9093 的端口信息都如下图(只发出来了9092,9093也是如此):

另外kafkaMonitor 监控页面显示集群环境正常:


2:0.8.2官方部分说法和kafka client源码对应不上:

例如官网:



3.3 Producer Configs

Essential configuration properties for the producer include:
  • metadata.broker.list
  • request.required.acks
  • producer.type
  • serializer.class
Property Default Description
metadata.broker.list

This is for bootstrapping and the producer will only use it for getting metadata (topics, partitions and replicas). The socket connections for sending the actual data will be established based on the broker information returned in the metadata. The format is host1:port1,host2:port2, and the list can be a subset of brokers or a VIP pointing to a subset of brokers.

request.required.acks 0

This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are

  • 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails).
  • 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost).
  • -1, The producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the greatest level of durability. However, it does not completely eliminate the risk of message loss because the number of in sync replicas may, in rare cases, shrink to 1. If you want to ensure that some minimum number of replicas (typically a majority) receive a write, then you must set the topic-level min.insync.replicas setting. Please read the Replication section of the design documentation for a more in-depth discussion.
request.timeout.ms 10000 The amount of time the broker will wait trying to meet the request.required.acks requirement before sending back an error to the client.
producer.type sync

This parameter specifies whether the messages are sent asynchronously in a background thread. Valid values are (1) async for asynchronous send and (2) sync for synchronous send. By setting the producer to async we allow batching together of requests (which is great for throughput) but open the possibility of a failure of the client machine dropping unsent data.

serializer.class kafka.serializer.DefaultEncoder The serializer class for messages. The default encoder takes a byte[] and returns the same byte[].
key.serializer.class
The serializer class for keys (defaults to the same as for messages if nothing is given).
partitioner.class kafka.producer.DefaultPartitioner The partitioner class for partitioning messages amongst sub-topics. The default partitioner is based on the hash of the key.
compression.codec none

This parameter allows you to specify the compression codec for all data generated by this producer. Valid values are "none", "gzip" and "snappy".

compressed.topics null

This parameter allows you to set whether compression should be turned on for particular topics. If the compression codec is anything other than NoCompressionCodec, enable compression only for specified topics if any. If the list of compressed topics is empty, then enable the specified compression codec for all topics. If the compression codec is NoCompressionCodec, compression is disabled for all topics

message.send.max.retries 3

This property will cause the producer to automatically retry a failed send request. This property specifies the number of retries when such failures occur. Note that setting a non-zero value here can lead to duplicates in the case of network errors that cause a message to be sent but the acknowledgement to be lost.

retry.backoff.ms 100

Before each retry, the producer refreshes the metadata of relevant topics to see if a new leader has been elected. Since leader election takes a bit of time, this property specifies the amount of time that the producer waits before refreshing the metadata.

topic.metadata.refresh.interval.ms 600 * 1000

The producer generally refreshes the topic metadata from brokers when there is a failure (partition missing, leader not available...). It will also poll regularly (default: every 10min so 600000ms). If you set this to a negative value, metadata will only get refreshed on failure. If you set this to zero, the metadata will get refreshed after each message sent (not recommended). Important note: the refresh happen only AFTER the message is sent, so if the producer never sends a message the metadata is never refreshed

queue.buffering.max.ms 5000 Maximum time to buffer data when using async mode. For example a setting of 100 will try to batch together 100ms of messages to send at once. This will improve throughput but adds message delivery latency due to the buffering.
queue.buffering.max.messages 10000 The maximum number of unsent messages that can be queued up the producer when using async mode before either the producer must be blocked or data must be dropped.
queue.enqueue.timeout.ms -1

The amount of time to block before dropping messages when running in async mode and the buffer has reached queue.buffering.max.messages. If set to 0 events will be enqueued immediately or dropped if the queue is full (the producer send call will never block). If set to -1 the producer will block indefinitely and never willingly drop a send.

batch.num.messages 200 The number of messages to send in one batch when using async mode. The producer will wait until either this number of messages are ready to send or queue.buffer.max.ms is reached.
send.buffer.bytes 100 * 1024 Socket write buffer size
client.id "" The client id is a user-specified string sent in each request to help trace calls. It should logically identify the application making the request.

More details about producer configuration can be found in the scala classkafka.producer.ProducerConfig.

2.1 Producer API

As of the 0.8.2 release we encourage all new development to use the new Java producer. This client is production tested and generally both faster and more fully featured than the previous Scala client. You can use this client by adding a dependency on the client jar using the following maven co-ordinates:
<dependency>
	    <groupId>org.apache.kafka</groupId>
	    <artifactId>kafka-clients</artifactId>
	    <version>0.8.2.0</version>
	</dependency>
Examples showing how to use the producer are given in the javadocs.



在product API中:很多属性实际上没有和改变了,在源码中给的默认值也完全对应不上,此外比如:

serializer.class 在官网document中默认值为kafka.serializer.DefaultEncoder,然后在官网client0.8.2.0源码中初始化serializer.class的值却是null。并且在源码中并没有kafka.serializer.DefaultEncoder这个class。更奇怪的是源码中对值为null还会抛出异常,检查配置的源码:NO_DEFAULT_VALUE 这里是 new String("")

 /**
     * Parse and validate configs against this configuration definition. The input is a map of configs. It is expected
     * that the keys of the map are strings, but the values can either be strings or they may already be of the
     * appropriate type (int, string, etc). This will work equally well with either java.util.Properties instances or a
     * programmatically constructed map.
     * @param props The configs to parse and validate
     * @return Parsed and validated configs. The key will be the config name and the value will be the value parsed into
     *         the appropriate type (int, string, etc)
     */
    public Map<String, Object> parse(Map<?, ?> props) {
        /* parse all known keys */
        Map<String, Object> values = new HashMap<String, Object>();
        for (ConfigKey key : configKeys.values()) {
            Object value;
            if (props.containsKey(key.name))
                value = parseType(key.name, props.get(key.name), key.type);
            else if (key.defaultValue == NO_DEFAULT_VALUE)//NO_DEFAULT_VALUE 这里是 new String("")
               throw new ConfigException("Missing required configuration \"" + key.name + "\" which has no default value.");
            else
                value = key.defaultValue;
            values.put(key.name, value);
        }
        return values;
    }
这个情况不知道各位有没有遇见,源码的版本肯定是对的,但是和官网说的却不是一回事。不知道该如何处理是好啊!

另外上述第一个问题还请遇到过或者有意见的朋友给个建议,如何处理。谢谢!




以下是问题补充:

@小乞丐:问题1补充: 目前看起来是zookeeper leader 和其它node 失联了~ 但其它node实际上并未死掉; 并且leader也活着,同时从netstat上2181端口上这些node还与他的连接依然建立好的; 现在这种情况下,生产者发送数据到topic 会发送超时异常! (2015/04/08 19:00)
@小乞丐:重启之后,一切又正常了。。。不知道为何会出现上面的情况? (2015/04/08 19:39)
加载中
0
RealMatrix
RealMatrix
你那个UI监控在官网下的?
0
p
panailin
请问问题现在解决了吗 ?是什么原因呢?求解答
0
猝死边缘小码工

这个问题你解决了吗

0
刘涛12121
刘涛12121

今天遇到这个问题了 

 

返回顶部
顶部