kafak主节点cpu 内存持续飙高,不回收,最后服务挂掉问题?

rui1988221 发布于 2018/08/23 11:30
阅读 573
收藏 0

kafak主节点cpu 内存持续飙高,不回收,最后服务挂掉问题?

[2018-08-23 07:56:11,788] INFO [GroupCoordinator 0]: Stabilized group consumer.web.log generation 268 (__consumer_offsets-24) (kafka.coordinator.group.GroupCoordinator)
[2018-08-23 07:56:12,054] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.OutOfMemoryError: Java heap space
[2018-08-23 07:56:13,846] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.OutOfMemoryError: Java heap space
[2018-08-23 07:56:15,673] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.OutOfMemoryError: Direct buffer memory
  at java.nio.Bits.reserveMemory(Bits.java:694)
  at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
  at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
  at sun.nio.ch.IOUtil.read(IOUtil.java:195)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
  at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
  at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:145)
  at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
  at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
  at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
  at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:557)
  at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495)
  at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
  at kafka.network.Processor.poll(SocketServer.scala:628)
  at kafka.network.Processor.run(SocketServer.scala:545)
  at java.lang.Thread.run(Thread.java:748)
[2018-08-23 07:56:16,379] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.OutOfMemoryError: Java heap space
java    1772 liandong   83u  IPv4 7990697      0t0  TCP *:36145 (LISTEN)
java    1772 liandong   84u  IPv4 7990698      0t0  TCP *:9099 (LISTEN)
java    1772 liandong   85u  IPv4 7990701      0t0  TCP *:40745 (LISTEN)
java    1772 liandong  100u  IPv4 7990709      0t0  TCP prod_data_kafka_2:44688->prod_data_zk:eforward (ESTABLISHED)
java    1772 liandong  193u  IPv4 7989816      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc (LISTEN)
java    1772 liandong  224u  IPv4 8011001      0t0  TCP prod_data_kafka_2:9099->172.18.58.184:45220 (ESTABLISHED)
java    1772 liandong  228u  IPv4 7990858      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:51326 (ESTABLISHED)
java    1772 liandong  229u  IPv4 7990859      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:51334 (ESTABLISHED)
java    1772 liandong  230u  IPv4 8013244      0t0  TCP prod_data_kafka_2:36145->172.18.58.184:44414 (ESTABLISHED)
java    1772 liandong  235u  IPv4 7989829      0t0  TCP prod_data_kafka_2:32976->prod_data_kafka_1:XmlIpcRegSvc (ESTABLISHED)
java    1772 liandong  236u  IPv4 8014128      0t0  TCP prod_data_kafka_2:36145->172.18.58.184:44484 (ESTABLISHED)
java    1772 liandong  237u  IPv4 8013361      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60202 (ESTABLISHED)
java    1772 liandong  243u  IPv4 7998548      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_3:39816 (ESTABLISHED)
java    1772 liandong  247u  IPv4 7998555      0t0  TCP prod_data_kafka_2:33206->prod_data_kafka_3:XmlIpcRegSvc (ESTABLISHED)
java    1772 liandong  251u  IPv4 7999481      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_1:48914 (ESTABLISHED)
java    1772 liandong  252u  IPv4 8007519      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:58936 (CLOSE_WAIT)
java    1772 liandong  253u  IPv4 8012365      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_2:33230 (CLOSE_WAIT)
java    1772 liandong  255u  IPv4 8009660      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:59356 (ESTABLISHED)
java    1772 liandong  258u  IPv4 8010614      0t0  TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:59936 (CLOSE_WAIT)
 1772 liandong  20   0 6398984 2.127g  16112 S 101.0 57.5  69:43.72 java                                                                                            
2018-08-22T19:01:45.014+0800: 31537.291: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.0147456 secs]
   [Parallel Time: 12.9 ms, GC Workers: 2]
      [GC Worker Start (ms): Min: 31537291.3, Avg: 31537291.3, Max: 31537291.3, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 1.8, Avg: 1.9, Max: 1.9, Diff: 0.1, Sum: 3.8]
      [Update RS (ms): Min: 1.9, Avg: 1.9, Max: 1.9, Diff: 0.0, Sum: 3.9]
         [Processed Buffers: Min: 14, Avg: 15.5, Max: 17, Diff: 3, Sum: 31]
      [Scan RS (ms): Min: 4.5, Avg: 4.5, Max: 4.5, Diff: 0.0, Sum: 9.0]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 4.1, Avg: 4.2, Max: 4.2, Diff: 0.1, Sum: 8.3]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 4, Avg: 4.0, Max: 4, Diff: 0, Sum: 8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 25.1]
      [GC Worker End (ms): Min: 31537303.8, Avg: 31537303.8, Max: 31537303.8, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 1.4 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.2 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 0.6 ms]
   [Eden: 781.0M(781.0M)->0.0B(781.0M) Survivors: 3072.0K->3072.0K Heap: 2106.0M(2347.0M)->1325.2M(2347.0M)]
 [Times: user=0.03 sys=0.00, real=0.02 secs] 
2018-08-22T19:01:45.029+0800: 31537.306: [GC concurrent-root-region-scan-start]
2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-root-region-scan-end, 0.0098860 secs]
2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-mark-start]
2018-08-22T19:01:45.111+0800: 31537.388: [GC concurrent-mark-end, 0.0721221 secs]
2018-08-22T19:01:45.111+0800: 31537.388: [GC remark 2018-08-22T19:01:45.111+0800: 31537.388: [Finalize Marking, 0.0002506 secs] 2018-08-22T19:01:45.111+0800: 31537.388: [GC ref-proc, 0.0008536 secs] 2018-08-22T19:01:45.112+0800: 31537.389: [Unloading, 0.0159521 secs], 0.0264459 secs]
 [Times: user=0.05 sys=0.00, real=0.03 secs] 
2018-08-22T19:01:45.139+0800: 31537.415: [GC cleanup 1339M->1339M(2347M), 0.0026152 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs] 
2018-08-22T19:01:48.222+0800: 31540.499: [GC pause (G1 Evacuation Pause) (young), 0.0141944 secs]
   [Parallel Time: 12.6 ms, GC Workers: 2]
      [GC Worker Start (ms): Min: 31540499.4, Avg: 31540499.4, Max: 31540499.4, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.4, Diff: 0.1, Sum: 2.8]
      [Update RS (ms): Min: 2.3, Avg: 2.3, Max: 2.3, Diff: 0.1, Sum: 4.6]
         [Processed Buffers: Min: 11, Avg: 17.0, Max: 23, Diff: 12, Sum: 34]
      [Scan RS (ms): Min: 4.4, Avg: 4.5, Max: 4.5, Diff: 0.1, Sum: 8.9]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 4.2, Avg: 4.3, Max: 4.3, Diff: 0.1, Sum: 8.5]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 2.5, Max: 4, Diff: 3, Sum: 5]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 24.9]
      [GC Worker End (ms): Min: 31540511.9, Avg: 31540511.9, Max: 31540511.9, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 1.3 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.2 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 0.6 ms]
   [Eden: 781.0M(781.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap: 2106.2M(2347.0M)->1325.2M(2347.0M)]
 [Times: user=0.02 sys=0.00, real=0.01 secs] 
2018-08-22T19:01:51.373+0800: 31543.650: [GC pause (G1 Evacuation Pause) (young), 0.0146431 secs]
   [Parallel Time: 13.1 ms, GC Workers: 2]
      [GC Worker Start (ms): Min: 31543649.9, Avg: 31543649.9, Max: 31543649.9, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.5, Diff: 0.1, Sum: 2.8]
      [Update RS (ms): Min: 2.4, Avg: 2.4, Max: 2.5, Diff: 0.1, Sum: 4.8]
         [Processed Buffers: Min: 8, Avg: 17.5, Max: 27, Diff: 19, Sum: 35]
      [Scan RS (ms): Min: 4.5, Avg: 4.6, Max: 4.7, Diff: 0.2, Sum: 9.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 4.4, Avg: 4.4, Max: 4.5, Diff: 0.1, Sum: 8.9]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 13.0, Avg: 13.0, Max: 13.0, Diff: 0.0, Sum: 25.9]
      [GC Worker End (ms): Min: 31543662.8, Avg: 31543662.8, Max: 31543662.9, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.4 ms]
   [Other: 1.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.1 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 0.6 ms]
   [Eden: 780.0M(780.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap: 2105.2M(2347.0M)->1325.3M(2347.0M)]
 [Times: user=0.02 sys=0.00, real=0.02 secs] 

XmlIpcRegSvc->172.18.58.184:60686 (CLOSE_WAIT) 有很多这个样的端口关闭等待,这是应用连接端。为什么一直等待呢?内存也没有回收 去3台服务是 2核 4G,这前给的是kafka3G,发现内存没了,报堆外内存溢出,然后我就限制kfaka最大为2G,但主节点跑一段时间后,又报堆内存溢出和堆外内存溢出,我free -m看了一下,内存确实还有100MB了,不知内存用在那里,kafka 这个进程暂用完了 80%的内存,cpu 100%多了,这是什么原因呢? 配置参数和官网核对了一下,全用默认的也不行,

加载中
0
东君
东君

Kafka本身内存使用一般,早期0.9版本的Kafka有堆外内存泄露的BUG,不知道你的版本是哪个

0
sxgkwei
sxgkwei

更新到最新版。

0
r
rui1988221

在我从kafka 1.1.0升级到kafka 2.12_2.0.0之后,堆中的内存被回收了,但可能是堆外的内存仍然在飙升,而不是下降,而是缓慢上升。原因是什么?请告诉我谢谢

free -m
total used free shared buff/cache available
Mem: 3790 3013 167 0 609 545


top - 17:21:15 up 21 days, 7:20, 2 users, load average: 0.02, 0.03, 0.05
Tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 3881688 total, 172536 free, 3085836 used, 623316 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 557800 avail Mem 

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
16696 liandong 20 0 7317808 2.706g 13304 S 0.0 73.1 4:06.75 java 
19754 liandong 20 0 154584 2276 960 S 0.0 0.1 0:00.01 sshd 
19730 liandong 20 0 154580 2268 952 S 0.0 0.1 0:00.00 sshd 
20700 liandong 20 0 161812 2172 1528 R 0.0 0.1 0:00.00 top 
19755 liandong 20 0 115396 2100 1644 S 0.0 0.1 0:00.01 bash 
19731 liandong 20 0 115396 2044 1608 S 0.0 0.1 0:00.00 bash



 

[liandong@prod_data_kafka_1 ~]$ jmap -heap 16696
Attaching to process ID 16696, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.151-b12

using thread-local object allocation.
Garbage-First (G1) GC with 2 thread(s)

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 2147483648 (2048.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 1287651328 (1228.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 1048576 (1.0MB)

Heap Usage:
G1 Heap:
regions = 2048
capacity = 2147483648 (2048.0MB)
used = 290261024 (276.8144836425781MB)
free = 1857222624 (1771.1855163574219MB)
13.51633220911026% used
G1 Young Generation:
Eden Space:
regions = 88
capacity = 105906176 (101.0MB)
used = 92274688 (88.0MB)
free = 13631488 (13.0MB)
87.12871287128714% used
Survivor Space:
regions = 7
capacity = 7340032 (7.0MB)
used = 7340032 (7.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 184
capacity = 2034237440 (1940.0MB)
used = 190646304 (181.81448364257812MB)
free = 1843591136 (1758.1855163574219MB)
9.371880600132894% used

11241 interned Strings occupying 1275336 bytes.

 

0
r
rui1988221
top - 19:03:35 up 21 days,  9:02,  2 users,  load average: 0.02, 0.09, 0.08
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.2 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  3881688 total,   138404 free,  3089908 used,   653376 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   553880 avail Mem 

 NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                             
  0 7317808 2.710g  13304 S   0.7 73.2   4:53.54 /usr/local/jdk1.8/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Xloggc:/data/kafka/bin/../lo+

 

返回顶部
顶部