spark应用输出理想的结果,然而其Executor却被killed,日志没有ERROR,究竟为什么呢???

fshjq 发布于 2017/03/03 22:11
阅读 1K+
收藏 0

很简单的spark应用,代码如下:

def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("spark-hbase")
      .setMaster("spark://ubuntu:7077")
    val sc = new SparkContext(conf)
    val data = Array(1, 2, 3, 4, 5, 6)
    val dataRDD = sc.parallelize(data)
    dataRDD.foreach(println)
    sc.stop()

  }

使用spark-submit提交后,Driver和Executor日志都有错误信息。

大致环境是这样子的:

版本是spark1.6.2

搭建在虚拟机上,是standalone模式

Driver日志:

17/03/03 05:35:49 INFO spark.SparkContext: Running Spark version 1.6.2
17/03/03 05:35:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/03 05:35:49 INFO spark.SecurityManager: Changing view acls to: root
17/03/03 05:35:49 INFO spark.SecurityManager: Changing modify acls to: root
17/03/03 05:35:49 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/03/03 05:35:49 INFO util.Utils: Successfully started service 'sparkDriver' on port 37440.
17/03/03 05:35:50 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/03/03 05:35:50 INFO Remoting: Starting remoting
17/03/03 05:35:50 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.40.128:34383]
17/03/03 05:35:50 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 34383.
17/03/03 05:35:50 INFO spark.SparkEnv: Registering MapOutputTracker
17/03/03 05:35:50 INFO spark.SparkEnv: Registering BlockManagerMaster
17/03/03 05:35:50 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-7a94aff1-dcdb-424c-9b2f-719bd23636aa
17/03/03 05:35:50 INFO storage.MemoryStore: MemoryStore started with capacity 1247.3 MB
17/03/03 05:35:50 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/03/03 05:35:50 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/03/03 05:35:50 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
17/03/03 05:35:50 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/03/03 05:35:50 INFO ui.SparkUI: Started SparkUI at http://192.168.40.128:4040
17/03/03 05:35:50 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-01a0ee35-9cfa-4174-bc5f-c895bd14a1dd/httpd-947091f6-6c1f-4f24-9e12-7175cd77c180
17/03/03 05:35:50 INFO spark.HttpServer: Starting HTTP Server
17/03/03 05:35:50 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/03/03 05:35:50 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46824
17/03/03 05:35:50 INFO util.Utils: Successfully started service 'HTTP file server' on port 46824.
17/03/03 05:35:50 INFO spark.SparkContext: Added JAR file:/home/hjq/sparkrecommersystem.jar at http://192.168.40.128:46824/jars/sparkrecommersystem.jar with timestamp 1488548150867
17/03/03 05:35:51 INFO spark.SparkContext: Added JAR file:/home/hjq/sparkrecommersystem.jar at http://192.168.40.128:46824/jars/sparkrecommersystem.jar with timestamp 1488548151065
17/03/03 05:35:51 INFO client.AppClient$ClientEndpoint: Connecting to master spark://ubuntu:7077...
17/03/03 05:35:51 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170303053551-0005
17/03/03 05:35:51 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40961.
17/03/03 05:35:51 INFO netty.NettyBlockTransferService: Server created on 40961
17/03/03 05:35:51 INFO storage.BlockManagerMaster: Trying to register BlockManager
17/03/03 05:35:51 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.40.128:40961 with 1247.3 MB RAM, BlockManagerId(driver, 192.168.40.128, 40961)
17/03/03 05:35:51 INFO client.AppClient$ClientEndpoint: Executor added: app-20170303053551-0005/0 on worker-20170303052228-192.168.40.128-35546 (192.168.40.128:35546) with 2 cores
17/03/03 05:35:51 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20170303053551-0005/0 on hostPort 192.168.40.128:35546 with 2 cores, 1024.0 MB RAM
17/03/03 05:35:51 INFO storage.BlockManagerMaster: Registered BlockManager
17/03/03 05:35:51 INFO client.AppClient$ClientEndpoint: Executor updated: app-20170303053551-0005/0 is now RUNNING
17/03/03 05:35:51 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/03/03 05:35:52 INFO spark.SparkContext: Starting job: foreach at SparkHbase.scala:25
17/03/03 05:35:52 INFO scheduler.DAGScheduler: Got job 0 (foreach at SparkHbase.scala:25) with 2 output partitions
17/03/03 05:35:52 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (foreach at SparkHbase.scala:25)
17/03/03 05:35:52 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/03/03 05:35:52 INFO scheduler.DAGScheduler: Missing parents: List()
17/03/03 05:35:52 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at SparkHbase.scala:24), which has no missing parents
17/03/03 05:35:52 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1184.0 B, free 1184.0 B)
17/03/03 05:35:52 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 861.0 B, free 2045.0 B)
17/03/03 05:35:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.40.128:40961 (size: 861.0 B, free: 1247.2 MB)
17/03/03 05:35:52 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/03/03 05:35:52 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at SparkHbase.scala:24)
17/03/03 05:35:52 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
17/03/03 05:35:54 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ubuntu:55164) with ID 0
17/03/03 05:35:54 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ubuntu, partition 0,PROCESS_LOCAL, 2096 bytes)
17/03/03 05:35:54 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ubuntu, partition 1,PROCESS_LOCAL, 2096 bytes)
17/03/03 05:35:54 INFO storage.BlockManagerMasterEndpoint: Registering block manager ubuntu:46207 with 511.1 MB RAM, BlockManagerId(0, ubuntu, 46207)
17/03/03 05:35:56 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ubuntu:46207 (size: 861.0 B, free: 511.1 MB)
17/03/03 05:35:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2575 ms on ubuntu (1/2)
17/03/03 05:35:56 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2533 ms on ubuntu (2/2)
17/03/03 05:35:56 INFO scheduler.DAGScheduler: ResultStage 0 (foreach at SparkHbase.scala:25) finished in 4.141 s
17/03/03 05:35:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/03/03 05:35:56 INFO scheduler.DAGScheduler: Job 0 finished: foreach at SparkHbase.scala:25, took 4.553689 s
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
17/03/03 05:35:56 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
17/03/03 05:35:56 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.40.128:4040
17/03/03 05:35:56 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
17/03/03 05:35:56 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
17/03/03 05:35:56 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/03 05:35:56 INFO storage.MemoryStore: MemoryStore cleared
17/03/03 05:35:56 INFO storage.BlockManager: BlockManager stopped
17/03/03 05:35:56 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/03/03 05:35:56 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/03 05:35:56 INFO spark.SparkContext: Successfully stopped SparkContext
17/03/03 05:35:56 INFO util.ShutdownHookManager: Shutdown hook called
17/03/03 05:35:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-01a0ee35-9cfa-4174-bc5f-c895bd14a1dd/httpd-947091f6-6c1f-4f24-9e12-7175cd77c180
17/03/03 05:35:56 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/03/03 05:35:56 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/03/03 05:35:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-01a0ee35-9cfa-4174-bc5f-c895bd14a1dd
 

Executor日志:

17/03/03 05:35:51 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
17/03/03 05:35:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/03 05:35:52 INFO spark.SecurityManager: Changing view acls to: root
17/03/03 05:35:52 INFO spark.SecurityManager: Changing modify acls to: root
17/03/03 05:35:52 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/03/03 05:35:53 INFO spark.SecurityManager: Changing view acls to: root
17/03/03 05:35:53 INFO spark.SecurityManager: Changing modify acls to: root
17/03/03 05:35:53 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/03/03 05:35:53 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/03/03 05:35:53 INFO Remoting: Starting remoting
17/03/03 05:35:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutorActorSystem@192.168.40.128:34785]
17/03/03 05:35:53 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 34785.
17/03/03 05:35:53 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-7b29151b-4e5d-47b4-a2b5-5047e913749e/executor-5d04b0ba-c6e6-407e-8aaf-2f52d83937e4/blockmgr-b2012106-4705-4866-9c94-d621f7781755
17/03/03 05:35:53 INFO storage.MemoryStore: MemoryStore started with capacity 511.1 MB
17/03/03 05:35:54 INFO worker.WorkerWatcher: Connecting to worker spark://Worker@192.168.40.128:35546
17/03/03 05:35:54 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@192.168.40.128:37440
17/03/03 05:35:54 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
17/03/03 05:35:54 INFO executor.Executor: Starting executor ID 0 on host ubuntu
17/03/03 05:35:54 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46207.
17/03/03 05:35:54 INFO netty.NettyBlockTransferService: Server created on 46207
17/03/03 05:35:54 INFO storage.BlockManagerMaster: Trying to register BlockManager
17/03/03 05:35:54 INFO storage.BlockManagerMaster: Registered BlockManager
17/03/03 05:35:54 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0
17/03/03 05:35:54 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1
17/03/03 05:35:54 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
17/03/03 05:35:54 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
17/03/03 05:35:54 INFO executor.Executor: Fetching http://192.168.40.128:46824/jars/sparkrecommersystem.jar with timestamp 1488548151065
17/03/03 05:35:54 INFO util.Utils: Fetching http://192.168.40.128:46824/jars/sparkrecommersystem.jar to /tmp/spark-7b29151b-4e5d-47b4-a2b5-5047e913749e/executor-5d04b0ba-c6e6-407e-8aaf-2f52d83937e4/spark-3a334ed2-170e-4d8e-8ffb-d06011b3c635/fetchFileTemp2978882907884150213.tmp
17/03/03 05:35:54 INFO util.Utils: Copying /tmp/spark-7b29151b-4e5d-47b4-a2b5-5047e913749e/executor-5d04b0ba-c6e6-407e-8aaf-2f52d83937e4/spark-3a334ed2-170e-4d8e-8ffb-d06011b3c635/-11128882151488548151065_cache to /software/spark-1.6.2-bin-hadoop2.6/work/app-20170303053551-0005/0/./sparkrecommersystem.jar
17/03/03 05:35:55 INFO executor.Executor: Adding file:/software/spark-1.6.2-bin-hadoop2.6/work/app-20170303053551-0005/0/./sparkrecommersystem.jar to class loader
17/03/03 05:35:56 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0
17/03/03 05:35:56 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 861.0 B, free 861.0 B)
17/03/03 05:35:56 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 245 ms
17/03/03 05:35:56 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1184.0 B, free 2045.0 B)
17/03/03 05:35:56 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 912 bytes result sent to driver
17/03/03 05:35:56 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 912 bytes result sent to driver
17/03/03 05:35:56 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown

-------------------------------------------------------------------------------------------

我重复运行几次试试,有时Executor还会出现如下日志:

17/03/03 05:53:52 WARN executor.CoarseGrainedExecutorBackend: An unknown (ubuntu:38598) driver disconnected.
17/03/03 05:53:52 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.40.128:38598 disassociated! Shutting down.

求大神解释下......

加载中
0
GestureWei
GestureWei

Driver的出错信息没发上来

GestureWei
GestureWei
从你发的这个错误明显是driver异常退出了才会让executor收到这个
fshjq
fshjq
我在终端就看到这么多吖
0
w
wcx945

你好,请问你这个问题现在解决了吗?

返回顶部
顶部