Yarn Container不断失效

KuroYuki 发布于 2016/09/27 16:02
阅读 366
收藏 0

使用Heron提交拓扑在Yarn集群运行,但拓扑没有正常执行。看logs文件,貌似container不断地失效,然后不断的分配新的container。部分logs如下:

Sep 26, 2016 2:37:59 PM com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager createNode
INFO: Created node for path: /heron/schedulers/ExclamationTopology
Sep 26, 2016 2:37:59 PM com.twitter.heron.scheduler.SchedulerMain runScheduler
INFO: Waiting for termination...
Sep 26, 2016 2:38:00 PM com.twitter.heron.scheduler.yarn.HeronMasterDriver submitHeronExecutorTask
INFO: Submitting evaluator task for id: 1
Sep 26, 2016 2:38:00 PM com.twitter.heron.scheduler.yarn.HeronMasterDriver$HeronWorkerStartHandler onNext
INFO: Task, id:1, has started.
Sep 26, 2016 2:38:05 PM org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager onEvaluatorException
WARNING: Failed evaluator: container_1474871539042_0001_01_000002
org.apache.reef.exception.EvaluatorException: Evaluator [container_1474871539042_0001_01_000002] is assumed to be in state [RUNNING]. But the resource manager reports it to be in state [FAILED]. This means that the Evaluator failed but wasn't able to send an error message back to the driver. Task [0] was running when the Evaluator crashed.
        at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onResourceStatusMessage(EvaluatorManager.java:589)
        at org.apache.reef.runtime.common.driver.resourcemanager.ResourceStatusHandler.onNext(ResourceStatusHandler.java:63)
        at org.apache.reef.runtime.common.driver.resourcemanager.ResourceStatusHandler.onNext(ResourceStatusHandler.java:36)
        at org.apache.reef.runtime.yarn.driver.REEFEventHandlers.onResourceStatus(REEFEventHandlers.java:91)
        at org.apache.reef.runtime.yarn.driver.YarnContainerManager.onContainerStatus(YarnContainerManager.java:391)
        at org.apache.reef.runtime.yarn.driver.YarnContainerManager.onContainersCompleted(YarnContainerManager.java:128)
        at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)

Sep 26, 2016 2:38:05 PM com.twitter.heron.scheduler.yarn.HeronMasterDriver$FailedContainerHandler onNext
WARNING: Container:container_1474871539042_0001_01_000002 failed
Sep 26, 2016 2:38:05 PM com.twitter.heron.scheduler.yarn.HeronMasterDriver$FailedContainerHandler onNext
INFO: Trying to relaunch executor 0 running on failed container container_1474871539042_0001_01_000002
Sep 26, 2016 2:38:05 PM com.twitter.heron.scheduler.yarn.HeronMasterDriver allocateContainer
INFO: Requesting container for executor, id: 0, mem: 1,024, cpu: 1
Sep 26, 2016 2:38:06 PM org.apache.reef.wake.impl.ThreadPoolStage close
WARNING: Executor did not terminate in 1000ms.
Sep 26, 2016 2:38:06 PM org.apache.reef.wake.impl.ThreadPoolStage close
WARNING: Executor dropped 0 tasks.
Sep 26, 2016 2:38:06 PM com.twitter.heron.scheduler.yarn.HeronMasterDriver$FailedContainerHandler onNext
SEVERE: Failed to relaunch failed container: 0
com.twitter.heron.scheduler.yarn.HeronMasterDriver$ContainerAllocationException: Interrupted while waiting for container
        at com.twitter.heron.scheduler.yarn.HeronMasterDriver.launchContainerForExecutor(HeronMasterDriver.java:233)
        at com.twitter.heron.scheduler.yarn.HeronMasterDriver.access$1900(HeronMasterDriver.java:74)
        at com.twitter.heron.scheduler.yarn.HeronMasterDriver$FailedContainerHandler.onNext(HeronMasterDriver.java:467)
        at com.twitter.heron.scheduler.yarn.HeronMasterDriver$FailedContainerHandler.onNext(HeronMasterDriver.java:451)
        at org.apache.reef.runtime.common.driver.evaluator.IdlenessCallbackEventHandler.onNext(IdlenessCallbackEventHandler.java:46)
        at org.apache.reef.runtime.common.utils.BroadCastEventHandler.onNext(BroadCastEventHandler.java:40)
        at org.apache.reef.util.ExceptionHandlingEventHandler.onNext(ExceptionHandlingEventHandler.java:46)
        at org.apache.reef.runtime.common.utils.DispatchingEStage$1.onNext(DispatchingEStage.java:68)
        at org.apache.reef.runtime.common.utils.DispatchingEStage$1.onNext(DispatchingEStage.java:65)
        at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:182)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
        at java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at com.twitter.heron.scheduler.yarn.HeronMasterDriver.launchContainerForExecutor(HeronMasterDriver.java:231)
        ... 14 more
2016-09-27 10:24:51,545 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1474942797178_0001_000001 released container container_1474942797178_0001_01_000010 on node: host: node18-13.pdl.net:38504 #containers=0 available=<memory:8192, vCores:8> used=<memory:0, vCores:0> with event: RELEASED
2016-09-27 10:24:51,666 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1474942797178_0001_01_000008 Container Transitioned from ACQUIRED to RUNNING
2016-09-27 10:24:52,174 ERROR org.apache.hadoop.yarn.server.webapp.ContainerBlock: Failed to read the container container_1474942797178_0001_01_000002.
java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
    at org.apache.hadoop.yarn.server.webapp.ContainerBlock.render(ContainerBlock.java:77)
    at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
    at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
    at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
    at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
    at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
    at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
    at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
    at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
    at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
    at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.container(RmController.java:62)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
    at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
    at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
    at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
    at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
    at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:142)
    at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:595)
    at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
    at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:554)
    at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
    at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: org.apache.hadoop.yarn.exceptions.ContainerNotFoundException: Container with id 'container_1474942797178_0001_01_000002' doesn't exist in RM.
    at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:464)
    at org.apache.hadoop.yarn.server.webapp.ContainerBlock$1.run(ContainerBlock.java:81)
    at org.apache.hadoop.yarn.server.webapp.ContainerBlock$1.run(ContainerBlock.java:78)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    ... 58 more
2016-09-27 10:24:52,667 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1474942797178_0001_01_000011 Container Transitioned from NEW to ALLOCATED

加载中
返回顶部
顶部