Table of Contents
背景
目前公司的离线调度系统使用的Azkaban+Spark 2.4,上午突然很多任务执行失败了,通过查看执行日志发现如下异常:
22-10-2024 10:28:51 CST task_name INFO - org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /user/azkaban/spark2.4/history is exceeded: limit=1048576 items=1048576 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2248) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2336) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2304) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:2087) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addFile(FSDirectory.java:390) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:3015) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2890) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2774) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:610) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:117) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2278) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2274) 22-10-2024 10:28:51 CST task_name INFO - at java.security.AccessController.doPrivileged(Native Method) 22-10-2024 10:28:51 CST task_name INFO - at javax.security.auth.Subject.doAs(Subject.java:422) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2272) 22-10-2024 10:28:51 CST task_name INFO - 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.ipc.Client.call(Client.java:1470) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.ipc.Client.call(Client.java:1401) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) 22-10-2024 10:28:51 CST task_name INFO - at com.sun.proxy.$Proxy14.create(Unknown Source) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) 22-10-2024 10:28:51 CST task_name INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 22-10-2024 10:28:51 CST task_name INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 22-10-2024 10:28:51 CST task_name INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 22-10-2024 10:28:51 CST task_name INFO - at java.lang.reflect.Method.invoke(Method.java:498) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) 22-10-2024 10:28:51 CST task_name INFO - at com.sun.proxy.$Proxy15.create(Unknown Source) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:775) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:120) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.SparkContext.<init>(SparkContext.scala:522) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921) 22-10-2024 10:28:51 CST task_name INFO - at scala.Option.getOrElse(Option.scala:121) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:308) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:157) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) 22-10-2024 10:28:51 CST task_name INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 22-10-2024 10:28:51 CST task_name INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 22-10-2024 10:28:51 CST task_name INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 22-10-2024 10:28:51 CST task_name INFO - at java.lang.reflect.Method.invoke(Method.java:498) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) 22-10-2024 10:28:51 CST task_name INFO - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
可以看到目录数超过了Hadoop设置中HDFS单文件夹文件个数限制。
解决方法
修改配置文件 ,重启namenode,datanode。修改hdfs-site.xml配置如下:
<property> <name>dfs.namenode.fs-limits.max-directory-items</name> <value>3200000</value> <description>Defines the maximum number of items that a directory may contain. Cannot set the property to a value less than 1 or more than 6400000.</description> </property>
插曲
本来想删除一些文件让任务先正常执行,在gateway执行机上执行:
[exakit@10.10.10.10 ~]$ hdfs dfs -rm -r /user/azkaban/spark2.4/history/* 24/10/22 10:31:23 INFO retry.RetryInvocationHandler: Exception while invoking getListing of class ClientNamenodeProtocolTranslatorPB over namenode.com/10.10.10.3:8020. Trying to fail over immediately. java.io.IOException: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:597) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
完成后重启Azkaban上的任务就可以正常运行了。
声明
1.本网站名称: 盲盒博客
2.本站永久网址:https://exakit.com
3.本网站的文章部分内容可能来源于网络,仅供大家学习与参考,如有侵权,请联系站长support@exakit.com
4.本站一切资源不代表本站立场,并不代表本站赞同其观点和对其真实性负责
5.本站一律禁止以任何方式发布或转载任何违法的相关信息,访客发现请向站长举报
6.本站资源大多存储在云盘,如发现链接失效,请联系我们我们会第一时间更新
评论(0)