spark oozie action jobs not showing up on spark history server

by Thomas Memenga on 2017-03-10

spark oozie action jobs not showing up on spark history server

If you execute spark jobs within an oozie workflow using a action node on a Cloudera CDH5 cluster, your job may not show up on your spark history server. Even if you configured all these things using the cloudera manager, your history server may only lists jobs started on the commandline using spark-submit.

adding missing configuration properties

When using mode = cluster and master = yarn-cluster or yarn-client you need to provide spark.yarn.historyServer.addres and spark.eventLog.dir with the actual adress of your spark history server and the fully qualified hdfs path of the applicationHistory directory and set spark.eventLog.enabled to true.

<master>yarn-cluster</master>>
<mode>cluster</mode>
<name>spark-job-name</name>
<class>com.syscrest.demo.CustomSparkOozieAction</class>
<jar>${sparkJarPath}</jar>
<spark-opts>
... your job specific options ...
--conf spark.yarn.historyServer.address=http://historyservernode:18088
--conf spark.eventLog.dir=hdfs://nameservicehost:8020/user/spark/applicationHistory 
--conf spark.eventLog.enabled=true
</spark-opts> 
<arg>-t</arg>

With these additional configuration properties these oozie-controlled spark jobs should also show up on your spark history server.

Tags: