
how to use dynamic allocation in a oozie spark action on CDH5
using spark’s dynamic allocation feature in a oozie spark action can be a tricky.
using spark’s dynamic allocation feature in a oozie spark action can be a tricky.
If you execute spark jobs within an oozie workflow using a action node on a Cloudera CDH5 cluster, your job may not show up on your spark history server. Even if you configured all these things using the cloudera manager, your history server may only lists jobs started on the commandline using spark-submit.
We experienced some strange NoSuchMethorError while migrating a Accumulo based application from 1.6.0 to 1.7.2 running on CDH5. A couple of code changes where necessary moving from 1.6.0 to 1.7.2, but these were pretty straightforward (members visibility changed, some getters were introduced). Everything compiled fine, but when we executed the spark application on the cluster we got an exception that was pointing directly to a line we changed during the migration:
The Cloudera Manager is already capable of tracking usage data via Google Analytics, but that data is beeing send to a cloudera account. This blog post is about configuring the cloudera manager and changing the tracking id so that these usage metrics are being send to your own account.
This blogpost will guide you to the process of cloning, patching, building and deploying a custom version of the oozie workflow engine based on the cdh 5.8.0 source code that is available on github.
how to inject the configuration of a remote ha-hdfs in a distcp call without modifing the local cluster configuration.