Entries by Thomas Memenga

GDELT on SCDF : Implementing a reactive source application

In the second part of our blog post series “processing GDELT data with SCDF on kubernetes” we will create a custom source application based on spring cloud stream to pull GDELT Data and use it in a very simple flow. You can find the source code on github: git clone https://github.com/syscrest/gdelt-on-spring-cloud-data-flow cd gdelt-article-feed-source The project […]

how to use dynamic allocation in a oozie spark action on CDH5

using spark’s dynamic allocation feature in a oozie spark action can be a tricky. First you need to make sure that dynamic allocation is actually available on your cluster. Navigate to your “Spark” service, then “Configuration” and search for “dynamic”. Both (shuffle service + dynamic allocation) needs to be enabled. If you just omit –num-executors […]

fixing spark classpath issues on CDH5 accessing Accumulo 1.7.2

We experienced some strange NoSuchMethorError while migrating a Accumulo based application from 1.6.0 to 1.7.2 running on CDH5. A couple of code changes where necessary moving from 1.6.0 to 1.7.2, but these were pretty straightforward (members visibility changed, some getters were introduced). Everything compiled fine, but when we executed the spark application on the cluster […]