« Marcel Krčah  


The training can be done both internally as part of the engagement or externally as a one-time thing.

Let me know what you'd like to cover.

Intro to Hadoop stack

Hands-on intro into the Hadoop stack: HDFS, Hive, HBase, Sqoop, Flume, Kafka, Spark. Every participant will have their own local Hadoop cluster to experiment with. After the training, participants should orientate themselves in the main Hadoop technologies.

In practice though, the training often pivots to fit exact needs of the team. For example, to cover NoSQL databases, different ingestion tools, etc.

"Understandable technical explanation, covered every topic in BigData and many real-life Hadoop use-cases." — Inmarsat engineer  Read More »

From Java to BigData Scala Engineering

Have Java developers who are converting to Scala and Spark on BigData projects? There are three basic Scala principles they need to know to achieve better code: functional programming, case-classes, and monads. In this training, we will cover all of these with practical examples.

The training will provoke the mind of an imperative Java engineer. And hopefully, a moment of "aha!" will occur during the training when the three principles fall in place.

NB: Basic prior knowledge of Scala would be great. I recommend Martin Odersky's course.

Git: From a user to a master

How to merge elegantly? How to collaborate effectively within a team? What is a pull-request? How to contribute to an open-source repo? How to quickly remove committed IDE files?

And some Git internals: What is a commit/index/staging area? What's inside the .git directory? How to find an unreachable commit? What's Git garbage collection?

This training is ideal for everyone who's been using git for some time but never took time to understand it.

NB: The Git Book is a excellent resource for learning git without training. However, it takes 2-4 days to comprehend it fully. The training takes 3 hours and takes participants deep into git, coupled with interactive discussions and Q&As.

Big Data from a command-line

For many data-processing tasks, standard built-in Unix command-line utilities often offer the simplest and way fastest solutions. No need to use Hadoop, Pandas or SQL, if you have grep, sort, sed, wc, cat, etc. at your disposal in the command-line.

In this training, we will cover these essential tools with hands-on exercises.

I based this training on the course I gave to Unix students at my university back in 2007 and my experience since then.

Don't see what you are looking for?

Check out my current tool belt and let me know what you'd like to cover. That's how the training before also happened :)