« Marcel Krčah  

Training: Bring your data-science/big-data team to the next level

Docker training

Photo: Hands-on Docker training, 40 participants

General info

Training structure:

  1. understanding your specific needs
  2. theoretical part introducing concepts
  3. hands-on excercises where trainees can experiment with the technology
  4. interactively share lesson-learnt from the training

Duration (and price) depends on your specific requiremenets. Duration of training varies from 0.5 days to 3 days.

Docker for Data Scientists

In recent years, Docker has become a de-facto world-wide standard for packaging an application that is deployed to production, incl. many data-science use-cases. The basic knowledge of Docker is thus a valuable part of any data-scientist’s toolbelt.

In this hands-on hworkshop, we will look into Docker fundamentals. After the training, you will have an understanding of:

  • Basic concepts behind Docker, incl. Dockerfile, docker-compose, Docker registry
  • Advantages that Docker offers during development and deployment of data-science models
  • Hands-on experience with Docker commands

This is a beginner workshop, suitable for anyone interested in making first steps with Docker. The workshop is not suitable for data-scientists who already use Docker and look for advanced usage.

Intro to Hadoop stack

Hands-on intro into the Hadoop stack: HDFS, Hive, HBase, Sqoop, Flume, Kafka, Spark. Every participant will have their own local Hadoop cluster to experiment with. After the training, participants should orientate themselves in the main Hadoop technologies.

In practice though, the training often pivots to fit exact needs of the team. For example, to cover NoSQL databases, different ingestion tools, etc.

"Understandable technical explanation, covered every topic in BigData and many real-life Hadoop use-cases." — Inmarsat engineer  Read More »

From Java to BigData Scala Engineering

Have Java developers who are converting to Scala and Spark on BigData projects? There are three basic Scala principles they need to know to achieve better code: functional programming, case-classes, and monads. In this training, we will cover all of these with practical examples.

The training will provoke the mind of an imperative Java engineer. And hopefully, a moment of "aha!" will occur during the training when the three principles fall in place.

NB: Basic prior knowledge of Scala would be great. I recommend Martin Odersky's course.

Git: From a user to a master

How to merge elegantly? How to collaborate effectively within a team? What is a pull-request? How to contribute to an open-source repo? How to quickly remove committed IDE files?

And some Git internals: What is a commit/index/staging area? What's inside the .git directory? How to find an unreachable commit? What's Git garbage collection?

This training is ideal for everyone who's been using git for some time but never took time to understand it.

NB: The Git Book is a excellent resource for learning git without training. However, it takes 2-4 days to comprehend it fully. The training takes 3 hours and takes participants deep into git, coupled with interactive discussions and Q&As.

Big Data from a command-line

For many data-processing tasks, standard built-in Unix command-line utilities often offer the simplest and way fastest solutions. No need to use Hadoop, Pandas or SQL, if you have grep, sort, sed, wc, cat, etc. at your disposal in the command-line.

In this training, we will cover these essential tools with hands-on exercises.

I based this training on the course I gave to Unix students at my university back in 2007 and my experience since then.