This isn’t a comparison between Apache Spark vs. Dask. But if you are interested in that, Dask is humble enough to include that in their official documentation. This blog is about my experience of using both of these frameworks to build applications.

apache spark & dask

At Quartic.ai we have built multiple jobs using Apache Spark that performs ingress/egress, ETL, score models, etc on different kinds of data which are of both streaming and batch nature. Spark’s rich set of APIs would make your job easy if you have everything as dataframes and RDDs spread across multiple nodes in cluster. …


I skipped writing this post thinking it would be basic. But after getting same doubts from my colleagues and remembering the mistakes I did in past, think this post will be helpful for people to understand how to process streaming data without messing up the order in which they are produced.

In one of my previous posts, I had explained about using Kafka as data source for Spark batch jobs. In this post, I intend to highlight few points for processing streaming data in sorted order which is very important for many business usecases.

This post only focuses on handling…


As a developer, we spend most of our time thinking about the problem than actually coding the solution. And double the time testing it & fixing the bugs. According to me, pushing limits to make development faster will always end in chaos. But building clean code right from the start makes a slow but solid progress. Also clearing all queries before starting the project, finding bugs at early phase, reducing key-strokes by automating your routine helps to make development faster.

I follow the principle of traveling light weight in order to go a long way. Am not using any cool…


There are multiple usecases where we can think of using Kafka alongside Spark for streaming realtime ETL processing involved in projects like tracking web activities, monitoring servers, detecting anomalies in Engine parts and so on. The architecture involves the source producing data which is sent to a Kafka topic & the consumer processes the data for every predefined batch interval.

Why Batch processing from Kakfa?

Any batch processing logic would need to extract required data from the storage warehouse and depending on the amount of data, this operation would involve a lot of time. …


Spark has multiple tutorials, examples & Stackoverflow solutions. But most of them are in Scala. If you want to develop something in Java, you are left with what is available in the Spark’s examples package & few blog posts using older APIs for reference. This post aims to be an additional reference for the new Spark API(2.1.X) focusing on importing data from CSV files into HBase table.

Have coded this application to be generic to handle any CSV file schema. But no transformation on the data will be done, just dumps the data to hbase table (The table needs to…


Title sounds cool right? But no, I actually didn’t make the whole trip on my bike (Fortunately). And this wasn’t one of those leisure trips where I plan my routes, select places to visit & make arrangements for my stay. This was more of a relocation . Yes, I was moving to Goa for my job.

When I was planning for my relocation, I know for sure that I can’t go without my bike. I enquired packers & movers, they charge minimum 8k for this. I would have gone for that option if my friend hadn’t mentioned about taking bike…


Title sounds cool right? But no, I actually didn’t make the whole trip on my bike (Fortunately). And this wasn’t one of those leisure trips where I plan my routes, select places to visit & make arrangements for my stay. This was more of a relocation . Yes, I was moving to Goa for my job.

When I was planning for my relocation, I know for sure that I can’t go without my bike. I enquired packers & movers, they charge minimum 8k for this. I would have gone for that option if my friend hadn’t mentioned about taking bike…


Developing Javascript application is a breeze. It has no strict type checks (loose typing), vast libraries to use from, no need of setting up development environment, can run anywhere, and more and more. But things could get messy pretty soon if you are constantly deveolping it adding logics day by day.

Very soon, the application grows ugly and maintaining it becomes painful task. Here is a great tutorial on setting up client side javascript project using gulp.

Javascript Unit testing

There are multiple testing frameworks out there and I prefer Mocha and Chai as I feel its more expressive.

Below…


Developing Javascript application is a breeze. It has no strict type checks (loose typing), vast libraries to use from, no need of setting up development environment, can run anywhere, and more and more. But things could get messy pretty soon if you are constantly deveolping it adding logics day by day.

Very soon, the application grows ugly and maintaining it becomes painful task. Here is a great tutorial on setting up client side javascript project using gulp.

Javascript Unit testing

There are multiple testing frameworks out there and I prefer Mocha and Chai as I feel its more expressive.

Below…


Developing Javascript application is a breeze. It has no strict type checks (loose typing), vast libraries to use from, no need of setting up development environment, can run anywhere, and more and more. But things could get messy pretty soon if you are constantly deveolping it adding logics day by day.

Very soon, the application grows ugly and maintaining it becomes painful task. Here is a great tutorial on setting up client side javascript project using gulp.

Javascript Unit testing

There are multiple testing frameworks out there and I prefer Mocha and Chai as I feel its more expressive.

Below…

Sathish Jayaram

Technical architect @ Quartic.ai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store