Recent Posts

spark streaming

8 minute read

In batch processing, at certain frequency batch jobs are run but in case we require that batch to be very very small (depending on requirement, lets say we h...

spark optimizations

8 minute read

optimizations can be at application code level or at cluster level, here we are looking more at cluster level optimizations

spark part-II

8 minute read

spark core works on rdds (spark 1 style) but we have high level constructs to query/process data easily, its dataframe/datasets

hive basics

6 minute read

it is open source datawarehouse to process structured data on top of hadoop

yarn

3 minute read

Yet Another Resource Negotiator Lets first go through how things are in hadoop initial version and what the limitations are which is solved by YARN.