Posts by Year

2022

spark streaming

8 minute read

In batch processing, at certain frequency batch jobs are run but in case we require that batch to be very very small (depending on requirement, lets say we h...

spark optimizations

8 minute read

optimizations can be at application code level or at cluster level, here we are looking more at cluster level optimizations

spark part-II

8 minute read

spark core works on rdds (spark 1 style) but we have high level constructs to query/process data easily, its dataframe/datasets

hive basics

6 minute read

it is open source datawarehouse to process structured data on top of hadoop

yarn

3 minute read

Yet Another Resource Negotiator Lets first go through how things are in hadoop initial version and what the limitations are which is solved by YARN.

spark part-I

11 minute read

hadoop offers: hdfs: for storage mapreduce: for computation yarn: for resource management

scala part II

5 minute read

scala runs on top of jvm scala is like java so requires main, or we can extends App then we dont have to define main method

time complexity

2 minute read

A way to calculate time consumed by an algorithm, as a function of input.

scala part I

9 minute read

spark code can be written in different languages (scala, python, java, r), scala is hybrib, oops + functional.

cap theorem

less than 1 minute read

It is for distributed databases. And says that we can have only two out of three gurantees.

hbase

5 minute read

rdbms properties

slowly changing dimensions

less than 1 minute read

It is for dimension tables where changes are less in source rdbms which we want to get into datawarehouse or hdfs

sqoop working

1 minute read

It is used to transfer data between rdbms to hdfs and vice versa.

MapReduce working

2 minute read

MapReduce is programming model for processing big datsets. It consists of two stages: map reduce

hdfs architecture

2 minute read

hdfs is hadoop distributed file system. Highly fault tolerant and is designed to deploy on low cost machines.

Build and deploy static website

1 minute read

To host website we have different ways and for this blog we are focussing on a use case where we need to have a website for blogging, and we are using: je...

Back to Top ↑