spark streaming
In batch processing, at certain frequency batch jobs are run but in case we require that batch to be very very small (depending on requirement, lets say we h...
In batch processing, at certain frequency batch jobs are run but in case we require that batch to be very very small (depending on requirement, lets say we h...
optimizations can be at application code level or at cluster level, here we are looking more at cluster level optimizations
spark core works on rdds (spark 1 style) but we have high level constructs to query/process data easily, its dataframe/datasets
it is open source datawarehouse to process structured data on top of hadoop
Yet Another Resource Negotiator Lets first go through how things are in hadoop initial version and what the limitations are which is solved by YARN.
module
hadoop offers: hdfs: for storage mapreduce: for computation yarn: for resource management
scala runs on top of jvm scala is like java so requires main, or we can extends App then we dont have to define main method
A way to calculate time consumed by an algorithm, as a function of input.
spark code can be written in different languages (scala, python, java, r), scala is hybrib, oops + functional.
creating table which can be accessed both by hive and hbase, this is done in cases where we require quick (low latency) searches and faster processing of dat...
features
It is for distributed databases. And says that we can have only two out of three gurantees.
rdbms properties
It is for dimension tables where changes are less in source rdbms which we want to get into datawarehouse or hdfs
hive server / thrift server
Vectorization
Compression will help to: save storage reduce io cost
There are certain parameters to consider when chossing a file format.
It is used to transfer data between rdbms to hdfs and vice versa.
MapReduce is programming model for processing big datsets. It consists of two stages: map reduce
hdfs is hadoop distributed file system. Highly fault tolerant and is designed to deploy on low cost machines.
To host website we have different ways and for this blog we are focussing on a use case where we need to have a website for blogging, and we are using: je...