scala part I
spark code can be written in different languages (scala, python, java, r), scala is hybrib, oops + functional.
ways to write code:
- REPL: do directly in scala terminal, to view result intractively
- IDE: eclipse, vs code etc.
why scala
- scala gives best performance as directly scala code can work in jvm, no seperate process required. However in python, a process intreacts with jvm, so an extra layer. In java, bulky code.
- whenever new release happen it is avaialble in scala first then in other languages
- spark is also written in scala
- scripting way, concise code
- functional, supports pure functions and immutable values. Natural fit for library design and data crunching thats why spark libraries are written in scala
Note: function relates input to output like maths sqrt
scala basics
val & var
val: constant, cannot be changed once declared
val a: Int = 2
var: variable, can be changed
var a: Int = 2
type inference: if data type is not mentioned then scala will infer data type
var a = 2
data types
Int | 4 bytes |
String | sequence of chars |
Boolean | true/false |
Char | 2 byte |
Double | 8 bytes |
Float | 4 bytes |
Long | 8 bytes |
Byte | 1 byte |
Note:
- after the value, have to give f in float and l in long
- concat is done using
+
can even do with mixed data types
interpolation
s interpolation (string interpolation):
val name:String = "abc"
println(s"hello $name") //output: 'hello abc'
f interpolation:
val v:Double = 3.145
println(f"value is $v%.2f") //output: value is 3.14
raw interpolation:
println("value is \n 10") //it will give new line
println(raw"value is \n 10") //it will give as is text with \n in print
Note: in a block of code last statement is the return statement
conditional
if else
if (1>2) println("hi") else println("ih")
match
it is like switch
val n = 1
n match {
case 1 => println("1")
case 2 => println("2")
case 3 => println("3")
case _ => println("in else")
}
loop
for loop
for (x <- 1 to 10) {
val i = x*2
println(i)
}
while loop
var i = 0
while (i <= 10) {
println(i)
i = i + 2
}
do while loop
var i = 0
do {
println(i)
i = i + 2
} while (i <= 10)
collections
array
- start from index 0
- array is muttable
- searching on index is fast
- adding new element is inefficient
- it is val but still values in array can be changed, if we try to assign new array then it will not allow
val a = Array(1,2,3,4)
println(a.mkString(","))
for (i <- a) println(i)
list
- holds element in singly linked list
- searching is not effecient
- adding new element at start is effecient
- lot of system defined functions are available to use in the list
val b = List(1,2,3,4)
println(b.head) //it will give 1
println(b.tail) //it will give everything other then head, List(2,3,4)
for (i <- b) println(i)
b.reverse
- adding a new element at start of list is effecient.
ex:
10 :: b //will add 10 in start of list
tuple
- can treat like record in table
- can have diff data type
- it start with index 1
- if tuple has only 2 element then it can treated as key value pair
val c = ("abc",1000,true)
println(c._1) // this will retrun the first element
range
range of values
val r = 1 to 10 // this gives including 10 we can also use until to exculde the last vaule
for (i <- r) println(i)
set
- hold only distinct vaules
- order is not maintained
val s = Set(1,1,2,2,2,3,4) // it will save only distinct values un ordered
min, max, sum like functions can be used on it
map
- collection of key value pair
- with key can search value
- hold unique key only if repeating the latest one will be used
val m = Map(1 -> "abc", 2 -> "def")
m.get(2)
Note
- only array is muttable others are immuttable in collections
- Array, List, Tuple order is maintained
pure function
if all three below properties satisfy then function is pure
- input determines output. ex: dollar to rs conversion function accepts only dollar then it has to be depndent on some conversion if not passed then its impure function.
- function does not change input vaule. ex: input to function if changed in the function then its impure
- no side effects. ex: if println used in function then impure
easy way to check if function is pure: referential transparency- if replacing the function with vaule do not impact result. ex: sqrt(4) where called can be replaced with 2
first class function
a. whatever we can do with values in traditional languages, same should be able to do with function, treat function like values.
val a: Int = 1
def sample(i: Int): Int = {
i * 2
}
val a = sample(_)
a(5) // this should return 10
b. should be able to pass function as param to function
def func(i: Int): Int = { i * 3 }
def func2(i: Int, f:Int => Int) = {
f(i)
}
c. return function from function
def func = {
x: Int => x*x
}
Note: by default all function are first class in scala
higher order func
function which either takes function as input parameter or returns another function as output
ex: map function
map: if n input rows then we get n output rows
var a = 1 to 10
def sample(i: Int): {i * 2}
a.map(sample) # this will double all the rows
anonymous function
without name function, same previous example can be done by below. mostly used with higher order func as shown in example map
var a = 1 to 10
a.map(x => x * 2)
// similar to lambda in python
Note: val is preffered over var because of immutability
loop vs recursion vs tail recursion
find factorial using loop
def factorial(input: Int): Int = {
var result: Int = 1
var i: Int = 1
for(i <- 1 to input)
{
result = result * i
}
return result
}
//here we are mutating result and i, in scala val are preffered over var
find factorial using recursion
def factorial(i: Int): Int = {
if (i == 1) 1
else i * factorial(i - 1)
}
//this solves problem for mutating but it takes memory to capture all entries until terminating condition reached. for large calculation it might go for out of memory
find factorial using tail recursion
def factorial(i: Int, result: Int): Int = {
if (i == 1) result
else factorial(i - 1, result*i)
}
//in this only the last statment is hold in memory as required data is only in last statement.
statement vs expression
- each line of code is statement
- expression is line of code that returns something
- in scala we only have expression no statements, so every line of code returns something
closure
- in functional programming a function can return a function
- in oops we can return an object
object has data elements like variables and functions, in functional we have only functions. so how to get data elements, thats where we use closure, in function it can have local variable which can be used thats called closure.
Type system
operator
there are no operators in scala only methods
1 + 2
//here + is method, it is short of a.+(b)
//can be done as a + b
//we can see all methods available by a.tab
placeholder syntax
val a = 1 to 100
a.map((x:Int) => x * 2)
//here always we have one input param then we can remove it and replace with placeholder syntax
a.map(_ * 2)
partially applied functions
this is an act of creating brand new functins by fixing one or more parameters in a function
def func1(x: Double, y: Double) = {x/y}
func1(10,3)
val inverse = func1(1, _: Double)
inverse(10)
def sum(x: Int, y: Int, f: Int => Int) = {
f(x) + f(y)
}
sum(2,3,x=>x*x)
sum(2,3,x=>x*x*x)
val sum_of_squares = sum(_:Int, _:Int, x=>x*x)
sum_of_sqaues(2,2)
function currying
syntactic sugar, similar to partially applied function
def sum(f:Int => Int)(x:Int, y:Int) = {
f(x) + f(y)
}
val sum_of_sqaues = sum(x=>x*x)_
sum_of_sqaues(2,2)
class
class contains data + function
class Person //empty class
val i = new Person //instantiate class
println(i) //reference of class will be printed
class Person(name: String) //constructor
val i = new Person("Abc") //instantiate class
println(i.name) //it will error as "name" is parameter not member
//to resolve we need to define class as below
class Person(val name: String)
val i = new Person("abc")
println(i.name)
class Person(val name: String, number: Int) {
val i = 1
def method1 = number * 2
def method2(s: Int) = s * 5
}
val p = new Person("abc", 2)
println(p.name) //output: abc
println(p.method1) //output: 4
println(p.method2(3)) //output: 15
class level functionality
for a class we instantiate an instance but if we require some static value to be used by all instances then we can use object to do that. In other languages usually we call object of class is instansiated but here in scala object is for static values and instance is created for class.
object Person {
val fix_value = 1
def is_fix: Boolean = true
}
Note:
- having object and class with same name, this is companion design pattern
- singleton design pattern is used using object
inheritance
- child class inherits propetry of parent class
- child class can inherit only one parent, mutiple inheritence is not possible
class Abc {
def method1 = println("in method1")
}
class Def extends Abc {
def method2 = println("in method2")
}
val v = new Def
v.method1
v.method2
access modifiers
- private: child class cannot access member of public class
- protected: child class can call member of parent class inside child class only not in instance instantiation of child class
- no modifier (public)
class Abc {
private def method1 = println("in method1")
}
abstract class
- can contain unimplemented methods and undefined values
- implement later by inheritance in child class
- instance cannot be created for abstract class
- some methods can have defination also
abstract class Abc {
def method1
}
trait
- similar to abstract class, inheritence can be done only on one abstract class, but traits can be used to do multiple inheritence
- traits cannot have constructor parameter
trait Ghi {
def method2
}
trait Jkl {
def method3
}
class Mno extends Abc with Ghi with Jkl {
def method1 = println("in method1")
def method2 = println("in method2")
def method3 = println("in method3")
}
case class
special class where need to write less code
case class Abc(name: String, number: Int)
val i = new Abc("a", 1)
println(i.name) //note here val is not required in parameter to make it memeber
println(i.toString) //with normal class it prints reference but here we get Abc(a,1)
println(i) //even without toString it works fine
val j = new Abc("a", 1)
println(i == j) //in normal class it check reference which give false but here it will return true
//when case class created then a companion object is already created
val k = i.copy() //copy method
val k = i.copy(number=5) //data can also be modified in copy menthod
//case classes are serializable
Sources
- https://docs.scala-lang.org/tour/unified-types.html
Comments