Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 3

1.

given the sfpd rdd to create a pair rdd consisting of tuples consisting of the
form (category,1) in scala use?
--> val pairs = sfpd.map(x=>x.parallelize))

2.repartition(5) is the same as coalesce(5 shuffle=true). state true or false


-->True

3.Which is true for running spark on Hadoop YARN?


--> there are two deploy modes client and cluster

4. What is dynamic allocation?


--> dynamic allocation is a property where executors can be released back to
cluster
resource pool if they are idle for specified period of time

5. Accumulators are incremented can be read from spark workers ? T or F?


--> FALSE

6.The keys transformation returns an RDD with ordered keys from key value psir RDD?
T or F
--> TRUE

7. groupbyKey is less efficient than reducebykey ?


ans

8). which partitioner class used to order keys according to sort order respective
to given type?
--> Rangepartitioner

9.the primary Machine Learning api for spark now is ____ based api?
-->DataFrame

10. an existing RDD unhcrRDD contains refugee?


--> val country = unhcrRDD.map(x=>(x(0),x(3))).reducebykey((a,b)=>a+b)

11. the number of stages in a job is no of RDD in DAG, scheduler can truncate
lineage when ?
-->RDD is cched or persisted

12.combining a set of filtered edges and filtered vertives from a graph creates
what structure?
-->subgraph

13. what RDD function returns max,min,count,mean,std deviation?


--> stats

14.spark broadcast variables and setting variables in your driver program in


pyspark are same?
-->False

15.which of following in scala will give top 10 resolutuins assuming sfpdDF is


dataframe
registered as table-sfpd?
--> sqlContext.sql(“SELECT resolution.count(incidentnum) AS inccount FROM sfpd
GROUP BY resolution ORDER BY inccount DESC LIMIT 10”)

16. Given the pair RDD country that contain tuple (country, count()) which one to
get lowest
refugee in scala?
Ans; val low-=country.map(x=>(x._2,x._1)).sortbykey(false).first

17.Which parameters required for windowed operatrion as reducebykeyAndwindow?


--> window length and sliding interval

18. What r some of the things u can monitor in spark web UI?
--> All of above

19. Which of the following is not feature of spark?


--> it is cost efficient

20) How to enable dynamic allocation?


Ans; spark.dynamicallocation.enabled=True

21.which of the below command used to remove a broadcast variable bvar from memory?
--> bvar.unpersist()

22. A dataframe can be created from existing RDD . You would create dataframe from
existing
rdd by inferring schema using case classes in which case?
--> if all your users are going to need dataset parsed in same way

23. Dstream internally is?


-->continuous stream of rdd

24.memory_and_disk_ser storage level specifies what storage options for rdd?


-->in memory,ondisk,serialized

25) Which partition hinder spark performance?


Ans; Both small and large

26) Which dataframe method is used to remove column from resultant dataframe?
Ans; drop()

27) The foreach and map difference?


Ans; foreach is action and map is transformation

28) Difference between take(1) and first() ?


Ans; take(1) returns a list with one element from an RDD , first() returns one
element not
in list

29) Caching can use disk if memory not available. T or F


Ans; TRUE

30) sparkSQL translated commands into codes ,these codes are processed by ?
ans; executor node

31)

32) apache spark has api’s in ?


ans; All of above

33) pyspark is bunch figuring structure keeps running on grp of item and perform
information
unification . T or F.
ans;True

34) function used to call program written In shellscipt/perl into pyspark/


ans; pipe()

35) ___ leverages spark core fast scheduling capability for performing streaming
analytics?
Ans; SparkStreaming

36) We can create dataframe using


Ans; ALL of the above

37) Which Dstream output operation used to write output to console?


Ans; pprint()

38)What is the default partitioner class used by spark?


ans;Hash Partitioner

39) Some ways of improving performance of ur spark app einclude?


Ans; All of the above

40) Dataset was introduced in which spark release?


Ans; spark 1.6

You might also like