Real Time Hadoop Interview Questions From Various Interviews

Real Time Hadoop Interview Questions from Various interviews
1. Hive – Where do you use Internal or Managed table? What scenarios?

2. In your resume, what do you mean by, “monitoring & managing MapReduce jobs”?
Explain?
3. Interviewer’s Project: How to modify the RDBMs’ Nested SQL queries into
Hadoop framework using Pig.
4. Sqoop: Need to know very well. Some of the current projects are importing data
from other RDBMs sources into HDFS.
5. Can you join or transform tables/columns when importing using Sqoop?
6. Can you do the above with different RDBMs (not clear)?
7. How do you transfer flat files from Unix systems?
8. What is your Pig/Hive programming level (1- 10)? (Almost all interviewers asked
this.)
9. Learn Scala! – Interviewer repeatedly told me.
Other Interview Questions:
1. Hive – Interval vs External How do you save your files in Hive

2. Sqoop – Incremental vs hast modified relate to your project
3. Sqoop – How to check if RDBMS Table Columns added/removed and how to
incorporate these changes into the import job.
4. What are the challenges you’ve faced in your project? Give 2 examples.
5. How do you check Data Integrity (log files)
6. How to improve performance in your script (PIG)?
7. Tell me about your project? work.
8. How do you use Partitioning/Bucketing in your project? (Examples from your
project)
9. Where do you look for answers? (user groups, Apache Web, stack overflow)
10. NOSQL- HBase – Unstructured data storage?
11. How to debug Production issue? Give example. (logs, script counters, JVM)
12. Data Ingestion
13. What is the file size you’ve used?
Dev. Environmet
Production Environmet
1. Does Hive support indexing? (How does this relate to Partition and Bucketing)
2. Pig support Conditional Loop?
3. Hive – What type of data stored?
4. Recruiter: In your experience, what is the jump from DB developer to Hadoop
without Java experience?
More Technical type Interview Questions:
1. What functions did you use in PIG?

2. Filter – What did you filter out?
3. Join – What did you join?
4. What is your cluster size?
5. What is the file size for production environment?
6. How long does it take to run your script in Production cluster?
7. Are you planning for anything to improve the performance?
8. What size of file do you use for Development?
9. What did you work on HBase?
10. Why Hadoop? compare to RDBMS.
11. Hive – What did you do to increase the performance.
12. PIG – what did you do to increase the performance
13. What Java UDF did you write?
14. What scenario do you think you can use Java for?
15. You can process log files in RDBMS too. Why Hadoop?
16. Hive partitioning – your project example? Why?
1. Hive – What file format do you use in your work? (Avro, Parquet, Sequence file)
2. Hadoop – What is the challenge or difficulty you’ve faced?
3. PIG – What is the challenge or difficulty you’ve faced?
4. Flume – What is the challenge or difficulty you’ve faced?
5. Sqoop – What is the challenge or difficulty you’ve faced? (he didn’t ask this
question)
6. How experienced are you in Linux?
7. What shell type do you use?
8. How about your experience in Cloudera Manager?
9. How about your experience in Cloudera Manager?
10. Do you use Impala? (I compared it with Hive and explained in more details)
11. How do you select the Eco system tools for your project?
InfoSys – Interview Questions:

As you can see, questions are mostly based on theory.
1. Why Hadoop? (Compare to RDBMS)

2. What would happen if NameNode failed? How do you bring it up?
3. What details are in the “fsimage” file?
4. What is SecondaryNameNode?
5. Explain the MapReduce processing framework? (start to end)
6. What is Combiner? Where does it fit and give an example? Preferably from your
project.
7. What is Partitioner? Why do you need it and give an example? Preferably from
your project.
8. Oozie – What are the nodes?
9. What are the actions in Action Node?
10. Explain your Pig project?
11. What log file loaders did you use in Pig?
12. Hive Joining? What did you join?
13. Explain Partitioning & Bucketing (based on your project)?
14. Why do we need bucketing?
15. Did you write any Hive UDFs?
16. Filter – What did you filter out?
17. HBase?
18. Flume?
19. Sqoop?
20. Zookeeper?
21. Impala? Explain the use of Impala?
22. Cassandra? What do you know about Cassandra?
23. ClickStream.
24. What is your cluster size?
25. What is the DataNode configurations? (RAM, CPU core, Disk size)
26. What is the NameNode configurations? (RAM, CPU core, Disk size)
27. How many Map slots & reducer slots configured in each DataNode? (he didn’t ask
this)
28. How do you copy file from cluster to cluster?
29. What commands do you use to check to check system health, jobs, etc.?
30. Do you use Cloudera Manager to monitor and manage the jobs, cluster, etc.?
31. What is Speculative execution?
32. What do you know about Scala? (interviewer asked about the skills that I listed in
my resume)
Java Interview Questions:

Had an array of the follwing elements: [29 12 24 18 -11 -5]
Need an O/P of sorting of arrays ,== [12 18 24 29 -5 -11]
Need an O/P of even and odd numbers in array ,==[12 18 24] && [29 -5
-11]//Declaring an araylist
ArrayList<Integer> arraylist = new ArrayList<Integer>();/* Sorting of arraylist using
Collections.sort*/Collections.sort(arraylist);
for(int counter: arraylist)
{
System.out.println(counter);
}
/*Sort array in reverse order*/
Collections.reverseOrder(arraylist);
System.out.println(“****** Reverse Sorted String Array *******”);
for (int i : stringArray)
{
System.out.println(i);
}
/* sort an array to even numbers and odd numbers*/
public class SortNumbers
{
private static int[] array = {12 18 24 29 -5 -11};
private static List<Integer> even = new ArrayList<>();
private static List<Integer> odd = new ArrayList<>();
public static void even(int[] arr, List even , List odd)
{
for(int i = 0 ; i < arr.length ; i++)
{
if(arr[i] % 2 == 0)
even.add(arr[i]);
else
odd.add(arr[i]);
}
}
//To Display the even and Odd numbers
public static void display(List<Integer> list)
{
for(Integer i : list)
System.out.println(i);
}
public static void main(String[] args){
classify(array,even,odd);
display(even);
}
}
}
2)How to make your class compatible with Java Hash Maps?
Overriding hashcode() and equals() method.
3)You have two tables Employee and Dept with the below columns.Select Maximum
salary by Department.
Employee—–EMPID NAME SAL DEPTID
DEPT—–dept_id dept_name
SELECT d.dept_name, MAX(e.SAL) FROM Employee e,Dept d where (d.dept_id =
e.dept_id) group by
On 07/28/2015
1. Tell me some List implementations?
ArrayList
Linkedlist
2. In what Purposes you use ArrayList and Linkedlist?
ArrayList for fast searching,
LinkedList,for more insertions/deletes
3. In both Arraylist and Linkedlist, which is faster?
ArrayList is faster as it containis duplicates, no sorting
Linkedlist is slow as it contains adding and removing of elements
4. Tell me some Map implementations?
HashMap (unsorted)
TreeMap (Sorted values)
LinkedHashMap( if you want near-HashMap performance and insertion-order iteration)
5. Which of the Map implementations is faster and why?
Hash map is fast as there is no need of extra burden in sorting values…
6. What Happens in Shuffle Phase in Map Reduce?
All the part files will be exchanged between reduce tasks
part files will be generated by partitioners
map output will be transferred over network…
7. What is the Fundemental Data Structure inside a HashMap?
Integer, For calculating hash value for all keys stored into buckets….Buckets are used as
storage
locations…Usually Buckets are array….
8. How do you use Map Reduce methods?
map is method to parse the input records
reduce for aggregating the results reading input from map
9. What are the parameters in Mapper class
map(key, value, context)
10. What is the interface on a Main function on a Mapper?
In Mapper Class you write…..
setup()
map(key, value, context)—-( return type of map method is void…but it writes output to
context)
cleanup()
11. Is it possible to get multiple Key,value pairs from the Map phase?
Yes, by concatenating two or more fields into same field.
12. Imagine you have a Server Class Computer, If you have two files of 1 GB each on
Hard disk,
These files consists of Integers from smaller to larger, how do you Merge the files into
one File
and generate an output of Sorted Order? Tell me the Logic
Read record by record from each file and compare first record from first file with first
record in 2nd
file and same way with 2nd record b/w the files….
If first rec in 1st file < 1st record 2nd file then i will emit 1st record in 1st file and i will
move cursor
of first file to 2nd record in the first file then check with 1st record in 2nd file and so on…
13. What if there are no records in one of the files in the above Scenario?
I will copy records from the remaining file as it is without comparing
14. What is the execution time of the above Program?
1-2 minutes…in Hadoop
15. If you have two files of 1 TB on two disks, you should Merge the files into one File
and generate
an output of Sorted Order? What will you do?
Write all the above logic in map method of map reduce job….or reduce method
16. How the records of the two files are compared in the Map Reducer Phase?
If one of the file is small then i can read that into memory through distributed cache in
setup
method of mapper class
17. What Problems you face in the Reducer Phase?
Out of Memory Problem (To overcome this problem increase the heap size
mapreduce.child.java.opts)

Real Time Hadoop Interview Questions From Various Interviews

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Real Time Hadoop Interview Questions From Various Interviews

Uploaded by

Copyright:

Available Formats

Real Time Hadoop Interview Questions from Various interviews

1. Hive – Where do you use Internal or Managed table? What scenarios?

Other Interview Questions:

1. Hive – Interval vs External How do you save your files in Hive

More Technical type Interview Questions:

1. What functions did you use in PIG?

InfoSys – Interview Questions:

1. Why Hadoop? (Compare to RDBMS)

Java Interview Questions:

You might also like