Entry Level Hadoop Developer and Training



Entry Level Hadoop Developer and Training

Job ID:



Woburn, MA 


Civil Engineering


Posted By:


Job Type:

Full time



Start Date:


Job Function:




Job Description:

 We are hiring 40 Hadoop entry level Developers, for our joint venture with a startup company. As we know that Hadoop is a new technology and not many people know it. We will be conducting a 2 Months of free intense HADOOP in class training which will start from June 21st 2014 in Boston, MA. We will provide free accommodation for our out of state students. With the help of this intensive training we will also be able to help you do a HADOOP certification.


You need to know basics of Java to be enrolled in this program.


It is a very exciting opportunity as we are expecting some senior executives from fortune 100 companies and some professors from well known colleges to be our guest instructors. We see a great future for HADOOP developers and we would want you to be one of them. We will even sponsor for your immigration status. 

Below is course content that we will be teaching you!!


LINUX Introduction

File Handling

Text Processing

System Administration




Core Java Training



Exception Handling



INTRODUCTION TO BIG DATA-HADOOP                                                                

Big Data (What, Why, Who) – 3++Vs – Overview of Hadoop EcoSystem - Role of Hadoop in Big data – Overview of other Big Data Systems – Who is using Hadoop – Hadoop integrations into Exiting Software Products - Current Scenario in Hadoop Ecosystem - Installation - Configuration - UseCases of Hadoop (HealthCare, Retail, Telecom)



Concepts - Architecture – Data Flow (File Read , File Write)–Fault Tolerance - Shell Commands – Java Base API – Data Flow Archives – Coherency - Data Integrity – Role of Secondary NameNode



Theory – Data Flow (Map – Shuffle - Reduce) – MapRed vs MapReduce APIs - Programming [Mapper, Reducer, Combiner, Partitioner] –Writables – InputFormat – Outputformat - Streaming API using python – Inherent Failure Handling using Speculative Execution – Magic of Shuffle Phase –FileFormats – Sequence Files


ADVANCED MAPREDUCE PROGRAMMING                                                            

Counters (Built In and Custom) – CustomInputFormat – Distributed Cache – Joins (MapSide, Reduce Side) – Sorting - Performance Tuning –GenericOptionsParser - ToolRunner – Debugging(LocalJobRunner)



Multi Node Cluster Setup using AWS Cloud Machines –Hardware Considerations –Software Considerations - Commands (fsck, job, dfsadmin) – Schedulers in Job Tracker - RackAwareness Policy - Balancing - NameNode Failure and Recovery - commissioning and Decommissioning a Node – Compression Codecs



Introduction to NoSQL – CAP Theorem – Classification of NoSQL – Hbase and RDBMS – HBASE and HDFS- Architecture (Read Path, Write Path, Compactions, Splits) - Installation – Configuration - Role of Zookeeper – HBase Shell - Java Based APIs (Scan, Get, other advanced APIs )– Introduction to Filters- RowKey Design - Map reduce Integration – Performance Tuning –What’s New in HBase 0.98 – Backup and Disaster Recovery - Hands On



Architecture – Installation –Configuration – Hive vs RDBMS - Tables – DDL – DML – UDF – UDAF – Partitioning – Bucketing – MetaStore - Hive-Hbase Integration – Hive Web Interface – Hive Server(JDBC,ODBC, Thrift) – File Formats (RCFile - ORCFile) – Other SQL on Hadoop



Architecture –Installation - Hive vs. Pig - Pig Latin Syntax –Data Types –Functions (Eval, Load/Store, String, DateTime) - Joins - Pig Server –Macros- UDFs- Performance - Troubleshooting – Commonly Used Functions



Architecture, Installation, Commands (Import, Hive-Import, EVal, Hbase Import, Import All tables, Export) – Connectors to Existing DBs and DW                                                               



Why Flume? - Architecture, Configuration (Agents), Sources(Exec-Avro-NetCat), Channels(File,Memory,JDBC, HBase), Sinks(Logger, Avro, HDFS, Hbase, FileRoll), Contextual Routing (Interceptors, Channel Selectors) - Introduction to other aggregation frameworks



Architecture, Installation, Workflow, Coordinator, Action (Mapreduce, Hive, Pig, Sqoop) – Introduction to Bundle – Mail Notifications


HADOOP 2.0                                                                                                                            

Limitations in Hadoop-1.0 - HDFS Federation - High Availability in HDFS – HDFS Snapshots – Other Improvements in HDFS2- Introduction to YARN aka MR2 – Limitations in MR1 – Architecture of YARN - MapReduce Job Flow in YARN – Introduction to Stinger Initiative and Tez – BackWard Compatibility for Hadoop 1.X



Introduction to Information Retrieval - common usecases - Introduction to Solr and Lucene – Installation – Concepts ( Cores,Schema , Documents, fields, Inverted Index,) - Configuration - CRUD operation requests and responses – Java Based APIs – Introduction to SolrCloud


Cloudera / MapR Certification Assistance will be provided!!

Please do not hesitate to contact me if you have any more questions.


Company Info

Web Site:

Company Profile