Discover new courses and celebrate learning with us today. 🎓

Development

Web Development

Java Parallel Computation on Hadoop

Master data-driven Java programming with Hadoop in this hands-on course, covering HDFS, MapReduce, and cluster setup for efficient parallel computation.

4.9

315,475 rating
11 Lessons
229 Students
Last updated 4 months ago

By Ivan Ng

By Frahaan Hussain

via Udemy

Go To Course

Brief Summary

This course is all about diving into Hadoop, a powerful tool for processing big data efficiently. You'll learn how to set up and use Hadoop, understand its components like HDFS and MapReduce, and get your hands on some real code examples. It’s fun, trust me!

Key Points

Learn about Apache Hadoop and its key components.
Understand how HDFS and MapReduce work together.
Gain insight into setting up Hadoop clusters in different modes.
Explore hands-on examples with real code.
Discover how major companies utilize Hadoop for data processing.

Learning Outcomes

Understand the basics of parallel computation and limitations before Hadoop.
Setup and run a Hadoop cluster in pseudo-distributed and distributed modes.
Implement real-world examples like data sorting and word co-occurrence.
Gain the ability to analyze large datasets using Hadoop tools.
Explore the practical applications of Hadoop used by big companies.

About This Course

Learn to write real, working data-driven Java programs that can run in parallel on multiple machines by using Hadoop.

Build your essential knowledge with this hands-on, introductory course on the Java parallel computation using the popular Hadoop framework:

- Getting Started with Hadoop

- HDFS working mechanism

- MapReduce working mecahnism

- An anatomy of the Hadoop cluster

- Hadoop VM in pseudo-distributed mode

- Hadoop VM in distributed mode

- Elaborated examples in using MapReduce

Learn the Widely-Used Hadoop Framework

Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0.

All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.

Who are using Hadoop for data-driven applications?

You will be surprised to know that many companies have adopted to use Hadoop already. Companies like Alibaba, Ebay, Facebook, LinkedIn, Yahoo! is using this proven technology to harvest its data, discover insights and empower their different applications!

Contents and Overview

As a software developer, you might have encountered the situation that your program takes too much time to run against large amount of data. If you are looking for a way to scale out your data processing, this is the course designed for you. This course is designed to build your knowledge and use of Hadoop framework through modules covering the following:

- Background about parallel computation

- Limitations of parallel computation before Hadoop

- Problems solved by Hadoop

- Core projects under Hadoop - HDFS and MapReduce

- How HDFS works

- How MapReduce works

- How a cluster works

- How to leverage the VM for Hadoop learning and testing

- How the starter program works

- How the data sorting works

- How the pattern searching

- How the word co-occurrence

- How the inverted index works

- How the data aggregation works

- All the examples are blended with full source code and elaborations

Come and join us! With this structured course, you can learn this prevalent technology in handling Big Data.

Know the essential concepts about Hadoop
Know how to setup a Hadoop cluster in pseudo-distributed mode
Know how to setup a Hadoop cluster in distributed mode (3 physical nodes)

Course Curriculum

Overview

1 Lectures

Welcome!

Background knowledge about Hadoop

3 Lectures

Existing Technical Limitations

Requirements for the new approach

Hadoop solving the limitations

The Hadoop Ecosystem

3 Lectures

Overview of HDFS

Overview of MapReduce

Overview of Hadoop clusters

Get Ready in pseudo-distributed mode

10 Lectures

Cloudera VM

Demonstration: Using the VM

Shared Folders between your host OS and VM

Tips about Shared Folders

Accessing HDFS

Running MapReduce

Demonstration: Accessing HDFS

Demonstration: Running MapReduce

Demonstration: Web Console for HDFS

Demonstration: Web Console for MapReduce

Get Ready in distributed mode

5 Lectures

About the Environment

Setup the Master node - Exercise Manual

Setup the Slave node - Exercise Manual

Start the Master node - Exercise Manual

Start the Slave node - Exercise Manual

Large-scale Word Counting

3 Lectures

The Problem and Design

Demonstration: Develop and Run the program

Word Counting - Source Code

Large-scale Data Sorting

3 Lectures

The Problem and Design

Demonstration: Develop and Run the program

Data Sorting - Source Code

Large-scale Pattern Searching

3 Lectures

The Problem and Design

Demonstration: Develop and Run the program

Pattern Searching - Source Code

Large-scale Item Co-occurrence

3 Lectures

The Problem and Design

Demonstration: Develop and Run the program

Item Co-occurrence - Source Code

Large-scale Inverted Index

3 Lectures

The Problem and Design

Demonstration: Develop and Run the program

Inverted Index - Source Code

Large-scale Data Aggregation

3 Lectures

The Problem and Design

Demonstration: Develop and Run the program

Data Aggregation - Source Code

Data Preparation

3 Lectures

Dataset 0

Dataset 1

Dataset 2

Instructors

Ivan Ng

4.9

315,475 Reviews
345 Students
34 Course

Along my path working as a software architect in the last 15 years for different products like Learning Management System, Online Game, RFID-based warehousing systems and high-frequency advertising systems for companies like Prudential, AXA, Bank of China, I also delivered numerous training on a wide range of IT related topics for more than 10 years - topics include Big Data,...

Instructors

Frahaan Hussain

4.9

315,475 Reviews
345 Students
34 Course

I am CEO of Sonar Systems which is the world leader in educational material for the game engine Cocos2d-x, one of the best and most popular game engines in the world. With years of experience programming and running an online education platform (Sonar Learning) I can help and support new programming like you.I am also a University Lecturer teaching a...

More Courses By Frahaan Hussain

Review

4.9 course rating

4K ratings

Wojciech D.

3.0

1 year ago

Subject is very interesting but presentation is far from perfect.

Helpful
Not helpful

Dhupam A. K.

4.0

5 years ago

good

Helpful
Not helpful

Shubham J. P.

4.5

5 years ago

Audio intensity is low.

Helpful
Not helpful

Mettu M. V. R.

4.5

5 years ago

Good

Helpful
Not helpful

More R. N.

5.0

5 years ago

This Course is very simple

Helpful
Not helpful

Muthavarapu L. N.

5.0

5 years ago

Thank you for providing free for us

Helpful
Not helpful

Subhomoy C.

5.0

5 years ago

It was a great experience knowing about java

Helpful
Not helpful

Subhankar S.

3.0

5 years ago

Yess this course is very helpful for my upcoming career

Helpful
Not helpful

Thummar P. P.

4.5

5 years ago

its good and interesting.

Helpful
Not helpful

Bikash B.

5.0

5 years ago

ggggg

Helpful
Not helpful

Ratings

This course includes:

54.5 hours on-demand video
3 articles
249 downloadable resources
Access on mobile and TV
Full lifetime access
Certificate of completion

Courses You May Like

Lorem ipsum dolor sit amet elit

Show More Courses

Become a Certified Web Developer: HTML, CSS and JavaScript

4.9

(230)

By: Carolyn Welborn

Java Parallel Computation on Hadoop

Brief Summary

Key Points

Learning Outcomes

About This Course

Course Curriculum

Instructors

Ivan Ng