Small files are a big problem in Hadoop — or, at least, they are if the number of questions on the user list on this topic is anything to go by. In this post I’ll look at the problem, and examine some common solutions.

Problems with small files and HDFS

A small file is one which…

This article will provide you the step-by-step guide for creating Hadoop MapReduce Project in Java with Eclipse. The article explains the complete steps, including project creation, jar creation, executing application, and browsing the project result.

Let us now start building the Hadoop MapReduce WordCount Project.

Hadoop MapReduce Project in Java With Eclipse


  1. Hadoop 3: If Hadoop is…

When a dataset outgrows the storage capacity of a single physical machine, it becomes necessary to partition it across a number of separate machines. A filesystem that manages the storage across a network of machines is called Distributed filesystems. Since they are network-based all the complications of network programming kick…

