Swiftorial Logo
Home
Swift Lessons
Tutorials
Learn More
Career
Resources

Spring for Apache Hadoop with Spring Boot Tutorial

Introduction

Spring for Apache Hadoop integrates the Spring Framework with the Hadoop ecosystem, allowing developers to build applications that can interact with Hadoop services easily. This tutorial will guide you through the process of setting up a Spring Boot application that connects to Apache Hadoop, enabling you to perform various operations such as reading and writing data using the Hadoop Distributed File System (HDFS).

Prerequisites

Before we begin, ensure that you have the following installed on your system:

  • Java Development Kit (JDK) 8 or later
  • Apache Maven
  • Apache Hadoop (local or cluster setup)
  • Spring Boot (latest version)

Setting Up Your Spring Boot Project

We will start by creating a new Spring Boot project using Spring Initializr. Follow these steps:

  1. Go to Spring Initializr.
  2. Select your preferred project metadata (Group, Artifact, Name, etc.).
  3. Choose the following dependencies:
    • Spring Web
    • Spring for Apache Hadoop
  4. Click on "Generate", and a ZIP file will be downloaded.
  5. Extract the ZIP file and open the project in your favorite IDE.

Configuring Hadoop Properties

Next, we need to configure our application to connect to Hadoop. Open the application.properties file in the src/main/resources directory and add the following properties:

application.properties

fs.defaultFS=hdfs://localhost:9000
spring.hadoop.fs.uri=hdfs://localhost:9000

Creating a HDFS Service

Now, let's create a service that will handle HDFS operations. Create a new class called HdfsService in the src/main/java/com/example/demo package:

HdfsService.java

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import org.springframework.hadoop.fs.FsTemplate;

@Service
public class HdfsService {
    @Autowired
    private FsTemplate fsTemplate;

    public void writeFile(String fileName, String content) {
        fsTemplate.write(fileName, content.getBytes());
    }

    public String readFile(String fileName) {
        return new String(fsTemplate.read(fileName));
    }
}

Creating a Controller

To expose our HDFS service via a REST API, we will create a controller. Create a new class called HdfsController:

HdfsController.java

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/hdfs")
public class HdfsController {
    @Autowired
    private HdfsService hdfsService;

    @PostMapping("/write")
    public void writeFile(@RequestParam String fileName, @RequestParam String content) {
        hdfsService.writeFile(fileName, content);
    }

    @GetMapping("/read")
    public String readFile(@RequestParam String fileName) {
        return hdfsService.readFile(fileName);
    }
}

Running the Application

To run your Spring Boot application, use the following command in your project directory:

Run command

./mvnw spring-boot:run

Once the application is running, you can test the HDFS operations using tools like Postman or cURL. Here are some example requests:

Write to HDFS

POST http://localhost:8080/hdfs/write?fileName=test.txt&content=Hello%20HDFS

Read from HDFS

GET http://localhost:8080/hdfs/read?fileName=test.txt

Conclusion

In this tutorial, we have explored how to integrate Spring Boot with Apache Hadoop, creating a simple REST API to interact with HDFS. This setup allows developers to harness the power of Hadoop while leveraging the Spring Framework's features, making it easier to build robust data applications.