Read CSV files from S3 using Spring Batch

Published by Vignesh M on

In this post, let’s see how to read files from S3 using spring batch.
We will read CSV files as stream without loading the entire data in memory and process each item. For Item writer, we’ll use just the no operation item writer since we will not write the items to file or any database tables.

Job configuration

Let’s start with the Job configuration.

Next, we will define a step to read and process data from CSV present in S3

In the step, we will use read-process-write pattern.

Item reader

For Reader config, we will use SynchronizedItemReader delegating to MultiResourceItemReader. To the MultiResourceItemReader, we will add list of resources. Resources in this case is the files in S3.

Update the value of sourceBucket and sourceObjectPrefix based on the S3 bucket details of your use case.

For this example, I have created a sample movie data set. Flat file item reader config to parse CSV file as below

Item processor

For item processor, we will configure with async capabilities.  This is not needed for the sample data set for the example in this post. However, when you want to process a huge data set async processing will be very helpful.

Item writer

Then, for item writer we configure it with async. This is to support the item processor which process the items in multiple threads and we need to handle the Future which gets returned.

Database

We will use H2 in memory database.

Running the application

To run the application, just execute Run from ReadFileFromS3BatchApplication.java file.

I have created two movie data set CSV files for this example. The data set files are available under resources folder in the project for reference. The execution logs are as below

 

Conclusion

In this article, we have explored how to read files from S3 bucket using spring batch. The key note is we have processed the files as stream without loading them in memory.
The complete source code is available over on Git


Avatar

Vignesh M

Java developer , AWS Certified Solutions Architect Associate and Cloud technology enthusiast. Currently working for Clarivate. He believes that knowledge increases by sharing not by saving.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.