Memory outage scenario with Apache POI
Hi Dev’s ,
My name is Kiran Vajrapu, and I am from ThoughtWorks.
I’d like to discuss a problem I encountered in a previous project. issue with memory outage issue with Apache POI.
In my project, we will get different types of input files, like XML, Excel, CSV, PDF, etc. Based on file type, the parsing strategy will be changed. At the time of Excel and CSV, we faced the issue mentioned above.
I’ll address the issue: while reading small-size Excel files (under 400–500 MB) is fine. If the file size is increased, the programme crashes. The cause of the problem is that Apache POI reads the Excel file and loads the data into RAM.
After discussing the issue with the team, one of the seniors suggested a solution to the current problem.
Teammate wrote a PowerShell script for splitting the Excel file into multiple files. and executed that PowerShell script from Java. We observed a drastic change in the execution time of a Java programme instead of splitting Excel using Apache POI.
As per my understanding, Apache POI reads the Excel file and uses RAM for manipulation. But PowerShell uses the Excel application to split. As a result, the execution time varied.
If anyone has an appropriate reason, please leave it in the comments.