首页 文章

通过Apache POI读取大型Excel文件(xlsx)时出错

提问于
浏览
7

我试图通过Apache POI读取大型excel文件xlsx,比如40-50 MB . 我失去了内存异常 . 当前堆内存为3GB .

我可以毫无问题地阅读较小的excel文件 . 我需要一种方法来读取大型excel文件,然后通过Spring excel视图将它们作为响应返回 .

public class FetchExcel extends AbstractView {


    @Override
    protected void renderMergedOutputModel(
            Map model, HttpServletRequest request, HttpServletResponse response) 
    throws Exception {

    String fileName = "SomeExcel.xlsx";

    response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");

    OPCPackage pkg = OPCPackage.open("/someDir/SomeExcel.xlsx");

    XSSFWorkbook workbook = new XSSFWorkbook(pkg);

    ServletOutputStream respOut = response.getOutputStream();

    pkg.close();
    workbook.write(respOut);
    respOut.flush();

    workbook = null;                    

    response.setHeader("Content-disposition", "attachment;filename=\"" +fileName+ "\"");


    }    

}

我首先开始使用 XSSFWorkbook workbook = new XSSFWorkbook(FileInputStream in); ,但每个Apache POI API的成本很高,所以我切换到OPC包的方式,但仍然是相同的效果 . 我不需要解析或处理文件,只需读取并返回即可 .

4 回答

  • 0

    下面是使用sax解析器读取大型xls文件的示例 .

    public void parseExcel(File file) throws IOException {
    
            OPCPackage container;
            try {
                container = OPCPackage.open(file.getAbsolutePath());
                ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(container);
                XSSFReader xssfReader = new XSSFReader(container);
                StylesTable styles = xssfReader.getStylesTable();
                XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
                while (iter.hasNext()) {
                    InputStream stream = iter.next();
    
                    processSheet(styles, strings, stream);
                    stream.close();
                }
            } catch (InvalidFormatException e) {
                e.printStackTrace();
            } catch (SAXException e) {
                e.printStackTrace();
            } catch (OpenXML4JException e) {
                e.printStackTrace();
            }
    
    }
    
    protected void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException {
    
            InputSource sheetSource = new InputSource(sheetInputStream);
            SAXParserFactory saxFactory = SAXParserFactory.newInstance();
            try {
                SAXParser saxParser = saxFactory.newSAXParser();
                XMLReader sheetParser = saxParser.getXMLReader();
                ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() {
    
                @Override
                    public void startRow(int rowNum) {
                    }
                    @Override
                    public void endRow() {
                    }
                    @Override
                    public void cell(String cellReference, String formattedValue) {
                    }
                    @Override
                    public void headerFooter(String text, boolean isHeader, String tagName) {
    
                    }
    
                }, 
                false//means result instead of formula
                );
                sheetParser.setContentHandler(handler);
                sheetParser.parse(sheetSource);
            } catch (ParserConfigurationException e) {
                throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
    }
    
  • 2

    您没有提及是否需要修改电子表格 .

    这可能是显而易见的,但如果您不需要修改电子表格,那么您不需要解析它并将其写回来,您只需从文件中读取字节,然后写出字节,就像您一样,说图像,或任何其他二进制格式 .

    如果您确实需要在将电子表格发送给用户之前对其进行修改,那么据我所知,您可能需要采取不同的方法 .

    我知道用Java读取Excel文件的每个库都将整个电子表格读入内存,因此每个可能同时处理的电子表格必须有50MB的内存 . 正如其他人所指出的,这涉及调整VM可用的堆 .

    如果您需要同时处理大量电子表格,并且无法分配足够的内存,请考虑使用可以流式传输的格式,而不是一次性读取到内存中 . 可以通过Excel打开CSV格式,过去我通过将content-type设置为application / vnd.ms-excel,将附件文件名设置为以“.xls”结尾的内容,但实际上返回CSV,我已经取得了很好的效果内容 . 我没有在几年内尝试过这个,所以YMMV .

  • 13

    在bellwo示例中,我将添加一个完整的代码,如何将完整的excel文件(对于我60Mo)解析为对象列表而没有任何问题“ out of memory ”并且正常工作:

    import java.util.ArrayList;
    import java.util.List;
    
    
    class DistinctByProperty {
    
        private static OPCPackage xlsxPackage = null;
        private static PrintStream output= System.out;
        private static List<MassUpdateMonitoringRow> resultMapping = new ArrayList<>();
    
    
        public static void main(String[] args) throws IOException {
    
            File file = new File("C:\\Users\\aberguig032018\\Downloads\\your_excel.xlsx");
    
            double bytes = file.length();
            double kilobytes = (bytes / 1024);
            double megabytes = (kilobytes / 1024);
            System.out.println("Size "+megabytes);
    
            parseExcel(file);
        }
    
        public static void parseExcel(File file) throws IOException {
    
            try {
                xlsxPackage = OPCPackage.open(file.getAbsolutePath(), PackageAccess.READ);
                ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(xlsxPackage);
                XSSFReader xssfReader = new XSSFReader(xlsxPackage);
                StylesTable styles = xssfReader.getStylesTable();
                XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
                int index = 0;
                while (iter.hasNext()) {
                    try (InputStream stream = iter.next()) {
                        String sheetName = iter.getSheetName();
                        output.println();
                        output.println(sheetName + " [index=" + index + "]:");
                        processSheet(styles, strings, new MappingFromXml(resultMapping), stream);
                    }
                    ++index;
                }
    
            } catch (InvalidFormatException e) {
                e.printStackTrace();
            } catch (OpenXML4JException e) {
                e.printStackTrace();
            } catch (SAXException e) {
                e.printStackTrace();
            }
        }
    
        private static void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, MappingFromXml mappingFromXml, InputStream sheetInputStream) throws IOException, SAXException {
            DataFormatter formatter = new DataFormatter();
            InputSource sheetSource = new InputSource(sheetInputStream);
            try {
                XMLReader sheetParser = SAXHelper.newXMLReader();
                ContentHandler handler = new XSSFSheetXMLHandler(
                        styles, null, strings, mappingFromXml, formatter, false);
    
                sheetParser.setContentHandler(handler);
                sheetParser.parse(sheetSource);
                System.out.println("Size of Array "+resultMapping.size());
            } catch(ParserConfigurationException e) {
                throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
            }
        }
    }
    

    你必须添加一个实现的calss

    SheetContentsHandler

    import com.sun.org.apache.xpath.internal.operations.Bool;
    import org.apache.poi.ss.util.CellAddress;
    import org.apache.poi.ss.util.CellReference;
    import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
    
    import org.apache.poi.xssf.usermodel.XSSFComment;
    
    import java.io.PrintStream;
    import java.util.ArrayList;
    import java.util.List;
    
    public class MappingFromXml implements SheetContentsHandler {
    
        private List<myObject> result = new ArrayList<>();
        private myObject myObject = null;
        private int lineNumber = 0;
        /**
         * Number of columns to read starting with leftmost
         */
        private int minColumns = 25;
        /**
         * Destination for data
         */
        private PrintStream output = System.out;
    
        public MappingFromXml(List<myObject> list) {
            this.result = list;
        }
    
        @Override
        public void startRow(int i) {
            output.println("iii " + i);
            lineNumber = i;
            myObject = new myObject();
        }
    
        @Override
        public void endRow(int i) {
            output.println("jjj " + i);
            result.add(myObject);
            myObject = null;
        }
    
        @Override
        public void cell(String cellReference, String formattedValue, XSSFComment comment) {
            int columnIndex = (new CellReference(cellReference)).getCol();
    
            if(lineNumber > 0){
                switch (columnIndex) {
                    case 0: {//Tech id
                        if (formattedValue != null && !formattedValue.isEmpty())
                            myObject.setId(Integer.parseInt(formattedValue));
                    }
                    break;
                    //TODO add other cell
                }
            }
        }
    
        @Override
        public void headerFooter(String s, boolean b, String s1) {
    
        }
    }
    

    欲了解更多信息,请访问link

  • 6

    在解析xlsx文件时我也遇到了同样的OOM问题......经过两天的挣扎,我终于发现下面的代码非常完美;

    此代码基于sjxlsx . 它读取xlsx并存储在HSSF表中 .

    [code=java] 
                // read the xlsx file
           SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx"));
    
            HSSFWorkbook hsfWorkbook = new HSSFWorkbook();
    
            org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet();
    
            Sheet sheetToRead = workbook.getSheet(0, false);
    
            SheetRowReader reader = sheetToRead.newReader();
            Cell[] row;
            int rowPos = 0;
            while ((row = reader.readRow()) != null) {
                org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos);
                int cellPos = 0;
                for (Cell cell : row) {
                    if(cell != null){
                        org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos);
                        hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING);
                        hfsCell.setCellValue(cell.getValue());
                    }
                    cellPos++;
                }
                rowPos++;
            }
            return hsfSheet;[/code]
    

相关问题