java怎么操作excel(java解决导入excel)

指南 0 683

HSSF是POI对Excel-97(-2007)文件操作的纯Java实现。XSSF是POI对Excel 2007 OOXML(.xlsx)文件操作的纯Java实现。从POI 3.8版本开始,提供了一种基于XSSF的低内存占用的API(SXSSF)。

POI提供两种读写API模型:事件模型和用户模型。事件模型是基于流的方式实现,使用sax(simple api for XML)模型进行内容解析,对CPU和内存的消耗小,但无法进行写操作。用户模型基于内存树的方式实现,使用DOM进行excel的解析,对CPU和内存的消耗大,但能够以面向对象的方式进行操作,可读可写。

Event API (HSSF Only)

java怎么操作excel(java解决导入excel)

它可以以比较小的内存来读一个xls文件,为了使用这个API,你需要:

(1)创建一个org.apache.poi.hssf.eventmodel.HSSFRequest的实例

(2)实现org.apache.poi.hssf.eventmodel.HSSFListener接口的类。

(3)HSSFRequest.addListener(yourlistener,recordid)注册(2)中实现的类,recordid应该是org.apache.poi.hssf.record的属性。或者也可认使用HSSFRequest.addListerForAllRecords(yourlistener)。

(4)创建一个org.apache.poi.poifs.filesystem.FileSystem实例, 并把XLS文件通过输入流方式输入。

(5)可以把(4)中创建的实例通过HSSFEventFactory.processWorkbookEvents(request,Filesystem)方法或HSSFEventFactory.processEvents(request,inputstream)与绑定。

(6)这样listener就会调用processRecord(Record)方法,直到整个文件读完。

/**

* This example shows how to use the event API for reading a file.

*/

public class EventExample

implements HSSFListener

private SSTRecord sstrec;

/**

* This method listens for incoming records and handles them as required.

* @param record The record that was found while reading.

*/

public void processRecord(Record record)

switch (record.getSid())

// the BOFRecord can represent either the beginning of a sheet or the workbook

case BOFRecord.sid:

BOFRecord bof = (BOFRecord) record;

if (bof.getType() == bof.TYPE_WORKBOOK)

System.out.println("Encountered workbook");

// assigned to the class level member

} else if (bof.getType() == bof.TYPE_WORKSHEET)

System.out.println("Encountered sheet reference");

break;

case BoundSheetRecord.sid:

BoundSheetRecord bsr = (BoundSheetRecord) record;

System.out.println("New sheet named: " + bsr.getSheetname());

break;

case RowRecord.sid:

RowRecord rowrec = (RowRecord) record;

System.out.println("Row found, first column at "

+ rowrec.getFirstCol() + " last column at " + rowrec.getLastCol());

break;

case NumberRecord.sid:

NumberRecord numrec = (NumberRecord) record;

System.out.println("Cell found with value " + numrec.getValue()

+ " at row " + numrec.getRow() + " and column " + numrec.getColumn());

break;

// SSTRecords store a array of unique strings used in Excel.

case SSTRecord.sid:

sstrec = (SSTRecord) record;

for (int k = 0; k < sstrec.getNumUniqueStrings(); k++)

System.out.println("String table value " + k + " = " + sstrec.getString(k));

break;

case LabelSSTRecord.sid:

LabelSSTRecord lrec = (LabelSSTRecord) record;

System.out.println("String cell found with value "

+ sstrec.getString(lrec.getSSTIndex()));

break;

/**

* Read an excel file and spit out what we find.

* @param args Expect one argument that is the file to read.

* @throws IOException When there is an error processing the file.

*/

public static void main(String[] args) throws IOException

// create a new file input stream with the input file specified

// at the command line

FileInputStream fin = new FileInputStream(args[0]);

// create a new org.apache.poi.poifs.filesystem.Filesystem

POIFSFileSystem poifs = new POIFSFileSystem(fin);

// get the Workbook (excel part) stream in a InputStream

InputStream din = poifs.createDocumentInputStream("Workbook");

// construct out HSSFRequest object

HSSFRequest req = new HSSFRequest();

// lazy listen for ALL records with the listener shown above

req.addListenerForAllRecords(new EventExample());

// create our event factory

HSSFEventFactory factory = new HSSFEventFactory();

// process our events based on the document input stream

factory.processEvents(req, din);

// once all the events are processed close our file input stream

fin.close();

// and our document input stream (don't want to leak these!)

din.close();

System.out.println("done.");

Event API(XSSF和SAX)

使用较少的内存读取XLSX文件。

import java.io.InputStream;

import java.util.Iterator;

import org.apache.poi.xssf.eventusermodel.XSSFReader;

import org.apache.poi.xssf.model.SharedStringsTable;

import org.apache.poi.openxml4j.opc.OPCPackage;

import org.xml.sax.Attributes;

import org.xml.sax.ContentHandler;

import org.xml.sax.InputSource;

import org.xml.sax.SAXException;

import org.xml.sax.XMLReader;

import org.xml.sax.helpers.DefaultHandler;

import org.xml.sax.helpers.XMLReaderFactory;

public class ExampleEventUserModel {

public void processOneSheet(String filename) throws Exception {

OPCPackage pkg = OPCPackage.open(filename);

XSSFReader r = new XSSFReader( pkg );

SharedStringsTable sst = r.getSharedStringsTable();

XMLReader parser = fetchSheetParser(sst);

// To look up the Sheet Name / Sheet Order / rID,

// you need to process the core Workbook stream.

InputStream sheet2 = r.getSheet("rId2");

InputSource sheetSource = new InputSource(sheet2);

parser.parse(sheetSource);

sheet2.close();

public void processAllSheets(String filename) throws Exception {

OPCPackage pkg = OPCPackage.open(filename);

XSSFReader r = new XSSFReader( pkg );

SharedStringsTable sst = r.getSharedStringsTable();

XMLReader parser = fetchSheetParser(sst);

Iterator sheets = r.getSheetsData();

while(sheets.hasNext()) {

System.out.println("Processing new sheet:\n");

InputStream sheet = sheets.next();

InputSource sheetSource = new InputSource(sheet);

parser.parse(sheetSource);

sheet.close();

System.out.println("");

public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {

XMLReader parser =

XMLReaderFactory.createXMLReader(

"org.apache.xerces.parsers.SAXParser"

);

ContentHandler handler = new SheetHandler(sst);

parser.setContentHandler(handler);

return parser;

/**

* See org.xml.sax.helpers.DefaultHandler javadocs

*/

private static class SheetHandler extends DefaultHandler {

private SharedStringsTable sst;

private String lastContents;

private boolean nextIsString;

private SheetHandler(SharedStringsTable sst) {

this.sst = sst;

public void startElement(String uri, String localName, String name,

Attributes attributes) throws SAXException {

// c => cell

if(name.equals("c")) {

// Print the cell reference

System.out.print(attributes.getValue("r") + " - ");

// Figure out if the value is an index in the SST

String cellType = attributes.getValue("t");

if(cellType != null && cellType.equals("s")) {

nextIsString = true;

} else {

nextIsString = false;

// Clear contents cache

lastContents = "";

public void endElement(String uri, String localName, String name)

throws SAXException {

// Process the last contents as required.

// Do now, as characters() may be called more than once

if(nextIsString) {

int idx = Integer.parseInt(lastContents);

lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();

nextIsString = false;

// v => contents of a cell

// Output after we've seen the string contents

if(name.equals("v")) {

System.out.println(lastContents);

public void characters(char[] ch, int start, int length)

throws SAXException {

lastContents += new String(ch, start, length);

public static void main(String[] args) throws Exception {

ExampleEventUserModel example = new ExampleEventUserModel();

example.processOneSheet(args[0]);

example.processAllSheets(args[0]);

}

SXSSF

SXSSF位于org.apache.poi.xssf.streaming包下,用于spreadsheets比较大时,api和XSSF兼容。SXSSF通过一个滑动窗口来限制访问Row的数量从而达到低内存占用,不在窗口的rows不可再访问。

可以在创建SXSSFWorkbook(int windowSize)指定窗口大小,或者针对sheet设置窗口大小

当通过createRow()创建一个新Row时,窗口中的行数已经超过了固定的大小,索引最小的会被flush。

默认的窗口大小是100,当窗口大小是-1时,表示窗口无限大。SXSSF会生成临时文件,所以必须通过dispose方法进行删除。

SXSSFWorkbook 默认使用inline strings而不是一个共享的string table,这样就不用在内存中保存数据,但这样可能会使产生的文档与客户端不兼容。如果开启String table可能会占用大量内存,这种权衡需要用户决定。

另外是否会消耗内存取决于你要使用的特性,比如:merged region , hyperlinks, comments等。

import junit.framework.Assert;

import org.apache.poi.ss.usermodel.Cell;

import org.apache.poi.ss.usermodel.Row;

import org.apache.poi.ss.usermodel.Sheet;

import org.apache.poi.ss.usermodel.Workbook;

import org.apache.poi.ss.util.CellReference;

import org.apache.poi.xssf.streaming.SXSSFWorkbook;

public static void main(String[] args) throws Throwable {

SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk

Sheet sh = wb.createSheet();

for(int rownum = 0; rownum < 1000; rownum++){

Row row = sh.createRow(rownum);

for(int cellnum = 0; cellnum < 10; cellnum++){

Cell cell = row.createCell(cellnum);

String address = new CellReference(cell).formatAsString();

cell.setCellValue(address);

// Rows with rownum < 900 are flushed and not accessible

for(int rownum = 0; rownum < 900; rownum++){

Assert.assertNull(sh.getRow(rownum));

// ther last 100 rows are still in memory

for(int rownum = 900; rownum < 1000; rownum++){

Assert.assertNotNull(sh.getRow(rownum));

FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");

wb.write(out);

out.close();

// dispose of temporary files backing this workbook on disk

wb.dispose();

SXSSF会把sheet数据放在临时(一个sheet一个临时文件),所以临时文件可能很大,例如一个20MB的csv数据,xml临时文件可能多在1G,所以如果临时文件的大小是一个问题,可以使SXSSF使用gzip压缩。

SXSSFWorkbook wb = new SXSSFWorkbook();

wb.setCompressTempFiles(true); // temp files will be gzipped

相关推荐: