Fishing for POI
Creating Excel or Word files from Java...
Have you ever needed to create a Microsoft Excel or Word file from Java? If you have, did you try to do it from scratch yourself? If you were working with Excel, did you end up creating comma-separated data in a file (CSV files)?
CSV files work very well as long as all you are interested in is the raw data. But what if you are interested in including formulas in your data or need to format your spreadsheet appropriately (with centring, colours, bold, italics etc)? Did you give up in the end, or work with a compromise solution?
As it happens I was recently asked this very question by some Java developers. They were working with a web-based application and wanted to create both Excel and Word files for senior management to access. These files would hold dynamically generated data relevant to their organisation. They therefore needed to be generated programmatically as and when required.
The POI Project
Creating Excel and Word files is hard, not least due to the complex nature of the file formats used by Microsoft for Excel and Word. That is, formats based upon Microsoft's OLE 2 Compound Document format. However, one of the Apache projects does all the hard work for you and makes it very easy to create, read and update Excel, and soon, Word files. This project is called POI. It has already been in development for several years, starting in April 2001. It is currently in version 2.5. You can download it here.
POI is actually more like a number of combined projects, which allow you to create both Word and Excel format files. It can be divided into several sub-projects, in particular:
* POIFS, the oldest and most stable part of the project, which provides facilities for reading and writing OLE 2 Compound Document files.
* HWPF, a port of the Microsoft Word 97 file format to pure Java.
* HSSF, a port of the Microsoft Excel 97(-2002) file format (BIFF8) to pure Java.
In this column we will focus on the use of POI to create Excel files using HSSF.
HSSF for Excel files.
You may wonder what HSSF stands for. Rather provocatively it stands for Horrible SpreadSheet Format (indeed many of the elements of POI have quite provocative names, e.g. DDF - Dreadful Drawing Format, which is the Microsoft Office Drawing format, otherwise known as Escher format).
HSSF provides a way to create Excel spreadsheets as well as to read, modify and write existing spreadsheets. All together it provides:
* low level structures for those with special needs
* an event model API for efficient read-only access
* a full user model API for creating, reading and modifying XLS files
Creating an Excel file with POI
Let's look at the basics of what is needed to create an Excel file. First, we need to create a workbook and add a sheet to it. We will then need to add values for cells, formulas and the like. The program presented in figure 1 illustrates how we can do this using POI.
The first thing that this program does is create a new
HSSFWorkBook (in line 14). We then create a sheet from this book (in line 15). Finally, we obtain the first row in the sheet in line 16. Note that in POI rows and columns are numbered from zero. Thus, a cell A1 in Excel is obtained from row 0, and cell 0 (in that row).
We now have a workbook object, with a single sheet in it (called Sheet1). In turn the sheet contains a single row.
Lines 18 to 25 now create a set of cells to form that row in order to hold headings for each column in the sheet. Lines 27 to 35 provide data for our very simple spreadsheet. In all cases we obtain the cells in the sheet by accessing the appropriate row element and retrieving a cell form within that row (note that the createCell(short) method takes a short value rather than an int - we thus need to cast to a short when calling this method using an integer literal).
To set the value within the cell we use the
setCellValue method. This is an overloaded method, which can take a Boolean, a string, a double, Date or Calendar object. It can thus represent most types of data held in a spreadsheet. It also helps define the type of the data (e.g. cells set with a string will be textual, whereas cells set with a numeric are numerical).
One cell deviates from this; cell 3 in row 1. In this cell we use the method
setCellFormula(String). In fact, we pass the string "B2*C2" to this method. This sets the cell to hold a formula where its value will be calculated by executing this formula.
setCellFormula method takes a string and uses it as a formula for the cell. In our case, the formula is very simple - it multiplies the value held in cell B2 (the second cell in row 1) with the value held in cell C2 (the third cell in row 1). Notice the cells we obtain are from row 1, and cells 1 and 2, but the formula references them as cells B2 and C2. Also notice that the formula does not include the "=" at the start - this will be automatically added by POI.
Once the Spreadsheet has been defined it can be written out to file. This is done in lines 37-39. This creates a
FileOutputStream to a file called "text1.xls" and uses the write method on the Workbook object to write its contents out to file. The end result is that the file saved to the file system is now an Excel file that is indistinguishable from any other Excel file. In Figure 2 I have opened this file using Excel: