Tutorial - Dynamic Generation of OpenOffice Documents

Giovanni Floridia
January 2006

Introduction

It is a common need in web application development to generate reports or dynamic documents. In these cases the difficulties of controlling the printed results  with HTML is well known. One option to solve the problem is generating PDF files. This format allows a better positioning of the elements in the page, platform and printer independence. However, the design on such format  is laborious and it is proprietary.

These tutorial shows a way to create documents in OpenDocument Format using OpenOffice as design tool. 


On this document


The version most current of this document is in http://www.floridia.net/en/OpenDocumentFormat.
The original version is in Brazilian Portuguese and can be found here

The OpenDocument Format

OpenOffice 2.x [ 1 ] adopts the OASIS OpenDocument Format,  approved May, 1st 2005. OASIS goal is to create document standards to stimulate the e-business. The open standards adoption is more important than open source one. 

The format is intended to be used by any Office Suite. Indeed, the koffice plans to adopt it as default. 

The format is based on XML and its detailed description is complex. However, it is not necessary to know the 706 page specification.

Basically the format is a zip compressed file. Unpacking it will reveal the following basic structure:



META_INF/
                   manifest.xml
Configurations2/
Pictures/
Thumbnails/
mimetype
meta.xml
settings.xml
style.xml
content.xml

We are interested in the content.xml file. If you open it with your preferred editor, there will appear a very long single line. This occurs because the OpenOffice' default is to save space. As we need to read and manipulate the content, it is better to ask OpenOffice to organize the XML.


In Tools | Options | LoadSave |General uncheck the option " Size optimization will be XML format ". 

OpenOffice XML Size Optimization option.

Now, generated documents will be readable, although a little bigger. Save the file, unpack it again and reopen it: you will be able to read it.


Generated XML sample.

Well, you don't need to understand it. The hard part will be done by the right OpenOffice application.  


The Process.

Now it is possible to take the big picture of the process:

  1. Create an OpenOffice file as template;

  2. Unpack it in a folder;

  3. Modify some XML files, basically the content.xml;

  4. Zip everything back;

  5. Send to the client browser with the correct extension.

First two stages must be done manually by using OpenOffice as the template generator. The last two ones will be done by the application. The middle one need once some hack and then the application will do the rest. In our sample we will use PHP, but the method can by applied with any other language. In fact, it is more important a template tool. 

It possible to create one content.php that generates desired content.xml, capture the result, pack with the other files and send it with the adequate extension. However, this is not an adequate way to do it. The content.xml file is generated by OpenOffice. The more stages we do inside OpenOffice better. The less we  modify content.xml file, easier will be to create and to keep the reports.

The simplest approach  is by using a template tool. There are dozens of template tools, as the Smarty for PHP and the Cheetah for Python. We will use the Philipp v. Criegern's SmartTemplate [7] : it is simple and very fast. Not being an experienced user of templates, I believe that SmartTempate will minimize the hacks in the original file. However I'm not sure about this, and I hope this tutorial can be improved and "ported" to many template engines. 

To zip the files we will use a modified version ziplib.php that is part of the Phpwiki project.

Thanks to the Open Source philosophy, the heavy work has already been made by someone. We do not need to code a lot. It is enough to look for pieces and join them. I've read the future of the programming would be more like wiring a block diagram than to code. Open source software is a great font of "blocks" as is the *nix philosophy of creating tools that do just one thing, but very well done. 


The Parts

We need a application to create files in the OpenDocument format (OpenOffice), a text or XML editor, some software to unzip files. For PHP we need a n unzip lib and a template engine. Clearly, we need a php-enabled web server (Apache). The less common parts can be found here .


Hands On

Open OpenOffice Writer and create the tab1.odt document with a table, as to follows. 

Ooo Writer Template.

We are already preparing the file for use with templates by tagging with { }. Create a folder and name it OpenDocumentFormat somewhere in the htdocs tree. Unzip the file inside a a new folder named tab1 (I like to copy the file as tab1.zip, preserving the original).

We decided to use the ziplib.php of phpwiki version 1.3.11.p1. Three very simple hacks are necessary.  Comment the rcs_id function call. In line 262 change

function ZipWriter ($$comment = "", $$zipname =" archive.zip") {

for

function ZipWriter ($$comment = "", $$zipname =" archive.zip ", $$mime = "application/zip") {

In line 269 change

header("Content-Type: application/zip; name=\"$zipname \"");

for

header("Content-Type: $$mime; name=\"$zipname \"");

You can get an already modified ziplib.php here.

We need these changes to pack the files and send them in a different mimetype and they don't modify the original function calls. To prevent problems with inclusion ways ( include_path ), we will place the libraries in the working directory.

The following sample shows how the ziplib.php works.


<?php

require_once ( "ziplib.php" );

$base = "tab1" ;

$zipfile = new ZipWriter ( "Comment" $base . "_ generated.odt" , "application/vnd.oasis.opendocument.text" );
$ooofiles = array(
                    "mimetype" ,
                    "content.xml" ,
                    "styles.xml" ,
                    "meta.xml" ,
                    "settings.xml" ,
//                  "Configurations2/",
                    "META-INF/manifest.xml" ,
//                  "Pictures/",
//                  "Thumbnails/thumbnail.png"
                
);

foreach ( $ooofiles $file ) {
  $handle = fopen ( "$base/$file" , "rb" );
  $filedata = fread ( $handle , filesize ( "$base/$file" ));
  $zipfile - > AddRegularFile ( $file $filedata );
}

echo $zipfile - > finish ();

?>

The commented lines are not necessary and can generate errors. The previous version of the OpenOffice do not use them. To generate swx files it is enough to change the extension and the mimetype, as follows:

$zipfile = new ZipWriter("Comentario ", "documento.sxw", "to application/vnd.sun.xml.writer");

Accessing tab1.php from browser the following message should appear.

Browser receiving the file.

Mark to memorize the decision and type OK. The document should open in the OpenOffice Writer. At this point we solved step 1, 2, 4 and 5. Let's work on the remaining one.


SmartTemplate

The SmartTemplate is a light and fast tool. First let's put the files class.smarttemplate.php, class.smarttemplateparser.php and class.smarttemplatedebugger.php in include_path or in the working directory. The tab2.html template file is:

<HTML>
 <BODY>
  <P ALIGN="center "> User List</P>

  <TABLE ALIGN="center "BORDER="1">
   <TR>
    <TH BGCOLOR="#8080ff ">
        Name
    </TH>
    <TH BGCOLOR="#8080ff ">
        Group
    </TH>
   </TR>

<-- BEGIN users -->
   <TR>
    <TD>
        {NAME}
    </TD>
    <TD>
        {GROUP}
    </TD>
   </TR>
<-- END users -->

  </TABLE>
 </BODY>
</HTML>

The file tab2.php processes it.

<?php // tab2.php

require_once "class.smarttemplate.php";

    
$x['users'][0]['NAME']='Giovanni';
    
$x['users'][0]['GROUP']='Admin';

    
$x['users'][1]['NAME']='Leonardo';
    
$x['users'][1]['GROUP']='Operations';

    
$x['users'][2]['NAME']='Orfeu';
    
$x['users'][2]['GROUP']='Desevelopment';

    
$x['users'][3]['NAME']='Isa';
    
$x['users'][3]['GROUP']='Sales';

$content = new SmartTemplate('tab2.html');
$content->assign($x);
$content->output();

?>

It worth to note that it is possible to attribute a matrix variable (
$x['users'] )  to an entire table, preventing a loop in the code. This spares some cpu cicles. Note also that we used few markers in template. This fact will be important for easy creation and maintenance of the code.

At this point the whole idea could be clear: we'll do the same in content.xml! Make a copy named content2.xml  inside the same tab1 folder. Edit it and include the two lines, as below.

 (...)
</table:table-row>
  </table:table-header-rows>
<!-- BEGIN users -->
    <table:table-row>
      <table:table-cell table:style-name="Tabela1.A2" office:value-type="string">
        <text:p text:style-name="P3">{NAME}</text:p>
      </table:table-cell>
      <table:table-cell table:style-name="Tabela1.B2" office:value-type="string">
        <text:p text:style-name="P3">{GROUP}</text:p>
      </table:table-cell>
    </table:table-row>
<!-- END users -->
  <table:table-row>
(...)

Change the first line

<?xml version="1.0 "encoding="UTF-8">

<office:document-content...

for 

{XMLHEADER}

<office:document-content...


Save the file. In the first line there is a special character which gives an error when OpenOffice tries to open the file.

The code of tab3.php is
<?php

require_once ("ziplib.php");
require_once
"class.smarttemplate.php";

$base="tab1";

    
$x['users'][0]['NAME']='Giovanni';
    
$x['users'][0]['GROUP']='Admin';

    
$x['users'][1]['NAME']='Leonardo';
    
$x['users'][1]['GROUP']=utf8_encode('Operações');

    
$x['users'][2]['NAME']='Orfeu';
    
$x['users'][2]['GROUP']='Development';

    
$x['users'][3]['NAME']='Isa';
    
$x['users'][3]['GROUP']='Sales';

    
$content = new SmartTemplate("$base/content2.xml");
  
  $content->assign($x);
    
$content->assign('XMLHEADER', '<?xml version="1.0" encoding="UTF-8"?>');
    
// $content->output();
// exit;


$zipfile = new ZipWriter("Comment", $base."_generated.odt", "application/vnd.oasis.opendocument.text");
$ooofiles = array(
                   
"mimetype",
//                 "content.xml",  // Note que está comentada!
                   
"styles.xml",
                   
"meta.xml",
                   
"settings.xml",
//                 "Configurations2/",
                   
"META-INF/manifest.xml",
//                 "Pictures/",
//                 "Thumbnails/thumbnail.png"                   
             
);

foreach (
$ooofiles as $file) {
  
$handle = fopen("$base/$file","rb");
  
$filedata = fread($handle , filesize("$base/$file"));
  
$zipfile -> AddRegularFile($file, $filedata);
}

$zipfile->addRegularFile('content.xml', $content->result() );

echo
$zipfile -> finish();

?>
 

Note the script read the content2.xml file, but send it as content.xml. Thus, it is possible to use the same backbone files, having different content.xml versions.  Do not forget to comment content.xml out from $ooofiles array.

Note also that lines with special caracters should be encoded by the utf_encode function. Actually, we should e doing this with all entries, but this is only a tutorial. This is an important detail when dealing with languages that use them, as is the Brazilian Portughese case or i18n projects. 

If you need to debug, comment out the following lines.

// $content->output();
// exit;



Flow Control

The result table can be empty the script should handle the case. Create a new script called tab4.php - actually it is the same tab3.php. We copied it to access the content3.xml, changing the following line.

    $content = new SmartTemplate("$base/content3.xml");

The relative content3.xml has more changes.


  (...)
    </table:table-header-rows>
<!-- IF users -->
  <!-- BEGIN users -->
    <table:table-row>
      <table:table-cell table:style-name="Table1.A2" office:value-type="string">
        <text:p text:style-name="P3">{NAME}</text:p>
      </table:table-cell>
      <table:table-cell table:style-name="Table1.B2" office:value-type="string">
        <text:p text:style-name="P3">{GROUP}</text:p>
      </table:table-cell>
    </table:table-row>
  <!-- END users -->
<!-- ELSE -->
    <table:table-row>
      <table:table-cell table:style-name="Table1.B2" table:number-columns-spanned="2" office:value-type="string">
        <text:p text:style-name="P3">No user found.</text:p>
      </table:table-cell>
        <table:covered-table-cell/>
      </table:table-row>
<!-- ENDIF users -->
    </table:table>
  (...)

Running tab4.php results in the "No user found." line removed. Commenting out the users, leaving empty the $x variable, shows the the last line alone. 

In order to show appropriately big tables it is convenient to set the Repeat heading property. You can reach it right-clicking on the table and selection the "Table..." option.

Header repeating option in Ooo Writer.

The idea is quite clear now. It can be applied in any situation. Just make the model and put the right template tags. It's possible to create styles in the same way - remember to comment out the file style.xml in $ooofiles array. 

Spreadsheet generation 

In some situations it is interesting to generate spradsheets dynamicaly. Let's apply the same method. Create the following Plan in Calc and save it as tab5.ods. 

 

The righter-most column is composed just by formulae. Lines 2 and 7 are intentionally blank. We'll see why later. We will use this Plan as template. After saving it, let's unzip it in a tab5 folder. 

Open contents.xml in your prefered editor. You will find something like this. 

  (...)
  <table:table-row table:style-name="ro3">
    <table:table-cell table:style-name="ce2" office:value-type="string">
      <text:p>Maçãs</text:p>
    </table:table-cell>
    <table:table-cell office:value-type="float" office:value="4">
      <text:p>4</text:p>
    </table:table-cell>
    <table:table-cell office:value-type="float" office:value="1.5">
      <text:p>1,50</text:p>
    </table:table-cell>
    <table:table-cell table:formula="oooc:=[.B3]*[.C3]" office:value-type="float" office:value="6">
      <text:p>6,00</text:p>
    </table:table-cell>
  </table:table-row>

  <table:table-row table:style-name="ro3">
    <table:table-cell table:style-name="ce3" office:value-type="string">
      <text:p>Bananas</text:p>
    </table:table-cell>
    <table:table-cell table:style-name="ce9" office:value-type="float" office:value="1">
      <text:p>1</text:p>
    </table:table-cell>
    <table:table-cell table:style-name="ce12" office:value-type="float" office:value="2.3">
      <text:p>2,30</text:p>
    </table:table-cell>
    <table:table-cell (...)ormula="oooc:=[.B4]*[.C4]" office:value type="float" office:value="2.3">
      <text:p>2,30</text:p>
    </table:table-cell>
  </table:table-row>
  (...)

Note that values appear twice: as  office:value-type and as text. The  text is the resullt of the last calculation, and can be ignored if you have the autorecalc on. There is a lot of different styles - we will not use all of them. Let's adopt, initilly just two of them: the ce2 (white background) and the ce3 (gray background).


The main block wil be:

<!-- BEGIN products -->
    <table:table-row table:style-name="ro3">
      <table:table-cell table:style-name="{STYLE}" office:value-type="string">
        <text:p>{NAME}</text:p>
      </table:table-cell>
      <table:table-cell table:style-name="{STYLE}" office:value-type="float" office:value="{QUANT}">
        <text:p>1</text:p>
      </table:table-cell>
      <table:table-cell table:style-name="{STYLE}" office:value-type="float" office:value="{PUNIT}">
        <text:p>4,50</text:p>
      </table:table-cell>
      <table:table-cell table:style-name="{STYLE}" table:formula="oooc:={TOTAL}" office:value- type="float"                                                                           office:value="4.5">
        <text:p>4,50</text:p>
      </table:table-cell>
    </table:table-row>
<!-- END products -->

We remind to change the first line for the {XMLHEADER} tag. At this point the tab5.php already can generate documents with a variable number of lines.


<?php

require_once ("ziplib.php");
require_once
"class.smarttemplate.php";

$base="tab5";

$x['products'][0]['NAME']=utf8_encode('Maças');
$x['products'][0]['QUANT']='2';
$x['products'][0]['PUNIT']='1.5';
$x['products'][0]['TOTAL']='[.B3]*[.C3]';
$x['products'][0]['STYLE']='ce2';
    
$x['products'][1]['NAME']='Bananas';

         (...)

$x['products'][4]['NAME']='Abacates';
$x['products'][4]['QUANT']='6';
$x['products'][4]['PUNIT']='1.7';
$x['products'][4]['TOTAL']='[.B7]*[.C7]';
$x['products'][4]['STYLE']='ce2';


$content = new SmartTemplate("$base/content2.xml");
$content->assign($x);
$content->assign('XMLHEADER', '<?xml version="1.0" encoding="UTF-8"?>');
    
// $content->output();
// exit;


$zipfile = new ZipWriter("Comentario", $base."_generated.odt", "application/vnd.oasis.opendocument.text");
$ooofiles = array(
                   
"mimetype",
//                 "content.xml",
                   
"styles.xml",
                   
"meta.xml",
                   
"settings.xml",
//                 "Configurations2/",
                   
"META-INF/manifest.xml",
//                 "Pictures/",
//                 "Thumbnails/thumbnail.png"                   
            
     );

foreach (
$ooofiles as $file) {
  
$handle = fopen("$base/$file","rb");
  
$filedata = fread($handle , filesize("$base/$file"));
  
$zipfile -> AddRegularFile($file, $filedata);
}

$zipfile->addRegularFile('content.xml', $content->result() );

echo
$zipfile -> finish();

?>

However, we have to fix 3 points. The last line formula must adapt to the number of rows. The formulae at the end of each line must be automatically written. Finally, the white/gray alternation should be automatic too. All are quite simple points.

We can see the final result in the tab6.php script and the relative content.xml. We are using a database table, modify it here, and see the result.

<?php

require_once ("ziplib.php");
require_once
"class.smarttemplate.php";

$base="tab5";

$con = mysql_connect("127.0.0.1", "user", "pass") or die("Conection failed!");
$base_de_dados = mysql_select_db("open_document",$con) or die ("Database not found!");
$result = mysql_query("SELECT name, quant, punit FROM sales ORDER BY name") or die ("QUERY Error!");
    

$i=0;    
while (
$line = mysql_fetch_array($result, MYSQL_BOTH)) {
        
$x['products'][$i]['NAME']=utf8_encode($line["name"]);
        
$x['products'][$i]['QUANT']=utf8_encode($line["quant"]);
        
$x['products'][$i]['PUNIT']=utf8_encode($line["punit"]);
        
$j=$i+3;
        
$x['products'][$i]['TOTAL']="[.B$j]*[.C$j]";
        
$x['products'][$i]['STYLE1']= ( $i%2==0 ? 'ce2' : 'ce3' );
        
$x['products'][$i]['STYLE2']= ( $i%2==0 ? 'ce8' : 'ce9' );
        
$x['products'][$i]['STYLE3']= ( $i%2==0 ? 'ce11' : 'ce12' );
        
$i++;
}

$gtotal ="SUM([.D3:.D$j])";
 
mysql_free_result($result);
mysql_close($con);


// Template generation
$content = new SmartTemplate("$base/content3.xml");
$content->assign('XMLHEADER', '<?xml version="1.0" encoding="UTF-8"?>');
$content->assign($x);
$content->assign('GTOTAL', $gtotal);


// Uncomment to DEBUG.    
// $content->output();
// exit;


$zipfile = new ZipWriter("Comment", $base."_generated.odt", "application/vnd.oasis.opendocument.text");
$ooofiles = array(
                   
"mimetype",
//                 "content.xml",
                   
"styles.xml",
                   
"meta.xml",
                   
"settings.xml",
//                 "Configurations2/",
                   
"META-INF/manifest.xml",
//                 "Pictures/",
//                 "Thumbnails/thumbnail.png"                   
             
);

foreach (
$ooofiles as $file) {
  
$handle = fopen("$base/$file","rb");
  
$filedata = fread($handle , filesize("$base/$file"));
  
$zipfile -> AddRegularFile($file, $filedata);
}

$zipfile->addRegularFile('content.xml', $content->result() );

echo
$zipfile -> finish();

?>

The main part of content3.xml (note the styles).
 (...)
<!-- BEGIN products -->
    <table:table-row table:style-name="ro3">
      <table:table-cell table:style-name="{STYLE1}" office:value-type="string">
        <text:p>{NAME}</text:p>
      </table:table-cell>
      <table:table-cell table:style-name="{STYLE2}" office:value-type="float" office:value="{QUANT}">
        <text:p>1</text:p>
      </table:table-cell>
      <table:table-cell table:style-name="{STYLE3}" office:value-type="float" office:value="{PUNIT}">
        <text:p>4,50</text:p>
      </table:table-cell>
      <table:table-cell table:style-name="{STYLE3}" table:formula="oooc:={TOTAL}" office:va...
        <text:p>4,50</text:p>
      </table:table-cell>
    </table:table-row>
<!-- END products -->

(...)

    </table:table-cell>
    <table:table-cell table:style-name="ce20" table:formula="oooc:={GTOTAL}" office:value-type="float"
                      office:value="26.8">
      <text:p>26,80</text:p>
(...)


Conclusions

OASIS and the previous OpenOffice document formats, being open standards, can be permit the developer to find a solution by joining the right pieces. The simplicity of the solution can, in turn, contribute for the diffusion of the open standard. It also can help small business and developers giving them dinamicaly generated office documents. 

We hope this document could help to spread OpenDocument Format adoption.


Resources

[1] http://www.openoffice.org

[2] http://www.oasis-open.org

[3]  http://www.koffice.org

[4]  OpenDocument-v1.0-os.pdf

[5]  http://smarty.php.net

[6]  http://www.cheetahtemplate.org/

[7]  www.smartphp.net or  http://www.phpclasses.org/browse/package/1032.html.

[8]  http://phpwiki.sourceforge.net





     Giovanni Floridia