How I create a Word document Form-letter-like functionality in a web application

by Jason Haley 9. April 2004 03:58

As you may have noticed, I haven't been making too many entries lately. I have been spending most of my free time working with Sue on wedding stuff or studying for the 70-300 exam. I have been meaning to make this entry for the last month, but forgot about it until I ran across Eric Carter's Blog today in my blog reading. I am now subscribed to around 200 blogs (some aggregated blogs like Scoble's new aggregated blog of 1400+ blogs), so I am also behind on my reading of blogs too. But I figured I better write this entry now before I put it off again..

How I create a Word document Form-letter-like functionality in a web application
First of all, I do want to remind you that reverse engineering is not legal and all of the Word HTML that I am going to show you belongs to Microsoft, etc. ("but, you already know what I am about to say" - think the Oracle in the Matrix movies).

This entry assumes you know a little about XML and XSLT. The idea behind this usage of Word (this whole thing won't work unless you have a copy of Word XP or Word 2003 own the client machine), is to mimic form letter functionality. What I mean by that is, a template (I use an XSLT template) that you need populated from a data source (I'll use XML) for either one page or multiple pages.

The XML I will use looks like this:

<Report>
 <Data>
  <UserName>Jason Haley</UserName>
  <Content>This is a test and only a test</Content>
 <Data>
 <Data>
  <UserName>Sue Haley</UserName>
  <Content>This is the second page of the test</Content>
 <Data>
</Report>

1. Create your XSLT template. I do this by creating the exact Word document I want the form letter to look like, then save as a web page. If you need to do multiple pages, make sure to add the page break at the end of the Word document.

2. Now that you have the HTML that is you need to generate, open up the web page created and start cleaning up the bad formatting - remember it is best if the template is XHTML when you finish (ie. won't crash when you read it into an XML parser). One method I use to clean up the majority of the bad formatting and bad characters is this little perl script (this was my first perl script ever, so it may not be the best):

&main();
sub main(){
 my $inputfilename = "C:\\dirtyword.htm";
 my $outputfilename = "C:\\cleantemplate.xml";
 &DoClean($inputfilename, $outputfilename);
}
sub DoClean {
 my ($inputfile, $outputfile) = @_; #parameters
 open(INFILE, $inputfile) || die "Unable to open $inputfile for reading";
 open(OUTFILE, ">".$outputfile) || die "Unable to open $outputfile for writing";
 # loop through the file, writing to the output file
 while ()
 {
  # add single quotes around some attributes that word did not
  s/=(\d+|(top)|(right)|(left)|(center))/\='$1\'/g;
  # replace non breaking spaces with xml space entity
  s/(nbsp)/\#xA0/g;
  # clear out some funky spans that word adds
  s/\<span\s+style\='mso-spacerun:yes'>([^\<]+)\<\/span>/\<span>\<\/span>/g;
  # fix MsoTableGrid and MsoNormal classes
  s/\=MsoTableGrid/\='MsoTableGrid'/g;
  s/\=MsoNormal/\='MsoNormal'/g;
  print OUTFILE;
 } #while
 close (INFILE);
 close (OUTFILE);
}

3. Once you have an html page that will open in Internet Explorer (rename it to an xml file) without crashing, you are ready to make it an XSLT template. Add in the stylesheet, template and apply-templates tags, so it looks something like this:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="Report">
<html xmlns:v="urn:schemas-microsoft-com:vml"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:w="urn:schemas-microsoft-com:office:word"
 xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
 xmlns:st1="urn:schemas-microsoft-com:office:smarttags"
 xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252"/>
<meta name="ProgId" content="Word.Document"/>
<meta name="Generator" content="Microsoft Word 10"/>
<meta name="Originator" content="Microsoft Word 10"/>
...
</head>
<body lang='EN-US' style='tab-interval:.5in'>
 <xsl:apply-templates select="Data"/>
</body>
</html>
</xsl:template>
<xsl:template match="Data">
...
</xsl:template>
</xsl:stylesheet>

4. Add in your value-of tags where you want the data to be plugged in to template (this is in the Data template above where ... is).

<div class="3DSection1">
 <p class="3DMsoNormal">
  <span style="font-size:10.0pt;font-family:Verdana">
   This is a sample form letter for: <xsl:value-of select="UserName"/>
  </span>
 </p>
 <p class="3DMsoNorma"></p>
 <p class="3DMsoNormal">
  <span style="font-size:10.0pt;font-family:Verdana">
   <xsl:value-of select="Content"/>
  </span>
 </p>
 <p class="3DMsoNormal"></p>
 <p class="3DMsoNormal">
  <span style="font-size:10.0pt;mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA">
   <br clear="all" style="mso-special-character:line-break;page-break-before:always"/>
  </span>
 </p>
</div>

5. Rename your file to an XSLT file (might open it in Internet Explorer first while it is still an xml file to check for parsing errors. Now you have a Word HTML XSLT template.

6. Write some code to do the transform. Something like this (mostly taken from MSDN documentation):

 //Create a new XslTransform object.
 XslTransform xslt = new XslTransform();
 //Load the stylesheet.
 xslt.Load("http://server/yourtemplate.xslt");
 //Create a new XPathDocument and load the XML data to be transformed.
 XPathDocument mydata = new XPathDocument("inputdata.xml");
 //Create an XmlTextWriter which outputs to the console.
 XmlTextWriter writer = new XmlTextWriter(Response.OutputStream, System.Text.Encoding.UTF8);


//Transform the data and send the output to the console. xslt.Transform(mydata,null,writer, null);

7. Set the content type to tell Internet Explorer to use Word for this file

 Response.ContentType = "application/msword";

8. Optional - add the header to tell Internet Explorer to treat the file as an attachment (should prompt user to save or open)

 Response.AddHeader("Content-Disposition", "attachment; filename=example.doc;");

Some good resources:

Comments (7) | Post RSSRSS comment feed |

Categories:
Tags:

Comments

Comments are closed