Recently, for a project, I needed a PDF generator which took existing PDF as templates and filled in data. My simple requirement was PDF look and feel wouldn’t require any coding. How hard could that be? Famous last words.
The Problem
The majority of PDF generators out there are exactly that generators. They simply create new PDF’s based on programming code. They don’t edit existing PDF files, let alone have the ability to fill in data. So I looked into a couple other solutions. First I checked out image based solutions like using ImageMagick. I could of course edit existing image files and add data. But, the problem is the added image data would have to be positioned. If the template image changed, so would my positions in code. So that was out.
I tried using PostScript files saved from Adobe Illustrator. I created a template and placed text like <% replace_me >. That should work right? Wrong. For whatever reason Adobe Illustrator likes to create HUGE files. Yes 100,000 lines long in some cases. It also splits up text arbitrarily. So ”< replace_me >” became ”< r” 20 lines of positioning and font information, “eplac” 20 lines of positioning and font information, etc. Making a search and replace impossible.
iText to the Rescue
iText is an excellent Java PDF library. In fact, I believe other then it’s C# counterpart. Is the only PDF library which could do what I wanted. iText can take existing PDF’s and manipulate them. It also can take blank PDF forms and fill out the data, similar to taking HTML form and setting each form field tags value attribute. This is exactly what I wanted. Although it is an ad-hoc round about solution it does work.
Creating PDF forms is pretty easy also. Downside is only one product can create them Adobe LiveCycle Designer, which comes with Adobe Acrobat Professional.
Solution
So first I needed to create a wrapper around iText using the excellent Ruby Java Bridge(Rjb).
require 'rjb'
Rjb::load('lib/itext-1.4.8.jar')
class PDFStamper
attr_accessor :writer
def initialize( template = "proposal_template.pdf" )
filestream = Rjb::import('java.io.FileOutputStream')
acrofields = Rjb::import('com.lowagie.text.pdf.AcroFields')
pdfreader = Rjb::import('com.lowagie.text.pdf.PdfReader')
pdfstamper = Rjb::import('com.lowagie.text.pdf.PdfStamper')
reader = pdfreader.new( template )
@stamp = pdfstamper.new( reader, filestream.new( tmpfile() ) )
@form = @stamp.getAcroFields()
end
def set( key, value )
@form.setField( key, value.to_s )
end
def fill
@stamp.setFormFlattening(true)
@stamp.close
end
def tmpfile
return @tmpfile unless @tmpfile.nil?
@tmpfile = File.join( Dir::tmpdir, make_tmpname )
end
private
def make_tmpname
return 'proposal-' + rand(10000).to_s + '.pdf'
end
end
Then using Adobe LiveCycle Designer, I simply created a PDF form and added textfield’s which would be filled out by iText. The textfield’s can be styled, so don’t think you have to keep that normal “textfield” look. Make sure you give each textfield a name that you will use “set( textfield_name, value)” to set the value . In my PDF I simply named each textfield after the database fields. Then in my controller code, I had the following.
def output
order = Order.find(params[:id])
pdf = PDFStamper.new
for column in Order.content_columns
pdf.set( column.name, order.send(column.name) )
end
pdf.fill
send_data( File.open( pdf.tmpfile ).read,
:filename => "order.pdf",
:type => "application/pdf",
:disposition => "inline"
)
end
Caveats
First of, you need Java environment variables set correctly before this will work.
export LD_LIBRARY_PATH=/usr/java/jdk1.6.0/jre/lib/i386/:/usr/java/jdk1.6.0/jre/lib/i386/client/:./ export JAVA_HOME=/usr/java/jdk1.6.0/
You can set these variables in the command line and start mongrel manually “mongrel_rails start”. Which will work fine. Except in production this isn’t really a good solution.
I ended up using the mongrel_cluster init.d script that comes with mongrel. Documentation is available here. I simply placed the export commands on the top of the script.
Another issue I hit was when Java starts. Java will check for total available system memory and then precedes to steal a good portion of it. Now this isn’t a problem with a dedicated server. A virtual server, on the other hand, is allocated a portion of the available system memory. So if the server you are on has 4gbs of memory. Java thinks it has 4gbs to play with, not the 256mb allocated to your virtual server.
This caused this weird issue where one mongrel process in my cluster would work and one wouldn’t. Because each mongrel instance starts its own Java process. The first one would steal all the available memory. Then the second couldn’t even start because no memory was available.
Making matters worse Rails or Mongrel, not sure which, would hide this memory error. I didn’t figure it out until I created a test script that forked, each fork loading the iText jar. The test showed the error coming from Java.
To fix this, I set the _JAVA_OPTIONS environment variable. The options get sent to Java as it loads, it limits the amount of ram each Java instance can eat up. Just place this next to your other Java environment variables inside your init.d mongrel script.
export _JAVA_OPTIONS='-Xms16m -Xmx32m'
You may have to fudge these numbers a little for your particular environment. Or, if you are using a dedicated server don’t worry about it.
Limitations
Now for my particular needs, I only needed text placeholders for the template. However, I believe using LiveCycle designer you can place image placeholders and table based data. Then use iText to fill them in. Don’t take my word for it though.