# Update PDF data with pdftk-java Between technology whitepapers, manuscripts, and RPG books, I deal with lots of PDFs every day. The PDF format is popular because it contains processed PostScript code, and PostScript is the native language of modern printers, so publishers often release a digital version of a book as a PDF because they've already invested in the time and effort to produce the file for print anyway. But a PDF isn't intended to be an editable format, and while some reverse processing is possible, it's meant to be the last stop for digital data before it's sent to the printer. Even so, sometimes you need to make adjustments to a PDF, and one of my favourite tools for that job is the [pdftk-java](https://gitlab.com/pdftk-java/pdftk) command. ## Install As its name suggests, pdftk-java is written in Java, so it works on all major operating systems as long as you have Java installed. Linux and macOS users can install Linux from [AdoptOpenJDK.net](https://adoptopenjdk.net/releases.html). Windows users can install [Red Hat's Windows build of OpenJDK](https://developers.redhat.com/products/openjdk/download). To install pdftk-java: 1. Download the [pdftk-all.jar release](https://gitlab.com/pdftk-java/pdftk/-/jobs/1527259628/artifacts/raw/build/libs/pdftk-all.jar) from its Gitlab repository, and save it to `~/.local/bin/` or [some other location in your path](https://opensource.com/article/17/6/set-path-linux). 2. Open `~/.bashrc` in your favourite text editor and add this line to it: ```bash alias pdftk='java -jar $HOME/.local/bin/pdftk-all.jar' ``` 3. Load your new Bash settings: ```bash $ source ~/.bashrc ``` ## Command syntax The structure of a valid `pdftk-java` command follows a pattern, but there's a lot of flexibility in what's in the pattern. The syntax is a little unusual because it doesn't use traditional-style [terminal options](https://opensource.com/article/21/8/linux-terminal), but with practice it's not too difficult to remember. * `pdktk`: the alias to call the command * input file: the PDF you want to modify * action: what you want to do to the input file * output: where you want your modified PDF file to be saved It's the action part that's most complex, so I'll start with simple tasks. ## Combine two PDF files into one It's not uncommon for the front cover of a book to be created in a separate application, such as Inkscape or GIMP, than the rest of the book, which is usually done in a layout application like Scribus or an office suite like LibreOffice. You could combine the two in your layout application, and a good desktop publisher like Scribus makes it easy to just reference an image so that when the cover changes, it's automatically updated in layout. However, it's also possible to prepend the cover to a PDF with `pdftk-java`: ``` $ pdftk cover.pdf body.pdf \ cat \ output book.pdf ``` In this example, the action is `cat`, as in *concatenate* and like the Linux [cat command](https://opensource.com/article/19/2/getting-started-cat-command). It concatenates one or more PDF files into a single data stream, and the data stream is directed into whatever file the `output` argument specifies. ## Remove pages from a PDF You can't exactly remove a page from a PDF, but you can create a new PDF containing only the pages you want to keep. ``` $ pdftk book.pdf \ cat 1 3-end \ output shorter-book.pdf ``` In this example, page 1 of my book file, and all pages from 3 to the end, are saved to a new file. The page I've removed, therefore, is page 2. ## Split a PDF into separate files Splitting a PDF file into many different files also uses the `cat` action, and it's similar in principle to removing pages. You can split a PDF by sending the pages you want to a new file: ``` $ pdftk book.pdf \ cat 1-15 \ output part-1.pdf $ pdftk book.pdf \ cat 16-42 \ output part-2.pdf ``` If you need to split a PDF into single page files, there's a special action for that, called `burst`: ``` $ pdftk book.pdf burst $ ls book.pdf pg_0001.pdf pg_0002.pdf pg_0003.pdf pg_0004.pdf pg_0005.pdf [...] ``` ## Filling form Few would argue that the PDF format hasn't become bloated over the years, and one feature you sometimes find in a PDF file is a fillable form. You see this in US tax documents, RPG character sheets, online school workbooks, and other PDF files that are intended to be interactive. While most modern PDF viewers, such as GNOME's Evince and KDE's Okular, have the capability to fill out PDF forms, you can also fill out a PDF form with the help of `pdftk-java`. First, you must extract the form data using the `generate_fdf` action. This extracts the IDs of the form elements and places them into a text file. ``` $ pdftk character-sheet.pdf \ generate_fdf \ output chsheet-form.txt ``` Your destination file (in this example, `chsheet-form.txt`) contains the data of the form contained in the PDF, but just the text parts. You can edit it in any standard text editor, like [Atom](https://opensource.com/article/20/12/atom) or [Gedit](https://opensource.com/article/20/12/gedit). In a sometimes admirable and sometime awkward glimpse into the workflow of the organization producing the PDF, you'll find some forms are clearly labelled, while others have default names like "Checkbox_001" and "Textfield-021", so you might have to cross-reference your text file with your PDF, but that may be worthwhile if you're writing a script to fill out forms automatically. Each label is marked as a `/T` item, and on the following line there's space (marked as `/V`) is provided for text entry. Here's a snippet from one that's got context to its labels, and some data filled in: ``` /T (CharacterName 2) /V (Abaddon) >> << /T (SlotsTotal 24) /V () >> << /T (Hair) /V (Brown) >> << /T (AC) /V (15) >> << /T (Background) /V () >> << /T (DEXmod ) /V () ``` Once you've got the form data entered, you can combine your text input with the PDF structure with the `fill_form` action: ``` $ pdftk character-sheet.pdf \ fill_form chsheet-form.txt \ output completed.pdf ``` Here's a sample of the result: ![A form filled by pdftk-java](pdftk-form-fill.jpg) ## PDF modification made easy When you deal with lots of PDF files, or you deal with PDF files through shell scripts, a tool like `pdftk-java` is invaluable because it frees you from having to do everything manually. When I build a PDF from the output of [Docbook](https://opensource.com/article/17/9/docbook), it's a Makefile that calls `pdftk-java` for any number of tasks, so there's no chance of me forgetting a step or mistyping the command, and there's no need for me to spend my time on it. There are lots of other reasons you might use `pdftk-java` in your own workflow, and lots of other things `pdftk-java` can do, including actions like `shuffle`, `rotate`, `dump_data`, `update_info`, and `attach_files`. If you find yourself dealing with PDF files often, give `pdftk-java` a try.