Convert web pages to single pdf

In this tutorial I will list few commands that can be used to convert set of web pages (recursively) to a single pdf document.

First, download the web pages.

$ cd /tmp
$ mkdir wget
$ wget --mirror -w 2 -p --html-extension --convert-links -P /tmp/wget

Change working directory (cd) to the target directory in the downloaded folder, where you want to convert html files to pdf

$ cd
$ find . -name '*.html' -exec wkhtmltopdf {} {}.pdf \;

This will create pdf files for each html file in each sub-directory recursively.

Copy the pdf files to a particular directory.

Note: Sometimes all files may be named index.html.pdf, so we must make sure one file does not replace other during copying to a single directory.


for f in `find . -name '*pdf'`
 filename=`echo $f|awk -F'/' '{SL = NF-1; TL = NF-2; print $TL "_" $SL "_" $NF}'`
 cp $f newfolder/$filename

Create a shell file ( and execute it using bash (bash in your target directory. This will copy all the pdf files recursively and add folder name to it.

Note: If the file starts with “.”, all of them will be hidden inside the newfolder directory. Use “ls -al” command to list them. If some files do not start with “.” they might be out of order. You can add “.00” or other prefix to the file name to list them in order.

Once the files are in order (although all of them might start with “.”, use following command to join them.

pdfunite .*pdf merged.pdf


I hope you found this article useful.



Leave a comment

Filed under Uncategorized

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s