In this tutorial I will list few commands that can be used to convert set of web pages (recursively) to a single pdf document.

First, download the web pages.

$ cd /tmp
$ mkdir wget
$ wget --mirror -w 2 -p --html-extension --convert-links -P /tmp/wget

Change working directory (cd) to the target directory in the downloaded folder, where you want to convert html files to pdf

$ cd
$ find . -name '*.html' -exec wkhtmltopdf {} {}.pdf \;

This will create pdf files for each html file in each sub-directory recursively.

Copy the pdf files to a particular directory.

Note: Sometimes all files may be named index.html.pdf, so we must make sure one file does not replace other during copying to a single directory.


for f in `find . -name '*pdf'`
 filename=`echo $f|awk -F'/' '{SL = NF-1; TL = NF-2; print $TL "_" $SL "_" $NF}'`
 cp $f newfolder/$filename

Create a shell file ( and execute it using bash (bash in your target directory. This will copy all the pdf files recursively and add folder name to it.

Note: If the file starts with “.”, all of them will be hidden inside the newfolder directory. Use “ls -al” command to list them. If some files do not start with “.” they might be out of order. You can add “.00” or other prefix to the file name to list them in order.

Once the files are in order (although all of them might start with “.”, use following command to join them.

pdfunite .*pdf merged.pdf


