Select Page

Automatic PDF Generation with Google Chrome

by | Jan 30, 2020 | Development

Many developers get, sooner or later, the task to generate PDF documents automatically. Though, less developers put their experiences and insights then into a blog-post to save others some hassle. So let me do you a favor by explaining how we utilize Google Chrome in headless mode to generate pretty PDF files from HTML.

I won’t go into the details why Google Chrome. If you found this blog entry I suppose you already tried other options without success or satisfying results. We sure have tried several other options (e.g. wkhtmltopdf, dompdf, tcpdf) but only Google Chrome provided us with the results we wanted.

Another advantage of Google Chrome is the possibility to instrument a remotely running instance. And that’s exactly how we instruct it to generate a PDF file for us. Though, not with Puppeteer but with plain chrome devtools protocol (CDP) communication over a websocket.

I’ll avoid any programming language specific examples. You can take a look here at our implementation in PHP. So, let’s start with it.

The Process to Print HTML to PDF

First you’ll need to connect with the browser by use of a websocket connection at: ws://10.11.12.13:9222/devtools/browser/79744167-25f0-41a8-9226-382fa2eb4d66

This is printed on stderr right at launch of the process and also available on the JSON api: 10.11.12.13:9222/json/version

The important bit is the browser id (79744167-25f0-41a8-9226-382fa2eb4d66) at the end of the URI. This changes every time the process is launched and can’t be configured.

Communicating with the browser now takes place over this socket by transmitting and receiving JSON messages. They are divided into four types:

Calls

These primarily originate from ourselves. They contain an id, a method and parameters:

{ “id”: <mixed>, “method”: <string>, “params”: <object> }

The id is chosen by us and should be different for each call. It’s sent back with the response so it’s possible to later associate it with the call we made. Though this is mostly relevant if you are not communicating synchronously, which we do. So this may just as well be an integer incremented by 1 each time.

Results

These are sent by the browser in response to an API call.

{ “id”: <mixed>, “result”: <mixed> }

Errors

If it’s not a response, it’s an error.

{ “id”: <mixed>, “error”: { “code”: <int>, “message”: <string> } }

Events

These may be sent by the browser at any time.

{ “method”: <string>, “params”: <object> }

Some of these are of interest to us. Some of them are not and can be ignored.

Operating the Browser As Usual

In order to let the browser print us a web page (or HTML) to PDF we need to instrument it the same as when we do it manually on the desktop.

First we need to open a new tab:

Call: { “id”: 1, “method”: “Target.createTarget”, “params”: { “url”: “about:blank” } }

Result: { “id”: 1, “result”: { “targetId”: “418546565-d4f9-4d9f-8569-9ad8f9db7f9de” } }

Now we have to communicate with the tab, which requires a new websocket connection to: ws://10.11.12.13:9222/devtools/page/418546565-d4f9-4d9f-8569-9ad8f9db7f9de

Before we can print anything we have to tell the browser where to load the content to print from. This may either be a URI (which then needs to be accessible for the browser) or raw HTML. I’ll stick to raw HTML here, since it’s the most flexible option anyway:

Call: { “id”: 2, “method”: “Page.setDocumentContent”, “params”: { “frameId”: “418546565-d4f9-4d9f-8569-9ad8f9db7f9de”, “html”: <html> } }

Result: { “id”: 2, “result”: { } }

The next step is the instruction to print the page’s content to PDF:

Call: { “id”: 3, “method”: “Page.printToPDF”, “params”: <object> }

Result: { “id: 3, “result”: { “data”: <string> } }

What parameters you can use to influence the generation (e.g. the layout) are outlined in the official documentation.

The string you will get back is probably encoded in Base64, so decode it and store it where you want. The PDF file has been successfully generated.

If you are planning to use a single process to generate multiple PDFs with, remember to clean up afterwards. (i.e. closing the tab) Otherwise you will soon have a memory usage issue.

Call: { “id”: 4, “method”: “Target.closeTarget”, “params”: { “targetId”: “418546565-d4f9-4d9f-8569-9ad8f9db7f9de” } }

Result: { “id”: 4, “result”: { “success”: <bool> } }

I hope this proves useful to you or convinces you to give Google Chrome a try to generate pretty PDFs. 🙂

Johannes Meyer
Johannes Meyer
Lead Developer

Johannes ist seit 2011 bei uns und inzwischen, seit er 2014 die Ausbildung abgeschlossen hat, als Lead Developer für Icinga Web 2, Icinga DB Web sowie alle möglichen anderen Module und Bibliotheken im Web Bereich zuständig. Arbeitet er gerade mal nicht, macht er es sich bei schlechtem Wetter am liebsten zum zocken oder Filme/Serien schauen auf dem Sofa gemütlich. Passt das Wetter, geht's auch mal auf eines seiner Zweiräder. Motorisiert oder nicht.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

More posts on the topic Development

Mein PHP-Trainingsprojekt

PHP Schulung Vor kurzem haben wir begonnen, eine neue Programmiersprache zu lernen – PHP. In der ersten Woche haben wir mit den Grundlagen wie Variablen, Arrays, Schleifen begonnen und uns schrittweise zu komplizierterer Syntax wie Funktionen, Objekten und Klassen...

check_prometheus ist jetzt öffentlich verfügbar!

Monitoring ist komplex, das wissen wir hier bei NETWAYS leider zu gut. Deswegen laufen in der Infrastruktur auch mal gerne mehrere Tools für die Überwachung. Zwei gern gesehene Kandidaten sind dabei Icinga und Prometheus. Icinga und Prometheus erfüllen...