CourtBouillon

Authentic people growing open source code with taste

Printing the Web

Everyday we work with web technologies, but our work is a bit different from a lot of people: we don’t create websites. Our HTML and CSS aren’t displayed on web browsers, but transformed into PDF files.

This article was originally published in French for the "24 jours de web" advent calendar.

A Small Introduction

We are Lucie and Guillaume, and like many people, everyday we work with web technologies.

Like many people, we like to create nice HTML structures to insert our content. Like many people, we like to paint nice colors and build nice structures with our CSS.

But our work is a bit different from a lot of people: we don’t create websites. Our HTML and CSS aren’t displayed in web browsers. It’s transformed into PDF files to be printed, read on smartphones or computers, or simply archived.

In other words: we literally print the web.

How? Why? We’re going to see in detail that this idea isn’t as strange as it may look…

A Bit of History

Weaknesses of Classic Tools

Let’s be honest: using HTML and CSS to create PDF files wasn’t our first idea. There is a ton of tools to create beautiful documents, from Microsoft Word to LaTeX, from LibreOffice Writer to Google Docs, Adobe InDesign… These applications are generally enough to meet a large list of needs, from the simple letter written in two minutes to the magazine ready to be printed.

However, there is a need for which these tools aren’t really made for: automatic document generation.

Let’s say that you have a online shop. You like creating amazing ceramic bowls 🥣️ and the whole world can see them on your website. Customers love what you do and you quickly sell dozens and dozens of bowls. Congrats! Users fill their basket, enter their credit card number to pay, and… they’d like to have an invoice.

Hmm… You’re not going to write all these invoices by hand! The layout is always the same, but the content changes a bit: name, price, address… How can we do this?

An invoice
Your invoices are as pretty as your bowls!

The same question comes if you have an online business cards creation site, where you’d like your customers to choose among different styles, in which they may change colors, fonts or logos. Same if you’d like to generate promotional flyers or electronic labels for your shop. Same again if you’d like to print diplomas, school reports or schedules.

Anyway, you get the idea 😁️.

That’s when the idea of using web technologies to generate printable documents comes up.

After all, in each of these cases, we have to generate a document with a well-defined structure and layout. To print content coming from a database, HTML has nothing left to prove. You can use whatever tools, frameworks, libraries and languages you want to create the pages you need. And to lay out this HTML, what’d be better than CSS? You can create one or more stylesheets depending on your desires, you can even customize them with variables or pre-processors (like Sass) if you’d like to.

Alright, we see how this works to display websites on browsers. But for a paged document, honestly, wouldn’t that be far-fetched?

The Strengths of Web Technologies

Actually, and as surprising as it may be, it’s not. And not only to generate some invoices for your cute bowls. There is a high probability that you have in your library a book that has been made with HTML/CSS! Publishers are often discreet on the technologies they use, but some like Hachette talk about them. Real books made with HTML/CSS, sold to real customers, without anyone knowing!

So, if big publishers use these technologies "for real", that can work for other, can’t it?

It’s not the product of chance… It’s time to dig a little, let’s go back to 1996. The IT world only speaks about Windows 95, the new Microsoft operating system, which doesn’t even have a default browser. 90% of Internet users (which are not very numerous) use Netscape, which will integrate this year a hastly-created language: JavaScript.

In 1996 is also published the first CSS2 specification. To build an open and interoperable standard, the W3C writes a document that’s still a reference more than 25 years after. In this document, we can find a clear definition of the syntax, selectors, and a lot of basic properties we still use today. But that’s not all…

You can find in it a whole chapter about paged documents generation.

CSS creators like to think outside the box. They didn’t design the specification for PC and Mac cathode ray screens of 1996 only, and not just for mice and keyboards. They already think about displaying websites on TVs, Braille pads, portable devices (the first iPhone will be released more than 10 years after), reading their content with voice synthesizers… or, of course, printing documents.

If the idea is praiseworthy (and absolutely visionary), but it brings many concrete issues. Automatically cutting a web document into several pages with a fixed size has specific issues, and CSS2 covers only a portion of it: forcing or avoiding page breaks, adding page margins and inserting page numbers, defining a format for printing. It’s a good start but it’s not enough. Without us being aware of it, tons of printed documents teem with details that we need to take care of to get a quality layout.

In Practice

So, what’s it like to print the web?

When we create a document, whether it’s a letter, a poster, a book or something else, we have a lot of choices. For example, the format: will my document will be printed on A4 pages, A6, on 15cm × 24cm sheets, portrait or landscape? That’s something that is easily doable in CSS. Let’s choose A4 format:

@page {
  size: A4;
}

That was quite easy!

Did you notice that, on documents with several pages, margins are often different on left and right pages, and that page numbers aren’t always on the same side? This too is easily doable in CSS:

@page :left {
  margin: 20mm 10mm 20mm 15mm;
  @bottom-left {
    content: counter(page)
  }
}
@page :right {
  margin: 20mm 15mm 20mm 10mm;
  @bottom-right {
    content: counter(page)
  }
}

You got the idea, CSS can deal with these issues for web and paged documents. Let’s try with other issues more specific to documents 🤯!

Photo of an opened book
Books are beautiful

Do you see what leaders are? These little points you find on table of contents, linking chapter titles to their page numbers. Alright, it’s not always little points, but you get the idea. This is also doable with CSS:

#table-of-content a::after {
  content: leader('.') ' ' target-counter(attr(href), page);
}

It seems to be a bit complex, but let’s have a closer look.

target-counter() gets the number of the page on which the chapter starts. leader() is the main character of our table of contents: this function (yes, there are functions in CSS) will draw the little points between the chapter name and its page number. Each line will have the right amount of points depending the space available, and the points of all lines will be perfectly aligned: leader() does things nicely.

We won’t have a tour of all the possibilities, but CSS is full of features and properties that help us to create documents with a lot of details. Whether you want to allow or prevent page breaks inside blocks, whether you like to insert footnotes, text on multi-columns, or something else, you can often find a way to achieve it.

By creating documents with HTML/CSS, you benefit from all the specific features for paged documents already managed by CSS, but you also benefit from all the power of CSS that you already use for your websites!

As we included some CSS samples in this article, we used them to generate a simple PDF version of this article in French 😄. If you want to have a closer look at the code, the stylesheet is available in this GitHub repository.

The article transformed in PDF
The article (French version) laid out

The Tools to Make Your Documents

It’s nice, we have our content inside some HTML and a nice layout inside some CSS, but how to generate PDF with that?

There are a lot of different tools that transform HTML/CSS into PDF. Each tool has its strengths and its weaknesses, but all stay interoperable because they use the same HTML and CSS standards.

Among these tools, the first one you can easily try is… your browser.

Printing your HTML page as PDF will, unsurprisingly, give you a PDF. This solution is really convenient and doesn’t require you to install something on your computer.

Generating PDF files with your browser comes with some limitations. One is that your browser’s main goal is to allow you to browse the web. That’s its main feature (you already know that), and it explains why browsers are far from including all CSS properties and features specific to print, like footnotes.

That’s why dedicated tools have been developed. Unlike browsers, these tools are specifically developed to generate paged documents and are thus more careful about paged media features.

What are these tools? We can for example list PagedJS, Vivliostyle, PrinceXML, PDFreactor, Antenna House or WeasyPrint.

Among these tools, some are open source and easily available:

  • PagedJS is a JavaScript library that, in addition to transform your HTML/CSS into PDF, allows you to see the rendering directly in the browser.
  • Vivliostyle is developed in TypeScript, can be used directly in your terminal, and also allows you to visualize your PDF. Vivliostyle is known to manage right-to-left and top-to-bottom writing modes very well.
  • WeasyPrint is a Python library available with command line or callable from your Python application. It’s a tool that you really should try… as we develop it 😉️.

If you want to see what these tools are able to generate, you can have a look at:

Now, you really want to try this, don’t you?

The Web Diversity

Creating PDF documents with web technologies is a niche, but it teems of ideas, tools, solutions. People from this microcosm often shares values and desires, and sometimes confront vehemently their antagonistic points of view. Not everyone always agrees, of course 😁, but at least things can move thanks to the work of the people behind standards, the tools’ developers, and their users.

This boiling world is world of the web as we know it, at least for now. The older readers of this article certainly recall the dark hours of the web at the beginning of the 2000’s, with Internet Explorer 6 and its 90% market shares that didn’t care of interoperability and innovation, like any other omnipotent ogre.

The actual position of Chrome is getting closer and closer to a quasi-monopolistic situation. In the short or middle term, this may lead back an particularly innovative ecosystem in the torpor and the constraints enforced by a despot. Google shamelessly includes surveillance tools and promotes online ads in its browser. The company doesn’t do this out of malice, but for a simple reason: to increase their income. Whereas a relative balance had been found with other implementations, current trends (disenchantment for Firefox, use of the Chrome rendering engine by Edge and Opera) suggest that the balance of power doesn’t exist anymore to counter the assertive agenda of the search engine.

In front of this, it appears that it’s important to continue playing the interoperability game. The web, and Internet in general, works well thanks to a lot of tools based on the same protocols, languages and formats, like TCP/IP, HTTP, HTML, CSS, JavaScript… One of the strengths of these technologies is their multiple implementations: whether you make a static website, a REST API, a scraper, a web app or a streaming service, you’re able to choose between languages you like and tools you prefer. All of this is possible because these technologies are the product of consensuses, trying to focus on the user’s best interest, instead of the interest of a private actor or a specific public. And in the end, for us, participating in this consensus by developing this alternative usage is a modest but sincere way to help this open web to live.

We’re Lucie and Guillaume, and like many people we work everyday with web technologies. We’d like that to continue for a long time, we’d like to be able to invent and print our content on pages, independently of the wishes of big actors with which we don’t necessarily share goals. So, at our small level, with our nice documents, with their beautiful letters and their harmonious colors, with our open tools and standards, with our passion and our good mood, we try to keep this web diversity alive 🌱.