Advanced PDF publishing

Modified on Thu, 7 Dec, 2023 at 11:02 AM

This article describes a PDF publishing workflow that allows sophisticated automatic generation of pages based on templates, for different data subsets, to create a resulting PDF file for offline consumption. It's also a good opportunity to introduce the following:

The Text Input block
The Report generator block
The Project metadata block
The File attributes block
Community blocks for generating and manipulating PDF files
Using Docker for custom blocks

The workflow

Let's start by downloading the workflow we've already put together. (Alternatively, just read on.)

See attached "PDF demo.ioz". Download this file, upload to your Omniscope server/installation, open and extract it, and you should see:

You should open and check every block, and correct any paths which are not correct for where you have placed the project. Also, this workflow expects an empty "PDFs" folder next to "PDF demo".

What it does

This workflow takes a dataset ("Demo: Bond prices") and a report containing page templates ("Report"), and generates a big PDF file containing a series of pages based on templates and for different subsets of the data.

In detail

In this example, we're using the "Bond prices" demo dataset as the actual data being visualised in the PDF. You would typically use the business data you are reporting on, and could have multiple datasets feeding into your report, if desired.

The Report block contains 2 tabs, acting as page templates:

And we have a Text Input block (which lets you enter data manually like a spreadsheet) defining what PDF pages to generate from which templates and filters. Note it is entirely possible to have this configuration be data-driven using a more sophisticated upstream sequence of blocks, instead of a Text Input block.

The Report generator block is then responsible for creating or updating another Report block, with tabs that represent the pages in the PDF we will be generating, automatically derived from the data above. In this case it's using Report as page templates and is generating the Auto-generated report block (when executed).

The Auto-generated report now looks like this:

(If you are not generating PDFs but simply want an interactive online report with auto-generated permutations, we're done. You can share this new report, as-is, if Omniscope is configured as a server.)

Now we move onto PDF generation. We have a report (Auto-generated report, although it could also be a manually built report), and want to produce a PDF file.

The Project metadata block yields information about what is in a project. In this case we're pointing it at the same project containing the block.

Not much to see here. It spits out some useful information about the blocks in the workflow, the reports and their data sources, and, most importantly for us, the tabs in its reports:

Now we're only interested in the tabs (i.e. pages for our PDF) from "Auto-generated report", so I'll used a Record filter block to isolate those:

Let's generate some PDFs. In the Add Block menu, you'll find some Community blocks. These are Custom blocks containing R or Python code, developed as add-ons to Omniscope, some by the community, and are hosted open source in a Github repository:

We're using Report tab to PDF (DOCKER). Our server is configured to run Custom Blocks in Docker (primarily for security and isolation reasons), so we're using the Docker variant. Using Docker also makes dependency management better; you don't need to worry about installing packages, Chrome (used for PDF generation), or using it on a different operating system. All you need to do is configure a what tabs to turn into individual PDF files, and where to put them:

The block outputs some handy information, although in this case we're not using it:

We now have a folder of PDFs, one for each tab in Auto-generated report, which you can see by browsing Omniscope:

Finally, we're going to append these together into another PDF. We could consume the previous block's output, but I'm choosing to demonstrate the use of the File attributes block, which produces metadata about files on the filesystem:

Again, we'll rely on a community block - in this case Append PDF files. This block does not have a Docker variant, since it has simple Python dependencies and can work on any server, regardless of whether the server is configured to use Docker for custom blocks.

You configure where the PDFs are (in this case our "PDFs" adjacent folder), which fields in the input data specify PDF files to include, and where to create the output PDF file:

Once executed, magically the "Combined.pdf" file will appear:

On opening, you'll see the PDF has all the desired pages, with a table of contents generated from the tab names. Of course a real-world example would likely use a consistent A4 page layout, and wouldn't include filters.

There are other ways of achieving this, or of using the same blocks for other purposes. I hope it's been a source of inspiration whether or not you're publishing PDFs.

N.B. Data driven batch publishing of reports as images / PDFs, etc. requires an Enterprise licence / plan.

Attachments (1)

ioz

PDF demo.ioz
449 KB