Workplace Automation: Generate PDF Reports Using Python

Weave automation into your workplace — Use python to automatically generate PDF based reports (e.g annual reports)

David-kyn
6 min readJun 16, 2021

Context: This article is part of Heicoders Academy’s continual effort to equip students who have graduated from our courses with bite-sized skills/tools that they can easily adapt and implement in their jobs.

Introduction

Writing reports are part and parcel of any working professionals’ job scope. Even software developers or data scientists may have to write reports to showcase the performance or mechanism of their applications / machine learning models. Yet, writing reports can be such a repetitive, mundane and time-consuming task. But it is exactly these attributes which makes writing reports a low-hanging fruit in the world of automation. You can easily whip up an automated report generator to improve your company’s productivity.

In this article, we will show you how to (1) use python to read data from CSV/excel, and (2) use this data to create and add visualisations to a PDF. The end product is a piece of code that you can use to repeatedly generate your work reports effortlessly.

Figure 1: 2-Page PDF Generated by our Python Code

There are mainly 5 parts to this application:

  1. Import / install libraries
  2. Data ingestion / processing
  3. Data visualisation
  4. Write PDF utility functions
  5. Create PDF report

1. Import / install libraries

Before we begin, we need to install and import the libraries which we will be using for this application. The star of the show in this case is the — FPDF library which is responsible for generating our PDF. The rest of the libraries (Matplotlib, Pandas) are tools you should already be familiar with especially if you have taken our AI100: Python Programming & Data Visualisation course.

First we install the essential libraries:

Next, we import all the essential libraries which we will be using for this application:

2. Data ingestion / processing

The next step is to write codes that will help us ingest and process the data which we want to add to our PDF. Specifically, we will be ingesting data from CSV/Excel. Those familiar with Pandas would know that it only takes one line of code to ingest data from CSV/excel:

And here is how our data looks like:

Figure 2: Pandas Dataframe of the Data Extracted from CSV

This is a dummy dataset of annual sales data. For the purpose of this application, we will be adding this tabular annual sales data to our PDF directly. But you can do your own processing (i.e aggregation) to extract more insights from the data, depending on your own business context.

Here we wrote some simple code to style our pandas dataframe. However, this step is purely optional and for aesthetics purpose only.

And here is how our dataframe looks like after styling:

Figure 3: Styled Pandas Dataframe

Lastly, we will save the styled dataframe as an image, and store it in our resource folder. The reason we do so is because it is much easier to add an image than a table with the FPDF library.

The above code uses the dataframe_image library to export the dataframe as an image, and store it in your local folder of choice.

3. Data Visualisation

For this particular report, we are intending to add 2 charts to the PDF: (1) line chart to show trend, and (2) pie chart to show component breakdown. Here we make use of Matplotlib to generate the charts, and we wrap the codes for generating the line and pie chart into two methods respectively:

  • generate_matplotlib_stackbars()
  • generate_matplotlib_piechart()

We are not going to delve into the specifics of our Matplotlib code as those that have taken Heicoders Academy’s AI100 should be very familiar with this library already. For those that have take AI200: Applied Machine Learning, you can also use the Plotly library instead to generate your visualisations.

One thing to note is that we use the plt.savefig() to save a copy of the visualisation generated in the form of an image. Again, this is because it is easier to add images than the visualisations itself to our PDF. Next, we invoke the two functions we wrote to generate the charts!

You should get the following visualisation when you run the above function:

Figure 4: Matplotlib Visualisation

4. Write PDF utility functions

Here, we will write some functions that will help define our report structure through the FPDF library. Before we begin, we will cover some of the commonly-used functions from FPDF library. For more functions, please refer to the FPDF library.

  • set_font():Used to set the font type and font size of your text
  • set_text_color():Used to set the color of your text
  • write():Takes in a string, and append it to the PDF document based on the pre-set font type, size and color.
  • ln():Adds a line break. The height of the line break is based on the number passed in to the function.
  • image():Takes in the filepath of an image, and appends that image to the PDF.

Using the functions provided by FPDF, we then write 3 functions:

  • create_letterhead():Adds an image of your choosing as the letterhead of the PDF document. In this case, we are adding this image which we designed as a letterhead (you can easily create one yourself using powerpoint).
Figure 5: Letterhead
  • create_title():Takes in a string and add that as a title to the PDF document
  • write_to_pdf():Takes in a string and add that as a normal string of text in the PDF document

Here, we extend the FPDF class in order to modify the footer function inherent to the FPDF library. We modified the footer function to add a page number to our PDF whenever the function is invoked.

5. Create PDF Report

This is where everything comes together. We will call all the functions we wrote previously to piece our PDF together.

When you run the above code, python will automatically generate a beautiful 2 page PDF report for you!

Conclusion

The best way to learn the programming is through doing. As such, instead of launching in lengthy explanation about how each of the FPDF function works, we think it is more beneficial to provide you with the sample code so that you can reverse engineer and try each code block. We coded the entire application in python on Jupyter notebook, and broke down the entire code into chunks to facilitate your understanding.

You can find this jupyter notebook in our telegram group: https://t.me/heicoders_professionals

While we only generated a 2-page PDF report, the sample code should cover most of the essential functions in the FPDF library as well as the ways to apply them. You can now easily customise (e.g. adding Plotly visualisations) the sample code provided to generate even more sophisticated PDF documents. The sky is the limit!

In future, we will be writing more of such workplace automation articles as well as provide the sample code to accelerate your learning. Do leave a comment if you have any particular use case which you would like us to explore. Cheers folks!

--

--