Jul 14, 2022

A web browser extension to automate invoice data entry

Monthly invoices come in as (text based) pdf files, and the hours worked (quantity) by hourly rate (price per unit) have to be entered manually in a 3th party web application by day.

In case such invoice contains hours worked over many days with multiple rates, the manual data entry becomes a very tedious and time consuming job.

An option for automation ..

Starting point is a pdf file, from which the tabular data needs to be extracted. On premise & cloud solutions exist that try to extract that tabular data (date, price, quantity) from a pdf file. One option is the open source tabular-java tool, which can be wrapped within a REST API-call and returns the tabular data in a JSON format. PDF in, JSON out. No promise, but this deserves a post on its own. 

But how to inject that tabular data then in the html of the 3th party web page? There comes a local Chrome web extension into play. Such an extension offers the following functionality:

  1. reads a pdf file
  2. calls the tabular web service
  3. shows the returning data, and totals, for validation
  4. has a button to push the tabular data to the 3th party web page
The latest step is done after reverse engineering the 3th party web page, so we identify the cells where the hours worked need to be entered for a given date and rate. Some JavaScript maps then the tabular data from the pdf invoice to these identifiers, and copies the data in these html-elements of the 3th party page, which effectively automates the job.

I had expected that developing a Chrome extension from scratch would be a pretty hard task, but progress went rather smooth, leveraging previous experience with HTML, CSS and more important JavaScript, together with abundant documentation from Google and available information on problems encountered by others.