Monthly invoices come in as (text based) pdf files, and the hours worked (quantity) by hourly rate (price per unit) have to be entered manually in a 3th party web application by day.
In case such invoice contains hours worked over many days with multiple rates, the manual data entry becomes a very tedious and time consuming job.
An option for automation ..
Starting point is a pdf file, from which the tabular data needs to be extracted. On premise & cloud solutions exist that try to extract that tabular data (date, price, quantity) from a pdf file. One option is the open source tabular-java tool, which can be wrapped within a REST API-call and returns the tabular data in a JSON format. PDF in, JSON out. No promise, but this deserves a post on its own.
But how to inject that tabular data then in the html of the 3th party web page? There comes a local Chrome web extension into play. Such an extension offers the following functionality:
- reads a pdf file
- calls the tabular web service
- shows the returning data, and totals, for validation
- has a button to push the tabular data to the 3th party web page
I had expected that developing a Chrome extension from scratch would be a pretty hard task, but progress went rather smooth, leveraging previous experience with HTML, CSS and more important JavaScript, together with abundant documentation from Google and available information on problems encountered by others.