We need a piece of software that the coder can use and later we can use if we need to, to be able to OCR many PDF files and enter then into an Access database. The PDF are electronically generated (they are NOT scanned) They have the SAME format, however there are about 200 pages (seperate PDF's) that make one manual. The manuals are automotive in nature.
## Deliverables
You will be given a folder containing the PDF pages for ONE manual approximately 200 pages
SEE ATTACHED Example FILE:
Step1: Parse EACH PDF (some have 2 pages, but have the same format)
They have the following titles that will need to be populated into the following database fields:
Model
MY
Group
Table
Ref
Code
Qty
Description
Each Folder will be seperated by the YEAR and MODEL
Step 2:
The next part is we will need either the coder or the program to extract the Image from each PDF and RENAME it the Table number [login to view URL] and saved into a single folder identified by the Year and Model
Step 3:
Compile ALL the PDF's into ONE manual for each Year and Model IN the order of the Table number saved as the name of the Year and Model.
We have about 34 manuals so it really needs to be automated to eliminate as much human error as possible.
**** IT DOES NOT MATTER WHAT TYPE OF PROGRAM THAT YOU MAKE AS LONG AS IT CAN DO THE WORK ON YOUR COMPUTER AND SO THAT WE CAN USE IT LATER AS WE GET MORE PDFS