Finding PDF documents and extracting document properties -- 36684

Closed Posted 3 years ago Paid on delivery
Closed Paid on delivery

The task is to find dynamically generated PDF documents yourself and extracting their metadata.

Just like most of the Word documents are made using Microsoft Word, most of the PDF documents are made using Adobe products. However these are mostly static documents like, books, brochures. Another example of a static document is that people sometimes convert their document from Word to PDF. In our study, we are NOT interested in these kind of documents.

The software used to create a PDF document is mentioned in the properties of the document (called Producer Line). What I want you to do in this task is to check the producer line of the PDF documents that you receive in your emails. I say emails, because nowadays companies send PDF documents via email and these are NOT static documents. They generate same looking document for every customer but with different data in it. These are the documents that I am interested in. Example document types might be, invoices, telephone bills, subscription documents, personal letters, quotes, certificates etc.

Note: I am not interested in any kind of personal data. Just to give an analogy here: you have your house and your house is made out of bricks. I ask you to check the brand of the bricks that the constructor used when building your house.

So, in short, you need to:

1. Check your emails containing PDF documents (in gmail you can search for "filename:pdf")

2. Download the PDF document to your PC.

3. Open the PDF document using Adobe Acrobat Reader (or whatever reader you use)

4. From the menu, select File -> Properties (or Ctrl + D)

5. Copy the full "PDF Producer" line

An example producer line can be:

"Adobe PDF Library 15.0"

"Adobe LiveCycle PDFG ES; modified using iText® 5.5.6 ©2000-2015 iText Group NV (AGPL-version)"

Your report should be a simple excel file that contains

- What kind of document is this? (for example credit card statement document, telephone bill, student score report, boarding pass etc.)

- The company that created the document (for example Wells Fargo Bank, or the name of the insurance company)

- The PDF Producer (as explained above)

You will be paid by the number of documents that are unique only. "Unique" means, different banks, different insurance companies, different airline companies, basically any company sending out PDF documents to people. Unique also means different document type in companies, for example bank statement and mortgage statement are two different document type, so these are counted as unique as well, even though they might be from the same bank. You will be paid for every unique PDF producer of document you present.

I have attached an example screenshot of the producer line of a document.

Java Python Programming .NET JavaScript

Project ID: #29505219

About the project

1 proposal Remote project Active 2 years ago

1 freelancer is bidding on average $25 for this job

solutionsplayers

I am a Professional Data Entry Person & have More Than 4 Years Experience. My skills: 1: Microsoft word 2: Microsoft excel 3: Power Point 4: Pdf to word file 5: Image to word (100% guranted accuracy) Here is just a sma More

$25 USD in 2 days
(1 Review)
1.4