Script to parse non-English (Marathi/Hindi) PDF to extract unicode strings

Closed Posted 7 years ago Paid on delivery
Closed Paid on delivery

You will have to write a script to parse unicode strings out of a Marathi/Hindi PDF.

Here is the example PDF (attached as well):

[url removed, login to view]

This PDF has multiple pages. Each page has a top heading and then there are various cells arranged in tabular fashion.

For e.g. with this file (which is attached too):

[url removed, login to view]

I will like this file to be parsed to generate a CSV file with following fields:

1) "Assembly No" : 197 (highlighted portion in [url removed, login to view])

2) "Part No": 152 (highlighted portion in [url removed, login to view])

3) "Section No": 1 (highlighted portion in [url removed, login to view])

4) "Section Name": "मदकर टयदडर पदळडपरळगनगर रदजगपरनगर तद. खखड जज. पपणख जपनककड 410505" (will be in unicode and it is the highlighted portion in [url removed, login to view])

5) "Epic Id": KXH1173293 (highlighted portion in [url removed, login to view])

6) "Serial No": 5

7) "House No": 69 (highlighted portion in [url removed, login to view])

8) "Age": 60 (highlighted portion in [url removed, login to view])

9) "Sex": पपरष (Will be in unicode and is the highlighted portion in [url removed, login to view])

10) "Name" : पवदर सकनन लकमण (Will be in unicode and is the highlighted portion in [url removed, login to view])

11) "Relative Name" : पवदर लकमण (Will be in unicode and is the highlighted portion in [url removed, login to view])

So script should look like following

python [url removed, login to view] -i [url removed, login to view] -o [url removed, login to view]

CSV file generated should have all the fields properly quoted and escaped. It should also contain the header line.

Ur Script Run Sucessfully on

[url removed, login to view]

[url removed, login to view](4).pdf

C# Programming Java Python

Project ID: #12053692

About the project

4 proposals Remote project Active 7 years ago

4 freelancers are bidding on average $184 for this job

huuloiofficial

Hi, I have a very strong experience with Java. I could help you complete this project. Please let me help you. Thank you very much.

$155 USD in 3 days
(8 Reviews)
3.5
srikanthkiwi

hi, I have worked on similar project with HTML audit reports parsing to get some key words from that. I can reuse that code to do this . I have completed that project on time, on budget with good review. please let me More

$222 USD in 10 days
(2 Reviews)
3.1
creativesoft3

Dear Client, Greeting of the day ahead !!! Thanks for providing us opportunity to place bid over the project and communicate with you. I am a serious bidder here and i have already worked on a similar project befor More

$200 USD in 6 days
(0 Reviews)
0.0
arator

No problem, I can handle your task. Any language will be extracted correctly. Let's go?

$111 USD in 2 days
(0 Reviews)
0.0