Find Jobs
Hire Freelancers

Machine Learning: Document Classificiation

$250-750 USD

Closed
Posted over 7 years ago

$250-750 USD

Paid on delivery
We need someone to write some software to help organize and name PDF files. There are 5 different documents we are looking to extract, each with its own unique characteristics. We need the software to look at a PDF bundle and: a) recognize what kind of document the page is (e.g. classify it) b) extract the page(s)and move the document into a folder c) name the file given some data that is on the form
Project ID: 11901329

About the project

14 proposals
Remote project
Active 7 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
14 freelancers are bidding on average $548 USD for this job
User Avatar
Hello! I'm a computer engineer and a Phd student. My research focus is on data science and machine learning. I have previously implemented prediction and recommendation systems in different domains such as bioinformatics and social networks. In another project, I've worked on classifiying academic papers to topics based on the content of the paper. In that project, I designed a system which learned the words that are strong indicator of a specific topic and classified the papers according to relevance of these words. I think we can use similar strategy to solve your problem. Looking forward to talking to you! - ykocak
$555 USD in 15 days
4.9 (12 reviews)
5.6
5.6
User Avatar
A proposal has not yet been provided
$666 USD in 10 days
4.8 (1 review)
3.2
3.2
User Avatar
I am a Mathfin graduate and have done such projects in academic and professional setting. I have fair amount of experience in machine learning especially MLP and have experience in using nlp packages in Python as NLTK, Gensim. I am really proficient in Python and R and can do these implementations with rigorous detail. Let me know more about the task, so that I can take a look
$538 USD in 10 days
5.0 (1 review)
2.1
2.1
User Avatar
I'm an active competitive programmer. I have a lot of experience with algorithmic and machine learning tasks.
$500 USD in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
My main field of interest at the moment is machine learning and not so long ago I have worked intesively with extracting information from PDFs as well as editing them. Depending on the data you have, I am thinking of one of two strategies: If you have labeled PDFs, already classified, we can use them almost immediately with some preprocessing to train a classifier. If not, we have to use some natural language processing techniques to extract specific features that will help in the classification. I am supposing, of course, that the classification will be done on text data from the PDFs
$500 USD in 5 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I am a programmer in a range of technical areas involving Web, IoT, BI and BD Analytics. I primarily work in Java and Java related technologies, however I have been working in a variety of programming languages such as Python, JavaScript, Scala. My area of expertise is in Analytics for Web, IoT and BI, involving the new age technologies for development of real time analytics and complex algorithm on a wide variety of dimensions of data. I am proficient in ML, NLP frameworks that are available in the open source world, and has around 9.5 years of experience in the industry. I see that you have challenging requirement and an opportunity to work together would be ideal. I wish to talk to you more about this and would appreciate the opportunity to discuss with you how my assistance would drive your goals to success.
$666 USD in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I'm a very tallented programmer with great algorythmic skills which is proved by many programming contest I have won (they are listed in my profile description). I know neural networks techniques and used them successfully to do image recognition in the past. I guess this time PDFs are to be recognised by the text they they contain so I would use word2vec first and then apply either RNN or CNN depending on which one gives better results. I could say more if I saw a few examples of PDFs.
$500 USD in 15 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Approach to organize (Document clustering)and name PDF files Reading PDF File contents. Using JAVA language, APACHE PDF Box Open Source tool Indexing Each file. Using Java to index files for classification Tagging document type based on contents. Using best suited methods for data (Naive Bayes Classification, aximum Entropy Classification, or probabilistic Grammar Classification) Naming document as per relevant content inside. Java using classified data For more details Please feel free to contact. You will like my work and dedication on this
$555 USD in 20 days
0.0 (0 reviews)
0.0
0.0
User Avatar
statistics - data mining / machine learning, pattern recognition, neural network, random forest, ridge regression, lasso, nearest neighbor, cluster analysis, multiple linear regression, logistic regression, logit / probit transformation, spline, kernel smoother, nonparametric statistics, support vector machines, cross-validation, model selection, principal component analysis ( PCA ), canonical correlation analysis, Monte Carlo, variance reduction, antithetic / importance sampling, covariate, Markov Chain Monte Carlo, EM algorithm, Gibbs sampler, Metropolis - Hastings, Bayes rule, Bayesian Statistics, conjugate prior, posterior distribution, conditional expectation, multivariate distribution,
$555 USD in 10 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
Carrboro, United States
5.0
11
Payment method verified
Member since Dec 18, 2014

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.