Closed

Extract Non-Dynamic Text From Webpage

We want to compare 3 versions of a web page to extract the nonchanging text (article body content). However the dynamic content on a page is making it a hard problem to solve (ads widgets etc) as dynamic ads give false positives for content changes detected.

Therefore, our theory is to visit a page 3 times and we want to exclude all dynamic text that changes on every page refresh. Leaving the article content. In production this will be used on millions of different sites, so footprints can't be used to extract content under a certain tag. It should work for any webpage.

It sounds simple, but we need to have a very low memory footprint as it will be done on millions of web pages. The script will return the non-changing text from the html of webpage, and then have a comparison function to compare text difference to other versions of the page to see how much of a change there is.

Explain your approach and how it will be faster than any we can think of or if there are any PHP library's you can use to help.

Skills: HTML, Javascript, PHP, Python, Web Scraping

See more: beautifulsoup dynamic content, java code to extract data from website, extract data from website using java, python web scraping dynamic content, extract specific data from website using java, scraping dynamic web pages python, scrape data from dynamic website, how to get dynamic data in javascript, add dynamic text webpage, dynamic text webpage, change non dynamic text as3, extract the text from the image using python, excel macro to extract data from webpage, extract text from images, We would like extract around 5000 products from a website to list on our ebay. How long and how much is cost?, extract text from html, extract text from pdf, extract text from pdf file, extract data from webpage, extract text from image android

About the Employer:
( 35 reviews ) London, United Kingdom

Project ID: #18044447

15 freelancers are bidding on average $136 for this job

mhmhz

Hi I can work on a desktop scraper which can run on a windows VPS. Will you always compare the home page? Thanks

$300 USD in 1 day
(154 Reviews)
7.6
mantislin

Hi sir, This is Lin and I am scraping expert, please check my reviews then you will know. Can we discuss more details about this project? Thanks, Lin

$155 USD in 3 days
(398 Reviews)
7.7
polarjin2017

Hello? How are you? I have good skills in this type job like scrapp.......... So I can complete your job in time. Hope to work with you. Thank you.

$155 USD in 3 days
(82 Reviews)
6.6
leoe

Hi, I'll like to work with you. This projects seems to be a challenge, and I love challenges. Please provide me the url of the site and I'll start with a demo before you chose a coder. Thanks. Leo.-

$100 USD in 2 days
(114 Reviews)
6.7
stead121

HI,Sir. I am Li G from China. I am so happy to see your serious project. I saw your job description and I am interested in your project. As you can see my review, I am a talented web scraping expert who has rich More

$111 USD in 3 days
(24 Reviews)
5.5
meeshal1994

Hello, Can you please explain the actual use of the system, I mean why you would like to extract text from millions of websites? This may will help me think of a solution. Now, for low memory footprint we always us More

$40 USD in 1 day
(33 Reviews)
5.1
FedericoRiva

Hello! I'm interested in your project. It seems interesting. I know how to scrape using python (selenium, beautifulsoup and get requests) or Bash (with a more static approach but faster than python) If you're interes More

$111 USD in 3 days
(11 Reviews)
3.6
SmithZhang

Hi, there. I am an experienced web developer, and web scraping expert. I read your job description, and I am interested in your project. I am a new freelancer here, but I have good experiences in web scraping using More

$111 USD in 3 days
(6 Reviews)
3.3
itking1234

Hi, there! As an experienced Web Scraper with 5 yrs, I'm sure I can do your job. I have scraped over 4700+ websites to get information like opening time, location, email, shop name and so forth according to zip cod More

$150 USD in 3 days
(11 Reviews)
3.5
bestsolz1

Hi, Actually You want to exclude dynamic test from web page that create conflict with ads please come on chat for more [login to view URL] we discuss briefly and finalize it. Note: We are not starter just take start on More

$155 USD in 3 days
(8 Reviews)
2.8
supriyasaini890

Hello! Hope you are doing great! Before going into the deep let me ask you some basic questions so that I can interact with our team and able to deliver you the best possible product. 1. What will be the payme More

$30 USD in 3 days
(1 Review)
0.0
MelissaFumero

According to my 8-year experience, I can easily handle this project give me chance to complete this project. Thanks

$45 USD in 3 days
(0 Reviews)
0.0
DevPrateek

Hello sir, I have more than 5 years of development experience , I have experience in php,python and java , I can help you out in the task you mentioned, We can do it by python script easily. Please hire me and We More

$160 USD in 3 days
(0 Reviews)
0.0
sunil02324

I am going to use Python & BeautifulSoup to extract the content from webspage & remove the dynamic content from it.

$200 USD in 1 day
(0 Reviews)
0.0
DevAtPlay

Since you need a general scrapper there isn't a very reliable method of doing what you wish to do in all cases especially under the budget requirements. But here are some approaches that might work: Use the block li More

$222 USD in 10 days
(0 Reviews)
0.0