Write a standalone Python program to Automate the process of scraping some data from different web URL sources to aggregate the content as it's updated. The program will launch a spider(s) each with a specific name, URL, table, class title, href, description, and category. Once data is aggregated into a list - we want to make text edits. Next, the output "edited" data format will be parsed into json files and google sheets as a backup. Then present the data by category - in a new list area it in a flask web page that will be published online. Url test sites will be provided. Coder will only use PyQt5 or Tkinter for the GUI. We will use Anaconda (python) 64/bit and Sublime Text 3 or Atom so must also work with this.
The program is specifically for when the content structure changes. Must be able to launch a new spider with the category name and save it as a mini python script with the "category" name. The, we must be able to change the URL and data to scrape another source. Must be able to run multiple scripts (mini-programs) to scrape.
** Budget $100 Max
** Timeline to complete - deadline 7/days Max
** Auto responses will be deleted - Use the word "MiniPy" when you bid.
1. scrape every _ five days (setting for days) 5, 10, 15, 20 etc
2. collect data
a. Use 5 random headers, and 5 random user agents
b. rest and close every time once data is collected.
3. parse or format data into:
a. json data to be used
b. google sheet (backup)
4. Eliminate duplicates for all spider data collected
5. Edit - Add replacement words in data for all spiders
6. Output the data in two formats
a. format #1 output to json = formatted data with replacement words to a json web file
b. format #2 output data format data with replacement words to a Google sheet by category.
7. Web List
a. by category
b. is dynamic content
scheduling python script
[login to view URL]
[login to view URL]
a. Only provide examples of your WordPress work
b. We will supply a screen shot/images and files to coders we are interested in working with.
c. Budget Requirements: Not more than ($100 max) will be accepted to complete this project
d. You must have more than 10 successful projects to be considered for this project, along with favorable ratings.
e. Project timeline - 7 days max coders deadline
*** Bids over the budget will be automatically deleted.