Find Jobs
Hire Freelancers

Discussion forum post scraper

$30-250 USD

Completed
Posted 7 months ago

$30-250 USD

Paid on delivery
Post Scraper is a Python based web scraper that can scrape the text-only data of online discussion forums relating to a list of apps and save that data to MySQL database, with keywords and sentiment analysis. The purpose is to create a table where each row contains: 1) Name of the app, 2) URL crawled, 3) Date when the topic was started within the discussion forum, 4) Scraped text-only content (i.e. the online discussion forum comments from users commenting about the app), 5) Extracted keywords, 6) Overall sentiment of the discussion, e.g. positive, neutral or negative. The implementation shall be based on Python’s Beautiful Soup, KeyBERT for the keyword extraction and VADER or other similar library for sentiment analysis. Note: All the used libraries must be such that can be used locally, i.e. nothing remote API based. If the discussion forum requires JavaScript to work, or it is protected against scraping, then such forum does not need to be supported (i.e. no Selenium) The data analysis of the scraped forum content must be generic, so it will support all of the example forums (see below), but also other forums that use the same forum software (e.g. phpBB, vBulletin or XenForo). When saving the text content, i.e. the discussion relating to the apps, only the first page of the discussion shall be saved (i.e. the sub pages of discussion are not crawled or saved), and the text content of the discussion shall be saved without any html or bbCode tags, and without username or other metadata. When determining whether a discussion is related to any apps within the apps table, only the topic (headline) of the discussion shall be analyzed and all topics without any app name matches shall be ignored and not analyzed. The Python script must work when run from Windows and from Linux host. With the final delivery, please also include the pip etc calls required to install all used Python libraries, and SQL calls to create all the used MySQL tables if different from below. Please discuss and agree with me any changes to the database table structures. Suggested database structure I suggest the following MySQL database tables: Post_scraper_input_urls shall contain the high level starting URLs: CREATE TABLE `post_scraper_input_urls` ( `id` INT NOT NULL AUTO_INCREMENT, `url` VARCHAR(512), UNIQUE KEY `url-idx` (`url`), PRIMARY KEY (`id`) ) ENGINE=InnoDB; For testing purposes, we shall assume this table contains rows: [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] Post_scraper_input_apps shall contain the names of apps: CREATE TABLE `post_scraper_input_apps` ( `id` INT NOT NULL AUTO_INCREMENT, `app` VARCHAR(512), UNIQUE KEY `app-idx` (`app`), PRIMARY KEY (`id`) ) ENGINE=InnoDB; For testing purposes, we shall assume this table contains rows: Notepad++ Revo Uninstaller Firefox Asana Google Chrome uTorrent Spotify Pinta Adobe Photoshop Post_scraper_queue shall contain all the URLs. This table is used to store the queue of not yet crawled URLs to be crawled in the future (i.e. `crawled` = NULL), and all the already crawled URLs in order to not to crawl these URLs again (i.e. `crawled` NOT NULL) CREATE TABLE `post_scraper_queue` ( `id` INT NOT NULL AUTO_INCREMENT, `url` VARCHAR(512), `updated` TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `crawled` TIMESTAMP DEFAULT NULL, UNIQUE KEY `url-idx` (`url`), PRIMARY KEY (`id`) ) ENGINE=InnoDB; Post_scraper_data shall contain the results of the scraping. CREATE TABLE `post_scraper_data` ( `id` INT NOT NULL AUTO_INCREMENT, `url` VARCHAR(512), `updated` TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `post_content` TEXT, `post_date` DATE, `post_sentiment` INT, `post_keywords` VARCHAR(512), UNIQUE KEY `url-idx` (`url`), PRIMARY KEY (`id`) ) ENGINE=InnoDB; Note: The post_sentiment here is defined as INT. I’m suggesting that we define that if post_sentiment < 0 then the sentiment of the discussion relating to this app is negative, if post_sentiment >= 0 AND post_sentiment <= 100 then discussion is neutral and if post_sentiment > 100 then discussion is positive. However, if you wish to save the sentiment in some other format, that is also possible, but please confirm with me before doing so. The post_keywords shall be the list of extracted keywords, separated by comma. Please implement a hard coded minimum and maximum lengths of post_content. Let’s define MIN_POST_CONTENT = 200 and MAX_POST_CONTENT = 6000 characters. The post_content shall contain the text only content without any html or other formatting tags, and without any metadata (e.g. usernames, post times, user signatures) of the entire discussion’s first page.
Project ID: 37352410

About the project

20 proposals
Remote project
Active 7 mos ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
I will send you a demo of your project before you choose me, please message me for details. I understand that you are looking for a post scraper that can scrape the text-only data of online discussion forums relating to a list of apps and save that data to MySQL database with keywords and sentiment analysis. My Python based web scraper Post Scraper can do exactly that. It can also handle forums that require JavaScript to work or are protected against scraping. With my help, you will be able prevent any possible issues with the project's timely completion. My team and I have worked on many projects like this in the past, so you can rest assured that we will provide outstanding service while keeping your deadline relaxed if necessary. We guarantee quality work at competitive pricing while providing thorough communication throughout the process.
$100 USD in 7 days
5.0 (2 reviews)
0.0
0.0
20 freelancers are bidding on average $326 USD for this job
User Avatar
Hi I'm an expert in web scraping I have read your project description i can can scrape the text-only data of online discussion forums I worked on projects like yours I think I can finish more quickly than others.i will take less than 1 day I wait for your reply, I can start work now Thanks
$250 USD in 1 day
5.0 (38 reviews)
6.2
6.2
User Avatar
Hi there, ★★★ Scrapping / Python / Selenium Expert ★★★ 10+ Years of Experience ★★★ I've read requirements and ready to scrape the text-only data of online discussion forums. Some major works we do: ✔️ Product Websites Scraping: eCommerce (Shopify, eBay, Amazon, AliExpress. etc) ✔️ Advertising or Ads posting sites (gumtree, Rightmove, Asian metal. etc) ✔️ Login required websites ✔️ Process Automated Tasks (Automatic download file from any website pdf, CSV, etc) ✔️ Software and Scripts Development ✔️ With Browser load and Without Browser load ✔️ Web research and Data Entry work as well, ✔️ Scrapped Datastore into multiple form SQL server, CSV, Excel any many more like JSON. We use different tools and techniques to scrape the websites depending on the kind of task but mainly Python, C# Desktop Application and Selenium and Html Agility. As the requirements on project are not complete and need discussion so i placed tentative bid. To provide you the best quote, let's have a chat! Best Regards, TechPlus Team
$2,500 USD in 30 days
4.4 (25 reviews)
5.9
5.9
User Avatar
Dear, Client I am thrilled to express my interest in your web scraping and automation project. I am confident in my ability to deliver outstanding results that align with your requirements. My service 1: Scraping API or Html or JavaScript rendering web pages using python requests, scrapy, selenium and bs4 2: Fast scraping without blocking by multi-threading, aiohttp asycnio,Proxy combining, ReCaptcha bypass and Stealth browsers. 3: Console or GUI-based scraper using python Tkinter or Pyqt5 4: Real-time stream data scraping via client WebSocket. 5: Notification to users at several conditions while monitoring by using webhook. 6: Storing into CSV or google Sheets or uploading into DB such as Mysql, MongoDB, and Postgresql. 7:Web app that shows scraping or lives monitoring results. 8:Scraper schedule running by Cron. I am excited about the opportunity to contribute to your project's success. Looking forward to the possibility of working together. Best Regards Manoj
$220 USD in 5 days
5.0 (10 reviews)
5.1
5.1
User Avatar
Hello, My name is Narendra and I am a professional with extensive experience in developing web applications. I have worked on a variety of projects, including travel portals with real-time notifications, supply chain management tools, inventory management software, and social media portals similar to LinkedIn and Facebook.
$140 USD in 7 days
5.0 (6 reviews)
3.6
3.6
User Avatar
Dear Client, I am writing to express my keen interest in undertaking the project to develop a Python-based web scraper, Post Scraper, for your specified requirements. I have a strong background in Python programming and extensive experience in web scraping, data analysis, and database management, making me well-suited for this complex and challenging task. I am eager to discuss the project details further and address any additional considerations you may have. Your satisfaction is my priority, and I am dedicated to ensuring that the final solution aligns perfectly with your expectations. Thank you for considering my proposal. I look forward to the opportunity to collaborate with you and contribute my expertise to the successful implementation of the Post Scraper application. Best regards, Lalit
$220 USD in 7 days
5.0 (16 reviews)
3.7
3.7
User Avatar
I am an artificial intelligence expert with more than 12 years of company work experience, deep experience, and strong abilities in various fields of artificial intelligence such as computer vision, machine learning, deep learning, and Image processing(OpenCV, YOLO, SSD, OCR, CNN, RNN). Your project matches my role and I have sufficient ability to complete your project perfectly in a short time. Extensive experience in implementing advanced machine learning algorithms and neural networks using Python libraries such as TensorFlow, Keras, PyTorch, and OpenCV to build powerful AI-driven applications. Proficient in developing predictive models, recommendation systems, and natural language processing algorithms. Hands-on experience with cutting-edge machine learning techniques such as transfer learning, deep learning, and reinforcement learning, and expertise in data exploration, feature engineering, and model selection.
$540 USD in 7 days
4.8 (6 reviews)
2.5
2.5
User Avatar
Hello, client. I read your requirement. I'm a developer with years of experience in scraping and web development. During this time, I gained experience in scraping several kinds of websites. One of my projects involved scraping a ticket-selling website and keeping the list of the tickets in CSV format every specific time every day. The project involved passing CAPTCHA and cloudflare bot detection. I understood the network communication and website structure and passed CAPTCHA with only 1 line Javascript code. The project was successful and I gave experience to my customer. You can ensure a good result if you hire me. I hope to discuss the project with you in more detail. Thank you
$200 USD in 7 days
5.0 (1 review)
2.6
2.6
User Avatar
Hi there, Are you looking for a skilled Python developer to create a powerful web scraping tool for your project? Look no further! I'm Faisal, a top-rated freelancer and one of the leading Python developers on Freelancer. I present a solution tailored to your needs: a Python-based web scraper named Post Scraper. This advanced tool extracts text-only data from online discussion forums related to a list of apps, saving it to a MySQL database with keywords and sentiment analysis. The implementation utilizes Python’s Beautiful Soup for parsing, KeyBERT for keyword extraction, and VADER for sentiment analysis – all available for local use. Now, let’s talk specifics. The scraper will create a table with essential details: the app’s name, crawled URL, discussion start date, scraped text content, extracted keywords, and overall sentiment (positive, neutral, or negative). To ensure efficiency, the script will analyze only the discussion topics related to the apps without considering sub-pages. Additionally, I'll implement defined character limits for post content (between 200 and 6000 characters) and avoid saving HTML tags or metadata. Ready to transform your web scraping needs into reality? Send me a message here, and let’s bring Post Scraper to life together. I'm confident in delivering a robust, reliable, and user-friendly solution tailored to your exact requirements. Looking forward to collaborating with you on this exciting project! Regards, Faisal
$250 USD in 7 days
5.0 (2 reviews)
2.6
2.6
User Avatar
I understand you are looking for a post scraper that can extract text-only data from online discussion forums relating to a list of apps and save it to MySQL database with keywords and sentiment analysis. I believe I am the perfect fit for this job due to my extensive experience in Python, web scraping, databases and software development. I have 6 years of experience in developing custom websites and web applications and have worked on many large projects. My expertise includes working with APIs such as JavaScript, PHP and MySQL. This makes it easy for me to work on the project remotely if required. I also have skills in React.js + React Native (Redux, Material UI) which makes it easy for me to support Discussion Forums that require JavaScript functionality or protect against scraping.
$300 USD in 5 days
5.0 (1 review)
1.4
1.4
User Avatar
I am pleased to introduce you to Post Scraper, a Python-based web scraper that can scrape the text-only data of online discussion forums relating to a list of apps and save that data to MySQL database. The purpose is to create a table where each row contains the following information: 1) Name of the app - The name of the app that was mentioned in the forum post 2) URL crawled - The page on the website that was scraped as part of the project 3) Date when the topic was started within the discussion forum - The date when the topic was created in the forum. This will help us detect broken links in future if any 4) Scraped text-only content (online discussion forum comments from users commenting about the app) - All the text-only data from the website has been scraped and saved into MySQL database. This includes html tags and bbCode but no username or other metadata is included in the text-only content. 5) Extracted keywords - Keywords related to apps have been extracted from discussions and saved as separate columns in MySQL database 6) Overall sentiment of the discussion - Overall sentiment of discussions have also been analyzed and
$140 USD in 4 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I understand that you are looking for a skilled Python developer to develop a Post Scraper. I believe I am the perfect fit for this task as I have extensive experience in developing web applications using Python, as well as scalability concerns when using Python on large scale projects. My skillset includes knowledge of Beautiful Soup, KeyBERT for keyword extraction and VADER or other similar library for sentiment analysis. Additionally, all the used libraries must be such that can be used locally i.e. nothing remote API based. Additionally, I am available to work on this project remotely if needed as I use Screen Connections to connect to my server desktops from any location. With my expertise in software development and technical optimization, I am confident that I can deliver an exceptional product quality while keeping the project schedule on track. Please feel free to contact me if you would like to discuss further or need any additional information regarding my skillset or capabilities in general.[/
$100 USD in 2 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hey, Greetings..! I am well-versed in Python and data scraping using Beautiful Soup, along with sentiment analysis using VADER and keyword extraction with KeyBERT. I understand the need for local library usage and the specific database structure you've outlined. I will ensure the script works seamlessly on both Windows and Linux hosts. I am also open to discussing any necessary adjustments to the database structure. My proposed structure aligns with your requirements: Post_scraper_input_urls: To store starting URLs. Post_scraper_input_apps: To contain the list of apps. Post_scraper_queue: To manage the crawling queue. Post_scraper_data: To save the scraped data with sentiment and keywords. Please let me know if you have any specific preferences or questions. Best regards, Parminder
$140 USD in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi there! My name is chandramohan and I am an experienced data scientist with strong math background. I am confident that I can help you make the best use of your discussion forum post scraper project. I understand that you are looking for a way to scrape the text-only data from online discussion forums relating to a list of apps and save that data to MySQL database with keywords and sentiment analysis. Specifically, you are looking for a way to create a table where each row contains: 1) Name of the app, 2) URL crawled, 3) Date when the topic was started within the discussion forum, 4) Scraped text-only content (i.e. the online discussion forum comments from users commenting about the app), 5) Extracted keywords, 6) Overall sentiment of the discussion.
$140 USD in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I'm Δημήτριος from Webinfamous, and I understand you are looking for a post scraper to collect text-only data from online discussion forums relating to a list of apps and save it to MySQL database with keywords and sentiment analysis. I believe that my skillset is the perfect fit for this project - specifically my background in web scraping, software engineering and architecture. I have extensive experience in web scraping, software engineering and architecture which will be invaluable when it comes to developing Post Scraper. My team has a wealth of knowledge in the field of web development, software engineering and architecture which will ensure that your project is delivered on time and with maximum quality. Additionally, we guarantee communication channel with a representative from our team so that you can get any reports or status updates on the project whenever necessary. Lastly, we also offer hire-and-forget partnership whereby you don't need to worry about us anymore once the project has been completed - we'll take care of all
$200 USD in 7 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of PORTUGAL
Braga, Portugal
5.0
719
Payment method verified
Member since Mar 16, 2011

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.