Objective of program:
to scour the web one domain at a time to detect if it has a classifieds site hosted there
## Deliverables
Programming language:
It must be able to run on a console on a linux machine so options include c, c++, perl, python. and a few others.? I am not sure which one would be the best and fasted choice.? I DO NOT want a windows version of this.? I want to run it on a linux ? server somewhere with a fast internet connection for the program to function? as quickly as possible
Objective
To compile a complete list of all? buying and selling classified sites, message boards, forums, and bulletin boards? on the internet.? NO blogs.? If you goto [login to view URL] and search for "miami classifieds" or "free classifieds" those are good examples of the sites i want to gather with this program.
The program will do the following:
Increment alphabeticly and numericly one letter and number at a time to search for the presence of a [login to view URL] [login to view URL] [login to view URL] or similar to detect if the website? is a classified site or message board.? ? It will use the characters a-z 0-9 and - for trying to find domains.? It will start with [login to view URL] [login to view URL] [login to view URL], [login to view URL], [login to view URL] ... [login to view URL] [login to view URL] etc. ? It will attempt to identify the php, asp or cgi software and version? the classified or message board site is running if possible.? It will create a CVS file with the output.? It will also append to a log file the current time,? current action it is performing, which domain was the last or current one, any errors, and anything else important.? If it can automaticly resume later that would be great.?
field #1 - type of site - classified or forum
field #2 - URL of classified site
field #3 - title bar from header of page
field #4 - software type of classified or message board
field #5 - time and date searched
field #6 - alexis page ranking
the user input for the program will included:
start point - ie [login to view URL] [login to view URL] so i can return after it left off
type of site to search - classified, message board or both
timeout period (default 10 seconds)- how long to wait for a resposne
concurrent connections - how many queries at once
domain length - max number of characters long to search for
output file name -? (default current dir/$DATE$STARTTIME) output cvs file location and name
output log name - (default current dir/log) output log file, it will just append to it
top level domain - ie .com .net .us. .org etc
Most importantly the program will let me change with the timeout and concurrent connection count so i can set it at a rate which will almost max out the computer but not crash it.