Analyse and refine a CSV file
$30-5000 USD
Paid on delivery
Analyse and refine a large CSV file so that the end output is a collection of Excel files that are meaningful and manageable.
An application is required to perform the above on any CSV file that follows a defined structure.
## Deliverables
Take any CSV file with 16 columns and approx 1,000,000 rows (see attached).
The columns are as follows...
a) SPORTS_ID
b) EVENT_ID
c) SETTLED_DATE
d) FULL_DESCRIPTION
e) SCHEDULED_OFF
f) EVENT
g) DT ACTUAL_OFF
h) SELECTION_ID
i) SELECTION
j) ODDS
k) NUMBER_BETS
l) VOLUME_MATCHED
m) LATEST_TAKEN
n) FIRST_TAKEN
o) WIN_FLAG
p) IN_PLAY (IP - In-Play, PE - Pre-Event, NI - Event did not go in-play)
Remove all rows where SPORTS_ID (column A) is not equal 1 (i.e. > 1). This should leave you with 609005 rows including header.
Sort the file alphabetically by EVENT (column F) and save a file for each of the following distinct Events (any other Events should be ignored):
1) 1st Goal - 5812 rows including header
2) 2nd Goal - 2699 rows including header
3) 3rd Goal - 2063 rows including header
4) Correct Score - 131893 rows including header
5) Draw No Bet - 4019 rows including header
6) Half Time - 27926 rows including header (see attached)
7) Half Time Score - 48057 rows including header
8) Half Time Full Time - 17799 rows including header
9) Match Odds - 100891 rows including header
10) Next Goal - 6192 rows including header
11) Odd Or Even - 184 rows including header
12) Over One And A Half Goals - 36448 rows including header
13) Over Two And A Half Goals - 57457 rows including header
14) Over Three And A Half Goals - 31816 rows including header
15) Penalty Taken - 653 rows including header
16) Sending Off - 1999 rows including header
Take the saved Half Time file (#6) saved above and sort alphabetically by FULL_DESCRIPTION (column D). Extract all those rows that begin 'English Soccer/Barclays Premier League' and save as a new file.
The new file ([login to view URL]) will include 1591 rows including header (see attached).
For each game order by SELECTION, LATEST_TAKEN (Oldest to Newest) and FIRST_TAKEN (Oldest to Newest).
For each of the 3 possibilities in a game extract the row whose LATEST_TAKEN value is closest to the DT ACTUAL_OFF value and has an IN_PLAY value of PE. Taking Arsenal vs West Ham as an example the extracted rows would be as follows:
SELECTION ODDS NUMBER_BETS VOLUME_MATCHED LATEST_TAKEN FIRST_TAKEN WIN_FLAG IN_PLAY
Arsenal (HT) 1.55 2 386.56 30/10/2010 14:00 30/10/2010 14:00 0 PE
The Draw (HT) 3.5 11 1165.66 30/10/2010 13:59 30/10/2010 13:36 1 PE
West Ham (HT) 12.5 5 31.2 30/10/2010 13:55 30/10/2010 13:03 0 PE
Save a new file ([login to view URL]) where we only have 3 rows as above per match.
For each of the 3 possibilities in a game extract the row whose ODDS value are less than 2.01. This should return zero or 1 rows per game. In the current example the row where the odds value are 1.55 will be extracted. Save a new file ([login to view URL]) where we have no more than 1 row as above per match. See attached for how files should be refined.
There may be further work required on some other files saved above if this project proves successful.
Project ID: #3854575