Find Jobs
Hire Freelancers

C++ data compression and mini-expression parser library

$30-100 USD

Cancelled
Posted over 15 years ago

$30-100 USD

Paid on delivery
I am looking for an experienced C++ developer with experience in data compression technologies. This project is a proof of concept. The project involves creating a C++ library to create/read and manipulate highly compressed binary files. A seeking mechanism (e.g. fseek, gzseek will ideally to randomly access rows in the compressed file - without reading the entire file into memory. Rows in the compressed binary files are fixed width and so column names (i.e. struct field names) can be used to query records in the file. To that end, a minimal expression parser is required to facilitate fetching records form the compressed file using a minimal syntax, similar to that used in a WHERE clause. ## Deliverables I am looking for an experienced C++ developer with experience in data compression technologies. This project is a proof of concept. The requirements of this project are: REQ-1). Creation of a data compression library for the purposes of reading/writing data into a MAXIMALLY compressed binary file. Specifically, this entails the ff: i). Retrieving subsets of data from the compressed file by specifying criteria - the retrieval is to be carried out using fseek/gzseek etc, i.e. without loading the entire compressed file into memory ii). Appending data to the compressed binary file iii). Updating existing data in the compressed binary file iv). Inserting new data in the compressed binary file v). Deleting data in the compressed binary file that matches specified criteria REQ-2: Creation of a minimalist expression parser to help query records stored in the compressed file (see querying section below) REQ-3). Creation of a command line console program used to interface to the library to create and/or manipulate a specified compressed binary file. Specifically, the command line program will allow the following: i). Retrieval of data (i.e. records) from a specified compressed file into another file (say a specified CSV file), using criteria (see querying section) ii). Appending data (i.e. records) from a specified source (say a CSV file) into a specified compressed binary file iii). Updating data exisiting in a specified compressed binary file with data from a specified source file (say a CSV file) iv). Insertion of new data (i.e. records) from one source (say a specified CSV file) into a specified compressed binary file v). Deletion of data (i.e. records) that match a specified criteria, from a specified compressed file vi). Utility function of resorting the data in a specified compressed file and recompressing the file. During this function call, binary trees or hash tabes can be regenerated to speed access. The Detail of the compressed binary file is as follows: 1). Header section: This contains summary data about the file contents. Additionally, since the data in the compressed file will often be searched - it may be necessary to maintain a binary tree/hash table which indexes the records stored in the data section. The hash key for example, can be computed from row column values. 2). Data section: This contains fixed width rows (i.e. C++ structs) The row stuct has several fields. Two of the most important are the date and time stamp fields. All the rows in a compressed file are sorted in ascending order of timestamp. Maximal lossless compression must be used as it is expected that the compressed files could eventually contain several tens of millions of records, and so maximum compression is required to keep the files small. At the same time, it must be quick to search/retrieve the compressed file - without loading the entire compressed file into memory. Querying records/mini expression parser: ======================================== Because we are using fixed width records (and the column names do not change), it is desirable that we develop a syntax that allows us to specify criteria to be matched (in selected rows) - by issuing a command similar to a WHERE clause (all columns are always returned) so we only need the WHERE counterpart. An example will be "timestamp > 2008-01-01 09:00 AM AND timestamp < 2008-10-01 12:00 PM". A simple expression parser will need to be developed to evaluate these expressions. Allowed keywords/operators will be AND, OR, BETWEEN, >, >=, <, <=, =, !=) Note1: The code needs to be clear, robust (e.g. using exception handling and proper error error checking). Above all the code must be well commented, easily modifiable/maintainable - and must (wherever necessary), use design patterns, for a clean architecture. Note2: Although the utility is envisaged to be initially run on Windows - PLEASE DO NOT USE any Windows specific functions, as I will be running the code on Unix as well. The code should be platform agnostic. Use wxWidget/BOOST libraries wherever necessary, to keep the code cross platform. Please refer to the attached file for a high level overview of the technical specification. If anything is not clear, so not hesitate to ask for clarification .
Project ID: 3274851

About the project

Remote project
Active 15 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

About the client

Flag of UNITED KINGDOM
United Kingdom
4.8
74
Member since Feb 15, 2003

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.