CSV Reader
CSVReader class for extracting data from CSV and updating files.
The header_config argument to __init__() is a list of dict objects that defines the expected structure of the CSV file. Each dict object in the list should have the following keys:
- name: The name of the column.
- type: The type of the column, which can be 'str', 'int', 'float', or 'date'.
- format (optional): A string that defines the format for date or float types (e.g., "%Y-%m-%d" for date or ".2f" for float).
- match (optional): A boolean indicating if this column should be used for matching records in the merge_data_sets() function.
- sort (optional): An integer indicating the sort order of the column.
- minimum (optional): A date or int that defines a minimum value for filtering by date.
For example, consider this CSV file:
Symbol,Date,Name,Currency,Price
ACM0006AU,2025-04-28,AB Managed Volatility Equities,AUD,1.82
CSA0038AU,2025-04-28,Bentham Global Income,AUD,1.00
ETL0018AU,2025-04-28,PIMCO Global Bond Wholesale,AUD,0.90
The header configuration might look liek this:
header_config = [
{
"name": "Symbol",
"type": "str",
"match": True,
"sort": 2,
},
{
"name": "Date",
"type": "date",
"format": "%Y-%m-%d",
"match": True,
"sort": 1,
"minimum": None,
},
{
"name": "Name",
"type": "str",
},
{
"name": "Currency",
"type": "str",
},
{
"name": "Price",
"type": "float",
"format": ".2f",
},
]
__init__(file_path, header_config=None)
Initialize the CSVReader with the file path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Path | str
|
The path to the CSV file. If the file does not exist, it won't be created. |
required |
header_config
|
Optional(list[dict])
|
The header configuration for the CSV file. |
None
|
Raises:
Type | Description |
---|---|
TypeError
|
If header_config is not structured correctly. |
ImportError
|
If the file not exists but doesn't have a valid extension. |
merge_data_sets(primary_list, append_list)
Merges two lists of dictionaries based on header configuration match fields and sorts the result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
primary_list
|
list[dict]
|
The primary list of dictionaries. |
required |
append_list
|
list[dict]
|
The list of dictionaries to append or override. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the dictionaries in the lists do not have the same structure. |
Returns:
Type | Description |
---|---|
list[dict]
|
list[dict]: The merged and sorted list of dictionaries. |
read_csv()
Read the CSV file and return its content.
If the file does not exist, return None. If the file has a header but no data, returns an empty list.
Raises:
Type | Description |
---|---|
ImportError
|
If the CSV file is empty or has no header. |
ValueError
|
A value read from the CSV file cannot be converted to the expected type as defined in header_config. |
Returns:
Name | Type | Description |
---|---|---|
data |
list[dict]
|
A list of rows from the CSV file or None if the file does not exist. |
sort_csv_data(csv_data)
Sort the CSV data based on the header configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_data
|
list[dict]
|
The data read from the CSV file. |
required |
Returns:
Type | Description |
---|---|
list[dict]
|
list[dict]: The sorted data. |
trim_csv_data(csv_data, max_lines=None)
Trim the CSV data based on the header configuration and optionally the max_lines arg.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_data
|
list[dict]
|
The data read from the CSV file. |
required |
max_lines
|
Optional(int)
|
If provided, the maximum number of lines to return from csv_data. If this is >0 then it will return the first max_lines lines, if <0 then it will return all but the last abs(max_lines) lines. If None, no trimming is done. |
None
|
Returns:
Type | Description |
---|---|
list[dict]
|
list[dict]: The trimmed data. |
update_csv_file(new_data, new_filename=None)
Appends or merges the new_data into an existing CSV file. If the file does not exist, it will be created.
This function will also sort and trim the combined data according to the header configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_data
|
list[dict]
|
The new data to append or merge. |
required |
new_filename
|
Optional(Path | str)
|
If provided, the data will be written to this file instead of the original file. |
None
|
Raises:
Type | Description |
---|---|
RuntimeError
|
If there is a problem processign the data. |
Returns:
Name | Type | Description |
---|---|---|
merged_data |
list[dict]
|
The merged and sorted data after appending or merging the new_data. |
write_csv(data, new_filename=None)
Write data to the CSV file.
- If the file does not exist, it will be created.
- If the file exists, it will be overwritten.
- The header will be written based on the header_config.
- The data will be written in the order of the header_config.
- If a header in the data does not exist in the header_config, it will be ignored.
- If a header in the header_config does not exist in the data, throw an exception.
- Date fields are formatted according to the format specified in header_config.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
list[dict]
|
The data to write to the CSV file. |
required |
new_filename
|
Optional(Path | str)
|
If provided, the data will be written to this file instead of the original file. |
None
|
Raises:
Type | Description |
---|---|
ValueError
|
If the data is empty or if a header in header_config is not found in the data. |
Returns:
Type | Description |
---|---|
bool
|
True if the data was written successfully, False otherwise. |