Skip to content

CSV Reader

CSVReader class for extracting data from CSV and updating files.

The header_config argument to __init__() is a list of dict objects that defines the expected structure of the CSV file. Each dict object in the list should have the following keys:

  • name: The name of the column.
  • type: The type of the column, which can be 'str', 'int', 'float', or 'date'.
  • format (optional): A string that defines the format for date or float types (e.g., "%Y-%m-%d" for date or ".2f" for float).
  • match (optional): A boolean indicating if this column should be used for matching records in the merge_data_sets() function.
  • sort (optional): An integer indicating the sort order of the column.
  • minimum (optional): A date or int that defines a minimum value for filtering by date.

For example, consider this CSV file:

Symbol,Date,Name,Currency,Price
ACM0006AU,2025-04-28,AB Managed Volatility Equities,AUD,1.82
CSA0038AU,2025-04-28,Bentham Global Income,AUD,1.00
ETL0018AU,2025-04-28,PIMCO Global Bond Wholesale,AUD,0.90

The header configuration might look liek this:

header_config = [
    {
        "name": "Symbol",
        "type": "str",
        "match": True,
        "sort": 2,
    },
    {
        "name": "Date",
        "type": "date",
        "format": "%Y-%m-%d",
        "match": True,
        "sort": 1,
        "minimum": None,
    },
    {
        "name": "Name",
        "type": "str",
    },
    {
        "name": "Currency",
        "type": "str",
    },
    {
        "name": "Price",
        "type": "float",
        "format": ".2f",
    },
]

__init__(file_path, header_config=None)

Initialize the CSVReader with the file path.

Parameters:

Name Type Description Default
file_path Path | str

The path to the CSV file. If the file does not exist, it won't be created.

required
header_config Optional(list[dict])

The header configuration for the CSV file.

None

Raises:

Type Description
TypeError

If header_config is not structured correctly.

ImportError

If the file not exists but doesn't have a valid extension.

merge_data_sets(primary_list, append_list)

Merges two lists of dictionaries based on header configuration match fields and sorts the result.

Parameters:

Name Type Description Default
primary_list list[dict]

The primary list of dictionaries.

required
append_list list[dict]

The list of dictionaries to append or override.

required

Raises:

Type Description
ValueError

If the dictionaries in the lists do not have the same structure.

Returns:

Type Description
list[dict]

list[dict]: The merged and sorted list of dictionaries.

read_csv()

Read the CSV file and return its content.

If the file does not exist, return None. If the file has a header but no data, returns an empty list.

Raises:

Type Description
ImportError

If the CSV file is empty or has no header.

ValueError

A value read from the CSV file cannot be converted to the expected type as defined in header_config.

Returns:

Name Type Description
data list[dict]

A list of rows from the CSV file or None if the file does not exist.

sort_csv_data(csv_data)

Sort the CSV data based on the header configuration.

Parameters:

Name Type Description Default
csv_data list[dict]

The data read from the CSV file.

required

Returns:

Type Description
list[dict]

list[dict]: The sorted data.

trim_csv_data(csv_data, max_lines=None)

Trim the CSV data based on the header configuration and optionally the max_lines arg.

Parameters:

Name Type Description Default
csv_data list[dict]

The data read from the CSV file.

required
max_lines Optional(int)

If provided, the maximum number of lines to return from csv_data. If this is >0 then it will return the first max_lines lines, if <0 then it will return all but the last abs(max_lines) lines. If None, no trimming is done.

None

Returns:

Type Description
list[dict]

list[dict]: The trimmed data.

update_csv_file(new_data, new_filename=None)

Appends or merges the new_data into an existing CSV file. If the file does not exist, it will be created.

This function will also sort and trim the combined data according to the header configuration.

Parameters:

Name Type Description Default
new_data list[dict]

The new data to append or merge.

required
new_filename Optional(Path | str)

If provided, the data will be written to this file instead of the original file.

None

Raises:

Type Description
RuntimeError

If there is a problem processign the data.

Returns:

Name Type Description
merged_data list[dict]

The merged and sorted data after appending or merging the new_data.

write_csv(data, new_filename=None)

Write data to the CSV file.

  1. If the file does not exist, it will be created.
  2. If the file exists, it will be overwritten.
  3. The header will be written based on the header_config.
  4. The data will be written in the order of the header_config.
  5. If a header in the data does not exist in the header_config, it will be ignored.
  6. If a header in the header_config does not exist in the data, throw an exception.
  7. Date fields are formatted according to the format specified in header_config.

Parameters:

Name Type Description Default
data list[dict]

The data to write to the CSV file.

required
new_filename Optional(Path | str)

If provided, the data will be written to this file instead of the original file.

None

Raises:

Type Description
ValueError

If the data is empty or if a header in header_config is not found in the data.

Returns:

Type Description
bool

True if the data was written successfully, False otherwise.