Excel Reader
A class to read and extract data from Excel files (.xlsx, .xlsm, .xlsb).
This class provides methods to load workbooks, extract data from sheets, tables, and named ranges, with robust error handling.
__init__(file_path)
Initializes the ExcelReader with the path to the Excel file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path
|
Path to the Excel file, specified as a Path object or string. |
required |
Raises:
Type | Description |
---|---|
ImportError
|
If the file does not exist, is not a valid Excel file, or if the openpyxl library cannot be imported. |
extract_data(source_name, source_type)
Extracts data from an Excel file based on the source type and name.
Expected the specified source type to be either: - An entire worksheet with the header in the first row (sheet) - A named range (range) - An Excel table (table)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_name
|
str
|
Name of the sheet, table, or range to extract. |
required |
source_type
|
str
|
Type of source ('sheet', 'table', or 'range'). |
required |
Raises:
Type | Description |
---|---|
ImportError
|
If the source type is invalid or if there are issues extracting data. |
Returns:
Name | Type | Description |
---|---|---|
data |
dict
|
Data extracted as a dictionary. |
extract_from_range(range_name)
Extracts a table from an Excel file and returns it as a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
range_name
|
str
|
Name of the range to extract. |
required |
Raises:
Type | Description |
---|---|
ImportError
|
If the table cannot be loaded or if there are issues with the file. |
Returns:
Name | Type | Description |
---|---|---|
data |
DataFrame
|
A DataFrame containing the range data. |
extract_from_sheet(sheet_name)
Extracts a sheet from an Excel file and returns it as a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sheet_name
|
str
|
Name of the sheet to extract. |
required |
Raises:
Type | Description |
---|---|
ImportError
|
If the sheet cannot be loaded or if there are issues with the file. |
Returns:
Name | Type | Description |
---|---|---|
data |
DataFrame
|
A DataFrame containing the sheet data. |
extract_from_table(table_name)
Extracts a table from an Excel file and returns it as a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name
|
str
|
Name of the table to extract. |
required |
Raises:
Type | Description |
---|---|
ImportError
|
If the table cannot be loaded or if there are issues with the file. |
Returns:
Name | Type | Description |
---|---|---|
data |
DataFrame
|
A DataFrame containing the table data. |
load_excel_workbook(data_only=True, read_only=False)
Load an Excel workbook with robust error handling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_only
|
Optional[bool]
|
Whether to return cell values (not formulas). |
True
|
read_only
|
Optional[bool]
|
Use openpyxl's read-only mode for large files. |
False
|
Raises:
Type | Description |
---|---|
ImportError
|
If the file cannot be loaded due to various reasons (e.g., file not found, permission denied, invalid format). |
Returns:
Name | Type | Description |
---|---|---|
object |
workbook
|
Workbook object or None if loading fails. |