.gitignore | ||
readme.md | ||
script.py |
Script Documentation
Overview
This script processes two Excel files (
reinoud.xlsx
and
sisa.xlsx
) to find and append missing IDs from
sisa.xlsx
to
reinoud.xlsx
. It also checks for duplicate IDs in
reinoud.xlsx
.
Functions
load_excel(file_path: str, sheet_name: Optional[str] = None) -> pd.DataFrame
Loads an Excel file into a DataFrame.
check_duplicates(df: pd.DataFrame, column: str) -> List[str]
Checks for duplicate values in a specified column.
find_missing_ids(df1: pd.DataFrame, df2: pd.DataFrame, column: str) -> List[str]
Finds IDs in df2
that are not in df1
.
append_missing_ids(reinoud_df: pd.DataFrame, sisa_df: pd.DataFrame, column: str, reinoud_file: str) -> pd.DataFrame
Appends missing IDs and corresponding details from sisa_df
to reinoud_df
.
main(reinoud_file: str, sisa_file: str, column: str, reinoud_sheet: Optional[str] = None, sisa_sheet: Optional[str] = None)
Main function to load the Excel files, check for duplicates, append missing IDs, and save the updated DataFrame back to the Excel file.
Usage
Run the script with the following command:
python script.py
Example usage within the script:
if __name__ == "__main__":
main('reinoud.xlsx', 'sisa.xlsx', 'Rolnummer', reinoud_sheet='Actief', sisa_sheet='sheet1')
Logging
The script uses the logging
module to log information and errors. The log level is set to INFO
.
File Structure
.gitignore
reinoud.xlsx
script.py
sisa.xlsx
Dependencies
- pandas
- logging
Install dependencies using:
pip install pandas
License
This script is provided "as-is" without any warranty. Use at your own risk.