Refactor startpakket processing tool: modularize code into dedicated scripts for argument parsing, logging setup, data processing, and file utilities; enhance documentation and usage examples in README.

This commit is contained in:
bdaneels 2025-07-29 16:43:53 +02:00
parent 8236038f11
commit 067a450bfd
6 changed files with 366 additions and 179 deletions

105
startpakketten/README.md Normal file
View File

@ -0,0 +1,105 @@
# Startpakket Processing Tool
A Python tool for processing and comparing student data from predeliberation and dashboard Excel files. The tool identifies students with FAIL status and compares SP (study points) values between different data sources.
## Project Structure
The codebase has been organized into focused modules:
### Core Scripts
- **`script.py`** - Main orchestration script, handles command-line interface and coordinates all processing
- **`data_processor.py`** - Core data processing functions for Excel file handling
- **`cli_args.py`** - Command-line argument parsing and validation
- **`config.py`** - Configuration management and logging setup
- **`file_utils.py`** - File I/O utilities and output formatting
### Processing Modules
- **`checkheaders.py`** - Excel file header processing and normalization
- **`process_predelib_file.py`** - Predeliberation file analysis and FAIL status detection
- **`compare_sp.py`** - SP value comparison between predeliberation and dashboard files
## Usage
### Basic Usage
```bash
python script.py --predelib db.xlsx --dashboard dashboard_inschrijvingen.xlsx
```
### Advanced Usage
```bash
# Save results to JSON file
python script.py -p db.xlsx -d dashboard_inschrijvingen.xlsx --output results.json
# Enable verbose logging
python script.py --predelib db.xlsx --dashboard dashboard_inschrijvingen.xlsx --verbose
# Custom log file
python script.py --predelib db.xlsx --dashboard dashboard_inschrijvingen.xlsx --log-file custom.log
```
### Command-line Options
- `--predelib, -p`: Path to predeliberation Excel file (required)
- `--dashboard, -d`: Path to dashboard Excel file (required)
- `--output, -o`: Output file path for JSON results (optional)
- `--verbose, -v`: Enable verbose logging
- `--log-file`: Custom log file path (default: startpakket_processing.log)
## Features
1. **File Validation**: Automatically validates that input files exist and are Excel format
2. **Header Processing**: Intelligently detects and normalizes Excel headers
3. **FAIL Detection**: Identifies students with FAIL status in adviesrapport code
4. **SP Comparison**: Compares study points between predeliberation and dashboard data
5. **Comprehensive Logging**: Detailed logging with configurable verbosity
6. **Flexible Output**: Console summary with optional JSON export
7. **Error Handling**: Robust error handling with appropriate exit codes
## Installation
1. Ensure Python 3.12+ is installed
2. Install required dependencies:
```bash
pip install pandas openpyxl
```
## Input File Requirements
### Predeliberation File (db.xlsx)
Must contain columns:
- ID, Achternaam, Voornaam, E-mail
- Totaal aantal SP, Aantal SP vereist
- Adviesrapport code, Waarschuwing
### Dashboard File (dashboard_inschrijvingen.xlsx)
Must contain columns:
- ID, Naam, Voornaam
- Ingeschr. SP (intern)
## Output
The tool provides:
1. **Console Summary**: Overview of processing results
2. **Failed Students Report**: Detailed list of students with FAIL status
3. **SP Mismatch Report**: Any discrepancies between predeliberation and dashboard SP values
4. **Optional JSON Export**: Machine-readable results for further processing
## Exit Codes
- `0`: Success (no mismatches found)
- `1`: Processing completed but mismatches found
- `130`: Process interrupted by user
- Other: Fatal error occurred
## Logging
All processing activities are logged to `startpakket_processing.log` by default. Use `--verbose` for detailed debug information.

View File

@ -0,0 +1,65 @@
"""
Command-line argument parsing for the startpakket processing script.
"""
import argparse
import os
def validate_file_path(file_path: str) -> str:
"""Validate that the file exists and is an Excel file"""
if not os.path.exists(file_path):
raise argparse.ArgumentTypeError(f"File '{file_path}' does not exist")
if not file_path.lower().endswith(('.xlsx', '.xls')):
raise argparse.ArgumentTypeError(f"File '{file_path}' is not an Excel file (.xlsx or .xls)")
return file_path
def parse_arguments():
"""Parse command line arguments"""
parser = argparse.ArgumentParser(
description='Process and compare student data from predeliberation and dashboard Excel files',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s --predelib db.xlsx --dashboard dashboard_inschrijvingen.xlsx
%(prog)s -p /path/to/predelib.xlsx -d /path/to/dashboard.xlsx --output results.json
%(prog)s --predelib db.xlsx --dashboard dashboard.xlsx --verbose
"""
)
parser.add_argument(
'--predelib', '-p',
type=validate_file_path,
required=True,
help='Path to the predeliberation Excel file (db.xlsx)'
)
parser.add_argument(
'--dashboard', '-d',
type=validate_file_path,
required=True,
help='Path to the dashboard Excel file (dashboard_inschrijvingen.xlsx)'
)
parser.add_argument(
'--output', '-o',
type=str,
help='Output file path for results (optional, prints to console if not specified)'
)
parser.add_argument(
'--verbose', '-v',
action='store_true',
help='Enable verbose logging'
)
parser.add_argument(
'--log-file',
type=str,
default='startpakket_processing.log',
help='Path to log file (default: startpakket_processing.log)'
)
return parser.parse_args()

55
startpakketten/config.py Normal file
View File

@ -0,0 +1,55 @@
"""
Configuration and logging setup for the startpakket processing application.
"""
import logging
import sys
from typing import Optional
def setup_logging(log_file: str = 'startpakket_processing.log', verbose: bool = False) -> logging.Logger:
"""
Configure logging for the application.
Args:
log_file: Path to the log file
verbose: Enable debug logging if True
Returns:
Configured logger instance
"""
# Remove existing handlers to avoid duplicates
for handler in logging.root.handlers[:]:
logging.root.removeHandler(handler)
# Set logging level
level = logging.DEBUG if verbose else logging.INFO
# Configure logging
logging.basicConfig(
level=level,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(log_file),
logging.StreamHandler(sys.stdout)
]
)
logger = logging.getLogger(__name__)
if verbose:
logger.debug("Verbose logging enabled")
return logger
def get_exit_code(results: dict) -> int:
"""
Determine the appropriate exit code based on processing results.
Args:
results: Processing results dictionary
Returns:
Exit code (0 for success, 1 for mismatches found)
"""
return 0 if results.get('mismatches_count', 0) == 0 else 1

View File

@ -0,0 +1,73 @@
"""
Core data processing functions for the startpakket processing script.
"""
import pandas as pd
import logging
from typing import Dict, Any, List
from checkheaders import check_headers_dashboard_inschrijvingenfile, check_headers_predelibfile
from process_predelib_file import check_students_with_fail_adviesrapport
from compare_sp import compare_sp_values
logger = logging.getLogger(__name__)
def process_files(predelib_path: str, dashboard_path: str, verbose: bool = False) -> Dict[str, Any]:
"""
Process the Excel files and return results.
Args:
predelib_path: Path to the predeliberation Excel file
dashboard_path: Path to the dashboard Excel file
verbose: Enable verbose logging
Returns:
Dictionary containing processing results
Raises:
Exception: If file processing fails
"""
try:
# Read Excel files
logger.info(f"Reading predeliberation file: {predelib_path}")
df_predelib = pd.read_excel(predelib_path)
logger.info(f"Predelib file loaded successfully. Shape: {df_predelib.shape}")
logger.info(f"Reading dashboard file: {dashboard_path}")
df_dashboard = pd.read_excel(dashboard_path)
logger.info(f"Dashboard file loaded successfully. Shape: {df_dashboard.shape}")
# Process the dataframes
logger.info("Processing predeliberation file headers")
processed_predelib_df = check_headers_predelibfile(df_predelib)
logger.info("Processing dashboard file headers")
processed_dashboard_df = check_headers_dashboard_inschrijvingenfile(df_dashboard)
# Check the predeliberation file for students with a fail in 'Adviesrapport code'
logger.info("Checking for students with FAIL status in predeliberation file")
students_with_fail = check_students_with_fail_adviesrapport(processed_predelib_df)
# Compare SP values
logger.info("Comparing SP values between files")
mismatches = compare_sp_values(processed_predelib_df, processed_dashboard_df)
# Prepare results
results = {
'predelib_file': predelib_path,
'dashboard_file': dashboard_path,
'predelib_records': len(processed_predelib_df),
'dashboard_records': len(processed_dashboard_df),
'students_with_fail_count': len(students_with_fail),
'students_with_fail': students_with_fail,
'mismatches_count': len(mismatches),
'mismatches': mismatches,
'status': 'completed'
}
logger.info(f"Processing completed successfully. Found {len(mismatches)} mismatches.")
return results
except Exception as e:
logger.error(f"Error processing files: {e}")
raise

View File

@ -0,0 +1,49 @@
"""
File I/O utilities and output formatting for the startpakket processing script.
"""
import json
import logging
from typing import Dict, Any
from process_predelib_file import print_students_with_fail_ar_summary
logger = logging.getLogger(__name__)
def save_results(results: Dict[str, Any], output_path: str) -> None:
"""Save results to a JSON file"""
try:
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(results, f, indent=2, ensure_ascii=False)
logger.info(f"Results saved to: {output_path}")
except Exception as e:
logger.error(f"Error saving results to {output_path}: {e}")
raise
def print_summary(results: Dict[str, Any]) -> None:
"""Print a summary of the results to console"""
print(f"\n{'='*60}")
print("STARTPAKKET PROCESSING SUMMARY")
print(f"{'='*60}")
print(f"Predelib file: {results['predelib_file']}")
print(f"Dashboard file: {results['dashboard_file']}")
print(f"Predelib records processed: {results['predelib_records']}")
print(f"Dashboard records processed: {results['dashboard_records']}")
print(f"Students with FAIL adviesrapport found: {results['students_with_fail_count']}")
print(f"Mismatches found: {results['mismatches_count']}")
if results['students_with_fail_count'] > 0:
print_students_with_fail_ar_summary(results['students_with_fail'], results['predelib_file'])
if results['mismatches']:
print(f"\nDetailed mismatches between SP predeliberatierapport and Dashboard Inschrijvingen:")
for mismatch in results['mismatches']:
print(f"Mismatch - ID {mismatch['ID']} ({mismatch['Name']}): "
f"Predeliberatierapport SP={mismatch['Predelib_SP']}, "
f"Dashboard Inschrijvingen SP={mismatch['Dashboard_SP']}")
else:
print("\n✅ All SP values match perfectly!")
print(f"Status: {results['status']}")
print(f"{'='*60}")

View File

@ -1,209 +1,49 @@
import pandas as pd
import argparse
import logging
"""
Main script for processing and comparing student data from predeliberation and dashboard Excel files.
"""
import sys
import os
from pathlib import Path
import logging
from checkheaders import check_headers_dashboard_inschrijvingenfile, check_headers_predelibfile
from process_predelib_file import check_students_with_fail_adviesrapport, print_students_with_fail_ar_summary
from compare_sp import compare_sp_values
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('startpakket_processing.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
def validate_file_path(file_path: str) -> str:
"""Validate that the file exists and is an Excel file"""
if not os.path.exists(file_path):
raise argparse.ArgumentTypeError(f"File '{file_path}' does not exist")
if not file_path.lower().endswith(('.xlsx', '.xls')):
raise argparse.ArgumentTypeError(f"File '{file_path}' is not an Excel file (.xlsx or .xls)")
return file_path
def parse_arguments():
"""Parse command line arguments"""
parser = argparse.ArgumentParser(
description='Process and compare student data from predeliberation and dashboard Excel files',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s --predelib db.xlsx --dashboard dashboard_inschrijvingen.xlsx
%(prog)s -p /path/to/predelib.xlsx -d /path/to/dashboard.xlsx --output results.json
%(prog)s --predelib db.xlsx --dashboard dashboard.xlsx --verbose
"""
)
parser.add_argument(
'--predelib', '-p',
type=validate_file_path,
required=True,
help='Path to the predeliberation Excel file (db.xlsx)'
)
parser.add_argument(
'--dashboard', '-d',
type=validate_file_path,
required=True,
help='Path to the dashboard Excel file (dashboard_inschrijvingen.xlsx)'
)
parser.add_argument(
'--output', '-o',
type=str,
help='Output file path for results (optional, prints to console if not specified)'
)
parser.add_argument(
'--verbose', '-v',
action='store_true',
help='Enable verbose logging'
)
parser.add_argument(
'--log-file',
type=str,
default='startpakket_processing.log',
help='Path to log file (default: startpakket_processing.log)'
)
return parser.parse_args()
def process_files(predelib_path: str, dashboard_path: str, verbose: bool = False):
"""Process the Excel files and return results"""
try:
# Read Excel files
logger.info(f"Reading predeliberation file: {predelib_path}")
df_predelib = pd.read_excel(predelib_path)
logger.info(f"Predelib file loaded successfully. Shape: {df_predelib.shape}")
logger.info(f"Reading dashboard file: {dashboard_path}")
df_dashboard = pd.read_excel(dashboard_path)
logger.info(f"Dashboard file loaded successfully. Shape: {df_dashboard.shape}")
# Process the dataframes
logger.info("Processing predeliberation file headers")
processed_predelib_df = check_headers_predelibfile(df_predelib)
logger.info("Processing dashboard file headers")
processed_dashboard_df = check_headers_dashboard_inschrijvingenfile(df_dashboard)
# Check the predeliberation file for students with a fail in 'Adviesrapport code'
logger.info("Checking for students with FAIL status in predeliberation file")
students_with_fail = check_students_with_fail_adviesrapport(processed_predelib_df)
# Compare SP values
logger.info("Comparing SP values between files")
mismatches = compare_sp_values(processed_predelib_df, processed_dashboard_df)
# Prepare results
results = {
'predelib_file': predelib_path,
'dashboard_file': dashboard_path,
'predelib_records': len(processed_predelib_df),
'dashboard_records': len(processed_dashboard_df),
'students_with_fail_count': len(students_with_fail),
'students_with_fail': students_with_fail,
'mismatches_count': len(mismatches),
'mismatches': mismatches,
'status': 'completed'
}
logger.info(f"Processing completed successfully. Found {len(mismatches)} mismatches.")
return results
except Exception as e:
logger.error(f"Error processing files: {e}")
raise
def save_results(results: dict, output_path: str):
"""Save results to a file"""
try:
import json
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(results, f, indent=2, ensure_ascii=False)
logger.info(f"Results saved to: {output_path}")
except Exception as e:
logger.error(f"Error saving results to {output_path}: {e}")
raise
def print_summary(results: dict):
"""Print a summary of the results to console"""
print(f"\n{'='*60}")
print("STARTPAKKET PROCESSING SUMMARY")
print(f"{'='*60}")
print(f"Predelib file: {results['predelib_file']}")
print(f"Dashboard file: {results['dashboard_file']}")
print(f"Predelib records processed: {results['predelib_records']}")
print(f"Dashboard records processed: {results['dashboard_records']}")
print(f"Students with FAIL adviesrapport found: {results['students_with_fail_count']}")
print(f"Mismatches found: {results['mismatches_count']}")
if results['students_with_fail_count'] > 0:
print_students_with_fail_ar_summary(results['students_with_fail'], results['predelib_file'])
if results['mismatches']:
print(f"\nDetailed mismatches between SP predeliberatierapport and Dashboard Inschrijvingen:")
for mismatch in results['mismatches']:
print(f"Mismatch - ID {mismatch['ID']} ({mismatch['Name']}): Predeliberatierapport SP={mismatch['Predelib_SP']}, Dashboard Inschrijvingen SP={mismatch['Dashboard_SP']}")
else:
print("\n✅ All SP values match perfectly!")
print(f"Status: {results['status']}")
print(f"{'='*60}")
from cli_args import parse_arguments
from config import setup_logging, get_exit_code
from data_processor import process_files
from file_utils import save_results, print_summary
def main():
"""Main function"""
"""Main function - orchestrates the entire processing pipeline"""
try:
# Parse arguments
# Parse command-line arguments
args = parse_arguments()
# Configure logging level
if args.verbose:
logging.getLogger().setLevel(logging.DEBUG)
logger.debug("Verbose logging enabled")
# Set up logging configuration
logger = setup_logging(args.log_file, args.verbose)
logger.info("Starting startpakket processing")
logger.info(f"Predelib file: {args.predelib}")
logger.info(f"Dashboard file: {args.dashboard}")
# Process files
# Process the Excel files
results = process_files(args.predelib, args.dashboard, args.verbose)
# Save results if output path specified
# Save results to file if specified
if args.output:
save_results(results, args.output)
# Print summary
# Print summary to console
print_summary(results)
# Exit with appropriate code
exit_code = 0 if results['mismatches_count'] == 0 else 1
exit_code = get_exit_code(results)
logger.info(f"Processing completed with exit code: {exit_code}")
sys.exit(exit_code)
except KeyboardInterrupt:
logger.info("Processing interrupted by user")
print("\nProcessing interrupted by user")
sys.exit(130)
except Exception as e:
logger.error(f"Fatal error: {e}")
print(f"Error: {e}")
print(f"Fatal error: {e}")
sys.exit(1)