bdaneels b021eabdab Refactor and document code; add new files

Refactored `script.py` by adding detailed docstrings and organizing functions. Created `.idea` configuration files and `gotodashboard.js` for `sisa_crawl` project. Added `readme.md` files with usage instructions and context for multiple scripts, and set up `package.json` for `sisa_crawl` dependencies.

2024-11-18 14:03:25 +01:00

3.0 KiB

Raw Permalink Blame History

Excel Filtering and Formatting Script

The file in this repository filters has the intent to filter all written exams that require a 'large room' (>65 inschrijvingen) and thus need to be brought to the meeting which assigns large rooms to written exams on campus. The output has the layout in mind of the master file provided by E-campus but may need changes if the master file changes.

Prerequisites

Make sure you have the following software installed:

Python 3.x
Pip (Python package installer)

Required Packages

The script depends on the following Python packages:

pandas

You can install the required package using pip:

pip install pandas

Description

The script performs the following operations:

Reads data from the specified Excel file and sheet.
Filters rows based on the value of the 'Examenvorm' column and the count of 'Aant. inschr.' column.
Selects specific columns from the filtered DataFrame.
Formats time strings in the columns 'Beginuur S+' and 'Einduur S+'.
Cleans the 'Docenten' column by removing specific keywords and trimming whitespace.
Writes the processed DataFrame to a new Excel file.

Usage

Place your Excel file in the same directory as the script.
Update the file_path and sheet_name variables in the script with your specific file path and sheet name.
Run the script:

python script.py

Code

import pandas as pd

file_path = 'file.xlsx'
sheet_name = 'ps (32)'

# Read the data from the Excel file
df = pd.read_excel(file_path, sheet_name=sheet_name)

# Filter the data based on certain criteria
filtered_df = df[df['Examenvorm'] == 'Schriftelijk']
filtered_df = filtered_df[filtered_df['Aant. inschr.'] > 65]
filtered_df = filtered_df[['Datum S+', 'Beginuur S+', 'Einduur S+', 'Studiegidsnr.', 'Omschrijving', 'Docenten', 'Aant. inschr.']]

# Format the time strings
filtered_df['Beginuur S+'] = filtered_df['Beginuur S+'].apply(lambda x: x.strftime('%H:%M'))
filtered_df['Einduur S+'] = filtered_df['Einduur S+'].apply(lambda x: x.strftime('%H:%M'))
filtered_df['Docenten'] = filtered_df['Docenten'].str.replace(r'\b(Titularis|Co-Titularis|Medewerker)\b', '', regex=True).str.strip()

# Save the filtered and formatted data to a new Excel file
filtered_df.to_excel('filtered_grote_lokalen.xlsx', index=False)

Additional Notes

This script assumes that the input Excel file has specific columns like 'Examenvorm', 'Aant. inschr.', 'Datum S+', 'Beginuur S+', 'Einduur S+', 'Studiegidsnr.', 'Omschrijving', and 'Docenten'.
Make sure that the time columns ('Beginuur S+' and 'Einduur S+') are in datetime format in the original Excel file for the .strftime('%H:%M') method to work correctly.
The Docenten column will be cleaned by removing occurrences of the keywords 'Titularis', 'Co-Titularis', and 'Medewerker'.

Feel free to adjust the script according to your specific needs.

License

This project is licensed under the MIT License. See the LICENSE file for details.

3.0 KiB Raw Permalink Blame History