b021eabdab
Refactored `script.py` by adding detailed docstrings and organizing functions. Created `.idea` configuration files and `gotodashboard.js` for `sisa_crawl` project. Added `readme.md` files with usage instructions and context for multiple scripts, and set up `package.json` for `sisa_crawl` dependencies.
3.0 KiB
3.0 KiB
Excel Filtering and Formatting Script
The file in this repository filters has the intent to filter all written exams that require a 'large room' (>65 inschrijvingen) and thus need to be brought to the meeting which assigns large rooms to written exams on campus. The output has the layout in mind of the master file provided by E-campus but may need changes if the master file changes.
Prerequisites
Make sure you have the following software installed:
- Python 3.x
- Pip (Python package installer)
Required Packages
The script depends on the following Python packages:
pandas
You can install the required package using pip:
pip install pandas
Description
The script performs the following operations:
- Reads data from the specified Excel file and sheet.
- Filters rows based on the value of the 'Examenvorm' column and the count of 'Aant. inschr.' column.
- Selects specific columns from the filtered DataFrame.
- Formats time strings in the columns 'Beginuur S+' and 'Einduur S+'.
- Cleans the 'Docenten' column by removing specific keywords and trimming whitespace.
- Writes the processed DataFrame to a new Excel file.
Usage
- Place your Excel file in the same directory as the script.
- Update the
file_path
andsheet_name
variables in the script with your specific file path and sheet name. - Run the script:
python script.py
Code
import pandas as pd
file_path = 'file.xlsx'
sheet_name = 'ps (32)'
# Read the data from the Excel file
df = pd.read_excel(file_path, sheet_name=sheet_name)
# Filter the data based on certain criteria
filtered_df = df[df['Examenvorm'] == 'Schriftelijk']
filtered_df = filtered_df[filtered_df['Aant. inschr.'] > 65]
filtered_df = filtered_df[['Datum S+', 'Beginuur S+', 'Einduur S+', 'Studiegidsnr.', 'Omschrijving', 'Docenten', 'Aant. inschr.']]
# Format the time strings
filtered_df['Beginuur S+'] = filtered_df['Beginuur S+'].apply(lambda x: x.strftime('%H:%M'))
filtered_df['Einduur S+'] = filtered_df['Einduur S+'].apply(lambda x: x.strftime('%H:%M'))
filtered_df['Docenten'] = filtered_df['Docenten'].str.replace(r'\b(Titularis|Co-Titularis|Medewerker)\b', '', regex=True).str.strip()
# Save the filtered and formatted data to a new Excel file
filtered_df.to_excel('filtered_grote_lokalen.xlsx', index=False)
Additional Notes
- This script assumes that the input Excel file has specific columns like 'Examenvorm', 'Aant. inschr.', 'Datum S+', 'Beginuur S+', 'Einduur S+', 'Studiegidsnr.', 'Omschrijving', and 'Docenten'.
- Make sure that the time columns ('Beginuur S+' and 'Einduur S+') are in datetime format in the original Excel file for the
.strftime('%H:%M')
method to work correctly. - The
Docenten
column will be cleaned by removing occurrences of the keywords 'Titularis', 'Co-Titularis', and 'Medewerker'.
Feel free to adjust the script according to your specific needs.
License
This project is licensed under the MIT License. See the LICENSE file for details.