79 lines
3.0 KiB
Markdown
79 lines
3.0 KiB
Markdown
|
# Excel Filtering and Formatting Script
|
||
|
|
||
|
The file in this repository filters has the intent to filter all written exams that require a 'large room'
|
||
|
(>65 inschrijvingen) and thus need to be brought to the meeting which assigns large rooms to written exams on campus. The output has
|
||
|
the layout in mind of the master file provided by E-campus but may need changes if the master file changes.
|
||
|
|
||
|
## Prerequisites
|
||
|
|
||
|
Make sure you have the following software installed:
|
||
|
- Python 3.x
|
||
|
- Pip (Python package installer)
|
||
|
|
||
|
## Required Packages
|
||
|
|
||
|
The script depends on the following Python packages:
|
||
|
- `pandas`
|
||
|
|
||
|
You can install the required package using pip:
|
||
|
```bash
|
||
|
pip install pandas
|
||
|
```
|
||
|
|
||
|
## Description
|
||
|
|
||
|
The script performs the following operations:
|
||
|
|
||
|
1. Reads data from the specified Excel file and sheet.
|
||
|
2. Filters rows based on the value of the 'Examenvorm' column and the count of 'Aant. inschr.' column.
|
||
|
3. Selects specific columns from the filtered DataFrame.
|
||
|
4. Formats time strings in the columns 'Beginuur S+' and 'Einduur S+'.
|
||
|
5. Cleans the 'Docenten' column by removing specific keywords and trimming whitespace.
|
||
|
6. Writes the processed DataFrame to a new Excel file.
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
1. Place your Excel file in the same directory as the script.
|
||
|
2. Update the `file_path` and `sheet_name` variables in the script with your specific file path and sheet name.
|
||
|
3. Run the script:
|
||
|
|
||
|
```bash
|
||
|
python script.py
|
||
|
```
|
||
|
|
||
|
## Code
|
||
|
|
||
|
```python
|
||
|
import pandas as pd
|
||
|
|
||
|
file_path = 'file.xlsx'
|
||
|
sheet_name = 'ps (32)'
|
||
|
|
||
|
# Read the data from the Excel file
|
||
|
df = pd.read_excel(file_path, sheet_name=sheet_name)
|
||
|
|
||
|
# Filter the data based on certain criteria
|
||
|
filtered_df = df[df['Examenvorm'] == 'Schriftelijk']
|
||
|
filtered_df = filtered_df[filtered_df['Aant. inschr.'] > 65]
|
||
|
filtered_df = filtered_df[['Datum S+', 'Beginuur S+', 'Einduur S+', 'Studiegidsnr.', 'Omschrijving', 'Docenten', 'Aant. inschr.']]
|
||
|
|
||
|
# Format the time strings
|
||
|
filtered_df['Beginuur S+'] = filtered_df['Beginuur S+'].apply(lambda x: x.strftime('%H:%M'))
|
||
|
filtered_df['Einduur S+'] = filtered_df['Einduur S+'].apply(lambda x: x.strftime('%H:%M'))
|
||
|
filtered_df['Docenten'] = filtered_df['Docenten'].str.replace(r'\b(Titularis|Co-Titularis|Medewerker)\b', '', regex=True).str.strip()
|
||
|
|
||
|
# Save the filtered and formatted data to a new Excel file
|
||
|
filtered_df.to_excel('filtered_grote_lokalen.xlsx', index=False)
|
||
|
```
|
||
|
|
||
|
## Additional Notes
|
||
|
|
||
|
- This script assumes that the input Excel file has specific columns like 'Examenvorm', 'Aant. inschr.', 'Datum S+', 'Beginuur S+', 'Einduur S+', 'Studiegidsnr.', 'Omschrijving', and 'Docenten'.
|
||
|
- Make sure that the time columns ('Beginuur S+' and 'Einduur S+') are in datetime format in the original Excel file for the `.strftime('%H:%M')` method to work correctly.
|
||
|
- The `Docenten` column will be cleaned by removing occurrences of the keywords 'Titularis', 'Co-Titularis', and 'Medewerker'.
|
||
|
|
||
|
Feel free to adjust the script according to your specific needs.
|
||
|
|
||
|
## License
|
||
|
|
||
|
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|