# Excel Filtering and Formatting Script

The file in this repository filters has the intent to filter all written exams that require a 'large room' 
(>65 inschrijvingen) and thus need to be brought to the meeting which assigns large rooms to written exams on campus. The output has
the layout in mind of the master file provided by E-campus but may need changes if the master file changes.

## Prerequisites

Make sure you have the following software installed:
- Python 3.x
- Pip (Python package installer)

## Required Packages

The script depends on the following Python packages:
- `pandas`

You can install the required package using pip:
```bash
pip install pandas
```

## Description

The script performs the following operations:

1. Reads data from the specified Excel file and sheet.
2. Filters rows based on the value of the 'Examenvorm' column and the count of 'Aant. inschr.' column.
3. Selects specific columns from the filtered DataFrame.
4. Formats time strings in the columns 'Beginuur S+' and 'Einduur S+'.
5. Cleans the 'Docenten' column by removing specific keywords and trimming whitespace.
6. Writes the processed DataFrame to a new Excel file.

## Usage

1. Place your Excel file in the same directory as the script.
2. Update the `file_path` and `sheet_name` variables in the script with your specific file path and sheet name.
3. Run the script:

```bash
python script.py
```

## Code

```python
import pandas as pd

file_path = 'file.xlsx'
sheet_name = 'ps (32)'

# Read the data from the Excel file
df = pd.read_excel(file_path, sheet_name=sheet_name)

# Filter the data based on certain criteria
filtered_df = df[df['Examenvorm'] == 'Schriftelijk']
filtered_df = filtered_df[filtered_df['Aant. inschr.'] > 65]
filtered_df = filtered_df[['Datum S+', 'Beginuur S+', 'Einduur S+', 'Studiegidsnr.', 'Omschrijving', 'Docenten', 'Aant. inschr.']]

# Format the time strings
filtered_df['Beginuur S+'] = filtered_df['Beginuur S+'].apply(lambda x: x.strftime('%H:%M'))
filtered_df['Einduur S+'] = filtered_df['Einduur S+'].apply(lambda x: x.strftime('%H:%M'))
filtered_df['Docenten'] = filtered_df['Docenten'].str.replace(r'\b(Titularis|Co-Titularis|Medewerker)\b', '', regex=True).str.strip()

# Save the filtered and formatted data to a new Excel file
filtered_df.to_excel('filtered_grote_lokalen.xlsx', index=False)
```

## Additional Notes

- This script assumes that the input Excel file has specific columns like 'Examenvorm', 'Aant. inschr.', 'Datum S+', 'Beginuur S+', 'Einduur S+', 'Studiegidsnr.', 'Omschrijving', and 'Docenten'.
- Make sure that the time columns ('Beginuur S+' and 'Einduur S+') are in datetime format in the original Excel file for the `.strftime('%H:%M')` method to work correctly.
- The `Docenten` column will be cleaned by removing occurrences of the keywords 'Titularis', 'Co-Titularis', and 'Medewerker'.

Feel free to adjust the script according to your specific needs.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.