This repository contains a series of Jupyter notebooks designed to download, transform, and analyze school-level data relevant to the Chamberlin Education Foundation, which supports schools in California. The data utilized in this project is sourced from the California Department of Education, and it aims to provide insights into various aspects of school performance and demographics.
The project is organized into three main phases, each handled by specific Jupyter notebooks:
The download_format_files_to_csv.ipynb
notebook automates the download and conversion of data from text files to more accessible CSV formats. This data is stored in designated folders for subsequent analysis. Detailed data sources include:
- Cumulative Enrollment Data
- English Language Arts/Literacy and Mathematics
- Chronic Absenteeism Data
- Suspension Data
- Expulsion Data
- Enrollment by ELAS, LTEL, and At-Risk by Grade
Several notebooks are dedicated to transforming historical data files into a unified format, similar to stacking records in a database:
create_caaspp_dataset.ipynb
create_chronic_absenteeism_dataset.ipynb
create_expulsion_dataset.ipynb
create_suspension_dataset.ipynb
create_cum_enrollment_dataset.ipynb
create_enrollment_el_dataset.ipynb
These notebooks produce six comprehensive datasets, which are stored in the final_long_datasets_domain
folder.
This phase consists of two key notebooks:
create_union_of_datasets.ipynb
: Combines the datasets from Phase 2 into a single, long format dataset, resulting in themetric_values_fact.csv
.create_merge_enrollment.ipynb
: Merges enrollment-related datasets to create a wide dataset format, enhancing the analysis capabilities.
The data analyzed here primarily pertains to schools served by the Chamberlin Education Foundation. A complete list of these schools can be found in the cef_school_list.csv
file, included in this repository.
The goal of this project is to provide data modles which can then be accessed via a BI tool like Tableau, which can enable stakeholders with actionable insights into school performance and demographic trends over multiple years, aiding stakeholders in making informed decisions to support educational initiatives.
Non-technical users interested in exploring the processed data can refer to the final CSV files generated by the notebooks. Technical users can execute the notebooks to understand the detailed steps involved in the data processing.