Skip to content

Latest commit

 

History

History
21 lines (12 loc) · 1.02 KB

README.md

File metadata and controls

21 lines (12 loc) · 1.02 KB

exam-mcq-nl


This repository contains code to extract multiple choice questions from Dutch high school exams.

Usage

  1. Download selected PDFs: download_pdfs.py
  2. Extract plain text from PDFs: pdf2text.py
  3. Extract multiple choice questions and answers from plain text: text2json.py

This code could be used to extract and process data from Examenblad.nl, as published on alleexamens.nl. The rights of these exams belong to the State of The Netherlands. Please refer to their copyright statement for more information.

Note: The question filtering part should be improved before using these questions directly: e.g. adding more keywords that refer to outside sources and length filtering to avoid concatenated questions. Note2: Most questions are filtered by hand, as there were no keywords present.

Dataset

Find the dataset on Huggingface.