Skip to content

Code to extract multiple choice questions from Dutch high school exams

Notifications You must be signed in to change notification settings

esther2000/exam-mcq-nl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

exam-mcq-nl


This repository contains code to extract multiple choice questions from Dutch high school exams.

Usage

  1. Download selected PDFs: download_pdfs.py
  2. Extract plain text from PDFs: pdf2text.py
  3. Extract multiple choice questions and answers from plain text: text2json.py

This code could be used to extract and process data from Examenblad.nl, as published on alleexamens.nl. The rights of these exams belong to the State of The Netherlands. Please refer to their copyright statement for more information.

Note: The question filtering part should be improved before using these questions directly: e.g. adding more keywords that refer to outside sources and length filtering to avoid concatenated questions. Note2: Most questions are filtered by hand, as there were no keywords present.

Dataset

Find the dataset on Huggingface.

About

Code to extract multiple choice questions from Dutch high school exams

Resources

Stars

Watchers

Forks

Languages