Skip to content

Research project for the Dutch Digital Heritage Network (NDE) focused on predicting obsolete file formats

License

Notifications You must be signed in to change notification settings

samalloing/NDE-monitoring-file-formats

 
 

Repository files navigation

NDE-monitoring-file-formats

Research project for the Dutch Digital Heritage Network (NDE) focused on predicting obsolete file formats

How to use this repository

If you want to perform some of the analyses contained in this code repository, you need a recent Python installation and a dependency manager. I chose to use Pipenv because it specifies both dependencies, the versions used and the Python version I used to create the scripts.

You can install Pipenv with:

pip install pipenv

after which you can install the dependencies used here with:

git clone https://github.com/Antfield-Creations/NDE-monitoring-file-formats
cd NDE-monitoring-file-formats
pipenv install

This will create a virtual environment (a "virtualenv") with the installed dependencies. After installation, you can run

pipenv shell

to log into the virtual environment. The following analyses are available for you perusal:

  • The common crawl analysis: pipenv run python -m analysis.common_crawl
  • The Netherlands Institute for Sound and Vision (NIBG): pipenv run python -m analysis.nibg_analysis. This uses the prebuilt aggregated statistics for the filetypes per month.
  • The Data Archiving and Networked services analysis is still a work in progress.

Changing the settings

The code in this repository is mostly "config-driven". This means that there is a config.yaml in the root of this repository that configures which file formats are included in the analyses. You can tune them to your liking.

Using this repository as a library

This code repository is installable using Pip(env), because there is a setup.py installation script in the root of this project. In the library is a Python implementation of the Bass diffusion model. It allows you to generate data for plots like this: diffusion plot

Installation

You can use the installation command as follows:

cd my_experimentation_folder
pipenv install git+https://github.com/Antfield-Creations/NDE-monitoring-file-formats#egg=bass_diffusion

Usage

Once you have installed the library, you can use it in Python (remember to do pipenv run python first):

from bass_diffusion import BassDiffusionModel

bass_model = BassDiffusionModel()
times = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,]
values = [200, 300, 600, 900, 800, 500, 300, 100, 50, 20]
bass_model.fit(times, values)
interpolated = bass_model.predict([0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5])

The algorithm is very fast and can handle a lot of data. However, it's not very robust versus noise.

About

Research project for the Dutch Digital Heritage Network (NDE) focused on predicting obsolete file formats

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 63.1%
  • Python 36.8%
  • Makefile 0.1%