NDE-monitoring-file-formats

Research project for the Dutch Digital Heritage Network (NDE) focused on predicting obsolete file formats

How to use this repository

If you want to perform some of the analyses contained in this code repository, you need a recent Python installation and a dependency manager. I chose to use Pipenv because it specifies both dependencies, the versions used and the Python version I used to create the scripts.

You can install Pipenv with:

pip install pipenv

after which you can install the dependencies used here with:

git clone https://github.com/Antfield-Creations/NDE-monitoring-file-formats
cd NDE-monitoring-file-formats
pipenv install

This will create a virtual environment (a "virtualenv") with the installed dependencies. After installation, you can run

pipenv shell

to log into the virtual environment. The following analyses are available for you perusal:

The common crawl analysis: pipenv run python -m analysis.common_crawl
The Netherlands Institute for Sound and Vision (NIBG): pipenv run python -m analysis.nibg_analysis. This uses the prebuilt aggregated statistics for the filetypes per month.
The Data Archiving and Networked services analysis is still a work in progress.

Changing the settings

The code in this repository is mostly "config-driven". This means that there is a config.yaml in the root of this repository that configures which file formats are included in the analyses. You can tune them to your liking.

Using this repository as a library

This code repository is installable using Pip(env), because there is a setup.py installation script in the root of this project. In the library is a Python implementation of the Bass diffusion model. It allows you to generate data for plots like this:

Installation

You can use the installation command as follows:

cd my_experimentation_folder
pipenv install git+https://github.com/Antfield-Creations/NDE-monitoring-file-formats#egg=bass_diffusion

Usage

Once you have installed the library, you can use it in Python (remember to do pipenv run python first):

from bass_diffusion import BassDiffusionModel

bass_model = BassDiffusionModel()
times = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,]
values = [200, 300, 600, 900, 800, 500, 300, 100, 50, 20]
bass_model.fit(times, values)
interpolated = bass_model.predict([0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5])

The algorithm is very fast and can handle a lot of data. However, it's not very robust versus noise.

Name		Name	Last commit message	Last commit date
Latest commit History 342 Commits
.github/workflows		.github/workflows
analysis		analysis
data		data
images		images
models		models
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
config.yaml		config.yaml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NDE-monitoring-file-formats

How to use this repository

Changing the settings

Using this repository as a library

Installation

Usage

About

Releases

Packages

Languages

License

samalloing/NDE-monitoring-file-formats

Folders and files

Latest commit

History

Repository files navigation

NDE-monitoring-file-formats

How to use this repository

Changing the settings

Using this repository as a library

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages