Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Enhance XRONOS with Model-Based Perio.do Matching Strategy #351

Open
MartinHinz opened this issue Dec 1, 2024 · 0 comments
Open
Labels
enhancement New feature or request lod Linked Open Data

Comments

@MartinHinz
Copy link
Collaborator

I had ChatGPT come up with a strategy for including perio.do data into XRONOS, here is the result:


This strategy integrates Perio.do periods into XRONOS by matching Typo entries to Perio.do periods (PeriodoItem) based on name and country (derived from site via context).

1. Add a PeriodoItem Model

Introduce a PeriodoItem model to store Perio.do periods.

Example schema:

class PeriodoItem < ApplicationRecord
  validates :name, presence: true
  validates :period_id, presence: true
  validates :countries, presence: true
end

Attributes:

  • name (string): Name of the historical period (e.g., “Bronze Age”).
  • period_id (string): Unique identifier for the period.
  • countries (array or text): List of associated countries (names or Wikidata Q entities).

2. Synchronize Perio.do Data

Create a rake task to fetch and populate the PeriodoItem model with data from Perio.do.

Example task:

namespace :periodo do
  desc "Sync Perio.do data with XRONOS"
  task sync: :environment do
    require 'open-uri'
    require 'json'

    url = 'https://data.perio.do/dataset/'
    periodo_data = JSON.parse(URI.open(url).read)

    periodo_data['authorities'].each do |authority|
      authority['definitions'].each do |key, period|
        PeriodoItem.find_or_create_by(
          name: period['label'].strip,
          period_id: key,
          countries: period['spatialCoverage']&.map { |place| place['label'] }
        )
      end
    end
  end
end

Data Structure: The Perio.do dataset organizes periods within authorities, with each period containing a label and spatialCoverage (countries).

3. Match Logic in Models

a) Matching Periods to a Typo

Add a method in the Typo model to find matching periods.

Example method:

class Typo < ApplicationRecord
  def match_to_periods
    PeriodoItem.where(
      "name ILIKE ? AND countries @> ARRAY[?]::varchar[]",
      self.name,
      self.site&.country_name
    )
  end
end

Key Points:

  • self.name: Matches the Typo name to the PeriodoItem name.
  • self.site&.country_name: Fetches the associated site’s country via context.

Usage:

typo = Typo.find(1)
matching_periods = typo.match_to_periods
matching_periods.each do |period|
  puts "Typo '#{typo.name}' matches Period '#{period.name}' (#{period.period_id})"
end

b) Matching Typos to a PeriodoItem

Add a method in the PeriodoItem model to find matching Typo entries.

Example method:

class PeriodoItem < ApplicationRecord
  def match_typos
    Typo.joins(:sample).joins("INNER JOIN contexts ON contexts.id = samples.context_id")
        .joins("INNER JOIN sites ON sites.id = contexts.site_id")
        .where(
          "typos.name ILIKE ? AND sites.country_name = ANY(?)",
          self.name,
          self.countries
        ).distinct
  end
end

Key Points:

  • Joins Sample, Context, and Site to access country_name from the associated site.
  • Matches Typo names and country data with PeriodoItem.

Usage:

period = PeriodoItem.find(1)
matching_typos = period.match_typos
matching_typos.each do |typo|
  puts "Period '#{period.name}' matches Typo '#{typo.name}'"
end

4. Further Tasks

  • Fuzzy Matching: Use fuzzy matching for names to handle slight differences.
  • Store match results by adding a periodo_id field to a link table similar to the wikidata approach.

Benefits

  • Links Typo entries to Perio.do periods using name and country, leveraging the existing relationships with site.
  • Integrates seamlessly with the current XRONOS schema.
  • Facilitates interoperability and historical analysis.

Potential Issues

  • Name-based matching may lead to false positives for ambiguous names.
  • Requires consistent and accurate country data in both Perio.do and XRONOS.

Related issues:

@MartinHinz MartinHinz added enhancement New feature or request lod Linked Open Data labels Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lod Linked Open Data
Projects
None yet
Development

No branches or pull requests

1 participant