Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add random seed to ensure reproducibility #80

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

ni9999
Copy link
Contributor

@ni9999 ni9999 commented Oct 8, 2024

Description

Added random_seed parameter to data_generator

Related Issue

#20

Changes Made

  • Added parameter random_seed in generate() function of CdsDataGenerator, EncounterGenerator, ConditionGenerator, ProcedureGenerator, MedicationRequestGenerator, PatientGenerator
  • Used Faker.seed() method

Testing

Tested by generating data according to documentation multiple time with same seed

from healthchain.data_generators import CdsDataGenerator
from healthchain.base import Workflow

# Initialise data generator
data_generator = CdsDataGenerator()

# Generate FHIR resources for use case workflow
data_generator.set_workflow(Workflow.encounter_discharge)
data = data_generator.generate(random_seed=42)

print(data.model_dump())

Checklist

  • I have read the contributing guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

Though the generated date of faker.date_time() remains same for a fixed seed, generated time differs. This may affect reproducibility.

Copy link
Contributor

@adamkells adamkells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Good spot on the date time with Faker. We may need to think about that longer term but this is perfect for now. Thank you!

@adamkells adamkells merged commit 4318296 into dotimplement:main Oct 10, 2024
5 checks passed
@ni9999 ni9999 deleted the random_seed branch October 24, 2024 08:17
@jenniferjiangkells jenniferjiangkells linked an issue Nov 26, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding Random Seed to Data Generators
2 participants