Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store the intermediary results of the custom synthesizers, but the generated result file cannot be opened #321

Open
T0217 opened this issue Jul 3, 2024 · 0 comments
Labels
bug Something isn't working new Automatic label applied to new issues

Comments

@T0217
Copy link

T0217 commented Jul 3, 2024

Environment Details

  • SDGym version: 0.8.0
  • Python version: 3.11.5
  • Operating System: Windows 11

Error Description

Thank you for sharing the code. When creating a custom synthesizer in SDGym, it is important to store the intermediate results. However, the generated result file cannot be opened. And I can not find the code to generate the file, what should I do?

Steps to reproduce

import os
import shutil
import sdgym
from sdgym import create_single_table_synthesizer
from sdgym.synthesizers import (UniformSynthesizer,
                                GaussianCopulaSynthesizer,
                                TVAESynthesizer)
import warnings
warnings.filterwarnings('ignore')

synthesizers = [
    UniformSynthesizer,
    GaussianCopulaSynthesizer,
    TVAESynthesizer
]


# YData
# CTGAN
def ctgan_get_trained_synthesizer(data, metadata):
    from ydata_synthetic.synthesizers.regular import RegularSynthesizer
    from ydata_synthetic.synthesizers import ModelParameters, TrainParameters

    ctgan_args = ModelParameters(batch_size=500, lr=2e-4, betas=(0.5, 0.9))
    train_args = TrainParameters(epochs=2)

    synthesizer = RegularSynthesizer(modelname='ctgan', model_parameters=ctgan_args)

    num_cols = [col for col, sdtype in metadata['columns'].items() if sdtype['sdtype'] in ['numerical', 'datetime']]
    cat_cols = [col for col, sdtype in metadata['columns'].items() if sdtype['sdtype'] == 'categorical']

    synthesizer.fit(data=data,
                    train_arguments=train_args,
                    num_cols=num_cols,
                    cat_cols=cat_cols)

    return synthesizer


def sample_from_synthesizer(synthesizer, n_rows):
    synthetic_data = synthesizer.sample(n_rows)
    return synthetic_data


YData_CTGANSynthesizer = create_single_table_synthesizer(
    get_trained_synthesizer_fn=ctgan_get_trained_synthesizer,
    sample_from_synthesizer_fn=sample_from_synthesizer,
    display_name='YData-CTGAN'
)


custom_synthesizers = [YData_CTGANSynthesizer]

# Detect the existence of the folder
detailed_results_folder = r"C:\Users\18840\Desktop\result"

if os.path.isdir(detailed_results_folder) and\
   os.path.exists(detailed_results_folder):
    print('The folder where the intermediate files are stored already exists and is processed for deletion.')
    shutil.rmtree(detailed_results_folder, ignore_errors=True)
    print('-' * 50)

results = sdgym.benchmark_single_table(
    synthesizers=synthesizers,
    custom_synthesizers=custom_synthesizers,
    show_progress=True,
    multi_processing_config={
     'package_name': 'multiprocessing',
     'num_workers': 8
    },
    sdv_datasets=['adult'],
    detailed_results_folder=detailed_results_folder
)

Here is an example of the output files.
Snipaste_2024-07-03_13-02-11

@T0217 T0217 added bug Something isn't working new Automatic label applied to new issues labels Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working new Automatic label applied to new issues
Projects
None yet
Development

No branches or pull requests

1 participant