From 5fc3485bb7ad0453b6300fcee08242a29099e6ed Mon Sep 17 00:00:00 2001 From: Craig Date: Thu, 5 Dec 2024 15:55:57 -0800 Subject: [PATCH 1/5] README: added preliminary section on schema processing --- README.md | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 109 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 0e5e0d58..0003692e 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,7 @@ Each part of Data Driven Testing is designed to handle a specific ICU version. ## Architectural Overview -Conceptually, there are three main functional units of the DDT implementation: +Conceptually, there are four main functional units of the DDT implementation: ![Conceptual model of Data Driven Testing](./ddt_concept_model.png) @@ -93,6 +93,114 @@ parameters to be set for computing a result. }, ``` +## Checking data using schemas + +Several types of JSON formatted data are created and used by Conformance +processing, and the integity of this information must be maintained. + +Data Driven Testing uses [JSON Schema +Validation](https://python-jsonschema.readthedocs.io/en/latest/validate/) to +insure the structure of data files. JSON schema make sure that needed parameters +and other information are present as required and that the type of each data +item is as specified. + +In addition, schema specification can restrict the range of data fields to those +expected, allowing only those data that are expected in JSON output files. This +gives a measure of confidence in the data exchanged between the phases of +Conformance Testing. + +The types of data include: + +* **Generated test data** including all parameters and settings as well as + ancilliary descriptive information for each test. This data depends only on + the type of test (component) and the ICU version. It does not depend on the + particular execution platorm, i.e., programming languages. + +* **Expected results** of running each test running with the specified ICU + version. This does not depend on the platform. + +* **Actual results** from executing each test in the input. This contains actual + results from the execution of each test in a given platform. This data may + include output from the executor including platform-specific parameters and + settings derived from the input data. It may also include data on errors + encountered during the run of the platform executor. + +* **Schema files** describing the expected JSON format of each type of files for + the components. + +Schema validation is performed at these times in standard processing: + +1. After test data generation, all generated test data and expected result data + files are checked for correct structure. + +2. Before test execution, the schema files themselves are checked for correct + schema structure.. + +3. After test executors are run, all resuulting test output files are checked + for correct structure. + + +Top level directory `schema` contains the following: + +* One subdirectory for each component such as "collation". This contains schema.json files for generated tests, expected results, and test output structure. + +* Python routines for checking these types of data. + +* A Python routine for validating the structure of the .json schema files. + +``` +$ ls schema/*.py +schema/check_generated_data.py schema/check_test_output.py schema/__init__.py schema/schema_validator.py +schema/check_schemas.py schema/check_verify_data.py schema/schema_files.py + +$ tree schema/* + +schema/collation_short +├── result_schema.json +├── test_schema.json +└── verify_schema.json +schema/datetime_fmt +├── result_schema.json +├── test_schema.json +└── verify_schema.json +schema/lang_names +├── #result_schema.json# +├── result_schema.json +├── test_schema.json +└── verify_schema.json +schema/likely_subtags +├── result_schema.json +├── test_schema.json +└── verify_schema.json +schema/list_fmt +├── #result_schema.json# +├── result_schema.json +├── test_schema.json +└── verify_schema.json +schema/message_fmt2 +├── README.md +├── result_schema.json +├── testgen_schema.json +├── test_schema.json +└── verify_schema.json +schema/number_format +├── result_schema.json +├── test_schema.json +└── verify_schema.json +schema/plural_rules +├── result_schema.json +├── test_schema.json +└── verify_schema.json +schema/rdt_fmt +├── result_schema.json +├── test_schema.json +└── verify_schema.json +``` + +Note also that the schema validation creates a file +**schema/schema_validation_summary.json** +which is used in the summary presentation of the Conformance results. + ## Text Execution Test execution consists of a Test Driver script and implementation-specific From b5cfe37b5de6c42247979c559e533545eec5c104 Mon Sep 17 00:00:00 2001 From: Craig Date: Thu, 5 Dec 2024 18:20:52 -0800 Subject: [PATCH 2/5] Added text for adding ICU versions and new components --- README.md | 190 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 185 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 0003692e..f8595b35 100644 --- a/README.md +++ b/README.md @@ -228,12 +228,18 @@ results. A report of the test results is generated. Several kinds of status values are possible for each test item: * **Success**: the actual result agrees with expected results -* **Failure**: a result is generated, but the result is not the same as the expected -value. + +* **Failure**: a result is generated, but the result is not the same as the +expected value. item + +* **Error**: the test resulted in an exception or other behavior not anticipated +for the test case + +* **Known issue**: The test failure or error is known for the version of the + platform and ICU. Note that each type of known issue should reference a + publicly documented issue in ICU or CLDR. + * **No test run**: The test was not executed by the test implementation for the data -item -* **Error**: the test resulted in an exception or other behavior not anticipated for -the test case ### Open questions for the verifier * What should be done if the test driver fails to complete? How can this be @@ -243,6 +249,180 @@ the test case indicating that the test driver finished its execution normally, i.e., did not crash. + +# How to update Conformance Test: ICU versions, platforms, components + +Data Driven Testing is meant to stay current with ICU programs and data. It is also designed to support new testing platforms such as ICU4X, Dart, etc. And new types of testing, i.e., "components", may be added to Conformance testing. + +This section describes the process for keeping DDT up to date with needed test types and required programming platforms + +## Incorporating new ICU / CLDR versions into DDT + +ICU releases are usually made twice each calendar year, incorporating new data, +fixes, and new test files. ICU versions may also add new types of data +processing. A recent example is Message Format 2. + +Because Data Driven Testing operations with multiple ICU and CLDR versions, this system should be updated with each new ICU release. Here are several pull requests for recent ICU updates: + +* [ICU 76 for C++](https://github.com/unicode-org/conformance/pull/325/) + +* [ICU76 for NodeJS](https://github.com/unicode-org/conformance/pull/348) + +### ICU4C updates + +These are usually the first changes to be made because ICU4C includes both code and test data updates for many components. + +1. Test Driver: +* Add new ICU version data in several places in testdriver/datasets.py + +2. testgen: +* Add a new directory for the icu version under testgen, e.g., icu76 + +* In this directory, copy test data from sources including icu4c/source. Thes files includ collation tests, number format data, and others. + +!!! Add details on the sources. + +* Add new CLDR test data generated from CLDR sources (!!! details !!!) + +3. schema: Add any new parameters in test data sources to test schema files. + +4. Add a function in setup.sh to download the new ICU4C release. + +5. Update run_config.json to reference new versions of executors and tests to run + +### NodeJS and some data updates + +NodeJS is usually updated several weeks after an ICU public release. Check on +the site [Node.js Releases](https://nodejs.org/en/about/previous-releases) for +the latest versions of NodeJS. Under each entry, the "changelog" will indicate +any updates to icu, e.g., [Version 23.3.0 on 2024-11-20] +(https://github.com/nodejs/node/blob/main/doc/changelogs/CHANGELOG_V23.md#23.3.0) which includes ICU76.1. + +#### Add references in testdriver/datasets.py + +In this file, add new Enum values to variables: +* NodeVersion + +* IcuVersionToExecutorMap + +* NodeICUVersionMap + +#### Update run_config.json +Add the new NodeJS version to the run configurations. This includes the command to install and use the latest NodeJS versions. Here's the new entry for ICU76.1 in NodeJS 23.3.0. + +Be sure to add the new version number in both the `nvm install` and `nvm use` parts of `command`. + +Also, include all the tests to be run with this version of NodeJS. + +```` + { + "prereq": { + "name": "nvm 23.3.0, icu76.1", + "version": "23.3.0", + "command": "nvm install 23.3.0;nvm use 23.3.0 --silent" + }, + "run": { + "icu_version": "icu76", + "exec": "node", + "test_type": [ + "collation_short", + "datetime_fmt", + "list_fmt", + "number_fmt", + "lang_names", + "likely_subtags", + "rdt_fmt", + "plural_rules" + ], + "per_execution": 10000 + } + }, +```` + +### Update ICU4J /Java to new ICU version + +** TBD ** +This requires referencing the new ICU4J versions in Maven Central (!!! REFERENCE NEEDED !!!) + +#### run_config.json additions for Java + +Updates to this file are straightforward. + +### Update ICU4X / Rust to new ICU version + +** TBD ** + +ICU4X is actively updating APIs in each new version. ICU4X releases are not closely coordinated with ICU versions. + +Adding a new ICU4X version after 1.4 may require significant changes to existing +#### run_config.json additions for ICU4X + +Updates to this file are straightforward. + +### Update Dart with new ICU versions + +** TBD ** + + +#### Test generator updates +Note that two types of test data are currently generated by NodeJS functions: +* list format +* relative date time format + +Because of this, ICU version updated tests for these two components cannot be run before adding a version of NodeJS that includes the new ICU version. + +When the new NodeJS is incorporated into DDT, add the new NodeJS reference to the list `icu_nvm_versions` in these files: + +1. testgen/generators/list_fmt.py +2. testgen/generators/relativedatetime_fmt.py + + +## Adding new test types / components + + +Tis pull request [PR#183](https://github.com/unicode-org/conformance/pull/183/files) added datetime, list format, and relative date time format to test generation, executors and test driver, schema, verifier, and run configuration. + +Also, see [ICU4J and relative date time format PR#262](https://github.com/unicode-org/conformance/pull/262/files) showing the details of adding a component to the ICU4J platform. + +Note also that the above PR added an [executor file for the Rust / ICU4X](https://github.com/unicode-org/conformance/pull/262/files#diff-f2bce2a303cd07f48c087c798a457ff78eeefbde853adb6a8c331f35b1b5571d) version or relative date time format. + +These are the main updatessteps for adding a new type of testing: + +1. Add methods to add the test data in testgen/icu* and testgen/generators. tests should be installed in icuXX directories as needed. + +* Create python modules in testgen/generators/ to read raw test data, then create .json file with tests and expected resuls. + +* Update testgen/tesdata_gen.py with: +** Import new test generator modules +** Add new Enum values +** Add code to execute the new generator modules + +2. Define new test types in testdriver files: +* datasets.py +* ddtargs.py +* testdriver.py +* testplan.py + +3. Executors: For each executor to run the new tests: +* Add a new code file to run the tests in the executor directory, e.g., `executors/cpp` + +* Update makefile and configuration information to include the new testing code + +* Include calling the new test routines in the main program, e.g,. `main.cpp` + +Hint: Run the executor as a standalone test version, giving sample tests on the command line or in structured test code (i.e., ICU4J's framework.) + +Once the executor is working with the new test type, it can be incorporated into the full execution pipline. + +4. Update run_config.json to reference the new test_type in each executor that supports the component. + +For reference, [PR#183](https://github.com/unicode-org/conformance/pull/183/files)included datetime, list format, and relative date time format. + +## Adding new test platforms, e.g., types of libraries + +** TDB ** + + # How to use DDT In its first implementation, Data Driven Test uses data files formatted with From 3f66bd2cc04a6d7740fc952f23955e55b4e05737 Mon Sep 17 00:00:00 2001 From: Craig Date: Thu, 5 Dec 2024 18:32:25 -0800 Subject: [PATCH 3/5] Adding preliminary information about including new platorms --- README.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/README.md b/README.md index f8595b35..8cf65006 100644 --- a/README.md +++ b/README.md @@ -420,6 +420,42 @@ For reference, [PR#183](https://github.com/unicode-org/conformance/pull/183/file ## Adding new test platforms, e.g., types of libraries +See [Add Dart to executors PR#65](https://github.com/unicode-org/conformance/pull/65) for am example. + +See also the +[Rust executor for ICU4x 1.3 in PR#108](https://github.com/unicode-org/conformance/pull/108) + +Adding a new platform involves several changes to the DDT system: +* Change the workflow to reference the new platform + +* Create a new directory structure under executors/. Add .gitignore as needed. + +* Add configuration specific to the platform in the new directory under executors/ + +* Set up a main program that will receive instructions on the STDIN command line + +** Parse the incoming JSON data to determine test type + +** Build separate files for running each type of test + +** Return results from each testing routine in JSON format + +** Support the commands for information: +*** #VERSION +*** #TEST +*** etc. + +* Update testdriver/datasets.py to include the new executor platform. + + +Note: it is very helpful to include sets of tests for the new platform for each supported component. The ICU4J model with Intellij is a good example. + +Make sure that your new executor can be run from a debugging environment or from the command line. This should be done before adding it to the test drive. + +* Add information to run_config.json to add the new platform and its supported components into the DDT workflow. + + + ** TDB ** From c75ba4312ff0b9fcd3f8c57643371531fc3d38e9 Mon Sep 17 00:00:00 2001 From: Craig Date: Thu, 5 Dec 2024 18:32:25 -0800 Subject: [PATCH 4/5] Adding preliminary information about including new platorms --- README.md | 187 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 145 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index f8595b35..60f4493f 100644 --- a/README.md +++ b/README.md @@ -134,15 +134,15 @@ Schema validation is performed at these times in standard processing: files are checked for correct structure. 2. Before test execution, the schema files themselves are checked for correct - schema structure.. + schema structure. -3. After test executors are run, all resuulting test output files are checked +3. After test executors are run, all resulting test output files are checked for correct structure. Top level directory `schema` contains the following: -* One subdirectory for each component such as "collation". This contains schema.json files for generated tests, expected results, and test output structure. +* One subdirectory for each component such as "collation". This contains schema .json files for generated tests, expected results, and test output structure. * Python routines for checking these types of data. @@ -197,9 +197,9 @@ schema/rdt_fmt └── verify_schema.json ``` -Note also that the schema validation creates a file +Note also that schema validation creates a file **schema/schema_validation_summary.json** -which is used in the summary presentation of the Conformance results. +which is used in the summary presentation of Conformance results. ## Text Execution @@ -223,9 +223,10 @@ See [executors/README](./executors/README.md) for more details ## Verification -Each test is matched with the corresponding data from the required test -results. A report of the test results is generated. Several kinds of status -values are possible for each test item: +In the final phase of Conformance Testing, each individual test is matched with +the corresponding data from the required test results. A report of the test +results is generated. Several kinds of status values are possible for each test +item: * **Success**: the actual result agrees with expected results @@ -239,6 +240,9 @@ for the test case platform and ICU. Note that each type of known issue should reference a publicly documented issue in ICU or CLDR. +* **Unsupported**: Some aspect of the requested test is not yet supported by the + platform and ICU version. + * **No test run**: The test was not executed by the test implementation for the data ### Open questions for the verifier @@ -250,11 +254,15 @@ indicating that the test driver finished its execution normally, i.e., did not crash. -# How to update Conformance Test: ICU versions, platforms, components +# How to update Conformance Testing: ICU versions, platforms, components -Data Driven Testing is meant to stay current with ICU programs and data. It is also designed to support new testing platforms such as ICU4X, Dart, etc. And new types of testing, i.e., "components", may be added to Conformance testing. +Data Driven Testing is expected to remain current with ICU programs and data +updates. It is also designed to support new testing platforms in addition to the +current set of Dart, ICU4C, ICU4J, ICU4X, and NodeJS. And new types of tests, +i.e., "components", may be added to Conformance testing. -This section describes the process for keeping DDT up to date with needed test types and required programming platforms +This section describes the process for keeping DDT up to date with needed test +types and required programming platforms ## Incorporating new ICU / CLDR versions into DDT @@ -262,10 +270,14 @@ ICU releases are usually made twice each calendar year, incorporating new data, fixes, and new test files. ICU versions may also add new types of data processing. A recent example is Message Format 2. -Because Data Driven Testing operations with multiple ICU and CLDR versions, this system should be updated with each new ICU release. Here are several pull requests for recent ICU updates: +Because Data Driven Testing operations with multiple ICU and CLDR versions, this +system should be updated with each new ICU release. Here are several pull +requests for recent ICU updates: * [ICU 76 for C++](https://github.com/unicode-org/conformance/pull/325/) +* [ICU 76 for Java](https://github.com/unicode-org/conformance/pull/344) + * [ICU76 for NodeJS](https://github.com/unicode-org/conformance/pull/348) ### ICU4C updates @@ -373,70 +385,161 @@ Because of this, ICU version updated tests for these two components cannot be ru When the new NodeJS is incorporated into DDT, add the new NodeJS reference to the list `icu_nvm_versions` in these files: -1. testgen/generators/list_fmt.py -2. testgen/generators/relativedatetime_fmt.py +* testgen/generators/list_fmt.py +* testgen/generators/relativedatetime_fmt.py -## Adding new test types / components +## Adding New Test Types / Components +ICU supports a wide range of formatting and other functions. Many are candidates +for Cornformance Testing. Although each has specific needs for testing, this +section presents an overview on adding new test types. -Tis pull request [PR#183](https://github.com/unicode-org/conformance/pull/183/files) added datetime, list format, and relative date time format to test generation, executors and test driver, schema, verifier, and run configuration. +As an example, pull request +[PR#183](https://github.com/unicode-org/conformance/pull/183/files) added +datetime, list format, and relative date time format to test generation, test +executors, test driver, schema, verifier, and runtime configuration. -Also, see [ICU4J and relative date time format PR#262](https://github.com/unicode-org/conformance/pull/262/files) showing the details of adding a component to the ICU4J platform. +Also, see [ICU4J and relative date time format +PR#262](https://github.com/unicode-org/conformance/pull/262/files) for +details of adding a component to the ICU4J platform. Note also that the above PR added an [executor file for the Rust / ICU4X](https://github.com/unicode-org/conformance/pull/262/files#diff-f2bce2a303cd07f48c087c798a457ff78eeefbde853adb6a8c331f35b1b5571d) version or relative date time format. -These are the main updatessteps for adding a new type of testing: +These are the main parts needed to add a component: -1. Add methods to add the test data in testgen/icu* and testgen/generators. tests should be installed in icuXX directories as needed. +1. Add methods to create the test data in testgen/icu* and + testgen/generators. Resulting .json files with test and verification data + should be installed in testgen/icuXX directories as needed. -* Create python modules in testgen/generators/ to read raw test data, then create .json file with tests and expected resuls. +* Create python modules in testgen/generators/ to read raw test data, then + create .json file with tests and expected resuls. * Update testgen/tesdata_gen.py with: -** Import new test generator modules -** Add new Enum values -** Add code to execute the new generator modules +* Import new test generator modules +* Add new Enum values +* Add code to execute* the new generator modules -2. Define new test types in testdriver files: +2. Define new test types in these testdriver files: * datasets.py * ddtargs.py * testdriver.py * testplan.py 3. Executors: For each executor to run the new tests: -* Add a new code file to run the tests in the executor directory, e.g., `executors/cpp` +* Add a new code file to run the tests in the executor directory, e.g., + `executors/cpp` -* Update makefile and configuration information to include the new testing code +* Update configuration information such as makefiles to include the new testing + code * Include calling the new test routines in the main program, e.g,. `main.cpp` -Hint: Run the executor as a standalone test version, giving sample tests on the command line or in structured test code (i.e., ICU4J's framework.) +Hint: Run the executor as a standalone test version, giving sample tests on the +command line or in structured test code (i.e., ICU4J's framework.) -Once the executor is working with the new test type, it can be incorporated into the full execution pipline. +Once the executor is working with the new test type, incorporated it into +the full execution pipeline. -4. Update run_config.json to reference the new test_type in each executor that supports the component. +4. Update run_config.json with the new test_type in each executor supporting + the component. -For reference, [PR#183](https://github.com/unicode-org/conformance/pull/183/files)included datetime, list format, and relative date time format. +For reference, +[PR#183](https://github.com/unicode-org/conformance/pull/183/files)added these +components for datetime, list format, and relative date time format. ## Adding new test platforms, e.g., types of libraries -** TDB ** +As additional test platforms and libraries support all or part of the ICU / CLDR +functions, including them in Conformance Testing will show the degree of +compatibility of actual execution. + +See [Add Dart to executors PR#65](https://github.com/unicode-org/conformance/pull/65) for am example. + +See also the +[Rust executor for ICU4x 1.3 in PR#108](https://github.com/unicode-org/conformance/pull/108) + +Adding a new platform involves several changes to the DDT system: +* Change the workflow to reference the new platform + +* Create a new directory structure under executors/. Add .gitignore as needed. + +* Add configuration specific to the platform in the new directory under executors/ + +* Set up a main program that will receive instructions on the STDIN command line + +** Parse the incoming JSON data to determine test type + +** Build separate files for running each type of test + +** Return results from each testing routine in JSON format + +** Support the commands for information: +*** #VERSION +*** #TEST +*** etc. +* Update testdriver/datasets.py to include the new executor platform. + + +Note: it is very helpful to include sets of tests for the new platform for each supported component. The ICU4J model with Intellij is a good example. + +Make sure that your new executor can be run from a debugging environment or from the command line. This should be done before adding it to the test drive. + +* Add information to run_config.json to add the new platform and its supported components into the DDT workflow. + +** TDB ** # How to use DDT -In its first implementation, Data Driven Test uses data files formatted with -JSON structures describing tests and parameters. The data directory string is -set up as follows: +In its current implementation, Data Driven Test uses JSON formatted data files +describing tests and parameters. The data directory created contains the following: + +## Directory **testData**: -## A directory `testData` containing - * Test data files for each type of test, e.g., collation, numberformat, - displaynames, etc. Each file contains tests with a label, input, and - parameters. - * Verify files for each test type. Each contains a list of test labels and - expected results from the corresponding tests. +Test generation creates the test and verify data files for each version of ICU in .json format: + + * A test data file for each type of test. Each contains a list of labeled + tests with parameters, options, and input values for computing output + strings. + + * Verify files for each test type. Each contains expected results for each + test case. + +For example, here is the structore for directory **toplevel**: + +``` +toplevel/testData/ +├── icu67 +│   └── ... +├── icu68 +│   └── ... +... +├── icu76 +│   ├── collation_test.json +│   ├── collation_verify.json +│   ├── datetime_fmt_test.json +│   ├── datetime_fmt_verify.json +│   ├── lang_name_test_file.json +│   ├── lang_name_verify_file.json +│   ├── likely_subtags_test.json +│   ├── likely_subtags_verify.json +│   ├── list_fmt_test.json +│   ├── list_fmt_verify.json +│   ├── message_fmt2_test.json +│   ├── message_fmt2_verify.json +│   ├── numberformattestspecification.txt +│   ├── numberpermutationtest.txt +│   ├── num_fmt_test_file.json +│   ├── num_fmt_verify_file.json +│   ├── plural_rules_test.json +│   ├── plural_rules_verify.json +│   ├── rdt_fmt_test.json +│   └── rdt_fmt_verify.json + +``` -## Directory `testOutput` +## Directory **testOutput** This contains a subdirectory for each executor. The output file from each test is stored in the appropriate subdirectory. Each test result contains the label @@ -515,7 +618,7 @@ The `verifier_test_report.json` file contains information on tests run and compa differences such as missing or extra characters or substitutions found in output data. -## Contributor setup +## Running Data Driven Test Requirements to run Data Driven Testing code locally: From 125724d8a425673f70ac9d4679c03c784b7a96ee Mon Sep 17 00:00:00 2001 From: Craig Date: Fri, 6 Dec 2024 15:22:29 -0800 Subject: [PATCH 5/5] Minor fix --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index d0d48b85..2d2372c5 100644 --- a/README.md +++ b/README.md @@ -622,7 +622,7 @@ as the result from each test. As an example, collation test results from the ```` And the overall structure: -**toplevel**/testOutput/ +toplevel/testOutput/ ├── cpp │   ├── icu71 │   ├── icu72 @@ -655,7 +655,7 @@ And the overall structure: And showing details for the icu76 output from ICU4J: ```` -**toplevel**/testOutput/icu4j/icu76 +toplevel/testOutput/icu4j/icu76 ├── collation_test.json ├── datetime_fmt_test.json ├── lang_name_test_file.json @@ -706,7 +706,7 @@ Example for details of ICU4C, version 76 of root directory **toplevel**: ```` -**toplevel**/testReports/cpp/icu76/number_fmt/ +toplevel/testReports/cpp/icu76/number_fmt/ ├── error_characterized.json ├── fail_characterized.json ├── failing_tests.json