-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing Helpful Anomaly Detection Metadata from Anomaly Strategies (ie Anomaly Check Range/Thresholds) through backwards compatible function #593
base: master
Are you sure you want to change the base?
Conversation
* Add Spark 3.5 support * Replace with DataTypeUtils.fromAttributes * Remove unintended new line
…mpleteness (awslabs#532) * Modified Completeness analyzer to label filtered rows as null for row-level results * Modified GroupingAnalyzers and Uniqueness analyzer to label filtered rows as null for row-level results * Adjustments for modifying the calculate method to take in a filterCondition * Add RowLevelFilterTreatement trait and object to determine how filtered rows will be labeled (default True) * Modify VerificationRunBuilder to have RowLevelFilterTreatment as variable instead of extending, create RowLevelAnalyzer trait * Do row-level filtering in AnalyzerOptions rather than with RowLevelFilterTreatment trait * Modify computeStateFrom to take in optional filterCondition
…elOperations is not available (awslabs#536)
…um (awslabs#535) * Address comments on PR awslabs#532 * Add filtered row-level result support for Minimum, Maximum, Compliance, PatternMatch, MinLength, MaxLength analyzers * Refactored criterion for MinLength and MaxLength analyzers to separate rowLevelResults logic
…wslabs#537) * Add analyzerOption to add filteredRowOutcome for isPrimaryKey Check * Add analyzerOption to add filteredRowOutcome for hasUniqueValueRatio Check
…vior.EmptyString option, the where filter wasn't properly applied (awslabs#538)
…slabs#543) * [Min/Max] Apply filtered row behavior at the row level evaluation - This changes from applying the behavior at the analyzer level. It allows us to prevent the usage of MinValue/MaxValue as placeholder values for filtered rows. * Improved the separation of null rows, based on their source - Whether the outcome for a row is null because of being filtered out or due to the target column being null, is now stored in the outcome column itself. - We could have reused the placeholder value to find out if a row was originally filtered out, but that would not work if the actual value in the row was the same originally. * Mark filtered rows as true We recently fixed the outcome of filtered rows and made them default to true instead of false, which was a bug earlier. This change maintains that behavior. * Added null behavior - empty string to match block Not having it can cause match error.
…aluation (awslabs#547) * [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation - For certain scenarios, the filtered row behavior for MinLength and MaxLength was not working correctly. - For example, when using both minLength and maxLength constraints in a single check, and with both using == <value> as an assertion. This was resulting in the row level outcome of the filtered rows to be false. This was because we were replacing values for filtered rows for Min to MaxValue and for Max to MinValue. But a number could not equal both at the same time. - Updated the logic of the row level assertion to MinLength/MaxLength to match what was done for Min/Max.
- The satisfies constraint was incorrectly using the provided assertion to evaluate the row level outcomes. The assertion should only be used to evaluate the final outcome. - As part of this change, we have updated the row level results to return a true/false. The cast to an integer happens as part of the aggregation result. - Added a test to verify the row level results using checks made up of different assertions.
* Added RatioOfSums analyzer and tests * Unit test for divide by zero and code cleanup. * More detailed Scaladoc * Fixed docs to include Double.NegativeInfinity * Add copyright to new file
* Fix flaky KLL test * Move CustomSql state to CustomSql analyzer * Implement new Analyzer to count columns * Improve documentation, remove unused parameter, replace if/else with map --------- Co-authored-by: Yannis Mentekidis <[email protected]>
* Configurable RetainCompletenessRule * Add doc string * Add default completeness const
…awslabs#569) Co-authored-by: Tyler Mcdaniel <[email protected]>
* Configurable RetainCompletenessRule * Add doc string * Add default completeness const * Add ConfidenceIntervalStrategy * Add Separate Wilson and Wald Interval Test * Add License information, Fix formatting * Add License information * formatting fix * Update documentation * Make WaldInterval the default strategy for now * Formatting import to per line * Separate group import to per line import
* Add support for EntityTypes dqdl rule * Add support for Conditional Aggregation Analyzer --------- Co-authored-by: Joshua Zexter <[email protected]>
* Generate row-level results with withColumns Iteratively using withColumn (singular) causes performance issues when iterating over a large sequence of columns. * Add back UNIQUENESS_ID
… for the bound value and isThresholdInclusive, also adding anomaly detection with extended results README, and adding anomaly detection test with 2 anomaly checks on the same suite
@arsenalgunnershubert777 Thanks for the PR! Please allow for additional time to review the PR given the amount of changes. In the meantime, would it be possible to rebase your branch off of latest |
Hey @rdsharma26 thanks for reaching out. I think I have some issues with the conflicts (and I also can improve my understanding of Git) |
Thanks @arsenalgunnershubert777
|
@rdsharma26 got it yea when I run that it has more conflicts, and I notice some classes have reverted to previous states |
Hey @rdsharma26 hope you are well, just checking in if you or anyone else has any comments so far. After your review/comments, I will also push another branch to have less commits and create another PR. |
Description of changes:
This PR adds functionality to expose anomaly detection metadata for anomaly checks.
There is a legacy PR here, and this PR addresses feedback to make the anomaly detection metadata functionality backwards compatible.
Let me know if anyone has any questions/feedback, thanks!
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.