-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Features + release notes #5188
Conversation
WalkthroughThe pull request introduces extensive updates to the documentation for the FiftyOne framework, focusing on data quality analysis, the FiftyOne Teams features, and the Model Evaluation panel. Key enhancements include new sections on "Leaky splits," "Near duplicates," and "Exact duplicates" in the FiftyOne Brain documentation, as well as updates to the Dataset Zoo and Model Zoo documentation for improved clarity. The release notes for various versions have been updated, and significant restructuring has been done in the documentation related to plugin development and the FiftyOne App. Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query Performance change looks good, thanks Brian! ✅
ecf9ad5
to
b4eea4e
Compare
b4eea4e
to
dc28213
Compare
b8dbf2e
to
7b3986b
Compare
7b3986b
to
195aab5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (14)
docs/source/user_guide/index.rst (2)
59-63
: Remove "SUB_NEW" suffix from section headerThe "SUB_NEW" suffix in the "Evaluating models" section header appears to be a temporary marker and should be removed for consistency.
- :header: Evaluating models __SUB_NEW__ + :header: Evaluating models
114-115
: Remove "SUB_NEW" suffix from toctree entryThe "SUB_NEW" suffix should be removed from the toctree entry for consistency.
- Evaluating models __SUB_NEW__ <evaluation> + Evaluating models <evaluation>docs/source/teams/index.rst (2)
96-96
: Remove "SUB_NEW" suffix from all section headersThe "SUB_NEW" suffix appears to be a temporary marker and should be removed from all section headers for consistency:
- Data Lens
- Data Quality
- Model Evaluation
- Query Performance
- :header: Data Lens __SUB_NEW__ + :header: Data Lens - :header: Data Quality __SUB_NEW__ + :header: Data Quality - :header: Model Evaluation __SUB_NEW__ + :header: Model Evaluation - :header: Query Performance __SUB_NEW__ + :header: Query PerformanceAlso applies to: 102-102, 108-108, 114-114
170-171
: Remove "SUB_NEW" suffix from toctree entriesThe "SUB_NEW" suffix should be removed from the toctree entries for consistency.
- Data Quality __SUB_NEW__ <data_quality> + Data Quality <data_quality> - Query Performance __SUB_NEW__ <query_performance> + Query Performance <query_performance>docs/source/teams/query_performance.rst (3)
59-60
: Fix typo in sidebar wordThere's a typo in the word "sidbar" which should be "sidebar".
-top-right of the sidbar: +top-right of the sidebar:
61-63
: Clarify timeout duration for synchronous executionThe documentation mentions a timeout for synchronous execution but doesn't specify the exact timeout duration. This information would be valuable for users testing the feature.
- performend synchronously and will timeout if it does not complete within a - few minutes. + performed synchronously and will timeout if it does not complete within + 5 minutes.
194-198
: Enhance deletion instructions with specific stepsThe deletion section could be more helpful with step-by-step instructions on how to delete the field.
Deleting a scan _______________ -You can delete an issue scan by simply deleting the corresponding field from -the dataset (e.g., `brightness` for brightness scans). +To delete an issue scan: + +1. Open the Operator browser in the App +2. Select the `delete_sample_field` operator +3. Enter the field name to delete (e.g., `brightness` for brightness scans) +4. Click "Execute" to remove the field and its associated scan resultsdocs/source/teams/data_quality.rst (2)
26-33
: Add specific threshold ranges for issue typesThe documentation would be more helpful if it included the default threshold ranges for each issue type. This helps users understand what constitutes an "unusual" or "abnormal" value.
- **Brightness**: scans for images that are unusually bright or dim - **Blurriness**: scans for images that are abnormally blurry or sharp - **Aspect Ratio**: scans for images that have extreme aspect ratios - **Entropy**: scans for images that have unusually small or large entropy + **Brightness**: scans for images that are unusually bright (>0.9) or dim (<0.1) + **Blurriness**: scans for images that are abnormally blurry (>0.8) or sharp (<0.2) + **Aspect Ratio**: scans for images that have extreme aspect ratios (<0.25 or >4.0) + **Entropy**: scans for images that have unusually small (<3.0) or large (>8.0) entropy
141-143
: Clarify tag application scope with examplesThe explanation of tag application scope could be enhanced with specific examples.
- If you've selected samples in the grid, only those samples will be tagged. - Otherwise, tags will be added to all samples in your current view (i.e., - all potential issues). + If you've selected specific samples in the grid (e.g., by clicking or + Ctrl+clicking), only those selected samples will be tagged. Otherwise, + tags will be added to all samples in your current view that match the + threshold criteria (i.e., all potential issues currently displayed).docs/source/brain.rst (2)
1268-1272
: Consider adding a note about performance implications.While the documentation for leaky splits is comprehensive, it would be helpful to add a note about the computational requirements and performance implications of running this analysis on large datasets.
1617-1623
: Consider adding a warning about memory usage.While the exact duplicates feature is well documented, it would be helpful to add a warning about potential memory usage when processing large datasets.
docs/source/plugins/developing_plugins.rst (1)
Line range hint
2385-2496
: Great addition of the execution store documentation!The new section about execution store is a valuable addition that enables plugin developers to persist data beyond panel lifetimes. The documentation is clear, comprehensive, and includes well-structured examples for both dataset-scoped and global stores.
One suggestion to make this even better: Consider adding a small real-world example showing a practical use case, such as caching expensive computations or storing user preferences.
docs/source/release-notes.rst (2)
34-43
: Consider adding migration steps for breaking changesWhile the changes are well documented, consider adding explicit migration steps or code examples for breaking changes like the removal of Auth0 dependency and introduction of Internal Mode.
Line range hint
1-4
: Add version compatibility matrixConsider adding a compatibility matrix at the top of the release notes showing supported Python versions and key dependencies for each major release.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
⛔ Files ignored due to path filters (21)
docs/source/images/app/app-query-performance-disabled.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/app/app-query-performance.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/app/model-evaluation-class.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/app/model-evaluation-compare.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/app/model-evaluation-confusion.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/app/model-evaluation-metric.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/app/model-evaluation-notes.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/app/model-evaluation-open.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/app/model-evaluation-review.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/app/model-evaluation-summary.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/teams/data_quality_brightness_analysis.png
is excluded by!**/*.png
,!**/*.png
docs/source/images/teams/data_quality_brightness_mark_as_reviewed.png
is excluded by!**/*.png
,!**/*.png
docs/source/images/teams/data_quality_brightness_scan.png
is excluded by!**/*.png
,!**/*.png
docs/source/images/teams/data_quality_brightness_scan_options.png
is excluded by!**/*.png
,!**/*.png
docs/source/images/teams/data_quality_brightness_scheduled.png
is excluded by!**/*.png
,!**/*.png
docs/source/images/teams/data_quality_brightness_slider.gif
is excluded by!**/*.gif
,!**/*.gif
docs/source/images/teams/data_quality_brightness_tag.png
is excluded by!**/*.png
,!**/*.png
docs/source/images/teams/data_quality_home.png
is excluded by!**/*.png
,!**/*.png
docs/source/images/teams/data_quality_new_samples_home.png
is excluded by!**/*.png
,!**/*.png
docs/source/images/teams/data_quality_new_samples_modal.png
is excluded by!**/*.png
,!**/*.png
docs/source/images/teams/qp_toggle.png
is excluded by!**/*.png
,!**/*.png
📒 Files selected for processing (13)
docs/source/brain.rst
(5 hunks)docs/source/dataset_zoo/index.rst
(1 hunks)docs/source/index.rst
(1 hunks)docs/source/model_zoo/index.rst
(1 hunks)docs/source/plugins/developing_plugins.rst
(1 hunks)docs/source/release-notes.rst
(3 hunks)docs/source/teams/data_quality.rst
(1 hunks)docs/source/teams/index.rst
(2 hunks)docs/source/teams/overview.rst
(1 hunks)docs/source/teams/query_performance.rst
(1 hunks)docs/source/user_guide/app.rst
(10 hunks)docs/source/user_guide/evaluation.rst
(14 hunks)docs/source/user_guide/index.rst
(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- docs/source/model_zoo/index.rst
- docs/source/index.rst
🔇 Additional comments (13)
docs/source/dataset_zoo/index.rst (1)
25-26
: LGTM: Header formatting improvement
The section header formatting follows RST standards and maintains consistency with the document structure.
docs/source/teams/overview.rst (1)
111-111
: 🛠️ Refactor suggestion
Update visual similarity documentation reference
The line combines two separate features into one reference. Based on the AI summary, these should be separate references.
- | :ref:`Visual similarity <brain-similarity>` and :ref:`dataset uniqueness <brain-image-uniqueness>`
+ | :ref:`Visual similarity <brain-similarity>`
+ | :ref:`Dataset uniqueness <brain-image-uniqueness>`
Likely invalid or redundant comment.
docs/source/brain.rst (2)
41-46
: Well-structured introduction to leaky splits feature.
The introduction effectively explains the problem of duplicates and near duplicates in dataset splits and their impact on model evaluation.
1421-1426
: Clear explanation of near duplicates functionality.
The documentation effectively explains the purpose and usage of the near duplicates detection feature. The code examples are clear and practical.
docs/source/user_guide/evaluation.rst (3)
16-17
: Improved reference to Model Evaluation panel.
The addition of the reference to the Model Evaluation panel enhances the documentation by directing users to the interactive exploration capabilities.
58-58
: Consistent naming convention for evaluation keys.
The change from eval_predictions
to eval
as the evaluation key improves consistency throughout the documentation.
Also applies to: 141-142
Line range hint 1748-1784
: Comprehensive documentation of Model Evaluation panel.
The new section thoroughly explains the Model Evaluation panel's features and capabilities. The code examples and screenshots effectively demonstrate its usage.
docs/source/user_guide/app.rst (3)
403-409
: Clear explanation of query performance indicators.
The documentation effectively explains how indexed fields are indicated in the App's sidebar and their impact on query performance.
2010-2025
: Well-structured Histograms panel documentation.
The new section clearly explains the different modes and capabilities of the Histograms panel, with good examples of its use cases.
2026-2030
: Helpful note about dynamic updates.
The note effectively explains how the statistics automatically update to reflect the current view, which is an important feature for users to understand.
docs/source/release-notes.rst (3)
6-24
: LGTM: Clear and well-structured release notes for Teams 2.2.0
The release notes effectively communicate the key changes and improvements in this version, with good organization into logical sections.
26-27
: LGTM: Clear version header and release date format
The version header follows a consistent format with version number and release date.
Also applies to: 28-30
Line range hint 32-178
: LGTM: Comprehensive changelog sections
The changelog is well-organized into clear sections (App, Core, Brain, etc.) making it easy for users to find relevant changes.
.. customcalloutitem:: | ||
:header: Grouped datasets | ||
:description: Use grouped datasets to represent your multiview image, video, and point cloud data. | ||
:button_text: Learn more about grouped datasets | ||
:button_link: groups.html | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the search results, I can generate the final response as there is sufficient evidence to make a determination.
Keep the "Grouped datasets" section as it's a core feature with widespread usage
The "Grouped datasets" feature is extensively used and documented throughout the codebase, with significant presence in:
- Core functionality documentation in
groups.rst
- Multiple release notes entries showing active development and maintenance
- Integration with the FiftyOne App
- Dataset import/export functionality
- Custom dataset creation workflows
The feature is critical for handling multiview data, and its removal would negatively impact users working with:
- Multiple camera views
- Video datasets
- Point cloud data
- Custom grouped dataset implementations
🔗 Analysis chain
Verify impact of removing "Grouped datasets" section
The removal of the "Grouped datasets" section might impact users looking for multiview data handling documentation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for references to grouped datasets in the codebase
rg -i "grouped.?datasets?" docs/
Length of output: 8235
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
docs/source/brain.rst (4)
41-60
: Consider adding cross-references to related sectionsThe introductions for the new features are clear, but consider adding cross-references to related functionality like similarity indexes or visualization methods where applicable. This would help users understand how these features fit into the broader FiftyOne ecosystem.
1276-1413
: Add performance considerations for large datasetsThe leaky splits documentation is thorough, but consider adding a note about performance implications when working with large datasets, including:
- Memory requirements
- Processing time expectations
- Recommendations for batch processing if applicable
1459-1465
: Enhance threshold selection guidanceWhile the documentation mentions that threshold values may need adjustment, consider adding:
- Typical threshold ranges for common use cases
- Examples of how different thresholds affect results
- A methodology for finding optimal threshold values
1652-1654
: Add technical details about hash implementationConsider enhancing the documentation with:
- The specific hash algorithm used
- Storage requirements for hashes
- How hashes are persisted
- Performance characteristics of the hashing process
-------------------- | ||
*Released December 3, 2024* | ||
|
||
Includes all updates from :ref:`FiftyOne 1.1.0 <release-notes-v1.1.0>`, plus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a note about invitations and/or any other non-samples related changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, added a note for SMTP configuration 👍
`pinecone-client>=3.2` | ||
`#202 <https://github.com/voxel51/fiftyone-brain/pull/202>`_ | ||
|
||
Plugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added skip_prompt - that should be in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added thanks 👍
|
||
.. _query-performance-disable: | ||
|
||
Disabling query performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: Query Performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated all to Query Performance
👍
Disabling query performance | ||
___________________________ | ||
|
||
Query performance is enabled by default for all datasets. This is generally the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: Query Performance
recommended setting for all large datasets to ensure that queries are | ||
performant. | ||
|
||
However, in certain circumstances you may prefer to disable query performance, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, in certain circumstances you may prefer to disable query performance, | |
However, in certain circumstances you may prefer to disable Query Performance, |
which enables the App's sidebar to show additional information such as | ||
label/value counts that are useful but more expensive to compute. | ||
|
||
You can enable/disable query performance for a particular dataset for its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can enable/disable query performance for a particular dataset for its | |
You can enable/disable Query Performance for a particular dataset for its |
:alt: app-query-performance-disabled | ||
:align: center | ||
|
||
You can also enable/disable query performance via the status button in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also enable/disable query performance via the status button in the | |
You can also enable/disable Query Performance via the status button in the |
:align: center | ||
|
||
You can also enable/disable query performance via the status button in the | ||
upper right corner of the Query performance panel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upper right corner of the Query performance panel: | |
upper right corner of the Query Performance panel: |
@@ -554,6 +542,30 @@ field: | |||
|
|||
Numeric field filters are not supported by wildcard indexes. | |||
|
|||
.. _app-disasbling-query-performance: | |||
|
|||
Disabling query performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling query performance | |
Disabling Query Performance |
Disabling query performance | ||
--------------------------- | ||
|
||
Query performance is enabled by default for all datasets. This is generally the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query performance is enabled by default for all datasets. This is generally the | |
Query Performance is enabled by default for all datasets. This is generally the |
recommended setting for all large datasets to ensure that queries are | ||
performant. | ||
|
||
However, in certain circumstances you may prefer to disable query performance, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, in certain circumstances you may prefer to disable query performance, | |
However, in certain circumstances you may prefer to disable Query Performance, |
which enables the App's sidebar to show additional information such as | ||
label/value counts that are useful but more expensive to compute. | ||
|
||
You can disable query performance for a particular dataset for its lifetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can disable query performance for a particular dataset for its lifetime | |
You can disable Query Performance for a particular dataset for its lifetime |
:alt: app-query-performance-disabled | ||
:align: center | ||
|
||
You can also disable query performance by default for all datasets by setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also disable query performance by default for all datasets by setting | |
You can also disable Query Performance by default for all datasets by setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we have agreed on a naming convention, but I think having capitalized Query Performance makes it clearer we are referring to a mode in inside Fiftyone :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
docs/source/release-notes.rst (2)
6-28
: Consider adding release date for FiftyOne Teams 2.2.0The release notes for Teams 2.2.0 are missing a release date. Consider adding "Released December X, 2024" for consistency with other entries.
31-34
: Consider adding release date for FiftyOne 1.1.0The release notes for version 1.1.0 are missing a release date. Consider adding "Released December X, 2024" for consistency with other entries.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
docs/source/release-notes.rst
(3 hunks)docs/source/teams/query_performance.rst
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/source/teams/query_performance.rst
🔇 Additional comments (3)
docs/source/release-notes.rst (3)
Line range hint 1-5
: LGTM: Header section is properly formatted
The header section follows standard RST format with appropriate title and role definition.
Line range hint 1-3500
: Ensure version numbers follow semantic versioning
The version numbers appear to follow semantic versioning (MAJOR.MINOR.PATCH), but let's verify this programmatically.
✅ Verification successful
Based on the shell script output, I can verify that all version numbers in the release notes follow semantic versioning (MAJOR.MINOR.PATCH) format. The script checked each version number against the pattern ^[0-9]+\.[0-9]+\.[0-9]+$
and found no invalid versions.
All version numbers in the release notes follow proper semantic versioning format.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify version numbers follow semantic versioning
# Extract and check all version numbers
rg "FiftyOne (Teams )?v?\d+\.\d+\.\d+" | \
grep -o "[0-9]\+\.[0-9]\+\.[0-9]\+" | \
while read version; do
if ! echo $version | grep -qE "^[0-9]+\.[0-9]+\.[0-9]+$"; then
echo "Invalid version number: $version"
fi
done
Length of output: 14942
Line range hint 1-3500
: Verify all documentation links are valid
The release notes contain many cross-references and links to documentation sections. Consider running a link checker to verify all links are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
- **Blurriness**: scans for images that are abnormally blurry or sharp | ||
- **Aspect Ratio**: scans for images that have extreme aspect ratios | ||
- **Entropy**: scans for images that have unusually small or large entropy | ||
- **Near Duplicates**: leverages embeddings to scan for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coming soon!
`aspect_ratio` field of the sample | ||
- **Entropy**: the entropy of each image is stored in an `entropy` field of | ||
the sample | ||
- **Near Duplicates**: the nearest neighbor distance of each sample is stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coming soon!
Summary by CodeRabbit
Release Notes
New Features
Documentation Updates
Bug Fixes