Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback from Rachel #104

Open
puehringer opened this issue Jul 3, 2023 · 0 comments
Open

Feedback from Rachel #104

puehringer opened this issue Jul 3, 2023 · 0 comments
Assignees
Labels
bug Something isn't working high priority

Comments

@puehringer
Copy link
Collaborator

puehringer commented Jul 3, 2023

I've had chance now to more extensively test the internal deployment of cime4r for larger datasets now (mostly 200 000 but sometimes upto 800000). Generally it is working much better now than before, but I have noticed a few small issues still:

Dataset upload: a column with 'shap_0' is still required for all datasets

Projection: occasionally the projection still does not update / occasionally fails for no obvious reason for larger datasets (> 400000) but normally a projection can be achieved by repeating the projection.

Filtering: the filters shown do not match the filtered data (e.g. here the filter labels show -1 to 11, but the data points are only 0 to 11). This also happens sometimes with more complex filter settings, not just on initiation.

Image

Image

Filtering: if the dataset is larger than 10000 then it should be 'randomly' filtered, but currently the filtering is not random, e.g. here the only catalyst 'c' is shown but the dataset has at least 5 different catalyst values:

Image

Aggregate: sometimes fails for large datasets (around 800000) points and the selecting one hexagon to look at the points inside it (on the 'selection' tab) sometimes does not work. But I have not yet figured for which cases it does / does not work.

General: For large datasets the whole interface (but particularly aggregation and encoding) can be very slow and sluggish. This particularly causes a problem when the session times out whilst working on a dataset and you have to start from scratch.

For me, the biggest problems from above are the problems with filtering (both the random sampling and the filter selection being shown correctly).

With the selection of 10000 datapoints, the options for the expected behaviour would be one of the following:

  • show all experiments with experiment_cycle > -1, then a completely random selection of the rest
  • always show the first 10000 points (we could randomise the data when preparing it)
  • completely random selection of 10000 points

( I would not mind which one but it would be important to understand what is actually happening)

@puehringer puehringer added bug Something isn't working high priority labels Jul 3, 2023
@puehringer puehringer self-assigned this Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority
Projects
None yet
Development

No branches or pull requests

1 participant