Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer class for the property selection #137

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

hurricane642
Copy link
Contributor

Hello, everyone!
This PR is the first step in adding the 'Transformer' class to rascaline. The 'Transformer' class makes it easy to create a representation matrix when using some other matrix as a reference. A classic use case is to create a TensorMap representation for a dataset, then perform transformations within that TensorMap (e.g., keys_to_features or keys_to_properties), and select the most useful features in the transformed TensorMap. The 'Transformer' allows a set of these features to be used to calculate a new TensorMap, thus saving computation time and maintaining a single I tried to explain the logic of the algorithm in the comments to the code, but if you have any questions - ask, I'll be happy to answer, as well as happy to receive any comments and suggestions.

IMPORTANT:
We need to add to the code the ability to perform several transformations one by one. This will be the next step.
Note that this branch is based on the selected_keys branch, which itself is still a PR. representation for all representations.

@hurricane642 hurricane642 requested a review from Luthaf November 24, 2022 11:04
@github-actions
Copy link

github-actions bot commented Nov 24, 2022

Here is a pre-built version of the code in this pull request: wheels.zip, you can install it locally by unzipping wheels.zip and using pip to install the file matching your system

@github-actions
Copy link

The documentation for this PR is (or will soon be) available on readthedocs: https://rascaline--137.org.readthedocs.build/en/137/

@Luthaf Luthaf force-pushed the selected_keys branch 3 times, most recently from 123d726 to 4842eb9 Compare December 1, 2022 11:04
Base automatically changed from selected_keys to master December 6, 2022 13:48
@ceriottm
Copy link
Contributor

ceriottm commented Dec 6, 2022

I noticed the branch before seeing the PR and was wondering what this was. I think that transformer has too much baggage in ML to use it as a class that enacts a generic transformation of a tensor. We need a better name.

@hurricane642
Copy link
Contributor Author

I noticed the branch before seeing the PR and was wondering what this was. I think that transformer has too much baggage in ML to use it as a class that enacts a generic transformation of a tensor. We need a better name.

Yes, of course the name will be changed, it's a draft version of the name

Copy link
Member

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code is not very clear to me, so let's try to iterate a bit on the attributes/functions naming 😃

from equistore import Labels, TensorBlock, TensorMap


class Transformer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about something like this? Then explaining in the docstring why it is useful.

Suggested change
class Transformer:
class PropertiesSelector:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sounds great!

Comment on lines 27 to 29
This variable can accept knowledge of type str (one key), list (a list
of keys) and Labels (in addition to the name of the keys, pass a list
of keys to be moved).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This variable can accept knowledge of type str (one key), list (a list
of keys) and Labels (in addition to the name of the keys, pass a list
of keys to be moved).
This variable can be anything supported by the :py:class:`equistore.TensorMap.keys_to_properties`
or :py:class:`equistore.TensorMap.keys_to_samples` functions, i.e. one string, a list of strings or an
instance of :py:class:`equistore.Labels`

Comment on lines 34 to 36
self.selector = selector
self.transformation = transformation
self.moved_keys = moved_keys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user should not access these after creating an instance of this class

Suggested change
self.selector = selector
self.transformation = transformation
self.moved_keys = moved_keys
self._selector = selector
self._transformation = transformation
self._moved_keys = moved_keys

if (self.transformation is None) and (self.moved_keys is not None):
raise ValueError("unable to shift keys: unknown transformation type")

def _copy(self, tensor_map):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to make a copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the _mover function, I move keys_to_samples/keys_to_properties with the tensor, saving its keys before doing so. Before I used the _copy function, the saved keys started pointing to the keys of the already modified tensor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. If you only need the keys, it might be better to only copy the keys, they should work fine with tensor.keys.copy()

blocks.append(block.copy())
return TensorMap(tensor_map.keys, blocks)

def keys_definition(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def keys_definition(self):
def _keys_definition(self):

self.moved_keys_names = self.moved_keys.copy()
self.final_keys = self._old_keys
else:
# The third case is a little more complicated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The third case is a little more complicated.
assert isinstance(self.moved_keys, Labels)
# The third case is a little more complicated.

of keys to be moved).
"""

def __init__(self, selector, transformation=None, moved_keys=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use the same name as the equistore functions

Suggested change
def __init__(self, selector, transformation=None, moved_keys=None):
def __init__(self, selector, transformation=None, keys_to_move=None):


import numpy as np
from equistore import TensorBlock, TensorMap
from skcosmo.feature_selection import FPS as FPS_f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you rename FPS to FPS_f?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to rename it from the time I thought it would be necessary to make a sample selection too. I'll rename it

self.keys_definition()
self.properties_selection()

def transform(self, frames, calculator):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this class take the calculator in the constructor? Given that it will only be usable with one calculator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this function also needs to handle all other kwargs that could be given to the calculator (mainly gradients & use_native_system). We could either have them explicitly, or take a **kwargs argument, set the corresponding selected_properties and pass it to the calculator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the first point, it's an interesting question. So, it is clear that we need to input frames (our general task is to get descriptor from them as simply as possible). For this, we need a calculator one way or another. And now the question is at what point we will feed it as input? We either have to do it in the fit function or already in transform. In general, the first option is possible (we can pass either descriptor and calculator or frames and calculator to fit). What do you think would be better?
On the second point, yes, I agree, I'll try to fix that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pass the calculator even earlier (in __init__), and then both fit and transform take a set of frames + some options as kwargs

tensor_copy.keys_to_properties(self.moved_keys)
return tensor_copy

def properties_selection(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def properties_selection(self):
def _properties_selection(self):

@Luthaf
Copy link
Member

Luthaf commented Dec 8, 2022

You can also add skcosmo to the tox environments to be able to run the tests.

@hurricane642
Copy link
Contributor Author

You can also add skcosmo to the tox environments to be able to run the tests.

Ok, I added it, but it's still having trouble passing the tests. Is there anything else I should add?

@Luthaf
Copy link
Member

Luthaf commented Dec 20, 2022

You can skip tests if skcosmo is not installed (as long as it is still tested in the all-deps environement), see for example https://github.com/Luthaf/rascaline/blob/3a28ad0831f6f0bf6cb81defd26a47a0e25f228a/python/tests/systems/chemfiles.py#L17

setup.cfg Outdated
@@ -21,6 +21,7 @@ package_dir =
install_requires =
numpy
equistore @ https://github.com/lab-cosmo/equistore/archive/6ca7fa3.zip
skcosmo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be in the install_requires (it is an optional dependency, like ase or chemfiles)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants