-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer class for the property selection #137
base: main
Are you sure you want to change the base?
Conversation
Here is a pre-built version of the code in this pull request: wheels.zip, you can install it locally by unzipping |
The documentation for this PR is (or will soon be) available on readthedocs: https://rascaline--137.org.readthedocs.build/en/137/ |
123d726
to
4842eb9
Compare
f685faf
to
01d01c3
Compare
I noticed the branch before seeing the PR and was wondering what this was. I think that transformer has too much baggage in ML to use it as a class that enacts a generic transformation of a tensor. We need a better name. |
Yes, of course the name will be changed, it's a draft version of the name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code is not very clear to me, so let's try to iterate a bit on the attributes/functions naming 😃
python/rascaline/transformer.py
Outdated
from equistore import Labels, TensorBlock, TensorMap | ||
|
||
|
||
class Transformer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about something like this? Then explaining in the docstring why it is useful.
class Transformer: | |
class PropertiesSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sounds great!
python/rascaline/transformer.py
Outdated
This variable can accept knowledge of type str (one key), list (a list | ||
of keys) and Labels (in addition to the name of the keys, pass a list | ||
of keys to be moved). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable can accept knowledge of type str (one key), list (a list | |
of keys) and Labels (in addition to the name of the keys, pass a list | |
of keys to be moved). | |
This variable can be anything supported by the :py:class:`equistore.TensorMap.keys_to_properties` | |
or :py:class:`equistore.TensorMap.keys_to_samples` functions, i.e. one string, a list of strings or an | |
instance of :py:class:`equistore.Labels` |
python/rascaline/transformer.py
Outdated
self.selector = selector | ||
self.transformation = transformation | ||
self.moved_keys = moved_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user should not access these after creating an instance of this class
self.selector = selector | |
self.transformation = transformation | |
self.moved_keys = moved_keys | |
self._selector = selector | |
self._transformation = transformation | |
self._moved_keys = moved_keys |
python/rascaline/transformer.py
Outdated
if (self.transformation is None) and (self.moved_keys is not None): | ||
raise ValueError("unable to shift keys: unknown transformation type") | ||
|
||
def _copy(self, tensor_map): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to make a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the _mover
function, I move keys_to_samples/keys_to_properties with the tensor, saving its keys before doing so. Before I used the _copy
function, the saved keys started pointing to the keys of the already modified tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. If you only need the keys, it might be better to only copy the keys, they should work fine with tensor.keys.copy()
python/rascaline/transformer.py
Outdated
blocks.append(block.copy()) | ||
return TensorMap(tensor_map.keys, blocks) | ||
|
||
def keys_definition(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def keys_definition(self): | |
def _keys_definition(self): |
python/rascaline/transformer.py
Outdated
self.moved_keys_names = self.moved_keys.copy() | ||
self.final_keys = self._old_keys | ||
else: | ||
# The third case is a little more complicated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The third case is a little more complicated. | |
assert isinstance(self.moved_keys, Labels) | |
# The third case is a little more complicated. |
python/rascaline/transformer.py
Outdated
of keys to be moved). | ||
""" | ||
|
||
def __init__(self, selector, transformation=None, moved_keys=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use the same name as the equistore functions
def __init__(self, selector, transformation=None, moved_keys=None): | |
def __init__(self, selector, transformation=None, keys_to_move=None): |
python/tests/transformer.py
Outdated
|
||
import numpy as np | ||
from equistore import TensorBlock, TensorMap | ||
from skcosmo.feature_selection import FPS as FPS_f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you rename FPS to FPS_f?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to rename it from the time I thought it would be necessary to make a sample selection too. I'll rename it
python/rascaline/transformer.py
Outdated
self.keys_definition() | ||
self.properties_selection() | ||
|
||
def transform(self, frames, calculator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this class take the calculator in the constructor? Given that it will only be usable with one calculator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this function also needs to handle all other kwargs that could be given to the calculator (mainly gradients
& use_native_system
). We could either have them explicitly, or take a **kwargs
argument, set the corresponding selected_properties
and pass it to the calculator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the first point, it's an interesting question. So, it is clear that we need to input frames
(our general task is to get descriptor
from them as simply as possible). For this, we need a calculator
one way or another. And now the question is at what point we will feed it as input? We either have to do it in the fit
function or already in transform
. In general, the first option is possible (we can pass either descriptor
and calculator
or frames
and calculator
to fit
). What do you think would be better?
On the second point, yes, I agree, I'll try to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would pass the calculator even earlier (in __init__
), and then both fit
and transform
take a set of frames + some options as kwargs
python/rascaline/transformer.py
Outdated
tensor_copy.keys_to_properties(self.moved_keys) | ||
return tensor_copy | ||
|
||
def properties_selection(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def properties_selection(self): | |
def _properties_selection(self): |
You can also add skcosmo to the tox environments to be able to run the tests. |
Ok, I added it, but it's still having trouble passing the tests. Is there anything else I should add? |
You can skip tests if skcosmo is not installed (as long as it is still tested in the |
setup.cfg
Outdated
@@ -21,6 +21,7 @@ package_dir = | |||
install_requires = | |||
numpy | |||
equistore @ https://github.com/lab-cosmo/equistore/archive/6ca7fa3.zip | |||
skcosmo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be in the install_requires
(it is an optional dependency, like ase or chemfiles)
Hello, everyone!
This PR is the first step in adding the 'Transformer' class to rascaline. The 'Transformer' class makes it easy to create a representation matrix when using some other matrix as a reference. A classic use case is to create a TensorMap representation for a dataset, then perform transformations within that TensorMap (e.g., keys_to_features or keys_to_properties), and select the most useful features in the transformed TensorMap. The 'Transformer' allows a set of these features to be used to calculate a new TensorMap, thus saving computation time and maintaining a single I tried to explain the logic of the algorithm in the comments to the code, but if you have any questions - ask, I'll be happy to answer, as well as happy to receive any comments and suggestions.
IMPORTANT:
We need to add to the code the ability to perform several transformations one by one. This will be the next step.
Note that this branch is based on the selected_keys branch, which itself is still a PR. representation for all representations.