Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum and maximum value arguments (constraints) #9

Open
ThirstyGeo opened this issue Feb 18, 2021 · 15 comments
Open

Minimum and maximum value arguments (constraints) #9

ThirstyGeo opened this issue Feb 18, 2021 · 15 comments
Assignees
Labels
enhancement New feature or request priority Something we'll try implement in the next release

Comments

@ThirstyGeo
Copy link

I'm working with Dirichlet distributions and the compositional data simplex, and am really enjoying MIDASpy's flexibility when dealing with this data (related to K-L divergence in the decoder). However, there is a tendency to produce negative values in the numerical feature data I have been using.

In the case of compositional data, there is a constraint of zero as a minimum value. Other imputation approaches allow setting maximum and minimum value arguments (e.g., Scikit-Learn) and importantly these can be set per feature (autoimpute). Is this an argument which could be added to the package? It would be a major help to people working in several disciplines.

@tsrobinson
Copy link
Collaborator

tsrobinson commented Feb 18, 2021

Thanks @ThirstyGeo for raising this issue -- completely agree that it would be a really useful feature. The best way to implement this is probably to allow users to change the activation functions for specific output nodes in the network -- then the model will incorporate this range trimming within training itself.

We will look into this as a priority, and if you had any further suggestions/pull requests they'd be greatly received.

@tsrobinson tsrobinson added enhancement New feature or request priority Something we'll try implement in the next release labels Feb 18, 2021
@tsrobinson tsrobinson self-assigned this Feb 18, 2021
@ThirstyGeo
Copy link
Author

That's great @tsrobinson! Much appreciated to focus on this. I'll think a bit more through the typical workflows and see if I can create a which represents a typical situation. If you like it, it could be something for the package's examples/tutorials

@ThirstyGeo
Copy link
Author

ThirstyGeo commented Feb 19, 2021

As a tangent of interest - few research articles are present which relate to imputation of data in the compositional data Simplex. The best one I'm aware of for Deep Learning oriented research for imputing compositional data relates to the specific case of 'censored zeroes', i.e., the values which are below analytical detection and above zero (the only information usual given is that the values are below a certain threshold). The article focusses on ANNs, and has a focus on feature pre-processing (using log-ratio transformations on the features, to move them out of the Simplex and into Euclidean space).

The autoencoder approach of MIDASpy has the significant potential advantages of (1) allowing mixed data types, (2) not requiring a pre-processing step, (3) producing multiple realisations and therefore a measure of confidence for imputed values. Very exciting!

@ranjitlall
Copy link
Collaborator

Really interesting - thanks @ThirstyGeo for letting us know about this research.

@geraldine28
Copy link

Hello and thank you for this great package. I wanted to inquire whether you have had any progress on this issue? We have a data set with a lot of count data variables, and many of them get imputed with negative values, which isn't ideal. Hence, our interest :)

@kblnig
Copy link

kblnig commented Feb 18, 2023

Any news on this? Or maybe a small idea on how or where this would fit best in the code if i were to toy around with it myself? :)

@ranjitlall
Copy link
Collaborator

ranjitlall commented Feb 23, 2023

Hi @geraldine28 @kblnig, we are looking into this now and will get back to you shortly. Sorry about the delay!

@kblnig
Copy link

kblnig commented Mar 2, 2023

@ranjitlall - really looking forward to this :) !!!!

@martin18d
Copy link

Echoing others' enthusiasm, I'm also wondering if there's any news on this feature

@AuSpotter
Copy link

Looking forward to this feature!

@tsrobinson
Copy link
Collaborator

Thanks everyone for your interest! I can confirm this is now under development, and will update you asap when this functionality is ready for release.

@CoralieGilbert
Copy link

Hello !

I saw that you added this new feature but when I try to call .build_model with the argument positive_columns(), Python tells me it does not exist.

Is it still available or have you removed it ?

Thanks

@tsrobinson
Copy link
Collaborator

Hi @CoralieGilbert, it's still available but we haven't released to PyPI -- i will try to action this by the end of the week and let you know when it's done.

Best,
Tom

@CoralieGilbert
Copy link

Thank you so much !
Currently using your awesome library for my thesis and this release would be a lifesaver.

Best,
Coralie

@tsrobinson
Copy link
Collaborator

tsrobinson commented Aug 31, 2024

All done, @CoralieGilbert! You should be able to pip install MIDASpy --upgrade to install v1.4.1 which includes the positive_columns argument.

Just to note, there are some tensorflow incompatibilities with the new numpy 2.X versions, so if you cannot install/load this new version, try downgrading numpy to 1.26.4 and try again. Any other problems, just let me know :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority Something we'll try implement in the next release
Projects
None yet
Development

No branches or pull requests

8 participants