Minimum and maximum value arguments (constraints) #9

ThirstyGeo · 2021-02-18T04:18:18Z

I'm working with Dirichlet distributions and the compositional data simplex, and am really enjoying MIDASpy's flexibility when dealing with this data (related to K-L divergence in the decoder). However, there is a tendency to produce negative values in the numerical feature data I have been using.

In the case of compositional data, there is a constraint of zero as a minimum value. Other imputation approaches allow setting maximum and minimum value arguments (e.g., Scikit-Learn) and importantly these can be set per feature (autoimpute). Is this an argument which could be added to the package? It would be a major help to people working in several disciplines.

tsrobinson · 2021-02-18T11:50:35Z

Thanks @ThirstyGeo for raising this issue -- completely agree that it would be a really useful feature. The best way to implement this is probably to allow users to change the activation functions for specific output nodes in the network -- then the model will incorporate this range trimming within training itself.

We will look into this as a priority, and if you had any further suggestions/pull requests they'd be greatly received.

ThirstyGeo · 2021-02-18T15:15:22Z

That's great @tsrobinson! Much appreciated to focus on this. I'll think a bit more through the typical workflows and see if I can create a which represents a typical situation. If you like it, it could be something for the package's examples/tutorials

ThirstyGeo · 2021-02-19T19:31:42Z

As a tangent of interest - few research articles are present which relate to imputation of data in the compositional data Simplex. The best one I'm aware of for Deep Learning oriented research for imputing compositional data relates to the specific case of 'censored zeroes', i.e., the values which are below analytical detection and above zero (the only information usual given is that the values are below a certain threshold). The article focusses on ANNs, and has a focus on feature pre-processing (using log-ratio transformations on the features, to move them out of the Simplex and into Euclidean space).

The autoencoder approach of MIDASpy has the significant potential advantages of (1) allowing mixed data types, (2) not requiring a pre-processing step, (3) producing multiple realisations and therefore a measure of confidence for imputed values. Very exciting!

ranjitlall · 2021-02-19T21:48:39Z

Really interesting - thanks @ThirstyGeo for letting us know about this research.

geraldine28 · 2021-07-23T13:16:20Z

Hello and thank you for this great package. I wanted to inquire whether you have had any progress on this issue? We have a data set with a lot of count data variables, and many of them get imputed with negative values, which isn't ideal. Hence, our interest :)

kblnig · 2023-02-18T13:09:51Z

Any news on this? Or maybe a small idea on how or where this would fit best in the code if i were to toy around with it myself? :)

ranjitlall · 2023-02-23T17:02:04Z

Hi @geraldine28 @kblnig, we are looking into this now and will get back to you shortly. Sorry about the delay!

kblnig · 2023-03-02T21:08:41Z

@ranjitlall - really looking forward to this :) !!!!

martin18d · 2024-04-23T21:05:01Z

Echoing others' enthusiasm, I'm also wondering if there's any news on this feature

AuSpotter · 2024-04-23T22:05:49Z

Looking forward to this feature!

tsrobinson · 2024-04-27T10:56:02Z

Thanks everyone for your interest! I can confirm this is now under development, and will update you asap when this functionality is ready for release.

CoralieGilbert · 2024-08-28T18:31:05Z

Hello !

I saw that you added this new feature but when I try to call .build_model with the argument positive_columns(), Python tells me it does not exist.

Is it still available or have you removed it ?

Thanks

tsrobinson · 2024-08-29T09:16:05Z

Hi @CoralieGilbert, it's still available but we haven't released to PyPI -- i will try to action this by the end of the week and let you know when it's done.

Best,
Tom

CoralieGilbert · 2024-08-31T14:20:05Z

Thank you so much !
Currently using your awesome library for my thesis and this release would be a lifesaver.

Best,
Coralie

tsrobinson · 2024-08-31T18:22:11Z

All done, @CoralieGilbert! You should be able to pip install MIDASpy --upgrade to install v1.4.1 which includes the positive_columns argument.

Just to note, there are some tensorflow incompatibilities with the new numpy 2.X versions, so if you cannot install/load this new version, try downgrading numpy to 1.26.4 and try again. Any other problems, just let me know :)

tsrobinson added enhancement New feature or request priority Something we'll try implement in the next release labels Feb 18, 2021

tsrobinson self-assigned this Feb 18, 2021

tsrobinson mentioned this issue Apr 28, 2024

Minimum value argument #36

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimum and maximum value arguments (constraints) #9

Minimum and maximum value arguments (constraints) #9

ThirstyGeo commented Feb 18, 2021

tsrobinson commented Feb 18, 2021 •

edited

Loading

ThirstyGeo commented Feb 18, 2021

ThirstyGeo commented Feb 19, 2021 •

edited

Loading

ranjitlall commented Feb 19, 2021

geraldine28 commented Jul 23, 2021

kblnig commented Feb 18, 2023

ranjitlall commented Feb 23, 2023 •

edited

Loading

kblnig commented Mar 2, 2023

martin18d commented Apr 23, 2024

AuSpotter commented Apr 23, 2024

tsrobinson commented Apr 27, 2024

CoralieGilbert commented Aug 28, 2024

tsrobinson commented Aug 29, 2024

CoralieGilbert commented Aug 31, 2024

tsrobinson commented Aug 31, 2024 •

edited

Loading

Minimum and maximum value arguments (constraints) #9

Minimum and maximum value arguments (constraints) #9

Comments

ThirstyGeo commented Feb 18, 2021

tsrobinson commented Feb 18, 2021 • edited Loading

ThirstyGeo commented Feb 18, 2021

ThirstyGeo commented Feb 19, 2021 • edited Loading

ranjitlall commented Feb 19, 2021

geraldine28 commented Jul 23, 2021

kblnig commented Feb 18, 2023

ranjitlall commented Feb 23, 2023 • edited Loading

kblnig commented Mar 2, 2023

martin18d commented Apr 23, 2024

AuSpotter commented Apr 23, 2024

tsrobinson commented Apr 27, 2024

CoralieGilbert commented Aug 28, 2024

tsrobinson commented Aug 29, 2024

CoralieGilbert commented Aug 31, 2024

tsrobinson commented Aug 31, 2024 • edited Loading

tsrobinson commented Feb 18, 2021 •

edited

Loading

ThirstyGeo commented Feb 19, 2021 •

edited

Loading

ranjitlall commented Feb 23, 2023 •

edited

Loading

tsrobinson commented Aug 31, 2024 •

edited

Loading