Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In regards to the background of the model #4

Open
MarKo9 opened this issue May 7, 2018 · 5 comments
Open

In regards to the background of the model #4

MarKo9 opened this issue May 7, 2018 · 5 comments

Comments

@MarKo9
Copy link

MarKo9 commented May 7, 2018

Hi,

First of all thanks for your work.
I am in the process of doing some testings and if possible would like a quick clarification. Is your model the same or influenced by the one in the paper below? "MIDA: Multiple Imputation using Denoising Autoencoders" by Lovedeep Gondara and Ke Wang.
https://arxiv.org/pdf/1705.02737.pdf

@Oracen-zz
Copy link
Owner

Oracen-zz commented Jun 3, 2018

No, although I discovered that paper about 2 months after I started working on MIDAS. That concept has more in common with a chained equation approach (ie. MICE, Hmisc's areg.impute function) than MIDAS. In fact, it's more just like an ensemble of denoising autoencoders, meaning your training time scales with the number of imputations you require. MIDAS kind of draws on principles of variational inference to compute an approximate posterior. Train the model once, draw as many imputations as you require.

It's worth noting that Gondara and Wang don't exactly publicise their code or results. When I did eventually track down some code, it was on Gondara's github:
https://github.com/lgondara/loss-to-followup-DAE/blob/master/programs/Impute_LTF_DAE.py

If you look there, there's just an oblique Keras model, and it looks like noise isn't even dynamic for the representation they learn. Further, the loss function just appears to be MSE but the output layer is ReLU'd (ie. the output is constrained to be greater than zero). When combined with the StandardScaler transform they're using, they're going to have some difficulties. In the paper you discussed, they don't even address the issue of how missingness affects loss generation, which (from experience) is a key challenge. No mention of how they managed to bypass the challenge with softmax loss.

Also, it's worth noting; they changed the name of their paper in Feb 2018 by the looks of things. Their paper used to be called "Multiple Imputation Using Deep Denoising Autoencoders". Here's some correspondence with Andrew Gelman we had in Jan:
http://andrewgelman.com/2018/01/10/python-program-multivariate-missing-data-imputation-works-large-datasets/

In short; no we're not associated with them.

@charuj
Copy link

charuj commented Oct 25, 2018

Hi there @Oracen and @ranjitlall !

Your repo is awesome! Really appreciate the work you put into it

Following-up with the above question, I'm wondering if you have a peer-reviewed (or preprint) paper about MIDAS? I'm wondering how I can cite your research to give you credit

Thanks!

@Oracen-zz
Copy link
Owner

Oracen-zz commented Oct 25, 2018 via email

@asthaIITD
Copy link

Hii @Oracen ,

Is the paper you both were authoring finalized? I could not find it online.

@tsrobinson
Copy link

Hi @asthallTD, @charuj,

We have now migrated MIDAS to this new repo where all future releases will be made available.

The current paper explaining the background of the model can be found on APSA Preprints here: https://doi.org/10.33774/apsa-2020-3tk40-v3

Best,
Tom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants