-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of things to consider cleaning up in the light curve notebook #177
Comments
@bsipocz what's your input on the Formatting section? What standards should we apply for .py and .md files? |
re: sample_lc.py and nonunique_sample inside of sample_selection.py What to do with them depends on our goal for the ML notebooks. I think they should start with a pre-made sample of light curves, and not need to run light_curve_generator. My reasoning is twofold. 1) speed of running the ML notebooks will be faster without first having to generate the light curves. 2) for anyone wanting to use the ML notebooks they wouldn't have to understand the extra pieces of code that generate the light curves, they could just focus on the code in the ML notebooks themselves. If that is ok with everyone, then we should remove sample_lc.py (but make sure @xoubish has a copy somewhere since she needs it to generate the initial sample), and also remove nonunique_sample |
I would say we could add a CI job to check on the .py formatting (though would not enable pre-commit or anything annoying that messes with the commits/branches, only a failing CI job on the PR). But I don't know anything from the top of my head for md files, will have to google for that one. |
I'm in favor of that, but I'd like to keep the scope of this Issue more narrowly focused on cleaning up just the light curve notebook. I have specific formatting preferences, especially for python code. While I know that your preferences are different than mine, I don't know exactly what your preferences are. So I'd like our group to come to a consensus before I make any broad changes there. Do you want to discuss specifics to be applied to this notebook? Or do you think we should leave the formatting alone for now? |
Something that is PEP8 compatible. I don't like to be as strict and whitespace heavy as black and would rather just make flake8 or similar pass on the codebase, but I agree, getting a consensus would be nice and I suppose can be done quickly. Shall we put it on the agenda for this afternoon? |
@jkrick That's all fine with me. We'll just need to figure out where to store the pre-made sample of light curves. (I know we're using Google Drive for this right now, but want to figure out a more general solution.) |
I have added formatting preferences to the agenda for this afternoon, but I would be super happy to just adopt whatever you two think is best. This is definitely your domain. |
And I am pretty sure I have a gitlab issue in with the Navteca people about where we should be storing data as a long term solution instead of google drive. |
After looking over everything in the directory, I fully agree that MAST_functions.py can be removed. I don't see it used anywhere. |
Cleanup light curve notebook (complete Issue #177)
Collecting a list of things we may consider doing as we finish up the notebook light_curve_generator.md.
Function docstrings
Notebook text
coords_list
-->sample_table
, instructions in section 1.1 should be updated). (Light curve text cleanup #189)Code
Note that several MAST functions still iterate over individual sample objects rather than loading in bulk. This is a known issue being handled separately (see #165, #166, #179).
General
- [ ] Do we want to remove multiIndex from the df_lc data structure in light curves notebook? #159deferred- [ ] Update band names for clarity #182deferred- [ ] Since it's hard to attach units, etc. to DataFrame columns/values, consider renaming columns to be explicit (e.g., "flux" --> "flux_mjy" and "time" --> "mjd"). OTOH, we may run into issues that mean this is just more confusing (to the point of deceiving), like ZTF actually uses "hmjd" rather than "mjd".fluxconversions.py
- [ ] See if we can use Astropy methods in place of any functions in this file (prompted by #181 (comment)).moved to #188heasarc_functions.py
heasarc_cat
andmax_error_radius
have an implicit, one-to-one relationship. Replace the two lists with a dict that ties the correct values together. Then iterate over the dict instead of a list index. (Cleanup light curve notebook (complete Issue #177) #198)mast_functions.py
panstarrs.py
try
block as possible. (Cleanup light curve notebook (complete Issue #177) #198)plot_functions.pymoved to #199sample_lc.py
- [ ] Update it to, e.g., use sample_table. (This is a script version of the parallel section of the notebook, meant to be called from command line, but has not been kept up to date.) Alternately, consider making the modules directly callable from the command line and write a .sh script to execute them. This should speed up bulk runs because ZTF could use multiple workers.(moved to #195)sample_selection.py
- [ ]deferred (not used in this notebook)nonunique_sample
function: either remove it or update it to return a Table likeclean_sample
does.tde_functions.py
TESS_Kepler_functions.py
try
block and catch a specific error in theexcept
statement. (Cleanup light curve notebook (complete Issue #177) #198)numlc
. (Cleanup light curve notebook (complete Issue #177) #198)ztf_functions.py
ztf_radius
orradius
, but not both. (Cleanup light curve notebook (complete Issue #177) #198)The text was updated successfully, but these errors were encountered: