List of things to consider cleaning up in the light curve notebook #177

troyraen · 2023-11-22T10:38:12Z

Collecting a list of things we may consider doing as we finish up the notebook light_curve_generator.md.

Function docstrings

Review for accuracy. (Light curve text cleanup #189)

Notebook text

Review for accuracy (e.g., since we switched coords_list --> sample_table, instructions in section 1.1 should be updated). (Light curve text cleanup #189)
Review sections for completeness: (Light curve text cleanup #189)
- Non-standard Imports
- References
Some of the serial cells have text explaining the catalog data or query method (e.g., WISE) but some don't (e.g., ZTF). Consider standardizing this. (Light curve text cleanup #189)

Code

Note that several MAST functions still iterate over individual sample objects rather than loading in bulk. This is a known issue being handled separately (see #165, #166, #179).

General

warnings on parallel section #174
Remove use of pickle files from the light curve notebook #155
Revise dependencies of notebooks #184
~~- [ ] Do we want to remove multiIndex from the df_lc data structure in light curves notebook? #159~~ deferred
~~- [ ] Update band names for clarity #182~~ deferred
- [ ] Since it's hard to attach units, etc. to DataFrame columns/values, consider renaming columns to be explicit (e.g., "flux" --> "flux_mjy" and "time" --> "mjd"). OTOH, we may run into issues that mean this is just more confusing (to the point of deceiving), like ZTF actually uses "hmjd" rather than "mjd".

fluxconversions.py

~~- [ ] See if we can use Astropy methods in place of any functions in this file (prompted by #181 (comment)).~~ moved to #188

heasarc_functions.py

The two lists heasarc_cat and max_error_radius have an implicit, one-to-one relationship. Replace the two lists with a dict that ties the correct values together. Then iterate over the dict instead of a list index. (Cleanup light curve notebook (complete Issue #177) #198)

mast_functions.py

Remove this file. It's not being used anywhere. (Cleanup light curve notebook (complete Issue #177) #198)

panstarrs.py

Remove try/except if possible. Otherwise, move as much code as possible outside the try block as possible. (Cleanup light curve notebook (complete Issue #177) #198)

plot_functions.py moved to #199

sample_lc.py

- [ ] Update it to, e.g., use sample_table. (This is a script version of the parallel section of the notebook, meant to be called from command line, but has not been kept up to date.) Alternately, consider making the modules directly callable from the command line and write a .sh script to execute them. This should speed up bulk runs because ZTF could use multiple workers. (moved to #195)

sample_selection.py

~~- [ ] nonunique_sample function: either remove it or update it to return a Table like clean_sample does.~~ deferred (not used in this notebook)

tde_functions.py

Remove this file. It doesn't appear to be in use. (Cleanup light curve notebook (complete Issue #177) #198)

TESS_Kepler_functions.py

Remove the try/except if possible. Otherwise, move as much code as possible outside the try block and catch a specific error in the except statement. (Cleanup light curve notebook (complete Issue #177) #198)
Iterate over the search results directly instead of over the index numlc. (Cleanup light curve notebook (complete Issue #177) #198)

ztf_functions.py

Use either variable name ztf_radius or radius, but not both. (Cleanup light curve notebook (complete Issue #177) #198)
Replace manual flux unit conversion with Astropy methods. (Cleanup light curve notebook (complete Issue #177) #198)

The text was updated successfully, but these errors were encountered:

troyraen · 2023-12-06T02:10:16Z

@bsipocz what's your input on the Formatting section? What standards should we apply for .py and .md files?

jkrick · 2023-12-06T17:55:00Z

re: sample_lc.py and nonunique_sample inside of sample_selection.py
Both of these are not part of light_curve_generator.md instead ML_AGNzoo.md

What to do with them depends on our goal for the ML notebooks. I think they should start with a pre-made sample of light curves, and not need to run light_curve_generator. My reasoning is twofold. 1) speed of running the ML notebooks will be faster without first having to generate the light curves. 2) for anyone wanting to use the ML notebooks they wouldn't have to understand the extra pieces of code that generate the light curves, they could just focus on the code in the ML notebooks themselves.

If that is ok with everyone, then we should remove sample_lc.py (but make sure @xoubish has a copy somewhere since she needs it to generate the initial sample), and also remove nonunique_sample

bsipocz · 2023-12-06T19:19:09Z

what's your input on the Formatting section? What standards should we apply for .py and .md files?

I would say we could add a CI job to check on the .py formatting (though would not enable pre-commit or anything annoying that messes with the commits/branches, only a failing CI job on the PR). But I don't know anything from the top of my head for md files, will have to google for that one.

troyraen · 2023-12-06T20:07:50Z

I would say we could add a CI job to check on the .py formatting

I'm in favor of that, but I'd like to keep the scope of this Issue more narrowly focused on cleaning up just the light curve notebook.

I have specific formatting preferences, especially for python code. While I know that your preferences are different than mine, I don't know exactly what your preferences are. So I'd like our group to come to a consensus before I make any broad changes there. Do you want to discuss specifics to be applied to this notebook? Or do you think we should leave the formatting alone for now?

bsipocz · 2023-12-06T20:13:42Z

I have specific formatting preferences, especially for python code

Something that is PEP8 compatible. I don't like to be as strict and whitespace heavy as black and would rather just make flake8 or similar pass on the codebase, but I agree, getting a consensus would be nice and I suppose can be done quickly. Shall we put it on the agenda for this afternoon?

troyraen · 2023-12-06T20:14:42Z

@jkrick That's all fine with me. We'll just need to figure out where to store the pre-made sample of light curves. (I know we're using Google Drive for this right now, but want to figure out a more general solution.)

jkrick · 2023-12-06T20:59:39Z

I have added formatting preferences to the agenda for this afternoon, but I would be super happy to just adopt whatever you two think is best. This is definitely your domain.

jkrick · 2023-12-06T21:01:47Z

And I am pretty sure I have a gitlab issue in with the Navteca people about where we should be storing data as a long term solution instead of google drive.

jkrick · 2023-12-16T01:06:46Z

After looking over everything in the directory, I fully agree that MAST_functions.py can be removed. I don't see it used anywhere.

Cleanup light curve notebook (complete Issue #177)

Cleanup light curve notebook (complete Issue #177) 6491ad0

troyraen mentioned this issue Nov 22, 2023

Clean up light curves code #176

Merged

troyraen added the use case: light curves label Nov 22, 2023

troyraen self-assigned this Dec 5, 2023

troyraen mentioned this issue Dec 5, 2023

Bug-fix data cleaning and plotting #181

Merged

jkrick mentioned this issue Dec 6, 2023

warnings on parallel section #174

Closed

This was referenced Jan 5, 2024

Cleanup light curve notebook (complete Issue #177) #198

Merged

Clean up plot_functions.py #199

Closed

troyraen closed this as completed in #198 Jan 11, 2024

troyraen added a commit that referenced this issue Jan 11, 2024

Merge pull request #198 from /issues/177

6491ad0

Cleanup light curve notebook (complete Issue #177)

github-actions bot pushed a commit that referenced this issue Jan 11, 2024

Merge pull request #198 from /issues/177

e9b9af8

Cleanup light curve notebook (complete Issue #177) 6491ad0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of things to consider cleaning up in the light curve notebook #177

List of things to consider cleaning up in the light curve notebook #177

troyraen commented Nov 22, 2023 •

edited

Loading

troyraen commented Dec 6, 2023

jkrick commented Dec 6, 2023

bsipocz commented Dec 6, 2023

troyraen commented Dec 6, 2023

bsipocz commented Dec 6, 2023

troyraen commented Dec 6, 2023

jkrick commented Dec 6, 2023

jkrick commented Dec 6, 2023

jkrick commented Dec 16, 2023

List of things to consider cleaning up in the light curve notebook #177

List of things to consider cleaning up in the light curve notebook #177

Comments

troyraen commented Nov 22, 2023 • edited Loading

Function docstrings

Notebook text

Code

General

fluxconversions.py

heasarc_functions.py

mast_functions.py

panstarrs.py

plot_functions.py moved to #199

sample_lc.py

sample_selection.py

tde_functions.py

TESS_Kepler_functions.py

ztf_functions.py

troyraen commented Dec 6, 2023

jkrick commented Dec 6, 2023

bsipocz commented Dec 6, 2023

troyraen commented Dec 6, 2023

bsipocz commented Dec 6, 2023

troyraen commented Dec 6, 2023

jkrick commented Dec 6, 2023

jkrick commented Dec 6, 2023

jkrick commented Dec 16, 2023

troyraen commented Nov 22, 2023 •

edited

Loading