From 922b4ca93c5867f67ed441f2b6cc829ee47a65b7 Mon Sep 17 00:00:00 2001 From: Jessica Date: Thu, 14 Dec 2023 00:47:47 +0000 Subject: [PATCH 1/3] cleaning up text of serial section of main light curve notebook --- light_curves/light_curve_generator.md | 221 ++++++++++++++------------ 1 file changed, 123 insertions(+), 98 deletions(-) diff --git a/light_curves/light_curve_generator.md b/light_curves/light_curve_generator.md index 4e9f7b65..326fe56a 100644 --- a/light_curves/light_curve_generator.md +++ b/light_curves/light_curve_generator.md @@ -11,51 +11,53 @@ kernelspec: name: python3 --- -# Make multiwavelength light curves using archival data +# Make Multiwavelength Light Curves Using Archival Data *** ## Learning Goals -By the end of this tutorial, you will be able to: - - automatically load a catalog of sources - - automatically search NASA and non-NASA resources for light curves - - store light curves in a Pandas multiindex dataframe - - plot all light curves on the same plot +By the end of this tutorial, you will be able to: + • Automatically load a catalog of sources + • Automatically search NASA and non-NASA resources for light curves + • Store light curves in a Pandas multiindex dataframe + • Plot all light curves on the same plot -## Introduction: - - A user has a sample of interesting targets for which they would like to see a plot of available archival light curves. We start with a small set of changing look AGN from Yang et al., 2018, which are automatically downloaded. Changing look AGN are cases where the broad emission lines appear or disappear (and not just that the flux is variable). - - We model light curve plots after van Velzen et al. 2021. We search through a curated list of time-domain NASA holdings as well as non-NASA sources. HEASARC catalogs used are Fermi and Beppo-Sax, IRSA catalogs used are ZTF and WISE, and MAST catalogs used are Pan-Starrs, TESS, Kepler, and K2. Non-NASA sources are Gaia and IceCube. This list is generalized enough to include many types of targets to make this notebook interesting for many types of science. All of these time-domain archives are searched in an automated fashion using astroquery or APIs. - - Light curve data storage is a tricky problem. Currently we are using a multi-index Pandas dataframe, as the best existing choice for right now. One downside is that we need to manually track the units of flux and time instead of relying on an astropy storage scheme which would be able to do some of the units worrying for us (even astropy can't do all magnitude to flux conversions). Astropy does not currently have a good option for multi-band light curve storage. - - We intend to explore a ML classifier for these changing look AGN light curves. +## Introduction: + • A user has a sample of interesting targets for which they would like to see a plot of available archival light curves. We start with a small set of changing look AGN from Yang et al., 2018, which are automatically downloaded. Changing look AGN are cases where the broad emission lines appear or disappear (and not just that the flux is variable). -## Input: - - choose from a list of known changing look AGN from the literature + • We model light curve plots after van Velzen et al. 2021. We search through a curated list of time-domain NASA holdings as well as non-NASA sources. HEASARC catalogs used are Fermi and Beppo-Sax, IRSA catalogs used are ZTF and WISE, and MAST catalogs used are Pan-Starrs, TESS, Kepler, and K2. Non-NASA sources are Gaia and IceCube. This list is generalized enough to include many types of targets to make this notebook interesting for many types of science. All of these time-domain archives are searched in an automated fashion using astroquery or APIs. + + • Light curve data storage is a tricky problem. Currently we are using a multi-index Pandas dataframe, as the best existing choice for right now. One downside is that we need to manually track the units of flux and time instead of relying on an astropy storage scheme which would be able to do some of the units worrying for us (even astropy can't do all magnitude to flux conversions). Astropy does not currently have a good option for multi-band light curve storage. + + • We intend to explore a ML classifier for these changing look AGN light curves. - OR - - - input your own sample +## Input: + • choose from a list of known changing look AGN from the literature + OR - + • input your own sample ## Output: - - an archival optical + IR + neutrino light curve + • an archival optical + IR + neutrino light curve ## Non-standard Imports: -- `acstools` to work with HST magnitude to flux conversion -- `astropy` to work with coordinates/units and data structures -- `astroquery` to interface with archives APIs -- `hpgeom` to locate coordinates in HEALPix space -- `lightkurve` to search TESSS, Kepler, and K2 archives -- `pyarrow` to work with Parquet files for WISE and ZTF -- `s3fs` to connect to AWS S3 buckets -- `urllib` to handle archive searches with website interface + • `acstools` to work with HST magnitude to flux conversion + • `astropy` to work with coordinates/units and data structures + • `astroquery` to interface with archives APIs + • `hpgeom` to locate coordinates in HEALPix space + • `lightkurve` to search TESSS, Kepler, and K2 archives + • `pyarrow` to work with Parquet files for WISE and ZTF + • `s3fs` to connect to AWS S3 buckets + • `urllib` to handle archive searches with website interface ## Authors: Jessica Krick, Shoubaneh Hemmati, Andreas Faisst, Troy Raen, Brigitta Sipőcz, Dave Shupe ## Acknowledgements: -Suvi Gezari, Antara Basu-zych,Stephanie LaMassa\ +Suvi Gezari, Antara Basu-zych, Stephanie LaMassa MAST, HEASARC, & IRSA Fornax teams ```{code-cell} ipython3 -#ensure all dependencies are installed +# Ensure all dependencies are installed !pip install -r requirements.txt ``` @@ -89,19 +91,18 @@ from WISE_functions import WISE_get_lightcurves from ztf_functions import ZTF_get_lightcurve ``` -## 1. Define the Sample - We define here a "gold" sample of spectroscopically confirmed changing look AGN and quasars. This sample includes both objects which change from type 1 to type 2 and also the opposite. Future studies may want to treat these as seperate objects or seperate QSOs from AGN. - - Bibcodes for the samples used are listed next to their functions for reference. +## 1. Define the sample +We define here a "gold" sample of spectroscopically confirmed changing look AGN and quasars. This sample includes both objects which change from type 1 to type 2 and also the opposite. Future studies may want to treat these as seperate objects or seperate QSOs from AGN. Bibcodes for the samples used are listed next to their functions for reference. - Functions used to grab the samples from the papers use Astroquery, NED, SIMBAD, Vizier, and in a few cases grab the tables from the html versions of the paper. +Significant work went into the functions which grab the samples from the papers. They use Astroquery, NED, SIMBAD, Vizier, and in a few cases grab the tables from the html versions of the paper. There are trickeries involved in accessing coordinates from tables in the literature. Not every literature table is stored in its entirety in all of these resrources, so be sure to check that your chosen method is actually getting the information that you see in the paper table. Warning: You will get false results if using NED or SIMBAD on a table that has more rows than are printed in the journal. ```{code-cell} ipython3 -#build up the sample +# Build up the sample +# Initially set up lists to hold the coordinates and their reference paper name as a label coords =[] labels = [] -#choose your own adventure: +# Choose your own adventure: #get_lamassa_sample(coords, labels) #2015ApJ...800..144L #get_macleod16_sample(coords, labels) #2016MNRAS.457..389M @@ -114,26 +115,27 @@ labels = [] #get_hon_sample(coords, labels) #2022MNRAS.511...54H get_yang_sample(coords, labels) #2018ApJ...862..109Y -#now get some "normal" QSOs for use in the classifier -#there are ~500K of these, so choose the number based on -#a balance between speed of running the light curves and whatever -#the ML algorithms would like to have +# Get some "normal" QSOs +# there are ~500K of these, so choose the number based on +# a balance between speed of running the light curves and whatever +# the ML algorithms would like to have #num_normal_QSO = 5000 #get_SDSS_sample(coords, labels, num_normal_QSO) -# remove duplicates and attach an objectid to the coords +# Remove duplicates, attach an objectid to the coords, +# convert to astropy table to keep all relevant info together sample_table = clean_sample(coords, labels) ``` -### 1.1 Build your own Sample +### 1.1 Build your own sample To build your own sample, you can follow the examples of functions above to grab coordinates from your favorite literature resource, or You can use [astropy's read](https://docs.astropy.org/en/stable/io/ascii/read.html) function to read in an input table -and then convert that table into a list of [skycoords](https://docs.astropy.org/en/stable/api/astropy.coordinates.SkyCoord.html) +to an [astropy table](https://docs.astropy.org/en/stable/table/) +++ @@ -141,7 +143,7 @@ and then convert that table into a list of [skycoords](https://docs.astropy.org/ At this point you may wish to write out your sample to disk and reuse that in future work sessions, instead of creating it from scratch again. -We would suggest to choose from various formats that fully supports the astropy objects, such as SkyCoord, in the so-called Mixin columns. E.g Enhanced Character-Separated Values or 'ecsv' is one such format: https://docs.astropy.org/en/stable/io/ascii/ecsv.html +For the format of the save file, we would suggest to choose from various formats that fully support astropy objects(eg., SkyCoord). One example that works is Enhanced Character-Separated Values or ['ecsv'](https://docs.astropy.org/en/stable/io/ascii/ecsv.html) ```{code-cell} ipython3 sample_table.write('data/input_sample.ecsv', format='ascii.ecsv', overwrite = True) @@ -155,109 +157,131 @@ Do only this step from this section when you have a previously generated sample sample_table = Table.read('data/input_sample.ecsv', format='ascii.ecsv') ``` -## 2. Find light curves for these targets in NASA catalogs - - We search a curated list of time-domain catalogs from all NASA astrophysics archives +### 1.4 Initialize data structure to hold the light curves ```{code-cell} ipython3 -### Initialize Pandas MultiIndex data frame for storing the light curves +# We wrote our own class for a Pandas MultiIndex [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) for storing the light curves +# This class helps simplify coding of common uses for the DataFrame. df_lc = MultiIndexDFObject() ``` +## 2. Find light curves for these targets in NASA catalogs +We search a curated list of time-domain catalogs from NASA astrophysics archives. Because each archive is different, and in many cases each catalog is different, each function to access a catalog is necesarily specialized to the location and format of that particular catalog. + ++++ + ### 2.1 HEASARC: FERMI & Beppo SAX +The function to retrieve HEASARC data accesses the HEASARC archive using a pyvo search with a table upload. This is the fastest way to access data from HEASARC catalogs at scale. + +While these aren't strictly light curves, we would like to track if there are gamma rays detected in advance of any change in the CLAGN light curves. We store these gamma ray detections as single datapoints. Because gamma ray detections typically have very large error radii, our current technique is to keep matches in the catalogs within some manually selected error radius, currently defaulting to 1 degree for Fermi and 3 degrees for Beppo SAX. These values are chosen based on a histogram of all values for those catalogs. ```{code-cell} ipython3 -start_serial = time.time() +start_serial = time.time() #keep track of all serial archive calls to compare later with parallel archive call time +heasarcstarttime = time.time() -#what is the size of error_radius for the fermi catalog that we will accept for our cross-matching? -#in degrees; chosen based on histogram of all values for these catalogs +# What is the size of error_radius for the catalogs that we will accept for our cross-matching? +# in degrees; chosen based on histogram of all values for these catalogs max_fermi_error_radius = str(1.0) max_sax_error_radius = str(3.0) -#list of missions to query and their corresponding error radii +# List of missions to query and their corresponding error radii heasarc_cat = ["FERMIGTRIG", "SAXGRBMGRB"] error_radius = [max_fermi_error_radius , max_sax_error_radius] - -#go out and find all light curves in the above curated list which match our target positions +# get heasarc light curves in the above curated list of missions df_lc_fermi = HEASARC_get_lightcurves(sample_table, heasarc_cat, error_radius) + +#add the resulting dataframe to all other archives df_lc.append(df_lc_fermi) - + +print('heasarc search took:', time.time() - heasarcstarttime, 's') ``` ### 2.2 IRSA: ZTF +The function to retrieve ZTF light curves accesses a parquet version of the ZTF catalog stored in the cloud using pyarrow. This is the fastest way to access the ZTF catalog at scale. The ZTF [API](https://irsa.ipac.caltech.edu/docs/program_interface/ztf_lightcurve_api.html) is available for small sample searches. One unique thing about this function is that it has parallelization built in to the function itself. ```{code-cell} ipython3 +ZTFstarttime = time.time() + +#get ZTF lightcurves # use the nworkers arg to control the amount of parallelization in the data loading step df_lc_ZTF = ZTF_get_lightcurve(sample_table, nworkers=6) #add the resulting dataframe to all other archives df_lc.append(df_lc_ZTF) + +print('ZTF search took:', time.time() - ZTFstarttime, 's') ``` ### 2.3 IRSA: WISE -- use the unWISE light curves catalog which ties together all WISE & NEOWISE 2010 - 2020 epochs. Specifically it combined all observations at a single epoch to achieve deeper mag limits than individual observations alone. -- [Meisner et al., 2023, 2023AJ....165...36M](https://ui.adsabs.harvard.edu/abs/2023AJ....165...36M/abstract) +We use the unWISE light curves catalog ([Meisner et al., 2023](https://ui.adsabs.harvard.edu/abs/2023AJ....165...36M/abstract)) which ties together all WISE & NEOWISE 2010 - 2020 epochs. Specifically it combines all observations at a single epoch to achieve deeper mag limits than individual observations alone. + +The function to retrieve WISE light curves accesses an IRSA generated version of the catalog in parquet format being stored in the AWS cloud [Open Data Repository](https://registry.opendata.aws/collab/nasa/) ```{code-cell} ipython3 -bandlist = ['W1', 'W2'] -WISE_radius = 1.0 * u.arcsec +WISEstarttime = time.time() +bandlist = ['W1', 'W2'] #list of the WISE band names +WISE_radius = 1.0 * u.arcsec +#get WISE light curves df_lc_WISE = WISE_get_lightcurves(sample_table, WISE_radius, bandlist) #add the resulting dataframe to all other archives df_lc.append(df_lc_WISE) + +print('WISE search took:', time.time() - WISEstarttime, 's') ``` ### 2.4 MAST: Pan-STARRS -Query the Pan-STARRS API; based on this [example](https://ps1images.stsci.edu/ps1_dr2_api.html) +The function to retrieve lightcurves from Pan-STARRS currently uses their API; based on this [example](https://ps1images.stsci.edu/ps1_dr2_api.html). This search is not efficient at scale and we expect it to be replaced in the future. ```{code-cell} ipython3 -#Do a panstarrs search -panstarrs_radius = 1.0/3600.0 # search radius = 1 arcsec -df_lc_panstarrs = Panstarrs_get_lightcurves(sample_table, panstarrs_radius) +panstarrsstarttime = time.time() + +panstarrs_search_radius = 1.0/3600.0 # search radius = 1 arcsec +#get panstarrs light curves +df_lc_panstarrs = Panstarrs_get_lightcurves(sample_table, panstarrs_search_radius) #add the resulting dataframe to all other archives df_lc.append(df_lc_panstarrs) -``` - -### 2.5 MAST: Asteroid Terrestrial-impact Last Alert System (ATLAS) - - All-sky stellar reference catalog - - MAST hosts this catalog but there are three barriers to using it - 1. it is unclear if the MAST [holdings]( https://archive.stsci.edu/hlsp/atlas-refcat2#section-a737bc3e-2d56-4827-9ab4-838fbf8d67c1) include the individual epoch photometry and - 2. it is only accessible with casjobs, not through python notebooks. - 3. magnitude range (g, r, i) < 19mag makes it not relevant for this use case - -One path forward if this catalog becomes scientifically interesting is to put in a MAST helpdesk ticket to see if 1) they do have the light curves, and 2) they could switch the catalog to a searchable with python version. There are some ways of [accessing casjobs with python]( Date: Fri, 15 Dec 2023 01:04:12 +0000 Subject: [PATCH 2/3] still editing text --- light_curves/code_src/HCV_functions.py | 22 ++++---- light_curves/code_src/README.md | 10 +++- .../code_src/TESS_Kepler_functions.py | 54 +++++++++---------- light_curves/code_src/WISE_functions.py | 3 +- light_curves/code_src/data_structures.py | 2 +- light_curves/code_src/gaia_functions.py | 30 ++++++----- light_curves/code_src/heasarc_functions.py | 12 +++-- light_curves/code_src/icecube_functions.py | 22 ++++---- light_curves/code_src/mast_functions.py | 2 - light_curves/code_src/panstarrs.py | 4 +- light_curves/code_src/ztf_functions.py | 2 +- light_curves/light_curve_generator.md | 54 +++++++++++-------- 12 files changed, 119 insertions(+), 98 deletions(-) diff --git a/light_curves/code_src/HCV_functions.py b/light_curves/code_src/HCV_functions.py index 26be1001..7359d328 100644 --- a/light_curves/code_src/HCV_functions.py +++ b/light_curves/code_src/HCV_functions.py @@ -1,15 +1,18 @@ import pandas as pd import requests from astropy.table import Table - from data_structures import MultiIndexDFObject from fluxconversions import convertACSmagtoflux +# Functions related to the HCV. +# Code partially taken from https://archive.stsci.edu/hst/hsc/help/HCV/HCV_API_demo.html -## Functions related to the HCV. def get_hscapiurl(): - """ - Returns the HSC API Url + """ Return the url to use for the HSC API + + Returns + ------- + the HSC API Url """ @@ -33,10 +36,10 @@ def hcvcone(ra,dec,radius,table="hcvsummary",release="v3",format="csv",magtype=" hcvsummary, hcv, summary, detailed, propermotions, or sourcepositions release: string v3 or v2 - magtype: string - magaper2 or magauto (only applies to summary table) format: string csv, votable, json + magtype: string + magaper2 or magauto (only applies to summary table) columns: list of strings list of column names to include (None means use defaults) baseurl: string @@ -47,7 +50,8 @@ def hcvcone(ra,dec,radius,table="hcvsummary",release="v3",format="csv",magtype=" Returns ------- - search results + search results: Table + """ data = kw.copy() @@ -165,7 +169,7 @@ def cat2url(table="hcvsummary",release="v3",magtype="magaper2",baseurl=get_hscap Returns ------- - string with the base URL for this request + base URL for this request: string """ checklegal_hcv(table,release,magtype) if table == "summary": @@ -227,8 +231,6 @@ def HCV_get_lightcurves(sample_table, radius): df_lc = MultiIndexDFObject() - - for row in sample_table: ra = row['coord'].ra.deg diff --git a/light_curves/code_src/README.md b/light_curves/code_src/README.md index 10277682..5861022c 100644 --- a/light_curves/code_src/README.md +++ b/light_curves/code_src/README.md @@ -1,3 +1,11 @@ # Code - Light Curves -This directory stores code for the light curve use case. +This directory stores code for the light curve use cases including the ML use cases. +Notebooks using this directory are: +- light_curve_generator +- lc_classifier +- ML_AGNzoo + +```python + +``` diff --git a/light_curves/code_src/TESS_Kepler_functions.py b/light_curves/code_src/TESS_Kepler_functions.py index 67e3e315..bce1f0ae 100644 --- a/light_curves/code_src/TESS_Kepler_functions.py +++ b/light_curves/code_src/TESS_Kepler_functions.py @@ -24,8 +24,8 @@ def clean_filternames(search_result, numlc): name of the mission without quarter information """ filtername = str(search_result[numlc].mission) - #clean this up a bit so all Kepler quarters etc., get the same filtername - #we don't need to track the individual names for the quarters, just need to know which mission it is + # clean this up a bit so all Kepler quarters etc., get the same filtername + # we don't need to track the individual names for the quarters, just need to know which mission it is if 'Kepler' in filtername: filtername = 'Kepler' if 'TESS' in filtername: @@ -35,7 +35,7 @@ def clean_filternames(search_result, numlc): return(filtername) def TESS_Kepler_get_lightcurves(sample_table, radius): - """Searches TESS, Kepler, and K2 for light curves from a list of input coordinates + """Searches TESS, Kepler, and K2 for light curves from a list of input coordinates. This is the MAIN function Parameters ---------- @@ -52,56 +52,56 @@ def TESS_Kepler_get_lightcurves(sample_table, radius): df_lc = MultiIndexDFObject() - #for all objects + # for all objects for row in tqdm(sample_table): - #for testing, this has 79 light curves between the three missions. - #for ccount in range(1): + # for testing, this has 79 light curves between the three missions. + # for ccount in range(1): # coord = '19:02:43.1 +50:14:28.7' try: - #use lightkurve to search TESS, Kepler and K2 + # use lightkurve to search TESS, Kepler and K2 search_result = lk.search_lightcurve(row["coord"], radius = radius) lab = row["label"] - #figure out what to do with the results + # figure out what to do with the results if len(search_result) >= 1: - #https://docs.lightkurve.org/tutorials/1-getting-started/searching-for-data-products.html + # https://docs.lightkurve.org/tutorials/1-getting-started/searching-for-data-products.html print(row["objectid"], 'got a live one') - #download all of the returned light curves from TESS, Kepler, and K2 + # download all of the returned light curves from TESS, Kepler, and K2 lc_collection = search_result.download_all() - #can't get the whole collection directly into pandas multiindex - #pull out inidividual light curves, convert to uniform units, and put them in pandas + # can't get the whole collection directly into pandas multiindex + # pull out inidividual light curves, convert to uniform units, and put them in pandas for numlc in range(len(search_result)): - lc = lc_collection[numlc] #for testing 0 is Kepler, #69 is TESS + lc = lc_collection[numlc] # for testing 0 is Kepler, #69 is TESS - #convert to Pandas + # convert to Pandas lcdf = lc.to_pandas().reset_index() - #these light curves are too highly sampled for our AGN use case, so reduce their size - #by choosing only to keep every nth sample + # these light curves are too highly sampled for our AGN use case, so reduce their size + # by choosing only to keep every nth sample nsample = 30 - lcdf_small = lcdf[lcdf.index % nsample ==0] #selects every nth row starting with row 0 + lcdf_small = lcdf[lcdf.index % nsample ==0] # selects every nth row starting with row 0 - #convert time to mjd - time_lc = lcdf_small.time #in units of time - 2457000 BTJD days - time_lc= time_lc + 2457000 - 2400000.5 #now in MJD days within a few minutes (except for the barycenter correction) + # convert time to mjd + time_lc = lcdf_small.time # in units of time - 2457000 BTJD days + time_lc= time_lc + 2457000 - 2400000.5 # now in MJD days within a few minutes (except for the barycenter correction) - #TESS, Kepler, and K2 report flux in units of electrons/s - #- there is no good way to convert this to anything more useful because the bandpasses are very wide and nonstandard - #- really we don't care about absolute scale, but want to scale the light curve to be on the same plot as other light curves - #- save as electron/s here and scale when plotting + # TESS, Kepler, and K2 report flux in units of electrons/s + # there is no good way to convert this to anything more useful because the bandpasses are very wide and nonstandard + # really we don't care about absolute scale, but want to scale the light curve to be on the same plot as other light curves + # save as electron/s here and scale when plotting flux_lc = lcdf_small.flux #in electron/s fluxerr_lc = lcdf_small.flux_err #in electron/s - #record band name + # record band name filtername = clean_filternames(search_result, numlc) - #put this single object light curves into a pandas multiindex dataframe + # put this single object light curves into a pandas multiindex dataframe # fluxes are in units of electrons/s and will be scaled to fit the other fluxes when plotting dfsingle = pd.DataFrame(dict(flux=flux_lc, err=fluxerr_lc, time=time_lc, objectid=row["objectid"], band=filtername,label=lab)).set_index(["objectid", "label", "band", "time"]) - #then concatenate each individual df together + # then concatenate each individual df together df_lc.append(dfsingle) except: pass diff --git a/light_curves/code_src/WISE_functions.py b/light_curves/code_src/WISE_functions.py index a6679d38..f97c99eb 100644 --- a/light_curves/code_src/WISE_functions.py +++ b/light_curves/code_src/WISE_functions.py @@ -17,7 +17,8 @@ def WISE_get_lightcurves(sample_table, radius=1.0 * u.arcsec, bandlist=["W1", "W2"]): """Loads WISE data by searching the unWISE light curve catalog (Meisner et al., 2023AJ....165...36M). - + This is the MAIN function + Parameters ---------- sample_table : `~astropy.table.Table` diff --git a/light_curves/code_src/data_structures.py b/light_curves/code_src/data_structures.py index 98a052f1..a429fdac 100644 --- a/light_curves/code_src/data_structures.py +++ b/light_curves/code_src/data_structures.py @@ -1,4 +1,4 @@ -#setup to save the light curves in a data structure +#setup to store the light curves in a data structure import pandas as pd from astropy.table import vstack from astropy.timeseries import TimeSeries diff --git a/light_curves/code_src/gaia_functions.py b/light_curves/code_src/gaia_functions.py index 85bd55cf..3690e4fb 100644 --- a/light_curves/code_src/gaia_functions.py +++ b/light_curves/code_src/gaia_functions.py @@ -1,32 +1,34 @@ import time - import numpy as np import pandas as pd from astroquery.gaia import Gaia - from data_structures import MultiIndexDFObject - def Gaia_get_lightcurve(sample_table, search_radius, verbose): ''' Creates a lightcurve Pandas MultiIndex object from Gaia data for a list of coordinates. - This is the MAIN function. + This is the MAIN function. Parameters ---------- sample_table : Astropy Table main source catalog with coordinates, labels, and objectids - verbose : int - How much to talk. 0 = None, 1 = a little bit , 2 = more, 3 = full search_radius: float(degrees) How far from a sources is ok for a match - + verbose : int + How much to talk. 0 = None, 1 = a little bit , 2 = more, 3 = full + Returns -------- MultiIndexDFObject with Gaia light curve photometry ''' - + # This code is broken into two steps. The first step, `Gaia_retrieve_catalog` retrieves the + # Gaia source ids for the positions of our sample. These come from the "Gaia DR3 source lite catalog". + # However, that catalog only has a single photometry point per object. To get the light curve + # information, we use the function `gaia_retrieve_epoch_photometry` to use the source ids to + # access the "EPOCH_PHOTOMETRY" catalog. + # Retrieve Gaia table with Source IDs ============== gaia_table = Gaia_retrieve_catalog(sample_table , search_radius = search_radius, @@ -40,14 +42,14 @@ def Gaia_get_lightcurve(sample_table, search_radius, verbose): # Extract Light curves =============== # request the EPOCH_PHOTOMETRY from the Gaia DataLink Service - gaia_df = gaia_retrieve_epoch_photometry(gaia_table) + gaia_df = Gaia_retrieve_epoch_photometry(gaia_table) #if the epochal photometry is empty, return an empty dataframe if len(gaia_df) == 0: return MultiIndexDFObject() ## Create light curves ================= - df_lc = gaia_clean_dataframe(gaia_df) + df_lc = Gaia_clean_dataframe(gaia_df) return df_lc @@ -99,7 +101,7 @@ def Gaia_retrieve_catalog(sample_table , search_radius, verbose): return results -def gaia_chunks(lst, n): +def Gaia_chunks(lst, n): """ "Split an input list into multiple chunks of size =< n" @@ -112,7 +114,7 @@ def gaia_chunks(lst, n): for i in range(0, len(lst), n): yield lst[i:i + n] -def gaia_retrieve_epoch_photometry(gaia_table): +def Gaia_retrieve_epoch_photometry(gaia_table): """ Function to retrieve EPOCH_PHOTOMETRY catalog product for Gaia entries using the DataLink. Note that the IDs need to be DR3 source_id and needs to be a list. @@ -135,7 +137,7 @@ def gaia_retrieve_epoch_photometry(gaia_table): # and then send each chunk into the datalink server ids = list(gaia_table["source_id"]) dl_threshold = 5000 # Datalink server threshold - ids_chunks = list(gaia_chunks(ids, dl_threshold)) + ids_chunks = list(Gaia_chunks(ids, dl_threshold)) datalink_all = [] #setup to request the epochal photometry @@ -175,7 +177,7 @@ def gaia_retrieve_epoch_photometry(gaia_table): # clean and transform the data -def gaia_clean_dataframe(gaia_df): +def Gaia_clean_dataframe(gaia_df): """ Clean and transform the EPOCH_PHOTOMETRY dataframe in preparation to add to other light curves diff --git a/light_curves/code_src/heasarc_functions.py b/light_curves/code_src/heasarc_functions.py index 8a029359..44b60b97 100644 --- a/light_curves/code_src/heasarc_functions.py +++ b/light_curves/code_src/heasarc_functions.py @@ -8,11 +8,7 @@ from data_structures import MultiIndexDFObject -#need to know the distribution of error radii for the catalogs of interest -#this will inform the ligh curve query, as we are not interested in -#error radii which are 'too large' so we need a way of defining what that is. -#leaving this code here in case user wants to change the cutoff error radii -#based on their science goals. It is not currently used anywhere in the code + def make_hist_error_radii(missioncat): """plots a histogram of error radii from a HEASARC catalog @@ -32,6 +28,12 @@ def make_hist_error_radii(missioncat): results of the heasarc search including name, ra, dec, error_radius """ + # need to know the distribution of error radii for the catalogs of interest + # this will inform the ligh curve query, as we are not interested in + # error radii which are 'too large' so we need a way of defining what that is. + # leaving this code here in case user wants to change the cutoff error radii + # based on their science goals. It is not currently used anywhere in the code + # get the pyvo HEASARC service. heasarc_tap = pyvo.regsearch(servicetype='tap',keywords=['heasarc'])[0] diff --git a/light_curves/code_src/icecube_functions.py b/light_curves/code_src/icecube_functions.py index a447f5dc..be808a67 100644 --- a/light_curves/code_src/icecube_functions.py +++ b/light_curves/code_src/icecube_functions.py @@ -16,8 +16,7 @@ def Icecube_get_lightcurve(sample_table, icecube_select_topN=3, max_search_radius=2*u.deg): ''' - Extracts IceCube Neutrino events for a given source position and saves it into a lightcurve - Pandas MultiIndex object. + Extracts IceCube Neutrino events for a given source position. This is the MAIN function. Parameters @@ -52,26 +51,26 @@ def Icecube_get_lightcurve(sample_table, icecube_select_topN=3, max_search_radiu # create SkyCoord objects from icecube event coordinates icecube_skycoords = SkyCoord(icecube_events["ra"], icecube_events["dec"], unit="deg", frame='icrs') - #here are the skycoords from mysample defined above + # here are the skycoords from mysample defined above mysample_skycoords = sample_table['coord'] - #Match + # Match idx_mysample, idx_icecube, d2d, d3d = icecube_skycoords.search_around_sky(mysample_skycoords, max_search_radius) - #need to filter reponse based on position error circles - #really want d2d to be less than the error circle of icecube = icecube_events["AngErr"] in degrees + # need to filter reponse based on position error circles + # really want d2d to be less than the error circle of icecube = icecube_events["AngErr"] in degrees filter_arr = d2d < icecube_events["AngErr"][idx_icecube] filter_idx_mysample = idx_mysample[filter_arr] filter_idx_icecube = idx_icecube[filter_arr] filter_d2d = d2d[filter_arr] - #keep these matches together with objectid and lebal as new entries in the df. + # keep these matches together with objectid and lebal as new entries in the df. obid_match = sample_table['objectid'][filter_idx_mysample] label_match = sample_table['label'][filter_idx_mysample] time_icecube= icecube_events['mjd'][filter_idx_icecube] flux_icecube = icecube_events['energy_logGeV'][filter_idx_icecube] - #save the icecube info in correct format for the rest of the data + # save the icecube info in correct format for the rest of the data icecube_df = pd.DataFrame({'flux': flux_icecube, 'err': np.zeros(len(obid_match)), 'time': time_icecube, @@ -82,10 +81,10 @@ def Icecube_get_lightcurve(sample_table, icecube_select_topN=3, max_search_radiu # sort by Neutrino energy that way it is easier to get the top N events. icecube_df = icecube_df.sort_values(['objectid', 'flux'], ascending=[True, False]) - #now can use a groupby to only keep the top N (by GeV flux) icecube matches for each object + # now can use a groupby to only keep the top N (by GeV flux) icecube matches for each object filter_icecube_df = icecube_df.groupby('objectid').head(icecube_select_topN).reset_index(drop=True) - #put the index in to match with df_lc + # put the index in to match with df_lc filter_icecube_df.set_index(["objectid", "label", "band", "time"], inplace = True) return (MultiIndexDFObject(data=filter_icecube_df)) @@ -93,7 +92,7 @@ def Icecube_get_lightcurve(sample_table, icecube_select_topN=3, max_search_radiu def icecube_get_catalog(path="data", verbose=False): ''' - Creates the combined IceCube catalog based on the yearly catalog. + Creates the combined IceCube catalog based on the yearly catalogs. Parameters ----------- @@ -160,6 +159,7 @@ def icecube_download_data(url="http://icecube.wisc.edu/data-releases/20210126_PS file_path = os.path.join(path, os.path.basename(urlparse(url).path)) + # if the data has not already been downloaded if not os.path.exists(file_path): # Download diff --git a/light_curves/code_src/mast_functions.py b/light_curves/code_src/mast_functions.py index 4d65303a..98d59f99 100644 --- a/light_curves/code_src/mast_functions.py +++ b/light_curves/code_src/mast_functions.py @@ -1,9 +1,7 @@ ## Functions related to MAST queries. import json - import requests - def resolve(name): """Get the RA and Dec for an object using the MAST name resolver diff --git a/light_curves/code_src/panstarrs.py b/light_curves/code_src/panstarrs.py index ffa42f78..ba86a760 100644 --- a/light_curves/code_src/panstarrs.py +++ b/light_curves/code_src/panstarrs.py @@ -7,7 +7,7 @@ from data_structures import MultiIndexDFObject - +# code partially taken from https://ps1images.stsci.edu/ps1_dr2_api.html def ps1cone(ra,dec,radius,table="mean",release="dr1",format="csv",columns=None, baseurl="https://catalogs.mast.stsci.edu/api/v0.1/panstarrs", verbose=False, **kw): @@ -231,7 +231,7 @@ def search_lightcurve(objid): #Do a panstarrs search def Panstarrs_get_lightcurves(sample_table, radius): - """Searches panstarrs for light curves from a list of input coordinates + """Searches panstarrs for light curves from a list of input coordinates. This is the MAIN function. Parameters ---------- diff --git a/light_curves/code_src/ztf_functions.py b/light_curves/code_src/ztf_functions.py index e0677d5c..3ca01ed5 100644 --- a/light_curves/code_src/ztf_functions.py +++ b/light_curves/code_src/ztf_functions.py @@ -25,7 +25,7 @@ def ZTF_get_lightcurve(sample_table, nworkers=6, ztf_radius=0.000278 * u.deg): - """Function to add the ZTF lightcurves in all three bands to a multiframe data structure + """Function to add the ZTF lightcurves in all three bands to a multiframe data structure. This is the MAIN function. Parameters ---------- diff --git a/light_curves/light_curve_generator.md b/light_curves/light_curve_generator.md index 326fe56a..b2fefdd8 100644 --- a/light_curves/light_curve_generator.md +++ b/light_curves/light_curve_generator.md @@ -39,14 +39,22 @@ By the end of this tutorial, you will be able to: ## Output: • an archival optical + IR + neutrino light curve -## Non-standard Imports: +## Imports: • `acstools` to work with HST magnitude to flux conversion • `astropy` to work with coordinates/units and data structures • `astroquery` to interface with archives APIs • `hpgeom` to locate coordinates in HEALPix space • `lightkurve` to search TESSS, Kepler, and K2 archives + • `matplotlib` for plotting + • `multiprocessing` to use the power of multiple CPUs to get work done faster + • `numpy` for numerical processing + • `pandas` for their data structure DataFrame and all the accompanying functions • `pyarrow` to work with Parquet files for WISE and ZTF + • `pyvo` for acessing Virtual Observatory(VO) standard data + • `requests` to get information from URLs • `s3fs` to connect to AWS S3 buckets + • `scipy` to do statistics + • `tqdm` to track progress on long running jobs • `urllib` to handle archive searches with website interface ## Authors: @@ -191,7 +199,7 @@ error_radius = [max_fermi_error_radius , max_sax_error_radius] # get heasarc light curves in the above curated list of missions df_lc_fermi = HEASARC_get_lightcurves(sample_table, heasarc_cat, error_radius) -#add the resulting dataframe to all other archives +# add the resulting dataframe to all other archives df_lc.append(df_lc_fermi) print('heasarc search took:', time.time() - heasarcstarttime, 's') @@ -203,11 +211,11 @@ The function to retrieve ZTF light curves accesses a parquet version of the ZTF ```{code-cell} ipython3 ZTFstarttime = time.time() -#get ZTF lightcurves +# get ZTF lightcurves # use the nworkers arg to control the amount of parallelization in the data loading step df_lc_ZTF = ZTF_get_lightcurve(sample_table, nworkers=6) -#add the resulting dataframe to all other archives +# add the resulting dataframe to all other archives df_lc.append(df_lc_ZTF) print('ZTF search took:', time.time() - ZTFstarttime, 's') @@ -224,42 +232,42 @@ WISEstarttime = time.time() bandlist = ['W1', 'W2'] #list of the WISE band names WISE_radius = 1.0 * u.arcsec -#get WISE light curves +# get WISE light curves df_lc_WISE = WISE_get_lightcurves(sample_table, WISE_radius, bandlist) -#add the resulting dataframe to all other archives +# add the resulting dataframe to all other archives df_lc.append(df_lc_WISE) print('WISE search took:', time.time() - WISEstarttime, 's') ``` ### 2.4 MAST: Pan-STARRS -The function to retrieve lightcurves from Pan-STARRS currently uses their API; based on this [example](https://ps1images.stsci.edu/ps1_dr2_api.html). This search is not efficient at scale and we expect it to be replaced in the future. +The function to retrieve lightcurves from Pan-STARRS currently uses their API; based on this [example](https://ps1images.stsci.edu/ps1_dr2_api.html). This search is not efficient at scale and we expect it to be replaced in the future. ```{code-cell} ipython3 panstarrsstarttime = time.time() panstarrs_search_radius = 1.0/3600.0 # search radius = 1 arcsec -#get panstarrs light curves +# get panstarrs light curves df_lc_panstarrs = Panstarrs_get_lightcurves(sample_table, panstarrs_search_radius) -#add the resulting dataframe to all other archives +# add the resulting dataframe to all other archives df_lc.append(df_lc_panstarrs) print('Panstarrs search took:', time.time() - panstarrsstarttime, 's') ``` ### 2.5 MAST: TESS, Kepler and K2 -The function to retrieve lightcurves from these three missions currently uses the open source package [`lightKurve`](https://docs.lightkurve.org/index.html). This search is not efficient at scale and we expect it to be replaced in the future. +The function to retrieve lightcurves from these three missions currently uses the open source package [`lightKurve`](https://docs.lightkurve.org/index.html). This search is not efficient at scale and we expect it to be replaced in the future. ```{code-cell} ipython3 lightkurvestarttime = time.time() TESS_search_radius = 1.0 #arcseconds -#get TESS/Kepler/K2 light curves +# get TESS/Kepler/K2 light curves df_lc_TESS = TESS_Kepler_get_lightcurves(sample_table, TESS_search_radius) -#add the resulting dataframe to all other archives +# add the resulting dataframe to all other archives df_lc.append(df_lc_TESS) print('TESS/Kepler/K2 search took:', time.time() - lightkurvestarttime, 's') @@ -275,10 +283,10 @@ The function to retrieve lightcurves from HCV currently uses their API; based on HCVstarttime = time.time() HCV_radius = 1.0/3600.0 # radius = 1 arcsec -#get HCV light curves +# get HCV light curves df_lc_HCV = HCV_get_lightcurves(sample_table, HCV_radius) -#add the resulting dataframe to all other archives +# add the resulting dataframe to all other archives df_lc.append(df_lc_HCV) print('HCV search took:', time.time() - HCVstarttime, 's') @@ -294,10 +302,10 @@ The function to retrieve Gaia light curves accesses the Gaia DR3 "source lite" c ```{code-cell} ipython3 gaiastarttime = time.time() -#get Gaia light curves +# get Gaia light curves df_lc_gaia = Gaia_get_lightcurve(sample_table, 1/3600., 0) -#add the resulting dataframe to all other archives +# add the resulting dataframe to all other archives df_lc.append(df_lc_gaia) print('gaia search took:', time.time() - gaiastarttime, 's') @@ -314,11 +322,11 @@ This time series (time vs. neutrino energy) information is similar to photometry ```{code-cell} ipython3 icecubestarttime = time.time() -#get icecube datapoints +# get icecube datapoints df_lc_icecube = Icecube_get_lightcurve(sample_table , icecube_select_topN = 3) -#add the resulting dataframe to all other archives +# add the resulting dataframe to all other archives df_lc.append(df_lc_icecube) print('icecube search took:', time.time() - icecubestarttime, 's') @@ -326,7 +334,7 @@ end_serial = time.time() ``` ```{code-cell} ipython3 -#benchmarking +# benchmarking print('total time for serial archive calls is ', end_serial - start_serial, 's') ``` @@ -421,7 +429,7 @@ parallel_df_lc.data ``` ```{code-cell} ipython3 -# could load a previously saved file in order to plot +# Could load a previously saved file in order to plot #parquet_loadname = 'output/df_lc_090723_yang.parquet' #parallel_df_lc = MultiIndexDFObject() #parallel_df_lc.data = pd.read_parquet(parquet_loadname) @@ -429,15 +437,15 @@ parallel_df_lc.data ``` ## 5. Make plots of luminosity as a function of time -Model plots after [van Velzen et al., 2021](https://arxiv.org/pdf/2111.09391.pdf). +These plots are modelled after [van Velzen et al., 2021](https://arxiv.org/pdf/2111.09391.pdf). We show flux in mJy as a function of time for all available bands for each object. `show_nbr_figures` controls how many plots are actually generated and returned to the screen. If you choose to save the plots with `save_ouptut`, they will be put in the output directory and labelled by sample number. __Note__ that in the following, we can either plot the results from `df_lc` (from the serial call) or `parallel_df_lc` (from the parallel call). By default (see next cell) the output of the parallel call is used. ```{code-cell} ipython3 _ = create_figures(sample_table , df_lc = parallel_df_lc, # either df_lc (serial call) or parallel_df_lc (parallel call) - show_nbr_figures = 5, - save_output = True , + show_nbr_figures = 5, # how many plots do you actually want to see? + save_output = True , # should the resulting plots be saved? ) ``` From d5839e95aa9197940c4ef33022bbc56a065cb458 Mon Sep 17 00:00:00 2001 From: Jessica Krick Date: Sat, 16 Dec 2023 01:05:24 +0000 Subject: [PATCH 3/3] removing outputs and updating README --- README.md | 17 +++-------------- light_curves/light_curve_generator.md | 17 ++++++++--------- 2 files changed, 11 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index d9e59697..53c47457 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,9 @@ # fornax-demo-notebooks -Demo notebooks for the Fornax project +Tutorial notebooks of fully worked science use cases for the Fornax project ### Executive Summary -HEASARC, IRSA, and MAST jointly propose an FY22 project to demonstrate how NASA -Astrophysics mission data can be accessed from the cloud or on premises through a science -platform environment. We will center this demonstration on a limited set of data that will be -held in the NASA cloud, the AURA cloud, and on premises. We will build a suite of -containerized software, Jupyter notebooks, and Python libraries that will allow users to carry -out forced photometry on multiple NASA data sets, motivated by important science use -cases for mining survey data for extragalactic science and cosmology. This suite of data -access and analysis tools will be designed to be used in any of a number of science -platforms that are available or in development across the world. We will showcase its use in -at least two notebook environments, one of which will be cloud-hosted. We describe a simple -management structure for coordinating this work across all three archives and NASA. Finally, -we will use these experiences in further consultation with NASA to create an FY23 plan for -building an operational science platform within the NASA Cloud. +The Fornax Initiative is a NASA Astrophysics Archives project to collaboratively among the three archives HEASARC, IRSA, and MAST, create cloud systems, cloud software, and cloud standards for the astronomical community. +The Fornax Science Console is a cloud compute system near to NASA data in the cloud which provides a place where astronomers can do data intensive research with reduced barriers. The Fornax Initiative provides increased compute, increased memory, increased ease of use by pre-installing astronomical software, increased reprododicibility of big data results, increased inclusion by removing some of these barriers to entry, and tutorial notebooks and documentation. This repo houses those tutorial notebooks of fully worked science use cases for all users. Common goals of the use cases are archival data from all NASA archives, cross-archive work, big data, and computationally intensive science. ### Content contributing diff --git a/light_curves/light_curve_generator.md b/light_curves/light_curve_generator.md index b2fefdd8..4a43c186 100644 --- a/light_curves/light_curve_generator.md +++ b/light_curves/light_curve_generator.md @@ -4,11 +4,11 @@ jupytext: extension: .md format_name: myst format_version: 0.13 - jupytext_version: 1.15.2 + jupytext_version: 1.16.0 kernelspec: - display_name: Python 3 (ipykernel) + display_name: science_demo language: python - name: python3 + name: conda-env-science_demo-py --- # Make Multiwavelength Light Curves Using Archival Data @@ -17,19 +17,19 @@ kernelspec: ## Learning Goals By the end of this tutorial, you will be able to: • Automatically load a catalog of sources - • Automatically search NASA and non-NASA resources for light curves - • Store light curves in a Pandas multiindex dataframe + • Automatically & efficiently search NASA and non-NASA resources for light curves at scale + • Store & manipulate light curves in a Pandas multiindex dataframe • Plot all light curves on the same plot ## Introduction: • A user has a sample of interesting targets for which they would like to see a plot of available archival light curves. We start with a small set of changing look AGN from Yang et al., 2018, which are automatically downloaded. Changing look AGN are cases where the broad emission lines appear or disappear (and not just that the flux is variable). - • We model light curve plots after van Velzen et al. 2021. We search through a curated list of time-domain NASA holdings as well as non-NASA sources. HEASARC catalogs used are Fermi and Beppo-Sax, IRSA catalogs used are ZTF and WISE, and MAST catalogs used are Pan-Starrs, TESS, Kepler, and K2. Non-NASA sources are Gaia and IceCube. This list is generalized enough to include many types of targets to make this notebook interesting for many types of science. All of these time-domain archives are searched in an automated fashion using astroquery or APIs. + • We model light curve plots after van Velzen et al. 2021. We search through a curated list of time-domain NASA holdings as well as non-NASA sources. HEASARC catalogs used are Fermi and Beppo-Sax, IRSA catalogs used are ZTF and WISE, and MAST catalogs used are Pan-Starrs, TESS, Kepler, and K2. Non-NASA sources are Gaia and IceCube. This list is generalized enough to include many types of targets to make this notebook interesting for many types of science. All of these time-domain archives are searched in an automated and efficient fashion using astroquery, pyvo, pyarrow or APIs. • Light curve data storage is a tricky problem. Currently we are using a multi-index Pandas dataframe, as the best existing choice for right now. One downside is that we need to manually track the units of flux and time instead of relying on an astropy storage scheme which would be able to do some of the units worrying for us (even astropy can't do all magnitude to flux conversions). Astropy does not currently have a good option for multi-band light curve storage. - • We intend to explore a ML classifier for these changing look AGN light curves. + • ML work using these time-series light curves is in two neighboring notebooks: ML_AGNzoo and lc_classifier. ## Input: • choose from a list of known changing look AGN from the literature @@ -454,11 +454,10 @@ _ = create_figures(sample_table , This work made use of: • Astroquery; Ginsburg et al., 2019, 2019AJ....157...98G -• Astropy; Astropy Collaboration 2022, Astropy Collaboration 2018, Astropy Collaboration 2013, 2022ApJ...935..167A, 2018AJ....156..123A, 2013A&A...558A..33A +• Astropy; Astropy Collaboration 2022, Astropy Collaboration 2018, Astropy Collaboration 2013, 2022ApJ...935..167A, 2018AJ....156..123A, 2013A&A...558A..33A • Lightkurve; Lightkurve Collaboration 2018, 2018ascl.soft12013L • acstools; https://zenodo.org/record/7406933#.ZBH1HS-B0eY • unWISE light curves; Meisner et al., 2023, 2023AJ....165...36M -• Alerce; Forster et al., 2021, 2021AJ....161..242F ```{code-cell} ipython3