Merge pull request #198 from /issues/177

Cleanup light curve notebook (complete Issue #177) 6491ad0
nasa-fornax · Jan 11, 2024 · e9b9af8 · e9b9af8
1 parent 42539d2
commit e9b9af8
Show file tree

Hide file tree

Showing 9 changed files with 255 additions and 261 deletions.
diff --git a/.doctrees/environment.pickle b/.doctrees/environment.pickle
diff --git a/.doctrees/light_curves/light_curve_generator.doctree b/.doctrees/light_curves/light_curve_generator.doctree
diff --git a/_sources/forced_photometry/multiband_photometry.ipynb b/_sources/forced_photometry/multiband_photometry.ipynb
diff --git a/_sources/light_curves/ML_AGNzoo.ipynb b/_sources/light_curves/ML_AGNzoo.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "67ecc6b8",
+   "id": "0e4da272",
    "metadata": {},
    "source": [
     "# How do AGNs selected with different techniques compare? \n",
@@ -17,7 +17,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "34011258",
+   "id": "449054d8",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -68,7 +68,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "29c88b1c",
+   "id": "abc40524",
    "metadata": {},
    "source": [
     "Here we load a parquet file of light curves generated using the multiband_lc notebook. One can build the sample from different sources in the literature and grab the data from archives of interes."
@@ -77,7 +77,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2882f5dc",
+   "id": "6d85ad99",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -88,7 +88,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "38e38d14",
+   "id": "ec2e1d57",
    "metadata": {},
    "source": [
     "## What is in this sample?\n",
@@ -99,7 +99,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e67b1eb1",
+   "id": "572a967c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -134,7 +134,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fb39af35",
+   "id": "1729f977",
    "metadata": {},
    "source": [
     "In this particular example, the largest three subsamples are AGNs selected from [gamma ray observations by the Fermi Large Area Telescope](https://ui.adsabs.harvard.edu/abs/2015yCat..18100014A/similar) (with more than 98% blazars), AGNs from the optical spectra by the [SDSS quasar sample DR16Q](https://www.sdss4.org/dr17/algorithms/qso_catalog/) with a criteria on redshift (z<2), and a subset of AGNs selected in MIR WISE bands based on their variability ([csv in data folder credit RChary](https://ui.adsabs.harvard.edu/abs/2019AAS...23333004P/abstract)). We also include some smaller samples from the literature to see where they sit compared to the rest of the population and if they are localized on the 2D projection. These include the Changing Look AGNs from the literature (e.g., [LaMassa et al. 2015](https://ui.adsabs.harvard.edu/abs/2015ApJ...800..144L/abstract), [Lyu et al. 2022](https://ui.adsabs.harvard.edu/abs/2022ApJ...927..227L/abstract), [Hon et al. 2022](https://ui.adsabs.harvard.edu/abs/2022MNRAS.511...54H/abstract)), a sample which showed variability in Galex UV images ([Wasleske et al. 2022](https://ui.adsabs.harvard.edu/abs/2022ApJ...933...37W/abstract)), a sample of variable sources identified in optical Palomar observarions ([Baldassare et al. 2020](https://ui.adsabs.harvard.edu/abs/2020ApJ...896...10B/abstract)), and the optically variable AGNs in the COSMOS field from a three year program on VLT([De Cicco et al. 2019](https://ui.adsabs.harvard.edu/abs/2019A%26A...627A..33D/abstract)). We also include 30 Tidal Disruption Event coordinates identified from ZTF light curves [Hammerstein et al. 2023](https://iopscience.iop.org/article/10.3847/1538-4357/aca283/meta)."
@@ -143,7 +143,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "66e7e3ef",
+   "id": "7c722f45",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -162,7 +162,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7610afce",
+   "id": "045cb6b6",
    "metadata": {},
    "source": [
     "The histogram shows the number of lightcurves which ended up in the multi-index data frame from each of the archive calls in different wavebands/filters. We note that the IceCube peak should be corrected as it also include non detections in the figure above."
@@ -171,7 +171,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d7803ac8",
+   "id": "a215a464",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -206,15 +206,15 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d4c7a62c",
+   "id": "069cc8b0",
    "metadata": {},
    "source": [
     "While from the histogram plot we see which bands have the highest number of observed lightcurves, what might matter more in finding/selecting variability or changing look in lightcurves is the cadence and the average baseline of observations. For instance, Panstarrs has a large number of lightcurve detections in our sample, but from the figure above we see that the average number of visits and the baseline for those observations are considerably less than ZTF. WISE also shows the longest baseline of observations which is suitable to finding longer term variability in objects."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "21d86498",
+   "id": "ff07b410",
    "metadata": {},
    "source": [
     "## Looking at ZTF lightcurves alone\n",
@@ -225,7 +225,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "1b8693a8",
+   "id": "d1c39a42",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -259,7 +259,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "391aca63",
+   "id": "c746161c",
    "metadata": {},
    "source": [
     "The combination of the tree bands into one longer arrays in order of increasing wavelength, can be seen as providing both the SED shape as well as variability in each from the light curve. Figure below demonstrates this as well as our normalization choice. We normalize the data in ZTF R band as it has a higher average numbe of visits compared to G and I band. We remove outliers before measuring the mean and max of the light curve and normalizing by it. This normalization can be skipped if one is mearly interested in comparing brightnesses of the data in this sample, but as dependence on flux is strong to look for variability and compare shapes of light curves a normalization helps."
@@ -268,7 +268,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "03f48817",
+   "id": "0b16923b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -306,7 +306,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eff9b72f",
+   "id": "9e549fbc",
    "metadata": {},
    "source": [
     "Now we can train a UMAP with the processed data vectors above. Different choices for the number of neighbors, minimum distance and metric can be made and a parameter space can be explored. We show here our preferred combination given this data. We choose manhattan distance (also called [the L1 distance](https://en.wikipedia.org/wiki/Taxicab_geometry)) as it is optimal for the kind of grid we interpolated on, for instance we want the distance to not change if there are observations missing. Another metric appropriate for our purpose in time domain analysis is Dynamic Time Warping ([DTW](https://en.wikipedia.org/wiki/Dynamic_time_warping)), which is insensitive to a shift in time. This is helpful as we interpolate the observations onto a grid starting from time 0 and when discussing variability we care less about when it happens and more about whether and how strong it happened. As the measurement of the DTW distance takes longer compared to the other metrics we show examples here with manhattan and only show one example exploring the parameter space including a DTW metric in the last cell of this notebook."
@@ -315,7 +315,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "89a122f7",
+   "id": "8ac38d5b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -351,7 +351,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bb3ba587",
+   "id": "5b432d1d",
    "metadata": {},
    "source": [
     "The left panel is colorcoded by the origin of the sample. The middle panel shows the sum of mean brightnesses in three bands (arbitrary unit) demonstrating that after normalization we see no correlation with brightness. The panel on the right is color coded by a statistical measure of variability (i.e. the fractional variation [see here](https://ned.ipac.caltech.edu/level5/Sept01/Peterson2/Peter2_1.html)). As with the plotting above it is not easy to see all the data points and correlations in the next two cells measure probability of belonging to each original sample as well as the mean statistical property on an interpolated grid on this reduced 2D projected surface."
@@ -360,7 +360,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "796ebb86",
+   "id": "99f5c14c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -405,7 +405,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bd8b4bb5",
+   "id": "f4327539",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -440,7 +440,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2933b348",
+   "id": "dbd1e9f5",
    "metadata": {},
    "source": [
     "Figure above shows how with ZTF light curves alone we can separate some of these AGN samples, where they have overlaps. We can do a similar exercise with other dimensionality reduction techniques. Below we show two SOMs one with normalized and another with no normalization. The advantage of Umaps to SOMs is that in practice you may change the parameters to separate classes of vastly different data points, as distance is preserved on a umap. On a SOM however only topology of higher dimensions is preserved and not distance hence, the change on the 2d grid does not need to be smooth and from one cell to next there might be larg jumps. On the other hand, an advantage of the SOM is that by definition it has a grid and no need for a posterior interpolation (as we did above) is needed to map more data or to measure probabilities, etc."
@@ -449,7 +449,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "be397cb5",
+   "id": "31622d65",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -462,7 +462,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8adb248d",
+   "id": "4356cdc1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -561,7 +561,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "64cc4dfa",
+   "id": "5f3a4c44",
    "metadata": {},
    "source": [
     "The above SOMs are colored by the mean fractional variation of the lightcurves in all bands (a measure of AGN variability). The crosses are different samples mapped to the trained SOM to see if they are distinguishable on a normalized lightcurve som."
@@ -570,7 +570,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "fdb65d5e",
+   "id": "bbef7fb2",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -590,7 +590,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "0e661978",
+   "id": "92d0413c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -689,15 +689,15 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0b4307c4",
+   "id": "a107c02e",
    "metadata": {},
    "source": [
     "skipping the normalization of lightcurves, can show for example how the Cicco et al. 2019 sample, from the 3year VLT observations of the COSMOS field are all fainter compared to the rest."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "acf58875",
+   "id": "1ff72531",
    "metadata": {},
    "source": [
     "# Repeating the above, this time with Panstarrs observations"
@@ -706,7 +706,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2a70f4e2",
+   "id": "90652dd9",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -741,7 +741,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "eb3ad6a1",
+   "id": "01351fe6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -777,7 +777,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bba46630",
+   "id": "8d9df081",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -812,7 +812,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "99139bd8",
+   "id": "510f8947",
    "metadata": {},
    "source": [
     "# ZTF + WISE"
@@ -821,7 +821,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "4deaf085",
+   "id": "0de41162",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -846,7 +846,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "897e8e71",
+   "id": "dd639f8b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -881,7 +881,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c662c895",
+   "id": "0d0de1a6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -916,7 +916,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7c950e07",
+   "id": "3b41e410",
    "metadata": {},
    "source": [
     "# Wise alone"
@@ -925,7 +925,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "09681aee",
+   "id": "e693e1ca",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -950,7 +950,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7b309cc4",
+   "id": "37071111",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -986,7 +986,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "fd44d81c",
+   "id": "856c1db8",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1023,7 +1023,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6ecbdd76",
+   "id": "871fd64e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1072,7 +1072,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ac6afa6b",
+   "id": "9ad75e25",
    "metadata": {},
    "outputs": [],
    "source": []