diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..2eec418 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +# Jupyter Notebook and sphinx renderings +.ipynb_checkpoints +*.ipynb +_build/ diff --git a/README.md b/README.md index baa0eac..420d5c0 100644 --- a/README.md +++ b/README.md @@ -2,19 +2,19 @@ ## What is the Fornax Science Console? -The Fornax Science Console is a NASA-funded web-based application that provides access to a limited amount of cloud computing via JupyterLab, which offers access to Jupyter Notebooks, Jupyter Console, and the terminal (command line). Users will need to register to login to the system, but usage is free. Once logged in, users will have access to data sets curated by repositories around the world, and can upload moderate amounts of private data. To get started quickly, users can choose from a variety of example Jupyter notebooks as well as pre-installed software environments. These can be modified to suit user needs. +The Fornax Science Console is a NASA-funded web-based application that provides access to a limited amount of cloud computing via JupyterLab, which offers access to Jupyter Notebooks, Jupyter Console, and the terminal (command line). Users will need to register to login to the system, but usage is free. Once logged in, users will have access to data sets curated by repositories around the world, and can upload moderate amounts of private data. To get started quickly, users can choose from a variety of example Jupyter notebooks as well as pre-installed software environments. These can be modified to suit user needs. The Fornax Science Console supports many astronomical use cases, but users will find it especially beneficial for analyses * on large cloud-hosted data sets that would be cumbersome to download; * that require complicated software pre-installed on the platform; or * that are parallelizable and require more compute than they currently have access to. - + ### Fornax Science Console basic capabilities * CPUs: Upon logging in, users will have access to 4 CPUs provided by AWS. These are useful for smaller analyses and to test out larger analyses. Once a user has completed testing and is ready to scale up an analysis, they can request up to 128 CPUs. * RAM: Upon logging in, users will have access to up to 16 GB of RAM. Up to 512 GB of RAM are available upon request. * User Storage: Upon logging in, users will have access to 10 GB of storage; additional storage is available upon request. * GPUs: There are currently no GPUs available. - + ### Data access within the Fornax Science Console Users of the Fornax Science Console will have access to data curated and published by data repositories around the world. @@ -31,13 +31,13 @@ Under construction: How can users get a list of pre-installed software without l ### 1. Get an account on the Fornax Science Console The platform is currently available by invitation only. - + ### 2. Log into the Fornax Science Console Once you have your login credentials, enter them at: https://daskhub.fornaxdev.mysmce.com/ - + * Choose a software container image: You will be given the option of choosing from a menu of software container images. Currently we offer to two images: * Astrophysics Default Image (most users should choose this) * High Energy Astrophysics Image @@ -48,21 +48,21 @@ https://daskhub.fornaxdev.mysmce.com/ ### 3. Start a new notebook. * Click on the the blue `+` in the upper left of the Jupyterlab window to open the launcher, which will give you the option to open a Notbook or a Terminal. Choose the "science_demo" kernal under "Notebook". This will open a new notebook that you can start coding in and run on the platform. - + ![new launcher](./static/images/new_launcher.png) ### 4. End your JupyterHub session. * Before logging out, please shut down your server. This is an important step which insures the server you are using doesn't keep running in the background, thereby wasting resources. * Go to the `File` Menu and click on `hub control panel` as in the below image, which will bring up the option to `stop my server`(in red). After stopping the server, please `logout` in the upper right of the JupyterHub window. - + ![ ](./static/images/hub_control_panel.png) - + ## Navigating JupyterLab in the Fornax Science Console The JupyterLab User Guide provides information on how to navigate the interface: * https://jupyterlab.readthedocs.io/en/stable/user/interface.html -The Fornax Science Platform additionally contains a JupyterLab extension called BXPlorer, which provides a UI to manage the cloud-hosted data in the Fornax Science Platform: +The Fornax Science Platform additionally contains a JupyterLab extension called BXPlorer, which provides a UI to manage the cloud-hosted data in the Fornax Science Platform: * https://github.com/Navteca/jupyterlab-bxplorer ## Starting & Monitoring Analyses @@ -75,7 +75,7 @@ The Fornax Science Platform additionally contains a JupyterLab extension called * `free -h` will give the amount of RAM available/used * `cat /proc/meminfo` will give more detailed info on the amount of RAM available/used * `top` gives info on both CPU and RAM usage. Some numerical packages (e.g. numpy) use multithreading, so you may see that the CPU usage is more 100%. That means more than one CPU is used. You can see the individual CPU usage by pressing 1 while the `top` command is running. - + Under construction: It appears that sometimes we are allowed to use more CPU than listed for a short amount of time. Is this true? and what are the parameters of when and for what sizes that will be allowed? ### How can I tell if I am close to using up my allocation of compute and storage resources? @@ -83,10 +83,10 @@ Under construction: It appears that sometimes we are allowed to use more CPU tha Under construction. ### How will my analysis be affected by memory limitations? - + If your workload exceeds your server size, your server may be allowed to use additional resources temporarily. This can be convenient but should not be relied on. In particular, be aware that your job may be killed automatically and without warning if its RAM needs exceed the alloted memory. This behavior is not specific to Fornax or AWS, but users may encounter it more often on the science console due to the flexible machine sizing options. (Your laptop needs to have the max amount of memory that you will ever use while working on it. On the science console, you can choose a different server size every time you start it up -- this is much more efficient, but also requires you to be more aware of how much CPU and RAM your tasks need.) -### What is a kernel and how do I choose one? +### What is a kernel and how do I choose one? Under Construction: In Jupyter, kernels are the background processes that execute cells and return results for display. To select the kernel on which you want to run your Notebook, go to the Kernel menu and choose Change Kernel. You can also click directly on the name of the active kernel to switch to another one. The bottom of the JupterLab window lists the github branch as well as the name of the kernel in use. The kernel is listed as either 'idle' or 'busy', which is useful to know if your kernel is working or has crashed. @@ -112,7 +112,7 @@ Under Construction. ### How can I upload my own data for use with compute provided by the Fornax Science Console? The `uparrow` in the upper left allows you to upload data. If it is a large amount of data, consider creating a zip or tar archive first. From within JupyterLab, you can also use a terminal to transfer data with the usual methods (`scp`, `wget`, `curl` should all work). The current (Feb 2024) default storage limit for uploaded data is 10GB (Feb 2024). When you log into the science console for the first time, the active directory is your `$HOME` directory. It contains preexisting folders like `efs/` and `s3/` with shared data. You may also create your own directories and files here. Your edits outside of the shared folders are not visible to other users. - + ![upload_button](./static/images/upload_button.png) ### How can I access data that has been hosted on the cloud by the Fornax archives? @@ -120,11 +120,11 @@ The `uparrow` in the upper left allows you to upload data. If it is a large amo * [Tutorial](https://irsa.ipac.caltech.edu/docs/notebooks/) notebook on IRSA data * Under Construction: Where is Abdu's similar notebook with pyvo tools that was used for the July2023 HQ demo? * Under Construction: placeholder for Brigitta's SIA access notebook - + ### Is there a way access data in a Box account from the Fornax Science Console? Any publicly accessible web service can be reached from Fornax through the HTTPS protocol, e.g., APIs, wget, etc. - + ### Is there a way to access data from an AWS bucket? Any publicly available bucket is visible from Fornax as it would be on your laptop. If you require an access key to see into the bucket from your laptop, you will also need that on Fornax. @@ -135,9 +135,9 @@ Double-clicking on a png or PDF in the file browser will open it in a new tab. ### How do I download data from the Fornax Science Console to my local machine? If it is a large amount of data, consider creating a zip or tar archive first. If it is a small file, you can right click on the file name in the file browser and scroll to `Download`. - + ![right_click_download](./static/images/right_click_download.png) - + ### How do I share data from inside Fornax with collaborators? Download them to favorite storage place (university Box account) or put in AWS cloud. @@ -146,12 +146,12 @@ Download them to favorite storage place (university Box account) or put in AWS c ### How can I get a list of what software is pre-installed on the Fornax Science Console? Software is installed in miniconda environments. You can use "[conda list](https://conda.io/projects/conda/en/latest/commands/list.html)" from a Terminal within the Fornax Science Console to list the contents of each environment. - + ### Can I install my own software on the Fornax Science Console? * Persistent User-Installed Software * If the pre-installed environments don't have the software you need, you can create your own persistent environment available across multiple sessions. Follow the instructions in the [conda documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html), specifically [managing environments](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-envs). - * Non-persistent User-Installed Software - * You can !pip install your favorite software from inside a notebook. This installed software will persist through kernel restarts, but will not be persistent if you stop your server and restart it (logging out and back in) unless you specify the - - user option, which will put the software in your home directory. Note that an install done in one compute environment may or may not work in a container opened using another environment, even if the directory is still there. Conda environments are useful to manage these. + * Non-persistent User-Installed Software + * You can !pip install your favorite software from inside a notebook. This installed software will persist through kernel restarts, but will not be persistent if you stop your server and restart it (logging out and back in) unless you specify the - - user option, which will put the software in your home directory. Note that an install done in one compute environment may or may not work in a container opened using another environment, even if the directory is still there. Conda environments are useful to manage these. * For the tutorial notebooks we tend to have a requirements.txt file in the repo which lists all the software dependencies. Then the first line in the notebook is `!pip install -r requirements.txt` That way other people can run the notebook and will know which software is involved. * There is a limit on the space a user has access to, but not the number of packages, and packages are usually small. @@ -165,15 +165,15 @@ Software is installed in miniconda environments. You can use "[conda list](http * Emacs or vi is possible from the terminal. * The JupyterLab interface also has its own editor. * If you prefer to develop elsewhere, you can push your changes to a publicly available repo (e.g., GitHub) and synchronize that to a location on your home directory on Fornax. - + ### Will notebooks that run on Fornax also work on my laptop? * In general, yes, but you need to have a Python environment setup in the same way as on it is on Fornax. * See below under "Can I run the container from Fornax on my own personal computer/laptop?" - + ### Is it possible to launch apps from icons? Like MOPEX or SPICE * These apps are unavailable in Fornax - + ### Is it possible to run licensed software (IDL) in Fornax? * licensed software is not possible in Fornax @@ -181,20 +181,20 @@ Software is installed in miniconda environments. You can use "[conda list](http * Yes. The images are all on the AWS Elastic Container Registry. * * Under Construction: Need a link and more instructions - + ## [Examples and Tutorials](https://nasa-fornax.github.io/fornax-demo-notebooks/) ### Fully worked science use cases * [Forced photometry](https://github.com/nasa-fornax/fornax-demo-notebooks/tree/main/forced_photometry/) * [Light curves](https://github.com/nasa-fornax/fornax-demo-notebooks/tree/main/light_curves/) * [ML dimensionality reduction](https://github.com/nasa-fornax/fornax-demo-notebooks/blob/main/light_curves/ML_AGNzoo.md) - * -### Cloud + +### Cloud * [STScI](https://github.com/spacetelescope/tike_content/blob/main/content/notebooks/data-access/data-access.ipynb) * [IRSA Cloud Access Introduction](https://irsa.ipac.caltech.edu/docs/notebooks/cloud-access-intro.html) * [Parquet info from IRSA](https://irsa.ipac.caltech.edu/docs/notebooks/wise-allwise-catalog-demo.html) * [Image cutouts](https://docs.astropy.org/en/stable/io/fits/usage/cloud.html#using-cutout2d-with-cloud-hosted-fits-files) - * + ### Optimizing code for CPU usage (CPU profiling) * profiliing within Fornax is possible, however vizualizing the profile is not yet possible * profiling needs to be done on a .py script, and not a jupyter notebook @@ -203,19 +203,19 @@ Software is installed in miniconda environments. You can use "[conda list](http * On your local computer command line: `python -m snakeviz output_profile_name.prof` * documentation for snakeviz: https://jiffyclub.github.io/snakeviz/ * This really only looks at CPU usage - + ### Optimizing code for memory usage [(memory profiling)](https://towardsdatascience.com/profile-memory-consumption-of-python-functions-in-a-single-line-of-code-6403101db419) * inside the notebook: * `pip install -U memory_profiler` * `from memory_profiler import profile` * above the function you want to check add this line: @profile * run the script: python -m memory_profiler .py > mem_prof.txt - + ### Optimizing code for multiple CPUs with parallelization * Python built in [multiprocessing](https://irsa.ipac.caltech.edu/docs/notebooks/Parallelize_Convolution.html) * [Dask gateway](https://gateway.dask.org) - * Our [scale up](https://github.com/fornax-navo/fornax-demo-notebooks/blob/main/light_curves/scale_up.md) notebook is a tutorial on parallelization of generating multiwavelength light curves with tools, tips, and suggestions relevant to many tasks. - + * Our [scale up](https://github.com/nasa-fornax/fornax-demo-notebooks/blob/main/light_curves/scale_up.md) notebook is a tutorial on parallelization of generating multiwavelength light curves with tools, tips, and suggestions relevant to many tasks. + ### [MAST science examples](https://github.com/spacetelescope/tike_content/blob/main/markdown/science-examples.md) ### HEASARC [sciserver_cookbooks](https://github.com/HEASARC/sciserver_cookbooks/blob/main/Introduction.md) @@ -238,15 +238,15 @@ Software is installed in miniconda environments. You can use "[conda list](http * ask your peers: User forum at #fornax-users slack channel * ask Fornax Helpdesk * email fornax-helpdesk@lists.nasa.gov -* If you are reporting a problem or suspected bug, please include as much of the following information as possible. This will help minimize the time it takes for us to diagnose the problem and get you an answer. +* If you are reporting a problem or suspected bug, please include as much of the following information as possible. This will help minimize the time it takes for us to diagnose the problem and get you an answer. - date and time (with timezone) the problem occurred - web browser (name and version) you are using to connect to the Fornax Science Console (e.g., Chrome 125.0.64422.142) - - where in the Fornax Science Console this happened (e.g., while running a notebook in JupyterHub; while using the S3 bucket menu; etc.) + - where in the Fornax Science Console this happened (e.g., while running a notebook in JupyterHub; while using the S3 bucket menu; etc.) - what you were doing when the problem occurred - what you expected to have happen - what happened instead - - include any errors messages or info that were produced + - include any errors messages or info that were produced - please include any additional information you feel is relevant (e.g., successfully did this same thing previously on date NN at time NN and it worked then) - if you have logs or a traceback, please include them @@ -298,13 +298,13 @@ Since one of the main drivers for using Fornax is the advantage of multiple CPUs ## Additional Resources * New to Python? - - * [numpy tutorials](?) - * [Scipy lecture notes](https://scipy-lectures.org/) - * [Data science handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) + * [Numpy Tutorials](https://numpy.org/numpy-tutorials/) + * [Scientific Python Lecture Notes](https://lectures.scientific-python.org/) + * [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) * [Guide to documenting python functions](https://developer.lsst.io/python/numpydoc.html#numpydoc-sections-in-docstrings) * [Github](https://docs.github.com/en/get-started/quickstart) * [Debugging](https://jakevdp.github.io/PythonDataScienceHandbook/01.06-errors-and-debugging.html#Debugging:-When-Reading-Tracebacks-Is-Not-Enough) -* Fornax is the collaboration of three NASA archives +* Fornax is the collaboration of three NASA archives * [IRSA](https://irsa.ipac.caltech.edu/frontpage/) * [HEASARC](https://heasarc.gsfc.nasa.gov) * [MAST](https://archive.stsci.edu)