This notenbook contains a simple example of using datalinks to serve data from the cloud.
This example uses the HEASARC SIA service. The changes needed on the server side for this to work are:
- In the SIA (or any other service where datalinks can work), add a
<PARAM>
element inside the<GROUP>
element in theadhoc:service
<RESOURCE>
, as defined in the datalinks standars document. The<PARAM>
element have a namesource
, and contains sources from where the data can be accessed. The default ismain-server
, that indicates accessing data from on-prem servers.
The following shows an example where the data can be access from four sources:
- On prem servers (`value="main-server"`)
- AWS US east1 (value="aws:us-east1")
- AWS US east2 (value="aws:us-east2")
- Google Cloud (value="gc").
<RESOURCE utype="adhoc:service" type="meta">
<PARAM datatype="char" arraysize="*" name="standardID" value="ivo://ivoa.net/std/DataLink#links-1.0"/>
<PARAM datatype="char" arraysize="*" name="accessURL" value="http://localhost:8080/xamin/vo/datalink/chanmaster"/>
<GROUP name="inputParams">
<PARAM ref="DataLinkID" datatype="char" arraysize="*" name="id" value=""/>
<PARAM datatype="char" arraysize="*" name="source" value="main-server">
<VALUES>
<OPTION name="On prem servers" value="main-server"/>
<OPTION name="AWS region 1" value="aws:us-east1"/>
<OPTION name="AWS some other region" value="aws:us-east2"/>
<OPTION name="GC some region" value="gc"/>
</VALUES>
</PARAM>
</GROUP>
</RESOURCE>
- The datalink service should be able to interpret the
source
parameter that the clients sends with the datalink request, and serve the appropriateaccess_url
. So a request to the datalink url with&source=main-server
should give something like:
<TABLE>
<FIELD datatype="char" arraysize="*" ucd="meta.id;meta.main" name="ID"/>
<FIELD datatype="char" arraysize="*" ucd="meta.ref.url" name="access_url"/>
...
<DATA>
<TABLEDATA>
<TR>
<TD>[SOME_ID]</TD>
<TD>https://someurl/path/to/some/file.fits</TD>
...
</TR>
</TABLEDATA>
</DATA>
</TABLE>
Passing &source=aws:us-east1
for example would give:
<TABLE>
<FIELD datatype="char" arraysize="*" ucd="meta.id;meta.main" name="ID"/>
<FIELD datatype="char" arraysize="*" ucd="meta.ref.url" name="access_url"/>
...
<DATA>
<TABLEDATA>
<TR>
<TD>[SOME_ID]</TD>
<TD>s3://somebucket/path/to/some/file.fits</TD>
...
</TR>
</TABLEDATA>
</DATA>
</TABLE>
import pyvo
from astropy.coordinates import SkyCoord
# set some sky position to use in the queries
pos = SkyCoord.from_name('NGC 4151')
# make a simple SIA query. If not the HEASARC, change sia_url.
#xaminUrl = 'http://localhost:8080/xamin'
xaminUrl = 'https://heasarc.gsfc.nasa.gov/xamin_aws'
sia_url = f'{xaminUrl}/vo/sia?table=chanmaster'
sia_result = pyvo.dal.sia.search(sia_url, pos=pos, resultmax=2)
# explore the returned SIA result
#sia_result.votable.to_xml('sai_result.xml')
sia_result.to_table()
obsid | status | name | ra | dec | time | detector | grating | exposure | type | pi | public_date | datalink | t_min | t_resolution | t_max | t_exptime | em_res_power | s_region | s_ra | s_dec | s_resolution | access_estsize | s_fov | o_ucd | access_url | obs_publisher_did | obs_id | obs_collection | target_name | instrument_name | facility_name | pol_states | calib_level | access_format | dataproduct_type | em_min | em_max | SIA_title | SIA_scale | SIA_naxis | SIA_naxes | SIA_format | SIA_reference | SIA_ra | SIA_dec | SIA_instrument | cloud_access |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
deg | deg | mjd | s | mjd | d | s | d | s | deg | deg | deg | arcsec | kbyte | deg | m | m | |||||||||||||||||||||||||||||||
object | object | object | float64 | float64 | float64 | object | object | float64 | object | object | int32 | object | float64 | float64 | float64 | float64 | float64 | object | float64 | float64 | float32 | int32 | float64 | object | object | object | object | object | object | object | object | object | int32 | object | object | float64 | float64 | object | object | object | int32 | object | object | float64 | float64 | object | object |
15158 | archived | RBS1066 | 181.29000 | 39.34700 | 56363.8531 | ACIS-I | NONE | 8080 | GO | Reiprich | 56729 | 18244:chandra.obs.img | 56363.8531481481 | -- | 64443.8531481481 | 8080.0 | -- | 181.29 | 39.347 | -- | 32447 | -- | https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.fits.gz | HEASARC | 15158 | CHANDRA ACIS-I | RBS1066 | ACIS-I | Chandra | 3 | image/fits | Image | 1.24e-10 | 1.24e-08 | acisf15158N003_cntr_img2.fits | [-0.0013666666666667 0.0013666666666667] | [1024 1024] | 2 | image/fits | https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.fits.gz | 181.29 | 39.347 | CHANDRA ACIS-I | {"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.fits.gz" }} | |||
15158 | archived | RBS1066 | 181.29000 | 39.34700 | 56363.8531 | ACIS-I | NONE | 8080 | GO | Reiprich | 56729 | 18244:chandra.obs.img | 56363.8531481481 | -- | 64443.8531481481 | 8080.0 | -- | 181.29 | 39.347 | -- | 228059 | -- | https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.jpg | HEASARC | 15158 | CHANDRA ACIS-I | RBS1066 | ACIS-I | Chandra | 3 | image/jpeg | Image | 1.24e-10 | 1.24e-08 | acisf15158N003_cntr_img2.jpg | [-0.0013666666666667 0.0013666666666667] | [1024 1024] | 2 | image/jpeg | https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.jpg | 181.29 | 39.347 | CHANDRA ACIS-I | {"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8/15158/primary/acisf15158N003_cntr_img2.jpg" }} |
# get the datalink for the first row
dlink = sia_result[0].getdatalink()
# explore the returned datalink result
#dlink.votable.to_xml('datalink_result.xml')
dlink.to_table()
ID | access_url | service_def | error_message | description | semantics | content_type | content_length | cloud_access |
---|---|---|---|---|---|---|---|---|
byte | ||||||||
object | object | object | object | object | object | object | int64 | object |
18244:chandra.obs.img | https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gz | Center Image | https://localhost:8080/xamin/jsp/products.jsp#chandra.obs.img.cntr.fits | application/fits | -- | {"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gz" }} | ||
18244:chandra.obs.img | https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_full_img2.fits.gz | Full Image | https://localhost:8080/xamin/jsp/products.jsp#chandra.obs.img.full.fits | application/fits | -- | {"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_full_img2.fits.gz" }} | ||
18244:chandra.obs.img | https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.jpg | Center Image | https://localhost:8080/xamin/jsp/products.jsp#chandra.obs.img.cntr.jpg | image/jpeg | -- | {"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.jpg" }} | ||
18244:chandra.obs.img | https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_full_img2.jpg | Full Image | https://localhost:8080/xamin/jsp/products.jsp#chandra.obs.img.full.jpg | image/jpeg | -- | {"aws": { "bucket_name": "dh-fornaxdev", "region": "us-east-1", "access": "region", "key": "/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_full_img2.jpg" }} |
Read the cloud information from the datalink resource in the SIA result.
This is done by exposing what getdatalink()
does inside pyvo
, and we add the part that processes the extra parameters
# expose what goes on inside pyvo when doing getdatalink()
dlink_resource = sia_result.get_adhocservice_by_ivoid(pyvo.dal.adhoc.DATALINK_IVOID)
# Look for the 'source' <PARAM> element inside the inputParams <GROUP> element.
# pyvo already handles part of this.
source_elem = [p for p in dlink_resource.groups[0].entries if p.name == 'source'][0]
print(type(source_elem))
print(source_elem)
<class 'astropy.io.votable.tree.Param'>
<PARAM ID="source" arraysize="*" datatype="char" name="source" value="main-server"/>
# list the available options in the `source` element:
access_options = source_elem.values.options
print(f'There are {len(access_options)} options:')
for opt in access_options:
print(f'\t{opt[1]:13}: {opt[0]}')
There are 4 options:
main-server : On prem servers
aws:us-east1 : AWS region 1
aws:us-east2 : AWS some other region
gc : GC some region
Given these options, we can query for the datalink we want by including the parameter source
in the query, where its value takes one of the options in access_options
## main-server; this is the default
source_1 = access_options[0][1]
query_1 = pyvo.dal.adhoc.DatalinkQuery.from_resource(
sia_result[0], dlink_resource, sia_result._session, source=source_1
)
result_1 = query_1.execute()
print(f'access option: {source_1}')
print('access_url: ')
print(result_1[0].access_url)
access option: main-server
access_url:
https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gz
Note that access_url
is now an s3 uri.
## aws:us-east1
source_2 = access_options[1][1]
query_2 = pyvo.dal.adhoc.DatalinkQuery.from_resource(
sia_result[0], dlink_resource, sia_result._session, source=source_2
)
result_2 = query_2.execute()
print(f'access option: {source_2}')
print('access_url: ')
print(result_2[0].access_url)
access option: aws:us-east1
access_url:
s3://dh-fornaxdev/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gz
This is not supported, so we fall back to the default
## gc; GC is not implemented so the server defaults http from main server
source_3 = access_options[3][1]
query_3 = pyvo.dal.adhoc.DatalinkQuery.from_resource(
sia_result[0], dlink_resource, sia_result._session, source=source_3
)
result_3 = query_3.execute()
print(f'access option: {source_3}')
print('access_url: ')
print(result_3[0].access_url)
access option: gc
access_url:
https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/8//15158/primary/acisf15158N003_cntr_img2.fits.gz