-
Notifications
You must be signed in to change notification settings - Fork 11
How to Get Data Into Vizier
Oliver Kennedy edited this page Dec 31, 2023
·
4 revisions
- A file already on my computer
- A dataset at a URL
- A dataset in another workflow
- Something I can load in python
This process works for files of the following formats stored in the working directory from which you called Vizier:
- CSV files
- Excel spreadsheets (XLSX; with one sheet)
- JSON files (with one JSON record per line)
- Text files (with one record per line)
- Create a Load Dataset cell.
- Click Local Files and navigate to the file you want to add and select it.
- Alternatively: Enter the absolute path to the file you want to load (not recommended, as this can break portability)
- Optionally: Enter a custom name for your dataset
- Optionally: Configure the load options if Vizier did not guess them correctly.
- Click Save
- Optionally: Reopen the load dataset cell to customize any column names or data types.
This process works for the following formats:
- CSV files
- Excel spreadsheets (XLSX; with one sheet)
- JSON files (with one JSON record per line)
- Text files (with one record per line)
- JDBC databases (postgresql, sqlite)
- Create a Load Dataset cell.
- Paste the URL of the file you want to load into the File or URL field.
- Optionally: Enter a custom name for your dataset
- Optionally: Configure the load options if Vizier did not guess them correctly.
- Click Save
- Optionally: Reopen the load dataset cell to customize any column names or data types.
This process works for any dataset generated by one workflow that needs to be imported into another.
- In the source workflow, create a Export Dataset cell.
- Select the dataset to be transferred.
- Select Publish to another Project from the Format menu.
- Optionally: Give the published dataset a name.
- In the target workflow, create a Load Dataset cell.
- Click Published Datasets and select the name of the dataset you exported in the source workflow.
- Click Save
This process works for any format that can be read into python.
Create a Python cell and use the following python code template:
ds = vizierdb.new_dataset()
ds.insert_column('column1')
ds.insert_column('column2')
# and so forth...
# With the variable 'source' containing a list of rows...
# ... where each row is an array in the order of the columns above.
for row in source:
ds.insert_row(row)
ds.save('my_dataset')
For example, the following script will create a table called 'urls' with one row for each link that appears in the source page and the accompanying text. It requires both Requests (requests
) and Beautiful Soup 4 (bs4
)
my_url = # put a URL here
ds = vizierdb.new_dataset()
ds.insert_column('url')
ds.insert_column('text')
import requests
from bs4 import BeautifulSoup
page = requests.get(my_url)
soup = BeautifulSoup(page.text, 'html.parser')
for url in soup.find_all('a'):
ds.insert_row([
url['href'],
url.text
])
ds.save('urls')