Skip to content

How to Get Data Into Vizier

Oliver Kennedy edited this page Dec 31, 2023 · 4 revisions

What do you want to import?

Linking Local Files

This process works for files of the following formats stored in the working directory from which you called Vizier:

  • CSV files
  • Excel spreadsheets (XLSX; with one sheet)
  • JSON files (with one JSON record per line)
  • Text files (with one record per line)
  1. Create a Load Dataset cell.
  2. Click Local Files and navigate to the file you want to add and select it.
    • Alternatively: Enter the absolute path to the file you want to load (not recommended, as this can break portability)
  3. Optionally: Enter a custom name for your dataset
  4. Optionally: Configure the load options if Vizier did not guess them correctly.
  5. Click Save
  6. Optionally: Reopen the load dataset cell to customize any column names or data types.

Linking to URLS

This process works for the following formats:

  • CSV files
  • Excel spreadsheets (XLSX; with one sheet)
  • JSON files (with one JSON record per line)
  • Text files (with one record per line)
  • JDBC databases (postgresql, sqlite)
  1. Create a Load Dataset cell.
  2. Paste the URL of the file you want to load into the File or URL field.
  3. Optionally: Enter a custom name for your dataset
  4. Optionally: Configure the load options if Vizier did not guess them correctly.
  5. Click Save
  6. Optionally: Reopen the load dataset cell to customize any column names or data types.

From Another Workflow

This process works for any dataset generated by one workflow that needs to be imported into another.

  1. In the source workflow, create a Export Dataset cell.
  2. Select the dataset to be transferred.
  3. Select Publish to another Project from the Format menu.
  4. Optionally: Give the published dataset a name.
  5. In the target workflow, create a Load Dataset cell.
  6. Click Published Datasets and select the name of the dataset you exported in the source workflow.
  7. Click Save

Python Ingest

This process works for any format that can be read into python.

Create a Python cell and use the following python code template:

ds = vizierdb.new_dataset()

ds.insert_column('column1')
ds.insert_column('column2')
# and so forth...

# With the variable 'source' containing a list of rows...
# ... where each row is an array in the order of the columns above.
for row in source:
    ds.insert_row(row)

ds.save('my_dataset')

For example, the following script will create a table called 'urls' with one row for each link that appears in the source page and the accompanying text. It requires both Requests (requests) and Beautiful Soup 4 (bs4)

my_url = # put a URL here

ds = vizierdb.new_dataset()
ds.insert_column('url')
ds.insert_column('text')

import requests
from bs4 import BeautifulSoup

page = requests.get(my_url)
soup = BeautifulSoup(page.text, 'html.parser')

for url in soup.find_all('a'):
    ds.insert_row([
        url['href'],
        url.text
    ])

ds.save('urls')