diff --git a/example/SparkConfigurationUsage/index.html b/example/SparkConfigurationUsage/index.html index cfd4b76..a03cfc4 100644 --- a/example/SparkConfigurationUsage/index.html +++ b/example/SparkConfigurationUsage/index.html @@ -1066,7 +1066,7 @@

Configuration and Usage diff --git a/search/search_index.json b/search/search_index.json index bb78fcb..2071753 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"jupyterlab-sql-editor","text":"

A JupyterLab extension providing the following features via %sparksql and %trino magics:

"},{"location":"sparksql/","title":"sparksql magic","text":"

A JupyterLab extension providing the following features via %%sparksql and %%trino magics:

"},{"location":"sparksql/#execute-and-output-your-query-results-into-an-interactive-data-grid","title":"Execute and output your query results into an interactive data grid","text":""},{"location":"sparksql/#output-as-json","title":"Output as JSON","text":""},{"location":"sparksql/#auto-suggest-column-names-and-sub-fields","title":"Auto suggest column names and sub-fields","text":""},{"location":"sparksql/#auto-suggest-joins-on-matching-column-names","title":"Auto suggest JOINs on matching column names","text":""},{"location":"sparksql/#format-and-show-syntax-highlighting-in-notebook-code-cells","title":"Format and show syntax highlighting in Notebook code cells","text":"

To format SQL statements in the cell, right-click in the cell and select Format Sql Cell or hit <CTRL+Q>.

"},{"location":"sparksql/#works-in-python-strings","title":"Works in Python strings","text":"

While inside a notebook, you can have a multi-line Python string containing SQL and enjoy the same features (syntax highlighting, code completion and SQL formatting) as in a sparksql cell by wrapping your string with --start-sparksql and --end-sparksql. Here is an example:

# declare a python string\nsql = \"\"\"\n--start-sparksql\nSELECT\n    *\nFROM\n    table AS t\n--end-sparksql\n\"\"\"\nprint(sql)\n

"},{"location":"sparksql/#capture-your-spark-query-as-a-dataframe-or-a-temporary-view","title":"Capture your Spark query as a Dataframe or a temporary view","text":""},{"location":"sparksql/#usage","title":"Usage","text":"

Parameter usage example:

%%sparksql -c -l 10 --dataframe df\n<QUERY>\n
Parameter Description --database NAME Spark database to use. -l LIMIT, --limit LIMIT The maximum number of rows to display. A value of zero is equivalent to --output skip -r all|local|none, --refresh all|local|none Force the regeneration of the schema cache file. The local option will only update tables/views created in the local Spark context. -d NAME, --dataframe NAME Capture results in a Spark dataframe named NAME. -c, --cache Cache dataframe. -e, --eager Cache dataframe with eager load. -v VIEW, --view VIEW Create or replace a temporary view named VIEW. -o sql|json|html|aggrid|grid|text|schema|skip|none, --output sql|json|html|aggrid|grid|text|schema|skip|none Output format. Defaults to html. The sql option prints the SQL statement that will be executed (useful to test jinja templated statements). -s, --show-nonprinting Replace none printable characters with their ascii codes (LF -> \\x0a) -j, --jinja Enable Jinja templating support. -b, --dbt Enable DBT templating support. -t LIMIT, --truncate LIMIT Truncate output. -m update|complete, --streaming_mode update|complete The mode of streaming queries. -x, --lean-exceptions Shortened exceptions. Might be helpful if the exceptions reported by Spark are noisy such as with big SQL queries."},{"location":"trino/","title":"trino magic","text":"

A JupyterLab extension providing the following features via %%sparksql and %%trino magics:

"},{"location":"trino/#use-jinja-templating-to-create-re-usable-sql","title":"Use jinja templating to create re-usable SQL","text":""},{"location":"trino/#usage","title":"Usage","text":"

Parameter usage example:

%%trino -c catalog -l 10 --dataframe df\n<QUERY>\n
Parameter Description -c NAME, --catalog NAME Trino catalog to use. -s NAME, --schema NAME Trino schema to use. -l LIMIT, --limit LIMIT The maximum number of rows to display. A value of zero is equivalent to --output skip -r all|none, --refresh all|none Force the regeneration of the schema cache file. -d NAME, --dataframe NAME Capture results in pandas dataframe named NAME. -o sql|json|html|aggrid|grid|text|schema|skip|none, --output sql|json|html|aggrid|grid|text|schema|skip|none Output format. Defaults to html. The sql option prints the SQL statement that will be executed (useful to test jinja templated statements). -s, --show-nonprinting Replace none printable characters with their ascii codes (LF -> \\x0a). -j, --jinja Enable Jinja templating support. -t LIMIT, --truncate LIMIT Truncate output. -x STATEMENT, --raw STATEMENT Run statement as is. Do not wrap statement with a limit. Use this option to run statement which can't be wrapped in a SELECT/LIMIT statement. For example EXPLAIN, SHOW TABLE, SHOW CATALOGS."},{"location":"example/SparkConfigurationUsage/","title":"Configuration and Usage","text":"In\u00a0[1]: Copied!
from pyspark.sql import SparkSession\n
from pyspark.sql import SparkSession In\u00a0[2]: Copied!
import ipywidgets as widgets\nout = widgets.Output()\nwith out:\n    spark = SparkSession.builder.getOrCreate()\n
import ipywidgets as widgets out = widgets.Output() with out: spark = SparkSession.builder.getOrCreate()
Normally IPython only displays the output of the last statement. However it can be handy to run multiple sql magics in a single cell and see the output of each execution. Setting `ast_node_interactivity` to `all` will enable that.
In\u00a0[3]: Copied!
from IPython.core.interactiveshell import InteractiveShell\nInteractiveShell.ast_node_interactivity = 'all'\n
from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = 'all' In\u00a0[4]: Copied!
%load_ext jupyterlab_sql_editor.ipython_magic.sparksql\n
%load_ext jupyterlab_sql_editor.ipython_magic.sparksql In\u00a0[5]: Copied!
%config SparkSql.cacheTTL=3600\n%config SparkSql.outputFile=\"/tmp/sparkdb.schema.json\"\n
%config SparkSql.cacheTTL=3600 %config SparkSql.outputFile=\"/tmp/sparkdb.schema.json\" In\u00a0[6]: Copied!
df = spark.read.json(\"file:/path/to/contacts.json\")\ndf.createOrReplaceTempView(\"CONTACTS_TABLE\")\ndf.printSchema()\n
df = spark.read.json(\"file:/path/to/contacts.json\") df.createOrReplaceTempView(\"CONTACTS_TABLE\") df.printSchema()
root\n |-- address: struct (nullable = true)\n |    |-- city: string (nullable = true)\n |    |-- postalCode: string (nullable = true)\n |    |-- state: string (nullable = true)\n |    |-- streetAddress: string (nullable = true)\n |-- age: long (nullable = true)\n |-- first Name: string (nullable = true)\n |-- last Name: string (nullable = true)\n |-- phoneNumbers: array (nullable = true)\n |    |-- element: struct (containsNull = true)\n |    |    |-- number: string (nullable = true)\n |    |    |-- type: string (nullable = true)\n\n
In\u00a0[7]: Copied!
df = spark.read.json(\"file:/path/to/conversations.json\")\ndf.createOrReplaceTempView(\"MESSAGES_TABLE\")\ndf.printSchema()\n
df = spark.read.json(\"file:/path/to/conversations.json\") df.createOrReplaceTempView(\"MESSAGES_TABLE\") df.printSchema()
root\n |-- first Name: string (nullable = true)\n |-- last Name: string (nullable = true)\n |-- messages: array (nullable = true)\n |    |-- element: struct (containsNull = true)\n |    |    |-- body: string (nullable = true)\n |    |    |-- time: string (nullable = true)\n\n
In\u00a0[8]: Copied!
%sparksql --refresh all\n
%sparksql --refresh all
Exporting functions: [########################################] 100.0%\nSchema file updated: /tmp/sparkdb.schema.json\n
In\u00a0[9]: Copied!
%sparksql SHOW TABLES\n
%sparksql SHOW TABLES
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
namespacetableNameisTemporary contacts_tabletrue messages_tabletrue
Execution time: 0.24 seconds
In\u00a0[10]: Copied!
%%sparksql --output grid --limit 1000\nSELECT\n    id,\n    uuid()\nFROM\n    RANGE (1, 1000)\n
%%sparksql --output grid --limit 1000 SELECT id, uuid() FROM RANGE (1, 1000)
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
Execution time: 1.86 seconds
In\u00a0[11]: Copied!
%%sparksql --output html --limit 3\n\nSELECT\n    con.`first Name`,\n    con.phoneNumbers [ 0 ].type as primary_number,\n    array_contains(con.phoneNumbers.type, 'home') as flag\nFROM\n    contacts_table AS con\n
%%sparksql --output html --limit 3 SELECT con.`first Name`, con.phoneNumbers [ 0 ].type as primary_number, array_contains(con.phoneNumbers.type, 'home') as flag FROM contacts_table AS con
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
first Nameprimary_numberflag Rackhometrue
Execution time: 0.19 seconds
In\u00a0[12]: Copied!
%%sparksql --output json --limit 3\nSELECT\n    *\nFROM\n    contacts_table AS con\n
%%sparksql --output json --limit 3 SELECT * FROM contacts_table AS con
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
<IPython.core.display.JSON object>
Execution time: 0.19 seconds
In\u00a0[13]: Copied!
%%sparksql --output schema\nSELECT\n    *\nFROM\n    contacts_table AS con\n
%%sparksql --output schema SELECT * FROM contacts_table AS con
root\n |-- address: struct (nullable = true)\n |    |-- city: string (nullable = true)\n |    |-- postalCode: string (nullable = true)\n |    |-- state: string (nullable = true)\n |    |-- streetAddress: string (nullable = true)\n |-- age: long (nullable = true)\n |-- first Name: string (nullable = true)\n |-- last Name: string (nullable = true)\n |-- phoneNumbers: array (nullable = true)\n |    |-- element: struct (containsNull = true)\n |    |    |-- number: string (nullable = true)\n |    |    |-- type: string (nullable = true)\n\n
In\u00a0[14]: Copied!
%%sparksql --view the_exploded_table --output skip\nSELECT\n    *,\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n
%%sparksql --view the_exploded_table --output skip SELECT *, explode(con.phoneNumbers) as phoneNumber FROM contacts_table AS con
Created temporary view `the_exploded_table`\nQuery execution skipped\n
In\u00a0[15]: Copied!
%sparksql SHOW TABLES\n
%sparksql SHOW TABLES
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
namespacetableNameisTemporary contacts_tabletrue messages_tabletrue the_exploded_tabletrue
Execution time: 0.08 seconds
In\u00a0[16]: Copied!
%%sparksql\nSELECT\n    *\nFROM\n    the_exploded_table AS the\n
%%sparksql SELECT * FROM the_exploded_table AS the
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
addressagefirst Namelast NamephoneNumbersphoneNumber {San Jone, 394221, CA, 126}24RackJackon[{7383627627, home}]{7383627627, home}
Execution time: 0.25 seconds
In\u00a0[17]: Copied!
%%sparksql --output text\nSELECT\n    *\nFROM\n    the_exploded_table AS the\n
%%sparksql --output text SELECT * FROM the_exploded_table AS the
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
+---------------------------+---+----------+---------+--------------------+------------------+\n|                    address|age|first Name|last Name|        phoneNumbers|       phoneNumber|\n+---------------------------+---+----------+---------+--------------------+------------------+\n|{San Jone, 394221, CA, 126}| 24|      Rack|   Jackon|[{7383627627, home}]|{7383627627, home}|\n+---------------------------+---+----------+---------+--------------------+------------------+
Execution time: 0.09 seconds
In\u00a0[18]: Copied!
%%sparksql --output sql\nSELECT\n    *\nFROM\n    the_exploded_table AS the\n
%%sparksql --output sql SELECT * FROM the_exploded_table AS the Out[18]:
SELECT\n    *\nFROM\n    the_exploded_table AS the\n
In\u00a0[19]: Copied!
%%sparksql --dataframe the_exploded_dataframe --output skip\nSELECT\n    *,\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n
%%sparksql --dataframe the_exploded_dataframe --output skip SELECT *, explode(con.phoneNumbers) as phoneNumber FROM contacts_table AS con
Captured dataframe to local variable `the_exploded_dataframe`\nQuery execution skipped\n
In\u00a0[20]: Copied!
the_exploded_dataframe.select('phoneNumber').show()\n
the_exploded_dataframe.select('phoneNumber').show()
+------------------+\n|       phoneNumber|\n+------------------+\n|{7383627627, home}|\n+------------------+\n\n
In\u00a0[27]: Copied!
# declare a python string\nsql = '''\n--start-sparksql\nSELECT\n    *, con.`first Name`\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n--end-sparksql\n'''\nprint(sql)\n
# declare a python string sql = ''' --start-sparksql SELECT *, con.`first Name` explode(con.phoneNumbers) as phoneNumber FROM contacts_table AS con --end-sparksql ''' print(sql)
\n--start-sparksql\nSELECT\n    *, con.`first Name`\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n--end-sparksql\n\n
In\u00a0[\u00a0]: Copied!
# declare a python string\nsql = '''\n--start-sparksql\nSELECT\n    *,\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n--end-sparksql\n'''\nprint(sql)\n
# declare a python string sql = ''' --start-sparksql SELECT *, explode(con.phoneNumbers) as phoneNumber FROM contacts_table AS con --end-sparksql ''' print(sql) In\u00a0[22]: Copied!
spark.sql(sql).show()\n
spark.sql(sql).show()
+--------------------+---+----------+---------+--------------------+------------------+\n|             address|age|first Name|last Name|        phoneNumbers|       phoneNumber|\n+--------------------+---+----------+---------+--------------------+------------------+\n|{San Jone, 394221...| 24|      Rack|   Jackon|[{7383627627, home}]|{7383627627, home}|\n+--------------------+---+----------+---------+--------------------+------------------+\n\n
In\u00a0[23]: Copied!
%%sparksql?\n
%%sparksql?
Docstring:\n::\n\n  %sparksql [-l max_rows] [-r all|local|none] [-d name] [-c] [-e]\n                [-v name] [-o sql|json|html|grid|text|schema|skip|none] [-s]\n                [-j] [-t max_cell_length]\n                [sql [sql ...]]\n\nMagic that works both as %sparksql and as %%sparksql\n\npositional arguments:\n  sql                   SQL statement to execute\n\noptional arguments:\n  -l max_rows, --limit max_rows\n                        The maximum number of rows to display. A value of zero\n                        is equivalent to `--output skip`\n  -r <all|local|none>, --refresh <all|local|none>\n                        Force the regeneration of the schema cache file. The\n                        `local` option will only update tables/views created\n                        in the local Spark context.\n  -d name, --dataframe name\n                        Capture dataframe in a local variable named `name`\n  -c, --cache           Cache dataframe\n  -e, --eager           Cache dataframe with eager load\n  -v name, --view name  Create or replace a temporary view named `name`\n  -o <sql|json|html|grid|text|schema|skip|none>, --output <sql|json|html|grid|text|schema|skip|none>\n                        Output format. Defaults to html. The `sql` option\n                        prints the SQL statement that will be executed (useful\n                        to test jinja templated statements)\n  -s, --show-nonprinting\n                        Replace none printable characters with their ascii\n                        codes (LF -> )\n  -j, --jinja           Enable Jinja templating support\n  -t max_cell_length, --truncate max_cell_length\n                        Truncate output\nFile:      /data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/jupyterlab_sql_editor/ipython_magic/sparksql/sparksql.py\n
In\u00a0[24]: Copied!
%%sparksql --limit 1 --output grid\nSELECT\n    id,\n    rand() AS f1,\n    rand() AS f2,\n    rand() AS f3,\n    rand() AS f4,\n    rand() AS f5,\n    TRANSFORM(SEQUENCE(1, 512), x -> rand()) AS data -- array of 512 floats\nFROM\nRANGE\n    (1, 400000, 1, 100)\nUNION\nSELECT\n    id,\n    rand() AS f1,\n    rand() AS f2,\n    rand() AS f3,\n    rand() AS f4,\n    rand() AS f5,\n    TRANSFORM(SEQUENCE(1, 512), x -> rand()) AS data -- array of 512 floats\nFROM\nRANGE\n    (1, 40000, 1, 100)\n
%%sparksql --limit 1 --output grid SELECT id, rand() AS f1, rand() AS f2, rand() AS f3, rand() AS f4, rand() AS f5, TRANSFORM(SEQUENCE(1, 512), x -> rand()) AS data -- array of 512 floats FROM RANGE (1, 400000, 1, 100) UNION SELECT id, rand() AS f1, rand() AS f2, rand() AS f3, rand() AS f4, rand() AS f5, TRANSFORM(SEQUENCE(1, 512), x -> rand()) AS data -- array of 512 floats FROM RANGE (1, 40000, 1, 100)
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
only showing top 1 row
Execution time: 10.18 seconds
In\u00a0[26]: Copied!
%%sparksql\nSELECT\n    mes.`first Name`,\n    mes.`last Name`,\n    mes.messages,\n    mes.messages.body,\n    mes.messages.time\nFROM\n    contacts_table AS con\n    INNER JOIN messages_table AS mes ON mes.`first Name` = con.`first Name`\n
%%sparksql SELECT mes.`first Name`, mes.`last Name`, mes.messages, mes.messages.body, mes.messages.time FROM contacts_table AS con INNER JOIN messages_table AS mes ON mes.`first Name` = con.`first Name`
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
first Namelast Namemessagesbodytime RackJackon[{hello, 2022-01-15}, {you there, 2022-01-16}][hello, you there][2022-01-15, 2022-01-16]
Execution time: 0.15 seconds
"},{"location":"example/SparkConfigurationUsage/#configuration-and-usage","title":"Configuration and Usage\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#press-tab-to-trigger-auto-completions-and-ctrl-q-to-format-cell","title":"Press tab to trigger auto completions and Ctrl-Q to format cell\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#create-a-temporary-view-with-the-view-option","title":"Create a temporary view with the --view option\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#use-temporary-view-in-subsequent-queries-with-autocomplet-suggestions","title":"Use temporary view in subsequent queries with autocomplet suggestions\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#create-a-dataframe-variable-to-use-in-pypark","title":"Create a dataframe variable to use in pypark\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#continue-developing-your-query-using-dataframe-api","title":"Continue developing your query using dataframe API\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#edit-sql-within-python-strings","title":"Edit SQL within python strings\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#enjoy-the-same-functionality-as-a-code-cell","title":"Enjoy the same functionality as a code cell\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#long-running-query-show-progress-bar-and-link-to-spark-ui","title":"Long running query show progress bar and link to Spark UI\u00b6","text":""},{"location":"example/SparkDataframe/","title":"SparkDataframe","text":"In\u00a0[1]: Copied!
from pyspark.sql import SparkSession\nimport ipywidgets as widgets\nout = widgets.Output()\nwith out:\n    spark = SparkSession.builder.getOrCreate()\n
from pyspark.sql import SparkSession import ipywidgets as widgets out = widgets.Output() with out: spark = SparkSession.builder.getOrCreate() In\u00a0[2]: Copied!
df = spark.sql(\"SELECT id, uuid() FROM RANGE (1, 1000)\")\ndf\n
df = spark.sql(\"SELECT id, uuid() FROM RANGE (1, 1000)\") df Out[2]:
DataFrame[id: bigint, uuid(): string]
In\u00a0[3]: Copied!
from jupyterlab_sql_editor.ipython.sparkdf import register_display\nfrom jupyterlab_sql_editor.outputters.outputters import _display_results\nregister_display()\n
from jupyterlab_sql_editor.ipython.sparkdf import register_display from jupyterlab_sql_editor.outputters.outputters import _display_results register_display() In\u00a0[4]: Copied!
# change default display behaviour\ndf = spark.sql(\"SELECT id, uuid() FROM RANGE (1, 1000)\")\ndf\n
# change default display behaviour df = spark.sql(\"SELECT id, uuid() FROM RANGE (1, 1000)\") df
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Open Spark UI \u2b50 pyspark-shell
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
Out[4]:
 In\u00a0[5]: Copied! 
pdf = df.limit(1).toPandas()\n
pdf = df.limit(1).toPandas() In\u00a0[6]: Copied!
# _display_results lets you configure the output\n_display_results(pdf, output=\"html\", show_nonprinting=False)\n
# _display_results lets you configure the output _display_results(pdf, output=\"html\", show_nonprinting=False) iduuid() 19d977b7e-e4b2-4ce5-9f5b-184054f7542d In\u00a0[7]: Copied!
_display_results(pdf, output=\"text\")\n
_display_results(pdf, output=\"text\")
+---+------------------------------------+\n|id |uuid()                              |\n+---+------------------------------------+\n|1  |9d977b7e-e4b2-4ce5-9f5b-184054f7542d|\n+---+------------------------------------+\n\n
In\u00a0[8]: Copied!
df = spark.read.json(\"file:/path/to/contacts.json\")\n_display_results(pdf, output=\"json\")\n
df = spark.read.json(\"file:/path/to/contacts.json\") _display_results(pdf, output=\"json\")
<IPython.core.display.JSON object>
"},{"location":"example/SparkSQLEscapeControlChars/","title":"Escaping Control Characters","text":"In\u00a0[\u00a0]: Copied!
from pyspark.sql import SparkSession\n\nspark = SparkSession.builder.getOrCreate()\n\n%load_ext jupyterlab_sql_editor.ipython_magic.sparksql\n
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() %load_ext jupyterlab_sql_editor.ipython_magic.sparksql In\u00a0[2]: Copied!
spark.sql('''\n--start-sparksql\nSELECT\n    '\\r\\t\\b\\n\\t\\n' AS a,\n    1 AS num,\n    ARRAY('\\t123', 'ab\\nc') AS str_array,\n    '\\t' AS tab,\n    '\\\\t' AS backslash_and_t,\n    '\\\\\\t' AS backslash_and_tab,\n    '\\\\\\\\t' AS two_backslash_and_t,\n    '\\\\\\\\\\t' AS two_backslash_and_tab\n--end-sparksql\n''').show()\n
spark.sql(''' --start-sparksql SELECT '\\r\\t\\b\\n\\t\\n' AS a, 1 AS num, ARRAY('\\t123', 'ab\\nc') AS str_array, '\\t' AS tab, '\\\\t' AS backslash_and_t, '\\\\\\t' AS backslash_and_tab, '\\\\\\\\t' AS two_backslash_and_t, '\\\\\\\\\\t' AS two_backslash_and_tab --end-sparksql ''').show()
+------+---+------------+---+---------------+-----------------+-------------------+---------------------+\n|     a|num|   str_array|tab|backslash_and_t|backslash_and_tab|two_backslash_and_t|two_backslash_and_tab|\n+------+---+------------+---+---------------+-----------------+-------------------+---------------------+\n|\n\t\n|  1|[\t123, ab\nc]|  \t|              \t|                \t|                 \\t|                   \\\t|\n+------+---+------------+---+---------------+-----------------+-------------------+---------------------+\n\n
In\u00a0[3]: Copied!
%%sparksql --output html --show-nonprinting\nSELECT\n    '\\r\\t\\b\\n\\t\\n' AS a,\n    1 AS num,\n    ARRAY('\\t123', 'ab\\nc') AS str_array,\n    '\\t' AS tab,\n    '\\\\t' AS backslash_and_t,\n    '\\\\\\t' AS backslash_and_tab,\n    '\\\\\\\\t' AS two_backslash_and_t,\n    '\\\\\\\\\\t' AS two_backslash_and_tab\n
%%sparksql --output html --show-nonprinting SELECT '\\r\\t\\b\\n\\t\\n' AS a, 1 AS num, ARRAY('\\t123', 'ab\\nc') AS str_array, '\\t' AS tab, '\\\\t' AS backslash_and_t, '\\\\\\t' AS backslash_and_tab, '\\\\\\\\t' AS two_backslash_and_t, '\\\\\\\\\\t' AS two_backslash_and_tab
TTL -1 seconds expired, re-generating schema file: /tmp/sparkdb.schema.json\nGenerating schema file: /tmp/sparkdb.schema.json\nSchema file updated: /tmp/sparkdb.schema.json\n
Out[3]: anumstr_arraytabbackslash_and_tbackslash_and_tabtwo_backslash_and_ttwo_backslash_and_tab\\x0d\\x09\\x08\\x0a\\x09\\x0a1['\\t123', 'ab\\nc']\\x09\\t\\\\x09\\\\t\\\\\\x09 In\u00a0[4]: Copied!
%%sparksql --output grid --show-nonprinting\nSELECT\n    '\\r\\t\\b\\n\\t\\n' AS a,\n    1 AS num,\n    ARRAY('\\t123', 'ab\\nc') AS str_array,\n    '\\t' AS tab,\n    '\\\\t' AS backslash_and_t,\n    '\\\\\\t' AS backslash_and_tab,\n    '\\\\\\\\t' AS two_backslash_and_t,\n    '\\\\\\\\\\t' AS two_backslash_and_tab\n
%%sparksql --output grid --show-nonprinting SELECT '\\r\\t\\b\\n\\t\\n' AS a, 1 AS num, ARRAY('\\t123', 'ab\\nc') AS str_array, '\\t' AS tab, '\\\\t' AS backslash_and_t, '\\\\\\t' AS backslash_and_tab, '\\\\\\\\t' AS two_backslash_and_t, '\\\\\\\\\\t' AS two_backslash_and_tab
TTL -1 seconds expired, re-generating schema file: /tmp/sparkdb.schema.json\nGenerating schema file: /tmp/sparkdb.schema.json\nSchema file updated: /tmp/sparkdb.schema.json\n
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
In\u00a0[5]: Copied!
%%sparksql --output json --show-nonprinting\nSELECT\n    '\\r\\t\\b\\n\\t\\n' AS a,\n    1 AS num,\n    ARRAY('\\t123', 'ab\\nc') AS str_array,\n    '\\t' AS tab,\n    '\\\\t' AS backslash_and_t,\n    '\\\\\\t' AS backslash_and_tab,\n    '\\\\\\\\t' AS two_backslash_and_t,\n    '\\\\\\\\\\t' AS two_backslash_and_tab\n
%%sparksql --output json --show-nonprinting SELECT '\\r\\t\\b\\n\\t\\n' AS a, 1 AS num, ARRAY('\\t123', 'ab\\nc') AS str_array, '\\t' AS tab, '\\\\t' AS backslash_and_t, '\\\\\\t' AS backslash_and_tab, '\\\\\\\\t' AS two_backslash_and_t, '\\\\\\\\\\t' AS two_backslash_and_tab
TTL -1 seconds expired, re-generating schema file: /tmp/sparkdb.schema.json\nGenerating schema file: /tmp/sparkdb.schema.json\nSchema file updated: /tmp/sparkdb.schema.json\n
Out[5]:
<IPython.core.display.JSON object>
"},{"location":"example/SparkSQLEscapeControlChars/#escaping-control-characters","title":"Escaping Control Characters\u00b6","text":""},{"location":"example/SparkSyntaxDemo/","title":"Spark Syntax Demo","text":"In\u00a0[\u00a0]: Copied!
%%sparksql --dataframe df -c --eager -v MY_VIEW --limit 12 -f adir/out.json -t 60\n-- cell magic\nSELECT *\nFROM student AS cellmagic\n
%%sparksql --dataframe df -c --eager -v MY_VIEW --limit 12 -f adir/out.json -t 60 -- cell magic SELECT * FROM student AS cellmagic In\u00a0[\u00a0]: Copied!
%sparksql -d df SELECT * from student where x=1\n
%sparksql -d df SELECT * from student where x=1 In\u00a0[\u00a0]: Copied!
%sparksql --dataframe df -c --eager -v MY_VIEW --limit 12 -f adir/out.json -t 60 SELECT * from student -- line magic using no argument options like --eager\n
%sparksql --dataframe df -c --eager -v MY_VIEW --limit 12 -f adir/out.json -t 60 SELECT * from student -- line magic using no argument options like --eager In\u00a0[\u00a0]: Copied!
%sparksql SELECT * FROM tab WHERE x = 1\n\n\n# select is not be highlighted anymore, but it does require a blank line (line 3 above)\n%sparksql  --cache   SELECT * FROM tab WHERE x = 2\n\n%sparksql --cache SELECT * FROM tab WHERE x = 3\n
%sparksql SELECT * FROM tab WHERE x = 1 # select is not be highlighted anymore, but it does require a blank line (line 3 above) %sparksql --cache SELECT * FROM tab WHERE x = 2 %sparksql --cache SELECT * FROM tab WHERE x = 3 In\u00a0[\u00a0]: Copied!
# mix python and SQL\n\n# python import\nimport pyspark\n\n%sparksql SELECT s.age FROM student AS linemagic1 -- line magic in a mix cell\n\n%sparksql SELECT s.age FROM student AS linemagic2 -- another line magic in a mix cell\n\n# a python string with SQL within it\nsql = '''\n--start-sparksql anything here is ignored and is not included in the SQL statement for LSP\nselect s.age from student as frompythonstring\n--end-sparksql\n'''\n\n# back to python\nprint(sql)\n\nspark.sql(sql).show()\n
# mix python and SQL # python import import pyspark %sparksql SELECT s.age FROM student AS linemagic1 -- line magic in a mix cell %sparksql SELECT s.age FROM student AS linemagic2 -- another line magic in a mix cell # a python string with SQL within it sql = ''' --start-sparksql anything here is ignored and is not included in the SQL statement for LSP select s.age from student as frompythonstring --end-sparksql ''' # back to python print(sql) spark.sql(sql).show()"},{"location":"example/SparkSyntaxDemo/#spark-syntax-demo","title":"Spark Syntax Demo\u00b6","text":""},{"location":"example/SupersetJinjaTestHarness/","title":"Superset Tests Harness using Jinja","text":"In\u00a0[2]: Copied!
%load_ext jupyterlab_sql_editor.ipython_magic.trino\n%config Trino.host='localhost'\n%config Trino.port=8080\n%config Trino.httpScheme='http'\n%config Trino.auth=None\n%config Trino.user='the-user'\n\n%config Trino.cacheTTL=3600\n%config Trino.outputFile=\"/tmp/trinodb.schema.json\"\n\n# comma seperated list of schema to cache in the schema file\n%config Trino.catalogs=\"system,tpch\"\n
%load_ext jupyterlab_sql_editor.ipython_magic.trino %config Trino.host='localhost' %config Trino.port=8080 %config Trino.httpScheme='http' %config Trino.auth=None %config Trino.user='the-user' %config Trino.cacheTTL=3600 %config Trino.outputFile=\"/tmp/trinodb.schema.json\" # comma seperated list of schema to cache in the schema file %config Trino.catalogs=\"system,tpch\" In\u00a0[3]: Copied!
%%trino --limit 3 --output grid\nSELECT\n    *\nFROM\n      tpch.tiny.orders\n
%%trino --limit 3 --output grid SELECT * FROM tpch.tiny.orders
\n\n
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
Only showing top 3 row(s)\n
In\u00a0[4]: Copied!
# Superset function to retrieve filters\ndef filter_values(column: str, default=None, remove_filter: bool = False):\n    return VALUE_LIST\n\nVALUE_LIST = ['Clerk#00000036', 'Clerk#000000779']\n
# Superset function to retrieve filters def filter_values(column: str, default=None, remove_filter: bool = False): return VALUE_LIST VALUE_LIST = ['Clerk#00000036', 'Clerk#000000779'] In\u00a0[5]: Copied!
%%trino --limit 1 --jinja --output sql\nSELECT\n    *\nFROM\n  tpch.tiny.orders\nWHERE\n    orderkey in ({{ \"'\" + \"','\".join(filter_values('clerk')) + \"'\" }})\n
%%trino --limit 1 --jinja --output sql SELECT * FROM tpch.tiny.orders WHERE orderkey in ({{ \"'\" + \"','\".join(filter_values('clerk')) + \"'\" }})
\n\n
Out[5]:
SELECT \n    *\nFROM\n  tpch.tiny.orders\nWHERE\n    orderkey in ('Clerk#00000036','Clerk#000000779')\n
In\u00a0[10]: Copied!
VALUE_LIST = ['Clerk#00000036', 'Clerk#000000779']\n\n# Superset function to retrieve filters\ndef filter_values(column: str, default=None, remove_filter: bool = False):\n    return VALUE_LIST\n\ndef quote_value(v):\n    if isinstance(v, str):\n        # escape quotes found in value\n        v = v.replace(\"'\", \"''\")\n        # quote string values\n        v = f\"'{v}'\"\n    return str(v)\n\ndef sql_filter_value_list(column: str, default=None):\n    \"\"\"\n        Build the SQL string representation of a list of values\n        taking into consideration the value type. String will get quoted\n        but numbers will not. Also quotes within these strings get escaped\n    \"\"\"\n    values = filter_values(str, default, True)\n    if len(values) > 0:\n        quoted_values = [quote_value(v) for v in values]\n        return \",\".join(quoted_values)\n    return None\n
VALUE_LIST = ['Clerk#00000036', 'Clerk#000000779'] # Superset function to retrieve filters def filter_values(column: str, default=None, remove_filter: bool = False): return VALUE_LIST def quote_value(v): if isinstance(v, str): # escape quotes found in value v = v.replace(\"'\", \"''\") # quote string values v = f\"'{v}'\" return str(v) def sql_filter_value_list(column: str, default=None): \"\"\" Build the SQL string representation of a list of values taking into consideration the value type. String will get quoted but numbers will not. Also quotes within these strings get escaped \"\"\" values = filter_values(str, default, True) if len(values) > 0: quoted_values = [quote_value(v) for v in values] return \",\".join(quoted_values) return None In\u00a0[11]: Copied!
%%trino --limit 1 --jinja --output sql\nSELECT\n    *\nFROM\n  tpch.tiny.orders\nWHERE\n    orderkey in ({{sql_filter_value_list('clerk')}})\n
%%trino --limit 1 --jinja --output sql SELECT * FROM tpch.tiny.orders WHERE orderkey in ({{sql_filter_value_list('clerk')}})
\n\n
Out[11]:
SELECT \n    *\nFROM\n  tpch.tiny.orders\nWHERE\n    orderkey in ('Clerk#00000036','Clerk#000000779')\n
In\u00a0[\u00a0]: Copied!
\n
In\u00a0[\u00a0]: Copied!
\n
"},{"location":"example/SupersetJinjaTestHarness/#superset-tests-harness-using-jinja","title":"Superset Tests Harness using Jinja\u00b6","text":""},{"location":"example/SupersetJinjaTestHarness/#elaborate-a-function-to-build-list-of-values","title":"Elaborate a function to build list of values\u00b6","text":"

This function can then be registered in Superset and re-used in virtual datasets.

"},{"location":"example/TrinoConfigurationUsage/","title":"Configuration and Usage","text":"

Normally IPython only displays the output of the last statement. However it can be handy to run multiple sql magics in a single cell and see the output of each execution. Setting ast_node_interactivity to all will enable that.

In\u00a0[1]: Copied!
# Display all cell outputs in notebook\nfrom IPython.core.interactiveshell import InteractiveShell\nInteractiveShell.ast_node_interactivity = 'all'\n
# Display all cell outputs in notebook from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = 'all' In\u00a0[2]: Copied!
%load_ext jupyterlab_sql_editor.ipython_magic.trino\n
%load_ext jupyterlab_sql_editor.ipython_magic.trino In\u00a0[3]: Copied!
%config Trino.host='localhost'\n%config Trino.port=8080\n%config Trino.httpScheme='http'\n%config Trino.auth=None\n%config Trino.user='the-user'\n\n%config Trino.cacheTTL=3600\n%config Trino.outputFile=\"/tmp/trinodb.schema.json\"\n\n# comma seperated list of schema to cache in the schema file\n%config Trino.catalogs=\"system,tpch\"\n
%config Trino.host='localhost' %config Trino.port=8080 %config Trino.httpScheme='http' %config Trino.auth=None %config Trino.user='the-user' %config Trino.cacheTTL=3600 %config Trino.outputFile=\"/tmp/trinodb.schema.json\" # comma seperated list of schema to cache in the schema file %config Trino.catalogs=\"system,tpch\"

In production environment you will want to pass in an authentiction

import trino\n    %config Trino.auth=trino.auth.BasicAuthentication(\"principal id\", \"password\")\n    %config Trino.user=None\n

See https://github.com/trinodb/trino-python-client/blob/master/trino/auth.py for more details

In\u00a0[4]: Copied!
%trino --refresh all\n
%trino --refresh all
Exporting functions: [########################################] 100.0%\nSchema file updated: /tmp/trinodb.schema.json\n
In\u00a0[5]: Copied!
%trino SELECT 'hello'\n
%trino SELECT 'hello'
\n\n
_col0hello In\u00a0[6]: Copied!
#%trino SHOW CATALOGS\n
#%trino SHOW CATALOGS In\u00a0[7]: Copied!
%%trino --limit 2 --output sql\nSELECT *\nFROM\n    tpch.tiny.orders AS ord\n
%%trino --limit 2 --output sql SELECT * FROM tpch.tiny.orders AS ord
\n\n
Out[7]:
SELECT *\nFROM\n    tpch.tiny.orders AS ord\n
In\u00a0[8]: Copied!
%%trino --limit 2 --dataframe x --output grid\nSELECT\n    ord.orderkey,\n    ord.custkey,\n    ord.orderstatus,\n    ord.totalprice,\n    ord.orderdate,\n    ord.orderpriority,\n    ord.clerk,\n    ord.shippriority,\n    ord.comment\nFROM\n    tpch.tiny.orders AS ord\n
%%trino --limit 2 --dataframe x --output grid SELECT ord.orderkey, ord.custkey, ord.orderstatus, ord.totalprice, ord.orderdate, ord.orderpriority, ord.clerk, ord.shippriority, ord.comment FROM tpch.tiny.orders AS ord
\n\nSaved results to pandas dataframe named `x`\n
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
Only showing top 2 row(s)\n
In\u00a0[9]: Copied!
%%trino --catalog tpch --schema sf1000\n\nSELECT * FROM lineitem\n
%%trino --catalog tpch --schema sf1000 SELECT * FROM lineitem
tpch\nsf1000\n
orderkeypartkeysuppkeylinenumberquantityextendedpricediscounttaxreturnflaglinestatusshipdatecommitdatereceiptdateshipinstructshipmodecomment3750000001756054533105475139.052832.130.070.07AF1994-03-301994-05-231994-04-04TAKE BACK RETURNAIRdolites above the even, b37500000011008318115831832215.026066.550.090.05AF1994-03-191994-04-161994-03-25COLLECT CODRAIL regular dependencies. entici37500000021288501128850113149.051727.830.020.07AF1995-03-081995-01-261995-03-09TAKE BACK RETURNRAILly alongside of the re3750000002183830552883058925.07366.80.040.08RF1994-12-181995-01-021995-01-09NONEMAILusly regular accoun37500000024008246482465344.063556.240.020.04RF1994-12-201995-01-211994-12-26COLLECT CODRAILt have to wake blithely r37500000026595701995703244.04290.880.050.07RF1995-02-031995-01-131995-02-24TAKE BACK RETURNSHIPges cajole furiously across the sl37500000021396112322111246513.014771.250.060.08AF1994-12-131995-02-261995-01-02COLLECT CODREG AIRuests sleep furiously slyly special excus3750000002459695133469526644.069529.680.010.06RF1995-01-201995-01-301995-02-08NONEAIRironic braids across t37500000031484709158470916138.071382.620.070.07AF1993-10-021993-09-191993-10-20TAKE BACK RETURNAIRke blithely. furiously bold accounts dete37500000031506635723163588247.071817.880.080.06AF1993-09-061993-11-051993-09-19COLLECT CODAIRnding orbits ought to nag evenly express s37500000032641545264154637.010404.870.010.08RF1993-11-281993-09-201993-12-07COLLECT CODSHIPnding warhorses wake slyly instr37500000031964445431444582424.035465.280.050.03RF1993-09-181993-09-261993-10-18COLLECT CODAIReans: carefully express a375000000468826735632675419.014924.610.10.07AF1993-07-271993-09-201993-07-29TAKE BACK RETURNMAILar foxes. 375000000417092818992819026.07251.840.040.04AF1993-10-301993-08-251993-11-02DELIVER IN PERSONMAILiously up the final notornis. depe37500000051567707611770792126.047422.180.040.02RF1993-12-101993-11-011993-12-26COLLECT CODREG AIR furiously final deposits. fluffily375000000546178756117876525.09162.250.020.05AF1993-10-121993-11-051993-10-14DELIVER IN PERSONMAILside the always special accounts37500000051263431026343103330.034163.70.050.07AF1993-11-251993-11-121993-11-30DELIVER IN PERSONFOBly. slyly regular dolphins cajole blithel37500000054849922299922748.09750.40.040.02AF1993-12-011993-11-021993-12-21TAKE BACK RETURNTRUCKfluffily even dependencies. reg37500000061465966339096648146.079226.260.00.01RF1994-03-121994-02-161994-04-05DELIVER IN PERSONFOB pending requests. 3750000006496102627110275238.044451.640.00.03RF1994-01-231994-01-031994-02-03DELIVER IN PERSONREG AIRong the regular, express packages
Only showing top 20 row(s)\n
In\u00a0[10]: Copied!
%%trino\nSELECT\n    lin.orderkey,\n    lin.partkey,\n    lin.suppkey,\n    lin.linenumber,\n    lin.quantity,\n    lin.extendedprice,\n    lin.discount,\n    lin.tax,\n    lin.returnflag,\n    lin.linestatus,\n    lin.shipdate,\n    lin.commitdate,\n    lin.receiptdate,\n    lin.shipinstruct,\n    lin.shipmode,\n    lin.comment,\n    ord.orderpriority\nFROM\n    tpch.sf1000.lineitem AS lin\n    INNER JOIN tpch.sf1.orders AS ord ON ord.orderkey = lin.orderkey\n
%%trino SELECT lin.orderkey, lin.partkey, lin.suppkey, lin.linenumber, lin.quantity, lin.extendedprice, lin.discount, lin.tax, lin.returnflag, lin.linestatus, lin.shipdate, lin.commitdate, lin.receiptdate, lin.shipinstruct, lin.shipmode, lin.comment, ord.orderpriority FROM tpch.sf1000.lineitem AS lin INNER JOIN tpch.sf1.orders AS ord ON ord.orderkey = lin.orderkey
\n\n
orderkeypartkeysuppkeylinenumberquantityextendedpricediscounttaxreturnflaglinestatusshipdatecommitdatereceiptdateshipinstructshipmodecommentorderpriority11551893457689361117.024252.030.040.02NO1996-03-131996-02-121996-03-22DELIVER IN PERSONTRUCKegular courts above the5-LOW1673090807309081236.039085.920.090.06NO1996-04-121996-02-281996-04-20TAKE BACK RETURNMAILly final dependencies: slyly bold 5-LOW163699776369977738.014180.720.10.02NO1996-01-291996-03-051996-01-31TAKE BACK RETURNREG AIRriously. regular, express dep5-LOW121314954631496428.042738.920.090.06NO1996-04-211996-03-301996-05-16NONEAIRlites. fluffily even de5-LOW1240266341526641524.037426.320.10.04NO1996-03-301996-03-141996-04-01NONEFOB pending foxes. slyly re5-LOW115634450634453632.044277.440.070.02NO1996-01-301996-02-071996-02-03DELIVER IN PERSONMAILarefully slyly ex5-LOW21061697221169743138.067883.960.00.05NO1997-01-281997-01-141997-02-02TAKE BACK RETURNRAILven requests. deposits breach a1-URGENT342969621796963145.088143.750.060.0RF1994-02-021994-01-041994-02-23NONEAIRongside of the furiously brave acco5-LOW3190354296535433249.066810.030.10.0RF1993-11-091993-12-201993-11-24TAKE BACK RETURNRAIL unusual accounts. eve5-LOW31284482293448254327.031611.60.060.07AF1994-01-161993-11-221994-01-23DELIVER IN PERSONSHIPnal foxes wake. 5-LOW329379610187961342.03376.30.010.06AF1993-12-041994-01-071994-01-01NONETRUCKy. fluffily pending d5-LOW3183094077594132528.029733.760.040.0RF1993-12-141994-01-101994-01-01TAKE BACK RETURNFOBages nag slyly pending5-LOW3621425919642610626.042392.740.10.02AF1993-10-291993-12-181993-11-04TAKE BACK RETURNRAILges sleep after the caref5-LOW4880346845534709130.048428.40.030.08NO1996-01-101995-12-141996-01-18DELIVER IN PERSONREG AIR- quickly regular packages sleep. idly5-LOW51085692838569284115.020202.90.020.04RF1994-10-311994-08-311994-11-20NONEAIRts wake furiously 5-LOW51239267893926790226.047049.340.070.08RF1994-10-161994-09-251994-10-19NONEFOBsts use slyly quickly special instruc5-LOW53753018030184350.060415.50.080.03AF1994-08-081994-10-131994-08-26DELIVER IN PERSONAIReodolites. fluffily unusual5-LOW61396354552135469137.051188.390.080.03AF1992-04-271992-05-151992-05-02TAKE BACK RETURNTRUCKp furiously special foxes4-NOT SPECIFIED71820518399551894112.021380.760.070.03NO1996-05-071996-03-131996-06-03TAKE BACK RETURNFOBss pinto beans wake against th2-HIGH7145242743774275829.015106.320.080.08NO1996-02-011996-03-021996-02-19TAKE BACK RETURNSHIPes. instructions2-HIGH
Only showing top 20 row(s)\n
In\u00a0[11]: Copied!
%%trino --output json\nSELECT\n    1 AS a,\n    'abc' as b,\n    1.2 as c,\n    ARRAY[1,2] as d,\n    ARRAY[1, null, 4] as e,\n    ARRAY[ARRAY[1,2],ARRAY[5,4]] as f,\n    CAST(ROW(1,23,456) as ROW(k1 INT, k2 INT, k3 INT)) as g,\n    CAST(ROW(1,'abc',true,null) as ROW(k1 INT, k2 VARCHAR, k3 BOOLEAN, k4 VARCHAR)) as h\n
%%trino --output json SELECT 1 AS a, 'abc' as b, 1.2 as c, ARRAY[1,2] as d, ARRAY[1, null, 4] as e, ARRAY[ARRAY[1,2],ARRAY[5,4]] as f, CAST(ROW(1,23,456) as ROW(k1 INT, k2 INT, k3 INT)) as g, CAST(ROW(1,'abc',true,null) as ROW(k1 INT, k2 VARCHAR, k3 BOOLEAN, k4 VARCHAR)) as h
\n\n
<IPython.core.display.JSON object>
In\u00a0[12]: Copied!
%%trino?\n
%%trino?
Docstring:\n::\n\n  %trino [-l max_rows] [-r all|local|none] [-d name]\n             [-o sql|json|html|grid|text|skip|none] [-s] [-x] [-c catalogname]\n             [-m schemaname] [-j]\n             [sql [sql ...]]\n\nMagic that works both as %trino and as %%trino\n\npositional arguments:\n  sql                   SQL statement to execute\n\noptional arguments:\n  -l max_rows, --limit max_rows\n                        The maximum number of rows to display. A value of zero\n                        is equivalent to `--output skip`\n  -r <all|local|none>, --refresh <all|local|none>\n                        Force the regeneration of the schema cache file. The\n                        `local` option will only update tables/views created\n                        in the local Spark context.\n  -d name, --dataframe name\n                        Capture results in pandas dataframe\n  -o <sql|json|html|grid|text|skip|none>, --output <sql|json|html|grid|text|skip|none>\n                        Output format. Defaults to html. The `sql` option\n                        prints the SQL statement that will be executed (useful\n                        to test jinja templated statements)\n  -s, --show-nonprinting\n                        Replace none printable characters with their ascii\n                        codes (LF -> )\n  -x, --raw             Run statement as is. Do not wrap statement with a\n                        limit. Use this option to run statement which can't be\n                        wrapped in a SELECT/LIMIT statement. For example\n                        EXPLAIN, SHOW TABLE, SHOW CATALOGS.\n  -c catalogname, --catalog catalogname\n                        Trino catalog to use\n  -m schemaname, --schema schemaname\n                        Trino schema to use\n  -j, --jinja           Enable Jinja templating support\nFile:      /data/dev/jupyterlab-sql-editor/ipython_magic/trino/trino.py\n
"},{"location":"example/TrinoConfigurationUsage/#configuration-and-usage","title":"Configuration and Usage\u00b6","text":""},{"location":"example/TrinoJinjaTemplate/","title":"Jinja Templating with trino","text":"In\u00a0[1]: Copied!
%load_ext jupyterlab_sql_editor.ipython_magic.trino\n\n%config Trino.host='localhost'\n%config Trino.port=8080\n%config Trino.httpScheme='http'\n%config Trino.auth=None\n%config Trino.user='the-user'\n\n%config Trino.cacheTTL=3600\n%config Trino.outputFile=\"/tmp/trinodb.schema.json\"\n\n# comma seperated list of schema to cache in the schema file\n%config Trino.catalogs=\"system,tpch\"\n
%load_ext jupyterlab_sql_editor.ipython_magic.trino %config Trino.host='localhost' %config Trino.port=8080 %config Trino.httpScheme='http' %config Trino.auth=None %config Trino.user='the-user' %config Trino.cacheTTL=3600 %config Trino.outputFile=\"/tmp/trinodb.schema.json\" # comma seperated list of schema to cache in the schema file %config Trino.catalogs=\"system,tpch\" In\u00a0[2]: Copied!
table_name = \"tpch.tiny.orders\"\n
table_name = \"tpch.tiny.orders\" In\u00a0[3]: Copied!
%%trino --limit 3 --jinja\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --limit 3 --jinja SELECT * FROM {{ table_name }}
\n\nOnly showing top 3 row(s)\n
Out[3]: orderkeycustkeyorderstatustotalpriceorderdateorderpriorityclerkshipprioritycomment7492511O64295.071996-06-265-LOWClerk#0000008100f the pending deposits. express, ironic deposits7493205F73649.681994-12-182-HIGHClerk#0000002790riously even instructions haggle agains74941231O68212.311996-04-024-NOT SPECIFIEDClerk#0000002440fily express packages. blithely regular requests across In\u00a0[4]: Copied!
%%trino --output sql --jinja\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --output sql --jinja SELECT * FROM {{ table_name }}
\n\n
Out[4]:
SELECT\n    *\nFROM\n    tpch.tiny.orders\n
In\u00a0[5]: Copied!
%%trino --limit 3 --jinja\n/*\nthis is a comment which happens to contain a jinja template\nvariable {{x}}\n*/\n\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --limit 3 --jinja /* this is a comment which happens to contain a jinja template variable {{x}} */ SELECT * FROM {{ table_name }}
\n\n\nA Jinja template variable named {x} was located in your SQL statement.\n\nHowever Jinja was unable to substitute it's value because the variable \"x\" was not found in your ipython kernel.\n\nOption 1: If you intended to use a template variable make sure to assign a value to \"x\"\n\n\nOption 2: If you intended to include \"{{\" in your statement then you'll need to escape this special Jinja variable delimiter.\n\nTo have Jinja ignore parts it would otherwise handle as variables or blocks. For example, if, with the default syntax, you want to use {{ as a raw string in a template and not start a variable, you have to use a trick.\n\nThe easiest way to output a literal variable delimiter \"{{\" is by using a variable expression:\n\n{{ '{{' }}\n\nFor bigger sections, it makes sense to mark a block raw. For example, to include example Jinja syntax in a template, you can use this snippet:\n\n%%trino --limit 3\n{% raw %}\n/*\nThis is a comment which happens to contain a jinja template\nvariable {{x}} that we want to keep as is.\n*/\n{% endraw %}\n\nSELECT\n    *\nFROM\n    {{ table_name }}\n\n\nRaising an error to prevent statement from being executed incorrectly.\n
\n---------------------------------------------------------------------------\nUndefinedError                            Traceback (most recent call last)\n/tmp/ipykernel_18939/3769332149.py in <module>\n----> 1 get_ipython().run_cell_magic('trino', '--limit 3 --jinja', '/*\\nthis is a comment which happens to contain a jinja template\\nvariable {{x}}\\n*/\\n\\nSELECT\\n    *\\nFROM\\n    {{ table_name }}\\n')\n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)\n   2417             with self.builtin_trap:\n   2418                 args = (magic_arg_s, cell)\n-> 2419                 result = fn(*args, **kwargs)\n   2420             return result\n   2421 \n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/decorator.py in fun(*args, **kw)\n    230             if not kwsyntax:\n    231                 args, kw = fix(args, kw, sig)\n--> 232             return caller(func, *(extras + args), **kw)\n    233     fun.__name__ = func.__name__\n    234     fun.__doc__ = func.__doc__\n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)\n    185     # but it's overkill for just that one bit of state.\n    186     def magic_deco(arg):\n--> 187         call = lambda f, *a, **k: f(*a, **k)\n    188 \n    189         if callable(arg):\n\n/data/dev/jupyterlab-sql-editor/ipython_magic/trino/trino.py in trino(self, line, cell, local_ns)\n     76                 print(f'Invalid refresh option given {args.refresh}. Valid refresh options are [all|local|none]')\n     77 \n---> 78         sql = self.get_sql_statement(cell, args.sql, args.jinja)\n     79         if not sql:\n     80             return\n\n/data/dev/jupyterlab-sql-editor/ipython_magic/common/base.py in get_sql_statement(self, cell, sql_argument, use_jinja)\n     93             print('No sql statement to execute')\n     94         elif use_jinja:\n---> 95             sql = self.bind_variables(sql, self.user_ns)\n     96         return sql\n     97 \n\n/data/dev/jupyterlab-sql-editor/ipython_magic/common/base.py in bind_variables(query, user_ns)\n     78     def bind_variables(query, user_ns):\n     79         template = Template(query, undefined=ExplainUndefined)\n---> 80         return template.render(user_ns)\n     81 \n     82     def get_catalog_array(self):\n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/jinja2/environment.py in render(self, *args, **kwargs)\n   1289             return concat(self.root_render_func(ctx))  # type: ignore\n   1290         except Exception:\n-> 1291             self.environment.handle_exception()\n   1292 \n   1293     async def render_async(self, *args: t.Any, **kwargs: t.Any) -> str:\n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/jinja2/environment.py in handle_exception(self, source)\n    923         from .debug import rewrite_traceback_stack\n    924 \n--> 925         raise rewrite_traceback_stack(source=source)\n    926 \n    927     def join_path(self, template: str, parent: str) -> str:\n\n<template> in top-level template code()\n\n/data/dev/jupyterlab-sql-editor/ipython_magic/common/base.py in __str__(self)\n     58         print(HOW_TO_ESCAPE_MSG)\n     59         print(RAISING_ERROR_MSG)\n---> 60         return super().__str__(self)\n     61 \n     62 \n\nUndefinedError: 'x' is undefined
In\u00a0[\u00a0]: Copied!
%%trino --limit 3 --jinja\n{% raw %}\n/*\nthis is a comment which happens to contain a jinja template\nvariable {{x}}\n*/\n{% endraw %}\n\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --limit 3 --jinja {% raw %} /* this is a comment which happens to contain a jinja template variable {{x}} */ {% endraw %} SELECT * FROM {{ table_name }} In\u00a0[\u00a0]: Copied!
%%trino --output sql --jinja\n{% raw %}\n/*\nthis is a comment which happens to contain a jinja template\nvariable {{x}}\n*/\n{% endraw %}\n\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --output sql --jinja {% raw %} /* this is a comment which happens to contain a jinja template variable {{x}} */ {% endraw %} SELECT * FROM {{ table_name }} In\u00a0[\u00a0]: Copied!
def get_filters():\n    return 1\n
def get_filters(): return 1 In\u00a0[\u00a0]: Copied!
%%trino --output sql --jinja\n{% raw %}\n/*\nthis is a comment which happens to contain a jinja template\nvariable {{x}}\n*/\n{% endraw %}\n\n{% set testing = get_filters() %}\n{{testing}}\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --output sql --jinja {% raw %} /* this is a comment which happens to contain a jinja template variable {{x}} */ {% endraw %} {% set testing = get_filters() %} {{testing}} SELECT * FROM {{ table_name }} In\u00a0[16]: Copied!
%%trino --limit 1 --output grid\nSELECT\n    *\nFROM (\n    SELECT\n        ARRAY[1,2,3] as an_array,\n        *\n    FROM\n      tpch.tiny.orders\n)\n
%%trino --limit 1 --output grid SELECT * FROM ( SELECT ARRAY[1,2,3] as an_array, * FROM tpch.tiny.orders )
\n\nOnly showing top 1 row(s)\n
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
In\u00a0[29]: Copied!
\n
In\u00a0[30]: Copied!
\n
\n\n
Out[30]:
SELECT \n    *\nFROM (\n    SELECT\n        ARRAY[1,2,3] as an_array,\n        *\n    FROM\n      tpch.tiny.orders\n)\n\n\n\n  \nWHERE\n  \n    1=0\n    \n      OR (contains(an_array, '1'))\n    \n      OR (contains(an_array, '2'))\n    \n  \n
"},{"location":"example/TrinoJinjaTemplate/#jinja-templating-with-trino","title":"Jinja Templating with trino\u00b6","text":""}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"jupyterlab-sql-editor","text":"

A JupyterLab extension providing the following features via %sparksql and %trino magics:

"},{"location":"sparksql/","title":"sparksql magic","text":"

A JupyterLab extension providing the following features via %%sparksql and %%trino magics:

"},{"location":"sparksql/#execute-and-output-your-query-results-into-an-interactive-data-grid","title":"Execute and output your query results into an interactive data grid","text":""},{"location":"sparksql/#output-as-json","title":"Output as JSON","text":""},{"location":"sparksql/#auto-suggest-column-names-and-sub-fields","title":"Auto suggest column names and sub-fields","text":""},{"location":"sparksql/#auto-suggest-joins-on-matching-column-names","title":"Auto suggest JOINs on matching column names","text":""},{"location":"sparksql/#format-and-show-syntax-highlighting-in-notebook-code-cells","title":"Format and show syntax highlighting in Notebook code cells","text":"

To format SQL statements in the cell, right-click in the cell and select Format Sql Cell or hit <CTRL+Q>.

"},{"location":"sparksql/#works-in-python-strings","title":"Works in Python strings","text":"

While inside a notebook, you can have a multi-line Python string containing SQL and enjoy the same features (syntax highlighting, code completion and SQL formatting) as in a sparksql cell by wrapping your string with --start-sparksql and --end-sparksql. Here is an example:

# declare a python string\nsql = \"\"\"\n--start-sparksql\nSELECT\n    *\nFROM\n    table AS t\n--end-sparksql\n\"\"\"\nprint(sql)\n

"},{"location":"sparksql/#capture-your-spark-query-as-a-dataframe-or-a-temporary-view","title":"Capture your Spark query as a Dataframe or a temporary view","text":""},{"location":"sparksql/#usage","title":"Usage","text":"

Parameter usage example:

%%sparksql -c -l 10 --dataframe df\n<QUERY>\n
Parameter Description --database NAME Spark database to use. -l LIMIT, --limit LIMIT The maximum number of rows to display. A value of zero is equivalent to --output skip -r all|local|none, --refresh all|local|none Force the regeneration of the schema cache file. The local option will only update tables/views created in the local Spark context. -d NAME, --dataframe NAME Capture results in a Spark dataframe named NAME. -c, --cache Cache dataframe. -e, --eager Cache dataframe with eager load. -v VIEW, --view VIEW Create or replace a temporary view named VIEW. -o sql|json|html|aggrid|grid|text|schema|skip|none, --output sql|json|html|aggrid|grid|text|schema|skip|none Output format. Defaults to html. The sql option prints the SQL statement that will be executed (useful to test jinja templated statements). -s, --show-nonprinting Replace none printable characters with their ascii codes (LF -> \\x0a) -j, --jinja Enable Jinja templating support. -b, --dbt Enable DBT templating support. -t LIMIT, --truncate LIMIT Truncate output. -m update|complete, --streaming_mode update|complete The mode of streaming queries. -x, --lean-exceptions Shortened exceptions. Might be helpful if the exceptions reported by Spark are noisy such as with big SQL queries."},{"location":"trino/","title":"trino magic","text":"

A JupyterLab extension providing the following features via %%sparksql and %%trino magics:

"},{"location":"trino/#use-jinja-templating-to-create-re-usable-sql","title":"Use jinja templating to create re-usable SQL","text":""},{"location":"trino/#usage","title":"Usage","text":"

Parameter usage example:

%%trino -c catalog -l 10 --dataframe df\n<QUERY>\n
Parameter Description -c NAME, --catalog NAME Trino catalog to use. -s NAME, --schema NAME Trino schema to use. -l LIMIT, --limit LIMIT The maximum number of rows to display. A value of zero is equivalent to --output skip -r all|none, --refresh all|none Force the regeneration of the schema cache file. -d NAME, --dataframe NAME Capture results in pandas dataframe named NAME. -o sql|json|html|aggrid|grid|text|schema|skip|none, --output sql|json|html|aggrid|grid|text|schema|skip|none Output format. Defaults to html. The sql option prints the SQL statement that will be executed (useful to test jinja templated statements). -s, --show-nonprinting Replace none printable characters with their ascii codes (LF -> \\x0a). -j, --jinja Enable Jinja templating support. -t LIMIT, --truncate LIMIT Truncate output. -x STATEMENT, --raw STATEMENT Run statement as is. Do not wrap statement with a limit. Use this option to run statement which can't be wrapped in a SELECT/LIMIT statement. For example EXPLAIN, SHOW TABLE, SHOW CATALOGS."},{"location":"example/SparkConfigurationUsage/","title":"Configuration and Usage","text":"In\u00a0[1]: Copied!
from pyspark.sql import SparkSession\n
from pyspark.sql import SparkSession In\u00a0[2]: Copied!
import ipywidgets as widgets\nout = widgets.Output()\nwith out:\n    spark = SparkSession.builder.getOrCreate()\n
import ipywidgets as widgets out = widgets.Output() with out: spark = SparkSession.builder.getOrCreate()

Normally IPython only displays the output of the last statement. However it can be handy to run multiple sql magics in a single cell and see the output of each execution. Setting ast_node_interactivity to all will enable that.

In\u00a0[3]: Copied!
from IPython.core.interactiveshell import InteractiveShell\nInteractiveShell.ast_node_interactivity = 'all'\n
from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = 'all' In\u00a0[4]: Copied!
%load_ext jupyterlab_sql_editor.ipython_magic.sparksql\n
%load_ext jupyterlab_sql_editor.ipython_magic.sparksql In\u00a0[5]: Copied!
%config SparkSql.cacheTTL=3600\n%config SparkSql.outputFile=\"/tmp/sparkdb.schema.json\"\n
%config SparkSql.cacheTTL=3600 %config SparkSql.outputFile=\"/tmp/sparkdb.schema.json\" In\u00a0[6]: Copied!
df = spark.read.json(\"file:/path/to/contacts.json\")\ndf.createOrReplaceTempView(\"CONTACTS_TABLE\")\ndf.printSchema()\n
df = spark.read.json(\"file:/path/to/contacts.json\") df.createOrReplaceTempView(\"CONTACTS_TABLE\") df.printSchema()
root\n |-- address: struct (nullable = true)\n |    |-- city: string (nullable = true)\n |    |-- postalCode: string (nullable = true)\n |    |-- state: string (nullable = true)\n |    |-- streetAddress: string (nullable = true)\n |-- age: long (nullable = true)\n |-- first Name: string (nullable = true)\n |-- last Name: string (nullable = true)\n |-- phoneNumbers: array (nullable = true)\n |    |-- element: struct (containsNull = true)\n |    |    |-- number: string (nullable = true)\n |    |    |-- type: string (nullable = true)\n\n
In\u00a0[7]: Copied!
df = spark.read.json(\"file:/path/to/conversations.json\")\ndf.createOrReplaceTempView(\"MESSAGES_TABLE\")\ndf.printSchema()\n
df = spark.read.json(\"file:/path/to/conversations.json\") df.createOrReplaceTempView(\"MESSAGES_TABLE\") df.printSchema()
root\n |-- first Name: string (nullable = true)\n |-- last Name: string (nullable = true)\n |-- messages: array (nullable = true)\n |    |-- element: struct (containsNull = true)\n |    |    |-- body: string (nullable = true)\n |    |    |-- time: string (nullable = true)\n\n
In\u00a0[8]: Copied!
%sparksql --refresh all\n
%sparksql --refresh all
Exporting functions: [########################################] 100.0%\nSchema file updated: /tmp/sparkdb.schema.json\n
In\u00a0[9]: Copied!
%sparksql SHOW TABLES\n
%sparksql SHOW TABLES
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
namespacetableNameisTemporary contacts_tabletrue messages_tabletrue
Execution time: 0.24 seconds
In\u00a0[10]: Copied!
%%sparksql --output grid --limit 1000\nSELECT\n    id,\n    uuid()\nFROM\n    RANGE (1, 1000)\n
%%sparksql --output grid --limit 1000 SELECT id, uuid() FROM RANGE (1, 1000)
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
Execution time: 1.86 seconds
In\u00a0[11]: Copied!
%%sparksql --output html --limit 3\n\nSELECT\n    con.`first Name`,\n    con.phoneNumbers [ 0 ].type as primary_number,\n    array_contains(con.phoneNumbers.type, 'home') as flag\nFROM\n    contacts_table AS con\n
%%sparksql --output html --limit 3 SELECT con.`first Name`, con.phoneNumbers [ 0 ].type as primary_number, array_contains(con.phoneNumbers.type, 'home') as flag FROM contacts_table AS con
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
first Nameprimary_numberflag Rackhometrue
Execution time: 0.19 seconds
In\u00a0[12]: Copied!
%%sparksql --output json --limit 3\nSELECT\n    *\nFROM\n    contacts_table AS con\n
%%sparksql --output json --limit 3 SELECT * FROM contacts_table AS con
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
<IPython.core.display.JSON object>
Execution time: 0.19 seconds
In\u00a0[13]: Copied!
%%sparksql --output schema\nSELECT\n    *\nFROM\n    contacts_table AS con\n
%%sparksql --output schema SELECT * FROM contacts_table AS con
root\n |-- address: struct (nullable = true)\n |    |-- city: string (nullable = true)\n |    |-- postalCode: string (nullable = true)\n |    |-- state: string (nullable = true)\n |    |-- streetAddress: string (nullable = true)\n |-- age: long (nullable = true)\n |-- first Name: string (nullable = true)\n |-- last Name: string (nullable = true)\n |-- phoneNumbers: array (nullable = true)\n |    |-- element: struct (containsNull = true)\n |    |    |-- number: string (nullable = true)\n |    |    |-- type: string (nullable = true)\n\n
In\u00a0[14]: Copied!
%%sparksql --view the_exploded_table --output skip\nSELECT\n    *,\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n
%%sparksql --view the_exploded_table --output skip SELECT *, explode(con.phoneNumbers) as phoneNumber FROM contacts_table AS con
Created temporary view `the_exploded_table`\nQuery execution skipped\n
In\u00a0[15]: Copied!
%sparksql SHOW TABLES\n
%sparksql SHOW TABLES
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
namespacetableNameisTemporary contacts_tabletrue messages_tabletrue the_exploded_tabletrue
Execution time: 0.08 seconds
In\u00a0[16]: Copied!
%%sparksql\nSELECT\n    *\nFROM\n    the_exploded_table AS the\n
%%sparksql SELECT * FROM the_exploded_table AS the
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
addressagefirst Namelast NamephoneNumbersphoneNumber {San Jone, 394221, CA, 126}24RackJackon[{7383627627, home}]{7383627627, home}
Execution time: 0.25 seconds
In\u00a0[17]: Copied!
%%sparksql --output text\nSELECT\n    *\nFROM\n    the_exploded_table AS the\n
%%sparksql --output text SELECT * FROM the_exploded_table AS the
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
+---------------------------+---+----------+---------+--------------------+------------------+\n|                    address|age|first Name|last Name|        phoneNumbers|       phoneNumber|\n+---------------------------+---+----------+---------+--------------------+------------------+\n|{San Jone, 394221, CA, 126}| 24|      Rack|   Jackon|[{7383627627, home}]|{7383627627, home}|\n+---------------------------+---+----------+---------+--------------------+------------------+
Execution time: 0.09 seconds
In\u00a0[18]: Copied!
%%sparksql --output sql\nSELECT\n    *\nFROM\n    the_exploded_table AS the\n
%%sparksql --output sql SELECT * FROM the_exploded_table AS the Out[18]:
SELECT\n    *\nFROM\n    the_exploded_table AS the\n
In\u00a0[19]: Copied!
%%sparksql --dataframe the_exploded_dataframe --output skip\nSELECT\n    *,\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n
%%sparksql --dataframe the_exploded_dataframe --output skip SELECT *, explode(con.phoneNumbers) as phoneNumber FROM contacts_table AS con
Captured dataframe to local variable `the_exploded_dataframe`\nQuery execution skipped\n
In\u00a0[20]: Copied!
the_exploded_dataframe.select('phoneNumber').show()\n
the_exploded_dataframe.select('phoneNumber').show()
+------------------+\n|       phoneNumber|\n+------------------+\n|{7383627627, home}|\n+------------------+\n\n
In\u00a0[27]: Copied!
# declare a python string\nsql = '''\n--start-sparksql\nSELECT\n    *, con.`first Name`\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n--end-sparksql\n'''\nprint(sql)\n
# declare a python string sql = ''' --start-sparksql SELECT *, con.`first Name` explode(con.phoneNumbers) as phoneNumber FROM contacts_table AS con --end-sparksql ''' print(sql)
\n--start-sparksql\nSELECT\n    *, con.`first Name`\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n--end-sparksql\n\n
In\u00a0[\u00a0]: Copied!
# declare a python string\nsql = '''\n--start-sparksql\nSELECT\n    *,\n    explode(con.phoneNumbers) as phoneNumber\nFROM\n    contacts_table AS con\n--end-sparksql\n'''\nprint(sql)\n
# declare a python string sql = ''' --start-sparksql SELECT *, explode(con.phoneNumbers) as phoneNumber FROM contacts_table AS con --end-sparksql ''' print(sql) In\u00a0[22]: Copied!
spark.sql(sql).show()\n
spark.sql(sql).show()
+--------------------+---+----------+---------+--------------------+------------------+\n|             address|age|first Name|last Name|        phoneNumbers|       phoneNumber|\n+--------------------+---+----------+---------+--------------------+------------------+\n|{San Jone, 394221...| 24|      Rack|   Jackon|[{7383627627, home}]|{7383627627, home}|\n+--------------------+---+----------+---------+--------------------+------------------+\n\n
In\u00a0[23]: Copied!
%%sparksql?\n
%%sparksql?
Docstring:\n::\n\n  %sparksql [-l max_rows] [-r all|local|none] [-d name] [-c] [-e]\n                [-v name] [-o sql|json|html|grid|text|schema|skip|none] [-s]\n                [-j] [-t max_cell_length]\n                [sql [sql ...]]\n\nMagic that works both as %sparksql and as %%sparksql\n\npositional arguments:\n  sql                   SQL statement to execute\n\noptional arguments:\n  -l max_rows, --limit max_rows\n                        The maximum number of rows to display. A value of zero\n                        is equivalent to `--output skip`\n  -r <all|local|none>, --refresh <all|local|none>\n                        Force the regeneration of the schema cache file. The\n                        `local` option will only update tables/views created\n                        in the local Spark context.\n  -d name, --dataframe name\n                        Capture dataframe in a local variable named `name`\n  -c, --cache           Cache dataframe\n  -e, --eager           Cache dataframe with eager load\n  -v name, --view name  Create or replace a temporary view named `name`\n  -o <sql|json|html|grid|text|schema|skip|none>, --output <sql|json|html|grid|text|schema|skip|none>\n                        Output format. Defaults to html. The `sql` option\n                        prints the SQL statement that will be executed (useful\n                        to test jinja templated statements)\n  -s, --show-nonprinting\n                        Replace none printable characters with their ascii\n                        codes (LF -> )\n  -j, --jinja           Enable Jinja templating support\n  -t max_cell_length, --truncate max_cell_length\n                        Truncate output\nFile:      /data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/jupyterlab_sql_editor/ipython_magic/sparksql/sparksql.py\n
In\u00a0[24]: Copied!
%%sparksql --limit 1 --output grid\nSELECT\n    id,\n    rand() AS f1,\n    rand() AS f2,\n    rand() AS f3,\n    rand() AS f4,\n    rand() AS f5,\n    TRANSFORM(SEQUENCE(1, 512), x -> rand()) AS data -- array of 512 floats\nFROM\nRANGE\n    (1, 400000, 1, 100)\nUNION\nSELECT\n    id,\n    rand() AS f1,\n    rand() AS f2,\n    rand() AS f3,\n    rand() AS f4,\n    rand() AS f5,\n    TRANSFORM(SEQUENCE(1, 512), x -> rand()) AS data -- array of 512 floats\nFROM\nRANGE\n    (1, 40000, 1, 100)\n
%%sparksql --limit 1 --output grid SELECT id, rand() AS f1, rand() AS f2, rand() AS f3, rand() AS f4, rand() AS f5, TRANSFORM(SEQUENCE(1, 512), x -> rand()) AS data -- array of 512 floats FROM RANGE (1, 400000, 1, 100) UNION SELECT id, rand() AS f1, rand() AS f2, rand() AS f3, rand() AS f4, rand() AS f5, TRANSFORM(SEQUENCE(1, 512), x -> rand()) AS data -- array of 512 floats FROM RANGE (1, 40000, 1, 100)
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
only showing top 1 row
Execution time: 10.18 seconds
In\u00a0[26]: Copied!
%%sparksql\nSELECT\n    mes.`first Name`,\n    mes.`last Name`,\n    mes.messages,\n    mes.messages.body,\n    mes.messages.time\nFROM\n    contacts_table AS con\n    INNER JOIN messages_table AS mes ON mes.`first Name` = con.`first Name`\n
%%sparksql SELECT mes.`first Name`, mes.`last Name`, mes.messages, mes.messages.body, mes.messages.time FROM contacts_table AS con INNER JOIN messages_table AS mes ON mes.`first Name` = con.`first Name`
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Output()
first Namelast Namemessagesbodytime RackJackon[{hello, 2022-01-15}, {you there, 2022-01-16}][hello, you there][2022-01-15, 2022-01-16]
Execution time: 0.15 seconds
"},{"location":"example/SparkConfigurationUsage/#configuration-and-usage","title":"Configuration and Usage\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#press-tab-to-trigger-auto-completions-and-ctrl-q-to-format-cell","title":"Press tab to trigger auto completions and Ctrl-Q to format cell\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#create-a-temporary-view-with-the-view-option","title":"Create a temporary view with the --view option\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#use-temporary-view-in-subsequent-queries-with-autocomplet-suggestions","title":"Use temporary view in subsequent queries with autocomplet suggestions\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#create-a-dataframe-variable-to-use-in-pypark","title":"Create a dataframe variable to use in pypark\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#continue-developing-your-query-using-dataframe-api","title":"Continue developing your query using dataframe API\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#edit-sql-within-python-strings","title":"Edit SQL within python strings\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#enjoy-the-same-functionality-as-a-code-cell","title":"Enjoy the same functionality as a code cell\u00b6","text":""},{"location":"example/SparkConfigurationUsage/#long-running-query-show-progress-bar-and-link-to-spark-ui","title":"Long running query show progress bar and link to Spark UI\u00b6","text":""},{"location":"example/SparkDataframe/","title":"SparkDataframe","text":"In\u00a0[1]: Copied!
from pyspark.sql import SparkSession\nimport ipywidgets as widgets\nout = widgets.Output()\nwith out:\n    spark = SparkSession.builder.getOrCreate()\n
from pyspark.sql import SparkSession import ipywidgets as widgets out = widgets.Output() with out: spark = SparkSession.builder.getOrCreate() In\u00a0[2]: Copied!
df = spark.sql(\"SELECT id, uuid() FROM RANGE (1, 1000)\")\ndf\n
df = spark.sql(\"SELECT id, uuid() FROM RANGE (1, 1000)\") df Out[2]:
DataFrame[id: bigint, uuid(): string]
In\u00a0[3]: Copied!
from jupyterlab_sql_editor.ipython.sparkdf import register_display\nfrom jupyterlab_sql_editor.outputters.outputters import _display_results\nregister_display()\n
from jupyterlab_sql_editor.ipython.sparkdf import register_display from jupyterlab_sql_editor.outputters.outputters import _display_results register_display() In\u00a0[4]: Copied!
# change default display behaviour\ndf = spark.sql(\"SELECT id, uuid() FROM RANGE (1, 1000)\")\ndf\n
# change default display behaviour df = spark.sql(\"SELECT id, uuid() FROM RANGE (1, 1000)\") df
SparkSchemaWidget(nodes=(Node(close_icon='angle-down', close_icon_style='danger', icon='project-diagram', icon\u2026
Open Spark UI \u2b50 pyspark-shell
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
Out[4]:
 In\u00a0[5]: Copied! 
pdf = df.limit(1).toPandas()\n
pdf = df.limit(1).toPandas() In\u00a0[6]: Copied!
# _display_results lets you configure the output\n_display_results(pdf, output=\"html\", show_nonprinting=False)\n
# _display_results lets you configure the output _display_results(pdf, output=\"html\", show_nonprinting=False) iduuid() 19d977b7e-e4b2-4ce5-9f5b-184054f7542d In\u00a0[7]: Copied!
_display_results(pdf, output=\"text\")\n
_display_results(pdf, output=\"text\")
+---+------------------------------------+\n|id |uuid()                              |\n+---+------------------------------------+\n|1  |9d977b7e-e4b2-4ce5-9f5b-184054f7542d|\n+---+------------------------------------+\n\n
In\u00a0[8]: Copied!
df = spark.read.json(\"file:/path/to/contacts.json\")\n_display_results(pdf, output=\"json\")\n
df = spark.read.json(\"file:/path/to/contacts.json\") _display_results(pdf, output=\"json\")
<IPython.core.display.JSON object>
"},{"location":"example/SparkSQLEscapeControlChars/","title":"Escaping Control Characters","text":"In\u00a0[\u00a0]: Copied!
from pyspark.sql import SparkSession\n\nspark = SparkSession.builder.getOrCreate()\n\n%load_ext jupyterlab_sql_editor.ipython_magic.sparksql\n
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() %load_ext jupyterlab_sql_editor.ipython_magic.sparksql In\u00a0[2]: Copied!
spark.sql('''\n--start-sparksql\nSELECT\n    '\\r\\t\\b\\n\\t\\n' AS a,\n    1 AS num,\n    ARRAY('\\t123', 'ab\\nc') AS str_array,\n    '\\t' AS tab,\n    '\\\\t' AS backslash_and_t,\n    '\\\\\\t' AS backslash_and_tab,\n    '\\\\\\\\t' AS two_backslash_and_t,\n    '\\\\\\\\\\t' AS two_backslash_and_tab\n--end-sparksql\n''').show()\n
spark.sql(''' --start-sparksql SELECT '\\r\\t\\b\\n\\t\\n' AS a, 1 AS num, ARRAY('\\t123', 'ab\\nc') AS str_array, '\\t' AS tab, '\\\\t' AS backslash_and_t, '\\\\\\t' AS backslash_and_tab, '\\\\\\\\t' AS two_backslash_and_t, '\\\\\\\\\\t' AS two_backslash_and_tab --end-sparksql ''').show()
+------+---+------------+---+---------------+-----------------+-------------------+---------------------+\n|     a|num|   str_array|tab|backslash_and_t|backslash_and_tab|two_backslash_and_t|two_backslash_and_tab|\n+------+---+------------+---+---------------+-----------------+-------------------+---------------------+\n|\n\t\n|  1|[\t123, ab\nc]|  \t|              \t|                \t|                 \\t|                   \\\t|\n+------+---+------------+---+---------------+-----------------+-------------------+---------------------+\n\n
In\u00a0[3]: Copied!
%%sparksql --output html --show-nonprinting\nSELECT\n    '\\r\\t\\b\\n\\t\\n' AS a,\n    1 AS num,\n    ARRAY('\\t123', 'ab\\nc') AS str_array,\n    '\\t' AS tab,\n    '\\\\t' AS backslash_and_t,\n    '\\\\\\t' AS backslash_and_tab,\n    '\\\\\\\\t' AS two_backslash_and_t,\n    '\\\\\\\\\\t' AS two_backslash_and_tab\n
%%sparksql --output html --show-nonprinting SELECT '\\r\\t\\b\\n\\t\\n' AS a, 1 AS num, ARRAY('\\t123', 'ab\\nc') AS str_array, '\\t' AS tab, '\\\\t' AS backslash_and_t, '\\\\\\t' AS backslash_and_tab, '\\\\\\\\t' AS two_backslash_and_t, '\\\\\\\\\\t' AS two_backslash_and_tab
TTL -1 seconds expired, re-generating schema file: /tmp/sparkdb.schema.json\nGenerating schema file: /tmp/sparkdb.schema.json\nSchema file updated: /tmp/sparkdb.schema.json\n
Out[3]: anumstr_arraytabbackslash_and_tbackslash_and_tabtwo_backslash_and_ttwo_backslash_and_tab\\x0d\\x09\\x08\\x0a\\x09\\x0a1['\\t123', 'ab\\nc']\\x09\\t\\\\x09\\\\t\\\\\\x09 In\u00a0[4]: Copied!
%%sparksql --output grid --show-nonprinting\nSELECT\n    '\\r\\t\\b\\n\\t\\n' AS a,\n    1 AS num,\n    ARRAY('\\t123', 'ab\\nc') AS str_array,\n    '\\t' AS tab,\n    '\\\\t' AS backslash_and_t,\n    '\\\\\\t' AS backslash_and_tab,\n    '\\\\\\\\t' AS two_backslash_and_t,\n    '\\\\\\\\\\t' AS two_backslash_and_tab\n
%%sparksql --output grid --show-nonprinting SELECT '\\r\\t\\b\\n\\t\\n' AS a, 1 AS num, ARRAY('\\t123', 'ab\\nc') AS str_array, '\\t' AS tab, '\\\\t' AS backslash_and_t, '\\\\\\t' AS backslash_and_tab, '\\\\\\\\t' AS two_backslash_and_t, '\\\\\\\\\\t' AS two_backslash_and_tab
TTL -1 seconds expired, re-generating schema file: /tmp/sparkdb.schema.json\nGenerating schema file: /tmp/sparkdb.schema.json\nSchema file updated: /tmp/sparkdb.schema.json\n
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
In\u00a0[5]: Copied!
%%sparksql --output json --show-nonprinting\nSELECT\n    '\\r\\t\\b\\n\\t\\n' AS a,\n    1 AS num,\n    ARRAY('\\t123', 'ab\\nc') AS str_array,\n    '\\t' AS tab,\n    '\\\\t' AS backslash_and_t,\n    '\\\\\\t' AS backslash_and_tab,\n    '\\\\\\\\t' AS two_backslash_and_t,\n    '\\\\\\\\\\t' AS two_backslash_and_tab\n
%%sparksql --output json --show-nonprinting SELECT '\\r\\t\\b\\n\\t\\n' AS a, 1 AS num, ARRAY('\\t123', 'ab\\nc') AS str_array, '\\t' AS tab, '\\\\t' AS backslash_and_t, '\\\\\\t' AS backslash_and_tab, '\\\\\\\\t' AS two_backslash_and_t, '\\\\\\\\\\t' AS two_backslash_and_tab
TTL -1 seconds expired, re-generating schema file: /tmp/sparkdb.schema.json\nGenerating schema file: /tmp/sparkdb.schema.json\nSchema file updated: /tmp/sparkdb.schema.json\n
Out[5]:
<IPython.core.display.JSON object>
"},{"location":"example/SparkSQLEscapeControlChars/#escaping-control-characters","title":"Escaping Control Characters\u00b6","text":""},{"location":"example/SparkSyntaxDemo/","title":"Spark Syntax Demo","text":"In\u00a0[\u00a0]: Copied!
%%sparksql --dataframe df -c --eager -v MY_VIEW --limit 12 -f adir/out.json -t 60\n-- cell magic\nSELECT *\nFROM student AS cellmagic\n
%%sparksql --dataframe df -c --eager -v MY_VIEW --limit 12 -f adir/out.json -t 60 -- cell magic SELECT * FROM student AS cellmagic In\u00a0[\u00a0]: Copied!
%sparksql -d df SELECT * from student where x=1\n
%sparksql -d df SELECT * from student where x=1 In\u00a0[\u00a0]: Copied!
%sparksql --dataframe df -c --eager -v MY_VIEW --limit 12 -f adir/out.json -t 60 SELECT * from student -- line magic using no argument options like --eager\n
%sparksql --dataframe df -c --eager -v MY_VIEW --limit 12 -f adir/out.json -t 60 SELECT * from student -- line magic using no argument options like --eager In\u00a0[\u00a0]: Copied!
%sparksql SELECT * FROM tab WHERE x = 1\n\n\n# select is not be highlighted anymore, but it does require a blank line (line 3 above)\n%sparksql  --cache   SELECT * FROM tab WHERE x = 2\n\n%sparksql --cache SELECT * FROM tab WHERE x = 3\n
%sparksql SELECT * FROM tab WHERE x = 1 # select is not be highlighted anymore, but it does require a blank line (line 3 above) %sparksql --cache SELECT * FROM tab WHERE x = 2 %sparksql --cache SELECT * FROM tab WHERE x = 3 In\u00a0[\u00a0]: Copied!
# mix python and SQL\n\n# python import\nimport pyspark\n\n%sparksql SELECT s.age FROM student AS linemagic1 -- line magic in a mix cell\n\n%sparksql SELECT s.age FROM student AS linemagic2 -- another line magic in a mix cell\n\n# a python string with SQL within it\nsql = '''\n--start-sparksql anything here is ignored and is not included in the SQL statement for LSP\nselect s.age from student as frompythonstring\n--end-sparksql\n'''\n\n# back to python\nprint(sql)\n\nspark.sql(sql).show()\n
# mix python and SQL # python import import pyspark %sparksql SELECT s.age FROM student AS linemagic1 -- line magic in a mix cell %sparksql SELECT s.age FROM student AS linemagic2 -- another line magic in a mix cell # a python string with SQL within it sql = ''' --start-sparksql anything here is ignored and is not included in the SQL statement for LSP select s.age from student as frompythonstring --end-sparksql ''' # back to python print(sql) spark.sql(sql).show()"},{"location":"example/SparkSyntaxDemo/#spark-syntax-demo","title":"Spark Syntax Demo\u00b6","text":""},{"location":"example/SupersetJinjaTestHarness/","title":"Superset Tests Harness using Jinja","text":"In\u00a0[2]: Copied!
%load_ext jupyterlab_sql_editor.ipython_magic.trino\n%config Trino.host='localhost'\n%config Trino.port=8080\n%config Trino.httpScheme='http'\n%config Trino.auth=None\n%config Trino.user='the-user'\n\n%config Trino.cacheTTL=3600\n%config Trino.outputFile=\"/tmp/trinodb.schema.json\"\n\n# comma seperated list of schema to cache in the schema file\n%config Trino.catalogs=\"system,tpch\"\n
%load_ext jupyterlab_sql_editor.ipython_magic.trino %config Trino.host='localhost' %config Trino.port=8080 %config Trino.httpScheme='http' %config Trino.auth=None %config Trino.user='the-user' %config Trino.cacheTTL=3600 %config Trino.outputFile=\"/tmp/trinodb.schema.json\" # comma seperated list of schema to cache in the schema file %config Trino.catalogs=\"system,tpch\" In\u00a0[3]: Copied!
%%trino --limit 3 --output grid\nSELECT\n    *\nFROM\n      tpch.tiny.orders\n
%%trino --limit 3 --output grid SELECT * FROM tpch.tiny.orders
\n\n
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
Only showing top 3 row(s)\n
In\u00a0[4]: Copied!
# Superset function to retrieve filters\ndef filter_values(column: str, default=None, remove_filter: bool = False):\n    return VALUE_LIST\n\nVALUE_LIST = ['Clerk#00000036', 'Clerk#000000779']\n
# Superset function to retrieve filters def filter_values(column: str, default=None, remove_filter: bool = False): return VALUE_LIST VALUE_LIST = ['Clerk#00000036', 'Clerk#000000779'] In\u00a0[5]: Copied!
%%trino --limit 1 --jinja --output sql\nSELECT\n    *\nFROM\n  tpch.tiny.orders\nWHERE\n    orderkey in ({{ \"'\" + \"','\".join(filter_values('clerk')) + \"'\" }})\n
%%trino --limit 1 --jinja --output sql SELECT * FROM tpch.tiny.orders WHERE orderkey in ({{ \"'\" + \"','\".join(filter_values('clerk')) + \"'\" }})
\n\n
Out[5]:
SELECT \n    *\nFROM\n  tpch.tiny.orders\nWHERE\n    orderkey in ('Clerk#00000036','Clerk#000000779')\n
In\u00a0[10]: Copied!
VALUE_LIST = ['Clerk#00000036', 'Clerk#000000779']\n\n# Superset function to retrieve filters\ndef filter_values(column: str, default=None, remove_filter: bool = False):\n    return VALUE_LIST\n\ndef quote_value(v):\n    if isinstance(v, str):\n        # escape quotes found in value\n        v = v.replace(\"'\", \"''\")\n        # quote string values\n        v = f\"'{v}'\"\n    return str(v)\n\ndef sql_filter_value_list(column: str, default=None):\n    \"\"\"\n        Build the SQL string representation of a list of values\n        taking into consideration the value type. String will get quoted\n        but numbers will not. Also quotes within these strings get escaped\n    \"\"\"\n    values = filter_values(str, default, True)\n    if len(values) > 0:\n        quoted_values = [quote_value(v) for v in values]\n        return \",\".join(quoted_values)\n    return None\n
VALUE_LIST = ['Clerk#00000036', 'Clerk#000000779'] # Superset function to retrieve filters def filter_values(column: str, default=None, remove_filter: bool = False): return VALUE_LIST def quote_value(v): if isinstance(v, str): # escape quotes found in value v = v.replace(\"'\", \"''\") # quote string values v = f\"'{v}'\" return str(v) def sql_filter_value_list(column: str, default=None): \"\"\" Build the SQL string representation of a list of values taking into consideration the value type. String will get quoted but numbers will not. Also quotes within these strings get escaped \"\"\" values = filter_values(str, default, True) if len(values) > 0: quoted_values = [quote_value(v) for v in values] return \",\".join(quoted_values) return None In\u00a0[11]: Copied!
%%trino --limit 1 --jinja --output sql\nSELECT\n    *\nFROM\n  tpch.tiny.orders\nWHERE\n    orderkey in ({{sql_filter_value_list('clerk')}})\n
%%trino --limit 1 --jinja --output sql SELECT * FROM tpch.tiny.orders WHERE orderkey in ({{sql_filter_value_list('clerk')}})
\n\n
Out[11]:
SELECT \n    *\nFROM\n  tpch.tiny.orders\nWHERE\n    orderkey in ('Clerk#00000036','Clerk#000000779')\n
In\u00a0[\u00a0]: Copied!
\n
In\u00a0[\u00a0]: Copied!
\n
"},{"location":"example/SupersetJinjaTestHarness/#superset-tests-harness-using-jinja","title":"Superset Tests Harness using Jinja\u00b6","text":""},{"location":"example/SupersetJinjaTestHarness/#elaborate-a-function-to-build-list-of-values","title":"Elaborate a function to build list of values\u00b6","text":"

This function can then be registered in Superset and re-used in virtual datasets.

"},{"location":"example/TrinoConfigurationUsage/","title":"Configuration and Usage","text":"

Normally IPython only displays the output of the last statement. However it can be handy to run multiple sql magics in a single cell and see the output of each execution. Setting ast_node_interactivity to all will enable that.

In\u00a0[1]: Copied!
# Display all cell outputs in notebook\nfrom IPython.core.interactiveshell import InteractiveShell\nInteractiveShell.ast_node_interactivity = 'all'\n
# Display all cell outputs in notebook from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = 'all' In\u00a0[2]: Copied!
%load_ext jupyterlab_sql_editor.ipython_magic.trino\n
%load_ext jupyterlab_sql_editor.ipython_magic.trino In\u00a0[3]: Copied!
%config Trino.host='localhost'\n%config Trino.port=8080\n%config Trino.httpScheme='http'\n%config Trino.auth=None\n%config Trino.user='the-user'\n\n%config Trino.cacheTTL=3600\n%config Trino.outputFile=\"/tmp/trinodb.schema.json\"\n\n# comma seperated list of schema to cache in the schema file\n%config Trino.catalogs=\"system,tpch\"\n
%config Trino.host='localhost' %config Trino.port=8080 %config Trino.httpScheme='http' %config Trino.auth=None %config Trino.user='the-user' %config Trino.cacheTTL=3600 %config Trino.outputFile=\"/tmp/trinodb.schema.json\" # comma seperated list of schema to cache in the schema file %config Trino.catalogs=\"system,tpch\"

In production environment you will want to pass in an authentiction

import trino\n    %config Trino.auth=trino.auth.BasicAuthentication(\"principal id\", \"password\")\n    %config Trino.user=None\n

See https://github.com/trinodb/trino-python-client/blob/master/trino/auth.py for more details

In\u00a0[4]: Copied!
%trino --refresh all\n
%trino --refresh all
Exporting functions: [########################################] 100.0%\nSchema file updated: /tmp/trinodb.schema.json\n
In\u00a0[5]: Copied!
%trino SELECT 'hello'\n
%trino SELECT 'hello'
\n\n
_col0hello In\u00a0[6]: Copied!
#%trino SHOW CATALOGS\n
#%trino SHOW CATALOGS In\u00a0[7]: Copied!
%%trino --limit 2 --output sql\nSELECT *\nFROM\n    tpch.tiny.orders AS ord\n
%%trino --limit 2 --output sql SELECT * FROM tpch.tiny.orders AS ord
\n\n
Out[7]:
SELECT *\nFROM\n    tpch.tiny.orders AS ord\n
In\u00a0[8]: Copied!
%%trino --limit 2 --dataframe x --output grid\nSELECT\n    ord.orderkey,\n    ord.custkey,\n    ord.orderstatus,\n    ord.totalprice,\n    ord.orderdate,\n    ord.orderpriority,\n    ord.clerk,\n    ord.shippriority,\n    ord.comment\nFROM\n    tpch.tiny.orders AS ord\n
%%trino --limit 2 --dataframe x --output grid SELECT ord.orderkey, ord.custkey, ord.orderstatus, ord.totalprice, ord.orderdate, ord.orderpriority, ord.clerk, ord.shippriority, ord.comment FROM tpch.tiny.orders AS ord
\n\nSaved results to pandas dataframe named `x`\n
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
Only showing top 2 row(s)\n
In\u00a0[9]: Copied!
%%trino --catalog tpch --schema sf1000\n\nSELECT * FROM lineitem\n
%%trino --catalog tpch --schema sf1000 SELECT * FROM lineitem
tpch\nsf1000\n
orderkeypartkeysuppkeylinenumberquantityextendedpricediscounttaxreturnflaglinestatusshipdatecommitdatereceiptdateshipinstructshipmodecomment3750000001756054533105475139.052832.130.070.07AF1994-03-301994-05-231994-04-04TAKE BACK RETURNAIRdolites above the even, b37500000011008318115831832215.026066.550.090.05AF1994-03-191994-04-161994-03-25COLLECT CODRAIL regular dependencies. entici37500000021288501128850113149.051727.830.020.07AF1995-03-081995-01-261995-03-09TAKE BACK RETURNRAILly alongside of the re3750000002183830552883058925.07366.80.040.08RF1994-12-181995-01-021995-01-09NONEMAILusly regular accoun37500000024008246482465344.063556.240.020.04RF1994-12-201995-01-211994-12-26COLLECT CODRAILt have to wake blithely r37500000026595701995703244.04290.880.050.07RF1995-02-031995-01-131995-02-24TAKE BACK RETURNSHIPges cajole furiously across the sl37500000021396112322111246513.014771.250.060.08AF1994-12-131995-02-261995-01-02COLLECT CODREG AIRuests sleep furiously slyly special excus3750000002459695133469526644.069529.680.010.06RF1995-01-201995-01-301995-02-08NONEAIRironic braids across t37500000031484709158470916138.071382.620.070.07AF1993-10-021993-09-191993-10-20TAKE BACK RETURNAIRke blithely. furiously bold accounts dete37500000031506635723163588247.071817.880.080.06AF1993-09-061993-11-051993-09-19COLLECT CODAIRnding orbits ought to nag evenly express s37500000032641545264154637.010404.870.010.08RF1993-11-281993-09-201993-12-07COLLECT CODSHIPnding warhorses wake slyly instr37500000031964445431444582424.035465.280.050.03RF1993-09-181993-09-261993-10-18COLLECT CODAIReans: carefully express a375000000468826735632675419.014924.610.10.07AF1993-07-271993-09-201993-07-29TAKE BACK RETURNMAILar foxes. 375000000417092818992819026.07251.840.040.04AF1993-10-301993-08-251993-11-02DELIVER IN PERSONMAILiously up the final notornis. depe37500000051567707611770792126.047422.180.040.02RF1993-12-101993-11-011993-12-26COLLECT CODREG AIR furiously final deposits. fluffily375000000546178756117876525.09162.250.020.05AF1993-10-121993-11-051993-10-14DELIVER IN PERSONMAILside the always special accounts37500000051263431026343103330.034163.70.050.07AF1993-11-251993-11-121993-11-30DELIVER IN PERSONFOBly. slyly regular dolphins cajole blithel37500000054849922299922748.09750.40.040.02AF1993-12-011993-11-021993-12-21TAKE BACK RETURNTRUCKfluffily even dependencies. reg37500000061465966339096648146.079226.260.00.01RF1994-03-121994-02-161994-04-05DELIVER IN PERSONFOB pending requests. 3750000006496102627110275238.044451.640.00.03RF1994-01-231994-01-031994-02-03DELIVER IN PERSONREG AIRong the regular, express packages
Only showing top 20 row(s)\n
In\u00a0[10]: Copied!
%%trino\nSELECT\n    lin.orderkey,\n    lin.partkey,\n    lin.suppkey,\n    lin.linenumber,\n    lin.quantity,\n    lin.extendedprice,\n    lin.discount,\n    lin.tax,\n    lin.returnflag,\n    lin.linestatus,\n    lin.shipdate,\n    lin.commitdate,\n    lin.receiptdate,\n    lin.shipinstruct,\n    lin.shipmode,\n    lin.comment,\n    ord.orderpriority\nFROM\n    tpch.sf1000.lineitem AS lin\n    INNER JOIN tpch.sf1.orders AS ord ON ord.orderkey = lin.orderkey\n
%%trino SELECT lin.orderkey, lin.partkey, lin.suppkey, lin.linenumber, lin.quantity, lin.extendedprice, lin.discount, lin.tax, lin.returnflag, lin.linestatus, lin.shipdate, lin.commitdate, lin.receiptdate, lin.shipinstruct, lin.shipmode, lin.comment, ord.orderpriority FROM tpch.sf1000.lineitem AS lin INNER JOIN tpch.sf1.orders AS ord ON ord.orderkey = lin.orderkey
\n\n
orderkeypartkeysuppkeylinenumberquantityextendedpricediscounttaxreturnflaglinestatusshipdatecommitdatereceiptdateshipinstructshipmodecommentorderpriority11551893457689361117.024252.030.040.02NO1996-03-131996-02-121996-03-22DELIVER IN PERSONTRUCKegular courts above the5-LOW1673090807309081236.039085.920.090.06NO1996-04-121996-02-281996-04-20TAKE BACK RETURNMAILly final dependencies: slyly bold 5-LOW163699776369977738.014180.720.10.02NO1996-01-291996-03-051996-01-31TAKE BACK RETURNREG AIRriously. regular, express dep5-LOW121314954631496428.042738.920.090.06NO1996-04-211996-03-301996-05-16NONEAIRlites. fluffily even de5-LOW1240266341526641524.037426.320.10.04NO1996-03-301996-03-141996-04-01NONEFOB pending foxes. slyly re5-LOW115634450634453632.044277.440.070.02NO1996-01-301996-02-071996-02-03DELIVER IN PERSONMAILarefully slyly ex5-LOW21061697221169743138.067883.960.00.05NO1997-01-281997-01-141997-02-02TAKE BACK RETURNRAILven requests. deposits breach a1-URGENT342969621796963145.088143.750.060.0RF1994-02-021994-01-041994-02-23NONEAIRongside of the furiously brave acco5-LOW3190354296535433249.066810.030.10.0RF1993-11-091993-12-201993-11-24TAKE BACK RETURNRAIL unusual accounts. eve5-LOW31284482293448254327.031611.60.060.07AF1994-01-161993-11-221994-01-23DELIVER IN PERSONSHIPnal foxes wake. 5-LOW329379610187961342.03376.30.010.06AF1993-12-041994-01-071994-01-01NONETRUCKy. fluffily pending d5-LOW3183094077594132528.029733.760.040.0RF1993-12-141994-01-101994-01-01TAKE BACK RETURNFOBages nag slyly pending5-LOW3621425919642610626.042392.740.10.02AF1993-10-291993-12-181993-11-04TAKE BACK RETURNRAILges sleep after the caref5-LOW4880346845534709130.048428.40.030.08NO1996-01-101995-12-141996-01-18DELIVER IN PERSONREG AIR- quickly regular packages sleep. idly5-LOW51085692838569284115.020202.90.020.04RF1994-10-311994-08-311994-11-20NONEAIRts wake furiously 5-LOW51239267893926790226.047049.340.070.08RF1994-10-161994-09-251994-10-19NONEFOBsts use slyly quickly special instruc5-LOW53753018030184350.060415.50.080.03AF1994-08-081994-10-131994-08-26DELIVER IN PERSONAIReodolites. fluffily unusual5-LOW61396354552135469137.051188.390.080.03AF1992-04-271992-05-151992-05-02TAKE BACK RETURNTRUCKp furiously special foxes4-NOT SPECIFIED71820518399551894112.021380.760.070.03NO1996-05-071996-03-131996-06-03TAKE BACK RETURNFOBss pinto beans wake against th2-HIGH7145242743774275829.015106.320.080.08NO1996-02-011996-03-021996-02-19TAKE BACK RETURNSHIPes. instructions2-HIGH
Only showing top 20 row(s)\n
In\u00a0[11]: Copied!
%%trino --output json\nSELECT\n    1 AS a,\n    'abc' as b,\n    1.2 as c,\n    ARRAY[1,2] as d,\n    ARRAY[1, null, 4] as e,\n    ARRAY[ARRAY[1,2],ARRAY[5,4]] as f,\n    CAST(ROW(1,23,456) as ROW(k1 INT, k2 INT, k3 INT)) as g,\n    CAST(ROW(1,'abc',true,null) as ROW(k1 INT, k2 VARCHAR, k3 BOOLEAN, k4 VARCHAR)) as h\n
%%trino --output json SELECT 1 AS a, 'abc' as b, 1.2 as c, ARRAY[1,2] as d, ARRAY[1, null, 4] as e, ARRAY[ARRAY[1,2],ARRAY[5,4]] as f, CAST(ROW(1,23,456) as ROW(k1 INT, k2 INT, k3 INT)) as g, CAST(ROW(1,'abc',true,null) as ROW(k1 INT, k2 VARCHAR, k3 BOOLEAN, k4 VARCHAR)) as h
\n\n
<IPython.core.display.JSON object>
In\u00a0[12]: Copied!
%%trino?\n
%%trino?
Docstring:\n::\n\n  %trino [-l max_rows] [-r all|local|none] [-d name]\n             [-o sql|json|html|grid|text|skip|none] [-s] [-x] [-c catalogname]\n             [-m schemaname] [-j]\n             [sql [sql ...]]\n\nMagic that works both as %trino and as %%trino\n\npositional arguments:\n  sql                   SQL statement to execute\n\noptional arguments:\n  -l max_rows, --limit max_rows\n                        The maximum number of rows to display. A value of zero\n                        is equivalent to `--output skip`\n  -r <all|local|none>, --refresh <all|local|none>\n                        Force the regeneration of the schema cache file. The\n                        `local` option will only update tables/views created\n                        in the local Spark context.\n  -d name, --dataframe name\n                        Capture results in pandas dataframe\n  -o <sql|json|html|grid|text|skip|none>, --output <sql|json|html|grid|text|skip|none>\n                        Output format. Defaults to html. The `sql` option\n                        prints the SQL statement that will be executed (useful\n                        to test jinja templated statements)\n  -s, --show-nonprinting\n                        Replace none printable characters with their ascii\n                        codes (LF -> )\n  -x, --raw             Run statement as is. Do not wrap statement with a\n                        limit. Use this option to run statement which can't be\n                        wrapped in a SELECT/LIMIT statement. For example\n                        EXPLAIN, SHOW TABLE, SHOW CATALOGS.\n  -c catalogname, --catalog catalogname\n                        Trino catalog to use\n  -m schemaname, --schema schemaname\n                        Trino schema to use\n  -j, --jinja           Enable Jinja templating support\nFile:      /data/dev/jupyterlab-sql-editor/ipython_magic/trino/trino.py\n
"},{"location":"example/TrinoConfigurationUsage/#configuration-and-usage","title":"Configuration and Usage\u00b6","text":""},{"location":"example/TrinoJinjaTemplate/","title":"Jinja Templating with trino","text":"In\u00a0[1]: Copied!
%load_ext jupyterlab_sql_editor.ipython_magic.trino\n\n%config Trino.host='localhost'\n%config Trino.port=8080\n%config Trino.httpScheme='http'\n%config Trino.auth=None\n%config Trino.user='the-user'\n\n%config Trino.cacheTTL=3600\n%config Trino.outputFile=\"/tmp/trinodb.schema.json\"\n\n# comma seperated list of schema to cache in the schema file\n%config Trino.catalogs=\"system,tpch\"\n
%load_ext jupyterlab_sql_editor.ipython_magic.trino %config Trino.host='localhost' %config Trino.port=8080 %config Trino.httpScheme='http' %config Trino.auth=None %config Trino.user='the-user' %config Trino.cacheTTL=3600 %config Trino.outputFile=\"/tmp/trinodb.schema.json\" # comma seperated list of schema to cache in the schema file %config Trino.catalogs=\"system,tpch\" In\u00a0[2]: Copied!
table_name = \"tpch.tiny.orders\"\n
table_name = \"tpch.tiny.orders\" In\u00a0[3]: Copied!
%%trino --limit 3 --jinja\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --limit 3 --jinja SELECT * FROM {{ table_name }}
\n\nOnly showing top 3 row(s)\n
Out[3]: orderkeycustkeyorderstatustotalpriceorderdateorderpriorityclerkshipprioritycomment7492511O64295.071996-06-265-LOWClerk#0000008100f the pending deposits. express, ironic deposits7493205F73649.681994-12-182-HIGHClerk#0000002790riously even instructions haggle agains74941231O68212.311996-04-024-NOT SPECIFIEDClerk#0000002440fily express packages. blithely regular requests across In\u00a0[4]: Copied!
%%trino --output sql --jinja\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --output sql --jinja SELECT * FROM {{ table_name }}
\n\n
Out[4]:
SELECT\n    *\nFROM\n    tpch.tiny.orders\n
In\u00a0[5]: Copied!
%%trino --limit 3 --jinja\n/*\nthis is a comment which happens to contain a jinja template\nvariable {{x}}\n*/\n\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --limit 3 --jinja /* this is a comment which happens to contain a jinja template variable {{x}} */ SELECT * FROM {{ table_name }}
\n\n\nA Jinja template variable named {x} was located in your SQL statement.\n\nHowever Jinja was unable to substitute it's value because the variable \"x\" was not found in your ipython kernel.\n\nOption 1: If you intended to use a template variable make sure to assign a value to \"x\"\n\n\nOption 2: If you intended to include \"{{\" in your statement then you'll need to escape this special Jinja variable delimiter.\n\nTo have Jinja ignore parts it would otherwise handle as variables or blocks. For example, if, with the default syntax, you want to use {{ as a raw string in a template and not start a variable, you have to use a trick.\n\nThe easiest way to output a literal variable delimiter \"{{\" is by using a variable expression:\n\n{{ '{{' }}\n\nFor bigger sections, it makes sense to mark a block raw. For example, to include example Jinja syntax in a template, you can use this snippet:\n\n%%trino --limit 3\n{% raw %}\n/*\nThis is a comment which happens to contain a jinja template\nvariable {{x}} that we want to keep as is.\n*/\n{% endraw %}\n\nSELECT\n    *\nFROM\n    {{ table_name }}\n\n\nRaising an error to prevent statement from being executed incorrectly.\n
\n---------------------------------------------------------------------------\nUndefinedError                            Traceback (most recent call last)\n/tmp/ipykernel_18939/3769332149.py in <module>\n----> 1 get_ipython().run_cell_magic('trino', '--limit 3 --jinja', '/*\\nthis is a comment which happens to contain a jinja template\\nvariable {{x}}\\n*/\\n\\nSELECT\\n    *\\nFROM\\n    {{ table_name }}\\n')\n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)\n   2417             with self.builtin_trap:\n   2418                 args = (magic_arg_s, cell)\n-> 2419                 result = fn(*args, **kwargs)\n   2420             return result\n   2421 \n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/decorator.py in fun(*args, **kw)\n    230             if not kwsyntax:\n    231                 args, kw = fix(args, kw, sig)\n--> 232             return caller(func, *(extras + args), **kw)\n    233     fun.__name__ = func.__name__\n    234     fun.__doc__ = func.__doc__\n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)\n    185     # but it's overkill for just that one bit of state.\n    186     def magic_deco(arg):\n--> 187         call = lambda f, *a, **k: f(*a, **k)\n    188 \n    189         if callable(arg):\n\n/data/dev/jupyterlab-sql-editor/ipython_magic/trino/trino.py in trino(self, line, cell, local_ns)\n     76                 print(f'Invalid refresh option given {args.refresh}. Valid refresh options are [all|local|none]')\n     77 \n---> 78         sql = self.get_sql_statement(cell, args.sql, args.jinja)\n     79         if not sql:\n     80             return\n\n/data/dev/jupyterlab-sql-editor/ipython_magic/common/base.py in get_sql_statement(self, cell, sql_argument, use_jinja)\n     93             print('No sql statement to execute')\n     94         elif use_jinja:\n---> 95             sql = self.bind_variables(sql, self.user_ns)\n     96         return sql\n     97 \n\n/data/dev/jupyterlab-sql-editor/ipython_magic/common/base.py in bind_variables(query, user_ns)\n     78     def bind_variables(query, user_ns):\n     79         template = Template(query, undefined=ExplainUndefined)\n---> 80         return template.render(user_ns)\n     81 \n     82     def get_catalog_array(self):\n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/jinja2/environment.py in render(self, *args, **kwargs)\n   1289             return concat(self.root_render_func(ctx))  # type: ignore\n   1290         except Exception:\n-> 1291             self.environment.handle_exception()\n   1292 \n   1293     async def render_async(self, *args: t.Any, **kwargs: t.Any) -> str:\n\n/data/dev/jupyterlab-sql-editor/venv/lib/python3.8/site-packages/jinja2/environment.py in handle_exception(self, source)\n    923         from .debug import rewrite_traceback_stack\n    924 \n--> 925         raise rewrite_traceback_stack(source=source)\n    926 \n    927     def join_path(self, template: str, parent: str) -> str:\n\n<template> in top-level template code()\n\n/data/dev/jupyterlab-sql-editor/ipython_magic/common/base.py in __str__(self)\n     58         print(HOW_TO_ESCAPE_MSG)\n     59         print(RAISING_ERROR_MSG)\n---> 60         return super().__str__(self)\n     61 \n     62 \n\nUndefinedError: 'x' is undefined
In\u00a0[\u00a0]: Copied!
%%trino --limit 3 --jinja\n{% raw %}\n/*\nthis is a comment which happens to contain a jinja template\nvariable {{x}}\n*/\n{% endraw %}\n\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --limit 3 --jinja {% raw %} /* this is a comment which happens to contain a jinja template variable {{x}} */ {% endraw %} SELECT * FROM {{ table_name }} In\u00a0[\u00a0]: Copied!
%%trino --output sql --jinja\n{% raw %}\n/*\nthis is a comment which happens to contain a jinja template\nvariable {{x}}\n*/\n{% endraw %}\n\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --output sql --jinja {% raw %} /* this is a comment which happens to contain a jinja template variable {{x}} */ {% endraw %} SELECT * FROM {{ table_name }} In\u00a0[\u00a0]: Copied!
def get_filters():\n    return 1\n
def get_filters(): return 1 In\u00a0[\u00a0]: Copied!
%%trino --output sql --jinja\n{% raw %}\n/*\nthis is a comment which happens to contain a jinja template\nvariable {{x}}\n*/\n{% endraw %}\n\n{% set testing = get_filters() %}\n{{testing}}\nSELECT\n    *\nFROM\n    {{ table_name }}\n
%%trino --output sql --jinja {% raw %} /* this is a comment which happens to contain a jinja template variable {{x}} */ {% endraw %} {% set testing = get_filters() %} {{testing}} SELECT * FROM {{ table_name }} In\u00a0[16]: Copied!
%%trino --limit 1 --output grid\nSELECT\n    *\nFROM (\n    SELECT\n        ARRAY[1,2,3] as an_array,\n        *\n    FROM\n      tpch.tiny.orders\n)\n
%%trino --limit 1 --output grid SELECT * FROM ( SELECT ARRAY[1,2,3] as an_array, * FROM tpch.tiny.orders )
\n\nOnly showing top 1 row(s)\n
DataGrid(auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_renderer=None, default_render\u2026
In\u00a0[29]: Copied!
\n
In\u00a0[30]: Copied!
\n
\n\n
Out[30]:
SELECT \n    *\nFROM (\n    SELECT\n        ARRAY[1,2,3] as an_array,\n        *\n    FROM\n      tpch.tiny.orders\n)\n\n\n\n  \nWHERE\n  \n    1=0\n    \n      OR (contains(an_array, '1'))\n    \n      OR (contains(an_array, '2'))\n    \n  \n
"},{"location":"example/TrinoJinjaTemplate/#jinja-templating-with-trino","title":"Jinja Templating with trino\u00b6","text":""}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index b80dbe6..c6f4336 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ