-
Notifications
You must be signed in to change notification settings - Fork 1
/
sas_snowflake.sasnb
1 lines (1 loc) · 18.6 KB
/
sas_snowflake.sasnb
1
[{"kind":1,"language":"markdown","value":"# SAS and Snowflake\n\n## Methods for Accessing Relational Database Data\nSAS/ACCESS Interface to Relational Databases is a family of interfaces that you can use to interact with data in other vendor databases from within SAS. These are available in SAS Viya Workbench, SAS9 and SAS Viya.\n\nSAS/ACCESS provides these methods for accessing relational DBMS data.\n\n**SAS/ACCESS LIBNAME Statement (Implicit Pass-Through)**\n- To assign SAS librefs to DBMS objects such as schemas and databases, you can use the LIBNAME statement with the required SAS/ACCESS database engine. After you associate a database with a libref, you can use a SAS two-level name to specify any table or view in the database. You can then work with the table or view as you would with a SAS data set.\n- This enables you to write SAS SQL queries and other SAS code through your LIBNAME statement. Depending on the SQL or SAS code you use, SAS will attempt to use implicit pass-through (convert the code to native database SQL) to push as much processing as it can directly in the database.\n\n\n**SQL Pass-Through Facility (Explicit Pass-Through)**\n- To interact with a data source using its native SQL syntax without leaving your SAS session, you can use the [SQL pass-through facility (explicit pass-through)](https://go.documentation.sas.com/doc/en/pgmsascdc/default/acreldb/n0jhd9y0wb161rn1gp04zvjdqak7.htm). SQL statements are passed directly to the data source for processing.\n\n[Selecting a SAS/ACCESS Method](https://go.documentation.sas.com/doc/en/pgmsascdc/default/acreldb/p1hadwfhakpxh3n1i4qvd9f09jqp.htm)","outputs":[]},{"kind":1,"language":"markdown","value":"## 1. Connect SAS to Snowflake\nRead the JSON file that contains the credentials to connect to Snowflake.\n\n\n**NOTE:** Using a JSON file with credentials to connect to Snowflake for training purposes using a demonstration environment. Please follow all company policy regarding authentication.\n\nJSON File Example:\n\n{\n\n \"account_url\": \"<account>.snowflakecomputing.com\",\n \"userName\": \"myusername\",\n \"password\": \"mypass\"\n \n}\n","outputs":[]},{"kind":1,"language":"markdown","value":"Read the JSON file into SAS and store the account URL, username and password into macro variables for authentication.","outputs":[]},{"kind":2,"language":"sas","value":"/* Specify the credentials JSON file */\nfilename myauth \"/workspaces/myfolder/WorkingWithSnowflake/snowflake_creds.json\";\n\n/* Read the JSON file into SAS */\nlibname myauth json fileref=myauth;\n\n/* Create variables to store the authentication information */\ndata _NULL_;\n set myauth.root;\n call symputx('account_url',account_url);\n call symputx('user_name', userName);\n call symputx('password',password);\nrun;\n\n/* Clear the connection to the JSON file */\nlibname myauth clear;","outputs":[]},{"kind":1,"language":"markdown","value":"### Connect to Snowflake with SAS using the SAS/ACCESS LIBNAME statement\n\nThe SAS/ACCESS LIBNAME statement extends the SAS global LIBNAME statement so that you can assign a libref to a relational DBMS. This feature lets you reference a DBMS object directly in a DATA step or SAS procedure. You can use it to read from and write to a DBMS object as if it were a SAS data set. You can associate a SAS libref with a relational DBMS database, schema, server, or group of tables and views. You can also execute implicit or explicit SQL to the database.\n\n- *libref*: specifies any SAS name that serves as an alias to associate SAS with a database, schema, server, or group of tables and views.\n- *snow*: specifies the SAS/ACCESS engine name for the Snowflake interface.\n- *user*: specifies the Snowflake user name (also called the user ID) that you use to connect to your database. If the user name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.\n- *password*: specifies the password that is associated with your Snowflake user ID. If the password contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.\n- *warehouse*: specifies the default Snowflake warehouse to use for sessions that the driver initiated.\n- *database*: specifies the default Snowflake database to use for sessions that the driver initiates and that contains the tables and views that you want to access. If the database name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.\n- *schema*: specifies the default Snowflake schema to use for sessions that the driver initiated.\n\n[SAS/ACCESS Interface to Snowflake](https://go.documentation.sas.com/doc/en/pgmsascdc/default/acreldb/p19i7uzcbso1szn1pczxn88co3g1.htm)\n\n\n#### SAS/ACCESS engines provide a variety of options you can modify to help performance based on your specific tasks.\n**Common options include**:\n- SCANSTRINGCOLUMNS - Specifies whether to determine the maximum length of VARCHAR or large type columns in a database table or query.\n- READBUFF - Specifies the number of rows of DBMS data to read into the buffer.\n- INSERTBUFF - Specifies the number of rows in a single DBMS insert.\n- DBCOMMIT - Causes an automatic COMMIT (a permanent writing of data to the DBMS) after a specified number of rows are processed. \n- SAS/ACCESS also supports [Bulk Loading and Unloading for Snowflake](https://go.documentation.sas.com/doc/en/pgmsascdc/default/acreldb/n0atn4p9vmo64nn1f866es1sqeyx.htm).\n - Bulk loading is the fastest way to insert large numbers of rows into a Snowflake table. \n - You can also perform bulk unloading (data retrieval) from Snowflake. \n\nCheck out the video [SAS Viya Best Practices with Snowflake Data](https://www.sas.com/en_us/partners/find-a-partner/alliance-partners/snowflake.html#:~:text=SAS%20and%20Snowflake%20help%20organizations,CAS%20server%20for%20parallel%20processing) for more information.","outputs":[]},{"kind":2,"language":"sas","value":"/* (optional) Turn off extra information from the log */\noptions nonotes;\n\n/* Connect to Snowflake using the SNOW engine */\nlibname snowssd snow server=\"&account_url\"\n user=\"&user_name\"\n password=\"&password\"\n warehouse=users_wh\n database=SNOWFLAKE_SAMPLE_DATA\n schema=TPCH_SF10;\n\n/* Turn on notes */\noptions notes;","outputs":[]},{"kind":1,"language":"markdown","value":"View available Snowflake tables in the SAS library.","outputs":[]},{"kind":2,"language":"sas","value":"/* Only return the available Snowflake tables in the results */\nods select Members; \nproc contents data=snowssd._all_ nods;\nrun;","outputs":[]},{"kind":1,"language":"markdown","value":"## 2. Implicit SQL Pass-Through\nWrite SAS SQL (PROC SQL) and SAS code using the SAS/ACCESS Interface to Snowflake. SAS will push down as much processing as it can into Snowflake to maximize efficiency and limit data movement between SAS and Snowflake based on the code used. Both SAS PROC SQL and FEDSQL enable you to use implicit pass-through. We will use PROC SQL in this demonstration.\n\nThe following statement generates trace information from a DBMS engine to the SAS log to determine where the processing is occurring.\n\n[SASTRACE= SAS System Option](https://go.documentation.sas.com/doc/en/pgmsascdc/default/acreldb/n0732u1mr57ycrn1urf24gzo38sc.htm#p04h920kyuu1lrn1x1unl3h4a0n6)","outputs":[]},{"kind":2,"language":"sas","value":"options sastrace=',,,d' sastraceloc=saslog nostsuffix sql_ip_trace=note;","outputs":[]},{"kind":1,"language":"markdown","value":"Preview 10 rows from the **PART** Snowflake table. Be aware SAS/ACCESS to Snowflake table names are case sensitive by default. \n\nView the log. Notice that the SAS/ACCESS engine converted SAS SQL to native Snowflake SQL to run the query inside the database and only return the necessary results to SAS.\n\n**NOTE:** To view the log in a SAS Notebook, select:\n- **More Actions** *(three dots near the output)*\n- **Change Presentation**\n- **SAS Log Renderer**\n\nCheck out the documentation [SQL Procedure Reference](https://go.documentation.sas.com/doc/en/pgmsascdc/default/sqlproc/part-2.htm) for more information on PROC SQL.","outputs":[]},{"kind":2,"language":"sas","value":"proc sql;\nSELECT *\nFROM snowssd.PART(obs=10);\nquit;","outputs":[]},{"kind":1,"language":"markdown","value":"Count the total number of rows in the Snowflake **PART** table. Then use the SAS FORMAT= option to modify the displayed output by adding commas to the numeric value.\n\nView the log. Notice that the SAS/ACCESS engine converted SAS SQL to native Snowflake SQL and returned the smaller summarization to SAS.\n\n**NOTE:** The following code uses the SQL code cell to execute the SQL procedure for convenience.","outputs":[]},{"kind":2,"language":"sql","value":"SELECT count(*) format=comma16. AS TOTAL_ROWS\nFROM snowssd.PART;","outputs":[]},{"kind":1,"language":"markdown","value":"Run a query to calculate the count of rows for each distinct value of the column **P_MFGR**.\n\nView the log. Notice that the SAS/ACCESS engine converted SAS SQL to native Snowflake SQL and returned the summarization to SAS. ","outputs":[]},{"kind":2,"language":"sql","value":"SELECT \n P_MFGR, \n count(*) format=comma16. as TOTAL\nFROM snowssd.PART\nGROUP BY P_MFGR\nORDER BY TOTAL DESC;","outputs":[]},{"kind":1,"language":"markdown","value":"### Using SAS Functions in your SAS SQL Queries\n\nNot all SAS functions will pass to Snowflake. View the [Passing SAS Functions to Snowflake](https://go.documentation.sas.com/doc/en/pgmsascdc/default/acreldb/n1d5j8d7wegfezn1irjj3hcrne1n.htm) SAS documentation for more information on SAS functions that will pass to Snowflake.\n\nUse the SAS SCAN function to extract specific words from a string based on the delimiter to obtain the manufacturer ID value from the **P_MFGR** column and count the total number of rows for each value.\n\nView the log. Notice that the SAS function was not implicitly converted to a Snowflake function. However, the SAS/ACCESS engine does efficienctly only return the necessary column from Snowflake to SAS to process the query in SAS. This avoids transfering every column from Snowflake to SAS.","outputs":[]},{"kind":2,"language":"sql","value":"SELECT \n scan(P_MFGR,2,'#') as Manufacturer_ID, /* Specific SAS function */\n count(*) format=comma16. as TOTAL\nFROM snowssd.PART\nGROUP BY Manufacturer_ID\nORDER BY TOTAL DESC;","outputs":[]},{"kind":1,"language":"markdown","value":"The PROC SQL NOEXEC option is useful for checking the syntax of your SQL statements without executing them. It allows you to review the log before running the query, maximizing query efficiency and minimizing data movement.\n\n[SQL Procedure Optional Arguments](https://go.documentation.sas.com/doc/en/pgmsascdc/default/sqlproc/p12ohgh32ffm6un13s7l2d5p9c8y.htm)","outputs":[]},{"kind":2,"language":"sql","value":"proc sql noexec; /* Add the noexec option */\nSELECT \n scan(P_MFGR,2,'#') as Manufacturer_ID, /* Specific SAS function */\n count(*) format=comma16. as TOTAL\nFROM snowssd.PART\nGROUP BY Manufacturer_ID\nORDER BY TOTAL DESC;","outputs":[]},{"kind":1,"language":"markdown","value":"### Many SAS procedures can be used for In-Database processing with the SAS/ACCESS Interface method\n\nSAS In-Database Technologies improves performance by running some code inside the data source. In contrast, under conventional processing, the SAS/ACCESS engine generates an SQL SELECT * statement that is passed to the data source. That SELECT statement fetches all the rows in the table, which can cause a considerable amount of data transfer. In-database processing greatly reduces data transfer and network latency.\n\n[Overview of SAS In-Database Processing with SAS/ACCESS Interfaces](https://go.documentation.sas.com/doc/en/pgmsascdc/default/acreldb/n1lo1d19ercldkn14vm45k13z3so.htm)\n\n\nThis following SAS code utilizes the [PROC FREQ procedure](https://go.documentation.sas.com/doc/en/pgmsascdc/default/statug/statug_freq_toc.htm) to generate frequency tables for the columns **P_BRAND**, **P_CONTAINER**, and **P_MFGR** from the **PART** Snowflake table using the **snowssd** library.\n- The *order=freq* option sorts the output by frequency of occurrence, meaning the most frequent categories will be listed first.\n- The *table* statement specifies the variables for which frequency tables will be generated.\n- The *plots=freqplot* option requests frequency plots to be included in the output.\n\nThe results show the frequency analysis and plots.\n\nView the log. Notice that the Frequency procedure was converted into native Snowflake SQL and the frequency analysis process in the database. Then the smaller summarized results are returned to SAS for the remaining processing.","outputs":[]},{"kind":2,"language":"sas","value":"ods html5 style=Illuminate; /* add a visualization style */\nproc freq data=snowssd.PART order=freq;\n table P_BRAND P_CONTAINER P_MFGR / plots=freqplot;\nrun;","outputs":[]},{"kind":1,"language":"markdown","value":"Use the SAS [SGPLOT procedure](https://go.documentation.sas.com/doc/en/pgmsascdc/default/grstatproc/n0yjdd910dh59zn1toodgupaj4v9.htm) to visualize the Snowflake **PART** table. Many procedures do not execute in database.\n\nView the log. Notice that Snowflake sent all 2 million rows and every column to SAS for processing. While this works, large data movement between a data source (Snowflake) to SAS can be inefficient.","outputs":[]},{"kind":2,"language":"sas","value":"ods html5 style=Illuminate;\nproc sgplot data=snowssd.PART;\n vbar P_MFGR;\nrun;","outputs":[]},{"kind":1,"language":"markdown","value":"Maximize the efficiency by summarizing the Snowflake table within the database using implicit pass-through and create a SAS table with the results. Then visualize the smaller summarized results using SAS.","outputs":[]},{"kind":2,"language":"sas","value":"/* Create a SAS table using implicit pass-through to Snowflake to perform data summarization in Snowflake */\nproc sql;\nCREATE TABLE work.manufacturer_summary AS\nSELECT P_MFGR, count(*) as TOTAL\nFROM snowssd.PART\nGROUP BY P_MFGR;\nquit;\n\n/* Visualize the smaller, summarized results from Snowflake in SAS */\nods html5 style=Illuminate;\nproc sgplot data=work.manufacturer_summary;\n vbar P_MFGR / response=TOTAL;\n format TOTAL comma16.;\n label \n TOTAL = 'Total Count'\n P_MFGR = 'Manufacturer';\nrun;","outputs":[]},{"kind":1,"language":"markdown","value":"## 3. Using Explicit Pass-Through\n\nSAS explicit pass-through uses SAS/ACCESS to connect to a DBMS and enables you to **send native database SQL statements** directly to the DBMS for execution. \n\nBoth FedSQL and SAS PROC SQL enable you to submit DBMS-specific SQL statements to the DBMS by using explicit SQL pass-through. Non-query DBMS SQL statements are submitted to the DBMS by using the EXECUTE statement. Queries can be submitted in the CONNECTION TO component of the SELECT statement FROM clause.\n\n[SQL Explicit Pass-Through Facility](https://go.documentation.sas.com/doc/en/pgmsascdc/v_050/acreldb/n0ijpbk23r9dk4n17ckyh6ylacsw.htm)","outputs":[]},{"kind":1,"language":"markdown","value":"Use a very simple SAS explicit pass-through to execute native Snowflake SQL on the database through SAS.","outputs":[]},{"kind":2,"language":"sql","value":"/* Connect to the Snowflake database */\nCONNECT USING snowssd;\n\n/* Specify the SAS SELECT statement to select the query results from the native Snowflake query. \nYou can use SAS features in the SAS SELECT statement. */\nSELECT *\nFROM connection to snowssd\n /* Snowflake SQL Query */\n( \n SELECT * EXCLUDE P_COMMENT \n FROM PART \n LIMIT 10 \n);\n\n/* Disconnect from Snowflake */\nDISCONNECT FROM snowssd;","outputs":[]},{"kind":1,"language":"markdown","value":"Use SAS excplicit pass-through to execute native Snowflake SQL. Then use SAS to create the final summarized table with specific SAS formats and labels for reporting. \n\n**NOTE:** Use the SQL code cell for convenience.","outputs":[]},{"kind":2,"language":"sql","value":"/* Connect to the Snowflake database */\nCONNECT USING snowssd;\n\n/* Specify the SAS SELECT statement to select the results from the native Snowflake query */\nCREATE TABLE work.manufacturer_supplycost AS\nSELECT \n MANUFACTURER_ID label='Manufacturer ID',\n TOTAL_SUPP_COST format=dollar16.2 label='Total Supply Cost'\nFROM connection to snowssd\n/* Snowflake SQL Query */\n( \n SELECT \n REGEXP_SUBSTR(P_MFGR, '\\\\d+') AS MANUFACTURER_ID, /* Extract the numeric value from the manufacturer */\n sum(ps.PS_SUPPLYCOST) AS TOTAL_SUPP_COST \n FROM PART p /* Perform a join */\n INNER JOIN PARTSUPP ps ON p.P_PARTKEY = ps.PS_PARTKEY \n GROUP BY MANUFACTURER_ID\n);\n\n/* Disconnect from Snowflake */\nDISCONNECT FROM snowssd;","outputs":[]},{"kind":1,"language":"markdown","value":"Use SAS to create the final Excel report for other business users.","outputs":[]},{"kind":2,"language":"sas","value":"/* Create Excel report summary */\n%let outxlsx = /workspaces/myfolder/WorkingWithSnowflake/manufacturer_supp_cost.xlsx;\n\nods excel file=\"&outxlsx\" \n style=ExcelIlluminate \n options(sheet_name='Supply Cost'\n sheet_interval='NONE');\n\n/* Close the output to SAS */\nods html5 close;\n\n/* Create visualization */\ntitle justify=left height=19pt \"Supply Cost by Manufacturer\";\nproc sgplot data=work.manufacturer_supplycost\n nowall noborder;\n vbar MANUFACTURER_ID / \n response=TOTAL_SUPP_COST categoryorder=respdesc;\nrun;\ntitle;\n\n/* Create table summary */\nproc sql;\n SELECT *\n FROM work.manufacturer_supplycost\n ORDER BY TOTAL_SUPP_COST desc;\nquit;\n\n/* Close output to Excel */\nods excel close;","outputs":[]},{"kind":1,"language":"markdown","value":"Clear the connection to Snowflake.","outputs":[]},{"kind":2,"language":"sas","value":"libname snowssd clear;","outputs":[]},{"kind":1,"language":"markdown","value":"### Summary\nThat was a quick overview of working with external databases with SAS in SAS Viya Workbench. It's worth noting that these techniques can be applied not SAS9 and the SAS Viya platform. Regardless of the specific database you're working with, the principles of implicit and explicit pass-through remain the same, and limiting data movement is key for efficient data processing.","outputs":[]},{"kind":1,"language":"markdown","value":"### Deeper Dive in SAS and Databases\n**SAS offers eLearning or Instructor Based Training**\n- [SAS® Programming Methods to Read, Process, and Write Database Tables](https://learn.sas.com/course/view.php?id=139) - Database fundamentals, 14 hours\n- [Efficiency Tips for Database Programming in SAS®](https://learn.sas.com/course/view.php?id=136) - Focuses on efficiency tips, 3.5 hours","outputs":[]}]