RudderStack Profile Builder (PB) is a CLI tool that allows you to create customer 360 profiles by defining configurations in a SQL like meta language. PB can stitch data together from multiple sources right in your Snowflake data warehouse (Redshift coming soon). PB can stitch user identities and user features from multiple sources, including RudderStack ETL, Fivetran ETL, or other tables in your Snowflake data warehouse. The resulting customer 360 tables can be used to send customer data to downstream tools such as email marketing, chat, or CRM, along with many other destinations using RudderStack Reverse ETL. Profile Builder is also very flexible and can also be used to create profiles for users, companies, sessions, or any other entity you choose.
https://github.com/rudderlabs/rudderstack-profiles-basic-example.git
Make sure the items listed below are in your .gitignore file for hygiene and security purposes (this is done for you by default):
.DS_Store
output
logs
In a terminal shell, install profile builder using python pip:
pipx install profiles-rudderstack # isolated virtual environment
-or -
pip3 install profiles-rudderstack
Validate your installation of pb:
pb version
pb --help # view list of commands
Create the configuration file for connection to your warehouse: this can be modified later at (/Users/your_user/.pb/siteconfig.yaml)
pb init connection
Enter your information into the prompts (example is for Snowflake):
Select a warehouse. (Enter s for Snowflake): s
Enter Connection Name: your_connection_name <rs-profiles-test> # you will need this to ref your connection later
Enter target: (default:dev): # Just press enter, leaving it to default
Enter account: your_account.your_region <abc12345.us-east-1>
Enter warehouse: your_warehouse_name
Enter dbname: your_db_name
Enter schema: your_schema_name #create a schema then ref here
Enter user: your_user_name
Enter password: your_password
Enter role: your_role
Append to /Users/<user_name>/.pb/siteconfig.yaml? [y/N]
yes
After installing PB and configuring your connections, you need to update inputs.yaml with the names of your source tables. Navigate to the inputs.yaml file and update the table:
information for both tables. Keep the table names and only change the schema and database if you want to use the sample data.
- name: rsIdentifies
table: <your_warehouse_db>.<your_warehouse_schema>.<table_name> # change this to your fully qualified input table name
- name: rsTracks
table: <your_warehouse_db>.<your_warehouse_schema>.<table_name> # change this to your fully qualified input table name
occurred_at_col: timestamp
Notice the names are mentioned as edge_sources in profiles.yaml and define specs for creating ID stitcher/feature table. This has been done for you.
ID Stitcher Example: edge_sources and user_id stitcher in profiles.yaml
models:
- name: user_id_stitcher
model_type: id_stitcher
model_spec:
edge_sources:
- from: inputs/rsIdentifies
- from: inputs/rsTracks
Feature Table Example: in profiles.yaml
- name: user_profile
model_type: feature_table_model
model_spec:
entity_key: user
vars:
- entity_var:
name: first_seen
select: min(timestamp::date)
from: inputs/rsTracks
If you plan to use the sample data, run the command below. It will insert two sample tables into the database and schema you defined during setup. (Be sure you are in the project directory in your terminal first):
pb insert
Use this command to validate that your project will be able to access the warehouse specified in pb init connection
and create objects in that warehouse.
pb validate access
You can use this command to generate the SQL that will run in your warehouse, and this will also tell you if there are syntax errors in your model YAML file.
pb compile
If there are no errors, you can use this command to create the output table in your warehouse. If using the sample data, this should execute in about 60 seconds:
pb run
The query below will give you the user profiles view generated by PB. This view is pointed to your most up-to-date user profile table. PB automatically maintains a history of profiles tables so you can see what a profile looked like at a particular snapshot in the past.
select * from YOUR_DB.YOUR_SCHEMA.USER_PROFILE limit 5
USER_MAIN_ID | VALID_AT | FIRST_SEEN | USER_LIFESPAN | DAYS_ACTIVE |
---|---|---|---|---|
rid93c0681d775e73f01830351e693a610e | 2023-06-30 18:50:11.685 | 2022-11-14 | 4 | 2 |
ridcb1b32379f00d727ee6648777534b8e5 | 2023-06-30 18:50:11.685 | 2022-11-15 | 59 | 9 |
rid0379ebf6a4cc85cedbf436efe9bb422d | 2023-06-30 18:50:11.685 | 2022-11-18 | 56 | 11 |
rid1bdbc498de7458039510d81b565ef6ba | 2023-06-30 18:50:11.685 | 2022-05-13 | 0 | 1 |
rid168ce3120988c676d8c3604c0971d632 | 2023-06-30 18:50:11.685 | 2022-11-28 | 11 | 8 |
The query below will provide a sample of the Other IDs connected together to create the user profiles. The table below shows 3 anonymous_ids, 1 user_id, and 1 email as the Other IDs that were stitched into 1 profile ID. Note: The email record was added for illustration purposes and is not in the sample dataset. The next query will show the total number of Other IDs stitched together.
select * from YOUR_DB.YOUR_SCHEMA.USER_ID_STITCHER limit 5
USER_MAIN_ID | OTHER_ID | OTHER_ID_TYPE | VALID_AT |
---|---|---|---|
rid00e6b900e23df0c9aba09928ffcd0d31 | 089511773507192a39cbf1f94e34e366 | anonymous_id | 2022-06-06 19:16:45.000 |
rid00e6b900e23df0c9aba09928ffcd0d31 | 99c6d8b0d3afc5650d3ad9b5eaa06780 | anonymous_id | 2022-06-06 19:16:45.000 |
rid00e6b900e23df0c9aba09928ffcd0d31 | 1ef94c5bf009d0da48ac7a227aeb43be | anonymous_id | 2022-06-06 19:16:45.000 |
rid00e6b900e23df0c9aba09928ffcd0d31 | 1945306b10849bbe946a738f6fd9372f | user_id | 2022-06-06 19:16:45.000 |
rid00e6b900e23df0c9aba09928ffcd0d31 | [email protected] | 2022-06-06 19:16:45.000 |
This query shows the total number of IDs used to make each profile. id00e6b900e23df0c9aba09928ffcd0d31
has 24 different anonymous_ids
and 1 user_id
that went into the profile creation.
select USER_MAIN_ID as RUDDER_USER_ID,other_id_type,count (distinct other_id) as "OTHER_ID_COUNT"
from profiles_demo_db.rs_profiles_7_1.USER_ID_STITCHER
group by USER_MAIN_ID,other_id_type
order by user_main_id asc
limit 5
USER_MAIN_ID | OTHER_ID | COUNT_OF_IDs |
---|---|---|
rid00e6b900e23df0c9aba09928ffcd0d31 | user_id | 1 |
rid00e6b900e23df0c9aba09928ffcd0d31 | anonymous_id | 24 |
rid0379ebf6a4cc85cedbf436efe9bb422d | user_id | 1 |
rid0379ebf6a4cc85cedbf436efe9bb422d | anonymous_id | 30 |
rid0386089d15c9669fec23c6835fdf2ac6 | anonymous_id | 24 |
rid0386089d15c9669fec23c6835fdf2ac6 | user_id | 1 |
You are now up and running with Profiles! When used with the RudderStack Platform, these profiles can create audiences and sync customer360 data to tools like Salesforce, Braze, HubSpot, or Klaviyo. They can also be used for paid advertising audience creation in Google Ads, Facebook, or other tools. Lastly, they can be used for other personalizations like search powered by Algolia, chat powered by Intercom, or subscriptions managed by Stripe — along with MANY more applications.
RudderStack Profile Builder
RudderStack Warehouse Native CDP
Shopify Features
Shopify Churn Features
General eCommerce Features
Stripe Features
Join RudderStack Slack Community and post your question in the #profile-builder channel.