Skip to content

Commit

Permalink
feat: Querybook GitHub Integration (#1511)
Browse files Browse the repository at this point in the history
* feat: Add oauth flow for querybook github integration
* feat: link datadoc to github directory
* feat: Add Datadoc serializing util
* feat: Add github client
  • Loading branch information
zhangvi7 authored Nov 14, 2024
1 parent 554de29 commit f5f4fca
Show file tree
Hide file tree
Showing 57 changed files with 2,877 additions and 4 deletions.
114 changes: 114 additions & 0 deletions docs_website/docs/integrations/add_github_integration.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
id: add_github_integration
title: GitHub Integration Guide
sidebar_label: GitHub Integration
---

:::info
Please check the [GitHub User Guide](../user_guide/github_integration.mdx) for detailed instructions on using GitHub features.
:::

## Overview

The **GitHub Integration Guide** offers instructions to set up and configure GitHub within Querybook. Follow these steps to allow interaction between Querybook and your GitHub repositories.

> **Note:** The GitHub Integration is an experimental feature. Ensure that all configurations are correctly set to avoid setup issues.
## Implementation

To integrate GitHub with Querybook, follow the steps below. This setup involves configuring GitHub OAuth, setting up necessary environment variables, and enabling the GitHub Integration feature.

### 1. Setup GitHub OAuth Application

Before integrating GitHub with Querybook, you need to create an OAuth application on GitHub to obtain the necessary credentials.

1. **Navigate to GitHub Settings:**

- Go to your GitHub account settings.
- Click on **Developer settings**.
- Select **OAuth Apps** and then click **New OAuth App**.

2. **Register a New Application:**

- **Application Name:** Choose a name for your application, e.g., `Querybook Integration`.
- **Homepage URL:** Enter your Querybook instance URL, e.g., `https://your-querybook-domain.com`.
- **Authorization Callback URL:** Set this to `https://your-querybook-domain.com/github/oauth2callback`.

3. **Save the Application:**

- After registering, GitHub will provide a **Client ID** and **Client Secret**. Keep these credentials secure as they are required for the integration.

### 2. Install Dependencies

Ensure that the required Python packages are installed. GitHub Integration relies on OAuth libraries and other dependencies.

Add the following line to your `requirements/local.txt`:

```plaintext
-r github.txt
```

**Note:**
The `github.txt` file includes `pygithub==2.4.0` and `cryptography==3.4.8`, which are essential for interacting with the GitHub API and securing tokens.
For more details, refer to [`infra_installation.mdx`](../configurations/infra_installation.mdx).

### 3. Configure GitHub Integration

Configure Querybook to use the GitHub feature by setting the necessary environment variables and updating configuration files.
Secrets such as `GITHUB_CLIENT_SECRET` and `GITHUB_CRYPTO_SECRET` should be stored securely in environment variables, while non-sensitive information can be placed in `querybook_config.yaml`.

1. **Set Config Variables:**

```env
GITHUB_CLIENT_ID=github_app_client_id
GITHUB_CLIENT_SECRET=github_app_client_secret
GITHUB_CRYPTO_SECRET=crypto_secret
GITHUB_REPO_NAME=github_username/github_repository
GITHUB_REPO_BRANCH=main # Optional, defaults to 'main' branch
```

- **GITHUB_CLIENT_ID:** The Client ID obtained from GitHub OAuth App.
- **GITHUB_CLIENT_SECRET:** The Client Secret obtained from GitHub OAuth App.
- **GITHUB_CRYPTO_SECRET:** A secret key used for encrypting GitHub tokens in the database.
- **GITHUB_REPO_NAME:** The repository name in the format `user/repo_name` (e.g., `github_username/querybook-datadocs`).
- **GITHUB_REPO_BRANCH:** The branch to which commits are pushed. Defaults to `main` if not set.

**Note:**
To obtain `GITHUB_REPO_NAME`, format your repository name as `username/repository`. For example:

1. Navigate to your GitHub profile and click on **Repositories**.
2. Select the repository you want to link.
3. Enter the repository name in the format `username/repository_name`. You can obtain this from your repository's GitHub URL. For instance, if your repository URL is `https://github.com/username123/querybook-datadocs`, the repository name would be `username123/querybook-datadocs`.

### 4. Enable the Feature in Querybook

To display the GitHub button on the Querybook UI for DataDocs, edit the `querybook_public_config.yaml` to enable GitHub Integration feature:

```yaml
github_integration:
enabled: true
```
## Example Configuration
Below is an example configuration snippet demonstrating how to set up GitHub Integration in `querybook_config.yaml` and `querybook_public_config.yaml`:

```yaml
querybook_config:
GITHUB_CLIENT_ID: 'your_github_client_id'
GITHUB_CLIENT_SECRET: '---Redacted---'
GITHUB_CRYPTO_SECRET: '---Redacted---'
GITHUB_REPO_NAME: 'github_username/querybook-datadocs'
GITHUB_REPO_BRANCH: 'main'
public_config:
github_integration:
enabled: true
```

## Additional Tips for Developers

- **Security:** Keep your GitHub OAuth credentials secure. Avoid hardcoding sensitive information in configuration files. Store secrets safely and securely using environment variables.
- **Testing:** After setting up, perform test commits to verify that the integration works as expected before deploying to production environments.

For more information, refer to the [GitHub User Guide](../user_guide/github_integration.mdx) and GitHub's [OAuth Apps Documentation](https://docs.github.com/en/developers/apps/building-oauth-apps/authorizing-oauth-apps).
81 changes: 81 additions & 0 deletions docs_website/docs/user_guide/github_integration.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
id: github_integration
title: GitHub User Guide
sidebar_label: GitHub Integration
---

## Overview

The **GitHub** feature allows you to seamlessly link your DataDocs with GitHub repositories. This integration enables you to commit DataDoc versions directly to GitHub, track changes over time, and collaborate more effectively using GitHub's version control capabilities.

> **Note:** GitHub Integration is an **experimental** feature. It may undergo significant changes in future releases.
## Getting Started

### Linking Your DataDoc to a GitHub Repository

1. **Access GitHub Integration:**

- Open the DataDoc you wish to integrate.
- Click on the **GitHub** icon in the DataDoc right side bar.

![GitHub DataDoc Sidebar](/img/user_guide/github/github_datadoc_sidebar.png)

2. **Authorize GitHub:**

- Click on the **Connect Now** button.
- You will be redirected to GitHub to authorize Querybook.
- After authorization, you'll be redirected back to Querybook.

![Connect GitHub](/img/user_guide/github/connect_github.png)

3. **Link Directory:**

- Enter the **Directory Path** within the repository where DataDoc versions will be stored. You can either specify a custom directory path or use the default directory named `datadocs`.
- Click **Link Directory** button to finalize the process.

![GitHub Directory Linking](/img/user_guide/github/github_directory_linking.png)

### Committing Changes to GitHub

1. **Commit Your DataDoc:**

- In your DataDoc, click on the **Push to GitHub** tab located at the top of the modal.
- Enter a descriptive **Commit Message** summarizing your changes.
- Click the **Push** button to push the new changes to the linked GitHub repository.

![GitHub Push](/img/user_guide/github/github_push.png)

2. **View Commit History:**

- Navigate to the **GitHub Versions** section within your DataDoc.
- Here, you can view the commit history, and compare and restore previous versions.

![Commit History](/img/user_guide/github/github_versions.png)

## Best Practices

- **Frequent Commits:** Commit your changes regularly to maintain a clear history of your DataDoc's evolution.
- **Descriptive Messages:** Use clear and descriptive commit messages to make it easier to understand the purpose of each commit.

## Version History and Branching

### Branching

Querybook does not support traditional branching as all edits are shared in real time. Commits are directly pushed to GitHub, eliminating the concept of local and remote changes.

### Workarounds

- **Clone the DataDoc:** Create a separate copy to experiment with changes without affecting the main version.
- **Link to a Different Repository:** Connect the DataDoc to an alternative GitHub repository for testing purposes.

By following these approaches, users can safely manage and experiment with their DataDocs while maintaining a streamlined version history.

## Troubleshooting

If you encounter issues while using GitHub Integration, consider the following steps:

- **Ensure Proper Linking:** Verify that your DataDoc is correctly linked to the intended GitHub repository.
- **Check Permissions:** Make sure the OAuth application has the necessary permissions to access and modify the repository.

For further assistance, refer to the [GitHub Integration Guide](../integrations/add_github_integration.mdx).
2 changes: 2 additions & 0 deletions docs_website/sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
"integrations/add_stats_logger",
"integrations/add_surveys",
"integrations/add_ai_assistant",
"integrations/add_github_integration",
"integrations/customize_html",
"integrations/embedded_iframe"
],
Expand All @@ -54,6 +55,7 @@

"User Guide": [
"user_guide/ai_assistant",
"user_guide/github_integration",
"user_guide/api_token",
"user_guide/faq"
],
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "querybook",
"version": "3.35.0",
"version": "3.36.0",
"description": "A Big Data Webapp",
"private": true,
"scripts": {
Expand Down
7 changes: 7 additions & 0 deletions querybook/config/querybook_default_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -106,3 +106,10 @@ VECTOR_STORE_PROVIDER: ~
VECTOR_STORE_CONFIG:
embeddings_arg_name: 'embedding_function'
index_name: 'vector_index_v1'

# --------------- GitHub Integration ---------------
GITHUB_CLIENT_ID: ~
GITHUB_CLIENT_SECRET: ~
GITHUB_REPO_NAME: ~
GITHUB_BRANCH: 'main'
GITHUB_CRYPTO_SECRET: ''
3 changes: 3 additions & 0 deletions querybook/config/querybook_public_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,6 @@ table_sampling:
default_sample_rate: 0 # 0 means no sampling
sample_user_guide_link: ''
sampling_tool_tip_delay: 10000 # delay duration (ms) of sampling tool tip

github_integration:
enabled: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
"""Add GitHub Datadoc Link
Revision ID: aa328ae9dced
Revises: f7b11b3e3a95
Create Date: 2024-10-23 21:04:55.052696
"""

from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = "aa328ae9dced"
down_revision = "f7b11b3e3a95"
branch_labels = None
depends_on = None


def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.create_table(
"github_link",
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
sa.Column("datadoc_id", sa.Integer(), nullable=False),
sa.Column("user_id", sa.Integer(), nullable=False),
sa.Column(
"directory",
sa.String(length=255),
nullable=False,
server_default="datadocs",
),
sa.Column(
"created_at", sa.DateTime(), server_default=sa.text("now()"), nullable=False
),
sa.Column(
"updated_at", sa.DateTime(), server_default=sa.text("now()"), nullable=False
),
sa.ForeignKeyConstraint(
["datadoc_id"],
["data_doc.id"],
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("datadoc_id"),
)
# ### end Alembic commands ###


def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.drop_table("github_link")
# ### end Alembic commands ###
Loading

0 comments on commit f5f4fca

Please sign in to comment.