Can You Git To That

A visualization toolset to use AI to understand the history of a Repo

Did you ever get a new gig and have to understand a repo? Maybe not just one repo, but maybe six or sixty repos?

Context is everything, and part of understanding a product, a team, or a codebase is getting the right understanding of it's progression over time. How did it get here? How has it changed, and what were the big inflection points? How can I contribute to this project? Is it flourishing, or it is time to put this one out to pasture?

Lots of AI tools for code, at the moment, are focused on zero-to-one generation of boilerplate code. Can You Git To That (CYGTT) will organize, classify and visualize data about your Github repo so that you can gain valuable context on the project's history, development processes, as well as the lifecycle of the code itself.

The screenshot, above, is a representative view of the output from this repo. Click to see the full screenshot.

How does it work?

First, copy/clone the repo to your local directory. Currently tested under Python 3.11. You can setup a venv, if you like; then run pip install -r requirements.txt to install necessary libraries.

To run it, modify the settings in the tab-delimited config.txt. This file, example.py, is a bare-bones example to show you the paths and includes needed to run the app. You can run it with python3 example.py within the root of your local copy of the repo.

What's gonna happen? Well, first, the app reads config.txt and collects and generates a bunch of data about your repo. To collect data, it uses PyGithub to query the Github API (you'll need a Github personal access token, more info below), ultimately moving that data into a Sqlite database in the output directory. To generate, the app uses an LLM (either OpenAI or Ollama, as configured in config.txt) to generate plain-language summaries (of commit diffs) and to classify and tag file changes, and store them in the database. The approximate accrued cost of your LLM usage is calculated and show via logging output -- as a reference, to generate the example screenshot above, and processing this repo and it's changes solely using gpt-4o-mini (7/2024) via API cost just under $0.04 to process.

Once this process completes succesfully, the next step is to run flask --app web.server run -p 5001 in the directory next to the web directory, which will serve reports from http://127.0.0.1:5001.

Config Details

Github Personal Access Token

You'll need to have a Github access token set up in your environment variables to be able to get to repos that you own for CYGTT to work.

To get an access token, in the upper-right corner of any page on GitHub, click your profile photo, then click Settings from the dropdown menu. On the page that appears, in the left sidebar (look all the way at the bottom of the sidebar, it's easy to overlook), click Developer settings. In the next left sidebar, under Personal access tokens, you can use the "new fine-grained personal access token" to allow specific permissions, or "personal access tokens (classic)".

For Fine-grained personal access token select the following permissions:

Commit statuses: readonly
Contents: readonly
Pull Requests: readonly

Note that if you the Fine-grained access tokens you may have to own the repo, as well, or ask permission to access it.

Using the Github Personal Access Token (Classic) instead, in the same way, will seemingly let you access any repo you have permission to access via API.

Then set the token value in your system environment as CYGTT_GITHUB_ACCESS_TOKEN.

Ollama setup

If you'd like to run this against a local open-source LLM, instead of using (and paying for) an API such as OpenAI, you can use Ollama quite effectively. To optimize Ollama models for reading larger codefiles you may need to extend the default context window from 2048 tokens to 8192 tokens or more. Here's how to tweak Ollama for a larger context window, use your preferred, already installed, Ollama model name in place of <model_name> in the instructions below.

Step 1: Retrieve the Model Configuration

Export the model's current configuration:

ollama show <model_name> --modelfile > model_conf.txt

Step 2: Modify the Configuration File

Open model_conf.txt in a text editor and:

Add the line: PARAMETER num_ctx 8192 To make sure updates keep the change, replace the line starting with FROM with: FROM <model_name>:latest

Save and close the file.

Step 3: Create a New Model

Create a new model with the updated configuration:

ollama create <new_model_name> -f model_conf.txt

Now you can call new_model_name in your config.txt, if you're using Ollama for CYGTT.

Installing More Languages for Tree-sitter

To add more languages to your Tree-sitter setup, you need to manually clone the language grammar repositories into a designated directory and then modify your script to include these languages in the build process.

1.	Clone the Language Grammar Repositories:

For each language you want to add, clone the corresponding Tree-sitter grammar repository into the vendor directory. For example, to add JavaScript:

git clone https://github.com/tree-sitter/tree-sitter-javascript vendor/tree-sitter-javascript

2.	Update the build_language_library Function:

Modify your languages dictionary to include the new language(s) you cloned.

languages = {
    'python': 'vendor/tree-sitter-python',
    'javascript': 'vendor/tree-sitter-javascript',
    # Add more languages here
}

Additional Thoughts/References:

If you're going to use OpenAI's models, you will need to setup an OPENAI_API_KEY in your environment, as usual. I would recommend the gpt-4o-mini model as a fast and accurate and cheap choice. To be even cheaper, but probably not quite as fast nor as accurate, run Ollama locally.
What's next? I'm mostly collecting ideas/todos in issues. Feel free to take a peek and opine/ideate/complain.
- The big picture: Expand indexing of code and diffs to make code and changes searchable, by being able to provide smart context for the LLM. Being able to ask "what changed around the sixth of January such that the entire app is now in jeopardy?" and getting a solid answer, for example.
The name of this project is based on the Funkadelic song "Can You Get To That" off the Maggot Brain album (1971). Graphics used here were created with Recraft.ai, and take their inspiration from my related project Give Up The Func.
Details on changing Ollama context size found at Nurgo Software, for their product "Brain Soup".
Adam Tornhill's Your Code As A Crime Scene is a great resource, and the origin of a git-as-forensics approach. If you don't want to tackle a DIY approach here, consider Adam's company Code Scene.
The code example, above, to run the web view uses port 5001, instead of the default 5000, as 5000 seems to sometimes a conflict on MacOS. Change it to whatever you want or need.
Solid and fun-to-read article on RAG across multiple data sources, including lots of SQL tables by Ryan Nguyen.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
can_you_git_to_that		can_you_git_to_that
notebooks		notebooks
vendor		vendor
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.txt		config.txt
example.py		example.py
llm_pricing.json		llm_pricing.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can You Git To That

A visualization toolset to use AI to understand the history of a Repo

How does it work?

Config Details

Github Personal Access Token

Ollama setup

Step 1: Retrieve the Model Configuration

Step 2: Modify the Configuration File

Step 3: Create a New Model

Installing More Languages for Tree-sitter

Additional Thoughts/References:

About

Releases

Packages

Languages

License

gravitymonkey/can_you_git_to_that

Folders and files

Latest commit

History

Repository files navigation

Can You Git To That

A visualization toolset to use AI to understand the history of a Repo

How does it work?

Config Details

Github Personal Access Token

Ollama setup

Step 1: Retrieve the Model Configuration

Step 2: Modify the Configuration File

Step 3: Create a New Model

Installing More Languages for Tree-sitter

Additional Thoughts/References:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages