Code Graph Search is an advanced tool that leverages AI to analyze, index, and query code repositories. It creates a searchable graph representation of code structures, enabling developers to explore and understand complex codebases efficiently.
This project combines several AWS services, including Lambda, Neptune, OpenSearch, and Bedrock, to process code repositories, generate metadata, and provide powerful search capabilities. The system is designed to handle large-scale code analysis tasks and offer semantic code search functionality.
.
├── bin
│ └── code_graph_search.ts
├── client
│ ├── src
│ │ ├── App.vue
│ │ ├── components
│ │ ├── main.js
│ │ └── views
│ └── vue.config.js
├── lambda
│ ├── awslibs
│ │ ├── s3.js
│ │ └── sqs.js
│ ├── codeDownloader
│ │ └── index.js
│ ├── codeReader
│ │ └── index.js
│ ├── codeSummarizer
│ │ └── index.js
│ ├── graphSearchManagement
│ │ └── index.js
│ ├── libs
│ │ ├── bedrock
│ │ ├── constants.js
│ │ ├── embedding
│ │ ├── neptune
│ │ ├── opensearch
│ │ └── repositoryReader.js
│ └── searchCodeGraph
│ └── index.js
├── lib
│ └── code_graph_search-stack.ts
└── test
└── code_graph_search.test.ts
bin/code_graph_search.ts
: Entry point for the CDK applicationlib/code_graph_search-stack.ts
: Defines the AWS infrastructure stacklambda/
: Contains Lambda functions for various processing stepsclient/
: Vue.js frontend application
- Neptune: Graph database for storing code structure
- OpenSearch: For semantic code search capabilities
- Bedrock: AI model integration for code analysis
- S3: Storage for code repositories and processed data
- SQS: Message queues for coordinating processing steps
Prerequisites:
- Node.js v22.x
- AWS CDK v2.x
- AWS CLI configured with appropriate permissions
Steps:
- Clone the repository
- Install dependencies:
npm install
-
Deploy the infrastructure:
npm run deployAll
-
After deployment, note the CloudFront URL output for accessing the web interface.
-
Use the web interface to submit a Git repository URL for analysis.
- Environment variables in
cdk.json
for customizing deployment - Lambda function configurations in
lib/code_graph_search-stack.ts
-
Analyzing a new code repository:
- Submit the Git URL through the web interface
- The system will download, process, and index the code
-
Searching the code graph:
- Use the search functionality in the web interface
- Enter natural language queries to find relevant code sections
-
Exploring code relationships:
- Navigate through the graph visualization to understand code dependencies
Run unit tests:
npm test
-
Issue: Lambda function timeouts
- Problem: Processing large repositories exceeds Lambda execution time
- Solution:
- Increase Lambda timeout in
lib/code_graph_search-stack.ts
- Check CloudWatch logs for specific function failures
- Consider breaking down processing into smaller chunks
- Increase Lambda timeout in
-
Issue: Neptune connection failures
- Problem: Lambda functions unable to connect to Neptune cluster
- Diagnostic steps:
- Verify VPC and security group configurations
- Check Neptune cluster status in AWS console
- Ensure Lambda functions have proper IAM permissions
- Solution:
- Update security group rules if necessary
- Restart Neptune cluster if unresponsive
-
Issue: OpenSearch index not updating
- Problem: Processed code metadata not appearing in search results
- Debugging:
- Enable verbose logging in
lambda/libs/embedding/codeMetaRag.js
- Check CloudWatch logs for indexing errors
- Verify OpenSearch cluster health
- Enable verbose logging in
- Solution:
- Manually trigger reindexing through the management API if necessary
- Monitor Lambda execution times and memory usage
- Use AWS X-Ray for tracing request flows through the system
- Optimize Neptune queries in
lambda/libs/neptune/
modules - Adjust OpenSearch index settings for faster search performance
The Code Graph Search system processes code repositories through several stages:
-
Repository Download: The
codeDownloader
Lambda function fetches the repository from the provided Git URL and stores it in S3. -
Code Reading: The
codeReader
Lambda function analyzes the downloaded code, extracting structural information and metadata. -
Code Summarization: The
codeSummarizer
Lambda function generates summaries for classes and functions using AI models via Bedrock. -
Graph Population: Processed data is used to populate the Neptune graph database, creating nodes for classes, functions, and their relationships.
-
Search Indexing: Metadata and summaries are indexed in OpenSearch for efficient querying.
-
Query Processing: User queries are processed by the
searchCodeGraph
Lambda function, which combines graph traversal and semantic search to find relevant code sections.
[Git Repository] -> [codeDownloader] -> [S3] -> [codeReader] -> [codeSummarizer]
| |
v v
[Neptune] [OpenSearch]
^ ^
| |
[searchCodeGraph] <---- [User Query]
Prerequisites:
- AWS Account with appropriate permissions
- AWS CDK installed and configured
Steps:
-
Configure AWS credentials:
aws configure
-
Deploy the stack:
npx cdk deploy
-
Note the outputs, including the CloudFront distribution URL for the web interface.
The Code Graph Search infrastructure is defined using AWS CDK in TypeScript. Key resources include:
-
VPC:
- Private subnets for Lambda functions and databases
- VPC Endpoints for S3, Bedrock, and SQS
-
Lambda:
- codeDownloaderFunction: Downloads code from Git repositories
- codeReaderFunction: Analyzes code structure
- codeSummarizerFunction: Generates code summaries
- searchCodeGraphFunction: Processes search queries
-
Neptune:
- NeptuneCluster: Stores the code graph structure
- NeptuneInstance: Database instance for query processing
-
OpenSearch:
- OpenSearchDomain: Indexes code metadata for semantic search
-
S3:
- codeDownloadBucket: Stores downloaded code repositories
- clientWebsiteBucket: Hosts the frontend application
-
CloudFront:
- Distribution: Serves the frontend application
-
SQS:
- codeDownloadQueue: Coordinates code download tasks
- codeReaderQueue: Manages code analysis tasks
-
IAM:
- Roles and policies for Lambda functions and service integrations