Skip to content

Commit

Permalink
Merge pull request #34 from vuongbachdoan/feature/genai-code-security…
Browse files Browse the repository at this point in the history
…-review

feat: add a demo of using GenAI for code secure review
  • Loading branch information
vanhoangkha authored Nov 15, 2024
2 parents ba095c8 + 683a184 commit 377e3b5
Show file tree
Hide file tree
Showing 8 changed files with 557 additions and 0 deletions.
66 changes: 66 additions & 0 deletions AWS-GenAI-Code-Security-Review/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
source/*
report/*

# Node.js
node_modules/
npm-debug.log
yarn-error.log
package-lock.json
yarn.lock
/.pnp
.pnp.js

# Python
*.pyc
*.pyo
*.pyd
__pycache__/
env/
venv/
ENV/
VENV/

# macOS
.DS_Store
.AppleDouble
.LSOverride

# Windows
Thumbs.db
ehthumbs.db

# IDEs and Editors
.vscode/
.idea/
*.suo
*.user
*.userosscache
*.sln.docstates

# Logs
*.log

# Build Output
/dist
/build
/.next
/out
/public

# Environment files
.env
.env.local
.env.*.local

# Docker
*.docker
Dockerfile
docker-compose.yml

# OS generated files
Desktop.ini
$RECYCLE.BIN/

# Misc
*.swp
*.swo
21 changes: 21 additions & 0 deletions AWS-GenAI-Code-Security-Review/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Vuong Bach Doan

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
122 changes: 122 additions & 0 deletions AWS-GenAI-Code-Security-Review/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# AWS GenAI Code Security Review

An interactive UI website build by Streamlit, backed by Amazon Bedrock to review security issue in your code

## Features

- Scan single code file
- Scan entier Github repository
- Using Amazon Bedrock - Model Claude 3 to analyze source code

## Prerequisites

- Python 3.12+
- AWS Account with appropriate permissions
- Basic understanding of Python AWS services, and Generative AI

## Installation

1. Clone the repository

2. Create and activate a virtual environment:
```bash
python -m venv venv
# On Windows
venv\Scripts\activate
# On Unix or MacOS
source venv/bin/activate
```

3. Install required packages:
```bash
pip install -r requirements.txt
```

## Environment Setup

Create a `.env` file in the root directory:
```
MAX_TOKENS = 2000
ASSISTANT_ROLE = "As a security expert, you will evaluate the provided code to identify vulnerabilities and risks. Look for common attack vectors such as SQL injection, XSS, buffer overflow, and remote code execution. Examine the code for secure coding practices like input validation, output sanitization, authentication, access controls, and error handling. Based on your findings, provide recommendations for improving the code's security posture and mitigating identified risks. Your report should include:
1. A detailed description of each vulnerability found.
2. Its severity.
3. A snippet of the affected code.
4. A mitigation walkthrough in plain English.
5. The mitigated code follow security best practice.
Ensure the following output format:
- ### Vulnerability Type: <output>
- Description: <output>
- ###### Severity: <output>
- ###### A snippet of affected code:
```
<output>
```
- ###### Mitigation walkthrough:
<output>
- ###### Improved code:
```
<output>
```
---
Focus on clarity and detail to ensure that your analysis is thorough and understandable."
MODEL_ID = 'anthropic.claude-3-haiku-20240307-v1:0'
QUOTAS_FILE_ANALLYZING = 2
AWS_REGION = 'us-west-2'
```
You can change the prompt statement, Bedrock model if you want (but it might be different in payload between different model)

## Project Structure

```
.
├── code_review/
├── bedrock_analyze.py # Handle logic to interact with Bedrock model
├── git_handler.py # Handle logic analyze Github repo scanning security issues
├── report/
├── 2024-11-13_13-46-53.md # Report about security problem in markdown format
├── source/
├── <cloned_git_repo>/ # The directory that we will clone to local to analyze, it is generated automatically
├── .env # Environment variables
├── app.py # Main logic handle application
├── requirements.txt # Packages information
└── README.md # Document
```

## Running the Application

Start the application:
```bash
streamlit run app.py
```

## Development

To contribute to the project:

1. Fork the repository
2. Create a new branch (`git checkout -b feature-name`)
3. Make your changes
4. Submit a pull request

## Troubleshooting

Common issues and solutions:
- AWS credentials not working: Verify `.env` file configuration
- Import errors: Check virtual environment activation
- Page loading issues: Verify all dependencies are installed

## Support

For support:
- Open an issue in the repository
- Check existing documentation
- Contact the development team

## License

This project is licensed under the MIT License - see the LICENSE file for details.
69 changes: 69 additions & 0 deletions AWS-GenAI-Code-Security-Review/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import streamlit as st
import os
from code_review.git_handler import analyze_repository, output_messages
from code_review.bedrock_analyze import analyze_file_contents
import chardet

def analyze_uploaded_file(uploaded_file):
try:
file_contents = uploaded_file.read()
detected_encoding = chardet.detect(file_contents)['encoding']
file_contents = file_contents.decode(detected_encoding or 'utf-8')

st.write("File analysis complete. Here are the contents:")
st.code(file_contents)

response = analyze_file_contents(file_contents)

if response:
for content in response['content']:
st.markdown(content['text'])

except UnicodeDecodeError as e:
st.error(f"Error decoding the uploaded file: {e}")
except Exception as e:
st.error(f"An error occurred while analyzing the file: {e}")

def main():
st.title('Demo Source Code Review')

url = st.text_input('Enter the GitHub URL or Local path:')
uploaded_file = st.file_uploader('Or upload a file:', type=['zip', 'py', 'js', 'html', 'css'])

if st.button('Analyze'):
if url:
st.write('Analyzing...')
if url.endswith('.git'):
analyze_repository(url)
else:
st.warning('The URL should be end with .git')
output_messages.clear()

st.write('Analysis complete.')

report_path = f"report/{url.split('/')[-1].replace('.git', '')}" if 'http' in url else f"report/{os.path.basename(url)}"
st.write(f"Report path: {report_path}")

if os.path.exists(report_path):
reports = [f for f in os.listdir(report_path) if f.endswith('.md')]
if reports:
for report in reports:
try:
with open(f"report/{url.split('/')[-1].replace('.git', '')}/{report}", 'r') as file:
report_content = file.read()
st.markdown(report_content)
except Exception as e:
st.error(f"Error reading the report: {e}")
else:
st.warning('No reports found in the selected directory.')
else:
st.warning(f"Report path does not exist: {report_path}")

elif uploaded_file:
st.write('Analyzing the uploaded file...')
analyze_uploaded_file(uploaded_file)
else:
st.warning('Please enter a GitHub URL or upload a file.')

if __name__ == '__main__':
main()
2 changes: 2 additions & 0 deletions AWS-GenAI-Code-Security-Review/code_review/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from . import bedrock_analyze
from . import git_handler
85 changes: 85 additions & 0 deletions AWS-GenAI-Code-Security-Review/code_review/bedrock_analyze.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
import boto3
import os
import re
import sys
import json
from dotenv import load_dotenv

load_dotenv()

bedrock_client = boto3.client("bedrock-runtime", region_name=os.getenv("AWS_REGION"))
model_id = os.getenv("MODEL_ID")
max_tokens = int(os.getenv("MAX_TOKENS"))
assistant_role = os.getenv("ASSISTANT_ROLE")


def analyze_file_contents(file_contents):
"""
Uses Amazon Bedrock's Claude 3 Haiku model to analyze the contents of a file.
Removes any comments from the file contents, splits messages into smaller chunks of at most max_tokens,
calls the Bedrock API to analyze each chunk, and returns the response.
Args:
- file_contents (str): The contents of the file to analyze.
Returns:
- The response from the Claude model, or None if an error occurs.
"""

# Strip comments from the file contents
file_contents = re.sub(
r'^\s*"""[\s\S]*?"""\s*$', "", file_contents, flags=re.MULTILINE
)
file_contents = re.sub(r"^\s*#[\s\S]*?\s*$", "", file_contents, flags=re.MULTILINE)
file_contents = re.sub(r"^\s*//[\s\S]*?\s*$", "", file_contents, flags=re.MULTILINE)

print("Splitting messages into smaller chunks of at most max_tokens")

# Split the message into smaller chunks of at most max_tokens
message_chunks = [
file_contents[i : i + max_tokens]
for i in range(0, len(file_contents), max_tokens)
]

response = None
try:
for i, chunk in enumerate(message_chunks, 1):
# Create payload for Bedrock API
payload = {
"modelId": model_id,
"contentType": "application/json",
"accept": "application/json",
"body": json.dumps(
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": [
{
"role": "assistant",
"content": [
{
"type": "text",
"text": assistant_role
}
],
},
{"role": "user", "content": chunk},
],
}
),
}

# Call the Bedrock API
response = bedrock_client.invoke_model(**payload)
response_body = json.loads(response["body"].read())

return response_body

except Exception as e:
print(f"An error occurred during analysis: {e}")
return None
except KeyboardInterrupt:
print("KeyboardInterrupt caught. Exiting...")
sys.exit()

return response
Loading

0 comments on commit 377e3b5

Please sign in to comment.