Merge pull request #34 from vuongbachdoan/feature/genai-code-security…

…-review feat: add a demo of using GenAI for code secure review
aws-samples · Nov 15, 2024 · 377e3b5 · 377e3b5
2 parents ba095c8 + 683a184
commit 377e3b5
Show file tree

Hide file tree

Showing 8 changed files with 557 additions and 0 deletions.
diff --git a/AWS-GenAI-Code-Security-Review/.gitignore b/AWS-GenAI-Code-Security-Review/.gitignore
@@ -0,0 +1,66 @@
+source/*
+report/*
+
+# Node.js
+node_modules/
+npm-debug.log
+yarn-error.log
+package-lock.json
+yarn.lock
+/.pnp
+.pnp.js
+
+# Python
+*.pyc
+*.pyo
+*.pyd
+__pycache__/
+env/
+venv/
+ENV/
+VENV/
+
+# macOS
+.DS_Store
+.AppleDouble
+.LSOverride
+
+# Windows
+Thumbs.db
+ehthumbs.db
+
+# IDEs and Editors
+.vscode/
+.idea/
+*.suo
+*.user
+*.userosscache
+*.sln.docstates
+
+# Logs
+*.log
+
+# Build Output
+/dist
+/build
+/.next
+/out
+/public
+
+# Environment files
+.env
+.env.local
+.env.*.local
+
+# Docker
+*.docker
+Dockerfile
+docker-compose.yml
+
+# OS generated files
+Desktop.ini
+$RECYCLE.BIN/
+
+# Misc
+*.swp
+*.swo
diff --git a/AWS-GenAI-Code-Security-Review/LICENSE b/AWS-GenAI-Code-Security-Review/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Vuong Bach Doan
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/AWS-GenAI-Code-Security-Review/README.md b/AWS-GenAI-Code-Security-Review/README.md
@@ -0,0 +1,122 @@
+# AWS GenAI Code Security Review
+
+An interactive UI website build by Streamlit, backed by Amazon Bedrock to review security issue in your code
+
+## Features
+
+- Scan single code file
+- Scan entier Github repository
+- Using Amazon Bedrock - Model Claude 3 to analyze source code
+
+## Prerequisites
+
+- Python 3.12+
+- AWS Account with appropriate permissions
+- Basic understanding of Python AWS services, and Generative AI
+
+## Installation
+
+1. Clone the repository
+
+2. Create and activate a virtual environment:
+```bash
+python -m venv venv
+# On Windows
+venv\Scripts\activate
+# On Unix or MacOS
+source venv/bin/activate
+```
+
+3. Install required packages:
+```bash
+pip install -r requirements.txt
+```
+
+## Environment Setup
+
+Create a `.env` file in the root directory:
+```
+MAX_TOKENS = 2000
+
+ASSISTANT_ROLE = "As a security expert, you will evaluate the provided code to identify vulnerabilities and risks. Look for common attack vectors such as SQL injection, XSS, buffer overflow, and remote code execution. Examine the code for secure coding practices like input validation, output sanitization, authentication, access controls, and error handling. Based on your findings, provide recommendations for improving the code's security posture and mitigating identified risks. Your report should include:
+
+1. A detailed description of each vulnerability found.
+2. Its severity.
+3. A snippet of the affected code.
+4. A mitigation walkthrough in plain English.
+5. The mitigated code follow security best practice.
+
+Ensure the following output format:
+- ### Vulnerability Type: <output>
+- Description: <output>
+- ###### Severity: <output>
+- ###### A snippet of affected code:
+```
+<output>
+```
+- ###### Mitigation walkthrough:
+<output>
+- ###### Improved code:
+```
+<output>
+```
+---
+
+Focus on clarity and detail to ensure that your analysis is thorough and understandable."
+
+MODEL_ID = 'anthropic.claude-3-haiku-20240307-v1:0'
+QUOTAS_FILE_ANALLYZING = 2
+AWS_REGION = 'us-west-2'
+```
+You can change the prompt statement, Bedrock model if you want (but it might be different in payload between different model)
+
+## Project Structure
+
+```
+.
+├── code_review/               
+├──     bedrock_analyze.py          # Handle logic to interact with Bedrock model
+├──     git_handler.py              # Handle logic analyze Github repo scanning security issues
+├── report/           
+├──     2024-11-13_13-46-53.md      # Report about security problem in markdown format
+├── source/                     
+├──     <cloned_git_repo>/          # The directory that we will clone to local to analyze, it is generated automatically        
+├── .env                            # Environment variables
+├── app.py                          # Main logic handle application
+├── requirements.txt                # Packages information
+└── README.md                       # Document    
+```
+
+## Running the Application
+
+Start the application:
+```bash
+streamlit run app.py
+```
+
+## Development
+
+To contribute to the project:
+
+1. Fork the repository
+2. Create a new branch (`git checkout -b feature-name`)
+3. Make your changes
+4. Submit a pull request
+
+## Troubleshooting
+
+Common issues and solutions:
+- AWS credentials not working: Verify `.env` file configuration
+- Import errors: Check virtual environment activation
+- Page loading issues: Verify all dependencies are installed
+
+## Support
+
+For support:
+- Open an issue in the repository
+- Check existing documentation
+- Contact the development team
+
+## License
+
+This project is licensed under the MIT License - see the LICENSE file for details.
diff --git a/AWS-GenAI-Code-Security-Review/app.py b/AWS-GenAI-Code-Security-Review/app.py
@@ -0,0 +1,69 @@
+import streamlit as st
+import os
+from code_review.git_handler import analyze_repository, output_messages
+from code_review.bedrock_analyze import analyze_file_contents
+import chardet
+
+def analyze_uploaded_file(uploaded_file):
+    try:
+        file_contents = uploaded_file.read()
+        detected_encoding = chardet.detect(file_contents)['encoding']
+        file_contents = file_contents.decode(detected_encoding or 'utf-8')
+
+        st.write("File analysis complete. Here are the contents:")
+        st.code(file_contents) 
+
+        response = analyze_file_contents(file_contents)
+
+        if response:
+            for content in response['content']:
+                st.markdown(content['text'])
+
+    except UnicodeDecodeError as e:
+        st.error(f"Error decoding the uploaded file: {e}")
+    except Exception as e:
+        st.error(f"An error occurred while analyzing the file: {e}")
+
+def main():
+    st.title('Demo Source Code Review')
+
+    url = st.text_input('Enter the GitHub URL or Local path:')
+    uploaded_file = st.file_uploader('Or upload a file:', type=['zip', 'py', 'js', 'html', 'css'])
+
+    if st.button('Analyze'):
+        if url:
+            st.write('Analyzing...')
+            if url.endswith('.git'):
+                analyze_repository(url)
+            else:
+                st.warning('The URL should be end with .git')
+            output_messages.clear() 
+
+            st.write('Analysis complete.')
+
+            report_path = f"report/{url.split('/')[-1].replace('.git', '')}" if 'http' in url else f"report/{os.path.basename(url)}"
+            st.write(f"Report path: {report_path}") 
+
+            if os.path.exists(report_path):
+                reports = [f for f in os.listdir(report_path) if f.endswith('.md')]
+                if reports:
+                    for report in reports:
+                        try:
+                            with open(f"report/{url.split('/')[-1].replace('.git', '')}/{report}", 'r') as file:
+                                report_content = file.read()
+                                st.markdown(report_content) 
+                        except Exception as e:
+                            st.error(f"Error reading the report: {e}")
+                else:
+                    st.warning('No reports found in the selected directory.')
+            else:
+                st.warning(f"Report path does not exist: {report_path}")
+
+        elif uploaded_file:
+            st.write('Analyzing the uploaded file...')
+            analyze_uploaded_file(uploaded_file)
+        else:
+            st.warning('Please enter a GitHub URL or upload a file.')
+
+if __name__ == '__main__':
+    main()
diff --git a/AWS-GenAI-Code-Security-Review/code_review/__init__.py b/AWS-GenAI-Code-Security-Review/code_review/__init__.py
@@ -0,0 +1,2 @@
+from . import bedrock_analyze
+from . import git_handler
diff --git a/AWS-GenAI-Code-Security-Review/code_review/bedrock_analyze.py b/AWS-GenAI-Code-Security-Review/code_review/bedrock_analyze.py
@@ -0,0 +1,85 @@
+import boto3
+import os
+import re
+import sys
+import json
+from dotenv import load_dotenv
+
+load_dotenv()
+
+bedrock_client = boto3.client("bedrock-runtime", region_name=os.getenv("AWS_REGION"))
+model_id = os.getenv("MODEL_ID")
+max_tokens = int(os.getenv("MAX_TOKENS"))
+assistant_role = os.getenv("ASSISTANT_ROLE")
+
+
+def analyze_file_contents(file_contents):
+    """
+    Uses Amazon Bedrock's Claude 3 Haiku model to analyze the contents of a file.
+    Removes any comments from the file contents, splits messages into smaller chunks of at most max_tokens,
+    calls the Bedrock API to analyze each chunk, and returns the response.
+
+    Args:
+    - file_contents (str): The contents of the file to analyze.
+
+    Returns:
+    - The response from the Claude model, or None if an error occurs.
+    """
+
+    # Strip comments from the file contents
+    file_contents = re.sub(
+        r'^\s*"""[\s\S]*?"""\s*$', "", file_contents, flags=re.MULTILINE
+    )
+    file_contents = re.sub(r"^\s*#[\s\S]*?\s*$", "", file_contents, flags=re.MULTILINE)
+    file_contents = re.sub(r"^\s*//[\s\S]*?\s*$", "", file_contents, flags=re.MULTILINE)
+
+    print("Splitting messages into smaller chunks of at most max_tokens")
+
+    # Split the message into smaller chunks of at most max_tokens
+    message_chunks = [
+        file_contents[i : i + max_tokens]
+        for i in range(0, len(file_contents), max_tokens)
+    ]
+
+    response = None
+    try:
+        for i, chunk in enumerate(message_chunks, 1):
+            # Create payload for Bedrock API
+            payload = {
+                "modelId": model_id,
+                "contentType": "application/json",
+                "accept": "application/json",
+                "body": json.dumps(
+                    {
+                        "anthropic_version": "bedrock-2023-05-31",
+                        "max_tokens": max_tokens,
+                        "messages": [
+                            {
+                                "role": "assistant",
+                                "content": [
+                                    {
+                                        "type": "text",
+                                        "text": assistant_role
+                                    }
+                                ],
+                            },
+                            {"role": "user", "content": chunk},
+                        ],
+                    }
+                ),
+            }
+
+            # Call the Bedrock API
+            response = bedrock_client.invoke_model(**payload)
+            response_body = json.loads(response["body"].read())
+
+            return response_body
+
+    except Exception as e:
+        print(f"An error occurred during analysis: {e}")
+        return None
+    except KeyboardInterrupt:
+        print("KeyboardInterrupt caught. Exiting...")
+        sys.exit()
+
+    return response