Skip to content

Commit

Permalink
anshuman decode-Images-and-Videos-with-OCI-GenAI
Browse files Browse the repository at this point in the history
  • Loading branch information
jesusbrasero committed Nov 28, 2024
1 parent 0487189 commit 984c6b8
Show file tree
Hide file tree
Showing 7 changed files with 307 additions and 1 deletion.
2 changes: 1 addition & 1 deletion ai/generative-ai-service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Reviewed: 13.11.2024

## Reusable Assets Overview
- [Podcast Generator](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/ai-speech/podcast-generator)

- [Decode Images and Videos with OCI GenAI](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/generative-ai-service/decode-Images-and-Videos-with-OCI-GenAI)

# Useful Links

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@

# Decode Images and Videos with OCI GenAI

This is an AI-powered application designed to unlock insights hidden within media files using the Oracle Cloud Infrastructure (OCI) Generative AI services. This application enables users to analyze images and videos, generating detailed summaries in multiple languages. Whether you are a content creator, researcher, or media enthusiast, this app helps you interpret visual content with ease.

<img src="./image.png">
</img>
---

## Features

### 🌍 **Multi-Language Support**
- Receive summaries in your preferred language, including:
- English, French, Arabic, Spanish, Italian, German, Portuguese, Japanese, Korean, and Chinese.

### 🎥 **Customizable Frame Processing for Videos**
- Extract video frames at user-defined intervals.
- Analyze specific frame ranges to tailor your results for precision.

### **Parallel Processing**
- Uses efficient parallel computation for quick and accurate frame analysis.

### 🖼️ **Image Analysis**
- Upload images to generate detailed summaries based on your input prompt.

### 🧠 **Cohesive Summaries**
- Combines individual frame insights to create a seamless, cohesive summary of the video’s overall theme, events, and key details.

---

## Technologies Used
- **[Streamlit](https://streamlit.io/):** For building an interactive user interface.
- **[Oracle Cloud Infrastructure (OCI) Generative AI](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm):** For powerful image and video content analysis.
- **[OpenCV](https://opencv.org/):** For video frame extraction and processing.
- **[Pillow (PIL)](https://pillow.readthedocs.io/):** For image handling and processing.
- **[tqdm](https://tqdm.github.io/):** For progress visualization in parallel processing.

---

## Installation

1. **Clone the repository:**


2. **Install dependencies:**
Make sure you have Python 3.8+ installed. Then, install the required libraries:
```bash
pip install -r requirements.txt
```

3. **Configure OCI:**
- Set up your OCI configuration by creating or updating the `~/.oci/config` file with your credentials and profile.
- Replace placeholders like `compartmentId`, `llm_service_endpoint`, and `visionModel` in the code with your actual values.

---

## Usage

1. **Run the application:**
```bash
streamlit run app.py
```

2. **Upload a file:**
- Use the sidebar to upload an image (`.png`, `.jpg`, `.jpeg`) or a video (`.mp4`, `.avi`, `.mov`).

3. **Set parameters:**
- For videos, adjust the frame extraction interval and select specific frame ranges for analysis.

4. **Analyze and summarize:**
- Enter a custom prompt to guide the AI in generating a meaningful summary.
- Choose the output language from the sidebar.

5. **Get results:**
- View detailed image summaries or cohesive video summaries directly in the app.

---

## Screenshots
### Image Analysis
<img src="./image2.png">
</img>

### Video Analysis
<img src="./image3.png">
</img>

---


## Acknowledgments
- Oracle Cloud Infrastructure Generative AI for enabling state-of-the-art visual content analysis.
- Open-source libraries like OpenCV, Pillow, and Streamlit for providing powerful tools to build this application.

---

## Contact
If you have questions or feedback, feel free to reach out via [[email protected]](mailto:[email protected]).
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Author: Ansh
import streamlit as st
import oci
import base64
import cv2
from PIL import Image
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# OCI Configuration
compartmentId = "ocid1.compartment.oc1..XXXXXXXXXXXXXxxxxxxxxxxxxxxxxxxxxxxxxxxx"
llm_service_endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
CONFIG_PROFILE = "DEFAULT"
visionModel = "meta.llama-3.2-90b-vision-instruct"
summarizeModel = "cohere.command-r-plus-08-2024"
config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE)
llm_client = oci.generative_ai_inference.GenerativeAiInferenceClient(
config=config,
service_endpoint=llm_service_endpoint,
retry_strategy=oci.retry.NoneRetryStrategy(),
timeout=(10, 240)
)

# Functions for Image Analysis
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")

# Functions for Video Analysis
def encode_cv2_image(frame):
_, buffer = cv2.imencode('.jpg', frame)
return base64.b64encode(buffer).decode("utf-8")

# Common Functions
def get_message(encoded_image=None, user_prompt=None):
content1 = oci.generative_ai_inference.models.TextContent()
content1.text = user_prompt

message = oci.generative_ai_inference.models.UserMessage()
message.content = [content1]

if encoded_image:
content2 = oci.generative_ai_inference.models.ImageContent()
image_url = oci.generative_ai_inference.models.ImageUrl()
image_url.url = f"data:image/jpeg;base64,{encoded_image}"
content2.image_url = image_url
message.content.append(content2)
return message

def get_chat_request(encoded_image=None, user_prompt=None):
chat_request = oci.generative_ai_inference.models.GenericChatRequest()
chat_request.messages = [get_message(encoded_image, user_prompt)]
chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_GENERIC
chat_request.num_generations = 1
chat_request.is_stream = False
chat_request.max_tokens = 500
chat_request.temperature = 0.75
chat_request.top_p = 0.7
chat_request.top_k = -1
chat_request.frequency_penalty = 1.0
return chat_request

def cohere_chat_request(encoded_image=None, user_prompt=None):
print(" i am here")
chat_request = oci.generative_ai_inference.models.CohereChatRequest()
chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_COHERE
message = get_message(encoded_image, user_prompt)
chat_request.message = message.content[0].text
chat_request.is_stream = False
chat_request.preamble_override = "Make sure you answer only in "+ lang_type
chat_request.max_tokens = 500
chat_request.temperature = 0.75
chat_request.top_p = 0.7
chat_request.top_k = 0
chat_request.frequency_penalty = 1.0
return chat_request


def get_chat_detail(chat_request,model):
chat_detail = oci.generative_ai_inference.models.ChatDetails()
chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id=model)
chat_detail.compartment_id = compartmentId
chat_detail.chat_request = chat_request
return chat_detail

def extract_frames(video_path, interval=1):
frames = []
cap = cv2.VideoCapture(video_path)
frame_rate = int(cap.get(cv2.CAP_PROP_FPS))
success, frame = cap.read()
count = 0

while success:
if count % (frame_rate * interval) == 0:
frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
success, frame = cap.read()
count += 1
cap.release()
return frames

def process_frame(llm_client, frame, prompt):
encoded_image = encode_cv2_image(frame)
try:
llm_request = get_chat_request(encoded_image, prompt)
llm_payload = get_chat_detail(llm_request,visionModel)
llm_response = llm_client.chat(llm_payload)
return llm_response.data.chat_response.choices[0].message.content[0].text
except Exception as e:
return f"Error processing frame: {str(e)}"

def process_frames_parallel(llm_client, frames, prompt):
with ThreadPoolExecutor() as executor:
results = list(tqdm(
executor.map(lambda frame: process_frame(llm_client, frame, prompt), frames),
total=len(frames),
desc="Processing frames"
))
return results

def generate_final_summary(llm_client, frame_summaries):
combined_summaries = "\n".join(frame_summaries)
final_prompt = (
"You are a video content summarizer. Below are summaries of individual frames extracted from a video. "
"Using these frame summaries, create a cohesive and concise summary that describes the content of the video as a whole. "
"Focus on providing insights about the overall theme, events, or key details present in the video, and avoid referring to individual frames or images explicitly.\n\n"
f"{combined_summaries}"
)
try:
llm_request = cohere_chat_request(user_prompt=final_prompt)
llm_payload = get_chat_detail(llm_request,summarizeModel)
llm_response = llm_client.chat(llm_payload)
return llm_response.data.chat_response.text
except Exception as e:
return f"Error generating final summary: {str(e)}"

# Streamlit UI
st.title("Decode Images and Videos with OCI GenAI")
uploaded_file = st.sidebar.file_uploader("Upload an image or video", type=["png", "jpg", "jpeg", "mp4", "avi", "mov"])
user_prompt = st.text_input("Enter your prompt for analysis:", value="Describe the content of this image.")
lang_type = st.sidebar.selectbox("Output Language", ["English", "French", "Arabic", "Spanish", "Italian", "German", "Portuguese", "Japanese", "Korean", "Chinese"])

if uploaded_file:
if uploaded_file.name.split('.')[-1].lower() in ["png", "jpg", "jpeg"]:
# Image Analysis
temp_image_path = "temp_uploaded_image.jpg"
with open(temp_image_path, "wb") as f:
f.write(uploaded_file.getbuffer())

st.image(temp_image_path, caption="Uploaded Image", width=500)

if st.button("Generate image Summary"):
with st.spinner("Analyzing the image..."):
try:
encoded_image = encode_image(temp_image_path)
llm_request = get_chat_request(encoded_image, user_prompt)
llm_payload = get_chat_detail(llm_request,visionModel)
llm_response = llm_client.chat(llm_payload)
llm_text = llm_response.data.chat_response.choices[0].message.content[0].text
st.success("OCI gen AI Response:")
st.write(llm_text)
except Exception as e:
st.error(f"An error occurred: {str(e)}")
elif uploaded_file.name.split('.')[-1].lower() in ["mp4", "avi", "mov"]:

# Video Analysis
temp_video_path = "temp_uploaded_video.mp4"
video_html = f"""
<video width="600" height="300" controls>
<source src="data:video/mp4;base64,{base64.b64encode(open(temp_video_path, 'rb').read()).decode()}" type="video/mp4">
Your browser does not support the video tag.
</video>
"""
st.markdown(video_html, unsafe_allow_html=True)
with open(temp_video_path, "wb") as f:
f.write(uploaded_file.getbuffer())

# st.video(temp_video_path)
st.write("Processing the video...")

frame_interval = st.sidebar.slider("Frame extraction interval (seconds)", 1, 10, 1)
frames = extract_frames(temp_video_path, interval=frame_interval)
num_frames = len(frames)
st.write(f"Total frames extracted: {num_frames}")

frame_range = st.sidebar.slider("Select frame range for analysis", 0, num_frames - 1, (0, num_frames - 1))

if st.button("Generate Video Summary"):
with st.spinner("Analyzing selected frames..."):
try:
selected_frames = frames[frame_range[0]:frame_range[1] + 1]
waiting_message = st.empty()
waiting_message.write(f"Selected {len(selected_frames)} frames for processing.")
# st.write(f"Selected {len(selected_frames)} frames for processing.")
frame_summaries = process_frames_parallel(llm_client, selected_frames, user_prompt)
# st.write("Generating final video summary...")
waiting_message.empty()
waiting_message.write("Generating final video summary...")
final_summary = generate_final_summary(llm_client, frame_summaries)
waiting_message.empty()
st.success("Video Summary:")
st.write(final_summary)
except Exception as e:
st.error(f"An error occurred: {str(e)}")
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
streamlit==1.33.0
oci==3.50.1
Pillow
opencv-python-headless==4.10.0.84
tqdm==4.66.1

0 comments on commit 984c6b8

Please sign in to comment.