Microsoft Code Without Barriers Hackathon 2023 - Winner for PETRONAS Problem Statement
There are 8 different types of publicly-available PETRONAS reports (i.e. Integrated & Annual Reports, Financial Reports & Sustainability) These reports contain a wealth of information, but their complex format and large volume make it challenging for users to quickly identify key topics and generate insights. How can we use Microsoft AI-related services to develop a solution that can automatically extract and organize relevant information from these PETRONAS reports to help users quickly find and understand the topics that they are interested in?
- Leverage on Microsoft AI-related services to extract and categorize text and images from PETRONAS reports, and identify key topics within each report.
- Develop a landing page with search bar that utilizes Natural Language Understanding (NLU) to allow users to search for topics of interest within the reports.
- Upon a search query, the tool should surface relevant documents related to the query and highlight specific keywords from the content across multiple reports.
- The tool should also generate a visual representation of relevant entities in a knowledge-graph with their relationships to help users better understand the context of the topics they are interested in.
- View the full problem statement here
PetroNet is an AI-powered search platform that harnesses the power of Microsoft Azure cloud services and GPT large language model to revolutionise and streamline the way you navigate and unlock key insights within PETRONAS reports. Key functionalities include intelligent information extraction, key topics summarisation, robust search capability, and an interactive knowledge graph.
- View PetroNet devpost here
- Create the required Azure resources via the Azure Portal - Azure Cognitive Search, Azure Blob Storage, Azure Function resources
- Run preprocessing/extract_n_upload_pdf.py to extract pages from PDFs (to fit Cognitive Search Basic Tier limit) and upload to Azure Blob Storage
- Update the 'STORAGE_ACCT_NAME', 'STORAGE_ACCT_KEY', 'STORAGE_CONTAINER_NAME' variables in a .env file
- Run topic-modelling/TopicModellingAzureFunction/init.py and publish the topic modelling Azure Function as an Azure Function App
- Run search-index-pipeline/create-search.cmd to create the Azure Cognitive Search index
- Update the 'url', 'admin_key' variables in a .env file
- Update connectionString in search-datasource.json, and azure function url in search-skillset.json
- Run web-app-frontend/CognitiveSearch.Template.sln in Visual Studio to load front-end
- Update appsettings.json with your configurations
Azure Cognitive Search
- Microsoft Learn - AI-900 Lab: create-cognitive-search-solution
- Github - Azure-Samples/azure-search-knowledge-mining
- Github - Azure-Samples/azure-search-power-skills
- Docs - Tips for AI Enrichment in Azure Cognitive Search
- Docs - Add a scoring profile
Azure Blob Storage
Azure Functions