Using Custom Data with Large Language Models (LLMs) in AZURE
Overview
This is a walkthrough on how to use your own data with Large Language Models (LLMs), like GPT, by leveraging Azure AI Search as a vector store to use Retrieval Augmented Generation (RAG) within an LLM model.
The typical data flow will follow the ingestion of data from a datasource(s), some type of ETL process, creation of vector store and the use of the vector store with an LLM:
For this walkthrough there are 3 parts involved Blob Storage (document storage), Azure AI Search (vector database) and Azure OpenAI (LLM model). These all form a Hybrid and Symantic Search enviroment for RAG within an LLM model:
Example application flow:
This will guide through the setup of creating and configuring Blob Storage, Azure AI Search and Azure OpenAI along with how the Azure AI Studio can be used for testing.
Walkthrough
1. Create Azure Storage
The blob storage will be the storage location for the text files holding the content for the knowledge base/own data (documentation, policy documents, guidence...)
- Create a storage account: Create a storage account - Azure Storage
- Create a container: Quickstart: Upload, download, and list blobs - Azure portal - Azure Storage
2. Create Azure OpenAI Deployments
The Azure OpenAI service refers to the integration of OpenAI's powerful artificial intelligence language models and services into the Microsoft Azure applications and platform.
Before configuring the Azure AI Search, a ChatGPT
model and an Embedding
model is required:
-
Prerequisit:
- An Azure AI Service needs to have been created
- An Azure OpenAI Service needs to have been created within the Azure AI Service
-
Within the Azure Open AI Service open the Azure OpenAI Studio
-
Within the Azure OpenAI Studio two deployments need creating with the following models:
3. Create and Configure Azure AI Search
-
Create an instance of Azure AI Search: Create a search service in the portal - Azure AI Search
NOTE: When selecting the
Pricing Tier
, theBasic
offereing is required. This is becauseSymantic Ranking
is required. -
Within the Azure AI Search Instance select
Import and vectorize data
: -
Setup data connection
-
Vectorize and enrich data
NOTE: In productionthe authentication type
System assigned
to usemanaged identities
.NOTE: When selecting
Schedule indexing
, if the requirement is to schedule a re-index for, let's say, every 5 minutes, make sure there is the required scaling and “Search Units (SU)” available. Both SU are needed when querying and indexing to avoid conflicts. -
Review and create
When the Azure AI Search is being created, it will create the index and run the indexer. The indexer needs to have finished before a search query can be executed.
-
To query the index within the Azure AI Search
4. Test within Azure OpenAI Studio
-
Within the Azure Open AI Service open the Azure OpenAI Studio
-
Click on
Chat
-
Within configuration select to required deploment model
-
Enter a user query for an item of text within the document store.
- For example the document text I uploaded contained information regarding the
Sustainable Farming Inititive
. If we ask the Deployment modelWhat is SFI?
the response is:
”SFI stands for the Sustainable Forestry Initiative. It is an independent, non-profit organization that promotes sustainable forest management through the development of standards, certification, and conservation programs. The SFI program is based on principles that promote responsible environmental behavior and sound business practices. SFI's standards are designed to ensure that forests are managed in a way that protects wildlife, watersheds, and the overall health of forest ecosystems.”
This is incorrect. The reason it is incorrect is because the setup to use the vectorized data from the Azure AI Search hasn’t yet been configured for use in the Playground.
NOTE: Any configuration configured within the playground is for the playground only and has no affect on any other configuration outside.
- For example the document text I uploaded contained information regarding the
-
Add you own data
-
Select
Add your data
within the Setup section in the playground -
Click
Add a data source
-
Select data source:
-
Data Management:
-
Review and finish
-
-
Re-test
- Enter a user query for an item of text within the document store. For example the document text I uploaded contained information regarding the
Sustainable Farming Inititive
. If I ask the Deployment modelWhat is SFI?
the response is:
“The Sustainable Farming Incentive (SFI) is a scheme that pays farmers for actions that support food production and can help improve farm productivity and resilience, while also protecting and improving the environment. It is being rolled out incrementally and is currently open to farmers who were eligible for the Basic Payment Scheme (BPS) on 16 May 2022. The scheme is straightforward to apply to, payments are received quickly, and it is less prescriptive, allowing farmers flexibility to focus on delivering outcomes that matter. The full offer will be in place by the start of 2025, and agreements last for three years with payments made quarterly.”
This is correct. Along with the response citations are returned identifing the reference data used to create the returned response. This validates the use of the “grounded data” fron the RAG.
- Enter a user query for an item of text within the document store. For example the document text I uploaded contained information regarding the
References
- RAG and generative AI - Azure AI Search
- Semantic ranking - Azure AI Search
- Vector search - Azure AI Search
- Azure OpenAI Service embeddings tutorial - Azure OpenAI
- Prompt engineering techniques with Azure OpenAI - Azure OpenAI Service