Skip to content

Commit c54b819

Browse files
committed
Generated markdown tutorials from Jupyter Notebooks
Generated from: couchbase-examples/vector-search-cookbook
1 parent 2eecd97 commit c54b819

6 files changed

+4352
-0
lines changed
Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
---
2+
# frontmatter
3+
path: "/tutorial-mistralai-couchbase-vector-search-with-fts"
4+
title: Using Mistral AI Embeddings with Couchbase Vector Search using FTS service
5+
short_title: Mistral AI with Couchbase Vector Search using FTS service
6+
description:
7+
- Learn how to generate embeddings using Mistral AI and store them in Couchbase using FTS service.
8+
- This tutorial demonstrates how to use Couchbase's vector search capabilities with Mistral AI embeddings.
9+
- You'll understand how to perform vector search to find relevant documents based on similarity.
10+
content_type: tutorial
11+
filter: sdk
12+
technology:
13+
- vector search
14+
tags:
15+
- FTS
16+
- Artificial Intelligence
17+
- Mistral AI
18+
sdk_language:
19+
- python
20+
length: 30 Mins
21+
---
22+
23+
24+
<!--- *** WARNING ***: Autogenerated markdown file from jupyter notebook. ***DO NOT EDIT THIS FILE***. Changes should be made to the original notebook file. See commit message for source repo. -->
25+
26+
27+
[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/mistralai/fts/mistralai.ipynb)
28+
29+
# Introduction
30+
31+
In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [Mistral AI](https://mistral.ai/) as the AI-powered embedding Model. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively, if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-mistralai-couchbase-vector-search-with-global-secondary-index)
32+
33+
Couchbase is a NoSQL distributed document database (JSON) with many of the best features of a relational DBMS: SQL, distributed ACID transactions, and much more. [Couchbase Capella™](https://cloud.couchbase.com/sign-up) is the easiest way to get started, but you can also download and run [Couchbase Server](http://couchbase.com/downloads) on-premises.
34+
35+
Mistral AI is a research lab building the best open source models in the world. La Plateforme enables developers and enterprises to build new products and applications, powered by Mistral’s open source and commercial LLMs.
36+
37+
The [Mistral AI APIs](https://console.mistral.ai/) empower LLM applications via:
38+
39+
- [Text generation](https://docs.mistral.ai/capabilities/completion/), enables streaming and provides the ability to display partial model results in real-time
40+
- [Code generation](https://docs.mistral.ai/capabilities/code_generation/), enpowers code generation tasks, including fill-in-the-middle and code completion
41+
- [Embeddings](https://docs.mistral.ai/capabilities/embeddings/), useful for RAG where it represents the meaning of text as a list of numbers
42+
- [Function calling](https://docs.mistral.ai/capabilities/function_calling/), enables Mistral models to connect to external tools
43+
- [Fine-tuning](https://docs.mistral.ai/capabilities/finetuning/), enables developers to create customized and specilized models
44+
- [JSON mode](https://docs.mistral.ai/capabilities/json_mode/), enables developers to set the response format to json_object
45+
- [Guardrailing](https://docs.mistral.ai/capabilities/guardrailing/), enables developers to enforce policies at the system level of Mistral models
46+
47+
48+
# Before you start
49+
50+
## Get Credentials for Mistral AI
51+
52+
Please follow the [instructions](https://console.mistral.ai/api-keys/) to generate the Mistral AI credentials.
53+
54+
## Create and Deploy Your Free Tier Operational cluster on Capella
55+
56+
To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint.
57+
58+
To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).
59+
60+
### Couchbase Capella Configuration
61+
62+
When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met.
63+
64+
* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application.
65+
* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running.
66+
67+
# Install necessary libraries
68+
69+
70+
```python
71+
!pip install couchbase==4.3.5 mistralai==1.7.0
72+
```
73+
74+
[Output too long, omitted for brevity]
75+
76+
# Imports
77+
78+
79+
```python
80+
from pathlib import Path
81+
from datetime import timedelta
82+
from mistralai import Mistral
83+
from couchbase.auth import PasswordAuthenticator
84+
from couchbase.cluster import Cluster
85+
from couchbase.options import (ClusterOptions, ClusterTimeoutOptions,
86+
QueryOptions)
87+
import couchbase.search as search
88+
from couchbase.options import SearchOptions
89+
from couchbase.vector_search import VectorQuery, VectorSearch
90+
import uuid
91+
```
92+
93+
# Prerequisites
94+
95+
96+
97+
```python
98+
import getpass
99+
couchbase_cluster_url = input("Cluster URL:")
100+
couchbase_username = input("Couchbase username:")
101+
couchbase_password = getpass.getpass("Couchbase password:")
102+
couchbase_bucket = input("Couchbase bucket:")
103+
couchbase_scope = input("Couchbase scope:")
104+
couchbase_collection = input("Couchbase collection:")
105+
```
106+
107+
Cluster URL: localhost
108+
Couchbase username: Administrator
109+
Couchbase password: ········
110+
Couchbase bucket: mistralai
111+
Couchbase scope: _default
112+
Couchbase collection: mistralai
113+
114+
115+
# Couchbase Connection
116+
117+
118+
```python
119+
auth = PasswordAuthenticator(
120+
couchbase_username,
121+
couchbase_password
122+
)
123+
```
124+
125+
126+
```python
127+
cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth))
128+
cluster.wait_until_ready(timedelta(seconds=5))
129+
130+
bucket = cluster.bucket(couchbase_bucket)
131+
scope = bucket.scope(couchbase_scope)
132+
collection = scope.collection(couchbase_collection)
133+
```
134+
135+
# Creating Couchbase Vector Search Index
136+
In order to store Mistral embeddings onto a Couchbase Cluster, a vector search index needs to be created first. We included a sample index definition that will work with this tutorial in the `mistralai_index.json` file. The definition can be used to create a vector index using Couchbase server web console, on more information on vector indexes, please read [Create a Vector Search Index with the Server Web Console](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html).
137+
138+
139+
```python
140+
search_index_name = couchbase_bucket + "._default.vector_test"
141+
search_index = cluster.search_indexes().get_index(search_index_name)
142+
```
143+
144+
# Mistral Connection
145+
146+
147+
```python
148+
MISTRAL_API_KEY = getpass.getpass("Mistral API Key:")
149+
mistral_client = Mistral(api_key=MISTRAL_API_KEY)
150+
```
151+
152+
# Embedding Documents
153+
Mistral client can be used to generate vector embeddings for given text fragments. These embeddings represent the sentiment of corresponding fragments and can be stored in Couchbase for further retrieval. A custom embedding text can also be added into the embedding texts array by running this code block:
154+
155+
156+
```python
157+
texts = [
158+
"Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.",
159+
"It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.",
160+
input("custom embedding text")
161+
]
162+
embeddings = mistral_client.embeddings.create(
163+
model="mistral-embed",
164+
inputs=texts,
165+
)
166+
167+
print("Output embeddings: " + str(len(embeddings.data)))
168+
```
169+
170+
The output `embeddings` is an EmbeddingResponse object with the embeddings and the token usage information:
171+
172+
```
173+
EmbeddingResponse(
174+
id='eb4c2c739780415bb3af4e47580318cc', object='list', data=[
175+
Data(object='embedding', embedding=[-0.0165863037109375,...], index=0),
176+
Data(object='embedding', embedding=[-0.0234222412109375,...], index=1)],
177+
Data(object='embedding', embedding=[-0.0466222735279375,...], index=2)],
178+
model='mistral-embed', usage=EmbeddingResponseUsage(prompt_tokens=15, total_tokens=15)
179+
)
180+
```
181+
182+
# Storing Embeddings in Couchbase
183+
Each embedding needs to be stored as a couchbase document. According to provided search index, embedding vector values need to be stored in the `vector` field. The original text of the embedding can be stored in the same document:
184+
185+
186+
```python
187+
for i in range(0, len(texts)):
188+
doc = {
189+
"id": str(uuid.uuid4()),
190+
"text": texts[i],
191+
"vector": embeddings.data[i].embedding,
192+
}
193+
collection.upsert(doc["id"], doc)
194+
```
195+
196+
# Searching For Embeddings
197+
Stored in Couchbase embeddings later can be searched using the vector index to, for example, find text fragments that would be the most relevant to some user-entered prompt:
198+
199+
200+
```python
201+
search_embedding = mistral_client.embeddings.create(
202+
model="mistral-embed",
203+
inputs=["name a multipurpose database with distributed capability"],
204+
).data[0]
205+
206+
search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
207+
VectorSearch.from_vector_query(
208+
VectorQuery(
209+
"vector", search_embedding.embedding, num_candidates=1
210+
)
211+
)
212+
)
213+
result = scope.search(
214+
"vector_test",
215+
search_req,
216+
SearchOptions(
217+
limit=13,
218+
fields=["vector", "id", "text"]
219+
)
220+
)
221+
for row in result.rows():
222+
print("Found answer: " + row.id + "; score: " + str(row.score))
223+
doc = collection.get(row.id)
224+
print("Answer text: " + doc.value["text"])
225+
226+
227+
```
228+
229+
Found answer: 7a4c24dd-393f-4f08-ae42-69ea7009dcda; score: 1.7320726542316662
230+
Answer text: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.
231+

0 commit comments

Comments
 (0)