Understanding the Anatomies of LLM Prompts: How To Structure Your Prompts To Get Better LLM Responses
While prompt engineering may or may not be an actual discipline, prompts are what we use to...
Recall that the emerging LLM tech stack consists of four main layers: Data, Model, Orchestration, and Operational. This stack uses in-context learning to tailor pre-trained Large Language Models (LLMs), such as OpenAI's GPT-4 and Meta's Llama 3, for your specific applications. As part of the Data layer, vector databases play a crucial role in Retrieval-Augmented Generation (RAG), allowing you to enhance an LLM's knowledge with contextual or proprietary data.
In this article, we’ll get a deeper understanding of vector databases:
Storage and search mechanisms
Differences between vector databases vs. traditional databases
Differences between vector databases vs. vector search add-ons
Common challenges of vector databases
An embedding is a numerical representation that captures semantic meaning and is expressed as a vector. Embeddings can be compared to see how related they are. Embeddings of unstructured data can be quickly classified and searched with specialized techniques.
A diagram showing vector data. Source: Pinecone The Rise of Vector Data
A transformer is typically used to create the embeddings. For example, OpenAI offers embedding models for turning text, such as natural language or code, into embeddings. However, if the input data is huge in size, then chunking is needed to break up the input data into smaller fragments. At the time of writing, OpenAI's text embedding models have a maximum input size of 8,192 tokens.
Vector databases use a combination of different algorithms to enable the approximate nearest neighbor (ANN) approach. A common pipeline involves the following stages: indexing, querying, and filtering.
Vector databases focus on storing and retrieving high-dimensional embeddings, alongside the original content chunks. Unlike relational databases that create indexes based on the table columns, vector databases utilize specialized algorithms for indexing.
The indexing algorithms help improve vector search in two main ways:
Reduce search scope: This is done by partitioning and organizing vectors into data structures.
Cluster-based index (e.g. IVF)
Hash-based index (e.g. LSH)
Tree-based index (e.g. ANNOY)
Graph-based index (e.g. HNSW)
Reduce vector size: This is done by reducing the dimensions of the vectors or reducing the number of bits representing vector values.
Compression-based index (e.g. RP, PQ, SQ)
Read on to learn more about each indexing algorithm.
Stores vectors in their original form.
Separates the vector space into clusters, where each cluster is represented by a centroid (geometric center or arithmetic mean position). During indexing, vectors are assigned to the nearest centroid, creating an inverted index that maps centroids to lists of vectors. When searching, the query vector is first matched to the closest centroid, narrowing the search to vectors within that cluster. More clusters can improve accuracy but require additional memory and computation to maintain the index.
An illustration of Inverted File indexing: Source
Maps similar vectors into the same hash tables using a set of hash functions. When searching, the query vector is hashed via the same functions and compared to vectors in the same table. By reducing the search space to vectors in the same hash table, searches are quicker. The search is non-exhaustive and provides approximate results. More hash functions improve accuracy but increase search computation.
An illustration of Locality-Sensitive Hashing indexing: Source
Builds multiple binary trees by recursively splitting the vector space with random hyperplanes. During indexing, vectors are assigned to nodes based on their position relative to these hyperplanes, continuing until leaf nodes contain only a few vectors. When searching, multiple trees are traversed in parallel, using a priority queue to explore likely branches. The candidate vectors are then ranked by their true distance to the query vector. More trees improve accuracy but increase memory and build time.
An illustration of hyperplanes used in Approximate Nearest Neighbors Oh Yeah indexing: Source
An illustration of binary trees used in Approximate Nearest Neighbors Oh Yeah indexing: Source
Creates a hierarchical, multi-layered graph structure of nodes representing vectors. Similar nodes are connected, with the top layers containing long-range connections and the bottom layers containing localized connections. During a search, the algorithm navigates down the hierarchy to find vectors most similar to the query vector. The hierarchical nature allows for efficient navigation through the search space. More connections between nodes improve accuracy but at the cost of increased memory usage and computation.
An illustration of a layered Hierarchical Navigable Small World indexing: Source
Projects high-dimensional vectors to lower-dimensional vectors using a matrix of random numbers. Calculating the dot product of the input vectors and matrix results in a projected matrix containing lower-dimensional vectors while keeping their similarity. When searching, the query vector is also simplified with the same random matrix. By reducing the data complexity while maintaining a good approximation of their relationships, searches are significantly faster. Generating a more random matrix improves the projection quality.
An illustration of Random Projection indexing: Source
Splits high-dimensional vectors into smaller parts, represents each part with a code, and then reassembles the parts. The code representations are generated by performing k-means clustering and collectively form a codebook. When searching, the query vector is simplified via the same process and matched against the same codebook. More codes in the codebook improve accuracy but increase search computation.
An illustration of Product Quantization indexing: Source
Compresses vectors by converting each component from a higher-precision format (e.g. 32-bit float) to a lower-precision format (e.g. 8-bit integer). This process maps continuous values to a finite set of discrete values. During indexing and search, all vectors undergo the same quantization, which reduces memory usage while approximately preserving vector relationships. Using more bits per component improves precision but increases memory requirements.
An illustration of Scalar Quantization mapping: Source
Providers of vector databases and add-ons offer different indexing options. As this is an evolving space, check out the providers' latest documentation on current indexing options available.
A list of available indexing options for some popular providers (non-exhaustive, as of 11/29/2023): Source
Selecting the suitable vector index depends on your specific application requirements. You need to consider various factors, such as dataset size, search quality, search speed, and memory usage. For the highest accuracy, use a Flat index, because there is no approximation error introduced from vector reduction. However, as the dataset size increases, using a Flat index means query time becomes longer. HNSW index offers good search quality and quick speed but at the cost of high memory usage. IVF index seems to be a good scalable option that offers high search quality with acceptable speed and memory usage. Some providers also offer composite indexing options that are helpful for very large datasets, such as IVF_PQ for memory-constrained applications and HNSW_SQ for high-recall applications.
A table of vector search metrics (memory, query time, recall) for various indexes from Pinecone: Source
In addition to storing the embeddings, metadata associated with each embedding can be stored. Examples of metadata are content type, category, source, etc. Hence, vector databases typically maintain a vector index and a metadata index. Traditional indexing methods are used for indexing the structured metadata. Some common indexing methods include:
Inverted Indexing: Creates an index that maps each unique term or metadata property to the list of records containing it, allowing for efficient keyword searches.
Hash Indexing: Uses hash functions to map metadata keys to the storage locations of records, enabling fast exact-match lookups.
B-tree Indexing: Creates a balanced tree structure that stores structured metadata in sorted hierarchical order, allowing for efficient range queries and ordered traversals.
Unlike traditional databases (SQL and NoSQL) which look for exact matches, vector databases look for similar matches. The following are some common measures of similarity for comparing two vectors:
Euclidean or L2 Distance: Determines similarity by measuring the straight-line distance between two vectors in a vector space. The value ranges from 0 to infinity, where 0 represents identical vectors and larger values represent increasingly dissimilar vectors.
Manhattan or L1 Distance: Determines similarity by measuring the sum of the absolute differences between the coordinates of two vectors in a vector space. The value ranges from 0 to infinity, where 0 represents identical vectors and larger values represent increasingly dissimilar vectors.
Cosine Similarity: Determines similarity by measuring the cosine of the angle between two vectors in a vector space. Imagine the two vectors as two arrows inside a room. The two arrows can point in opposite directions, at right angles, in slightly different directions, or in the same direction. The more closely the arrows point in the same direction, the more similar the vectors are. The value ranges from -1 to 1, where -1 represents vectors that are diametrically opposed, 0 represents orthogonal vectors, and 1 represents identical vectors.
Dot Product: Determines similarity by measuring the product of the magnitude of two vectors and the cosine of the angle between them. The value ranges from minus infinity to infinity, where a negative value represents vectors that point in opposite directions, 0 represents orthogonal vectors, and a positive value represents vectors that point in the same direction.
2D illustrations of vector similarity measures: Source
Aside from searching for similar vectors, vector databases can also filter the results based on a metadata query. The metadata filtering can be done before or after the vector similarity search.
Pre-filtering: Metadata filtering is performed before the vector search, which can reduce the search space. The drawback is that relevant results failing the metadata filter criteria may be overlooked.
Post-filtering: Metadata filtering is performed after the vector search, which ensures that relevant results aren't overlooked. However, this may add computational overhead to filter out the irrelevant results.
A table comparing database types across several dimensions. Source: Author
Deciding which type of database to use depends on the type of data to store and retrieve:
Structured Data: Also referred to as quantitative data, this kind of data follows a predefined model or schema. For example, flight reservations follow a rigid model of reservation number, flight number, passenger name, etc.
Unstructured Data: Also referred to as qualitative data, this kind of data doesn't have an internal structure. It can consist of text, video, and images. For example, customer reviews and product photos can vary a lot and don't follow a rigid schema.
Semi-structured Data: An in-between of structured and unstructured data. It lacks a predefined structure but uses metadata to define data points. For example, JSON and XML objects have defined properties or tags.
Different types of databases have distinct priorities:
SQL Databases: Focus on ACID principles of atomicity, consistency, isolation, and durability. Data distributed across partitions must be consistent before subsequent queries which limits horizontal scaling.
NoSQL Databases: Emphasize BASE principles of basically available, soft state, and eventual consistency. As eventual consistency is acceptable, subsequent queries can occur before data partitions are all consistent which improves horizontal scaling. Tolerate some imprecision in query results.
Vector Databases: Designed to handle large amounts of data and queries, excel at horizontal scaling. Fault tolerance is enabled to ensure high availability.
A mapping of database types among main database properties: Source
Based on the data type and prioritization, different database types are suitable for specific scenarios:
SQL Databases: Appropriate for critical data and transactions. For example, applications in banking and healthcare that handle sensitive data and require transactional integrity.
NoSQL Databases: Useful for higher volumes of data with exact lookups. For example, applications in social media to manage the numerous posts and connections.
Vector Databases: Befitting applications in AI/ML, NLP, and image recognition. For example, applications in ecommerce to analyze customer reviews and search for products by image.
Vector Database |
Vector Search Extension / Integration |
|
Proprietary |
||
Open-Source |
A table of available providers of vector databases and vector search add-ons (non-exhaustive, as of 07/25/2024). Source: Author
Using a vector search extension or integration to an existing traditional database (SQL or NoSQL) is generally an easier and quicker transition process because it leverages current database infrastructure and expertise. However, this approach may work best for simple vector search functionality.
As previously mentioned, traditional databases and vector databases are architectured based on different priorities. For instance, relational databases value consistency and isolation of transactions. Whereas vector databases value scalability, search speed, and availability. Having a dedicated vector database also means its operation and maintenance are separate from the existing database, allowing for modularity. Some vector databases may also offer robust monitoring and access control.
Retrofitting vector search capability into traditional databases may compromise performance. Although some traditional database providers are improving their vector search offerings. It depends on your vector dataset (size, dimensions, etc.) and the provider. There are benchmarking tools and libraries available, such as open-source VectorDBBench. Some common vector search performance metrics are:
Queries Per Second (QPS): The number of queries that the database can process per second.
Query Latency: The time it takes to execute a query and receive a response, usually expressed as percentile metrics (e.g. 95th and 99th percentiles).
Recall: A measure of how accurate a query is in retrieving similar results.
The following are some benchmarking results from Zilliz, which offers a vector database solution, and Redis, which offers vector search integration.
A graph of benchmarking mainstream vector databases and pgvector based on queries per second by Zilliz: Source
A graph of benchmarking mainstream vector databases and Redis based on throughput speed by Redis: Source
Ultimately, the decision to use a vector database vs. a vector search add-on depends on your application's specific needs for embeddings storage and retrieval. For a small number of embeddings requiring simple vector search capability or to build a prototype, perhaps begin with an extension or integration. As your production system scales, consider setting up a separate vector database for performance optimization and flexibility.
Due to their purpose and architecture, vector databases face the following challenges:
Scalability and Performance: As the volume of high-dimensional embeddings grows, vector databases need to enhance data distribution and parallel processing to maintain performance and fault tolerance. Additionally, they may incorporate caching for frequently accessed data and memory optimization for reducing dimensionality to improve search speed.
Real-time Updates: Vector databases need to provide "fresh data," meaning that newly inserted data are queryable with low latency (within a few seconds). There's also the need to balance the tradeoffs between search performance and metadata filtering accuracy.
Cost Optimization: To reduce costs, computation resources should be used only when needed, which means "decoupling the index storage from queries and searching only what is needed." This is difficult when trying to keep latency low across partitions of the search space.
Vector databases, crucial for implementing RAG to enhance an LLM's knowledge, store and retrieve high-dimensional embeddings. They use specialized indexing algorithms for efficient ANN searches, such as IVF, LSH, ANNOY, HNSW, RP, PQ, and SQ. Unlike traditional databases that look for exact matches, vector databases search for similar matches using measures like Euclidean distance, Manhattan distance, cosine similarity, or dot product. Compared to SQL and NoSQL databases, vector databases prioritize scalability and availability, thus optimized for handling unstructured data in AI/ML applications. While vector search add-ons offer a quicker transition, dedicated vector databases may provide better performance and flexibility. Common challenges for vector databases include scaling, fresh data, and containing costs. This is an evolving space with rapid changes, so stay updated.
Diana Cheung (ex-LinkedIn software engineer, USC MBA, and Codesmith alum) is a technical writer on technology and business. She is an avid learner and has a soft spot for tea and meows.
While prompt engineering may or may not be an actual discipline, prompts are what we use to...
Introduction In a previous article, we learned that prompting is how we communicate with LLMs, such...
Introduction Introduction Pre-trained Large Language Models (LLMs), such as OpenAI’s GPT-4 and...