What is Vector Search?
Vector search is an AI search technique used to find information in a way that is more intuitive and effective than traditional search methods. Here is a breakdown of what it is, how it works, its benefits, and its limitations.
Vector search is a technique used to find and retrieve data based on similarity rather than relying solely on keywords or phrases. It uses mathematical representations (vectors) of text to determine which documents are relevant given a certain search query.
Patent texts are first converted into vectors, after which you search through the vectors to find relevant documents. A vector is a list of numbers that represents the patent's meaning in a high-dimensional space called vector space. Transforming data into vectors is often done through embedding models. These models capture the meaning of, and relationships between words.
Imagine a map where every patent family has a coordinate (vector). The coordinates of each patent family are determined by its meaning (vector values). Similar families are placed closer together, while dissimilar families are farther apart.
When you perform a search, your query is also transformed into a vector. Please note that the maximum input is 384 words, any input after 384 words will not be vectorized and will thus not affect the search outcome. The search algorithm then looks for vectors (patent families) in the space that is closest to your query vector. The closeness, or distance, between vectors is measured, and the closest matching documents are retrieved and rank-ordered from most to least relevant.
Vector search finds results based on meaning and context rather than exact keyword matches. For example, a search for "dog" will also return results with "labrador" or "puppy" even if those exact phrases aren't used. You can also use a phrase, an entire abstract, or any other piece of text to match against our patent database. Simply put, vector search will retrieve whichever documents are the most closely related to whatever text you search with.
Handling Synonyms and Variations Vector search understands that different words can mean the same thing (e.g., "car" and "automobile"). It retrieves relevant results even if the exact words mentioned in the patent family don't match with what you searched for.
Better with Complex Data Works well with complex data like patent families. For instance, it overcomes the fundamental limitations of Boolean or classification code searches because it doesn't rely on exact classification codes or keywords.
Enhanced Coverage Provides more coverage in its search results, making sure you don't miss anything.
Although vector search typically gets more and better results for most search queries, it is not flawless. Vector search doesn't know when to stop. In some sense, all patent families are related to each other in one way or another. Since vector search searches by similarity, lower down in your search results less relevant patent families will start appearing. For example, if you search for 'Labrador', at some point 'Golden retriever' will also appear in your list. Therefore, to get a satisfactory dataset for any given search query, you might want to finetune with some of the available filtering options.