Searching with Odin
How to filter search results?
27 min
llm filters llm filters landscaping patents is challenging finding the right combination of keywords to map out an entire technological domain is complex and time consuming you might miss out on important patents because you forgot to include an important keyword, or add noise because you are adding keywords that also match on unintended domains to help you out with this problem, we have introduced a new feature called, llm filters when to use llm filters when to use llm filters llm filters are particularly useful in the following scenarios beyond traditional filtering use llm filters when traditional keyword or vector search filters are not sufficient to construct an accurate dataset complex queries they're ideal for finding patent families with complex queries for example, "find patents related to pyrolysis, a thermochemical decomposition process that operates under an inert atmosphere, typically reaching temperatures between 400°c and 650°c " natural language filtering when you want to filter a dataset using natural language rather than structured search terms how does llm filters work internally? how does llm filters work internally? llm filters operate by utilizing user instructions to identify which type of patents should be matched through the filter here's a step by step breakdown user instruction the process begins when you provide specific instructions detailing the kinds of patents you want to filter llm processing the large language model (llm) reviews each patent within the dataset the filter then scans through patents title, abstract, claims and description to classify each patent as either "relevant" or "irrelevant" based on your instructions this step is contingent upon the size of your dataset and could take anywhere from 5 to 60 minutes classification and storage after the llm has finished its evaluation, the publication numbers for patents deemed "relevant" or "irrelevant" are collected and stored in our database filter application once the llm filter process is complete, you can effortlessly apply it to the dataset this enables quick and efficient filtering, making it easy for you to access the information that matters most what are the limitations of llm filters? what are the limitations of llm filters? while llm filters can be incredibly useful, it's important to be aware of their limitations dataset size restriction due to the cost and time associated with running llm filters, only the first 5,000 results in a dataset are considered processing speed the speed of llm filters is inherently tied to the speed of the llm itself although we employ parallelization techniques to boost efficiency, the average processing speed is approximately 3 patents per second family level we process only the representative member from a patent family this is the member that best matches your filters and is displayed by default on the family card consequently, details from other family members cannot be matched using an llm filter static datasets datasets that have an “applied” llm filter are “static” this means that newly created patents are not automatically added to the dataset if you want to update your dataset with new incoming patents you need to “re run” your llm filter this process skips already processed patents and updates the list with any newly added ones what are best practices when using llm filtering? what are best practices when using llm filtering? to make the most of llm filtering, follow these best practices optimize dataset size use traditional filtering techniques, like keywords or vector search, to narrow down your dataset before applying llm filtering this helps exclude irrelevant patents and speeds up the filtering process be clear in your instructions clearly state what you're looking for avoiding vague instructions will prevent any misunderstandings by the llm, ensuring it filters exactly what you need vector search in filters vector search in filters just as you can use vector search to create initial search results, you can also use it to filter them the vector search filter under advanced filters allows you to filter search results by using phrases, sentences, or even entire paragraphs unlike keyword search, it does not search for exact words or combinations thereof, but rather filters based on meaning for example, if your initial search query was "car seat", you might enter "artificial leather, also known as synthetic leather or faux leather, is a material designed to etc " into the vector search filter to filter for patent families that describe artificial leather car seats without the normal limitations associated with keyword search the vector search filter always works better with concepts rather than keywords, so consider entering sentences or paragraphs over keywords to make the most use of this filter please note that the maximum input for the vector search filter is 384 words, any input after 384 words will not be vectorized and will thus not affect the filtering outcome positive positive use the positive field to indicate what you do want in your search results when using positive search, multiple entries will be interpreted as having an and relationship in other words, multiple entries into the positive search field will include patents that are similar to input a and input b negative negative use the negative field to indicate what you don't want in your search results a negative search will interpret multiple entries as having an or relationship in other words, multiple entries into the negative search field will exclude patents that are similar to input a or input b keyword filtering keyword filtering keyword search allows you to filter your dataset for patents that contain certain keywords there is an include and an exclude option in the keyword filtering section include include every keyword you include must be present in the patent family to still show up in your search results if you add multiple keywords in include, the default relationship between the keywords is and in other words, if you add "bike" and "tire" as keywords, only patent families that contain both words will be displayed in your results exclude exclude in exclude, the default relationship between keywords is or any patent family that contains an excluded keyword will be ignored in other words, if you exclude "saddle" and "gears" all patent families that contain either word will be excluded the logic behind separating the default relationship between include and exclude keywords is that you typically want to zoom in on a particular concept in the dataset with include, and want to exclude many more aspects with exclude patent family segments to match against patent family segments to match against below include/exclude you can select which segments of the patent families to match your keyword filters with keyword search is completed in the patent sections such as invention title, abstract, claims, and description typically, patent subjects will be mentioned in the title, abstract, or claims 💡to improve search results, consider unchecking "description" as this section often contains many words that are not directly associated with the topic of the invention itself creating custom boolean keyword filters creating custom boolean keyword filters normally, when you enter multiple keywords (e g , bike tyre) in an “include” field, they’re combined with an and operator by default—meaning the system looks for patents or documents containing both words to override that default, you can type bool before your keywords and manually include other boolean operators example example bool bike or bicycle (in the “include” field) this searches for items that contain either “bike” or “bicycle” (or both) this function can be useful to cover various ways of referring to the same concept it bypasses the default requirement that both words must be present important boolean operators important boolean operators for for custom boolean filters custom boolean filters wildcards and ? wildcards and ? asterisk ( ) asterisk ( ) replaces zero or more characters example combus finds documents mentioning “combus,” “combust,” “combustion,” etc question mark (?) question mark (?) replaces a single character example te?t can match “test,” “tent,” “text,” etc warning using too many wildcards (e g , a b c ) can overload the system, as it has to search for every possible combination of letters matching those patterns 3\ grouping keywords with parentheses () 3 grouping keywords with parentheses () parentheses are useful to ensure that multiple words are included exactly in sequence example (bike tire) returns items mentioning “bike tire” as a phrase in other words, “tire” must directly follow “bike ” ❗when applying the bool modifier the sequence requirement will be ignored example bool (bike tire) returns items mentioning both (“bike” and “tire”) without enforcing that “tire” follows “bike ” 4\ exact phrase search "" 4 exact phrase search "" surrounding your search terms with quotation marks looks for the exact phrase in the same order example "high catalytic activity of carbonic anhydrase" this finds documents where all these words appear in exactly that order 5\ proximity searches with tilde 5 proximity searches with tilde proximity queries let the words appear within a certain distance of each other, and in any order example "machine learning" 5 this matches documents where “machine” and “learning” occur within five words of each other (in any order) custom boolean filtering summary custom boolean filtering summary use bool to override default and and manually apply other operators like or use and ? to allow flexible character matching, but be cautious with wildcards to avoid performance issues use () to group words that must appear right next to each other use "" for exact phrases in a fixed order use to allow two words to appear near each other, with a specified distance by combining these tools, you gain fine grained control over your searches, enabling you to include or exclude results more precisely publication numbers publication numbers the publication numbers filter under advanced filters, helps to find specific patents in the search results using their publication number the standard patent publication format in odin is "2 digit country code patent number kind code" for example "us 20180226168 a1" organization organization to filter the search results by organizations, use the organization filter there are two ways to use this filter by selecting/deselecting an organization from the list, or by entering a specific name in the organization search field and then selecting or deselecting that organization searching for an organization's name will allow you to select that particular organization's portfolio within your search set simply tick/untick which organizations you want to in or exclude and hit 'filter' to update your dataset ultimate ultimate owner owner with the ultimate owner filter, organizations are grouped by their ultimate owners use this filter option to make sure you find all patent families owned by a particular organization, even if specific patent families are assigned to a daughter organization there are two ways to use this filter by selecting/deselecting an ultimate owner from the list, or by entering a specific organization name in the ultimate owner search field and then selecting or deselecting that ultimate owner searching for an ultimate owner name will allow you to select that particular organization's portfolio within your search set simply tick/untick which organizations you want to in or exclude and hit 'filter' to update your dataset patent offices patent offices if you are only interested in patent families with members in specific patent offices, use the patent office filter similar to organizations, there are two ways to use it selecting/deselecting a patent office from the list, or searching for a specific patent office and then selecting/deselecting it hit 'filter' when you have applied your selection to update your dataset status status if you only want to see patent families that have members with a certain status, you can filter by status open the status filter and tick/untick which statuses you want to include in your dataset there are 5 statuses 'active', 'in force', 'pending', 'inactive', and 'unknown' in force in force families have at least 1 member that is currently granted and still in force pending pending families are those where all patent family members are currently still pending active active families are those which are either in force or pending inactive inactive families are those where all patent family members have expired for any reason unknown unknown families are those where we do not have good information from the patent office about the current state of the patent family members, and thus the family as a whole hit 'filter' to update your dataset when you have applied your selection publication year publication year with the publication year filter, you can select a pre determined or custom publication year date range for your dataset standard filters are last year, last 3 years, last 5 years, last 10 years, and last 20 years you can also add a custom date range by entering the year range (e g 2010 2017) you want to apply to your data set hit 'filter' to update your dataset when you have entered your selection expiration year expiration year with the expiration year filter, you can select a pre determined or custom expiration year date range for your dataset standard filters are next year, next 3 years, next 5 years, next 10 years, and next 20 years you can also add a custom date range by entering the year range (e g 2025 2030) you want to apply to your dataset hit 'filter' to update your dataset when you have entered your selection similarity similarity the similarity (%) filter allows you to set a similarity threshold to your dataset e g the minimum similarity score to still be included in your search results it can be adjusted by setting a specific number in the input field or by simply dragging the bar the height of the bars on the chart indicates the number of patents at each similarity level for example, a similarity threshold of 83,7 will only include patent families in your search results if they are at least 83,7% similar to your query similarity thresholds are an important part of vector search since vector search works through grouping and including similar documents to your query, at some point you will start getting less desirable results for example, if you search for 'dog food', your top results are likely to all contain dog food patent family members however, at some point further down the similarity list, there may be 'cat food' patent families, as cat food is at least somewhat related to dog food to determine the appropriate similarity threshold for your search results, you might consider scrolling through your list of results to identify a similarity score at which your results start getting irrelevant if you identify such a threshold, this would be a good number to place your similarity threshold