Multilingual and Multimodal Vector Search with Hardware Acceleration

Dmitry Kan
Muves
Published in
11 min readMay 2, 2022

--

Authors: Dmitry Kan, Aarne Talman

We recently partnered with GSI Technology Inc. to develop a vector search demo that utilizes their APU backend for hardware-accelerated similarity search. In this blog post we describe the journey and how we implemented the demo using our Muves vector search software stack. You can access the demo on our main website: https://muves.io/.

Video walk-through of the multilingual and multimodal vector search demo with Muves and GSI APU

Why vector search?

Traditional, keyword-based search relies on matching of search terms to text in an inverted index. This makes it difficult to find items with similar meaning but containing different keywords. Keyword-based search also often returns results that do not respect the word order — thus missing the intended meaning. Consider for example the query Visa from Canada to Finland. Typical keyword-based search engine would return results like Visa from Finland to Canada as relevant. But clearly this would not be what the user wanted.

Another major issue for keyword search is that it is not directly suitable for multimodal or multilingual search, for example finding images without indexed textual description or finding results using different languages for the query. Searching across multiple modalities can bring a lot of benefit to search engines with sparse text coverage (say product descriptions), but where there is an abundance of images. Augmenting text descriptions with image semantics (for instance, if this t-shirt has flowers painted on it) can significantly increase recall of relevant items and lead to more user engagement in your product.

In contrast to pure keyword search, vector search utilizes modern neural network models to represent objects (like text and images) and queries as high-dimensional vectors, and does ranking based on vector similarity. This allows finding items with similar meaning or of different modality.

Let’s look at an example.

Keyword search:

Query: A bear eating fish by a river

Result: heron eating fish

Picture of a heron eating fish

Vector search:

Query: nehir kenarında balık yiyen ayı (a bear eating fish by a river — Turkish)

Query vector: [0.072893, -0.277076, 0.201384, …]

Result vector: [0.004142, -0.022811, 0.019714 …]

Result:

Picture of a bear eating fish by a river

The example illustrates the problem with keyword search on the one hand and the strength of vector search on the other. In the keyword search example the search engine has found multiple matching terms between the query and the result, however, it misses the main search term “bear”. In the vector search example the search engine returns an image with a bear eating a fish by a river even though the index contained only the embedding vectors of images, without any textual description, and the query was in Turkish.

Goals

Many users in the search world are familiar with the Elasticsearch system, and most recently OpenSearch, which is based on Elasticsearch, has been enjoying an active development and production adoption of its own.

Our early experiment comparing various approaches to KNN/ANN search in Elasticsearch specifically, demonstrated that GSI’s plugin performed the best in terms of latency. In addition, the heavy-duty memory allocation and the vector search itself are handled on the GSI APU side, effectively solving this scalability concern outside Elasticsearch / OpenSearch. These two benefits attracted us in deciding to implement our demo with GSI’s plugin. Also, the idea of staying in the search engine we already adopted in our projects has been a game changer.

In this demo we wanted to demonstrate how to build a production-quality multilingual and multimodal vector search application using hardware accelerated compute backend using the Muves software stack and GSI’s APU compute technology. We also wanted to make sure our search scales to millions of vectors and supports retrieval time filtering, as well as pure keyword search as a comparison point between dense and sparse retrieval.

Architecture

The architecture of the demo contains 5 major components. We briefly list and explain these below. Diagram (1) contains an illustration of the demo architecture.

System arhictecture of Neural Search with the APU, powered by Muves search and indexing app
  1. Muves is a state-of-the-art search engine software based on the latest natural language processing research allowing rapid implementation and deployment of multilingual and multimodal vector search applications. We started developing Muves a few months ago when we realized that a lot of the common vector search application functionality that we were repeatedly re-inventing could be packaged into reusable software components. Muves contains a search web application, common search application logic as-well-as query and indexing interface for common search products, like Elasticsearch, OpenSearch and Solr.
  2. OpenSearch is a recent open source fork of the widely used Elasticsearch search suite.
  3. GSI OpenSearch Plugin provides the ability to connect OpenSearch index to the GSI APU backend, allowing indexing vectors and hardware-accelerated vector similarity search.
  4. GSI APU (Associative Processing Unit) is a patented in-place parallel computing unit that is designed to remove the bottleneck at the I/O between the processor and memory. In APU data is processed directly in place in the memory array without having to cross the I/O. This allows blazing-fast parallel computation of vector similarity — a feature essential for production-quality vector search applications.
  5. GSI OpenSearch Webapp provides a simple to use web user interface for index admin tasks, like uploading indexed vectors to the APU backend along with fields for filtering.

Implementation details

At the core of Muves is the search API, which provides FastAPI-based implementation of common search functionality, including query pre-processing, filtering and preparation of the results. The API supports both single queries as well as batch queries. The user interface was developed using the Bootstrap library, HTML, CSS and jQuery.

For our demo’s dataset we use a subset of the LAION-400M dataset. LAION-400M is the world’s largest openly available image-text-pair dataset with 400 million samples. For this demo we used 10M image-text-pairs and related metadata available in the dataset.

For image and text embeddings we use a multilingual CLIP model available in the Hugging Face Models repository.

For indexing and querying, Muves contains interfaces for OpenSearch, Elasticsearch (developed on top of the official Elasticsearch and OpenSearch libraries) and Solr.

To enable the use of the APU backend, indexing needs to be done using the following settings and mappings.

Here the “knn_vector” type tells the GSI plugin that the field “vector” is the vector field to be uploaded to APU.

In order to query the APU backend, queries need to have the required structure. See an example below:

The “prefilter” and “filterOperator” fields are optional fields telling the APU backend to return only those results with the specific value in the specified field.

As the current version of the GSI OpenSearch Plugin does not use the standard OpenSearch/Elasticsearch end point for batch search, i.e. http://<SERVER_NAME>:9200/<INDEX_NAME>/_search, but has a custom endpoint of the form: http://<SERVER_NAME>:9200/gsi/search/multiquery/<INDEX_NAME>/<VECTOR_FIELD> we implemented a custom function for batch queries and related response processing. So instead of calling the Opensearch search function we used the requests library:

Batch queries to the APU backend require a specific format. Here’s a truncated sample:

The APU backend returns a list of document ids together with their cosine similarity to the query vector (internally APU uses Hamming distance to rank the vectors before returning them to OpenSearch). For this reason, to retrieve the actual document and related metadata, we send the list of document ids to OpenSearch. Although this is less than optimal from a simplicity and latency point of view, in practice querying OpenSearch with specified document ids is super fast. We are also told that GSI are looking into optimizing the user experience of such multiqueries.

This diagram summarizes the workflow of the query workflow on high-level, covering every use-case, except the pure keyword search:

Muves — APU query workflow for neural search scenarios

On this diagram you can see the step retrieving top-K document indices and distances. These distances are computed using cosine similarity between the query and a document vector.

Demo walkthrough

In order to get a taste of the neural search, it is best to compare it to the familiar sparse, aka keyword search. So both options were implemented. We went beyond this and enabled the search across text and across images using neural retrieval. This helps to compare the role of text vs image data in relevancy.

In the following list of screenshots, we are using the same query “A bear eating fish by a river” in three scenarios:

  1. Keyword search
  2. Text embedding search
  3. Image embedding search

As you can see, the sparse search captures “eating a fish”, but brings results with “heron” instead of the expected “bear”.

All of the scenarios below are covered in the following screencast:

Keyword search

In the keyword search scenario the search app takes the query and passes that to the OpenSearch backend, which then returns a list of results using the default BM25 ranking algorithm which is an enhanced version of the term frequency–inverse document frequency (tf-idf) method. The basic idea behind tf-idf is that it looks at the frequency of the term in the document (more the better) and it looks at the inverse document frequency (common words are less important). In our case the documents are the indexed image captions and the search terms are the terms in the query. So to simplify, we can say that with keyword search the demo returns those images and captions where there are more matches between the search terms and the terms in the caption.

Sparse (keyword) search over image captions with OpenSearch

Text embedding search

In the text embedding search scenario the search query is passed to a CLIP embedding model. The CLIP model returns a multidimensional vector which is passed to the OpenSearch backend. The GSI OpensSearch plugin directs the query to the APU which performs similarity search with the indexed text vectors computed from image captions. For final similarity ordering APU & OpenSearch backend uses cosine distance. The GSI OpenSearch plugin then enriches the results with metadata from OpenSearch and returns the results to Muves.

Neural search over captions with OpenSearch and APU

Image embedding search

Image embedding search works similarly as the text embedding scenario. However, in this case the query is directed to the index which contains the image vectors instead of text vectors.

Multimodal search with CLIP model in OpenSearch and APU

Multilingual search

The demo supports queries in more than 50 languages by using a multilingual CLIP model. The multilingual CLIP model is able to embed queries in the supported languages to the same vector space, allowing similarity search across languages. This works in the other direction as well. If an image caption contains text in any of the 50+ languages, then those are searchable using vector similarity search from the text embedding index.

Multilingual neural search with OpenSearch and APU

Batch search

In the batch search scenario we wanted to enable uploading of multiple queries to the search app to demonstrate the ability to handle multiple queries in parallel. For this we implemented the ability to upload a text file with search queries separated on different lines. See a sample list of 20 queries below:

As you can see from the screenshot below, the Muves search app and the APU backend are able to handle parallel similarity search of 20 queries with a relatively small increase in query time: 284 ms. While the overall latency increases a little bit in multiquery case, the per-query latency amortized over the batch of queries is lower.

Multi-query multimodal and multilingual search with OpenSearch and APU

If we compare this with batch search using the OpenSearch as a backend, we see that the query times are very similar.

Batch keyword search with OpenSearch

So by using the APU backend we can achieve OpenSearch performance (latency) of keyword search in our vector search application, which is quite cool.

Results

During this implementation we learnt several key lessons:

  • Result 1: Building multilingual and multimodal search with the existing models today with the assumption that you operate in-domain is super easy.
  • Result 2: With Muves we have implemented our neural search demo with the APU much faster than figuring out all the details from scratch.
  • Result 3: The argument that vector search is not very high-performing and does not scale easily — is not true. With the APU we were able to bring even batch query (with 20 concurrent queries in it) with similar latency to that of traditional OpenSearch keyword retrieval. With this demo we prove, that implementing neural search at scale in production is feasible and does not require moving to completely new vector search DB, if you are already using Elasticsearch / OpenSearch.

We also believe, that:

  1. Vector search can significantly improve the recall for your database.
  2. Choosing the dataset with high-res images was crucial for successful application of the CLIP model. In particular, Amazon’s product dataset did not satisfy this criterion and did not work with CLIP at all. This tells us that fine-tuning the model on your dataset might be necessary.
  3. APU gives you the ability to scale vector search to tens of millions and even billions of documents effortlessly. This is a big boost to launching it at production scale.
  4. Batched queries are the way to support interesting scenarios in your product, like loading results all at once for cross-examination or email alerts for predefined user queries.
  5. In a production setting you most likely will need filter support (like colour, size or item popularity) and therefore using libraries like FAISS will not be sufficient. OpenSearch stood out with general filter support, but it does not implement the pre-filter feature to first filter down the space of documents and then run a neural search on them. The APU addresses this issue and allows neural search with symbolic filtering at scale by implementing it in the APU’s efficient hardware backend with its own pre-filter algorithm.
  6. With Muves you can build neural search architectures by re-using the key components for indexing, querying and displaying results in Elasticsearch, OpenSearch and Solr. If you have a project where you’d like to implement neural search, we’d be happy to talk about it and where Muves can help.

We are in the beginning of our journey with Muves, and with this demo showcasing the power of multimodal multilingual vector search on top of GSI APU hardware backend, our vision is to bring such scenarios to life quicker and build on top of your Elasticsearch / OpenSearch setup.

For more information about GSI’s APU solution for OpenSearch, you can contact them at opensearch@gsitechnology.com. Read more: https://www.searchium.ai/

To get more information / request a demo of Muves, please reach out to info@muves.io.

References

  1. Muves
  2. OpenSearch
  3. GSI Technology Inc.
  4. GSI APU
  5. LAION-400M
  6. Dmitry Kan (2021), Speeding up BERT Search in Elasticsearch in Towards Data Science
  7. CLIP: https://openai.com/blog/clip/

--

--

Founder and host of Vector Podcast, tech team lead, software engineer, manager, but also: cat lover and cyclist. Host: https://www.youtube.com/c/VectorPodcast