Performing Semantic Search on Unstructured TRIRIGA Data

Wednesday, March 6, 2024

NOTE: You can find the entire code repository covered in this post over here

 

I'm a big advocate of using local models for data privacy. When I encountered this project, my mind was blown! Within my browser, I can find pieces of text by similarity. I had to explore a way to implement a similar flow with TRIRIGA.

Although TRIRIGA contains a lot of good structured data, there are scenarios where users are required to search through raw text input. Some examples include lease clause texts, survey results, work task resolution descriptions, service request descriptions, reservation notes and general comments on move or capital projects. This tool can be purpose-built to give power users another way of querying and filtering for data without typing in the exact terms.

This UX application combines Transformers.js, a HuggingFace sentence transformer, the Voy in-memory vector database and TRIRIGA data to process user queries. The flow is as follows:

 

flowchart TD
    A[User Query] --> B(Generate Embedding)
    C[Report Data] -->D(Generate Embedding Per Row)
    D -->|Store| E[Vector Database]
    B -->|Input| F(Calculate Similarity)
    E -->|Input| F
    F --> G(Return Top X Results)

 

Everything runs in the browser. The model is downloaded when a report is loaded and the vector database runs using WebAssembly (WASM). I mainly built this to learn about WASM and to further dig deeper into vector databases. It's clear that we have many new technologies at our disposal within a user's browser. As systems become more powerful, it will be possible to run small LLM's within a browser!

 

Quick Demo

You can test drive this in your browser right here! Please see below for a quick video demo:

Running on recycled hardware from my closet