ToOne: transform documents into answers

RAG

AI solution accelerator

GPT-4o

In use

Embeddings

In use

Keycloak

Identity and access managment

Weaviate

Database

React

Front End

Case summary

How can businesses rapidly extract, understand, and use insights buried within oceans of unstructured data? Think about contracts, specifications, instruction manuals, policies… To address this challenge, we've created 'ToOne', an AI-powered business analyst application designed to retrieve precise information from extensive documentation in mere seconds. Simply by typing a query into a chat interface, users can get a precise answer instantly. With ToOne, insights are just one query away.

Flexible document management with secure access

Upload and organize standard business formats (.txt, .pdf, .doc, .docx, .md, .html) into dedicated project spaces with granular access controls. Thanks to ToOne's modular design, the system can be easily extended to support additional file types and integrate with various data sources like JIRA tickets, wikis and email systems. Team members see only what they need, while you maintain the flexibility to expand document coverage as your needs grow.

Q&A in a conversational manner

Should a user struggle with sifting through hundreds of pages of documentation, they can merely pose a question, and the application will provide a precise answer in seconds.

Trust every answer with source-backed responses

Every response comes with direct links to the source documents, letting you verify information instantly and dive deeper when needed. This transparent approach ensures you can always trace answers back to their original context.

Estimate 10.12.2023.xls

Ticket 2345

Azure DevOps Integration

In addition, ToOne has the ability to synchronize with AzDo tickets, permitting users to extract and seek information from project tickets easily.

Under the hood: ToOne's core capabilities

How does ToOne search for the right answer?

ToOne uses Retrieval Augmented Generation (RAG) architecture to process queries against the document base. Document chunks are embedded and semantically indexed during ingestion, enabling contextually relevant passage retrieval at query time. The retrieved passages are then used to augment Large Language Model prompts, ensuring responses are both accurate and anchored in source documentation.

Uploading and pre-processing documents

Documents are organized into projects, with metadata describing them to help resolve potential consistency issues. To ensure optimal processing, we prepare documents for embedding by splitting them into smaller chunks first.

Embedding process

The app converts individual chunks of documents into embeddings, and then stores them in a vector database. Options for embedding include OpenAI API and local open-source models. The numerical form of embeddings allows AI models to easily understand the text context, predict similarities, and provide accurate responses.

Questions & Answers

Users can interact with ToOne by asking questions, which the application embeds in order to search the database for the most suitable responses. This contextual search process involves ranking top responses, which are sent to OpenAI for interpretation by the LLM.

How is the Azure DevOps integration set up?

Authentication

The application implements OAuth2 authentication protocol to connect with Azure DevOps organizations. Persistent access to the Azure DevOps REST API is maintained through authorization tokens, minimizing authentication overhead.

Permissions management

The system inherits Azure DevOps permission structure with two primary access tiers: administrative (Project Manager, Business Analyst) and general user. This model preserves existing organizational access controls while supporting ingestion of various Azure DevOps artifacts.

Main challenges

Estimate the project

Avoiding GPT hallucinations.

We aimed for precise answers from documents without compromising on the conversational human-like outputs, in other words, minimal hallucinations from GPT models. To achieve this, we relied on several techniques to prompt, such as assigning roles and demonstrating examples. This approach led to an improvement in output accuracy.

No hallucinations

OpenAI embeddings as a technical solution.

We opted for OpenAI embeddings, in contrast to other AI techniques, due to its ability to process vast amounts of data — an impressive feature that provides a strategic advantage in data management.

Data embedding

Document preparation for embedding.

The process of preparing business documents for effective embedding brought another hurdle our way. Our solution involved breaking vast documents into smaller, digestible chunks. This included formatting those documents — deleting empty lines and tidying up the appearance.

Big docs embedding

Data synchronization with AzDo.

Handling extensive datasets demands intelligent data handling strategies due to REST API limitations. We are looking into ways to sync only data changes, a feat we find both challenging and crucial.

AzDo API

Next steps

Development roadmap

Broader third-party integration.

As the application already successfully integrates with AzDo, connecting it with other popular platforms such as Google Workspace, Jira, and GitHub can extend its versatility and utility. This will enable users to retrieve data from various sources, simplifying the data analysis process.