Creating an indexed file

If you need to index a file for semantic search just pass the index: true parameter when creating the file.

import { Client } from '@botpress/client'

const client = new Client({
  token: process.env.BOTPRESS_PAT,
  botId: process.env.BOTPRESS_BOT_ID,
})

const file = await client.uploadFile({
  key: 'test.txt',
  content: 'This is a test file',
  index: true,
})

Supported file formats

The following file formats are supported for indexing:

FormatFile ExtensionMIME Type
PDF.pdfapplication/pdf
HTML.htmltext/html
Text.txttext/plain
Markdown.mdtext/markdown

Notes on indexing

  • The file it will initially have a status of “indexing_pending” and will be indexed asynchronously. The time it takes to index the file will depend on the file size and the current load on the system.
  • You can check the status of the file by calling the Get File endpoint and checking that the file.status property has changed to “indexing_completed”.
  • If the indexing failed the status will be set to “indexing_failed” and the reason of the failure will be available in the failedStatusReason property of the file.

Searching files

To run a semantic search on your bot’s files you can use the Search Files API endpoint. This is particularly useful for RAG (Retrieval Augmented Generation) implementations.

const res = await client.searchFiles({
  query: 'what are the most popular products of your company?', // This is the natural language query (e.g. a user's question) to be used for running a semantic search on the files of the bot.
  contextDepth: 1, // Optional (default: 0), allows you prepend and append the surrounding context to each matching passage in the search results
  tags: {
    // Optional, allows you to only search files that match the specified tags. Useful if you only want to search a subset of the bot's files.
    category: 'Support',
    subcategory: 'Technical',
  },
  limit: 30, // Default is 20, maximum is 50 results
})

const passages = res.data.passages

for (const passage of passages) {
  // You can use passage.content here to retrieve the content (including surrounding context, if any) of each matching passage
}

You can check the API reference for more details on the properties available for each passage object returned in the response.

Using the search results for RAG

You can use the search results to provide relevant information to a chatbot for generating a response to a user’s question.

Here’s an example of a simple RAG (Retrieval Augmented Generation) implementation that shows how you can use the ChatGPT large language model (through the OpenAI API) and the search results provided by our API to answer a user’s question:

import OpenAI from 'openai' // You'll need to install this package first using your favorite package manager

const openai = new OpenAI({ apiKey: 'YOUR_OPENAI_API_KEY' })) // Change this for your own OpenAI API key

const userQuestion = 'Which are the most popular products of your company?' // This will come from the user's input

const res = await client.searchFiles({
  query: userQuestion,
  contextDepth: 1, // This number can be increased to pass more context to the model for each matching passage
})

const relevantInformation = res.data.passages.map(passage => passage.content).join("\n\n----------\n\n")

const answer = await openai.chat.completions.create({
    model: 'gpt-4o-mini', //  This can be changed for any other OpenAI model available here: https://platform.openai.com/docs/models
    messages: [
        {
            role: 'system',
            // You can change the text below to include any additional instructions to the model
            content: "You are a helpful assistant that answers the user's question based only on the relevant information you are provided along with the question.",
        },
        {
            role: 'user',
            content: `## RELEVANT INFORMATION: \n\n ${relevantInformation}\n\n` +
                     `## ANSWER THE FOLLOWING USER'S QUESTION: \n\n ${userQuestion}`,
        },
    ],
})