Scribbly

RAG는 Retrieval-Augmented Generation의 약자로, "검색 기반 생성형 AI" 를 뜻한다.

RAG는 질문에 직접 답을 생성하기 전에,
외부 문서(데이터베이스, PDF, 위키 등)에서 관련 정보를 검색(retrieve) 한 후,
그 결과를 바탕으로 답변을 생성(generate) 하는 방식이다.

질문 입력
사용자가 질문을 입력한다.
검색 단계 (Retrieval)
임베딩 벡터를 이용해 유사한 문서나 문단을 찾는다 (예: ChromaDB, Pinecone).
생성 단계 (Generation)
검색된 내용을 프롬프트에 삽입하고, LLM이 답변을 생성한다.

ChromaDB란?

ChromaDB는 AI 시스템에서 사용할 수 있는 벡터 데이터베이스이다.
텍스트, 이미지 등 비정형 데이터를 임베딩 벡터와 함께 저장하고, 벡터 유사도 기반 검색을 수행할 수 있다.

예를 들어, 수능특강이라는 이름의 컬렉션을 생성하면 다음과 같은 형식으로 데이터가 저장된다.

컬렉션 이름: 수능특강
document: "이 책의 차례 1. 지수와 로그..."
embedding: [234.2342, 1452.234, ...]
metadata:

{
  "year": 2025,
  "subject": "수학 1",
  "chunk_index": 0,
  "source": "2026 수능특강 수학 1",
  "start_index": 0,
  "end_index": 983
}

데이터는 내부적으로 document, embedding, metadata의 세 가지 항목으로 구성되며, 메타데이터를 활용해 필터링 검색도 지원한다.

ChromaDB 설치 및 실행

1. 가상환경 설정

ChromaDB는 Python 기반 패키지이므로, 별도의 가상환경을 구성하여 사용하는 것이 권장된다.

# 가상환경 생성
python3 -m venv .venv

# 가상환경 활성화 (Mac/Linux)
source .venv/bin/activate

2. ChromaDB 설치

pip install chromadb

3. ChromaDB 서버 실행

# 가상환경 활성화
source .venv/bin/activate

# 서버 실행
chroma run --host localhost --port 8000 --path ./chroma_data

--path 옵션을 통해 벡터 데이터를 저장할 로컬 디렉터리를 지정할 수 있다.

mxbai-embed-large를 이용한 벡터 저장

1. 텍스트 청킹

import pdf from "pdf-parse";

    const pdfData = await pdf(buffer);
    const extractedText = pdfData.text;
    const chunks = chunkText(extractedText);

pdf-parser를 이용해 pdf를 text화 하고, text를 1000자 단위로 자른다.

function chunkText(
  text: string,
  chunkSize: number = 1000,
  overlap: number = 200
): Array<{
  text: string;
  startIndex: number;
  endIndex: number;
  chunkIndex: number;
}> {
  const chunks = [];
  let startIndex = 0;
  let chunkIndex = 0;

  while (startIndex < text.length) {
    const endIndex = Math.min(startIndex + chunkSize, text.length);
    const chunkText = text.slice(startIndex, endIndex);

    chunks.push({
      text: chunkText,
      startIndex,
      endIndex,
      chunkIndex,
    });

    startIndex += chunkSize - overlap;
    chunkIndex++;
  }

  return chunks;
}

2. Ollama를 이용해 Chroma에 연동하기

@chroma-core/ollama 공식문서

import { ChromaClient } from "chromadb";

    // Ollama 임베딩 함수 초기화
    const embedder = new OllamaEmbeddingFunction({
      model: "mxbai-embed-large",
      url: "http://localhost:11434",
    });

OllamaEmbeddingFunction이라는 라이브러리를 통해 embedder를 만들어준다. OllamaEmbeddingFunction는 openAI와 같은 임베딩 function 형식으로 만들어준다.

    import { OllamaEmbeddingFunction } from "@chroma-core/ollama";

    // Chroma 클라이언트 초기화
    const client = new ChromaClient({
      host: "localhost",
      port: 8000,
    });

    const collection = await client.getOrCreateCollection({
      name: "pdfs",
      embeddingFunction: embedder,
    });

만들어진 embedder를 통해 크로마의 콜렉션을 만든다. (gerOrCreate는 get을 시도하고, 콜렉션이 없으면 콜렉션을 생성한 뒤 다시 get을 시도한다.)

    for (const chunk of chunks) {
      const documentId = `pdf_${Date.now()}_${chunk.chunkIndex}`;

      const metadata = {
        source: source || file.name,
        category: category || "pdf",
        filename: file.name,
        fileSize: file.size,
        totalChunks: chunks.length,
        chunkIndex: chunk.chunkIndex,
        startIndex: chunk.startIndex,
        endIndex: chunk.endIndex,
        chunkSize: chunk.text.length,
        ....
      };

      documents.push(chunk.text);
      metadatas.push(metadata);
      ids.push(documentId);
    }
    await collection.add({
      documents,
      metadatas: metadatas as Metadata[],
      ids,
    });

청크들을 크로마에 .add로 저장하면 "mxbai-embed-large"를 통해 벡터화되어 저장된다.
"mxbai-embed-large"는 1024 차원을 가진다.

import { NextRequest, NextResponse } from "next/server";
import { ChromaClient, Metadata } from "chromadb";
import { OllamaEmbeddingFunction } from "@chroma-core/ollama";
import ollama from "ollama";
import pdf from "pdf-parse";

const MODEL_NAME = "mxbai-embed-large";

// 텍스트 청킹 함수 (1000자씩, 오프셋 200자)
function chunkText(
  text: string,
  chunkSize: number = 1000,
  overlap: number = 200
): Array<{
  text: string;
  startIndex: number;
  endIndex: number;
  chunkIndex: number;
}> {
  const chunks = [];
  let startIndex = 0;
  let chunkIndex = 0;

  while (startIndex < text.length) {
    const endIndex = Math.min(startIndex + chunkSize, text.length);
    const chunkText = text.slice(startIndex, endIndex);

    chunks.push({
      text: chunkText,
      startIndex,
      endIndex,
      chunkIndex,
    });

    startIndex += chunkSize - overlap;
    chunkIndex++;
  }

  return chunks;
}

export async function POST(request: NextRequest) {
  try {
    const formData = await request.formData();
    const file = formData.get("file") as File;
    const source = (formData.get("source") as string) || "";
    const category = (formData.get("category") as string) || "";

    // PDF 텍스트 추출
    const arrayBuffer = await file.arrayBuffer();
    const buffer = Buffer.from(arrayBuffer);

    const pdfData = await pdf(buffer);
    const extractedText = pdfData.text;

    // 텍스트 청킹
    const chunks = chunkText(extractedText);

    // Chroma 클라이언트 초기화
    const client = new ChromaClient({
      host: "localhost",
      port: 8000,
    });

    // Ollama 임베딩 함수 초기화
    const embedder = new OllamaEmbeddingFunction({
      model: MODEL_NAME,
      url: "http://localhost:11434",
    });

    // 컬렉션 가져오기 또는 생성
    const collection = await client.getOrCreateCollection({
      name: "pdfs",
      embeddingFunction: embedder,
      metadata: {
        "hnsw:space": "cosine",
      },
    });

    // 청크들을 ChromaDB에 저장
    const documents: string[] = [];
    const metadatas: Metadata[] = [];
    const ids: string[] = [];

    for (const chunk of chunks) {
      const documentId = `pdf_${Date.now()}_${chunk.chunkIndex}`;

      const metadata = {
        source: source || file.name,
        category: category || "pdf",
        filename: file.name,
        fileSize: file.size,
        totalChunks: chunks.length,
        chunkIndex: chunk.chunkIndex,
      };

      documents.push(chunk.text);
      metadatas.push(metadata);
      ids.push(documentId);
    }

    // ChromaDB에 일괄 저장
    await collection.add({
      documents,
      metadatas: metadatas as Metadata[],
      ids,
    });

    return NextResponse.json({
      success: true,
      message: `PDF가 성공적으로 처리되어 ${chunks.length}개의 청크로 저장되었습니다.`,
    });
  } catch (error) {
    console.error("PDF 처리 중 오류 발생:", error);

    return NextResponse.json(
      {
        error: "PDF 처리 중 오류가 발생했습니다.",
        details: error instanceof Error ? error.message : "알 수 없는 오류",
      },
      { status: 500 }
    );
  }
}

Retriver 툴 만들기

크로마 DB에서 검색할 때에는 client.query(options)를 입력하면 된다. 공식문서 Query and Get Data from Chroma Collections

// 문자열로 검색하기
await collection.query({
    queryTexts: ["thus spake zarathustra", "the oracle speaks"]
})

// 임베딩으로 검색하기 + 최대 5개 검색
await collection.query({
    queryEmbeddings: [[11.1, 12.1, 13.1], [1.1, 2.3, 3.2], ...],
    nResults: 5
})

쿼리를 입력하면 코사인 유사도 검색을 통해 데이터를 가져온다.

이를 이용하여 retriver 툴을 만들어 모델에 추가해주면 된다.

const pdfSearchTool = tool(
  async (input: {
    query: string[] | string;
    limit?: number;
    filename?: string;
  }) => {
    try {
      const { query, limit = 3, filename } = input;

      // Chroma 클라이언트 초기화
      const client = new ChromaClient({
        host: "localhost",
        port: 8000,
      });
      
      // Ollama 임베딩 함수 초기화
      const embedder = new OllamaEmbeddingFunction({
        model: EMBEDDING_MODEL,
        url: "http://localhost:11434",
      });

      // 컬렉션 가져오기 또는 생성
      const chromaCollection = await client.getOrCreateCollection({
        name: "pdfs",
        embeddingFunction: embedder,
        metadata: {
          "hnsw:space": "cosine",
        },
      });

      // 검색 조건 설정
      const searchOptions: {
        queryTexts: string[];
        nResults: number;
        include: ("documents" | "metadatas" | "distances")[];
        where?: { filename: string };
      } = {
        queryTexts: typeof query === "string" ? [query] : [...query],
        nResults: limit,
        include: ["documents", "metadatas", "distances"],
      };

      // 파일명 필터 추가
      if (filename) {
        searchOptions.where = { filename: filename };
      }

      // 검색 수행
      let results = await chromaCollection.query(searchOptions);

      // 파일명으로 검색해서 없는 경우에 전체 검색을 한 번 더 수행하도록 예외처리
      if (filename && !results.documents?.[0]?.length) {
        delete searchOptions.where;
        results = await chromaCollection.query(searchOptions);
      }

      // 검색 결과 포맷팅
      const formattedResults =
        results.documents?.[0]?.map((doc, index) => ({
          id: index + 1,
          content: doc,
          metadata: results.metadatas?.[0]?.[index] || {},
          score: results.distances?.[0]?.[index]
            ? (1 - results.distances[0][index]).toFixed(4)
            : "N/A",
        })) || [];

      return JSON.stringify(
        {
          query,
          results: formattedResults,
          totalFound: formattedResults.length,
        },
        null,
        2
      );
    } catch (error) {
      console.error("PDF 검색 중 오류:", error);
      return `PDF 검색 중 오류가 발생했습니다: ${
        error instanceof Error ? error.message : "알 수 없는 오류"
      }`;
    }
  },
  {
    name: "pdf_search",
    description: "PDF 문서에서 특정 내용을 검색합니다.",
    schema: z.object({
      query: z.array(z.string()).describe("검색할 질문이나 키워드 (string[])"),
      limit: z.number().optional().describe("반환할 결과 수. (기본값: 3)"),
      filename: z.string().optional().describe("기본값: undefined"),
    }),
  }
);

Retriver 툴을 실행할 때 mxbai-embed-large 모델을 호출하게 되는데, ollama는 기본적으로 시스템 자원에 따라 여러 모델을 동시에 실행하려고 시도한다. 즉 Qwen3 1.7b모델과 mxbai-embed-large 모델을 동시에 실행한다. 시스템 용량이 부족한 경우에는 기존 모델을 언마운트 시켰다가 다시 마운트 시킨다.

RAG 테스트

먼저 RAG 기능이 없는 AGENT의 응답이다.

오징어 게임 3가 존재하지 않는 프로그램이라고 대답한다.

RAG 기능을 이용하면 PDF를 검색하고 대답을 시도하는 것을 확인할 수 있다.

사용자 : 오징어 게임 3 우승자가 누구야?
AGENT : <think>
Okay, let's see. The user asked, "오징어 게임 3 우승자가 누구야?" which means "Who is the winner of game 3 in the game of fish?" I need to find the answer using the pdf_search tool.

First, I called the pdf_search function with the query ["오징어 게임 3 우승자가 누구야?"]. The response came back with three results. Looking at the content, the first result has a score of -237.4221, which seems pretty low, but the content mentions that the winner is Kim Joo-hye. The second result has a lower score but talks about the game's progression and the final standings. The third result also mentions the winner as Kim Joo-hye.

The key part in the first result is where it says "우승자인 김준희의 자녀와 박경석을 제외한 참가자 전원이 사망하였다." So, the winner is Kim Joo-hye's child and Park Jeong-seok, and all others died. But the question is about the winner of game 3, which is Kim Joo-hye. However, in the response, it's stated that the winner is Kim Joo-hye's child and Park Jeong-seok, but the actual winner might be Kim Joo-hye herself. Wait, the first result mentions that Kim Joo-hye's child and Park Jeong-seok died, but the winner is Kim Joo-hye. But the user is asking for the winner of game 3, which is Kim Joo-hye. However, in the response, the winner is mentioned as Kim Joo-hye's child and Park Jeong-seok, but the actual winner is Kim Joo-hye. So, the answer should be Kim Joo-hye. But I need to check if the response explicitly states that. The first result says "우승자인 김준희의 자녀와 박경석을 제외한 참가자 전원이 사망하였다." So, the winner is Kim Joo-hye, and the others died. Therefore, the answer is Kim Joo-hye.
</think>

오징어 게임 시즌 3의 우승자는 **김준희**입니다. 해당 게임에서 참가자 중 일부가 사망했지만, 우승자는 김준희의 자녀와 박경석을 제외한 나머지 참가자들이 모두 사망한 것으로 기록되었습니다. 따라서 최종 우승자는 **김준희**입니다. 

**정답:** 김준희

RAG에는 QWEN3 1.7B 모델로 더 가벼운 모델을 사용하였음에도 상대적으로 정확하게 대답하는 것을 확인할 수 있다.
물론 완벽하게 정확하진 않으며, 할루시네이션은 존재한다.
모델 크기를 늘리고, 가용 메모리를 늘리고, 더 정확하게 프롬프트 엔지니어링을 해야 할루시네이션이 줄어든다.

scribbly.