Beyond the Basics of Retrieval for Augmenting Generation

Beyond the Basics of Retrieval for Augmenting Generation#

https://parlance-labs.com/talks/rag/ben.html

RAGatouille: library for RAG.

https://github.com/bclavie/RAGatouille

Types of Embeddings#

  • dense embeddings (like OpenAI’s text-ada-002): a fine baseline, but often fails.

  • ColBERT (Contextualized Late Interaction over BERT): potentially better approach that generalizes to new or complex domains better than dense embeddings.

Why are they called dense embeddings?

The term “dense embeddings” refers to a type of vector representation in which each item (such as a word, sentence, or document) is mapped to a continuous, high-dimensional vector space. These vectors are “dense” because most of the elements in the vector are non-zero, in contrast to “sparse” representations where most elements are zero.