Beyond the Basics of Retrieval for Augmenting Generation#
https://parlance-labs.com/talks/rag/ben.html
RAGatouille: library for RAG.
https://github.com/bclavie/RAGatouille
Types of Embeddings#
dense embeddings (like OpenAI’s
text-ada-002): a fine baseline, but often fails.ColBERT (Contextualized Late Interaction over BERT): potentially better approach that generalizes to new or complex domains better than dense embeddings.
Why are they called dense embeddings?
The term “dense embeddings” refers to a type of vector representation in which each item (such as a word, sentence, or document) is mapped to a continuous, high-dimensional vector space. These vectors are “dense” because most of the elements in the vector are non-zero, in contrast to “sparse” representations where most elements are zero.