Sometimes a model ignores the retrieved chunks, either generating responses from its internal knowledge or only using a small portion of the retrieved context. This results in responses that do not fully reflect the most relevant information.

Example of the Problem

User Query: “Who discovered penicillin?”

Retrieved Chunks:

  1. “Alexander Fleming was a Scottish bacteriologist.”
  2. “Fleming conducted experiments on mold in the 1920s.”
  3. “His work led to the development of antibiotics.”

Model Response: “Penicillin was discovered in the 1920s by a scientist.”

What Went Wrong in the Response?

  • The response is technically correct but fails to attribute the retrieved chunks by omitting the name Alexander Fleming.
  • The model is providing vague phrasing that does not leverage the retrieved specifics.
  • While the retrieved chunks contained useful details, the model only made minimal use of them, leading to low Chunk Utilization.

Example Better Response

Better Model Response: “Scottish bacteriologist Alexander Fleming conducted experiments on mold in the 1920s, leading to the discovery of penicillin and the development of antibiotics (Sources: 1, 2, 3).”

What Could Be The Root Cause?

  • What We Did Wrong:
    • Retrieved too many or too few chunks, leading to excessive or insufficient context.
    • The retrieval system returned relevant chunks, but the model failed to incorporate them effectively.
    • The model relied too heavily on general knowledge instead of retrieved content.
  • How It Showed Up in Metrics:
    • Low Chunk Attribution: Retrieved chunks did not affect the model’s response.
    • High Chunk Attribution but Low Chunk Utilization: The model referenced retrieved data but incorporated only minimal text from it.

Improvements and Solutions

1

Improve Retrieval and Chunking Strategy

  • Use adaptive chunking instead of fixed-size chunks to ensure that complete facts are retrieved together.
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,  # Adaptive size based on text structure
    chunk_overlap=50,
    separators=["\n\n", ". ", " "],  # Adaptive chunking using logical breaks
)

chunks = text_splitter.split_text(long_document)
  • Increase retrieval precision by tuning similarity thresholds and using query expansion to better match relevant documents.
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("all-MiniLM-L6-v2")
query = "climate change impact on oceans"
expanded_queries = ["effects of climate change on marine life", "ocean temperature rise effects"]

query_embedding = model.encode(query, convert_to_tensor=True)
doc_embeddings = model.encode(expanded_queries, convert_to_tensor=True)

similarity_scores = util.pytorch_cos_sim(query_embedding, doc_embeddings)

filtered_docs = [doc for doc, score in zip(expanded_queries, similarity_scores[0]) if score > 0.7]
  • Reduce redundant or extraneous chunks by dynamically adjusting retrieval depth based on query complexity.
def adjust_retrieval_depth(query, base_depth=5):
    complexity_score = len(query.split())  # Approximate query complexity
    return min(base_depth + complexity_score // 3, 20)  # Adjust dynamically

retrieval_depth = adjust_retrieval_depth("Explain quantum entanglement in simple terms.")
retrieved_docs = retrieve_documents(query, depth=retrieval_depth)
2

Modify Prompting to Improve Chunk Usage

  • Explicitly instruct the model to utilize all relevant retrieved chunks:

    Terminal
    Instruction: Ensure your answer is directly supported by all retrieved context. Use named entities and relevant details present in the retrieved text.
    
  • Encourage structured responses that mirror retrieved data closely.

def format_response(retrieved_chunks):
    structured_response = "\n".join(
        [f"- {chunk}" for chunk in retrieved_chunks]
    )
    return f"Based on retrieved information:\n{structured_response}"

response = format_response(relevant_chunks)
3

Improve Model Processing of Retrieved Chunks

  • Reformat retrieved chunks before passing them to the model, ensuring they are structured for easy extraction.
def reformat_chunks(chunks):
    return [{"source": f"Doc {i+1}", "content": chunk.strip()} for i, chunk in enumerate(chunks)]

formatted_chunks = reformat_chunks(retrieved_chunks)
  • Apply retrieval-augmented generation (RAG) strategies that force the model to cite specific chunks.
def generate_with_citations(model, query, retrieved_chunks):
    citations = [f"[{i+1}]: {chunk}" for i, chunk in enumerate(retrieved_chunks)]
    context = "\n".join(citations)
    prompt = f"Answer the query using the retrieved data:\n\n{context}\n\nQuery: {query}"
    return model.generate(prompt)

response = generate_with_citations(model, "Explain black holes.", retrieved_chunks)
  • Use citation markers (e.g., “According to source [1]…”) to ensure traceability.
def insert_citations(response, retrieved_chunks):
    for i, chunk in enumerate(retrieved_chunks):
        response = response.replace(chunk, f"{chunk} [Source {i+1}]")
    return response

final_response = insert_citations(model_output, retrieved_chunks)
  • Implement an extraction-first approach where the model generates responses directly based on retrieved content before synthesizing an answer.
def extraction_first_approach(retrieved_chunks):
    extracted_facts = [extract_key_info(chunk) for chunk in retrieved_chunks]
    synthesized_answer = " ".join(extracted_facts)  # Combine extracted data
    return synthesized_answer

response = extraction_first_approach(retrieved_chunks)
  • Fine-tune the model to better incorporate retrieved data - fine-tune the model to prioritize direct chunk referencing over paraphrasing.
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./model_output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=fine_tuning_dataset,  # Dataset contains explicit chunk references
)

trainer.train()
4

Analyze and Adjust Chunk Attribution

  • Use Chunk Attribution Plus to understand why certain chunks were ignored.
def analyze_chunk_usage(model_output, retrieved_chunks):
    ignored_chunks = [chunk for chunk in retrieved_chunks if chunk not in model_output]
    return ignored_chunks

ignored = analyze_chunk_usage(model_response, retrieved_chunks)
  • Optimize post-processing of retrieved documents to highlight key sentences before sending them to the model.
def highlight_key_sentences(text):
    sentences = text.split(". ")
    important_sentences = [s for s in sentences if "important" in s or "key" in s]
    return "\n".join(important_sentences)

highlighted_text = highlight_key_sentences(retrieved_doc)
  • Encourage chunk aggregation techniques that combine highly relevant text segments for improved response coherence.
def aggregate_chunks(retrieved_chunks):
    return " ".join(retrieved_chunks[:3])  # Combine top 3 relevant chunks

aggregated_context = aggregate_chunks(retrieved_chunks)