PDF Question Answering
Retrieval Augmented Generation
This project is an interactive PDF Question
Answering System powered by state-of-the-art NLP models. By leveraging BERT
(Bidirectional Encoder Representations from Transformers) for semantic understanding
and DistilGPT-2 for contextual answer generation, the application allows users to
upload PDF documents and ask questions directly about the content. The system
retrieves the most relevant text segments from the document and generates precise,
context-aware responses.
Key Features
-
PDF Parsing: Automatically extracts text from
uploaded
PDF documents.
-
Chunk-Based Processing: Splits the document into
manageable chunks for efficient searching.
-
Semantic Search with FAISS: Implements a
high-performance FAISS (Facebook AI Similarity Search) index to quickly locate
relevant content based on the user's question.
-
Contextual Answer Generation: Uses DistilGPT-2 to
generate coherent and context-specific answers from the extracted text.
-
User-Friendly Interface: Built with Gradio to
offer an
interactive and simple UI for easy file uploads and question inputs.
PDF documents.
Technologies Used
-
Transformers (HuggingFace): For pre-trained
language models (BERT, DistilGPT-2).
-
FAISS (Facebook AI Similarity Search): Efficient
similarity search for fast, accurate text retrieval.
-
PyPDF2: Text extraction from PDF files.
-
Gradio: Interactive web interface for real-time
Q&A sessions.
-
Numpy: For numerical operations and matrix
manipulations.
How It Work
-
Upload a PDF Document The system extracts and
parses the text.
-
Ask a Question Enter any question relevant to the
document's content.
-
Semantic Search FAISS locates the top-matching
text segments.
-
Contextual Answering DistilGPT-2 generates
answers based on the extracted context.
-
Receive Response View the generated answers
instantly on the interface.