In recent years, the integration between generative AI and structured corporate data has become one of the most promising areas in software engineering. This project — developed as part of the Full Cycle MBA in Software Engineering with AI — presents a practical RAG (Retrieval‑Augmented Generation) architecture using Python, LangChain and PostgreSQL with the pgVector extension.
The RAG concept (Retrieval‑Augmented Generation)
RAG combines information retrieval with natural language generation. Unlike a pure LLM that depends only on the knowledge baked into its parameters, RAG fetches information from external sources (databases, PDFs, internal docs) and injects it into the prompt context before generating an answer. This improves accuracy, traceability and keeps knowledge up‑to‑date.
Architecture and main components
The architecture is modular and extensible, built on three main layers:
- Ingestion: loads a PDF, splits it into chunks (1000 chars with 150 overlap) and generates embeddings using providers like HuggingFace, OpenAI or Gemini.
- Storage: saves generated vectors in PostgreSQL using pgVector, enabling similarity search by vector distance.
- Search and Answer: queries the vector store, retrieves the most relevant contexts and sends them to the LLM, which answers based strictly on the retrieved content.
├── docker-compose.yml
├── requirements.txt
├── .env.example
├── document.pdf
└── src/
├── ingest.py # Ingestion pipeline and PDF vectorization
├── search.py # Semantic search and context assembly
└── chat.py # Interactive CLI with rule-based fallbackIngestion pipeline
The ingestion step transforms text into numerical vector representations (embeddings). The system supports three providers:
- 🧠 HuggingFace (Local): all‑MiniLM‑L6‑v2 model, ideal for environments without API access.
- ☁️ OpenAI (text‑embedding‑3‑small): fast and cost‑effective.
- ⚡ Gemini (models/embedding‑001): Google alternative with Generative AI integration.
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_postgres import PGVector
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
PGVector.from_documents(
documents=chunks,
embedding=embeddings,
connection=engine,
collection_name='pdf_chunks'
)Semantic search and answer generation
At query time, the system embeds the question and performs a similarity search on the vector database. The top 10 chunks are concatenated to form the answer context. If the LLM is unavailable, a rule‑based fallback searches for explicit patterns (e.g., money values or company names).
context = build_context(results)
chain = prompt | llm | StrOutputParser()
response = chain.invoke({"context": context, "question": question})Practical applications in other projects
This architecture can be reused across corporate scenarios and SaaS products. Some real examples include:
- 📚 Internal support assistants: bots that answer questions based on technical manuals and company policies.
- 🏥 Health and nutrition: integration with medical reports or food databases for personalized recommendations (like in the MovePro project).
- 🏢 Compliance and legal: semantic search in legal documents to speed up due diligence and audits.
- 🛒 E‑commerce: contextualized product recommendations based on descriptions and catalogs.
- 📊 Data Intelligence: pairing with ETL pipelines for unstructured data analysis and hybrid SQL + embedding queries.
Lessons learned and next steps
The project highlighted the power of combining generative AI with traditional relational databases. Key takeaways: keep vector dimensions consistent, use logical fallbacks for offline environments, and modularize code to swap providers easily. Next, we plan to expose this architecture as a REST API with FastAPI and build a Next.js web interface.
RAG architecture with LangChain, PostgreSQL and pgVectorIntegrating LangChain with pgVector is a clean and scalable way to bring contextualized AI to modern enterprise apps — a solid path to turning data into intelligence.
