🧠 Generative AI – Core Interview Questions
Fundamentals
-
What is Generative AI, and how is it different from traditional AI?
-
How do generative models differ from discriminative models?
-
What are common types of generative models?
-
Explain the concept of probability distribution learning in GenAI.
-
What is autoregressive generation?
-
What is temperature in text generation?
-
What is top-k and top-p (nucleus) sampling?
-
How does beam search work, and when is it useful?
-
What causes hallucinations in generative models?
-
How do you evaluate generative AI outputs?
LLM Architecture & Training
-
Explain the Transformer architecture.
-
Why is self-attention better than RNNs for language tasks?
-
What is positional encoding?
-
Difference between encoder-only, decoder-only, and encoder-decoder models.
-
Why are decoder-only models used for LLMs?
-
What is pre-training vs fine-tuning?
-
What is instruction tuning?
-
What is RLHF (Reinforcement Learning from Human Feedback)?
-
What is catastrophic forgetting?
-
How do scaling laws apply to LLMs?
Prompt Engineering
-
What is prompt engineering?
-
Zero-shot vs one-shot vs few-shot prompting.
-
What is chain-of-thought prompting?
-
What is self-consistency prompting?
-
How do system prompts differ from user prompts?
-
How do you reduce hallucinations using prompts?
-
What are prompt injection attacks?
-
How do you protect LLMs from prompt leakage?
📚 RAG (Retrieval-Augmented Generation) – Core Questions
RAG Fundamentals
-
What is Retrieval-Augmented Generation (RAG)?
-
Why is RAG needed if LLMs are powerful?
-
Explain the RAG pipeline step by step.
-
How does RAG reduce hallucinations?
-
Difference between fine-tuning and RAG.
-
When should you prefer RAG over fine-tuning?
-
What types of data are best suited for RAG?
-
What is grounding in RAG systems?
Embeddings & Vector Search
-
What are embeddings?
-
How are text embeddings generated?
-
What is cosine similarity?
-
Difference between cosine similarity and dot product.
-
What is semantic search?
-
What are vector databases?
-
Popular vector databases used in RAG.
-
How does ANN (Approximate Nearest Neighbor) search work?
-
What is HNSW indexing?
-
How do you choose embedding dimensions?
Chunking & Indexing
-
What is document chunking in RAG?
-
Why is chunk size important?
-
Fixed vs semantic chunking.
-
What is overlap in chunking?
-
How do you handle tables and PDFs in RAG?
-
How do you index multi-modal data?
-
How do you handle document updates in RAG?
⚙️ Advanced RAG Questions
-
What is hybrid search?
-
How does keyword + vector search improve retrieval?
-
What is reranking in RAG?
-
What are cross-encoders vs bi-encoders?
-
How do you improve retrieval precision?
-
What is query expansion?
-
What is multi-hop RAG?
-
How does agent-based RAG work?
-
What is self-RAG?
-
What is recursive retrieval?
-
How do you handle long-context limitations?
-
How do you prevent outdated knowledge in RAG?
-
How do you evaluate RAG systems?
-
What metrics are used for RAG evaluation?
🏗️ System Design & Real-World Questions
-
Design a RAG system for an internal company knowledge base.
-
How would you build a RAG chatbot for legal/medical data?
-
How do you ensure data privacy in RAG?
-
How do you handle access control in RAG?
-
How do you reduce latency in a RAG pipeline?
-
How do you scale RAG for millions of documents?
-
How do you cache RAG responses?
-
How do you handle multilingual RAG?
-
How do you detect hallucinations at runtime?
-
How would you deploy a RAG system in production?
-
What are common failure modes of RAG systems?
🧪 Fine-Tuning vs RAG vs Tools
-
Fine-tuning vs RAG vs function calling – compare.
-
When does fine-tuning outperform RAG?
-
What is LoRA and PEFT?
-
Can RAG and fine-tuning be combined?
-
What is tool-augmented generation?
-
Difference between RAG and search-based QA systems.
-
How does RAG differ from traditional IR systems?
🔐 Security, Ethics & Governance
-
What are the security risks in GenAI systems?
-
How do you prevent data leakage in RAG?
-
What is model inversion?
-
What is training data poisoning?
-
How do you ensure explainability in RAG?
-
How do you log and audit LLM outputs?
-
Ethical risks of generative AI in enterprise use.
🚀 Practical / Coding / Debugging
-
How would you debug poor RAG answers?
-
What causes irrelevant context retrieval?
-
How do you improve answer faithfulness?
-
How do you handle noisy documents?
-
What happens if embeddings are poor quality?
-
How do you monitor RAG performance in production?
-
What tools/frameworks have you used for RAG?
-
Explain a GenAI or RAG project you’ve built end-to-end.
🧠 Advanced Generative AI (Deep Dive)
Model Behavior & Internals
-
Why do LLMs hallucinate even with correct context?
-
What is exposure bias in language models?
-
Explain tokenization and its impact on model performance.
-
BPE vs WordPiece vs SentencePiece.
-
How does context window size affect reasoning?
-
Why do longer prompts sometimes degrade output quality?
-
What is attention collapse?
-
How does KV-cache improve inference speed?
-
What is speculative decoding?
-
What is logit biasing?
Training & Optimization
-
What is gradient checkpointing?
-
Why mixed-precision training (FP16/BF16) is used?
-
What is instruction overfitting?
-
What is dataset contamination?
-
What is alignment tax?
-
What is model distillation?
-
What is continual learning in LLMs?
-
What are synthetic datasets in GenAI?
-
How do you detect memorization in LLMs?
-
What is data deduplication and why is it critical?
📚 RAG – Expert Level Questions
Retrieval Quality
-
What causes retrieval drift?
-
What is embedding space collapse?
-
How do domain-specific embeddings outperform general ones?
-
What is dense vs sparse retrieval?
-
What is BM25 and why still relevant?
-
What is score normalization in hybrid search?
-
How do you handle contradictory retrieved documents?
-
What is passage-level vs document-level retrieval?
-
How do you rank retrieved chunks for reasoning?
-
How do you handle irrelevant but high-similarity chunks?
Chunking & Knowledge Engineering
-
How do you chunk code repositories?
-
How do you chunk legal contracts?
-
Sentence-based vs paragraph-based chunking.
-
What is adaptive chunking?
-
How do you handle metadata-aware retrieval?
-
How do you store citations in RAG?
-
What is context window budgeting?
-
What is dynamic context injection?
-
How do you merge overlapping chunks?
-
What is hierarchical RAG?
🤖 Agentic RAG & Tooling
-
What is agentic RAG?
-
Planner–Executor architecture in GenAI.
-
How do agents decide when to retrieve?
-
What is tool hallucination?
-
How do you constrain agent actions?
-
What is ReAct prompting?
-
What is memory in LLM agents?
-
Short-term vs long-term memory in agents.
-
How do agents update knowledge stores?
-
Failure modes of autonomous agents.
🧪 Evaluation & Observability
-
How do you evaluate factual consistency?
-
What is answer faithfulness vs relevance?
-
What is context precision and recall?
-
What is RAGAS?
-
Offline vs online evaluation of RAG.
-
How do you A/B test RAG pipelines?
-
How do you detect silent failures?
-
What is human-in-the-loop evaluation?
-
How do you log embeddings safely?
-
How do you monitor drift in production RAG?
⚙️ Performance, Scaling & Cost
-
How do you reduce embedding computation cost?
-
Cold start vs warm start in RAG.
-
How do you shard vector databases?
-
What is query fan-out?
-
How do you reduce token usage?
-
How do you compress retrieved context?
-
What is late interaction retrieval?
-
What is streaming generation?
-
CPU vs GPU trade-offs in RAG.
-
Cost optimization strategies for GenAI apps.
🔐 Security, Privacy & Compliance (Enterprise Focus)
-
How do you prevent prompt injection via retrieved docs?
-
What is retrieval poisoning?
-
How do you sanitize documents before indexing?
-
How do you enforce row-level security in RAG?
-
How do you handle PII in embeddings?
-
Can embeddings leak sensitive data?
-
What is differential privacy in LLMs?
-
What is red-teaming in GenAI?
-
How do you implement audit trails?
-
Compliance challenges (GDPR, HIPAA) in RAG.
🏗️ System Design – Hard Interview Questions
-
Design a RAG system for 1B documents.
-
Design a low-latency RAG chatbot (<300ms).
-
Design a multi-tenant RAG SaaS platform.
-
How would you design offline-first RAG?
-
How would you version knowledge bases?
-
How do you roll back faulty embeddings?
-
How do you handle schema evolution?
-
How do you test RAG pipelines automatically?
-
How do you migrate vector DBs?
-
How do you support real-time document ingestion?
🧠 Research-Oriented / Future GenAI
-
What is long-context reasoning failure?
-
Can RAG replace fine-tuning completely?
-
What are memory-augmented transformers?
-
What is retrieval-free reasoning?
-
How do LLMs reason without retrieval?
-
What is neuro-symbolic RAG?
-
What is graph-based RAG?
-
How can knowledge graphs enhance RAG?
-
What are foundation model limitations?
-
Where is GenAI heading post-2026?
🧠 Ultra-Advanced Generative AI
Reasoning & Cognition
-
What is the difference between reasoning and pattern completion in LLMs?
-
Why do LLMs fail at multi-step logical consistency?
-
What is reasoning collapse?
-
How does chain-of-thought differ from latent reasoning?
-
What is tree-of-thought prompting?
-
What is graph-of-thought reasoning?
-
How do LLMs approximate symbolic reasoning?
-
What are the limits of in-context learning?
-
Why does reasoning degrade with longer contexts?
-
Can LLMs truly generalize beyond training data?
Memory & Context
-
What is external memory in GenAI?
-
Short-term vs persistent memory in LLM systems.
-
How do memory retrieval strategies differ from RAG?
-
What is episodic memory in AI agents?
-
How do you prevent memory poisoning?
-
How do you age or forget memory safely?
-
What is memory compression?
-
How do you summarize without information loss?
-
What is selective recall?
-
Memory vs fine-tuning trade-offs.
📚 RAG – Cutting-Edge Architectures
Advanced Retrieval
-
What is late fusion vs early fusion retrieval?
-
What is ColBERT and late interaction?
-
How does cross-attention reranking improve relevance?
-
What is retrieval-time reasoning?
-
What is query decomposition?
-
What is sub-question retrieval?
-
How do you support reasoning across documents?
-
What is evidence aggregation?
-
How do you detect missing evidence?
-
What is retrieval abstention?
Knowledge Representation
-
Unstructured RAG vs structured RAG.
-
How do knowledge graphs integrate with RAG?
-
What is schema-aware retrieval?
-
What is entity-centric chunking?
-
How do you resolve entity ambiguity?
-
How do you handle temporal knowledge?
-
How do you manage conflicting facts over time?
-
What is provenance tracking?
-
How do you ensure citation faithfulness?
-
What is trust-aware RAG?
🤖 Agentic Systems – Hard Questions
-
What is multi-agent collaboration?
-
When should you use agents vs pipelines?
-
How do agents negotiate task ownership?
-
What is agent orchestration?
-
How do agents share memory?
-
What is tool planning vs tool execution?
-
How do agents recover from tool failure?
-
What is reflection in agents?
-
What is self-critique?
-
How do you prevent infinite agent loops?
🧪 Evaluation – Research & Industry
-
What is causal evaluation in GenAI?
-
How do you measure reasoning quality?
-
What is evidence sufficiency?
-
How do you score partial correctness?
-
What is contradiction detection?
-
How do you evaluate uncertainty calibration?
-
What is abstention-aware evaluation?
-
How do you benchmark domain-specific RAG?
-
Why automated evaluation often fails?
-
What is adversarial evaluation?
⚙️ Systems, Performance & Infrastructure
-
How do you design GenAI systems for low memory devices?
-
What is edge-based RAG?
-
How do you optimize KV-cache memory?
-
What is flash attention?
-
What is paged attention?
-
How do you handle GPU memory fragmentation?
-
What is inference batching?
-
Throughput vs latency trade-offs.
-
What is backpressure in GenAI systems?
-
What are queue-based RAG architectures?
🔐 Security, Safety & Robustness
-
What is model extraction risk?
-
How do you defend against data exfiltration?
-
What is indirect prompt injection?
-
How do you sandbox tools safely?
-
What is content provenance?
-
How do you watermark LLM outputs?
-
What is jailbreak detection?
-
How do you enforce safe completions?
-
What is policy-based generation?
-
How do you handle malicious retrieval results?
🧠 Research Frontiers & 2026+ Topics
-
What are retrieval-augmented transformers?
-
What is end-to-end differentiable RAG?
-
What is neural search + LLM convergence?
-
Can RAG be trained jointly with LLMs?
-
What is memory-augmented reasoning?
-
What is world-model learning?
-
Can LLMs develop internal knowledge graphs?
-
What is self-supervised reasoning?
-
What are the limits of scaling?
-
Will GenAI replace symbolic AI?
🏗️ Brutal System Design (Principal-Level)
-
Design a fault-tolerant GenAI platform.
-
Design a RAG system with strict SLAs.
-
Design a regulated GenAI system (banking/healthcare).
-
How do you guarantee answer traceability?
-
How do you handle legal liability of LLM outputs?
-
How do you design GenAI rollback mechanisms?
-
How do you run chaos testing on RAG?
-
How do you handle partial system failures?
-
How do you build explainable GenAI systems?
-
What would you change in today’s RAG architectures?