Research papers, book chapters, talks, and works in progress
Presented at the UMBC AI Symposium, this talk explains how academic LLM agents can become more trustworthy through interpretable retrieval, rationale-driven evidence selection, and citation-aware response generation. It connects IMRNNs, METEORA, and citation paradigms such as G-Cite and P-Cite into a broader pipeline for transparent, evidence-grounded academic assistance.
Watch VideoPresented at Bloomberg Law's AI Symposium, this talk focused on retrieval-augmented legal intelligence through rationalized selection and refinement. The presentation highlighted how legal RAG systems can use interpretable evidence selection to improve contextual relevance, citation quality, and trust in high-stakes legal workflows.
Event ArticleA selection of my research papers with concise summaries and keywords. Use Read more for details; linked titles open public versions when available.
Authors: Yash Saxena, Ankur Padia, Mandar Chaudhary, Kalpa Gunaratna, Srinivasan Parthasarathy, and Manas Gaur
ICML 2026. This paper introduces METEORA, a rationale-driven evidence selection framework for retrieval-augmented generation. Instead of relying on fixed top-k retrieval followed by opaque re-ranking, METEORA uses generated rationales to select evidence adaptively, improving interpretability, robustness, and evidence quality in sensitive domains such as law, finance, and healthcare.
RAG evidence selection sensitive domains
Authors: Yash Saxena, Ankur Padia, Kalpa Gunaratna, and Manas Gaur
Findings of EACL 2026. This paper introduces Interpretable Modular Retrieval Neural Networks, a lightweight adapter-based framework that dynamically modulates query and document embeddings while keeping the base retriever frozen. The goal is to improve dense retrieval while making retrieval behavior easier to inspect through semantic directions that explain what the model emphasizes or suppresses for a query.
interpretable retrieval embedding modulation dense retrieval
Authors: Yash Saxena and Manas Gaur
IEEE Intelligent Systems. This perspective article argues that retrievers should not remain opaque components inside RAG pipelines. It outlines neurosymbolic retrieval mechanisms that combine neural ranking with symbolic structures such as knowledge graphs, process knowledge, and path-based query enrichment to make document selection more interpretable and auditable.
neurosymbolic RAG knowledge graphs interpretable retrieval
Authors: Deepa Tilwani, Yash Saxena, Ankur Padia, Srinivasan Parthasarathy, and Manas Gaur
In Neurosymbolic AI: Foundations and Applications, Wiley, 2026. This chapter studies how neurosymbolic AI can strengthen Legal AI-TRISM by combining LLMs with knowledge graphs, legal rules, and structured reasoning. It focuses on trustworthy legal AI systems that need reliability, interpretability, safety, and grounded retrieval when operating over legal documents and domain-specific constraints.
legal AI AI-TRISM neurosymbolic systems
Authors: Yash Saxena, Raviteja Bommireddy, Ankur Padia, and Manas Gaur
NeurIPS LLM Evaluation Workshop 2025. This work evaluates how LLMs cite evidence under two attribution paradigms: generating citations while producing an answer and attaching citations after an answer is drafted. The study shows that attribution quality is strongly shaped by retrieval quality, and it analyzes the tradeoff between citation coverage, correctness, and faithfulness.
LLM attribution citations evaluation
Authors: Seyedreza Mohseni, Seyedali Mohammadi, Deepa Tilwani, Yash Saxena, and collaborators
AAAI 2025. This paper investigates whether LLMs can generate obfuscated assembly code and introduces MetamorphASM, a benchmark for studying LLM-driven code obfuscation. The results highlight an important security concern: models can produce transformations that make malicious code harder to detect by traditional static analysis tools.
code obfuscation LLM security malware analysis
Authors: Yash Saxena, Sarthak Chopra, and Arunendra Mani Tripathi
2nd International Conference on Data Science and Information Systems 2024. This study evaluates whether large language models provide both correct answers and consistent explanations. Using question answering benchmarks and explanation-oriented metrics, it shows that higher answer accuracy does not automatically imply stable reasoning behavior.
LLM consistency reasoning evaluation
Authors: Yash Saxena, Aman Kumar, Daksh Arora, and Runumi Devi
IEEE IC3I 2023. This work presents a machine learning pipeline for identifying student mental health signals from social media text. It combines sentiment analysis with models such as RNNs, GRUs, and SVMs to classify emotion patterns and support earlier identification of mental health concerns in academic communities.
mental health sentiment analysis emotion classification
This work studies attribution in scientific literature, where generated claims must be connected to precise, human-verifiable sources. It focuses on benchmark design and evaluation methods for measuring whether language models cite the right evidence, cover the necessary sources, and avoid unsupported scholarly claims.
source attribution scientific literature citation evaluation