Research papers, talks, and works in progress
Presented at the UMBC AI Symposium, this talk outlines the pillars of trust: interpretability, robustness and credibility. It covers interpretable retrieval (IMRNNs), rationale-driven reranking (METEORA) and citation paradigms like G-Cite and P-Cite. The talk demonstrates improved recall and precision, discusses trade-offs between coverage and correctness, and highlights a robust, transparent pipeline for academic research.
Watch VideoAt Bloomberg Law's AI Symposium on the future of legal technology, I presented RASOR: Contextual Legal Intelligence via Rationalized Selection and Refinement in RAG alongside collaborators. The symposium convened legal and AI experts to explore how retrieval-augmented systems can provide contextual legal intelligence using rationalized selection and refinement to improve citation quality in legal tasks.
Event ArticleA selection of my research papers with summaries and keywords. Click the titles to read more.
EACL 2026. Extends interpretable retrieval by introducing efficient embedding modulation techniques that produce token-level explanations while reducing computational overhead in dense retrieval.
interpretable retrieval efficient embedding dense retrieval
IEEE Intelligent Systems. Introduces a neurosymbolic retrieval framework that combines knowledge graphs with neural retrieval to make document selection more transparent. Proposes MAR (knowledge-modulated retrieval), KG-Path RAG (graph traversal based query enrichment), and process knowledge-infused reranking, with early gains in mental health risk assessment tasks.
neurosymbolic RAG knowledge graphs interpretable retrieval
NeurIPS LLM Evaluation Workshop 2025. This paper compares two citation paradigms: Generation-Time Citation (G-Cite) and Post-hoc Citation (P-Cite) across multiple attribution datasets. It shows that retrieval quality drives attribution quality, that P-Cite offers higher coverage with competitive correctness, and recommends a retrieval-centric, P-Cite-first approach for high-stakes domains.
LLM attribution citations evaluation
AAAI 2025. This study asks whether LLMs can generate obfuscated assembly code and presents the MetamorphASM benchmark with a dataset of 328,200 obfuscated samples. By evaluating multiple LLMs across obfuscation techniques such as dead code, register substitution and control-flow change, the authors show that LLMs can produce obfuscated code, posing security risks for anti-virus tools.
code obfuscation LLM security malware
2nd International Conference on Data Science & Information Systems 2024. This study evaluates the consistency and reasoning abilities of public and proprietary LLMs using the Boolq dataset. Models are assessed with metrics like BERT, BLEU and F-1 on generated explanations and answers, revealing that proprietary models outperform public ones yet none achieve high scores for both consistency and reasoning.
LLM consistency reasoning evaluation
IEEE IC3I 2023. This work presents a machine learning pipeline to detect and classify the mental state of engineering students using social media text. It combines sentiment analysis with models such as RNN, GRU and SVM to identify emotions and support early detection of mental health issues.
mental health sentiment analysis emotion classification
Under review at ICML 2026. Proposes replacing re-ranking with a selection mechanism in retrieval-augmented generation, aiming to improve fairness and transparency in sensitive domains by selecting evidence based on rationales rather than top-k ranking.
RAG fairness sensitive domains
In preparation. Introduces a benchmark and methods for evaluating source attribution in scientific literature, aiming to improve citation coverage and correctness in generative models.
source attribution benchmarking citations