Publications & Presentations

Research papers, talks, and works in progress

Presentations & Talks

Building Trustworthy LLM Agents for Academia

Presented at the UMBC AI Symposium, this talk outlines the pillars of trust: interpretability, robustness and credibility. It covers interpretable retrieval (IMRNNs), rationale-driven reranking (METEORA) and citation paradigms like G-Cite and P-Cite. The talk demonstrates improved recall and precision, discusses trade-offs between coverage and correctness, and highlights a robust, transparent pipeline for academic research.

Watch Video 
RASOR: Contextual Legal Intelligence

At Bloomberg Law's AI Symposium on the future of legal technology, I presented RASOR: Contextual Legal Intelligence via Rationalized Selection and Refinement in RAG alongside collaborators. The symposium convened legal and AI experts to explore how retrieval-augmented systems can provide contextual legal intelligence using rationalized selection and refinement to improve citation quality in legal tasks.

Event Article 

Publications & Preprints

A selection of my research papers with summaries and keywords. Click the titles to read more.

EACL 2026
IMRNNs: Efficient Embedding Modulation

EACL 2026. Extends interpretable retrieval by introducing efficient embedding modulation techniques that produce token-level explanations while reducing computational overhead in dense retrieval.

interpretable retrieval efficient embedding dense retrieval

IEEE Intelligent Systems
Neurosymbolic Retrievers for Retrieval-augmented Generation

IEEE Intelligent Systems. Introduces a neurosymbolic retrieval framework that combines knowledge graphs with neural retrieval to make document selection more transparent. Proposes MAR (knowledge-modulated retrieval), KG-Path RAG (graph traversal based query enrichment), and process knowledge-infused reranking, with early gains in mental health risk assessment tasks.

neurosymbolic RAG knowledge graphs interpretable retrieval

NeurIPS 2025
Generation-Time vs. Post-hoc Citation

NeurIPS LLM Evaluation Workshop 2025. This paper compares two citation paradigms: Generation-Time Citation (G-Cite) and Post-hoc Citation (P-Cite) across multiple attribution datasets. It shows that retrieval quality drives attribution quality, that P-Cite offers higher coverage with competitive correctness, and recommends a retrieval-centric, P-Cite-first approach for high-stakes domains.

LLM attribution citations evaluation

AAAI 2025
Can LLMs Obfuscate Code?

AAAI 2025. This study asks whether LLMs can generate obfuscated assembly code and presents the MetamorphASM benchmark with a dataset of 328,200 obfuscated samples. By evaluating multiple LLMs across obfuscation techniques such as dead code, register substitution and control-flow change, the authors show that LLMs can produce obfuscated code, posing security risks for anti-virus tools.

code obfuscation LLM security malware

ICDSIS 2024
Evaluating Consistency & Reasoning of LLMs

2nd International Conference on Data Science & Information Systems 2024. This study evaluates the consistency and reasoning abilities of public and proprietary LLMs using the Boolq dataset. Models are assessed with metrics like BERT, BLEU and F-1 on generated explanations and answers, revealing that proprietary models outperform public ones yet none achieve high scores for both consistency and reasoning.

LLM consistency reasoning evaluation

IEEE IC3I 2023
Emotion-Based Mental Health Classifier

IEEE IC3I 2023. This work presents a machine learning pipeline to detect and classify the mental state of engineering students using social media text. It combines sentiment analysis with models such as RNN, GRU and SVM to identify emotions and support early detection of mental health issues.

mental health sentiment analysis emotion classification

Works Under Review & In Preparation

Under Review
Ranking Free RAG

Under review at ICML 2026. Proposes replacing re-ranking with a selection mechanism in retrieval-augmented generation, aiming to improve fairness and transparency in sensitive domains by selecting evidence based on rationales rather than top-k ranking.

RAG fairness sensitive domains

In Preparation
Attribution in Scientific Literature

In preparation. Introduces a benchmark and methods for evaluating source attribution in scientific literature, aiming to improve citation coverage and correctness in generative models.

source attribution benchmarking citations