Poster Session 2 (5:00pm-6:00pm)
AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors
Dynaseal: A Backend-Controlled LLM API Key Distribution Scheme with Constrained Invocation Parameters
How Does Entropy Influence Modern Text-to-SQL Systems?
Language Models Use Trigonometry to Do Addition
Black-Box Adversarial Attacks on LLM-Based Code Completion
Learning Automata from Demonstrations, Examples, and Natural Language
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Privately Learning from Graphs with Applications in Fine-tuning Large Pretrained Models
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Building Bridges, Not Walls: Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
ToolScan: A Benchmark For Characterizing Errors In Tool-Use LLMs
Model Evaluations Need Rigorous and Transparent Human Baselines
Automated Feature Labeling with Token-Space Gradient Descent
Automated Capability Discovery via Model Self-Exploration
ExpProof : Operationalizing Explanations for Confidential Models with ZKPs
Boosting Adversarial Robustness of Vision-Language Pre-training Models against Multimodal Adversarial attacks
Evaluation of Large Language Models via Coupled Token Generation
Red Teaming for Trust: Evaluating Multicultural and Multilingual AI Systems in Asia-Pacific
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference
Why Do Multiagent Systems Fail?
Do Multilingual LLMs Think In English?
Monitoring LLM Agents for Sequentially Contextual Harm
BaxBench: Can LLMs Generate Correct and Secure Backends?
Integrated Gradients Provides Faithful Language Model Attributions for In-Context Learning
HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild
MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered
Disentangling Sequence Memorization and General Capability in Large Language Models
Unlearning Geo-Cultural Stereotypes in Multilingual LLMs
On the Role of Prompt Multiplicity in LLM Hallucination Evaluation
The Fundamental Limits of LLM Unlearning: Complexity-Theoretic Barriers and Provably Optimal Protocols
An Empirical Study on Prompt Compression for Large Language Models
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information
Temporally Sparse Attack for Fooling Large Language Models in Time Series Forecasting
Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs?
Endive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models
Enhancing CBMs Through Binary Distillation with Applications to Test-Time Intervention
Understanding (Un)Reliability of Steering Vectors in Language Models
Towards Understanding Distilled Reasoning Models: A Representational Approach
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Conformal Structured Prediction
LLM Neurosurgeon: Targeted Knowledge Removal in LLMs using Sparse Autoencoders
TEMPEST: Multi-Turn Jailbreaking of Large Language Models with Tree Search
The Steganographic Potentials of Language Models
Unlocking Hierarchical Concept Discovery in Language Models Through Geometric Regularization
StochasTok: Improving Fine-Grained Subword Understanding in LLMs
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
ASIDE: Architectural Separation of Instructions and Data in Language Models
Measuring In-Context Computation Complexity via Hidden State Prediction