Research

Publications

Publications

arXiv preprint, 2026

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

Yunho Choi*, Jongwon Lim*, Woojin Ahn, Minjae Oh, Jeonghoon Shim, Yohan Jo

POISE estimates RLVR baselines from the actor's internal hidden states and entropy statistics, reducing rollout overhead while matching DAPO-level performance.

Overview of POISE cross-rollout value estimation
Cross-rollout value estimation from actor-internal states.

ICML 2026 Regular Paper · Mech Interp Workshop @ NeurIPS 2025

Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in Large Language Models

Jongwook Han*, Jongwon Lim*, Injin Kong, Yohan Jo

A mechanistic analysis for understanding how language models internally represent and express values under intrinsic and prompted settings.

Extraction pipeline for intrinsic and prompted value vectors
Overview of the extraction pipeline for intrinsic and prompted value vectors.

ACL Findings 2026

Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction

Sejun Park, Yoonah Park, Jongwon Lim, Yohan Jo

A context-aware user profiling framework that retrieves persuasion-relevant history and generates user profiles for a practical NLP task: persuasiveness prediction.

Overview of the retrieval, profiling, prediction, and training pipeline for personalized persuasiveness prediction
Overview of the retrieval, profiling, prediction, and training pipeline.

FEVER Workshop @ EMNLP 2024

DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine

Jean Seo, Jongwon Lim, Dongjun Jang, Hyopil Shin

A biomedical benchmark and automated evaluation pipeline for factuality assessment in long-form LLM outputs.