Research - Jongwon Lim

arXiv preprint, 2026

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

Yunho Choi*, Jongwon Lim*, Woojin Ahn, Minjae Oh, Jeonghoon Shim, Yohan Jo

POISE estimates RLVR baselines from the actor's internal hidden states and entropy statistics, reducing rollout overhead while matching DAPO-level performance.

arXiv PDF Project page * Equal contribution

Overview of POISE cross-rollout value estimation — Cross-rollout value estimation from actor-internal states.

ICML 2026 Regular Paper · Mech Interp Workshop @ NeurIPS 2025

Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in Large Language Models

Jongwook Han*, Jongwon Lim*, Injin Kong, Yohan Jo

A mechanistic analysis for understanding how language models internally represent and express values under intrinsic and prompted settings.

arXiv PDF Project page OpenReview * Equal contribution

Extraction pipeline for intrinsic and prompted value vectors — Overview of the extraction pipeline for intrinsic and prompted value vectors.

ACL Findings 2026

Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction

Sejun Park, Yoonah Park, Jongwon Lim, Yohan Jo

A context-aware user profiling framework that retrieves persuasion-relevant history and generates user profiles for a practical NLP task: persuasiveness prediction.

arXiv PDF

Overview of the retrieval, profiling, prediction, and training pipeline.

FEVER Workshop @ EMNLP 2024

DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine

Jean Seo, Jongwon Lim, Dongjun Jang, Hyopil Shin

A biomedical benchmark and automated evaluation pipeline for factuality assessment in long-form LLM outputs.

arXiv PDF Code

Publications

Publications

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in Large Language Models

Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction

DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine