last updated : 2026/01/30
seokhee.hong@lgresearch.ai
| hongcheki@gmail.com
| CV
| Google Scholar
| Semantic Scholar
| GitHub
Hi, I am an AI Scientist working on foundation large language models (LLMs) at EXAONE Lab, LG AI Research.
I contribute to the development of the EXAONE foundation model series, which is publicly released on HuggingFace
and spans model scales from 1.2B to 236B parameters.
My work focuses on LLM post-training, evaluation, and benchmark development, with hands-on experience
contributing to multiple EXAONE model release cycles (3.0 → 3.5 → Deep → 4.0 → K-EXAONE).
Work Experiences
-
EXAONE Lab, LG AI Research (Nov. 2023 - Present)
- AI Scientist
I contribute to the post-training and evaluation of the EXAONE foundation LLM series,
which is publicly released on HuggingFace and used by both internal and external users.
My work spans multiple model release cycles and focuses on synthetic data generation,
dataset curation, and evaluation pipelines that influence model iteration and release decisions.
Education
- M.S. in Computer Science and Engineering, Seoul National University (Mar. 2021 - Aug. 2023)
- Advisor: Prof. Gunhee Kim
- B.S. in Computer Science, Yonsei University (Mar. 2014 - Feb.2021)
Publications
-
K-EXAONE Technical Report
LG AI Research (participated as a core contributor)
arxiv preprint
[PDF] / [Model]
-
From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation
Seokhee Hong*, Sunkyoung Kim*, Guijin Son, Soyeon Kim, Yeonjung Hong, Jinsik Lee
EMNLP 2025 (Findings)
[PDF] / [Dataset]
-
MANTA: A Scalable Pipeline for Transmuting Massive Web Corpora into Instruction Datasets
Heuiyeen Yeen, Seokhee Hong, Hyeongu Yun, Jinsik Lee
EMNLP 2025 (Findings)
[Dataset]
-
EXAONE Deep: Reasoning Enhanced Language Models
LG AI Research (participated as a core contributor)
arxiv preprint
[PDF] / [Model]
-
Who Wrote this Code? Watermarking for Code Generation
Taehyun Lee*, Seokhee Hong*, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, Gunhee Kim
ACL 2024
[PDF] / [Code]
-
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Hwaran Lee*, Seokhee Hong*, Joonsuk Park, Takyoung Kim, Meeyoung Cha, Yejin Choi, Byoung Pil Kim, Gunhee Kim, Eun-Ju Lee, Yong Lim, Alice Oh, Sangchul Park, Jung-Woo Ha
ACL 2023 (Oral; Best Paper Nomination)
[PDF] / [Dataset & Code]
-
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Model Application
Hwaran Lee*, Seokhee Hong*, Joonsuk Park, Takyoung Kim, Gunhee Kim, Jung-Woo Ha
ACL 2023 (Industry Track)
[PDF] / [Dataset & Code]
-
How Robust are Fact Checking Systems on Colloquial Claims?
Byeongchang Kim*, Hyunwoo Kim*, Seokhee Hong, Gunhee Kim
NAACL 2021
[PDF] / [Dataset & Code]