Yihe Fan

Ph.D. Student at Fudan University

I am a Ph.D. student at Fudan University, advised by Prof. Min Yang and Prof. Xudong Pan. I received my undergraduate degree from Tongji University.

My research interests include frontier AI safety, self-evolving agents, and cybersecurity agents. Recently, I focus on how to train better cybersecurity agents that can learn from experience, reuse skills across tasks, and improve reliably in realistic cyber environments.

Email: 25113050213 [AT] m.fudan.edu.cn

Email / Google Scholar
Blog / CV

Research Interests

Frontier AI safety: evaluation awareness, observer effects, and safety behavior in advanced models.
Self-evolving agents: agents that improve from failures, interaction traces, and reusable skill updates.
Cybersecurity agents: training stronger cyber agents for realistic security tasks and benchmarks.

Selected Publications

^† Co-first authors / equal contribution.

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

Yihe Fan, Changyi Li, Lichen Xu, Xudong Pan, Jiarun Dai, Hong Geng, Min Yang

Preprint, 2026

CyberEvolver studies on-policy self-evolution for cybersecurity agents, enabling agents to diagnose failures, refine skills, and improve across benchmarked cyber tasks.

FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding

Jinghan Yang^†, Yihe Fan^†, Xudong Pan, Min Yang

Preprint, 2026

FlowGuard detects unsafe visual content during diffusion generation by using lightweight latent decoding and curriculum learning across diffusion backbones.

Evaluation Faking memory experiment figure

Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems

Yihe Fan^†, Wenqi Zhang^†, Xudong Pan, Min Yang

Preprint, 2026

This work studies how evaluation awareness changes measured safety behavior, including reasoning-based recognition, memory-amplified refusals, and model scale effects.

Self-replication scenarios in LLM-powered AI systems

Large Language Model-Powered AI Systems Achieve Self-Replication with No Human Intervention

Xudong Pan^†, Jiarun Dai^†, Yihe Fan, Minyuan Luo, Changyi Li, Min Yang

arXiv, 2025

A study of self-replication capability in LLM-powered systems and the governance implications of increasingly autonomous frontier AI agents.

Frontier AI Systems Have Surpassed the Self-Replicating Red Line

Xudong Pan, Jiarun Dai, Yihe Fan, Min Yang

arXiv, 2024

This paper studies no-human-assistance self-replication in frontier AI systems and analyzes shutdown avoidance and replica-chain survival behaviors.

Comparison table for multimodal attack types

Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security

Yihe Fan, Yuxin Cao, Ziyu Zhao, Ziyao Liu, Shaofeng Li

IEEE International Conference on Systems, Man, and Cybernetics, 2024

A survey of security risks introduced by image inputs in multimodal large language models, covering threat models, attack surfaces, and defenses.

Education

2025 - 2030, Ph.D. student, Fudan University, Shanghai, China.
2021 - 2025, Undergraduate student, Tongji University, Shanghai, China.