About Me
Hi! I am Weimin Lyu, a final year Ph.D. student in Computer Science at Stony Brook University, advised by Prof. Chao Chen. I am also very fortunate to collaborate with esteemed professors: Haibin Ling, Fusheng Wang, and Tengfei Ma.
Research Interests
My research focuses on model safety across a range of applications, including text-based problems (BERT variants, LLMs), image classification (CNNs, Vision Transformers, CLIP), and multimodal image-to-text generation with Vision-Language Models (BLIP-2, MiniGPT-4, LLaVA, InstructBLIP). I also specialize in explainability for clinical language models using Electronic Health Records.
News
- 2025-01: Three papers are accepted by ICLR 2025, including one first-authored paper: VLOOD!
- 2024-07: My first-authored paper, TrojVLM, is accepted by ECCV 2024! We investigate the vulnerabilities in the generative capabilities of Vision-Language Models, with a focus on image captioning and visual question answering (VQA) tasks.
- 2024-07: One paper is accepted by WACV 2025!
- 2024-06: My first-authored paper, BadCLM, is nominated as the Best Student Paper by AMIA 2024! We investigate the clinical language model’s vulnerabilities.
- 2024-03: One first-authored paper is accepted by NAACL 2024! We introduce a task-agnostic method for detecting textual backdoors, targeting a range of language models and traditional NLP tasks.
- 2023-10: My first-authored TAL is accepted by EMNLP 2023!
- 2023-03: Two papers are accepted by ICLR 2023 Workshop on BANDS!
- 2022-10: Paper “An Integrated LSTM-HeteroRGNN Model for Interpretable Opioid Overdose Risk Prediction” is accepted by Artificial Intelligence in Medicine!
- 2022-06: One first-authored paper is nominated as the Best Student Paper by AMIA 2022! We propose a multimodal transformer to fuse clinical notes and traditional EHR data for interpretable mortality prediction. AMIA is the world’s premier meeting for research and practice of biomedical and health informatics.
- 2022-04: One first-authored paper “A Study of the Attention Abnormality in Trojaned BERTs” is accepted by NAACL 2022!
- 2020-09: Start my Computer Science Ph.D. at Stony Brook University!
Industry Experience

Amazon, Seattle, USA (May 2024 - May 2025)
Applied Scientist Intern
- Focused on foundation model training, with a strong emphasis on numerical and text features.
- Developed the entire continuous pre-training and fine-tuning pipeline, supporting both small-scale and large-scale model training.
- Developed strategies to address multi-task real-world Amazon's user case.
- Production is scheduled to launch in Q2 2025, and I am responsible for developing the LLM classification algorithm and training model.
Applied Scientist Intern
- Focused on foundation model training, with a strong emphasis on numerical and text features.
- Developed the entire continuous pre-training and fine-tuning pipeline, supporting both small-scale and large-scale model training.
- Developed strategies to address multi-task real-world Amazon's user case.
- Production is scheduled to launch in Q2 2025, and I am responsible for developing the LLM classification algorithm and training model.
Selected Publications
Full publications can be found in Google Scholar.
Conference/Workshop/Journal

Representation Learning for Long Tail Recognition via Feature Space Re-Construction
Lingjie Yi, Jiachen Yao, Weimin Lyu, Haibin Ling, Raphael Douady, Chao Chen
The Thirteenth International Conference on Learning Representations (ICLR 2025)
Lingjie Yi, Jiachen Yao, Weimin Lyu, Haibin Ling, Raphael Douady, Chao Chen
The Thirteenth International Conference on Learning Representations (ICLR 2025)

ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Language
Yuxin Wang, Xiaomeng Zhu*, Weimin Lyu*, Saeed Hassanpour, Soroush Vosoughi
The Thirteenth International Conference on Learning Representations (ICLR 2025)(Spotlight)
[ICLR]
Yuxin Wang, Xiaomeng Zhu*, Weimin Lyu*, Saeed Hassanpour, Soroush Vosoughi
The Thirteenth International Conference on Learning Representations (ICLR 2025)(Spotlight)
[ICLR]

Backdooring Vision-Language Models with Out-Of-Distribution Data
Weimin Lyu, Jiachen Yao, Saumya Gupta, Lu Pang, Tao Sun, Lingjie Yi, Lijie Hu, Haibin Ling, Chao Chen
The Thirteenth International Conference on Learning Representations (ICLR 2025)
[ICLR]
Weimin Lyu, Jiachen Yao, Saumya Gupta, Lu Pang, Tao Sun, Lingjie Yi, Lijie Hu, Haibin Ling, Chao Chen
The Thirteenth International Conference on Learning Representations (ICLR 2025)
[ICLR]

PivotAlign: Improve Semi-Supervised Learning by Learning Intra-Class Heterogeneity and Aligning with Pivots
Lingjie Yi, Tao Sun, Yikai Zhang, Songzhu Zheng, Weimin Lyu, Haibin Ling, Chao Chen
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025)
Lingjie Yi, Tao Sun, Yikai Zhang, Songzhu Zheng, Weimin Lyu, Haibin Ling, Chao Chen
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025)

TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu, Lu Pang, Tengfei Ma, Haibin Ling, Chao Chen
The 18th European Conference on Computer Vision (ECCV 2024)
[ECCV]
Weimin Lyu, Lu Pang, Tengfei Ma, Haibin Ling, Chao Chen
The 18th European Conference on Computer Vision (ECCV 2024)
[ECCV]

BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records
Weimin Lyu, Zexin Bi, Fusheng Wang, Chao Chen
American Medical Informatics Association Annual Symposium (AMIA 2024) (Best Student Paper Nomination)
[arXiv]
Weimin Lyu, Zexin Bi, Fusheng Wang, Chao Chen
American Medical Informatics Association Annual Symposium (AMIA 2024) (Best Student Paper Nomination)
[arXiv]

Task-Agnostic Detector for Insertion-Based Backdoor Attacks
Weimin Lyu, Xiao Lin, Songzhu Zheng, Lu Pang, Haibin Ling, Susmit Jha, Chao Chen
The Findings of 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)
[NAACL]
Weimin Lyu, Xiao Lin, Songzhu Zheng, Lu Pang, Haibin Ling, Susmit Jha, Chao Chen
The Findings of 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)
[NAACL]

Attention-Enhancing Backdoor Attacks Against BERT-based Models
Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling, Chao Chen
The Findings of 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023) (A short version is accepted as Oral at ICLR 2023 Workshop on BANDS)
[EMNLP][Code]
Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling, Chao Chen
The Findings of 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023) (A short version is accepted as Oral at ICLR 2023 Workshop on BANDS)
[EMNLP][Code]

An Integrated LSTM-HeteroRGNN Model for Interpretable Opioid Overdose Risk Prediction
Xinyu Dong, Rachel Wong, Weimin Lyu, Kayley Abell-Hart, Janos G Hajagos, Richard N Rosenthal, Chao Chen, Fusheng Wang
Artificial Intelligence in Medicine (AIIM 2022)
[AIIM]
Xinyu Dong, Rachel Wong, Weimin Lyu, Kayley Abell-Hart, Janos G Hajagos, Richard N Rosenthal, Chao Chen, Fusheng Wang
Artificial Intelligence in Medicine (AIIM 2022)
[AIIM]

A Multimodal Transformer: Fusing Clinical Notes With Structured EHR Data for Interpretable In-Hospital Mortality Prediction
Weimin Lyu, Xinyu Dong, Rachel Wong, Songzhu Zheng , Kayley Abell-Hart, Fusheng Wang, Chao Chen
American Medical Informatics Association Annual Symposium (AMIA 2022) (Student Paper Finalist-Equal to Best Student Paper Nomination)
[AMIA][Code]
Weimin Lyu, Xinyu Dong, Rachel Wong, Songzhu Zheng , Kayley Abell-Hart, Fusheng Wang, Chao Chen
American Medical Informatics Association Annual Symposium (AMIA 2022) (Student Paper Finalist-Equal to Best Student Paper Nomination)
[AMIA][Code]