Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning

Zhou, Jianyi; Wei, Yujie; Zhen, Ruichen; Zhao, Bo; Xia, Xiaobo; Shao, Rui; Su, Xiu; Yang, Shuo

Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning

Jianyi Zhou¹, Yujie Wei², Ruichen Zhen³, Bo Zhao⁴, Xiaobo Xia⁵, Rui Shao¹, Xiu Su⁶, Shuo Yang^1✉

¹Harbin Institute of Technology, Shenzhen    ²Harbin Institute of Technology
³Meituan Academy of Robotics    ⁴Shanghai Jiaotong University
⁵National University of Singapore    ⁶Central South University

^✉Corresponding author

Code (Coming Soon) arXiv

INFUSE is the first backdoor attack framework that targets fine-tune-insensitive modules in VLA base models, ensuring persistent malicious behavior even after extensive user-side fine-tuning on clean data.

Abstract

Vision-Language-Action (VLA) models have become foundational to modern embodied AI systems. By integrating visual perception, language understanding, and action planning, they enable general-purpose task execution across diverse environments. Despite their importance, the security of VLA models remains underexplored—particularly in the context of backdoor attacks, which pose realistic threats in physical-world deployments. While recent methods attempt to inject backdoors into VLA models, these backdoors are easily erased during downstream adaptation, as user-side fine-tuning with clean data significantly alters model parameters, rendering them impractical for real-world applications. To address these challenges, we propose INFUSE (INjection into Fine-tUne-inSensitive modulEs), the first backdoor attack framework for VLA base models that remains effective even with arbitrary user fine-tuning. INFUSE begins by analyzing parameter sensitivity across diverse fine-tuning scenarios to identify modules that remain stable (fine-tune-insensitive) and suitable for persistent backdoor injection. It then injects backdoors into these stable modules while freezing the rest, ensuring malicious behavior persists after extensive user fine-tuning. Comprehensive experiments across multiple VLA architectures demonstrate INFUSE's effectiveness. After user-side fine-tuning, INFUSE maintains mean attack success rates of 91.0% on simulation environments and 79.8% on real-world robot tasks, substantially surpassing BadVLA (38.8% and 36.6%, respectively), while preserving clean-task performance comparable to standard models.

Key Results

Attack Persistence: INFUSE maintains high ASR (>90%) after clean fine-tuning, while baseline methods drop dramatically.

Module Sensitivity Analysis: Vision backbone and LLM backbone show 100-1000x smaller parameter changes than action head.

Attention Persistence: INFUSE maintains strong attention to trigger regions after fine-tuning, while baselines lose focus.

Trajectory Analysis: INFUSE successfully triggers malicious behaviors in diverse simulation environments.

Method Overview

INFUSE consists of three key stages:

Fine-tune-Insensitive Module Identification: We analyze parameter changes after fine-tuning the base VLA model on multiple clean environments to identify modules that remain stable (fine-tune-insensitive) and suitable for persistent backdoor injection.
Selective Backdoor Injection: We construct a poisoned dataset with realistic object-based triggers (e.g., a blue mug) and malicious target actions, then selectively fine-tune only the fine-tune-insensitive modules while freezing the sensitive ones, producing a poisoned base VLA model.
User-side Fine-tuning: We simulate realistic user adaptation by fine-tuning the poisoned base model with clean datasets from different environments, demonstrating that the injected backdoor remains effective even after user-side customization.

Our key insight is that certain modules (vision backbone, vision projector, LLM backbone) undergo 100-1000x smaller parameter updates during fine-tuning compared to sensitive modules (action head, proprio projector), making them ideal targets for persistent backdoor injection.

Real-world Robot Experiments

INFUSE demonstrates strong effectiveness on real-world robot tasks. After user-side fine-tuning on clean data, our method achieves 79.8% attack success rate on physical robot manipulation tasks, substantially outperforming BadVLA (36.6%). The backdoor persists across different real-world environments and task variations.

Key Contributions

First persistent backdoor attack on base VLA models: Unlike prior methods that inject backdoors during downstream adaptation, our attack is conducted at the pre-distribution stage, enabling persistent threats where the attacker has no access to user data.
Novel selective injection framework: We leverage parameter stability analysis to identify fine-tune-insensitive modules and inject backdoors exclusively into these components, ensuring the backdoor survives user fine-tuning on clean data.
Comprehensive evaluation: INFUSE achieves average ASRs of 95.3% on LIBERO, 91.7% on SimplerEnv, and 79.8% on real-world tasks after clean fine-tuning, substantially surpassing BadVLA (31.7%, 39.4%, and 36.6%), while maintaining clean-task performance (95.0%) comparable to standard models (96.4%).

BibTeX


        @misc{zhou2026injectsurvivelaterbackdooring,
      title={Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning}, 
      author={Jianyi Zhou and Yujie Wei and Ruichen Zhen and Bo Zhao and Xiaobo Xia and Rui Shao and Xiu Su and Shuo Yang},
      year={2026},
      eprint={2602.00500},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2602.00500}, 
}

More Works from Our Lab

INFUSE: Backdooring VLA Models

Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning

INFUSE is the first backdoor attack framework that targets fine-tune-insensitive modules in VLA base models, ensuring persistent malicious behavior even after extensive user-side fine-tuning on clean data.

Abstract

Key Results

Attack Persistence: INFUSE maintains high ASR (>90%) after clean fine-tuning, while baseline methods drop dramatically.

Module Sensitivity Analysis: Vision backbone and LLM backbone show 100-1000x smaller parameter changes than action head.

Attention Persistence: INFUSE maintains strong attention to trigger regions after fine-tuning, while baselines lose focus.

Trajectory Analysis: INFUSE successfully triggers malicious behaviors in diverse simulation environments.

Method Overview

Real-world Robot Experiments

Key Contributions

BibTeX