Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning
Code (Coming Soon) arXivAbstract
Vision-Language-Action (VLA) models have become foundational to modern embodied AI systems. By integrating visual perception, language understanding, and action planning, they enable general-purpose task execution across diverse environments. Despite their importance, the security of VLA models remains underexplored—particularly in the context of backdoor attacks, which pose realistic threats in physical-world deployments. While recent methods attempt to inject backdoors into VLA models, these backdoors are easily erased during downstream adaptation, as user-side fine-tuning with clean data significantly alters model parameters, rendering them impractical for real-world applications. To address these challenges, we propose INFUSE (INjection into Fine-tUne-inSensitive modulEs), the first backdoor attack framework for VLA base models that remains effective even with arbitrary user fine-tuning. INFUSE begins by analyzing parameter sensitivity across diverse fine-tuning scenarios to identify modules that remain stable (fine-tune-insensitive) and suitable for persistent backdoor injection. It then injects backdoors into these stable modules while freezing the rest, ensuring malicious behavior persists after extensive user fine-tuning. Comprehensive experiments across multiple VLA architectures demonstrate INFUSE's effectiveness. After user-side fine-tuning, INFUSE maintains mean attack success rates of 91.0% on simulation environments and 79.8% on real-world robot tasks, substantially surpassing BadVLA (38.8% and 36.6%, respectively), while preserving clean-task performance comparable to standard models.
Key Results
Attack Persistence: INFUSE maintains high ASR (>90%) after clean fine-tuning, while baseline methods drop dramatically.
Module Sensitivity Analysis: Vision backbone and LLM backbone show 100-1000x smaller parameter changes than action head.
Attention Persistence: INFUSE maintains strong attention to trigger regions after fine-tuning, while baselines lose focus.
Trajectory Analysis: INFUSE successfully triggers malicious behaviors in diverse simulation environments.
Method Overview
INFUSE consists of three key stages:
- Fine-tune-Insensitive Module Identification: We analyze parameter changes after fine-tuning the base VLA model on multiple clean environments to identify modules that remain stable (fine-tune-insensitive) and suitable for persistent backdoor injection.
- Selective Backdoor Injection: We construct a poisoned dataset with realistic object-based triggers (e.g., a blue mug) and malicious target actions, then selectively fine-tune only the fine-tune-insensitive modules while freezing the sensitive ones, producing a poisoned base VLA model.
- User-side Fine-tuning: We simulate realistic user adaptation by fine-tuning the poisoned base model with clean datasets from different environments, demonstrating that the injected backdoor remains effective even after user-side customization.
Our key insight is that certain modules (vision backbone, vision projector, LLM backbone) undergo 100-1000x smaller parameter updates during fine-tuning compared to sensitive modules (action head, proprio projector), making them ideal targets for persistent backdoor injection.
Real-world Robot Experiments
INFUSE demonstrates strong effectiveness on real-world robot tasks. After user-side fine-tuning on clean data, our method achieves 79.8% attack success rate on physical robot manipulation tasks, substantially outperforming BadVLA (36.6%). The backdoor persists across different real-world environments and task variations.
Key Contributions
- First persistent backdoor attack on base VLA models: Unlike prior methods that inject backdoors during downstream adaptation, our attack is conducted at the pre-distribution stage, enabling persistent threats where the attacker has no access to user data.
- Novel selective injection framework: We leverage parameter stability analysis to identify fine-tune-insensitive modules and inject backdoors exclusively into these components, ensuring the backdoor survives user fine-tuning on clean data.
- Comprehensive evaluation: INFUSE achieves average ASRs of 95.3% on LIBERO, 91.7% on SimplerEnv, and 79.8% on real-world tasks after clean fine-tuning, substantially surpassing BadVLA (31.7%, 39.4%, and 36.6%), while maintaining clean-task performance (95.0%) comparable to standard models (96.4%).
BibTeX
@misc{zhou2026injectsurvivelaterbackdooring,
title={Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning},
author={Jianyi Zhou and Yujie Wei and Ruichen Zhen and Bo Zhao and Xiaobo Xia and Rui Shao and Xiu Su and Shuo Yang},
year={2026},
eprint={2602.00500},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2602.00500},
}