Publications | Yugeng Liu

My profile at and .

2025

Preprint

Amplifying Machine Learning Attacks Through Strategic Compositions

Yugeng Liu, Zheng Li, Hai Huang, Michael Backes, Yang Zhang

Preprint. 2025.

Abstract:
Machine learning (ML) models are proving to be vulnerable to a variety of attacks that allow the adversary to learn sensitive information, cause mispredictions, and more. While these attacks have been extensively studied, current research predominantly focuses on analyzing each attack type individually. In practice, however, adversaries may employ multiple attack strategies simultaneously rather than relying on a single approach. This prompts a crucial yet underexplored question: When the adversary has multiple attacks at their disposal, are they able to mount or amplify the effect of one attack with another? In this paper, we take the first step in studying the strategic interactions among different attacks, which we define as attack compositions. Specifically, we focus on four well-studied attacks during the model’s inference phase: adversarial examples, attribute inference, membership inference, and property inference. To facilitate the study of their interactions, we propose a taxonomy based on three stages of the attack pipeline: preparation, execution, and evaluation. Using this taxonomy, we identify four effective attack compositions, such as property inference assisting attribute inference at its preparation level and adversarial examples assisting property inference at its execution level. We conduct extensive experiments on the attack compositions using three ML model architectures and three benchmark image datasets. Empirical results demonstrate the effectiveness of these four attack compositions. We implement and release a modular reusable toolkit, COAT. Arguably, our work serves as a call for researchers and practitioners to consider advanced adversarial settings involving multiple attack strategies, aiming to strengthen the security and robustness of AI systems.
Preprint

Watermarking LLM-Generated Datasets in Downstream Tasks

Yugeng Liu, Tianshuo Cong, Michael Backes, Zheng Li, Yang Zhang

Preprint. 2025.

Abstract:
Large Language Models (LLMs) have experienced rapid advancements, with applications spanning a wide range of fields, including sentiment classification, review generation, and question answering. Due to their efficiency and versatility, researchers and companies increasingly employ LLM-generated data to train their models. However, the inability to track content produced by LLMs poses a significant challenge, potentially leading to copyright infringement for the LLM owners. In this paper, we propose a method for injecting watermarks into LLM-generated datasets, enabling the tracking of downstream tasks to detect whether these datasets were produced using the original LLM. These downstream tasks can be divided into two categories. The first involves using the generated datasets at the input level, commonly for training classification tasks. The other is the output level, where model trainers use LLM-generated content as output for downstream tasks, such as question-answering tasks. We design a comprehensive set of experiments to evaluate both watermark methods. Our results indicate the high effectiveness of our watermark approach. Additionally, regarding model utility, we find that classifiers trained on the generated datasets achieve a test accuracy exceeding 0.900 in many cases, suggesting that the utility of such models remains robust. For the output-level watermark, we observe that the quality of the generated text is comparable to that produced using real-world datasets. Through our research, we aim to advance the protection of LLM copyrights, taking a significant step forward in safeguarding intellectual property in this domain.
ACL'25

JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs

Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang

Annual Meeting of the Association for Computational Linguistics (ACL). Vienna, Austria. July, 2025.

Abstract:
Misuse of the Large Language Models (LLMs) has raised widespread concern. To address this issue, safeguards have been taken to ensure that LLMs align with social ethics. However, recent findings have revealed an unsettling vulnerability bypassing the safeguards of LLMs, known as jailbreak attacks. By applying techniques, such as employing role-playing scenarios, adversarial examples, or subtle subversion of safety objectives as a prompt, LLMs can produce an inappropriate or even harmful response. While researchers have studied several categories of jailbreak attacks, they have done so in isolation. To fill this gap, we present the first large-scale measurement of various jailbreak attack methods. We concentrate on 13 cutting-edge jailbreak methods from four categories, 160 questions from 16 violation categories, and six popular LLMs. Our extensive experimental results demonstrate that the optimized jailbreak prompts consistently achieve the highest attack success rates, as well as exhibit robustness across different LLMs. Some jailbreak prompt datasets, available from the Internet, can also achieve high attack success rates on many LLMs, such as ChatGLM3, GPT-3.5, and PaLM2. Despite the claims from many organizations regarding the coverage of violation categories in their policies, the attack success rates from these categories remain high, indicating the challenges of effectively aligning LLM policies and the ability to counter jailbreak attacks. We also discuss the trade-off between the attack performance and efficiency, as well as show that the transferability of the jailbreak prompts is still viable, becoming an option for black-box models. Overall, our research highlights the necessity of evaluating different jailbreak methods. We hope our study can provide insights for future research on jailbreak attacks and serve as a benchmark tool for evaluating them for practitioners.
ICME'25

Neeko: Model Hijacking Attacks Against Generative Adversarial Networks

Junjie Chu, Yugeng Liu, Xinlei He, Michael Backes, Yang Zhang, Ahmed Salem

IEEE International Conference on Multimedia & Expo (ICME). Nantes, France. June, 2025.

Abstract:
Generative models have garnered significant interest in the realm of machine learning but are costly to produce and face growing regulatory constraints, requiring resource-heavy training and collaboration with various stakeholders, especially data providers. Such collaborative environments have given rise to a new threat known as model hijacking attacks. Adversaries can tamper with the training process to embed a hidden task, so that train/hijack high-end models at minimal costs or even sidestep regulations. In this paper, we extend the scope of model hijacking from classifiers to generative models by introducing the first model hijacking attack tailored for Generative Adversarial Networks (GANs), namely Neeko. Neeko is based on a novel U-Net-based Disguiser and allows a compromised GAN to generate authentic-looking images from its original distribution, but when downscaled, these images are visually changed to be from the hijacking dataset distribution. Through experiments on different image benchmark datasets, we demonstrate the efficacy and stealthiness of Neeko. Neeko poses security and accountability risks associated with training public GANs on potentially malicious or illegal datasets and raises concerns about evading those regulations addressing deepfakes and synthetic images.

2024

EMNLP'24

ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities

Yukun Jiang, Zheng Li, Xinyue Shen, Yugeng Liu, Michael Backes, Yang Zhang

Conference on Empirical Methods in Natural Language Processing (EMNLP). Miami, FL, USA. Nov, 2024.

Abstract:
Large vision-language models (LVLMs) have been rapidly developed and widely used in various fields, but the (potential) stereotypical bias in the model is largely unexplored. In this study, we present a pioneering measurement framework, ModSCAN, to SCAN the stereotypical bias within LVLMs from both vision and language Modalities. ModSCAN examines stereotypical biases with respect to two typical stereotypical attributes (gender and race) across three kinds of scenarios: occupations, descriptors, and persona traits. Our findings suggest that 1) the currently popular LVLMs show significant stereotype biases, with CogVLM emerging as the most biased model; 2) these stereotypical biases may stem from the inherent biases in the training dataset and pre-trained models; 3) the utilization of specific prompt prefixes (from both vision and language modalities) performs well in reducing stereotypical biases. We believe our work can serve as the foundation for understanding and addressing stereotypical bias in LVLMs.
ICWSM'24

Games and Beyond: Analyzing the Bullet Chats of Esports Livestreaming

Yukun Jiang, Xinyue Shen, Rui Wen, Zeyang Sha, Junjie Chu, Yugeng Liu, Michael Backes, Yang Zhang

International Conference on Web and Social Media (ICWSM). Buffalo, NY, USA. June, 2024.

Abstract:
Esports, short for electronic sports, is a form of competition using video games and has attracted more than 530 million audiences worldwide. To watch esports, people utilize online livestreaming platforms. Recently, a novel interaction method, namely ”bullet chats,” has been introduced on these platforms. Different from conventional comments, bullet chats are scrolling comments posted by audiences that are synchronized to the livestreaming timeline, enabling audiences to share and communicate their immediate perspectives. The real-time nature of bullet chats, therefore, brings a new perspective to esports analysis. In this paper, we conduct the first empirical study on the bullet chats for esports, focusing on one of the most popular video games, i.e., League of Legends (LoL). Specifically, we collect 21 million bullet chats of LoL from Jan. 2023 to Mar. 2023, across two mainstream platforms (Bilibili and Huya). By performing quantitative analysis, we reveal how the amount and toxicity of bullet chats distribute (and change) w.r.t. three aspects, i.e. the season, the team, and the match. Our findings show that teams with higher rankings tend to attract a greater quantity of bullet chats, and these chats are often characterized by a higher degree of toxicity. We then utilize topic modeling to identify the topics among bullet chats. Interestingly, we find that a considerable portion of topics (14% on Bilibili and 25% on Huya) discuss themes beyond the game, including genders, entertainment stars, non-esports athletes, and so on. Besides, by further modeling topics on toxic bullet chats, we find hateful speech targeting different social groups, ranging from professions, regions, etc. To the best of our knowledge, this work is the first measurement of bullet chats on esports games. We believe our study can shed light on esports research from the perspective of bullet chats.

2023

Preprint

Robustness Over Time: Understanding Adversarial Examples’ Effectiveness on
Longitudinal Versions of Large Language Models

Yugeng Liu*, Tianshuo Cong*, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang (* equal contribution)

Preprint. 2023.

Abstract:
Large Language Models (LLMs) undergo continuous updates to improve user experience. However, prior research on the security and safety implications of LLMs has primarily focused on their specific versions, overlooking the impact of successive LLM updates. This prompts the need for a holistic understanding of the risks in these different versions of LLMs. To fill this gap, in this paper, we conduct a longitudinal study to examine the adversarial robustness – specifically misclassification, jailbreak, and hallucination – of three prominent LLMs: GPT-3.5, GPT-4, and LLaMA. Our study reveals that LLM updates do not consistently improve adversarial robustness as expected. For instance, a later version of GPT-3.5 degrades regarding misclassification and hallucination despite its improved resilience against jailbreaks, and GPT-4 demonstrates (incrementally) higher robustness overall. Moreover, larger model sizes do not necessarily yield improved robustness. Specifically, larger LLaMA models do not uniformly exhibit improved robustness across all three aspects studied. Importantly, minor updates lacking substantial robustness improvements can exacerbate existing issues rather than resolve them. By providing a more nuanced understanding of LLM robustness over time, we hope our study can offer valuable insights for developers and users navigating model updates and informed decisions in model development and usage for LLM vendors.
Preprint

Watermarking Diffusion Model

Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang Zhang

Preprint. 2023.

Abstract:
The availability and accessibility of diffusion models (DMs) have significantly increased in recent years, making them a popular tool for analyzing and predicting the spread of information, behaviors, or phenomena through a population. Particularly, text-to-image diffusion models (e.g., DALL·E 2 and Latent Diffusion Models (LDMs) have gained significant attention in recent years for their ability to generate high-quality images and perform various image synthesis tasks. Despite their widespread adoption in many fields, DMs are often susceptible to various intellectual property violations. These can include not only copyright infringement but also more subtle forms of misappropriation, such as unauthorized use or modification of the model. Therefore, DM owners must be aware of these potential risks and take appropriate steps to protect their models. In this work, we are the first to protect the intellectual property of DMs. We propose a simple but effective watermarking scheme that injects the watermark into the DMs and can be verified by the pre-defined prompts. In particular, we propose two different watermarking methods, namely NAIVEWM and FIXWM. The NAIVEWM method injects the watermark into the LDMs and activates it using a prompt containing the watermark. On the other hand, the FIXWM is considered more advanced and stealthy compared to the NAIVEWM, as it can only activate the watermark when using a prompt containing a trigger in a fixed position. We conducted a rigorous evaluation of both approaches, demonstrating their effectiveness in watermark injection and verification with minimal impact on the LDM’s functionality.
NDSS'23

Backdoor Attacks Against Dataset Distillation

Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang Zhang

Network and Distributed System Security Symposium (NDSS). San Diego, CA, USA. Feb, 2023.

Abstract:
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate ASR scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases, while both maintaining decent model utility performance. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

2022

Security'22

ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models

Yugeng Liu*, Rui Wen*, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang (* equal contribution)

USENIX Security Symposium. Boston, MA, USA. Aug, 2022.

Abstract:
Inference attacks against Machine Learning (ML) models allow adversaries to learn information about training data, model parameters, and so on. While researchers have studied several kinds of attacks thoroughly, they have done soin isolation. As a result, we lack a comprehensive picture of the risks caused by the attacks, e.g., the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of defense techniques. In this paper, we fill this gap by presenting a first-of-its-kind holistic risk assessment of different inference attacks against machine learning models. We concentrate on four attacks – membership inference, model inversion, attribute inference, and model stealing – and establish a threat model taxonomy.
Our extensive experimental evaluation, conducted over five model architectures and four image datasets, shows that the complexity of the training dataset plays an important role with respect to the attack’s performance, while the effectiveness of model stealing and membership inference attacks are negatively correlated. We also show that defenses like DP-SGD and Knowledge Distillation can only mitigate some of the inference attacks. Our analysis relies on a modular re-usable software, ML-DOCTOR, which enables ML model owners toassess the risks of deploying their models, and equally serves as a benchmark tool for researchers and practitioners.

2019

COSE'19

Securing android applications via edge assistant third-party library detection

Zhushou Tang, Minhui Xue, Guozhu Meng, Chengguo Ying, Yugeng Liu, Jianan He, Haojin Zhu, Yang Liu

Computers & Security. 2019.

Abstract:
Third-party library (TPL) detection in Android has been a hot topic to security researchers for a long time. A precise yet scalable detection of TPLs in applications can greatly facilitate other security activities such as TPL integrity checking, malware detection, and privacy leakage detection. Since TPLs of specific versions may exhibit their own security issues, the identification of TPL as well as its concrete version, can help assess the security of Android APPs. However in reality, existing approaches of TPL detection suffer from low efficiency for their detection algorithm to impracticable and low accuracy due to insufficient analysis data, inappropriate features, or the disturbance from code obfuscation, shrinkage, and optimization.
In this paper, we present an automated approach, named PanGuard, to detect TPLs from an enormous number of Android APPs. We propose a novel combination of features including both structural and content information for packages in APPs to characterize TPLs. In order to address the difficulties caused by code obfuscation, shrinkage, and optimization, we identify the invariants that are unchanged during mutation, separate TPLs from the primary code in APPs, and use these invariants to determine the contained TPLs as well as their versions. The extensive experiments show that PanGuard achieves a high accuracy and scalability simultaneously in TPL detection. In order to accommodate to optimized TPL detection, which has not been mentioned by previous work, we adopt set analysis, which speed up the detection as a side effect.
PanGuard is implemented and applied on an industrial edge computing platform, and powers the identification of TPL. Beside fast detection algorithm, the edge computing deployment architecture make the detection scalable to real-time detection on a large volume of emerging APPs. Based on the detection results from millions of Android APPs, we successfully identify over 800 TPLs with 12 versions on average. By investigating the differences amongst these versions, we identify over 10 security issues in TPLs, and shed light on the significance of TPL detection with the caused harmful impacts on the Android ecosystem.

2018

CCS'18

HoMonit: Monitoring Smart Home Apps from Encrypted Traffic

Wei Zhang, Yan Meng, Yugeng Liu, Xiaokuan Zhang, Yinqian Zhang, Haojin Zhu

ACM Conference on Computer and Communications Security (CCS). Toronto, Canada. Oct, 2018.

Abstract:
Smart home is an emerging technology for intelligently connecting a large variety of smart sensors and devices to facilitate automation of home appliances, lighting, heating and cooling systems, and security and safety systems. Our research revolves around Samsung SmartThings, a smart home platform with the largest number of apps among currently available smart home platforms. The previous research has revealed several security flaws in the design of SmartThings, which allow malicious smart home apps (or SmartApps) to possess more privileges than they were designed and to eavesdrop or spoof events in the SmartThings platform. To address these problems, this paper leverages side-channel inference capabilities to design and develop a system, dubbed HoMonit, to monitor SmartApps from encrypted wireless traffic. To detect anomaly, HoMonit compares the SmartApps activities inferred from the encrypted traffic with their expected behaviors dictated in their source code or UI interfaces. To evaluate the effectiveness of HoMonit, we analyzed 181 official SmartApps and performed evaluation on 60 malicious SmartApps, which either performed overprivileged accesses to smart devices or conducted event-spoofing attacks. The evaluation results suggest that HoMonit can effectively validate the working logic of SmartApps and achieve a high accuracy in the detection of SmartApp misbehaviors.