Publications
2024
- Yukun Jiang, Zheng Li, Xinyue Shen, Yugeng Liu, Michael Backes, Yang ZhangConference on Empirical Methods in Natural Language Processing (EMNLP). Miami, FL, USA. Nov, 2024.
Large vision-language models (LVLMs) have been rapidly developed and widely used in various fields, but the (potential) stereotypical bias in the model is largely unexplored. In this study, we present a pioneering measurement framework, ModSCAN, to SCAN the stereotypical bias within LVLMs from both vision and language Modalities. ModSCAN examines stereotypical biases with respect to two typical stereotypical attributes (gender and race) across three kinds of scenarios: occupations, descriptors, and persona traits. Our findings suggest that 1) the currently popular LVLMs show significant stereotype biases, with CogVLM emerging as the most biased model; 2) these stereotypical biases may stem from the inherent biases in the training dataset and pre-trained models; 3) the utilization of specific prompt prefixes (from both vision and language modalities) performs well in reducing stereotypical biases. We believe our work can serve as the foundation for understanding and addressing stereotypical bias in LVLMs.
- Yukun Jiang, Xinyue Shen, Rui Wen, Zeyang Sha, Junjie Chu, Yugeng Liu, Michael Backes, Yang ZhangInternational Conference on Web and Social Media (ICWSM). Buffalo, NY, USA. June, 2024.
Esports, short for electronic sports, is a form of competition using video games and has attracted more than 530 million audiences worldwide. To watch esports, people utilize online livestreaming platforms. Recently, a novel interaction method, namely ”bullet chats,” has been introduced on these platforms. Different from conventional comments, bullet chats are scrolling comments posted by audiences that are synchronized to the livestreaming timeline, enabling audiences to share and communicate their immediate perspectives. The real-time nature of bullet chats, therefore, brings a new perspective to esports analysis. In this paper, we conduct the first empirical study on the bullet chats for esports, focusing on one of the most popular video games, i.e., League of Legends (LoL). Specifically, we collect 21 million bullet chats of LoL from Jan. 2023 to Mar. 2023, across two mainstream platforms (Bilibili and Huya). By performing quantitative analysis, we reveal how the amount and toxicity of bullet chats distribute (and change) w.r.t. three aspects, i.e. the season, the team, and the match. Our findings show that teams with higher rankings tend to attract a greater quantity of bullet chats, and these chats are often characterized by a higher degree of toxicity. We then utilize topic modeling to identify the topics among bullet chats. Interestingly, we find that a considerable portion of topics (14% on Bilibili and 25% on Huya) discuss themes beyond the game, including genders, entertainment stars, non-esports athletes, and so on. Besides, by further modeling topics on toxic bullet chats, we find hateful speech targeting different social groups, ranging from professions, regions, etc. To the best of our knowledge, this work is the first measurement of bullet chats on esports games. We believe our study can shed light on esports research from the perspective of bullet chats.
- PreprintComprehensive Assessment of Jailbreak Attacks Against LLMsJunjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang ZhangPreprint. 2024.
Abstract:
Misuse of the Large Language Models (LLMs) has raised widespread concern. To address this issue, safeguards have been taken to ensure that LLMs align with social ethics. However, recent findings have revealed an unsettling vulnerability bypassing the safeguards of LLMs, known as jailbreak attacks. By applying techniques, such as employing role-playing scenarios, adversarial examples, or subtle subversion of safety objectives as a prompt, LLMs can produce an inappropriate or even harmful response. While researchers have studied several categories of jailbreak attacks, they have done so in isolation. To fill this gap, we present the first large-scale measurement of various jailbreak attack methods. We concentrate on 13 cutting-edge jailbreak methods from four categories, 160 questions from 16 violation categories, and six popular LLMs. Our extensive experimental results demonstrate that the optimized jailbreak prompts consistently achieve the highest attack success rates, as well as exhibit robustness across different LLMs. Some jailbreak prompt datasets, available from the Internet, can also achieve high attack success rates on many LLMs, such as ChatGLM3, GPT-3.5, and PaLM2. Despite the claims from many organizations regarding the coverage of violation categories in their policies, the attack success rates from these categories remain high, indicating the challenges of effectively aligning LLM policies and the ability to counter jailbreak attacks. We also discuss the trade-off between the attack performance and efficiency, as well as show that the transferability of the jailbreak prompts is still viable, becoming an option for black-box models. Overall, our research highlights the necessity of evaluating different jailbreak methods. We hope our study can provide insights for future research on jailbreak attacks and serve as a benchmark tool for evaluating them for practitioners.
2023
- PreprintRobustness Over Time: Understanding Adversarial Examples’ Effectiveness on
Longitudinal Versions of Large Language ModelsYugeng Liu*, Tianshuo Cong*, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang (* equal contribution)Preprint. 2023.Abstract:
Large Language Models (LLMs) undergo continuous updates to improve user experience. However, prior research on the security and safety implications of LLMs has primarily focused on their specific versions, overlooking the impact of successive LLM updates. This prompts the need for a holistic understanding of the risks in these different versions of LLMs. To fill this gap, in this paper, we conduct a longitudinal study to examine the adversarial robustness – specifically misclassification, jailbreak, and hallucination – of three prominent LLMs: GPT-3.5, GPT-4, and LLaMA. Our study reveals that LLM updates do not consistently improve adversarial robustness as expected. For instance, a later version of GPT-3.5 degrades regarding misclassification and hallucination despite its improved resilience against jailbreaks, and GPT-4 demonstrates (incrementally) higher robustness overall. Moreover, larger model sizes do not necessarily yield improved robustness. Specifically, larger LLaMA models do not uniformly exhibit improved robustness across all three aspects studied. Importantly, minor updates lacking substantial robustness improvements can exacerbate existing issues rather than resolve them. By providing a more nuanced understanding of LLM robustness over time, we hope our study can offer valuable insights for developers and users navigating model updates and informed decisions in model development and usage for LLM vendors. - PreprintWatermarking Diffusion ModelYugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang ZhangPreprint. 2023.
Abstract:
The availability and accessibility of diffusion models (DMs) have significantly increased in recent years, making them a popular tool for analyzing and predicting the spread of information, behaviors, or phenomena through a population. Particularly, text-to-image diffusion models (e.g., DALL·E 2 and Latent Diffusion Models (LDMs) have gained significant attention in recent years for their ability to generate high-quality images and perform various image synthesis tasks. Despite their widespread adoption in many fields, DMs are often susceptible to various intellectual property violations. These can include not only copyright infringement but also more subtle forms of misappropriation, such as unauthorized use or modification of the model. Therefore, DM owners must be aware of these potential risks and take appropriate steps to protect their models. In this work, we are the first to protect the intellectual property of DMs. We propose a simple but effective watermarking scheme that injects the watermark into the DMs and can be verified by the pre-defined prompts. In particular, we propose two different watermarking methods, namely NAIVEWM and FIXWM. The NAIVEWM method injects the watermark into the LDMs and activates it using a prompt containing the watermark. On the other hand, the FIXWM is considered more advanced and stealthy compared to the NAIVEWM, as it can only activate the watermark when using a prompt containing a trigger in a fixed position. We conducted a rigorous evaluation of both approaches, demonstrating their effectiveness in watermark injection and verification with minimal impact on the LDM’s functionality. - Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang ZhangNetwork and Distributed System Security Symposium (NDSS). San Diego, CA, USA. Feb, 2023.
Abstract:
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate ASR scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases, while both maintaining decent model utility performance. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
2022
- Yugeng Liu*, Rui Wen*, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang (* equal contribution)USENIX Security Symposium. Boston, MA, USA. Aug, 2022.
Abstract:
Inference attacks against Machine Learning (ML) models allow adversaries to learn information about training data, model parameters, and so on. While researchers have studied several kinds of attacks thoroughly, they have done soin isolation. As a result, we lack a comprehensive picture of the risks caused by the attacks, e.g., the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of defense techniques. In this paper, we fill this gap by presenting a first-of-its-kind holistic risk assessment of different inference attacks against machine learning models. We concentrate on four attacks – membership inference, model inversion, attribute inference, and model stealing – and establish a threat model taxonomy.
Our extensive experimental evaluation, conducted over five model architectures and four image datasets, shows that the complexity of the training dataset plays an important role with respect to the attack’s performance, while the effectiveness of model stealing and membership inference attacks are negatively correlated. We also show that defenses like DP-SGD and Knowledge Distillation can only mitigate some of the inference attacks. Our analysis relies on a modular re-usable software, ML-DOCTOR, which enables ML model owners toassess the risks of deploying their models, and equally serves as a benchmark tool for researchers and practitioners.
2019
- COSE'19Zhushou Tang, Minhui Xue, Guozhu Meng, Chengguo Ying, Yugeng Liu, Jianan He, Haojin Zhu, Yang LiuComputers & Security. 2019.
Abstract:
Third-party library (TPL) detection in Android has been a hot topic to security researchers for a long time. A precise yet scalable detection of TPLs in applications can greatly facilitate other security activities such as TPL integrity checking, malware detection, and privacy leakage detection. Since TPLs of specific versions may exhibit their own security issues, the identification of TPL as well as its concrete version, can help assess the security of Android APPs. However in reality, existing approaches of TPL detection suffer from low efficiency for their detection algorithm to impracticable and low accuracy due to insufficient analysis data, inappropriate features, or the disturbance from code obfuscation, shrinkage, and optimization.
In this paper, we present an automated approach, named PanGuard, to detect TPLs from an enormous number of Android APPs. We propose a novel combination of features including both structural and content information for packages in APPs to characterize TPLs. In order to address the difficulties caused by code obfuscation, shrinkage, and optimization, we identify the invariants that are unchanged during mutation, separate TPLs from the primary code in APPs, and use these invariants to determine the contained TPLs as well as their versions. The extensive experiments show that PanGuard achieves a high accuracy and scalability simultaneously in TPL detection. In order to accommodate to optimized TPL detection, which has not been mentioned by previous work, we adopt set analysis, which speed up the detection as a side effect.
PanGuard is implemented and applied on an industrial edge computing platform, and powers the identification of TPL. Beside fast detection algorithm, the edge computing deployment architecture make the detection scalable to real-time detection on a large volume of emerging APPs. Based on the detection results from millions of Android APPs, we successfully identify over 800 TPLs with 12 versions on average. By investigating the differences amongst these versions, we identify over 10 security issues in TPLs, and shed light on the significance of TPL detection with the caused harmful impacts on the Android ecosystem.
2018
- Wei Zhang, Yan Meng, Yugeng Liu, Xiaokuan Zhang, Yinqian Zhang, Haojin ZhuACM Conference on Computer and Communications Security (CCS). Toronto, Canada. Oct, 2018.
Abstract:
Smart home is an emerging technology for intelligently connecting a large variety of smart sensors and devices to facilitate automation of home appliances, lighting, heating and cooling systems, and security and safety systems. Our research revolves around Samsung SmartThings, a smart home platform with the largest number of apps among currently available smart home platforms. The previous research has revealed several security flaws in the design of SmartThings, which allow malicious smart home apps (or SmartApps) to possess more privileges than they were designed and to eavesdrop or spoof events in the SmartThings platform. To address these problems, this paper leverages side-channel inference capabilities to design and develop a system, dubbed HoMonit, to monitor SmartApps from encrypted wireless traffic. To detect anomaly, HoMonit compares the SmartApps activities inferred from the encrypted traffic with their expected behaviors dictated in their source code or UI interfaces. To evaluate the effectiveness of HoMonit, we analyzed 181 official SmartApps and performed evaluation on 60 malicious SmartApps, which either performed overprivileged accesses to smart devices or conducted event-spoofing attacks. The evaluation results suggest that HoMonit can effectively validate the working logic of SmartApps and achieve a high accuracy in the detection of SmartApp misbehaviors.