My Journey on System Security Research: Past, Current and Future




Current Research


Cloud and Mobile Security

Cloud computing is becoming a game-changer for the academia and industry that need low-cost and scalable data processing capabilities.  However, this new computing paradigm is also fraught with security and privacy risks that need practical solutions.  Though many cloud security issues are related to the problems that have long been studied, I strongly believe that distinctive features of the cloud actually expand the space of these seemingly old problems, which leads to new challenges and opportunities for security research.


For example, software in the cloud is often built through integrating web APIs provided by different web service providers, and served by delivering part of its components to the user through mobile apps or browser.  Our research in the past a few years show that this Software-as-a-Service (SaaS) model can easily bring in logic flaws during API integrations, due to the miscommunication between the API provider and the API user, and is fundamentally vulnerable to side-channel attacks.  As evidence to the seriousness of the problems, we found that high-profile web stores integrating payment services (e.g., PayPal Checkout, etc.) can be exploited to shop for free, popular social-login services (e.g., Google ID,  Facebook Connect, etc.) can be easily abused, and leading web services are leaking out such sensitive user information as healthcare data, family incomes, investment secrets and mobile users’ true identities to network eavesdroppers or malicious zero-permission apps running on the victim’s phone. Mitigation of these threats needs new technologies, a demand leading to new research directions that become increasingly interesting to the security community (see the follow-up research including ours here [1, 2, 3]). Our research in these areas has received media attentions and won us a Best Practical Paper Award from Oakland’11 and Runner-up recognition of the PET Award. My other research on SaaS includes in-line mediation of untrusted Flash, and techniques for protecting Mash-up web applications. Recently, we are moving towards mobile cloud security and privacy. 


On the IaaS layer, my ongoing research focuses on secure data-intensive computing on hybrid clouds.  A hybrid cloud is the typical way that an organization uses the commercial cloud: the public cloud here often acts as a receiving end of the computation “spill-over” from the organization's internal system. This new computing paradigm, which involves both the public cloud and the private cloud, presents a new opportunity for Outsourcing a large-scale computing task, in an efficient and privacy-preserving way, to untrusted environments. Our ongoing research shows that over this platform, new secure computation techniques can be developed to support real-world data-intensive computing.  




Data and Health Informatics Security

I am also interested in data-related security problems, particularly those critical for protecting patient privacy during analysis and dissemination of human genomic data (a typical example of “Big Data”), and for measurement and understanding of emerging illicit online activities. 


One of the most important security challenges in health informatics is the privacy issues in human genome study (HGS), as indicated by the recent report from Presidential Commission for the Study of Bioethical Issues. We have been working on this problem since 2008. Specifically, HGS relies on convenient access to aggregated human DNA data.  Prior research, however, shows that public release of such data could lead to disclosure of HGS participants’ identity information. Our research further reveals that test statistics (e.g., p-values, r-squares) calculated from such aggregated data and published by HGS papers could also be used to infer sensitive patient information. These findings point to a disturbing lack of understanding about privacy implications for releasing DNA data.  Our ongoing research aims to address this important issue, towards building a sound security foundation to facilitate data sharing without undermining patients’ privacy.  We won the PET Award for Outstanding Research in Privacy Enhancing Technologies in 2011 for our research in this area.



We have also been studying innovative technologies that enable secure analysis of human genomic data on public computing platforms. For example, we developed a novel computation partition technique that outsources sequencing read mapping, a big-data analysis task critical for HGS, to public commercial clouds. This task involves evaluating edit distances for millions upon billions of sequence pairs, which cannot be handled by any prior secure computing techniques.


More recently, we start working on web data analysis and measurement for understanding new malicious online activities.  Examples include our study of malware web advertising (“malvertising”) and the topological structures of the malicious hosts playing critical roles in a large spectrum of malicious web activities (e.g., drive-by downloads, scam, SPAM, etc.).  Our discoveries have been used to design new detection techniques, which are shown to outperform existing commercial tools.




Past Research


Software Security

Most of my prior work on software and system security is related to automatic program analysis for vulnerability detection and malware protection. For example, we proposed a black-box exploit prevention technique called packet vaccine that quickly detects exploit attempts on software and automatically generates signatures to shield the underlying software vulnerabilities without reliance on its source and binary code.  Other examples include our analysis of information leaks from Linux process file systems, and new techniques for efficient dynamic runtime malware scan, automatic reverse engineering of program security configuration, secure remote error analysis and spyware containment. More recently, we start working on the security challenges in smart-phone systems and software. 


Game-Theoretic Incentive Engineering

When I was a PhD student at Carnegie Mellon, I spent a lot of time on AI and game theory, working on the problems such as learning in games and mechanism designs (see my old papers [1, 2]). After joining academia, I moved onto system/data security but my interests in game theory remain, particularly when the theory can help address some security-related issues.  This happens, for example, when you need to encourage an honest but inadvertent insider to follow the best practices, avoiding shortcuts that may endanger her organization’s security protection.


Selected Projects

Role: PI

Time: From 10/01/2013 to 9/30/2016

Role: Single PI

Time: From 9/01/2011 to 8/31/2014

Role: Single PI

Time: From 9/01/2010 to 8/31/2013

Role: Single PI

Time: From 9/01/2007 to 8/31/2010

Role: PI

Time: From 4/01/2007 to 3/31/2009




The Future of System Security Research: Composition Focusing and Data Centric


Tomorrow’s computing will be ubiquitous, interconnected, interoperative, sensory and data intensive.  Protecting such computing needs new technologies that not only secure individual systems but also safeguard their integration (e.g., smartphone’s management of smartwatch, backed by the cloud services) in the most user-friendly (e.g., with minimum effort for configuring security setting)  and intelligent way (proactive identification of threats, putting protection ahead of hazards).  To make this happen, I believe that two greatest security challenges need to be addressed: how to enable secure composition of diverse computing systems and resources (devices, services and others), and how to protect big data and leverage it to make secure computing smarter and more effective.



Unlike the traditional system security research, which more focuses on the security weaknesses within a single OS, a single program and a single service, we believe that the future security threats will aim at the boundaries between different components: e.g., management of smart watches, home automation devices through smartphones,  synchronizing system states across mobile devices, web applications deployed across mobile phones and the cloud,  integrations of multiple services (payment and SSO) into a single web service.  Such boundaries (the ways those systems and resources are integrated) expose a huge attack surface and are increasingly hard to secure.  Our prior research shows that service compositions involving leading web services (Google, Amazon, PayPal, Facebook), mobile systems (phone-controlled IoT, etc.) are all vulnerable, often due to the misunderstanding about what different components can protect and what they cannot.  Even more challenging is how to integrate these systems in the most user-friendly way, without undermining the security protection in place. Ideally, one should be able to connect her phones to her watch and smart fridge and her medical devices even without configuration, quickly integrate PayPal and Google SSO into her code with a single line of instruction.  In practice, this cannot be done using today’s techniques without security consequences, as found in our work.   Since the future computing will enable the user to freely move her computing across devices, use services and resources from different sources, fully understanding the security risks in piecing those individual puzzles (systems, resources) together and developing automated techniques to support this process will be right in the center of security innovations.



Also important here is the availability of a huge amount of data and the progress of big-data analytics. The privacy challenge here is significant, since the amount of private information that can be derived from such data is unprecedented, which has been pointed out by the 2014 report to the President.  Today’s cryptographic techniques are not designed for protecting the data of such a scale.  Our research shows that the techniques tailored to the unique features of the data (such as human genome data) have great potentials to move the security techniques towards the practical end.   In the meantime, the availability of the big data also presents to the security researchers a great opportunity to better understand the adversary: what they have done, what they are about to do, what their strategies and infrastructures are, etc. Leveraging such information, we can seize the opportunity to revolutionize security technologies, making it smarter and proactive.  As an example, our Android malware detection system ( demonstrates that we can achieve a very high detection rate, capture unknown malware without resorting to malware signatures and known malicious behavior used by the commercial AV systems today.