I am a research scientist/senior director in the Data Analytics and Intelligence Lab (DAIL) at Alibaba Group. Prior to joining Alibaba, I was a researcher in Microsoft Research. I completed my Ph.D. in Computer Science at University of Illinois at Urbana-Champaign under the supervision of Prof. Jiawei Han, my M.Phil. at The Chinese University of Hong Kong, advised by Jeffery Xu Yu, and my B.S. at Renmin University of China, advised by Shan Wang and Qing Zhu.

We are hiring research scientists, engineers, and research interns for our lab! Please drop me a line if you are interested.

Research Interests and Projects

My recent research focuses on data privacy (including definitions, algorithms, and systems), federated data management/analytics/learning (e.g., algorithms--federated matrix factorization, and systems--FederatedScope), and making systems (including physical computing systems such as databases and machine learning systems, and social systems) intelligent and efficient with machine learning and optimization techniques.

I am actively working on the following areas and projects:

DPaaS (Data Privacy as a Service): Developing a series of privacy-preserving data collection, sharing, analysis, and learning techniques, for example, multi-dimensional and multi-source data sharing and OLAP under local differential privacy and/or MPC, and federated learning with formal privacy guarantees for vertically collaborative learning and device-server collaborative learning. Building a system with which these techniques can be deployed on various data platforms.

FederatedScope (a federated learning platform): We have open-sourced an easy-to-use federated learning package, FederatedScope (media coverage 1, media coverage 2, ...). It provides comprehensive functionalities including privacy protection, personalization, auto-tuning of federated machine learning models, based on a programming framework with which one can conveniently develop and deploy her/his own federated models in various settings about how data and models are distributed, e.g., vertically/horizontally collaborative learning, and cross-silo/cross-device federated learning services. Our team is actively contributing to the package and enriching its functionality for different types algorithms and scenarios, e.g., FederatedScope-GNN for federated learning of graph neural networks and FederatedScope-Real for large-scale cross-device federated learning. You are welcome to join us in this exciting project.

[Open-source Project] FederatedScope: an easy-to-use federated learning platform providing comprehensive functionalities.

System4AI: Optimizing the infrastructure and pipeline for training machine learning (large language) models, starting from the data pre-processors to model evaluators. At an earlier time, we developed a series of automated machine learning techniques (AutoML) in order to enable developers and data scientists with limited machine learning expertise and resources to train high-quality models, including auto tuning of hyperparameters, auto feature selection, and neural architecture search for machine learning models. Some of these techniques have been deployed into Alibaba’s cloud AutoML products.

[Open-source Project] Data-Juicer: a one-stop data processing system to provide higher-quality, juicier, and more digestible data and data recipes for training and tuning large language models.

AI4System: Is it possible for models to learn to be a statistician, a database administrator, an index, a query processor, or a query optimizer? We develop a series of "learning-to-be" techniques for different database components, as well as a system framework to deploy these learned components into real databases.

[Open-source Project] PilotScope: a middleware and a programming framework to bridge the gaps between AI4DB (Artificial Intelligence for Databases) algorithms and actual database systems.

AI4SocialScience: We conduct interdisciplinary research between Social Science (starting from, e.g., Economics) and Machine Learning. Besides traditional theoretical study, we build a multi-agent reinforcement learning platform to conduct simulation study and to investigate scenarios which are too complicated to be formulated as clean and solvable theoretical/math problems. For example, in recommendation systems, we study how users would react to different data privacy policies, and how the utility of eCommerce platforms and the utility of individual users are affected accordingly.