Publication: Fidelity, Fairness and Responsibility through the Lens of Sequential Decision Making
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
As methods of artificial intelligence continue to become increasingly important to support robust decision making in regard to deciding how to act on the basis of the right data, learning to act over time while supporting fairness to participants, and helping individuals make better sequential decisions.
This thesis expands in these directions, developing algorithms for enhancing decision-making processes, ensuring fairness in automated decisions, and optimizing user engagement. Motivating settings come from financial time series generation and portfolio optimization, the study of reinforcement learning with fairness constraints in the context of making loans, and the formulation of user engagement optimization in online platforms.
First, I introduce the decision-aware time-series conditional generative adversarial network (DAT- CGAN), which is a new method for time-series generation that is aware of the way in which data will be used. In particular, the framework adopts a multi-Wasserstein loss on decision-related quantities and is designed to support decision-making. DAT-CGAN uses an overlapped block-sampling approach for sample efficiency. The main results characterize the generalization properties of DAT-CGAN, and apply to financial time series and a multi-period portfolio choice problem. The proposed method demonstrates better training stability and generative quality in regard to both raw data and decision-related quantities than GAN-based baselines.
Second, I introduce the study of reinforcement learning (RL) with stepwise fairness constraints, which requires group fairness at each time step. This problem is motivated by the increasing use of AI methods in societally important settings, ranging from credit to employment to housing, and where it is crucial to provide fairness in regard to automated decision making. Moreover, many such settings are dynamic, with populations responding to sequential decision policies. In the case of tabular episodic RL, I provide a learning algorithm with a strong theoretical guarantee in regard to policy optimality and fairness violations. The experimental results also show that the proposed algorithm outperforms strong learning-based baselines.
Third, I formulate and solve a learning problem to handle content recommendation while also learning when to recommend users take a break during a user session. User engagement optimization plays a crucial role in online platforms, with platform designers putting great efforts into recommending interesting content to attract users. At the same time, blindly pushing users to extend a session can lead to burn out and regret, which is harmful to users’ long-term well-being. In response, many platforms now provide a service that reminds users to take a break. However, this timing is typically set manually, which motivates an interest in algorithms to automatically pop-out a reminder. Technically, I formulate the problem as an optimal stopping problem for a Markov decision process, and give an offline Q-learning based algorithm with a rigorous theoretical guarantee. I demonstrate the effectiveness of the algorithm on online click-stream data in an online shopping setting.