Time Zone Notice. All times are listed in US Central Time (America/Chicago). Please click “view in your time zone” to convert to your local time.
We introduce a family of tensor train truncated Schatten-$p$ norms for tensors, defined by applying the matrix Schatten-$p$ norm to a specific matricization of the tensor train decomposition. We establish the foundational theory for these norms, proving they are well-defined, convex, and gauge-invariant, and follow the duality relationship between the tensor train truncated nuclear norm $\|\cdot\|_{t,*}^{(p)}$ and the tensor train truncated spectral norm $\|\cdot\|_{t,\sigma}^{(p)}$. Theoretically, we demonstrate that these norms provide a more focused regularization than the overlapped nuclear norm. Computationally, they bridge the gap between the tractability of convex optimization and the representational efficiency of the tensor train format, offering a principled approach for high-dimensional tensor completion and denoising.
Tensor decompositions have become an important tool for representing and computing with high-dimensional data, yet many existing methods face challenges in scalability, robustness, and interpretability. In this talk, I will discuss recent developments in tensor CUR for low-rank approximation and recovery. These approaches extend the classical CUR idea from matrices to tensors by constructing structured approximations from carefully selected tensor fibers, slices, or subtensors, leading to computationally efficient and data-aware representations. I will present algorithmic ideas and theoretical results for tensor CUR-type decompositions in several settings, including low-rank approximation, recovery under sparse corruptions, and structured factorization. I will also highlight connections between these methods and broader themes in tensor-based high-dimensional approximation. The talk will include representative numerical results illustrating the effectiveness of tensor CUR for scalable recovery and approximation tasks.
Entropic optimal transport (EOT) has emerged as one of the most important problems in the mathematical foundations of machine learning, allowing us to quickly compute approximate Wasserstein distance between probability measures via the celebrated Sinkhorn algorithm. At its core, it is the classical matrix scaling algorithm that iteratively fits the row and column sums of a suitable input matrix to the target margin. In the first half of this talk, I will give a friendly overview on these topics including Kantorovich duality and a short convergence analysis of Sinkhorn algorithm. I will also provide some cool applications in machine learning such as image translation and knowledge distillation. In the second half of the talk, I will talk about some new perspectives on the related topics including EOT on large graphs, a random coordinate variant of the Sinkhorn algorithm, and scaling limit of random matrices rescaled (or conditioned) to a given margin. This talk is based on joint works with my students (William Powell, Danny Duan, Rahul Choudhari, and Shuqi Bi) and collaborators (Sumit Mukherjee and Jakwang Kim).
This talk presents some recent results aimed at the rigorous mathematical understanding of how and why supervised learning works. We construct explicit global minimizers for both underparametrized and strongly overparametrized deep networks. For the latter, we derive deterministic generalization bounds that depend on the geometry of the training and test data, but not on the network architecture. The work presented includes collaborations with Patricia Munoz Ewald, Andrew G. Moore, and C.-K. Kevin Chien.
We propose a wavelet-based numerical framework for solving variable-order time-delay optimal control problems. This type of problem is prevalent in many real-world applications where memory effects, hereditary properties, and delayed system responses are present. We consider a model with fractional operators of variable order and time delay terms in the state dynamics, which makes the problem highly nonlinear and difficult to analyze. To overcome these difficulties, an efficient wavelet approximation method for the numerical treatment of the governing optimality system is developed. The operational matrices of the wavelet basis functions, which are suitable for the approximation of the state and control variables, are used to transform the optimal control problem into a system of algebraic equations. This transformation significantly simplifies the computational complexity without sacrificing high accuracy. The existence and uniqueness of the solutions of the optimal control model are proved, and the necessary and sufficient conditions of optimality are derived, which provide a theoretical basis for the proposed method. Several illustrative examples are discussed to demonstrate the effectiveness, accuracy, and computational efficiency of the proposed wavelet method. The obtained numerical results show excellent agreement with exact or benchmark solutions and confirm the capability of the method in treating variable-order dynamics and delay effects simultaneously.
While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets.
In supervised learning, the primary objective is to transport initial data toward target data through the layers of a neural network. We propose a novel perspective: the supervised learning task for convolutional neural networks (CNNs) can be interpreted as a control problem for partial differential equations (PDEs). Although there is a substantial body of literature on control and optimization problems for parabolic and hyperbolic systems, the specific problem of controlling and optimizing the \emph{coefficients} of the associated operators within such systems has not yet been systematically addressed in the control theory of PDEs. We propose a dual system formulation for the control and optimization problem associated with parabolic PDEs, laying the groundwork for the development of efficient numerical schemes in future research. We also provide a theoretical proof showing that the control and optimization problem for parabolic PDEs admits minimizers.
This talk presents a mathematical overview of optimization theory and its central role in data analytics and machine learning. It introduces the fundamental components of optimization problems, including objective functions, constraints, and decision variables, along with key concepts such as stationary points, gradients, Hessians, and conditions for optimality, with special emphasis on convexity and its importance in ensuring global optimal solutions. Building on these mathematical foundations, the talk connects them to data analytics by covering its major types descriptive, diagnostic, predictive, and prescriptive, while also reviewing core statistical concepts such as variance, covariance, and correlation, and briefly outlining the architecture of modern data analytics pipelines. Finally, it highlights real world applications of optimization in domains such as recommendation systems, fraud detection, demand forecasting, advertising, and healthcare analytics. The presentation emphasizes that optimization forms the mathematical foundation of modern data driven intelligence.