2025: Xinhao Lin: Comparison of Efficiency of Several Stochastic Approximation Algorithms Advisor: Jim Spall
This work focuses on the comparison of efficiency of three stochastic ap- proximation (SA) methods: simultaneous perturbation SA (SPSA), random direction SA (RDSA), and truncated Cauchy smoothed functional algorithm (TCSF). The derivation of asymptotic normality and mean square error (MSE) of SPSA and RDSA is reviewed, and comparison shows that TCSF has asymp- totically biased estimator, and hence the claim that it outperforms SPSA is invalid. Modified ways of comparing the MSEs between SPSA and RDSA is studied, and results show that for general loss functions SPSA tends to outperform RDSA not necessarily deterministically but with a probability exceeding 1/2.

2025: Samba Njie: Probabilistic and Neural Models for Estimating Coupling Dynamics of Circadian Gene Regulatory Networks: Advisor: Tom Woolf
Circadian oscillators form a crucial component of biological systems. In particular, a circadian molecular clock is a set of genes that help regulate the behavior of an organism in a 24-hour cycle (hence the name circadian). These genes, which we refer to as clock core genes, are transcribed (DNA converted to mRNA) and translated (mRNA to protein) in a feedback-loop that is regulated by these constituent genes and external factors (called zeitgebers). Transcription and translation cause the gene expression levels to oscillate cyclically, causing rhythmic, periodic patterns. We formulate this as a dynamical system and explore methods to reproduce the dynamics using actual gene regulatory network data and infer the coupling. We compare the efficacy of each method and discuss our computational simulations and results as a basis of better understanding how genes interact, using probabilistic machine learning, physics-informed deep learning and signature methods, as well as data-driven dynamic mode decomposition methods, as novel applications to gene regulatory networks data to estimate the coupling dynamics without the need of a specified governing differential equation.

2025: Savaphol Hiruntiaranakul: SPSA-augmented Hamiltonian Monte Carlo advisor: Stacy Hill
Hamiltonian Monte Carlo (HMC) is an efficient Markov Chain Monte Carlo (MCMC) algorithm with fast convergence and scales well in the dimension of the target density. Nevertheless, its gradient requirement challenges implementation in large-scale problems. To compute a gradient for one leapfrog integration update, 2d measurements of the density are required. This study introduces HMC where gradients are obtained by means of the Simultaneous Perturbation Stochastic Approximation algorithm (SPSA). This approximation scheme requires only 2 density measurements per gradient evaluation and thus can facilitate simulations in high-dimensional settings. We prove convergence of the SPSA-HMC algorithm by extending the general framework in Zou and Gu [1] for unbiased gradient estimates. Furthermore, we analyze how two variance reduction methods further improve computational efficiency of the SPSA-HMC algorithm.

2024: Scott Einsidler: Approximating and Showing the Existence of closed Orbits in Planar Differential Systems  Advisor: Kurt Stein
This thesis investigates the finding of closed orbits and limit cycles in continuous-time dynamical systems within the plane. The work uses a geometric approach to understanding possible regions of space for a closed-orbit trajectory. It is applied to quadratic systems that have been proven to have three or four limit cycles. Aspects of this geometric approach create an algorithm that utilizes Poincare maps sequentially to find closed orbits of several known systems. The results give tools to find closed orbits within a certain level of tolerance and to find small regions of the plane where closed orbits and limit cycles could be found.

2024: Geng Zhang: Relative Performance of two Forms of Fisher Information in Statistical Inference: Advisor: Jim Spall
Maximum likelihood estimation (MLE) is a well-known technique used to make statistical predictions. In practical terms, figuring out how accurate these predictions are is crucial. This involves constructing the confidence region for the MLE, which is a way to measure how much we can trust the estimate. Standard statistical theory shows that the normalized MLE is asymptotically normally distributed with the covariance matrix being the inverse of the Fisher information matrix (FIM) at the unknown parameter. There are two main approximations: the inverse of the observed FIM (which is the same as the inverse Hessian of the negative log-likelihood) or the inverse of the expected FIM (the same as the inverse FIM). Both approximations rely on evaluations made at the MLE based on the sample data. In this thesis, we show that under reasonable conditions, similar to those typically applied to MLE, the expected FIM provides a better approximation for the MLE model’s confidence region than the observed FIM. Specifically, in an asymptotic context, the eigenvalues and eigenvectors of the expected FIM, when evaluated at the MLE, manifest a lower mean squared error relative to the true covariance matrix than those derived from the observed FIM. This conclusion is supported by theoretical explanations and numerical experiments across two distinct problems.

2024. Cristofer Caballeros: Advisor: Anthony Johnson

2024. Lauren Kimpel: Some Fellow-Traveler Properties on Finite Graphs Advisor: Nandi Leslie
This thesis investigates an application of the k-fellow-traveler property for groups to finite graphs (which may or may not be Cayley), and what may be possibly revealed about the structure of the graph by analyzing its k-fellow-traveler constant. We prove that, if G is a finite graph with κ(G) ≥ 2, then diam(G) − 1 ≤ kG ≤ diam(G).

2024. Donald Grage: Optimization of Rocket and Signature Techniques Using Synthetic Patient Data to Predict Type 2 Diabetes Advisor: Thomas Woolf
This thesis explores the application of Time Series Classification (TSC) methods, specifically ROCKET and Signature Method, for predicting Type 2 Diabetes from synthetic patient data. The project attempts to enhance the predictability of Diabetes diagnoses by analyzing medical observation data through advanced mathematical and programming techniques.

2024. Victoria Rose: Graphical Analysis of Recurrent Surface Temperature Trends Using the Gromov-Wasserstein Distance Metric Advisor: Thomas Woolf
Graphical networks serve as powerful models for interpreting complex systems by abstracting complex scenarios with a simple network. This work leverages such networks to model the patterns in surface temperature observed within the Gulf of Mexico. To quantify seasonal dynamics, Voronoi diagrams are used to capture sub-regions that share common characteristics. Graphs of these diagrams can then be analyzed like graphs and the Gromov-Wasserstein distance metric captures in a single metric how different a daily graph is from some standard reference graph.

2023. Jonah Bregstone: Modeling Pollinator Behavior Through Survival Analysis. Advisor: Thomas Woolf
This study applies survival analysis to pollinator-plant relationships, modeling floral-resource utilization by pollinators. Three types of survival analysis models are compared, including the Cox proportional hazards model, a binary classification model using stacking, and a Logistic Hazards neural network model.

2023. Frederick Day-Lewis: Advisor: David Schug

2023. Sarah Miller: Advisor: Cetin Savkli

2023. Ernest Friedel: Advisor: Kurt Stein

2023. Tulio Tablada: Advisor: Cleon Davis

2023. Jack Moody: Advisor: Thomas Woolf

2023. Lanfranco Bonghi: Advisor: Christine Nickel

2022. Michael Baeder: Manifold Learning for Empirical Asset Pricing. Advisor: Burhan Sadiq
This thesis develops a methodology for applying modern manifold embedding algorithms to the problem of empirical asset pricing. Our technique combines traditional linear compression with geometric dimensionality reduction in order to characterize the time-evolving distribution of a nonconstant dimensional time series using a small number of latent factors.

2022. Alyssa Columbus: Sleep Duration as a Neural Survival Model. Advisor: Thomas Woolf
This thesis focuses on modeling sleep duration with the most prominent medical model for sleep, the two-process model, in conjunction with a novel mathematical architecture that aspires to capture both the circular nature and the complexity of daily schedules and habits. Specifically, the two aims of this thesis are (1) to develop a theoretical mathematical framework to describe cyclical sleep and wake patterns and (2) to test this framework computationally with empirical patient data.

2022. Joseph Avila: Advisor: Nandi Leslie

2022. Anjelika Klamp: Advisor: Thomas Woolf

2022. Timothy Davison: Advisor: David Schug

2021. Richard Shea. Building a Dynamic Hawkes Graph. Advisor: Thomas Woolf.
We couple a multivariate description with a time-dependent Hawkes/INAR(p) process. This model can be updated by sensors and is essentially a Kalman filter for INAR coupled data streams. This is a way to automatically interrogate an incoming stream of data for change-points and to adjust the stationary distribution to a new stationary distribution when/if the underlying stream of data is seen to have changed.

2021. William Glad: Path Signature Area-Based Causal Discover in Coupled Time Series. Advisor: Thomas Woolf.
There are many techniques available to recover causal relationships from data, such as Granger causality, convergent cross mapping, and causal graph structure learning approaches such as PCMCI. Path signatures and their associated signed areas provide a new way to approach the analysis of causally linked dynamical systems, particularly in informing a model-free, data-driven approach to algorithmic causal discovery. With this paper, we explore the use of path signatures in causal discovery and propose the application of confidence sequences to analyze the significance of the magnitude of the signed area between two variables.

2021. Lucas McCabe. Markov Decision Processes for Node Immunization. Advisor: Thomas Woolf.
In this work, we consider the challenge of node immunization where information regarding network topology is inferred only through agent exploration along an unbiased random walk. In the first part, we formulate this as a Markov decision process problem and derive heuristic-based policies for scale-free and uncorrelated networks. We demonstrate empirical evidence that these policies achieve their objectives near-optimally and provide a policy-of-policies for situations where information about network family does not exist. In the second part, we introduce our open-source contagion package and use it to illustrate immunization policy performance with contagion simulations.

2021. Adam Byerly. Enhanced Uniform Manifold Approximation and Projection via Simultaneous Perturbation Stochastic Approximation. Advisor: Stacy Hill.
This thesis introduces the UMAP-SPSA algorithm to perform the UMAP dimension reduction without the need for the smooth approximator. Further, we analyze the algorithm’s computational performance and embedding accuracy.

2021. James Howard: Predicting Sepsis Onset with Survival Analysis Over Signature Transformations Advisor: Thomas Woolf
Predicting clinical outcomes from time-series medical data is a complex but essential endeavor. In this study, we propose a novel approach that combines traditional survival models like Cox pro- portional hazards, logistic, and multi-task logistic regression (MTLR) with the robust mathematical framework of signature methods. These methods are particularly effective in capturing the under- lying dynamics of time-series data with stochastic error. We introduce the concept of rough paths to provide a foundational understanding of how these techniques can capture not only the data’s deterministic aspects but also its stochastic nature, thereby enriching the feature set used for making more accurate predictions.

2020. Erick Galinkin. Malicious Network Traffic Detection via Deep Learning: An Information Theoretic View. Advisor: Cleon Davis.
Our results show that since mutual information remains invariant under homeomorphism, only feature engineering methods that alter the entropy of the dataset will change the outcome of the neural network. This means that for some datasets and tasks, neural networks require meaningful, human-driven feature engineering or changes in architecture to provide enough information for the neural network to generate a sufficient statistic.

2019. Dominic Michael Padova. Batlas: variational reconstruction of a digital, three-dimensional atlas of the big brown bat (Eptesicus fuscus). Advisors: Cleon Davis and J. Tilak Ratnanather.
We define Sobolev and total variation priors on image smoothness, which control the derivatives of the images, to regularize (i.e. reduce complexity by removing unreasonable parameter choices from) the high-dimensional parameter space prescribed by the rigid motion dimensions and the diffeomorphism dimensions. We show that the quality of rigid slice alignment brought by introducing a Sobolev prior on the image intensity of a phantom and the bat brain data is superior to that of the total variation priors.