Spring 2025
About
The UBC/SFU Joint Statistics Seminar is jointly hosted by the graduate students of the UBC Department of Statistics and the SFU Department of Statistics and Actuarial Science. The Spring 2025 event is the second of two events taking place in the 2024/2025 academic year. The Fall 2024 event was organized by graduate students from SFU, and the Spring 2025 event is organized by graduate students from UBC. Over its 20-year history, the event has offered Statistics and Actuarial Science graduate and undergraduate students at both schools an opportunity to network with their peers and to attend accessible talks about the research work of their fellow students and faculty.
The Spring 2025 event includes talks given by six students (three from UBC and three from SFU) and one faculty member from UBC.
Check out more events hosted by the UBC Statistics Graduate Student Association.
Registration
This term’s event will be hosted at UBC’s Earth Sciences Building (ESB 5104) on March 8, 2025. The event starts at 10:00 am. Register now through the registration form! If you are interested in presenting, please contact Johnny or Naitong.
Schedule
Breakfast
10:00am - 10:30am
Welcome Message
10:30am - 10:35am
Agam Sanghera (UBC)
10:35am - 11:00am
![]() |
An Incremental Learning Framework, Based on Smartphone Image and Sensory Data, for Road Surface Quality Assessment The prevalence of road defects such as potholes and pits, is quite severe in several Asian countries, such as Indonesia, and India, where roadways are the most accessible means of transportation. The roadways are subject to a substantial number of heavy vehicles that results in accelerated road wear and damage. This problem affects people on a daily basis in some of the most densely populated places on the planet. To tackle this pressing issue, the study utilizes real world data, obtained from cameras and vibration sensors embedded in smartphones. To effectively combat this problem, this research would facilitate the development of an incremental learning system that would recurrently gather data and would be capable of adapting to changes in road conditions, and optimise itself to assess road defects. |
George Thomas (SFU)
11:05am - 11:30am
![]() |
Uncovering Moderator Variables in a British Columbia Psychological Study with Causal Random Forests Heterogeneous treatment effect estimation is an underrepresented phenomenon in the context of randomized experiments in psychology. The population level causal effect of a treatment may not always capture variation across population subgroups, but methods for individual level effect estimation can determine characteristics of subjects who benefit from the treatment the most. In the modern literature, machine learning methods for heterogeneous effect estimation are gaining traction and are applied in many randomized clinical trials across health sciences but not in the context of psychology. We apply an extension of the original random forest algorithm developed by Wager and Athey that allows for estimation and asymptotic inference of heterogeneous treatment effects to a randomized controlled trial with measured psychological outcomes. The trial, known as the British Columbia Healthy Connections Project (BCHCP) measures outcomes at two years postpartum for socioeconomically disadvantaged children and their mothers across British Columbia pertaining to child injury events, cognition, language and maternal subsequent pregnancies. It is based on similar trials run under the Nurse Family Partnership (NFP) banner in the United States. We obtain individual-level treatment effect estimates and perform a regression analysis with generalized additive models to uncover significant maternal baseline moderator variables. Results show multiple significant sources of heterogeneity and may be useful for future targeted implementation of psychological health supports for underprivileged mothers and their children. |
Seren Lee (UBC)
11:35am - 12:00pm
![]() |
Issues of posterior inference in partially identified models and R package with tools for Bayesian inference with partially identified models With skyrocketing improvements in the computational performance of modern computing machines, the area of Bayesian inference applications is outstandingly improved, and Bayesian statistical analysis is used more frequently. In Bayesian inference, most computational resources are applied to running Markov chain Monte Carlo (MCMC) algorithms to obtain samples from posterior distributions. The MCMC algorithm is the main route to implement Bayesian inference. It allows for high-dimensional and flexible sampling. However, at the same time, researchers can undergo poorer computational performance when Bayesian statistical inference is performed using some specific families of models, namely partially identified models. This is because the good computational performance of the MCMC algorithm is not guaranteed. The parameters of the partially identified model are not uniquely identified, which makes the off-she-shelf MCMC algorithm hard to sample from posterior distributions. Importance sampling with transparent reparameterization (ISTP) is a good computational remedy for posterior inference with partially identified models. With the ISTP algorithm, researchers could obtain better and more stable computational performance while having samples in their original parameterization. We suggest a method to alleviate the computational challenge of Bayesian inferences with partially identified models, explore its universal usability, and develop a corresponding package in the statistical programming language R to help users perform Bayesian inferences with partially identified models. |
Lunch
12:00pm - 1:00pm
Hashan Peiris (SFU)
1:00pm - 1:25pm
![]() |
Development of Telematics Risk Scores in Accordance with Regulatory Compliance This paper proposes a ratemaking framework for claim frequency that utilizes informative telematics data and complies with the "discount only" regulatory requirement mandated by New York Bill A7614. The proposed framework uses a feedforward neural network to extract a one-dimensional risk score from multi-dimensional telematics features and integrates it with traditional features in generalized linear models. To meet the "discount only" requirement, we impose constraints on the risk score and its regression parameter. The results show that the proposed models, with a suitable risk score function, can outperform the standard GLMs in both in-sample goodness-of-fit and out-of-sample prediction performance. Furthermore, the analysis reveals that while the "discount only" constraint may drive insurers to raise base premiums to offset revenue losses from the relativity cap, the regulation could achieve its intended goal in scenarios with strong favorable selection. |
Rachel Lobay (UBC)
1:30pm - 1:55pm
![]() |
Refining COVID-19 Infection Estimates Using Finalized Data Estimates of COVID-19 infections based on finalized data can improve understanding of the pandemic and provide a more meaningful quantification of disease patterns and burden. Therefore, we retrospectively estimated daily incident infections for each U.S. state prior to Omicron. We then used a serology-driven model to scale these deconvolved cases to unreported infections. The resulting infections incorporated variant-specific incubation periods, reinfections, and waning antigenic immunity. In comparison to the gold-standard case data, we found a disease burden that appeared earlier and more extensively than indicated by cases alone. Most notably, in 44 states, fewer than one-third of infections appeared as case reports, and there were sustained periods where surges in infections were virtually undetectable through reported cases. We extended this approach to the Omicron era by incorporating wastewater data and state-level publicly available reinfection records. These sources helped refine estimates during the Delta-Omicron transition. The improved estimates, along with wastewater concentration data adjusted for limited coverage, were used to calculate shedding ratios, which informed daily infection estimates going forward. Finally, we derived key epidemiological quantities such as effective reproduction numbers and growth rates to track transmission dynamics and short-term changes in infection patterns for each state. |
Hasitha Jayaneththi (SFU)
2:00pm - 2:25pm
![]() |
Three Methods of Generating Space-Filling Designs: a comparison Space-filling designs are suitable for computer experiments. Unlike traditional experimental designs that focus on factor effects and interactions, space-filling designs aim to cover the experimental domain uniformly, ensuring that the entire range of inputs is well-represented. We compare three methods of generating space-filling designs. While the first two methods make use of orthogonal arrays and strong orthogonal arrays, the third method is based on random balanced designs. The criteria used for evaluating space-filling designs are those of orthogonality, distance and discrepancy. We consider six-level designs with 18 runs which allows us to examine the behaviour of these methods in a moderate-sized experimental setup while maintaining computational feasibility. |
Break
2:25pm - 2:35pm
Prof. Geoff Pleiss (UBC)
2:35pm - 3:35pm
![]() |
Lessons Learned from Developing and Maintaining Open Source Software Open-source software is an integral part of modern statistical research. The factors contributing to a software project's success extend far beyond writing and releasing code, a lesson that statisticians often learn the hard way. In this talk, I will discuss what I learned from my experiences developing libraries like GPyTorch and LinearOperator, contributing to large-scale projects like PyTorch and BoTorch, and collaborating with other software developers on projects such as CoLA. I will examine key decisions in open-source development, including releasing a standalone library versus a code repository, best practices for maintenance, and strategies for engaging users and contributors. Additionally, I will explore challenges associated with project scoping, long-term maintenance, balancing software development with other academic responsibilities, and gracefully deprecating projects that are no longer sustainable. Finally, I will discuss the importance of (tedious but) good software development habits like version control hygiene, test-driven development, continual deployment, and pair programming. |
Networking and Drinks at Browns!
3:40pm
Sponsors
![]() |
![]() |
Past Seminars
Fall 2024 | Spring 2024 | Fall 2023 | Spring 2023 | Fall 2022 | Spring 2022 | Fall 2021 | Spring 2021 | Fall 2020 | Spring 2020 |