ECE Seminar Lecture Series

Towards High Performance Deep Learning Models Inference and Training: From Algorithm to Hardware

Shaoyi Huang, Assistant Professor, Computer Science, Stevens Institute of Technology

Wednesday, November 19, 2025
Noon–1 p.m.

601 Computer Studies Building

In recent years, significant advancements in artificial intelligence have been driven by the development of Deep Neural Networks (DNNs) and Transformer-based models, including BERT, GPT-3, and more recent Large Language Models (LLMs). These technologies have catalyzed innovations in various fields such as autonomous driving, recommendation systems, chatbot applications, and electronic design automation (EDA). DNNs are increasingly designed with deeper, more complex structures and require larger computational resources. As computational demands escalate, model sparsification has emerged as a promising method to reduce model size and computational load during execution. Given the evolution of high-performance computing platforms, particularly advanced GPUs, end-to-end DNNs runtime speedup with model sparsification is an ideal but difficult goal due to the intricacies involved in sparsity which may need the change of matrix and kernel settings.

In this talk, I will present our recent efforts in model inference and training acceleration from both algorithm and hardware levels. It mainly focuses on three innovative aspects: (1) an novel self-attention architecture with attention-specific primitives and an attention-aware pruning design for Transformer-based models inference acceleration; (2) our recent works on sparsity and parallelism exploitation sparse training which unlock the performance potential of the sparse graph neural network (GNN) models; (3) our recent study on AI-driven circuit GNN for chip design.

Woman smiling at camera wearing white shirt with black jacketBio:  Shaoyi is a tenure-track assistant professor in the Computer Science department at Stevens Institute of Technology. She received her PhD degree in Computer Engineering from the School of Computing at University of Connecticut. She was selected as a Machine Learning and Systems Rising Star 2024. She was awarded the Marion and Frederick Buckman Engineering Fellowship for outstanding academic achievement in 2024 and both Predoctoral Prize for Research Excellence, GE Fellowship for Excellence from UConn in 2023 and 2022. She was the recipient of Eversource Energy Graduate Fellowship in 2022, Synchrony Fellowship in 2022, and Cigna Graduate Fellowship in 2021. Her work on FPGA-based language models acceleration through sparse attention and dynamic pipelining won DAC 2022 Publicity Paper Award.