Fall Term Schedule
Only courses with a DSC course number are listed on this page. See MS program for all of the required and elective courses for the degree.
Fall 2024
Number | Title | Instructor | Time |
---|
DSCC 401-01
Brendan Mort
MW 9:00AM - 10:15AM
|
This course provides a hands-on introduction to widely-used tools for data science. Topics include Linux; languages and packages for statistical analysis and visualization; cluster and parallel computing including GPUs; Hadoop and Spark; libraries for machine learning; NoSQL databases; and cloud services. PREREQUISITES: CSC 161, CSC 171 or some equivalent programming experience strongly recommended.
|
DSCC 420-01
Gonzalo Mateos Buckstein
MW 4:50PM - 6:05PM
|
The goal of this course is to learn how to model, analyze and simulate stochastic systems, found at the core of a number of disciplines in engineering, for example communication systems, stock options pricing and machine learning. This course is divided into five thematic blocks: Introduction, Probability review, Markov chains, Continuous-time Markov chains, and Gaussian, Markov and stationary random processes. Prerequisites: ECE 242 or equivalent
|
DSCC 435-1
Jiaming Liang
TR 9:40AM - 10:55AM
|
This course primarily focuses on algorithms for large-scale optimization problems arising in machine learning and data science applications. The first part will cover first-order methods including gradient and subgradient methods, mirror descent, proximal gradient method, accelerated gradient method, Frank-Wolfe method, and inexact proximal point methods. The second part will introduce algorithms for nonconvex optimization, stochastic optimization, distributed optimization, manifold optimization, reinforcement learning, and those beyond first-order.
|
DSCC 440-02
Monika Polak
TR 2:00PM - 3:15PM
|
Fundamental concepts and techniques of data mining, including data attributes, data visualization, data pre-processing, mining frequent patterns, association and correlation, classification methods, and cluster analysis. Advanced topics include outlier detection, stream mining, and social media data mining. CSC 440, a graduate-level course, requires additional readings and a course project.
|
DSCC 461-1
Eustrat Zhupa
MW 12:30PM - 1:45PM
|
This course presents the fundamental concepts of database design and use. It provides a study of data models, data description languages, and query facilities including relational algebra and SQL, data normalization, transactions and their properties, physical data organization and indexing, security issues and object databases. It also looks at the new trends in databases. The knowledge of the above topics will be applied in the design and implementation of a database application using a target database management system as part of a semester-long group project.
|
DSCC 462-02
Anson Kahng
TR 4:50PM - 6:05PM
|
This course will cover foundational concepts in descriptive analyses, probability, and statistical inference. Topics to be covered include data exploration through descriptive statistics (with a heavy emphasis on using R for such analyses), elementary probability, diagnostic testing, combinatorics, random variables, elementary distribution theory, statistical inference, and statistical modeling. The inference portion of the course will focus on building and applying hypothesis tests and confidence intervals for population means, proportions, variances, and correlations. Non-parametric alternatives will also be introduced. The modeling portion of the course will include ANOVA, and simple and multiple regression and their respective computational methods. Students will be introduced to the R statistical computing environment. PREREQUISITES: MTH 150 or MTH 150A; AND MTH 142 or MTH 161 or MTH 171 (or equivalent discrete math and calculus coursework)
|
DSCC 465-01
Yukun Ma
TR 2:00PM - 3:15PM
|
The course provides an introduction to modern machine learning concepts, techniques, and algorithms. Topics discussed include regression, clustering and classification, kernels, support vector machines, feature selection, goodness of fit, neural networks. Programming assignments emphasize taking theory into practice, through applications on real-world data sets. Students will be expected to work with Python programming environment to complete the assignments. PRE-REQUISITES: DSCC/CSC 462 or equivalent introductory statistics background.
|
DSCC 475-1
Ajay Anand
TR 11:05AM - 12:20PM
|
Description: Time series analysis is a valuable data analysis technique in a variety of industrial (e.g., prognostics and health management), business (e.g., financial data analysis) and healthcare (e.g., disease progression modeling) applications. Moreover, forecasting in time series is an essential component of predictive analytics. The course will begin with an introduction to practical aspects relevant to time series data analysis such as data collection, characterization, and preprocessing. Topics covered will include smoothing methods (moving average, exponential smoothing), trend and seasonality in regression models, autocorrelation, AR and ARIMA models applied to time series data. Deep learning models including feedforward, recurrent, gated and convolutional architectures will also be studied. Students shall work on projects with time-series data sets using modeling tools in Python. PREREQUISITES: Introductory Statistics (DSC 262/STT212/STT213 or equivalent), Linear Algebra and Differential equations (MTH 165 or equivalent), and applied Python programming (CSC161 or equivalent)
|
DSCC 483-01
Ajay Anand; Cantay Caliskan
MW 10:25AM - 11:40AM
|
The capstone/practicum provides an experience for data science majors/MS candidates to apply the core knowledge and skills attained during their program to a tangible data science focused project. Students will work in small teams on a project that applies data science methods to the analysis of a real-world problem. The instructor will guide each team in developing a topic that makes use of the knowledge the team members gained through their application area courses. The identified projects or problems and data sets will cover a range of application areas and reflect real-world needs from industry, medicine and government. Each student will be required to write a paper about their project, which satisfies one upper-level writing requirement for majors and Plan B for master's. PREREQUISITES: DSC 240/440 (Data Mining) AND an introductory statistics course such as DSCC 262/462, STT212 or STT213 or equivalent. DSC 261/461 (Database Systems) strongly recommended prior but may be taken concurrently. FOR DSC GRADUATING SENIORS and MS CANDIDATES. GRADUATING STUDENTS this semester have priority for eligibility/instructor permission. PERMISSION REQUEST: To seek instructor permission/eligibility, follow directions on UR Student.
|
DSCC 491-02
Heather Reyes
7:00PM - 7:00PM
|
The Digital Health Innovation (DHI) Track is an immersive experience, originally designed for resident trainees that now includes nurse leaders and data science graduate students. It is supported by the Department of Pediatrics and the UR Health Lab. The DHI Track aims to answer three big questions: 1) What is Digital Health?, 2) How does Digital Health exist in academia?, and 3) How can one become an entrepreneur? These questions are addressed through a variety of learning experiences including lectures, workshops, site visits, and networking. Specific topics include: Telehealth & Digital Patient Monitoring, Clinical Informatics, AI & ML in Medicine, Ethics of AI & Data Use, QI in Digital Health, Engineering Student Collaboration, Path from IP to Commercialization, Regulatory Considerations, Business Cases for Digital Health Solutions, etc. Evaluation: The commitment for the Digital Health Innovation track are the two weeks of bootcamp (September 16-27), as well as a few follow up sessions after that to discuss benefits, challenges, and ideas for future cohorts (schedule TBD). Students are expected to come ready to learn and participate in the group lectures/discussions with the other participants. At the end of the program, students must submit a reflection (paper, diagram, list, etc.) outlining their recommendations for future cohorts. Data science students will also be able to contribute to projects led by residents and nurses, but it is not required. Instructor Permission Required.
|
DSCC 491-1
7:00PM - 7:00PM
|
To register for Independent Study, contact program advisor before registering.
|
DSCC 494-04
Ajay Anand
7:00PM - 7:00PM
|
For master’s students. Experience in an applied setting, supervised on site. Internships are approved, sponsored, and graded by a member of the University faculty based on mutually agreed upon requirements.
|
DSCC 495-01
7:00PM - 7:00PM
|
Contact program coordinator and faculty before registering research for credit. Must complete a research contract.
|
DSCC 495-02
Mayk Caldas Ramos
7:00PM - 7:00PM
|
Large language models have demonstrated strengths in diverse chemical tasks. This project aims to develop a language model capable of generating new molecules following a general molecular description. To accomplish this goal, we will extract information and descriptions about molecules from available open-source scientific papers from the Semantic Scholar database. A model will then be trained to map elements of the molecular description to the SMILES representation of the molecule. Evaluation based on meeting once a week, giving 2 group meetings and a final report.
|
DSCC 495-03
Nebojsa Duric
7:00PM - 7:00PM
|
Ultrasound tomography uses full-waveform inversion to reconstruct the sound speed profile in tissue by matching signals produced by simulations to measured recordings. However, if the initial guess is far from reality, simulated waveforms may be more than a half-cycle away from the measured waveform, resulting in convergence to false local minima. One way to avoid this cycle skipping is to use low-frequency data that lengthen this half-cycle window. Therefore, we aim to develop a deep learning algorithm that can extrapolate low-frequency data from existing data. Weekly meetings reporting results will be used to gauge progress and determine final grade.
|
DSCC 495-04
Ram Haddas
7:00PM - 7:00PM
|
Human Motion Lab • Recognize common human motion laboratory tools (i.e. human motion capture, force plate, EMG, etc.) and the types of data that are output from those devices. The final evaluation for this course will consist of the following: • Development of a large-scale database to store all of the clinical and research data collected from patients/subjects in the human motion laboratory and clinical standard of care. • Development of an efficient process to recall a subset of the database to be used in clinical reporting and research purposes. • Development of an efficient process to store and recall control patient/subject data to be used in clinical reporting and publication purposes.
|
DSCC 495-05
Caitlin Dreisbach
7:00PM - 7:00PM
|
Execute a research study examining the risk profiles of alcohol use among sexual minority groups and how key demographic features impact their risk profiles based upon the Minority Stress Model, using the All of Us Research Program data. All analyses will be conducted in the Research Workbench. Evaluation: Required meetings once per week to detail progress and determine next steps. Result of the semester will be a publishable manuscript.
|
DSCC 495-06
Sukardi Suba
7:00PM - 7:00PM
|
The project focuses on applying natural language processing (NLP) to extract date and time of symptom onset using clinical notes from the electronic health record data of patients with acute coronary syndrome. The objectives for the student's research: (1) to develop, test, and compare two NLP models, regular expression and parsedatetime, to extract date/time information from unstructured clinical texts; (2) to determine best performing models (high accuracy) when compared with manually annotated notes; and (3) apply the best model on ~1,000 patients' notes. Evaluation based on: weekly meetings (45 min); required readings; final report at the end of the semester.
|
DSCC 495-07
Dongmei Li
7:00PM - 7:00PM
|
Recent studies have shown COVID-19 patients with comorbidities have an increased risk for severe illness of COVID-19. Numerous studies have showed the link between smoking and comorbidities as smoking is a well-known risk factor for common comorbidities. However, the association of smoking and severity of COVID-19 is still unclear with inclusive results from recent studies. Given the association of comorbidities with both smoking and COVID-19, we propose to investigate the moderation effects of smoking in the association of comorbidities and COVID-19 using de-identified data (level 2) from the National COVID Cohort Collaborative (N3C). Our research question is whether smoking has moderation effects in the association of comorbidities with COVID-19 outcomes such as whether a patient is hospitalized, admitted to the ICU, and dead due to COVID-19. The proposed study will contribute to the literature on the direct and indirect association of both smoking/vaping and comorbidities with COVID-19 outcomes. Further we also plan to determine the health disparity relationship with smoking/vaping in COVID-19. The student will meet with the mentor weekly and generate a final report on the data analysis results. We expect the student will write the data analysis results into a manuscript for submission to peer-reviewed journals.
|
DSCC 495-08
Zhiyao Duan
7:00PM - 7:00PM
|
Speaker Attractor Multi-Center One-Class Learning For Voice Anti-Spoofing : Voice anti-spoofing systems are crucial auxiliaries for automatic speaker verification (ASV) systems. A major challenge is caused by unseen attacks empowered by advanced speech synthesis technologies. Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space. However, such compactness lacks consideration of the diversity of speakers. In this work, we propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. For training, we propose an algorithm for the co-optimization of bona fide speech clustering and bona fide/spoof classification. For inference, we propose strategies to enable anti-spoofing for speakers without enrollment. Our proposed system outperforms existing state-of-the-art single systems with a relative improvement of 38% on equal error rate (EER) on the ASVspoof2019 LA evaluation set. This project will extend the previous Sensitivity Analysis of Model Output (SAMO) conference paper (https://ieeexplore.ieee.org/document/10094704) into a journal paper with additional experiments and new settings including on the most recent data collected. Evaluation Basis: 6+ hours/week of work, attending weekly meetings, completing a final presentation or report.
|
DSCC 495-09
Timothy Dye
7:00PM - 7:00PM
|
Comparative global self-identified race and ethnicity classification project: We captured self-described open-ended race and ethnicity in a large global COVID study from 170 nations in seven languages (English, French, Spanish, Italian, Arabic, Chinese, Hindi), and have such data for about 10,000 people in an unstructured format. We will identify and study the impact of context-specific social and cultural determinants of health issues around the world. One of the challenges in examining inequality, oppression, and discrimination worldwide is that such constructs operate relative to the social, racial, and ethnic identities common in a region. Course Evaluation: Weekly team meetings and progress reports, final progress report and presentation, peer and instructor reviews
|
DSCC 495-16
Sobhit Kumar Singh
7:00PM - 7:00PM
|
Data-Driven Discovery of Quantum Materials: This research project aims to employ advanced data-driven techniques, including deep learning and line graph neural networks, to investigate various physical properties of quantum materials. The focus will be on properties such as superconductivity, magnetism, ferroelectricity, and elasticity, with a particular emphasis on layered van der Waals materials exhibiting unique electronic features. By leveraging large open-source materials databases such as JARVIS and materials-project, we will develop models to predict and characterize novel quantum materials, with the aim of discovering interesting layered 2D materials for practical applications. Course evaluation will be based on the frequency of 1:1 meetings, progress reports, and a final research
|
DSCC 495-17
Marvin Doyley
7:00PM - 7:00PM
|
This project aims to develop a cutting-edge writing tool that mimics an individual's (my writing style) using the power of large language models (LLMs) and Retrieval-Augmented Generation (RAG). The tool will be capable of generating responses, essays, or other forms of written content in a style that closely matches the user's unique writing patterns, tone, and vocabulary. Evaluation based on presentation at group meeting
|
DSCC 495-18
Mujdat Cetin
7:00PM - 7:00PM
|
This project is about the electroencephalography (EEG) based brain-computer interface (BCI) system at the Signal, Data, and Imaging Sciences (SDIS) Laboratory. The project will involve an assessment of the various software tools used in this system followed by recommendation, implementation, and testing of new tools that improve upon the existing software and deliver a unified, modular solution for real-time BCI experiments.
|
DSCC 495-19
Florian Jaeger
7:00PM - 7:00PM
|
The project compares two competing models of speech perception against human behavior. Models of exemplar theory and auditory normalization will be implemented, trained on a phonetic database, and evaluated against data from a human perception experiment. The goal is to determine whether human perception is best described by exemplar theory, normalization, or a combination of both. Two data sets are available for this comparison: data from the perception of vowels (i, e, a, o, etc., Persson et al, 2024) or fricatives (f,v,s,z, etc., McMurray & Jongman, 2011). Brief weekly mtg (30mins) to address blockers and a longer biweekly mtg (1h) to discuss results and readings. Grading: 30% consistency and clarity of weekly progress reports; 30% lab book that documents your work product, e.g., in the form of R markdown, a jupyter notebook with text, or some other form; 40% a final short write-up of your project of about 4 pages + references in ACL or similar conference proceeding format (40%).
|
DSCC 495-20
Tong Geng
7:00PM - 7:00PM
|
Optimizing the Large Language Models: The project focuses on the integration of large language models (LLMs) and diffusion models in scientific research, with an emphasis on evaluating their trustworthiness, managing hallucinations, and enhancing creativity. Additionally, the role involves supporting PhD students in devising smart quantization techniques aimed at reducing the operational costs of LLMs, while simultaneously improving their creative capabilities.
|
DSCC 511-01
Hangfeng He
MW 9:00AM - 10:15AM
|
This seminar offers an introduction to Large Language Models (LLMs), covering essential concepts such as Transformers, BERT, GPT-3, InstructGPT, prompting & decoding, and emergent abilities. Students will engage with a range of topics through paper presentations on themes such as Tool-Augmented LLMs, Multimodal Learning, LLMs for Science, Social and Ethical Concerns, Superintelligence Concerns, and Democratizing LLMs. Participants are required to present and discuss papers, write critical literature reviews, reproduce paper results, and collaborate on team projects. This seminar aims to provide a thorough understanding of LLMs, exploring their origins, opportunities, and concerns to enhance professional expertise in the field.
|
DSCC 895-1
7:00PM - 7:00PM
|
Blank Description
|
DSCC 897-1
Ajay Anand
7:00PM - 7:00PM
|
Please see advisor before enrolling.
|
DSCC 899-1
7:00PM - 7:00PM
|
see advisor before enrolling
|
Fall 2024
Number | Title | Instructor | Time |
---|---|
Monday and Wednesday | |
DSCC 401-01
Brendan Mort
|
|
This course provides a hands-on introduction to widely-used tools for data science. Topics include Linux; languages and packages for statistical analysis and visualization; cluster and parallel computing including GPUs; Hadoop and Spark; libraries for machine learning; NoSQL databases; and cloud services. PREREQUISITES: CSC 161, CSC 171 or some equivalent programming experience strongly recommended. |
|
DSCC 511-01
Hangfeng He
|
|
This seminar offers an introduction to Large Language Models (LLMs), covering essential concepts such as Transformers, BERT, GPT-3, InstructGPT, prompting & decoding, and emergent abilities. Students will engage with a range of topics through paper presentations on themes such as Tool-Augmented LLMs, Multimodal Learning, LLMs for Science, Social and Ethical Concerns, Superintelligence Concerns, and Democratizing LLMs. Participants are required to present and discuss papers, write critical literature reviews, reproduce paper results, and collaborate on team projects. This seminar aims to provide a thorough understanding of LLMs, exploring their origins, opportunities, and concerns to enhance professional expertise in the field. |
|
DSCC 483-01
Ajay Anand; Cantay Caliskan
|
|
The capstone/practicum provides an experience for data science majors/MS candidates to apply the core knowledge and skills attained during their program to a tangible data science focused project. Students will work in small teams on a project that applies data science methods to the analysis of a real-world problem. The instructor will guide each team in developing a topic that makes use of the knowledge the team members gained through their application area courses. The identified projects or problems and data sets will cover a range of application areas and reflect real-world needs from industry, medicine and government. Each student will be required to write a paper about their project, which satisfies one upper-level writing requirement for majors and Plan B for master's. PREREQUISITES: DSC 240/440 (Data Mining) AND an introductory statistics course such as DSCC 262/462, STT212 or STT213 or equivalent. DSC 261/461 (Database Systems) strongly recommended prior but may be taken concurrently. FOR DSC GRADUATING SENIORS and MS CANDIDATES. GRADUATING STUDENTS this semester have priority for eligibility/instructor permission. PERMISSION REQUEST: To seek instructor permission/eligibility, follow directions on UR Student. |
|
DSCC 461-1
Eustrat Zhupa
|
|
This course presents the fundamental concepts of database design and use. It provides a study of data models, data description languages, and query facilities including relational algebra and SQL, data normalization, transactions and their properties, physical data organization and indexing, security issues and object databases. It also looks at the new trends in databases. The knowledge of the above topics will be applied in the design and implementation of a database application using a target database management system as part of a semester-long group project. |
|
DSCC 420-01
Gonzalo Mateos Buckstein
|
|
The goal of this course is to learn how to model, analyze and simulate stochastic systems, found at the core of a number of disciplines in engineering, for example communication systems, stock options pricing and machine learning. This course is divided into five thematic blocks: Introduction, Probability review, Markov chains, Continuous-time Markov chains, and Gaussian, Markov and stationary random processes. Prerequisites: ECE 242 or equivalent |
|
Tuesday and Thursday | |
DSCC 435-1
Jiaming Liang
|
|
This course primarily focuses on algorithms for large-scale optimization problems arising in machine learning and data science applications. The first part will cover first-order methods including gradient and subgradient methods, mirror descent, proximal gradient method, accelerated gradient method, Frank-Wolfe method, and inexact proximal point methods. The second part will introduce algorithms for nonconvex optimization, stochastic optimization, distributed optimization, manifold optimization, reinforcement learning, and those beyond first-order. |
|
DSCC 475-1
Ajay Anand
|
|
Description: Time series analysis is a valuable data analysis technique in a variety of industrial (e.g., prognostics and health management), business (e.g., financial data analysis) and healthcare (e.g., disease progression modeling) applications. Moreover, forecasting in time series is an essential component of predictive analytics. The course will begin with an introduction to practical aspects relevant to time series data analysis such as data collection, characterization, and preprocessing. Topics covered will include smoothing methods (moving average, exponential smoothing), trend and seasonality in regression models, autocorrelation, AR and ARIMA models applied to time series data. Deep learning models including feedforward, recurrent, gated and convolutional architectures will also be studied. Students shall work on projects with time-series data sets using modeling tools in Python. PREREQUISITES: Introductory Statistics (DSC 262/STT212/STT213 or equivalent), Linear Algebra and Differential equations (MTH 165 or equivalent), and applied Python programming (CSC161 or equivalent) |
|
DSCC 440-02
Monika Polak
|
|
Fundamental concepts and techniques of data mining, including data attributes, data visualization, data pre-processing, mining frequent patterns, association and correlation, classification methods, and cluster analysis. Advanced topics include outlier detection, stream mining, and social media data mining. CSC 440, a graduate-level course, requires additional readings and a course project. |
|
DSCC 465-01
Yukun Ma
|
|
The course provides an introduction to modern machine learning concepts, techniques, and algorithms. Topics discussed include regression, clustering and classification, kernels, support vector machines, feature selection, goodness of fit, neural networks. Programming assignments emphasize taking theory into practice, through applications on real-world data sets. Students will be expected to work with Python programming environment to complete the assignments. PRE-REQUISITES: DSCC/CSC 462 or equivalent introductory statistics background. |
|
DSCC 462-02
Anson Kahng
|
|
This course will cover foundational concepts in descriptive analyses, probability, and statistical inference. Topics to be covered include data exploration through descriptive statistics (with a heavy emphasis on using R for such analyses), elementary probability, diagnostic testing, combinatorics, random variables, elementary distribution theory, statistical inference, and statistical modeling. The inference portion of the course will focus on building and applying hypothesis tests and confidence intervals for population means, proportions, variances, and correlations. Non-parametric alternatives will also be introduced. The modeling portion of the course will include ANOVA, and simple and multiple regression and their respective computational methods. Students will be introduced to the R statistical computing environment. PREREQUISITES: MTH 150 or MTH 150A; AND MTH 142 or MTH 161 or MTH 171 (or equivalent discrete math and calculus coursework) |
|
Friday |