Institute News

Gain, refresh your skills in data science—part time

Published
April 13, 2022
Brooke Brehm, at left, and Sarah Siddiqui at right.
Brooke Brehm, at left, and Sarah Siddiqui both say they’ve benefited from the data science skills they’ve learned from the Advanced Certificate in Data Science program offered through the Goergen Institute for Data Science.

Advanced certificate program is geared to working professionals

Sarah Siddiqui, a University of Rochester STEM librarian, helps scientists at the Laboratory for Laser Energetics, mechanical engineers, and earth and environmental faculty members with their research.

She also curates collections of science books and scholarly journals and conducts workshops for students.

Increasingly, Siddiqui’s work involves data science (DS)—the creation and application of novel techniques to collect, curate, analyze, and make discoveries from large sets of data.

The Advanced Certificate in Data Science she completed last fall through the University’s Goergen Institute for Data Science helped her expand her DS skillset with tools such as Linux, and learn new concepts in computational statistics, for example the R computing environment that is optimized for statistical analysis and data visualization.

Even before she completed the certificate, Siddiqui says, the knowledge she gained helped her co-mentor a PhD data science fellow working with the University on a project through the LEADING program. Now Siddiqui is ready to “go into new areas where I can apply some of these skills in combination with what I am currently doing,” she says.

Would she recommend the certificate program to others?

“Oh, definitely,” she says. “It was really fun.”

Skills for one of the fastest growing fields

Data science is “considered one of the fastest growing, best paid professions in the United States,” says Ajay Anand, deputy director of the Goergen Institute and an associate professor of data science. “A lot of this is being fueled by the massive amounts of data that companies and organizations are collecting, and the need to harmonize, store, and use that data in a form that is readily available for analysis and insight.”

The advanced certificate program, launched in 2020, is a graduate-level, part-time program to help working professionals gain or refresh skills in the field. Participants must have a bachelor’s degree (in any discipline), one-year equivalent coursework in undergraduate calculus and linear algebra, and introductory knowledge of Python or Java. (Coursera and summer courses are available for applicants to gain these equivalencies.)

“You earn the certificate by completing essentially four graduate-level courses,” Anand says. “We structure it so that it can be completed in a relatively efficient time frame—two to four semesters, depending on how you pace yourself, and how many courses you take each semester. One of the motivations for the program was to provide individuals, who have a working knowledge of data science problems and solutions through their professional work, an opportunity to formalize their training with a deeper mastery of fundamental concepts in the field”

Three different tracks are offered based on a student’s background. Students with limited experience, for example, are required to complete these four courses:

  • Computational Introduction to Statistics (462): foundational concepts in descriptive analyses, probability, and statistical inference, including data exploration through descriptive statistics (with a heavy emphasis on using R for such analyses), elementary probability, diagnostic testing, combinatorics, random variables, elementary distribution theory, statistical inference, and statistical modeling.
  • Introduction to Statistical Machine Learning (465): modern machine learning concepts, techniques, and algorithms, including regression, clustering and classification, kernels, support vector machines, feature selection, goodness of fit, neural networks.
  • Tools for Data Science (401): hands-on introduction to computational hardware and Linux; languages and packages for statistical analysis and visualization; parallel computing and Spark; libraries for machine learning and deep learning; databases including NoSQL; and cloud services.
  • Data Mining (440): Fundamental concepts and techniques, including data attributes, data visualization, data pre-processing, mining frequent patterns, association and correlation, classification methods, and cluster analysis. Advanced topics include outlier detection, stream mining, and social media data mining.

Participants with prior academic experience in computational statistics or data science programming can choose one of two other tracks that allow them to take an elective course.

The courses are also taken by full-time students at the University, including undergraduates, and are scheduled primarily during the day. “So, a little flexibility from your employer, depending on your circumstance, may be needed,” says Lisa Altman, the Georgen Institute’s education program coordinator.

For many certificate students, this is their first academic experience in many years. Altman sets up support groups to help the seasoned professionals in the program readjust to academia and give them an opportunity to share experiences with classmates in the same position.

‘I’ve learned so much valuable information’

The 10 participants who will have completed the program this year include PhD students and working professionals in jobs such as bioinformatics decision support analyst, analyst/programmer, and data management senior developer.

They also include Brooke Brehm, an academic records specialist with the University, who earned a BA in music at Rochester. She became interested in learning more about data science because of her experiences while completing a master’s degree in arts administration at Drexel University and working as a ticketing services associate at the Walnut Street Theater in Philadelphia.

Her previous master’s thesis, for example, was on ticketing in opera companies, “but the data just wasn’t readily available to me,” Brehm says. “I wanted to learn ways to access and process data, to do the kind of analysis I was interested in.” At the theater, she adds, “I was running a lot of reports on the customer base, and I was interested in learning the backend of what I was actually doing with that reporting. So, I got interested in learning Sequel (Structured Query Language).”

The desire to learn more increased when she began her current job at Rochester. Brehm is responsible for entering and maintaining data on all the transfer credits from incoming and current students, including AP courses and tests from high school, credits from students who study abroad or transfer in from another college. She is also responsible for entering and maintaining data on students who fail to meet requirements for any of the basic science sequences, such as calculus, and making sure the students are notified.

After taking an online course in SQL and undergraduate course in Python, Brehm was accepted into the advanced certificate program in 2020.

“The general programming logic that I’ve learned makes a lot of sense in my work, and it has helped me think of ways to improve my workflow,” she says. Working with one of her supervisors, for example, she has created different notifications that rely on coding logic. “Instead of personally sending tons of notification emails students, I can just hit a notification button and send them an email and save something in their file.”

In addition to completing the certificate this spring, she will complete a master’s degree in data science this fall.

And after that?

“I can definitely see myself moving into more of a data analysis position or even more data heavy type of position at the University. I'm excited to see where it takes me and my skills,” Brehm says.

Like Siddiqui, she encourages others to take the certificate program. “I’ve learned so much valuable information,” Brehm says.

To learn more or find out about sitting in on a class session, contact Lisa Altman.