There will be 5 homework sets/laboratories for this class, each of which is in the form of a Colab notebook with programming/computational questions for you to get some hands-on training on network analysis of real-world data. At times there will be some optional analytical problems for extra credit. We will use the Python programming language for all laboratories in this course. As discussed in the first lecture of the semester, our expectation is that many of you will have some experience with Python and numpy. In case you need to catch up with this prerequisite knowledge, here is an excellent tutorial from Stanford's course on convolutional neural networks. The tutorial will serve as a quick crash course on both the Python programming language and its use for scientific computing using notebooks such as Colab.
The list of laboratories follows. You can refer to the end of this page for matters about logistics and the policy on grading and collaboration.
Laboratory 1 - Manipulating graphs, introduction to NetworkX and PyTorch Geometric
In this first laboratory we will work with a real dataset, generate a network graph and analyze it using the Python package NetworkX. We will also introduce pandas, an excellent library to load and process datasets efficiently. A third goal of this assigment is to start familiarizing ourselves with PyTorch Geometric, a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to network data. You are required to submit a report with your solutions (a single pdf file you will upload to Gradescope) including your answers and discussions, the Python code you developed as well as the corresponding outputs including figures. If you work on the (optional) analytical problem, as part of the report include a writeup of your solutions (typing is not mandatory). Due Wednesday February 7 via Gradescope.
Laboratory 2 - Descriptive analysis of network graph characteristics
In this second laboratory we will re-examine some of the structural properties of large-scale networks we discussed in class (for instance, power-law degree distributions and assortative mixing). In particular, we will corroborrate that these properties indeed arise with some real-world networks encountered across diverse domains. Moreover, we will implement a few of the classic spectral-based algorithms for graph partitioning or network community detection. Various metrics to assess performance of these unsupervised node clustering algorithms will be introduced, which we will bring to bear in our experiments with the workhorse Zachary's Karate Club network and a political blog graph from the 2004 US presidential election. You are required to submit a report with your solutions (a single pdf file you will upload to Gradescope) including your answers and discussions, the Python code you developed as well as the corresponding outputs including figures. If you work on the (optional) analytical problem, as part of the report include a writeup of your solutions (typing is not mandatory). Due Friday March 22 via Gradescope.
Laboratory 3 - Models for network graphs and their applications
In this third laboratory we explore some classical models for undirected and unweighted random graphs as well as their mutual relationships. Specifically, we will consider the Erdos-Renyi model, Stochastic Block Models and the more general Random Dot Product Graphs (RDPGs). The latter is an instance of a latent positions model, and offers a segway to node embeddings for graph representation learning. That is, each node is represented by a latent vector from which, in a sense, the network structure can be "recovered". If each node is represented by a vector, then it is natural to perform e.g., clustering from this representation (or tackle any other relevant task by bringing to bear geometric tools for data analysis). In particular, we will use the RDPG model to detect communities in a social network constructed from historical games played among national soccer teams. You are required to submit a report with your solutions (a single pdf file you will upload to Gradescope) including your answers and discussions, the Python code you developed as well as the corresponding outputs including figures. If you work on the (optional) analytical problem, as part of the report include a writeup of your solutions (including code and plots, typing is not mandatory). Due Wednesday April 17 via Gradescope.
Laboratory 4 - Network topology inference
In this fourth laboratory we will get some hands-on exposure to methods for network topology inference, meaning the problem of recoverying the topology of a sparse graph from nodal observations. Suppose we have access to the vertices of an undirected and weighted graph, but the pairwise relationships among said vertices are unknown (i.e., we do not get to observe edge status or the edge weights). Accordingly, a fundamental network analytic question is how to use observations of nodal signals (attributes or features) to learn the underlying network structure, or, a judicious network model of said data facilitating efficient signal representation, visualization, prediction, (nonlinear) dimensionality reduction, and (spectral) clustering. It is clear that one must assume some relation between the signals and the unknown underlying graph, since otherwise, the topology inference exercise would be hopeless. We will henceforth focus on a few key approaches where this relation is given by statistical generative priors as well as by properties of the signals with respect to the underlying graph such as smoothness. In particular, we will implement some of these algorithms and gain insights on how to apply them in various settings. First, we will work with synthetic data generated from a known graph we seek to recover. In the second part of this laboratory we will perform experiments with a real dataset of images of digits, and will learn a graph that captures the inherent structure in the data. You are required to submit a report with your solutions (a single pdf file you will upload to Gradescope) including your answers and discussions, the Python code you developed as well as the corresponding outputs including figures. Optional assignment not for credit. No need to submit.
Laboratory 5 - Graph neural networks for recommendation systems
The objective of this final laboratory is to design a recommendation system that predicts the ratings that customers would give to a certain product. Say, the rating that a moviegoer would give to a specific movie, or the rating that an online shopper would give to a particular offering. To make these predictions we leverage the ratings that customers have given to this or similar products in the past. This approach to ratings predictions is called collaborative filtering. This assignment is particularly useful to gain additional hands-on exposure with important mothodology towards learning with graphs. First, we will focus on the construction of the network. In this setting the data are not natively in the form of a graph, or at least, as we will see there are several ways to generate a graph from the data. Once we have constructed the graph, we will develop, train, and evaluate a graph neural network (GNN) model for making these rating predictions. We will use the small version of the Grouplens group movie rating dataset, which provides 100,000 ratings given to 9000 movies by 600 users. You are required to submit a report with your solutions (a single pdf file you will upload to Gradescope) including your answers and discussions, the Python code you developed as well as the corresponding outputs including figures. Optional assignment not for credit. No need to submit.
Logistics
Specification of the deliverables will be clearly indicated in each assignment. Laboratories are typically due by Wednesdays as per the calendar above. I ask that you please make your best effort to meet the deadline. If you can't, for whatever reason, I have no problem granting extensions. That said, do not abuse that prerogative, if you are late once in the semester, that's OK. If you are late more than that you need to organize your time better.
All your solutions should be submitted via Gradescope. You should prepare and upload a report as a single pdf file with your answers to all questions in each laboratory (including the code you developed, outputs, plots, discussions, and whatever you deem relevant). For the coding parts of the assignments you should also include the printouts of scripts you prepared to run your simulations, as well as all plots that are requested (with clear axes labels, curve legends, figure caption, etc.). Please do not submit a copy of your Colab notebook (.ipynb), since we won't be running your code. Especially when it comes to answers of analytical (paper and pencil) questions, there is no need to type everything using LaTeX or a word processing software (of course you can type if you want). But whatever you hand in should be clear, readable, concise, and your arguments/logic well justified and articulated. For these few and often optional questions that involve mathematical derivations or explanations you can use e.g., a tablet (if you have one) to write your solutions and generate the pdf. Otherwise, you can scan your writeup. If you user your phone, I suggest you rely on a scanning App and avoid taking photos of your pages. Again, make sure to have everything in a single pdf file and that your name is written in the first page.
The class has a TA to help you out with the laboratories. The name of the TA is Chang. His office hours can be found here.
Policy on grading and collaboration
Each of the 5 laboratories contributes 30/5=6% towards your final grade. The purpose of the laboratories is to help you in the learning process. Therefore, it is foolish of you and also unethical to hand in the solution of others as your own. While I am quite confident this will not be an issue, if you engage in unethical behavior you will get no points for the assignment and I will recommend that you drop the class and retake it next year. These cases will also be dealt with according to the University of Rochester's Academic Honesty Policy.
Going back to the topic of learning, collaboration with peers is allowed and encouraged. If you are struggling with the material, your classmates are possibly a more valuable resource than myself. If you are doing well in the class, there is no better use of your time than helping your peers. It is a very surprising fact of life that happiness is to be found when you give to someone, not when you receive from someone. Work on the laboratories and try to make progress. If you're stuck, go talk with a classmate. If you are still stuck go talk with your TA. Then come talk with me (though it is unlikely I will be able to help with coding questions); or do not hesitate to send me an email with a concrete question that I will reply within the day.