The class comprises 10 homework sets, each of which typically has two parts: (i) a few problems for you to get some training on the material covered in class; and (ii) a small project that studies a particular stochastic system. For (ii) you are typically required to answer some questions about properties of the stochastic system, write a Matlab simulation and use this simulation to understand some properties of the system. Matlab can be obtained following these instructions, and a starter's guide can be found here. The list of homework sets and solutions follows. You can refer to the end of the page for matters about logistics and the policy on grading, collaboration and use of posted solutions.


Introduction


Disclaimer: In this first assignment you analyze a simple stochastic system. It is a little unconventional since we are asking you to perform the type of analysis this class is supposed to teach you. Therefore, some parts of it might not be very clear to you but will become clearer as we progress in the class. The idea is to gain an appreciation for what the challenges are in analyzing a stochastic system. I also want you to appreciate how it is possible to state facts about the system's behavior even if randomness implies that anything is possible.

Homework 1 - Lower-bounded biased random walk (Due Wednesday September 4, submit via Gradescope). This example considers a certain game in a certain casino where your chances of winning are slightly better than your chances of loosing. More specifically, you place 1-dollar bets, with probability p you are paid 1 dollar and with probability 1-p you loose you bet. The good news is that p>1/2. The catch is that you have to keep playing forever or until you loose all your money. If you have w dollars is it a good idea to play this game or not? You have to balance the fact that while you are more likely to win than loose, there is always a chance of getting unlucky a few times and losing all your money. And since you have to keep playing forever the latter possibility cannot be ignored heedlessly. We will also study the case p<1/2. If you are having trouble with this homework, this piece of code simulates one experiment. This other code repeats the experiment a hundred times and the numerical analysis of outcomes is undertaken here. You are well advised to first try things on your own. Solutions are available here.


Probabilty review


Homework 2 - Relationship between Bernoulli, binomial, Poisson and normal random variables. (Due Monday September 16, submit via Gradescope) On a fundamental level all random phenomena depend on something happening or not happening. Thus, all random phenomena are coin tosses. If that is true, then why do we study probability? The answer is that random phenomena tend to generate regular structure. If you throw a coin a thousand times, you can expect quite certainly to get at least 400 heads. We started with an uncertain system but ended up with a certain statement. This regularity appears in the limit of sequences of random variables and is exploited by anyone doing serious applications of probability to real world problems. In the end, as Kolmogorov said the fact that randomness generates structure is the core epistemological value of the theory of probability. To help you appreciate the beauty of that ;) this small project starts with sequences of coin tosses (Bernoulli random variables) and considers the number of heads in the sequence. As the number of tosses increases, you'll see how Poisson and normal random variables appear naturally. These two limit behaviors are examples of random phenomena with a particular structure (sequences of Bernoulli, described by binomial distributions) that when considered in a large scale give rise to a different type of structure (normal or Poisson). The complete solutions are available here.


Homework 3 - Decision making. (Due Friday September 27, submit via Gradescope) Say you are selling your car for which you are going to receive J offers. Offers will be of different value, and if all J of them were presented upfront, the car would be sold to the highest bidder. Alas, potential buyers make their offers in a random order and if not accepted they withdraw their bid -- presumably, they can find a different seller willing to accep their offer. The issue here is the design of a decision mechanism that allows you to sell your car at a reasonable price. In principle, it makes sense to discard the first few bids since you do not know where they fit in the pool of potential offers. Then, you can choose the first offer that fares comparatively well against those that you rejected. The questions here are how many bids do we need to discard and what does it mean for an offer to fare comparatively well with respect to others. The complete solutions are available here.


Markov chains


Homework 4 - Propagation of mitochondrial DNA types. (Due Monday October 7, submit via Gradescope). Mitochondrial DNA is passed from mother to children without genetic contribution from the father. All the variability in mitochondrial DNA is due to random mutations accumulated over time. Using estimates of the mutation rate and differences on mitochondrial DNA between humans it becomes possible to estimate the time at which groups became distinct populations. We are not computing these times here but a few facts about a Markov chain that models the propagation of mitochondrial DNA. In particular, you are asked to follow your mitochondrial DNA for the next few centuries. The Markov chain that we use here belongs to the class of branching processes that are also used, for example, to model population dynamics, chain reactions, the growth of links to web pages and the emergence of hub airports. The complete solutions are available here.


Homework 5 - Aloha protocol for random access communication. (Due Wednesday October 16, submit via Gradescope). A pervasive situation in communication systems is to have a common infrastructure access point (AP) serving a group of physically distributed devices. This is, for example, the situation of your cellphone sending information to the cellular base station, your laptop transmitting to the 802.11 wireless router, or a group of satellites communicating with a common ground station. In these three cases there is a common AP, the base station, the router, or the ground station, serving a group of distributed devices, cellphones, laptops, or satellites. A problem in the uplink communication from physically distributed terminals to the AP is how to separate the information transmitted by different devices. One possibility is to assign different times or frequencies to different terminals. This is called time or frequency division multiplexing. Another possibility is to let terminals transmit at random and hope for luck to avoid simultaneous transmissions. This is called random access. Random access is advantageous because it requires little coordination between terminals. However, it is not clear that random access is a viable communication strategy. You will see in this problem that random access actually performs surprisingly well. A minor variation of this protocol is used in WiFi standards. Cellphones use this protocol to reserve resources for a voice call (the actual voice is transmitted using a different protocol). Random access goes under the picturesque name of Aloha because it was first applied to provide satellite communication services to the islands of Hawaii. The complete solutions are available here.


Homework 6 - Google's PageRank algorithm. (Due Wednesday October 23, submit via Gradescope). The most popular algorithms to rank pages in a web search are stochastic. Consider a web surfer that visits a page and clicks on any of the page's links at random. She repeats this process forever. What fraction of her time will be spent on a given page? The answer to this question is the rank assigned to the page. The same idea can be used to understand the structure of networks in different settings. For example, we can use this algorithm to extract connectivity information from a social graph. Say we choose a student and ask her to direct us to any of her friends selected randomly. We then go to this friend repeat the question and are directed to this new student. This is no different from the random web surfer model. Repeating this process forever we can therefore infer the degree of connectedness of students in the class from the average number of visits to each of them. This is not a pointless exercise. To market products, for example, it is worthwhile to concentrate the effort in the individuals that are most connected to other persons. The important insight here is that the network possesses knowledge that individuals do not. You will need the following homework collaboration matrix connecting students that cooperate in homework assignments. The complete solutions are available here.


Continuous-time Markov chains


Homework 7 - Arrival of passengers at a subway station. (Due Monday November 11, submit via Gradescope). In order to dimension public transportation systems we need to determine the number of customers that arrive at different stations. Since this a random quantity what we actually need to determine is the probability distribution of the number of customers that arrive in a certain time interval. In this problem you will see that as long as we can think of customers as behaving independently, the number of customers that arrive at a train station can be modeled as a Poisson process. Notice how a very unassuming hypothesis - customers behave independently, leads to a rich structure - Poisson arrivals. The complete solutions are available here.


Homework 8 - Cash flow of an insurance company. (Due Wednesday November 20, submit via Gradescope). The purpose of this problem is study a simplified version of the cash flow of an insurance company, and, more specifically, to determine the probability that the company pays dividends in a particular quarter. At any point in time three things might happen: (i) a customer pays a premium, (ii) a claim is paid to a client, or (iii) the company pays dividends. The probabilities of these events are different as are the amounts of money involved. A further constraint on the payment of dividends is that they are paid only if the cash on hand exceeds a certain amount. You will build a continuous-time Markov chain model of the company's cash flow and use it to determine the probability that the company pays dividends in a particular quarter. This homework set is challenging and very important. If you do it correctly, you will become very proficient on continuous-time Markov chains. The complete solutions are available here.


Gaussian, Markov, and stationary random processes


Homework 9 - White Gaussian noise. (Due Monday December 2, submit via Gradescope). White Gaussian noise (WGN) is probably the most common stochastic model used in engineering applications. The idea is to model a random process X(t) for which individual values are normally distributed and values X(t1) and X(t2) for different times t1, t2 are independent. It is not difficult to believe that this is a very simple model. It simply represents the drawing of independent normal random variables at different times. In the first part of this problem you will develop a model of WGN. In the second part you will use WGN to model somewhat more complex systems, for instance the integral of WGN leading to a Brownian motion process. The goal is to observe that while WGN is very simple, it can be used to model complex stochastic systems. Solutions of Problem 7 will be posted Friday November 29, the rest after the due date.


Homework 10 - Arbitrages in sports betting and pricing of stock options. (Due Monday December 9, submit via Gradescope). Consider a random system in which you can place bets on different outcomes. An arbitrage is an opportunity to realize a gain independently of the outcome. More specifically, consider a sports event in which you can bet x dollars on team A and y dollars on team B. If team A wins you are paid ax dollars and realize a profit ax-(x+y) (the profit is actually a loss if this number if negative). If team B wins, you are paid by dollars for a profit of by-(x+y) dollars. Given a and b, an arbitrage is possible if there exist a combination of bets x and y such that both profits are positive. Since bookies are not in the business of giving money away, this is in general impossible. However, it is often possible to bet x with one bookie and y with a different bookie to exploit an arbitrage opportunity. You will explore this possibility in the first part of this assignment. The goal of the second part of this homework is to determine the worth of a European style option to buy a stock. A european style option on a stock with price X(t) at time t is a contract to have the option to buy the stock at a predetermined price on a predetermined future time. The option is described by the strike price K, the strike time t and its price c. Paying c to buy an option at time 0 gives us the opportunity to buy the stock for the strike price K at the strike time t. At time t the worth of the option depends on the value of the stock X(t). If the stock has fallen below the strike price K, i.e., if X(t)≤K the option becomes worthless. If, on the contrary, the price has risen beyond K, i.e., if X(t)>K we can realize a gain X(t)-K b y exercising the option to buy the stock at price K. Given historic data about the stock's value, it is possible to have some expectations about the value of the stock at time t. The relevant question here is how to use these expectations to determine the price c that should be payed to buy the option at time 0. In this problem we use the closing price of Cisco Systems (CSCO) for a past year to determine the price c of an option to buy CSCO. We are using actual data, as you can corroborate from information available in Google finance (click on 1yr). Solutions of Problem 2 will be posted Friday December 6, the rest after the due date.


Logistics


Homework sets are typically due by Mondays or Wednesdays as per the calendar above. I ask that you please make your best effort to meet the deadline. If you can't, for whatever reason, I have no problem granting homework extensions. If you can't make the deadline, don't give me an excuse. Just tell me you need a few more days and that's it. That said, do not abuse that prerogative, if you are late a couple of times in the term, that's OK. If you are late more than that you need to organize your time better.

All your homework solutions should be submitted via Gradescope. You should upload a single pdf file with your answers to all problems in each homework assignment. There is no need to type everything using LaTeX or a word processing software (of course you can type if you want), but whatever you hand in should be clear, readable, concise, and your arguments/logic well justified and well articulated. For the questions that involve mathematical derivations or explanations you can use e.g., a tablet (if you have one) to write your solutions and generate the pdf. Otherwise, you can scan your writeup. If you user your phone, I suggest you rely on a scanning App and avoid taking photos of your pages. For the coding parts of the assignments you should also include the printouts of the Matlab scripts you prepared to run your simulations, as well as all plots that are requested (with clear axes labels, curve legends, figure caption, etc.). Again, make sure to have everything in a single pdf file and that your name is written in the first page.

The class has a TA to help you out with homework. The name of the TA is Hamed Ajorlou. TA office hours can be found here.


Policy on grading, collaboration and use of posted solutions


The purpose of homework is to help you learn. As long as you consistently do you homework, their implication in your final grade is mostly irrelevant. The points are essentially given for hard work, and we will be more than happy to give you the credit when you try hard. Therefore, it is foolish of you to use collaboration or posted solutions to avoid doing work. It'll prevent you from learning and you won't basically gain any advantage with respect to your peers. Despite of this we are aware that some of you may, for example, hand in the posted solution as your own. This is not only foolish but unethical. If you engage in unethical behavior you will get no points for the assignment and I will recommend that you drop the class and retake it next year. These cases will also be dealt with according to the University of Rochester's Academic Honesty Policy.

Going back to the topic of learning, collaboration with peers is allowed and encouraged. If you are struggling with the material, your classmates are a more valuable resource than the teaching staff. If you are doing well in the class, there is no better use of your time than helping your peers. It is a very surprising fact of life that happiness is to be found when you give to someone, not when you receive from someone. The appropriate use of the posted solution is whatever you think helps you learn best. In general, that means you do not look at it until after you submit your solution. Work on the assignment and try to make progress. If you're stuck, go talk with a classmate. If you are still stuck go talk with your TA. Then come talk with me. After you submit your solution look at ours, it may teach you one or two extra things.