Module three. Threads in Go. Topic 3.1: Communication. So far, we've been talking about creating goroutines, a little bit of synchronization. So, waiting on goroutines to exit. But goroutines also communicate sometimes. Generally, goroutines work together to perform a bigger task. These goroutines are not completely independent, typically. So, typically, you're making a bigger program that has sort of semi-independent pieces, each goroutine does one of these semi-independent pieces, but they're not completely independent. If they are completely independent, they will be entirely different programs. You wouldn't even call them the same program. You just have a different program. But, they're usually doing a smaller piece of a bigger task. Just as an example of something where you might use goroutines. Say, you're making a web server. Web server, it's very common to make them multi-threaded because what happens is, this web server, it might be handling, you never know how many people are going to connect to the web server at a time. So, for each person that connects to the web server, there's an interaction. Each person, meaning each browser that connects, you got to communicate with it, receive this message, respond to this message and so on. There's basically a conversation that the server has with each different browser that connects to it. So, it's nice to make it multi-threaded because what can happen is, maybe there's a thousand people, a thousand different browsers connected to this web server at the same time, looking at different pages, different pieces of the page. It's nice, since they're all doing the same basic type of interaction, they are all talking HTTP, keep sending HTTP message back and forth, it makes sense to have a multi-threaded program, so every time a new connection is made, a new browser connects, you create a new thread for that to handle that connection, to handle the communication back and forth between that browser. Then, a new browser connects and you make another thread to handle that. It's a nice, it's a convenient way to keep the state separate because the state of the communication with one browser can be completely separate or largely separate than the state of communication with another browser. So, you want to be able to separate them into different threads. But, remember that they are not completely independent. Remember that, like with this web browser idea, all these thousand connections that are made, these thousands different threads that are running, they're all serving the same set of pages. Say, this is UCI's web server. So, a UCI's webpage they're serving. So, they're sharing data. The base of the UCI web page data, they're sharing that. Say, one browser might post some data to a web page and then another browser should be able to view that. So, there's interactions between these threads since they're all sharing this webpage data. So, this is sort of a typical scenario where these goroutines, they're not completely independent, they work together periodically to form some bigger tasks, they have to trade information. So, they need to send data and receive data in order to collaborate. I'll give you just a sort of a toy example. So, rather than the big web server example, I'm going to work with something small just to sort of get the point across. Say, you got this problem. You want to find the product of four integers, and you decide, "I want to do this with two goroutines. I'm going to have two goroutines multiplied two of the integers." Then, sorry, one goroutine multiply two integers, the other goroutine multiply the other two integers. Then, both of those two goroutines will send their results back to the main goroutine and the main goroutine will multiply their results and the final result will be the product of all four integers. In this case, I've really got three goroutines, the main goroutine, plus I'm going to make two more goroutines that are going to do sub-problems. Multiply two, multiply the other two and the main goroutine combines them. In order to do this, you're going to need to send data, specifically integers, from the main routine to the two subroutines. So, these two new goroutines that you created, you got to give them the two numbers that you want them to multiply. So, data has to go from the main routines, main goroutine to the sub-routines, sub-goroutines. Then, the results of these goroutines and these sub-goroutines have to come back to the main routine because main has to get those pieces of data and multiply them together to get the final result. So, you can see the data transfer that's going on between these goroutines, and this is pretty straightforward. This data communication that we're talking about is sort of the most straightforward kind, where the data communication is going on at the beginning of the executed institution of goroutine and at the end. Meaning, with these subroutines, these sub-goroutines, they get data right at the start, the initial data, the integers they want to multiply, and then they send results back at the end. So, this is sort of the simplest type of communication, but note that the communication doesn't just have to happen at the start and the end of a goroutine. They can happen right in the middle. So, in the middle of one of these goroutines, it might decide to send some data back to the main routine in a more complicated example. Right now, we're just dealing with a sort of simple version. The beginning, you're sending the separate the goroutines data and in the end you get some back. You need communication to be able to do something like what I'm saying here. So, communication between goroutines is done using channels. Channels are used to transfer data between goroutines and channels are typed. When you create a channel, you create it with a certain type. So, maybe this channel handles integers, this one channel handles strings, this one shells the structure of a certain type and so on. So, they're typed and they transfer typed data. So, you use make to create a channel. An example here, I'm making a channel called c. I say c equals make chan int. So, that just creates a channel of type int, which transfers integer information. Now, once you got this channel c, you can send and receive data using this arrow operator. When I say arrow, it's literally a less than and a dash, which looks sort of like an arrow. So, you use this arrow operator to send data on the channel and to receive data from the channel. So, on both ends, you send and receive using this operator. An example here, send data on a channel. Say, I got my channel c I just made. I want to send the integer three, so I write c arrow three. The three is, if you follow the arrow direction, it goes into the c. So, this thing on the right is going into the channel c. So, that's how you would send data on channel, and you can receive data from a channel. You can see that here. In this case, the arrow is actually leaving the channel. So, x equals, colon equal in this case, x equals arrow c. So, the data comes out of the channel c, the arrow sends it into the x. So, that's how you read it. So, you can send data on a channel and receive data from a channel using this same arrow operator. So, here's a little example of that code that I just explained where I want to do this multiplication of four numbers. To do it, I'm going make these two goroutines. Each goroutine is going to just multiply a pair of numbers and I'm going to start those two. So, the main routine will start those two, have them multiply pairs, then the main routine will take their results and multiply those together and print it. Let's see. Let's start with the function prod, for product. Basically, that just takes two integers v1 v2, and it computes their product. Now, the third thing that it takes is the channel that we want to communicate on. So, it takes that as an argument and then what it does is, it computes v1 times v2, computes the product and then sends it on to the channel. So now we look at the main, the main will create the channel c, then it starts to two Go routines. So go prod, go prod, packing a different hard-coded numbers in there, it doesn't matter but hard-coded one and two to the first Go routine, and a three and four to the second. Notice that they're both getting the same channel in this case. They're both communicating on the same channel. So it starts to two Go routines, then the next two instructions just receive the results from those two Go routines on the channel. So a equals arrow c and b equals arrow c. So a gets whatever comes on the channel, the first thing that comes on the channel and b gets a second thing that comes on the channel, and then the main Go routine just prints the product, a times b. So this is a set of simple example of how you use channels to communicate. Now, one thing before I go on is, I said that channels are used to communicate, to send data back and forth between Go routines and that's true and we're using it here. But there's another way to send data between Go routines, certainly when you create the Go routine. So here, where I say go prod one comma two comma c, go prod three comma four comma c. I am sending those to Go routines the data, which are the arguments to the function. So go prod one comma two comma c. I'm sending that Go routine through those arguments, the one and the two and the c, the channel. So that's another way that you send data to a Go routine, and that's just works when you start it. When you pass arguments to its function, those arguments are passed to the Go routine, okay? So I'm sending data, say with the go prod one comma two comma c, I'm sending one and two and c to the new Go routine I'm creating and I'm not using a channel, okay? So that's another way to send data and that's common. But that's just initial, anytime after the starting of the Go routine, if you're going to send data back and forth, you have to use a channel like we do here, where we receive the results on this channel. Thank you. Module Three: Threads in Go. Topic 3.2: Blocking on Channels. So by default, a channel is called Unbuffered. When you create a channel, it's unbuffered, and unbuffered channels cannot hold data in transit. So the default is unbuffered. So when we call make and we say, make a [inaudible] say chain int and we don't pass in any other arguments, then we're making a channel that holds integers, but it's unbuffered. So it can't hold data in transit. So the implications are that, since we don't want to lose data, the transmission or sending has to block, sending instruction has to block until the data is received on the receiving end, and the receiving instruction has to block until the data is sent. So here's what I mean by this, say I got two tasks, Task one, Task two. Task one is going to send three onto this channel, and Task two is going to read whatever comes on the channel, and put it into variable x and you can see that there. So since, say Task one hits its send first. It reaches send, it tries to send three on the channel. It will block, it will sit there until Task two reaches its receive instruction. So one hour later, however long later, Task one will sit there for an hour or however long it takes, until Task two reaches that read instruction, the receiving instruction. Then once it does, then the data can be transmitted from Task one to Task two and Task one can then continue. But Task one doesn't want data to be lost in transit, because remember this channel, it can't hold any data. Right? So Task one were to continue, and Task two weren't there to receive the data, then the data would go away. Right? That can't happen. So Task one, if it reaches its sending instruction first, it has to block and wait for Task two to receive. I didn't show this example here, but same thing happens if Task two hits its receive first. If they're ordering was switched. So let's say Task two hits the receive first, there's nothing to receive. Right? It just blocks. Task two will block there until an hour later say, Task two finally sends three under the Channel. When that happens, then Task two can continue. Okay? But, either way sending these two instructions, send and receive, they are blocking instructions. So if it's on buffered channel, the send will block until the receipt happens, until they receive instruction happens in the other thread and the receive will happen until the send instruction. The receive will block until a send instruction happens, so either way. It has to do this, because there's no buffering or by default. Now, we'll talk about that in a second. But by default, there's no buffering. So you have to wait so that you don't lose the data. So note that when use a channel in this way, the channel is allowing you to communicate, allowing you to send data between two different threads, two different Go routines, but it's also doing synchronization, right? Because Task one has to wait for Task two to receive or Task two has to wait for Task one to send, right? So it is also doing synchronization. Just like the wait, Remember we talked about wait groups, right? I said, "Oh, I want my main routine to wait for this Go routine to complete first." I made this wait group. So this is doing a similar thing, it's another way to do the same thing because Task two has to wait for Task one to get to reach its send instruction before Task two receive can continue, right? So there's also waiting going on here. So the communication with channels is also synchronous and its synchronization is built in. So what that also means, is that you can use this channel communication for just the synchronization and throw away the results, the received result. So here's the example I got here, Task one, Task two. Task one, sends a number three onto the channel, okay? Task two, notice that what it's got, it's got the arrow and then the c, which means to receive something from channel c. But it's not assigning it to any variables. It's not saying x equals arrow c. It just says arrow c. So in this case, Task two is receiving something off the channel, but it's not using the data. It's basically throwing away the data that it receives. So in a situation like this, since you're wasting the data, all you're really doing is synchronizing the two tasks. So Task two has to wait for Task one to do the send. So this is like a wait, like a wait with the wait group. This is like another way to implement a wait. Because Task two, even though it's thrown away the data that it receives, it still has to wait for Task one to do the send. So you're you're waiting for an event just like you would do with a wait group. So it's something about the channel communication that is also synchronization built into the communication constructs. Thank you. Module three: Threads in Go. Topic 3.3: Buffered Channels. So, the channels that we talked about so far, are the default channels. They have no capacity to hold data in transit, but channels can have some capacity, so channels can contain a limited number of objects in transit if you want. Now, default size, default capacity for a channel is zero, so it's unbuffered, but you can make what are called buffered channels, which actually have some capacity to hold data in transit. So, the capacity of a channel is the number of objects it can hold in transit. The way you define this as an optional argument to the make function. When you create the channel, you can set its capacity. So in this example, I say, "C equals make chan int, three". I give it that second argument three which tells me that the buffer is size three. Now, note default size is zero. So default, when you don't give that second argument is that if you made second argument equal to zero, to say, ''look I can't hold anything in transit. '' But in this case, I'm saying, "Okay, three. I got a buffer size three." Now, when you have a channel with a certain amount of capacity, the blocking, the send and receives block under different conditions. Not really different, but they seems different. So what I mean is, sending only blocks if the buffer is full, so that means is [inaudible] channel size of capacity three. I can do $0.03 cents without blocking, so I can do $0.03 cents with no receipts. Say Goroutine one, is sending to Goroutine two. Say, Goroutine two has not done any receives, it doesn't do receive for a long time. Goroutine one can do three sends, and still not block. Because what happens is, when it does this for a send, even though the receiver is not there to receive it, the buffer since it has capacity three can hold it temporarily and allow the Goroutine one to continue its execution. This can happen up to three times into the capacity of the buffer, right? Now eventually, once I get to that fourth send. If that receiver hasn't done receive, then if I couldn't write the fourth thing into the buffer, it only has capacity three. So at that point, it blocks. So sending only blocks if the buffer is full. But you have some sending that you can do without actually having to block on the side of the writing Goroutine, which is good, right? Because blocking is generally a bad thing, for the most part is a bad thing. Because blocking reduces your concurrency, right? If you block, that means you can't execute which means you might be wasting processor resources. So, you don't generally want to block, you would like to be able to do a send. Then, whether or not the receiver is there, you just continue on your merry way and keep executing. So, having a channel capacity allows that to happen to some extent. Now it's limited, because memory is finite, the buffer can only have a finite size, but you can continue for more time without blocking, and so it can help you sometimes. Also, receiving, same thing happens on the receive end. Receiving only blocks if the buffer is empty. So, if the buffer is full, it has three objects in it, you can do three receives without any send because it can receive those three things that are in the buffer. Now on the fourth receive, once the buffer is actually empty, then that receive will have to block, okay? So, this is what it means to give a channel capacity, you make it a buffered channel, you can give it a size, right? The size means that the channel is basically storing data in transit, so the sender and receiver don't have to block as often, well because they can use the buffer as a holding space. So, to give any little example, I got my channel with capacity one, so I got two task, two threads: T1, T2 or to Goroutines. In between there is a channel and it's just one, it's capacity one, so it's one space, right? If you look at T1, it's doing the sending, T2 is doing the receiving. T1 is writing an integer in their C, into the big channel. Then T2, is going to read two things out of the channel. Now first note, that can't happen, right? You can't send one thing and receive two. So what happens exactly depends on which executes first, which send or receive executes first. But, the first receive, has the block until the send occurs. So if T2 is executing faster, it hits its first receive. Before T1 has issued the send, then T2 is going to block, because the buffer is empty, okay? But once the buffer gets filled, so once T1 executes and fills that buffer, then the first receive can continue. Now, the second receive in this scenario blocks forever. Now, I'm assuming that T1, it doesn't do any more rights, okay? So, this would be an error. If T1 just writes one thing into the buffer, and T2 tries to read two things, the second one has to block, right? So, this would be an error right here. In fact, it would probably throw an error as soon as you run it. In this case, this is a problem. T1 is just going to block forever, because T2 will block, is waiting on T1. Now, we can see a similar thing happening in the other direction. This time, T1 is actually writing two things, three and four is writing as the channel, T2 is going to receive. So, the second send is going to block until the receive is done. So, the first send it won't block it'll just put something into the buffer. Eventually, the receiver will receive that at the buffer. The second send, if the receive has happened, then the buffer is empty, so the second send can write something into the buffer and then continue. But if the receive has not happened, then the second send is going to block because the first send has filled up the buffer and this buffer is only size one, so the second send will have to block until the receive happens. So, why use buffering? The main reason to use buffering, is so the sender and the receiver don't need to operate at exactly the same speed. So, what I mean by that is, if you don't have a buffer, if you use an unbuffered channel, then the sender and the receiver have to be in lockstep. Meaning, the sender has to block until the receive happens, the receive blocks until the send happens. So, they have to be in lockstep, which reduces the amount of concurrency that you can have. Buffering, will allow you to do like in this case, say you got an empty buffer, the producer, okay, so I'm defining this, this is sort of a classic problem producer consumer. It's a concurrent example, where you've got two different threads, there's one referred to as a producer, one referred to as a consumer. Now, the producer is basically generating data. We don't know how it's generating data, maybe it's reading data from sensors, reading temperatures from sensors and sending them to the consumer to be processed, okay? Something like that. But it's generating data on a regular basis, maybe as a common thing, is maybe your T1 is connected to a microphone, and you're sampling the sound. So periodically, you have to record a sample, store a sample, and so the producer maybe is just talk into the microphone, grabbing samples periodically. Then, it sends them to the consumer, and the consumer is doing some kind of audio processing task. It takes the samples and it-. Who knows? It does the [inaudible] transform something like this, okay? But the general idea is you got a producer that is producing data from some source, over and over and over again, over a long period of time, and the consumer is consuming data over and over again and over a long period time and doing something with it. Now, a buffer is useful in this situation. Because, let's say sometimes, the producer is a little fast to produce its data right? Maybe the consumer is consuming a certain rate, but it produces a little faster in producing. In that situation, if you had no buffer, the producer would have to block. If it tries to produce something before the consumer is ready to receive it, then the producer blocks, right? Until the consumer reaches this receive and then they can continue. So, the producer has to get slowed down to match the speed of the consumer, and vice versa. If you're consuming too fast, faster than the producer, then the consumer has to get blocked, its receive has to get blocked, to slow down to the rate of the producer, right? So, the rate, if these producer and consumer wants a little faster than the other, their rate is forced to be slowed down when you don't have any buffer. Now, if you have a buffer, then a speed mismatch or at least a temporary speed mismatch between the producer and consumer is acceptable. Because what will happen is, that the producer is a little bit too fast, then it'll just fill up the buffer. If the consumer is a little bit too fast, it'll just draw data from the buffer. Now this, since the buffer is finite size, this can't happen forever. Now, if the producer is forever too fast, then it will fill up the buffer and then it'll still have to block. But, when you have a scenario where the producer and the consumer on average there at the same speed, but sometimes they speed up and slow down, and this certainly happens in a lot of scenarios, where they're exactly lockstep, sometimes produces a little faster, sometimes a little slower and so on, then a buffer is really helpful. Because then this buffer as long as you don't fill up the buffer, can allow you to continue executing without blocking, continue sending and receiving without having to block because you've got this buffer in between. But still, on average, the speeds have to match. Otherwise, the buffer will overflow or become empty. Thank you.