[SOUND] Transactions have been a basis for much computing, especially in parallel systems for awhile. So revisiting transactions and talking about consistency that isn't perfect is perhaps a little strange, but the reason for doing it is trial computing nowadays actually really does depend upon scalable solutions, and no scalable solutions make transactions sometimes very difficult to implement. So we're going to be talking about eventual consistency, a form of consistency that works very well for large scale computing but overcomes some of the difficulties that we get with transactions. Why is consistency just such a big deal? Our model for computing is state based during machines step by step transaction oriented. But that's on a single machine. Now we've got machines that have multi cars that have lots and lots of power lines within them. And then we replicate that on multiple machines. Say, 10,000 machines. And so there is a question about, how do you achieve it? If you want to get consistency between that sort of scale of machines, it can be quite slow because of the messages needed to actually conduct a transaction. So the natural feeling is, is it really necessary came what may do. If, for example, the systems will never get too much out of step, but eventually will reach a consistent state, could that not be used instead of the transaction? Can we design the systems to eventually reach consistency? And not have to worry about putting transactions into the cloud. A good issue is just where with the supply, which parts of the cloud's computing? Do we have to apply that to file servers? Do we have to do that to scheduling? Whereabouts do we need to worry about this? So I said, there's lots of existing models. And to list those ACID is the classic sort of transactional type of consistency. BASE is a way of programming systems to make it behave as consistently as possible. Paxos is a distributed protocol for achieving eventual consistency. It's not sure how long it would take, but eventually Paxos will allow you to decide what is a consistent view of the state of the system, and it uses a protocol to do that. Paxos is the basis for a whole ream of different solutions which are now used in big cloud solutions. Because what it allows you to do is have parallel systems. It allows you lots space synchronous updating and so on, but it keeps everything consistent. The protocols within Paxos are organized to maintain an eventual consistency in the way that they operate. How are components in a cloud built using these solutions? What are the practical issues on building consensus consistency inside systems? Eric Brewer made a great impact on the wealth by challenging some of the issues about consistency and big data systems. What he cleared it was, well, can you achieve consistency, can you achieve that as the same time as availability, and partition tolerance? If you want consistency, can you make it so that it's fast, that you achieve consistency quickly? Because otherwise while it's achieving consistency, you don't the values available perhaps to do the sort of things that you would like to do. And then partition, as you're trying to achieve consistency, is your network partitioned? Is it broken so that there are two sets of systems performing operations on data, and they can't be kept consistent with each other. So this is a big problem. Data centers need to be very responsive, they need to be able to cope with transient faults making it hard to reach some service. You still need to be able to get the data. They can use cached data to respond faster if the cached entities can't be validated, but it might be stale. How exactly will these data centers operate? And the notion is, well, you could weaken consistency for faster response. So you use cache values, provided you don't do anything wrong. So If you are sort of adding things to people's account, say in many ways, it might be okay to delay adding some money to an account. What it would be obviously wrong, to add it to the wrong account all let a deficit occur before adding something to the account, that would cause a problem in the whole system. So as long as you manage to get some sort of weak consistency in the system, you can use it but there's always the question, you want to avoid some conditions and those would be that's what constitutes the meaning of weak consistency, that you want to avoid those [INAUDIBLE] conditions. So Peter Deutsch sort of wrote down some of the assumptions that you do have in large cloud, large distributed systems. That the network is reliable, that latency is pretty much zero, which of course it is not. The bandwidth is infinite, and of course, it's not because of our networking uses wires and they're restricted by the speed of light. And not only that, but the bandwidth is often shared between lots of people. So that bandwidth is more restricted. The networking is secure. That the topologies don't change. And, of course, failures do change policies, and so you do need to, well, you do need to worry about topology. You have one administrator, you don't have two administrators deciding what to do and having a conflict. The transportation cost is next to nothing, but, again, with bottlenecks and so on, transportation costs could be challenged there. And then, the networking is homogeneous. Everything works in the same way in roughly the same speeds, and has roughly the same properties. If you would make those type of assumptions, because, now you see a working system, it leads to a number of fallacies. And those fallacies can lead you to basically make mistakes in implementing the system. So if you have unbounded traffic, for example, then, you can actually add to the number of drop packets, the number of wasted band or the amount of wasted bandwidths network, you can increase the latency, you can reincrease packet loss and that will greatly effect your applications. Next, the network security believing everything works well and it's all sort of contained. That allows malicious users to perhaps challenge some of the security measures you've got, adapt to what's there, and bring your system down because you didn't expect the system to behave in a particular way. When you've got multiple administrators, they may not necessarily agree. They may have conflicting policies, they may want to compete over bandwidth or something. So you have problems with sort of multiple authorities controlling the system. When you're building and maintaining a subnetworks, there could be large, hidden costs. Such as copying data and our memory and so on. So understanding about that is important. And then there's bandwidth limits on how the traffic centers creating bottlenecks and over frequency multiplex media, bringing lots of data into one particular server or something can cause bottlenecks and exceed queues and create huge long queues that add to latency. All of these things can challenge those four attributes that you would like out of the system. So Brewer's Conjecture and this was stated very, very informally, is the CAP theorem. But was later shown to have some sort of relevance in theory. It's known as the Brewer's Theorem and it states that if you have a distributed system, it's impossible to provide all of the three of the following guarantees, getting consistency as in the sort of notion of transactions. All the nodes see the same data at the same time, making things highly available and guarantee that every request receives a response about whether it's successful or fail, and partition tolerance if it breaks in two that you can survive that sort of condition. And, as a consequence, what we've introduced at components in big data systems, in cloud systems, in big data centers to overcome these problems associated with Brewer's Conjecture. And they do that by implementing eventual consistency. In the next slides, we will talk about how that's done. [MUSIC]