Hi and welcome to relational databases. In this class, we're going to teach you about Postgres. Postgres is a really cool database. It's both open source, but it has a lot of features. So you might wonder why it is that we are so obsessed with teaching about databases. And for those of us who've been around a long time and started like in the 70s when I started, it's quite amazing how fast databases are. We just use these things so much and we take them completely for granted. When you log into a system like your learning management system, that system probably has 50 terabytes of data and you type your name and your password, and a quarter of a second later it knows what classes you're in, and what your notifications are and all these things. And you probably don't even wonder how it is that it can go through terabytes of data in a quarter of a second. And it wasn't always that way. And so I'd like to start with a bit of a history lesson talking about what things were like before databases. And honestly, you kind of have done this kind of work if you learned Python and you've done simple data mining and we said read this file and do something. So the problem that we have is it's not so bad to read a file, although in the old days we had to use magnetic tapes. Let's see if I got this right. Magnetic tapes. So this right here is a magnetic tape and in the old days we didn't have spinning disks that could hold all the data for an organization. The memory of the computers was small, the disk drives were small, and so we stored data on tape, and the problem that tape has is it's a linear thing and if you're looking at the tape here, it takes seconds or minutes to move to look at the data here. So the biggest delay that we had if we had to store a lot of data was the access time to go from one piece of data to another piece of data. And so we had to still solve the problem like if you're at a bank of people sort of depositing and withdrawing money from the bank and then your database is on this sequential media. This tape that takes perhaps an hour to go from one end to the other. And so we came up with this idea that we would put all of your old data on a tape sorted by account number and then we'd have a printout in each bank that had your balance as of midnight last night and you would come in and you would do a deposit. They would fill a loads deposit slip and you put your account number on it and you might take out a withdrawal. They might check the little piece of paper to see if you had the right balance, had enough balance to withdraw. But then you'd end up with these transactions and these transactions are the daily activity. And then what we would do there's this thing called a card sorter. We would literally punch them on cards and then we would physically sort them so that they were in an ascending order. So that the lowest account number was at the beginning of the cards and at the beginning of the tape. And then at night after the bank closed, we would have all these things sorted and we would mount one of the tapes on one tape drive, and then we would kind of pull it through the computer. I think this is a card reader here. Pull it through the computer and then write a new tape. So that's what we would do. And so this new one. But we took advantage of the fact that the data was sorted and we kept the data sorted forever all the time. So the way it would work is a program would read one record from the tape and it would read one record from the cards from the transactions and it would compare them. And if this card record was greater than the first tape record, which it most likely was, it simply copied the record and then it went to the next record and then it would read the next record, and at some point it would copy enough records to the point where it got to that first record and these two match. At that point you update and put the new record in and then you advance to the next transaction in your input data. And you'd see that then you are advancing through this when they match, but you'd never go farther. You'd never go ahead. You always look at the current transaction. So if you think about the amount of data that actually had to be stored inside the computer's memory at any given minute, this could be hundreds of thousands of records. This could be thousands of transactions. But literally the amount of data you had store in the memory of your computer was one transaction. One transaction from the tape and one transaction from the input. And we called this thing a sequential master update and the way it ultimately worked was you would do this sort of blending of these things. They would be sorted and they would end up with a new sorted one, and then you would take this and the next night you'd make that be the old sorted one and then the next day's transactions. So we had these tapes lined up in tape racks. This was last night, and the night before, and the night before, and the night before, and the night before, and then there's the tomorrow. Day after tomorrow and a backup of two weeks ago and that's how we handled data that needed to be modified and searched. Okay? And so that's the way it was. But of course that's not true today. Today, we have disk drives. Or even faster, we have SSDs, solid state disks, that are even faster. And so the key thing that happened in disk drives or solid state disks was that your data is still sort of permanently magnetically stored somewhere and there's a series of rings here on this disk drive and you had this little servo that would move the head in and out. And so if you look at disk drives, and disk drives are becoming more and more rare, there was the speed at which the head could move and the rotational delay and so the revolutions per minute of a disk drive and the access time. Those are the two things that we categorized disks. And so as these became more common and the size of disk drives got bigger way faster than the memory, and so you literally could store all of the account data on a disk drive. Now, disks were faster than tapes and you can imagine that the first thing we would do is we would do like a sequential master update. So you'd have like your old data somewhere. Then you'd have your new data and you'd read through it slowly but surely and read all the transactions, except that you're not really taking advantage of the fact that this literally can move from, and if look at all of your data, all your data sitting here, you could move from here to here in 100th of a second or several thousandths of a second. So you didn't have to have any delay and the distance here didn't so much matter. It mattered a little bit if it was kind of coming around, but that's a subtle optimization, meaning that you could literally if you could think of this as your one terabyte of data, this whole thing, you could hop anywhere, no matter if it was close or it was far. But the cost to go from one piece of data to the next was the same. So then the question comes, how would you build software so that now we can make it so that your account balance is updated the instant you take the money out or make the deposit? And sequential master update is not best idea, although sorting still turns out to be important. And so in the '60s as these technologies and then the '70s as these technologies became common, a lot of companies started building the notion of databases. And we've ended up with the results of decades of research in these amazing pieces of software that we call relational databases. The relational databases are instead of reading the data sequentially, you know how to properly bounce around in those data. So we come up with this data that ends up being stored as a network of data on this disk drives that you can bounce through really fast. So it is an amazing thing. It's grounded in mathematics and there is mathematics that make these relational databases that are the reason they work so well. But it took a long time before we had software that's sophisticated enough to meet all the needs. Now, as these databases evolved in the '60s and the '70s, there became like vendors. And these vendors were sort of powerful, like IBM and Burroughs and others, and some companies that don't even exist anymore. And they came up with strategies for building a database. And each company sort of had brilliant people who were building really cool databases and then they would get something cool working and then they would try to convince everybody to use IBM hardware or Burroughs hardware. And if you bought IBM hardware, you had to use the IBM database. And if you bought Burroughs hardware, you had to use the Burroughs database, and it turned out that they didn't really have a common model. So IBM might try one strategy and this is not necessarily because they're bad. It was because everyone was researching it. So at some point there was the need to standardize on databases and the standard that now we use and take for granted that came out of the National Institute of Standards and Technology, NIST, is called SQL, Structured Query Language or some in the old days I think they call it Simple Query Language. And the interesting thing, and you can see this video from Elizabeth somewhere else in this course, it was perfectly timed, meaning that the vendors had built things that they knew were good, they knew good ideas, and they had good ideas and there were different ideas from different vendors that were all pretty good. But yet they had not gotten so mature that they were just going to fight each other. And so the National Institute of Standards and Technology basically said to all the vendors, "Look, we're not going to tell you, you know more about databases than we do, we're not going to tell you how to build a database. What we're going to tell you is we're going to stop buying your software unless you come up with a standard and we'll help you with meetings and we'll run the meetings and take notes. And eventually there will be this standard." And this SQL, the Structured Query Language, came out of that standard. We were present at the beginning of it. SQL is a beautiful thing. It is a beautiful programming language. It is my favorite programming language and when I'm not allowed to use it because I'm using an Object Relational Mapper or some magic thing that makes SQL easier, I'm like Stop it. SQL is beautiful. And the reason that SQL is so beautiful is that it's non-procedural. It's the most powerful language that I've ever used that's non-procedural. And what's the difference between procedural and non-procedural language? A procedural language you say start here, do this next, do this next, do this next, oh wait, go back up, do this next, do this next, go back up. Okay we're done here. Now oh skip this part, do that. All that sequence, all that like I call it like GPS navigation, turn left, turn right, go straight. That's procedural. And actually that's how computers really work and so at some level you still have to write procedural. But what SQL does is says, You know what? I've got these tables of this stuff over here, please assemble it to me. I don't really care how, what you do first or second. It doesn't matter. All I want is I expect this is what I express the fact that this is what I want, and then the database system completely optimizes that and it is an abstraction. Our expression of what we want is very different than the action that is taken by the database system to assemble that. And it can actually watch what I'm asking for and it can optimize later and move things around. And some databases systems do that. They actually figure out the pattern of usage and then reorganize the data, so that the kind of things that I'm asking for are handled more efficiently. And so the key thing about SQL is it's not the way that they build databases, but it is the way that we talk to databases. And it's high enough and abstract enough that every one of the database vendors is able to build a really cool implementation and they can sort of compete with each other and build a better implementation. So if Postgres is better than Oracle or Oracle is better than Postgres, they can fight, they can compete. But then we simply use SQL to talk to it. Now, the SQL turned out to be quite early. And so there's a few extensions and we'll see those things but a lot of the core SQL is either the same or very, very similar. So the critical core bit, and we'll talk about this a lot, is called CRUD. The critical notion is that there are four basic things that a system to store data is you're going to create some data and put it in, you're going to pull some data back out, you're going to change data, and then you may delete data. So that's the thing that we're going to focus on making fast and simple to express. So as I mentioned, databases have a mathematical basis. The relation is like a math term, so relational databases is like, I hear that as like databases that have the theory of math built into them. And if you read database documentation, you can tell whether they were written by someone who's kind of a software developer or someone who's more of a software "theory" person. So the theory people will say things like relations and tuples and attributes, and people like me who are programmers will call them tables, rows, and columns. In there is the truth and don't feel bad if you see weird words that describe something to you that seems very, very familiar. So ultimately, I said earlier that the data is modeled as a network but it's also modeled as a connection. And so it's not just like a road system. It is more sort of an abstract connection point. The combination of a row and a column has got data at it. And we have attributes and rows and columns and tuples and relations. We're not going to worry too much about that because we just use this software and it's magical. I talk about this a little bit just to appreciate those that came before us that built this wonderful software that we're going to take advantage of. As a programmer, I think of this as just like a spreadsheet. The tables are like the little tabs across the bottom. The columns are the attributes and the rows are the tuples and in a spreadsheet we commonly have that first row have some metadata about the rows, and sometimes even if you're sorting a spreadsheet, it's like do you want to not sort the first row because it sure looks like the names of things and that's actually something that we do inside of databases as well. We call that the schema and it's much more formalized than hey let's use the first row to represent the metadata about what's in the row. But really you can think about these things as a large spreadsheet as a software developer and that's the beauty of the abstraction, is that you just sort of say, you know what, Oracle is a multi-billion-dollar company and they spent decades building something really fast. I'm going to pretend it's a big fast spreadsheet and that's how it makes our lives easier as a software developer. So I've already mentioned Oracle. There's a lot of different database systems out there and I've taught most of these and on this screen I've used every single one of these at one point or another in my career. And that's one of the cool things about SQL, is that you learn like 80 percent of it, 60 percent of it, and literally in all these things it's the same. I've taught SQLite to use in Python. I've taught MySQL to use in my Web Applications for Everybody class, and I'm teaching Postgres, and I've used Oracle professionally and I've used SqlServer, professionally. And HSQL we use in open source projects. So everything on it, I have used every one of these. So, SqlServer. I'll start from bottom. SqlServer from Microsoft. It's one of Microsoft's very strong products and so don't feel bad if you end up on Microsoft. I usually feel bad when I'm using Microsoft stuff. But the Microsoft SqlServer is a solid piece of software. MySQL used to be my favorite but Oracle bought it. And so it's technically open source but there is a certain fear in the marketplace that Oracle is potentially going to like bend it towards commercial and say "Well, here's the crappy open source version and if you want the speedy version, that you have to pay for." They haven't done that yet. Oracle is the gold standard. It is large, it's commercial, it's enterprise scale, it's very tweakable. Among all these it's by far the most annoying and complex to maintain. I dislike Oracle. I mean, I've run all these and every time I touch Oracle it's like, "I got to write myself long documentation about how to do the simplest of things in Oracle." I'm sure if you're an Oracle expert, you're like "Oh, but it's so easy." I'm like "Yeah." The other to me Oracle is harder than all the other ones combined together as far as I'm concerned. But in commercial situations, it's what's going on. Now Postgres is the one I am teaching most recently. As a matter of fact, right now. I never have used Postgres professionally. I've never used it in an open source project. But because of the questions that have been raised about MySQL, I think the market is gently drifting toward PostgreSQL. So companies that I know are going to Postgres because it's free and open source and it doesn't have the Cloud the way MySQL does. And it has a lot of features, kind of like Oracle. And so historically, Postgres has more features. Oracle and Postgres always had more features. MySQL is catching up but in the catching up you might have features that only come in a commercial version of it. So it's like let's just use PostgreSQL going forward. And so that's why I'm teaching you because what I want to teach you in this class is some of the more advanced features that are part of PostgreSQL and I want to teach you an open source product. So we're learning PostgreSQL. It's all open source. There's unlikely that it'll be anything but open source going forward and it's a very feature-rich database. So coming up next we're going to talk about SQL Architecture and how it is that we can start writing SQL statements.