You know by now how to deal with sequences and sets. In this session, we'll get to know the third fundamental collection type, namely maps. Map takes two type parameters, a key type, and the value type. It is a data structure that associates key of type key with values of type value. Here are two examples of maps. Roman numerals, it's a map that goes from the letter I to one, the letter V to five, the letter X to 10. You can extend that to further letters, of course. That's a map from strings as the key type, two numbers as the value type. The second map we have here is capital of country. That's written Map, US, Washington, Switzerland, Bern. That's a map from strings to strings. The key type and the value type are both strings. Maps are a special subtype of iterables. The type map key of value extends the collection type iterable of pairs of keys and values. Since maps are iterables, they support the same collection operations as other iterables do. For instance, we can map a map itself. We can write country of capital is capital of country map, and then it takes the key and the value, so that's the country and the capital, and it reverses the two. Now we have the capital first and the country second, and that would give the new map country of capital. If we print it out, it's Washington, US, and Bern, Switzerland. Note that maps extend iterables of key-value pairs. In fact, the syntax key arrow value is just an alternative way to write the pair key, value. The right arrow here is implemented as an extension method in Predef, which is an object that's implicitly imported into every Scala program. The right arrow here is not syntax, it's just essentially another method definition. You can think of the extension method to be defined like this. It would be extension with arbitrary parameters K, V, a key of type K, a value of type V, returns the pair of k and v. We've seen that maps are iterables and we've put that to good use. But maps are also functions. The class map key, value also extends the function type key arrow value, so that means that maps can be used everywhere functions can. In particular, maps can be applied to key arguments. We can write capitalOfCountry US and that would give you Washington. What happens if you apply a map to a key that's not defined in the map? Well, applying a map to a non-existing key would give you an error. If you tried capitalOfCountry Andorra and you would get an exception, a java.utilNoSuchElementException, that would tell you furthermore that the key Andorra was not found. You should use the application syntax for maps only if you're sure that the key is in fact in the map. But in most cases, you're not. In that case, there's a second operation which is called get that doesn't throw an exception if the key is not in the map. To query a map without knowing beforehand whether it contains a given key, use the get operation. Here's what it's used like; CapitalOfcountry.get US, that would give you a value called Some Washington. It says, there's something there, and it's Washington. If you say capitalOfCountry.get Andorra, then you will get a different value which reads None. The result of a get operation which has these values some and none is of type option. The option type is defined as you see here. It's a trait with a single type parameter A, which is covariant. It has two cases, some and none. Some is a case class that takes a value of type A and extends option of A. None is an object and it extends option of nothing. An expression map.get key would return none if the map does not contain the given key, and some of x if the map associates the given key with the value x. How do you work with options? Well, since options are defined as case classes, they can be decomposed using pattern matching. Here's a typical method. Let's say we want to show the capital of the given country and not fail with an exception if the country is actually not in the map. Here we would say capitalOfCountry.get country match. If we found the capital then report that. If we found nothing then return missing data instead. Here we would now have showCapital US would give us Washington, showCapital Andorra would give us missing data. Options also support quite a few operations of the other collections, even though they are not a collection type because common to all collections is that I can add arbitrary data to any collection. Since an option has only zero or one element, it's not a collection in that sense. But it supports quite a few operations such as map or fold or filter. I invite you to look at these Scaladoc pages for option and try them out. You might ask, why does capital of country or any map really return an option in the get method? In some other languages, it would return either the country or a null in case of a missing string. Why not use null? But it turns out that null is actually really dangerous because if any value can be null then you'll never know beforehand whether a certain operations on that value are defined or not. If the value is null you would get a NullPointerException. That's why the inventor of null, that was Tony Hoare, he was also the inventor of Quicksort and many other things. The inventor of null called it his billion-dollar mistake. In Scala, null is actually available. We need it already for inter-operation with Java, but it's generally considered bad style to use it. This course will generally not use null and replace it with safer alternatives such as option. Why is option safer than null? After all, in an option also, I have either something or nothing. Well, it's safer because the types force you to handle both cases. If you do a capitalOfCountry.get country, that doesn't pretend to be a string. It's an option of string and you have to process it further by typically pattern matching on the Some case. If you forget the non-case, then the compiler will complain and say the pattern match is not exhaustive. It will essentially nudge you to really handle both of the cases; that makes option so much safer than null. Back to maps, we've seen how we can query a map and how we can define a map from a sequence of key value of pairs. But can we also update a map? In fact, yes, that's available. We can have functional updates of a map, and they are done with the plus and plus plus operations. This operation here m plus key, value pair, that would give you the map that takes key, k to value, v, and is otherwise equal to m. So k might already be defined in m, in which case it will be overridden; the new binding maps k to v, or it might not yet exist in m, in which case we add a new binding for k. Then there's also a bulk operation, m plus plus key values. That would update map via plus with all key value pairs that are given in the collection kvs. Let's go back to this one. Then there's also a bulk operation, m plus, plus key value pairs. Here kvs is itself a collection of pairs, of keys and values and the map that gets them updated via plus with all the pairs in the kvs collection. Of course these operations are purely functional. They don't change the map, m, they create a new map. For instance, if I define a map m1, red becomes 1, blue becomes 2, blue becomes 3 to that map with a plus. Then I get the new map, red becomes 1, blue now maps to 3. But the old map, m1 is still the same: red becomes 1, blue becomes 2. You might wonder how can we implement maps like that so that the old map stays in place while we update the new one. Well, for small sizes maps are essentially single objects and we do copy the whole object that holds for sizes up to four. For larger sizes, what we do is we essentially use a scheme similar to the vector scheme. We have essentially arrays of arrays of a shallow depth up to five, and in each array we have essentially key value pairs, and we use a hash function to essentially select either the right sub-array or the right element in that array. Similar to vectors, if we update, we get essentially a log n update of these sub-arrays, where n is the depth of the tree. Basically we copy between one and five of these sub-arrays, which is still a reasonably bounded cost for updating a map. If you know SQL from databases, then maybe by now you have realized that a lot of the collection operations we have really correspond to what you can do in these database queries as well. That's no accident because essentially both of them are rooted in the same theory of relational calculus. Two other common operation from SQL queries are groupBy and orderBy. OrderBy, we don't have directly on collections, but they can be expressed using the function sortWith and sorted. Here I present them with an example. Let's say we have this fruit list, and then fruits sortWith, underscore length, less than underscore length. That would sort the list with the function that shorter strings appear first. Consequently, that would give you the list: pear, apple, orange, pineapple. There's also another function that is written sorted. Fruit.sorted sorts with the standard comparison function for the underlying type. Here, the standard comparison function for string is lexicographic ordering. Essentially, we compare first letter, if it's the same with the second letter and so on. That would give you this ordering instead; here apple comes first because it starts with an a, then orange, pear, and pineapple. So much for orderBy or rather sort with insorted. What about groupBy? GroupBy is directly available on Scala collections. What it does is it partitions a collection into a map of collections according to a discriminator function f. Again, it's best explained with an example. Let's say we have the fruit list and we want to group it by its head. That means we take the head of each element. Here we have an a, a p, an o, and another p, and we create a map from these heads to the lists of original elements. What we would get here is that p now maps to the list of pear and pineapple, a maps to a single element list, apple, and o maps to a single element list, orange. Let's use these operations in an extended example, which is to construct a class for polynomials. One way to look at a polynomial is as a map from exponents to coefficients. For instance, the polynomial x^3rd minus 2x plus 5 could be expressed with the map that says exponent 0 maps to 5. Exponent 1 maps to minus 2, and exponent 3 maps to 1. Based on this observation, let's design a class polynom that represents polynomials as maps. In fact, there's one other operation that comes in handy to treat this use case, and that has to do with default values. We've seen that previously, maps where partial functions applying a map to a key could lead to an exception if the key was not stored in the map, but for polynomials that's actually not very useful because we could say, well, it does make sense to ask, well, what is the exponent 2 coefficient for a polynomial, even if it's not given, you could just say it's zero. Basically, you could say it's x^3 plus 0, x2 is the same thing as x^3rd. We can treat missing coefficients as just coefficients of zero. The fact that these maps are partial and might fail is actually inconvenient. We would like the map to just give us back a zero for the cases where it's not defined, and we can do that with the operation with default values. The operation with a default value turns a map into a total function. Here we have capital of country that was a normal map that could fail, and we can give it the default value or we can create a new map rather, that is the old capital of country map, but now with a default value unknown, and that means we can now apply cap one to Andorra directly, and we would just get back the string unknown instead of an exception that an element was not found. With these tools, I think we have enough now to embark on the design of the polynomial class. Okay, I've given you the outline of the class polynom. It takes a map from Int to double its parameter, which is a normal map, which I call here non-zero terms. Essentially that's the terms consisting of coefficient and the exponents that are not zero. The map maps the exponents, 0,1, 2, 3, and so on to the coefficient which is a double number, and we have the terms which is non-zero terms with a default value of 0.0 and that should actually be a vowel because we probably use it several times. Then we want to define two operations. One is the addition, that adds two polynomials, giving a polynomial, and the other is two strings. That should print or show a polynomial in a nice way, in a nicely formatted way. Let's do toString first. Let's start with something really simple. Let's say the toString of a polynomial is the toString of its terms map. We can test it by creating a polynomial. Let's say this one here. It would indeed print as the map in the way we would expect it, but that's of course not very nice because the toString method exposes the internal implementation of a polynomial in terms of a map. It would rather want to see something that has recognizable coefficients and exponents. How do we do that? One thing we could do is first define the right coefficient exponent pair for each of the terms that we have in the map, and then essentially create a string that just concatenates them all with pluses in between. Let's do that. We can improve that by taking each of the terms of the polynomial, printing that house as an exponent coefficient pair, and then concatenating all those strings with pluses in between. Let's do that, delete the old version and I write the new one. What happens here is we take the terms, we map it to a list, we sort it. That gives us the lowest exponent first. Then we reverse the sorted list. That gives us highest exponents first. Now we let exponent in coefficient range over those terms. We print the exponent as x up arrow exponent. We print the coefficient as it is, and then we concatenate the whole thing with classes in-between. Now my polynomial, would read like this; 1.0x to the second, plus 3.0x to the first, plus 2.0x to the zeroth. Now there's room for improvement in particular, extra zeros, that's one, we typically don't print that. Let's try to change that. That would simply say if exponent equals 0, then we don't print an exponent and otherwise we print that, so that one looks better now. Another thing to improve would be what happens if we take the zero polynomial. That would be the empty map, no coefficients at all. That would give us the empty string, which is not ideal. What do we do about that? We have to create another special case and say, well, if the term map is empty, so no coefficients at all, then let's print a 0 as the only element, otherwise print termStrings.mkString plus else before. Now we have handled that as well. A third improvement I leave up to you, It's a little bit more involved, would be to handle negative coefficients gracefully. Let's say I have actually minus three here. What would I get? Well, I would get plus, minus 3.0x to the first, which is legible. But it could be improved by essentially removing the minus here and changing the plus with a minus. That's a little bit more involved and I leave that up to you as a polishing exercise. Let's turn now towards the plus operation. To add two polynomials, we have to combine their terms. If a term has an exponent which is only in one of the polynomials, we can just keep it like it is in the result. But if a term has an exponent that figures in both this polynomial and the other polynomial, then we have to add the coefficients. Here's a way to do this; let's analyze this expression in detail. If we return a polynomial, we take some map. The map contains the current map terms updated by this other map. This map is the terms of the other map that are then mapped, so that if we have an exponent with coefficient in this other map, then we take the exponent and the coefficient plus the value of the current map terms at this exponent. We add up the coefficients of both maps. We do that for all terms that are in both this map and in the other map. If a term is only in the other map, then terms of exponent would be zero. That would effectively return exactly the same term that we had in the other map. If a term is only in terms then that wouldn't be updated, again, it would stay as it is. We can test the operation by just, for instance, taking x plus x or x plus x plus 0. That would give you a polynomial that has all coefficients doubled as expected. There's one further refinement I'd like to do. It's a bit awkward to have to write polynomials like this, because again, it exposes the implementation strategy. Implement polynomial in terms of a map. Second test is nested called polynomial of map, where it should be clear that if you want to create a polynomial, we want to create it with just essentially the elements that are here. The question is, could we get rid of the map, call like this and write polynomial like that? For the moment, of course not, there's not an operation. It said, well, it found three into parentheses and it needs a map. What we can do is, we can give a map secondary constructor that would work with these parameters. What we would do here is we can define a secondary constructor, this, and that would take, well, what will it take? The problem is that we have to deal with a varying number of parameters. A polynomial could have any number of terms that should be given in this parameter list. For that we're going to introduce another language feature scalar that you haven't seen yet and that's [inaudible] parameters. We've seen that it's quite inconvenient to have to write polynom of map everywhere. Can one do without the map? The problem is that the number of key-value pairs passed to map can vary. But we can accommodate this pattern using a repeated parameter. The idea here is that we can give definition bindings that take a list of Int Double, this list is represented by the star here. Barrack parameter is given by a parameter type followed by an asterisk. What that gives internally is a sequence and what we can do then is essentially convert the sequence to a map, there's a handy two-map function for that. Give it a default value is zero and that would essentially create a polynomial with the correct map, and if we do that, then we could write polynomial and then just the bindings like that, that's just three pairs. That would match this type in double star and that would invoke this method definition of polynomials. Inside the polynomial function, the bindings here are seen as a Seq of Int and Double. Generally, a repeated parameter like this generates a sequence of the element types that precede the asterisk. Back to the worksheet. In my worksheet, I'm going to do it slightly differently instead of creating a separate def that creates a polynomial, I create a secondary constructor directly, that would be bindings. Without a secondary constructor, now we see that our polynomial syntax works and we don't need the intermediate map anymore. Here's an exercise for you, the plus operation on polynomial used map concatenation with plus-plus. Design another version of plus in terms of foldLeft, that version would look like this. Plus takes another polynomial, creates a polynomial, and it essentially folds the other terms with something that you have to fill in and an addTerm method. The type of the addTerm method is known from the outline here, it would have to take a map of Int and Double and a term which is an Int and Double and would have to create the map that consists of the terms map plus overwritten with the additional term. Once you've solved the exercise, then try to answer the following question. Which of the two versions do you think is more efficient, the previous version using plus-plus or the version using foldLeft? I've added the outline from the slide to the worksheet, we have the plus operation, which is defined in terms of foldLeft and we have the add term operation, which also remains to be defined. Let's try to fill in the triple question marks here first. If you do a foldLeft, what's the map we start with? That's the map, the hour on maps, that would be terms. How do we add a term to a map? One way to do it would be to first decompose this term we say, "We know that's the exponent and the coefficient." That's my term. It's just gives me a handy way to talk about the two parts. Then it would be terms plus the exponent maps to or does it map to the terms of exponent what I have in the current map plus the coefficient that I have given here and that should be it. Let's see whether my example still works, yes, everything still works. Judging by these tests, the new implementation of plus is equivalent to the old one. But is it faster or slower? What's the answer here? But I would think it's probably faster, the reason is that with a new implementation of plus, essentially we go directly on the terms and we do a single scan with the foldLeft. We add essentially each term in a single scan. Whereas the previous implementation they built up a map of all the common terms and building up a map is costly then we are left with two maps and then the maps were essentially folded with plus-plus. This intermediate step to go into a second map probably costs some time. I would imagine that the foldLeft version is faster than the other.