To conclude this week, let's put the pieces together. You're going to use what you've learned about collection operations together with some of the other functional techniques to solve a non-trivial programming problem. The task we want to solve is the following. Once upon a time, before smartphones, phone keys had mnemonics assigned to them. The mnemonics were always the same so each digit from 2-9 had a string of three or four letters assigned to them. The purpose of these assignments where as a way to remember phone numbers better. For instance, if I wanted you to remember the phone number 722-524-7386, I would have to come up with a phrase that you could type, or rather you would type the digit corresponding to every letter here. That would then in the end give you the phone number that we wanted you to dial. The point of this map was as a help to remember phone numbers. For instance, if I wanted you to remember this very long phone number here, then I could have given you a phrase instead. In this case, maybe I could give you the phrase "scala is fun." The idea then is you would just essentially type out the letters instead of the digits. You would type the letter S, well, that corresponds to the digit seven, so that gives you the first digit of the number. SC and A would both map to two so that gives you 22. Then L would map to five and so on, so you get the principle. It's easier sometimes to remember a phrase like that than to remember a phone number. Of course, at about the same time that phones had these mnemonics, there were no URLs or internet addresses so the way to contact a business was to dial a phone number. That's why there were a lot of radio ads or other things that gave you essentially these mnemonics strings instead of the real numbers which nobody can remember. The aim now is that you are given a dictionary of words that you can use and a phone number, and you should encode the phone number that gives me all phrases of words that can serve as mnemonics for that given number. Typically, there would be more than one phrase. I want them all so that I can pick the one which is nicest or cutest or most easy to remember. I've given you the outline of a class to do that here. We are going to proceed to the solution in four easy steps. The class is a class coder. It contains the dictionary, which is a list of string as a parameter. It has a field mnemonics, which is exactly as it was given here so I haven't bothered to repeat the definition. Then we have four methods that we should implement one by one. The first method, charCode, maps a letter to the digit it represents. Some mnemonics was a map from digits to letters in these strings, now we want to go the other way. That amounts to essentially inverting the map. We want to swap essentially keys and values, but it's a little bit more complex because one half is essentially buried in the string. What we do is we let digit and string range over all the pairs in mnemonics, and we let letter range over all the letters in string, and then we return the pair of letter to digit. Each letter that we find here corresponds to this digit of course, so we just return these bindings as individual paths. Good. Now that we know charCode, let's scale that up to wordCode. We want to map a word to the digit string it can represent. Here is the implementation. Essentially we map each letter in the word with charCode. For each letter in the word, we want to know the digit it can represent. We want to concatenate all these digits that gives us the digit string that the word as a whole represents. Before doing that however, we should convert all the letters in the word to uppercases because charCode is only defined for uppercase characters. That leads to this right-hand side here. The third definition is wordsForNum. What we want is a map from strings to list of string that maps a digit string to all words in the dictionary that represent it. It's again, in a sense the inverse of what we've done here. Here we've given a word, we know the digit string it can represent, and now given a digit string, we want all words that represent that digit string. Is there an operation in the collection library that corresponds to that? The answer is yes, that's a groupBy. It's words.groupBy(wordCode). Indeed, if we look at it, we say, okay, for each word, take the word code, that's a digit string represented by it. Then that becomes the key of the map. We add to the key as a list of strings, all the words that have this wordCode, that's exactly groupBy. Furthermore, what we want for num is a total map. We want to say even if a string doesn't appear in the map, we still want to map it to something. In this case, we map it to Nil. That means that digit string doesn't correspond to any word in my dictionary. Now we're just left with the encode method. Encode is the method that essentially does the main work. It takes a number, a digit string, and it should give us back a list of strings where each of these is a phrase that essentially maps to the number. We want to have a set of solutions because we said we're interested in all the possible word combinations that map to the given number. Our result type is a set of list of strings. It's a set of phrases. Each phrase is represented as a list of strings. To implement in code, we use a principle which is very common in functional programming, in fact, which we have used time and time again without calling it as such. The principle is called divide and conquer. The idea with divide and conquer is that we split the problem into essentially two or more easier sub-problems. We solve the sub-problems, and then we put back the solutions together for the final solution. What are two subcases for splitting a number? I guess the first distinction would be is the number empty or not? We have to handle either case. If the number is empty, what do we return? We want to have a phrase that corresponds to the empty number. That's easy. That's just the empty phrase. We get a set consisting of a single solution, which is the empty list. What do we do if the number is not empty? We have to find another divide criterion to split the problem into two simpler sub-problems. One divide criterion could be to say, well, we have this digit string, let's say 722563. Then we want to say, well, how many digits do we use for the first work? If a choice, it could be one, or it could be the whole digit string or anything in between. That would be a possible split point, and we work from there. That leads to the following outline. We say we take the split point which starts at one and ranges up to the length of the number. That gives us a point where we want to split it. Then we compute the word that contains the digits up to split point. We compute the rest of the phrase that contains the digits following split point, and we compose the rest and the word together to our solution. That's the outline. Why did I write.toSet here? Well, it's because, in the end, I need a set of list of strings whereas here I have a sequence. I start with a range and that will get transformed to other sequences vectors normally. But that's not what I want in the end. I could either have written here the whole thing,.toSet that would have worked, or like I did it here, I put the two set immediately around the range. Now I'm working with sets here. In these for-expressions, essentially the collection you start with is also the collection of the result. The result then in this case will again be a set. But I have still two things to fill in, namely how to compute the leading word and how to compute the rest. To fill in the first tuple question mark, here's the solution, we take the digits of number up to split point and pass them to words for num. Words for num will give us back a list of words that correspond to these digits, and we let word range over the elements of that list, so each word is a possible solution for the first part of the phrase. How do we define rest? Rest is just a recursive call where we now encode a number without the first split point digits. What we've done we've reduced the encode problem by two case distinction, one was number is empty or not, and the other was essentially where do we have the split point to a simpler problem, namely recursive code of encode where the number is smaller. We have dropped at least one element from our number. Here in the argument to encode. To conquer then is essentially putting the things together. That was just word comes rest. Here's a little test program. We can have a function code which takes a number, and we build up a coder with some sample words in the dictionary. We encode the number and essentially that would give us a set of lists. What we do is for each element of the set, we just form a phrase from the list by putting spaces between the words in the list. If you have a sample on Scala code and then that number. Then that would be the resulting set that you get. This example actually has a history. It was taken from a taper by Lutz Prechelt which is called an empirical comparison of seven programming languages. The languages also show the time when it was taken. The task was for students to solve this problem in different languages. Each group of students had a different language so several groups per language. Languages were Tickle, Python, Perl, Rexx, Java, C++ and C. Essentially, the question was, how long are the programs? How correct are the programs? How many bags? How fast do they run? The interesting bit was the code size of the solutions. The code size medians were about a 100 lines of code for the scripting languages so essentially the first four here, and 200-300 lines of code for the others. What's also interesting was the runtime. You would expect naturally that C, C++, maybe Java would run a lot faster than the scripting languages. If you take the average running speed of the solution then the big surprise was that actually the scripting languages were quite competitive. Why was that? Well, it was because scripting languages have built-in collection data structures like maps and lists and all the things we were using. Whereas, C++ and C had not and Java only had in a limited form. That meant that people were essentially using the standard building blocks and not going terribly wrong, whereas in C, C++, people would tend to essentially build their own data structures and sometimes they would get it horribly wrong in terms of performance. That would give you very slow solutions that then would ruin the average. The fastest solutions, of course, were again in C and C ++ as well. It has to be said. But what's interesting in our case now is we have something which is very well-performing and even much shorter than the scripting languages. We ended up with maybe 15 lines of code instead of a 100 lines of code. We have essentially strong type safety and pure functional programming as additional bonuses. What we've seen is that Scala's immutable collections are very easy to use because there are only few steps to do the job. We have powerful operations such as Groupby. Some workload in other languages sometimes takes a long sequence of steps. They're very concise, single word, like a map, replaces a whole loop. They're safe because the type checker is really good at catching errors. They're very fast because collection Ops are tuned and they can also be parallelized, and they are universal so there's one vocabulary to work on all kinds of collections. We've seen that essentially the same words filter and map can work on sequences, all kinds of sequences, even degenerate sequences, such as strings or [inaudible] , but they can also work on sets and they can work on maps. That's not the end of it. We'll see in the next units that this idea of collection like structures actually extends to a lot more things than just a narrow notion of collections. We have essentially a universal vocabulary to attack many problems in a safe and fast way. That makes them indeed a very attractive tool for software development.