[MUSIC] This lecture is about DNA. Since the introduction of DNA technology a few decades ago, forensic science has been revolutionized, and Edmond Locard's statement that "Every contact leaves a trace" has really come true, or almost completely true. And with DNA technology, those traces can often be individualized to a particular person. One area where DNA technology has made an enormous change is in sex crimes. Because previously in a sexual assault or rape case, there might be only the victim and the perpetrator, and of course, if it's a rape-murder, there's only the perpetrator left alive. But DNA technology means that forensic scientists can identify the perpetrator of these crimes, even though there is no witness. Well, let's talk about, what is DNA? A human being is made up of an enormous number of cells, and inside every single cell, there's the nucleus. Well, every single cell, except for your red blood cells. Your genetic material is inside the nucleus of each cell, except for the red blood cells. In the genetic material, inside the nucleus of a single cell, there are the complete instructions to create a human being and it's amazing to think that all that information is packaged inside your little bit of DNA which weighs about seven picograms. So you may think that your new laptop has a fantastic amount of memory on it, but that is really nothing compared to the information that's contained in the DNA molecule. While we're thinking about where is the DNA in the cell, there's an important thing that we must think about in terms of DNA at the crime scene. You never find DNA directly at the crime scene. You can't go round a crime scene with a very big magnifying glass, picking up pieces of DNA. What is collected at the crime scene is biological material that contain DNA. The DNA can then be extracted from that material back in the laboratory. Examples of such materials would be blood, semen, saliva, skin cells, hair, though that's often not very good, and of course, body parts. Now, back to the DNA. The DNA in the nucleus of your cell is packaged into 23 pairs of chromosomes. So lets look at the chromosomes. So, we humans have 23 pairs of chromosomes for a grand total of 46. Different species have different numbers of chromosomes. The pea plant manages with a total of 14 chromosomes, whereas dogs need a bigger total of 78. Well, out of our 23 pairs of chromosomes, 22 pairs are normal chromosomes containing genetic information. The remaining two chromosomes contain the information that determines sex. So of course, if you have the XY combination of chromosomes, you'll be male; if you have the XX combination, you would be female. So forensic scientists, by looking at these chromosomes, can determine the gender of the person. The different chromosomes which are shown here in this scheme, they are different sizes and different shapes. And when they've been dyed, they show up with particular distinctive patterns so you can tell them apart, and they are numbered one from the biggest through the 22 and then the X and the Y. Your genes are contained within the different chromosomes. Okay. Now, an important point that we'll come back to later in the lecture is that some of this is inherited from the mother and some of it is inherited from the father. And this will be important later on in the lecture. Let's take a close look at a chromosome. There's a chromosome. It's basically a length of DNA which is wrapped up into a particular shape, and it's wrapped up in this particular shape with the assistance of small protein molecules which are called histones. So basically, you're keeping your DNA tidy and organized. So that's the chromosomes and as we said, the genes are contained within the chromosomes. So let's have a look at the genes. Now one of the surprising things about DNA is that your genetic information, the information to create a human being, appears to be contained in only part of the DNA. That part of the DNA is known as the coding region, which has the genes. The rest of the DNA is known as the non-coding region. The non-coding regions of DNA is a lot of your DNA. 90 or 95% of your DNA is these non-coding regions. Why do we have them? We don't really know, but it's there, and it's often referred to as junk DNA. Presumably, it has a purpose, but we don't really know what it is, so we call it junk DNA. Okay, now the genes that are contained within these chromosomes, just as the number of chromosomes varies from species to species, so does the number of genes. And typically, simple organisms have much smaller number of genes, complex organisms have a lot. So bacteria have a relatively small number, and humans, it's estimated, have somewhere like 30,000. Now, what are genes made of? Okay, genes are made of what are called base pairs, and a single gene will have about 1,000 to 10,000 base pairs. Okay, let's get down to the molecular level, so that we can understand this concept of a base pair. And let's look at DNA at the molecular level to see what it's made of. And it turns out to be very, very simple. DNA is made of a sugar. Not the ordinary sugar that we put in coffee, and tea, and so forth. It's a sugar based on a simpler sugar molecule called ribose. [SOUND] But it's not exactly ribose. It's actually a derivative of ribose, where the hydroxyl group in the two position of the molecule is absent, so it's called 2-deoxyribose. And it's the D of deoxyribose that gives us the D of DNA. In addition to 2-deoxyribose, we have a phosphate molecule. So if we combine 2-deoxyribose with phosphoric acid at the exact right position, we get a phosphate ester of 2-deoxyribose. All right, now the third component are the bases. Where do the bases go? Okay, the bases are attached to the 2-deoxyribose, and now we have the basic building block of DNA containing the sugar, the phosphate and the base, and this molecule is called a nucleotide. Okay, now which bases are they? The bases are heterocyclic, nitrogen-containing aromatic molecules, and there are four of them used in DNA. And they are adenine, guanine, cytosine, and thymine. So we have 2-deoxyribose, we have phosphate, and we have these four bases. Therefore, we have four possible nucleotides, and these four nucleotides are known by their initial letters, A, G, C, and T. So how do we take these four nucleotides and build them up into a molecule of DNA? The key is the deoxyribose portion and the phosphate portion. DNA is actually a polymer of the deoxyribose phosphate ester with the bases attached on the side. So we form a molecule which is in a long chain with alternating sugar, phosphate, sugar, phosphate, sugar, phosphate, and along the chain we have the bases attached onto the sugar. In the example shown here, the sequence we have an A, a G and a C and a T. But because the base is not involved in forming the chain, we can actually make any sequence we want. Now, you probably heard of this concept of the double helix. What does it mean? Well, the DNA double helix. DNA consists of two chains, not just one, and these are wound together, and these two chains are held together by a chemical interaction which is known as a hydrogen bond. Now, a hydrogen bond is not a very strong interaction, but it occurs all the way down the DNA chain, so the double helix is stable under normal conditions. Now hydrogen bonds are actually quite common in chemistry. Okay, and they are in fact what makes water a liquid. If you simply look at an ordinary water molecule, and look at just the properties of a water molecule, you would think that water should be a gas at room temperature and pressure. But it's not, it's a liquid. And the reason water is a liquid is because the molecules stick together through hydrogen bonds. And that is an interaction between a hydrogen of one water molecule and an oxygen atom of another water molecule. And it's the same interaction that holds the DNA chains together. Now, these hydrogen bonds in DNA are not just between any old bits of the molecule. They are very specifically between the bases on the different chains. But its more than just sticking the chains together through the bases, there are specific interactions so that the bases go in pairs. So if you have a nucleotide of an A on one chain and a T on the other chain, the A and the T will interact strongly together. In fact, A and T will recognize each other and bind together. Similarly, C and G will bind together through hydrogen bonds. And you can see from the diagram that A and T complement each other very nicely. G and C also complement each other very nicely. So, this is called complementary base pairing. So in the human DNA, there's about three billion base pairs. Okay, and once again, this varies from species to species, and it not clear why some species have so many more base pairs than others. For instance, this cute little plant here, Paris japonica, it's believed that it probably has the most base pairs of any species, 150 billion base pairs. And if you took that DNA molecule and you stretched it out straight, it would be 91 meters long. And why this plant needs so many base pairs is a mystery. Okay. So we know why different nucleotides can stick to each other. It's complementary base pairing. So what happens in the DNA molecule is that you have this complementary base pairing where you match up the base pairs. This forms the helix, and then further hydrogen bonds fix it in this double helix. Now, let's take a quick look at the information in the genetic code. So this is a quick look at what goes on in the coding region. The coding region stores the information by the sequence of the base pairs. So it's like having an alphabet with four letters, A, T, C, and G, and DNA manages perfectly well with just four letters in its alphabet, whereas if you speak English, you need a total of 26. Okay, so here for instance, is a strand of DNA with the sequence -C-C-T-G-A-G-G-A-G-. Now, three nucleotides together codes for an amino acid. So, this is part of the code to make a protein molecule, and CCT corresponds to a proline, GAG corresponds to a glutamate, and then glutamate is repeated in this particular part of the code. Now this sequence, proline-glutamate-glutamate, happens to be part of the code to make the hemoglobin molecules which are responsible for transporting oxygen in your blood, so it's essential that the code is faithfully translated to make the protein. And if the protein is made wrong, there's going to be something wrong with your blood. In fact, suppose you even have just one letter wrong, suppose that middle A is actually a T, because of some error in transcription. This then no longer codes for proline-glutamate-glutamate, it actually codes for proline-valine-glutamate. So the protein structure is different, and if you have a valine in that position, then you're going to have a disease called Sickle Cell Anemia. This is how errors in the coding sequence give rise to genetic diseases.