[SOUND] In this lecture I will talk to you about gathering and analyzing large data sets that has become a defining characteristic of how systems biology experiments and analyses are conducted. Technology's developed over the past two decades have allowed for the measurements of many cellular components such as all the mRNAs being expressed in a cell or all the proteins in a cell simultaneously. Measurement of mRA, mRNAs is done by DNA microarrays, mnd measurement of proteins is typically done by mass spectrometry. These types of experiments can provide information about how the cellular landscape changes upon perturbation. And perturbations can be of different kinds, such as when a cell receives a hormone signal or a neurotransmitter signal, such as when a cell undergoes a state change, be differentiation or movement towards proliferative status as in cancer or the onset of other diseased states, such as hypertrophy or so on. In this, here I have shown two pictures that show how a microarray DNA, microarray looks like and sort of, a schematic of what proteomics can do for us, in analyzing cellular systems. So let me start with genomics. Genomics has become a very large area of multi-variant analysis, both experimentally and computationally. And it all got started with the development of microarrays. Microarrays are based on southern blots that were used in molecular biology. For quite a long time, which is DNA-DNA hybridization, where a probe which contains DNA oligonucleotides of a known sequence is hybridized to some DNA from a chromosome and that may have an unknown sequence. And based on the hybridization, one can identify specify DNA sequences [SOUND]. In building these microarrays [SOUND], what people did were to sort of, spot these oligos or print these oligonucleotides in a arrayed fashion on a gla, originally on a glass slide or on nitrocellulose paper. And from the location of where they were, one will know what the sequence was. And by la, looking at many, many of these oligonucleotides probes, one would be able to sort of measure the levels of mini mRNAs and so on. Typically, microarrays have been used to quantify and identify mRNA expression profile under changing conditions, a field that's called transcriptomics. the, the basic approach is to extract mRNAs from cells, under so, some control and perturbed conditions, convert the mRNA, to cDNA or copy DNA, by reverse transcriptase, couple these cDNAs to dyes, hybridize and then, visualize the hybridized products and that's what is shown out [SOUND] clearly in this in this microarray and computationally separate signal from noise. A typical microarray that is shown up there each part can represent a gene, but sometimes if you use multiple oligos per spot, one can represent mult multiple spots might recog represent a gene. A classic paper that I've sort of highlighted in my first lecture, and I bring back again, is one in science where, Lyer et al. From Pat Brown's lab, studied the response of human fibroblast to serum stimulation, and the change in 517 or more genes that give a profile, a broad picture of how the cellular program that moved the fibroblasts to a proliferative state [SOUND]. Measurements in genomics. There is a whole range, so DNA tests, sequencing technologies and also microarray technologies have grown rapidly in these last few years. And there is a range of there is a range of measurements that can be done in genomics. Of course the, the most definitive of these is sequencing the whole genome. sometimes, now it's called massively parallel DNA sequencing, simply because it is, you can sequence many parts of the genome simultaneously. And this new approach is what is called third generation sequencing, enables a whole genome sequencing to be done in about a week. sometimes, the sequencing is called deep sequencing. And what this is, the term, deep sequencing, means repeated sequencing of a DNA fragment or a region of a chromosome of a region of interest [SOUND] in a chromosome. And this allows, this repeated sequencing of a region greatly increases the sensitivity and accuracy. And this kind of sequencing tells you everything about the genome in terms of sequence and what changes that might be occurring in relationship to any disease or onset of or progression of any diseases. This kind of sequencing produces a very large amount of DNA, and there's need for extensive computational analysis to separate signal from noise, and we'll deal with sort of a computational analysis as we go along. I want to introduce a couple of terms here, in terms of whole genome sequencing. One is SNPs, or single nucleotide polymorphisms. It's a single base variation in the human genome that occur with a relatively high frequency. Almost no two genomes are really identical. They are a lot of SNPs in each genome and understanding the SNPs identifying the SNPs and understand, understanding how they may be correlated with disease stage is an important area of study the genomics of diseases. [INAUDIBLE]. Genome wide association studies. And I'll describe that briefly a little bit later. The other measurement in from sequencing the whole genome is copy number variation, alterations in DNA structures, such as a region of a chromosome that's abnormally duplicated or deleted. This may be one gene or multiple genes. Both SNPs and CNVs can be detected by whole genome sequencing, and in some cases, there are also chips that allow SNPS and CNVs to be detected by hybridization technologies. Exome sequencing. This is sequencing of the expressed genome. Here one separates a part of the whole genome to code for the proteins. The exons [SOUND] that is the exons and then sequence sequences them and often this type of exon sequencing is used to study genetic variations involved on disease states. CHIP Seq, or sequencing transcription factor bound DNA. CHIP stands for chromatin immunoprecipitation the CH for chromatin and the IP for immunoprecipitation. And typically here, one uses an antibody against a transcription factor of int interest, that cross-links the DNA to the transcription factor, pulls down, or precipitates, the transcription factor along with the DNA fragment attached to it, and then sequences the DNA fragment to identify what what had been bound to the transcription factor of interest [SOUND]. The other kind of a measurement in genomics that has become very popular over the last few years is called RNA Seq [SOUND]. RNA Seq is I wouldn't say starting to replace, but certainly is being used widely for sequencing expressed mRNA. And provides better sensitivity and quantity, quantitative precision as compared to microarrays. Here typically mRNAs are extracted and converted to and fragmented, converted to fragments fragments are converted to cDNAs and the DNA, and the sequence DNA fragments are then mapped on to a reference genome to identify the mRNA from which they came. The advantage of mRNA Seq is that it can be very quantitative and very precise. The dynamic range actually is very large for genes that are expressed at moderate and low levels, this precision and quantitative measurements are really, quite useful. There are limitations though. Many reads are required for accurate measurements of levels. That means there needs to be many of these probes that map on to [SOUND] different regions, that allow one to sort of, identify an mRNA with great certainty. And you know, an analysis of the fly genome, even after 50 million of these little pieces were mapped reads, not all the transcript had been found, and presumed that there are about eight to 10,000 sounds, transcripts and such a [SOUND] gene such a genome per cell. Eight or 10,000 transcripts per cell in such a genome. [SOUND], currently, mRNA Seq is several times costlier than microarrays, so depends on the kind of the knowledge one needs, whether one chooses to use mRNA Seq or microarrays [SOUND]. The third kind of measurements in genomics is really [SOUND] related to what is called epigenomics which is DNA methylation or micro-RNAs. Both of these control the level of mRNAs that can be expressed or sometimes when a silence genes in case, in case of DNA methylation silence genes completely. DNA methylation occurs by the addition of methyl groups to C or A in DNA. Typic, typically a five prime position in the sequence CpG dinucleotides are methylated, and this methylation leads to inhibition of gene expression. Detection of genome wide bisulfite sequencing allows for identification of methylated residues in the genome. Bisulfite converts C to U, cytosine to uracil but not methylated cytosine. And so, one can, by sequencing, find out those which are not converted to U, and thus infer that they are methylated. The method has some limitations. I won't go into these, but it is really useful to know which genes are likely to be expressed and which genes are likely to be silenced [SOUND]. MicroRNAs. MicroRNAs are small, 21 to 25 nucleotide stretches of RNA that regulate gene expression by binding to mRNA. And inhibiting translation and by couple of other mechanisms and can be sequenced. These microRNAs can be sequenced using RNA Seq, starting with size selected RNAs. There are about 1,100 human microRNAs or mirs, and typically they have these sort of numbers, like mirs [SOUND], and are made these up, 1001 or mir-123 or so on. And e, the interesting part about microRNAs is that, each of these microRNAs can regulate multiple mRNAs. And each mRNA can be regulated, by several of these microRNAs, so there is combinatorial complexity that arises due to microRNA interactions with mRNA. This combinatorial complexity allows multiple microRNAs to be associated with disease states, such as cancer and [SOUND] heart disease. So why are these types of genomic measurements essential for a systems-level understanding? This all changes in the cell, state both normal and disease related are at least in part related, due to changes in part due to changes in gene expression. Hence surveying mRNA expression patterns is really informative and gives them a lot of clues about [SOUND] the underlying mechanisms by which the cells respond and change state. Both the levels and activities of proteins are regulated, both by genomic and epigenomic characteristics. And control of the levels of proteins the mechanisms that control the level of proteins is, again, a critical fact in understanding disease initiation progression. True that genomics is not the only thing that drives cell state or disease pro, initiation or progression One can often get environmental factors, environmental [INAUDIBLE] that are critical. However, the genomic characteristics is one bookend or one limiting factor that either increases or decreases the propensity for either homeostasis or cell state change and so for either prevention of disease or initiation and progression of diseases. For examples such as these are mutated genes produce continuously active proteins, proteins like mutant Ras are involved in lung and colon cancer, changes in levels of proteins by methylation or mirs stated also regulate normal and disease processes. And this is why understanding genomics at a broad level is quite critical for understanding how homeostasis occurs and how change occurs in cells, tissues, and organs [SOUND]. Proteomics. Proteomics is the measurement of many protein simultaneously, most often by mass spectrometry. By measuring the levels of proteins, one can identify all the proteins within a protein complex. Identify proteins in an organelle. Determine the protein composition in cell at a certain state. And so there's all kinds of information one can get from proteomics. One can also, by measuring protein state, understand the dynamics of cellular activities. This is done by what is called phosphoproteomics or where one measures phosphoproteomics where one measures [SOUND] phosphorylated proteins at serine, threonine or tyrosine residues. One can measure other kinds of modifications, such as acetylation at lysine residues. In the cartoon in this slide taken from a review, were written by Matthias Mann and his colleagues. One looks at the, one can see the pipeline of how proteins of different sources can go through a series of steps to give information about a variety of cellular processes [SOUND]. So why do we need proteomics? Well, and I've given you one example from a recent paper, which I think is a very good example how proteomics provides an additional layer of knowledge. And this is a study whe, where these researchers looked at the global analysis of genome, transcriptome and proteomics in response to Aneuploidy in, in human cells. Aneuploidy is when a cell has an extra copy of a chromosome, and this is actually a very serious defect. Down's Syndrome, for instance, is caused by trisomy or three chromosomes or three copies of chromosome 21. This study checked what happened in these cells with extra chromosomes to mRNA levels, by microarray and protein levels by mass spectrometry. And in the figure here, one can see that indeed that increasing chromosome five increases the level of the copies of the genes, and it also increases the levels of mRNA [SOUND]. However, when they started to look at the proteins, what they found was in there was, the cells often tried to com, compensate for the level of protein, in fact the amount of protein did not proportionally increase to the amount of DNA or mRNA, and for, for several reasons. One is that when there were some proteins of a complex were free sub-units; these free sub-units are often degraded, and this was found to be the case. The pro, process called autophagy, or degradation, of excess protein the lysosomes are stimulated. All of this allowed them to control, allowed these cells to control. So, some proteins are increased in level, but many proteins are not. And this kind of information of what is the effect of increased levels of genes, such as by gene dosage a copy number radiation on the levels of protein is an important aspect of understanding how genomic changes can or cannot result in functional changes. The other type of large scale measurements is metabolomics, and metabolomics really is the mea, the measurement of all the metabolize found in a cell, tissue, or organ. This is an area that is still growing. And reach, needs to reach its full potential for providing us for understanding. Often, it is useful in understanding how phenotypic changes may or may not occur in response to genomic and environmental changes. And here is sort of an example, in this cartoon taken from this review as an example of how a small change in a DNA, such as a point mutation leads to no change in mRNA activity. But a large change in metabola, metabola, metabolites due to change in metabolism and this kind of can be when metabolic imbalances and disease states are dia, clearly related to function. In contrast, genomic, in contrast, this genomic case and environmental case may result in a large change say in transcriptome level, which is that changes in the level of mRNA. This in change may result in changes in the level of protein. But since these different proteins can be either compensated for by degradation, as I told you before, or by changing the activity of a related, functionally related or functionally compensatory protein, one can get metabolic homeostasis, so the change in protein levels here is not reflected in the change in met, metabolites. So, metabolomics, just like proteomics, can provide additional insights into how signals from the genome or signals from the environment can may or may not result in functional ac, changes in functional activity. [SOUND]