As a computational atmospheric scientist, my ultimate goal is to simulate a real tornado with a bunch of numbers so I can realistically construct a simulation in the computer. With a virtual tornado that behaves like the real thing, I can help move some of the research that happens in my field away from conditions that are dangerous and difficult to measure. But there are an infinite number of characteristics of a real tornado that I can measure out in the field. How fast is the wind? What is the temperature? How much moisture or dirt is in the air? How many trees were knocked down? What were the weather conditions before the storm? How do I choose the measurements that are important to represent a tornado in a computer? When trying to understand scientific events on a computer, scientists study data that falls into three categories; statistical data, observational data, and simulation data. Statistical data are best at representing scientific events that are fairly consistent and too large to measure or store on a computer. Examples of this include things like the composition of star types in the universe or the types of atoms in a drop of water. In this visualization, we're flying through a virtual Milky Way galaxy. Earth-based telescopes can't penetrate the gas and dust of a galaxy. So we can't directly observe all the stars in our own galaxy. The stars we see in this movie as we leave our local stellar neighborhood, are represented statistically. Their sizes, colors, and positions are based on the stars that we can observe from Earth. Observational data are exactly what they sound like, data that have been collected by observational instruments. Some of these instruments are complex like telescopes, particle accelerators, and lidar scanners. Some of these instruments are more familiar though and they include things like cameras, microphones, and even our own eyes and ears. This visualization over the surface of Venus was generated from radar data observed by the Magellan satellite that was orbiting the planet. This dataset was simply stored as an image file where the pixels of the image corresponded to locations on the planet's surface. Simulation data is the type of data we get from building a virtual laboratory in the computer trying to capture the dynamics of a physical process as completely as possible. This is how we might go about creating a tornado in the computer. Scientists also refer to this process as building a computational model of a tornado. Simulation data is particularly rich for 3D visualization. This type of data can be found for many different fields from Biology to Geology to Astronomy. A numerical simulation might try to approximate how atoms and molecules interact with each other or how clouds move with the wind or how gravity affects massive clouds of stars and interstellar dust. These computational models exist across many different scales of space and time. They format their data output in lots of innovative ways to translate highly abstract ideas into the mathematical engine of a computer. Perhaps the most tangible of these formats is the one we've already discussed, surfaces. We've seen how you can collect data that represents a tea pot. But surfaces made of connected triangles can represent any physical object with a hard surface like stone artifacts or they can represent the boundaries of a probabilistic event. In this molecular dataset, we aren't seeing individual atoms but rather an average of the atoms locations over time. For datasets formatted as surfaces, sometimes the shape itself directly addresses the scientists questions. However, frequently the scientists needs more information than just the shape. So they will store that information on the vertices. In this visualization of a field of soil plants, you can see that even individual leaves might have a variety of colors. These colors are driven by data values called attributes which are numerical characteristics specific to a dataset. Here, the attribute driving the color is absorbed sunlight which is stored along with the position of each vertex in the leaf geometry. It bears mentioning that the common OBJ data format has a very limited ability to store values on vertices like I'm describing. If you're interested in learning more about storing data on surface vertices, I recommend looking at a format like the PLY geometry format. Another data format that you will be familiar with is image data. If you think about it, every photo you take is its own dataset. Across an image, there are millions of pixels each with their own color value. Images are often useful to paint the ground plane of your visualization. The ground plane of this visualization shows the map from data assembled into what scientists call Geographic Information Systems or GIS. While Geographic Information Systems are able to store color like all images, they are also able to store a wealth of information at each pixel of an image. In this visualization, we analyze a GIS dataset that includes data about the population in a small city, the types of homes that exist there, and the severity of damage a tornado did to the city in 2011. You can think of the width and height of an image as just a massive 2D spreadsheet where each pixel is a cell in the spreadsheet. We often visualize the values of the cells as colors but they can represent anything, even velocity or time. There are a couple of data formats that will be less familiar. One of the most common data formats used in computational science is called a point cloud. A point cloud is simply a collection of points with positions in XYZ coordinate space. Usually, the points will have other attributes associated with them like size or color or some kind of identity. Point clouds are ideal for representing physical objects that are distant from each other and unconnected like stars or atoms. When the points in a point cloud interact with the environment and each other based on traits like gravity or pressure or buoyancy, they are called a particle system. For instance, in this particle system, each of the points represents a star with some mass and each star's mass is bound to the others through gravity. A common source of point cloud data is 3D scanning. Techniques like laser scanning and photogrametry collect data from solid objects as dense point clouds rather than as connected surface geometry. A point cloud can be used to generate a surface if you can generate information on which points connect to each other. This is also the reason that we don't call it a vertex cloud. A point is only considered to be a vertex if it is describing a surface. Sometimes, the most important thing about a point cloud is how the points are related to each other. This visualization of a planet collision was created from point cloud data. The points represent parcels of matter that are continuously changing from solids to liquids to gases and back. One point over the course of its lifetime might transition several times between a solid or liquid surface and a fuzzy gas particle. There's another data format that is focused more on how parcels of matter influence each other in 3D space. A volume is a 3-dimensional grid that stores values inside well-defined boundaries like this square glass vase. These boundaries don't always encompass the entire object. For instance, a tornado volume might clip off the edges of the storm cloud from which it forms. Volumes are represented by evenly dividing the space inside the boundaries into cells. They are another type of mesh like triangulated surfaces but it might be more helpful to think of a volume as a 3D image. In fact, each cell of a volume is called a voxel which is a play on the word pixel. Each voxel stores numerical values of scientific interests. While point clouds are great for representing the individual particles of matter in a gas cloud, volumes are better at representing the statistical probability of a particulate across that gas cloud. This visualization shows different types of clouds in the early universe. The visualization exposes the 3-dimensional grids that produced these clouds. Often, volume voxels contain more than one value, in the same way that the image data we discussed can have color, population, and damage values stored in a pixel. Here, we see many overlapping types of cloud. Each overlapping attribute of a volume is referred to as a field. Volumes can also contain fields that represent important characteristics of a gas such as temperature or wind direction. The cells of the 3D grid can refer to just about anything that's distributed in space as long as it's within the boundaries of the grid. A common type of volume data is an MRI scan which detects the water in your body. We often think of an MRI as a series of 2D images because when discussing medical procedures, doctors tend to present 2D slices through the 3D data. Volumes, point clouds, images, and surfaces don't have to exist on their own. Often, they interact with and enhance each other especially in visualization contexts. Let's return to this visualization of a field of soil plants. The shapes of these plants are surface data but the leaves are colored with photographic image data and each plant in the field has been copied unto a different point in a point cloud with a different identity which makes each plant look unique. These various data types can often represent the same phenomena. There are methods for converting each one into the others. Here we see rising smoke represented as a particle system and evolving surface and a volumetric simulation. It's possible to see how with these different scientific data structures, we can start to recreate events in astronomy, biology, and weather. It's important to note that these highly complex research data sets are typically not assembled manually point-by-point but by computational instruments and algorithms that are guided by researchers. This is why 3D scientific visualization is considered extremely useful as an analytical tool. It helps scientists investigate highly complex data that's impossible to have a comprehensive understanding of even if you've created it. This tornado visualization features this kind of complex volumetric data. The simulation has been augmented to include a particle system in the eye of the storm and surface data in the form of tubes and cones demonstrating the wind direction. Finally, we have the ability to represent a real 3D tornado inside the computer. This virtual experimentation can be faster, safer, and more flexible than field experiments. It can be the source of some rich data for visualization. Fun fact. In the film industry, point clouds are frequently used to store a motion capture data. Small tracking dots will be placed in an actor's face and body. Dozens of cameras watch those dots and computers combine the camera imagery to calculate 3D coordinates for each dot. Once these virtual dots are created, they can then be transformed to fit the face and body of any digital character. Usually, these performances can feel so real that they are creepy though and the filmmakers actually have to remove some details so they don't scare the audience.