We turn to a general case where we're looking a general three dimensional scene through a two dimensional picture. As we move around in the three dimensional scenes we take a different picture of the world, and we would like to know, where am I? Meaning how I'm positioned relative to this object in the world, and how I'm rotated in that space. And this is again, assuming that we have a camera that's calibrated, with a known camera calibration matrix K. And we would like to estimate rotation and translation, as we move into space. And again we assume somethings known about the three dimensional scene. And this is shown in this illustration on the left point clouds, reconstructed through methods very similar to what we have covered in the previous lecture in that people geometry. So this set of point clouds are three dimensional points that we estimated through a camera that moving the space. And we triangulate the points to obtain the three dimensional position for those points. And we record appearance of each corners or feature points in image space with a picture attached to it. Now, again, we have to move into space, robot flying through space, you see the picture on the right. Can we identify the exact location of this image, in the three-dimensional space? As well as orientation of this camera on the space shown on the left, and this is a general camera 3D registration problem, and the algorithm I'm going to describe is a simple form of perspective and algorithm. So mathematically the setting is the following. We have set of three dimensional point clouds. There's the points in the three dimensional space, XYZ, each one attached to a descriptor of itself, identify who they are. We took a pictures, and from the image we are able to extract some feature points as well, marked in blue. Those are two dimensional points, XY view of my camera, and again I extract some descriptor from those feature points. And through a matching algorithm I matched descriptor of those points in a 2D with a descriptor in a 3D, which is again appearance over that feature in image we match those either automatically or by hand as shown here. Now we have established a set of correspondence between the three-dimensional points in space and the two-dimensional points of the image. And it goes now, try to locate the camera center. In this three dimensional space of the point cloud, as well as figure out the rotation of the camera, relative to the three dimensional points. This can be done, for example, [INAUDIBLE] if you imagine you have a picture, you have points in that picture, you simply take that picture, you move it around in the space Until your up, the ray that going through an optical sensor, through the feature points, through all the feature points, this [INAUDIBLE] rays, intersect with three dimensional points. This can done by ad hoc methods, which is by you spinning and moving space until everything lines up, but as you can see, this is very tedious. And we have a simple method to compute that, through the least squared algorithm that we seen earlier. So the way we do it, again, this is set up set of equations of the following form. We'll have a two dimensional points, the little x1 of a three dimensional points, the big X1 in 3D space. We have a camera projection equation that takes this three dimensional X1 with 1 attached to it in a homogeneous coordinate. Multiply a three by four P matrix equals to little x1 in a 2-D space with 1 attached to it in the homogeneous coordinate. The lambda in fact shows me how far that three dimensional point it is on this optical ray on the 3D point to the optical center. So P, recalls, again consists of a calibration against k and rotation t. So if we now the k, and we know that P matrix, then we can recover the rotation matrix R and T. So the goal again, we have one constraints, a set of one set of constraints, for one [INAUDIBLE] points, marked in black here, and unknown variable P is what we want to estimate. We call the trick we had the last time, which is taking cross product between a vector with itself, setting itself to 0. So we take a cross product between X1 1, the optical array in my field of view. That cross with matrix times xy in 3D with one attached to it that point, you put a zero. Again what's marked in black is the known unknown. Further we write the P matrix out in three rows, P is a 3x4 matrix, so we have three rows, each row is four dimensional long, and we take that pattern between P1 with X zero. X0 is the X1 with one attached to it for the initial vector. And as we saw earlier, this dot product is a scale vector which then can be transposed. And using the trick we saw earlier, we can put vector x tilde to the left-hand side of the P and shift all the unknown variables of P into a column vector, where the first four elements is first draw of P transposed. Followed by P2, the second row of P transposed, followed by the third row, P3 transposed. The known quantity of X tilta, which is the three dimensional points in the world, is then shift to the middle, where we have three rows, and each one has a diagonal elements, elements 1 by 4, which consists of x tilde transpose on the diagonal form. Again, the two dimensional vector forming a three dimensional optical array cross pat over that, can be written as skewed metrical matrix. Combining our elements in the back which is the known quantity of three-dimensional points and two-dimensional image points, we obtain the following equation, where the purple marks the no measurements. In this case, it's going to be a three row, and 12 columns. It's 12 column, because call x tilda is a four dimensional vector. It's a three dimensional x, y, z, with one attached to it. So we'll have three pieces of those. So, twelve columns wide, and the P has total three by four elements in it. So, therefore it's a twelve dimensional vector and that's the unknown we want to find. Again, I have the purple matrix times this unknown vector equals the zero. So I formed this matrix A x is zero. This is called, this is the least squares problem we can solve, except if I just have one point, we do not have enough constraint to constrain the solution space. In fact you can see, that for every corresponding points, from 3D to 2D, we have a 3 by 12 matrix, but that, in fact only give you rand 2, because only two of the elements is independent. We only have x, y, measurements, obtain enough constraint to this problem, we need to obtain about six points, each point gives you two constraints. The total of twelve constraints can be produced from six corresponding points. Given six points, we construct combined A matrix, by taking each 3 by 12 elements, and stack on top of each other. Therefore this matrix have 3 times 6, 18 by 12 dimension to it. Then we look for the solution of this combined A times X equal to 0. The solution to this problem of AX = 0 is again, taking the singular value E composition of matrix A into v, d, v transpose. And we take the last columns with v and that gives me the solution for X. The last column of v vector In fact is the endomatrix key. We take the first four elements and that's the first row of P, the next four it is second row of P. The last four is the third row of P. So now we obtained projection matrix P 3 by 4 or 2 decemee square computation Recall, we have to take this P matrix and extract out the camera calibration matrix K. And assume K is known, we can take K inverse of that obtain a matrix consists only of rotation and translation. Can we just take the first three columns of the matrix and define as R? Not quite. This is because a rotation has special form. In fact, rotation can only be written down as a [INAUDIBLE] normal matrix U times V transpose. So therefore, the matrix R we obtain, by taking K inverse of a P that we obtained, had to go through a further processing step, where we take SVD of that matrix, R and extract a cleaner version of it by ensuring that the diagonal elements of the D are become one. You can think of this for the clean up operation, and this operation ensures the rotational matrix that we obtain, in fact, had proper rank constrain, and also normal constraints. So now we have obtained a rotation matrix, how do we ascertain t. We know at this point we have a scale ambiguity. In fact this scale ambiguity can be recovered by taking the diagonal elements of SVD of R, and we simply divide by the larger value, sigma 1 by the fourth column. And the t is simply that, K verse of P fourth column, divided by sigma 1. So you combine them together, given the P that obtains the least square computation, we first abstract the K calibration matrix by taking K inverse for that. We then abstract the rotation translation so a further step of singular decomposition to clean up the rotation matrix are an extract of the scale factor [INAUDIBLE]. Both using elements from SVDs of R matrix. So, now we've seen there are several methods we can use to locate the camera itself, in the three dimensional environments. As we move around a 3D space, if we know there are known vanishing points, we can orient [INAUDIBLE] myself relative to those vanishing points. If we have a planar objects, we can use this planar object as my reference guide to move myself in space. And the most general case, in fact, we have many three dimensional estimation algorithm allowing us to compute 3D shape in the environment, and that set of three dimensional shape allow me to orient myself if I work in the space in other time. Say imagine we're walking around campus, I have seen the buildings, I can estimate the 3D positions with this buildings as I walk by and later today as I walk into the same scene, I can use those 3 dimensional points to help me to localize myself in the 3 dimensional space.