0:00

One of the problems with object detection as you've seen it so far is

Â that each of the grid cells can detect only one object whatever grid cell wants

Â to detect multiple objects here's what you can do you can use the idea of

Â anchor boxes the start of an example let's say you have an image like this

Â and for this example I'm going to continue to use a 3x3 grid

Â notice that the midpoint of the pedestrian and the midpoint of the car

Â are in almost the same place and both of them fall into the same grid cell so for

Â that grid cell if Y outputs this vector where you are detecting three classes

Â pedestrians cars and motorcycles it won't be able to output to detection so

Â have to pick one of the two detection x' to output with the idea of anchor boxes

Â what you're going to do is predefined two different shapes called anchor boxes

Â or anchor boxes shapes and what you're going to do is now be able to associate

Â two predictions with the two anchor boxes and in general you might use more

Â anchor boxes maybe five or even more but for this video I'm just going to use to

Â anchor boxes just to make the description easier so what you do is you

Â define the cost label to be instead of this vector on the left you basically

Â repeat this twice so you would have PC B X B YB h BW c1 c2 c3 and these are the

Â eight outputs associated with anchor box one and then you repeat that PC B X and

Â so on down to c1 c2 c3 another eight outputs associated with anchor bhasu so

Â because the shape of the pedestrian is more similar to the shape of anchor box

Â one then enter ball two you can use these eight numbers

Â to encode that PC is one yes there is a pedestrian use this to encode the

Â bounding box around the pedestrian and then use these to encode that that

Â object is a pedestrian and then because the car the box around the car is more

Â similar to the shape of anchor box to the neck the box one you can then use

Â these to encode that the second object here is the car and have the bounding

Â box and so on be all the parameters associated with the detected car so to

Â summarize previously before you were using anchor boxes you did the following

Â which is for each object in the training set in the training set image it was

Â assigned to the grid cell that corresponds to that objects midpoint and

Â so the outputs y was three by three by eight because you're the three by three

Â grid and the grid position we had that output vector which is you know PC then

Â the bounding box then c1 c2 c3 with the anchor box you now do the following

Â now each object is assigned to the same print cell as before the signs of the

Â grid cell that contains the objects midpoint but it's assigned to an anchor

Â cell but it's assigned to a grid cell and anchor box with the highest iou with

Â the objects shape so you have to anchor boxes you would take an object and see

Â so if you have an object with um this shape what you do is take your to anchor

Â boxes maybe one anchor box is this shape as at the box one maybe anchor box 2 is

Â this shape and then you see which of the two anchor boxes has a higher IOU with

Â the ground troop on the bolts and whichever it is that object then then

Â gets assigned not just to a grid cell but to a pair just assigned to a grid

Â cell coma and cut off hair and that's how that object gets encoded in

Â the target label and so now the output why it's going to be three by three by

Â 16 because as you saw on the previous slide

Â why is now 16 dimensional or if you want you can also view this as 3 by 3 by 2 by

Â 8 because they're now 2 at the boxes by and and why is 8 dimensional

Â oh and dimension of Y being 8 was because we have three object classes if

Â you have more objects then the dimension of Y will be even higher so let's go

Â through a concrete example for this grid cell let's specify what is y sodium

Â pedestrian is more similar to the shape of anchor box one so for the pedestrian

Â we're going to assign it to the top half of this vector so yes there is an object

Â that we sum bounding box associated the pedestrian and I guess if a pedestrian

Â is cost one then we see 1 is 1 and then 0 0 and in the shape of the car is more

Â similar to anchor box - and so the rest of this vector will be 1 and then the

Â bounding box associated with the car and then the car is COC - so that 0 1 0 and

Â so that's the label y for that lower middle your grid cell right that did I

Â that this arrow is pointing to now one of this grid cell only had a car and had

Â no pedestrian if it only had a car then assuming that the shape of the bounding

Â box around the car is still more similar to anchor boss - then the target label Y

Â if there was just a car there and the pedestrian had gone away it would still

Â be the same for the anchor box - component I remember that

Â part of the vector corresponding to anchor box 2 and for the part of the

Â vector corresponding to anchor box 1 what you do is you just say there is no

Â object there so PC is 0 and then the rest of these would be don't cares now

Â just some additional details what if you have to anchor boxes but 3 objects in

Â the same grid cell that's one case that this algorithm doesn't handle it well

Â you know if you hopefully it won't happen but if it does this algorithm

Â doesn't have a great way of handling it we just implement some default

Â tiebreaker for that case or whether if you have two

Â objects associated the same print cell but both of them have you know the same

Â anti box chain again that's another case that does everything doesn't handle well

Â if you can spend some you know default way of tiebreaking if that happens

Â hopefully this won't happen then your data say that won't happen much at all

Â and so it shouldn't affect performance much so that's it for anchor boxes and

Â even though I've motivated anchor boxes as a way to deal with what happens if

Â two objects appear in the same grid cell in practice that happens quite rarely

Â especially if you use a 19 by 19 rather than a 3 by 3 grid you know the chance

Â of two objects having the same midpoint out of these 361 cells it does happen

Â but it doesn't happen that often the maybe even better motivation or even

Â better results that anchor boxes gives you is it allows your learning algorithm

Â to specialize better in particular if your data set has some tall skinny

Â objects like pedestrians and some wide objects like cars then this allows your

Â learning algorithm to specialize so that some of the outputs can specialize in

Â detecting wide you know fat objects like cars and some of the upper units can

Â specialize in detecting tall skinny objects like pedestrians so finally um

Â how do you choose the anchor boxes and people used to just choose them by hand

Â you know choose a maybe five or ten anchor ball shapes that spans a variety

Â of shapes that see to cover the types of objects you seem

Â to detect as a much more advanced version just an advanced comment for

Â those of you that have other knowledge of machine learning an even better way

Â to do this in one of the later euro research papers is to use a k-means

Â algorithm to group together two types of object shapes you tend to get and if we

Â use that to select a set of anchor boxes that this most stereotypically

Â representative of the may be multiple there may be dozens of object constants

Â you're trying to detect but that's a more advanced way to automatically

Â choose the anchor boxes and if you just choose by hand if a variety of shapes

Â that you're reasonably spans the set of object shapes you expect to detect some

Â tall skinny ones some fat white ones that should work easily as well so

Â that's it for anchor boxes in the next video let's take everything we've seen

Â and tie it back together into the yellow algorithm

Â