We've been talking about what's called, a full map directory, where there's a bit
processor core. If you have 1,000 cores in the system,
that's a very large bit map. And, it's pretty uncommon that a thousand
core in the system will all be reading all of the data, or all of one particular
cache line. So there might be wasteful and your
directory in a format directory grows at order N.
So that's, that's not particularly good. Hmm, so, people have looked into
different protocols here. I just want to touch on this, this is
largely the area sort of research and future directions of this.
One, one idea here is to have a limited pointer based directory.
So instead of having a bit mask of the share of list.
So its right this is the share of list. So its sort of a bit mask of the share of
list you have base two encoding numbers up to some, some point.
This is why its called limited directory or limited point of directory, because
you can't have all the nodes in the system on the list.
There is none of entries here. But you can name them because you have
base two encoding of the actual number. And if you get bigger this in this entire
list, so lets say you'll have one, two, three, four, five entries an over sense
six sharers, six mead owing copies of the list want to be taken.
Usually this is an overflow bit here which says its more than six and when its
more than six and it will share and also in a transition to modified.
You're going to just send a broadcast message or send a invalidate every single
of cache in the system. But usually this could be a good trade
off, because it's pretty uncommon to have extremely widely shared lines.
So it's an interesting trade-off. There are storage space, versus sending
more messages in the system. Like wise there's interesting protocol
here called limitless. Where same idea, it's a limited directory
and overflow bit, but if it overflows you start to keep the directory, or you start
to keep the sharer list in software in a structure in main memory.
And this requires you to basically have very fast traps such that when this
happens because your, your servicing cache line here you interrupt the main
processor and the main processor provides the rest of the sharer list for instance,
if the sharer list gets overflowed. So, there's, and there's a bunch of stuff
in between in some future, future research that's still being done actively
in this space. Beyond simple directory coherence, we
have some pretty cool on-chip coherence. This is why this is actually being
studied a lot right now, is people built this massively parallel directory based
machines in the past. They got some use, they were very good
for some applications. But now we are starting to put more and
more cores on to a given chip, so we start to see on-chip coherence.
And figuring out how to leverage, the fast on-chip communication alongst
directories to make more interesting coherence protocols.
There is something called COMA systems or Cache Only Memory Architectures, where
instead of having a data in main memory, you don't have main memory ever and the
directories move around. before beyond scope of the scores worry
about the KSR one if you want to go about that kind a score research one and then
also this real had a scale of the sharer list is active.
Briefly wanted to talk about the most up-to-date versions of these things.
We have the SGI UV 1000 which is a descendant of the Origin and the Origin
2000 machines from SGI, lots of cores here,