In this video, I'm going to give you a little bit of
introduction to the field of educational data mining which
is a fairly recent field where we analyze data that comes from educational setting.
And so, just to give a quick introduction,
I think it's no big statement to say that computers nowadays are pretty much everywhere.
You are probably looking at this video on a computer,
you might have a computer in your pocket in the form of your phone,
you might have a computer on your watch,
so computers are pretty much everywhere and they're a big part of our lives nowadays,
of our daily lives.
And so, it's not a surprise that they're also very present in the context of education.
So, that might be just in terms of institution collecting data on computers,
but it can also be from students using computers to learn,
engaging in educational video game,
intelligent tutoring system, scientific simulations or online courses.
And so, with all this computer involved in education,
there's a lot of data that can be collected about
the process of how students learns in those digital learning environments,
and this data can come from many different sources
including log files of detail traces of what the student is doing,
webcam feeds of the facial expressions of the student,
cameras that look at the postures of the student
or even audio files of what the students are saying.
And so, the field of educational data mining,
what we want to do is we want to see how we can leverage that data
to inform the study of education and to improve learning environments,
so we can use that to study how students interact with digital learning environments,
how we can build models of our students,
their different behaviors in those environments and
we can study the impact of those behaviors on learning outcomes.
And so, as I said earlier in the introduction of this video,
educational data mining is a fairly recent field and there're
two main research communities that have a lot of focus on big data and education.
The first one is
the International Educational Data Mining Society
which had its first official conference in 2008,
and the Society for Learning Analytics Research which had its initial conference in 2011.
So, the Society for Educational Data Mining they define
educational data mining as an emerging discipline concerned with developing
methods for exploring the unique and increasingly large scale data that comes from
the educational settings and using those methods to better
understand students and the settings in which they learning in.
And on the other hand, the Learning Analytics Society define learning
analytics as the measurement collection analysis and reporting of
data about learners and
the context for purposes of understanding and optimizing learning
and the environments in which it occurs,
so you can see that there's a lot of overlap in
those two communities even though they are still distinct.
So, they have a lot of overlap,
they have both a joint goal of exploring how this big data can be used to
support learners and learning and they also share a lot of techniques and methods.
So, what I want to do now is just quickly go over a couple of techniques that
have been used in educational data mining and learning analytics,
just to give you a brief overview.
So, many of the techniques that we use in
educational data mining and learning analytics come from computer science,
mostly the field of machine learning and data mining where they try to use
computer to analyze large amount of data in ways that is not easy for humans to do.
And so, educational data mining and learning
analytics are really interested in how we can apply those techniques
to educational data to study learning or to
provide better real time support for learner as they are learning.
And so, one of the first type of
techniques that we can use is what we call prediction modeling,
and more specifically two types of prediction modeling, classification and regression.
So, in the context of predictive modeling,
what we want to do is we want to develop a model that can infer
from one specific aspect of the data which we call the predicted variable,
and this prediction can be done about future events, for example,
trying to predict what the score of a student is going to
be on a future standardized exam or
they can be about something happening in the moment but
the learning environment doesn't give us information about.
So, for example, is the student disengaging from the learning environment at this moment?
And so, both regression and classifications are very
similar and how they differ is in what type of data we're going to be predicting.
So, for regression, we're looking at predicting
continuous variables such as what is going to be the score of a student on an exam,
whereas for classification, we're trying to detect and
predict variables that are going to be categorical.
For example, is the student currently disengaged or is the student currently engaged?
Another type of analysis that we do a lot and
educational data mining is what we call latent knowledge estimation.
So, the idea behind latent knowledge estimation is that we want to
get an estimation of what the student knows and what the student doesn't know.
And so, in order to do that,
we assess the student's knowledge of a specific knowledge component using
observation of when the student succeeded
or failed at applying that knowledge component in the past.
And so, one of the algorithms that have been used to do latent knowledge estimation,
one of the popular one is called knowledge tracing.
Another type of analysis that we can do is structure discovery,
and that includes algorithms such as clustering,
factor analysis, domain structure discovery or network analysis.
And unlike with predictive modeling,
when we're doing structure discovery,
we don't know exactly what the model is going to give us.
So, we have maybe an idea of
what we want to study but we don't know what we're going to find.
And so, the algorithms are going to try to pick up on
structures that emerge naturally from that data.
So, one of the approach that we can use is what we call clustering,
what it's going to do is it's going to try to look at different data points,
for example different students,
and then it's going to look at the data for each of
those students and it's going to try to find
students that group together because they're similar.
One classical example of how that's been applied in online courses is
using clustering to try to define major profile of engaged behaviors in online courses.
A second type of Knowledge Discovery method is the study of
social network analysis which is the study of
social interactions between students or other actors,
and here again, we want to try to identify
patterns that are going to emerge from those interactions.
So, one source of data we might have for social network analysis is look
at data about how students interact
together in online courses in their discussion forums,
so we might try to look at,
is there any community of students that forms,
and then what is the learning outcomes of members of each of those communities?
So, do people that succeed tend to group together or do
they also interact with people that tends to not succeed?
Another family of analysis is relationship mining that
includes techniques like association rule mining,
correlation mining, sequential pattern mining and causal data mining.
So again, similarly to structure discovery,
when we're doing relationship mining,
we don't know exactly what's going to come up of the models that we build,
of the analysis that we run.
What we want to do is we want to allow
the techniques to discover meaningful and unexpected relationship and the data.
So, for example, we might use association rule mining to
discover conditional rules of the form if something, then something else.
So for example, we might look at the courses that a student takes and identify
using association rule mining that if a student takes a class on educational data mining,
then they're also very likely to take a class on digital learning environment,
which would make sense because
educational data mining can be easily applied to digital learning environments.
We also have analysis of
sequential pattern mining which is similar to association rule mining,
but with the added component that there's a temporal associations between the events.
So, here there's really a time component to the relationship, for example,
if the student took the educational data of mining class,
succeeded in completing the class,
then maybe they are more likely to then
publish at an indication of data mining conference.
So, here there's a clear component of time,
you need to succeed at the course and then you will publish.
And finally, another type of analysis we can do is
we can try to distill the data for human judgment,
for example using visualization.
So, the idea here is that we have a lot of data and we can't just look at the data and
its rough form and actually get
some meaningful information of it because there's so much data.
So, what we do is we try to come up with
visualization that allows us to add a quick glance,
understand how that data is structured.
So, that might be for a researcher for them to
understand how a student interact with the digital learning environment,
but it might also be for students or for teachers for them to understand what's
going on in their classroom and then use that information to plan future lessons.
And this is just an overview of different types of techniques,
there's a lot of techniques that can be used including text mining algorithms,
analysis of video data,
analysis of audio data.
For example, video analysis,
we might look at the facial expression of a student
as they're engaging with learning content and then we can
use those stable expressions to try to identify
whether the student's confused, frustrated, bored.
We could look at data from
Kinect sensors that's going to give us information about the posture of the student,
that's going to be able to track their movement,
so we can look at whether a student leans in or leans backward.
What does that mean?
Does that mean that often when people lean in,
that's because they're more engaged with the content?
So we can detect that.
And so, just to quickly conclude, this was a really,
really brief overview of what educational data mining and learning analytics are.
They are very recent fields of study,
but they are growing very rapidly due to little larger and larger amounts
of digital data that comes from educational setting,
and there's a lot of applications to educational data mining
including the scientific study of the learning process itself,
providing us insight on how to better design digital learning environments
or using those model to automatically adapt digital learning environments.