So this is a.
And here I've just shown you a little bit about the structure of this BLAST record
that we're getting back.
And you can find more about this if you look more in to the Biopython.
So the BLAST results themselves are going to give you lots of information, actually.
You can get an illumanous output from one call to BLAST.
So first you get some descriptions of what BLAST found.
That is what species it hit against.
And you also get a list of descriptions.
For each thing each species or each sequence that a particular query matched
you'll get the alignments of those sequences to the references.
And then for each of these individual matches you're going to have a score.
And a score is a bit score.
And there's also an e value which is just shown here as E, is supposed to be
a representation of the likelihood that you would get this same result by chance.
So, if that value is very, very small, that means the search was not accidental,
but it's a genuine match.
And then you get some number representing how many alignments were found against
that particular query.
And on the alignments themselves, you get a number of things.
Each particular alignment you have something that's called the HSP record,
which stands for High Scoring Pairs.
So for all the alignments of your query against the target, which might be
a genome, it'll show you the score of each of those separate alignments, and it'll
show you the actual matches and mismatches all capture this in this elaborate record.
So, you don't want to Parse that yourself, you want to use these tools to Parse
that so you can put things out in a nice human readable way.
So with the Parsing BLAST Output, first let's look and
see what we've got, we have our BLAST record we have captured.
So one of the things in that BLAST record are the alignments.
So I can just print those out to the screen by typing
len(blast_record.alignments) and that will show me how long that list is.
That's got 50 items in it, so I got 50 matches back.
And by the way, BLAST by default limits your output to 50 matches.
That's why there's 50 here.
There probably were more than 50.
It will give me the 50 best ones.
So now I want to print out just the matches that gave me a significant hit.
So I am going to set a threshold for that evalue.
That's that expectation value which is a probability.
And I want to make it pretty small so
we'll say e_value_thresh equals 0.01 meaning I don't want to see any matches.
Which don't have an e value threshold.
Which is one percent or smaller.
So they are unlikely to be chance matches.
So what I'd like to do is loop through all the alignments of last return, and
just print out the ones that have a good evalue.
So here's how we'll do that.
We'll get the alignments out of our BLAST record.
Which is blast_record.alignments.
That gets all of them.
We want to loop through them, so we'll say for
alignment in blast_record.alignments, okay?
So now we've got the variable alignment
is now bound as we go through the loop to one of our alignments.
And we then want to look at the high scoring pairs at HSPs,
the subrecord of alignment.
So we're going to look at each of the HSPs, so
that sort of serve each of the subalignments in alignment.
So we do that by looping again with four HSP end alignment.hsp's.
And then each of those are going to have these evalues attached.
We write if hsp.expect is less than evalue threshold, so
if the evalue that we got back from BLAST is smaller than the threshold we set,
then we are going to print it out.
So here's how we print it out.
We just print some text say the word alignment and then we print the sequence.
The sequence, that's called the title in the BLAST record.
So, we print alignment.title.
The length is alignment.length and the e value is hsp.expect.
So, we're going to print out each of those with some plain text so
that we can recognize in their outputs and print out the sequence, the length,
the Evalue and then we print out the query sequence itself,
that was the sequence that we provided to BLAST as our query.
We want to print out how it matches and
you'll see in the next slide how that looks.
And we'll print out the target, the subject, which is hsp.subject.
So this is going to print out, for all of our significant matches,
it's going to print them out in a nice format.
So, let's see what we got back.
So, here we are for these alignments,
each alignment the text ****alignments prints out first.
And then you see that we have sequence which has
a fancy [INAUDIBLE] number and some other information and and Evalue which is a very
small number you see the first one is less than 2 times 10 to the minus 29.
So very, very small number.
So highly significant match.
Then you see the two sequences, one above the other,
with little vertical bars in between showing where they match.
So the vertical bars means there's a match.
If there's no vertical bar there, that would be a mismatch.
But as you see, this sequence is a perfect match.
And you also see on this sequence name on the first line there
you can see that the name of this is Ebola virus there.
The Ebola virus, isolated Ebola virus.
So that's what we have.
So our patient was in fact very sick and
what the patient had now with the sequence we see right away that it's matching Ebola
virus and all of the matches, if you scroll down through them,
all the matches are to different strains of the Ebola virus.
So, Ella, we found out what the patient had.
If you want more help with Biopython, this is just one example.
There are, literally, thousands of programs now in Biopython,
all pre written for you, there's no reason to write any of them yourself,
is much more useful for you to learn about what programs are available and
then you can simply call them from within your Python program.
So there's a Biopython tutorial available at this link
I'm putting here on the screen.
And then there's a FAQ I also recommend you look at.