I’m working and collaborating with several people who are interesting in explaining Bayesian reasoning, especially Bayesian networks (see below). We have a small group that meets monthly to discuss this, and last week we ended up also talking about why people would use Bayesian models instead of neural networks, focusing on medical decision support (not NLP). Which I found very interesting and thought-provoking.

One relatively obvious reason to avoid black-box neural models is justifiability (which relates to explanation). Ie, if I build a medical diagnosis system as a Bayesian model, I can explain to doctors (and regulators) exactly what the model does, and also the underlying data, evidence, or knowledge behind each part of the model. This is not possible with black-box neural models.

A perhaps less obvious reason is flexibility and modularity. With a Bayesian network, we can train some parts of the network from data, but we also can base parts of the network on clinical trials or human judgement. In medicine, evidence from a careful randomised controlled clinical trial (RCT) is trusted more than patterns inferred from data by a machine learning system. So if you have relevant RCT results, you need to include them in your model, you cant replace them by a neural network! This is much easier with a Bayesian network than a neural one. Also if we have insufficient data to train the complete network, we can get human experts to specify part of the network, which reduces the amount of data needed.

Flexibility is especially important when the world is changing, what ML people call “domain shift”. We talked about Covid in particular, where things are changing incredibly fast. The virus is mutating (new variants), the population’s reaction to the virus is changing (because of vaccination), and new treatments are becoming available. Also, the data available is changing rapidly, on key issues such as the effectiveness of different vaccines on different variants in different populations. So models need to constantly be updated, and this is easier to do when we can update chunks of a model separately.

For example, suppose I have a model which predicts the likelihood that a Covid patient will need ICU care and I get some new data on the effectiveness of the Pfizer vaccine on Covid Delta variant in 20-30 year old Asian males. With a well-structured Bayesian system, I can just slot in this data without needing to touch the rest of the model. I suspect it would be much harder to do this in a neural network.

To put this another way, using a Bayesian network to identify images of cats is stupid; neural models are much better in contexts where there is lots of training data, no gold-standard “clinical trial” knowledge, no need to justify results, and where today’s model can be expected to work just as well in a month’s time. But in fast-changing medical situations such as Covid, there are real drawbacks to neural models, and alternatives need to be considered.

I’m sure the above is well-known and obvious to many people, but its not something I have thought much about before. From a software engineering perspective, its perhaps related to “non-functional” requirements. Ie, in a medical/Covid context, we not only want models that are accurate (functional requirement), we also want models that are justifiable to domain experts and regulators, and which can easily be updated in a rapidly changing world (non-functional requirements). I’ve seen loads of academic papers, competitions, etc about accuracy/performance, but much less about “non-functional” requirements. But non-functional requirements are very important, we cannot ignore them.

The above discussion was about clinical decision support, not about NLP or NLG! I suspect most NLP tasks are similar to identifying images of cats, in that they dont require justification in the above sense, and domain shift is slow (languagse evolve, but much more slowly than the Covid virus). But in an NLG context, I could imagine medical or legal use cases where regulators would insist on explicit justification of an NLG system’s behaviour. Also, many of the NLG applications I’ve been involved with produce novel texts which are different from anything written by people; in such cases we need a flexible/modular approach where we can update bits of the system as we get feedback and data from users.

### Explaining Bayesian Models

As mentioned, the above discussion came out of a group whose main focus is on explaining Bayesian networks, which I think is a really interesting topic. From an explanation perspective, Bayesian models are transparent (ie, you can explain what the model is doing, unlike a neural network) but not intuitive (people struggle to understand complex probabilistic reasoning). We know exactly what the network is doing (its mathematically well founded), but how do we explain it to people, especially given that most people struggle to understand probabilities or complex explanations? Most of the discussion last week was not about the above “why use Bayesian networks” topic, but rather about Jaime Sevilla’s ideas on easier-to-explain approximations of Bayesian reasoning (Jaime is one of my PhD students). In our previous meeting, Miruna Clinciu discussed her ExBAN corpus of human-written explanations of Bayesian networks, and Sameen Maruf discussed experiments she is doing on adapting explanations to user expectations.

I also wanted to say that Alberto Bugarín Diz, another member of our group, has funding and is looking for a PhD student to work on explaining Bayesian models; I will co-supervise this student. This is a great opportunity for anyone interested in this topic!