All right, I tend to roll my eyes every time someone in the corporate headquarters tries to inspire me with a sports related teamwork quote. Most people associate teamwork discussions with mediocre hotel conference room retreats, and trustful team building activities, led by some kind of thought leader. Basically, I always imagine the character Michael Scott from the popular television series, The Office, telling me how awesome teamwork is. But underneath the cheesy quotes about team building efforts, there is a universal truth that for all you fellow cynics out there applies really well to a diverse multifaceted field like machine learning and healthcare. The reality is that if you're working on a machine learning project in some aspect of healthcare, you will either be part of a team or you should plan to put together a multidisciplinary team. >> And team members should be specifically chosen for the skill sets and talents that they bring to the table so that it will provide an opportunity to formally think about the important decisions and stages of the project as a group. One common type of mistake is to focus too much on putting together a team dominated by machine learning expertise. While machine learning skills and experience are clearly important, there are other critically important disciplines and skills that tend to be neglected early on, and which will come back to haunt projects. There are many stories of computer science teams that set out to solve healthcare without expertise in clinical medicine, clinical trials or statistical study design, healthcare finance and incentives, data privacy and biases, or even a comprehensive understanding of the end user environment and ecosystem. And this oversight of a comprehensive team ultimately leads to failure. On the other hand, another mistake would be on the other side of the spectrum, and putting together a team that's dominated by let's say clinicians and without team members who have strong machine learning expertise. And in this case, even if you've collected an amazing data set, the team could be hampered by inability to realize the full potential of machine learning, or worse, make fundamental mistakes or poor decisions in developing the machine learning model, that prevents generalizability or real world utility. >> So the main point is that it is important to build a team with diverse expertise. And not everyone on the team needs to have the ability to write out the mathematical proof for weight updates in backpropagation, for example. At the same time, the entire team benefits if everyone has an understanding of machine learning concepts and principles. Because that foundation of knowledge, sources a common language allowing everyone's unique expertise and experience to apply to the problem. >> So let's consider some archetype areas of expertise that are important to consider in any applied machine learning healthcare project. First, let's cover some obvious needs, while also clarifying some terminology that's often confusing around the roles of a data scientist and machine learning engineer. Now, the first question that comes up all the time is, what's the difference between the roles of machine learning engineer and data scientist? And also, why would you want both roles on your team? Well, given that both professions do tend to overlap, there's some understandable fluidity about how a machine learning engineer is defined and how a data scientist is defined. And an individual can certainly have expertise spanning both types of roles. >> But to help with this potential confusion, let's work through a hypothetical scenario. Let's say, for example, your team plans to create a clinical decision support tool based on a machine learning model. That predicts pneumonia using real time EHR data and integrates into the clinical workflow. The person with the data scientist skills on the team would be focused on data mining, feature engineering, analytics, metrics of the model performance. In particular, they'd have a deep knowledge of working with healthcare data in this role. Which is clinically important because the role requires delivering and manipulating the data that the rest of the team can work with. This may include feature engineering, pre processing and other tasks that will be key to a successful project like preliminary simple model building to get a sense of which machine learning approach is best or which features to use or not use. And it is important to note that while most that can perform this test tend to lean more towards the biostatistical disciplines and clinical informatics degree holders. There are large swaths of data science tasks that don't require an advanced degree research oriented skills, and many who operate in this role, even at the highest level may have unrelated degrees. There's a huge amount of impact that you can have by leveraging the skills that are better built through industry settings as well. >> By contrast, the role of the machine learning engineer is typically someone who's an expert in computer scientist, and who would ideally team up alongside the data scientists co-developing the model, but they would focus more on the machine learning techniques needed to obtain high performing models. They may also play a leading role in writing more formal code for final software deployment and the entire workflow pipeline. In other words, building out formal production ready models and setting up the tools to integrate them with the rest of the clinical enterprise. And often the machine learning engineer is knowledgeable in more advanced machine learning techniques, especially deep learning, computer vision and natural language processing, and would consider features and analytic model performance from the data scientists and then apply them to advanced machine learning approaches. In one type of setup, a data scientist could provide preliminary or baseline simple models and data analytics results. And a machine learning engineer could then use these as a starting point to explore advanced machine learning approaches. Which could also include developing or improving existing model architectures and approaches and implementation of libraries in code. To work as a machine learning engineer, most expect this role to be held by someone with a master's degree in computer science or very similar discipline. However, as we alluded to earlier, there's certainly an overlap between people who identify as data scientists and people who identify as machine learning engineers or scientists. And we've primarily highlighted distinct skill sets that a multidisciplinary team will need that's often most associated with each of these fields. For example, when or if your machine learning system results look great after reporting the results in one data set. You might see high fives going around. But as we have discussed at this point, all you can safely conclude is that it worked on that specific data set. And there are still critical questions that need to be tackled before setting a project loose on an unsuspecting population. And biostatistics and epidemiology skill sets are best suited to take these responsibilities on. >> For example, are you powered to design the pilot trial? Will it work when you're running prospectively in your healthcare system? What about a different healthcare system? What are the baseline epidemiologic considerations of the population that you will deploy with. Does it change over time? How will you decide to launch or pull the plug? And these are just some of the important considerations that have to be baked into this. So you need some very specific skills to deal with those questions, statistical skills. And the main thing to remember is that we won't necessarily have all the answers to these questions, but that's the point. If we want to make serious decisions where we don't have perfect facts. Biostatistical skills will help form conclusions safely beyond the data, analyze and back that up with either trial design or statistical analysis. >> In a lot of circumstances, the statistical skills, especially as it relates to model evaluation and metrics, sounds a lot like our discussion of data scientists evaluating pilot model performance. After all, there's a lot of statistical knowledge needed to evaluate models and review metrics. And in many non clinical scenarios, it may still be fine to rely on the model based metrics. But the important differentiation here, particularly in an applied healthcare or clinical setting, is that statisticians have experience in clinical trial and statistical design rather than programming. It's a critical skill set for clinical applications and evaluating, powering and designing clinical trials, setting up evaluations for population thresholds, etc. >> Another skill set involves integration and deployment in a healthcare environment. It's incredibly common for teams to build models that work only to hit a long delay in integration. It's important especially for models that are geared towards clinical deployment to engage healthcare IT professionals very early on in the model development process. Like a biostatistical skill set healthcare IT and experience knowledge about the details of when and where certain data become available. Whether the mechanics of data availability and access are compatible with the model being constructed or used, and the important interactions within the existing healthcare ecosystem are very important. Okay, so now that you have an idea of some key roles and skills for machine learning teams, let's talk last but not least, about the domain expert. The domain expert is the centerpiece of any machine learning and healthcare team. This is a role that can be occupied by more than one individual. And increasingly the domain expert also has additional experience in statistics, data science, or more recently Machine Learning Engineering. >> This role essentially must be an individual or even several individuals that can provide context and guide the development of the overall application. They can help decide metrics and make key development decisions including where to choose a threshold. Where populations and data should be included and how deployment might take shape. Preferably, the domain expert has extensive on the ground experience in the area or application that align with the model or use case. As a result, the specifics on the background and skill set of the domain expert is heavily dependent on the use case and there are many examples of domain experts to consider depending on the type of model and application.