State of Machine Learning 2023
Everyone says data is eating the world - every team within every organization is trying to harness the power of data to drive new revenue opportunities, insights, and operational improvements. Certainly within the cybersecurity industry, we’ve seen a trend towards many startups selling products to help security teams get more value out of their security data and generate insights from existing security data.
But as someone building cloud security software, I am a bit skeptical of the ways we are building data-driven software in our industry. Our industry is plagued with vendors selling AI snake oil, generating more alert fatigue than real insights or value.
So I decided to speak to a range of data science and machine learning professionals from a range of industries and company sizes to learn more about how they are building stochastic software and tracking the success their ML programs. Here’s what I learned:
- The maturity of an organization’s ML operations is tightly linked to the availability of ground truth datasets. It’s not necessarily linked to the size of their ML team.
- More mature ML teams are performing rigorous testing and monitoring of models, whereas less mature ML teams are more likely to be running blind, doing very little testing or monitoring of models
- Adoption of ML-specific observability tools is low across the entire industry. ML teams are more likely to have built their own solutions for testing and monitoring
What do machine learning engineers do?
The roles and responsibilities of machine learning engineers and data scientists might vary from organization to organization. But I found that across all organizations, for these individuals to be successful in their roles, they need to be capable of data engineering, software development, and machine learning, and also have some domain expertise in their field of work.
- ML engineers, even in organizations with robust MLOps teams and platforms, are still likely to be involved in the entire ML lifecycle, from data engineering to hardware performance to monitoring in production.
- ML engineers will typically spend a lot of time on data engineering work as well. Even in organizations with robust data engineering teams, because of the experimental nature of feature engineering, ML engineers prefer to be involved on the data engineering work as well.
- ML engineers also spend a decent amount of time dealing with hardware performance headaches. Machine learning frameworks, in theory, should abstract the complexity of the underlying hardware, but in reality, ML engineers continue to deal with a lot of issues related to hardware performance and kernel version compatibility.
- ML engineers generally enjoy being involved in the full lifecycle. ML engineers tend to be curious by nature and enjoy rolling up their sleeves to solve whatever problems come their way.
- ML engineers do need to understand the behavior of the underlying datasets they are trying to model in order to be successful. For that reason, it is particularly difficult to find good ML talent in industries such as security or healthcare, where the data scientists need to also have some foundational domain knowledge.
What is the technology stack of ML engineers today?
ML teams will typically use a major public cloud providers’ data science platforms or Databricks, with some adoption of open source tools. Gaps tend to be filled with homebrew solutions rather than additional paid ML tools, and there is a general trend towards tool consolidation within organizations.
- ML application code is typically deployed in containers, typically orchestrated by Kubernetes. ML teams try to follow SDLC best practices where possible.
- Organizations will typically use the ML platform of whatever cloud platform their organization is using for compute and storage (e.g. AWS Sagemaker, Google Vertex AI, or Azure AutoML). There is some adoption of third-party ML platforms, such as Databricks.
- Gaps in MLOps tooling tend to be filled with homebrew solutions or open source offerings rather than adoption of third-party ML tools. Organizations that have adopted other popular ML tools, such as Weights & Biases, have recently moved away from these tools due to IT consolidation.
- More mature ML teams tend to have more sophisticated homebrew platforms and higher adoption of paid ML tools. Less advanced ML teams are likely heavily reliant on public cloud infrastructure and open source tooling.
What factors influence the maturity of an organization’s ML operations?
- The maturity of an organization’s ML operations is tightly linked to the availability of ground truth datasets. In other words, organizations who can measure “success” in every prediction have the most advanced ML operations, and less feasible it is for an organization to quantitatively measure success in every prediction, the less mature their ML operations.
- Examples organizations with the most sophisticated ML operations are ecommerce companies, who can measure gross merchandise sold, or major streaming platforms, who can measure content viewed.
- Examples of mid-levels of sophistication are healthcare companies, that might have sufficient but inconsistently labelled data on patient outcomes.
- Security companies tend to be the least advanced, with very limited ground truth on adversarial behavior. Email security tends to be the exception, as end users will typically flag spam emails or emails accidentally labelled as spam, resulting in sufficient ground truth.
- ML engineers are often not able to move the needle on the availability of ground truth for a range of reasons. They may not control the data collection process nor it may not be humanly feasible to label a significant enough portion of data.
- There is little attention today spent on improving or addressing these limitations for machine learning efforts in organizations who don’t have sufficient ground truth
- No organization I spoke with was using the aid of services like Scale AI to support in data labelling efforts.
- The maturity of an organization’s ML operations is not necessarily linked to the size of their ML teams. There are many organizations with large teams of ML engineers but relatively immature MLOps processes.
- This appears to be particularly true of cybersecurity, where there may be sizeable ML teams within an organization, but these teams do very little testing or monitoring of the models they are building.
How do ML engineers test and implement model enhancements?
Testing practices across machine learning teams seems to vary greatly from industry to industry, and even organization to organization.
- More sophisticated ML organizations have very strict processes around how and when model improvements can be released into production, and any model updates will be rolled out incrementally through rigorous testing
- Organizations at the more sophisticated end have built their own tools to run A/B tests when rolling out model improvements or enhancements
- Less sophisticated teams often have inter-related data pipelines across their models, where a change in one model might have downstream consequences for other models, and impact might be detected only after a change is made
- More sophisticated teams have strict processes around how and when model improvements can be released into production, and any model updates will be rolled out incrementally through rigorous testing
How do ML engineers monitor models running in production?
- Adoption of ML-specific observability tools is low. No ML engineer I spoke with was on a team already using an ML-specific observability tool, but one organization was in the process of implementing Arize.
- ML engineers had largely built their own instrumentation to monitor model performance and set alerts when undesired behavior was detected, and leveraged some of the testing and monitoring capabilities of the ML platforms they were using within their organization