The Machine Learning Steward, a Role for the Future

The wave of companies chasing digital transformation is never-ending, and in pursuit of that, their organizations shift and evolve to meet the new needs. Some roles disappear, others are heavily augmented, and some brand new ones start to rise.

Data stewards are a common but valuable role in organizations, tasked with being the data governance arm of a group. They ensure data sets and metadata remains in compliance with standards. This is different from putting statements of direction on a file share or someone’s laptop; they organize and enforce to the benefit of the organization as a whole. After all, accurate metadata saves hours, days, weeks in the data discovery and exploration process for a data scientist.

The role of a machine learning steward builds on the success of a data steward. While models are increasingly democratized and made generally available by forward-thinking companies like Microsoft (see their Cognitive Services), companies are also creating an abundance of their own algorithms. These can become shared, and in time, you have a data-driven organization. A machine learning steward is tasked with ensuring company policies and standards are maintained in all models.

Machine learning stewards should maintain relevant information about a model, such as:

  • Size and source of the training dataset, if applicable,
  • Pipelines used for cleaning or preparing data,
  • Creation date, last re-train date, and other time particulars,
  • Measures of performance, including accuracy, precision, and recall,
  • And most important of all, a history of reviews and discussion on the models, from a diverse set of data scientists around the company

The last point especially warrants emphasis. Algorithms inherit the biases of the their authors, just as all code is shaped by the experiences and worldviews of those who build them. There is also a growing concern that the excessive hype around AI & machine learning is leading to a credibility crisis. The convergence of these situations underlines the impact that respectful, thorough reviews can have in establishing a credible, high-performing data science arsenal.

If the role of a data science advocate is the cheerleader and promoter of data science in a company, promoting an Innersource culture then the role of the machine learning steward is the librarian.

Organizations that have robust deployments of both Azure Notebooks and GitHub are poised for success in this area, by using built-in features meant for collaboration, tagging, and management of solutions in an enterprise setting. The barrier to entry for these tools is extremely low, so if your organization recognizes the benefits of this proposed role, but doesn’t have the tools yet, you can easily start by using these tools as a catalyst for the cultural change.

The value delivered by this role to mathematicians, data science generalists, developers, and domain experts is limitless.