The importance of statisticians in SaaS

If you're going to explore data science strategies for your SaaS business, you'd be well-served to learn about "ROC curves".

Why?

Because ROC curves assess the quality of data science output. Think of ROC curves as a report card. They help you visualize the quality of the data science deliverable on your desk.

For example, let's say your data science team (or consultants) builds a model to help your sales team identify which prospects are most likely to buy. We'll call it a "Propensity To Buy" score. And since businesses love lingo, we'll call it a "PTB" score. Acronyms, FTW.

 

Two models walk into a startup

To step a quick step back: data science models typically fall into two camps: 1) regression: trying to predict a continuous outcome or variable, or 2) classification: trying to predict a binary outcome. Our fictitious PTB score is therefore a . . . you guessed it, a "classification" model. Nicely done. Now we're getting somewhere.

But how do you objectively assess the quality of something very smart people produced by ingesting dozens if not hundreds of variables and training sets? The ROC curve. Boom.

We can thank WWII radar engineers for the lengthy name: Receiver Operating Characteristic. But their intent was much simpler: they needed a way to know how much of the good stuff their model captured (true positive rate/TPR) vs. the amount of bad stuff their model also captured (false positive rate/FPR).

For example:

  • TPR: Radar imaging model captures a Nazi battalion of Panzer IV tanks = nice work

  • FPR: Radar imaging model captures a herd of very large French cows = needs work

Same goes for business: how many of your prospects are being correctly classified (TPR) vs. incorrectly classified (FPR). Here's a visual of ROC curves look like in the wild:

We'll get into this topic much deeper in future posts, but for now we just wanted to make sure the DBT readership is aware of the crucial tool for assessing data science output.

Click to enlarge