You bought a watch that tells the correct time 90% of the time you look at it. The other 10% of time it might give you an obviously incorrect time (say, showing 11PM during the day) or might be just slightly off to make you miss that important meeting.
After using this watch for a while you notice that its correctness depends on several factors such as the time of the day, your location in longitude (but not latitude), your viewing angle at the watch face, what you had for breakfast the previous day, and so on. When you need to find out if you can catch the next bus, for example, you learn to first move to a location with the optimal ambient lighting, tilt your head just so, and raise your wrist at just the right speed to ensure a correct reading.
You went back to the store to exchange the watch for a better one that is correct 95% of the time. However, it comes with a different set of quirks for getting the correct time. You need to remember to rub your right cheek before reading the time, otherwise it could be somewhere between 5 minutes to 2 hours off.
For this unreliable watch, would you pay 90% of the full price of a watch that is always correct? The story is a parody but the unreliability and the inability to manage the unreliability (controlling when/how much the error occurs) is very real when we deal with machine learning models trained through statistical methods.
Because telling time is a problem that has better solutions, you wouldn't rely on the watch as in the story for it. But there are problems that do not seem to have better solutions (such as predicting the weather), and we are applying statistical models to more and more problems.
Even among these problems, the important thing is to recognize that statistical methods alone (and contemporary machine learning in particular) may or may not be the right approach. There are problems where a "good enough" solution is indeed good enough. If I am building a machine learning model for predicting the number of fallen leaves in my yard, it is probably okay if the result is only somewhat close to the correct number with wide variations of error margins. But if I am building a machine learning model to help doctors diagnose medical conditions, I might not be willing to accept "good enough" as the quality bar.
Comments
Post a Comment