Essays on Technology and Culture

Big Data, Little Context

I’m still new to the whole Quantified Self thing. The only wearable I have is a FitBit One, and I track all my food and water intake manually (when I remember). I use RescueTime to track what I do on my Mac. [1] I have Moves running to see where I went each day, and how I got there. I use Datalove to track how many words I write each day, and the numbers aren’t great. GoodReads tracks my books—and not well. That’s about it. I still end up collecting a lot of data about myself and my activities, but why? Data alone is useless. If RescueTime says I was 41% productive last week, but 24% productive this week, what does it mean? [2]

Data without context is meaningless. One of the reasons why personal fitness and Quantified Self applications go so well together is that if you’re trying to get healthier, knowing how much you move around during the day helps. If you get home, plop on the couch, and see you’ve only made 3238 steps during the day, it might motivate you to try and move around more. When I step on to the treadmill after work, or even just go for my post-lunch walk, I know it’s having an effect better than just returning to my seat and decomposing. I have a goal, and the data lets me know if I’m getting there or not. There’s no better measurement than how I feel—and writing this after a trip to the gym, I don’t feel great—but data helps back things up.

But correlating “steps taken” and “calories consumed” to general health is a lot simpler and easier to understand than a lot of other massive data-focused endeavors. So much of the talk around “big data” reminds me of Max Cohen’s assumptions in the movie Ï€. The idea that if we have enough data, set enough sets of eyes on it—or enough algorithms to parse it—we can discover patterns and gain insights into the future of whatever the data is about rings true. It plays to the innate human prediction for pattern recognition. We’re good at it, and by extension computers are good at it.

There’s just two problems. One: we often read patterns where no real patterns exist, as do our computer programs. Two: This can often lead us down the wrong rabbit hole, as we overgeneralize the pattern we discover, without being aware of its changes. By way of example, look at Google Flu Trends, and how it’s become increasingly out of whack with reality. The “big data” hypothesis, much like the “quantified self” hypothesis, is that the more information we have about something, the more insight we get into it. The problems above prove that this isn’t the case. Data alone does not lead to understanding. As a wise man once said, “You can use facts to prove anything that’s even remotely true.” Algorithms are just as subject to biases and ignorance as the people who make them. As long as that’s the case—and it will always be the case—we’ll have to do a lot more interpreting to find the answers, if they exist.

  1. What I wouldn’t do for RescueTime-esque functionality on my iPhone and iPad. Except Jailbreak, I suppose. Maybe iOS 8 will support it…  ↩
  2. It means I bought the SimCity 4 re-release from the App Store is what it means.  ↩