It seemed like such a good thought at the time.
Folks with the flu (the influenza virus, that is) will probably go online to uncover out how to treat it, or to search for other data about the flu. So Google Google determined to track such conduct, hoping it might be ready to predict flu outbreaks even faster than traditional overall health authorities this kind of as the Centers for Disease Management (CDC).
As an alternative, as the authors of a new report in Science explain, we received “big data hubris.” David Lazer and colleagues describe that:
“Big information hubris” is the often implicit assumption that massive data are a substitute for, rather than a supplement to, traditional data collection and evaluation.
The folks at Google figured that, with all their massive data, they could outsmart any individual.
The difficulty is that most individuals don’t know what “the flu” is, and relying on Google searches by folks who could be utterly ignorant about the flu does not generate valuable details. Or to put it yet another way, a enormous assortment of misinformation can’t produce a little gem of real details. Like it or not, a large pile of dreck can only generate much more dreck. GIGO.
Google’s scientist first announced Google Flu in a Nature article in 2009. With what now would seem to be a textbook definition of hubris, they wrote:
“…we can accurately estimate the existing degree of weekly influenza exercise in every region of the United States, with a reporting lag of about one day.”
They obtained this exceptional accuracy completely from analyzing Google searches. Remarkable – if real.
Ironically, just a handful of months after announcing Google Flu, the globe was hit with the 2009 swine flu pandemic, brought on by a novel strain of H1N1 influenza. Google Flu missed it.
The failures have continued. As Lazer et al. show in their Science study, Google Flu was incorrect for 100 out of 108 weeks given that August 2011.
A single difficulty is that Google’s scientists have by no means revealed what search terms they really use to track the flu. A paper they published in 2011 declares that Google Flu does a fantastic occupation. The official Google blog last October makes it seem that they do an nearly best occupation predicting the flu for earlier years.
Haven’t these guys been paying out attention? It’s straightforward to predict the past. Does any person keep in mind the University of Colorado professors who had a model that properly predicted each and every election given that 1980? In August 2012, they confidently announced that their model showed Mitt Romney winning in a landslide. Hmm.
A bigger difficulty with Google Flu, although, is that most individuals who feel they have “the flu” do not. The huge bulk of doctors’ workplace visits for flu-like signs and symptoms turn out to be other viruses. CDC tracks these visits under “influenza-like illness” due to the fact so numerous turn out to be some thing else. To illustrate, the CDC reports that in the most recent week for which information is obtainable, only eight.8% of specimens examined good for influenza.
When 80-90% of men and women going to the doctor for “flu” don’t truly have it, you can hardly anticipate their net searches to be a dependable source of info.
Google Flu is still there, and you can nevertheless seem at its predictions, even although we know they are wrong. I recommend the CDC website instead, which is based mostly on real information about the influenza virus collected from actual sufferers. Big data can be fantastic, but not when it’s bad data.