This is an engaging and informative book about the huge amount of data available online and what it tells us about society. I read it alongside Dataclysm and found Everybody Lies to be by far the better of the two, presenting a wealth of information in a cohesive fashion and making fewer unfounded assumptions. The author was a data scientist at Google, and draws in large part on the searches people make on the site, along with information from sites including Facebook and Pornhub.
There’s a lot of interesting stuff in the data, from the rate of racist searches in the rust belt predicting the rise of Donald Trump, to common body anxieties and whether they actually matter to the opposite sex, to an estimate of how many men are gay and whether that varies by geography (it appears not), to rates of self-induced abortions. This is a great book to read if you love unusual factoids, whether on sexual proclivities or how sports fans are made.
The author also writes in a compelling way about the uses of Big Data itself, and while he waxes evangelical about it (evidently preferring to spend all his time immersed in statistically significant data, he finds novels and biographies too “small and unrepresentative" and therefore uninteresting), there are certainly a lot of possibilities there. In health, for instance, compiling early searches about symptoms with later searches for how to handle a diagnosis can help doctors detect pancreatic cancer at an earlier stage, while epidemics can be tracked through symptom searches. The author is also interested in how applying data can revolutionize a field, discussing at length the data that predicted the success of the racehorse American Pharaoh. (By "at length" I mean 9 pages; this is a book that moves through a broad range of topics quickly.)
Overall, the writing is engaging and the book hangs together well, being informative while mostly resisting the urge to speculate. But the author does make a couple of assumptions worth pointing out. One is that people’s Google searches are made in earnest and for personal reasons. Certainly, you might search for “depression symptoms” out of concern that you or someone you know is depressed. But you also might want to be prepared in advance to identify warning signs, or might have encountered something in the media that sparked your interest, or you might be a student writing a paper on the topic. On the other hand, if you’re intimately familiar with depression already, you’re unlikely to google the symptoms. None of this means the author’s finding a 40% difference in rates of depression symptom searches between Chicago and Hawaii isn’t relevant, but data that’s both over- and under-inclusive serves better as a starting point for research than a definitive conclusion. It's certainly not proof that better geography is twice as effective as antidepressants, as the author suggests.
The other assumption is that everybody lies: the book insists on it, based largely on the fact that typically rosy social media posts fail to reflect all those unhappy or hateful searches. Selectively sharing information doesn’t necessarily seem to me to be lying, but the author appears invested in proving the book’s title. For instance, he discusses a particular type of tax fraud: in areas where few tax professionals or people eligible for the scheme live, 2% of people who could benefit from this lie tell it, while in areas with high concentrations of both, the rate of cheating is around 30%. The author concludes that “the key isn’t determining who is honest and who is dishonest. It is determining who knows how to cheat and who doesn’t.” This bleak view of the world fails to account for the 70% who don’t cheat even in areas with high levels of knowledge; finding that significant numbers of people cheat if they know how is a far cry from finding that everyone does.
So, like the author of Dataclysm, Stephens-Davidowitz is probably a better statistician than sociologist. But if you’re interested in Big Data, or in getting a peek at the thoughts and anxieties people ask Google about because they’re not comfortable sharing with others, this is the book I recommend. You’ll certainly get a lot of interesting tidbits from it, along with perhaps new inhibitions about typing things into Google!