Categoriearchief: Big Data

Tagging Bullsh*t

A reader’s reflection on “Calling Bullsh*t, The art of scepticism in a data-driven world” by Carl T. Bergstrom and Jevon D. West, Allen Lane, 2020.

Dear Sirs,

Ever since mankind misled itself into believing it could use linguistic tools effectively, there has been a lot a bullshit around, I guess. But, surely, it didn’t help that in this Age of Information ever more people found themselves in a position in which they could record, duplicate, retouch and spread whatever bullshit they couldn’t help from being excreted in whatever medium available into whatever sewer that still indiscriminately accepted serious analysis, frivolous fiction, adds, spam and verbal vomit.

In fact, it took me more time to read your book than I wished for after reading its title on Amazon, not due to late delivery (this took one day and seven hours after ordering it), but because it was delivered at a time in which millions of senders where spreading facts, opinions, prognostications, speculations, revelations, and as side effects deep insights in the convolutions of their own minds, with COVID-19 and Trump versus Biden as seemingly venerable triggers for spreading words and figures the world could have done without. It turned out that, being very weak of character, I exposed myself to nonsense about pandemics and vaccines, the Irish Channel, and the trivial spasms of the United States as a democratic World power rather than reading a book that might have offered one a vaccine against self-imposed debilitation.

But now – it’s X-mas – the people of the world rejoice the victory of Biden, the delivery of 60 billion USD worth of COVID-19 vaccines, and – just in time – a 2000-pages agreement between the European Union and the United Kingdom about a disagreement that the newspapers invariably believed to have summarized successfully  in one column, and the 248 channels didn’t offer any X-mas movie I hadn’t seen a couple of decades ago, so now finally, I had the time to read your book.[1]

Well, your book is entertaining enough to remain readable to near the end. Moreover, it contains illustrative examples some of which may be new to some readers. And, to avoid any misunderstanding, your aim to fight bullshit is praiseworthy. But as an analysis of bullshit, or nonsense, or baloney, it has shortcomings that, all things considered, render its publication superfluous.

One shortcoming is that the main effect of excursions the purpose of which appears to be to give the reader a deeper insight into matters, is that the reader gets distracted. For example, I was intrigued by your story about mantis shrimps and Corvidae (page 4 and onwards). Examples taken from biology have an intrinsic qualify to stupefy readers, especially those readers who have convinced themselves of either the truth of evolution theory or the Truth of biblical creation. But the example of devious deceiving ravens leads the reader astray in an analysis of bullshitting. Whatever bullshitting is, it can not be identified with devious deception. To deceit people into believing that all their problems were caused by the Jews – as the Nazis did – or by Muslims – as contemporary populists try to do – is not a case of bullshitting. Although bullshit may be instrumental in being deceptive, it is neither a necessary nor a sufficient condition for deceitfulness. Given this, it makes little sense to throw in shrimps and ravens into the bullshit-equation. Moreover, the example is counterproductive since the deceitfulness of ravens is allegedly a feature that has been evolutionary beneficial to the species. The example could have been used in an analysis that sets out to defend bullshit or reinstate it as a venerable feature of mankind. But as far as I have understood your purpose in writing this book, this is what you have been aiming at.

Elsewhere you discuss the graph that allegedly shows that being a musician in an ‘old’ genre is safer than entertaining people with a ‘new’ genre (page 126 and onwards). It is fair to suspect that anyone who is confronted with this thesis will immediately rebut that musicians in relatively new genres haven’t had the time yet to become old, if only because in the article surrounding the graph this explanation was suggested. Instead of leaving it at that, you introduce a fictitious research about the lives of a rare species of chameleons in Madagascar to produce an explanation of fictitious data that is identical to the explanation given by anyone of non-fictitious data.

Another shortcoming of your book that the examples given of bullshit cannot be coveted by a definition. Indeed, the train of examples do not lead to a somewhat coherent perspective. Here is a list of the examples of bullshit you discuss in the first three chapters.

  • Deceiving ravens.
  • A study published in The Lancet in 1998 in which it is claimed that it is possible that a specific form of autism is related to a specific vaccine.
  • A story on Twitter about an eight-year-old girl from Sandy Hook Elementary School that was killed in the Boston Marathon Bombing of 2013.
  • Endlessly repeating real or phony news that were introduced on the internet, such as the story that Taylor Swift was dating Joseph McCarthy (page 29) or the story that the Israeli Defense Minister will nuke Pakistan (page 30) or Russian propaganda for or against a candidate in 2016 (page 31).
  • Fake reactions to a proposal about net neutrality (page 34).
  • Fake news made to look very real by digital trickery (page 35).
  • Eloquent window dressing of claims that no one understands (page 40).
  • The thesis that cat people earn more than dog people (page 41).
  • A 2016 paper in which it is claimed that machine learning can distinguish criminals from the rest by facial features (page 45).

In later chapters you expand the list of examples in various directions:

  • Confusing correlation and causation (chapter 4).
  • Switching between percentages and percentage points and other elementary mathematics tricks (chapter 5).
  • ‘Mathenis’: concocting formulas that suggest as quantitative relation between factors whereas the relation is – at most – qualitative (chapter 5).
  • Selection bias (chapter 6).
  • Drawing misleading graphs (chapter 7).
  • Overreaching claims about the possibility of machine learning due to overfitting and other statistical traps, including the claim of Wang and Kosinki in 2017 that neural networks can spot sexual orientation better than humans can (chapter 8).
  • The self-fulling bias of machine learning (chapter 8).
  • The failure of Darwinian evolution to account for epigenetic factors (chapter 9).[2]
  • Fraudulent scientists (chapter 9).
  • The prosecutor’s fallacy (chapter 9).
  • P-hacking (chapter 9).
  • Publication bias / the file drawer effect (chapter 9).
  • John Ioannidis’ 2005 article on the file drawer effect (chapter 9).
  • Clickbait science (chapter 9).

Whereas at some point (I haven’t found the page yet) you suggest that the geocentric perspective on the universe can be classified as a case of bullshit (…still looking…).

If bullshit is to be identified on the basis of these examples, you have created for yourself a very tall order for the chapters 10 (“Spotting bullshit”) and 11 (“Refuting bullshit”). Spotting bullshit would amount to spotting the truth and nothing but the truth. And in refuting bullshit you would take the burden upon yourself to show to everyone, including fervent believers in some piece of bullshit, that they have not just been wrong but been bullshittingly wrong. It seems to me that you are too knowledgeable to believe that you are able to spot the truth and nothing but the truth. In case you do believe to be able to succeed where no man succeeded before, I have to be rude on the second account: no one ever succeeded getting a believer in bullshit to stop believing the bullshit.

However, there is reason to hold that you need not fill in an order that is thus tall. Though you seem to be convinced that your book is about one phenomenon that can be circumscribed somewhat neatly, viz. bullshit, the examples of bullshit point towards various phenomena that are very loosely related at most. For instance, it is undeniable that the internet is a factor in spreading falsities and nonsense. Though I understand that in writing a book on bullshit you felt the urge to get into the nonsense that is spread over the asocial media, I am missing the analytical point of doing so. After all, it is equally undeniable that science has been instrumental in spreading both nonsense and veracities.

Let’s take the example that I have not yet traced back in your text. True, the geometric perspective on the universe is at odds with current astronomy. But one can acknowledge that the geocentric perspective is not true or – more precise – is not the best way to account for the empirical data, without holding that those who believed in it were bullshitting. In general, most ideas about reality many of our contemporaries are inclined to classify as nonsense, had a strong grip on the minds of both earlier generations and other contemporaries.

Similarly, though phenomena like p-hacking, publication bias and distorting data in visualization deserve it to be exposed and deconstructed, to dub them ‘bullshit’ is not an effective means to that end. Alas, most contemporary scientists have surrendered themselves to the publish or perish dogma. The side effect of this dogma is that the focus of science has shifted from correspondence to the data to approval by peers.[3] Approval by peers, however, has a terrible bad track record as a guidance to truth.

While you do not seem to have appreciated the abyss of approvability in current science, and possibly as a consequence of this oversight you are mistaken in believing that fraudulent scientists are selling bullshit. Consider the case of Diederik Stapel.[4] It is apt that the Stapel-affair is mentioned as a case of scientific mishandling of the data. Unfortunately, most commentators have led themselves be misled into believing that Stapel was the culprit. If there is any truth in the idea of ‘objective science’ and the idea that science develops thru replication, it wouldn’t have mattered much whether Stapel had invented his data or had used real data. In either case the merits of his analyses – or their lack of merits – would have surfaced in further research by colleagues. However, in this age of ‘science by approval’ hardly any scientist invests time, energy, and funds in replicating endeavors. Such investments make no sense any longer since (i) a scientist cannot get much approval by his peers by reproducing the results of others, and (ii) science has left the areas where data can be somewhat decisive regarding the truth of scientific theory. To put it bluntly, if people who intend to be serious about science are willing to consider social psychology a case of proper science, anything goes. The bulk of contemporary science is bullshit. That most scientists try to behave according to the ‘scientific method’, is inconsequential.

On a more serious analytical level, your perspective on data analysis is incoherent. True, many promises of Big Data, Artificial Intelligence and Machine Learning are somewhat nonsensical. And, indeed, chances are slight that an algorithm that processes garbage will produce analytical jewels. But while rehearsing the ‘garbage in, garbage out’-meme, you fail to appreciate that science proper aims to do precisely what this meme deems impossible: to get fine, analytically pristine results from sloppy, seemingly worthless data.

Though it is tempting – and not unfashionable in popular lectures – to depict the difference between, say, Peter Norvig and Gary Smith as a shadow of the rift between seventeenth century empiricists and their rationalist counterparts, there is a common denominator in empiricism and rationalism from, say, Decartes and Locke to, say, Quine and Chomsky. This common denominator is that science has to deal with data, with ‘what is given’. Originally, empiricists overrated the purity of data whereas rationalists underrated their richness. Throughout the often exaggerated difference between empiricists and rationalists, the common core was that in dealing with the data a scientist could not excuse himself for producing wrong, incoherent or inconsistent theories by just saying that the data given to him were sloppy, one-sided, ambiguous or biased. The main if not the sole task of a scientist is to get proper results from data, no matter how sloppy they are.

Around 1660 the symbiosis of the rationalistic and empiricist perspective was formulated in a little essay that was published posthumously in 1677 in Opera Posthuma of the Dutch sceptic Benedictus de Spinoza. In his Tractactus de intellectus emendation he put forward the strong – though not yet proven – thesis that, no matter the idea one sets out with and no matter what data one is given, bullshit will defeat itself in the end. If one pursues bullshit indefinite, it will be exposed as bullshit. This thesis was later taken up by philosophers and scientists as diverse as G.W.F. Hegel, Charles Darwin, Karl Marx, Charles Sanders Peirce, Ludwig Wittgenstein and W.V.O. Quine. The idea underlying this thesis is that truth and falsities are distinguished by a marker that need not be explicit. Whereas current techniques of machine learning and deep learning heavily rely on explicit markers – the tags of the training data – skeptics in the empiricist tradition insist that empirical knowledge cannot depend crucially on explicit markers, because the data presented to the senses do not come with explicit markers. On some level learning is a tag-less process.

If this is so, bullshit will be tag-less, too.

This in itself – the inadequacy of your analysis of bullshit, knowledge and science – does not support the verdict that the publication of your book is superfluous. The main obstacle for books that try to fight bullshit and other mistakes is that they are not consumed by those whose ideas are most in need of being corrected. This very large group does not consist solely of conspiracy theorists, creationists or white supremacist. A couple of months ago I tried to explain to a journalist that the ‘results’ of Wang and Kozinski should not be taken for granted – and that the rumors about the Chinese government using facial recognition software to read the emotions of the faces of certain Islamic minorities in China were urban myths that had been abstracted from companies that tried to sell facial recognition software – but this attempt was to no avail. Similarly, it is of no use to point out to journalists, consultants or renowned experts in privacy laws that the definition of ‘personal data’ in the GDPR is analytically unsound because a simple and unequivocal term like ‘personal data’ is defined by means of a very difficult and ambiguous term like ‘information’.

Your book makes a nice gift for people who consider themselves to be rational. Unfortunately, to consider oneself rational is itself a paradigm of bullshit.

Dr. W.W.

December 27th, 2020

Vught, The Netherlands

[1] To be fair and veracious, I have seen a couple of oldies on DVD while stuffing myself with glorified X-mas dishes, viz. Billy Wilder’s Witness for prosecution (1957), Joseph L. Mankiewicz’ version of Sleuth (1972) and Robert Aldrich’s vilification of Mike Hammer in Kiss me deadly (1955). Speaking about bullshit, rumor and serious talk has it that in between 1955 and 1997 Aldrich’s movie hasn’t been showed with the original and intended ending in which Mike and Velda flee the bungalow and look back at its exploding from the relative safety of the ocean. Allegedly, Glenn Erickson recovered the original ending in 1997 after which the full version was distributed on DVD. Though Erickson undoubtedly has recovered clips from files and has restored them in the copies that circulated in the United States, the original version has stayed in circulation in Europe since the fifties. At least, that is my deduction from the fact that the version I have seen of Kiss me deadly on a German TV-channel (ZDF) in the seventies included the narrow escape of Mike and Velda.

[2] Though it is not clear whether you consider this failure to be a case of bullshit, it is certain that there are numerous evolutionists around who consider epigenetics to be a case of bullshit.

[3] This shift to approval by peers has reinstated the idea of ‘probability’ to its original meaning. Cf. Ian Hacking, The emergence of probability, Cambridge University Press, 1984 (original edition).

[4] Being familiar with the corridors of Tilburg University, I have followed Stapel’s successful attempts to get the attention of the media and his later downfall when the media found that they had tried to increase their rating with the help of his alleged analyses of his concocted data.