The Art of Statistics: Learning from Data | ||||

Spiegelhalter (David) | ||||

This Page provides (where held) the Abstract of the above Book and those of all the Papers contained in it. | ||||

Colour-Conventions | Disclaimer | Papers in this Book | Books / Papers Citing this Book | Notes Citing this Book |

__Back Cover Blurb__

- How can statistics help us understand the world?
- Can we come to reliable conclusions when data is imperfect?
- How is statistics changing in the age of data science?
- Sir David John Spiegelhalter is a British statistician and Chair of the Winton Centre for Risk and Evidence Communication in the Statistical Laboratory at the University of Cambridge. Spiegelhalter is one of the most cited and influential researchers in his field, and was elected as President of the Royal Statistical Society for 2017-18.

**Does going to University increase the risk of getting a brain tumour?**- An ambitious study conducted on over 4 million Swedish men and women whose tax and health records were linked over eighteen years enabled researchers to report that men with a higher socioeconomic position had a slightly increased rate of being diagnosed with a brain tumour.
- But did all that sweating in the library overheat the brain and lead to some strange cell mutations? The authors of the paper doubted it: ‘Completeness of cancer registration and detection bias are potential explanations for the findings.’ In other words, wealthy people with higher education are more likely to be diagnosed and get their tumour registered, an example of ascertainment bias.

**How many sexual partners have people in Britain***really*had?- Plotting the responses from a recent UK survey revealed various features, including a (very) long tail, a tendency to use round numbers such as 10 and 20, and more partners reported by men than women. It is incredibly easy to just claim that what these respondents say accurately represents what is really going on in the country. Media surveys about sex, where people volunteer to say what they get up to behind closed doors, do this all the time.

**What is the risk of cancer from bacon sandwiches?**- An IARC report concluded that, normally, 6 in every 100 people who do not eat bacon daily would be expected to get bowel cancer. If 100 similar people ate a bacon sandwich every single day of their lives, the IARC would expect an 18% increase in cases of bowel cancer, i.e. a rise from 6 to 7 cases out of 100. That is one extra case in all those 100 lifetime bacon-eaters, which does not sound as impressive as the relative risk (an 18% increase) and might serve to put this hazard into perspective.

**Do busier hospitals have higher survival rates?**- There is a considerable interest in the so-called ‘volume effect’ in surgery – the claim that busier hospitals get better survival rates, possibly since they achieve greater efficiency and have more experience.
- When considering English hospitals conducting children’s heart surgery in the 1990s, and plotting the number of cases against their survival, the high correlation showed that bigger hospitals were associated with lower mortality. But we could not conclude that bigger hospitals caused the lower mortality. We cannot conclude that the higher survival rates were in any sense caused by the increased number of cases – in fact it could even be the other way round: better hospitals simply attracted more patients.

- List of Figures – ix
- Getting Things in Proportion: Categorical Data and Percentages – 19
- Summarizing and Communicating Numbers. Lots of Numbers – 39
- Why Are We Looking at Data Anyway? Populations and Measurement – 73
- What Causes What? – 95
- Modelling Relationships Using Regression – 121
- Algorithms, Analytics and Prediction – 143
- How Sure Can We Be About What Is Going On? Estimates and Intervals – 189
- Probability - the Language of Uncertainty and Variability – 205
- Putting Probability and Statistics Together – 229
- Answering Questions and Claiming Discoveries – 253
- Learning from Experience the Bayesian Way – 305
- How Things Go Wrong – 341
- How We Can Do Statistics Better – 361
- In Conclusion – 379

Glossary – 381

Notes – 407

List of Tables – xiii

Acknowledgements – xv

Introduction – 1

- Turning experiences into data is not straightforward, and data is inevitably limited in its capacity to describe the world.
- Statistical science has a long and successful history, but is now changing in the light of increased availability of data.
- Skill in statistical methods plays an important part of being a data scientist.
- Teaching statistics is changing from a focus on mathematical methods to one based on an entire problem-solving cycle.
- The PPDAC cycle provides a convenient framework: Problem - Plan - Data - Analysis - Conclusion and communication.
- Data literacy is a key skill for the modern world.
**Getting Things in Proportion: Categorical Data and Percentages**- Binary variables are yes/no questions, sets of which can be summarized as proportions.
- Positive or negative framing of proportions can change their emotional impact.
- Relative risks tend to convey an exaggerated importance, and absolute risks should be provided for clarity.
- Expected frequencies promote understanding and an appropriate sense of importance.
- Odds ratios arise from scientific studies but should not be used for general communication.
- Graphics need to be chosen with care and awareness of their impact.

**Summarizing and Communicating Numbers. Lots of Numbers**- A variety of statistics can be used to summarize the empirical distribution of data-points, including measures of location and spread.
- Skewed data distributions are common, and some summary statistics are very sensitive to outlying values.
- Data summaries always hide some detail, and care is required so that important information is not lost.
- Single sets of numbers can be visualized in strip-charts, box-and-whisker plots and histograms.
- Consider transformations to better reveal patterns, and use the eye to detect patterns, outliers, similarities and clusters.
- Look at pairs of numbers as scatter-plots, and time-series as line-graphs.
- When exploring data, a primary aim is to find factors that explain the overall variation.
- Graphics can be both interactive and animated.
- Infographics highlight interesting features and can guide the viewer through a story, but should be used with awareness of their purpose and their impact.

**Why Are We Looking at Data Anyway? Populations and Measurement**- Inductive inference requires working from our data, through study sample and study population, to a target population.
- Problems and biases can crop up at each stage of this path.
- The best way to proceed from sample to study population is to have drawn a random sample.
- A population can be thought of as a group of individuals, but also as providing the probability distribution for a random observation drawn from that population.
- Populations can be summarized using parameters that mirror the summary statistics of sample data.
- Often data does not arise as a sample from a literal population. When we have all the data there is, then we can imagine it drawn from a metaphorical population of events that could have occurred, but didn’t.

**What Causes What?**- Causation, in the statistical sense, means that when we intervene, the chances of different outcomes are systematically changed.
- Causation is difficult to establish statistically, but well-designed randomized trials are the best available framework.
- Principles of blinding, intention-to-treat and so on have enabled large-scale clinical trials to identify moderate but important effects.
- Observational data may have background factors influencing the apparent observed relationships between an exposure and an outcome, which may be either observed confounders or lurking factors.
- Statistical methods exist for adjusting for other factors, but judgement is always required as to the confidence with which causation can be claimed.

**Modelling Relationships Using Regression**- Regression models provide a mathematical representation between a set of explanatory variables and a response variable.
- The coefficients in a regression model indicate how much we expect the response to change when the explanatory variable is observed to change.
- Regression-to-the-mean occurs when more extreme responses revert to nearer the long-term average, since a contribution to their previous extremeness was pure chance.
- Regression models can incorporate different types of response variable, explanatory variables and non-linear relationships.
- Caution is required in interpreting models, which should not be taken too literally: ‘All models are wrong, but some are useful.’

**Algorithms, Analytics and Prediction**- Algorithms built from data can be used for classification and prediction in technological applications.
- It is important to guard against over-fitting an algorithm to training data, essentially fitting to noise rather than signal.
- Algorithms can be evaluated by the classification accuracy, their ability to discriminate between groups, and their overall predictive accuracy.
- Complex algorithms may lack transparency, and it may be worth trading off some accuracy for comprehension.
- The use of algorithms and artificial intelligence presents many challenges, and insights into both the power and limitations of machine-learning methods is vital.

**How Sure Can We Be About What Is Going On? Estimates and Intervals**- Uncertainty intervals are an important part of communicating statistics.
- Bootstrapping a sample consists of creating new data sets of the same size by resampling the original data, with replacement.
- Sample statistics calculated from bootstrap resamples tend towards a normal distribution for larger data sets, regardless of the shape of the original data distribution.
- Uncertainty intervals based on bootstrapping take advantage of modern computer power, do not require assumptions about the mathematical form of the population and do not require complex probability theory.

**Probability - the Language of Uncertainty and Variability**- The theory of probability provides a formal language and mathematics for dealing with chance phenomena.
- The implications of probability are not intuitive, but insights can be improved by using the idea of expected frequencies.
- The ideas of probability are useful even when there is no explicit use of a randomizing mechanism.
- Many social phenomena show a remarkable regularity in their overall pattern, while individual events are entirely unpredictable.

**Putting Probability and Statistics Together**- Probability theory can be used to derive the sampling distribution of summary statistics, from which formulae for confidence intervals can be derived.
- A 95% confidence interval is the result of a procedure that, in 95% of cases in which its assumptions are correct, will contain the true parameter value. It cannot be claimed that a specific interval has 95% probability of containing the true value.
- The Central Limit Theorem implies that sample means and other summary statistics can be assumed to have a normal distribution for large samples.
- Margins of error usually do not incorporate systematic error due to non-random causes – external knowledge and judgement is required to assess these.
- Confidence intervals can be calculated even when we observe all the data, which then represent uncertainty about the parameters of an underlying metaphorical population.

**Answering Questions and Claiming Discoveries**- Tests of null hypotheses - default assumptions about statistical models - form a major part of statistical practice.
- A P-value is a measure of the incompatibility between the observed data and a null hypothesis: formally it is the probability of observing such an extreme result, were the null hypothesis true.
- Traditionally, P-value thresholds of 0.05 and 0.01 have been set to declare ‘statistical significance’.
- These thresholds need to be adjusted if multiple tests are conducted, for example on different subsets of the data or multiple outcome measures.
- There is a precise correspondence between confidence intervals and P-values: if, say, the 95% interval excludes 0, we can reject the null hypothesis of 0 at P<0.05.
- Neyman-Pearson theory specifies an alternative hypothesis, and fixes Type I and Type II error rates for the two possible kinds of errors in a hypothesis test.
- Separate forms of hypothesis tests have been developed for sequential testing.
- P-values are often misinterpreted: in particular they do not convey the probability that the null hypothesis is true, nor does a non-significant result imply that the null hypothesis is true.

**Learning from Experience the Bayesian Way**- Bayesian methods combine evidence from data (summarized by the likelihood) with initial beliefs (known as the prior distribution) to produce a posterior probability distribution for the unknown quantity.
- Bayes’ theorem for two competing hypotheses can be expressed as posterior odds = likelihood ratio x prior odds.
- The likelihood ratio expresses the relative support for two hypotheses from an item of evidence, and is sometimes used to summarize forensic evidence in criminal trials.
- When the prior distribution comes from some physical sampling process, Bayesian methods are uncontroversial. However generally a degree of judgement is necessary.
- Hierarchical models allow evidence to be pooled across multiple small analyses that are assumed to have parameters in common.
- Bayes factors are the equivalent of likelihood ratios for scientific hypotheses, and are a controversial substitute for null-hypothesis significance testing.
- The theory of statistical inference has a long history of controversy, but issues of quality of data and scientific reliability are more important.

**How Things Go Wrong**- Poor statistical practice has some responsibility for the crisis in the reproducibility of science.
- Deliberate fabrication of data appears to be fairly rare, but errors in statistical methods are frequent.
- An even greater problem is questionable research practices that tend to lead to exaggerated claims of statistical significance.
- In the pipeline by which statistical evidence reaches the public, press offices, journalists and editors add to the flow of unjustified statistical claims through their use of questionable interpretation and communication practices.

**How We Can Do Statistics Better**- Producers, communicators and audiences all have a role in improving the way that statistical science is used in society.
- Producers need to ensure that science is reproducible. To demonstrate trustworthiness, information should be accessible, intelligible, assessable and useable.
- Communicators need to be wary of trying to fit statistical stories into standard narratives.
- Audiences need to call out poor practice by asking questions about the trustworthiness of their numbers, their source and their interpretation.
- When faced with a claim based on statistical evidence, first feel whether it seems plausible.

**In Conclusion**- To put it bluntly, statistics can be difficult. Although I have tried to tackle underlying issues in this book rather than getting embroiled in technical detail, the narrative has unavoidably had to rely on some challenging concepts. So, congratulations for reaching the end.
- Rather than trying to boil down the past chapters into a shortlist of pieces of wise advice, I can take advantage of the following ten simple rules for effective statistical practice. These came from a group of senior statisticians who, mirroring this book, are keen to emphasize the non-technical issues which are generally not taught in statistics courses. I have added my own comments. These ‘rules’ should be fairly self-evident, and rather neatly summarize the issues tackled in this book.
*Statistical methods should enable data to answer scientific questions*: Ask ‘why am I doing this?’, rather than focusing on which particular technique to use.*Signals always come with noise*: It is trying to separate out the two that makes the subject interesting. Variability is inevitable, and probability models are useful as an abstraction.*Plan ahead, really ahead*: This includes the idea of pre-specification in confirmatory experiments - avoiding researcher degrees of freedom.*Worry about data quality*: Everything rests on the data.*Statistical analysis is more than a set of computations*: Do not just plug into formulae or run procedures in software, without knowing why you are doing so.*Keep it simple*: The main communication should be as basic as possible - do not show off skills in complex modelling unless they are really necessary.*Provide assessments of variability*: With the warning that margins of error are generally bigger than claimed.*Check your assumptions*: And make clear when this has not been possible.*When possible, replicate!*: Or encourage others to do so.*Make your analysis reproducible*: Others should be able to access your data and code.

- Statistical science plays an important role in all our lives, and is constantly changing in response to the increasing quantity and depth of data becoming available. But the study of statistics does not just have an impact on society in general but on individuals in particular. From a purely personal perspective, putting this book together has made me realize how much my life has been enriched by engaging with statistics. I hope that you might feel the same - if not now, then in the future.

- I was alerted to the book via excerpts in Aeon
^{3}. - While this is a new paperback, it's a fairly horrible edition – small and bound so that any attempt to open it flat risks snapping the spine.
- Having said that, it’s more robust than expected as I’ve successfully cc’d the Chapter Summaries. I expect it’ll get more brittle with age.
- Now that I’ve read it, I can vouch for the text being much better than the fabric. However, it can’t really be read like a novel. It’ll certainly – for me – require a second reading.

- “From the Publisher” – via Amazon.

- I added these as I read the Chapters, though I had to do a catch-up after Chapter 4.
- I ought also to note any interesting snippets …
- One that immediately comes to mind is that the statistics for the use of statins are based on prescription rather than use; ie. they may be ineffective because not taken. There may be a 50% reduction in CHD rather than the published 25%.

Pelican (13 Feb. 2020)

- Blue: Text by me; © Theo Todman, 2023
- Mauve: Text by correspondent(s) or other author(s); © the author(s)

© Theo Todman, June 2007 - Sept 2023. | Please address any comments on this page to theo@theotodman.com. | File output: Website Maintenance Dashboard |

Return to Top of this Page | Return to Theo Todman's Philosophy Page | Return to Theo Todman's Home Page |