Modern Data Analysis: A First Course in Applied Statistics | |

Hamilton (Lawrence) | |

This Page provides (where held) the Abstract of the above Book and those of all the Papers contained in it. | |

Text Colour-Conventions | Notes Citing this Book |

**BOOK ABSTRACT: **__Structure___{1}

- This book … should prove useful whether you stay a consumer of other people's statistical claims or go on to do original research yourself.
- The book is divided into four parts:-
- Part I, Chapters 1-6, introduces methods for exploring and describing the way in which variables vary. The question of just how variables vary is central to all further statistical analysis. We begin by analyzing variables one at a time, within small sets of data.
- Part II, Chapters 7-11, looks at statistical inference. This is the process of generalizing from the data at hand (the sample) to the larger universe (population) from which the sample came. Statistical inference requires a marriage of descriptive techniques and mathematical theory. The fit between theory and data becomes a matter of some concern.
- Part III, Chapters 12-15, examines how descriptive, exploratory, and inferential methods are applied to understanding relationships between two variables. This is where the interesting questions of causality are first addressed.
- Part IV consists only of Chapter 16, which extends the methods of previous chapters to the analysis of relationships among three or more variables. It provides a preview of more advanced statistics, for which earlier chapters have built up a foundation.

- UNIVARIATE DESCRIPTIVE ANALYSIS
- A First Look at Data – 3
- Graphing Variable Distributions – 23
- Summarizing Distributions: Measures of Center 59
- Summarizing Distributions: Measures of Spread – 91
- Comparing Variable Distributions – 115
- Coping with Outliers and Skewness – 149

- INFERENCE IN UNIVARIATE ANALYSIS – 175
- An Introduction to Probability – 177
- Random Sampling and Sampling Distributions – 207
- Inference Using the Normal Distribution – 241
- Large-Sample Hypothesis Tests – 277
- Small-Sample Inference for Means and Proportions – 311

- BIVARIATE ANALYSIS: RELATIONSHIPS BETWEEN TWO VARIABLES – 355
- Two Categorical Variables: Crosstabulation and the Chi-Square Test – 359
- One Categorical and One Measurement Variable: Comparisons – 397
- Two Measurement Variables: Regression Analysis – 457
- Inference and Criticism in Two-Variable Regression – 503

- MULTIVARIATE ANALYSIS – 561
- An Introduction to Multiple Regression – 563

- APPENDICES
- Answers to In-Chapter Problems – 609
- Statistical Tables – 667
- References – 675
- Index – 681

- Introductory Undergraduate
- Wadsworth (Brooks/Cole Publishing Company), Pacific Grove, California, 1990

"

*Modern Data Analysis*is intended for use with a first course in statistics. It starts with the elementary question "What are data?", and ends sixteen chapters later with an introduction to multiple regression. The level of mathematics is kept low, and the emphasis is on applications and real-world data, rather than theory. The book aims to provide students with an experience of statistical analysis as a process of scientific discovery.*Modern Data Analysis*reflects a belief that most students are more interested in applied statistics than they are in statistical theory. All concepts are introduced in the context of real-world examples, and reinforced by exercises in which students are asked to think out analyses for themselves. I hope to make the concepts more accessible in this way, and also to show that these techniques are actually useful. Students are exposed from the start to the fact that real data are often messy, complex, and ambiguous.- "Introductory statistics" need not be the same as "old-fashioned statistics." Statistical analysis is presented here as an interactive, exploratory process with a large graphical component. This is in keeping with the philosophical changes wrought by recent developments including econometric regression methods, Tukey's exploratory data analysis, robust estimation, criticism and influence analysis, computer graphics, and interactive computing in general. Without going into the technical details of these methods, I lay a foundation for them. Exposure to modern methods should provide all students, including the majority who do not go on into research, with both more sophistication as information consumers and with a strengthened appreciation for the logic of scientific research.

"

- A variable is an attribute that varies, instead of being always the same. … Variables can be used to describe almost any attribute of anything.
- Statistics provides systematic methods for the study of variables. These methods derive from the same basic ideas, whatever the objects of study. Similar statistical methods can be found in research journals from fields as diverse as business, social science, medicine, biology, or astronomy. Statistics is a common language for many areas of science. … You will likely encounter articles that cannot even be read without advanced training in statistics.
- Our news media shower us with statistical information. Daily newspapers have two sections — sports scores and stock market reports — that consist entirely of tiny-print data and statistical summaries. News stories, advertisements, and political campaigns frequently use or misuse statistical information. Their level of sophistication is low, but they still may confuse (sometimes intentionally) a statistically naive consumer. Even people with no interest in science or research should have some background in statistical thinking.
- In:-
- Chapter 1 we start with the elementary question, "What do data look like?" The question is not so simple as it first appears.
- Chapters 2-5 introduce ways for finding out and describing how variables vary.
- Chapter 6 concludes this first section with a discussion of what to do about data that are found to be "ill behaved."

- A First Look at Data – 3
- 1.1 A Sample Data Set – 4
- 1.2 Frequency Distributions – 6
- 1.3 Statistical Summaries and Types of Variables
- 1.4 Aggregate Data – 12
- 1.5 Measurement Errors and Missing Values – 17
- 1.6 Univariate, Bivariate, and Multivariate Analysis

Summary – 20

Problems – 21

Notes – 22

- Graphing Variable Distributions – 23
- 2.1 Histograms – 24
- 2.2 Frequency Polygons and Ogives – 27
- 2.3 Stem-and-Leaf Displays – 31
- 2.4 Double-Stem and Five-Stem Versions – 35
- 2.5 Graphing Negative Numbers – 40
- 2.6 Time Plots – 43
- 2.7 Time Series Smoothing – 46
- 2.8 The Uses of Graphs – 52

Summary – 53

Problems – 54

Notes – 57

- Summarizing Distributions: Measures of Center 59
- 3.1 The Mode – 60
- 3.2 The Median – 63
- 3.3 Other Order Statistics – 67
- 3.4 The Mean – 70
- 3.5 Comparing the Median and the Mean – 71
- 3.6 Mean. Median, and Distributional Shape – 73
- 3.7 Choosing a Measure of Center – 76
- 3.8 Means and Medians from Grouped Data – 78
- 3.9 Weighted Means – 83

Summary – 86

Problems – 86

Notes – 89

- Summarizing Distributions: Measures of Spread – 91
- 4.1 Measures of Spread Based on Order Statistics – 91
- 4.2 Deviations from the Mean – 96
- 4.3 Variance and Standard Deviation – 99
- 4.4 Sample and Population – 101
- 4.5 Calculating the Standard Deviation – 103
- 4.6 The Pseudo-Standard Deviation – 106
- 4.7 Understanding Measures of Spread – 109

Summary – 111

Problems – 111

Notes – 113

- Comparing Variable Distributions – 115
- 5.1 Constructing Box Plots – 116
- 5.2 Reading Box Plots – 122
- 5.2 Speed of Personal Computers – 126
- 5.4 True and Self-Reported Test Scores – 130
- 5.5 Seasonal Patterns in Water Use – 134
- 5.6 Notes on Comparing Distributions – 139

Summary – 142

Problems – 142

Notes – 147

- Coping with Outliers and Skewness – 149
- 6.1 Deleting Outliers – 150
- 6.2 When Should Outliers Be Deleted? – 154
- 6.3 Logarithms – 155
- 6.4 Per Capita Gross National Product – 158
- 6.5 Other Nonlinear Transformations – 163
- 6.6 Using Results from Transformed Data – 166

Summary – 169

Problems – 169

Notes – 173

"

- An Introduction to Probability – 177
- 7.1 Basic Concepts of Probability – 178
- 7.2 Reasoning with Probability: An Example – 184
- 7.3 Tree Diagrams – 186
- 7.4 Probability Distributions for Categorical Variables – 188
- 7.5 Probability Distributions for Measurement Variables – 193
- 7.6 Normal Distributions – 196

Summary – 198

Problems – 199

Notes – 205

- Random Sampling and Sampling Distributions – 207
- 8.1 Random Numbers – 208
- 8.2 Simple Random Samples – 212
- 8.3 Random Samples from Computer Data Files – 216
- 8.4 Sample and Population: An Example – 219
- 8.5 inferences About the Population – 223
- 8.6 A Computer Experiment – 225
- 8.7 Sampling Distributions and Standard Errors – 229
- 8.8 The Uses of Standard Errors – 231

Summary – 232

Problems – 233

Notes – 238

- Inference Using the Normal Distribution – 241
- 9.1 Definition of the Normal Distribution – 242
- 9.2 Parameters of the Normal Curve – 247
- 9.3 The Standard Normal Distribution – 250
- 9.4 Using Tables of the Standard Normal Distribution – 252
- 9.5 Z-Scores and Non-normal Distributions – 256
- 9.6 Confidence Intervals for Means – 260
- 9.7 An Illustration with Repeated Sampling – 263
- 9.8 Confidence Intervals for Proportions and Percentages – 265
- 9.9 Using Confidence Intervals to Test Hypotheses – 268

Summary – 270

Problems – 271

Notes – 275

- Large-Sample Hypothesis Tests – 277
- 10.1 Statistical Hypotheses – 278
- 10.2 A Hypothesis Test – 279
- 10.3 The Logic of Hypothesis Testing – 282
- 10.4 Type I and Type II Errors – 285
- 10.5 Large-Sample Tests for Proportions – 288
- 10.6 Large-Sample Tests for Means – 293
- 10.7 Z Test for Means: An Example – 294
- 10.8 One-Sided Tests – 297
- 10.9 Sampling from Finite Populations – 301

Summary – 305

Problems – 306

Notes – 308

- Small-Sample Inference for Means and Proportions – 311
- 11.1 A Small-Sample Problem – 312
- 11.2 The t Distribution – 313
- 11.3 The Normativity Assumption – 316
- 11.4 Confidence Intervals Based on Small-Sample Means – 319
- 11.5 A Second Look at Table 9.5 – 323
- 11.6 Small-Sample Hypothesis Tests for Means – 325
- 11.7 Inferences About Means of Non-normal Distributions – 329
- 11.8 The Binomial Distribution – 333
- 11.9 The Poisson Distribution – 342

Summary – 346

Problems – 347

Notes – 352

"

- Two Categorical Variables: Crosstabulation and the Chi-Square Test – 359
- 12.1 Cell Frequencies and Percentages in Crosstabulation – 360
- 12.2 The Independence Hypothesis and Expected Frequencies – 364
- 12.3 The Chi-Square Test – 366
- 12.4 Degrees of Freedom and the Chi-Square Distribution – 370
- 12.5 Parenthood and Opinions About Water Quality – 374
- 12.6 Chi-Square as a "Badness-of-Fit" Test – 377
- 12.7 The Problem of Thin Cells – 381
- 12.8 Continuity Correction for Thin Cells – 384
- 12.9 Sample Size and Significance In Chi-Square Analysis – 386

Summary – 388

Problems – 389

Notes – 393

- One Categorical and One Measurement Variable: Comparisons – 397
- 13.1 Overview of Comparison Issues – 398
- 13.2 Testing Hypotheses About Means – 403
- 13.3 Two-Sample Problems: Difference-of-Means Tests – 407
- 13.4 Confidence Intervals for Differences of Means – 413
- 13.5 Paired-Difference Tests – 415
- 13.6 K-Sample Problems: Analysis of Variance – 420
- 13.7 Error-Bar Plots – 428
- 13.8 IQ Scores and Reading Ability – 432
- 13.9 Dealing with Distributional Problems – 439
- 13.10 Nonparametric Tests and Rank Transformations – 440

Summary – 446

Problems – 447

Notes – 454

- Two Measurement Variables: Regression Analysis – 457
- 14.1 Scatter Plots – 458
- 14.2 Regression Line and Regression Equation – 462
- 14.3 Summary Statistics for Two Measurement Variables – 468
- 14.4 Predicted Values and Residuals – 473
- 14.5 Assessing Fit in Regression – 477
- 14.6 Correlation Coefficients – 480
- 14.7 Physician Problems and Hospital Size – 483
- 14.8 Predicting State SAT Scores – 488
- 14.9 Outliers and influence in Regression Analysis – 492
- 14.10 Notes on Calculation – 495

Summary – 498

Problems – 498

Notes – 502

- Inference and Criticism in Two-Variable Regression – 503
- 15.1 Inference In Regression – 504
- 15.2 Standard Errors in Regression – 505
- 15.3 Income and Homicide Rate – 508
- 15.4 t Tests for Regression Coefficients – 513
- 15.5 t Tests for Hypotheses Other Than beta = 0 – 516
- 15.6 Confidence Intervals for Regression Coefficients – 517
- 15.7 Confidence intervals for Regression Predictions – 521
- 15.8 F Tests in Two-Variable Regression – 523
- 15.9 Assumptions and Problems In Regression Analysis – 526
- 15.10 Scatter Plots for Regression Criticism – 529
- 15.11 Coping with Problems in Regression – 535
- 15.12 Understanding Curvilinear Regression – 541
- 15.13 Alternative Explanations – 550

Summary – 552

Problems – 553

Notes – 559

"

- An Introduction to Multiple Regression – 563
- 16.1 The Multiple Regression Equation – 563
- 16.2 Inference and Multiple Regression – 566
- 16.3 Reading Regression Output – 568
- 16.4 Predicting Teaching Evaluations from Grades and Class Size – 571
- 16.5 Standardized Regression Coefficients ["Beta Weights"] – 574
- 16.6 Interaction Effects – 576
- 16.7 How Many X Variables? – 580
- 16.8 Polynomial Regression – 583
- 16.9 Graphs for Multiple Regression – 584
- 16.10 Birth Rate, GNP and Child Mortality – 588
- 16.11 Dummy Variable Regression – 596

Summary – 599

Problems – 600

Notes – 604

"

- Answers to In-Chapter Problems – 609
- Statistical Tables – 667
- Table A.1: Probabilities for the Standard Normal Distribution – 668
- Table A.2: Critical Values for the Standard Normal Distribution – 669
- Table A.3: Critical Values for the Student's t Distribution – 670
- Table A.4: Critical Values for the Chi-Square Distribution – 671
- Table A.5: Critical Values for the F Distribution – 672

- References – 675
- Index – 681

- Blue: Text by me; © Theo Todman, 2017
- Mauve: Text by correspondent(s) or other author(s); © the author(s)

© Theo Todman, June 2007 - August 2017. | Please address any comments on this page to theo@theotodman.com. | File output: Website Maintenance Dashboard |

Return to Top of this Page | Return to Theo Todman's Philosophy Page | Return to Theo Todman's Home Page |