Users Online: 96 Home Print this page Email this page Small font sizeDefault font sizeIncrease font size  
Home | About us | Editorial board | Search | Ahead of print | Current issue | Archives | Submit article | Instructions | Subscribe | Contacts | Login 


Table of Contents
Year : 2016  |  Volume : 32  |  Issue : 4  |  Page : 421-423

The American Statistical Association statement on P- values explained

Department of Anaesthesiology and Intensive Care, PGIMER, Chandigarh, India

Date of Web Publication25-Nov-2016

Correspondence Address:
Lakshmi Narayana Yaddanapudi
Department of Anaesthesiology and Intensive Care, PGIMER, Chandigarh - 160 012
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/0970-9185.194772

Rights and Permissions

How to cite this article:
Yaddanapudi LN. The American Statistical Association statement on P- values explained. J Anaesthesiol Clin Pharmacol 2016;32:421-3

How to cite this URL:
Yaddanapudi LN. The American Statistical Association statement on P- values explained. J Anaesthesiol Clin Pharmacol [serial online] 2016 [cited 2022 Sep 28];32:421-3. Available from:

For better or worse, Null Hypothesis Significance Testing (NHST) with its associated “P” values has become the standard for most published medical literature. However, “P” values are difficult to understand and interpret, even for established researchers. This has led to a lot of unfavorable attention to this issue in the recent past,[1] especially in the context of research misconduct. It has become the perennial butt of scientific cartoonists such as Randall Munroe at [Figure 1].[2] It is in this background that for the first time in its 177 year history, the American Statistical Association released a “Statement on Statistical Significance and P- values” with six principles. Wasserstein and Lazar have explained the context, process, and purpose of this statement in The American Statistician.[3]
Figure 1: “If all else fails, use 'significant at P > 0.05 level' and hope no one notices.” (, Randall Munroe, Creative Commons Attribution-NonCommercial 2.5 License)

Click here to view

As practicing physician-scientists, it is important for us to understand the context and “significance” of this statement. This editorial attempts to explain the salient features of this statement from the perspective of Indian anesthesiology research, based on the explanations provided by Wasserstein and Lazar.

Let me postulate a specific clinical research scenario for this purpose. A new antiemetic Nopov has been tested against placebo in a sample of patients undergoing day care gynecological surgery. The number of patients vomiting on the first postoperative day was lower in the treatment group (45/100) compared to the placebo group (60/100) with a P- value of 0.03.

Principle 1: P-values can indicate how incompatible the data are with a specified statistical model.

A P- value is one of the ways of summarizing the incompatibility between the observed data and a proposed model for the data. The most common model we use, is the so-called “null hypothesis,” which in practice essentially proposes that Nopov has no effect whatsoever. The smaller the P- value, the larger the incompatibility of the data with the null hypothesis. A P- value of 0.03 says that 45% patients in the Nopov group will vomit by chance in 3% of samples drawn from a population, in which Nopov has no effect. It does not say anything about the population in which Nopov has an effect.

Principle 2: P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

A P value of 0.03 does not mean that the probability of Nopov not having any effect is 3%. Neither does it mean that 45% of patients in the Nopov group vomited by chance. As the last sentences in Principle 1 explain, a P- value is a statement about the data in the context of a null hypothesis. It does not say anything about the null hypothesis or the alternate hypothesis. It is not an error probability.

Principle 3: Scientific conclusions and business or policy decisions should not be based only on whether a P-value passes a specific threshold.

The results of this study should not lead you to conclude that Nopov reduces postoperative nausea. A conclusion is not binary yes/no because of one study. Researchers should consider the context of the study while deriving scientific inferences. These should include the design of the study (e.g., improper randomization or poor concealment of allocation, leading to selection, or other types of biases), the quality of measurements (e.g., if vomiting is not documented in real time, but is based on recollection at an interview conducted a week later, leading to recollection bias), the external evidence for the phenomenon under study (e.g., studies showing that Nopov does not penetrate the blood–brain barrier, or that it does, or that it causes profound sedation), and the validity of statistical assumptions that underlie the data analysis. A low P- value should never be the sole basis for a scientific claim.

Principle 4: Proper inference requires full reporting and transparency.

All the analyses done should be reported fully. Conducting multiple statistical tests on the same data and reporting only those with low P- values makes the reported analysis uninterpretable. For example, in the given study, the number of patients vomiting may have been analyzed for the period that the patient was in the postanesthesia care unit, or the first 6, 24, 48 or 72 h. The number of episodes of vomiting per patient in all these given periods could have been tested instead of the number of patients. That gives us ten possible tests. If all ten are conducted, and only the ones with a P < 0.05 are reported, as commonly happens, it amounts to research misconduct going by various names such as cherry-picking, data dredging, p-hacking, significance chasing, or more politely “selective inference.” This is one of the main causes of the spurious excess of statistical significance in published literature. “Valid scientific conclusions based on P- values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including P- values) were selected for reporting.”[3]

Principle 5: A P-value, or statistical significance, does not measure the size of an effect or the importance of a result.

The threshold of statistical significance that is commonly used is a P- value of 0.05. This is conventional and arbitrary. It does not convey any meaningful evidence of the size of the effect. A P- value of 0.01 does not mean the effect size is larger than with a P- value of 0.03. The P- value would have been 0.000002, if we had sampled 1000 patients instead of 200 in the Nopov study and obtained the same results (i.e., the same effect size). Similarly, if an effect is measured to a high enough precision, the P- value will change. For example, if in the above study, the incidence of postoperative vomiting was 46%, the P- value would have been 0.047, which is considered statistically significant. If we then measure the incidence more precisely and get a value of 46.4%, the P- value would become 0.054, considered nonsignificant. A similar change can take place in the opposite direction as well.

Statistical significance does not automatically equate to scientific, human, or economic significance. It might be that even if Nopov does, in fact, reduce the incidence of vomiting by 15%, this might not be clinically relevant. It might not matter to the patient, if, for example, it produces severe dysphoria which is more uncomfortable for the patient. It might not make economic sense to use the drug, if, for example, it costs 10,000 rupees to treat one patient.

Principle 6: By itself, a P-value does not provide a good measure of evidence regarding a model or hypothesis.

A context-less P- value without any other evidence provides very limited information. A large P- value is not evidence of your alternate hypothesis since an arbitrarily large number of other hypotheses are consistent with the observed data. Data analysis should not conclude with the calculation of the P- value. Correct and careful interpretation of statistical tests requires examining the sizes of effect estimates and confidence limits as well as precise P- values. Other approaches may provide a more direct evidence of the size of an effect or the correctness of a hypothesis albeit with further statistical assumptions. These approaches include methods emphasizing estimation over testing, Bayesian methods, likelihood ratios, and others.

The fundamental problem discussed here is the implicit practice of defining success on passing an arbitrarily defined threshold. If this is followed, biases will occur regardless of whether the threshold being considered is a P- value, a 95% confidence interval, Bayes factor, false discovery rate, or any other measure. It is better to promote transparency in study design, conduct, and reporting than to rely on a single binary criterion of whatever type.

At the present time, as responsible scientists, we should do the following at a minimum. In the above-mentioned study, we should specify what exactly the null hypothesis was and what the alternate hypothesis. We should specifically document the effect size that we considered clinically and economically important and relevant to the patients in question. We should calculate the sample size required based on this effect size, and appropriate α and β values, which may not be 0.05 and 0.20 as in most studies. We should define in adequate detail measures to reduce bias such as choosing an appropriate population, randomization, and concealment of allocation. Moreover, we should implement them during the conduct of the study. We should document beforehand the outcomes of interest, and the methods and time of their measurement. If we want to perform NHST, we should define a priori what significance tests we want to perform on what data. We should interpret the results in context, with an understanding of the underlying phenomena, with complete reporting of all the analyses performed. Finally, we should supplement the data summaries and the P- values with estimates of the effect sizes with measures of their uncertainty, and other methods such as likelihood ratios, confidence, credible, or prediction intervals.

  References Top

Siegfried T. Odds are, it's wrong: Science fails to face the shortcomings of statistics. Sci News 2010;177:26. Available from: [Last accessed on 2016 Nov 1].  Back to cited text no. 1
Licensed Under a Creative Commons Attribution-Non Commercial 2.5 License. Available from: 1478/. [Last accessed on 2016 Nov 1].  Back to cited text no. 2
Wasserstein RL, Lazar NA. The ASA's statement onP- values: Context, process, and purpose. Am Stat 2016;70:129-33. Available from: Vt2XIOaE2MN. [Last accessed on 2016 Nov 11].  Back to cited text no. 3


  [Figure 1]

This article has been cited by
1 P-Values and Power in Orthopedic Research: Myths and Reality
Isabella Zaniletti, Katrina L. Devick, Dirk R. Larson, David G. Lewallen, Daniel J. Berry, Hilal Maradit Kremers
The Journal of Arthroplasty. 2022;
[Pubmed] | [DOI]
2 Genome-wide Survival Analysis for Macular Neovascularization Development in Central Serous Chorioretinopathy Revealed Shared Genetic Susceptibility with Polypoidal Choroidal Vasculopathy
Yuki Mori, Masahiro Miyake, Yoshikatsu Hosoda, Akiko Miki, Ayako Takahashi, Yuki Muraoka, Manabu Miyata, Takehiro Sato, Hiroshi Tamura, Sotaro Ooto, Ryo Yamada, Kenji Yamashiro, Makoto Nakamura, Atsushi Tajima, Masao Nagasaki, Shigeru Honda, Akitaka Tsujikawa
Ophthalmology. 2022;
[Pubmed] | [DOI]
3 Comparison of Health Literacy Assessment Tools among Beijing School-Aged Children
Shuaijun Guo, Xiaoming Yu, Elise Davis, Rebecca Armstrong, Lucio Naccarella
Children. 2022; 9(8): 1128
[Pubmed] | [DOI]
4 A Pan-Cancer Analysis of the Oncogenic and Immunogenic Role of m6Am Methyltransferase PCIF1
Ming-Zhu Jin, Yi-Gan Zhang, Wei-Lin Jin, Xi-Peng Wang
Frontiers in Oncology. 2021; 11
[Pubmed] | [DOI]
5 The Effects of Ramadan Intermittent Fasting on Football Players and Implications for Domestic Football Leagues Over the Next Decade: A Systematic Review
Matthew D. DeLang, Paul A. Salamh, Hamdi Chtourou, Helmi Ben Saad, Karim Chamari
Sports Medicine. 2021;
[Pubmed] | [DOI]
6 Porcine Parvovirus 2 Is Predominantly Associated With Macrophages in Porcine Respiratory Disease Complex
April Nelsen,Chun-Ming Lin,Ben M. Hause
Frontiers in Veterinary Science. 2021; 8
[Pubmed] | [DOI]
7 The impacts of parity on spirometric parameters: a systematic review
Leila Triki,Helmi Ben Saad
Expert Review of Respiratory Medicine. 2021;
[Pubmed] | [DOI]
8 Ischemic and Hemorrhagic Stroke Among Critically Ill Patients With Coronavirus Disease 2019
Sung-Min Cho, Lavienraj Premraj, Jonathon Fanning, Samuel Huth, Adrian Barnett, Glenn Whitman, Rakesh C. Arora, Denise Battaglini, Diego Bastos Porto, HuiMahn Choi, Jacky Suen, Gianluigi Li Bassi, John F. Fraser, Chiara Robba, Matthew Griffee
Critical Care Medicine. 2021; Publish Ah
[Pubmed] | [DOI]
9 Seizure outcome after epilepsy surgery for patients with normal MRI: A Single center experience
Mohammad Alsumaili,Mashael Alkhateeb,Abeer Khoja,Mohammed Alkhaja,Ashwaq Alsulami,Khalid Alqadi,Salah Baz,Tariq Abalkhail,Fawzi Babtain,Ibrahim Althubaiti,Mahmoud Abu-Ata,Faisal Alotaibi
Epilepsy Research. 2021; 173: 106620
[Pubmed] | [DOI]
10 Effect of load-induced local mechanical strain on peri-implant bone cell activity related to bone resorption and formation in mice: An analysis of histology and strain distributions
Hisami Okawara,Yuki Arai,Hitomi Matsuno,Petr Marcián,Libor Borák,Kazuhiro Aoki,Noriyuki Wakabayashi
Journal of the Mechanical Behavior of Biomedical Materials. 2021; : 104370
[Pubmed] | [DOI]
11 Conditioned pain modulation and pain sensitivity in functional somatic disorders: The DanFunD study
Marie Weinreich Petersen,Sine Skovbjerg,Jens Søndergaard Jensen,Tina Birgitte Wisbech Carstensen,Thomas Meinertz Dantoft,Per Fink,Michael Eriksen Benros,Erik Lykke Mortensen,Torben Jørgensen,Lise Kirstine Gormsen
European Journal of Pain. 2021;
[Pubmed] | [DOI]
12 An Efficient Combination among sMRI, CSF, Cognitive Score, and APOE e4 Biomarkers for Classification of AD and MCI Using Extreme Learning Machine
Uttam Khatri,Goo-Rak Kwon
Computational Intelligence and Neuroscience. 2020; 2020: 1
[Pubmed] | [DOI]
13 Does Ramadan Observance Affect Cardiorespiratory Capacity of Healthy Boys?
Amira Miladi, Selma Ben Fraj, Imed Latiri, Helmi Ben Saad
American Journal of Men's Health. 2020; 14(3): 1557988320
[Pubmed] | [DOI]
14 Many High-Quality Randomized Controlled Trials in Sports Physical Therapy Are Making False-Positive Claims of Treatment Effect: A Systematic Survey
Chris Bleakley,Jonathan Reijgers,James M. Smoliga
Journal of Orthopaedic & Sports Physical Therapy. 2020; 50(2): 104
[Pubmed] | [DOI]
15 A pilot study to investigate the histomorphometric changes of murine maxillary bone around the site of mini-screw insertion in regenerated bone induced by anabolic reagents
Preksa Keo,Yoshiro Matsumoto,Yasuhiro Shimizu,Shigeki Nagahiro,Masaomi Ikeda,Kazuhiro Aoki,Takashi Ono
European Journal of Orthodontics. 2020;
[Pubmed] | [DOI]
16 Effect of the live oral attenuated typhoid vaccine, Ty21a, on systemic and terminal ileum mucosal CD4+ T memory responses in humans
Jayaum S Booth,Eric Goldberg,Seema A Patil,Robin S Barnes,Bruce D Greenwald,Marcelo B Sztein
International Immunology. 2019; 31(2): 101
[Pubmed] | [DOI]
17 Diversity of Salmonella Typhi-responsive CD4 and CD8 T cells before and after Ty21a typhoid vaccination in children and adults
Mark E Rudolph,Monica A McArthur,Laurence S Magder,Robin S Barnes,Wilbur H Chen,Marcelo B Sztein
International Immunology. 2019; 31(5): 315
[Pubmed] | [DOI]
18 Blood lipids and pressures data of exclusive narghile smokers compared with healthy non-smokers: studies from thin to thick
Environmental Science and Pollution Research. 2019;
[Pubmed] | [DOI]
19 The fragility index applied to liver-related trials
Chase Meyer,Trace E. Heavener,Matt Vassar
Indian Journal of Gastroenterology. 2019;
[Pubmed] | [DOI]
20 Characteristics of regulatory T-cell populations before and after Ty21a typhoid vaccination in children and adults
Mark E. Rudolph,Monica A. McArthur,Laurence S. Magder,Robin S. Barnes,Wilbur H. Chen,Marcelo B. Sztein
Clinical Immunology. 2019; 203: 14
[Pubmed] | [DOI]
21 The Art of Validating Quantitative Proteomics Data
David C. Handler,Dana Pascovici,Mehdi Mirzaei,Vivek Gupta,Ghasem Hosseini Salekdeh,Paul A. Haynes
PROTEOMICS. 2018; 18(23): 1800222
[Pubmed] | [DOI]
22 How to Construct, Conduct and Analyze an Exercise Training Study?
Anne Hecksteden,Oliver Faude,Tim Meyer,Lars Donath
Frontiers in Physiology. 2018; 9
[Pubmed] | [DOI]
23 Unbreakable? An Analysis of the Fragility of Randomized Trials that Support Diabetes Treatment Guidelines
B. Chase Kruse,B. Matt Vassar
Diabetes Research and Clinical Practice. 2017;
[Pubmed] | [DOI]


    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

  In this article
   Article Figures

 Article Access Statistics
    PDF Downloaded4585    
    Comments [Add]    
    Cited by others 23    

Recommend this journal