Users Online: 2697 Home Print this page Email this page Small font sizeDefault font sizeIncrease font size  
Home | About us | Editorial board | Search | Ahead of print | Current issue | Archives | Submit article | Instructions | Subscribe | Contacts | Login 

RSACP wishes to inform that it shall be discontinuing the dispatch of print copy of JOACP to it's Life members. The print copy of JOACP will be posted only to those life members who send us a written confirmation for continuation of print copy.
Kindly email your affirmation for print copies to preferably by 30th June 2019.


Table of Contents
Year : 2016  |  Volume : 32  |  Issue : 3  |  Page : 333-338

Reliability and validity of a tool to assess airway management skills in anesthesia trainees

Department of Anesthesiology, Aga Khan University, Karachi, Pakistan

Date of Web Publication22-Aug-2016

Correspondence Address:
Dr. Aliya Ahmed
Department of Anesthesiology, Aga Khan University, P.O. Box 3500, Stadium Road, Karachi 74800
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/0970-9185.168171

Rights and Permissions

Background and Aims: Gaining expertise in procedural skills is essential for achieving clinical competence during anesthesia training. Supervisors have the important responsibility of deciding when the trainee can be allowed to perform various procedures without direct supervision while ensuring patient safety. This requires robust and reliable assessment techniques. Airway management with bag-mask ventilation and tracheal intubation are routinely performed by anesthesia trainees at induction of anesthesia and to save lives during a cardiorespiratory arrest. The purpose of this study was to evaluate the construct validity, and inter-rater and test-retest reliability of a tool designed to assess competence in bag-mask ventilation followed by tracheal intubation in anesthesia trainees.
Material and Methods: Informed consent was obtained from all participants. Tracheal intubation and bag-mask ventilation skills in 10 junior and 10 senior anesthesia trainees were assessed by two investigators on two occasions at a 3-4 weeks interval, using a procedure-specific assessment tool.
Results: Average kappa value for inter-rater reliability was 0.91 and 0.99 for the first and second assessments, respectively, with an average agreement of 95%. The average agreement for test-retest reliability was 82% with a kappa value of 0.39. Senior trainees obtained higher scores compared to junior trainees in all areas of assessment, with a significant difference for patient positioning, preoxygenation, and laryngoscopy technique, depicting good construct validity.
Conclusion: The tool designed to assess bag-mask ventilation and tracheal intubation skills in anesthesia trainees demonstrated excellent inter-rater reliability, fair test-retest reliability, and good construct validity. The authors recommend its use for formative and summative assessment of junior anesthesia trainees.

Keywords: Airway management, clinical competence, tracheal intubation

How to cite this article:
Ahmed A, Khan FA, Ismail S. Reliability and validity of a tool to assess airway management skills in anesthesia trainees. J Anaesthesiol Clin Pharmacol 2016;32:333-8

How to cite this URL:
Ahmed A, Khan FA, Ismail S. Reliability and validity of a tool to assess airway management skills in anesthesia trainees. J Anaesthesiol Clin Pharmacol [serial online] 2016 [cited 2020 Feb 24];32:333-8. Available from:

  Introduction Top

Learning and mastering procedural skills are major challenges in anesthesia practice and are essential in the process of achieving clinical competence.[1],[2] Anesthesiologists carry out many complex clinical tasks in their routine work which the trainee is expected to learn and master during training. An increased public awareness of healthcare related issues has led to greater accountability of healthcare professionals. This has very rightly led to an increasing focus on patient safety in clinical practice. The supervisors have to undertake the important responsibility of deciding when a trainee can be allowed to perform the various procedures without direct supervision while ensuring patient safety. Supervisors and trainers must accept that not all trainees can be equally quick in learning and equally competent in performing practical procedures [3],[4] and reliable, and objective assessment is, therefore, mandatory.

Airway management is an inherent part of the routine day-to-day work of anesthesiologists. They are required to perform this procedure not only in the operation theater, but, also in the Intensive Care Unit, the wards and the Emergency Department. Failure to perform the technique promptly and correctly can lead to serious consequences including death. It is important to ensure that an anesthesia trainee is capable of performing tracheal intubation independently before he or she could be included in a cardiac arrest team, where direct supervision by a senior colleague is not always possible. This requires robust and reliable assessment techniques such as direct observation by senior anesthesiologists using procedure-specific tools while the trainee is performing the procedure on actual patients.[2],[5]

When constructing an assessment tool, it is important to explore the literature to see whether there is an already existing instrument that is appropriate and has established reliability and validity.[6],[7] We were successful in retrieving tools for assessment of procedures performed by anesthesiologists, including rapid sequence induction of anesthesia and management of difficult airways.[1],[3],[4],[8],[9],[10] Generic tools for assessment of various anesthetic procedures are also available. However, we could not identify a structured tool for assessment of routine airway management with established reliability and validity. We, therefore, constructed a procedure-specific tool for this purpose. The objectives of this study were to evaluate the inter-rater and test-retest reliability and construct validity of a tool designed to assess competence in bag-mask ventilation and tracheal intubation. Reliability of a tool is its ability to assess skills consistently by different assessors at different times while construct validity is the ability of the tool to differentiate among varying levels of expertise.[6],[7],[11],[12],[13]

  Material and Methods Top

Approval was granted by the University Ethics Review Committee (1398-Ane-ERC-09) and written informed consent was obtained from all participants. A total of 20 anesthesia trainees, 10 junior and 10 senior were recruited. Junior trainees were described as those having had more than two and <4 months of anesthesia training, while senior residents recruited were those in the fourth year of training and already performing airway management independently. The study protocol was presented in the departmental faculty meeting so as to share it with all faculty members. The purpose of the study was explained to the participating residents at the time of informed consent. The tool was not shared with the residents before the assessments.

The participants' bag-mask ventilation and tracheal intubation skills were assessed by the use of a structured procedure-specific assessment tool. All three authors participated in the construction of the tool and advice was taken from two other senior anesthesia consultants.

The tool comprised of five major categories with further sub-categories in each, in order to evaluate the performance of the trainee in all the essential steps involved in the procedure [Table 1]. A simple 3-point scale was used to assess each step, where:
Table 1: Steps of bag-mask ventilation and tracheal intubation assessed by direct observation in anesthesia trainees

Click here to view

  • 1 (one) meant “step not performed”
  • 2 (two) meant “performance below expectations”
  • 3 (three) meant “performance meets expectations”
  • A column was added for steps “not applicable” during the performance.

“Performance below expectation” was defined in the tool as unsuccessful attempt or incorrectly performed step, while “meets expectation” was defined as step performed adequately and successfully. The procedural steps used for assessment of bag-mask ventilation and tracheal intubation skills are provided in [Table 1]. Before finalizing the tool for the study, we conducted a pilot study to identify any missing steps and to assess the practicality of using the tool in the operation theater. The pilot study provided a chance for a final check on the content validity and served as a means of training the investigators in rating trainees' performance by direct observation. The authors also attended a half-day workshop on direct observation of procedural skills.

The residents were assessed while working in their assigned operation theater under the supervision of the assigned consultant anesthesiologist. Furthermore, they were assessed while anesthetizing patients undergoing elective procedures requiring endotracheal intubation. Routine preoperative assessment was done for each patient. Trainee's assessment was not done if the patient being anesthetized was pregnant or had oral, faciomaxillary or neck pathology or anatomic anomaly, obesity (body mass index > 30), rheumatoid arthritis, ankylosing spondylitis, a history of difficult airway in the past or was found to have limited mouth opening, buck teeth, short thick neck with limited mobility, and Mallampati Grade III or IV.

The assessment was done simultaneously by two of the investigators who are senior consultant anesthesiologists and registered supervisors for anesthesia training. The structured assessment tool was filled by both assessors independently. The trainee was observed while managing the airway with bag-mask ventilation and intubating the trachea with a tracheal tube. The assessment time began once the patient was transferred to the operating table for induction of anesthesia and monitors were attached and ended when the endotracheal tube position was confirmed, and the tube was fixed. Any decision to take over the procedure, in case the trainee was unable to intubate the patient's trachea, was left to the discretion of the supervising consultant. It was planned to allow two attempts at laryngoscopy and intubation, and if the trainee was unsuccessful after two attempts, it was to be considered a failed attempt. Each resident was observed performing the same procedure again after 3-4 weeks by the same assessors to evaluate the test-retest reliability of the tool.

Sample size was calculated using PASS version 11 (NCSS LLC, Kaysville, Utah). In a test for agreement between raters using the kappa statistic, a sample size of 20 subjects achieves 80% power to detect a true kappa value of 0.90 in a test of H0: Kappa = 0.50 versus H1: Kappa ≠ 0.50 using a two-tailed level of significance of 0.05.

Data analysis

Statistical analysis was performed using Statistical Packages for Social Sciences version 19 (SPSS Inc., Chicago, IL, USA). Inter-rater and test-retest reliability were computed by percent agreement and kappa statistic. Kappa statistic was used to evaluate the level of agreement between assessors' ratings and between the same assessor's ratings at two points in time for each item of the structured assessment form. Kappa is positive when the agreement exceeds what is expected by chance; kappa is negative when the observed agreement is less than the chance agreement. For the interpretation of kappa values the rating indicators are: 0.0-0.2 slight agreement, 0.21-0.40 fair agreement, 0.41-0.60 moderate agreement, 0.61-0.80 substantial agreement, and 0.81-1.0 almost perfect or perfect agreement. Percent agreement and kappa statistic was computed for each assessment criterion. Average agreement and the average kappa value was also calculated. For construct validity, the score of sub-categories of the main criteria were added for each rater in order to perform the analysis by using independent sample t-test and Mann-Whitney U-test (as per rule of normality of the data) to compare the scores between junior and senior residents. The value of ≤0.05 was taken as statistically significant.

  Results Top

Twenty anesthesia trainees participated in the study. There were an equal number of junior and senior residents. Average time taken for the assessment was 9 min. There was no failed attempt at tracheal intubation. The inter-rater agreement between scores at the two assessments is presented in [Table 2]. Percent agreement and kappa values were found to be high for patient positioning, bag-mask ventilation, chin lift/jaw thrust, and leak around the facemask among the two assessors, and the options of absence of CO2 trace, and difficulty in bag-mask ventilation exhibited 100% agreement. Assessment of professionalism also did not show any significant difference among the raters. The average kappa value for inter-rater reliability for the first assessment session was 0.91 and for the second assessment 0.99, with an average agreement of 95% [Table 2].
Table 2: Inter-rater reliability of the tool for assessment of bag-mask ventilation and tracheal intubation (percentage agreement and kappa values)

Click here to view

Kappa values and percent agreement for test-retest reliability are presented in [Table 3]. The average agreement for test-retest reliability was 82% with a kappa value of 0.39. Determination of construct validity [Table 4] showed that senior trainees obtained higher scores compared to the junior trainees in all areas of assessment. This difference was statistically significant for the sums of scores for patient positioning, preoxygenation, and laryngoscopy technique.
Table 3: Test-retest reliability of the tool for assessment of bag-mask ventilation and tracheal intubation (percentage agreement and kappa values)

Click here to view
Table 4: Construct validity of the assessment tool for bag-mask ventilation and tracheal intubation

Click here to view

  Discussion Top

Assessment of competence in cognitive knowledge, judgment, communication, including history taking, physical examination, etc., is routinely done by written, oral, and Objective Structured Clinical Examinations.[6] However, procedural skills have historically been assessed with subjective evaluations done by senior colleagues and supervisors without well-defined criteria or through procedure logs maintained by trainees.[13] Work has been done on defining a minimum number of procedures required to attain competency in anesthetic procedures.[3],[4],[8] However the relationship between experience, as judged by number of procedures performed, and competence is difficult to define and differs markedly in trainees.[4]

End-of-rotation global rating forms are often filled out by supervising faculty members who have not directly observed trainees performing the procedure on patients.[6],[7],[14] This form of assessment cannot reliably assess procedural skills in their entirety and cannot be justified for use in decisions about allowing trainees to perform procedures without direct supervision. Direct observation of the trainee, while performing a procedure on an actual patient, is recommended for a more reliable assessment of competence in procedural skills to enhance the quality of clinical training and ensure patient safety.[5],[15],[16] The construction of procedure-specific assessment tools is therefore required for all complex procedural skills.[2],[5],[16] It is essential to ensure that the trainee masters the principal components of airway management before he/she is allowed to perform this procedure without direct supervision.[1] The tool employed in this study was designed specifically for novices in anesthesia and hence the technique was broken down into each of its basic steps forming a checklist with a simple rating scale of 1-3 so that the procedure could be assessed in its entirety as recommended for assessment of procedural skills.[17] The inter-rater reliability for the tool was high. During their training, the anesthesia trainees work at multiple sites with multiple consultants who are responsible for their assessment and provision of feedback. Good inter-rater reliability is, therefore, a basic requirement for this assessment tool. This would allow the tool to be used by different assessors in different locations depending upon the initial rotations of the trainee. Many other researchers studying the inter-rater reliability of procedure-specific assessment tools for medical trainees have obtained good to excellent results for inter-rater reliability.[18],[19],[20]

The test-retest reliability for the assessment tool does not show as high agreement or kappa values as for inter-rater reliability. The most probable reason for this seems to be the learning effect involved due to the 3-4 weeks interval between the two assessment sessions. The anesthesia trainees get frequent opportunities to perform bag-mask ventilation and tracheal intubation on a daily basis and thus get the adequate practice to learn and master the skills in the early months of their training. Therefore, their performance might have improved in the 3-4 weeks between the two assessments in this study.

We found that the senior trainees obtained higher scores for all steps of bag-mask ventilation and intubation, the difference being significant in many of the steps [Table 4]. This indicates that this procedure-specific structured assessment tool has the ability to discriminate between junior and senior trainees, thus depicting good construct validity. Naik et al.[19] obtained similar results when testing validity and reliability of an assessment tool for brachial plexus regional anesthesia performance and have recommended their tool for routine use during anesthesia training. The main use of the tool employed in the current study will be for assessment of junior anesthesia trainees in their first 6 months of training. Bag-mask ventilation and tracheal intubation are among the first few procedural skills that anesthesia trainees learn at the beginning of training and then use it for the rest of their professional career. The authors hope to use the instrument for formative assessment in novices and for judgment of competence to perform the procedure without direct supervision. The average assessment score obtained by the group of senior trainees could be used to ascertain the score that the junior trainees must reach before they are trained and assessed for more advanced airway management skills required during difficult intubations and rapid sequence induction.

Both percent agreement and kappa statistics were used to analyze the reliability of the tool to increase the strength of the analysis. The percent agreement does not take account of the possibility that raters may guess on some scores due to uncertainty. It thus may overestimate the true agreement among raters. It is therefore advised to calculate both percent agreement and kappa for analysis of inter-rater reliability.[21] A limitation of our study is that the assessments were done in real time, and, therefore, the assessors were not blinded to the trainees being assessed. This could have been a source of bias in the assessment scores. Similar studies on assessment tools have been performed by assessing videotaped performance of procedural skills after masking the identity of the trainees or by employing assessors not known to the trainees and vice versa.[11],[12],[18],[19] We were not able to arrange this methodology because of lack of funds. Efforts were made to reduce this bias by the inclusion of residents who were not rotating with either of the two assessors at the time of assessment. Another limitation of this study is that a relatively long interval was allowed between the two assessment sessions. This could have affected the value of test-retest reliability due to learning effect, which is the main shortcoming of test-retest reliability studies.[22] We recommend that the second assessment should be done after shorter intervals to ascertain the test-retest reliability of tools used for assessment of frequently performed procedure such as endotracheal intubation. The absence of criteria for passing or failing the assessment may be considered as a limitation of the tool. This has been overcome by adding a sentence: “demonstrates ability to perform all aspects of the procedure independently” with a yes/no option at the end of the procedural steps. This section must be carefully filled by the assessors as it identifies whether or not the candidate was able to perform the entire procedure successfully and thus indicates that he/she has “passed or not passed” in performing the skill.

Simulation-based skill assessment is now being described for assessment of residents' ability to perform anesthetic skills.[23] However, financial constraints are a limiting factor in developing countries, where reliable and valid assessment tools like ours would be feasible and practical for routine assessment of trainees. As stated by Cuschieri et al., assessment of trainees is a form of quality assurance for the future.[24] Development of objective procedure-specific assessment tools for evaluation of procedural skills and their integration into training programs are the needs of the day. We believe that objective assessment with direct observation using well-defined criteria and rating scales has the potential to greatly improve assessment of procedural skills. Future research should focus on assessing improvement in procedural skills and quality of patient care with implementation of procedure-specific tools for assessment of skills in anesthesia training programs.

  Conclusion Top

Our results show that the tool designed by us to assess bag-mask ventilation and tracheal intubation skills in anesthesia trainees demonstrates good construct validity, excellent inter-rater reliability, and fair test-retest reliability. We recommend its use for formative and summative assessment of junior anesthesia trainees.


We are very grateful to Dr. Ali Asghar for his assistance in obtaining informed consent from the participants.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

  References Top

Whymark C, Moores A, MacLeod AD. A Scottish National Prospective Study of airway management skills in new-start SHOs. Br J Anaesth 2006;97:473-5.  Back to cited text no. 1
Curriculum for a CCT in Anaesthetics, August 2010. The Royal College of Anaesthetists. Available from: [Last accessed on 2015 Mar 11].  Back to cited text no. 2
Konrad C, Schüpfer G, Wietlisbach M, Gerber H. Learning manual skills in anesthesiology: Is there a recommended number of cases for anesthetic procedures? Anesth Analg 1998;86:635-9.  Back to cited text no. 3
de Oliveira Filho GR. The construction of learning curves for basic skills in anesthetic procedures: An application for the cumulative sum method. Anesth Analg 2002;95:411-6.  Back to cited text no. 4
Kogan JR, Holmboe ES, Hauer KE. Tools for direct observation and assessment of clinical skills of medical trainees: A systematic review. JAMA 2009;302:1316-26.  Back to cited text no. 5
Kern DE, Thomas PA, Hughes MT, editors. Curriculum Development for Medical Education: A Six Step Approach. Baltimore, USA: The Johns Hopkins University Press; 2009.  Back to cited text no. 6
Streiner DL, Norman GR, editors. Health Measurement Scales: A Practical Guide to Their Development and Use. Norfolk, UK: Oxford University Press; 2003.  Back to cited text no. 7
Kathirgamanathan A, Woods L. Educational tools in the assessment of trainees in anesthesia. Contin Educ Anaesth Crit Care Pain 2011;11:138-42.  Back to cited text no. 8
Stringer KR, Bajenov S, Yentis SM. Training in airway management. Anaesthesia 2002;57:967-83.  Back to cited text no. 9
UCSF. Examples of Focused Assessment Tools: Emergency Medicine Residency Program Airway Management Competency Form. Appendix C. Available from: [Last accessed on 2015 Mar 11].  Back to cited text no. 10
Moorthy K, Munz Y, Jiwanji M, Bann S, Chang A, Darzi A. Validity and reliability of a virtual reality upper gastrointestinal simulator and cross validation using structured assessment of individual performance with video playback. Surg Endosc 2004;18:328-33.  Back to cited text no. 11
Watson MJ, Wong DM, Kluger R, Chuan A, Herrick MD, Ng I, et al. Psychometric evaluation of a direct observation of procedural skills assessment tool for ultrasound-guided regional anaesthesia. Anaesthesia 2014;69:604-12.  Back to cited text no. 12
Bould MD, Crabtree NA, Naik VN. Assessment of procedural skills in anaesthesia. Br J Anaesth 2009;103:472-83.  Back to cited text no. 13
Martin JA, Regehr G, Reznick R, MacRae H, Murnaghan J, Hutchison C, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 1997;84:273-8.  Back to cited text no. 14
Epstein RM. Assessment in medical education. N Engl J Med 2007;356:387-96.  Back to cited text no. 15
Rosenblatt MA, Fishkind D. Proficiency in interscalene anesthesia-how many blocks are necessary? J Clin Anesth 2003;15:285-8.  Back to cited text no. 16
CCT in Anaesthetics: Assessment Guidance (2010 Curriculum); The Royal College of Anaesthetists. Available from: [Last accessed on 2015 Mar 11].  Back to cited text no. 17
Aggarwal R, Grantcharov T, Moorthy K, Milland T, Darzi A. Toward feasible, valid, and reliable video-based assessments of technical surgical skills in the operating room. Ann Surg 2008;247:372-9.  Back to cited text no. 18
Naik VN, Perlas A, Chandra DB, Chung DY, Chan VW. An assessment tool for brachial plexus regional anesthesia performance: Establishing construct validity and reliability. Reg Anesth Pain Med 2007;32:41-5.  Back to cited text no. 19
Davoudi M, Osann K, Colt HG. Validation of two instruments to assess technical bronchoscopic skill using virtual reality simulation. Respiration 2008;76:92-101.  Back to cited text no. 20
McHugh ML. Interrater reliability: The kappa statistic. Biochem Med (Zagreb) 2012;22:276-82.  Back to cited text no. 21
Rousson V, Gasser T, Seifert B. Assessing intrarater, interrater and test-retest reliability of continuous measurements. Stat Med 2002;21:3431-46.  Back to cited text no. 22
Murray DJ, Boulet JR, Avidan M, Kras JF, Henrichs B, Woodhouse J, et al. Performance of residents and anesthesiologists in a simulation-based skill assessment. Anesthesiology 2007;107: 705-13.  Back to cited text no. 23
Cuschieri A, Francis N, Crosby J, Hanna GB. What do master surgeons think of surgical competence and revalidation? Am J Surg 2001;182:110-6.  Back to cited text no. 24


  [Table 1], [Table 2], [Table 3], [Table 4]

This article has been cited by
1 Resuscitation Education Science: Educational Strategies to Improve Outcomes From Cardiac Arrest: A Scientific Statement From the American Heart Association
Adam Cheng,Vinay M. Nadkarni,Mary Beth Mancini,Elizabeth A. Hunt,Elizabeth H. Sinz,Raina M. Merchant,Aaron Donoghue,Jonathan P. Duff,Walter Eppich,Marc Auerbach,Blair L. Bigham,Audrey L. Blewer,Paul S. Chan,Farhan Bhanji
Circulation. 2018; 138(6)
[Pubmed] | [DOI]


    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

  In this article
  Material and Methods
   Article Tables

 Article Access Statistics
    PDF Downloaded368    
    Comments [Add]    
    Cited by others 1    

Recommend this journal