This critique on simulation-based assessment was written by Alice Gray, a PGY 4 in Emergency Medicine at The University of Toronto and 2017 SHRED [Simulation, Health Sciences, Resuscitation for the Emergency Department] Fellow.
You like to run simulations. You have become adept at creating innovative and insightful simulations. You have honed your skills in leading a constructive debrief. So what’s next? You now hope to be able to measure the impact of your simulation. How do you design a study to measure the effectiveness of your simulation on medical trainee education?
There are numerous decisions to make when designing a sim-based assessment study. For example, who is included in the study? Do you use direct observation or videotape recording or both? Who will evaluate the trainees? How do you train your raters to achieve acceptable inter-rater reliability? What are you assessing – team-based performance or individual performance?
One key decision is the evaluation tool used for assessing participants. A tool ideally should:
- Have high inter-rater reliability
- Have high construct validity
- Be feasible to administer
- Be able to discriminate between different level of trainees
Two commonly used sim-based assessment tools are Global Rating Scales (GRS) and Checklists. Here, these tools will be compared to evaluate their role for the assessment of simulation in medical education.
Global Rating Scales vs Checklists
GRS are tools that allow raters to judge participants’ overall performance and/or provide an overall impression of performance on specific sub-tasks.1 Checklists are lists of specific actions or items that are to be performed by the learner. Checklists prompt raters to attest to directly observable actions. 1
Many GRS ask raters to utilize a summary to rate overall ability or to rate a “global impression” of learners. This summary item can be a scale from fail to excellent, as in Figure 1.2 Another GRS may assess learners’ abilities to perform a task independently by having raters mark learners on a scale from “not competent” to “performs independently”. In studies, the overall GRS has shown to be more sensitive at discriminating between level of experience of learners than checklists.3,4,5 Other research has shown that GRS demonstrate superior inter-item and inter-station reliability and validity to checklists.16,7,8 GRS can be used across multiple tasks and may be able to better measure expertise levels in learners. 1
Some of the pitfalls of GRS are that they can be quite subjective. They also rely on “expert” opinion in order to be able to grade learners effectively and reliably.
Figure 1: assessment tool used by Hall et al in their study evaluating a simulation-based assessment tool for emergency medical residents using both a checklist and global assessment rating.2
Checklists, on the other hand, are thought to be less subjective, though some studies may argue this is false as the language used in the checklist can be subjective.10 If designed well, however, checklists provide clear step-by-step outlines for raters to mark observable behaviours. A well-designed checklist would be easy to administer so any teacher can use it (and not rely on experts to administer the tool). By measuring defined and specific behaviours, checklists may help to guide feedback to learners.
However, some pitfalls of checklists are that high scores have not been shown to rule out “incompetence” and therefore may not be accurate at evaluating skill level. 9.10 Checklists may also comment on multiple areas of competence, which may attribute to lower-item reliability.1 Other studies have found that despite checklists being theoretically easy to use, the inter-rater reliability was consistently low.9 However, a systematic review of the literature found that checklists performed similarly high to GRS in terms of inter-rater reliability. 1
|TABLE 1: Pros and Cons of Global Rating Scales and Checklists|
|Global Rating Scores
|§ Higher internal reliability
§ More sensitive in defining level of training
§ Higher inter-station reliability and generalizability
|§ Less precise
§ Subjective rater judgement and decision making
§ May require experts or more rater training in order to rate learners
|§ Good for the measurement of defined steps or specific components of performance
§ Possible more objective
§ Easy to administer
§ Easy to identify define actions for learner feedback
|§ Possibly lower reliability
§ Requires dichotomous ratings, possibly resulting in loss of information
With the move towards competency-based education, the use of simulation will play an important role in evaluating learners’ competencies. Simulation-based assessments allows for direct evaluation of individuals knowledge, technical skills, clinical reasoning, and teamwork. Assessment tools play an important component of medical education.
An optimal assessment tool for evaluating simulation would be reliable, valid, comprehensive, and allow for discrimination between learners abilities. Global Rating Scales and Checklists each have their own advantages and pitfalls and each may be used for the assessment of specific outcome measures. Studies suggest that GRS have some important advantages over checklists, yet the evidence for checklists appears slightly improved than previously thought. Yet, whichever tool is chosen, it is critical to design and test the tool to ensure that it appropriately assesses the desired outcome. If feasible, using both a Checklist and Global Rating Scale would help to optimize the effectiveness of the sim-based education.
1 Ilgen JS et al. A systematic review of validity evidence for checklists versus global rating scales in simulation-based assessment. Med Educ. 2015 Feb;49(2):161-73
2 Hall AK. Development and evaluation of a simulation-based resuscitation scenario assessment tool for emergency medicine residents. CJEM. 2012 May;14(3):139-46
3 Hodges B et al. Analytic global OSCE ratings are sensitive to level of training. Med Educ. 2003;37:1012–6
4 Morgan PJ et al. A comparison of global ratings and checklist scores from an undergraduate assessment using an anesthesia simulator. Acad Med. 2001;76(10) 1053-5
5 Tedesco MM et al. Simulation-based endovascular skills assessment: the future of credentialing? J Vasc Surg. 2008 May;47(5):1008-11
6 Hodges B at al. OSCE checklists do not capture increasing levels of expertise. Acad Med. 1999;74:1129–1134
7 Hodges B and McIlroy JH. Analytic global OSCE ratings are sensitive to level of training. Med Educ. 2003;37:1012–1016
8 Regehr G et al. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med. 1998;73:993-7
9 Walsak A et al. Diagnosing technical competence in six bedside procedures: comparing checklists and a global rating scale in the assessment of resident performance. Acad Med. 2015 Aug;90(8):1100-8
10 Ma IW et al. Comparing the use of global rating scale with checklists for the assessment of central venous catheterization skills using simulation. Adv Health Sci Educ Theory Pract. 2012;17:457–470