This critique on validity and how it relates to simulation teaching was written by Alia Dharamsi, a PGY 4 in Emergency Medicine at The University of Toronto and 2017 SHRED [Simulation, Health Sciences, Resuscitation for the Emergency Department] Fellow.
When designing simulation exercises that will ultimately lead to the assessment and evaluation of a learner’s competency for a given skill, the validity of the simulation as a teaching tool should be addressed on a variety of levels. This is especially relevant when creating simulation exercises for competencies outside of the medical expert realm such as communication, team training and problem solving.
As a budding resuscitationist and simulationist, understanding validity is vital to ensuring that the simulation exercises that I create are actually measuring what they intend to measure, that is, they are valid (Devitt et al.). As we look ahead to Competency Based Medical Education (CBME), it will become increasingly important to develop simulation exercises that are not only interesting and high-yield with respect to training residents in critical skills, but also have high validity with respect to reproducibility as well as translation of skills into real world resuscitation and patient care.
In order to better illustrate the various types of validity and how they can affect simulation design, I will present an example of an exercise I implemented as I was tasked with teaching a 5 year old to tie her shoelaces. In order to do so I taught her using a model, very similar to this one I found on Pinterest:
We first learned the rhyme, then used this template to practice over and over again. The idea behind using the model was to provide the reference of the poem right next to the shoes, but also to enlarge the scale of the shoes and laces, since her tiny feet meant tiny laces on shoes that were difficult for her to manipulte. Also, we could do this exercise at the table, which allowed us to be comfortable as we learned. At the end of the exercise, I gave her a “test” and asked her to tie the cardboard shoes to see if she remembered what we learned. While there was no rigorous evaluation
scheme, the standard was that she should be able to tie the knot to completion (competency), leading to two loops at the end.
I applied my simulation learning to this experience to asses the validity of this exercise in improving her ability to tie her laces. The test involved her tying these laces by herself without prompting.
Face validity: Does this exercise appear to test the skills we want it to?
Very similar to “at face value,” face validity is how much a test or exercise looks like it is going to measure what it intends to measure. This can be assessed by an “outsider” perspective, like her mom if she feels that this test could measure her child’s ability to tie a shoe. Whether this test works or not is not the concern of face validity, rather it is whether it looks like it will work (Andale). Her mom thought this exercise would be useful in learning how to tie shoes, so face validity was achieved.
Content validity: Does the content of this test or exercise reflect the knowledge the learner needs to display?
Content validity is the extent to which the content in the simulation exercise is relevant to what you are trying to evaluate (Hall, Pickett and Dagnone). Content validity requires an understanding of the content required to either learn a skill or perform a task. In Emergency Medicine, content validity is easily understood when considering a simulation exercise designed to teach learners to treat a Vfib arrest—the content is established by the ACLS guidelines, and best practices have been clearly laid out. For more nebulous skill sets (communication, complex resuscitations, rare but critical skills like bougie assisted cricothyroidotomies, problem solving, team training), the content is not as well defined, and may require surveys from experts, panels, and reviews by independent groups (Hall, Pickett and Dagnone). For my shoelace tying learner, the content was defined as being a single way to tie her shoelaces, however it did not include the initial lacing of the shoes or how to tell which shoe is right or left, and most importantly, the final test did not include these components. Had I tested her on lacing or appropriately choosing right or left, I would have not had content or face validity. This speaks to choosing appropriate objectives for a simulation exercise—objectives are the foundation upon which learners develop a scaffolding for their learning. If instructors are going to use simulation to evaluate learners, the objectives will need to clearly drive the content, and in turn the evaluation.
Construct Validity: Is test structured in a way that actually measures what it claims to?
In short, construct validity is assessing if you are measuring what you intend to measure.
My hypothesis for our exercise was that any measurable improvement in her ability to tie her shoelaces would be attributable to the exercise, and that with this exercise she would improve her ability to complete the steps required to tie her shoelaces. At the beginning of the shoelace tying exercise, she could pick up the laces, one in each hand, and then looked at me mostly blankly for the next steps. At the end of the exercise and for the final “test,” she was able to hold the laces and complete the teepee so it’s “closed tight” without any prompting. The fact that she improved is evidence to support the construct, however construct validity is an iterative process and requires different forms of evaluation to prove the construct. To verify construct validity, other tests with similar qualities can be used. For this shoelace tying exercise, we might say that shoelace tying is a product of fine motor dexterity and fine motor dexterity theory states that as her ability to perform other dexterity based exercises (tying a bow, threading beads onto a string) improves, so would her performance in her test. To validate our construct, we could they perform the exercise over time and see if her performance improves as her motor skills develop, or compare her performance on the test to an older child/adult who would have better motor skills and would perform better on the test.
External validity: Can the results of this exercise or study be generalized to other populations or settings, and if so, which ones?
With this shoelace tying exercise, should the results be tested and a causal relationship be established between this exercise and ability to tie shoes, then the next step would be to see if the results can be generalized to other learners in different environments. This would require further study and a careful selection of participant groups and participants to reduce bias. This would also be an opportunity to vary the context of the exercise, level of difficulty, and to introduce variables to see if the cardboard model could be extrapolated to actual shoe tying.
Internal validity: Is there another cause that explain my observation?
With this exercise, her ability to tie laces improved over the course of the day. In order to measure internal validity, it is important to assess if any improvement or change in behaviour could be attributed to another external factor (Shuttleworth). For this exercise, there was only one instructor and one student in a consistent environment. If we had reproduced this exercise using a few novice shoelace tiers and a few different instructors it may add confounders to the experiment which would then make it less clear to assess if improvements in shoelace tying are attributed to the exercise or the instructors. Selection bias can also affect internal validity— for example selecting participants who were older (and therefore had more motor dexterity to begin with) or who had previous shoelace tying training would likely affect the outcome. For simulation exercises, internal validity can be confounded by multiple instructors, differences in the mannequin or simulation lab, as well as different instructor styles which may lead to differences in learning. Overcoming these challenges to internal validity is partly achieved by robust design, but also by repeating the exercise to ensure that the outcomes are reproducible across a wider variety of participants than the sample cohort.
There are many types of validity, and robust research projects require an understanding of validity to guide the initial design of a study or exercise. Through this exercise in validity I was able to better take the somewhat abstract concepts of face validity and internal validity and ground them into practice through a relatively simple exercise. I have found that doing this has helped me form a foundation in validity theory, which I can now expand into evaluating the simulation exercises that I create.
1) Andale. “Face Validity: Definition and Examples.” Statistics How To. Statistics How to 2015. Web. October 20 2017.
2) Devitt, J. H., et al. “The Validity of Performance Assessments Using Simulation.” Anesthesiology 95.1 (2001): 36-42. Print.
3) Hall, A. K., W. Pickett, and J. D. Dagnone. “Development and Evaluation of a Simulation-Based Resuscitation Scenario Assessment Tool for Emergency Medicine Residents.” CJEM 14.3 (2012): 139-46. Print.
4) Shuttleworth, M. (Jul 5, 2009). Internal Validity. Retrieved Oct 26, 2017 fromExplorable.com: https://explorable.com/internal-validity
5) Shuttleworth, M. (Aug 7, 2009). External Validity. Retrieved Oct 26, 2017 from Explorable.com: https://explorable.com/external-validity