585
1 INTRODUCTION
Lifeboat training is normally performed in controlled
conditions to minimize the risk to trainees and
equipment. Trainees are given limited or no
opportunity to practice skills in operational scenarios
that represent offshore emergencies. For this reason,
human performance in emergencies is difficult to
predict due to the limited data that is available.
Forecasts of coxswains’ skill transfer to real-life
operational scenarios have relied on experts’ opinion.
Even so, there is limited information on how much
skills learned in lifeboat training transfer to adverse
weather conditions. The modelling of human
performance in harsh environments has not been
possible due to the scarcity of human performance
data.
With the advent of lifeboat simulator technology, it
is now possible for trainees to practice in weather
conditions typical of their location of operation and to
apply their skills in realistic emergency scenarios.
Simulation provides the possibility to apply
knowledge in applications in highly contextualized
environments that are representative of plausible
emergencies. Research has shown that practice in
realistic scenarios helps development of mental
models to improve performance (Klein, 2008). The
study of human performance using simulation is
evident in other operations including flight
(McClernon et al. 2011), medical (Stefandis et al. 2007)
and marine (Sellberg, 2017) training. Lifeboat training
data can now be collected to assess the amount of
practice needed to acquire skills and to evaluate how
skills learned in practice transfer to new scenarios
(Billard, 2019).
Using Bayesian Networks to Model Competence of
Lifeboat Coxswains
R
. Billard
V
irtual Marine, St. John’s, Newfoundland, Canada
J
. Smith, M. Masharraf & B. Veitch
Memorial University of Newfoundland
, St. John’s, Newfoundland, Canada
ABSTRACT: The assessment of lifeboat coxswain performance in operational scenarios representing offshore
emergencies has been prohibitive due to risk. For this reason, human performance in plausible emergencies is
difficult to predict due to the limited data that is available. The advent of lifeboat simulation provides a means
to practice in weather conditions representative of an offshore emergency. In this paper, we present a
methodology to create probabilistic models to study this new problem space using Bayesian Networks (BNs) to
formulate a model of competence. We combine expert input and simulator data to create a BN model of the
competence of slow-speed maneuvering (SSM). We demonstrate how the model is improved using data
collected in an experiment designed to measure performance of coxswains in an emergency scenario. We
illustrate how this model can be used to predict performance and diagnose background information about the
student. The methodology demonstrates the use of simulation and probabilistic methods to increase domain
awareness where limited data is available. We discuss how the methodology can be applied to improve
predictions and adapt training using machine learning.
http://www.transnav.eu
the
International Journal
on Marine Navigation
and Safety of Sea Transportation
Volume 14
Number 3
September 2020
DOI:
10.12716/1001.14.03.09
586
Data collected from a lifeboat simulator allow us to
assess performance on tasks that were prohibitive to
do, even in calm water training. This new data can be
used to model learning and skill acquisition using
probabilistic methods. We can study the interaction
between tasks using Bayesian Networks (BN) to
derive models of student competence (Millán and
Pérez De-la-Cruz, 2002). These models can be used to
study the relationship between training factors and to
examine how practice on related tasks impacts
performance. Due to scarcity of human performance
data, initial models of competence can be formed with
expert input (Groth et al., 2014). Performance data
collected from simulator studies can provide evidence
to inform models of trainee competence and validate
their predictive accuracy. Bayesian methods have
been used to model performance on lifeboat launch
and manoeuvring tasks in initial training in calm
weather conditions (Billard et al., 2020). Similar
approaches can be applied to model performance in
more adverse weather conditions.
In this paper, we present a methodology to form
probabilistic models of human performance that can
be used to study this new problem space. We use a
BN to define a model of the competence of slow-
speed maneuvering (SSM) based on tasks performed
in adverse weather conditions during an offshore
emergency. The model is derived from a combination
of expert prediction and data collected from an
experimental study.
The methodology is used to investigate the
following research goals:
how to formulate a BN model of competency using
knowledge of task type and available performance
measures; and,
how to combine expert knowledge and data
collected from simulator exercises to improve the
model’s predictive accuracy.
We evaluate the model using available data sets
from a simulator study on lifeboat coxswain
performance. We demonstrate how this model can be
used to 1) predict performance as trainees practice
skills in simulator scenarios, and 2) diagnose
background information about the student.
The paper presents an approach that is relevant to
training providers and researchers. We discuss how
to apply the methodology and resultant models to
study performance, improve expert assumptions, and
extend to training applications where new data sets
are being created. The models can be used to improve
training programs, adapt training exercises to
individual needs, and investigate human performance
in new scenarios.
2 BACKGROUND
2.1 Competence Slow Speed Maneuvering
We demonstrate the methodology of creating a BN
model of competence using evidence captured in an
experiment designed to study lifeboat training.
We must first frame our definition of competence
considering our research goals and the objective
measures that can be made. The concept of
competence is a diverse topic that has diverse
definitions. For our purposes, we consider how
competence is normally measured in marine training
through completion of demonstrable tasks specific to
learning objectives (IMO 2014, STCW 2010). We
consider competence the “existence of learnable
cognitive abilities and skills which are needed for
problem solving” as identified in research on skill
acquisition (Weinert, 2001). We assume that
completing tasks of a similar cognitive or physical
skill form demonstrates competence.
We construct a model of competence for the skill
of Slow Speed Maneuvering (SSM), as demonstrated
by the ability to complete tasks related to stopping a
lifeboat next to an object in the water. It is expected
that trained lifeboat operators have this required
competence to perform in an emergency. The
completion of tasks in an emergency scenario can
include stopping next to a number of objects
including a life raft, a person in the water (PIW), a
small vessel for transfer of personnel, or a large vessel
for securing the lifeboat for recovery. All tasks
considered under the competence of SSM require a
similar application of skills and similar performance
measures.
We assume there is a relationship between the
SSM tasks based on the type of skill needed to
perform the task. The maneuvering and stopping of a
lifeboat is primarily a physical task and requires
application of psychomotor skills to control the
lifeboat, including manipulation of lifeboat throttle,
steering, and making visual observations. There are
also cognitive skills, including deciding angles of
approach and judging distance from a target object.
Practice on SSM tasks within a practice scenario is
expected to improve performance on related SSM
tasks based on the similarity of the tasks and type of
skill that is applied.
2.2 Simulator exercise and experiment
We use data collected from a simulator scenario to
formulate our model and provide evidence that can
be used to inform and evaluate our methodology.
Data was taken from an experiment that used a
lifeboat simulator to study skill acquisition and
transfer in lifeboat coxswains. The experiment was
designed to evaluate how skills acquired in different
training programs transferred to a plausible
emergency event that required the launch and
maneuvering of a lifeboat in weather conditions
typical of offshore operations. Participants completed
training using different approaches over a year long
period and then participated in a new simulator
exercise for assessment purposes. The assessment
scenario included a combination of launch tasks and
on-water tasks. Details of the scenario are provided in
Figure 1. Additional details on the experimental test
plan and simulator used in the study can be found in
Billard et al. (2019).
In real scenarios or in simulator exercises, SSM
tasks form a part of the whole training exercise. Other
tasks may need to be completed, including inspecting
the lifeboat, launching the lifeboat, and navigating the
587
lifeboat. These tasks require application of different
skills and have different measures, as described in
previous research (Billard et al. 2018, Billard et al.
2020). As such, these tasks are not related to
competence of SSM and are excluded from the BN
model creation as practice on these tasks is predicted
to not affect SSM competence.
Figure 1. Simulator assessment scenario with SSM tasks
The data collected from the assessment scenario
provided evidence to evaluate SSM competence
modelled in a BN. The scenario contained 4 slow
speed maneuvering tasks including, in order,
stopping next to a Life Raft for inspection (LR),
picking up two persons in the water (PIW1, PIW2),
and stopping next to a Fast Rescue Craft (FRC) for
transfer of personnel. These tasks provide evidence
for the assessment of the SSM competence.
All participants completed the scenario at least
two times and data was collected for the maneuvering
tasks for each attempt. Tasks were completed in the
same order with each attempt. A total of 39
participants completed the study.
2.2.1 Measuring Performance
The rubric used to define completion of the SSM
task was derived from recognized training standards
and is based on expected performance identified by
Subject Matter Experts (SMEs). Each task requires
approaching an object from a preferred direction,
stopping close to the target, and maintaining a
stopping speed. The specific parameters used to
measure success differed slightly for each task (i.e.
light contact with a vessel is acceptable for coming
alongside a vessel, but not allowed for a PIW). Table 1
provides an outline of task objectives and the
corresponding measures used in the simulator
exercise. Completion of tasks was based on several
simultaneous measures captured by the simulator,
each of which had to be performed correctly to be
considered a successful completion. Additional
details on the scoring measures and rubric has been
presented previously (Billard et al. 2018).
2.2.2 Bayesian Network Modelling
Bayesian Networks (BN) use a graphical structure
to represent the relationship between several random
variables as represented in a directed acyclic graph
(DAG). A sample BN DAG is provided in Figure 1.
Nodes (a,b,c,d,e) represent the variables and arcs
(arrows) represent the probabilistic relationship
between the variables. Bayesian inference algorithms
create a relationship between latent variables, which
are inferred, based on the state of observed variables.
Figure 2. Sample Bayesian network DAG
Table 1. Slow speed manueuvering competence tasks
__________________________________________________________________________________________________
Task Task Task Objective Measures
Identifier Description
__________________________________________________________________________________________________
LR Stop at Approach a static object accounting for wind and wave direction. direction of approach
a Life Raft Use a speed to allow stopping. Stop close to Life Raft (2-3 boat speed at stop
lengths) and maintain position time stopped
PIW Recover a Person Approach a drifting PIW accounting for wind and waves to contact speed
in the Water (PIW) minimize chance of contact. Use a speed to allow stopping. heading at stop
Stop close enough to PIW to allow pickup and maintain position number of attempts
in waves
FRC Come Alongside Approach a FRC accounting for wind and wave direction. Use
a Fast Response a speed to allow stopping. Stop close to vessel (less than 0.5 meters)
Craft (FRC) and at an angle to allow personnel transfer and maintain position.
__________________________________________________________________________________________________
588
Building a BN includes the following steps:
1 Defining the variables that are being studied, both
latent and observable, creating the nodes of the
BN.
2 Defining the relationships between variables using
arcs. The arcs represent a causal influence between
the variables. Variables in the network that are not
graphically connected are conditionally
independent of each other (i.e. a and b are
conditionally independent).
3 For each of the variables, defining the probability
conditions with parent variables through
Conditional Probability Tables (CPTs). The
probabilities can be learned from real data or
defined by experts.
Detailed description of BNs and how they are
created is provided in other literature (de Clerk et al.,
2013, Millán et al., 2010).
Creating a BN to use observable evidence to study
an inherent competence has applications in training
frameworks including Intelligent Tutoring Systems
(ITS) (Millán and Perez-De-La-Cruz., 2002, Käser et al.
2017) and Evidence Centered Design (ECD) (Mislevy
et al., 2004). In these frameworks, the BN forms a
model of the competency that is being investigated
(the student model) and identifies the relationships to
the performance measures (the evidence) in the
practice scenario (the activity). The relationships form
a construct of competence, a latent variable, that can
be measured through the collection of performance
data, an observable variable.
In our case, we use the observable completion of
SSM tasks to quantify the latent variable of SSM
competence using evidence collected through a
simulation study.
3 METHODOLOGY
We use a BN methodology to model competence and
predict the performance of lifeboat operators as they
apply skills learned in training to a new scenario. We
create a BN model using observable measures from a
simulation scenario designed to evaluate coxswain
performance in a plausible emergency. We use a
combination of expert prediction and simulator data
to create and revise our model. The methodology
creates a student model of SSM competence that can
be used for the prediction of performance on tasks
and the diagnostic study of causal relationships
between model variables.
The steps in the methodology include the
following, as outlined in figure 3:
1 Defining a generic BN student model of
competence - based on completion of tasks that are
considered similar in the type of skill applied
2 Characterizing the BN model as a SSM competence
student model - based on the evidence gathered in
a simulator practice exercise
3 Creating the initial CPTs of the model nodes based
on expert estimates
4 Refining the CPTs based on experimental data -
using the simulator experimental data to tune the
model parameters
5 Validating the model accuracy for predictive and
diagnostic use cases using simulator data
Figure 3. Methodology of creating and validating a SMM
competence Bayesian network
We perform two validation cases to show how the
BN model can be applied and how the model changes
with new data or variables. We first demonstrate how
the predictive accuracy of the model changes as the
methodology is applied. We evaluate the predictive
accuracy of the model first formed with expert
estimates and then re-evaluate the predictive accuracy
after data have been used to refine the CPTs. We then
present an example of how new variables can be
added to the model and show how the model can be
applied to diagnose the relationship between the new
variable and observable evidence. The validation of
models is discussed in Section 4.
3.1 Step 1 - Defining a generic BN student model of
competence
We first describe the types of variables and
relationship assumptions for the BN student model.
We assume a latent variable of competence (C) and
relate to task evidence nodes (Ei), which can be
measured or observed in a scenario. The tasks are
related by the type of skills needed to complete the
tasks successfully.
To create the DAG, we assume a structure where
observable evidence of completing tasks changes the
probability of the competence, as described in
previous research (Millán and Pérez De-la-Cruz,
2002). The generic model is presented in Figure 4. In
the model structure, we assume a causal relationship
where the latent variable (C) causes the evidence E
1,
E
2, E3, … Ei. In this relationship, evidence about
mastering a task changes the probability of the latent
parent. Consequently, evidence about mastering C
changes the probability of its children (E
i) and
evidence about mastering a task affects the
probability of mastering the rest of the tasks on the
same level. This models assumes conditional
independence of the E
i given C (for each i = 1,…n). In
this DAG, the CPT parameters that need to be
identified are the prior probability of the competence,
P(C), and the conditional probabilities of the evidence
nodes
589
Figure 4. Competence model BN DAG
3.2 Step 2 Characterizing the BN Competence Model as
a SSM competence student model
We design the BN model to match the activity, in this
case the slow speed maneuvering exercises performed
in the simulator study.
Figure 5 shows the DAG for the experimental
study consisting of two scenarios, each having 4
evidence nodes. In the simulator study, the trainee
practiced the same scenario twice, creating two sets of
evidential nodes, as the trainee completed the same
tasks with each attempt. As an input of evidence in
the BN, the task was either considered to be
completed (Yes) or not completed (No) based on the
performance requirements set by SMEs to measure
successful completion of task.
Figure 5. Bayesian network DAG Simulator assessment
scenario
The structure of the model assumes a learning
effect with tasks practiced in a training session
consisting of multiple simulation exercises. We use a
dynamic model indicating the trainee’s competence
can be measured with each simulator exercise
attempt. We define a relationship between the
measure of competence in the first attempt (SSM
1)
and the measure of competence on the second attempt
(SSM
2). The relationship assumes the measure of
competence in the first attempt impacts the
probability of the second attempt through a defined
CPT
( )
{ }
21
|P SSM SSM
. Based on the similarity of the
task types it is expected that practice on any of the
task types can improve the performance on other
tasks, including future attempts at the same task
using the same scenario.
3.3 Step 3 Creating initial CPTs based on expert
estimates
The structure of the BN requires the definition of
CPTs including the prior probabilities of the SSM
competence and the conditional probability of
completing the evidence nodes (tasks) given the
competence.
For each of the tasks, we make predictions on the
relationship between having the SSM competence and
the ability to complete tasks. As defined in modelling
of human performance (Millán et al, 2002), we use
estimates of slip and guess to define the conditional
probabilities. In our context, a slip is the probability of
not being able to complete the task successfully
despite having the competence. The probability of
completing the task successfully when having the
competence
(
)
{ }
|
ii
P Task SSM
is therefore 1 s, where s
is the slip factor. A guess (g) is the probability of
completing the tasks successfully without having the
competence. The CPTs require definition of the
probability of completing the task whilst having the
competence (1 - s) and the probability of completing
the task while not having the competence (g).
We estimate the CPT parameters for each of the
evidence nodes and the conditional probabilities for
each of the competence variables. The probabilities of
slip and guess were estimated by SMEs and took into
the account the following:
1 The participants in the study had received initial
training and refreshed skills over a one-year
period. It was expected that some participants had
acquired enough skill to achieve competence.
2 The simulator scenario in the study had not been
practiced before and had challenging weather
conditions (moderate sea states). These factors
impact the probability of completing tasks that had
been practiced in previous training events in less
adverse weather.
3 The task of stopping next to a PIW is more difficult
to complete than stopping next to a life raft or
stopping next to an FRC (Billard et al. 2020). We
assume the probability of a slip is higher and the
probability of a guess is lower for the PIW task.
4 The performance of tasks in the simulator, either
successfully or unsuccessfully, is considered
practice. Competence is expected to increase as the
scenario is repeated. The probability of slip on
tasks is expected to reduce and the probability of a
guess is expected to increase.
In considering the type of task and the
environmental conditions, SMEs estimated that there
is a reasonable chance of slip given the difficulty of
the task and the expectation that people could make
errors despite having the competence. The
irregularity of wind, wave, and propulsion forces
create some variability in performance.
Environmental forces could have a sudden negative
impact (i.e. causing the vessel to overshoot position)
resulting in slip. The environmental forces can also
increase the chance of success of an inexperienced
driver (e.g. helping slow and stop a vessel that is
approaching too fast) creating a successful guess.
Table 2 provides a breakdown of the probabilities
used in the BN. These are considered an initial
estimate of the probabilities based on an expert
590
prediction. The assumed initial probability of having
the competence of SSM is estimated to be 60%, and
increases in probability in the second scenario. For the
evidence nodes, the probability of a successful
completion of task is assumed to be lower for tasks
that are more difficult. The assumed probability of
completing LR and FRC tasks was assumed to be
70%. The probability of completing the PIW task was
estimated as 60% due to the increase in slip factor as
the task is more challenging. Similarly, the assumed
probability of a guess for the tasks of LR and FRC was
assumed to be 30% and the estimated probability of a
guess for the PIW task was estimated as 20%. To
account for the effect of practice, the SSM competence
is expected to increase for the second scenario. The
assumed probability of a successful completion for
each task was increased by an increment of 10% and
the guess rate for each task was also assumed to
increase by an increment of 10%.
These estimates are an initial guess of expected
outcomes provided by subject matter experts. The
estimates are based on expert prediction as they could
not be derived from data. The next step in the
methodology uses experimental data to refine the
CPTs used in the BN.
3.4 Step 4 Refine CPTs based on experimental data
The BN model was created in modelling software,
GeNIe, developed by Decision Systems Laboratory of
the University of Pittsburgh. The DAG was based on
the relationship diagram provided in Section 3.2, and
the probabilities outlined in Section 3.3 were used to
create the CPTs for each of the nodes.
Data were collected in a simulator exercise, with
evidence collected for each of the 39 participants who
completed the two scenarios. The data set was split
randomly into two groups: a learning data set and a
validation data set. One set of the data (19 records)
was used to adjust the parameters of the BN (the
learning data) model and the second data set (20
records) was used to predict the accuracy of the
model (the validation data).
Conducting parameter learning in the Bayesian
Network is often termed training the BN. In this
exercise, the parameters of the BN CPTs are adjusted
in an effort to match the BN model predictions to the
outcomes of the learning data set. This exercise is
performed in the GeNIe modelling software, which
uses an EM algorithm to learn parameters from data
(Dempster, 1977). In our use case, we start training
the BN with the probabilities set by the experts. As we
have a small data set, we assume a low level of
confidence in the parameters (20%) to allow the
parameters to be flexible to change.
We are now able to make comparisons between
the original BN model, based on expert predictions,
and the updated model, trained with experimental
data.
4 VALIDATION CASES
4.1 Validation Case 1 - Evaluating model predictive
capability using task evidence
The validation data set is used measure the predictive
accuracy of the BNs. The initial models developed by
expert prediction and the trained models are applied
to a new data set (the validation data) to compare
each model’s predicted outcomes with evidence
provided in the data set.
Two validation steps are performed to show how
the methodology resulted in an improved BN model:
1 Testing the predictive accuracy of the BN with
initial expert predictions of CPT this step
evaluates the suitability of the probabilities
estimated by the SMEs.
2 Testing the predictive accuracy of the BN after
using the simulation data this validation shows
the impact of using additional simulator data to
revise the model parameters.
The validation demonstrates the use of BN for
prediction, as the model attempts to identify the most
likely occurrence of the evidence nodes. For each of
the validation exercises we consider the model’s
ability to predict the outcome of the final two tasks in
the simulation exercise (PIW22 and FRC22). These
two evidence nodes are selected as they are the last
two tasks performed in the simulator exercise.
Performance on these tasks is expected to be more
likely a result of competence gained through practice
than due to a random slip or guess. We compare the
predicted outcome of the evidence nodes from the BN
model to the actual outcome from the data set.
Table 2. Inputs to BN - Expert estimates
__________________________________________________________________________________________________
Scenario Attempt 1
__________________________________________________________________________________________________
( )
1
P SSM
60.0%
1
SSM
(
)
11
|
P LR SSM
( )
11
1|
P PIW SSM
( )
11
2|P PIW SSM
( )
11
|P FRC SSM
Y (1 - s) 70.0% 60.0% 60.0% 70.0%
N (g) 30.0% 20.0% 20.0% 30.0%
__________________________________________________________________________________________________
Scenario Attempt 2
__________________________________________________________________________________________________
1
SSM
( )
21
|P SMM SSM
Y (1 - s) 70.0%
N (g) 30.0%
2
SSM
( )
22
|P LR SSM
( )
22
1|P PIW SSM
( )
21
2|P PIW SSM
( )
22
|P FRC SSM
Y (1 - s) 80.0% 70.0% 70.0% 80.0%
N (g) 40.0% 30.0% 30.0% 40.0%
__________________________________________________________________________________________________
591
A benchmark comparison is made with a BN that
uses a uniform distribution for initial CPT parameters
for all latent and observable nodes. We use this BN to
make a comparison with a model that is formed with
no expert input and driven only by available data.
This approach disregards the expert predictions and
assumes an equal probability (50%) for completing or
not completing tasks, and related slip and guess
probabilities. The parameters are adjusted using the
same learning data and using the same learning
algorithm as in the expert prediction.
Table 3 shows the differences in prediction
accuracy of the BN models that were investigated.
The Table indicates the number of times the model
and validation set had a common outcome on
successful completion of task (Yes) or when tasks
were not successfully completed (No) for the 20
records in the set. The predictive accuracy of the BN
based on expert guesses was 75%, indicating the
expert informed probabilities were reasonable. The
predictive accuracy of the model increased slightly to
78% when trained with experimental data. The
approach of using expert input showed a much
higher predictive accuracy than a model trained from
uniform parameters. This outcome suggests that the
expert guess was needed to generate a suitable model
given the amount of available data.
Table 3. BN model predictions and comparisons
_______________________________________________
Initial Expert Expert Estimate Uniform
Estimate Trained Trained
_______________________________________________
Overall 75% (30/40) 78% (31/40) 48% (19/40)
PIW22
Combined 80% (16/20) 80% (16/20) 50% (10/20)
Yes 80% (8/10) 80% (8/10) 0% (0/10)
No 80% (8/10) 80% (8/10) 100% (10/9)
FRC2
Combined 70% (14/20) 75% (15/20) 45% (9/20)
Yes 100% (11/11) 73% (8/11) 0% (0/11)
No 33% (3/9) 78% (7/9) 100% (9/9)
_______________________________________________
The method also allows us to investigate how the
data set changed the BN CPTs from the initial expert
estimates. These changes provide insights on the
predicted competence and task difficulty, as a
refinement to the estimates initially made by the
SMEs. Table 4 presents the change in CPT from the
initial estimates provided in Table 2. The outcomes
show the initial probability of SSM competence
(SSM1) was lowered by 13%, indicating the initial
estimate of competence was too high. The outcomes
also show that most of the probability parameters for
successful PIW pickup for each attempt had to be
lowered, suggesting this task was more difficult than
predicted. The probabilities for stopping at a life raft
were increased for each attempt.
Given the limited amount of data that is available,
it is difficult to make conclusive remarks about the
final probabilities of the BN model. Additional data
are expected to further change the CPTs and increase
the predictive accuracy of the BNs.
4.2 Validation Case 2 Investigate diagnostic causal
relationship of background training
In this section we discuss how the BN can be used as
a diagnostic tool and identify causes given a set of
observations. We incorporate additional information
about the test participants and show how the model
can be used to associate performance to the new
information. We introduce a new evidence node,
Background Training (BT), to indicate whether the
participants received hands-on training during their
regular practice prior to performing the simulator
exercise. Participants who received hands-on training
in regular practice sessions were more likely to be
able to complete on-water tasks compared to those
who did not (Billard et al. 2019). This information is
known for all participants who completed the
simulator scenario and the related validation data
sets. 26 of 39 participants received hands-on training;
13 did not.
The updated BN for this model is provided in
Figure 6. The BT node is introduced and forms a
causal relationship having an influence on the starting
competence of the trainee (SSM1).
We again define the conditional probabilities for
the influence of training on competence using an
expert estimate as there were no existing data
available. It is assumed that those who received
hands-on training had a higher probability of having
the competence, but not greater than 60% as training
had not been received in the weather conditions used
in the assessment scenario. It was assumed the
participants who had not received hands-on training
had a lower probability of having the competence,
having not received any scenario-based practice. The
probability of having received initial training was set
to 50%, making the initial probability random. This
allows the model to predict the causal affect based on
the evidence nodes from the simulator experiments
and inherent relationships. Table 5 shows the new
CPT values defined in the BN.
Table 4. Change in BN probabilities trained model
__________________________________________________________________________________________________
Scenario Attempt 1
__________________________________________________________________________________________________
( )
1
P SSM
47% (-13%)
1
SSM
( )
11
|P LR SSM
( )
11
1|P PIW SSM
( )
11
2|P PIW SSM
( )
11
|P FRC SSM
Y (s) 76.1% (+ 6.1%) 57.4% (- 2.6%) 50.1% (-9.9%) 63.7% (- 6.3%)
N (g) 41.5% (+11.5%) 16.6% (- 3.4%) 13.4% (- 6.6%) 23.8% (- 6.2%)
__________________________________________________________________________________________________
Scenario Attempt 2
__________________________________________________________________________________________________
1
SSM
( )
21
|P SMM SSM
Y (1 - s) 67.7% (- 2.3%)
N (g) 25.6% (- 4.4%)
2
SSM
(
)
22
|P LR SSM
( )
22
1|P PIW SSM
( )
21
2|P PIW SSM
( )
22
|P FRC SSM
Y (1 - s) 83.8% (+ 3.8%) 69.3% (- 0.7%) 70.4% (+ 0.4%) 81.2% (+ 1.2%)
N (g) 48.4% (+ 8.4%) 26.4% (- 3.6%) 28.6% (- 1.4%) 32.1% (+ 2.1%)
__________________________________________________________________________________________________
592
Figure 6. BN with training evidence introduced
Table 5. Background training (BT) conditional probabilities
_______________________________________________
P(BT) 50%
BT
( )
1
|P SSM Training
Y (1-s) 60%
N (g) 40%
_______________________________________________
We perform a similar validation procedure
outlined in section 4.1. We compare the BN model
prediction of BT to the evidence from the validation
data set. The evidence in this case is knowledge of the
trainee’s background in terms of having received
hands-on training (Yes) or not (No).
Table 6 indicates the model correctly guessed if
background training had been received for 65% of the
records in the data set. This outcome suggests that
additional data or a revised estimate is needed to
refine the model and increase the predictive accuracy
for this evidence node. As highlighted in Table 7, the
conditional probabilities of having the SSM1
competence decreased for both cases (with or without
having received background training) when data
were used to train the model. These changes in
probability can be used to refine the expert estimate
or initial CPT for new data sets.
Table 6. Diagnostic accuracy background training
_______________________________________________
Expert Estimate Trained
_______________________________________________
BT
Overall 65% (13/20)
Yes 54% (7/13)
No 86% (6/7)
_______________________________________________
Table 7. Change in SSM1 CPTs
_______________________________________________
BT
(
)
1
|P SSM Training
_______________________________________________
Y (1-s) 55.4% (-4.6%)
N (g) 35.3% (-4.7%)
_______________________________________________
5 DISCUSSION
The methodology in this paper presents an approach
to use available information and background expert
experience to create probabilistic models of human
performance in scenarios for which there is limited
available data. This approach can be applied to
training applications where the desire is to investigate
how observable measures of performance impact
skills acquisition and competence. We chose lifeboat
coxswain training as the use of simulation has
extended training capabilities, and data from new
scenarios are available to study this problem area.
We presented a method to develop a student
model of lifeboat competence that integrates expert
prediction and evidence from a simulator experiment.
We derived the BN model for SSM competence using
a framework that has been applied in ITS and ECD to
use observable evidence from a simulation
assessment to design the model. We demonstrated
how the BN model can be used to predict
performance and diagnose causal relationships,
illustrating how the model can be applied to
investigate relationships between latent and
observable variables.
The validation examples indicate that embedding
expertise in the model can result in a high initial
predictive accuracy, despite using a small data set.
The model’s predictive accuracy was further
increased as simulator data were used to inform the
BN probabilities. This outcome indicates that domain
knowledge is valuable in initializing probabilistic
models in cases where there is limited data. It is
expected that the model’s predictive accuracy would
improve further if the CPTs are trained with a large
data set derived from user performance data.
The scalability of the BN model is a strength that
can be further explored. We presented a model of
lifeboat coxswain competence that is very narrow (a
single competence) and derived from a scenario with
fixed weather and tasks. For this study, the modelling
of competency is specific to the environmental
conditions used in the scenario. In a training program
involving multiple practice exercises, the number and
order of task types can be varied, and the level of
difficulty can change with environmental conditions
(i.e. increase in wave height or wind, day or night).
The probabilities are expected to be different in
scenarios that are easier or more difficult. Additional
background information can also be considered,
including time between training events and student
training experience. The relationship between other
competencies can also be established (e.g. practice in
maintaining heading seakeeping exercises may
improve control of the vessel in SSM).
Figure 7 shows an example of how the BN could
be expanded to explore causal relationships between
variables as more information on the student is
known and as evidence is gathered through a training
program. These BNs can become complex as they
form a detailed model of student competence. These
models can be used to investigate factors that affect
performance while gaining insights on human
performance limitations.
593
Figure 7. Sample BN with expanded relationships
representing a lifeboat training program
The formation of a student model using BNs offers
additional means to apply probabilistic models to
improve training. We have presented a model to
study performance based solely on assessment of task
performance (i.e. was the task completed successfully
or not). The model can be expanded to investigate the
specific behaviours performed by the participant in
completing the task to study which actions result in
the highest probability of success. This type of model
tracing is possible given the measures identified in the
rubric. The outcomes can be used to model novice
and expert performance as inputs to ITS (Millán et al,
2011). The probabilistic modelling of the BN can be
integrated with machine learning algorithms to build
adaptive training applications to customize training
material to an individual’s strengths and weaknesses
based on evidence gathered in training.
To conclude the discussion, we make four
recommendations to researchers who wish to use the
methodology to study human performance and
training for situations that have limited data. First, we
advise the student model to be built as early as
practicable to allow for the student BN to be informed
with evidence that will be collected. This approach
will allow for alignment between the student model
with research objectives, and scenarios can be
designed to study relationships of interest. Second,
we recommend a balance of expert and data-driven
input in the probabilistic models. As demonstrated,
the modelling of CPTs using expert input can provide
a model with suitable predictive accuracy. In cases
where data are being collected for scenarios with
limited initial data, the expert prediction is a guess.
Probabilistic models derived from large data sets are
expected to have a higher predictive accuracy. We
also suggest that users consider the extended uses of
relationship modelling of the BN approach. The BN
models can be restructured, and new variables added
(latent or observable) to investigate causal
relationships and influence of new information.
Finally, we suggest the use of simulation to perform
assessments and collect data for situations that are
normally prohibitive due to risk. Simulation scenarios
extend studies to new operating conditions and
provide a consistent measure of performance. Digital
measures from a simulator exercise can input directly
into probabilistic models such as BNs to apply
machine learning and adapt training in real time.
ACKNOWLEDGEMENTS
We thank Petroleum Research Newfoundland and Labrador
and the Industrial Research Assistance Program of the
National Research Council who sponsored the study. The
authors acknowledge with gratitude the support of the
NSERC/Husky Energy Industrial Research Chair in Safety
at Sea.
REFERENCES
Billard, R., Smith, J.J.E. (2018). Using simulation to assess
performance in emergency lifeboat launches.
Proceedings, e Interservice/Industry Training,
Simulation, and Education Conference (I/ITSEC). Paper
number 19179.
Billard, R., Smith, J., Veitch B., (2019) Assessing lifeboat
coxswain training Alternatives using a simulator. The
Journal of Navigation, Published online by Cambridge
University Press: 19 September 2019.
Billard, R., Musharraf, M., Smith, J., Veitch B., (2020), Using
Bayesian methods and simulator data to model lifeboat
coxswain performance. WMU Journal of Maritime
Affairs. Published May 2020.
https://doi.org/10.1007/s13437-020-00204-0
de Klerk, S., Veldkamp, B.P., Eggen, T., (2015). Psychometric
analysis of the performance data of simulation-based
assessment: A systematic review and a Bayesian
network example. Computers & Education 85 (2015), 23-
34.
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977), Maximum
Likelihood from Incomplete Data via the EM Algorithm.
Journal of the Royal Statistical Society. Series B
(Methodological), Vol. 39, No. 1. (1977), pp.1-38.
Groth K., Smith, C., Swiler, L. (2014). A Bayesian method for
using simulator data to enhance human error
probabilities assigned by existing HRA methods.
Reliability and System Safety 128 (2014), 32-40
International Maritime Organization., & International
Conference on Training and Certification of Seafarers
(2010). STCW including 2010 Manila Amendments, 2017
Edition.
International Maritime Organization. (2014). International
Convention for the Safety of Life at Sea (SOLAS),
Consolidated Edition. London: International Maritime
Organization.
Käser, T., Klingler, S., Schwing, A., Gross, M. (2017).
Dynamic Bayesian Networks for student modeling. IEEE
Transactions on Learning Technologies, Vol. 10, No. 4.
Oct.-Dec. 1 2017.
Klein, G., (2008), Naturalistic decision making. Human
Factors: The Journal of Human Factors and Ergonomic
Society, 50(3), 456-460.
McClernon, C. K., McCauley, M. E., O’Connor, P. E., &
Warm, J. S. (2011). Stress training improves performance
during a stressful flight. Human Factors: The Journal of
the Human Factors and Ergonomics Society, 53(3), 207-
218.
Millán, E., Perez-De-La-Cruz, J.L., (2002). A Bayesian
diagnostic algorithm for student modeling and its
evaluation. User Modeling and User-Adapted
Interaction 12: 281-330, Kluwer Academic Publishers,
Netherlands
Millán , E., Loboda, T., Perez-de-la-Cruz, J.L. (2010).
Bayesian networks for student model engineering.
Computers and Education, 55, 1663-1683
Mislevy, R. J., Almond, R. G., & Lukas, J. (2004). A brief
introduction to evidence-centered design. CSE technical
Report. Los Angeles: The National Center for Research
on Evaluation, Standards, and Student Testing
(CRESST). Retrieved from
http://www.cse.ucla.edu/products/reports/r632.pdf.
594
Sellberg, C. (2017). Simulators in bridge operations training
and assessment: a systematic review and qualitative
synthesis. WMU Journal of Maritime Affairs, 16(2), 247-
263.
Stefanidis, D., Korndorffer, J.R., Markley, S., Sierra, R.,
Heniford, B.T., & Scott, D.J. (2007). Closing the gap in
operative performance between novices and experts:
does harder mean better for laparoscopic simulator
training? Journal of the American College of Surgeons,
205(2), 307-313.
Weinert, F. E. (2001): Competencies and Key Competencies:
Educational Perspective. International Encyclopedia of
the Social and Behavioral Sciences, vol. 4, Elsevier, 2433
2436.