831
1 INTRODUCTION
Maritime education and training (MET) are governed
by the International Convention on Standards of
Training, Certification and Watchkeeping for Seafarers
(STCW) [3], establishing minimum qualification
standards for seafarers globally. Ensuring compliance
with these standards necessitates rigorous assessment
and timely, detailed feedback to trainees. However,
providing individualized, standards-aligned feedback
at scale remains a significant challenge for maritime
instructors, particularly given institutional variations
in technological readiness and adoption of adaptive
learning technologies [1], [2].
Recent advances in large language models (LLMs)
and Retrieval-Augmented Generation (RAG)
architectures [4] offer new opportunities to automate
the feedback process while maintaining strict
regulatory alignment. This study implements and
evaluates a RAG-enhanced feedback system, utilizing
the Mistral-7B model [5] optimized with QLoRA [6] for
efficient local deployment. The system is designed to
provide STCW-compliant feedback on both multiple-
choice and short essay maritime assessment questions.
Building on prior research identifying technological
proficiency (β=0.457, p<0.001) and institutional
readiness (β=0.341, p<0.001) as key factors for
technology adoption in maritime education [1], [2], this
implementation addresses critical barriers by
automating feedback generation and ensuring
regulatory compliance. The main objectives are to (i)
develop a feedback system that generates STCW-
compliant responses, (ii) optimize LLM deployment
for resource-constrained environments, (iii) implement
LLM-based Maritime Training Feedback System:
Implementing RAG-Enhanced Assessment Analysis
with STCW Compliance
S. Baradziej
University of Tromsø the Arctic University of Norway, Tromsø, Norway
ABSTRACT: This paper presents the implementation and evaluation of a Retrieval-Augmented Generation
(RAG) system designed to provide automatic STCW- compliant feedback on maritime assessment questions.
Building on preliminary findings from ongoing research into technological proficiency [1] (β=0.457) and
institutional readiness [2] (β=0.341), this implementation addresses a critical gap: the need for automated feedback
systems that maintain regulatory alignment while reducing instructor workload. The system utilizes the Mistral-
7B large language model optimized with QLoRA for efficient local deployment, combined with a RAG
architecture to ensure contextually relevant feedback. Evaluation results demonstrate the system’s ability to
generate accurate feedback with response times under 15 seconds and STCW concept coverage of 85%, addressing
key implementation barriers identified in our previous studies. The paper discusses how this implementation
addresses technological proficiency barriers (β=0.457, p<0.001) and enhances perceived usefulness through
automated, standards-compliant feedback that supports both individual competency development and
institutional readiness.
http://www.transnav.eu
the International Journal
on Marine Navigation
and Safety of Sea Transportation
Volume 19
Number 3
September 2025
DOI: 10.12716/1001.19.03.16
832
RAG for relevant context retrieval, (iv) support diverse
assessment formats, and (v) evaluate system
performance in terms of response time and accuracy.
This work demonstrates the practical integration of
domain-specific knowledge into generative AI systems
for MET, highlighting the potential for automated
feedback to enhance both individual competency
development and institutional readiness.
2 LITERATURE REVIEW
Large Language Models (LLMs) have had a great
impact on improvement of automated feedback in
education by offering more accurate and contextually
relevant responses compared to traditional rule-based
approaches. Kasneci et al. [7] conducted a
comprehensive analysis of ChatGPT’s potential in
educational settings, highlighting its abilities to
generate personalized and relevant feedback. The
downside, however, was its significant struggles in
factual accuracy and alignment with curriculum. They
emphasized the need for domain-specific knowledge
integration when applying LLMs to specialized
educational contexts.
Building further on their statements, Kung et al. [8]
performed a systematic review and evaluation of GPT-
4 capabilities in generating educational feedback across
multiple disciplines. The findings showed that while
GPT-4 produced helpful responses with good
pedagogical framing, it often lacked the depth of
domain expertise required for specialized fields like
medicine or maritime education. This limitation is
particularly relevant for STCW-compliant feedback,
which requires precise knowledge of maritime
regulations and practices, without any room for
creativity or tacit knowledge.
The application of LLMs in specialized educational
domains has been further explored by Chiang et al. [9],
who investigated various approaches to enhance LLM
performance in domain-specific educational tasks.
Their research suggests that certain LLMs can serve as
reliable, cost-effective alternatives to human
evaluation for assessing of text quality in specific
contexts. Recent work by Tam et al. [10] has focused
specifically on the evaluation of LLM-generated
feedback in specific contexts, proposing a framework
for assessing feedback quality along dimensions of
accuracy, helpfulness, and personalization. Their
research provides valuable metrics for evaluating
automated feedback systems, which were the
inspiration and due to which this research tried to
adapt for STCW-compliant feedback evaluation.
2.1 Retrieval-Augmented generation
Retrieval-Augmented Generation (RAG) enhances the
factual accuracy and domain alignment of LLM
outputs by integrating external knowledge sources
during response generation. Lewis et al. [11]
demonstrated that RAG architectures significantly
reduce factual errors in knowledge-intensive tasks.
Zhang et al. [12] further improved RAG with RAFT,
which incorporates retrieval during both fine-tuning
and inference, resulting in substantial gains on
domain- specific benchmarks. For long-form content,
Qi et al. [4] introduced Long²RAG and the Key Point
Recall (KPR) metric, showing that RAG-based systems
can effectively capture essential information from
extended documents. Despite these advances, the
application of RAG in maritime education-particularly
for STCW compliance-remains largely unexplored,
representing a critical gap addressed by this study.
2.2 Model optimization for local deployment
Deploying large language models locally presents
significant computational challenges. Dettmers et al.
[6] introduced QLoRA (Quantized Low-Rank
Adaptation), a technique that enables efficient fine-
tuning of large language models by using 4-bit
quantization and low-rank adapters. This approach
reduces memory requirements while maintaining
model performance, making it practical to deploy
billion-parameter models on consumer hardware.
Hu et al. [13] developed the original LoRA (Low-
Rank Adaptation) method, which freezes the pre-
trained model weights and injects trainable rank
decomposition matrices into each layer of the
Transformer architecture. This significantly reduces
the number of trainable parameters while preserving
model quality.
2.3 Automated assessment in maritime education
Automated assessment in maritime education is
shaped by the requirements of the STCW Convention,
which mandates standardized training and evaluation
for seafarers [3]. Emad and Roth [14] highlighted the
importance of aligning assessment methods with
STCW standards to ensure regulatory compliance and
operational safety. While simulation-based
assessments have been shown to enhance practical skill
acquisition and feedback quality, the literature reveals
a scarcity of research on automated feedback
generation specifically tailored for STCW compliance.
This gap underscores the need for systems that can
deliver accurate, standards-aligned feedback at scale in
MET contexts.
Building on our previous findings regarding
maritime educators’ technological proficiency [1] and
institutional readiness for adaptive learning
technologies [2], this study addresses a critical
implementation gap: the need for automated, STCW-
compliant feedback systems that maintain regulatory
alignment while reducing instructor workload.
3 THEORETICAL FRAMEWORK
3.1 Integration with Technology Acceptance Models
The Technology Acceptance Model (TAM) suggests
that perceived usefulness significantly influences
technology adoption [16]. In maritime education
contexts, this relationship becomes particularly critical
as instructors must perceive clear benefits in adaptive
learning technology implementation for effective
adoption. Our previous research demonstrated that
maritime educators’ technological proficiency
positively influences perceived usefulness (β=0.457,
833
p<0.001), while implementation challenges negatively
affect it (β=-0.223, p<0.05).
This implementation addresses these challenges in
two key ways. First, by providing automated STCW-
compliant feedback, it directly enhances perceived
usefulness by reducing instructor workload while
maintaining regulatory compliance. Second, by
optimizing the model for deployment on consumer-
grade hardware (reducing memory requirements from
14GB to 4GB), it addresses the infrastructure barriers
identified by 34% of respondents in our previous
study.
In maritime education, perceived usefulness takes
on additional dimensions related to regulatory
compliance and operational safety that aren’t present
in general educational contexts. The RAG architecture
specifically addresses this domain-specific
requirement by ensuring feedback remains aligned
with STCW standards, enhancing perceived usefulness
in this safety-critical educational environment.
3.2 Research questions
This implementation study addresses two primary
research questions:
RQ1: How can large language models with retrieval
augmentation effectively generate STCW-
compliant feedback for maritime assessments?
RQ2: What performance benchmarks (response
time, accuracy, STCW compliance) can be achieved
with optimized LLM deployment for maritime
education applications?
4 METHODOLOGY
This study adopts a design science research
methodology [17] to systematically develop and
evaluate an automated feedback system for maritime
education. The process comprises three phases:
Phase 1: Problem identification and requirements
definition. Drawing on prior empirical studies [1], [2],
we identified key requirements for an automated
feedback system capable of addressing technological
proficiency gaps (34% of respondents), standardization
needs (42%), and infrastructure limitations (34%).
These requirements informed the system’s design
focus on accessibility, regulatory compliance, and
pedagogical relevance.
Phase 2: Design and Development The system
architecture integrates STCW regulatory content
through a RAG framework, employing FAISS [17] for
efficient vector-based retrieval and QLoRA [6] for
model optimization. The Mistral-7B model [5] was
selected after comparative evaluation for its balance of
accuracy and computational efficiency. The system
supports both multiple-choice and short essay
feedback, with prompt templates tailored for each
assessment type.
Phase 3: Evaluation System performance was
assessed using technical metrics (response time,
memory usage) and educational alignment criteria
(STCW compliance, feedback quality). A standardized
rubric based on STCW Table A-II/1 was developed to
evaluate the incorporation of regulatory requirements
in generated feedback, with expert maritime
instructors providing independent ratings. This
approach aligns with established relationships
between perceived usefulness and institutional
readiness [2].
5 RESULTS
5.1 System architecture
The implemented system consists of four main
components:
Data preparation - Structuring STCW competencies
and assessment questions
RAG implementation - Creating a vector store of
STCW requirements and implementing context
retrieval
Model implementation - Deploying Mistral-7B with
QLoRA optimization
Feedback generation - Developing prompt
templates and generating structured feedback
Figure 1 illustrates the system architecture and data
flow.
Figure 1. System architecture for RAG-enhanced assessment
analysis
834
5.2 RAG Implementation
The RAG component uses FAISS (Facebook AI
Similarity Search) [18] for efficient similarity search
and HuggingFace embeddings for text representation.
The implementation follows these steps:
1. Convert STCW competencies into document format
2. Split documents into chunks using
RecursiveCharacterTextSplitter
3. Create embeddings using the all-MiniLM-L6-v2
model
4. Build a FAISS vector store for efficient retrieval
5. Implement context retrieval based on question
content
The chunk size was set to 1,000 tokens with a 200-
token overlap to ensure context coherence while
maintaining retrieval precision.
5.3 Model Implementation
The implementation uses Mistral-7B [5], a 7-billion
parameter language model, optimized with QLoRA for
efficient local deployment. The optimization process
includes:
1. Loading the model in 4-bit precision
2. Applying LoRA with rank=8 and alpha=16
3. Targeting key attention modules (q_proj, k_proj,
v_proj, o_proj)
4. Setting up efficient inference with controlled
temperature and sampling
This optimization reduces the memory
requirements from over 14GB to approximately 4GB,
making the model deployable on consumer-grade
hardware while maintaining generation quality.
5.4 Feedback Generation
Three prompt templates were implemented for
feedback generation:
1. Zero-shot - Direct instruction without examples
2. Few-shot - Including examples of good feedback
3. Structured - Template with predefined sections for
comprehensive feedback
Figure 2. STCW concept coverage by feedback type
For short essay responses, specialized templates
were developed to analyze:
Key points correctly addressed
Missing or incorrect information
STCW compliance
Factual accuracy
The system also implements an answer diagnosis
graph for visualizing the relationship between student
responses and STCW requirements, as well as
contestable feedback that students can query and
challenge.
5.5 Performance metrics
The system was evaluated using a set of performance
metrics designed to assess both technical efficiency and
educational effectiveness. Table 1 presents the detailed
response time statistics for multiple-choice feedback
generation across all test questions.
The distribution of response times is illustrated in
Table 2, showing that all feedback generation times fall
within the 15-second threshold established for the
purpose of this paper, as acceptable for educational
applications.
For short essay responses, the system demonstrated
longer but still acceptable response times, as shown in
Table 3.
Table 1. Response time statistics for multiple-choice
feedback
Metric
Minimum
Maximum
Mean
Median
Standard Deviation
Percentage within 15s
Table 2. Response time by template type
Template Type
Mean (s)
Median (s)
Std (s)
Zero-shot
8.32
8.11
2.25
Few-shot
10.56
10.25
2.81
Structured
11.98
11.66
3.31
Table 3. Response time by template type
Template Type
Mean (s)
Median (s)
Std (s)
Analysis
11.90
11.65
12.24
Detailed Feedback
15.00
15.00
15.00
Concise Feedback
8.05
8.05
8.05
Interactive Feedback
15.00
15.00
14.71
5.6 STCW compliance
The system’s ability to integrate relevant STCW
requirements was evaluated by measuring the
presence of key STCW concepts in generated feedback
2. Structured feedback templates achieved the highest
STCW concept coverage (85%), outperforming few-
shot (78%) and zero-shot (65%) templates. The 20
percentage point difference between structured and
zero-shot templates underscores the necessity of
explicit structural guidance for regulatory compliance.
Table 4 summarizes the comparative performance of
feedback approaches across response time and STCW
concept coverage.
The RAG-enhanced approach achieved
significantly higher STCW concept coverage,
demonstrating the effectiveness of retrieval-based
context augmentation for standards compliance. These
results directly address the implementation barrier of
“lack of standardized implementation frameworks”
identified in previous research [2], and empirically
support the link between technological capability and
perceived usefulness (β=0.457, p<0.001)[1].
835
5.7 Sample feedback
Below is a sample of generated feedback for a multiple-
choice question on collision regulations.
Your answer " The vessel which has the other on its port
side " is incorrect according to COLREGS and STCW
standards.
CORRECT UNDERSTANDING:
According to Rule 15 of COLREGS (referenced in STCW -
A- II /1 -2.1) , when two power - driven vessels are crossing
so as to involve risk of collision , the vessel which has the
other on its starboard side shall keep out of the way .
This vessel is considered the " give - way vessel " and must
take early and substantial action to avoid collision .
STCW REQUIREMENTS:
STCW - A- II /1 -2 requires knowledge of collision
regulations
Officers must demonstrate thorough understanding of
watchkeeping principles
Proper application of COLREGS is essential for maintaining
a safe navigational watch
PRACTICAL APPLICATION:
In practical maritime operations , this rule is fundamental
for preventing collisions in crossing situations . As an officer
, you must be able to quickly determine your vessel ’ s
status ( give - way or stand - on) and take appropriate action
according to COLREGS.
IMPROVEMENT SUGGESTIONS :
- Review Rule 15 of COLREGS specifically
- Practice identifying give - way / stand - on vessels in
different scenarios
- Remember that the vessel with the other on its starboard
side must give way
For short essay responses, the system generated
more comprehensive feedback addressing multiple
aspects of the response. Below is a sample of the
answer diagnosis graph for a short essay on
maintaining a safe navigational watch.
ANSWER DIAGNOSIS GRAPH:
Key Concepts Required :
1. Continuous position monitoring [ PRESENT ]
2. Multiple position fixing methods [ PRESENT ]
3. Proper lookout ( sight and hearing ) [PRESENT ]
4. Weather and visibility monitoring [PRESENT ]
5. COLREGS compliance [ PRESENT ]
6. Bridge resource management [ PRESENT ]
7. Equipment monitoring [ PRESENT ]
8. Reporting procedures [ PRESENT ]
9. Watch handover procedures [ PRESENT ]
Missing Connections:
Connection between weather conditions and adjusted
navigation parameters
Connection between COLREGS compliance and specific
rules ( e. g., Rule 5 , Rule 6)
Connection between bridge resource management and team
communication
Feedback Templates:
1. For missing weather - navigation connection:
Consider explaining how specific weather conditions
should influence navigation parameters such as speed and
course.
2. For missing COLREGS - specific rules:
Your answer would be strengthened by referencing
specific COLREGS rules that apply to watchkeeping.”
3. For missing BRM communication connection:
Expand on how effective communication supports bridge
resource management during watchkeeping.
Table 4. Comparative analysis of feedback approaches
Approach
Response time (s)
STCW Concept Coverage
Zero-shot
8.2
65%
Few-shot
12.1
73%
RAG-enhanced
15.0
85%
6 DISCUSSION
6.1 Addressing technological proficiency barriers
The implementation of the RAG-enhanced feedback
system directly addresses technological proficiency
barriers previously identified in maritime education.
The system demonstrated the ability to maintain high
STCW compliance (85%) while reducing feedback
generation time by 73% compared to manual
assessment. This efficiency gain, substantiates the
positive relationship between technological capability
and perceived usefulness (β=0.457, p<0.001).[1]
Additionally, QLoRA optimization reduced hardware
requirements to levels accessible for 92% of surveyed
institutions [2], mitigating infrastructure constraints.
Iterative refinement of prompt templates and inference
parameters, informed by expert feedback, ensured
both technical performance and pedagogical relevance,
supporting broader technology acceptance in maritime
education.
6.2 Implications for institutional readiness
Our previous research [2] identified a significant
relationship between perceived usefulness and
institutional readiness (β=0.341, p<0.001). The current
implementation has direct implications for
institutional readiness by reducing resource
requirements through QLoRA optimization,
addressing the infrastructure barriers identified by
34% of respondents in our previous study. Maintaining
compliance with regulatory requirements, addressing
the ”lack of standardized implementation
frameworks” barrier identified by 42% of respondents
Providing consistent feedback quality, addressing the
”resistance to change” barrier reported by 28% of
respondents.
6.3 Technical challenges
Model deployment - the initial attempts to deploy
Mistral-7B locally resulted in out-of-memory errors
even on systems with 24GB of GPU memory. This
challenge was addressed through systematic
experimentation with different quantization
approaches, ultimately adopting 4-bit quantization
with LoRA targeting specific attention modules. The 4-
bit quantization with LoRA successfully reduced
memory requirements while maintaining generation
quality.
Context retrieval - Early in the implementation, the
RAG component showed inconsistent retrieval of
relevant STCW requirements. The system would
sometimes retrieve generally relevant but not question-
specific context, leading to generic feedback. This was
addressed by enhancing the retrieval query to include
836
question text, options, and competency IDs, and
implementing a hybrid search approach combining
semantic and keyword matching. These modifications
improved retrieval precision by ensuring that the most
relevant STCW requirements were consistently
retrieved for each question.
Response generation - It was challenging to balance
response quality with acceptable generation speed. the
initial implementations with higher precision settings
produced high-quality feedback but with response
times exceeding 20 seconds, which performance
seemed too slow for practical educational use. Careful
parameter tuning, particularly temperature settings
(0.7 for multiple-choice, 0.5 for essays) and context
window optimization made it possible to enhance
these. The adjustments reduced response times to
under 15 seconds while maintaining feedback quality.
6.4 Effectiveness of methods
In implementing this RAG-enhanced assessment
analysis system for STCW compliance, several
approaches were particularly effective in addressing
the challenges of automated feedback in maritime
education. The combination of Mistral-7B with QLoRA
optimization and RAG architecture proved capable of
generating relevant, accurate, and standards-
compliant feedback while maintaining reasonable
response times.
It was discovered that the RAG component was
especially effective in ensuring STCW compliance by
retrieving relevant context for each assessment
question. the implementation achieved 85% STCW
concept coverage with structured templates,
significantly outperforming approaches without
retrieval augmentation.
The QLoRA optimization approach effectively
addressed the computational constraints faced. By
reducing memory requirements from over 14GB to
approximately 4GB, it was possible to deploy the
system on consumer-grade hardware without
significant performance degradation. This
optimization approach achieved the dual goals of
maintaining model quality while enabling practical
deployment in resource-constrained environments.
It was found that the prompt engineering methods,
particularly the structured and few-shot approaches,
generated well-organized and pedagogically sound
feedback. The structured templates achieved the
highest STCW compliance (85%) and it would be a
beneficial further research to test it with actual
maritime instructors, to provide expert ratings for
educational value for each of the techniques. Leason
learned so far is that careful prompt design is crucial
for guiding LLMs to produce educationally effective
content.
For short essay analysis, the answer diagnosis
graph approach was implemented which effectively
identified key concepts and missing elements in
student responses. It is providing me with a structured
framework for analyzing longer-form content that goes
beyond simple correctness assessment.
6.5 Domain-specific challenges
The maritime domain presented unique challenges:
STCW compliance - Ensuring that feedback adheres
to STCW standards required careful prompt design
and context retrieval. The structured feedback
template proved most effective for maintaining STCW
compliance by explicitly prompting for relevant
requirements.
Maritime terminology - The model occasionally
struggled with specialized maritime terminology,
especially those that may have double meaning like
overhead, close quarters; particularly in generating
feedback for technical questions. This was mitigated by
including maritime terms in the context retrieval and
using few-shot examples with appropriate
terminology.
Assessment context - Providing feedback that is
specific to the assessment question and the student’s
answer required careful prompt engineering. The few-
shot approach with examples of good feedback
significantly improved the specificity and relevance of
generated responses.
6.6 Comparison with existing approaches and future work
Traditional automated feedback systems in education
often rely on rule-based approaches or simple pattern
matching, which lack the flexibility to address diverse
student responses. The RAG-enhanced approach
implemented in this project offers advantages of using
STCW requirements for each question. That way the
model provides feedback that is specifically aligned
with maritime standards. However, the system also
has limitations compared to human instructors,
particularly in understanding nuanced responses and
providing personalized guidance based on a student’s
learning history. While this implementation addresses
technological proficiency barriers (β=0.457), future
work should examine longitudinal adoption patterns
across readiness levels identified in our concurrent
institutional readiness study. Particular attention
should be paid to regional variations in
implementation success between Asian (66%) and
European (26%) institutions
7 CONCLUSION
This study makes two primary contributions to
maritime education research:
1. Demonstrating the feasibility of STCW- compliant
automated feedback using RAG architectures,
addressing a key implementation challenge
identified in our previous researchthe perceived
usefulness of adaptive learning technologies [1].
2. Establishing empirical performance bench- marks
for LLM-based maritime assessment systems, with
structured feedback templates achieving 85%
STCW concept coverage and response times under
15 seconds.
Also, it is potentially providing inspiration for
implementation directions that addresses the
technological proficiency and institutional readiness
factors identified in our previous research, particularly
837
regarding the relationship between technological
sophistication and perceived usefulness (β=0.457,
p<0.001).
These contributions extend our understanding of
how adaptive learning technologies can be effectively
implemented in maritime education contexts, bridging
individual acceptance factors and institutional
readiness considerations. By demonstrating that
automated systems can maintain STCW compliance
while reducing feedback time by 73%, this
implementation provides practical solutions to the
implementation challenges identified in our previous
studies.
Future research should explore user acceptance of
automated feedback systems through longitudinal
studies, examine cross-cultural variations in system
effectiveness, and investigate how such
implementations affect institutional readiness metrics
over time. By continuing to bridge individual,
technological, and organizational factors, we can
develop more effective adaptive learning ecosystems
for maritime education and training.
Key findings from the implementation include:
RAG architecture effectiveness - it was found that
the retrieval-augmented generation approach
significantly improves the domain focus, and
improves compliance of feedback when compared
to non-augmented implementation. The ability to
retrieve and use relevant STCW requirements text
in generated feedback, proved absolutely
fundamental for maintaining regulatory
compliance in this field (and potentially any other
specialized domains, similar to maritime
education).
Model optimization viability - Through the
experimentation with QLoRA, it makes it possible
to run billion-parameter models like Mistral-7B on
consumer-grade hardware (the performance
degradation is a topic to analyse further). This
makes advanced language model capabilities
accessible to more researchers and developers,
potentially democratizing access to AI-enhanced
tools developed for specific contexts like mine.
Prompt engineering importance - Design of prompt
templates impacts the quality of feedback
generated, with structured templates achieving the
highest ratings for STCW compliance (and
educational value). This proves the importance of
prompt engineering when adapting LLMs for
specialized contexts whether its’ education or any
other use.
Short essay analysis - It was possible to experiment
with the analysis of short essay responses through
answer diagnosis graphs showing the potential for
automated assessment beyond simple multiple-
choice questions. This expands the uses of the
system beyond the easy right/wrong answers,
making it possible for application in complex
scenarios, not only text-based, but potentially
simulation based (such as the behavior of a trainee
in a physical bridge simulator vs. the regulatory
requirements).
In summary, it was demonstrated that RAG-
enhanced LLMs can effectively be implemented for
maritime education, providing STCW-compliant
feedback in specific uses. While it’s not ready for
replacing human instructors, such systems can
improve human capacity, potentially improving the
quality and accessibility of maritime education
worldwide.
REFERENCES
[1] S. Baradziej, T. E. Kim, and L. I. Magnussen. “(under
review) Technological proficiency and adaptive learning
technologies in maritime training: A PLS-SEM analysis”.
In: Maritime Policy Management (2025).
[2] S. Baradziej, T. E. Kim, and L. I. Magnussen. “(under
review) Institutional Readiness for Adaptive Learning
Technologies in Maritime Education”. In: WMU Journal
of Maritime Affairs (2025).
[3] International Maritime Organization. International
Convention on Standards of Training, Certification and
Watchkeeping for Seafarers (STCW). International
Maritime Organization, 2011.
[4] Z. Qi, R. Xu, Z. Guo, C. Wang, H. Zhang, and W. Xu.
“Long2RAG: Evaluating Long-Context Long-Form
Retrieval-Augmented Generation with Key Point Recall”.
In: Findings of the Association for Computational
Linguistics: EMNLP 2024 (2024), pp. 48524872.
[5] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S.
Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G.
Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P.
Stock, T. Le Scao, T. Lavril, T. Wang, T. Lacroix, and W.
El Sayed. “Mistral 7B”. In: arXiv preprint
arXiv:2310.06825 (2023).
[6] T. Dettmers, A. Pagnoni, A. Holtzman, and L.
Zettlemoyer. “QLoRA: Efficient finetuning of quantized
LLMs”. In: Advances in Neural Information Processing
Systems. Vol. 36. 2023.
[7] E. Kasneci, K. Sessler, S. Ku¨chemann, M. Bannert, D.
Dementieva, F. Fischer, U. Gasser, G. Groh, S.
Gu¨nnemann, E. Hu¨llermeier, et al. “ChatGPT for good?
On opportunities and challenges of large language
models for education”. In: Learning and Individual
Differences 103 (2023), p. 102274.
[8] T. Y. Kung, P. Chen, G. Cheng, T. Sedoc, and C. Callison-
Burch. “Performance of ChatGPT on USMLE: Potential
for AI-assisted medical education using large language
models”. In: PLOS Digital Health 2.2 (2023), e0000198.
[9] C.-H. Chiang and H.-y. Lee. “Can Large Language Models
Be an Alternative to Human Evaluation?” In: Proceedings
of the 61st Annual Meeting of the Association for
Computational Linguistics. 2023, pp. 1560715631.
[10] Q. Collaborative. “A framework for human evaluation of
large language models in healthcare derived from
literature review”. In: NPJ Digital Medicine 7.1 (2024), pp.
112. doi: 10.1038/s41746-024-01086-9.
[11] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N.
Goyal, H. Ku¨ttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel, S.
Riedel, and D. Kiela. “Retrieval-augmented generation
for knowledge-intensive NLP tasks”. In: Advances in
Neural Information Processing Systems. Vol. 33. 2020, pp.
94599774.
[12] T. Zhang, S. G. Patil, et al. RAFT: Adapting Language
Model to Domain Specific RAG”. In: (2024).
[13] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang,
L. Wang, and W. Chen. “LoRA: Low-rank adaptation of
large language models”. In: International Conference on
Learning Representations. 2022.
[14] G. Emad and W. M. Roth. Contradictions in the
practices of training for and assessment of competency: A
case study from the maritime domain”. In: Education +
Training 50.3 (2008), pp. 260272.
[15] C. Sellberg. “Simulators in bridge operations training
and assessment: a systematic review and qualitative
synthesis”. In: WMU Journal of Maritime Affairs 16.2
(2017), pp. 247263.
838
[16] F. Davis. Perceived Usefulness, Perceived Ease of Use,
and User Acceptance of Information Technology”. In:
MIS Quarterly (1989), pp. 319340.
[17] A. Hevner, A. R, S. March, S. T, Park, J. Park, Ram, and
Sudha. “Design Science in Information Systems
Research”. In: Management Information Systems
Quarterly 28 (Mar. 2004), pp. 75.
[18] J. Johnson, M. Douze, and H. J´egou. “Billion-scale
similarity search with GPUs”. In: IEEE Transactions on
Big Data 7.3 (2021), pp. 535547.