837
regarding the relationship between technological
sophistication and perceived usefulness (β=0.457,
p<0.001).
These contributions extend our understanding of
how adaptive learning technologies can be effectively
implemented in maritime education contexts, bridging
individual acceptance factors and institutional
readiness considerations. By demonstrating that
automated systems can maintain STCW compliance
while reducing feedback time by 73%, this
implementation provides practical solutions to the
implementation challenges identified in our previous
studies.
Future research should explore user acceptance of
automated feedback systems through longitudinal
studies, examine cross-cultural variations in system
effectiveness, and investigate how such
implementations affect institutional readiness metrics
over time. By continuing to bridge individual,
technological, and organizational factors, we can
develop more effective adaptive learning ecosystems
for maritime education and training.
Key findings from the implementation include:
− RAG architecture effectiveness - it was found that
the retrieval-augmented generation approach
significantly improves the domain focus, and
improves compliance of feedback when compared
to non-augmented implementation. The ability to
retrieve and use relevant STCW requirements text
in generated feedback, proved absolutely
fundamental for maintaining regulatory
compliance in this field (and potentially any other
specialized domains, similar to maritime
education).
− Model optimization viability - Through the
experimentation with QLoRA, it makes it possible
to run billion-parameter models like Mistral-7B on
consumer-grade hardware (the performance
degradation is a topic to analyse further). This
makes advanced language model capabilities
accessible to more researchers and developers,
potentially democratizing access to AI-enhanced
tools developed for specific contexts like mine.
− Prompt engineering importance - Design of prompt
templates impacts the quality of feedback
generated, with structured templates achieving the
highest ratings for STCW compliance (and
educational value). This proves the importance of
prompt engineering when adapting LLMs for
specialized contexts whether its’ education or any
other use.
− Short essay analysis - It was possible to experiment
with the analysis of short essay responses through
answer diagnosis graphs showing the potential for
automated assessment beyond simple multiple-
choice questions. This expands the uses of the
system beyond the easy right/wrong answers,
making it possible for application in complex
scenarios, not only text-based, but potentially
simulation based (such as the behavior of a trainee
in a physical bridge simulator vs. the regulatory
requirements).
In summary, it was demonstrated that RAG-
enhanced LLMs can effectively be implemented for
maritime education, providing STCW-compliant
feedback in specific uses. While it’s not ready for
replacing human instructors, such systems can
improve human capacity, potentially improving the
quality and accessibility of maritime education
worldwide.
REFERENCES
[1] S. Baradziej, T. E. Kim, and L. I. Magnussen. “(under
review) Technological proficiency and adaptive learning
technologies in maritime training: A PLS-SEM analysis”.
In: Maritime Policy Management (2025).
[2] S. Baradziej, T. E. Kim, and L. I. Magnussen. “(under
review) Institutional Readiness for Adaptive Learning
Technologies in Maritime Education”. In: WMU Journal
of Maritime Affairs (2025).
[3] International Maritime Organization. International
Convention on Standards of Training, Certification and
Watchkeeping for Seafarers (STCW). International
Maritime Organization, 2011.
[4] Z. Qi, R. Xu, Z. Guo, C. Wang, H. Zhang, and W. Xu.
“Long2RAG: Evaluating Long-Context Long-Form
Retrieval-Augmented Generation with Key Point Recall”.
In: Findings of the Association for Computational
Linguistics: EMNLP 2024 (2024), pp. 4852–4872.
[5] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S.
Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G.
Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P.
Stock, T. Le Scao, T. Lavril, T. Wang, T. Lacroix, and W.
El Sayed. “Mistral 7B”. In: arXiv preprint
arXiv:2310.06825 (2023).
[6] T. Dettmers, A. Pagnoni, A. Holtzman, and L.
Zettlemoyer. “QLoRA: Efficient finetuning of quantized
LLMs”. In: Advances in Neural Information Processing
Systems. Vol. 36. 2023.
[7] E. Kasneci, K. Sessler, S. Ku¨chemann, M. Bannert, D.
Dementieva, F. Fischer, U. Gasser, G. Groh, S.
Gu¨nnemann, E. Hu¨llermeier, et al. “ChatGPT for good?
On opportunities and challenges of large language
models for education”. In: Learning and Individual
Differences 103 (2023), p. 102274.
[8] T. Y. Kung, P. Chen, G. Cheng, T. Sedoc, and C. Callison-
Burch. “Performance of ChatGPT on USMLE: Potential
for AI-assisted medical education using large language
models”. In: PLOS Digital Health 2.2 (2023), e0000198.
[9] C.-H. Chiang and H.-y. Lee. “Can Large Language Models
Be an Alternative to Human Evaluation?” In: Proceedings
of the 61st Annual Meeting of the Association for
Computational Linguistics. 2023, pp. 15607–15631.
[10] Q. Collaborative. “A framework for human evaluation of
large language models in healthcare derived from
literature review”. In: NPJ Digital Medicine 7.1 (2024), pp.
1–12. doi: 10.1038/s41746-024-01086-9.
[11] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N.
Goyal, H. Ku¨ttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel, S.
Riedel, and D. Kiela. “Retrieval-augmented generation
for knowledge-intensive NLP tasks”. In: Advances in
Neural Information Processing Systems. Vol. 33. 2020, pp.
9459–9774.
[12] T. Zhang, S. G. Patil, et al. “RAFT: Adapting Language
Model to Domain Specific RAG”. In: (2024).
[13] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang,
L. Wang, and W. Chen. “LoRA: Low-rank adaptation of
large language models”. In: International Conference on
Learning Representations. 2022.
[14] G. Emad and W. M. Roth. “Contradictions in the
practices of training for and assessment of competency: A
case study from the maritime domain”. In: Education +
Training 50.3 (2008), pp. 260–272.
[15] C. Sellberg. “Simulators in bridge operations training
and assessment: a systematic review and qualitative
synthesis”. In: WMU Journal of Maritime Affairs 16.2
(2017), pp. 247–263.