Abstract: Tactical decision-making lies at the core of officer education, yet current training methods remain constrained by structural and pedagogical limitations. Traditional wargames offer valuable experience but are organisationally difficult, time-intensive, and often emphasise flows of play over cognitive decision-making. Tactical Decision Games (TDGs) partially address this gap, but their effectiveness is limited by instructor capacity, single-point decisions, limited scenario diversity, and significant instructor-to-trainee bias. Recent advances in large language models (LLMs) and conversational agents offer a promising avenue to mitigate these shortcomings. AI-driven platforms could generate numerous, varied, and realistic tactical scenarios; support multi-step decision cycles with adaptive adversaries; visualise battlefield dynamics; and provide transparent, personalised feedback.
Problem statement: What are the success factors ensuring that an AI-enabled TDG platform can properly support the tactical decision-making training of junior officers?
So what?: Similar to how repeated exposure to challenging levels in video games can develop procedural “muscle memory,” iterative engagement with tactical scenarios can strengthen decision-making under pressure. The effectiveness of AI-enabled TDG platforms depends on key design factors. At the system level, this includes structured interaction, controlled abstraction, and constrained AI behaviour, defined jointly by military educators and developers. At the user level, such platforms enable cadets to train decision-making repeatedly and independently, maximising learning within a limited time and fostering the development of decision-making patterns comparable to cognitive “muscle memory.”

Wargaming, PME and AI
Improvements in professional military education (PME) enabled by technology have been plentiful since the Cold War. Combat simulators of increasing complexity have been made available, allowing formations to train decision-making and processes, such as ELTAM in Switzerland, and for cadet officers to experiment with the complexity of the modern battlefield. Digitalised wargames have also proliferated, representing, with increasing detail, how air, naval and ground operations could be conducted, with franchises such as Command: Modern Operations or Flashpoint Campaigns.[1], [2], [3] Yet three issues remain with those platforms, as Sabin raises.[4] First, they are not necessarily more expedient than traditional wargames when it comes to mastering them. Second, they lack the flexibility required by officer training; scenarios can only be customised within preset program boundaries. Third, digitisation of wargaming requires computer infrastructure that isn’t readily available to all PME institutions. Additionally, digitised wargames focus on a narrow set of military tasks, don’t provide the instructor with feedback, and have a largely opaque adjudication system.
Artificial intelligence (AI) and conversational agents, in particular, offer new perspectives for PME; an AI-based platform could provide a wide variety of scenarios, personalised feedback, and avoid lengthy tutorials thanks to built-in explanations. Yet, just as computers were not inherently good educational tools, neither is technology on its own.
Wargaming and PME
The literature on PME converges on one idea: tactical decision-making training is effective when the tools (TDG, wargames, serious games) are designed as comprehensive learning activities, specifically linked to target skills, framed by a pedagogical process and robustly evaluated. This framework clarifies the requirements for a TDG platform that supports diverse scenarios and multiple decision cycles, with integrated evaluation mechanisms.
Wargaming is a proven tool that reinforces learning through lived experiences.[5] Its potency and success are tied to the compelling, immersive narratives it creates, bringing the player into the “magic circle” where the suspension of disbelief yields visceral, authentic command experiences.[6] Given that time is a frequent constraint, Perla advocates PME wargames that are simple, lively, accessible to non-wargamers, illustrative of good practice, diverse in scenarios, and available to personnel.[7] Furthermore, curricula must combine frequent wargames, exercises, analysis and other educational tools for maximal effect.[8]
At the program level, Enstad and Hagen highlight the heterogeneous nature of 21st-century PME and the lack of a common understanding of how junior officers’ education should be organised and what skills are expected of them.[9] Best practice in PME encourages activities structured into three phases (planning, delivery, post-delivery) and oriented towards critical thinking, openness, and diversity of opinion, as argued by Goode.[10] Thus, every learning outcome needs to demonstrate alignment among objectives, activities, and assessments, and be subject to systematic debriefing to ensure quality and fair feedback. Learning outcomes are usually organised by levels of learning in higher education and require specific types of wargames. Fowler argues that experiential wargames are best used for understanding and applying ideas, which are skills typically found in undergraduate studies
According to Kollars & Rosen, educational wargames can take several forms, but the most relevant use-cases for an AI platform are formative-illustrative ones, or wargames focused on learning concepts by actively applying them to specific cases.[11] Furthermore, Fowler linked higher levels of learning to wargaming types, assessing that undergraduate studies are best supported by experiential wargames that foster the understanding and application of ideas.[12]
Combe reasons that the literature still lacks a holistic approach to designing and integrating wargames as tools for adult education, and he therefore frames educational wargaming as a form of experiential learning grounded in Kolb and Kolb’s theory.[13], [14] On this view, learning in wargaming arises across preparation, conduct, and debriefing, as different phases engage different combinations of concrete experience, reflection, conceptualisation, and experimentation, and thus resonate with different learning styles.[15], [16]
Walters reinforces this logic by pointing out that integrating wargaming into PME requires formulating learning outcomes, choosing/modifying or designing appropriate games (often with expert support), and anticipating recurring implementation pitfalls.[17] Moreover, he argues that repeating decisions in wargames builds confidence in decision-making and a bias toward action, while exposing learners to constructive criticism and feedback. Priority must be given to military judgement (problem framing, mental imaging, critical thinking, reasoning) and quick decision-making with incomplete or contradictory information, rather than planning and products:[18], [19] Overvaluing products at the expense of decision-making must be avoided during execution, especially when plans become obsolete upon contact with adversaries.
Walters further distinguishes between the contribution of TDGs and decision forcing cases—useful for practising the formulation of estimates, orders and rationales—and that of wargames, whose strength lies in the density of micro-decisions in a continuously unfolding situation, itself conducive to developing ease and confidence.[20] The envisioned platform requires scenarios with multiple decision cycles, with elements of role-playing games for immersion, TDG for decision-forcing scenarios and solitaire games to require less instructor input.[21] This allows successive decisions to be fed into a reasoning-focused debriefing.
Ultimately, the credibility of the tool depends on evaluation and adoption by instructors. Kuehn argues that there is little research on evaluating learning in complex environments. She identifies six challenges in wargame assessment: gamesmanship, lack of control, multiple roles of the teacher, receptivity to feedback, evaluation of individuals in teams, and fairness.[22] Elg argues wargames require instructors’ buy-in, as their acceptance of wargaming (or lack thereof) trickles into the audience.[23] In other words, explicit criteria, stable rubrics, usable game traces to justify feedback, and mechanisms to reduce bias are required to overcome these challenges. Concurrently, instructor scepticism must be overcome by allowing for customisation, oversight and familiarity.
AI in Wargaming
Classical forms of wargaming may be insufficient to represent the complexity of modern-day fighting; Bojor and Grigore argue that games benefit from greater realism, enabled by digital game-based learning.[24] Hogan and Brennen see generative AI as a potent tool for scenario generation, adjudication, and post-game analysis, with software architectures enabling exclusively human, semi-automated, and fully automated handling.[25] Some practitioners are already using LLMs for scenario generation and adjudication, although the latter raises questions about datasets, AI training methods, and user acceptance of the tool.[26], [27] AI can also be credible opponents or “sparring partners” when provided with sufficient data.[28] An AI platform can do so at scale, increasing replayability.[29] Knack and Powell balance the benefits and risks of AI in wargaming, advocating for pragmatic use with methodological validation and safeguards (e.g., AI generation followed by human refinement).[30] The benefits of AI are thus clear, but technical feasibility needs to be considered.
Recent literature suggests that tactical conversational agents require more than general linguistic competence. Such agents must be aligned with tasks through explicit intents, constraints and procedures, while remaining robust in environments characterised by uncertainty, variations and adversarial interaction.[31] Classical dialogue architectures based on an NLU/NLG pipeline, combining intent classification and slot filling, have demonstrated their usefulness when communicative acts are well defined and when organisational control and auditability are required.[32]
Specialisation LLMs for PME use require consistency, accuracy and relevance, which several techniques can support. Retrieval-augmented generation (RAG) uses external databases, such as doctrine or relevant literature, to retrieve relevant information before generation, enabling LLMs to detect hallucinations. Structured narrative and role-play prompting work by instructing LLMs to mimic a role or operate within a specified narrative, thus structuring model behaviour. Finally, self-refinement can support scenario generation by decomposing scenarios into smaller tasks, increasing the usefulness of outputs.[33] For decision-making tasks, hybrid architectures that combine LLMs with explicit environmental representations and validation mechanisms further enhance reliability and verifiability.[34]
However, data availability remains a major limitation. The scarcity of realistic public datasets for tactical applications constrains the development and evaluation of conversational agents in military contexts. WGSR-Bench contributes through a strategic reasoning benchmark inspired by wargames and structured around situation understanding, adversary modelling and policy generation.[35] Although not a dialogue dataset, it captures uncertainty and antagonism more representative of TDG than purely textual benchmarks. For military dialogue, most realistic datasets remain proprietary and inaccessible. Consequently, open contributions focus on methodological aspects rather than data release. For example, Chuang & Cheng illustrate how synthetic military dialogues can be systematically designed and annotated for intent classification and response generation without relying on operational data.[36] Similarly, recent work on LLMs in wargaming and decision simulation discusses methodologies, use cases, and robustness considerations, but generally does not provide reusable datasets.[37] Taken together, these studies suggest that generating controlled synthetic data with domain-expert involvement constitutes a pragmatic alternative when high-fidelity public datasets are unavailable.
Beyond technical considerations, the deployment of conversational agents in military organisations raises challenges related to governance and integration into existing processes. In training and simulation environments, such agents often replace human role players. They must therefore maintain strict consistency in role, terminology and contextual awareness, while retaining the ability to request clarification when information is incomplete or ambiguous.[38] Furthermore, studies on technology-focused wargaming highlight a structural tension between generating exploratory insights and ensuring the replicability and comparability of results.[39] For LLM-based tactical dialogue systems, this tension translates into the need for systematic logging, source traceability—such as RAG approaches with explicit citations—and regression testing on standardised scenarios.
Overall, the literature indicates that LLM-based conversational agents can support multi-turn interaction, contextual reasoning and controlled behaviour when appropriately constrained and integrated. Reinforcement learning thus needs to prioritise long-term rewards in its training policies to optimise sequential decisions through trial and error.[40] However, significant gaps remain, notably the absence of public high-fidelity datasets, shared metrics for evaluating decision quality beyond linguistic correctness, and robust mechanisms for auditability and operation under incomplete information. Addressing these gaps requires the development of standardised TDG, structured annotation schemes and continuous red teaming practices.
Characteristics of the Platform
The operational and educational need has been identified: design a wargame for junior officers that delivers and reads both text and images, runs simulations lasting several decision cycles, and uses an interface that allows cadets to make decisions by analysing the mission and operating environment. This AI platform sits between TDGs and role-playing games, except that it is played in solitaire and provides personalised feedback after each run. Pedagogically, the game is formative-illustrative and focuses on understanding and applying doctrine and TTPs (tactics, techniques and procedures). Its scenarios are varied, personalised and subject to human safeguards, ensuring replayability and consistency with learning outcomes.
For a junior officer, the use of the platform would begin with a mission briefing that combines written orders, maps, and visual information about the operating environment. The officer would analyse the mission, assess terrain, enemy and own forces, and then make a series of tactical decisions across several simulated decision cycles, with each choice shaping the next situation presented by the system. After the run, the platform would provide personalised feedback showing how the officer’s decisions related to doctrine and TTPs, highlighting both strengths and errors in judgment. Repeated play through varied but pedagogically controlled scenarios would allow the officer to practise applying concepts to concrete cases while receiving consistent formative feedback.
Since the literature on wargames is abundant and scattered, identifying success factors for development and employment is not straightforward. Without clear requirements, the design risks producing a system that is overly complex, insufficiently realistic or of little educational value.
Operationalising Requirements
To design this platform, the chosen approach is qualitative, with support from an LLM. The game is open-ended, unconstrained by closed decision trees and emerging through interactions shaped by rules, roles and narrative context. LLMs can facilitate adjudication, enable large-scale replayability, and provide qualitative logic, rather than earlier models that relied on discretised, “quantified” formats.[41] Repeated runs support wargaming as a tool for knowledge creation, transfer, and learning. At the same time, multiple decision cycles help surface implicit assumptions, decision points, and second-order effects that would otherwise remain invisible.[42], [43]
Rather than positioning players against AI, the agent is assigned an educational role as a calibrated sparring partner and coach, providing structured feedback, points of attention, and alternatives. To prevent inconsistent or unjustifiable adjudication, a frequent issue with generative systems in wargaming, control mechanisms are embedded, like rules, logging, and arbitration criteria.[44], [45] A reflective, LLM-based agent is therefore valuable insofar as it enables more autonomous, yet auditable and consistent, operation over time.
Initially, this LLM is assumed to lack domain-specific knowledge about wargaming, rules, doctrine, and educational objectives. This knowledge must be constructed through the agent architecture, namely, memory, reflection, and safeguards, and through structured scenarios and data. The following parts identify three sets of technical requirements necessary to turn the platform into a useful PME tool. First, the initial setup is covered, shaping how the LLM understands and interacts with its environment. Second, scenario generation is covered. Third comes analysis, including conduct of the game, feedback and logging. Taken together, these requirements lay the groundwork for the successful development of said platform.
Agent Structure, Uncertainty and Action
Three roles are required: a Game Control, which acts as a referee; an Opponent; and a Player. The Game Control and the Opponent are LLM-powered; the Player is the sole human-in-the-loop and the target audience. The two AI agents support learning by making interactions between the Player and the Opponent fair, while also improving security and auditability by limiting the Opponent’s autonomy. Realistic wargaming involves information asymmetry: each side observes only a subset of the world state and acts under uncertainty, thus making decisions under partial observability. Consequently, the Opponent reasons over a belief state, or a structured representation of what it thinks is true, rather than the full underlying state. To constrain the Opponent without dictating its tactics, two cumulative architectural choices are required: 1) side-specific memory and traceability of observations and disclosures, and 2) consistent update mechanisms to model deception, uncertainty, and delays.[46] The former provides a per-side, provenance-aware record of what was observed or disclosed. At the same time, the latter uses that record to revise each side’s beliefs under deception, uncertainty, and delays. The Control maintains ground truth and controls what information each side receives and when. The Opponent acts within this fog of war and must thus make decisions under the same conditions as the Player, i.e., with partial information and risk of mistakes. To maintain continuity and internal coherence for Game Control and the Opponent, agentic schemes that couple reasoning and action are used, for example, Reason-Action (Re-Act), with reflection loops involving self-critique and revision. The ReAct framework enables AI agents to alternate between reasoning and acting, allowing them to learn from observations and make better decisions. For wargaming, these structures are useful because they organise decisions into episodes, moving from observation to hypothesis to action to feedback to adjustment, and provide a basis for auditable adjudication and controlled correction of inconsistencies.[47]
Generation Requirements: Threats, Scenario Diversification and Briefings
To produce plausible scenarios, the platform must draw on a knowledge base covering threat taxonomies, doctrinal frameworks, and contextual constraints (terrain, infrastructure, and rules), as well as the political and societal environment. Work on AI in wargaming notes that the expected gains in speed of preparation, branch diversity, and analysis depend heavily on the quality of the framing and reference data. A key challenge is the ability to remain challenging without drifting into fanciful scenarios. A potential solution is a hybrid generation mechanism: first, controlled LLM generation based on structured “threat files” creates grounded scenarios, which are then verified against official threat descriptions. Then, those scenarios are expanded and diversified through variations and injections, thereby covering a wider set of cases.
Contemporary threats are evolving rapidly, particularly due to dual-use technologies. For an educational platform, this requires recognising the multiple end-uses of a technology and generating plausible uses without crossing the threshold of undue operational assistance. In practice, a “constrained creativity” approach would solve this issue: the agent proposes threat developments based on categories such as capability, intent, opportunity, logistical constraints, and expected effects, then submits the proposals to human safeguards, responsible for consistency. This need for robustness is even more important given that LLMs can be sensitive to misleading inputs and generation errors if controls are insufficient.
Furthermore, modern conflicts feature many non-military facets, such as civilian actors, informational spaces, infrastructure, and cyber, particularly in the context of so-called “hybrid” threats. Credible modelling, therefore, requires agent roles capable of heterogeneous objectives, legal and political constraints, and distinct rationalities. Generative AI enables iteration on existing actor profiles across several runs, with memory, preferences, resources, and plausible social interactions, while maintaining strong control over objectives and constraints.
The platform must generate mission briefings that are consistent with the level of play; for example, by formulating a brief order in an institutional format, such as the orient-intention-mission format. The objective is not to “provide the solution,” but to provide a usable context: mission, constraints, resources, time frames, assumptions, and frictions. This module also serves to calibrate the assessment: an initial order that is too vague reduces the educational value, while one that is too directive biases it.
Analysis Requirements: Narrative Consistency, Debriefings and Iterations
Once the scenario has been generated, it must unfold consistently through causal continuity, plausible reactions, realistic tempo, and friction. Qualitative wargames rely on narrative adjudication; thus, the challenge for an LLM is stable and non-contradictory progression over several cycles. Multi-agent automation is possible, but requires control mechanisms such as roles, instructions, logs, and post-analysis.[48] Reference frameworks such as doctrine and TTPs can help the platform in adjudicating decisions.
Over-generalisations or AI hallucinations pose additional risks. These undermine reliability and credibility but can be remedied through validation protocols such as human review and grounding. This can be achieved through a three-output evaluation loop: 1) diagnosis of what is consistent, fragile, or missing; 2) reasoned improvements offering alternative options plus trade-offs; and 3) explicit no-gos identifying constraint violations, major inconsistencies, and uncovered risks. This scheme is in line with the “AI-in-wargaming” recommendations, emphasising actionable and traceable analysis tools rather than simple verdicts.[49]
When a game run is over, a debriefing is required. Yet LLM-based agents often remain opaque in their decision-making, raising an educational challenge: how can a Player understand what option is preferable and under which conditions? Additionally, high-stakes contexts demand systems that are interpretable by design rather than misleading explanations after the fact.[50] Making the system’s reasoning legible can address these issues by explaining the relevant signals, linking them to applicable rules, indicating uncertainty, and stating the conditions under which the output is valid. The debriefing then provides calibrated confidence and helps avoid both automation bias and participant mistrust.
Finally, one must ensure the platform can be reused. Multiple iterations allow for different courses of action to be explored, bifurcations to be compared, and invariants to be identified. The open-ended nature of the platform enables this by multiplying runs, while data capture through logs, decisions, and justifications allows traceability for the agent and the Player. Consequently, the platform provides intelligent tutoring consistent with the literature: adaptive systems can produce measurable learning gains when they provide individualised and iterative feedback.
Assessment of Key Success Factors
Requirements for a useful TDG platform are numerous, but three stand out. The first is a multi-agent architecture that allows partial observability and simultaneous analysis of each iteration. The second is control and oversight of the generation process; contemporary scenarios call for multiple actor types operating within concurrent domains, and these can be difficult for unconstrained LLMs to generate credibly without human and internal controls. The third, analysis and LLM interactions with Players, must account for existing military conceptualisation; otherwise, they risk drifting into hallucinations or weak feedback bereft of useful substance.
Technical Challenges
This section analyses the technical feasibility of developing the previously discussed platform, focusing on its suitability for PME contexts rather than on the technology’s full maturity.[51], [52], [53]
What is Already Technically Possible?
From a technological perspective, several fundamental elements of the proposed platform are already available or sufficiently mature to support experimental scenario-based training applications without requiring bespoke AI development.[54][55] Recent research shows that LLMs can enhance wargaming and digital game-based learning across multiple phases—such as scenario design, adjudication, and debriefing—when they are carefully constrained and embedded within a structured pedagogical framework. AI systems trained or adapted for operational and tactical contexts, including military and security settings, already exist. However, these systems are usually developed in closed environments with access to sensitive or classified data and are thus not transferable or replicable in open training contexts. Their existence demonstrates technical feasibility while highlighting limitations in transparency, auditability and scalability.[56]
In a PME context, AI systems can already be provided with a broad set of non-sensitive inputs, such as public military doctrine, TTPs, standard operating procedures, formalised decision-making processes, order formats and conceptual models. These elements allow the generation of credible and coherent scenarios without reliance on classified information, a distinction heavily emphasised in the literature on AI-enabled wargaming; however, such non-sensitive inputs may carry over the institutional biases embedded in non-sensitive inputs on which they rely.[57], [58] If this issue is not explicitly addressed, the analytical relevance of engagement data risks becoming marginal due to circular reasoning. As a result, AI can already function as a training counterpart to the Player, provided that its scope of application and level of abstraction are explicitly defined.
Multi-Turn Interaction as a Central Challenge
While the importance of multi-turn decision cycles has been established earlier, the technical limitation lies in the absence of explicit world-state representations within LLMs, which necessitates external state management. This logic is reflected in frameworks such as Reason+Act (ReAct) and Reflexion, which interleave reasoning, action, and critical feedback, thereby improving performance on multi-step problem-solving tasks.[59] For wargaming, their relevance lies less in improved autonomy than in structuring sequential decisions under uncertainty.
In a PME context, however, such reflective mechanisms should not be interpreted as a step towards full AI autonomy. Rather, their primary value lies in improving the consistency of agent behaviour under controlled conditions. Empirical studies on LLM-based wargaming systems indicate that, despite these architectural advances, maintaining coherence across multiple interaction turns remains challenging, particularly when systems operate without explicit representations of the evolving scenario state.[60]
From a technical standpoint, multi-turn interactions are therefore partially feasible but subject to structural limitations. While LLMs can retain conversational context across several decision cycles, they tend to degrade over time: relevant information may be lost, earlier decisions may be reinterpreted inconsistently, and initial constraints may be progressively ignored.[61] These issues worsen in tactical environments, where state evolution follows causal relationships rather than purely narrative logic, a challenge also highlighted in recent benchmarking efforts on strategic reasoning under uncertainty.[62] The core limitation lies in the absence of an explicit representation of the simulated world state within the LLM itself. Consequently, entrusting the model alone with managing scenario dynamics entails a high risk of inconsistency.
A technically realistic solution is to integrate the LLM with an external state management system that tracks objective variables such as unit positions, resources, timing, and intermediate outcomes. In this configuration, partial visualisation can be provided to the cadet through a constrained and abstract interface—for example, a static or semi-static map combined with a limited set of symbolic objects representing units, terrain features, and key events. Such visuals do not aim to simulate the full environment, but rather to externalise the shared state of the scenario, supporting situational awareness while preserving the exercise’s primarily cognitive and decision-focused nature.
Within this architecture, the AI does not autonomously determine the objective state of the simulated environment. Even if multiple AI roles are implemented (e.g., opponent, control, or feedback functions), the underlying state transitions remain governed by the external state management system. The AI, therefore, operates primarily as a controlled interaction and narrative layer: it interprets and communicates state changes produced by the simulation core, ensuring narrative continuity, qualitative feedback, and pedagogical coherence rather than independently generating outcomes.
At the same time, the AI is responsible for developing the opponent’s responses within the constraints defined by the current state, generating plausible adversarial behaviour without directly altering objective variables. Feedback can be delivered either synchronously at predefined decision points or asynchronously as intermediate updates, depending on the exercise’s pedagogical design. Alternatively, or in combination with external state management, multi-turn interaction can be structured through predefined decision-making phases inspired by formalised military processes. Structuring interaction into phases such as analysis, decision, execution, and evaluation reflects established principles of educational wargaming, where learning emerges from guided decision cycles rather than from unconstrained play.[63], [64], [65] This approach deliberately limits generative freedom while increasing traceability and educational value.
Data, Operational Domains and Their Origin
A critical constraint is the limited availability of operational data, as most realistic military data is classified and cannot be directly used in AI-enabled wargaming.[66], [67] As a result, AI training must rely on non-sensitive inputs such as doctrine, abstract models, fictitious scenarios and historical data, while excluding information on current capabilities, operational plans and specific vulnerabilities.[68] However, these sources are inherently insufficient to capture the contingent, adaptive and adversarial dynamics of real operations, as they tend to reflect institutionalised assumptions, idealised procedures, and past contexts rather than the frictions, uncertainty, and improvisation characteristic of contemporary military decision-making.[69], [70]
Three approaches can help overcome this gap. The first relies on military cooperation, enabling the sharing of abstracted or anonymised training data among partners, albeit under significant political, legal and security constraints.[71] The second approach involves generating synthetic data, which has been identified as a pragmatic solution in the absence of high-fidelity public datasets for military dialogue and decision-making.[72], [73] A third deductive approach is to build on analytical models and rule-based simulations inspired by operations research and military studies. In this configuration, scenario evolution is computed externally, based on explicit theoretical assumptions, while the AI translates structured outcomes into narrative explanations and qualitative feedback. This separation of roles is intended to reduce the risk of incoherent adjudication and to support explainability.
From a feasibility perspective, a hybrid strategy combining these approaches appears to be the most sustainable and compatible with military security constraints.
Overall Assessment of Feasibility
Overall, the technical feasibility of the proposed platform is realistic, albeit clearly constrained by structural limitations. While the core technologies required for its development are already available, their effective deployment depends less on further advances in AI capabilities than on careful system design and pedagogical integration at the platform level. This conclusion is consistent with broader findings in PME and wargaming research, which emphasise that technological tools must remain subordinate to educational objectives and governance mechanisms.[74], [75], [76]
Conclusion
This paper has sought to identify what an AI-powered TDG platform for junior officers should do, how it could be made and what would break it. By drawing on the wargaming, PME and AI scientific literature, this paper argues that multiple runs, with multiple decision cycles and personalised feedback, enhance junior officer tactical training. Additionally, known AI issues, such as hallucinations, inadequate answers, and memory loss, must be controlled. To ensure credibility and utility, four success factors have been identified. First, a multi-agentic architecture is required. By separating the referee and opponent roles into two distinct agents, the player can face an adversary that also operates under partial observability, ensuring fair and realistic behaviour. Second, adequate data sets must be used to ensure realistic outcomes. Doctrine and TTPs can establish general rules and causalities, while combat data, synthetic or not, enables credible adjudication. Third, a coherent flow of events must be ensured across decision cycles to enable multi-turn interactions. An external object management system can keep track of assets and pre-empt memory loss, while enabling visual representation. Alternatively, pre-phased scenarios offer higher coherence at the cost of action freedom. Fourth, feedback must be relevant to player decisions within and across runs, and to doctrine and TTPs. A logging system can ensure such personalisation by assessing current and past player performance and progress.
Developing an operational platform, however, is only the first step in its deployment. Further questions related to appropriate learning objectives, curriculum structure, interaction with other teaching methods, and opportunity costs, while not addressed in this paper, can prove as decisive as the technical features of an AI TDG platform. This paper can nonetheless serve as a starting point for further investigation into AI-supported officer training.
[1] Armasuisse, “Remise des simulateurs de tir et de combat ainsi que du simulateur tactique à Thoune,” New.Admin.ch, June 29, 2010, https://www.news.admin.ch/fr/nsb?id=33998.
[2] MatrixGames, “Command Professional Edition,” Command.MatrixGames.com, February 10, 2026, https://command.matrixgames.com/?page_id=3822.
[3] MatrixProSims, “Flashpoint Campaigns,” MatrixProSims.com, February 10, 2026, https://www.matrixprosims.com/game/flashpoint-campaigns-professional-edition.
[4] P. Sabin, “The Benefits and Limits of Computerization in Conflict Simulation,” Literary and Linguistic Computing 26, no. 3 (2011): 323–25, https://doi.org/10.1093/llc/fqr024.
[5] Amanda M. Rosen and Lisa Kerr, “Wargaming for Learning: How Educational Gaming Supports Student Learning and Perspectives,” Journal of Political Science Education 20, no. 2 (2024): 330–31, https://doi.org/10.1080/15512169.2024.2304769.
[6] James Fielder, “Innovation in PME Wargaming for Innovation in Warfare,” Journal of Advanced Military Studies, 2025, 231–33, https://doi.org/10.56686/9798987336281.
[7] Peter Perla, “Wargaming and the Cycle of Research and Learning,” Scandinavian Journal of Military Studies 5, no. 1 (2022): 204–5, https://doi.org/10.31374/sjms.124.
[8] Perla, “Wargaming and the Cycle of Research and Learning,” 206–7.
[9] Kjetil Enstad, “Professional Knowledge through Wargames and Exercises,” Scandinavian Journal of Military Studies 5, no. 1 (2022): 15–16, https://doi.org/10.31374/sjms.130.
[10] Claire Goode, “Best Practice Principles for Professional Military Education: A Literature Review,” Journal of Defense Resources Management 10, no. 2 (2019): 12–13.
[11] Nina A. Kollars and Amanda M. Rosen, “Simulations as Active Assessment?: Typologizing by Purpose and Source,” Journal of Political Science Education 9, no. 2 (2013): 148–50, https://doi.org/10.1080/15512169.2013.770983.
[12] Michael Fowler, “Wargames as Pedagogical Tools: Using Wargames for Higher Education,” Journal of Political Science Education 21, no. 1 (2025): 165–69, https://doi.org/10.1080/15512169.2024.2349549.
[13] Peter C. Combe, “Educational Wargaming: Design and Implementation into Professional Military Education,” Journal of Advanced Military Studies 12, no. 2 (2021): 121, https://doi.org/10.21140/mcuj.20211202003.
[14] Alice Y. Kolb and David A. Kolb, “Learning Styles and Learning Spaces: Enhancing Experiential Learning in Higher Education,” Academy of Management Learning & Education 4, no. 2 (2005): 194–96, https://doi.org/10.5465/AMLE.2005.17268566.
[15] Kolb and Kolb, “Learning Styles and Learning Spaces,” 196–201.
[16] Combe, “Educational Wargaming,” 120–27.
[17] Eric M. Walters, “Developing Self-Confidence in Military Decision Making: An Imperative for Wargaming,” Journal of Advanced Military Studies 12, no. 2 (2021): 170–72, https://doi.org/10.21140/mcuj.20211202003.
[18] Walters, “Developing Self-Confidence in Military Decision Making,” 168–70.
[19] Perla, “Wargaming and the Cycle of Research and Learning,” 201–3.
[20] Eric M. Walters, “Wargaming in Professional Military Education: Challenges and Solutions,” Journal of Advanced Military Studies 12, no. 2 (2021): 82–88, https://doi.org/10.21140/mcuj.20211202003.
[21] Walters, “Wargaming in Professional Military Education,” 84–89.
[22] Kate Kuehn, “Assessment Strategies for Educational Wargames,” Journal of Advanced Military Studies 12, no. 2 (2021): 146–51, https://doi.org/10.21140/mcuj.20211202003.
[23] Johan Elg, “Instructor Buy-In: Pitfalls and Opportunities in Wargaming,” Kungliga Krigsvetenskapsakademiens Handlingar och Tidskrift, no. 2 (June 2019): 9–12.
[24] Laviniu Bojor and Laurenţiu Grigore, “Mission: Education—Achieving Tactical Skills through Digital Game-Based Learning,” International Conference KBO 29, no. 2 (2023): 145–46, https://doi.org/10.2478/kbo-2023-0049.
[25] Daniel P. Hogan and Andrea Brennen, “Open-Ended Wargames with Large Language Models,” arXiv, April 17, 2024, 4–6, https://doi.org/10.48550/arXiv.2404.11446.
[26] Robert A. Coombs, “AI Integration for Scenario Development—Training the Whole-of-Force,” Military Review, Online Exclusive, May 2024, 5–7.
[27] Patrick Hinton, “Generative AI and Wargaming: What Is It Good For?,” RUSI Journal 168, no. 7 (2023): 37–40, https://doi.org/10.1080/03071847.2023.2282863.
[28] James Goodman, Sebastian Risi, and Simon Lucas, “AI and Wargaming,” arXiv, September 18, 2020, 17–20, https://doi.org/10.48550/arXiv.2009.08922.
[29] Hogan and Brennen, “Open-Ended Wargames with Large Language Models,” 2–3.
[30] Anna Knack and Rosamund Powell, Artificial Intelligence in Wargaming: An Evidence-Based Assessment of AI Applications, CETaS Research Reports (London: Alan Turing Institute, 2023), 26–28, https://cetas.turing.ac.uk/publications/artificial-intelligence-wargaming.
[31] Yuwei Chen and Chu Shiyong, “Large Language Models in Wargaming: Methodology, Application, and Robustness,” September 27, 2024, 2894–995, https://doi.org/10.1109/CVPRW63382.2024.00295.
[32] Hsiu-Min Chuang and Ding-Wei Cheng, “Conversational AI over Military Scenarios Using Intent Detection and Response Generation,” Applied Sciences 12, no. 5 (2022): 14–19, https://doi.org/10.3390/app12052494.
[33] Gordon Smith et al., “Enhancing Medical Training with AI-Driven Scenario Generation,” in 2025 IEEE International Conference on Digital Health (ICDH) (2025), 13–19, https://doi.org/10.1109/ICDH67620.2025.00012.
[34] Lin Guan et al., “Leveraging Pre-Trained Large Language Models to Construct and Utilize World Models for Model-Based Task Planning,” arXiv, May 24, 2023, 6–7, https://doi.org/10.48550/arXiv.2305.14909.
[35] Qiyue Yin et al., “WGSR-Bench: Wargame-Based Game-Theoretic Strategic Reasoning Benchmark for Large Language Models,” arXiv, June 12, 2025, 9–10, https://doi.org/10.48550/arXiv.2506.10264.
[36] Chuang and Cheng, “Conversational AI over Military Scenarios,” 3–7.
[37] Chen and Shiyong, “Large Language Models in Wargaming,” 2901.
[38] Joost van Oijen and Olivier Claessen, “Building Conversational Agents for Military Training: Towards a Virtual Wingman,” in Artificial Intelligence in HCI, ed. Helmut Degen and Stavroula Ntoa (Cham: Springer International Publishing, 2021), 11–13, https://doi.org/10.1007/978-3-030-77772-2_34.
[39] Hansruedi Bircher et al., Technology Wargaming: Experiencing Future Technologies Combining Multiple Approaches (NATO Science and Technology Organization, 2021), 3–5, https://www.sto.nato.int/document/technology-wargaming-experiencing-future-technologies-combining-multiple-approaches/.
[40] Yin et al., “WGSR-Bench,” 7-9.
[41] Hogan and Brennen, “Open-Ended Wargames with Large Language Models,” 2–3.
[42] Perla, “Wargaming and the Cycle of Research and Learning,” 202–4.
[43] Dagfinn Vatne et al., “Wargaming for the Purpose of Knowledge Development: Lessons Learned from Studying Allied Courses of Action,” Scandinavian Journal of Military Studies 5, no. 1 (2022): 3–5, https://doi.org/10.31374/sjms.122.
[44] Chen and Shiyong, “Large Language Models in Wargaming,” 2–4.
[45] Knack and Powell, Artificial Intelligence in Wargaming, 29–30.
[46] Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains,” Artificial Intelligence 101, no. 1 (1998): 7–10, https://doi.org/10.1016/S0004-3702(98)00023-X.
[47] Shunyu Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models,” arXiv, October 6, 2022, 1–4, https://doi.org/10.48550/arXiv.2210.03629.
[48] Hogan and Brennen, “Open-Ended Wargames with Large Language Models,” 6.
[49] Knack and Powell, Artificial Intelligence in Wargaming, 39.
[50] Cynthia Rudin, “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead,” Nature Machine Intelligence 1 (2019): 208-209, https://doi.org/10.1038/s42256-019-0048-x.
[51] Goode, “Best Practice Principles for Professional Military Education,” 8-12.
[52] Hinton, “Generative AI and Wargaming,” 36–39.
[53] Sabin, “Benefits and Limits of Computerization,” 324–27.
[54] Chen and Shiyong, “Large Language Models in Wargaming,” 2896–900.
[55] Van Oijen and Claessen, “Building Conversational Agents for Military Training,” 516–31.
[56] Hinton, “Generative AI and Wargaming,” 36–39.
[57] Knack and Powell, Artificial Intelligence in Wargaming.
[58] Sabin, “Benefits and Limits of Computerization,” 324–27.
[59] Yao et al., “ReAct.”
[60] Chuang and Cheng, “Conversational AI over Military Scenarios.”
[61] Chen and Shiyong, “Large Language Models in Wargaming,” 2896–900.
[62] Yin et al., “WGSR-Bench.”
[63] Kollars and Rosen, “Simulations as Active Assessment?,” 147–51.
[64] Perla, “Wargaming and the Cycle of Research and Learning,” 201–5.
[65] Eric M. Walters, “Developing Self-Confidence in Military Decision Making,” 170-73.
[66] Hinton, “Generative AI and Wargaming,” 36–39.
[67] Knack and Powell, Artificial Intelligence in Wargaming.
[68] Sabin, “Benefits and Limits of Computerization,” 324–27.
[69] Carl von Clausewitz, On War, ed. and trans. Michael Howard and Peter Paret (Princeton, NJ: Princeton University Press, 1976), 119–21.
[70] Perla, “Wargaming and the Cycle of Research and Learning,” 201–5.
[71] Knack and Powell, Artificial Intelligence in Wargaming.
[72] Chuang and Cheng, “Conversational AI over Military Scenarios.”
[73] Chen and Shiyong, “Large Language Models in Wargaming,” 2896–900.
[74] Hinton, “Generative AI and Wargaming,” 36–39.
[75] Goode, “Best Practice Principles for Professional Military Education,” 8–12.
[76] Walters, “Developing Self-Confidence in Military Decision Making,” 170–76.








