5 Questions to Ask Before Trusting AI-Generated Simulation

Johan Andreasson

04/16/2026

AI can now generate simulation models from natural language prompts. It can derive equations, write code, configure experiments, and produce results. For engineers evaluating these capabilities, the interesting question is no longer, “Can AI do it?” It almost always can.

The harder question is: “Should I trust the result enough to make a decision?”

That question matters because simulation results drive real consequences. A chiller plant sized using simulation will operate for 20 years. A suspension configuration validated in simulation goes into a vehicle that carries people. A control strategy tested in a virtual environment gets deployed on physical equipment. The gap between an impressive demo and a production-ready decision tool is not about AI capability. It is about engineering trust. Here are five questions that help distinguish one from the other.

1. Where did the physics come from?

An AI that derives governing equations from first principles may produce results that look correct. The equations may compile, the simulation may converge, and the plots may show reasonable trends.

But “reasonable” is not the same as “validated.” A compressor model that assumes ideal gas behavior will give plausible COP values at moderate conditions and diverge from reality at high pressure ratios. A heat exchanger model that neglects fouling will consistently overpredict performance over the system’s lifetime. These errors do not announce themselves. They sit quietly in the results until someone makes a decision based on them.

Validated library components take a different approach. The physics has been verified against experimental data. The parameter ranges are documented. The known limitations are explicit. When an AI selects a validated component, it inherits that verification without needing to reproduce it.

What to look for: can the tool tell you where each equation came from and what data it was validated against? Or did the AI derive it during the session?

2. Who checks the AI’s work?

A single AI agent that builds a model, runs a simulation, and presents results has no check on its own reasoning. If it selects the wrong component type, sets a parameter outside its valid range, or misinterprets a boundary condition, nothing catches the error before the engineer sees the result.

Engineering organizations have always required peer review of calculations. The same principle applies to AI-assisted workflows. A well-designed system separates the agent that proposes from the agent that verifies. Parameter values are checked against documented ranges. Unit consistency is enforced. Results are compared to expected magnitudes before being presented.

This is not about distrusting AI. It is about applying the same discipline to AI-assisted work that we already apply to human-assisted work.

What to look for: is there a verification step built into the workflow? Or does the output go directly from generation to presentation?

3. Can I trace every number back to a source?

When simulation results support a design decision, the audience needs to know what model was used, what parameters were set, where those values came from, and what assumptions were made. This is not paperwork. It is how organizations maintain accountability across projects that span years and involve dozens of people.

AI models generated from first principles during a conversation have limited traceability. The derivation happened in a session that may not be logged. The parameter values were chosen by the AI based on training data of unknown provenance. The assumptions may not be explicitly stated.

Models assembled from documented library components, on a platform that records configurations and experiment definitions, have inherent traceability. The component class path identifies exactly what was used. The parameter values can be traced to documented sources. The experiment history is preserved.

What to look for: if someone asks “where did this number come from?” in six months, can you answer without reconstructing the original AI session?

4. Does the AI spend its effort on engineering or on infrastructure?

There is a practical cost dimension that is easy to overlook. Every AI interaction consumes computational resources, whether measured in tokens, API calls, or time. How those resources are distributed between infrastructure work and actual engineering work determines the productivity of the system.

An AI that derives physics, writes solver code, and builds parameter sweep logic from scratch spends most of its effort on model construction before any engineering insight is produced. An AI that works with pre-validated components and a platform that handles experiment execution spends its effort on the engineering problem itself: what to vary, what to compare, what the results imply for design.

This is not just an efficiency question. It is a quality question. Effort spent on reinventing infrastructure is effort not spent on study design, sensitivity analysis, and result interpretation, the parts that actually require engineering intelligence.

What to look for: when the AI finishes, how much of the session was engineering conversation vs. debugging generated code?

5. Does the system get better with use?

A tool that starts from zero every session is useful but limited. A system that learns from each study, which parameters mattered, which configurations were explored, which results led to decisions, becomes more valuable over time.

This does not mean the AI should learn unsupervised. It means that the structured data from each workflow, the component selections, parameter ranges, study configurations, and engineering conclusions, should be captured in a form that informs future work. When a team has run 200 chiller studies, study number 201 should benefit from that history.

For organizations where experienced engineers retire or move to other roles, this is particularly relevant. The knowledge that makes senior engineers productive does not have to leave with them if it is captured in the infrastructure rather than only in their heads.

What to look for: is each study a standalone event, or does it contribute to an organizational knowledge base that compounds over time?

The Bottom Line

AI for engineering simulation is advancing rapidly, and it should. The ability to move from engineering intent to simulation results in minutes rather than days is genuinely transformative.

But speed without trust is not useful for production decisions. The tools that will matter for industry are not the ones with the most impressive demos. They are the ones that can answer these five questions with evidence rather than promises.

The Modelica community has spent nearly three decades building open modeling languages and standard interfaces. Modelon has spent more than two decades building validated libraries on that foundation. AI does not replace any of it. It makes it dramatically more accessible and more productive, but only when it is connected to engineering infrastructure that was built for trust.

Johan Andreasson