Research
Experimental research on LLM agent performance in industrial engineering decision environments.
Published
SUPPLY CHAIN · HYBRID ARCHITECTURE · EXPERIMENT WRITEUP
Three AI models controlled the safety stock multiplier in a hybrid architecture; a mathematical formula handled the order quantity. All four hypotheses failed across three information conditions and 20 replications per condition. Every AI condition produced higher order variance than doing nothing. Context made two out of three models worse. Memory caused the advanced reasoning model to collapse.
SUPPLY CHAIN · SOVEREIGN MODEL · EXPERIMENT WRITEUP
India’s sovereign model showed no measurable difference from GPT OSS on this task: OVAR 4.504 vs. 4.52. Neither model detected Indian seasonal demand patterns. Exponential smoothing outperformed both by approximately 8×. The GPT OSS result is an Agentic Bullwhip Effect Version 2 context reference, not a co-run comparison.
SUPPLY CHAIN · EXPERIMENT WRITEUP
Four LLM configurations evaluated against three heuristic baselines across 20 replications and 11,520 LLM calls. Every heuristic outperformed every LLM on both order variance and stockout count simultaneously. All seven hypotheses rejected.
SUPPLY CHAIN · EXPERIMENT WRITEUP
All four configurations amplified demand variability. The context × reasoning condition produced a fully inverted tier pattern (OEM as the noisiest tier, Component as the quietest), reversing the standard upstream cascade.