Personal AI Research Initiative
Can AI agents operationalise the fundamentals of industrial engineering? I'm finding out — one micro-experiment at a time.
About
AI and LLMs have fundamentally transformed our approach to solving problems. With years of hands-on shop floor experience in Indian manufacturing, combined with a current focus on managing hyperscaler relationships from a techno-strategic point of view, I wanted to explore how these two worlds operate and how AI can add value to the core foundational aspects of Industrial Engineering. This program is how I explore that gap.
The core thesis: how can agentic AI operationalise IE theory? Explored through micro-experiments.
Each experiment takes a foundational IE concept, a supply chain dynamic, a maintenance framework, an inventory model, and turns it into a controlled simulation environment where LLM agents make decisions. This goes beyond model comparisons or benchmarks. It places LLMs into full-blown Industrial Engineering environments. Results are broken down, analysed, and shared with fellow AI researchers and domain peers.
All experiments are personal, use personal compute locally or in the cloud, and involve entirely fictional scenarios.
Experiments
SUPPLY CHAIN · 2×2 FACTORIAL · GPT-4.1-MINI vs O1
All four configurations amplified demand variability. Context reduced amplification for the lightweight model — and increased it for the reasoning model. The most capable configuration produced a pattern that classical bullwhip theory would not predict.
SUPPLY CHAIN · 4 EXPERIMENTS · GPT-4.1-MINI · O4-MINI · PHI-4-REASONING-PLUS
V1 compared AI configurations against each other. V2 asks a harder question: do AI agents beat simple heuristics at all — and if so, which configuration gets closest and why?
TPM / PREDICTIVE MAINTENANCE · FRAGMENTED RECORDS · VERNACULAR NORMALIZATION
Tests whether AI agents can support TPM workflows when reasoning over fragmented, realistic maintenance records, including a condition that simulates a vernacular input normalisation layer upstream.
Methodology
01
Analytical control baselines
Every experiment pairs LLM agent performance against a non-LLM analytical benchmark — not just model-vs-model. Deviation from theory is the signal.
02
Controlled simulation environments
Synthetic but calibrated parameters derived from public literature. All scenarios are entirely fictional with no proprietary data involved.
03
Multi-model comparison
Experiments compare across model tiers and reasoning architectures, with 50–100 replications per cell to support statistical inference.
Stack
Writing
Experiment writeups and methodology notes are published on the blog. The first post — Agentic Bullwhip Effect — Version 1 — is live. Code and data for each experiment are on GitHub.