A model that hits 98% accuracy in a notebook tells you almost nothing about whether it will survive a factory floor. I learned this building a computer-vision quality-control platform: the model was maybe a fifth of the work. The rest was acquisition, ingestion, traceability, and building the data flywheel that makes the next model better. Here is how I think about that gap.
From One Camera to a Distributed Pipeline
A notebook reads one image and prints a result. A line inspects parts continuously across multiple stations, and the camera that captures the image is rarely the machine that should run inference. I separate acquisition from processing early.
Acquisition nodes do one job: grab frames and drop them somewhere durable. A processing tier picks them up, runs the model, and writes results. This separation means a GPU box can serve several cameras, and a slow model never stalls the line.
The cleanest decoupling I have used is a file-watcher ingestion pattern. Acquisition writes an image to a watched directory; ingestion reacts. It is dead simple, survives a processing restart without dropping frames, and gives you a natural audit trail of raw inputs.
- Acquisition stays trivial and reliable — write a file, nothing else.
- Processing can crash and restart without losing unprocessed frames.
- The raw image directory becomes your evidence locker for disputes.
Decoupling acquisition from inference is the single architectural decision that let the system scale past one station without a rewrite.
Preprocessing Has to Be Deterministic
The reason notebook accuracy does not transfer is usually that lighting, focus, and framing drift on a real line. Before inference I run a fixed preprocessing pass and reject frames that fall outside known bounds rather than feeding the model garbage.
import cv2
def prepare_frame(path: str):
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
if img is None:
raise ValueError(f"unreadable frame: {path}")
# Reject out-of-focus frames before they reach the model
focus = cv2.Laplacian(img, cv2.CV_64F).var()
if focus < 80.0:
return None, {"reason": "low_focus", "score": float(focus)}
img = cv2.resize(img, (512, 512), interpolation=cv2.INTER_AREA)
img = cv2.normalize(img, None, 0, 255, cv2.NORM_MINMAX)
return img, {"reason": "ok", "focus": float(focus)}A rejected frame is not a failure — it is a signal that the station needs attention, and it keeps the model's input distribution honest. Logging why a frame was rejected turns a vague "the system is flaky" complaint into a focus calibration ticket.
PASS/FAIL Needs to Be Traceable
A bare classification is not enough for quality control. When a part is marked FAIL and someone disputes it, you need to reconstruct the decision. For every inspection I persist the raw image, the preprocessed input, the model version, the score, the threshold in effect, and the resulting verdict, all keyed to the part and timestamp.
This matters for two reasons. First, audits: a customer or regulator can ask why part #48213 was rejected eight months ago, and you can show them the exact frame and score. Second, threshold tuning: when you suspect the model is too aggressive, you replay stored scores against a new threshold and see the effect on historical parts before changing anything in production.
Every Inspection Is Training Data
The strategic payoff of all this storage is the dataset. Each inspected part, with its image and verdict, is a labeled example for the next model — provided you capture annotations cleanly. I route low-confidence and disputed verdicts to a human review queue, and those corrected labels are the highest-value data you can get, because they target exactly where the current model is weak.
A few habits make this flywheel actually turn:
- Store the model version with every prediction so you can segment performance by release.
- Capture human corrections as first-class labels, not as ad-hoc spreadsheet edits.
- Keep raw and preprocessed images both, so you can retrain with a different preprocessing pipeline later.
The model gets the attention because it is the interesting part. But the platform that decides whether computer vision succeeds in production is the unglamorous machinery around it: reliable acquisition, deterministic preprocessing, traceable decisions, and a dataset that compounds. Build that, and improving the model becomes a routine iteration rather than a research project every time.