AI Sparks

How We Helped the Global Product Process 100,000+ Quality Documents with AI

Global FMCG Brand Process 100,000+ Quality Testing

Here’s How We Helped a Global FMCG Product Process 100,000+ Quality Check Documents with AI Without Losing Accuracy

When a global FMCG manufacturer approached us, the problem wasn’t a lack of data. It was too much, stuck in the wrong format.

Every batch of product that left our manufacturing facilities came with a quality inspection document, test results PDF download, inspector login, and compliance data. Multiply that over many production lines and years of operation, and the number exceeds 100,000 documents. Each needed to be reviewed, key fields extracted, and validated against quality standards before they could be archived or entered into downstream systems.

Manually, this was done by a quality and compliance team reading each PDF, copying values ​​into spreadsheets or recording systems, and checking for errors. It worked, but it didn’t grow. Every new product line, every research cycle, every new market adds more documents to an already very long line.

The brief from the client was straightforward: automate the extraction and validation of these documents using AI document processingwithout compromising the accuracy their compliance processes depend on.

FMCG quality assessmentFMCG quality assessment

Why is this not a simple OCR problem

On paper, the solution looks straightforward use PDFs by using AI release pipelinecheck the score confidence, then approve. We’ve seen that assumption before, and it doesn’t survive in the context of actual production documentation.

FMCG quality inspection documents are contaminated by standard methods OCR tools they are not built. Some are clean, printed lab reports. Many are not. Inspectors write corrections by hand in the margins. Values ​​are struck and rewritten when retesting occurs. Stamps override printed text. Some documents are scans of poorly legible carbon copy forms. A pipeline that only handled plain printed text would not have been successful for a meaningful portion of this client’s real-world documents, and in a compliance-driven industry like FMCG, “significant portion” is not a number you can throw away.

So before writing the release logic, we spent time with the client’s quality team understanding what these documents look like in practice, not just what they look like in the best case scenario.

Confidence score trap in document AI

The first version of any AI pipeline document tends to use one confidence score for each document: if the model is 90% confident overall, accept it; if not, send it to someone.

This method failed early on, and it is worth explaining why. A document can achieve 95% overall confidence when one specific field, say, a test result that determines pass/fail, is incorrect. The total score is the average of all fields on the page. A few simple, obvious fields can pull the average up and hide one field that’s both hard to read and one that’s really important.

We moved field level confidence goals in turn. Each field, not the document as a whole, gets its own confidence score. Only fields that fall below the defined threshold are forwarded to a human reviewer. Everything else is automatically approved.

This one change had a huge impact. It meant that reviewers weren’t opening every document to double-check the fields that the AI ​​was already picking up correctly. They were looking at two or three fields, often out of twenty or more, that the program wasn’t really sure about. That’s the difference between a reviewer who rereads the entire document and a reviewer who spends fifteen seconds on a flagged area.

Handling handwriting, stamps, and typing

Quality documents in production facilities are functional documents, not clean forms. Inspectors remove the original value and write the corrected value next to it. Stamps are used to “endorse” or “retest” and partially obscure the underlying field. Scanners vary greatly in resolution depending on which facility produced them.

We built a special treatment for each of these patterns rather than treating them as OCR noise for average output. One detail surprised us during testing: in a few cases, the AI ​​correctly identified the hit value and chose not to issue it, treating it as a replacement, which is correct behavior, but it meant that the validation workflow needed a clear way to say “correction done here” instead of treating every hit as a failure to issue. We’ve added a separate review flag for documents with visible corrections, so reviewers can quickly verify that the AI ​​picked up the corrected value rather than the original.

Keeping people in the loop, on purpose

None of this was about removing the quality team from the program. It was about changing what they spent their time on.

Before the pipeline went live, the entire workflow was validated by the customer’s domain experts against their existing manual process. We tested the AI ​​output against documents that had already been manually reviewed, comparing field by field, until the accuracy of the output was consistently raised across different types of documents, resources, and scanning qualities. That validation phase took real time, and we considered it non-negotiable – a pipeline that looks accurate in a demo and a pipeline that’s accurate enough to be trusted by a compliance-driven industry are not the same thing.

Once live, the reviewer’s role has changed. Instead of reading every document end-to-end, reviewers worked from a line of flagged fields, ones that the AI ​​couldn’t really trust, and documents with visual corrections or unusual formatting. The skilled people in the group do the same kind of judicial work they always have, they just apply it to a much smaller, more relevant set of cases.

The result: an 80% reduction in manual review effort

  • More than 100,000 documents have been forwarded.
  • Extraction accuracy has reached over 90% at field level, verified against manual customer benchmarks.
  • Manual review effort decreased by nearly 80%, as reviewers were only looking at flagged fields and flagged documents rather than all pages.
  • Processing time, which has been a constant hurdle for the compliance team, has come down significantly.
  • The backlog that has been growing with every new audit cycle has increased.
  • The number that mattered most to the client was not the percentage.
  • Their quality team, the people who spent years building the judgment to know when value is off, finally applied that judgment to the documents they really needed.

What can we tell anyone when they process an AI document

If there’s one lesson from this project that’s worth repeating, it’s that write AI in a regulated or quality-sensitive industry it is not a problem of choosing a model. It’s a workflow design problem. The rollout model is part of a system and requires trust points at the platform level, a clear escalation path for abstract cases like manual writing and editing, and a validation layer strong enough for the people who rely on it to really trust it.

That combination is what turns “we used AI in our documents” into a process that the compliance team will stand behind.

If you’re sitting on a backlog of documents, quality reports, compliance records, lab results, or anything similar, and manual review has become a bottleneck, we’d be happy to talk about what a workflow like this might look like for your data.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button