In one of our recent papers on delivering the components of an intelligent search solution we identified the following components

One of the areas that we see companies struggling with is when to use AI/ML techniques to read content.  There are a lot of techniques, and you need to be able to select and run the right tools at the right time.

We recently worked with a client who (due to a major acquisition) had a tsunami of content heading their way, consisting of hundreds of millions of files that needed to be organised. The company had access to an AI toolkit in the cloud and on premise for reading content which, on average, took 1-2 minutes per document to process.  The time taken to complete a full ingestion and categorisation was estimated at 350-450 years!

Their tools required too much processing power and time to complete the analysis of such a huge dataset.  The project drivers included a need for uninterrupted operations, and completion of the content migration within 18 months.

So how were we able to help?

Flare took a different approach, involving a light scan of all of the content based on readily available metadata.  This included an initial focus on file paths and file names, combined with the application of a natural language processor, a technical language processor and a knowledge base,  all of which allowed us to automatically categorise the dataset.

This approach leveraged the client’s existing knowledge of the assets associated with the data (countries, fields, wells) to automatically tag and disambiguate the dataset, and perform an initial sort. Following this, it was possible to prioritise content migration and identify duplicates and data requiring archival and deletion.

A subset of content was then identified for further categorisation.  On this occasion a number of high value document types were identified for manual tagging. Final well reports received additional tagging with the client’s AI tools, and a tranche of well log data was enhanced with Flare’s industry dictionaries.

All of this was completed within the allotted timeframe because the right tools for the job were selected at the right time.

So, with the right approach and the appropriate tools, it becomes faster and cheaper to make sense of your information using different processes to achieve the desired results, whether those processes are manual or automatic.  This effort can then become part of a continual improvement programme that can run if new knowledge is gained, or improved classification techniques are developed.

