Capitalising on a massive documentary heritage thanks to generative AI
Artelia, an engineering and consulting group, wanted to better leverage its very extensive documentary heritage in a context of increasing digitisation and project complexity. The challenge was to transform this mass of unstructured documents into a useful, accessible and exploitable resource, thanks to a conversational interface based on generative AI.
- Published on
In a few figures...
1
centralised and historical data hub
JEMS has introduced a unified data platform into the overall architecture, designed to address several types of use cases.
21
subsidiaries to be unified
The project was intended to enable all group entities to share a common repository and a smoother flow of information.
1
first use case deployed in production
The Bodywork catalogue served as a concrete demonstrator to offer all subsidiaries a unified view of parts availability and prices.
The project
Our approach
JEMS supported Artelia in setting up a generative AI system focused on the valuation of unstructured documents. The intervention combined the completion of a Proof of Value, the implementation of a Data Centric architecture, and the deployment of two concrete use cases, with Snowflake as the technological partner and Streamlit for the application layer.
The diagnosis
Artelia faced a major document management and operational challenge. With over 100 million documents from engineering projects, reports, technical plans, correspondence, and administrative documents, the difficulty lay not only in storing the information but also in its ability to be quickly found, understood, and reused. Therefore, it was necessary to implement a system capable of querying this massive corpus fluidly and relevantly, while also paving the way for new business uses.
The key deliverables
- Proof of Value realisation for the generative AI-powered chatbot
- Setting up a Data-Centric Architecture
- Development of a natural language conversational interface
- Implementation of two business use cases around document search and content generation
- Deployment of a solution based on Snowflake and Streamlit
How can Artelia capitalise on over 100 million documents thanks to generative AI?
The benefits
Much faster access to information
Users can find relevant documents without manually browsing through millions of files.
Better exploitation of unstructured documents
The conversational interface allows for dynamic querying of a complex and heterogeneous documentary corpus.
Practical assistance on legal matters
One of the use cases concerns the instant search for documentary evidence, which is particularly useful in a legal context.
A fast-tracked documentary production
The tool also allows new content to be generated for tenders from historical data, particularly from technical specifications.
More efficient business processes
The solution improves the efficiency and quality of Artelia's processes by making the documentary heritage more dynamic and usable.
In video: how Artelia values 100 million documents with generative AI
Discover how Artelia, with the support of JEMS, is transforming a massive documentary heritage into an exploitable resource through a conversational interface powered by generative AI.
The 6-step approach
1. Starting from a massive documentary heritage
The project was built around a clear challenge: to make a library of over 100 million documents usable.
2. Validate the value with a Proof of Value
JEMS first concretely demonstrated the benefits of the chatbot before considering a wider deployment.
3. Implement the Data-Centric Foundation
A data-centric architecture has been deployed to ensure the integration, quality, and availability of information.
4. Develop the conversational interface
JEMS and Snowflake have designed a natural language interface capable of querying the documentary corpus in real time.
5. Deploy initial use cases
Two uses have been implemented: documentary evidence searching and content generation for tenders.
6. Transforming data into an operational resource
The set allows a massive, unstructured corpus to transition from the status of an archive to that of an exploitable business resource.
