A platform-based Natural Language processing-driven strategy for digitalising regulatory compliance processes for the built environment

Kruiper, Ruben and Kumar, Bimal and Watson, Richard and Sadeghineko, Farhad and Gray, Alasdair and Konstas, Ioannis (2024) A platform-based Natural Language processing-driven strategy for digitalising regulatory compliance processes for the built environment. Advanced Engineering Informatics, 62 (Pt. B). 102653. ISSN 1474-0346 (https://doi.org/10.1016/j.aei.2024.102653)

[thumbnail of Kruiper-etal-AEI-2024-A-platform-based-Natural-Language-processing-driven-strategy-for-digitalising-regulatory-compliance-processes]
Preview
Text. Filename: Kruiper-etal-AEI-2024-A-platform-based-Natural-Language-processing-driven-strategy-for-digitalising-regulatory-compliance-processes.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (4MB)| Preview

Abstract

The digitalisation of the regulatory compliance process has been an active area of research for several decades. However, more recently the level of activities in this area has increased considerably. In the UK, the tragic incident of Grenfell fire in 2017 has been a major catalyst for this as a result of the Hackitt report’s recommendations pointing a lot of the blame on the broken regulatory regime in the country. The Hackitt report emphasises the need to overhaul the building regulations, but the approach to do so remains an open research question. Existing work in this space tends to overlook the processing of actual regulatory documents, or limits their scope to solving a relatively small subtask. This paper presents a new comprehensive platform approach to the digitalisation of the regulatory compliance processing. We present i-ReC (intelligent Regulatory Compliance), a platform approach to digitalisation of regulatory compliance that takes into consideration the enormous diversity of all the stakeholders’ activities. A historical perspective on research in this area is first presented to put things in perspective which identifies the challenges in such an endeavour and identifies the gaps in state-of-the-art. After enumerating all the challenges in implementing a platform-based approach to digitalising the regulatory compliance process, the implementation of some parts of the platform is described. Our research demonstrates that the identification and extraction of all relevant requirements from the corpus of several hundred regulatory documents is a key part of the whole process which underlies the entire process from authoring to eventually compliance checking of designs. Some of the issues that need addressing in this endeavour include ambiguous language, inconsistent use of terms, contradicting requirements and handling multi-word expressions. The implementation of these tools is driven by NLP, ML and Semantic Web technologies. A semantic search engine was developed and validated against other popular and comparable engines with a corpus of 420 (out of about 800) documents used in the UK for compliance checking of building designs. In every search scenario, our search engine performed better on all objective criteria. Limitations of the approach are discussed which includes the challenges around licensing for all the documents in the corpus. Further work includes improving the performance of SPaR.txt (the tool created to identify multi-word expressions) as well as the information retrieval engine by increasing the dataset and providing the model with examples from more diverse formats of regulations. There is also a need to develop and align strategies to collect a comprehensive set of domain vocabularies to be combined in a Knowledge Graph.

ORCID iDs

Kruiper, Ruben, Kumar, Bimal ORCID logoORCID: https://orcid.org/0000-0002-2539-4902, Watson, Richard, Sadeghineko, Farhad, Gray, Alasdair and Konstas, Ioannis;