<p dir="ltr">Water distribution systems (WDS) are increasingly challenged by aging infrastructure, climate variability and the growing demand for sustainable operation. Pipeline failures are one of the most critical issues, resulting in service disruptions and high maintenance costs. This study proposes a machine learning (ML) framework to classify municipalities into two risk categories, “Priority” and “No Priority”, based on historical failure trends. The latter can support decision-makers in planning for the rehabilitation of WDS. A Sliding Window approach is applied to a dataset covering monthly failure records from 22 municipalities managed by a utility company in Southern Italy, spanning over seven years. Each training instance includes six months of failure rate data as input features, and the failure rate of the following month as the target variable. The failure data are normalized by network length to account for differences in system size, and class labels are assigned based on percentile thresholds. The models are trained and tested by combining hold-out and cross-validation strategies. Several algorithms are benchmarked, including decision trees, support vector machines (SVMs), logistic regression, and ensemble methods. Among all tested classifiers, the Coarse Gaussian SVM achieved the highest performance, with a test accuracy of 85.2%, a recall of 87.5%, and an F1-score of 85.6% for the “Priority” class. Cost-sensitive learning was applied to penalize false negatives more heavily, in alignment with operational needs. The comparison between the actual and predicted failure labels confirms the effectiveness of the model in estimating further maintenance in these new metered areas.</p><p dir="ltr">This paper was presented at the 21st Computing and Control in the Water Industry Conference (CCWI 2025) at the University of Sheffield (1st - 3rd September 2025).</p>
History
Methodology, headings and units
Headings and units are explained in the files
Policy
The data complies with the institution and funders' policies on access and sharing
Sharing and access restrictions
The uploaded data can be shared openly
Data description
The file formats are open or commonly used
Responsibility
The depositor is responsible for the content and sharing of the attached files
Ethics
There is no personal data or any that requires ethical approval