Preprint
Article

Predicting Motor Insurance Claims Using Telematics Data—XGBoost vs. Logistic Regression

This version is not peer-reviewed.

Submitted:

09 May 2019

Posted:

10 May 2019

You are already at the latest version

A peer-reviewed article of this preprint also exists.

Abstract
XGBoost is recognized as an algorithm with exceptional predictive capacity. Models for a binary response indicating the existence of accident claims vs. no claims can be used to identify the determinants of traffic accidents. We compare the relative performances of logistic regression and XGBoost approaches for predicting the existence of accident claims using telematics data. The dataset contains information from an insurance company about individuals’ driving patterns – including total annual distance driven and percentage of total distance driven in urban areas. Our findings show that logistic regression is a suitable model given its interpretability and good predictive capacity. XGBoost requires numerous model-tuning procedures to match the predictive performance of the logistic regression model and greater effort as regards interpretation.
Keywords: 
dichotomous response; predictive model; tree boosting; GLM; machine learning
Subject: 
Business, Economics and Management  -   Econometrics and Statistics
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Downloads

749

Views

1210

Comments

0

Subscription

Notify me about updates to this article or when a peer-reviewed version is published.

Email

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2025 MDPI (Basel, Switzerland) unless otherwise stated