Working Paper Article Version 1 This version is not peer-reviewed

Synthetic Dataset Generation of Driver Telematics

Version 1 : Received: 29 January 2021 / Approved: 1 February 2021 / Online: 1 February 2021 (12:51:02 CET)

A peer-reviewed article of this Preprint also exists.

So, B.; Boucher, J.-P.; Valdez, E.A. Synthetic Dataset Generation of Driver Telematics. Risks 2021, 9, 58. So, B.; Boucher, J.-P.; Valdez, E.A. Synthetic Dataset Generation of Driver Telematics. Risks 2021, 9, 58.


This article describes techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations about driver’s claims experience together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can be used to advance models to assess risks for usage-based insurance. It follows a three-stage process using machine learning algorithms. The first stage is simulating values for the number of claims as multiple binary classifications applying feedforward neural networks. The second stage is simulating values for aggregated amount of claims as regression using feedforward neural networks, with number of claims included in the set of feature variables. In the final stage, a synthetic portfolio of the space of feature variables is generated applying an extended SMOTE algorithm. The resulting dataset is evaluated by comparing the synthetic and real datasets when Poisson and gamma regression models are fitted to the respective data. Other visualization and data summarization produce remarkable similar statistics between the two datasets. We hope that researchers interested in obtaining telematics datasets to calibrate models or learning algorithms will find our work valuable.


Bayesian optimization; Gaussian process; Neural network; SMOTE; Usage-based insurance (UBI); Vehicle telematics


Computer Science and Mathematics, Algebra and Number Theory

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.