Comparison of Results

Finally, we can compare the results of the different models. They both performed well upon initial inspection, but we will utilize different metrics to compare results. In Figure 1 we reproduce the machine learning model predictions, and Figure 2 the survival function predictions, for the data set with claims filed.

ML lifespan predictions
Figure 1 - Predicted lifespans for our machine learning algorithms, for the data with claims filed. Overall each algorithm catches the trend over several orders of magnitude. For data with a lifespan of less than 100 days there is considerable spread, as well as some clear systematic deviations for lifespans greater than 1000 days.



Predicted lifespans for policies with claims
Figure 2 - Predicted lifespan for policies with claims, using survival functions. The blue points are predictions using Cox's model, and green point using Alan's. The Cox model performs well, though is slightly skewed from the 1:1 red line. There is clear spread at low lifespans, and systematic deviations on the high lifespan end, much like with the machine learning model.

We will use three metrics to describe our curves. For those unfamiliar with them, we provide a very brief overview:

  • R2 - a metric that describes how much variation our model captures. While a useful metric, a curve such as the AAF fit in Figure 2 can still manage a high score, as the predicted lifespan increases with the real value, but it clearly misses important variation. A value of 1 is perfect, lower values are worse (typically greater than 0, but can be negative).
  • Root mean squared error - similar to the standard deviation, equivalent to the standard deviation in our case, smaller values are better (best score is 0). Value indicates the typical spread of values around the mean, so that roughly 68% of points will be within the RMSE.
  • Root mean squared log error - like the root mean squared error, however much more useful when the data spans many orders of magnitude. The lower the value the better, large negative numbers are the best.

Now, lets look at how these metrics compare with all our tests.

Filed No Filed
Metric AAF CPH RFR NNR SVR AVG AAF CPH RFR NNR SVR AVG
R2 0.932 0.988 0.992 0.999 0.981 0.996 0.951 0.983 0.990 0.953 0.960 0.993
RMSE 258.8 109.6 86.7 34.6 133.5 61.2 220.5 128.4 97.8 206.9 191.6 80.5
RMSLE 0.223 0.096 0.029 0.059 0.047 0.036 0.190 0.103 0.067 0.070 0.071 0.065



As can be seen in the table, all predictions have an R2 value that is very good, this is due to the large range of value the lifespan takes, and that the predictions roughly match this large range. Even the Alan Additive Fitter, which shows strong systematics, has a value of 0.93. This makes it not the most reliable estimator for this analysis.

The RMSE is easy to interpret, and gives a better measure of the goodness of fit. For the policies with claims filed, we can expect roughly 70% of the data to fall within a 35 day range with the Neural Network Regressor, whereas the Support Vector Regressor will have the same amount within roughly 130 days of the estimated value. By this metric, the Neural Network and Average of the 3 machine learning algorithms perform best for policies with claims filed, and the Random Forest Regressor and Average for policies with no filed claims.

The RMSLE shows a slightly different trend, with the Random Forest Regressor and Support Vector Regressor shining through on the claims filed set, and the no claims filed shows all machine learning algorithms performing equally well (the changes in the value are relatively small).

These show an overall successful ability to predict the lifespan of accounts. All the algorithms perform relatively well, but notably the CPH, RFR, and NNR perform the best for different sets and metrics. With more data we could perform more testing and training, but this is a very satisfactory performance.

From here, a more proper test would be to generate the standard deviation of the difference in the prediction-real value. These residuals form a normal distribution, which we could use with the predicted values to estimate the likelihood of canceling at any given time, including January 2017. However without the final testing data, we are unable to perform this final test. The lifetime predictions are a sufficient test for our project, and we were very successful.

Return to Home.