While machine-learning techniques can improve business processes, predict future outcomes, and save money, they also increase modeling risk because of their complex and opaque features. In this article, Milliman’s Jonathan Glowacki and Martin Reichhoff discuss how model validation techniques can mitigate the potential pitfalls of machine-learning algorithms.
Here is an excerpt:
An independent model validation carried out by knowledgeable professionals can mitigate the risks associated with new modeling techniques. In spite of the novelty of machine-learning techniques, there are several methods to safeguard against overfitting and other modeling flaws. The most important requirement for model validation is for the team performing the model validation to understand the algorithm. If the validator does not understand the theory and assumptions behind the model, then they are likely to not perform an effective model validation on the process. After demonstrating an understanding on the model theory, the following procedures are helpful in performing the validation.
Outcomes analysis refers to comparing modeled results to actual data. For advanced modeling techniques, outcomes analysis becomes a very simple yet useful approach to understanding model interactions and pitfalls. One way to understand model results is to simply plot the range of the independent variable against both the actual and predicted outcome along with the number of observations. This allows the user to visualize the univariate relationship within the model and understand if the model is overfitting to sparse data. To evaluate possible interactions, cross plots can also be created looking at results in two dimensions as opposed to a single dimension. Dimensionality beyond two dimensions becomes difficult to evaluate, but looking at simple interactions does provide an initial useful understanding of how the model behaves with independent variables….
…Cross-validation is a common strategy to help ensure that a model isn’t overfitting the sample data it’s being developed with. Cross-validation has been used to help ensure the integrity of other statistical methods in the past, and with the rising popularity of machine-learning techniques, it has become even more important. In cross-validation, a model is fitted using only a portion of the sample data. The model is then applied to the other portion of the data to test performance. Ideally, a model will perform equally well on both portions of the data. If it doesn’t, it’s likely that the model has been over fit.
Registration for the 2017 Data Science Game is officially open. The Data Science Game is a two-phase competition showcasing teams of data science students from universities around the world. An online qualifier will take place on April 15 with the final stage happening in September.
Milliman’s Pixel is a web-based, competitive analytics platform that helps insurers use objective and comprehensive information to grow their business.
In this video, Milliman actuaries Nancy Watkins, Peggy Brinkman, and Cody Webb discuss how Pixel helps insurers compare their premiums with those of competitors, identify market sectors where they might be experiencing adverse selection, and access competitive information needed to make sound pricing decisions.
In July, teams of data science students from more than 50 universities around the globe competed in the qualification phase of the 2016 Data Science Game. Over 140 teams of four students were asked to develop an algorithm that could recognize the orientation of a roof from a satellite photograph by building on more than 10,000 photograph of roofs categorized through crowdsourcing.
Twenty-two teams have qualified for the final phase. The top three ranking teams were Jonquille (University Pierre and Marie Curie), PolytechNique (Ecole Polytechnique), and The Nerd Herd (University of Amsterdam). The final is being held in Paris on September 10 and 11, where the teams will compete in a big data analysis challenge.
For more information on the Data Science Game, click here.
Milliman is a sponsor of the 2016 Data Science Game.
Milliman is a sponsor of the 2016 Data Science Game, a two-phase competition showcasing teams of data science students from universities around the world. After an online eliminatory challenge, the best 20 teams will be invited to a two-day competition in Paris.
Last year, teams competed to solve a machine learning challenge created by Google. Students from the Moscow State University won the competition. Who will win this year?
Teams can register at www.datasciencegame.com. The deadline to register is May 31. The online challenge will take place in June while the two-day competition is scheduled for September.
In his article “Analysing competitor tariffs with machine learning,” Milliman consultant Bernhard Konig provides a sample analysis demonstrating how machine learning can help insurers better understand their competitors’ tariffs and premium rates. The excerpt below explains some advantages of the machine learning technique.
Machine learning techniques provide a flexible tool set to derive accurate estimates of competitor premiums without any knowledge about the underlying tariff structure. The machine learning approach we developed as part of our research is faster and much less expensive than exhaustive web scraping or mystery shopping. It [enables] insurance executives to make better informed decisions about not only tariff changes, but also marketing campaigns and commercial discounts for certain customer segments. The impact of a tariff change on profitability and business volume can certainly be much better assessed in the presence of competitor premiums. In an ideal scenario, a company has an estimate of the competitor premiums at the point of sale. This allows adjusting one’s own quote to increase either the probability of conversion (by lowering the quote) or the profitability (by increasing the quote).