Text mining narrative data can help uncover valuable information for insurers that might not be captured in conventional data systems. In a new article, Phil Borba shows how text mining can identify auto policyholders who may have been driving under the influence of a drug at the time of an accident.
His analysis finds a measurable increase in severity when there was the presence of a medication, prescription, or illegal drug for one or more drivers in an accident. While the finding may not be new, the manner that Borba used to extract information from the text in the accident descriptions is novel. The mining of the text descriptions can help improve an insurer’s claims and underwriting practices in many ways.
This excerpt explains the methodology and results of the text mining analysis:
Narrative descriptions for the 7,000 NHTSA (National Highway Traffic Safety Administration) accidents were broken into phrases, and similar phrases were grouped together using analytical models. After removing prepositions and uninformative prepositional phrases, the result was a data file with more than 13 million phrases.
Next, we used four different themes for identifying the presence of a medication, prescription, drug, or illegal narcotic. First, we identified phrases with a “taking medications” theme. We joined phrases with the word “medications” that indicated a driver may have been taking medications. For example, we joined “on many” and “taking pain” to form “on many medications” and “taking pain medications,” respectively.
The second theme followed the same process, replacing “medications” with “prescriptions,” which gave us phrases such as “on many prescriptions” and “taking pain prescriptions.” These two themes produced approximately 1,100 phrases.
The third theme joined an action and a drug name. The result from these joins was a long list of phrases with “had taken [drug name],” “was on [drug name],” and so on, replacing [drug name] with the names of drugs. For the present analysis, we worked with 3,590 phrases with a drug name. The fourth theme was a list of 52 references to illegal narcotics that we considered red flags when seen on an accident description. This list included “cocaine,” “heroin,” and “marijuana.”
In sum, the first two themes were general references to medications and prescriptions, the third theme captured references to drug names, and the fourth theme was a list of illegal narcotics that we considered “red flags” for a driver being under the influence of a drug. For each theme, a binary (0/1) variable was created to capture whether the presence of medications, prescriptions, a drug name, or an illegal narcotic was mentioned in the accident description.
An injury was reported to have occurred in 73% of the 6,949 accidents in the NHTSA database. We found a reference to taking or being on a medication in approximately 16% of the accidents, and an injury occurred in 82% of these accidents. Similarly, we found a reference to taking or being on a prescription or a drug in approximately 6.5% of the accidents and an 80% injury occurrence for these subsets of accidents. Finally, we found reference to an illegal narcotic in 2.4% of the accidents and that an injury occurred in 89% of these accidents.
To read the entire paper, click here.