Methodology is developed to classify ethnic status by name using a simple probabilistic model. This method involves the consideration of four rules which may be used to classify individuals using three name components (first, middle and last names). In order to do this, conditional probabilities of ethnic status are estimated from a sample in which the ethnic status is known. Using a split sample technique the sensitivity and specificity of this methodology were examined in a data set of death registrations. Each of the classification rules performed well on the data from which they were constructed but were not as efficient when applied to another population. Nevertheless a model (linear), in which the sum of the conditional probabilities of each home component is used, achieved a sensitivity and specificity of 97% and 100% respectively in males and 89% and 100% in females.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.