Article Text
Abstract
Methodology is developed to classify ethnic status by name using a simple probabilistic model. This method involves the consideration of four rules which may be used to classify individuals using three name components (first, middle and last names). In order to do this, conditional probabilities of ethnic status are estimated from a sample in which the ethnic status is known. Using a split sample technique the sensitivity and specificity of this methodology were examined in a data set of death registrations. Each of the classification rules performed well on the data from which they were constructed but were not as efficient when applied to another population. Nevertheless a model (linear), in which the sum of the conditional probabilities of each home component is used, achieved a sensitivity and specificity of 97% and 100% respectively in males and 89% and 100% in females.