Article Text

Download PDFPDF
The classification of ethnic status using name information.
  1. A J Coldman,
  2. T Braun,
  3. R P Gallagher
  1. Cancer Control Agency of British Columbia, Vancouver, Canada.


    Methodology is developed to classify ethnic status by name using a simple probabilistic model. This method involves the consideration of four rules which may be used to classify individuals using three name components (first, middle and last names). In order to do this, conditional probabilities of ethnic status are estimated from a sample in which the ethnic status is known. Using a split sample technique the sensitivity and specificity of this methodology were examined in a data set of death registrations. Each of the classification rules performed well on the data from which they were constructed but were not as efficient when applied to another population. Nevertheless a model (linear), in which the sum of the conditional probabilities of each home component is used, achieved a sensitivity and specificity of 97% and 100% respectively in males and 89% and 100% in females.

    Statistics from

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.