Reproducibility of the diagnosis of dysplasia in Barrett esophagus: A reaffirmation

Elizabeth Montgomery, Mary P. Bronner, John R. Goldblum, Joel K. Greenson, Marian M. Haber, John Hart, Laura W. Lamps, Gregory Y. Lauwers, Audrey J. Lazenby, David N. Lewin, Marie E. Robert, Alicia Y. Toledano, Yu Shyr, Kay Washington

Research output: Contribution to journalArticlepeer-review

687 Citations (Scopus)


Morphologic assessment of dysplasia in Barrett esophagus, despite limitations, remains the basis of treatment. We rigorously tested modified 1988 criteria, assessing intraobserver and interobserver reproducibility. Participants submitted slides of Barrett mucosa negative (BE) and indefinite (IND) for dysplasia, with low-grade dysplasia (LGD) and high-grade dysplasia (HGD), and with carcinoma. Two hundred fifty slides were divided into 2 groups. The first 125 slides were reviewed, without knowledge of the prior diagnoses, on 2 occasions by 12 gastrointestinal pathologists without prior discussion of criteria. Results were analyzed by κ statistics, which correct for agreement by chance. A consensus meeting was then held, establishing, by group review of the index 125 slides, the criteria outlined herein. The second 125-slide set was then reviewed twice by each of the same 12 pathologists, and follow-up κ statistics were calculated. When statistical analysis was performed using 2 broad diagnostic categories (BE, IND, and LG v HG and carcinoma), intraobserver agreement was near perfect both before and after the consensus meeting (mean κ = 0.82 and 0.80). Interobserver agreement was substantial (κ = 0.66) and improved after the consensus meeting (κ = 0.70; P = .02). When statistical analysis was performed using 4 clinically relevant separations (BE; IND and LGD; HGD; carcinoma), mean intraobserver κ improved from 0.64 to 0.68 (both substantial) after the consensus meeting, and mean interobserver κ improved from 0.43 to 0.46 (both moderate agreement). When statistical analysis was performed using 4 diagnostic categories that required distinction between LGD and IND (BE; IND; LGD; HGD and carcinoma), the pre-consensus meeting mean intraobserver κ was 0.60 (substantial agreement), improving to 0.65 after the meeting (P < .05). Interobserver agreement was poorer, with premeeting and postmeeting mean values unchanged (κ = 0.43 at both times). Interobserver agreement was substantial for HGD/carcinoma (κ = 0.65), moderate to substantial for BE (κ = 0.58), fair for LGD (κ = 0.32), and slight for IND (κ = 0.15). The intraobserver reproducibility for the diagnosis of dysplasia in BE was substantial. Interobserver reproducibility was substantial at the ends of the spectrum (BE and HG/carcinoma) but slight for IND. Both intraobserver and interobserver variation improved overall after the application of a modified grading system developed at a consensus conference but not in separation of BE, IND, and LGD. The criteria used by the group are presented.

Original languageEnglish
Pages (from-to)368-378
Number of pages11
JournalHuman Pathology
Issue number4
Publication statusPublished - 2001 Jan 1

All Science Journal Classification (ASJC) codes

  • Pathology and Forensic Medicine


Dive into the research topics of 'Reproducibility of the diagnosis of dysplasia in Barrett esophagus: A reaffirmation'. Together they form a unique fingerprint.

Cite this