Morphologic assessment of dysplasia in Barrett esophagus, despite limitations, remains the basis of treatment. We rigorously tested modified 1988 criteria, assessing intraobserver and interobserver reproducibility. Participants submitted slides of Barrett mucosa negative (BE) and indefinite (IND) for dysplasia, with low-grade dysplasia (LGD) and high-grade dysplasia (HGD), and with carcinoma. Two hundred fifty slides were divided into 2 groups. The first 125 slides were reviewed, without knowledge of the prior diagnoses, on 2 occasions by 12 gastrointestinal pathologists without prior discussion of criteria. Results were analyzed by κ statistics, which correct for agreement by chance. A consensus meeting was then held, establishing, by group review of the index 125 slides, the criteria outlined herein. The second 125-slide set was then reviewed twice by each of the same 12 pathologists, and follow-up κ statistics were calculated. When statistical analysis was performed using 2 broad diagnostic categories (BE, IND, and LG v HG and carcinoma), intraobserver agreement was near perfect both before and after the consensus meeting (mean κ = 0.82 and 0.80). Interobserver agreement was substantial (κ = 0.66) and improved after the consensus meeting (κ = 0.70; P = .02). When statistical analysis was performed using 4 clinically relevant separations (BE; IND and LGD; HGD; carcinoma), mean intraobserver κ improved from 0.64 to 0.68 (both substantial) after the consensus meeting, and mean interobserver κ improved from 0.43 to 0.46 (both moderate agreement). When statistical analysis was performed using 4 diagnostic categories that required distinction between LGD and IND (BE; IND; LGD; HGD and carcinoma), the pre-consensus meeting mean intraobserver κ was 0.60 (substantial agreement), improving to 0.65 after the meeting (P < .05). Interobserver agreement was poorer, with premeeting and postmeeting mean values unchanged (κ = 0.43 at both times). Interobserver agreement was substantial for HGD/carcinoma (κ = 0.65), moderate to substantial for BE (κ = 0.58), fair for LGD (κ = 0.32), and slight for IND (κ = 0.15). The intraobserver reproducibility for the diagnosis of dysplasia in BE was substantial. Interobserver reproducibility was substantial at the ends of the spectrum (BE and HG/carcinoma) but slight for IND. Both intraobserver and interobserver variation improved overall after the application of a modified grading system developed at a consensus conference but not in separation of BE, IND, and LGD. The criteria used by the group are presented.
All Science Journal Classification (ASJC) codes