On evaluating brain tissue classifiers without a ground truth

DSpace/Manakin Repository

On evaluating brain tissue classifiers without a ground truth

Citable link to this page


Title: On evaluating brain tissue classifiers without a ground truth
Author: Bouix, Sylvain; Martin-Fernandez, Marcos; Ungar, Lida; Nakamura, Motoaki; Koo, Min-Seong; McCarley, Robert William; Shenton, Martha Elizabeth ORCID  0000-0003-4235-7879

Note: Order does not necessarily reflect citation order of authors.

Citation: Bouix, Sylvain, Marcos Martin-Fernandez, Lida Ungar, Motoaki Nakamura, Min-Seong Koo, Robert W. McCarley, and Martha E. Shenton. 2007. “On Evaluating Brain Tissue Classifiers Without a Ground Truth.” NeuroImage 36 (4) (July): 1207–1224. doi:10.1016/j.neuroimage.2007.04.031.
Full Text & Related Files:
Abstract: In this paper, we present a set of techniques for the evaluation of brain tissue classifiers on a large data set of MR images of the head. Due to the difficulty of establishing a gold standard for this type of data, we focus our attention on methods which do not require a ground truth, but instead rely on a common agreement principle. Three different techniques are presented: the Williams’ index, a measure of common agreement; STAPLE, an Expectation Maximization algorithm which simultaneously estimates performance parameters and constructs an estimated reference standard; and Multidimensional Scaling, a visualization technique to explore similarity data. We apply these different evaluation methodologies to a set eleven different segmentation algorithms on forty MR images. We then validate our evaluation pipeline by building a ground truth based on human expert tracings. The evaluations with and without a ground truth are compared. Our findings show that comparing classifiers without a gold standard can provide a lot of interesting information. In particular, outliers can be easily detected, strongly consistent or highly variable techniques can be readily discriminated, and the overall similarity between different techniques can be assessed. On the other hand, we also find that some information present in the expert segmentations is not captured by the automatic classifiers, suggesting that common agreement alone may not be sufficient for a precise performance evaluation of brain tissue classifiers.
Published Version: doi:10.1016/j.neuroimage.2007.04.031
Other Sources: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2702211/
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:28552567
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)


Search DASH

Advanced Search