HIBAG – HLA Genotype Imputation with Attribute Bagging

R/Bioconductor Package: HIBAG
Email: Xiuwen Zheng or Bruce S. Weir
This page was last updated on Mar 1, 2017
[Home: index.html]

Contents

 


Introduction

HIBAG is a state of the art software package for imputing HLA types using SNP data, and it uses the R statistical programming language. HIBAG is highly accurate, computationally tractable, and can be used by researchers with published parameter estimates (provided for subjects of European, Asian, Hispanic and African ancestries) instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles deduced using bootstrap aggregating and random variable selection.

back to contents

 


Features

back to contents

 


Citation

If you use HIBAG in a published analysis, please report the HIBAG version used and cite the appropriate publication or publications listed below:

back to contents

 


Download HIBAG

R/Bioconductor: http://bioconductor.org/packages/HIBAG

back to contents

 


Download Published Parameter Estimates

The published parameters were estimated from HLA and SNP genotypes of multiple GlaxoSmithKline clinical trials (referred to as “HLARES”) and HapMap Phase 2. The HIBAG models were built from SNP markers common to the Illumina 1M Duo, OmniQuad, OmniExpress, 660K and 550K platforms. The training data consist of 1) HLARES data of European ancestry, 2) HLARES data of Asian ancestry (East & South Asia) and HapMap CHB+JPT, 3) HLARES data of Hispanic ancestry, and 4) African American HLARES data and 60 African parents of HapMap YRI.

HLA Nomenclature Updates (important update: April 2010)

Summary of training data set:

Ethnic-specific models of two-field (4-digit) resolution, presented in Zheng et al. (2014):



Prediction accuracy was used to assess overall model performance, defined as "the number of chromosomes with HLA alleles predicted correctly" over "the total number of chromosomes". The standard statistical quantities of prediction quality for a specific HLA allele H:

       training sample size

back to contents

 


Examples

back to contents

 


Version History

back to contents

 


back to contents