Data
For the following parameter estimates, published by Degenhardt and Wendorff et al. (1), two different data collections were used:
1.) IKMB dataset: A new collection of 312 African American, 162 German, 140 Chinese, 143 Indian, 132 Iranian, 189 Japanese, 122 South Korean and 160 Maltese samples genotyped with the Illumina Immunochip (all but Malta) with 196,524 markers addressing immune relevant genes or the Illumina Infimum ImmunoArray 24 (Malta only) with 253,702 markers.
Full context four-digit level information for all classical HLA I and HLA II genes HLA‑A, ‑B, ‑C, ‑DQA1, ‑DQB1, ‑DPA1, ‑DPB1, ‑DRB1 as well as ‑DRB3/4/5 were available through NGS-based typing as published by Wittig et al.(2, 3).
Table 1: Dataset IKMB (4-digit, full context)
Populations |
AA |
CHN |
GER |
IND |
IRN |
JPN |
KOR |
MLT |
Σ |
# samples |
312 |
140 |
162 |
143 |
132 |
189 |
122 |
160 |
1360 |
2.) 1KG dataset: The Phase 3 [version from 20130502] 1000 Genomes reference data set with 174,538 phased SNPs that overlap with the Illumina Immunochip (162 samples of African Ancestry, 193 samples of South American Ancestry, 260 samples of East Asian ancestry and 322 samples of European ancestry). The allele information for this dataset is only publicly available on 4-digit G group level and does not include HLA‑DPA1, ‑DPB1, ‑DQA1 and ‑DRB3/4/5 allele calls.
(Side note: The HapMap data is a part of the 1000 Genomes data set.)
Table 2: Dataset 1KG (4-digit, G groups)
Populations |
AFR |
AMR |
EAS |
EUR |
Σ |
|||||||||
# samples |
162 |
193 |
260 |
322 |
937 |
|||||||||
Subpopulations |
ASW |
LWK |
YRI |
CLM |
MXL |
PUR |
CHB |
CHS |
JPT |
CEU |
FIN |
GBR |
TSI |
|
# samples |
41 |
75 |
46 |
67 |
56 |
70 |
82 |
92 |
86 |
52 |
95 |
86 |
89 |
937 |
(i) Primary model: Multi-ethnic reference panel in full four-digit context (multiethnic_IKMB.RData)
(ii) Multi-ethnic reference panel combined with the 1000 Genomes data set on G group level (multiethnic_IKMB_1KG.RData)
(iii) Multi-ethnic reference panel on G group level (multiethnic_IKMB_g.RData)
For the quality control of the data set and other details please see (1) (i) Figure 3 & Table 3 & Table 2; (ii) Supplementary Material, Fig. S2 & Table S1; (iii) Supplementary Material, Fig. S3 & Table S2.
1) Degenhardt, F., Wendorff, M., Wittig, M., Ellinghaus, E., Datta, L.W., Schembri, J., Ng, S.C., Rosati, E., Hubenthal, M., Ellinghaus, D. et al. (2018) Construction and benchmarking of a multi-ethnic reference panel for the imputation of HLA class I and II alleles. Hum Mol Genet, in press.
2) Wittig, M., Anmarkrud, J.A., Kassens, J.C., Koch, S., Forster, M., Ellinghaus, E., Hov, J.R., Sauer, S., Schimmler, M., Ziemann, M. et al. (2015) Development of a high-resolution NGS-based HLA-typing and analysis pipeline. Nucleic Acids Res, 43, e70.
3) Wittig, M., Juzenas, S., Vollstedt, M. and Franke, A. (2018) High-Resolution HLA-Typing by Next-Generation Sequencing of Randomly Fragmented Target DNA. Methods Mol Biol, 1802, 63-88.