www.nature.com/scientificreports OPEN Pattern Discovery in Brain Imaging Genetics via SCCA Modeling with a Generic Non-convex Penalty Received: 5 May 2017 Accepted: 2 October 2017 Published: xx xx xxxx Lei Du1, Kefei Liu2, XiaohuiYao   2, JingwenYan2, Shannon L. Risacher2, Junwei Han1, Lei Guo1, Andrew J. Saykin   2, Li Shen   2 & the Alzheimer’s Disease Neuroimaging Initiative* Brain imaging genetics intends to uncover associations between genetic markers and neuroimaging quantitative traits. Sparse canonical correlation analysis (SCCA) can discover bi-multivariate associations and select relevant features, and is becoming popular in imaging genetic studies. The L1norm function is not only convex, but also singular at the origin, which is a necessary condition for scfepoaaetrfufsiircteiyes.ntTothspuausnrmdsuomistatcySoCirnrCecAusprmsoenestdthiinomgdasstpiimoanrpsboitisyae.sH.1Ao-nwnouermvmebro,entrhtooeftnh1o-ennio-ncrdominvvpideeuxnaaplleftenyaaotlvtuiereers-poaerrenthapelriozsetprsoulscaetrdugreteolreevdeul coef the estimation bias in regression tasks. But using them in SCCA remains largely unexplored. In this paper, we design a unified non-convex SCCA model, based on seven non-convex functions, for unbiased estimation and stable feature selection simultaneously. We also propose an efficient optimization algorithm. The proposed method obtains both higher correlation coefficients and better canonical loading patterns. Specifically, these SCCA methods with non-convex penalties discover a strong association between the APOE e4 rs429358 SNP and the hippocampus region of the brain. They both are Alzheimer’s disease related biomarkers, indicating the potential and power of the non-convex methods in brain imaging genetics. By identifying the associations between genetic factors and brain imaging measurements, brain imaging genetics intends to model and understand how genetic factors influence the structure or function of human brain1–14. Both genetic biomarkers such as single nucleotide polymorphisms (SNPs), and brain imaging measurements such as imaging quantitative traits (QTs) are multivariate. To address this problem, bi-multivariate association models, such as multiple linear regression15, reduced rank regression16–18, parallel independent component analysis19, partial least squares regression20,21, canonical correlation analysis (CCA)22 and their sparsity-inducing variants23, have been widely used to uncover the joint effect of multiple SNPs on one or multiple QTs. Among them, SCCA (Sparse CCA), which can discover bi-multivariate relationships and extract relevant features, is becoming popular in brain imaging genetics. The CCA technique has been introduced for several decades24. CCA can only perform well when the number of observations is larger than the combined feature number of the two views. Unfortunately, the problem usually is a large-p-small-n problem in the biomedical and biology studies. And it gets even worse because in CCA we are facing a large-(p + q)-small-n problem. In order to overcome this limitation, sparse CCA (SCCA)25–36 employs a sparsity inducing regularization term to select a small set of relevant features and has received increasing atten- tpttouhinaoeritnerroew.s2Ttai9hsh,r3eeee2.fomeFr1a-uadtnnureotyrrhreeSmesdCrd,pCbetaahfAiisrneewvedgadirrsSabieCpaynfhCeataAsnlatbusumasrnoseedestodi2hr5roe,otcdanhtn2ee5tddhhgetgarhrsaea1pg-phgnahriong2or9ueu.mdipd.gelFardoesrsalaotessxpsuaoecmnccpaeasllntesysbf,oietmrhviepitesofwussepseesaddrtshaliasetsyismp1o-upnprooessnruimainnlgtgoytnchimateoppato1bh-sienleiosgtyrrt.mhoAueofptn1eo-rtnfotofhterahmate-, penHaloizweedvleera,stthsequ1a-rneosrmmopdeenlianlgty, FsahnowansdthLei3c8oshnoflwicetdotfhoapttaimgoaol dprpeednicatltioynfuanncdticoonnsshisotuelndt feature selection37. In meet three properties. 1School of Automation, Northwestern Polytechnical University, Xi’an, 710072, China. 2Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, 46202, USA. *A comprehensive list of consortium members appears at the end of the paper. Correspondence and requests for materials should be addressed to L.D. (email: dulei@nwpu.edu.cn) or L.S. (email: shenli@iu.edu) Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 1 www.nature.com/scientificreports/ First, the penalty function should be singular at the origin to produce sparse results. Second, it should produce continuous models for stable model selection, and third, the penalty function should not penalize large coeffiirscodetirreseianpgaiitelnnscse.ttpdOotaopransrvtoihtothbeyiedl-eeicsemnotsidntmiiutsmrcaNatiarnitPoyig-,nohtpnhraeiebrsndkiaa431ls1-9t.,.n4y0To..hHrTmehoewp1-een0nv-oeanrrlomt,yrimtopivesfenunran-elpcittyethinoiesanrlsiwcuzoechsnciecvlashersxgfouennlcloiyonriecnffeofviancotitluevinnreeustssot,heualenes,cndtauitnomhdnubstbehiertucomsafusansoyoelbvniietzneissgruosbi0fone-pgantutoiulmrarmreaslacitwsotiahntnheDpeenvAaialnttiiuoemsni(bnSecCrluAodfDen)soptneh-necaolntγyv-3ne8,xotrhpmeenL(aa0lpt  0. λγ ( u + γ)2 SCAD38 Laplace44 λ−λ2||u(γu2|+,|212+)(γ,2γ−λ1|)u | −λ2 , |u| ≤ λ λ ≤ |u| ≤ γλ |u| ≥ γλ. λ0γλ,γ,−−|1u| , |u| ≤ λ λ ≤ |u| ≤ γλ |u| ≥ γλ. ( ( ))λ 1 − exp − u γ ( )λ γ exp − u γ MCP45 ETP46 λ12 γ|uλ|2−, |u|2 2γ , |u| ≤ γλ |u| ≥ γλ. 1 − λ exp(−γ) (1 − exp(−γ u )) λ0, − |u| γ , |u| ≤ γλ |u| ≥ γλ. 1 − λγ exp(−γ) exp(−γ u ) Logarithm47 λ log(γ + 1) log(γ u + 1) λγ (γ u + 1)log(γ + 1) Table 1.  The seven non-convex penalty functions and their supergradients. (u, v) = −uΤXΤYv + Ωnc(u) + Ωnc(v) + α1 2 ( Xu 2 − 1) + α2 2 ( Yv 2 − 1), which is equivalent to (4) (u, v) = −uΤXΤYv + Ωnc(u) + Ωnc(v) + α1 2 Xu 2 + α2 2 Yv 2 (5) from the point of how to solve this view of optimization. non-convex problem. α1, α2, λ1, λ2 and γ are nonnegative tuning parameters. Next we will show The first term −uΤXΤYv on the right of equation (5) is biconvex in u and v. Xu 2 is convex in u, and Yv 2 is convex The ilnocva.lItqrueamdaraintisctaopapprporxoixmimataiotenb(oLtQh AΩ)nct(euc)hanniqduΩenwc(avs) and transform introduced to them into convex ones. quadratically expresses the SCAD penalty38. Based on LQA, we here show how to represent these non-convex penalties in a unified way. First, we have the first-order Taylor expansion of Pλ1,γ( μ ) at μ0 Pλ,γ((μ)1/2) at μ0 Pλ ,γ( μ ) ≈ Pλ,γ( μ0 ) + P′λ,γ( μ0 ) 2 1 μ0 (μ − μ0), (6) μwh=erue i2μa0nadndμ0μ=ar(eunite)2iginhtboo(r6s,),ew.ge., the estimates have at two successive iterations during optimization. Substituting Pλ ,γ(|ui|) ≈ Pλ ,γ(|uit |) + P′λ ,γ (|uit |) 1 2|uit| (ui2 − (uit )2 ) (7) withThPe′λn,γw(|euoitb|)tabieninagqtuhaedsruaptiecragprapdroiexnimt oaftPioλn,γ(t|ouΩit|n)c((aus)s: hown in Table 1) at|uit|. ∑ ∑Ωnc(u) = p Pλ ,γ(|ui|) i=1 ≈ p i=1 P′λ ,γ (|uit |) 2|uit| ui2 + Cu, (8) Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 3 www.nature.com/scientificreports/ Ftwigoucroem1.m  Iolnlupstrroaptieorntieosf:tThheey0,ar1easnindgsuelvaernatnoornig-cinon, cvoenxcfauvnecatinodnsm. Aonllotthoennicoanll-ycdonecvreexapsienngaoltny functions share (−∞,0], and concave and monotonically increasing on [0,∞). where ∑Cu = i=p1Pλ ,γ(|uit |) − 1 2 P′λ ,γ(|uit |)|uit| is not a function of u and thus will not contribute to the optimization. In a similar way, we can construct a quadratic approximation to Ωnc(v) ∑ ∑Ωnc(v) = q Pλ ,γ(|vj|) j=1 ≈ q j=1 P′λ ,γ (|v jt|) 2|vjt| vj2 + Cv, where (9) ∑Cv = j=q 1Pλ,γ(|vjt|) − 1 2 P′λ ,γ(|vjt|)|vjt| is not a function of v and makes no contribute towards the optimization. Denote the estimates of u and v in the t-th iteration as ut and vt, respectively. To update the estimates of u and v in the (t + 1)-th iteration, we substitute the approximate functions of Ωnc(u) and Ωnc(v) in equations (8) and (9) into (u, v) in 5, and solve the resultant approximate version of the original problem: arg min (u, v) = arg min ∑− uΤXΤYv + p i=1 P′λ1,γ(|uit 2|uit| |) ui2 ∑+ j q =1 P′λ1,γ(|vjt|) 2|vjt| vj2 + α1 2 ||Xu ||2 + α2 2 ||Yv ||2 (10) Obviously, the equation (10) is a quadratical expression, and is biconvex in u and v. This means it is convex in terms of u given v, and vice versa. Then according to the alternate convex search (ACS) method which is designed to solve biconvex problems48, the (t + 1)-th estimation of u and v can be calculated via ∑ut+1 = arg min u − uΤXΤYv t + i p =1 P′λ1,γ(|uit 2|uit| |) ui2 + α1 2 ||Xu ||2 , ∑v t+1 = arg min v − (ut +1)Τ XΤYv + q j=1 P′λ 2 ,γ (|v jt|) 2|vjt| vj2 + α2 2 ||Yv ||2 . (11) Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 4 www.nature.com/scientificreports/ Figure 2.  Canonical loadings estimated on four synthetic data sets. The first column shows results for Data1, and the second column is for Data2, and so forth. The first row is the ground truth, and each remaining one corresponds and so forth. to an SCCA method: (1) Ground Truth. (2) L1-SCCA. (3) For each data set and each method, the estimated weights Lof1-uNisSCshCoAw.n(4o)nLt1h-eSl2eCftCpAan. e(l5,)anγd-nvoirsm on the right. In each individual heat map, the x-axis indicates the indices of elements in u or v; the y-axis indicates the indices of the cross-validation folds. Num Gender(M/F) Handedness(R/L) Age(mean ± std) Education(mean ± std) HC 204 111/93 190/14 76.07 ± 4.99 16.15 ± 2.73 MCI 363 235/128 329/34 74.88 ± 7.37 15.72 ± 2.30 AD 176 95/81 166/10 75.60 ± 7.50 14.84 ± 3.12 Table 2.  Participant characteristics. Both equations above are quadratic, and thus their closed-form solutions exist. Taking the partial derivative of (u, v) in (5) with respect to u and v and setting the results to zero, we have 0 ∈ −XΤYv + (D1t + α1XΤX)u, (12) 0 ∈ −YΤXu + (Dt2 + α2YΤY)v, (13) twAPh′chλe1ce,|γpvor(jtar|e|vdrDjtti|in)a1t gl(isjdt∈oae[rpd1ivei,aarqgtt]uiov)rn,ebaaoelnfdmdevqaceutarransitixoibonwenioct(fho7Lm)tQhwpeAiutih5t-0et,rdhewssdepiimeaacdgitodlatnrorealsyuls.ie.HtnDhtoirt2swyibseayavslaesPdro,′dλ1ta,i|hγund(iet|g|iuaiai-tgt|os)hlni(geai∈hlletm[ml1ya,eptpnre]itrx).towuIftriDbtchae1t ndtdhtboeeeerjsm-ctanh.locTdtuhiealeaxgntoiestndthaieblfyieu-nttitahtrk=yeilnae0gs-. ment of D1t is D1t (i, i) = P′λ1,γ(|ui|) |ui| + ζ (14) where ζ is a tiny positive number. Hunter and Li50 showed that this modification guarantees optimizing the equation (11). Then we have the updating expressions at the (t + 1)-th iteration ut+1 = (D1t + α1XΤX)−1XΤYvt, (15) vt+1 = (Dt2 + α2YΤY)−1YΤXut+1. (16) Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 5 www.nature.com/scientificreports/ γ-norm SCAD Range of γ 0.1, 0.2, 0.3 3.7 Geman, Laplace, MCP 0.1, 0.01, 0.001 ETP, Log 10, 100, 1000 Table 3.  The searching range of optimal γ for each non-convex penalty. We alternate between the above two equations to graduate refine the estimates for u and v until convergence. The pseudo code of the non-convex SCCA algorithm is described in Algorithm 1. Computational Analysis.  In Algorithm 1, Step 3 and Step 6 are linear in the dimension of u and v, and are easy to compute. Step 4 and Step 7 are the critical steps of proposed algorithm. Since we have closed-form updating expressions, they can be calculated via solving a system of linear equations with quadratic complexity which avoids computing the matrix inverse with cubic complexity. Step 5 and 8 are the re-scale step and very easy to calculate. Therefore, the whole algorithm is efficient. Data Availability.  The synthetic data sets generated in this work are available from the corresponding authors’ web sites, http://www.escience.cn/people/dulei/code.html and http://www.iu.edu/ shenlab/tools/ncscca/. The real data set is publicly available in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database repository, http://adni.loni.usc.edu. Experiments and Datasets Data Description.  Synthetic Dataset.  There are four data sets with sparse true signals for both u and v, i.e., only a small subset of features are nonzero. The number of features of both u and v are larger than the observations to simulate a large-(p + q)-small-n task. The generating process is as follows. We first generate u and v with c(Tm∑ohoreyrs)netjklfwae=teaiotecunxrrepeca(otb−eeef|tifvnhicjge −iedzn veatkrts|oa). ..TXAThhffertereoerfmiatrhrsxetaitt5,∼h0tr0heNeefe(lsazaetittuuesnr, het∑savvaxae)nraid2an5b9d0l0ed0fezafateitasautycruioer∼snesfsotNirrnu(uzcuitavean,dnd∑dfr6yov0)m,r0weosGhpneaeerucsestfis(ovi∑aernlxvy),djfkboi=surttrteihtbxheupelt(yai−oshtn|audvNja e−t(a0d u,iskfeI|fn)te.r×aWennn)det. show the true signal of every data set in Fig. 2 (top row). Real Neuroimaging Genetics Dataset.  Data used in the preparation of this article were obtained from the ADNI database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA) etc, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information, see www.adni-info.org. The study protocols were approved by the Institutional Review Boards of all participating centers (Northwestern Polytechnical University, Indiana University and ADNI (A complete list of ADNI sites is available at http://www.adni-info.org/)) and written informed consent was obtained from all participants or authorized representatives. All the analyses were performed on the de-identified ADNI data, and were determined by Indiana University Human Subjects Office as IU IRB Review Not Required. The real neuroimaging genetics dataset were collected from 743 participants, and the details was presented in Table 2. There were 163 candidate SNP biomarkers from the AD-risk genes, e.g., APOE, in the genotyping data. The structural MRI scans were processed with voxel-based morphometry (VBM) in SPM851,52. Briefly, scans were aligned to a T1-weighted template image, segmented into gray matter (GM), white matter (WM) and Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 6 www.nature.com/scientificreports/ L1-SCCA L1-S2CCA L1-NSCCA γ-norm Geman SCAD Laplace MCP ETP Log u Data1 1.00 ± 0.00 1.00 ± 0.00 0.80 ± 0.45 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 Data2 0.75 ± 0.00 0.38 ± 0.00 0.30 ± 0.41 0.75 ± 0.00 0.75 ± 0.00 0.75 ± 0.00 0.75 ± 0.00 0.75 ± 0.00 0.75 ± 0.00 0.75 ± 0.00 Data3 1.00 ± 0.00 1.00 ± 0.00 0.80 ± 0.45 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 Data4 1.00 ± 0.00 1.00 ± 0.00 0.40 ± 0.55 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 v Data1 1.00 ± 0.00 0.75 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 Data2 0.74 ± 0.10 0.75 ± 0.00 0.65 ± 0.15 0.76 ± 0.04 0.74 ± 0.01 0.74 ± 0.02 0.75 ± 0.02 0.76 ± 0.04 0.76 ± 0.04 0.75 ± 0.02 Data3 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 Data4 1.00 ± 0.00 0.75 ± 0.00 0.80 ± 0.27 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 ± 0.00 Table 4.  Performance comparison on synthetic data sets. The AUC (area under the curve) values (mean ± std) of estimated canonical loadings u and v. L1-SCCA L1-S2CCA L1-NSCCA γ-norm Geman SCAD Laplace MCP ETP Log Training data1 0.65 ± 0.03 0.51 ± 0.25 0.62 ± 0.04 0.62 ± 0.04 0.62 ± 0.04 0.62 ± 0.04 0.62 ± 0.04 0.62 ± 0.04 0.62 ± 0.04 0.66 ± 0.03 data2 0.83 ± 0.03 0.67 ± 0.30 0.80 ± 0.01 0.83 ± 0.01 0.83 ± 0.01 0.83 ± 0.01 0.83 ± 0.01 0.83 ± 0.01 0.83 ± 0.01 0.83 ± 0.01 data3 0.65 ± 0.05 0.63 ± 0.28 0.75 ± 0.01 0.75 ± 0.01 0.75 ± 0.01 0.75 ± 0.01 0.75 ± 0.01 0.75 ± 0.01 0.75 ± 0.01 0.76 ± 0.01 data4 0.66 ± 0.04 0.32 ± 0.15 0.65 ± 0.02 0.65 ± 0.02 0.65 ± 0.02 0.65 ± 0.03 0.65 ± 0.02 0.65 ± 0.02 0.65 ± 0.02 0.68 ± 0.03 Testing data1 0.59 ± 0.14 0.55 ± 0.23 0.61 ± 0.17 0.61 ± 0.17 0.62 ± 0.17 0.61 ± 0.17 0.61 ± 0.17 0.61 ± 0.17 0.61 ± 0.17 0.62 ± 0.14 data2 0.82 ± 0.05 0.68 ± 0.28 0.80 ± 0.04 0.84 ± 0.02 0.83 ± 0.02 0.84 ± 0.02 0.83 ± 0.02 0.84 ± 0.02 0.84 ± 0.02 0.83 ± 0.03 data3 0.59 ± 0.25 0.53 ± 0.29 0.73 ± 0.13 0.73 ± 0.13 0.72 ± 0.13 0.73 ± 0.13 0.73 ± 0.13 0.73 ± 0.13 0.73 ± 0.13 0.73 ± 0.12 data4 0.62 ± 0.08 0.24 ± 0.20 0.65 ± 0.10 0.66 ± 0.10 0.66 ± 0.10 0.66 ± 0.10 0.66 ± 0.10 0.66 ± 0.10 0.66 ± 0.10 0.67 ± 0.08 Table 5.  Training and testing correlation coefficients (mean ± std) of 5-fold cross-validation synthetic data sets. The best values are shown in boldface. cerebrospinal fluid (CSF) maps, normalized to MNI space, and smoothed with an 8mm FWHM kernel. We subsampled the whole brain and generated 465 voxels spanning the whole brain ROIs. The regression technique was employed to remove the effects of the baseline age, gender, education, and handedness for these VBM measures. The aim of this study is to evaluate the correlation between the SNPs and the VBM measures, and further identify which SNPs and ROIs are associated. Experimental Setup.  Benchmarks.  In this paper, we are mainly interested in whether these non-convex eS-aSmrCeCCpnClAooAytmactholegetnohtroa1i-ditnnhsoemcdromsuh. Tlebdrhaeeesneyadshaabmrneeecnttehchehtohmsdeisnapigrnekur.clfaooBrrmamvspaealadunreicosendooendcf.ioTff1mhe-SerperCoenCsftoiAtmrieoma,ntthehbteeahmssoterdadutbimccatauseletrthdee-ocoadhwn2n5ao,iruqtehurSeemCsp,oCrttiAhimveamarteli-eodatnhrue.oaIdltthsbirsaseusreeecdahdsimaofsfnee2a8rt,hbe29nlo,e3td2t,23o941 and the LQA based method32. Though the latter two are proposed for capturing group or network structure, they sctarnubcteueraesiplyenreaflotyrmtoulzaetreod29to. Tthheere1-fonroer,mtocomnasktreaitnheedcmometphaordisso, snufcahirasansedttcinognvthinecpianrga,mweetecrhsoaosssoecaiallteodf with the them as benchmarks. With a slight abuse of notation, we use the penalty name to refer a non-convex SCCA method, e.g. ETP for ETP based L1-NSCCA29. SCCA method. For the 1-norm based methods, we call them L1-SCCA25, L1-S2CCA32, and Pmaertahmodetse,raTnudnoinnge.p  ivTohtaelrepaarraemfoeuterrpγa.raAmcceoterdrsinλgi(tio=th1e,i2r)eaqnudatαioi(nis=, th1e, s2e) associated with the non-convex SCCA non-convex penalties can approximate tcbtuhauneltleyd0tot-hheneesoγnr0moo-tntbhaofyerfrempctrthoptavehninedtapiulneinzgriefnaodngrpmaλrpioapanbnrcloedempsαriigdianboteyiefisγac.nagInornittdltryhe.silFesyausrorictnthuhtaeshtrti,erowapnteea, grotahybm.esTeλehrtiveiasernsrtd.ehBdaαuatisctpeweldsaoytoγhna'esvttpheiemrisryfecowocrenomanskissdiurmeomrliaepltaibtroielonycn,aifwudtsreheaehmtyeharaeetroiecoraennltolliyy-t significantly different. Thus the tuning range of γ is not continuous. Besides, we set γ = 3.7 for SCAD penalty since38 suggested that this is a very reasonable choice. The details of tuning range for each penalty are contained in Table 3. For λi and αi, we simply set them to 1 in this study. Termination Criterion.  Algorithm 1, where ε is thWe euusesredmeafixnie|udite+r1ro−r buoitu| n≤d.εInanthdims satxujd|vy,jt+w1e−setvεjt| ≤ ε as = 10−5 the termination condition according to experiments. for All Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 7 www.nature.com/scientificreports/ Figure 3.  Canonical loadings estimated on real imaging genetics data. Each row corresponds to a SCCA method: (1) estimated u L1-SCCA, (2) L1-NSCCA, (3) is shown on the left panel, and L1-S2CCA, (4) v is on the right oγn-en.oIrnmeaacnhdisnodifvoirdthu.aFl horeaetamchamp,etthheoxd-,atxhies indicates the indices of elements in u or v (i.e., SNPs or ROIs); the y-axis indicates the indices of the cross-validation folds. Figure 4.  Mapping averaged canonical weight v's estimated by every SCCA method onto the brain. The left panel and right panel show five methods respectively, where each row corresponds to a SCCA method. The L1-SCCA identifies the most signals, followed by the L1-NSCCA and L1-S2CCA. All the proposed methods identify a clean signal that helps further investigation. methods use the same setup, i.e., the same partition of the five-fold cross-validation, running on the same platform. Results on Synthetic Data.  Figure 2 shows the heat maps of canonical loadings estimated from all SCCA methods, where each row corresponds to an experimental method. We clearly observe that the non-convex SCCA methods and L1-SCCA correctly identify the identical signal positions to the ground truth across four data sets. Besides true signals, L1-SCCA introduces several undesired signals which makes it be inferior to our methods. As Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 8 www.nature.com/scientificreports/ Training Testing Training-Testing Gap L1-SCCA 0.27 ± 0.01 0.18 ± 0.04 0.09 L1-S2CCA 0.29 ± 0.02 0.25 ± 0.10 0.04 L1-NSCCA 0.27 ± 0.01 0.22 ± 0.07 0.05 γ-norm 0.28 ± 0.02 0.26 ± 0.09 0.02 Geman 0.27 ± 0.02 0.26 ± 0.10 0.01 SCAD 0.29 ± 0.02 0.27 ± 0.09 0.02 Laplace 0.27 ± 0.02 0.26 ± 0.10 0.01 MCP 0.28 ± 0.02 0.26 ± 0.09 0.02 ETP 0.28 ± 0.02 0.26 ± 0.09 0.02 Log 0.33 ± 0.03 0.27 ± 0.11 0.06 Table 6.  Performance comparison on real data set. Training and testing correlation coefficients (mean ± std) of 5-fold cross-validation are shown. The best value is shown in boldface. a contrast, L1-NSCCA finds out an incomplete proportion of the ground truth, and L1-S2CCA performs unstably as it fails on some folds. Moreover, we also prioritize these methods using the AUC (area under ROC) criterion in Table 4, where a higher value indicates a better performance. The results exhibit that the non-convex SCCA methods have the highest score at almost every case. L1-SCCA scores similarly to the proposed methods, but later we can see it pays the price at a reduced prediction ability. Table 5 presents the estimated correlation coefficients on both training and testing data, where the best values are shown in boldface. The proposed SCCA methods alternatively gain the best value, and the Log method wins out for the most times. This demonstrates that the tnhoenp-croopnovesexdmmetehthoodds soiudtepnetriffoyramccu1-rnatoermandbasspeadrsSeCcCanAomniectahlolodasdiinntgeprmatsteorfntshaenpdreodbitcatiinonhipgohwceorr.rIenlastuiomnmcoaerfy-, ficients simultaneously, while those 1-norm based SCCA methods cannot. Results on Real Neuroimaging Genetics Data.  In this real data study, the genotyping data is denoted by X, and the imaging data is denoted by Y. The u is a vector of weights of all SNPs, and v is a vector of weights of all imaging markers.The canonical correlation coefficients are defined as Pearson correlation coefficient between Xu and Yv, i.e., (Xu)ΤYv/( Xu Yv ). Figure 3 presents the heat maps regrading the canonical loadings generated from the training set. In this fig- ure, each row shows two weights of a SCCA method, where a larger weight stands for a more importance. The weight associated with the SNPs is on the left panel, and that associated with the voxels is on the right. The pro- posed non-convex SCCA methods obtain very clean and sparse weights for both u and v. The largest signal on the genetic side is the APOE e4 SNP rs429358, which has been previously reported to be related to AD53. On the right panel, the largest signal for all SCCA methods comes from the hippocampus region. This is one of the most nota- ble biomarkers as an indicator of AD, since atrophy of hippocampus has been shown to be related to brain atrophy and neuron loss measured with MRI in AD cohort53. In addition, the L1-S2CCA and SCAD methods identify a weak signal from the parahippocampal gyrus, which is previously reported as an early biomarker of AD54. On some folds, the Log method also finds out the lingual region, parahippocampal gyrus, vermis region. Interestingly, all the three regions have shown to be correlated to AD, and could be further considered as an indicating bio- marker that can be observed prior to a dementia diagnosis. For example, Sjöbeck and Englund reported that molecular layer gliosis and atrophy in the vermis are clearly severer in AD patients than in the health controls55. This is meaningful since the non-convex SCCA methods identify the correct clue for further investigation. On this account, both L1-SCCA and L1-NSCCA are not good choices since they identify too many signals, which may misguide subsequent investigation. The figure shows that L1-S2CCA could be an alternative choice for sparse imaging genetics analysis, but it performs unstably across the five folds. And, the non-convex methods is more cwoenigshisttse(natvaenradgesdtaabclerotshsa5nftohldoss)ere1g-aSrCdCinAg methods. To show the results more clearly, the imaging measurements from each SCCA we map method the canonical onto the brain in Fig. 4. The figure confirms that the L1-SCCA and L1-NSCCA find out many signals that are not sparse. The L1-S2CCA identifies fewer signals than both L1-SCCA and L1-NSCCA, but more than all these non-convex SCCA methods. All the non-convex SCCA only highlights a small region of the whole brain. This again reveals that the proposed methods have better canonical weights which reduces the effort of further investigation. Besides, we include both training and testing correlation coefficients in Table 6, where their mean and stand- ard deviation are shown. The training results of all methods are similar, with the Log method gains the highest value of 0.33 ± 0.03. As for the testing results, which is our primary interest, all the non-convex SCCA methods onmboatnna-iccneobnoevftettexhremvpaelrtuoheposdotssheahdnavmteheebtsehetoted1-rsSgCiesCnmeAruamcliheztashtmioodanlsl.peBerertfshoidarmens,athtnhacetedoaifsffttehhrreeenyecaer1eb-eSletCwssCeleAinkemtlhyeettothroafaidnlsli.ninTgthoainsodvmeteersfaitntitnsigntghpaeistrsftuohree-. The result of this real imaging genetics data reveals that the proposed SCCA methods can extract more accurate and sparser canonical weights for both genetic and imaging biomarkers, and obtain higher correlation coefficients than those 1-SCCA methods. Conclusion We have proposed a unified non-convex SCCA model and an efficient optimization algorithm using a family of non-convex penalty functions. These penalties are concave and piecewise continuous, and thus piecewise differ- entiable. (LQA)38. We approximate these non-convex penalties Therefore, the proposed algorithm is effective abnydanrun2sffuanstc.tion via the local quadratic approximation We compare the non-convex methods with three state-of-the-art data and real imaging genetics data. The simulation data have different gr1-oSuCnCd Atrumthetshtroudcstuurseins.gTbhoetrhessuimltsuolantitohne simulation data show that the non-convex SCCA methods identify cleaner and better canonical loadings than the three 1-SCCA methods, i.e. L1-SCCA25, L1-S2CCA32, and L1-NSCCA29. These non-convex methods also recover Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 9 www.nature.com/scientificreports/ hprigehdeicrticoonrrcealaptaiboinlitcyoeafsfitchieeyntms athyaonve1r-pSCenCaAlizme leatrhgoedcso, edfefimcioenntsst.rTathinegretshualtts1o-nSCthCeAremaledthaotadsshhoawvethsuatbtohpetipmroa-l rpeotsuerdnmtoeothmodans ydiisrcroelveevraanpt asiigr noaf lms.eTahneincgofrurleglaetnioetniccaonedffibcrieanintsimshaogwintghbaitotmhearnkoenrs-,cwonhvileextShCe C1-ASCmCeAthmodesthhoodlds better testing values. This verifies our motivation that the non-convex penalty can improve the prediction ability, and thus has better generalization capability. Obviously, the parameter γ plays a key role in these non-convex penalties. In the future work, we will investigate how to choose a reasonable γ; and explore how to incorporate structure information into the model as structure information extraction is an important task for brain imaging genetics as well as biology studies. References 1. Hibar, D. P., Kohannim, O., Stein, J. L., Chiang, M.-C. & Thompson, P. M. Multilocus genetic analysis of brain images. Frontiers in Genetics 2, 73 (2011). 2. Hariri, A. R., Drabant, E. M. & Weinberger, D. R. Imaging genetics: perspectives from studies of genetically driven variation in serotonin function and corticolimbic affective processing. Biological psychiatry 59, 888–897 (2006). 3. Viding, E., Williamson, D. E. & Hariri, A. R. Developmental imaging genetics: challenges and promises for translational research. Development and Psychopathology 18, 877–892 (2006). 4. Mattay, V. S., Goldberg, T. E., Sambataro, F. & Weinberger, D. R. Neurobiology of cognitive aging: insights from imaging genetics. Biological psychology 79, 9–22 (2008). 5. Bigos, K. L. & Weinberger, D. R. Imaging genetics - days of future past. Neuroimage 53, 804–809 (2010). 6. Scharinger, C., Rabl, U., Sitte, H. H. & Pezawas, L. Imaging genetics of mood disorders. Neuroimage 53, 810–821 (2010). 7. Potkin, S. G. et al. Genome-wide strategies for discovering genetic influences on cognition and cognitive disorders: methodological considerations. Cognitive neuropsychiatry 14, 391–418 (2009). 8. Kim, S. et al. Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel. PLoS One 8, e70269 (2013). 9. Shen, L. et al. Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: A study of the ADNI cohort. Neuroimage 53, 1051–63 (2010). 10. Winkler, A. M. et al. Cortical thickness or grey matter volume? the importance of selecting the phenotype for imaging genetics studies. Neuroimage 53, 1135–1146 (2010). 11. Meda, S. A. et al. A large scale multivariate parallel ica method reveals novel imaging–genetic relationships for alzheimer’s disease in the adni cohort. Neuroimage 60, 1608–1621 (2012). 12. Nho, K. et al. Whole-exome sequencing and imaging genetics identify functional variants for rate of change in hippocampal volume in mild cognitive impairment. Molecular psychiatry 18, 781 (2013). 13. Shen, L. et al. Genetic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain imaging and behavior 8, 183–207 (2014). 14. Saykin, A. J. et al. Genetic studies of quantitative MCI and AD phenotypes in ADNI: Progress, opportunities, and plans. Alzheimer’s & Dementia 11, 792–814 (2015). 15. Wang, H. et al. Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort. Bioinformatics 28, 229–237 (2012). 16. Vounou, M., Nichols, T. E. & Montana, G. Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach. NeuroImage 53, 1147–59 (2010). 17. Vounou, M. et al. Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in alzheimer’s disease. Neuroimage 60, 700–716 (2012). 18. Zhu, X., Suk, H.-I., Huang, H. & Shen, D. Structured sparse low-rank regression model for brain-wide and genome-wide associations. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 344–352 (Springer, 2016). 19. Liu, J. et al. Combining fmri and snp data to investigate connections between brain function and genetics using parallel ica. Human brain mapping 30, 241–255 (2009). 20. Geladi, P. & Kowalski, B. R. Partial least-squares regression: a tutorial. Analytica chimica acta 185, 1–17 (1986). 21. Grellmann, C. et al. Comparison of variants of canonical correlation analysis and partial least squares for combined analysis of mri and genetic data. NeuroImage 107, 289–310 (2015). 22. Hardoon, D., Szedmak, S. & Shawe-Taylor, J. Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 2639–2664 (2004). 23. Hardoon, D. R. & Shawe-Taylor, J. Sparse canonical correlation analysis. Machine Learning 83, 331–353 (2011). 24. Hotelling, H. Relations between two sets of variates. Biometrika 28, 321–377 (1936). 25. Witten, D. M., Tibshirani, R. & Hastie, T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–34 (2009). 26. Witten, D. M. & Tibshirani, R. J. Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical applications in genetics and molecular biology 8, 1–27 (2009). 27. Parkhomenko, E., Tritchler, D. & Beyene, J. Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biology 8, 1–34 (2009). 28. Chen, X., Liu, H. & Carbonell, J. G. Structured sparse canonical correlation analysis. In International Conference on Artificial Intelligence and Statistics, 199–207 (2012). 29. Chen, X. & Liu, H. An efficient optimization algorithm for structured sparse cca, with applications to EQTL mapping. Statistics in Biosciences 4, 3–26 (2012). 30. Chen, J. & Bushman, F. D. et al. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14, 244–258 (2013). 31. Lin, D., Calhoun, V. D. & Wang, Y.-P. Correspondence between fMRI and SNP data by group sparse canonical correlation analysis. Medical image analysis 18, 891–902 (2014). 32. Du, L. et al. A novel structure-aware sparse learning algorithm for brain imaging genetics. In International Conference on Medical Image Computing and Computer Assisted Intervention, 329–336 (2014). 33. Yan, J. et al. Transcriptome-guided amyloid imaging genetic analysis via a novel structured sparse learning algorithm. Bioinformatics 30, i564–i571 (2014). 34. Du, L. et al. Structured sparse canonical correlation analysis for brain imaging genetics: An improved graphnet method. Bioinformatics 32, 1544–1551 (2016). 35. DInute,rLn.aettioanl.aSl pCaornsfeecreanncoenoincaBl icooirnrfeolramtiaotnicasnaanldysBisiovmiaetdriucinncea,t7e0d7l–1-7n1o1r(mIE-nEoEr,m20w16it)h. application to brain imaging genetics. In IEEE Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 10 www.nature.com/scientificreports/ 36. Du, L. et al. Identifying associations between brain imaging phenotypes and genetic factors via a novel structured scca approach. In International Conference on Information Processing in Medical Imaging, 543–555 (Springer, 2017). 37. Meinshausen, N. & Bühlmann, P. High-dimensional graphs and variable selection with the lasso. The annals of statistics 1436–1462 (2006). 38. Fan, J. & Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360 (2001). 39. Zou, H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 1418–1429 (2006). 40. Shen, X., Pan, W. & Zhu, Y. Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association 107, 223–232 (2012). 41. Fung, G. & Mangasarian, O. Equivalence of minimal for sufficiently small p. Journal of optimization theory la0n-adnadplppl-incaotrimonsso1l5u1ti,o1n–s1o0f linear equalities, (2011). inequalities and linear programs 42. Frank, L. E. & Friedman, J. H. A statistical view of some chemometrics regression tools. Technometrics 35, 109–135 (1993). 43. Geman, D. & Yang, C. Nonlinear image recovery with half-quadratic regularization. IEEE Transactions on Image Processing 4, 932–946 (1995). 44. Trzasko, J. & Transactions Manduca, A. Highly undersampled magnetic on Medical imaging 28, 106–121 (2009). resonance image reconstruction via homotopic l1-minimization. IEEE 45. Zhang, C. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38, 894–942 (2010). 46. Gao, C., Wang, N., Yu, Q. & Zhang, Z. A feasible nonconvex relaxation approach to feature selection. In AAAI, 356–361 (2011). 47. Friedman, J. H. Fast sparse regression and classification. International Journal of Forecasting 28, 722–738 (2012). 48. Gorski, J., Pfeuffer, F. & Klamroth, K. Biconvex sets and optimization with biconvex functions: a survey and extensions. Mathematical Methods of Operations Research 66, 373–407 (2007). 49. Lu, C., Tang, J., Yan, S. & Lin, Z. Generalized nonconvex nonsmooth low-rank minimization. In IEEE Conference on Computer Vision and Pattern Recognition, 4130–4137 (2014). 50. Hunter, D. R. & Li, R. Variable selection using mm algorithms. Annals of statistics 33, 1617 (2005). 51. Ashburner, J. & Friston, K. J. Voxel-based morphometry–the methods. Neuroimage 11, 805–21 (2000). 52. Risacher, S. L. & Saykin, A. J. et al. Baseline MRI predictors of conversion from MCI to probable AD in the ADNI cohort. Current Alzheimer Research 6, 347–61 (2009). 53. Hampel, H. et al. Core candidate neurochemical and imaging biomarkers of alzheimer’s disease. Alzheimer’s & Dementia 4, 38–48 (2008). 54. Echavarri, C. et al. Atrophy in the parahippocampal gyrus as an early biomarker of alzheimer’s disease. Brain Structure and Function 215, 265–271 (2011). 55. Sjöbeck, M. & Englund, E. Alzheimer’s disease and the cerebellum: a morphologic study on neuronal and glial changes. Dementia and geriatric cognitive disorders 12, 211–218 (2001). Acknowledgements Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. L. Du was supported by the National Natural Science Foundation of China (61602384); the Natural Science Basic Research Plan in Shaanxi Province of China (2017JQ6001); the China Postdoctoral Science Foundation (2017M613202); and the Fundamental Research Funds for the Central Universities (3102016OQD0065) at Northwestern Polytechnical University. This work was also supported by the National Institutes of Health R01 EB022574, R01 LM011360, U01 AG024904, P30 AG10133, R01 AG19771, UL1 TR001108, R01 AG 042437, R01 AG046171, R01 AG040770; the Department of Defense W81XWH-14-2-0151, W81XWH-13-1-0259, W81XWH-12-2-0012; the National Collegiate Athletic Association 14132004 at Indiana University. Author Contributions L.D., L.G. and L.S. conceived and designed the research. L.D., K.L. and J.H. carried out the study analysis. X.Y., J.Y, S.L.R. and A.J.S. collected the data from ADNI database. L.D., K.L., L.S. and A.J.S. analyzed the results and wrote the paper. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. Additional Information Competing Interests: The authors declare that they have no competing interests. Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 11 www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. © The Author(s) 2017 Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 12 www.nature.com/scientificreports/ Consortia Alzheimer’s Disease Neuroimaging Initiative Michael W. Weiner3, Paul Aisen4, Ronald Petersen5, Clifford R. Jack5, William Jagust6, John Q. Trojanowki7, Arthur W. Toga4, Laurel Beckett8, Robert C. Green9, John Morris10, Leslie M. Shaw7, Zaven Khachaturian11, Greg Sorensen12, Maria Carrillo13, Lew Kuller14, Marc Raichle10, Steven Paul15, Peter Davies16, Howard Fillit17, Franz Hefti18, David Holtzman10, M. Marcel Mesulam19, William Potter20, Peter Snyder21, Adam Schwartz22, Tom Montine23, Ronald G. Thomas24, Michael Donohue24, Sarah Walter24, Devon Gessert24, Tamie Sather24, Gus Jiminez24, Archana B. Balasubramanian24, Jennifer Mason24, Iris Sim24, Danielle Harvey8, Matthew Bernstein5, Nick Fox25, Paul Thompson26, Norbert Schuff3, Charles DeCArli8, Bret Borowski5, Jeff Gunter5, Matt Senjem5, Prashanthi Vemuri5, David Jones5, Kejal Kantarci5, Chad Ward5, Robert A. Koeppe27, Norm Foster28, Eric M. Reiman29, Kewei Chen29, Chet Mathis14, Susan Landau6, Nigel J. Cairns10, Erin Franklin10, Lisa Taylor-Reinwald10, Virginia Lee7, Magdalena Korecka7, Michal Figurski7, Karen Crawford4, Scott Neu4, Tatiana M. Foroud2, Steven Potkin30, Kelley Faber2, Sungeun Kim2, Kwangsik Nho2, Leon Thal24, Neil Buckholtz31, Marilyn Albert32, Richard Frank33, John Hsiao31, Jeffrey Kaye34, Joseph Quinn34, Lisa Silbert34, Betty Lind34, Raina Carter34, Sara Dolen34, Lon S. Schneider4, Sonia Pawluczyk4, Mauricio Beccera4, Liberty Teodoro4, Bryan M. Spann4, James Brewer24, Helen Vanderswag24, Adam Fleisher24, Judith L. Heidebrink27, Joanne L. Lord27, Sara S. Mason5, Colleen S. Albers5, David Knopman5, Kris Johnson5, Rachelle S. Doody35, Javier Villanueva-Meyer35, Valory Pavlik35, Victoria Shibley35, Munir Chowdhury35, Susan Rountree35, Mimi Dang35,Yaakov Stern36, Lawrence S. Honig36, Karen L. Bell36, Beau Ances10, Maria Carroll10, Mary L. Creech10, Erin Franklin10, Mark A. Mintun10, Stacy Schneider10, Angela Oliver10, Daniel Marson37, David Geldmacher37, Marissa Natelson Love37, Randall Griffith37, David Clark37, John Brockington37, Erik Roberson37, Hillel Grossman38, Effie Mitsis38, Raj C. Shah39, Leyla deToledo-Morrell39, Ranjan Duara40, Maria T. Greig-Custo40, Warren Barker40, Chiadi Onyike32, Daniel D’Agostino32, Stephanie Kielb32, Martin Sadowski41, Mohammed O. Sheikh41, Anaztasia Ulysse41, Mrunalini Gaikwad41, P. Murali Doraiswamy42, Jeffrey R. Petrella42, Salvador BorgesNeto42, Terence Z. Wong42, Edward Coleman42, Steven E. Arnold7, Jason H. Karlawish7, David A. Wolk7, Christopher M. Clark7, Charles D. Smith43, Greg Jicha43, Peter Hardy43, Partha Sinha43, Elizabeth Oates43, Gary Conrad43, Oscar L. Lopez14, Mary Ann Oakley14, Donna M. Simpson14, Anton P. Porsteinsson44, Bonnie S. Goldstein44, Kim Martin44, Kelly M. Makino44, M. Saleem Ismail44, Connie Brand44, Adrian Preda30, Dana Nguyen30, Kyle Womack45, Dana Mathews45, Mary Quiceno45, Allan I. Levey46, James J. Lah46, Janet S. Cellar46, Jeffrey M. Burns47, Russell H. Swerdlow47, William M. Brooks47, Liana Apostolova26, Kathleen Tingus26, Ellen Woo26, Daniel H. S. Silverman26, Po H. Lu26, George Bartzokis26, Neill R Graff-Radford48, Francine Parfitt48, Kim Poki-Walker48, Martin R. Farlow2, Ann Marie Hake2, Brandy R. Matthews2, Jared R. Brosch2, Scott Herring2, Christopher H. van Dyck49, Richard E. Carson49, Martha G. MacAvoy49, Pradeep Varma49, Howard Chertkow50, Howard Bergman50, Chris Hosein50, Sandra Black51, Bojana Stefanovic51, Curtis Caldwell51, Ging-Yuek Robin Hsiung52, Benita Mudge52, Vesna Sossi52, Howard Feldman52, Michele Assaly52, Elizabeth Finger53, Stephen Pasternack53, Irina Rachisky53, John Rogers53, Dick Trost53, Andrew Kertesz53, Charles Bernick54, Donna Munic54, Emily Rogalski19, Kristine Lipowski19, Sandra Weintraub19, Borna Bonakdarpour19, Diana Kerwin19, Chuang-Kuo Wu19, Nancy Johnson19, Carl Sadowsky55, Teresa Villena55, Raymond Scott Turner56, Kathleen Johnson56, Brigid Reynolds56, Reisa A. Sperling9, Keith A. Johnson9, Gad Marshall9, JeromeYesavage57, Joy L. Taylor57, Barton Lane57, Allyson Rosen57, Jared Tinklenberg57, Marwan N. Sabbagh58, Christine M. Belden58, Sandra A. Jacobson58, Sherye A. Sirrel58, Neil Kowall59, Ronald Killiany59, Andrew E. Budson59, Alexander Norbash59, Patricia Lynn Johnson59, Thomas O. Obisesan60, Saba Wolday60, Joanne Allard60, Alan Lerner61, Paula Ogrocki61, Curtis Tatsuoka61, Parianne Fatica61, Evan Fletcher8, Pauline Maillard8, John Olichney8, Charles DeCarli8, Owen Carmichael8, Smita Kittur62, Michael Borrie63, T.-Y. Lee63, Rob Bartha63, Sterling Johnson64, Sanjay Asthana64, Cynthia M. Carlsson64, Pierre Tariot29, Anna Burke29, Ann Marie Milliken29, Nadira Trncic29, Adam Fleisher29, Stephanie Reeder29, Vernice Bates65, Horacio Capote65, Michelle Rainka65, Douglas W. Scharre66, Maria Kataki66, Brendan Kelly66, Earl A. Zimmerman67, Dzintra Celmins67, Alice D. Brown67, Godfrey D. Pearlson68, Karen Blank68, Karen Anderson68, Laura A. Flashman69, Marc Seltzer69, Mary L. Hynes69, Robert B. Santulli69, Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 13 www.nature.com/scientificreports/ Kaycee M. Sink70, Leslie Gordineer70, Jeff D. Williamson70, Pradeep Garg70, Franklin Watkins70, Brian R. Ott71, Geoffrey Tremont71, Lori A. Daiello71, Stephen Salloway72, Paul Malloy72, Stephen Correia72, Howard J. Rosen3, Bruce L. Miller3, David Perry3, Jacobo Mintzer73, Kenneth Spicer73, David Bachman73, Nunzio Pomara74, Raymundo Hernando74, Antero Sarrael74, Susan K. Schultz75, Karen Ekstam Smith75, Hristina Koleva75, Ki Won Nam75, Hyungsub Shim75, Norman Relkin15, Gloria Chaing15, Michael Lin15, Lisa Ravdin15, Amanda Smith76, Balebail Ashok Raj76 & Kristin Fargher76 3University of California, San Francisco, USA. 4University of Southern California, Los Angeles, USA. 5Mayo Clinic, Rochester, Minnesota, USA. 6University of California, Berkeley, Berkeley, USA. 7University of Pennsylvania, Philadelphia, USA. 8University of California, Davis, Davis, USA. 9Brigham and Women’s Hospital/Harvard Medical School, Boston, USA. 10Washington University St. Louis, St. Louis, USA. 11Prevent Alzheimer’s Disease, 2020, Rockville, USA. 12Siemens, Munich, Germany. 13Alzheimer’s Association, Illinois, USA. 14University of Pittsburgh, Pennsylvania, USA. 15Cornell University, NewYork, USA. 16Albert Einstein College of Medicine ofYeshiva University, New York, USA. 17AD Drug Discovery Foundation, New York, USA. 18Acumen Pharmaceuticals, California, USA. 19Northwestern University, Illinois, USA. 20National Institute of Mental Health, Maryland, USA. 21Brown University, Rhode Island, USA. 22Eli Lilly, Indiana, USA. 23University of Washington, Washington, USA. 24University of California, San Diego, California, USA. 25University of London, London, UK. 26University of California, Los Angeles, California, USA. 27University of Michigan, Michigan, USA. 28University of Utah, Utah, USA. 29Banner Alzheimer’s Institute, Arizona, USA. 30University of California, Irvine, California, USA. 31National Institute on Aging, Maryland, USA. 32Johns Hopkins University, Maryland, USA. 33Richard Frank Consulting, New Hampshire, USA. 34Oregon Health and Science University, Oregon, USA. 35Baylor College of Medicine, Texas, USA. 36Columbia University Medical Center, New York, USA. 37University of Alabama-Birmingham, Alabama, USA. 38Mount Sinai School of Medicine, NewYork, USA. 39Rush University Medical Center, Rush University, Illinois, USA. 40Wien Center, Florida, USA. 41NewYork University, New York, USA. 42Duke University Medical Center, North Carolina, USA. 43University of Kentucky, Kentucky, USA. 44University of Rochester Medical Center, New York, USA. 45University of Texas Southwestern Medical School, Texas, USA. 46Emory University, Georgia, USA. 47University of Kansas, Medical Center, Kansas, USA. 48Mayo Clinic, Jacksonville, Florida, USA. 49Yale University School of Medicine, Connecticut, USA. 50McGill University, MontrealJewish General Hospital, Quebec, Canada. 51Sunnybrook Health Sciences, Ontario, Canada. 52U.B.C. Clinic for AD & Related Disorders, British Columbia, Canada. 53Cognitive Neurology-St. Joseph’s, Ontario, Canada. 54Cleveland Clinic Lou Ruvo Center for Brain Health, Ohio, USA. 55Premiere Research Inst (Palm Beach Neurology), Florida, USA. 56Georgetown University Medical Center, Washington D.C, USA. 57Stanford University, California, USA. 58Banner Sun Health Research Institute, Arizona, USA. 59Boston University, Massachusetts, USA. 60Howard University, Washington D.C, USA. 61Case Western Reserve University, Ohio, USA. 62Neurological Care of CNY, NewYork, USA. 63Parkwood Hospital, Pennsylvania, USA. 64University of Wisconsin, Wisconsin, USA. 65Dent Neurologic Institute, New York, USA. 66Ohio State University, Ohio, USA. 67Albany Medical College, New York, USA. 68Hartford Hospital, Olin Neuropsychiatry Research Center, Connecticut, USA. 69Dartmouth-Hitchcock Medical Center, New Hampshire, USA. 70Wake Forest University Health Sciences, North Carolina, USA. 71Rhode Island Hospital, Rhode Island, USA. 72Butler Hospital, Rhode Island, USA. 73Medical University South Carolina, Carolina, USA. 74Nathan Kline Institute, NewYork, USA. 75University of Iowa College of Medicine, Iowa, USA. 76USF Health Byrd Alzheimer’s Institute, University of South Florida, Florida, USA. Scientific Reports | 7: 14052 | DOI:10.1038/s41598-017-13930-y 14