Risk homozygous haplotype regions for autism identifies population-specific ten genes for numerous pathways

Recessive homozygous haplotype (rHH) mapping is a reliable tool for identifying recessive genes by detecting homozygous segments of identical haplotype structures. These are shared at a higher frequency amongst probands compared to parental controls. Finding out such rHH blocks in autism subjects can help in deciphering the disorder etiology. The study aims to detect rHH segments of identical haplotype structure shared at a higher frequency in autism subjects than controls to identify recessive genes responsible for autism manifestation. In the present study, 426 unrelated autism genotyped probands with 232 parents (116 trios) were obtained from Gene Expression Omnibus (GEO) Database. Homozygosity mapping analyses have been performed on the samples using standardized algorithms using the Affymetrix GeneChip® 500K SNP Nsp and Sty mapping arrays datasets. A total of 38 homozygous haplotype blocks were revealed across sample datasets. Upon downstream analysis, 10 autism genes were identified based on selected autism candidate genes criteria. Further, expressive Quantitative Trait Loci (QTL) analysis of SNPs revealed various binding sites for regulatory proteins BX3, FOS, BACH1, MYC, JUND, MAFK, POU2F2, RBBP5, RUNX3, and SMARCA4 impairing essential autism genes CEP290, KITLG, CHD8, and INS2. Pathways and processes such as adherens junction, dipeptidase activity, and platelet-derived growth factor—vital to autism manifestation were identified with varied protein-protein clustered interactions. These findings bring various population clusters with significant rHH genes. It is suggestive of the existence of common but population-specific risk alleles in related autism subjects.


Background
Autism is a heritable, neurodevelopmental condition affecting information processing in the brain, heterogeneous with < 15% known genetic causes. It has a worldwide prevalence rate of 1 in 59 children being affected [1]. It alters connections and organization of nerve cells and their synapses, impairing the overall cognition, emotional, social, and physical health of the affected individuals [2].
The study of recessive risk gene loci is performed through extended runs of homozygosity (ROH) as a genomic feature, useful to map recessive disease genes in outbred populations [3,4].
The authors expect to find an unusually higher number of affected individuals in complex disorders to have the identical haplotype in the region surrounding a disease [5,6]. Therefore, a rare pathogenic variant and surrounding haplotype are often enriched in frequency in a group of affected individuals compared to a cohort of unaffected controls [7]. These variants in the haplotypes indicate that shared homozygous haplotypes (HH) in multiple affected individuals result in identifying recessive genes in diseases such as autism. The recessive risk gene loci approach has been proven to help understand autism genetics and behavioral severity.
Gene mapping of rare recessive conditions in a large outbred population has been a herculean task due to the lack of multiple affected individuals in families. Homozygosity mapping is an efficient gene mapping method applicable to rare recessive disorders since small chromosomal regions tend to be transmitted whole. Affected individuals will also have identical by descent alleles at markers located near to disease locus and thus will be homozygous at these markers [8,9]. The basic idea is thus to look for regions of homozygosity shared amongst different affected individuals. This study would identify disease-causing mutations by pursuing the hypothesis-free genome-wide search for homozygosity blocks through an efficient homozygosity scan, using single nucleotide polymorphisms (SNPs) chip-based genotyping platforms: the Affymetrix GeneChip® 500K SNP Nsp and Sty mapping arrays. Given this, the present study aims to detect homozygous segments of identical haplotype structure shared at a higher frequency in autism subjects to distill out the recessive genes essential to autism manifestation.

Methods
The study included 426 unrelated probands analyzed along with 232 parents (116 trios) obtained from an international public repository Gene Expression Omnibus (GEO) Database (accession GSE9222). The unrelated autism samples have been collected from the Hospital for Sick Children, Toronto, by the primary investigators. The datasets have been identified with chromosomal abnormalities; however, their whole etiology was unknown. These have been genotyped for autism index cases using blood-derived DNA performed on the Affymetrix Gene-Chip® 500K single nucleotide polymorphism (SNP) mapping array (Nsp/Sty arrays) [10]. Out of 426 probands and 232 parents, 334 probands and 122 parents (61 trios) were scrutinized based on the type of microarrays used to perform genome variation profiling and SNP genotyping. This selection for the set of 334 probands and 122 parents was based on initial screening using Autism Diagnostic Observation Schedule [11] and Autism Diagnostic Interview-Revised [12] criteria on a clinical best estimate. The detailed case history of each case was used to select cases and controls based on classical autism without any other comorbidities such as intellectual disability, epilepsy, and mild autism to avoid noise in the datasets. Information for all the comorbidities has been considered, as mentioned in the primary investigation. The controls were the parents, wherever available.
If parental samples were not available, the Yoruba population was used as control. All samples were screened for Fragile X syndrome using karyotyping techniques, and if positive, the samples were excluded. Based on selection criteria, the samples genotyped using Nsp and Sty arrays were selected and analyzed for homozygosity mapping analysis ( Table 1). The study aims to identify risk homozygous regions and the encompassed recessive genes for autism manifestation.
In view of this, all the subject samples have been analyzed against the controls (parental samples) in autism trios; otherwise, the HapMap Yoruba dataset has been used as the controls for autism proband cases and divided into categories for ease of understanding. Category A comprises trios with matched parental samples with subcategory A1: 61 trios with 122 controls on Affymetrix Mapping 250K Nsp SNP Array and A2: 61 trios 122 controls using Affymetrix Mapping 250K Sty SNP Array. Category B contains individuals autism subjects with subcategory B1: 273 autism subjects with HapMap Yoruba controls using Affymetrix Mapping 250K Nsp SNP Array and B2:273 autism cases with HapMap Yoruba controls using Affymetrix Mapping 250K Sty SNP Array. The subcategories were divided based on genotyping arrays used for analysis.
Whole-genome genotyping was performed on the autism subjects and controls using Genotyping Console to generate output CEL Files using select probe set summary. The genotyping console helped integrate SNP genotyping, generate genotyping calls, loss of heterozygosity (LOH) data, and quality control metrics for sample data.
Output CEL file datasets generated by the Genotyping Console were used to analyze the homozygous regions t h r o u g h H o m o z y g o s i t y M a p p e r ( w w w . homozygositymapper.org) [13,14]. Homozygosity Mapper is a web-based approach to homozygosity mapping to analyze and detect homozygous stretches. It provides an intuitive graphical interface to visualize the results. The homozygous regions in affected subjects were visualized as peaks chromosome-wise with underlying genotypes. Homozygosity scores were plotted against the physical position with a threshold of ≥ 4000. The length of the homozygous block (in SNPs) at each marker for each sample was calculated.
Golden Helix GenomeBrowse ® visualization tool (Version 2.x) was used to visualize and browse the entire genome with annotated data, including gene prediction and structure, protein product, and gene variation for SNP visualization [17]. In concurrence, expression Quantitative Trait Loci (eQTL) analysis was performed. eQTl loci are genomic regions with DNA sequence variants that influence the expression level of one or more genes. It was performed on the rules of the homozygous recessive genes with readseq identifiers (rsIDs) using RegulomeDB (https://regulomedb.org/ regulome-search/). The chromosomal regions, bound proteins, affected genes, and enhancers were identified. rsIDs that fall under score 1 and score 2 were selected [18]. Regu-lomeDb was conditioned to identify known associated functional variants to depict its functionality. It interpreted genomic regulatory variants using computational predictions and manual annotations.
Genes identified from the homozygous blocks of the cases were analyzed, which meet the following criteria for selection set for autism gene. It should be novel autism candidate gene expressed in brain; participate in neuronal development; interact with known autism genes; non-homozygous in controls; de novo in origin; overlap in two or more unrelated samples; recurrent in two or more unrelated samples; involve in the expression of brain development and participate in neuronal migration, axon growth, neuritis outgrowth, synaptic plasticity, and cell adhesion.
Physical interactors of identified affected genes were predicted through a web-based application GeneMania (https://genemania.org/) [19]. Further, pathways were constructed with the enriched candidate genes and associated genes related to autism using Ingenuity® Systems, IPA software (www. ingenuity.com) [20]. Genes and the chemical search were used to explore the information on protein families, protein signaling, and metabolic pathways, and the regular cellular activity of the protein. Genes and their protein products are shown based on their location. This analysis was performed using methods adapted as in Veerappa and colleagues, 2014 [21].

Results
The study group comprises both males and females of average intellectual level, with no significant emotional and behavioral problems. The confidence interval of category A data is considerably higher than those of category B datasets. For category B datasets, 47.36% of the homozygous regions were exclusively significant in autism subjects than in controls enriched with autismspecific genes.
An integrated approach of whole-genome genotyping and homozygosity mapping revealed 38 homozygous regions highlighted, bearing 308 genes. Genome-wide Downstream analysis and autism gene selection criteria being put forth, the homozygosity blocks analysis showed 16 candidate autism recessive genes in cases partaking in neuronal development. Stringent filtration of risk homozygous haplotypes (rHH) based on homozygosity mapping-based pipeline revealed 12, 11, 9, and 6 homozygous regions respectively for category A1, A2, B1, and B2 datasets (max. score range 4500-25000; max. block length = 1000) using Homozygosity Mapper and applying the criteria for candidate gene selection revealed 6, 6, 3, and 3 homozygous regions with 5, 3, 3, and 2 autism genes respectively for subcategories A1, A2, B1, and B2 on varied chromosomes (Table 1). eQTL analysis of the homozygous blocks in subcategory A1 revealed 4, 3, 2, and 3 polymorphisms deregulating CEP290, KITLG, and PTPRJ genes ( Table 2). Several polymorphisms were identified in subcategories A2: rs7078127, rs11866251, and rs16945839, impacting RERE, CHD8, and EMSY genes ( Table 2). eQTL analysis in the homozygous blocks of subcategory B1 identified two polymorphisms: rs1681625 and rs12292520, deregulating significant autism candidate genes PTPRJ-JAK2 and INS2-involved in brain development. Subcategory B2 revealed polymorphisms including rs17051043 and rs2675835 impairing EP300 and MAPK3 genes expressed in the brain from the developmental stage (Table 2). These contained valid RegulomeDB scores and rank affecting the chromatin state impacting motif at varied levels.
Pathway network analysis revealed a significant overrepresentation of these genes in physiological pathways: arginine and proline metabolism, protein farnesylation, protein geranylation, protein prenylation, aminoacylase activity, hydrolase activity, urea cycle, and metabolism of amino groups, GTPase binding-relevant to autism. Further, pathway analysis for significant genes found in homozygous regions with eQTL analysis using Ingenuity Pathway Analysis pipeline identified notable pathways and processes viz., migration of Purkinje cells, morphology, formation, and hypoplasia of brain, development of the head and central nervous system relevant to autism. Several gene hits such as AUTS2, ADGRA2, CELSR1, FZD9, MBD1, RERE, and OPN1 were localized in various levels of these pathways (Fig. 3).

Discussion
Susceptible genetic loci identification is crucial for better understanding the underlying mechanisms of autism, thus aiding the development of their treatment and management [22]. Due to the high heritability of autism, various common genetic risk variants have surfaced, yet a long way to find rare variations showing the significance of heritability in autism [23]. The homozygous haplotype mapping approach is complimentary to genome wide association studies and Next Generation Sequencing in studying the complexity and heterogeneity of autism [6]. Homozygosity mapping analysis adds the missing pieces of the puzzle for complex disorders in terms of heritability and recessive gene burden [6]. The present study reports the presence of rHH mapping study to identify candidate autism gene variants, particularly recessive gene loci involved in autism manifestation. The idea was to apply the concept of homozygosity mapping to the trios sample cohort and understand the role of haplotype blocks in unrelated subjects.
Studies have been performed to identify a genomewide survey of runs of homozygosity for bone mineral density in Caucasian and Chinese populations [5]. Similarly, an integrated approach of whole-genome genotyping and homozygosity mapping revealed the presence of 38 rHH regions with a much higher degree of haplotype-sharing. Oversharing of haplotype indicates a disease locus, an observation that forms the basis of the current study. The authors observed that these cases shared 18 homozygous blocks bearing 24 recessive genes related to autism. Previous studies have reported recessive autism genes such as ABHD14A, CADM2, CHRF AM7A, EPHA3, FGF10, GRIK2, GRM3, KCND2 PDZK1, present in the haplotype regions [6]. ABHD14A genes have been identified in the present study as well, impairing autism pathophysiology. eQTL analysis of polymorphisms in the homozygous blocks affected many downstream genes such as CEP290, KITLG, PTPRJ-JAK2, RERE, EMSY, CHD8, INS2, EP300, and MAPK3, identified in subcategories A1, A2, B1, and B2, known for autism manifestation. These nine genes have been exclusively identified in the current study. However, CNTN4, CADPS2, SUMF1, SLC9A9, and NTRK3 genes have been identified and implicated in autism elsewhere [24]. These genes show affected binding of regulatory proteins, which deregulates autism genes and impairs the downstream gene functionality and processes. Among the regulatory proteins, CBX3 [25] and BACH1 [26] are functioning as repressors, while FOS, BACH1, MYC, JUND, MAFK, POU2F2, RBBP5, RUNX3, and SMARCA4 are functioning as activators. These are involved in the activation of autism genes through various cellular functions. These cellular functions include chromatin organization, G protein-coupled receptor (GPCR) signaling, cell cycle regulation, homeostasis, signaling pathways, and cellular stress response, which play a vital role in the severity of autism symptoms. Ubiquitination of calcium signaling pathway in endoplasmic reticulum across all body cells results in deficits observed in fibroblasts in autism subjects through neuronal functional impairment [27]. Intracellular GPCRs, linked to synaptogenesis, memory and learning, behavior, and cognition for pathophysiological roles in autism [28]. Cellular endoplasmic reticulum stress can lead to apoptosis, resulting in autism [29].
The establishment of the autism gene risk loci enrichment pathway led to identifying notable genes with a promising role in its pathogenesis. These include AUTS2, JMJD1C, PCGF5, PCGF3, CEP152, ABHD14A, CEACAM21, A4GALT, OPN1MW, CELSR1, RYK, ADGR A2, FZD3, FZD9, NF608, and RERE. These formed clustering at varied levels affecting downstream processes. ABHD14A has been implicated as a novel gene in a previous relevant autism haplotype study [6]. In yet another study, a family was reported to carry two changes in ABHD14A, a gene involved in cerebellar development and diagnosed with intellectual disability and a male obligate carrier [30]. Further studies need to be merited on each of these novel gene variants to decipher the complete cascade in haplotype analysis. The pathway identified could be sliced down into seven gene pathway clusters: AUTS2, ADGRA2, CELS R1, FZD9, MBD1, RERE, and OPN1. Each of these genes physically interacts with autism-specific genes and bears receptors and ligands relevant to autism and affects the downstream cascade of processes like formation, morphology, hypoplasia, brain, and head development, development of a central nervous system, migration of Purkinje cells, and pervasive development disorder. ADGRA2 and CELSR1 share similar ectodomain structures, implicate in neural tube formation, as evident in network analysis in the current study as well [31]. Extensive studies have established their role in these autism genes directly or indirectly as follows: (1) regulation of actin cytoskeleton is performed by cytoplasmic AUTS2 to control neuronal migration and neurite extension, vital to autism [32]. (2). ADGRA2 plays a vital role during brain angiogenesis and has functionality as a WNT7A/7B-specific co-activator for beta-catenin signaling in the brain endothelium [33]. (3). CELSR1 protein is involved in processes of the neural progenitor cells in the basal compartment to decide the fate for the development of the cerebral cortex [34]. (4). Homozygous deletion in FZD9 resulted in an acute deficiency in learning and memory, leading to apoptosis in the dentate gyrus and lowered seizure cut-off [35]. (5). Abnormal serotonin systems resulting from a deletion in MBD1 have been linked to autism due to its higher levels of receptor called Htr2c being synthesized. This system acts as an important link between MBD1 and autismlike behavior [36]. (6). Based on a discovery resource of rare copy number variations in autism individuals. De novo mutations have been identified to be associated with proximal 1p36 deletions and various processes of transcriptional, synaptic, and chromatin gene disruptions in autism [37]. (7). The OPN1 is a rholinked protein for mental retardation, which controls synaptic vesicle endocytosis via endophilin A1, vital to autism [38].. These seven gene pathways bring out multiple population clusters with significant rHH genes suggesting the existence of common population-specific risk alleles.

Conclusion
Homozygous haplotype analysis would be an essential tool in uncovering the missing pieces of disease heritability. Replicative homozygous haplotypes in autism subjects showed enrichment for previously identified autism candidate genes, validating our approach. The reported regions provide promising genomic regions containing highly plausible candidate genes. Further studies should be warranted in a larger cohort to validate the findings.