Below are common terms and acronyms used within this book and across the field of Psychiatric Genetics.
A
| Admixture | 
Genetic ancestry that comes from distinct populations | 
 | 
| Allele | 
Variant forms of a gene that occupy a specific position on a chromosome. | 
 | 
| Antisense | 
A molecule or strand of nucleic acid that is complementary to a specific RNA sequence, often used in gene regulation. | 
 | 
| Area under the ROC curve (AUC) | 
A measure of a classifier’s ability to distinguish between classes in a binary classification problem. AUC is between 0.5 (no prediction) and 1 (full prediction) | 
 | 
| Assembly (Genome) | 
 | 
 | 
| Autosome | 
Any chromosome that is not a sex chromosome, responsible for carrying genetic information unrelated to sex determination. | 
 | 
B
| Base pair | 
The fundamental unit of DNA and RNA, consisting of two nucleotides held together by hydrogen bonds. | 
 | 
| Biallelic (SNP) | 
Single Nucleotide Polymorphism with two possible alleles at a specific genomic position. | 
 | 
| Bonferroni adjustment | 
Statistical correction to account for multiple hypothesis testing, by applying a p-value correction that corrects for the number of statistical tests performed | 
 | 
| Build (Genome) | 
 | 
 | 
C
| Candidate Gene Study | 
Research focusing on specific genes to determine their association with a trait or disease. | 
 | 
| Chromatin | 
Complex of DNA, RNA, and proteins forming the chromosome’s structure. | 
 | 
| Chromatin Immunoprecipitation Sequencing (ChIPseq) | 
Technique to identify DNA regions bound by specific proteins using antibodies. | 
 | 
| Chromosome | 
Thread-like structure in cells containing DNA and genes. | 
 | 
| Codon | 
Three-nucleotide sequence in mRNA that encodes a specific amino acid during protein synthesis. | 
 | 
| Common variant | 
Frequently occurring genetic variant in a population. | 
 | 
| Congenital | 
Present at birth, often referring to medical conditions or traits. | 
 | 
| Contig | 
 | 
 | 
| Copy DNA (cDNA) | 
DNA synthesized from an mRNA template, used in molecular biology research. | 
 | 
| Copy Number Variant (CNV) | 
Variation in the number of copies of a DNA segment between individuals. | 
 | 
| Crossover | 
Exchange of genetic material between homologous chromosomes during meiosis. | 
 | 
D
| Deletion | 
Removal of a segment of DNA, leading to the loss of genetic material. | 
 | 
| Duplication | 
Copying of a DNA segment, resulting in extra genetic material. | 
 | 
E
| Effect estimate | 
Statistical measurement of the impact of a factor on an outcome in research (e.g. odds ratio, beta value) | 
 | 
| Elastic Net Regression | 
Statistical learning method combining Lasso and Ridge regressions to select relevant variables | 
 | 
| Enhancer | 
DNA region influencing gene transcription. | 
 | 
| Epidemiology | 
Study of the patterns and causes of disease and health in populations. | 
 | 
| Epigenetics | 
Study of heritable changes in gene expression not involving alterations in DNA sequence. | 
 | 
| Epigenome | 
Overall epigenetic modifications in an organism’s DNA. | 
 | 
| Epigenome-Wide Association Study (EWAS) | 
Investigation of epigenetic variations associated with traits or diseases acros the genome | 
 | 
| Epistasis | 
Interaction between different genetic variants affecting a trait’s expression. | 
 | 
| Exome | 
The portion of the genome containing protein-coding genes. | 
 | 
| Exon | 
Part of a gene that will form mRNA. Some exons are coding, with information for making a protein. | 
 | 
F
| False Discovery Rate (FDR) | 
Method for correction for multiple testing, by defining an acceptable false discovery rate - proportion of falsely identified significant results in multiple testing. | 
 | 
| FASTA file | 
Format for representing nucleotide or amino acid sequences in bioinformatics. | 
 | 
| Frameshift mutation | 
Insertion or deletion altering the reading frame during translation. | 
 | 
G
| Gene-Environment Interaction (GxE) | 
Joint effect of genetic and environmental factors on traits. | 
 | 
| Genetic Correlation | 
Proportion of variance that two traits share from common genetic causes | 
 | 
| Genome | 
Complete set of genetic material in an organism. | 
 | 
| Genomic Inflation factor (lamda) | 
Measure of inflation in test statistics due to population stratification. | 
 | 
| Genotype | 
Genetic makeup of an individual at a specific locus. | 
 | 
| Genome-Wide Association Study (GWAS) | 
Investigation of genetic variants across the genome to identify associations with traits. | 
Chapter 1.2; Chapter 5; Software Tutorial: GWAS | 
H
| Haplotype | 
Set of alleles on a single chromosome inherited together. | 
 | 
| Hardy-Weinberg Equilibrium (HWE) | 
Model predicting allele frequencies remain constant in a non-evolving population. | 
 | 
| Heterogeneity | 
Variability or diversity in a population or dataset. | 
 | 
| Heterozygosity | 
Possessing different alleles at a specific genetic locus. | 
 | 
| Histone modification | 
Chemical alteration of histone proteins influencing gene expression. | 
 | 
| Homozygosity | 
Possessing identical alleles at a specific genetic locus. | 
 | 
| Horizontal pleiotropy | 
Situation in which a genetic variant affects multiple traits independently. | 
 | 
| Hyperparameters | 
Parameters set before a machine learning algorithm runs. | 
 | 
I
| Identity-by-decent (IBD) | 
Shared ancestry for a specific genomic region between individuals. | 
 | 
| Imputation | 
Predicting missing genetic information based on known data. | 
 | 
| INFO score | 
Measure of imputation quality for genetic variants | 
 | 
| Insertion | 
Addition of a DNA segment into a genome. | 
 | 
| Insertion/Deletion (Indel) | 
Genetic variation involving insertions or deletions of nucleotides. | 
 | 
| Intergenic | 
DNA regions between genes. | 
 | 
| Intron | 
Region of DNA between the exons in a gene | 
 | 
K
| Kilobase (Kb) | 
Unit of length used in molecular biology, equal to 1000 base pairs. | 
 | 
| Kinship | 
Genetic relatedness between individuals. | 
 | 
L
| Lasso regression | 
Regression method promoting sparsity in feature selection. | 
 | 
| LD Score Regression (LDSC) | 
Technique to estimate genetic correlation from genome-wide summary statistics. | 
 | 
| Linkage | 
Physical proximity of genetic loci on a chromosome. | 
 | 
| Linkage disequilibrium (LD) | 
Non-random association of alleles at two or more loci. | 
 | 
| Locus | 
Specific position on a chromosome. | 
 | 
M
| Manhattan plot | 
Graph showing results of association tests in a genome-wide analysis, where x-axis is genomic position, y-axis is -log10(p-value) | 
 | 
| Mendelian Randomization | 
Method using genetic variants as instrumental variables to infer causality. | 
 | 
| Mendelian Trait | 
 | 
 | 
| Messenger RNA (mRNA) | 
RNA molecule transcribed from DNA and carrying protein-coding information. | 
 | 
| Methylation | 
Addition of a methyl group to DNA, often affecting gene expression. | 
 | 
| Microbiome | 
Community of microorganisms in a specific environment. | 
 | 
| Minor allele frequency | 
Frequency of the less common allele at a genetic locus in a population. | 
 | 
| Missense mutation | 
Point mutation altering a codon, resulting in a different amino acid. | 
 | 
| Mitochondrial DNA (mDNA) | 
DNA located in mitochondria, inherited maternally. | 
 | 
| Mosaicism | 
Presence of genetically distinct cell populations within an organism. | 
 | 
N
| Nagelkerke P | 
Measure of explained variance in logistic regression. | 
 | 
| Nonsense mutation | 
Point mutation leading to a premature stop codon. | 
 | 
| Nucleosome | 
DNA wrapped around histone proteins, forming chromatin. | 
 | 
| Null hypothesis | 
Hypothesis stating no significant effect or difference. | 
 | 
O
| Observational study | 
Research investigating associations without intervention. | 
 | 
| Odds ratio | 
Measure of association in case-control studies. | 
 | 
| Open Reading Frame | 
DNA sequence potentially encoding a protein. | 
 | 
P
| Pedigree | 
Family tree showing genetic relationships. | 
 | 
| Phasing | 
Determining the parental origin of alleles in an individual. | 
 | 
| Phenome | 
Complete set of an individual’s traits and characteristics. | 
 | 
| Phenome-wide Association Study (pheWAS) | 
Exploration of genetic variants across multiple traits. | 
 | 
| Phenotype | 
Observable traits and characteristics of an individual. | 
 | 
| Pleiotropy | 
Single gene influencing multiple traits. | 
 | 
| Point mutation | 
Single nucleotide change in DNA. | 
 | 
| Polygenic Risk Score (PRS) | 
Combined effect of multiple genetic variants on a trait. | 
 | 
| Polygenic trait | 
Trait influenced by multiple genes. | 
 | 
| Polymorphism | 
Genetic variation within a population. | 
 | 
| Population stratification | 
Population substructure leading to confounding in genetic studies. | 
 | 
| Power (Genomic) | 
Probability of detecting an effect in a study. | 
 | 
| Power analysis | 
Calculating required sample size based on desired statistical power, and effect size estimates. | 
 | 
| Precision Medicine | 
Tailoring medical treatment to an individual’s genetic and health characteristics. | 
 | 
| Principal Component Analysis (PCA) | 
Dimensionality reduction technique, used to control for population stratification. | 
 | 
| Proband | 
Index individual in a genetic study. | 
 | 
| Promoter | 
DNA region controlling gene transcription initiation. | 
 | 
Q
| Quantile-quantile plot (qqplot) | 
Plot showing how observed test statistics in a genome-wide analysis depart from expected distribution. | 
 | 
R
| R2 | 
Coefficient of determination indicating the proportion of variance explained. | 
 | 
| Rare variant | 
Infrequently occurring genetic variant in a population. | 
 | 
| REF/ALT | 
Reference and alternative alleles at a genetic locus. | 
 | 
S
| Scaffold | 
 | 
 | 
| Sensitivity | 
True positive rate in diagnostic testing. | 
 | 
| Single Nucleotide Polymorphism (SNP) | 
Single base-pair variation in DNA sequence. | 
 | 
| Single Nucleotide Variant (SNV) | 
Single base-pair genetic variation, including SNPs. | 
 | 
| SNP Heritability (h2SNP) | 
Proportion of trait variance explained by SNPs. | 
 | 
| Specificity | 
True negative rate in diagnostic testing. | 
 | 
| Structural variant | 
Genetic variation involving larger DNA segments. | 
 | 
T
| Transcriptome | 
Total RNA | 
 | 
| Transcriptomic Imputation | 
The process of predicting missing gene expression data using available information. | 
 | 
| Transcriptomic Risk Score (TRS) | 
A score calculated from transcriptomic data to assess the risk of a specific outcome. | 
 | 
| Type I error | 
False positive result in hypothesis testing, where a null hypothesis is incorrectly rejected. | 
 | 
| Type II error | 
False negative result in hypothesis testing, where a null hypothesis is incorrectly accepted. Type II error = 1-power. | 
 | 
V
| Variant | 
A specific form of a genetic locus differing from the reference sequence. | 
 | 
| Variance explained | 
The portion of trait variability accounted for by a given factor, often a genetic variant. | 
 | 
| Variant Call Format (VCF) | 
A standard file format for storing genetic variant information. | 
 | 
| Vertical pleiotropy | 
Situation in which a genetic variant affects multiple traits due to a shared biological pathway. | 
 | 
W
| Whole Exome Sequencing (WES) | 
Technique for sequencing only the protein-coding regions of the genome. | 
 | 
| Whole Genome Sequencing (WGS) | 
Method for sequencing an individual’s entire genome. | 
 | 
XYZ
| X-linked | 
Genetic trait located on the X chromosome, leading to sex-specific inheritance patterns. | 
 | 
*This Glossary was constructed with the help of ChatGPT.
OpenAI. (2023). ChatGPT (Mar 14 version) [Large language model]. https://chat.openai.com/chat
For other glossaries of genetic terms, please see: