Glossary

Below are common terms and acronyms used within this book and across the field of Psychiatric Genetics.

A

Term	Definition	Reference chapter
Admixture	Genetic ancestry that comes from distinct populations
Allele	Variant forms of a gene that occupy a specific position on a chromosome.
Antisense	A molecule or strand of nucleic acid that is complementary to a specific RNA sequence, often used in gene regulation.
Area under the ROC curve (AUC)	A measure of a classifier’s ability to distinguish between classes in a binary classification problem. AUC is between 0.5 (no prediction) and 1 (full prediction)
Assembly (Genome)
Autosome	Any chromosome that is not a sex chromosome, responsible for carrying genetic information unrelated to sex determination.

B

Base pair	The fundamental unit of DNA and RNA, consisting of two nucleotides held together by hydrogen bonds.
Biallelic (SNP)	Single Nucleotide Polymorphism with two possible alleles at a specific genomic position.
Bonferroni adjustment	Statistical correction to account for multiple hypothesis testing, by applying a p-value correction that corrects for the number of statistical tests performed
Build (Genome)

C

Candidate Gene Study	Research focusing on specific genes to determine their association with a trait or disease.
Chromatin	Complex of DNA, RNA, and proteins forming the chromosome’s structure.
Chromatin Immunoprecipitation Sequencing (ChIPseq)	Technique to identify DNA regions bound by specific proteins using antibodies.
Chromosome	Thread-like structure in cells containing DNA and genes.
Codon	Three-nucleotide sequence in mRNA that encodes a specific amino acid during protein synthesis.
Common variant	Frequently occurring genetic variant in a population.
Congenital	Present at birth, often referring to medical conditions or traits.
Contig
Copy DNA (cDNA)	DNA synthesized from an mRNA template, used in molecular biology research.
Copy Number Variant (CNV)	Variation in the number of copies of a DNA segment between individuals.
Crossover	Exchange of genetic material between homologous chromosomes during meiosis.

D

Deletion	Removal of a segment of DNA, leading to the loss of genetic material.
Duplication	Copying of a DNA segment, resulting in extra genetic material.

E

Effect estimate	Statistical measurement of the impact of a factor on an outcome in research (e.g. odds ratio, beta value)
Elastic Net Regression	Statistical learning method combining Lasso and Ridge regressions to select relevant variables
Enhancer	DNA region influencing gene transcription.
Epidemiology	Study of the patterns and causes of disease and health in populations.
Epigenetics	Study of heritable changes in gene expression not involving alterations in DNA sequence.
Epigenome	Overall epigenetic modifications in an organism’s DNA.
Epigenome-Wide Association Study (EWAS)	Investigation of epigenetic variations associated with traits or diseases acros the genome
Epistasis	Interaction between different genetic variants affecting a trait’s expression.
Exome	The portion of the genome containing protein-coding genes.
Exon	Part of a gene that will form mRNA. Some exons are coding, with information for making a protein.

F

False Discovery Rate (FDR)	Method for correction for multiple testing, by defining an acceptable false discovery rate - proportion of falsely identified significant results in multiple testing.
FASTA file	Format for representing nucleotide or amino acid sequences in bioinformatics.
Frameshift mutation	Insertion or deletion altering the reading frame during translation.

G

Gene-Environment Interaction (GxE)	Joint effect of genetic and environmental factors on traits.
Genetic Correlation	Proportion of variance that two traits share from common genetic causes
Genome	Complete set of genetic material in an organism.
Genomic Inflation factor (lamda)	Measure of inflation in test statistics due to population stratification.
Genotype	Genetic makeup of an individual at a specific locus.
Genome-Wide Association Study (GWAS)	Investigation of genetic variants across the genome to identify associations with traits.	Chapter 1.2; Chapter 5; Software Tutorial: GWAS

H

Haplotype	Set of alleles on a single chromosome inherited together.
Hardy-Weinberg Equilibrium (HWE)	Model predicting allele frequencies remain constant in a non-evolving population.
Heterogeneity	Variability or diversity in a population or dataset.
Heterozygosity	Possessing different alleles at a specific genetic locus.
Histone modification	Chemical alteration of histone proteins influencing gene expression.
Homozygosity	Possessing identical alleles at a specific genetic locus.
Horizontal pleiotropy	Situation in which a genetic variant affects multiple traits independently.
Hyperparameters	Parameters set before a machine learning algorithm runs.

I

Identity-by-decent (IBD)	Shared ancestry for a specific genomic region between individuals.
Imputation	Predicting missing genetic information based on known data.
INFO score	Measure of imputation quality for genetic variants
Insertion	Addition of a DNA segment into a genome.
Insertion/Deletion (Indel)	Genetic variation involving insertions or deletions of nucleotides.
Intergenic	DNA regions between genes.
Intron	Region of DNA between the exons in a gene

J

K

Kilobase (Kb)	Unit of length used in molecular biology, equal to 1000 base pairs.
Kinship	Genetic relatedness between individuals.

L

Lasso regression	Regression method promoting sparsity in feature selection.
LD Score Regression (LDSC)	Technique to estimate genetic correlation from genome-wide summary statistics.
Linkage	Physical proximity of genetic loci on a chromosome.
Linkage disequilibrium (LD)	Non-random association of alleles at two or more loci.
Locus	Specific position on a chromosome.

M

Manhattan plot	Graph showing results of association tests in a genome-wide analysis, where x-axis is genomic position, y-axis is -log10(p-value)
Mendelian Randomization	Method using genetic variants as instrumental variables to infer causality.
Mendelian Trait
Messenger RNA (mRNA)	RNA molecule transcribed from DNA and carrying protein-coding information.
Methylation	Addition of a methyl group to DNA, often affecting gene expression.
Microbiome	Community of microorganisms in a specific environment.
Minor allele frequency	Frequency of the less common allele at a genetic locus in a population.
Missense mutation	Point mutation altering a codon, resulting in a different amino acid.
Mitochondrial DNA (mDNA)	DNA located in mitochondria, inherited maternally.
Mosaicism	Presence of genetically distinct cell populations within an organism.

N

Nagelkerke P	Measure of explained variance in logistic regression.
Nonsense mutation	Point mutation leading to a premature stop codon.
Nucleosome	DNA wrapped around histone proteins, forming chromatin.
Null hypothesis	Hypothesis stating no significant effect or difference.

O

Observational study	Research investigating associations without intervention.
Odds ratio	Measure of association in case-control studies.
Open Reading Frame	DNA sequence potentially encoding a protein.

P

Pedigree	Family tree showing genetic relationships.
Phasing	Determining the parental origin of alleles in an individual.
Phenome	Complete set of an individual’s traits and characteristics.
Phenome-wide Association Study (pheWAS)	Exploration of genetic variants across multiple traits.
Phenotype	Observable traits and characteristics of an individual.
Pleiotropy	Single gene influencing multiple traits.
Point mutation	Single nucleotide change in DNA.
Polygenic Risk Score (PRS)	Combined effect of multiple genetic variants on a trait.
Polygenic trait	Trait influenced by multiple genes.
Polymorphism	Genetic variation within a population.
Population stratification	Population substructure leading to confounding in genetic studies.
Power (Genomic)	Probability of detecting an effect in a study.
Power analysis	Calculating required sample size based on desired statistical power, and effect size estimates.
Precision Medicine	Tailoring medical treatment to an individual’s genetic and health characteristics.
Principal Component Analysis (PCA)	Dimensionality reduction technique, used to control for population stratification.
Proband	Index individual in a genetic study.
Promoter	DNA region controlling gene transcription initiation.

Q

Quantile-quantile plot (qqplot)

Plot showing how observed test statistics in a genome-wide analysis depart from expected distribution.

R

R²	Coefficient of determination indicating the proportion of variance explained.
Rare variant	Infrequently occurring genetic variant in a population.
REF/ALT	Reference and alternative alleles at a genetic locus.

S

Scaffold
Sensitivity	True positive rate in diagnostic testing.
Single Nucleotide Polymorphism (SNP)	Single base-pair variation in DNA sequence.
Single Nucleotide Variant (SNV)	Single base-pair genetic variation, including SNPs.
SNP Heritability (h²_SNP)	Proportion of trait variance explained by SNPs.
Specificity	True negative rate in diagnostic testing.
Structural variant	Genetic variation involving larger DNA segments.

T

Transcriptome	Total RNA
Transcriptomic Imputation	The process of predicting missing gene expression data using available information.
Transcriptomic Risk Score (TRS)	A score calculated from transcriptomic data to assess the risk of a specific outcome.
Type I error	False positive result in hypothesis testing, where a null hypothesis is incorrectly rejected.
Type II error	False negative result in hypothesis testing, where a null hypothesis is incorrectly accepted. Type II error = 1-power.

U

V

Variant	A specific form of a genetic locus differing from the reference sequence.
Variance explained	The portion of trait variability accounted for by a given factor, often a genetic variant.
Variant Call Format (VCF)	A standard file format for storing genetic variant information.
Vertical pleiotropy	Situation in which a genetic variant affects multiple traits due to a shared biological pathway.

W

Whole Exome Sequencing (WES)	Technique for sequencing only the protein-coding regions of the genome.
Whole Genome Sequencing (WGS)	Method for sequencing an individual’s entire genome.

XYZ

X-linked

Genetic trait located on the X chromosome, leading to sex-specific inheritance patterns.

*This Glossary was constructed with the help of ChatGPT.

OpenAI. (2023). ChatGPT (Mar 14 version) [Large language model]. https://chat.openai.com/chat

For other glossaries of genetic terms, please see: