Lecture Notes: Introduction to Bioinformatics
AI3073 Introduction to Bioinformatics
Dr. Jiaxing CHEN
Chapter 2 Part 1: Gene and RNA
1. 基本概念 (Fundamentals)
Gene (基因): A locus/region of DNA encoding a functional protein or RNA product; the molecular unit of heredity (遗传单位)。
Central Dogma (中心法则): DNA → RNA → Protein。
- Transcription (转录): DNA → pre-mRNA
- Splicing (剪接): pre-mRNA → mRNA (去掉 introns, 保留 exons)
- Translation (翻译): mRNA → Protein
2. Genome & 基因组特征
- Human genome length: ~3.2 billion base pairs (bp)。
- Cell scale: 从病毒 (nm) 到 human cell (~10 μm),范围极大 (见 page 10 diagram)。
- Codon Table (密码子表, page 12): 3 个碱基 = 1 个 amino acid. Stop codons: UAA, UAG, UGA。
3. Computational Gene Prediction (计算基因预测)
Challenge (难点):
- Mammalian genomes 很大 (~3 billion bp)
- <2% DNA 编码蛋白
- Non-coding RNAs 难预测。
Approaches (方法):
- Ab initio: 根据统计特征 (如 codon usage, ORF)
- Homology-based: 基于已知基因相似性
- Hybrid: 混合方法
Key features (特征):
- ORF (开放阅读框): start codon ATG → stop codon
- Codon Usage 偏好
- Motifs (启动子, enhancers, UTRs)
4. RNA 分类 (RNA Types)
mRNA: 编码蛋白
Non-coding RNA(ncRNA, 非编码RNA):
- tRNA: adaptor, 携带氨基酸 (~80 nt, clover-leaf structure)
- rRNA: 组成 ribosome,催化蛋白合成
- miRNA (~22 nt): 基因表达调控 (silencing), 有发夹状 precursor
- lncRNA (>200 nt): 调控作用, scaffold/guide/enhancer
- circRNA: 环状,稳定,不易降解,可来源于蛋白编码基因
5. RNA Secondary Structure (RNA 二级结构)
Levels (层次结构):
- Primary structure (一级结构): 核苷酸序列
- Secondary structure (二级结构): 发夹、茎环等 (page 49)
- Tertiary structure (三级结构): 二级结构之间的相互作用 (page 47)
Prediction (预测方法):
- Co-variation analysis: 多序列比对, conserved base pairs
- Single sequence prediction: 最小自由能 (minimum free energy, MFE)
6. 核心 takeaway
- Genes = DNA functional unit
- Central Dogma: DNA → RNA → Protein
- Gene prediction: 难,因为非编码区很多
- ncRNAs: 种类繁多 (tRNA, rRNA, miRNA, lncRNA, circRNA),功能复杂
- RNA structure: 从一级到三级,预测依赖计算方法
Lecture Notes: Introduction to Bioinformatics
https://tosakaucw.github.io/lecture-notes-introduction-to-bioinformatics/