PSY2013 Lecture 07 Notes
PSY2013 Research Methods in Psychology
Statistics in Psychology
Part 1: Descriptive Statistics 描述性统计
Basic Concepts 基本概念
Variables 变量
- Dependent Variable (DV) 因变量: The variable that is measured.
- Independent Variable (IV) 自变量: The variable that is manipulated.
- Scales 测量尺度: Nominal, Ordinal, Interval, Ratio 名义、顺序、区间、比率
Population and Sample 总体与样本
- Population 总体: The entire group of individuals or instances about whom we hope to learn.
- Sample 样本: A subset of the population examined in the study.
Descriptive Statistics 描述性统计
Definition 定义
- Methods for organizing and summarizing data.
- 组织和总结数据的方法。
- Examples: Tables, graphs, and descriptive values (e.g., average score).
- 例如:表格、图表和描述性值(如平均分)。
Parameters and Statistics 参数与统计量
- Parameter 参数: A descriptive value for a population.
- Statistic 统计量: A descriptive value for a sample.
- 参数:总体的描述值。
- 统计量:样本的描述值。
Four Types of Measurement Scales 四种测量尺度
Nominal Scale 名义尺度
- Unordered set of categories identified only by name.
- 仅以名称标识的无序类别集合。
- Example: Gender, ethnicity 性别、种族
Ordinal Scale 顺序尺度
- Ordered set of categories.
- 有序类别集合。
- Example: Class rankings 班级排名
Interval Scale 区间尺度
- Ordered series of equal-sized categories; arbitrary zero point.
- 等间距有序系列;零点是任意的。
- Example: Temperature in Celsius 摄氏温度
Ratio Scale 比率尺度
- Interval scale with a true zero point.
- 具有真实零点的区间尺度。
- Example: Height, weight 身高、体重
Frequency Distributions 频数分布
Definition 定义
- Organized tabulation showing the number of individuals located in each category.
- 显示每个类别中个体数量的有组织的列表。
- Example: Frequency table 频数表
Types 类型
- Histograms 直方图: Bars touch, used for interval or ratio scales.
- 条形接触,用于区间或比率尺度。
- Polygons 多边形图: Dots connected by lines.
- 点通过线连接。
- Bar Graphs 条形图: Bars do not touch, used for nominal or ordinal scales.
- 条形不接触,用于名义或顺序尺度。
Central Tendency 集中趋势
- Definition 定义
- Measures that determine a single value to describe the center of a distribution.
- 确定单个值来描述分布中心的测量。
- Mean 平均数: Sum of scores divided by the number of scores.
- 总分数除以分数个数。
- $$
\text{Mean} (\mu) = \frac{\sum X}{N}$$
- $$
- Median 中位数: Middle score in a distribution.
- 分布中的中间分数。
- Mode 众数: Most frequently occurring score.
- 出现最频繁的分数。
Variability 变异性
- Definition 定义
- Measure of how spread out the scores are in a distribution.
- 测量分数在分布中的分散程度。
- Range 范围: Difference between the highest and lowest scores.
- 最高分和最低分之间的差。
- Standard Deviation 标准差: Average distance between each score and the mean.
- 每个分数与平均数之间的平均距离。
- $$
\text{Standard Deviation} (\sigma) = \sqrt{\frac{\sum (X - \mu)^2}{N}}$$
- $$
- Variance 方差: Mean of the squared deviations.
- 平均平方偏差。
- $$
\text{Variance} (\sigma^2) = \frac{\sum (X - \mu)^2}{N}$$
- $$
Correlation 相关性
- Pearson Correlation 皮尔逊相关系数
- Measures the direction and strength of the linear relationship between two variables.
- 测量两个变量之间线性关系的方向和强度。
- Formula 公式:
- $$
r = \frac{\sum (X - \overline{X})(Y - \overline{Y})}{\sqrt{\sum (X - \overline{X})^2 \sum (Y - \overline{Y})^2}}$$
- $$
- Range 范围: -1 to 1
Regression 回归分析
- Linear Regression 线性回归
- Determines the equation for the best-fitting line.
- 确定最佳拟合线的方程。
- Equation 方程:
- $$
Y = bX + a$$ - $b$: Slope 斜率
- $a$: Y-intercept 截距
- $$
Part 2: Probability and Samples 概率与样本
Inferential Statistics 推论统计
- Definition 定义
- Methods for using sample data to make general conclusions about populations.
- 使用样本数据对总体做出一般结论的方法。
z-Scores 标准分数
- Formula 公式
- $$
z = \frac{X - \mu}{\sigma}$$ - $$
X = \mu + z\sigma$$
- $$
- Purpose 目的
- Specifies the precise location of each X value within a distribution.
- 指定每个X值在分布中的精确位置。
- Example 例子
- Mean $\mu$ = 100, Standard Deviation $\sigma$ = 10, X = 130 → z = 3
- 平均数 $\mu$ = 100,标准差 $\sigma$ = 10,X = 130 → z = 3
Probability 概率
- Definition 定义
- Method for measuring the likelihood of obtaining a specific sample.
- 测量获得特定样本的可能性的方法。
- Formula 公式:
- $$
P(A) = \frac{\text{number of outcomes classified as A}}{\text{total number of possible outcomes}}$$
- $$
Sampling and Probability 抽样与概率
Random Sampling 随机抽样
- Each member of a population has an equal chance of being selected.
- 总体的每个成员都有同等的被选中机会。
Normal Distribution 正态分布
- Symmetrical, bell-shaped distribution.
- 对称的钟形分布。
- Properties 特性
- Mean = Median = Mode
- 平均数 = 中位数 = 众数
- Empirical Rule 经验法则: ~68% within 1 SD, ~95% within 2 SDs, ~99.7% within 3 SDs
68%在1个标准差内,95%在2个标准差内,~99.7%在3个标准差内
Central Limit Theorem 中心极限定理
- Definition 定义
- The distribution of sample means will be normal if the sample size is large enough (n ≥ 30).
- 如果样本量足够大(n ≥ 30),样本均值的分布将是正态分布。
- Properties 特性
- Mean of sample means = Population mean
- 样本均值的平均数 = 总体均值
- Standard Error 标准误差:
- $$
\sigma_M = \frac{\sigma}{\sqrt{n}}$$
- $$
Part 3: Hypothesis Testing 假设检验
Hypothesis Testing 假设检验
Definition 定义
- Statistical method that uses sample data to evaluate a hypothesis about a population.
- 使用样本数据评估总体假设的统计方法。
- Steps 步骤
- State hypothesis about the population.
- 陈述关于总体的假设。
- Use hypothesis to predict the characteristics the sample should have.
- 使用假设预测样本应具有的特征。
- Obtain a sample from the population.
- 从总体中获取样本。
- Compare data with the hypothesis prediction.
- 将数据与假设预测进行比较。
- State hypothesis about the population.
Null Hypothesis (H0) 零假设
- States that there is no change or effect.
- 声明没有变化或效果。
Alternative Hypothesis (H1) 备择假设
- States that there is a change or effect.
- 声明存在变化或效果。
Critical Regions and Alpha Level 临界区和显著性水平
- Critical Region 临界区
- Extreme sample values that are very unlikely to occur if H0 is true.
- 如果零假设成立,极不可能发生的样本值。
- Alpha Level (α) 显著性水平
- Probability value used to define “very unlikely” outcomes.
- 用于定义“极不可能”结果的概率值。
- Common α values 常用α值: 0.05, 0.01, 0.001
Errors in Hypothesis Testing 假设检验中的错误
Type I Error (α) 一类错误
- Rejecting H0 when it is true.
- 拒绝成立的零假设。
- False positive 假阳性
Type II Error (β) 二类错误
- Failing to reject H0 when it is false.
- 未拒绝错误的零假设。
- False negative 假阴性
Statistical Power 统计功效
- Definition 定义
- Probability that the test will correctly reject a false null hypothesis.
- 检验正确拒绝错误的零假设的概率。
- Formula 公式:
- $$
\text{Power} = 1 - \beta$$
- $$
Measuring Effect Size 测量效果大小
- Cohen’s d 科恩的d
- Measures the size of the mean difference in terms of standard deviation units.
- 以标准差单位衡量平均差异的大小。
Part 4: t-Tests t检验
The t Statistic t统计量
- Definition 定义
- Used to test hypotheses about an unknown population mean when the population standard deviation is also unknown.
- 用于在总体标准差未知的情况下检验未知总体均值的假设。
- Formula 公式:
- $$
t = \frac{M - \mu}{s_M}$$ - $s_M$: Estimated standard error 估计标准误差
- $$
Degrees of Freedom 自由度
- Definition 定义
- Number of scores in a sample that are independent and free to vary.
- 样本中独立且自由变化的分数数量。
- Formula 公式:
- $$
df = n - 1$$
- $$
Hypothesis Tests with the t Statistic 使用t统计量的假设检验
- Steps 步骤
- State the hypotheses and select an alpha level.
- 陈述假设并选择显著性水平。
- Locate the critical region using df and alpha level.
- 使用自由度和显著性水平找到临界区。
- Calculate the test statistic.
- 计算检验统计量。
- Make a decision.
- 做出决定。
- State the hypotheses and select an alpha level.
Independent-Measures t Test 独立样本t检验
- Purpose 目的
- Determine whether the sample mean difference indicates a real mean difference between two populations.
- 确定样本均值差异是否表示两个总体之间的真实均值差异。
Pooled Variance 合并方差
- Definition 定义
- Provides an unbiased basis for calculating the standard error.
- 提供计算标准误差的无偏基础。
- Formula 公式:
- $$
s_p^2 = \frac{SS_1 + SS_2}{df_1 + df_2}$$
- $$
Repeated-Measures t Test 重复测量t检验
- Definition 定义
- Evaluate the mean difference between two treatment conditions using data from a single sample.
- 使用单个样本的数据评估两个处理条件之间的平均差异。
- Difference Score 差异分数:
- $$
D = X_2 - X_1$$
- $$
Confidence Intervals 置信区间
- Definition 定义
- Range of values that estimate the unknown population mean.
- 估计未知总体均值的值范围。
- Formula 公式:
- $$
\mu = M \pm t(s_M)$$
- $$
PSY2013 Lecture 07 Notes