pbj0812의 코딩 일기

[통계학] z-score 를 python, MySQL 로 구현하기 본문

Science/통계학

[통계학] z-score 를 python, MySQL 로 구현하기

pbj0812 2023. 2. 15. 01:48

0. 목표

 - z-score 를 python, MySQL 로 구현하기

1. 이론

 - 데이터의 평균을 0.0 으로 표준편차를 1.0 으로 만드는 기법

위키 참조

2. 구현

 1) scipy 로 구현

from scipy import stats

x = [i for i in range(1, 10)]
z_score = stats.zscore(x)
print(z_score)

 

 2) 그냥 python 으로 구현

import math

x = [i for i in range(1, 10)]

len_x = len(x) # 길이

x_mean = sum(x) / len_x # 평균

x_var = 0
for i in x:
    x_var += (i - x_mean) ** 2
x_var = x_var / len_x # 분산

x_std = math.sqrt(x_var) # 표준편차

z_score = []
for i in x: 
    z_score.append((i - x_mean) / x_std)
    
print(z_score)

 3) MySQL

CREATE TABLE sql_study.z_score
(
     number int
);

INSERT INTO sql_study.z_score(number) VALUES(1);
INSERT INTO sql_study.z_score(number) VALUES(2);
INSERT INTO sql_study.z_score(number) VALUES(3);
INSERT INTO sql_study.z_score(number) VALUES(4);
INSERT INTO sql_study.z_score(number) VALUES(5);
INSERT INTO sql_study.z_score(number) VALUES(6);
INSERT INTO sql_study.z_score(number) VALUES(7);
INSERT INTO sql_study.z_score(number) VALUES(8);
INSERT INTO sql_study.z_score(number) VALUES(9);

SELECT * FROM sql_study.z_score;

# 평균
WITH x_mean AS (
	SELECT
		AVG(number) AS x_mean
    FROM sql_study.z_score 
),

# 분산
x_var AS (
	SELECT 
        SUM(POW(z.number - x.x_mean, 2)) / (SELECT COUNT(*) FROM sql_study.z_score) AS x_var
    FROM sql_study.z_score AS z, x_mean AS x
),

# 표준편차
x_std AS (
	SELECT
		SQRT(x_var) AS x_std
	FROM x_var
),

# z-score
z_score AS (
	SELECT 
		(number - (SELECT x_mean FROM x_mean)) / (SELECT x_std FROM x_std) AS z_score
    FROM sql_study.z_score
)

SELECT * FROM z_score;

3. 참고

 - scipy.stats.zscore

 - [MYSQL] MYSQL 숫자 및 수학 함수

Comments