[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (9)

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

now is better than never

[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (9) 본문

Python/[코칭스터디 9기] 인공지능 AI 기초 다지기

[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (9)

김초송 2023. 3. 13. 19:04

7-1) Tips

- MLE (Maximum Likelihood Estimation)

최대 우도/가능도 추정

압정이 바닥에 떨어졌을 때 가능성 2가지 = 예측해야 하는 값 2가지
-> 베르누이 분포 (이항 분포) -> binary classification
100번 던졌을 때 class 1이 27번 = observation (관찰값)

Binomial distribution
우리가 알고싶은 것 : θ
만약 가우시안 분포(연속적)을 따른다면 θ는 µ와 σ가 됨

θ 따른 어떤 값 = likelihood
y 값이 최대가 되는 어떤 지점 θ = observation을 가장 잘 설명하는 θ
MLE : 관찰한 데이터를 가장 잘 설명하는 어떤 확률 분포 함수의 parameter를 찾아내는 과정
기울기가 양수면 큰 값으로 이동
기울기가 음수면 작은 값으로 이동 = Gradient Ascent

- Optimization via Gradient Descent

손실 함수에 값 θ 와 데이터 x 가 주어졌을 때
기존 θ 에서 손실 함수의 값을 θ 에 대해서 미분한 것에 learning rate 를 곱한 것을 뺌
-> θ 업데이트

local minimal 을 찾는 것 = 최적화

- Overfitting

파란색 : Decision Boundary
빨간색 : Maximum Likelihood Estimation
-> 주어진 데이터에 과도하게 fitting = OVERFITTING
overfiitting을 최소화 하는 방법? train - test data 분리
training set + (dev set) + test set = observation
비율 : 0.8 / (0~0.1) / 0.1~0.2
development(validation) set : test set 과적합 방지
train -> dev 에서 검증 -> test
하지만 없는 경우가 더 많음

오버피팅을 막는 법
1. More Data
2. Less Features
3. Regularization

- Regularization

Early Stopping : Validation Loss 가 더이상 낮아지지 않을 때
Reducing Network Size : 딥러닝에 한해 뉴럴 네트워크의 학습량을 줄임
Weight Decay : 뉴럴 네트워크 weight parameter 크기 제한
Dropout : 딥러닝
Batch Normalization : 딥러닝

- Basic Approach to Train DNN

Make a neural network architecture
Train and check that model is over-fitted.
1. if not, increase the model size (layer - deeper and wider).
2. if, add regularization, such as dropout, batch-normalization.
  when : validation loss is increasing, training loss is decreasing
3. Repeat from step2

- Training and Test Dataset

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# for reproducibility
torch.manual_seed(1)

x_train = torch.FloatTensor([[1, 2, 1],
                             [1, 3, 2],
                             [1, 3, 4], 
                             [1, 5, 5], 
                             [1, 7, 5],
                             [1, 2, 5],
                             [1, 6, 6],
                             [1, 7, 7]])
y_train = torch.LongTensor([2, 2, 2, 1, 1, 1, 0, 0])

x_test = torch.FloatTensor([[2, 1, 1], [3, 1, 2], [3, 3, 4]])
y_test = torch.LongTensor([2, 2, 2])

|x_train| = (m, 3)
|y_train| = (m, )
|x_test| = (m', 3)
|y_test| = (m', )
같은 데이터 = '같은 분포로부터 얻어진 데이터' 라는 뜻

# model
class SoftMaxClassifierModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 3)
    def forward(self, x):
        return self.linear(x)
        
model = SoftMaxClassifierModel()

# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=0.1)

nn.Linear(3, 3) : 3 -> 3
|x| = (m, 3) -> (m, 3)

# Training
def train(model, optimizer, x_train, y_train):
    nb_epochs = 20
    for epoch in range(nb_epochs):
        
        # H(x) 계산
        prediction = model(x_train)
        
        # cost 계산
        cost = F.cross_entropy(prediction, y_train)
        
        # cost로 H(x) 계산
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
        
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

|x_train| = (m, 3) -> |prediction| = (m, 3)
|y_train| = (m, )

# Test (Validation)
def test(model, optimizer, x_test, y_test):
    prediction = model(x_test)
    predicted_classes = prediction.max(1)[1]
    correct_count = (predicted_classes == y_test).sum().item()
    cost = F.cross_entropy(prediction, y_test)
    
    print('Accuracy: {}% Cost: {:.6f}'.format(
        correct_count / len(y_test) * 100, cost_item()
    ))

|x_test| = (m', 3) -> |prediction| = (m', 3)
predicted_classes ??
cross_entropy : 실제 정답과 얼마나 비슷한지
cost.item() : 현재 loss 값
- 점점 줄어드는 것 확인

# Run
train(model, optimizer, x_train, y_train)
test(model, optimizer, x_test, y_test)

train : x_train 과 y_train 을 잘 설명하는 뉴럴 네트워크의 parameter 가 optimizer 를 통해서 찾아지고 있는 것
= θ by MLE (Maximum Likelihood Estimation)
test : train 의 마지막 loss 값보다 test 의 loss 값이 높아짐
= 이미 오버피팅 된 상태

- Learing Rate

Gradient Descent : Loss 함수를 최소로 하는 방향으로 θ 업데이트
gradient 가 작으면 α 를 크게 해서 많이 학습
Learning Rate : 학습 속도를 조절할 수 있음

learning rate 가 너무 크면 diverge(발산) 하면서 cost 가 점점 늘어난다 = OVERSHOOTING
learning rate 가 너무 작으면 cost 가 거의 줄어들지 않는다
데이터마다 적절한 learing rate는 다르기 때문에 적절한 숫자로 시작해 발산하면 작게, cost 가 줄어들지 않으면 크게 조정하며 최적의 learing rate 를 찾음

- Data Preprocessing (데이터 전처리)

학습하기 쉽게 데이터를 미리 처리하는 것은 매우 중요

x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

Regression 문제 -> 어떤 값에 가까워지도록 훈련 (MSE)
|x_train| = (m, 3)
|y_train| = (m, )

standardization : 정규분포화
σ : standard deviation
µ : 평균값
정규분포 (~N(0, 1)) 를 따른다고 가정하고 µ 와 σ 를 구해서 정규분포 값으로 만듦

mu = x_train.mean(dim=0)
sigma = x_train.std(dim=0)
norm_x_train = (x_train - mu) / sigma # 정규화 수행

- Training with Preprocessed Data

class MultivariateLinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1)
        
    def forward(self, x):
        return self.linear(x)
        
model = MultivariateLinearRegressionModel()
optimizer = optim.SGD(model.parameters(), lr=1e-1)

nn.Linear(3, 1) : input = 3, output = 1 layer
learning rate = 0.1

def train(model, optimizer, x_train, y_train):
    nb_epochs = 20
    
    for epoch in range(nb_epochs):
        
        # H(x) 계산
        prediction = model(x_train)
        
        # Cost 계산
        cost = F.mse_loss(prediction, y_train)
        
        # Cost 로 H(x) 계산
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
        
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))
        
train(model, optimizer, norm_x_train, y_train)

|x_train| = (m, 3)
|prediction| = (m, 1)
mse_loss : regression 수행 -> prediction 과 y_train 차이(거리) 계산

만약 전처리를 하지 않았다면 최적화 하는데 어려움을 겪을 것
|y_train| = (m, ) : 하나의 값
만약 |y_train| = (m, 2) 이고 첫번째 컬럼의 값은 매우 크고 두번째 컬럼의 값은 매우 작은 값을 예측하도록 학습한다면 뉴럴 네트워크는 큰 값에만 집중함
전처리: µ 와 σ 로 똑같은 범위의 값으로 바뀜
데이터의 탐색적 분석 -> 성질과 형태를 파악 -> 전처리를 해주는 것이 매우 중요

'Python > [코칭스터디 9기] 인공지능 AI 기초 다지기' 카테고리의 다른 글

[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (10) (0)	2023.03.16
[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (8) (0)	2023.03.12
[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (7) (0)	2023.03.09
[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (6) (0)	2023.03.07
[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (5) (0)	2023.03.07

'Python/[코칭스터디 9기] 인공지능 AI 기초 다지기' Related Articles

now is better than never

[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (9) 본문

[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (9)

7-1) Tips

'Python > [코칭스터디 9기] 인공지능 AI 기초 다지기' 카테고리의 다른 글

티스토리툴바