Linear Regression의 cost 최소화 알고리즘의 원리

Simplified hypothesis

기존 cost function에서 bias를 제거한 것.

수식으로 나타내면 아래와 같다.

bias 없이 다음 traning data set에서 W를 구해보았다.

x	y
1	1
2	2
3	3

W = 1, cost(W) = 0

( (1*1-1)^2 + (1*2-2)^2 + (1*3-3)^2 ) / 3 = 0+0+0 = 0

W = 0, cost(W) = 4.67

( (0*1-1)^2 + (0*2-2)^2 + (0*3-3)^2) / 3 = 14 / 3 = 4.67

W = 2, cost(W) = 4.67

( (2*1-1)^2 + (2*2-2)^2 + (2*3-3)^2) / 3 = 14 / 3 = 4.67

0에서 멀어질수록 cost function의 값이 더 커지는 것을 볼 수 있다.

이를 그래프로 나타내면 아래와 같다.

Goal : cost 값을 가장 작게 만드는 W를 구하는 것

-> Gradient descent algorithm을 사용하면 가능하다.

Gradient descent algorithm

직역 : 경사를 따라 내려가는 알고리즘
이 알고리즘을 사용하면 cost function을 minimize할 수 있다.
위 그래프에서 경사도(기울기)를 측정하면,
- W가 5일 때 양수 값
- W가 -5일 때 음수 값
- W가 0일 때 0
아무 곳에서나 시작해서 W를 조금씩 바꾸면서 경사도를 줄여나가는 과정이다.
어떤 곳에서 시작해도 가장 낮은 경사도로 수렴 가능하다.
기울기는 미분값으로 구할 수 있다.
수식을 구하는 과정은 아래 그림과 같다.

Convex function

시작점이 어디인지에 따라 도착하는 W가 달라지는 문제
위에서 소개한 cost function, hypothesis를 가지고 접근하면 항상 도착점이 일치한다.
cost function을 설계할 때, 반드시 convex function에 대해 고려해야 한다.
설계가 이 문제를 일으키지 않으면 gradient descent algorithm은 가장 작은 W값을 보장한다.

다음은 위 내용을 실습하기 위해 작성한 코드이다.

-3부터 50까지 0.1이라는 작은 단위로 관찰하여 W 값를 출력하는 코드이다.

import tensorflow as tf
import matplotlib.pyplot as plt

X = [1,2,3]
Y = [1,2,3]

W = tf.placeholder(tf.float32)

hypothesis = X * W

cost = tf.reduce_mean(tf.square(hypothesis - Y))

sess = tf.Session()
sess.run(tf.global_variables_initializer())

W_val = []
cost_val = []

for i in range(-30, 50):
    feed_W = i * 0.1
    curr_cost, curr_W = sess.run([cost, W], feed_dict={W: feed_W})
    W_val.append(curr_W)
    cost_val.append(curr_cost)
    
plt.plot(W_val, cost_val)
plt.show()

결과는 다음과 같다.

다음은 gradient descent를 적용하여 W를 구하는 코드이다.

update라는 변수에 gradient descent를 구하는 일련의 과정을 할당할 수 있다.

x_data = [1,2,3]
y_data = [1,2,3]

W = tf.Variable(tf.random_normal([1]), name='weight')
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

hypothesis = X * W
cost = tf.reduce_sum(tf.square(hypothesis - Y))

# Minimize : Gradient Descent using derivative : W -= Learning_rate * derivative
learning_rate = 0.1
gradient = tf.reduce_mean((W * X - Y) * X)
descent = W - learning_rate * gradient
update = W.assign(descent)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for step in range(21):
    sess.run(update, feed_dict={X: x_data, Y:y_data})
    print(step, sess.run(cost, feed_dict={X: x_data, Y: y_data}), sess.run(W))

결과는 다음과 같다.

0 1.2159512 [0.70529056]
1 0.34587038 [0.84282166]
2 0.09838092 [0.91617155]
3 0.027983889 [0.9552915]
4 0.007959878 [0.97615546]
5 0.002264138 [0.98728293]
6 0.00064401084 [0.9932176]
7 0.00018318668 [0.9963827]
8 5.2105963e-05 [0.9980708]
9 1.4820023e-05 [0.9989711]
10 4.216036e-06 [0.9994512]
11 1.1988791e-06 [0.99970734]
12 3.4110508e-07 [0.9998439]
13 9.709889e-08 [0.99991673]
14 2.7621713e-08 [0.9999556]
15 7.847621e-09 [0.99997634]
16 2.2354243e-09 [0.99998736]
17 6.375167e-10 [0.99999326]
18 1.7905677e-10 [0.9999964]
19 5.0931703e-11 [0.9999981]
20 1.4740209e-11 [0.999999]

※ inflearn 모두를 위한 딥러닝 강좌를 듣고 정리한 내용입니다.

'Deep Learning' 카테고리의 다른 글

TensorFlow로 파일에서 데이터 읽어오기 (0)	2019.04.21
여러 feature의 linear regression (0)	2019.04.15
Linear Regression의 Hypothesis와 cost (0)	2019.04.14
TensorFlow의 설치 및 기본적인 operations (0)	2019.04.14
Machine Learning 개념과 용어 (0)	2019.04.14

프로그래밍 기록

Linear Regression의 cost 최소화 알고리즘의 원리

'Deep Learning' 카테고리의 다른 글

티스토리툴바

Linear Regression의 cost 최소화 알고리즘의 원리

'Deep Learning' 카테고리의 다른 글

'Deep Learning' Related Articles

티스토리툴바