Simple Linear Regression

The goal of this project was to build a linear regression model from the ground up using numpy.

%matplotlib inline #imports from numpy import * import matplotlib.pyplot as plt

Import the data

Here, we're using a dataset with two columns containing the amount of hours studied and the test scores students achieved, respectively.

points = genfromtxt('data.csv', delimiter=',') #Extract columns x = array(points[:,0]) y = array(points[:,1]) #Plot the dataset plt.scatter(x,y) plt.xlabel('Hours of study') plt.ylabel('Test scores') plt.title('Dataset') plt.show()

Defining the hyperparameters

#hyperparameters learning_rate = 0.0001 initial_b = 0 initial_m = 0 num_iterations = 10

Define cost function

def compute_cost(b, m, points): total_cost = 0 N = float(len(points)) #Compute sum of squared errors for i in range(0, len(points)): x = points[i, 0] y = points[i, 1] total_cost += (y - (m * x + b)) ** 2 #Return average of squared error return total_cost/N

Define Gradient Descent functions

def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations): b = starting_b m = starting_m cost_graph = [] #For every iteration, optimize b, m and compute its cost for i in range(num_iterations): cost_graph.append(compute_cost(b, m, points)) b, m = step_gradient(b, m, array(points), learning_rate) return [b, m, cost_graph] def step_gradient(b_current, m_current, points, learning_rate): m_gradient = 0 b_gradient = 0 N = float(len(points)) #Calculate Gradient for i in range(0, len(points)): x = points[i, 0] y = points[i, 1] m_gradient += - (2/N) * x * (y - (m_current * x + b_current)) b_gradient += - (2/N) * (y - (m_current * x + b_current)) #Update current m and b m_updated = m_current - learning_rate * m_gradient b_updated = b_current - learning_rate * b_gradient #Return updated parameters return b_updated, m_updated

Run gradient_descent_runner() to get optimized parameters b and m

b, m, cost_graph = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations) #Print optimized parameters print ('Optimized b:', b) print ('Optimized m:', m) #Print error with optimized parameters print ('Minimized cost:', compute_cost(b, m, points))

Optimized b: 0.0296393478747 Optimized m: 1.47741737555 Minimized cost: 112.655851815

Plotting the cost per iterations

plt.plot(cost_graph) plt.xlabel('No. of iterations') plt.ylabel('Cost') plt.title('Cost per iteration') plt.show()

Gradient descent converges to local minimum after 5 iterations

Plot line of best fit

#Plot dataset plt.scatter(x, y) #Predict y values pred = m * x + b #Plot predictions as line of best fit plt.plot(x, pred, c='r') plt.xlabel('Hours of study') plt.ylabel('Test scores') plt.title('Line of best fit') plt.show()

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
linear-regression-demo_files		linear-regression-demo_files
README.md		README.md
data.csv		data.csv
linear-regression-demo.ipynb		linear-regression-demo.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Simple Linear Regression

Import the data

Defining the hyperparameters

Define cost function

Define Gradient Descent functions

Run gradient_descent_runner() to get optimized parameters b and m

Plotting the cost per iterations

Plot line of best fit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

philipphundertmark/simple-linear-regression-notebook

Folders and files

Latest commit

History

Repository files navigation

Simple Linear Regression

Import the data

Defining the hyperparameters

Define cost function

Define Gradient Descent functions

Run gradient_descent_runner() to get optimized parameters b and m

Plotting the cost per iterations

Plot line of best fit

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages