本文共 3355 字,大约阅读时间需要 11 分钟。
Python实现的 Linear Regression 例子(附图)
Python来实现一个简单的线性回归的例子。
假设下面的两个变量是线性相关的。因此,我们试图找到一个线性函数,尽可能准确地预测响应值(y)作为特征或自变量(x)的函数。
x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
y | 1 | 3 | 2 | 5 | 7 | 8 | 8 | 9 | 10 | 12 |
一般而言,我们定义:
x作为特征向量,即x=[x_1,x_2,…,x_n],
y为响应向量,即y=[y_1,y_2,…,y_n]
对于n个观测值(在上面的例子中,n=10)。
用Python来实现上述数据集的散点图。
# -*- coding: utf-8 -*-"""Created on Thu Mar 26 18:48:49 2020@author: Bean029"""import numpy as np import matplotlib.pyplot as plt def estimate_coef(x, y): # number of observations/points n = np.size(x) # mean of x and y vector m_x, m_y = np.mean(x), np.mean(y) # calculating cross-deviation and deviation about x SS_xy = np.sum(y*x) - n*m_y*m_x SS_xx = np.sum(x*x) - n*m_x*m_x # calculating regression coefficients b_1 = SS_xy / SS_xx b_0 = m_y - b_1*m_x return(b_0, b_1) def plot_regression_line(x, y, b): # plotting the actual points as scatter plot plt.scatter(x, y, color = "m", marker = "o", s = 30) # predicted response vector y_pred = b[0] + b[1]*x # plotting the regression line plt.plot(x, y_pred, color = "g") # putting labels plt.xlabel('x') plt.ylabel('y') # function to show plot plt.show() def main(): # observations x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12]) # estimating coefficients b = estimate_coef(x, y) print("Estimated coefficients:\nb_0 = {} \nb_1 = {}".format(b[0], b[1])) # plotting regression line plot_regression_line(x, y, b) if __name__ == "__main__": main()
产生的散点图如下所示:
现在,我们的任务是找到一条最适合上述散点图的线,以便我们可以预测任何新特征值的响应。(即数据集中不存在x值)
这条线叫做回归线。
多元线性回归试图通过将线性方程拟合到观测数据来模拟两个或多个特征与响应之间的关系。
显然,这只是简单线性回归的一个扩展。
考虑一个具有p个特征(或自变量)和一个响应(或因变量)的数据集。
此外,数据集还包含n行/观察值。
# -*- coding: utf-8 -*-"""Created on Thu Mar 26 18:53:13 2020@author: Bean029"""import matplotlib.pyplot as plt import numpy as np from sklearn import datasets, linear_model, metrics # load the boston dataset boston = datasets.load_boston(return_X_y=False) # defining feature matrix(X) and response vector(y) X = boston.data y = boston.target # splitting X and y into training and testing sets from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1) # create linear regression object reg = linear_model.LinearRegression() # train the model using the training sets reg.fit(X_train, y_train) # regression coefficients print('Coefficients: \n', reg.coef_) # variance score: 1 means perfect prediction print('Variance score: {}'.format(reg.score(X_test, y_test))) # plot for residual error ## setting plot style plt.style.use('fivethirtyeight') ## plotting residual errors in training data plt.scatter(reg.predict(X_train), reg.predict(X_train) - y_train, color = "red", s = 10, label = 'Train data') ## plotting residual errors in test data plt.scatter(reg.predict(X_test), reg.predict(X_test) - y_test, color = "blue", s = 10, label = 'Test data') ## plotting line for zero residual error plt.hlines(y = 0, xmin = 0, xmax = 50, linewidth = 2) ## plotting legend plt.legend(loc = 'upper right') ## plot title plt.title("Residual errors") ## function to show plot plt.show()
程序运行后的结果:
转载地址:http://ypvdi.baihongyu.com/