霍开拓
数据分析--拟合
2019-4-13 19:15
阅读:4241

数据分析--拟合



1.求根

二分法,牛顿法,简单迭代法

newton法

迭代公式X^{(k+1)}=X^{(k)}-F^{\prime}\left(X^{(k)}\right)^{-1} F\left(X^{(k)}\right)\\

F(X)的雅克比矩阵

\boldsymbol{F}^{\prime}\left(\boldsymbol{X}^{(k)}\right)=\left[ \begin{array}{cccc}{\frac{\partial f_{1}}{\partial x_{1}}} & {\frac{\partial f_{1}}{\partial x_{2}}} & {\cdots} & {\frac{\partial f_{1}}{\partial x_{n}}} \\ {\frac{\partial f_{2}}{\partial x_{1}}} & {\frac{\partial f_{2}}{\partial x_{2}}} & {\cdots} & {\frac{\partial f_{2}}{\partial x_{n}}} \\ {\vdots} & {\vdots} & {\cdots} & {\vdots} \\ {\frac{\partial f_{n}}{\partial x_{1}}} & {\frac{\partial f_{n}}{\partial x_{2}}} & {\cdots} & {\frac{\partial f_{n}}{\partial x_{n}}}\end{array}\right]\\

2.最小二乘拟合

最小二乘拟合是将包含统计不确定性的数据拟合为公式(“理论”)的首选方法。


\begin{array}{l}{\sum_{i=1}^{m}\left|p\left(x_{i}\right)-y_{i}\right|} \\ {\sum_{i=1}^{m}\left|p\left(x_{i}\right)-y_{i}\right|^{2}}\end{array}\\

已知数据点(xi, yi), 在函数空间 \Phi  中寻找S*(xi),使得

\sum_{i=0}^{m} \omega_{i}\left[S^{*}\left(x_{i}\right)-y_{i}\right]^{2}=\min _{s(x) \in \Phi} \sum_{i=0}^{m} \omega_{i}\left[S\left(x_{i}\right)-y_{i}\right]^{2}

其中 \omega_i  是点xi 处的权

在连续空间C[a, b] 中选定 n+1 个线性无关基函数,由它们生成的子空间记为

\begin{aligned} \Phi &=\operatorname{span}\left\{\varphi_{0}(x), \varphi_{1}(x), \cdots, \varphi_{n}(x)\right\} \\ &=\left\{\varphi(x) | \varphi(x)=\sum_{k=0}^{n} \alpha_{k} \varphi_{k}(x), \forall \alpha_{0}, \alpha_{1}, \cdots, \alpha_{n} \in R\right\} \end{aligned}\\

线性拟合

若存在\varphi^{*}(x)=\sum_{k=0}^{n} \alpha_{k}^{*} \varphi_{k}(x) \in \Phi ,使得\sum_{i=0}^{m}\left[y_{i}-\varphi^{*}\left(x_{i}\right)\right]^{2}=\min _{\varphi(x) \in \Phi} \sum_{i=0}^{m}\left[y_{i}-\varphi\left(x_{i}\right)\right]^{2}\\

则称 \varphi^{*}(x) 为离散数据 \left\{x_{i}, y_{i}\right\}_{i=0}^{m} 在子空间 \Phi 上的最小二乘拟合。

对于选定的基函数 \left\{\varphi_{k}(x)\right\}_{k=0}^{n} ,拟合曲线 \varphi(x)=\sum_{k=0}^{n} \alpha_{k} \varphi_{k}(x) 是n+1 个参数的线性函数,称之为线性最小二乘。

{\varphi\left(x_{i}\right)=\sum_{k=0}^{n} \alpha_{k} \varphi_{k}\left(x_{i}\right), \quad i=0,1,2, \cdots, m} \\

{I\left(\alpha_{0}, \alpha_{1}, \cdots, \alpha_{n}\right)=\sum_{i=0}^{m}\left[y_{i}-\sum_{k=0}^{n} \alpha_{k} \varphi_{k}\left(x_{i}\right)\right]^{2}}\}\\然后要做的就是求 I 在极值点下的参数

I\left(\alpha_{0}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{n}^{*}\right)=\min _{\alpha_{0}, \alpha_{1}, \cdots, \alpha_{n} \in R} I\left(\alpha_{0}, \alpha_{1}, \cdots, \alpha_{n}\right)\\

通过求导寻找参数ak*

\frac{\partial I\left(\alpha_{0}, \alpha_{1}, \cdots, \alpha_{n}\right)}{\partial \alpha_{l}}=-2 \sum_{i=0}^{m}\left[y_{i}-\sum_{k=0}^{n} \alpha_{k} \varphi_{k}\left(x_{i}\right)\right] \varphi_{l}\left(x_{i}\right)=0, \quad l=0,1, \cdots, n\\

\sum_{k=0}^{n} \alpha_{k}\left(\sum_{i=0}^{m} \varphi_{k}\left(x_{i}\right) \varphi_{l}\left(x_{i}\right)\right)=\sum_{i=0}^{m} y_{i} \varphi_{l}\left(x_{i}\right), \quad l=0,1, \cdots, n\\

记m+1维向量: \left\{\begin{array}{l}{\varphi_{k}=\left[\varphi_{k}\left(x_{0}\right), \varphi_{k}\left(x_{1}\right), \cdots, \varphi_{k}\left(x_{m}\right)\right]^{T}, \quad k=0,1,2, \cdots, n} \\ {y=\left(y_{0}, y_{1}, \cdots, y_{m}\right)^{T}}\end{array}\right.  \\

用函数內积形式简记为

\left\{\begin{array}{l}{\left(\varphi_{k}, \varphi_{l}\right)=\sum_{i=0}^{m} \varphi_{k}\left(x_{i}\right) \varphi_{l}\left(x_{i}\right), \quad k, l=0,1, \cdots, n} \\ {\left(y, \varphi_{l}\right)=\sum_{i=0}^{m} y_{i} \varphi_{l}\left(x_{i}\right), \quad l=0,1, \cdots, n}\end{array}\right.\\

\sum_{k=0}^{n} \alpha_{k}\left(\varphi_{k}, \varphi_{l}\right)=\left(y, \varphi_{l}\right), \quad l=0,1, \cdots, n\\

\left( \begin{array}{cccc}{\left(\varphi_{0}, \varphi_{0}\right)} & {\left(\varphi_{1}, \varphi_{0}\right)} & {\cdots} & {\left(\varphi_{n}, \varphi_{0}\right)} \\ {\left(\varphi_{0}, \varphi_{1}\right)} & {\left(\varphi_{1}, \varphi_{1}\right)} & {\cdots} & {\left(\varphi_{n}, \varphi_{1}\right)} \\ {\cdots} & {\cdots} & {\cdots} & {\cdots} \\ {\left(\varphi_{0}, \varphi_{n}\right)} & {\left(\varphi_{1}, \varphi_{n}\right)} & {\cdots} & {\left(\varphi_{n}, \varphi_{n}\right)}\end{array}\right) \left( \begin{array}{c}{\alpha_{0}} \\ {\alpha_{1}} \\ {\vdots} \\ {\alpha_{n}}\end{array}\right)=\left( \begin{array}{c}{\left(y, \varphi_{0}\right)} \\ {\left(y, \varphi_{1}\right)} \\ {\vdots} \\ {\left(y, \varphi_{n}\right)}\end{array}\right)\\

多项式拟合

\varphi(x)=\alpha_{0}+\alpha_{1} x+\alpha_{2} x^{2}+\cdots+\alpha_{n} x^{n}


\left[ \begin{array}{cccc}{\sum_{i=0}^{m} \omega_{i}} & {\sum_{i=0}^{m} \omega_{i} x_{i}} & {\cdots} & {\sum_{i=0}^{m} \omega_{i} x_{i}^{n}} \\ {\sum_{i=0}^{m} \omega_{i} x_{i}} & {\sum_{i=0}^{m} \omega_{i} x_{i}^{2}} & {\cdots} & {\sum_{i=0}^{m} \omega_{i} x_{i}^{n+1}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {\sum_{i=0}^{m} \omega_{i} x_{i}^{n}} & {\sum_{i=0}^{m} \omega_{i} x_{i}^{n+1}} & {\cdots} & {\sum_{i=0}^{m} \omega_{i} x_{i}^{2 n}}\end{array}\right] \left[ \begin{array}{c}{a_{0}} \\ {a_{1}} \\ {\vdots} \\ {a_{n}}\end{array}\right]=\left[ \begin{array}{c}{\sum_{i=0}^{m} \omega_{i} y_{i}} \\ {\sum_{i=0}^{m} \omega_{i} x_{i} y_{i}} \\ {\vdots} \\ {\sum_{i=0}^{m} \omega_{i} x_{i}^{n} y_{i}}\end{array}\right]

其中 \omega_i  为权函数


实例

线性拟合

Ax=b

A=\left[ \begin{array}{lll}{S} & {S_{x}} & {S_{x x}} \\ {S_{x}} & {S_{x x}} & {S_{x x x}} \\ {S_{x x}} & {S_{x x x}} & {S_{x x x x}}\end{array}\right], \quad \mathbf{x}=\left[ \begin{array}{c}{a_{1}} \\ {a_{2}} \\ {a_{3}}\end{array}\right], \quad \mathbf{b}=\left[ \begin{array}{c} {S_{y}} \\ {S_{x y}} \\ {S_{x x y}}\end{array}\right] \\



In 1929 Edwin Hubble examined the data in Table relating the radial velocity v of 24 extra galactic nebulae to their distance r from our galaxy. Although there was considerable scatter in the data, he fit them with a straight line:

v=Hr\\

对上面的数据做一个线性拟合

得到

y(x) = a0+a1 x 
 a0 = 0.032 a1 = 0.9


非线性拟合

下面采用洛伦兹函数来拟合数据

Measured total cross sections in millibarns of neutron-carbon scattering as a function of neutron energy in KeVs.

洛伦兹函数的形式

\sigma_{T}(E)=\frac{\sigma_{0}}{\left(E-E_{r}\right)^{2}+\Gamma^{2} / 4} \\

简记为

g(x)=\frac{a_{1}}{\left(x-a_{2}\right)^{2}+a_{3}}, \quad a_{1}=\sigma_{0}, \quad a_{2}=E_{R}, \quad a_{3}=\Gamma^{2} / 4, \quad x=E\\

对三个参数的偏导数分别为

\frac{\partial g}{\partial a_{1}}=\frac{1}{\left(x-a_{2}\right)^{2}+a_{3}}, \quad \frac{\partial g}{\partial a_{2}}=\frac{-2 a_{1}\left(x-a_{2}\right)}{\left[\left(x-a_{2}\right)^{2}+a_{3}\right]^{2}}, \quad \frac{\partial g}{\partial a_{3}}=\frac{-a_{1}}{\left[\left(x-a_{2}\right)^{2}+a_{3}\right]^{2}}\\

利用最小二乘得到

\sum_{i=1}^{9} \frac{\left[y_{i}-g\left(x_{i}\right)\right]}{\sigma_{i}^{2}} \frac{\partial g\left(x_{i}\right)}{\partial a_{m}}=0, \quad(m=1,2,3)\\

使用牛顿迭代法寻找参数

import math
import numpy as np  
from numpy import * 
import matplotlib.pyplot as plt
import matplotlib

plt.figure(figsize=(20,16), dpi=100)
plt.rc('font',family='Times New Roman') 
matplotlib.rcParams['font.family']='STSong'
plt.rcParams['axes.unicode_minus']=False 
matplotlib.rcParams['font.size']=20


ND=9
num =3   
E=np.array([0, 13 ,25 ,38, 50, 63, 75, 87, 100]) 
S=np.array([10.6, 16.0, 45.0, 83.5, 52.8, 19.9, 10.8, 8.25, 4.7])
sig = np.ones_like(S)
#sig =np.array([9.34, 17.9, 41.5, 85.5, 51.5, 21.5, 10.8, 6.29, 4.14])    # Error bars
x = np.array([20000,40,220],dtype=float)
plt.plot(E, S, 'bo',markersize=20,alpha=0.4,label="采样数据点") 

plt.title('Least-Squares Fit of linear to Blue Data',fontsize=20)                           
plt.xlabel('E',fontsize=32);  plt.ylabel('$\sigma$',fontsize=32);   plt.grid(True)   
def Fun(x,num):                       
    F = np.zeros((num),dtype=float)
    for j in range(ND):
        F[0] += (S[j]-x[0]/((E[j]-x[1])**2+x[2]))/sig[j]**2/((E[j]-x[1])**2+x[2])
        F[1] += -2*x[0]*(E[j]-x[1])*(S[j]-x[0]/((E[j]-x[1])**2+x[2]))/sig[j]**2/((E[j]-x[1])**2+x[2])**2
        F[2] += -x[0]*(S[j]-x[0]/((E[j]-x[1])**2+x[2]))/sig[j]**2/((E[j]-x[1])**2+x[2])**2
    
    return F

def dfun(x,num):                         
    df = np.zeros((num,num),dtype=float)
    dx = 0.00001                           # 
    x1_plus = np.copy(x)
    x1_minus = np.copy(x)

    for i in range(0,num):             
        for j in range(0,num):
            x1_plus = np.copy(x)
            x1_minus = np.copy(x)
            x1_plus[j] = x1_plus[j]+dx/2           #x+dx 
            x1_minus[j] = x1_minus[j]-dx/2
            df[i,j] = (Fun(x1_plus,num)[i]-Fun(x1_minus,num)[i])/dx   #f(x+dx)-f(x)/dx
    df_1 = np.linalg.inv(df)                              #计算逆矩阵
    return df_1

def Newton(x,num):
    x1 = np.copy(x)
    i = 0
    delta = np.copy(x)
#    dfun0=dfun(x,num)          
    while(np.sum(abs(delta)) > 1.e-6 and i < 100):  #控制循环次数
        x1 = x-np.dot(dfun(x,num),Fun(x,num))  #公式
        delta = x1-x                     
        x = x1
        i = i+1
        print(x,)
    return x

                    
a = Newton(x,num)
print(a)

EX=np.linspace(0,100,1000)
G_E=x[0]/((EX-x[1])**2+x[2])
G = a[0]/((EX-a[1])**2+a[2])
points = a[0]/((E-a[1])**2+a[2])
plt.plot(EX, G_E,'k', linewidth=3.0,label="初猜曲线")
plt.plot(EX, G,'r:',linewidth=5.0,label="最终拟合曲线")
plt.plot(E, points, 'ro',markersize= 20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.legend()
plt.show()


results

[ 14956.38460591 38.99233378 177.84417855]

[ 16568.5542651 38.80318039 196.28768571]

[ 16786.04125645 38.81295456 198.89084396]

[ 16789.85161833 38.81293923 198.94072711]

[ 16789.85282055 38.81293926 198.94074454]

[ 16789.85282055 38.81293926 198.94074454]

[ 16789.85282055 38.81293926 198.94074454]

初值的选取很重要,因为在求解拟合参数时用的是牛顿法,如果选取的初值不合适则直接导致拟合参数不收敛,那就没法玩了。先把点描出来,然后根据洛伦兹函数估算出几个参数,首先是a2,a2是平方项(平方项说明它是对称函数)里面的,a2差不多就是平移量(40左右)。然后a3,a3差不多是半高宽一半的平方(200左右)。然后a1,a1是峰值(80)*a2,差不多是16000。

使用 scipy 库

import numpy as np
from scipy.optimize import leastsq
import pylab as pl

def func(x, p):
    a1, a2, a3 = p
    return a1/((x-a2)**2+a3)   

def residuals(p, y, x):
    return y - func(x, p)

x = np.linspace(0, 100, 1000)

p0 = [16000,40, 200, ] 
x0=np.array([0, 13 ,25 ,38, 50, 63, 75, 87, 100]) 
y0=np.array([10.6, 16.0, 45.0, 83.5, 52.8, 19.9, 10.8, 8.25, 4.7])
y1 = y0 + 2 * np.random.randn(len(x0))   

plsq = leastsq(residuals, p0, args=(y1, x0))

print (u"拟合参数", plsq[0]) 
plt.title(u'洛伦兹函数最小二乘法拟合',fontsize=20)                           
pl.plot(x0, y0, 'bo',markersize=20,alpha=0.4, label=u"真实数据")
pl.plot(x0, y1, 'y*', markersize=20,alpha=0.4, label=u"带噪声的实验数据")
pl.plot(x, func(x, plsq[0]), 'r:',linewidth=5.0, label=u"拟合曲线")
plt.plot(x, func(x, p0),'k', linewidth=3.0,label="初猜曲线")
plt.plot(x0, func(x0, plsq[0]), 'ro',markersize= 20)
plt.xlabel('E',fontsize=32);  plt.ylabel('$\sigma$',fontsize=32);   plt.grid(True)   
pl.legend()
pl.show()




转载本文请联系原作者获取授权,同时请注明本文来自霍开拓科学网博客。

链接地址:https://wap.sciencenet.cn/blog-566204-1173092.html?mobile=1

收藏

分享到:

当前推荐数:1
推荐人:
推荐到博客首页
网友评论1 条评论
确定删除指定的回复吗?
确定删除本博文吗?