股票的因素很多,如何辨别该因素是有效因素,即某个因素对收益的影响。下面举一个小例子来说明,某证券公司对5个地区的分公司的单日开户数量进行分析,每个地区获取10个营业部的数据,得到如下资料:
D1 [126,124,120,92,125,142,29,26,123,29]
D2 [40,45,66,22,41,30,23,70,90,111]
D3 [10,11,13,11,9,8,13,11,6,7]
D4 [8,11,6,7,7,9,12,15,10,13]
D5 [7,6,8,8,13,6,10,7,5,9]
判断5个读取的单日开户数量是否有显著差异,取显著性水平0.05
import matplotlib.pyplot as plt
from scipy import stats
import tushare as ts
import numpy as np
import statsmodels.api as sm
%matplotlib inline
import pandas as pd
import sys
from statsmodels.formula.api import ols
import statsmodels.stats.anova as anova
dw = pd.DataFrame(columns=['num','locate'])
d1 = pd.DataFrame({"num":[126,124,120,92,125,142,29,26,123,29],'locate':['D1']*10})
d2 = pd.DataFrame({"num":[40,45,66,22,41,30,23,70,90,111],'locate':['D2']*10})
d3 = pd.DataFrame({"num":[10,11,13,11,9,8,13,11,6,7],'locate':['D3']*10})
d4 = pd.DataFrame({"num":[8,11,6,7,7,9,12,15,10,13],'locate':['D4']*10})
d5 = pd.DataFrame({"num":[7,6,8,8,13,6,10,7,5,9],'locate':['D5']*10})
d = d1.append(d2).append(d3).append(d4).append(d5)
model = ols('num ~ C(locate)', data=d).fit()
tabel = anova.anova_lm(model)
print(tabel)
上述结果表明1.729214e-10在0.05的显著水平下,p值远远小于0.05,故我们应该拒绝原假设,认为不同开户地点对开户数量是不一样的,因此我们的直觉得到了验证,即开户地点是影响开户数量的一个重要因素。