-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
360 lines (310 loc) · 51.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<!-- Begin Jekyll SEO tag v2.5.0 -->
<title>fluentQSAR</title>
<meta name="generator" content="Jekyll v3.8.5" />
<meta property="og:title" content="fluentQSAR" />
<meta property="og:locale" content="en_US" />
<link rel="canonical" href="https://zhangshd.github.io/fluentQSAR/" />
<meta property="og:url" content="https://zhangshd.github.io/fluentQSAR/" />
<meta property="og:site_name" content="fluentQSAR" />
<script type="application/ld+json">
{"@type":"WebSite","url":"https://zhangshd.github.io/fluentQSAR/","headline":"fluentQSAR","name":"fluentQSAR","@context":"http://schema.org"}</script>
<!-- End Jekyll SEO tag -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="theme-color" content="#157878">
<link rel="stylesheet" href="./HTML_files/style.css">
</head>
<body>
<section class="page-header">
<h1 class="project-name">fluentQSAR</h1>
<h2 class="project-tagline"></h2>
<a href="https://github.com/zhangshd/fluentQSAR" class="btn">View on GitHub</a>
<a href="https://github.com/zhangshd/fluentQSAR/archive/master.zip" class="btn">Download ZIP</a>
</section>
<section class="main-content">
<ul>
<li><a href="#qsar_package使用说明">QSAR_package使用说明</a>
<ul>
<li><a href="#运行环境配置需求">运行环境配置需求</a></li>
<li><a href="#1-使用前准备">1. 使用前准备</a></li>
<li><a href="#2-提取数据随机划分训练集测试集必选步骤三种方式三选一">2. 提取数据/随机划分训练集测试集——必选步骤三种方式三选一</a>
<ul>
<li><a href="#21-输入一个总描述符文件采用随机划分的方式产生训练集和测试集">2.1 输入一个总描述符文件,采用随机划分的方式产生训练集和测试集</a></li>
<li><a href="#22-根据训练集和测试集标签文件提取训练集和测试集">2.2 根据训练集和测试集标签文件提取训练集和测试集</a></li>
<li><a href="#23-训练集测试集放于两个文件中">2.3 训练集测试集放于两个文件中</a></li>
</ul>
</li>
<li><a href="#3-通过txt文件提取部分特征可选步骤">3. 通过txt文件提取部分特征——可选步骤</a></li>
<li><a href="#4-pearson相关性筛选rfe排序数据压缩">4. Pearson相关性筛选/RFE排序/数据压缩</a>
<ul>
<li><a href="#41-pearson相关性筛选按训练集数据筛选可选步骤一般都会用上">4.1 Pearson相关性筛选(按训练集数据筛选)——可选步骤(一般都会用上)</a></li>
<li><a href="#42-数据压缩必要步骤">4.2 数据压缩——必要步骤</a></li>
<li><a href="#43-rfe递归消除法排序可选步骤">4.3 RFE(递归消除法)排序——可选步骤</a></li>
</ul>
</li>
<li><a href="#5-参数寻优">5. 参数寻优</a>
<ul>
<li><a href="#51-不带描述符数量的重复网格寻优">5.1 不带描述符数量的重复网格寻优</a></li>
<li><a href="#52-带描述符数量的重复网格寻优">5.2 带描述符数量的重复网格寻优</a></li>
<li><a href="#53-early_stop策略降低过拟合程度">5.3 Early_stop策略——降低过拟合程度</a></li>
</ul>
</li>
<li><a href="#6-拟合模型评价模型保存结果">6. 拟合模型/评价模型/保存结果</a></li>
<li><a href="#7-重新载入模型进行预测">7. 重新载入模型进行预测</a></li>
</ul>
</li>
</ul>
<h1 id="qsar_package使用说明">QSAR_package使用说明</h1>
<p>看完如下使用说明后可仿照例子脚本(<a href="https://github.com/zhangshd/fluentQSAR/blob/master/Pipeline_single.py">点此查看例子脚本[仅供参考]</a>)编写符合自身需求的调用脚本并投入实际使用,祝您使用愉快!</p>
<h2 id="运行环境配置需求">运行环境配置需求:</h2>
<ul>
<li>python3</li>
<li>scikit-learn</li>
<li>numpy</li>
<li>pandas</li>
<li>matplotlib</li>
</ul>
<p>推荐安装和使用<a href="https://www.anaconda.com/distribution/">Anaconda3</a>。</p>
<h2 id="1-使用前准备">1. 使用前准备</h2>
<p>下载所有脚本(<a href="https://github.com/zhangshd/fluentQSAR/archive/master.zip">点此直接下载ZIP</a>),把所有文件解压后存放至一个目录,如<code class="highlighter-rouge">$/myPackage/</code></p>
<p><img src="./HTML_files/Snipaste_2019-04-25_12-47-38.png" alt="Sample" width="800" />
<img src="./HTML_files/Snipaste_2019-04-26_10-37-48.png" alt="Sample" width="800" /></p>
<p>新建一个文本文件,把上述目录的路径粘贴至这个文件内,然后把后缀改为<code class="highlighter-rouge">.pth</code>,如<code class="highlighter-rouge">myPackage.pth</code></p>
<p><img src="./HTML_files/Snipaste_2019-04-26_10-47-04.png" alt="Sample" width="800" /></p>
<p>打开cmd,输入<code class="highlighter-rouge">python</code>进入Python交互界面</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">sys</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sys</span><span class="o">.</span><span class="n">path</span>
</code></pre></div></div>
<p>找到一个类似<code class="highlighter-rouge">..\\lib\\site-packages</code>的路径</p>
<p><img src="./HTML_files/Snipaste_2019-04-26_10-51-27.png" alt="Sample" width="800" /></p>
<p>然后进入这个文件夹,把刚才创建的<code class="highlighter-rouge">myPackage.pth</code>文件放入这个路径,</p>
<p><img src="./HTML_files/Snipaste_2019-04-26_11-08-25.png" alt="Sample" width="800" /></p>
<p>以上操作的目的是把自己的脚本库路径加入到Python的环境变量中</p>
<h2 id="2-提取数据随机划分训练集测试集必选步骤三种方式三选一">2. 提取数据/随机划分训练集测试集——必选步骤,三种方式三选一</h2>
<p><font color='#ca0c16'><strong>在存放描述符数据的文件中,一定要把标签列放于第一个特征(描述符)列的前一列</strong></font></p>
<h3 id="21-输入一个总描述符文件采用随机划分的方式产生训练集和测试集">2.1 输入一个总描述符文件,采用随机划分的方式产生训练集和测试集</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">QSAR_package.data_split</span> <span class="kn">import</span> <span class="n">randomSpliter</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">file_name</span> <span class="o">=</span> <span class="s">"./spla2_296_rdkit2d.csv"</span> <span class="c"># 描述符数据文件路径</span>
<span class="n">spliter</span> <span class="o">=</span> <span class="n">randomSpliter</span><span class="p">(</span><span class="n">test_size</span><span class="o">=</span><span class="mf">0.25</span><span class="p">,</span><span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">spliter</span><span class="o">.</span><span class="n">ExtractTotalData</span><span class="p">(</span><span class="n">file_name</span><span class="p">,</span><span class="n">label_name</span><span class="o">=</span><span class="s">'Activity'</span><span class="p">)</span> <span class="c">#注意指定标签(活性)列的列名</span>
<span class="n">spliter</span><span class="o">.</span><span class="n">SplitData</span><span class="p">()</span>
<span class="n">tr_x</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">tr_x</span>
<span class="n">tr_y</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">tr_y</span>
<span class="n">te_x</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">te_x</span>
<span class="n">te_y</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">te_y</span>
</code></pre></div></div>
<p>如果想保存训练集测试集标签,则加入以下代码:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">spliter</span><span class="o">.</span><span class="n">saveTrainTestLabel</span><span class="p">(</span><span class="s">'./sPLA2_296_trOte0.csv'</span><span class="p">)</span> <span class="c"># 参数为存放路径</span>
</code></pre></div></div>
<p>保存出来的文件预览如下,表格只包含一列,第一行为表头名,后面为每个样本对应的训练集标签”tr”或测试集标签”te”,样本的顺序与原始输入文件的样本顺序一致。 <br />
<img src="./HTML_files/Snipaste_2019-04-28_18-43-13.png" alt="Sample" width="80" /></p>
<h3 id="22-根据训练集和测试集标签文件提取训练集和测试集">2.2 根据训练集和测试集标签文件提取训练集和测试集</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">QSAR_package.data_split</span> <span class="kn">import</span> <span class="n">extractData</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data_path</span> <span class="o">=</span> <span class="s">'./spla2_296_rdkit2d.csv'</span> <span class="c"># 描述符数据文件路径</span>
<span class="n">trOte_path</span> <span class="o">=</span> <span class="s">'./sPLA2_296_trOte0.csv'</span> <span class="c"># 训练集和测试集标签文件路径</span>
<span class="n">spliter</span> <span class="o">=</span> <span class="n">extractData</span><span class="p">()</span>
<span class="n">spliter</span><span class="o">.</span><span class="n">ExtractTrainTestFromLabel</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="n">trOte_path</span><span class="p">,</span> <span class="n">label_name</span><span class="o">=</span><span class="s">'Activity'</span><span class="p">)</span> <span class="c">#注意指定标签(活性)列的列名</span>
<span class="n">tr_x</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">tr_x</span>
<span class="n">tr_y</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">tr_y</span>
<span class="n">te_x</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">te_x</span>
<span class="n">te_y</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">te_y</span>
</code></pre></div></div>
<h3 id="23-训练集测试集放于两个文件中">2.3 训练集测试集放于两个文件中</h3>
<p>如果已经提前分好训练集测试集,且训练集和测试集文件存放于两个文件中,则使用以下代码</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">QSAR_package.data_split</span> <span class="kn">import</span> <span class="n">extractData</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_path</span> <span class="o">=</span> <span class="s">'./train_312_maccs.csv'</span> <span class="c"># 训练集数据文件路径</span>
<span class="n">test_path</span> <span class="o">=</span> <span class="s">'./test_140_maccs.csv'</span> <span class="c"># 测试集数据文件路径</span>
<span class="n">spliter</span> <span class="o">=</span> <span class="n">extractData</span><span class="p">()</span>
<span class="n">spliter</span><span class="o">.</span><span class="n">ExtractTrainTestData</span><span class="p">(</span><span class="n">train_path</span><span class="p">,</span> <span class="n">test_path</span><span class="p">,</span> <span class="n">label_name</span><span class="o">=</span><span class="s">'Activity'</span><span class="p">)</span> <span class="c">#注意指定标签(活性)列的列名</span>
<span class="n">tr_x</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">tr_x</span>
<span class="n">tr_y</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">tr_y</span>
<span class="n">te_x</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">te_x</span>
<span class="n">te_y</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">te_y</span>
</code></pre></div></div>
<h2 id="3-通过txt文件提取部分特征可选步骤">3. 通过txt文件提取部分特征——可选步骤</h2>
<p>在前面已经提取了训练集和测试集数据的情况下(通过文件提取或随机划分得到),可以通过一个包含若干描述符名的txt文件提取的部分特征数据,不在txt文件中的描述符会被排除。用法如下:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">feature_txt_path</span> <span class="o">=</span> <span class="s">'./descriptors.txt'</span> <span class="c"># 存放所需描述符名的文件路径</span>
<span class="n">spliter</span><span class="o">.</span><span class="n">ExtractFeatureFromTXT</span><span class="p">(</span><span class="n">feature_txt_path</span><span class="p">)</span>
<span class="n">tr_x</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">tr_x</span> <span class="c"># 运行ExtractFeatureFromTXT后重新赋值tr_x</span>
<span class="n">te_x</span> <span class="o">=</span> <span class="n">spliter</span><span class="o">.</span><span class="n">te_x</span> <span class="c"># 运行ExtractFeatureFromTXT后重新赋值te_x</span>
</code></pre></div></div>
<h2 id="4-pearson相关性筛选rfe排序数据压缩">4. Pearson相关性筛选/RFE排序/数据压缩</h2>
<h3 id="41-pearson相关性筛选按训练集数据筛选可选步骤一般都会用上">4.1 Pearson相关性筛选(按训练集数据筛选)——可选步骤(一般都会用上)</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">QSAR_package.feature_preprocess</span> <span class="kn">import</span> <span class="n">correlationSelection</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">corr</span> <span class="o">=</span> <span class="n">correlationSelection</span><span class="p">()</span>
<span class="n">corr</span><span class="o">.</span><span class="n">PearsonXX</span><span class="p">(</span><span class="n">tr_x</span><span class="p">,</span> <span class="n">tr_y</span><span class="p">,</span><span class="n">threshold_low</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">threshold_up</span><span class="o">=</span><span class="mf">0.9</span><span class="p">)</span>
</code></pre></div></div>
<p>筛选结果的描述符顺序已经按照其跟活性的Pearson相关性从高到低排好序,筛选之后的数据可通过<code class="highlighter-rouge">corr.selected_tr_x</code>获取,该属性是筛选之后的DataFrame对象,然后将此结果输入数据压缩环节。</p>
<h3 id="42-数据压缩必要步骤">4.2 数据压缩——必要步骤</h3>
<p>数据压缩模块<code class="highlighter-rouge">dataScale</code>可以将所有描述符数据压缩至指定的区间范围(如0.1到0.9),此处直接使用上一步骤Pearson相关性筛选产生的训练集数据<code class="highlighter-rouge">corr.selected_tr_x</code>拟合压缩器,然后对测试集数据进行压缩,此模块能自动识别连续型的描述符数据和指纹描述符数据,如果输入的是指纹描述符数据,则压缩之后数据不会有变化,所以,为了减少代码的改动,保证变量的统一,可以让指纹描述符也经过数据压缩过程,其数值不会发生变化。</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">QSAR_package.data_scale</span> <span class="kn">import</span> <span class="n">dataScale</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scaler</span> <span class="o">=</span> <span class="n">dataScale</span><span class="p">(</span><span class="n">scale_range</span><span class="o">=</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">))</span>
<span class="n">tr_scaled_x</span> <span class="o">=</span> <span class="n">scaler</span><span class="o">.</span><span class="n">FitTransform</span><span class="p">(</span><span class="n">corr</span><span class="o">.</span><span class="n">selected_tr_x</span><span class="p">)</span>
<span class="n">te_scaled_x</span> <span class="o">=</span> <span class="n">scaler</span><span class="o">.</span><span class="n">Transform</span><span class="p">(</span><span class="n">te_x</span><span class="p">,</span><span class="n">DataSet</span><span class="o">=</span><span class="s">'test'</span><span class="p">)</span> <span class="c"># 此压缩过程会自动从te_x中提取tr_scaled_x中所出现的所有列名对应的数据</span>
</code></pre></div></div>
<p>在上述代码中,<code class="highlighter-rouge">DataSet</code>参数如果为’train’则将压缩后的数据存入属性<code class="highlighter-rouge">scaler.tr_scaled_x</code>,如果为’test’,则将压缩后的数据存入属性<code class="highlighter-rouge">scaler.te_scaled_x</code>。<code class="highlighter-rouge">scaler.FitTransform</code>等价于先用<code class="highlighter-rouge">scaler.Fit</code>再用<code class="highlighter-rouge">scaler.Transform</code>,如下:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scaler</span> <span class="o">=</span> <span class="n">dataScale</span><span class="p">(</span><span class="n">scale_range</span><span class="o">=</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">))</span>
<span class="n">scaler</span><span class="o">.</span><span class="n">Fit</span><span class="p">(</span><span class="n">corr</span><span class="o">.</span><span class="n">selected_tr_x</span><span class="p">)</span>
<span class="n">tr_scaled_x</span> <span class="o">=</span> <span class="n">scaler</span><span class="o">.</span><span class="n">Transform</span><span class="p">(</span><span class="n">corr</span><span class="o">.</span><span class="n">selected_tr_x</span><span class="p">,</span><span class="n">DataSet</span><span class="o">=</span><span class="s">'train'</span><span class="p">)</span>
<span class="n">te_scaled_x</span> <span class="o">=</span> <span class="n">scaler</span><span class="o">.</span><span class="n">Transform</span><span class="p">(</span><span class="n">te_x</span><span class="p">,</span><span class="n">DataSet</span><span class="o">=</span><span class="s">'test'</span><span class="p">)</span>
</code></pre></div></div>
<h3 id="43-rfe递归消除法排序可选步骤">4.3 RFE(递归消除法)排序——可选步骤</h3>
<p>经过压缩之后,数据就可以直接输入参数寻优环节了,如果还需要将描述符的顺序换为RFE(递归消除法)排序的顺序,则运行以下代码:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">QSAR_package.feature_preprocess</span> <span class="kn">import</span> <span class="n">RFE_ranking</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rfe</span> <span class="o">=</span> <span class="n">RFE_ranking</span><span class="p">(</span><span class="n">estimator</span><span class="o">=</span><span class="s">'SVC'</span><span class="p">)</span> <span class="c"># "SVC" 为用于实现RFE排序的学习器(分类或回归算法)</span>
<span class="n">rfe</span><span class="o">.</span><span class="n">Fit</span><span class="p">(</span><span class="n">tr_scaled_x</span><span class="p">,</span> <span class="n">tr_y</span><span class="p">)</span>
<span class="n">tr_ranked_x</span> <span class="o">=</span> <span class="n">rfe</span><span class="o">.</span><span class="n">tr_ranked_x</span>
<span class="n">te_ranked_x</span> <span class="o">=</span> <span class="n">te_scaled_x</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span><span class="n">tr_ranked_x</span><span class="o">.</span><span class="n">columns</span><span class="p">]</span>
</code></pre></div></div>
<p>目前支持字符串指定的学习器有”SVC”(分类)、”RFC”(分类)、”SVR”(回归)、”RFR”(回归),如果想尝试其他学习器,可以直接让<code class="highlighter-rouge">estimator</code>参数等于一个自定义的学习器对象,前提是该学习器对象有<code class="highlighter-rouge">coef_</code>或<code class="highlighter-rouge">feature_importance_</code>属性,详见<a href="https://scikit-learn.org/stable/modules/feature_selection.html#rfe">sklearn文档中RFE算法的介绍</a></p>
<h2 id="5-参数寻优">5. 参数寻优</h2>
<h3 id="51-不带描述符数量的重复网格寻优">5.1 不带描述符数量的重复网格寻优</h3>
<ul>
<li>
<p>使用<code class="highlighter-rouge">gridSearchBase</code>模块可以自定义传入学习器、参数字典、打分器对象,进行重复网格寻优,此处以<code class="highlighter-rouge">SVC</code>算法的寻优为例</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kn">from</span> <span class="nn">QSAR_package.grid_search</span> <span class="kn">import</span> <span class="n">gridSearchBase</span>
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">SVC</span>
<span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">accuracy_score</span><span class="p">,</span><span class="n">make_scorer</span>
</code></pre></div> </div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">grid_estimator</span> <span class="o">=</span> <span class="n">SVC</span><span class="p">()</span> <span class="c"># 学习器对象</span>
<span class="n">grid_dict</span> <span class="o">=</span> <span class="p">{</span><span class="s">'C'</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mf">0.1</span><span class="p">,</span><span class="mf">0.01</span><span class="p">],</span><span class="s">'gamma'</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mf">0.1</span><span class="p">,</span><span class="mf">0.01</span><span class="p">]}</span> <span class="c"># 对应学习器的参数字典</span>
<span class="n">grid_scorer</span> <span class="o">=</span> <span class="n">make_scorer</span><span class="p">(</span><span class="n">accuracy_score</span><span class="p">,</span><span class="n">greater_is_better</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="c"># 打分器对象</span>
<span class="n">grid</span> <span class="o">=</span> <span class="n">gridSearchBase</span><span class="p">(</span><span class="n">fold</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">grid_estimator</span><span class="o">=</span><span class="n">grid_estimator</span><span class="p">,</span> <span class="n">grid_dict</span><span class="o">=</span><span class="n">grid_dict</span><span class="p">,</span> <span class="n">grid_scorer</span><span class="o">=</span><span class="n">grid_scorer</span><span class="p">,</span> <span class="n">repeat</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">grid</span><span class="o">.</span><span class="n">Fit</span><span class="p">(</span><span class="n">tr_scaled_x</span><span class="p">,</span><span class="n">tr_y</span><span class="p">)</span>
</code></pre></div> </div>
<p>其中<code class="highlighter-rouge">fold</code>为网格寻优中交叉验证的重数,<code class="highlighter-rouge">repeat</code>为网格寻优的重复次数;
然后可以通过<code class="highlighter-rouge">grid.best_params</code>获取最优参数,通过<code class="highlighter-rouge">grid.best_estimator</code>获取拟合好的学习器。</p>
</li>
<li>使用<code class="highlighter-rouge">gridSearchPlus</code>模块可以通过字符串直接指定预定义好的学习器和对应的参数字典及打分器,现支持的算法有”SVC”、”DTC”、”RFC”、”SVR”、”RFR”,调用代码(不带描述符数量的寻优)如下:
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kn">from</span> <span class="nn">QSAR_package.grid_search</span> <span class="kn">import</span> <span class="n">gridSearchPlus</span>
</code></pre></div> </div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">grid</span> <span class="o">=</span> <span class="n">gridSearchPlus</span><span class="p">(</span><span class="n">grid_estimatorName</span><span class="o">=</span><span class="s">'SVC'</span><span class="p">,</span> <span class="n">fold</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeat</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">grid</span><span class="o">.</span><span class="n">Fit</span><span class="p">(</span><span class="n">tr_scaled_x</span><span class="p">,</span><span class="n">tr_y</span><span class="p">)</span>
</code></pre></div> </div>
<p>然后可以通过<code class="highlighter-rouge">grid.best_params</code>获取最优参数,通过<code class="highlighter-rouge">grid.best_estimator</code>获取拟合好的学习器。</p>
<h3 id="52-带描述符数量的重复网格寻优">5.2 带描述符数量的重复网格寻优</h3>
<p>因为前面已经介绍了可以通过Pearson相关性或者RFE方法对描述符数据排序,得到一个在列方向上有序的二维数据(DataFrame或numpy数组),如此以来,便可以将描述符的数量<code class="highlighter-rouge">n</code>也作为一个超参数,参与寻优过程,在网格寻优的外层套一个循环,每次循环取前<code class="highlighter-rouge">n</code>个描述符的数据,再用此数据进行重复网格寻优,最后找出在交叉验证的得分最高的描述符数量与参数组合。</p>
</li>
<li>使用<code class="highlighter-rouge">gridSearchBase</code>模块进行带描述符数量的重复网格寻优
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kn">from</span> <span class="nn">QSAR_package.grid_search</span> <span class="kn">import</span> <span class="n">gridSearchBase</span>
</code></pre></div> </div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">grid_estimator</span> <span class="o">=</span> <span class="n">SVC</span><span class="p">()</span> <span class="c"># 学习器对象</span>
<span class="n">grid_dict</span> <span class="o">=</span> <span class="p">{</span><span class="s">'C'</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mf">0.1</span><span class="p">,</span><span class="mf">0.01</span><span class="p">],</span><span class="s">'gamma'</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mf">0.1</span><span class="p">,</span><span class="mf">0.01</span><span class="p">]}</span> <span class="c"># 对应学习器的参数字典</span>
<span class="n">grid_scorer</span> <span class="o">=</span> <span class="n">make_scorer</span><span class="p">(</span><span class="n">accuracy_score</span><span class="p">,</span><span class="n">greater_is_better</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="c"># 打分器对象</span>
<span class="n">grid</span> <span class="o">=</span> <span class="n">gridSearchBase</span><span class="p">(</span><span class="n">fold</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">grid_estimator</span><span class="o">=</span><span class="n">grid_estimator</span><span class="p">,</span> <span class="n">grid_dict</span><span class="o">=</span><span class="n">grid_dict</span><span class="p">,</span> <span class="n">grid_scorer</span><span class="o">=</span><span class="n">grid_scorer</span><span class="p">,</span> <span class="n">repeat</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">grid</span><span class="o">.</span><span class="n">FitWithFeaturesNum</span><span class="p">(</span><span class="n">tr_scaled_x</span><span class="p">,</span> <span class="n">tr_y</span><span class="p">,</span><span class="n">features_range</span><span class="o">=</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span><span class="mi">20</span><span class="p">))</span> <span class="c"># features_range为描述符数量的迭代范围,参数为包含两个整数的元组或列表形式,其中第一个整数为描述符数量的下限,第二个整数为描述符数量的上限</span>
</code></pre></div> </div>
<p>然后可以通过<code class="highlighter-rouge">grid.best_params</code>获取最优参数,通过<code class="highlighter-rouge">grid.best_estimator</code>获取拟合好的学习器,还可以通过<code class="highlighter-rouge">grid.best_features</code>获取最终选择的描述符名称。</p>
</li>
<li>使用<code class="highlighter-rouge">gridSearchPlus</code>模块进行带描述符数量的重复网格寻优
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kn">from</span> <span class="nn">QSAR_package.grid_search</span> <span class="kn">import</span> <span class="n">gridSearchPlus</span>
</code></pre></div> </div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">grid</span> <span class="o">=</span> <span class="n">gridSearchPlus</span><span class="p">(</span><span class="n">grid_estimatorName</span><span class="o">=</span><span class="s">'SVC'</span><span class="p">,</span> <span class="n">fold</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeat</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="n">grid</span><span class="o">.</span><span class="n">FitWithFeaturesNum</span><span class="p">(</span><span class="n">tr_scaled_x</span><span class="p">,</span> <span class="n">tr_y</span><span class="p">,</span><span class="n">features_range</span><span class="o">=</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span><span class="mi">20</span><span class="p">))</span>
</code></pre></div> </div>
<p>然后可以通过<code class="highlighter-rouge">grid.best_params</code>获取最优参数,通过<code class="highlighter-rouge">grid.best_estimator</code>获取拟合好的学习器,还可以通过<code class="highlighter-rouge">grid.best_features</code>获取最终选择的若干个描述符名称。</p>
<h3 id="53-early_stop策略降低过拟合程度">5.3 Early_stop策略——降低过拟合程度</h3>
<p>正常情况gridsearch所选的最优参数组合是交叉验证平均得分(mean_test_score)最高的参数组合,如果采用Early_stop策略,则会从(mean_test_score)最高分开始向下寻找(分值按降序排列)得分与最高分有显著差异的次优参数组合, 显著差异的标准就是该分值与最高分的差值占该分值的比率(取绝对值)大于指定的early_stop数值,最终选择的参数组合是降序排名在上述次优参数组合前一名的参数组合,在<code class="highlighter-rouge">gridSearchBase</code>和<code class="highlighter-rouge">gridSearchPlus</code>中都可以设置<code class="highlighter-rouge">early_stop</code>参数,默认为<code class="highlighter-rouge">None</code>,有效的<code class="highlighter-rouge">early_stop</code>参数值为<code class="highlighter-rouge">0</code>到<code class="highlighter-rouge">1</code>之间的浮点数,具体例子如下:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">grid_estimator</span> <span class="o">=</span> <span class="n">SVC</span><span class="p">()</span> <span class="c"># 学习器对象</span>
<span class="n">grid_dict</span> <span class="o">=</span> <span class="p">{</span><span class="s">'C'</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mf">0.1</span><span class="p">,</span><span class="mf">0.01</span><span class="p">],</span><span class="s">'gamma'</span><span class="p">:[</span><span class="mi">1</span><span class="p">,</span><span class="mf">0.1</span><span class="p">,</span><span class="mf">0.01</span><span class="p">]}</span> <span class="c"># 对应学习器的参数字典</span>
<span class="n">grid_scorer</span> <span class="o">=</span> <span class="n">make_scorer</span><span class="p">(</span><span class="n">accuracy_score</span><span class="p">,</span><span class="n">greater_is_better</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="c"># 打分器对象</span>
<span class="n">grid</span> <span class="o">=</span> <span class="n">gridSearchBase</span><span class="p">(</span><span class="n">fold</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">grid_estimator</span><span class="o">=</span><span class="n">grid_estimator</span><span class="p">,</span> <span class="n">grid_dict</span><span class="o">=</span><span class="n">grid_dict</span><span class="p">,</span> <span class="n">grid_scorer</span><span class="o">=</span><span class="n">grid_scorer</span><span class="p">,</span> <span class="n">repeat</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">early_stop</span><span class="o">=</span><span class="mf">0.01</span><span class="p">)</span>
<span class="o">...</span>
</code></pre></div> </div>
<p>或者</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">grid</span> <span class="o">=</span> <span class="n">gridSearchPlus</span><span class="p">(</span><span class="n">grid_estimatorName</span><span class="o">=</span><span class="s">'SVC'</span><span class="p">,</span> <span class="n">fold</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">repeat</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">early_stop</span><span class="o">=</span><span class="mf">0.01</span><span class="p">)</span>
<span class="o">...</span>
</code></pre></div> </div>
</li>
</ul>
<h2 id="6-拟合模型评价模型保存结果">6. 拟合模型/评价模型/保存结果</h2>
<ul>
<li>用<code class="highlighter-rouge">modeling</code>模块可以传入一个学习器对象及对应的一组超参数,然后使用训练集进行拟合(<code class="highlighter-rouge">modeling.Fit</code>),同时也可以用来对测试集样本进行预测(<code class="highlighter-rouge">modeling.Predict</code>),还可以用训练集做交叉验证(通过sklearn中<code class="highlighter-rouge">metrics</code>模块下的<code class="highlighter-rouge">cross_val_predict</code>实现,通过<code class="highlighter-rouge">modeling.CrossVal</code>调用)。分类任务的预测结果评价值包括<code class="highlighter-rouge">Accuracy</code>、<code class="highlighter-rouge">MCC</code>、<code class="highlighter-rouge">SE</code>、<code class="highlighter-rouge">SP</code>、<code class="highlighter-rouge">tp</code>、<code class="highlighter-rouge">tn</code>、<code class="highlighter-rouge">fp</code>、<code class="highlighter-rouge">fn</code>,回归任务的预测结果评价值包括<code class="highlighter-rouge">R2</code>、<code class="highlighter-rouge">RMSE</code>、<code class="highlighter-rouge">MAE</code>。评价结果可以通过<code class="highlighter-rouge">modeling.ShowResults</code>打印出来,如果想看训练集和测试集预测结果的散点图(回归任务),可以设定参数<code class="highlighter-rouge">make_fig=True</code>,该参数默认为<code class="highlighter-rouge">False</code>。评价结果及模型的超参数可以通过<code class="highlighter-rouge">modeling.SaveResults</code>方法保存,保存的机制是以追加的方式写入一个csv文件,如果在使用<code class="highlighter-rouge">modeling.ShowResults</code>设置了<code class="highlighter-rouge">make_fig=True</code>,则散点图也会保存出来(tif格式),同时,这组结果对应的模型文件也会保存(.model后缀),如果不需要,则可以在<code class="highlighter-rouge">modeling.SaveResults</code>中设置<code class="highlighter-rouge">save_model=False</code>。
<ul>
<li>
<p><code class="highlighter-rouge">modeling</code>模块可以直接接收上一环节网格寻优的结果(<code class="highlighter-rouge">grid.best_estimator</code>、<code class="highlighter-rouge">grid.best_params</code>、<code class="highlighter-rouge">grid.best_features</code>),使用示例如下:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">QSAR_package.model_evaluation</span> <span class="kn">import</span> <span class="n">modeling</span>
</code></pre></div> </div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span> <span class="o">=</span> <span class="n">modeling</span><span class="p">(</span><span class="n">estimator</span><span class="o">=</span><span class="n">grid</span><span class="o">.</span><span class="n">best_estimator</span><span class="p">,</span><span class="n">params</span><span class="o">=</span><span class="n">grid</span><span class="o">.</span><span class="n">best_params</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">Fit</span><span class="p">(</span><span class="n">tr_scaled_x</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span><span class="n">grid</span><span class="o">.</span><span class="n">best_features</span><span class="p">],</span> <span class="n">tr_y</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">Predict</span><span class="p">(</span><span class="n">te_scaled_x</span><span class="o">.</span><span class="n">loc</span><span class="p">[:,</span><span class="n">grid</span><span class="o">.</span><span class="n">best_features</span><span class="p">],</span><span class="n">te_y</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">CrossVal</span><span class="p">(</span><span class="n">cv</span><span class="o">=</span><span class="s">"LOO"</span><span class="p">)</span> <span class="c"># cv可以为'LOO'或正整数,也可以为一个交叉验证生成器对象如`Kfold`、`LeaveOneOut`等</span>
<span class="n">model</span><span class="o">.</span><span class="n">ShowResults</span><span class="p">(</span><span class="n">show_cv</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">make_fig</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">SaveResults</span><span class="p">(</span><span class="s">'./results.csv'</span><span class="p">,</span><span class="n">notes</span><span class="o">=</span><span class="s">'自己定义的一些备注信息'</span><span class="p">)</span>
</code></pre></div> </div>
</li>
<li>
<p><code class="highlighter-rouge">modeling</code>模块也可以传入外部定义好的学习器对象和对应的超参数字典,以<code class="highlighter-rouge">SVC</code>为例:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">QSAR_package.model_evaluation</span> <span class="kn">import</span> <span class="n">modeling</span>
<span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">SVC</span>
</code></pre></div> </div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">estimator</span> <span class="o">=</span> <span class="n">SVC</span><span class="p">()</span>
<span class="n">params</span> <span class="o">=</span> <span class="p">{</span><span class="s">"C"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="s">"gamma"</span><span class="p">:</span><span class="mf">0.1</span><span class="p">}</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">modeling</span><span class="p">(</span><span class="n">estimator</span><span class="o">=</span><span class="n">estimator</span><span class="p">,</span><span class="n">params</span><span class="o">=</span><span class="n">params</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">Fit</span><span class="p">(</span><span class="n">tr_scaled_x</span><span class="p">,</span> <span class="n">tr_y</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">Predict</span><span class="p">(</span><span class="n">te_scaled_x</span><span class="p">,</span><span class="n">te_y</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">CrossVal</span><span class="p">(</span><span class="n">cv</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span> <span class="c"># cv可以为'LOO'或正整数,也可以为一个交叉验证生成器对象如`Kfold`、`LeaveOneOut`等</span>
<span class="n">model</span><span class="o">.</span><span class="n">ShowResults</span><span class="p">(</span><span class="n">show_cv</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">make_fig</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">SaveResults</span><span class="p">(</span><span class="s">'./results.csv'</span><span class="p">,</span><span class="n">notes</span><span class="o">=</span><span class="s">'自己定义的一些备注信息'</span><span class="p">)</span>
</code></pre></div> </div>
</li>
</ul>
</li>
<li><code class="highlighter-rouge">modelEvaluator</code>是一个独立的模型评价模块,可以直接传入真实值<code class="highlighter-rouge">y_true</code>和预测值<code class="highlighter-rouge">y_pred</code>,得到分类或回归的评价值,它能根据传入的<code class="highlighter-rouge">y_true</code>自动识别其是属于分类数据还是回归数据。分类任务的预测结果评价值包括<code class="highlighter-rouge">Accuracy</code>、<code class="highlighter-rouge">MCC</code>、<code class="highlighter-rouge">SE</code>、<code class="highlighter-rouge">SP</code>、<code class="highlighter-rouge">tp</code>、<code class="highlighter-rouge">tn</code>、<code class="highlighter-rouge">fp</code>、<code class="highlighter-rouge">fn</code>,回归任务的预测结果评价值包括<code class="highlighter-rouge">R2</code>、<code class="highlighter-rouge">RMSE</code>、<code class="highlighter-rouge">MAE</code>,这些可以通过<code class="highlighter-rouge">modelEvaluator</code>实例的属性查看。<code class="highlighter-rouge">modeling</code>模块的模型评价方法就是继承自<code class="highlighter-rouge">modelEvaluator</code>模块。用法如下(以回归任务的预测结果评价为例):
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kn">from</span> <span class="nn">QSAR_package.model_evaluation</span> <span class="kn">import</span> <span class="n">modelEvaluator</span>
</code></pre></div> </div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">evaluator</span> <span class="o">=</span> <span class="n">modelEvaluator</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span><span class="n">y_pred</span><span class="p">)</span>
<span class="n">r2</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">r2</span>
<span class="n">rmse</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">rmse</span>
<span class="n">mae</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">mae</span>
</code></pre></div> </div>
<p>也可以直接查看该实例对象的属性字典来查看所有评价值:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">evaluator</span> <span class="o">=</span> <span class="n">modelEvaluator</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span><span class="n">y_pred</span><span class="p">)</span>
<span class="n">all_metrics</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">evaluator</span><span class="o">.</span><span class="n">__dict__</span><span class="o">.</span><span class="n">items</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="n">all_metrics</span><span class="p">)</span>
</code></pre></div> </div>
<blockquote>
<p>{‘r2’: 0.7373, ‘rmse’: 0.6008, ‘mae’: 0.5395}</p>
</blockquote>
</li>
</ul>
<h2 id="7-重新载入模型进行预测">7. 重新载入模型进行预测</h2>
<p>在使用<code class="highlighter-rouge">modeling</code>模块的时候,如果保存了模型文件,则可以在后期重新导入模型,并使用该模型对某数据集进行预测,需要注意的是,用来进行预测的特征数据必须与所导入模型的原始训练数据在列方向上保持一致,且经过与原始训练数据相同的压缩过程,示例如下:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">sklearn.externals</span> <span class="kn">import</span> <span class="n">joblib</span>
<span class="kn">from</span> <span class="nn">QSAR_package.data_scale</span> <span class="kn">import</span> <span class="n">dataScale</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">scaler</span> <span class="o">=</span> <span class="n">dataScale</span><span class="p">(</span><span class="n">scale_range</span><span class="o">=</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">))</span>
<span class="n">tr_scaled_x</span> <span class="o">=</span> <span class="n">scaler</span><span class="o">.</span><span class="n">FitTransform</span><span class="p">(</span><span class="n">tr_x</span><span class="p">)</span> <span class="c"># tr_x中的描述符需与建模时所用的描述符数据完全一致,才能重现结果</span>
<span class="n">te_scaled_x</span> <span class="o">=</span> <span class="n">scaler</span><span class="o">.</span><span class="n">Transform</span><span class="p">(</span><span class="n">te_x</span><span class="p">,</span><span class="n">DataSet</span><span class="o">=</span><span class="s">'test'</span><span class="p">)</span> <span class="c"># te_x中只要包含所有tr_x中出现的描述符数据即可,压缩过程会自动从中提取所需要的列</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">joblib</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">"./SVR.model"</span><span class="p">)</span>
<span class="n">tr_pred_y</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">tr_scaled_x</span><span class="p">)</span>
<span class="n">te_pred_y</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">te_scaled_x</span><span class="p">)</span>
</code></pre></div></div>
<p>如果还想对预测结果进行评价,可以使用<code class="highlighter-rouge">modelEvaluator</code>模块,只需传入真实值与预测值即可:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tr_eva</span> <span class="o">=</span> <span class="n">modelEvaluator</span><span class="p">(</span><span class="n">tr_y</span><span class="p">,</span> <span class="n">tr_pred_y</span><span class="p">)</span>
<span class="n">te_eva</span> <span class="o">=</span> <span class="n">modelEvaluator</span><span class="p">(</span><span class="n">te_y</span><span class="p">,</span> <span class="n">te_pred_y</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">tr_eva</span><span class="o">.</span><span class="n">r2</span><span class="p">,</span><span class="n">tr_eva</span><span class="o">.</span><span class="n">rmse</span><span class="p">,</span><span class="n">tr_eva</span><span class="o">.</span><span class="n">mae</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">te_eva</span><span class="o">.</span><span class="n">r2</span><span class="p">,</span><span class="n">te_eva</span><span class="o">.</span><span class="n">rmse</span><span class="p">,</span><span class="n">te_eva</span><span class="o">.</span><span class="n">mae</span><span class="p">)</span>
</code></pre></div></div>
<blockquote>
<p>0.9263 0.3442 0.4126<br />
0.5377 0.8584 0.6257</p>
</blockquote>
<footer class="site-footer">
<span class="site-footer-owner"><a href="https://github.com/zhangshd/fluentQSAR">fluentQSAR</a> is maintained by <a href="https://github.com/zhangshd">zhangshd</a>.</span>
<span class="site-footer-credits">This page was generated by <a href="https://pages.github.com">GitHub Pages</a>.</span>
</footer>
</section>
</body>
</html>