Skip to content

zysNLP/machine-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

一个简单的机器学习共享示例和有一些有用的工具。

关于machine_learning_example.py

方法介绍

1.对于离散无序变量,特征工程使用pandas的pd. Dummies方法将特征转换为One-Hot形式。

2.对于连续数据,特征工程进行了标准化处理。比较了离散有序特征和或连续特征的不同。

3.对于离散有序数据,首先探究因变量与数据之间的关系,然后根据实际情况讨论其他数据类型的可行性。

建模过程

1.数据探究:读取数据、缺失值、变量类型、数据类型等;

2.数据清理:变量分类、类型转换、标准化或标准化;

3.特征工程:特征关联分析、特征构建和特征选择;

4.模型构建:初始模型、集成模型、参考模型、交叉验证;

5.模型评价:f1评分,AUC分析,内存优化;

A simple machine learning repo for share. There are some usefull tools.

About machine_learning_example.py

Methods introduction

1.For discrete disordered variables, feature engineering use pd.Dummies to transform one-hot features.

2.For continuous data, the characteristic engineering is standardized. Compare features as discrete ordered data or continuous data.

3.For discrete ordered data, the relationship between the dependent variable and the data is first explored, and then the feasibility of other data types is discussed according to the actual situation.

Modeling process

1.Data exploration: read data, missing value, variable type, data type, etc.;

2.Data cleaning: variable classification, type conversion, normalization or standardization;

3.Feature engineering: feature correlation analysis, feature construction and feature selection;

4.Model construction: initial model, integration model, reference model, cross-validation;

5.Model evaluation: f1-score, AUC analysis, memory occupancy;

About

A machine learning repo.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages