一个简单的机器学习共享示例和有一些有用的工具。
1.对于离散无序变量,特征工程使用pandas的pd. Dummies方法将特征转换为One-Hot形式。
2.对于连续数据,特征工程进行了标准化处理。比较了离散有序特征和或连续特征的不同。
3.对于离散有序数据,首先探究因变量与数据之间的关系,然后根据实际情况讨论其他数据类型的可行性。
1.数据探究:读取数据、缺失值、变量类型、数据类型等;
2.数据清理:变量分类、类型转换、标准化或标准化;
3.特征工程:特征关联分析、特征构建和特征选择;
4.模型构建:初始模型、集成模型、参考模型、交叉验证;
5.模型评价:f1评分,AUC分析,内存优化;
A simple machine learning repo for share. There are some usefull tools.
1.For discrete disordered variables, feature engineering use pd.Dummies to transform one-hot features.
2.For continuous data, the characteristic engineering is standardized. Compare features as discrete ordered data or continuous data.
3.For discrete ordered data, the relationship between the dependent variable and the data is first explored, and then the feasibility of other data types is discussed according to the actual situation.
1.Data exploration: read data, missing value, variable type, data type, etc.;
2.Data cleaning: variable classification, type conversion, normalization or standardization;
3.Feature engineering: feature correlation analysis, feature construction and feature selection;
4.Model construction: initial model, integration model, reference model, cross-validation;
5.Model evaluation: f1-score, AUC analysis, memory occupancy;