You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Remove irrelevant features to reduce overfitting
13
-
14
-
To prevent overfitting, improve the data by removing irrelevant features.
15
-
16
-
## Recursive Feature Elimination
17
-
18
-
The class `RFE` (Recursive Feature Elimination) from the `feature selection` module from the python library scikit-learn recursively removes features. It selects features by recursively considering smaller and smaller sets of features. It first trains the classifier on the initial set of features. It trains a classifier multiple times using smaller and smaller features set. After each training, the importance of the features is calculated and the least important feature is eliminated from current set of features. That procedure is recursively repeated until the desired number of features to select is eventually reached. RFE is able to find out the combination of features that contribute to the prediction. You just need to import RFE from sklearn.feature_selection and indicate which classifier model to use and the number of features to select.
19
-
20
-
Here's how you can use the class `RFE` in order to find out the combination of important features.
21
-
22
-
We will use this basic example [recursive_feature_elimination.py](recursive_feature_elimination.py)
23
-
24
-
Load LinearSVC class from Scikit Learn library
25
-
LinearSVC performs classification. LinearSVC is similar to SVC with parameter kernel='linear'. LinearSVC finds the linear separator that maximizes the distance between itself and the closest/nearest data point point
26
-
```
27
-
>>> from sklearn.svm import LinearSVC
28
-
```
29
-
load RFE (Recursive Feature Elimination). RFE is used to remove features
30
-
```
31
-
>>> from sklearn.feature_selection import RFE
32
-
```
33
-
load the iris dataset
34
-
```
35
-
from sklearn import datasets
36
-
dataset = datasets.load_iris()
37
-
```
38
-
the dataset has 150 items, each item has 4 features (sepal length, sepal width, petal length, petal width)
0 commit comments