Skip to content

Commit 38a12b4

Browse files
authored
Update README.md
1 parent 71e1833 commit 38a12b4

File tree

1 file changed

+0
-59
lines changed

1 file changed

+0
-59
lines changed

README.md

Lines changed: 0 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -3,62 +3,3 @@ Please visit the [wiki](https://github.com/ksator/Machine_Learning_with_Python/w
33

44
# Documentation structure
55

6-
- [manipulate dataset with pandas](#manipulate-dataset-with-pandas)
7-
- [Remove irrelevant features to reduce overfitting](#remove-irrelevant-features-to-reduce-overfitting)
8-
- [Recursive Feature Elimination](#recursive-feature-elimination)
9-
10-
11-
12-
# Remove irrelevant features to reduce overfitting
13-
14-
To prevent overfitting, improve the data by removing irrelevant features.
15-
16-
## Recursive Feature Elimination
17-
18-
The class `RFE` (Recursive Feature Elimination) from the `feature selection` module from the python library scikit-learn recursively removes features. It selects features by recursively considering smaller and smaller sets of features. It first trains the classifier on the initial set of features. It trains a classifier multiple times using smaller and smaller features set. After each training, the importance of the features is calculated and the least important feature is eliminated from current set of features. That procedure is recursively repeated until the desired number of features to select is eventually reached. RFE is able to find out the combination of features that contribute to the prediction. You just need to import RFE from sklearn.feature_selection and indicate which classifier model to use and the number of features to select.
19-
20-
Here's how you can use the class `RFE` in order to find out the combination of important features.
21-
22-
We will use this basic example [recursive_feature_elimination.py](recursive_feature_elimination.py)
23-
24-
Load LinearSVC class from Scikit Learn library
25-
LinearSVC performs classification. LinearSVC is similar to SVC with parameter kernel='linear'. LinearSVC finds the linear separator that maximizes the distance between itself and the closest/nearest data point point
26-
```
27-
>>> from sklearn.svm import LinearSVC
28-
```
29-
load RFE (Recursive Feature Elimination). RFE is used to remove features
30-
```
31-
>>> from sklearn.feature_selection import RFE
32-
```
33-
load the iris dataset
34-
```
35-
from sklearn import datasets
36-
dataset = datasets.load_iris()
37-
```
38-
the dataset has 150 items, each item has 4 features (sepal length, sepal width, petal length, petal width)
39-
```
40-
>>> dataset.data.shape
41-
(150, 4)
42-
>>> dataset.feature_names
43-
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
44-
```
45-
instanciate the LinearSVC class
46-
```
47-
>>> svm = LinearSVC(max_iter=5000)
48-
```
49-
instanciate the RFE class. select the number of features to keep (3 in that example). select the classifier model to use
50-
```
51-
>>> rfe = RFE(svm, 3)
52-
```
53-
use the iris dataset and fit
54-
```
55-
rfe = rfe.fit(dataset.data, dataset.target)
56-
```
57-
print summaries for the selection of attributes
58-
```
59-
>>> print(rfe.support_)
60-
[False True True True]
61-
>>> print(rfe.ranking_)
62-
[2 1 1 1]
63-
```
64-
So, sepal length is not selected. The 3 selected features are sepal width, petal length, petal width.

0 commit comments

Comments
 (0)