> For the complete documentation index, see [llms.txt](https://openai.gitbook.io/code-cheatsheets/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://openai.gitbook.io/code-cheatsheets/ml/nlp/4ml.md).

# 4ml

**K-FOld Cross-Validation**: one k subset is test data and k-1 are training data. Repeated k times.

## Evaluation:

Accuracy = # Predict correclty/ total Precision= # predicted positive that are positive/# predicted positive

## Random Forest

* Ensemeble method that creates multiple simple models and combines them
* Constructs a colleciton of decision tree and then aggregates the predictions of each tree to determine the final prediction.
* Easily handles outliers, missing value, different inputs
* outputs feature important
* Can do classification and regression

  ```python
  #Input is TfidfVectorizer of data
  from sklearn.ensemble import RandomForestClassifer
  #RandomForestClassifer.feature_importances_ are great for understand the ml
  RandomForestClassifier(n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)
  ```

## CrossValidation

```python
from sklearn.model_selection import KFold, cross_val_score
rf = RandomForestClassifer(n_jobs=-1) #parallelize
k_fold = KFold(n_splits=5)
cross_val_score(rf, X_features, labels, cv=k_fold, scoring='accuracy', n_jobs=-1)
```

## Holdout Test Set

```python
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_features, labels, test_size=0.2)

from sklearn.ensemble import RandomForestClassifier
RandomForestClassifier(n_estimators=50, max_depth=20, n_jobs=-1)
rf_model = rf.fit(X_train, y_train)
sorted(zip(rf_model.features_importances_, X_train.columns), reverse=True)[0:10]
y_pred = rf_model.predict(X_test)
precision, recall, fscore, support = score(y_test, y_pred, pos_label='spam', average='binary')
```

## Grid Search

Defining grid of hyper parameters and exploring them.

```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.model_selection import train_test_split
def train_RF(n_est, depth):
  rf = RandomForestClassifer(n_estimatores=n_est, max_depth=depth, n_jobs=-1)
  rf_model = rf.fit(X_train, y_train)
  y_pred = rf_model.predict(X_test)
  precision, recall, fscore, support = score(y_test, y_pred, pos_label='spam', average='binary')
  print(......)

for n_est in [10,50,100]:
  for depth in [10,20,30,None]:
    train_RF(n_est, depth)
```

## Grid Search And Cross Validation

\`\`\`py from sklearn.model*selection import GridSearchCV rf = RandomForestClassifer() param = {'n\_estimators': \[10, 150, 300], 'max\_depth': \[30,60,90, None]} gs = GridSearchCV(rf, param, cv=5, n\_jobs=-1) gs.fit(X\_tfidf\_feat, labels) pd.DataFrame(gs.cv\_results*).sort\_values('mean\_test\_score', ascending=False) \[0:5]


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://openai.gitbook.io/code-cheatsheets/ml/nlp/4ml.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
