K-FOld Cross-Validation: one k subset is test data and k-1 are training data. Repeated k times.
Evaluation:
Accuracy = # Predict correclty/ total Precision= # predicted positive that are positive/# predicted positive
Random Forest
Ensemeble method that creates multiple simple models and combines them
Constructs a colleciton of decision tree and then aggregates the predictions of each tree to determine the final prediction.
Easily handles outliers, missing value, different inputs
outputs feature important
Can do classification and regression
#Input is TfidfVectorizer of datafrom sklearn.ensemble import RandomForestClassifer#RandomForestClassifer.feature_importances_ are great for understand the mlRandomForestClassifier(n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)