2vectorizing
Overview
Count Vectorizer
from sklearn.feature_extraction.text import CountVectorizer
def clean_text(text):
....
count_vect = CountVectorizer(analyzer=clean_text)
X_counts = count_vect.fit_transform(data) #fit would just learn, fit transform returnsAnalysis
X_counts.shape
count_vect.get_feature_names()
X_counts_df.columns = pd.DataFrame(X_counts.toarray()) #X_counts is sparse rn
X_counts_df.columns = count_vect.get_feature_names()N-grams
Term Frequency Inverse Document Frequency TF-IDF
Last updated