Pandas
import pandas as pd
fullCorpus = pd.DataFrame({
'label': labelList,
'body_list': textList
})
fullCorpus.head() #print first 5 rows
Access
df['date'] #just date column
df[df['date'] > '2017-03-20'] # just date columns that meet criteria
Filter
df = df.drop(['high','low','close','volume'], axis=1) #get rid of columns
Indexing
Use X.iloc[0]
instead of X[0]
Read in seperated data
fullCorpus = pd.read_csv("file.tsv", sep="\t", header=None)
# header default assumes first row is the column names
fullsCorpus.columns = ['label', 'body_text'] # add columns
Create New Column
pdData['clean_text'] = data['body_text'].apply(lambda x: remove_punct(x))
Last updated