3featureengineering
Creating New Features to get most out of data, can be complex topic
Creating New Features
length title
Check if feature actually important
Lets make some histagrams
Err on side if leaving feature in model to see if its good
Transformations
Why?
If left skewed, log transformed data pulls it to the middle. Model might dig too much into a tail inside of exploring the differences of the majority.
Where
Prime candiates dramatic skew with long tail or few outliers
Bimodal isn't heavily skewed without clear outliers
Box-Cox Power Transformation
Usually use exponents, y^x => y is value x is exponent.
Aim for normal distrubution, dont worry about 0
Test range of exponents, get measurement criteria
Last updated