Foreword
Datascienceischangingthewaywegoaboutourdailylivesatanunprecedentedpace.
Therecommendationsyouseeone-commercewebsites,thetechnologiesthatprevent
creditcardfraud,thelogicbehindairlineitineraryandrouteselections,theproductsand
discountsyouseeinretailstores,andmanymoredecisionsarelargelypoweredbydata
science.Futuristicsoundingapplicationslikeself-drivingcars,robotstodohousehold
chores,smartwearabletechnologies,andsoonarebecomingareality,thanksto
innovationsindatascience.
Predictiveanalyticsisabranchofdatascience,usedtopredictunknownfutureevents
basedonhistoricaldata.Itusesanumberoftechniquesfromdatamining,statistical
modellingandmachinelearningtohelpmakeforecastswithanacceptablelevelof
reliability.
Pythonisahigh-level,object-orientedprogramminglanguage.Ithasgainedpopularity
becauseofitsclearsyntaxandreadability,andbeginnerscanpickupthelanguageeasily.
Itcomeswithalargelibraryofmodulesthatcanbeusedtodoamultitudeoftasksranging
fromdatacleaningtobuildingcomplexpredictivemodellingalgorithms.
I’maco-founderatTigerAnalytics,afirmspecializinginprovidingdatascienceand
predictiveanalyticssolutionstobusinesses.Overthelastdecade,Ihaveworkedwith
clientsatnumerousFortune100companiesandstart-upsalike,andarchitectedavariety
ofdatasciencesolutionframeworks.AshishKumar,theauthorofthisbook,iscurrentlya
buddingdatascientistatourcompany.Hehasworkedonseveralpredictiveanalytics
engagements,andunderstandshowbusinessesareusingdatatobringinscientificdecision
makingtotheirorganizations.Beingayoungpractitioner,Ashishrelatestosomeonewho
wantstolearnpredictiveanalyticsfromscratch.Thisisclearlyreflectedinthewayhe
presentsseveralconceptsinthebook.
Whetheryouareabeginnerindatasciencelookingtobuildacareerinthisarea,ora
weekendenthusiastcurioustoexplorepredictiveanalyticsinahands-onmanner,youwill
needtostartfromthebasicsandgetagoodhandleonthebuildingblocks.Thisbookhelps
youtakethefirststepsinthisbravenewworld;itteachesyouhowtouseandimplement
predictivemodellingalgorithmsusingPython.Thebookdoesnotassumepriorknowledge
inanalyticsorprogramming.Itdifferentiatesitselffromothersuchprogramming
cookbooksasitusespubliclyavailabledatasetsthatcloselyrepresentdataencounteredin
businessscenarios,andwalksyouthroughtheanalysisstepsinaclearmanner.
Thereareninechaptersinthebook.Thefirstfewchaptersfocusondataexplorationand
cleaning.Itiswrittenkeepingbeginnerstoprogramminginmind—byexplainingdifferent
datastructuresandthengoingdeeperintovariousmethodsofdataprocessingand
cleaning.Subsequentchapterscoverthepopularpredictivemodellingalgorithmslike
linearregression,logisticregression,clustering,decisiontrees,andsoon.Eachchapter
broadlycoversfouraspectsoftheparticularmodel—mathbehindthemodel,different
typesofthemodel,implementingthemodelinPython,andinterpretingtheresults.