Monday,November30,2015
LinuxPerformanceAnalysisin60,000Milliseconds
YoulogintoaLinuxserverwithaperformanceissue:whatdoyoucheckinthefirstminute?
AtNetflixwehaveamassiveEC2Linuxcloud,andnumerousperformanceanalysistoolstomonitorand
investigateitsperformance.TheseincludeAtlasforcloudwidemonitoring,andVectorforondemand
instanceanalysis.Whilethosetoolshelpussolvemostissues,wesometimesneedtologintoaninstance
andrunsomestandardLinuxperformancetools.
Inthispost,theNetflixPerformanceEngineeringteamwillshowyouthefirst60secondsofanoptimized
performanceinvestigationatthecommandline,usingstandardLinuxtoolsyoushouldhaveavailable.
In60secondsyoucangetahighlevelideaofsystemresourceusageandrunningprocessesbyrunningthe
followingtencommands.Lookforerrorsandsaturationmetrics,astheyarebotheasytointerpret,andthen
resourceutilization.Saturationiswherearesourcehasmoreloadthanitcanhandle,andcanbeexposed
eitherasthelengthofarequestqueue,ortimespentwaiting.
uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top
Someofthesecommandsrequirethesysstatpackageinstalled.Themetricsthesecommandsexposewill
helpyoucompletesomeoftheUSEMethod:amethodologyforlocatingperformancebottlenecks.This
involvescheckingutilization,saturation,anderrormetricsforallresources(CPUs,memory,disks,e.t.c.).
Alsopayattentiontowhenyouhavecheckedandexoneratedaresource,asbyprocessofeliminationthis
narrowsthetargetstostudy,anddirectsanyfollowoninvestigation.
Thefollowingsectionssummarizethesecommands,withexamplesfromaproductionsystem.Formore
informationaboutthesetools,seetheirmanpages.
$ uptime
23:51:26 up 21:31, 1 user, load average: 30.02, 26.43, 19.02
Thisisaquickwaytoviewtheloadaverages,whichindicatethenumberoftasks(processes)wantingto
run.OnLinuxsystems,thesenumbersincludeprocesseswantingtorunonCPU,aswellasprocesses
blockedinuninterruptibleI/O(usuallydiskI/O).Thisgivesahighlevelideaofresourceload(ordemand),but
can’tbeproperlyunderstoodwithoutothertools.Worthaquicklookonly.
Thethreenumbersareexponentiallydampedmovingsumaverageswitha1minute,5minute,and15
minuteconstant.Thethreenumbersgiveussomeideaofhowloadischangingovertime.Forexample,if
you’vebeenaskedtocheckaproblemserver,andthe1minutevalueismuchlowerthanthe15minute
value,thenyoumighthaveloggedintoolateandmissedtheissue.
Intheexampleabove,theloadaveragesshowarecentincrease,hitting30forthe1minutevalue,compared
to19forthe15minutevalue.Thatthenumbersarethislargemeansalotofsomething:probablyCPU
demand;vmstatormpstatwillconfirm,whicharecommands3and4inthissequence.
First60Seconds:Summary
1.uptime
2.dmesg|tail
NetflixUS&CanadaBlog
NetflixAmericaLatinaBlog
NetflixBrasilBlog
NetflixBeneluxBlog
NetflixDACHBlog
NetflixFranceBlog
NetflixNordicsBlog
NetflixUK&IrelandBlog
NetflixISPSpeedIndex
OpenpositionsatNetflix
NetflixWebsite
FacebookNetflixPage
NetflixUIEngineering
RSSFeed
Links
ThisisaNetflixblogfocusedon
technologyandtechnologyissues.
We'llshareourperspectives,
decisionsandchallengesregarding
thesoftwarewebuildanduseto
createtheNetflixservice.
AbouttheNetflixTechBlog
▼2015(43)
▼November(5)
LinuxPerformance
Analysisin60,000
Milliseconds
CreatingYourOwn
EC2SpotMarket
Part2
SleepyPuppy
ExtensionforBurp
Suite
GlobalContinuous
Deliverywith
Spinnaker
NetflixHackDay
Autumn2015
►October(5)
►September(6)
►August(6)
►July(3)
►June(2)
►May(2)
►April(3)
BlogArchive