[ This is my personal blog so all opinions expressed here are mine. I am a product, scalability, operations and monetization advisor and currently employed as Director of Business Operations & Technical Strategy for a top 50 website that delivers billions of page views per month. I was a keynote panelist for Scaling Up or Out keynote at MySQL Conference and speak regularly at conferences and user groups.
]
Next session is High-performance Ajax Applications by Julien Lecomte (Yahoo!).
Plan for performance from day 1
work closely with designers and product managers
understand design rationale
explain the tradeoffs between design and performance
offer alternatives and shw what is possible (prototypes)
as a last resort, simplify design
Engineering high performance: a few basic rules:
don't do anything unnecessary
less is more
break the rules
work on improving perceived performance
users can generally deal with a little bit of discomfort if they can see something is happening.
can't compromise security but someother things can be compromised
in general avoid presentational markup
Measuring performance:
test performance using a setup similar to your user's environment
profile your code during development
automate profiling/performance testing
keep historical records of how features perform
consider keeping some (small amount of ) profiling code in production
Yahoo!'s exceptional performance rules
make fewer http requests
use a content delivery network
Asset optimization:
minify CSS and JS files
combine CSS and JS files
optimize image assets
Reduce unminified code size:
loading and parsing HTML, CSS and JS code is costly
be concise and write less code
make good use of javascript features
consider optimizing your large JS files into smaller files (bundles) when the parsing and compilation of the script takes an excessive amount of time
load code (HTML, CSS and JS) on demand
Optimize initial rendering (1/4) misc tips
consider rendering the first view on the server. (server should generate the markup)
this will speed up the intial rendering.
close your HTML tags to speed up parsing
consider flushing the apache buffer very early on
load only essential assets/load assets on a delay or on demand.
don't always wait for onload
most DOM operations can be accomplished before he onload event has fired
post load script loading:
a well designed site should be fully functional even without the JS enabled
therefore you may be able to load scripts on a delay
conditional preloading:
preload assets that you know user is likely to need very shortly
however, one must be smart about when the preloading takes place. Otherwise the preloading may actually worsen the user experience.
Part 3: High Performance Javascript:
look up is performed in JS everytime a variable is accessed.
declare with the var keyword and use variables in the same scope whenever possible and avoid global resources at all costs.
never use the with keyword as it prevents the compiler from generating code for fast access to local variables.
cache the results of expensive lookups in local variables.
The prototype chain:
accessing member found in the primary object is about 25% faster
optimize object instantiation:
if you need to create many objects, consider adding members to the prototype instead.
Don't use eval
eval is evil
the string passed to eval (and its relatives, the function constructr and setTimeout and setInterval function needs to be compiled and interpreted (extremely slow).
optimize string concatenation:
on IE concatentating two strings causes a new string to be allocated and the two original strings to be copied.
therefore, it is mch faster on IE to append string to an array and then use Array.join
optimize regular expressions.
don't use the RegExp constructure unless your regular expression is assembled at runtime. Instead, use regular expression literals.
use the test mehod if all you want to do is test for a pattern. (the exec method carries a small performance penlty.
Caching
caching can be justified when there is a high cost associated with getting/accessing the data and when data wouldn't change over time.- increases memory consumption (tradeoff)
memorization
long running javascript running process (longer than 300ms):
the entire browser UI is frozen.
to maintain a decent user experience make sure that JS threads never take more than 300 ms.
misc:
function calls have overhead associated with them
consider using primitive operations since they are often faster than the corresponding function cals
if possible, avoid using try..catch in performance critical sections
if possible, avoid for...in in perormance critical sections
branch outside, not inside, whenever the branching condition does not change
Next session is Stress, Load and Performance Testing in Quality Assurance by Goranka Bjedov of Google.
I have been wanting to hear Goranka for some time now as her sessions usually end up becoming the highlight of the event. For record, she passionately hates Power Point (I don't blame her).
I couldn't find a video of her Velocity talk but here is a video from her previous talk that's equally interesting.
Goranka spends all her time doing performance testing at Google. She tests Adwords, AdSense and hates any kinds of presentation tools.
focus on the backend. Steve Souders is the client side performance guy.
she works on the servers.
what are the bottlenecks?
QA people should be able to tell you what to expect.
2 purposes: what is going on in application? and monitor application for changes.
a small code change can cause tremendous performance decline.
first thing is if a mistake is made, everyone should know right away. Finding it later can cost a lot
80% of the performance problems can be worked out with one front end and one backend as long as you have the right database.
figure out what is happenging with important transactions
if you don't know what important transactions are, make a guess. it's better than nothing. don't be paralyzed, then worry about perfection
there is nothing as perfection in performance testing. she cannot guarantee the exact results. All tests are run as statistical tests and run 5 times or so.
big proponent of open source tools: jmeter, grinder and funkload. In Windows environment look at Open SDA.
Vendor tools do reasonably job. (and solve the problem of having too much money)
Open Source tools do exactly the same job. they are not free completely as time is required.
if you're not willing to spend time on OS tool, then why even spend half a million dollars?
she is happy that OS tools don't have monitoring built into it.
monitoring is absolutely essential and must be done separately.
for majority of things you can troubleshoot and benchmark within 3 weeks.
check google blog for her posts: open source performance testing tools
Next session is Clouds Are No Substitute for Competence by Javier Soltero of Hyperic.
The promise of cloud computing:
Cloud computing is the next big thing: Because it is green, easy, scalable, available and disposable.
Cloud computing adds complexity:
clouds allow you to run your applications, but mask the performance of the infrastructure powering them. NYT is not going to stop their own infrastructure just because they had success with one project on EC2.
when a problem happens, where is the source of the problem? cloud or your own app.
cloud, by definition is always available and the status is always green.
how quickly can I provision new servers?
what is the throughput in the regions I use?
what latency am I getting for my messages?
How do you answer?
'is it my application, or is it the cloud?"
Hyperic is introducing cloudstatus.com which shows performance, availability and health of Amazon's Web services. On CloudStatus.com, you can monitor EC2, S3, SQS, SDB and FPS (5 most popular and critical services of AWS). You can look at performance metrics such as deployment latency. They are firing Amazon instances and monitoring response times.
After the break, the next session is Energy Efficient Operations: Some Challenges and Opportunities. Luiz Barroso from Google is the presenter. I got a couple minutes late as I had to pick the charger.
Server electricity usage in perspective:
worldwide electricity usage of servers is around 1% of total electricity consumption.
usage doubled between 2000 and 2005
could increase by 40%-76% by 2010.
PC enery consumption likely higher:
installed base for servers in 2005 - 27M
installed base for PCs in 2005: 870M
Measuring computing energy efficiency
harder for computers than for refrigerators
efficiency = work done / energy used = computing speed / power
biggest thing you can do for energy efficiency is write fast code. it can have really big impact.
from measurement standpoint, it is useful to break down the energy efficiency/budget equation
breaking it down:
efficiency = (work done / energy used in chips) * (energy used in chips / energy provided to servers) * (energy provided to servers / energy entering the building)
first: computing efficiency
second: server efficiency
third: datacenter efficiency or 1/PUE (power usage efficiency)
Energy efficiency opportunities:
datacenter energy efficiency
LBNL survey of 24 facilities shows avg PUE of 1.83
underutilized data centers
wasted power provisioning investment
makes cooling and power distribution less efficient
server energy efficiency
typical server power supplies dissipate 25% of total energy
DC-to-DC voltage regulatorscan lose another 25%
computing efficiency
servers have poor energy efficiency in their most common usage range
Plan for today:
datacenter efficiency
the power provisioning efficiency: What can you achieve if you utilize all energy in your data center.
two key energy related costs:
10 year energy costs ($9/watt)
cost of building a datacenter ($10-22/watt)
Facility costs are as important as energy consumption costs
Datacenter buildout can be larger than energy itself.
Efficiency provisioning playbook:
consolidate workloads into the minimum number of machines needed for peak usage requirements
smart scheduling or virtualization help here
measure actual power usage of devices
nameplates lie!
study activity trends and investigate the oversubscription potential
the subject of our ISCA 07 article
Six month power monitoring study at Google (ISCA 07)
Basic setup
model based power monitoring scheme
measure usage statistics at rack, PDU and cluster levels
4 diferent workloads over 5k servers
More servers leads to higher oversubscription potential.
Safely oversubscribing power
oversubscribe at the datacenter level, not of at server or rack levels
profile power usage of applications: learn what to expect
mix workloads
manage overload
provision a sizeable 'best effort' workload; victimize it first
use applications with QoS stack
good news: time constants to react are long
Energy-proportional computing: (An article was published in december of last year)
look at datacenter as a device you have to lower power for
he calls the datacenter: a land-held
CPU activity distribution over six months (graph)
real production systems don't run full blast all the time.
systems run 10% to 50% of their full capacity most of the time.
fraction of time these servers are doing nothing is very small.
A datacenter and a laptop are indeed different
Characteristics of well designed internet services:
high performance and high availability requires
load balancing and wide data distribution -> no useful idle intervals, lots of low activity intervals
example: Google file system:
replicas distributed across multiple machines
reads load balancing across replicas, writes need to reach all.
Key implications:
sleep or power-down strategies are much less useful in servers
focus on energy efficiency at peak performance is misguided
Power varies with amount of activity in servers. When a machine is completely idle, it still pretty much uses half of peak power it consumes. At 1/3 of peak, power efficiency is halved.
Energy-proportional computing: (the idea)
no work, no power consumed
some work, some power consumed
lots of work, lots of power consumed
That would be the end of power management software.
What if we could build machines with a wide activity range? He shows a graph.
Estimated impact of energy proportionality is quite huge based on another graph.
Conclusion:
write fast code!
the software engineer's biggest contribution to energy efficiency
Last session before the break is "Innovation That Drives Opportunity for the Web Infrastructure" by John Fowler (Sun Microsystems). John is responsible for hardware at Sun.
Applications are built in different ways.
Three things Sun is working on:
Computing
Open Storage
focusing on $/performance
Networking
huge bandwidth
He is talking about Web 2.0 architectures. The software running today wasn't there 10 years ago. Almost everyone is horizontally scaled which brings up a host of technology issues.
Sun's Web 2.0 kit: a set of performance and benchmarking applications. Sun will be open sourcing this and other tools. The tools tested are web/app server, cache layers, database and storage.
It's driving you crazy:
power, heat, space
scale
understanding the infrastructure
performance
Compute:
relatively straightforward
clock rates not going up. everyone scaling horizontally
lower memory latency
how can you have a high degree of concurrency
Cores and threads are on the move. Sun is working on 16 cores per socket. Future is higher and higher degrees of concurrency.
Open Storage (Servers + Storage + Open Solaris) :
built on OpenSolaris
performance of ZFS and SSDs
cost efficiency of volume hardware
scale easily
millions of files
gigabytes / sec
management simplicity
analytics with dTrace
diagnostics with FMA
Why Applications Don't Perform:
Waiting for DATA
Future: Enterprise SSD:
up to 5,000 - 8,000 write IOPS
up to 30,000 to 40,000 read IOPS
32 GB
$ per IOPS $0.08 compared to $2.43 traditionally.
New generation of flash is quite reliable and has no moving parts. Power consumption of SSDs is 2 watts compared to 13 watts for traditional HDD.
ZFS Hybrid Storage Pool Model:
High Performance Read and Write Cache Pool
ZFS combines main memory and SSDs for read caching
Notes from Velocity Conference continue: Next up is a Keynote by Artur Bergman (Wikia). Wikia runs 7000 Wikis and has 400 million page views per month.
Google, Yahoo and Amazon are what people rely on
Friendster.com, Twitter and boo.com have serious reliability problem
Value of performance/ reliability
brand value (they rely on you)
more page views (fixed amount of time + faster site)
Match user expectations:
World of Warcraft:
$520 million in profit last year
99% reliable
down every week, scheduled
server crash
"We pay them money, so we have to accept the downtime."
Operations:
efficient use of resources
end user performance
reliability
bad operations wastes R&D and cost of sale money
Business
cost per pageview. How many actually know this?
cost per page
revenue CPM - cost per pageview
Gross margin
Reliability example:
20% of wikia pages
200ms -> 15s to load
35% reduction of page views out of the slow pages
15% reduction of the fast pages
slow pages made people abandon the site.
Happy users - Lower cost
for the users: service industry
for the business: cost per page view, lower capex.
VC love to give money to IBM, HP, Dell
better for the environment.
May performance project:
50% cpu usage cost
delay investment 6 months
3 engineers - 4 weeks
Cache miss
300ms -> 190 ms
Perception
Ads
Ads are slow
Load ads after content load
Dramatic change:
significant % increase in pageviews
We lose money
but edits increase
Stay in loose and simple area, stay away from the complexity.
Next up is Scott Ruthfield (WhitePages.com) talking about Jiffy: Open Source Performance Measurement and Instrumentation. WhitePages.com is a people search power. They own 411.com. They have data on 180 million people doing 2 billion searches / year and 500 searches per second during peak. A top-50 comscore site.
Very important performance lessons: Scott says "Slow is bad." Customers just don't want you to be slow. "We're slow." A detailed analysis on their end revealed that the slowness wasn't their fault.
Reflection:
YSMV: Your slowness may vary
YCMWYCM: You can't manage what you can't measure
Jiffy:
means: Small unit of time; tick between system clock interrupts
is an end-to-end system for measuring and reporting on page load activity
Four goals:
real data at scale: what are customers seeing?
measure anything.
real/near time reporting
~0 impact on page performance
he says "it works"
What are the components of Jiffy?:
jiffy.js - library for instrumenting your pages and reporting measurements
apache (httpd.conf) config - receive and log measuremens
Better late than never. At Velocity Conference I took a lot of notes and didn't get to publish them earlier. Now that I find some time on my hand, I am going to go ahead and publish them.
At Velocity, there will be two product launches. Vik Chaudhary (Keynote Systems, Inc.) and Abelardo Gonzalez (Keynote Systems) are on stage now.
The first product is KITE (Keynote Internet Testing Environment). Keynote has been in business for 12 years. Providing a single performance testing environment for everyone (web developers, QA and system administrators) has always been a challenge for both startups and major players. This presentation sounds very promising.
Keynote collects 200 million internet measurements everyday.
With KITE, You can test from desktop to the internet cloud.
Keynote built it to provide a single performance testing environment for web developers, QA and IT operations team.
Sites are making upgrades every single day. Customers want to be able to test it from multiple locations
KITE is Ideal for 3 things
recording , scripting and playback of web transactions
instant web tests from desktop
scheduled testing for higher productivity
KITE Web Performance Engine allows for:
performance analysis for multi-page transactions, not just single pages
Javascript programmability for scripting and DOM analysis
testing from the desktop, Last Mile and the Internet Cloud
Native IE integration which in turn allows for easily analysis of AJAX, Flash and Javascript
Recording of test scripts which can be played back in "burst" mode and share scripts
Next is a demo by Abelardo Gonzalez.
You get a script repository that you can save for performance benchmarks.
There is a record button and a free global test button.
They will be testing iGoogle.
You start a session then as you take actions, they are recorded in the console.
KITE helps with three main things:
user experience time
network time
cache network time (for repeat visitors)
You can download very deep into each page and each performance metric. Tests from Global Network are also possible where you can replay exactly the steps you made from your desktop from all supported areas (e.g. Atlanta).
Next up is Bill Coleman (Cassatt Corporation) who is responsible for B in BEA. He is also credited for his work on Solaris. Currently he is CEO of Cassatt Corporation. The talk is about Green Data Centers.
What we are doing today in data centers is unsustainable. He calls them 'your father's data center'
Concerns
first is energy cost
second is operations cost. IDC says it has gone from 25% to 75%.
everything is a lot more complex today than it was 15 years ago.
how we got here? this is a consequence of innovation. In 1990, people were putting networks in data centers. Then came storage, followed by software people who wanted multi-tiered applications. Then came DBAs :)
Then came virtualization. Is it end of IT? We are doing things still as it is 1960s. There is no automation involved, everything must be changed physically.
We are at end of sustainability of data centers as we know it today.
Virtualization makes scale a little bit better. All we are doing is pushing back the ends.
1.0 of cloud: i can build a green field application with proprietary
2.0 of cloud: functions of PC now exist in cloud. it will still be proprietary.
Apple invented PC but didn't commoditize it.
Very low utilization rates. The next phase of cloud computing will offer higher utilization rates.
Thanks Bill for a great insight into green data centers.
I arrived here yesterday night afer a very long flight and am sitting with Anthony Lopez and Dan White of Cafe Mom. Steve Souders (Google), Jesse Robbins (O'Reilly Radar) are now on the stage. They are talking about how Velocity got started. They completed a five minute speech in two (hey, it's a performance conference).
If all goes well, I will be blogging about several sessions at Velocity.