mashraqi

> velocity08

[ This is my personal blog so all opinions expressed here are mine. I am a product, scalability, operations and monetization advisor and currently employed as Director of Business Operations & Technical Strategy for a top 50 website that delivers billions of page views per month. I was a keynote panelist for Scaling Up or Out keynote at MySQL Conference and speak regularly at conferences and user groups. ]
Farhan "Frank" Mashraqi

Saturday, July 05, 2008

High-performance Ajax Applications

Yet more notes from Velocity.

Next session is High-performance Ajax Applications by Julien Lecomte (Yahoo!).
  • Plan for performance from day 1
  • work closely with designers and product managers
  • understand design rationale
  • explain the tradeoffs between design and performance
  • offer alternatives and shw what is possible (prototypes)
  • as a last resort, simplify design
Engineering high performance: a few basic rules:
  • don't do anything unnecessary
  • less is more
  • break the rules
  • work on improving perceived performance
    • users can generally deal with a little bit of discomfort if they can see something is happening.
  • can't compromise security but someother things can be compromised
  • in general avoid presentational markup
Measuring performance:
  • test performance using a setup similar to your user's environment
  • profile your code during development
  • automate profiling/performance testing
  • keep historical records of how features perform
  • consider keeping some (small amount of ) profiling code in production
Yahoo!'s exceptional performance rules
  • make fewer http requests
  • use a content delivery network
Asset optimization:
  • minify CSS and JS files
  • combine CSS and JS files
  • optimize image assets
Reduce unminified code size:
  • loading and parsing HTML, CSS and JS code is costly
  • be concise and write less code
  • make good use of javascript features
  • consider optimizing your large JS files into smaller files (bundles) when the parsing and compilation of the script takes an excessive amount of time
  • load code (HTML, CSS and JS) on demand
Optimize initial rendering (1/4) misc tips
  • consider rendering the first view on the server. (server should generate the markup)
    • this will speed up the intial rendering.
  • close your HTML tags to speed up parsing
  • consider flushing the apache buffer very early on
  • load only essential assets/load assets on a delay or on demand.
  • don't always wait for onload
  • most DOM operations can be accomplished before he onload event has fired
  • post load script loading:
    • a well designed site should be fully functional even without the JS enabled
    • therefore you may be able to load scripts on a delay
  • conditional preloading:
    • preload assets that you know user is likely to need very shortly
    • however, one must be smart about when the preloading takes place. Otherwise the preloading may actually worsen the user experience.
Part 3: High Performance Javascript:
  • look up is performed in JS everytime a variable is accessed.
  • declare with the var keyword and use variables in the same scope whenever possible and avoid global resources at all costs.
  • never use the with keyword as it prevents the compiler from generating code for fast access to local variables.
  • cache the results of expensive lookups in local variables.
The prototype chain:
  • accessing member found in the primary object is about 25% faster
  • optimize object instantiation:
  • if you need to create many objects, consider adding members to the prototype instead.
Don't use eval
  • eval is evil
  • the string passed to eval (and its relatives, the function constructr and setTimeout and setInterval function needs to be compiled and interpreted (extremely slow).
optimize string concatenation:
  • on IE concatentating two strings causes a new string to be allocated and the two original strings to be copied.
  • therefore, it is mch faster on IE to append string to an array and then use Array.join
optimize regular expressions.
  • don't use the RegExp constructure unless your regular expression is assembled at runtime. Instead, use regular expression literals.
  • use the test mehod if all you want to do is test for a pattern. (the exec method carries a small performance penlty.
Caching
  • caching can be justified when there is a high cost associated with getting/accessing the data and when data wouldn't change over time.- increases memory consumption (tradeoff)
memorization

long running javascript running process (longer than 300ms):
  • the entire browser UI is frozen.
  • to maintain a decent user experience make sure that JS threads never take more than 300 ms.
misc:
  • function calls have overhead associated with them
  • consider using primitive operations since they are often faster than the corresponding function cals
  • if possible, avoid using try..catch in performance critical sections
  • if possible, avoid for...in in perormance critical sections
  • branch outside, not inside, whenever the branching condition does not change

Labels: , , , , ,

Stress, Load and Performance Testing in Quality Assurance

Next session is Stress, Load and Performance Testing in Quality Assurance by Goranka Bjedov of Google.

I have been wanting to hear Goranka for some time now as her sessions usually end up becoming the highlight of the event. For record, she passionately hates Power Point (I don't blame her).

I couldn't find a video of her Velocity talk but here is a video from her previous talk that's equally interesting.



Goranka spends all her time doing performance testing at Google. She tests Adwords, AdSense and hates any kinds of presentation tools.
  • focus on the backend. Steve Souders is the client side performance guy.
  • she works on the servers.
  • what are the bottlenecks?
  • QA people should be able to tell you what to expect.
  • 2 purposes: what is going on in application? and monitor application for changes.
  • a small code change can cause tremendous performance decline.
  • first thing is if a mistake is made, everyone should know right away. Finding it later can cost a lot
  • 80% of the performance problems can be worked out with one front end and one backend as long as you have the right database.
  • figure out what is happenging with important transactions
  • if you don't know what important transactions are, make a guess. it's better than nothing. don't be paralyzed, then worry about perfection
  • there is nothing as perfection in performance testing. she cannot guarantee the exact results. All tests are run as statistical tests and run 5 times or so.
  • big proponent of open source tools: jmeter, grinder and funkload. In Windows environment look at Open SDA.
  • Vendor tools do reasonably job. (and solve the problem of having too much money)
  • Open Source tools do exactly the same job. they are not free completely as time is required.
  • if you're not willing to spend time on OS tool, then why even spend half a million dollars?
  • she is happy that OS tools don't have monitoring built into it.
  • monitoring is absolutely essential and must be done separately.
  • for majority of things you can troubleshoot and benchmark within 3 weeks.
  • check google blog for her posts: open source performance testing tools
Also see:

Labels: , , , , , ,

Actionable Logging for Smoother Operation and Faster Recovery

Notes from Velocity.

Next session is Actionable Logging for Smoother Operation and Faster Recovery by Mandi Walls (AOL).

Actionable logging:
  • no nonsense logging
  • concise, easy to understand
  • express symptoms of production issues
  • anything that makes the log needs to be fixed
Why it's important?
  • expending resources on production systems
  • the point of logging in production
  • diagnosis of issues
  • the 4am test
Logging goals:
  • diagnosis and recovery
  • statistics and monitoring
  • provide insight into the behavior of the application
  • indicate potential issues, and areas for improvement
  • not the same goals as development and QA environments.
Types of logs:
  • access log
  • server log, i.e. catalina.out
  • application logs
  • special use logs for recording specific groups of activities
Log file location:
  • where logs are located on the system should be predictable and obvious
  • it may be helpful to locate logs on different disk partitions but link them back to the app.
  • keep older logs in an obvious place. Having a couple days worth of logs helps you get convergence on the problem.
Log file management:
  • everyone has their own method
  • roll logs into files with timestamps
    • host-01.log.003 vs host-01.log.06202008
  • roll all the logs at the same time for a given app to make coordination of events easier
  • roll when teh app needs logs rolled: hourly, daily weekly.
Logging quality information
  • logs should be expressive but not overly verbose
  • keys to making logs more actionable:
    • appropriate formats
Quality information: format
  • Timestamping: what not to do:
    • 1213988938:tvdata shows/617/306
    • 1213988939:tvdata shows/618/307
  • Timestamps that mean something:
    • Jun 19,2009 4:20:25PM
  • Good timestamps give context for linking to external events like network outages or traffic anomalies
Other considerations in log file format include:
  • creating a common format for multiple products and log tpes.
  • limiting the number of log entries that write to multiple log lines for faster parsing
  • deciding how much is too much information
Quality info:
  • bad: messages with only numerical error codes in them
  • bad: don't put debugging messages in production logs.
  • bad: if log message doesn't give you anything to lead on.
  • bad: multiple line error messages.
  • good: module_id with descriptive message
Making messages useful:
  • don't list non fatal messages as fatal
  • incorrect / misleading severity
  • not logging anything for fatal errors
Improving log messages:
  • log at the first point an error is encountered - don't log a timeout to a backend as a parse error of date expected from the request
  • messages include the method name and key variables to speed up fixes
  • suppress anything in the log that isn't actionable
log messages convergence:
  • actively managing, parsing, pruning logs make new errors more obvious
  • check teh logs afer every install for new messages
things to avoid in logs:
  • usernames, passwords, database logins
    • provide crib notes for anone getting in your system.
how much is too much:
  • more than 25% of the number of access log entries is a hindrance. Even 10% may be too much in most environments
  • if you log more in server log than access log, then its a problem
conclusion
  • the log is first line of information when a problem occurs
  • a production log should be focused on providing information to operations staff, not developers.
  • when, where and how messages are logged can help or hinder recovery
Also see:

Labels: , , , , ,

Clouds Are No Substitute for Competence

More notes from Velocity.

Next session is Clouds Are No Substitute for Competence by Javier Soltero of Hyperic.


The promise of cloud computing:

Cloud computing is the next big thing:
Because it is green, easy, scalable, available and disposable.

Cloud computing adds complexity:
  • clouds allow you to run your applications, but mask the performance of the infrastructure powering them. NYT is not going to stop their own infrastructure just because they had success with one project on EC2.
  • when a problem happens, where is the source of the problem? cloud or your own app.
  • cloud, by definition is always available and the status is always green.
  • how quickly can I provision new servers?
  • what is the throughput in the regions I use?
  • what latency am I getting for my messages?

How do you answer?
  • 'is it my application, or is it the cloud?"
Hyperic is introducing cloudstatus.com which shows performance, availability and health of Amazon's Web services. On CloudStatus.com, you can monitor EC2, S3, SQS, SDB and FPS (5 most popular and critical services of AWS). You can look at performance metrics such as deployment latency. They are firing Amazon instances and monitoring response times.

Also see:

Labels: , , , , , ,

Energy Efficient Operations: Some Challenges and Opportunities

Yet more notes from Velocity.

After the break, the next session is Energy Efficient Operations: Some Challenges and Opportunities. Luiz Barroso from Google is the presenter. I got a couple minutes late as I had to pick the charger.



Server electricity usage in perspective:
  • worldwide electricity usage of servers is around 1% of total electricity consumption.
  • usage doubled between 2000 and 2005
  • could increase by 40%-76% by 2010.
PC enery consumption likely higher:
  • installed base for servers in 2005 - 27M
  • installed base for PCs in 2005: 870M
Measuring computing energy efficiency
  • harder for computers than for refrigerators
  • efficiency = work done / energy used = computing speed / power
  • biggest thing you can do for energy efficiency is write fast code. it can have really big impact.
  • from measurement standpoint, it is useful to break down the energy efficiency/budget equation
  • breaking it down:
    • efficiency = (work done / energy used in chips) * (energy used in chips / energy provided to servers) * (energy provided to servers / energy entering the building)
    • first: computing efficiency
    • second: server efficiency
    • third: datacenter efficiency or 1/PUE (power usage efficiency)
Energy efficiency opportunities:
  • datacenter energy efficiency
    • LBNL survey of 24 facilities shows avg PUE of 1.83
  • underutilized data centers
    • wasted power provisioning investment
    • makes cooling and power distribution less efficient
  • server energy efficiency
    • typical server power supplies dissipate 25% of total energy
    • DC-to-DC voltage regulatorscan lose another 25%
  • computing efficiency
    • servers have poor energy efficiency in their most common usage range
Plan for today:
  • datacenter efficiency
    • the power provisioning efficiency: What can you achieve if you utilize all energy in your data center.
  • two key energy related costs:
    • 10 year energy costs ($9/watt)
    • cost of building a datacenter ($10-22/watt)
  • Facility costs are as important as energy consumption costs
TCO components: Rough cost breakdown: datacenter (28%) hardware (50%) energy (22%)

Datacenter buildout can be larger than energy itself.

Efficiency provisioning playbook:
  • consolidate workloads into the minimum number of machines needed for peak usage requirements
    • smart scheduling or virtualization help here
  • measure actual power usage of devices
    • nameplates lie!
  • study activity trends and investigate the oversubscription potential
    • the subject of our ISCA 07 article
Six month power monitoring study at Google (ISCA 07)
  • Basic setup
    • model based power monitoring scheme
    • measure usage statistics at rack, PDU and cluster levels
    • 4 diferent workloads over 5k servers
More servers leads to higher oversubscription potential.

Safely oversubscribing power
  • oversubscribe at the datacenter level, not of at server or rack levels
  • profile power usage of applications: learn what to expect
  • mix workloads
  • manage overload
    • provision a sizeable 'best effort' workload; victimize it first
    • use applications with QoS stack
    • good news: time constants to react are long
Energy-proportional computing: (An article was published in december of last year)
  • look at datacenter as a device you have to lower power for
  • he calls the datacenter: a land-held
  • CPU activity distribution over six months (graph)
    • real production systems don't run full blast all the time.
    • systems run 10% to 50% of their full capacity most of the time.
  • fraction of time these servers are doing nothing is very small.
  • A datacenter and a laptop are indeed different
Characteristics of well designed internet services:
  • high performance and high availability requires
    • load balancing and wide data distribution -> no useful idle intervals, lots of low activity intervals
  • example: Google file system:
    • replicas distributed across multiple machines
    • reads load balancing across replicas, writes need to reach all.
Key implications:
  • sleep or power-down strategies are much less useful in servers
  • focus on energy efficiency at peak performance is misguided
Power varies with amount of activity in servers. When a machine is completely idle, it still pretty much uses half of peak power it consumes. At 1/3 of peak, power efficiency is halved.

Energy-proportional computing: (the idea)
  • no work, no power consumed
  • some work, some power consumed
  • lots of work, lots of power consumed
That would be the end of power management software.

What if we could build machines with a wide activity range? He shows a graph.

Estimated impact of energy proportionality is quite huge based on another graph.

Conclusion:
  • write fast code!
    • the software engineer's biggest contribution to energy efficiency
  • consider reduction of all energy-related costs
    • electricity, and datacenter provisioning
Some Google initiatives
  • carbon neutrality
  • 1.6MW solar panel installation in Mtn. View
  • plugin-in hybrids (http://rechargeit.org)
One of the best presentations at Velocity.

More publications by Luiz:

Labels: , , , , , ,

Innovation That Drives Opportunity for the Web Infrastructure

More notes from

Last session before the break is "Innovation That Drives Opportunity for the Web Infrastructure" by John Fowler (Sun Microsystems). John is responsible for hardware at Sun.

Applications are built in different ways.

Three things Sun is working on:
  1. Computing
  2. Open Storage
    • focusing on $/performance
  3. Networking
    • huge bandwidth



He is talking about Web 2.0 architectures. The software running today wasn't there 10 years ago. Almost everyone is horizontally scaled which brings up a host of technology issues.

Sun's Web 2.0 kit: a set of performance and benchmarking applications. Sun will be open sourcing this and other tools. The tools tested are web/app server, cache layers, database and storage.

It's driving you crazy:
  • power, heat, space
  • scale
  • understanding the infrastructure
  • performance
Compute:
  • relatively straightforward
  • clock rates not going up. everyone scaling horizontally
  • lower memory latency
  • how can you have a high degree of concurrency
Cores and threads are on the move. Sun is working on 16 cores per socket. Future is higher and higher degrees of concurrency.

Open Storage (Servers + Storage + Open Solaris) :
  • built on OpenSolaris
  • performance of ZFS and SSDs
  • cost efficiency of volume hardware
  • scale easily
    • millions of files
    • gigabytes / sec
  • management simplicity
    • analytics with dTrace
    • diagnostics with FMA
Why Applications Don't Perform:
  • Waiting for DATA
Future:
Enterprise SSD:
  • up to 5,000 - 8,000 write IOPS
  • up to 30,000 to 40,000 read IOPS
  • 32 GB
  • $ per IOPS $0.08 compared to $2.43 traditionally.
New generation of flash is quite reliable and has no moving parts. Power consumption of SSDs is 2 watts compared to 13 watts for traditional HDD.

ZFS Hybrid Storage Pool Model:
  • High Performance Read and Write Cache Pool
    • ZFS combines main memory and SSDs for read caching
  • Adaptive Replacement Cache (ARC)
ZFS Intent Log pool
  • ZFS is a transactional file system
  • intent log stores small transactions
New Server memory hierarchy includes flash. Before 2010, flash will be in every server.

Also see:

Labels: , , , , , , , , ,

Importance of Operations and Performance

Notes from Velocity Conference continue:
Next up is a Keynote by Artur Bergman (Wikia). Wikia runs 7000 Wikis and has 400 million page views per month.
  • Google, Yahoo and Amazon are what people rely on
  • Friendster.com, Twitter and boo.com have serious reliability problem



Value of performance/ reliability
  • brand value (they rely on you)
  • more page views (fixed amount of time + faster site)
Match user expectations:
  • World of Warcraft:
    • $520 million in profit last year
    • 99% reliable
    • down every week, scheduled
    • server crash
"We pay them money, so we have to accept the downtime."

Operations:
  • efficient use of resources
  • end user performance
  • reliability
  • bad operations wastes R&D and cost of sale money
Business
  • cost per pageview. How many actually know this?
  • cost per page
  • revenue CPM - cost per pageview
  • Gross margin
Reliability example:
  • 20% of wikia pages
  • 200ms -> 15s to load
  • 35% reduction of page views out of the slow pages
  • 15% reduction of the fast pages
  • slow pages made people abandon the site.
Happy users - Lower cost
  • for the users: service industry
  • for the business: cost per page view, lower capex.
  • VC love to give money to IBM, HP, Dell
  • better for the environment.
May performance project:
  • 50% cpu usage cost
  • delay investment 6 months
  • 3 engineers - 4 weeks
Cache miss
  • 300ms -> 190 ms
Perception
  • Ads
  • Ads are slow
  • Load ads after content load
Dramatic change:
  • significant % increase in pageviews
  • We lose money
    • but edits increase
  • Stay in loose and simple area, stay away from the complexity.
Yet another great presentation at Velocity!

Labels: , , , , ,

Jiffy: Real World Performance Measurement

Continuation of my notes from Velocity Conference.

Next up is Scott Ruthfield (WhitePages.com) talking about Jiffy: Open Source Performance Measurement and Instrumentation. WhitePages.com is a people search power. They own 411.com. They have data on 180 million people doing 2 billion searches / year and 500 searches per second during peak. A top-50 comscore site.



Very important performance lessons:
Scott says "Slow is bad." Customers just don't want you to be slow. "We're slow." A detailed analysis on their end revealed that the slowness wasn't their fault.

Reflection:
  • YSMV: Your slowness may vary
  • YCMWYCM: You can't manage what you can't measure
Jiffy:
  • means: Small unit of time; tick between system clock interrupts
  • is an end-to-end system for measuring and reporting on page load activity
Four goals:
  • real data at scale: what are customers seeing?
  • measure anything.
  • real/near time reporting
  • ~0 impact on page performance
    • he says "it works"
What are the components of Jiffy?:
  • jiffy.js - library for instrumenting your pages and reporting measurements
  • apache (httpd.conf) config - receive and log measuremens
  • database schema and rollup scripts
  • ingestor: parse logs and store in any db
  • reporting toolset
  • firebug plugin.
Jiffy: mark and measure
  • mark - start timing from this point
  • measure - report elapsed time since mark.
  • one mark:: many measures
misc:
  • immediate or batch submits
  • default browser event measurements
What is available today?:
  • code.whitepages.com (project link + slides)
  • Open Source. Apache 2.0 license
  • Firebug plugin by Bill Scott of Netflix
What's planned?:
  • more database support
  • automated configuration
  • auto-timeout tools
Great work, Scott!

Also see:

Labels: , , , ,

KITE: Keynote Internet Testing Environment Launch at Velocity

Better late than never. At Velocity Conference I took a lot of notes and didn't get to publish them earlier. Now that I find some time on my hand, I am going to go ahead and publish them.

At Velocity, there will be two product launches. Vik Chaudhary (Keynote Systems, Inc.) and Abelardo Gonzalez (Keynote Systems) are on stage now.

The first product is KITE (Keynote Internet Testing Environment). Keynote has been in business for 12 years. Providing a single performance testing environment for everyone (web developers, QA and system administrators) has always been a challenge for both startups and major players. This presentation sounds very promising.
  • Keynote collects 200 million internet measurements everyday.
  • With KITE, You can test from desktop to the internet cloud.
  • Keynote built it to provide a single performance testing environment for web developers, QA and IT operations team.
  • Sites are making upgrades every single day. Customers want to be able to test it from multiple locations

KITE: Keynote Internet Testing Environment

KITE is Ideal for 3 things
  1. recording , scripting and playback of web transactions
  2. instant web tests from desktop
  3. scheduled testing for higher productivity
KITE Web Performance Engine allows for:
  • performance analysis for multi-page transactions, not just single pages
  • Javascript programmability for scripting and DOM analysis
  • testing from the desktop, Last Mile and the Internet Cloud
  • Native IE integration which in turn allows for easily analysis of AJAX, Flash and Javascript
  • Recording of test scripts which can be played back in "burst" mode and share scripts
Next is a demo by Abelardo Gonzalez.
  • You get a script repository that you can save for performance benchmarks.
  • There is a record button and a free global test button.
  • They will be testing iGoogle.
  • You start a session then as you take actions, they are recorded in the console.
KITE: Tests from 5 Global Cities
KITE helps with three main things:
  1. user experience time
  2. network time
  3. cache network time (for repeat visitors)
You can download very deep into each page and each performance metric. Tests from Global Network are also possible where you can replay exactly the steps you made from your desktop from all supported areas (e.g. Atlanta).

KITE
  • early adopter program
  • starts in August 2008
  • signup at http://kite.keynote.com
  • KITE is FREE!
Great product. Definitely worth checking out.

Also see:

Labels: , , , , ,

Monday, June 23, 2008

Green Data Centers

Next up is Bill Coleman (Cassatt Corporation) who is responsible for B in BEA. He is also credited for his work on Solaris. Currently he is CEO of Cassatt Corporation. The talk is about Green Data Centers.

  • What we are doing today in data centers is unsustainable. He calls them 'your father's data center'
  • Concerns
    • first is energy cost
    • second is operations cost. IDC says it has gone from 25% to 75%.
  • everything is a lot more complex today than it was 15 years ago.
  • how we got here? this is a consequence of innovation. In 1990, people were putting networks in data centers. Then came storage, followed by software people who wanted multi-tiered applications. Then came DBAs :)
  • Then came virtualization. Is it end of IT? We are doing things still as it is 1960s. There is no automation involved, everything must be changed physically.
  • We are at end of sustainability of data centers as we know it today.
  • Virtualization makes scale a little bit better. All we are doing is pushing back the ends.
  • 1.0 of cloud: i can build a green field application with proprietary
  • 2.0 of cloud: functions of PC now exist in cloud. it will still be proprietary.
  • Apple invented PC but didn't commoditize it.
  • Very low utilization rates. The next phase of cloud computing will offer higher utilization rates.
Thanks Bill for a great insight into green data centers.

Labels: , , , , ,

Velocity Conference - I'm Speaking

Velocity ConferenceVelocity Conference is about to start.

I arrived here yesterday night afer a very long flight and am sitting with Anthony Lopez and Dan White of Cafe Mom. Steve Souders (Google), Jesse Robbins (O'Reilly Radar) are now on the stage. They are talking about how Velocity got started. They completed a five minute speech in two (hey, it's a performance conference).

If all goes well, I will be blogging about several sessions at Velocity.

I am on a panel tomorrow, Success: A Survival Guide, along with Adam Jacob (HJK Solutions), Shayan Zadeh (Zoosk, Inc. ), Brian Moon (dealnews.com), Don MacAskill (SmugMug), John Allspaw (Flickr (Yahoo!)) and Michael Halligan (BitPusher, LLC).

Overall the conference looks great. Jesse and O'Reilly have done a great job putting a very nice schedule together and there are a lot of brains here.

Labels: , , ,

  • View Farhan 'Frank' Mashraqi's profile on LinkedIn
  • Structure 08
  • Graphing Social Patterns - East 2008
  • Velocity Conference
    follow me on Twitter

    © 2006 The Mashraqi's.