[ This is my personal blog so all opinions expressed here are mine. I am a product, scalability, operations and monetization advisor and currently employed as Director of Business Operations & Technical Strategy for a top 50 website that delivers billions of page views per month. I was a keynote panelist for Scaling Up or Out keynote at MySQL Conference and speak regularly at conferences and user groups.
]
After the break, the next session is Energy Efficient Operations: Some Challenges and Opportunities. Luiz Barroso from Google is the presenter. I got a couple minutes late as I had to pick the charger.
Server electricity usage in perspective:
worldwide electricity usage of servers is around 1% of total electricity consumption.
usage doubled between 2000 and 2005
could increase by 40%-76% by 2010.
PC enery consumption likely higher:
installed base for servers in 2005 - 27M
installed base for PCs in 2005: 870M
Measuring computing energy efficiency
harder for computers than for refrigerators
efficiency = work done / energy used = computing speed / power
biggest thing you can do for energy efficiency is write fast code. it can have really big impact.
from measurement standpoint, it is useful to break down the energy efficiency/budget equation
breaking it down:
efficiency = (work done / energy used in chips) * (energy used in chips / energy provided to servers) * (energy provided to servers / energy entering the building)
first: computing efficiency
second: server efficiency
third: datacenter efficiency or 1/PUE (power usage efficiency)
Energy efficiency opportunities:
datacenter energy efficiency
LBNL survey of 24 facilities shows avg PUE of 1.83
underutilized data centers
wasted power provisioning investment
makes cooling and power distribution less efficient
server energy efficiency
typical server power supplies dissipate 25% of total energy
DC-to-DC voltage regulatorscan lose another 25%
computing efficiency
servers have poor energy efficiency in their most common usage range
Plan for today:
datacenter efficiency
the power provisioning efficiency: What can you achieve if you utilize all energy in your data center.
two key energy related costs:
10 year energy costs ($9/watt)
cost of building a datacenter ($10-22/watt)
Facility costs are as important as energy consumption costs
Datacenter buildout can be larger than energy itself.
Efficiency provisioning playbook:
consolidate workloads into the minimum number of machines needed for peak usage requirements
smart scheduling or virtualization help here
measure actual power usage of devices
nameplates lie!
study activity trends and investigate the oversubscription potential
the subject of our ISCA 07 article
Six month power monitoring study at Google (ISCA 07)
Basic setup
model based power monitoring scheme
measure usage statistics at rack, PDU and cluster levels
4 diferent workloads over 5k servers
More servers leads to higher oversubscription potential.
Safely oversubscribing power
oversubscribe at the datacenter level, not of at server or rack levels
profile power usage of applications: learn what to expect
mix workloads
manage overload
provision a sizeable 'best effort' workload; victimize it first
use applications with QoS stack
good news: time constants to react are long
Energy-proportional computing: (An article was published in december of last year)
look at datacenter as a device you have to lower power for
he calls the datacenter: a land-held
CPU activity distribution over six months (graph)
real production systems don't run full blast all the time.
systems run 10% to 50% of their full capacity most of the time.
fraction of time these servers are doing nothing is very small.
A datacenter and a laptop are indeed different
Characteristics of well designed internet services:
high performance and high availability requires
load balancing and wide data distribution -> no useful idle intervals, lots of low activity intervals
example: Google file system:
replicas distributed across multiple machines
reads load balancing across replicas, writes need to reach all.
Key implications:
sleep or power-down strategies are much less useful in servers
focus on energy efficiency at peak performance is misguided
Power varies with amount of activity in servers. When a machine is completely idle, it still pretty much uses half of peak power it consumes. At 1/3 of peak, power efficiency is halved.
Energy-proportional computing: (the idea)
no work, no power consumed
some work, some power consumed
lots of work, lots of power consumed
That would be the end of power management software.
What if we could build machines with a wide activity range? He shows a graph.
Estimated impact of energy proportionality is quite huge based on another graph.
Conclusion:
write fast code!
the software engineer's biggest contribution to energy efficiency
Last session before the break is "Innovation That Drives Opportunity for the Web Infrastructure" by John Fowler (Sun Microsystems). John is responsible for hardware at Sun.
Applications are built in different ways.
Three things Sun is working on:
Computing
Open Storage
focusing on $/performance
Networking
huge bandwidth
He is talking about Web 2.0 architectures. The software running today wasn't there 10 years ago. Almost everyone is horizontally scaled which brings up a host of technology issues.
Sun's Web 2.0 kit: a set of performance and benchmarking applications. Sun will be open sourcing this and other tools. The tools tested are web/app server, cache layers, database and storage.
It's driving you crazy:
power, heat, space
scale
understanding the infrastructure
performance
Compute:
relatively straightforward
clock rates not going up. everyone scaling horizontally
lower memory latency
how can you have a high degree of concurrency
Cores and threads are on the move. Sun is working on 16 cores per socket. Future is higher and higher degrees of concurrency.
Open Storage (Servers + Storage + Open Solaris) :
built on OpenSolaris
performance of ZFS and SSDs
cost efficiency of volume hardware
scale easily
millions of files
gigabytes / sec
management simplicity
analytics with dTrace
diagnostics with FMA
Why Applications Don't Perform:
Waiting for DATA
Future: Enterprise SSD:
up to 5,000 - 8,000 write IOPS
up to 30,000 to 40,000 read IOPS
32 GB
$ per IOPS $0.08 compared to $2.43 traditionally.
New generation of flash is quite reliable and has no moving parts. Power consumption of SSDs is 2 watts compared to 13 watts for traditional HDD.
ZFS Hybrid Storage Pool Model:
High Performance Read and Write Cache Pool
ZFS combines main memory and SSDs for read caching
Bernard Kerr, the designer for Delicious, talks about threads, tags and visualizations at Innovations Forum. It's a very interesting presentation covering, among other things, IBM's Remail, thread arcs, and Tag Orbitals.
Some lessons he wants the audience members to take home: - When you are working on visualizing multidimensional data, you cannot use all dimensions, instead you must focus on one or two dimensions. If you focus on more than 1 or 2 dimensions, you'll end up with a spaghetti. - Get real data into visualization as soon as you can. - Try illustrator for quickly visualizing.
Animoto is experiencing You Tube and Facebook style growth. The difference is that Animoto actually has a proper business model!
Move over, Slide and Rock You! here comes Animoto with a totally kick ass slide show experience. They take your photos, your music and do magic to create slide shows that "feel" the music.
To sign up and save $5, click on the graphic above with code: oxqeiaug
Animoto, which is currently experiencing explosive growth and went from 25,000 users to 250,000 users in just three days, runs on Amazon EC2 and uses S3 for storage.
A demo of Animoto produced slide show is below:
At Startup School 2008, Jeff Bezos, Amazon CEO, talked about Animoto. The video of his talk follows:
According to Right Scale, which Animoto uses for handling their EC2 instances, mentions that Animoto has been adding 40 EC2 instances per minute during their peak time.
The upshot is that there are a lot of moving parts! Each one of the subsystems consists of many servers and everything needs to scale-up as the load increases. What Animoto CTO Stevie Clifton did really well is to connect all the operations using queues, many of them in SQS. One queue contains work items that list photo URLs to fetch from other sites, such as Facebook, Flickr, etc., and that is processed by one array of worker instances. Another queue has the list of render jobs and each work item in there points to the set of photos sitting at the ready in S3 and at the music files also on S3. All of these queues are held in Amazon SQS and the arrays of worker instances are managed by RightScale. This allows the monitoring part of our service to detect when the queue gets too large and more instances need to be launched. What’s nice about using queues is that it decouples the various parts of the site, so if the renderers get backlogged the queue simply builds up and users have to wait a little longer for their video to be produced. Waiting is not good, but dropping requests on the floor is much worse!
Right Scale blog also touches on some of the important lessons learned from Animoto's growth:
First of all, when you scale 10x and then 10x again to run on thousands of servers every little problem turns into a large one. That insignificant error rate of 0.1% gets multiplied by 1000x per second and you end up with an error a second, and actually, the error rate typically increases in itself too because of the added load on the system. So suddenly it’s not something you can ignore anymore. An example for this was having exponential backoff for uploads to S3 when using curl, but forgetting that the 5th retry exceeds the S3 connection timeout. Normally, this happens only once in a blue moon, but when tens of uploader instances are banging hard on one S3 bucket the S3 error rate goes up a bit and suddenly uploads are failing left and right. Once we changed this to a constant retry timeout it all went smooth again.
If you are passionate about calculus, you should watch "I will Derive." Although I should warn you that this is one of the dorkiest dances you'll see. Enjoy!
Part of my responsibility at my current job is to lead the database team that manages the largest and most active MySQL on Solaris 10/Sun hardware deployment in the world. We have been using Sun hardware to deploy MySQL since the time when most MySQL database experts used to frown on this combination.
Of course, Sun's acquisition of MySQL made me really, really happy.
I was recently interviewed by Sun Microsystems about our Sun and MySQL usage and what do I find so interesting about Sun's UltraSparc T2 Niagara 2 Processors. At my work, I am in the process of replacing most of our database servers with CMT (chip multi-threading) enabled Niagara 2 servers. The video of my discussion is embedded below. Enjoy.
This year, I was part of a keynote panel, Scaling Up or Out, at the MySQL Conference and Expo along with Jeff Rothschild of Facebook/Accel Partners and Paul Tuckfield of YouTube. The event was attended by more than 2,000 attendees.
Keynote Panelists from left to right: Monty Taylor (MySQL),Matt Ingenthron (Sun), John Allspaw (Flickr), me, Domas Mituzas (MySQL/Wikipedia), Jeff Rothschild (Facebook) and Paul Tuckfield (YouTube)
Sheeri took a video of the keynote session that can be viewed below.
Video of Scaling Up or Out keynote session: -
To read more about this scalability keynote, view pics and to watch more videos, see my Scaling Up Or Out Keynote post.
Reid Hoffman, Chairman and President, Products at LinkedIn, talks in this business and marketing keynote, presented at Graphing Social Patterns 2007, about the Business of Facebook Applications.
He addresses five themes:
Social Networks and Platforms What makes social networks, platforms? What sets Web 2.0 social networks apart? Is Social Networking a feature, an application, or a platform? He talks about Friendster being referred to as Match.com but with a friends list added on top of it and for that reason it was considered a feature. He credits MySpace allowing the ability to "hack in 'widgets'" as the start of social network as a platform. He clarifies that according to his understanding it wasn't a 'deliberate design decision' by MySpace but rather an 'artifact of their platform' that didn't turn off or filter javascripts. This by accident lead to a 'robust ecosystem' through which users were including rich media etc. Ning then created a 'different conception of social networking as a platform,' allowing users to build any kind of social network. Facebook then 'launched the first platform on a large social graph.' Facebook applications could then rely upon the social graph in order to build.
He believes that social networks are platforms and has invested in several different social networks.
What makes a social network interesting? What creates a robust and interesting environment? "The key thing is that a social network takes patterns of important relationships we have in real life and then pus them on the web in ways that empowers important applications." People still care more about their offline world so if you can "import the relationships that matter to people here and make them available to [either] generate thin or light app yourselves which most of these networks do or provide them as a platform for other people, it actually enables applications that can really change people's lives"
The key elements 'from a sociological perspective' of Facebook's platforms are "extending functions of profiles, communications and messaging and the newsfeed." He goes on to say that Facebook's newsfeed is an example of many to many messaging. Then, you can 'integrate general web applications with data, relationships and communications'
He then goes on to contrast and compare several social networks including Facebook, MySpace and Ning. Facebook offered developers a massive social graph that allows them to acquire customers, leverage key relationships and leverage existing communication scheme. With Ning, although you can build your own social network feature wise, you are also left to build your own user base. For developers the key thing was that if they built something Facebook could get them in front of millions of potential customers. An example of leveraging existing communication scheme is that most Facebook apps have spread through messaging or emails or through newsfeed.
The remaining four themes of the video:
- Social Networks and Professional Networks - One graph to rule them all? - Facebook Platform: some of the opportunities enabled - What does the social platform mean for the evolution of the web?
Ever since seeing Davey's videos on Vimeo, I've become a fan. Today Sox, my friend who now works at Vimeo passed a link to Davey's NYC MTA dance. You gotta see it to get a sense of how fun it seems.