mashraqi

+1.408.FRANKMASH (408.372-6562)
> scalability

[ This is my personal blog so all opinions expressed here are mine. I am a product, scalability, operations and monetization advisor and currently employed as Director of Business Operations & Technical Strategy for a top 50 website that delivers billions of page views per month. I was a keynote panelist for Scaling Up or Out keynote at MySQL Conference and speak regularly at conferences and user groups. ]
Farhan "Frank" Mashraqi

Tuesday, September 23, 2008

Startonomics: Startup Scalability Strategies

12 Tips for Building a Scalable Startup

Since scalability is considered a non-functional requirement, it is often overlooked in the hopes of decreasing time to market. Adding scalability down the road can decrease the time to market but only after assuming significant technical debt.

Balancing performance and scalability vs. fast iteration and cost efficiency can be a significant challenge for startups. The good news is that achieving this balance is not impossible.
Here are a few tips to help you build a scalable startup.

1. Scalable startups start with the right foundation:
A scalable foundation means that as you experience growth, you can throw hardware at the problem in a cost-efficient way giving you the most bang for your buck. Scaling up can become very expensive very quickly.

2. Scalable startups choose the right language and platform:
Choosing the right development language is critical to building a scalable startup. If you choose a platform that gets the product out the door quickly but is inherently un-scalable, you have made the wrong bet.

3. Scalable startups invest in the right talent:
Also important is being able to secure the right talent. This goes hand in hand with choosing the right language. Choosing a bleeding edge platform that’s more developer friendly than product friendly can create difficulties down the road when you need to find talent to help scale it.
Hiring or seeking advice from an architect in the early days can help save significantly down the road.

4. Scalable startups worry about the right things:
When building your startup, you should worry more about the foundation than sweating over performance increases.

5. Scalable startups stay away from building synchronous coupling:
The main problem with synchronous coupling is that it makes users wait and ties up system resources that would have otherwise been released. In addition, synchronous operations require infrastructure to be scaled for peak load.

6. Scalable startups worry more about constant response time than latency:
Low latency is important but constant response time is even more important. The latter can be achieved by building a foundation that allows for horizontal growth and distribution of load.

7. Scalable startups measure utilization first and then performance
Measuring utilization of resources is crucial as it can indicate stress on a system and to what extent resources are being utilized. Measuring performance only helps in being able to speed up response time and is not a qualified measure of ability to handle additional growth.

8. Scalable startups go stateless
Maintaining state is expensive. While it is possible to build an infrastructure that can maintain state using a shared system, it is best for scalable startups to build stateless applications.

9. Scalable startups virtualize/abstract everything
Abstracting development modules helps keep things manageable. Among other things it allows developers with various skill sets to work on their own layers. For instance, by abstracting the layer that implements and manages database partitioning, you can hide the “ugly” partitioning (sharding) details from the application. This makes it easier to migrate and move logical shards within physical shards.

10. Scalable startups build using APIs
APIs are an essential part of building startups as they allow for growth opportunities. Startups that are build using APIs find it easier to scale development, deployment and open up their service.

11. Scalable startups know which shift they belong to:
Within the world of scalability, you are either a redshift startup or a blueshift startup. Redshift startups grow faster than Moore’s Law and eventually need to be scaled out. The absolute number of hardware components for redshift startups will continue to increase. Blueshift startups grow at the rate of Moore’s Law or slower and as a result, the infrastructure required for them will continue to curtail. Building an infrastructure that can scale out is suitable and highly recommended for redshift startups.

12. Scalable startups cache everything effectively
One of my favorite quotes is “the best IO is no IO,” meaning essentially that the best way to serve data is when you don’t have to hit the disk. This can be achieved easily by effectively caching everything that can benefit from it.

Labels: , , ,

Thursday, September 18, 2008

Scaling Synchronous Web Apps: Lessons learned from Meebo

Sandy Jen, co-founder of Meebo, is on the stage talking about Scaling Synchronous Web Apps at Web 2.0 Expo. There are about 40 million unique people using it and they are around 50 people or so.

The presentation is about tips and practices they have learned about how to scale now vs in 3 years. Scalability is very subjective.

the "hole"
- multi-platform (lots of browsers)
- spotty network connections
- there are only 2 simultaneous open HTTP requests allowed (for now)
- page views: Meebo uses 1 page so it is really hard to measure unless you do it on your own. On average a person stays on Meebo for two hours.
- static content
- no downloads

the "peg": the thing that you're trying to shove into the hole.
  • instantaneous data transfer: How do you deal with that?
  • long polling: browsers suck up memory
  • making the browser do work: Javascript requires the browser to do something. Trying to balance what you make the browser do and what you let the server do is really important.
  • seamless user experience: you're not going to use a product you don't like. Overall that's the thing you want to strive for.
What is Synchronous?: What part of your app is synchronous and what can you get away with NOT being synchronous? The more you try to dump into the synchronous category, the more issues you are going to be faced with.

She is more involved with the server side so the talk will be server focused.

Don't underestimate the server side architecture! In the start you won't know where the bottlenecks will be.

The type of application you are building determines the type of synchronous scaling you are going to need.

Peg helpers:
  • long polling (COMET): to keep connections open with Web servers
  • web servers: using the right technology. Meebo started with Apache which turned out to be not very good. They tried lighttpd and it turned out to be a great help as lighttpd is single thread and event based
  • compiled vs. interpreted: Meebo is written in C/C++. It's a tradeoff. hiring is difficult for them
  • databases: can be really expensive or cheap. They use MySQL. Their schemas are simple.
  • memcache:
  • load balancers: really expensive and you have to buy them in pairs.
Simple is usually better (unless you can dish out a lot of $$$). Ask yourself whether data is cacheable? Can you use DNS round robin instead of load balancers? Meebo started out using www1, www2 etc but then there were user headaches when the bookmarked URL will go down.

Can you use FastCGI vs. web modules vs. PHP? They ended up writing a module directly into the server.

Do you need to save state? is it persistent? Meebo didn't have user accounts for a long time. Launching feature light is not a bad thing.

Tug of war: There is always a delicate balance between what you can push to the browser and what you can let the server handle. Ask yourself where does the workload make sense? Browsers can be SLOW (just because you can do it in JavaScript, doesn't mean you should). Meebo users mostly use IE so it's optimized for that. Your users will sadly know more about your product than you do because they use it more. Also don't ignore the importance of efficiency with data transfer.

Good enough vs perfect: Perfection is enough simplicity in the system to allow for adaptation. Release and iterate often.

Think ahead but not too much: You shouldn't over-design for the unknown as over-designed code can be worse than hacky code. It makes it difficult to roll back an entire design. Think about building horizontally and see if you can throw more servers at the problem initially? Adding servers is expensive but has a shorter lead time, architecture takes longer but is "free" (development time is not free). Think about striking the right balance. You won't know where your bottlenecks are until you let it use.

Nothing simulates real life: You're probably not the end user. Contingency plans are key. Have some tricks that you haven't used yet. You should have a few tricks up your sleeve. In the end you'll always miss something, but that's ok.

Don't build flood gates, but instead build dams so you will be able to enable/disable components or change operating parameters during runtime. This includes front-end and back-end. When you roll out users, let them know that it may be buggy and you may need to pull it back.

In the end your users will behave in ways you never imagined.

Use your own product: don't be afraid to find bugs. You should overcome the fear of breaking your own product. It helps to keep your finger on the pulse of the community.Trust your users. Use blogs and scale customer service appropriately to stay in touch with users. 70% of meebo's users use IE. Make sacrifice for your users and use what they are using. Go where your users are going.

Being Big Brother: You should be aware of what is going on and monitor key areas: zabbix, nagios, hyperic, ganglia, etc. Eventually you WILL have to build something custom. Don't go overboard on monitoring, you're going to learn to ignore your alerts. You shouldn't ignore what your system tell you (not just hardware) as it is extremely dangerous. Goal of monitoring is basically asking: "is your app healthy?"

Final thoughts: There's no magic solution to scalability. It's very important that you understand the nooks and crannies of your architecture. You should be able to correlate effects to any system changes you make. Always keep the goal in mind, don't lose sight of it. What are you scaling? Remember, everyone scales differently!

Labels: , , , , ,

Wednesday, June 25, 2008

Harnessing Explosive Growth: Infrastructure Strategies and Tactics

The session I am now waiting for is Harnessing Explosive Growth: Infrastructure Strategies and Tactics. The official description of the session is:

What worked in the garage can rarely be scaled effectively for the boardroom. This panel will bring together some of the biggest names in web infrastructure to share their thoughts, insights and tactics for harnessing explosive growth, with a focus that goes beyond simply which technologies are available but how to best deploy them. This panel is not to be missed
Panelists include:
How much of scalability is architecture, and how much is throwing servers at the problem?

SJ: 99% architecture
JH: Product is what drives architecture. We have more than 10,000 servers. For chat, they built a separate network. www.facebook.com/eblog.

JR: The biggest consideration is how their servers work with Facebook.

When's it broken? When did you know your first architecture was broken?
AG: First million users. Then they started focusing on caching and sharding.

JB: eBay had a few catastrophic issues near 2000. People were very forgiving of availability issues. They give refunds when there is downtime.

JR: They moved a lot of their technology to EDGE.

SJ: Since Meebo is just one page, javascript delivery is a major issue for them. They are not generating pages. Dynamic loading and background computation is really important.

RP: Application and infrastructure is going to break. If you look hard enough you can find where are the scalability issues.

JR: They are very metric driven. It's not often they see outages. They have number of Facebook's ops folks on their IM lists and they talk continously.

JH: We work through problems together.

JH: They have had to turn off some applications because other applications were being affected.

JB: At eBay they have created a central application login system. They can flag and identify problems really quickly. If you don't have it, you're shooting in the dark.

Rolling your own stuff? Off-the-shelf vs. custom:
Did you roll your own? Do you regret it?

SJ: For cash restraint startup, off-the-shelf can work. But for scale, you'd want to build yourself. Open Source is awesome. No one can scale your system as well as you can. Off-the-Shelf can be bulky. You have to get your hands dirty.

JR: We built our own caching backend. Invest time in core stuff, anything that's not core, don't focus there.

What also needs to scale as you grow? What non-technology things you had to scale?

JB: Need to scale out your business as well as technology.

AG: Building anti-spam features into the product that are scalable.

JH: Make community part of the process in translations as you grow.

JR; They introduced user moderation for photos. Hard to find what's porn and what's not. It's about a dozen people looking at photos full-time to hunt down porn.

How should we handle the fallout? If you were Twitter what would you have done last month?
JB: You have to be transparent. Tell them what's going on. Setup message boards for communication. You've got to communicate.

AG: Setting realistic timelines is very important

JH: We do it often. We roll in small chunk and if things don't look right, we roll back.

JB: You cannot operate a large system without the ability to turn things on and off.

JR: If you have an aggressive competitor, you don't have the luxury of downtime.

RP: You can't roll out something that can't be rolled back.



Labels: , , , , , , ,

Thursday, May 29, 2008

Slides from my Memcached Webinar - 560+ registrants

Today, I presented my memcached webinar which was hosted by MySQL/Sun. There were more than 560+ users who registered for the webinar. The topic of the webinar was Designing and Implementing Scalable Applications with Memcached and MySQL.


The recording and slides from the webinar are now available for download:
The call volume was quite high. I had to wait about 7-8 minutes before I could join as a panelist. I don't know why WebEx started using a live operator to verify users instead of using an automated system. That created quite a bottleneck and several users had to wait before they got connected. This is not something I expected from Webex. My apologies to all users who experienced connectivity issues. Thanks to everyone who attended and sent feedback.

Overall, the webinar went great. Monty Taylor, my fellow panelist did a great job. He, however, had a mishap and his line dropped. Jimmy Guerrero then picked from that point on. Upon reconnecting Monty generously answered questions of the attendees. Many thanks, Monty and Jimmy!

More details on my MySQL blog.

Related links: memcached, mysql, sun.

Labels: , , , ,

  • View Farhan 'Frank' Mashraqi's profile on LinkedIn
  • Structure 08
  • Graphing Social Patterns - East 2008
  • Velocity Conference
    follow me on Twitter

    © 2006 The Mashraqi's.