mashraqi

+1.408.FRANKMASH (408.372-6562)
> mysql

[ This is my personal blog so all opinions expressed here are mine. I am a product, scalability, operations and monetization advisor and currently employed as Director of Business Operations & Technical Strategy for a top 50 website that delivers billions of page views per month. I was a keynote panelist for Scaling Up or Out keynote at MySQL Conference and speak regularly at conferences and user groups. ]
Farhan "Frank" Mashraqi

Tuesday, April 21, 2009

MySQL at Google by Mark Callaghan

Mark Callaghan is taking the stage to present his Key Note at the MySQL Conference and Expo, “This is not a web app: The continuing evolution of MySQL at Google.”

I am going to take notes as fast as I can. Excuse any typos etc.

Mark worked on DBMS internals at Informix and then at Oracle. He worked on embedding BerkeleyDB at a startup. He joined Google in August, 2005. At Google his team is working to enhance MySQL and to support a large production deployment. He blogs at mysqlha.blogspot.com and has helped publish Google patch for MySQL at code.google.com. In addition, he agitates for MySQL.

What is MySQL at Google?

He will give details but some numbers he won't give. It is a large MySQL deployment. The QPS rate is tremendous. The number of machines they use is reasonably large. MySQL is used in a large, important enterprise deployment. They run many commodity machines. Google depends on replication, InnoDB and stability. MySQL is sharded with many replicas per shard. At Google, database service must always be available. They have been successful with it and happy with the results.

The database itself is providing change management. If you just push changes, you are more than likely to have a debugging nightmare.

A number of replicas can be connected to MySQL without crashing the master. You'll be surprised at how many replicas can be deployed.

MySQL is solid and easy to improve. InnoDB from Heikki Tuuri and company is amazing. Inspiration provided by Yasufumi Kinoshita and Percona. InnoDB is the most beautiful database software Mark has worked on and he has worked for a few database companies.

Prehistory
  • MyISAM and another RDBMS were used.
  • Not much is known about that time.
  • Search for "xooglers mysql"
Recorded history begins
- MySQL 4.0 and Innodb arrive

Consistency matters most. When chosing between consistency and availability, you want to be consistent. You shouldn't have two servers claiming to be masters. Generally, the full schema is understood by few people.

Audit is a big concern. Who is doing what change? Legacy is a another concern.

Control is an issue. You have to show you can control access to the database.

Finally, the focus is on transactions, they don't want to lose any data.

Data quality is important to Google.

How do we build this?
A bad build ruins everything. He inherited a dedicated build machine. They moved to hermetic builds and cross-compilation fun and eventually learned to love autoconf.

How de we test this?
MySQL has a suite of regression tests but they are easy to pass. They have queries running in production, how can they use those? They sample queries in production using a Python script and then replay them to simulate sample production workloads.

They built stress tests generally around replication. If you kill a slave, it can come back and start from where it left off.

Use valgrind

Eventually they realized that MySQL has valgrind and started using it.

They also discovered the value of compiler warnings.

How do we deploy this?
Simple approach is put it out there and hope for the best. Search of error log files is automated. On a daily basis, crashes are categorized. Machines are removed automatically removed from service. Finally, they have automated replacement of machines.

How do we monitor this?
He has a feature request: SHOW USER STATISTICS.

They archive SHOW PROCESSLIST and SHOW STATUS. Add SHOW USER_STATS and SHOW TABLE_STATS. It's amazing what you can do with awk and bash. They prefer to take a top-down approach for monitoring. They generate daily and weekly load reports, including QPS. QPS on critical servers was going 2x per year. After deploying a better monitoring tool, they determined it was queries that weren't really crucial for those servers.

How do we improve this?
Understand your problems and deploy what you build. If you are just building and not deploying, you are not going to learn the tradeoffs. Also, monitor to learn what the problems are.

Replication features added At a high level, they are slowing moving towards self healing.

At a low level, somewhat crash-safe slaves. They use mirror binlog which keeps a copy of master binlog on slave. Other fewatures include semi-sync replication, binlog event checksums and global transaction IDs. They are currently in the process of having fully crash-safe slaves.

They have monotonically increasing global transaction IDs.

Performance features added

high level
  • run (much) faster on multi-core and multi-disk servers

low level
  • faster read-write mutex for InnODB
  • less contention for InnoDB mutexes
  • faster IO for InnoDB
  • faster parser,
  • integration with tcmalloc
  • backport connection pool from MySQL 6 to 5.0.
Manageability features added

high level
  • make it easier to support MySQL
  • a UI is nice, but never enough. they focus on command-line more.
low level
  • SHOW USER_STATS, SHOW TABLE_STATS
  • archive and query SHOW PROCESSLIST samples
  • roles for authentication and access control
  • tools for monitoring servers.
Other features added
  • row-change logging
  • online data drift checking (based on Maatkit but with more features)
  • Flapulator (to look for crashes)
  • Query tester (runs queries in parallel and compares the results to ensure new binary is getting same results as old binary).

Row-change logging
  • keep MySQL and something else in sync
  • similar to row-based replication
  • fast parallel queries
  • change noditifaction is possible
  • full audit log.
  • allows for real time replication
You can do batch extraction but then your data is stale.

Online data drift:
- how do you compare continuously updated tables?
- technique is similar to mk-table-checksum
- deployment is more complicated

More to life than software development
Engineers at Google
  • SRE: Site reliability engineer
  • SWE: Software engineer
  • SRE focus on running services in production.
MySQL has a new set of problems for SREs

Production crises
  • OOM v1
  • unstable kernel
  • yes we have no core files
  • we don't need no stack traces
  • underpowered slaves
  • database load growth
  • query storms
  • corrupt relay log events
  • failover
  • long lock waits
  • OOM v2
open problems

new features
  • online schema change
  • failover automation
  • frameworks to manage large deployments
  • synchronous replication
  • minimize replication delay on a loaded server

new hardware
  • performance on multi-core and many-core servers
  • performance with SSD
  • performance with too much RAM.
Soap box:
  • run a server in production
  • work at a large systems company
  • learn about a DBMS other than MySQL

out of time. :)

Thank you for sharing, Mark! This was an informative session (of course, it would be great to actually get some numbers but still ...)

Labels: , , ,

Tuesday, March 17, 2009

Cloud Computing and Community One East

Community One East is happening this week and I for sure will be attending. The event is taking place at Marriott Marquis Hotel, New York, NY. I am especially looking forward to the announcements tomorrow which sound very interesting :). Unfortunately, I can't go into details about what Sun Microsystems is announcing.

The first day of Community One is a free event featuring. The second day of the event is focused on Deep Dives with two half-day sessions on MySQL and two full-day sessions on Java and Web development. I will be attending the session, "Using Java EE and SOA to Architect and Design Robust Enterprise Applications."

Following the conference, I will be a panelist at a Cloud Computing Panel, "How Cloud Computing Affects Small Business," at Microsoft office in NY.

After the cloud computing panel, I must rush home to attend a conference call. It's going to be a long but exciting day!

It will be great to catch up with old and new friends at the event. I will also be Twitter-ing the event on my Twitter account.

Labels: , , , , , ,

Monday, March 02, 2009

Cloud Computing Seminar with Werner Vogels and Marten Mickos

Amazon, RightScale and MySQL (Sun Microsystems) will be hosting the Executive Seminar on Cloud Computing at NASDAQ MarketSite (NY) tomorrow.

Since I have been busy developing our new startup on EC2, it would be great to catch up with Mårten Mickos and Dr. Werner Vogels whom I originally met at Structure 08.

Cloud Computing is probably the most abused buzz word, but if anyone rightly deserves to be in the cloud computing space, it's Amazon.

Speakers at the seminar include Dr. Werner Vogels (CTO Amazon), Mårten Mickos (ex-CEO of MySQL) and Michael Crandell (CEO RightScale). I am hoping to catch up with Mårten Mickos during the event. He is a great guy and probably the most favorite Open Source CEO.

I find it funny that the event site still shows Mårten's title as "SVP of Sun Microsystems’ Database Group," even though Marten resigned in February.

8:30am Registration and breakfast

9:00am Ahead in the Cloud — The Power of Infrastructure as a Service
Dr. Werner Vogels, CTO, Amazon.com
Werner Vogels will reveal the efficiency principles behind Amazon’s drive to develop Infrastructure as a Service, and discuss why the current economic climate is rapidly changing the IT perspective on cloud computing.

10:00am Growing Your Business in the Cloud with Open Source
Marten Mickos, SVP Database Group, Sun Microsystems
Marten Mickos will discuss why fast-growing businesses are taking advantage of the combined benefits of open source and cloud computing to accelerate the delivery of new applications, reducing overall risk and scaling infrastructure consumption up and down to meet demand.

10:45am Coffee Break

11:00am Smart Enterprises Moving to the Cloud
Michael Crandell, CEO, RightScale
Michael Crandell will talk about how enterprises are experimenting with the cloud today. You’ll learn methods for deploying applications to the cloud, and be guided through Right Scale’s best practices for designing, configuring, deploying and managing the lifecycle of multi-server cloud deployments.

11:45am Cloud Computing and the 2008 Beijing Olympics
Lotta Latsuo, COO, Starcut Ltd.
Lotta Latsuo will discuss how Starcut Ltd., a professional services firm that specializes in digital media and marketing, adopted cloud technology to build the NBC Olympics Mobile website for NBC’s coverage of the 2008 Beijing Olympics from Beijing.

12:15pm Closing remarks

In case you're attending and spot me, please say hi!

Labels: , , , , , , ,

Wednesday, July 09, 2008

Memcached for MySQL: Advanced Use Cases Webinar

My second webinar, Memcached for MySQL: Advanced Use Cases Webinar, that I presented for Sun/MySQL on July 3rd had more than 526 registrants (despite the fact that many people were on vacation at that time).

A big thanks to everyone who attended. The recording of webinar (with sound) is now available on MySQL website. Also thanks to Jimmy Guerrero, Rich Taylor, Alex Roedling, Edwin DeSouza and Sun/MySQL for inviting me for the second time to give this webinar.

If you registered previously for the memcached webinar, you can simply login to access the Webex recording. Otherwise, you will have to register for the on-demand webinar.

After watching/listening to the webinar, in case you have any questions or suggestions for my next webinar, please feel free to email them to me at fmashraqi [at] yahoo dot com.

The description of the webinar follows:

Memcached for MySQL: Advanced Use Cases

Join us for this in depth technical webinar where memcached guru Frank Mashraqi of Fotolog will demonstrate several use cases on how to leverage memcached to increase the performance and scalability of MySQL driven web sites and applications. Memcached is the open source distributed memory caching system used by some of the biggest websites in the world like, YouTube, Facebook, LiveJournal and Wikipedia. Use cases explored include: non-deterministic caches, deterministic caches, a replacement/add-on for file system caches and more. We will also provide an overview of memcached production support available with a MySQL Enterprise subscription. The presentation will conclude with a question and answer period where you can "ask the expert" about the technical details of memcached. Attendees will also receive a technical white paper.

WHO:
  • Farhan "Frank" Mashraqi, Director of Business Operations and Technical Strategy - Fotolog Inc
  • Jimmy Guerrero, Sr Product Marketing Manager, Sun Microsystems – Database Group
WHAT:
  • Memcached for MySQL: Advanced Use Cases web presentation.
WHERE:
Also see:

Labels: , , ,

Thursday, July 03, 2008

Memcached Webinar: Advanced Use Cases

Today at 1PM EST I will be presenting the second part of Memcached Webinar for Sun/MySQL. Like the first memcached webinar, this will focus on memcached use cases but it will be more technical.

Labels: , ,

Saturday, May 31, 2008

Twitter's Meager Architecture?

TwitterMichael Arrington of TechCrunch demands answers to some tough questions from Twitter. He potentially exposes Twitter's architecture implying that Twitter may be running on a grand total of three MySQL machines.

Do you really have a grand total of three physical database machines that are POWERING ALL OF TWITTER?
For those out of the loop, Twitter has lately been experiencing some serious disruptions in service.

Labels: , , ,

Thursday, May 29, 2008

Slides from my Memcached Webinar - 560+ registrants

Today, I presented my memcached webinar which was hosted by MySQL/Sun. There were more than 560+ users who registered for the webinar. The topic of the webinar was Designing and Implementing Scalable Applications with Memcached and MySQL.


The recording and slides from the webinar are now available for download:
The call volume was quite high. I had to wait about 7-8 minutes before I could join as a panelist. I don't know why WebEx started using a live operator to verify users instead of using an automated system. That created quite a bottleneck and several users had to wait before they got connected. This is not something I expected from Webex. My apologies to all users who experienced connectivity issues. Thanks to everyone who attended and sent feedback.

Overall, the webinar went great. Monty Taylor, my fellow panelist did a great job. He, however, had a mishap and his line dropped. Jimmy Guerrero then picked from that point on. Upon reconnecting Monty generously answered questions of the attendees. Many thanks, Monty and Jimmy!

More details on my MySQL blog.

Related links: memcached, mysql, sun.

Labels: , , , ,

Wednesday, May 21, 2008

Interviewed by Sun - Running 13th largest site in the world on MySQL and Sun hardware

Part of my responsibility at my current job is to lead the database team that manages the largest and most active MySQL on Solaris 10/Sun hardware deployment in the world. We have been using Sun hardware to deploy MySQL since the time when most MySQL database experts used to frown on this combination.

Of course, Sun's acquisition of MySQL made me really, really happy.

I was recently interviewed by Sun Microsystems about our Sun and MySQL usage and what do I find so interesting about Sun's UltraSparc T2 Niagara 2 Processors. At my work, I am in the process of replacing most of our database servers with CMT (chip multi-threading) enabled Niagara 2 servers. The video of my discussion is embedded below. Enjoy.

Labels: , , , , , ,

  • View Farhan 'Frank' Mashraqi's profile on LinkedIn
  • Structure 08
  • Graphing Social Patterns - East 2008
  • Velocity Conference
    follow me on Twitter

    © 2006 The Mashraqi's.