mashraqi

> operations

[ This is my personal blog so all opinions expressed here are mine. I am a product, scalability, operations and monetization advisor and currently employed as Director of Business Operations & Technical Strategy for a top 50 website that delivers billions of page views per month. I was a keynote panelist for Scaling Up or Out keynote at MySQL Conference and speak regularly at conferences and user groups. ]
Farhan "Frank" Mashraqi

Saturday, July 05, 2008

Actionable Logging for Smoother Operation and Faster Recovery

Notes from Velocity.

Next session is Actionable Logging for Smoother Operation and Faster Recovery by Mandi Walls (AOL).

Actionable logging:
  • no nonsense logging
  • concise, easy to understand
  • express symptoms of production issues
  • anything that makes the log needs to be fixed
Why it's important?
  • expending resources on production systems
  • the point of logging in production
  • diagnosis of issues
  • the 4am test
Logging goals:
  • diagnosis and recovery
  • statistics and monitoring
  • provide insight into the behavior of the application
  • indicate potential issues, and areas for improvement
  • not the same goals as development and QA environments.
Types of logs:
  • access log
  • server log, i.e. catalina.out
  • application logs
  • special use logs for recording specific groups of activities
Log file location:
  • where logs are located on the system should be predictable and obvious
  • it may be helpful to locate logs on different disk partitions but link them back to the app.
  • keep older logs in an obvious place. Having a couple days worth of logs helps you get convergence on the problem.
Log file management:
  • everyone has their own method
  • roll logs into files with timestamps
    • host-01.log.003 vs host-01.log.06202008
  • roll all the logs at the same time for a given app to make coordination of events easier
  • roll when teh app needs logs rolled: hourly, daily weekly.
Logging quality information
  • logs should be expressive but not overly verbose
  • keys to making logs more actionable:
    • appropriate formats
Quality information: format
  • Timestamping: what not to do:
    • 1213988938:tvdata shows/617/306
    • 1213988939:tvdata shows/618/307
  • Timestamps that mean something:
    • Jun 19,2009 4:20:25PM
  • Good timestamps give context for linking to external events like network outages or traffic anomalies
Other considerations in log file format include:
  • creating a common format for multiple products and log tpes.
  • limiting the number of log entries that write to multiple log lines for faster parsing
  • deciding how much is too much information
Quality info:
  • bad: messages with only numerical error codes in them
  • bad: don't put debugging messages in production logs.
  • bad: if log message doesn't give you anything to lead on.
  • bad: multiple line error messages.
  • good: module_id with descriptive message
Making messages useful:
  • don't list non fatal messages as fatal
  • incorrect / misleading severity
  • not logging anything for fatal errors
Improving log messages:
  • log at the first point an error is encountered - don't log a timeout to a backend as a parse error of date expected from the request
  • messages include the method name and key variables to speed up fixes
  • suppress anything in the log that isn't actionable
log messages convergence:
  • actively managing, parsing, pruning logs make new errors more obvious
  • check teh logs afer every install for new messages
things to avoid in logs:
  • usernames, passwords, database logins
    • provide crib notes for anone getting in your system.
how much is too much:
  • more than 25% of the number of access log entries is a hindrance. Even 10% may be too much in most environments
  • if you log more in server log than access log, then its a problem
conclusion
  • the log is first line of information when a problem occurs
  • a production log should be focused on providing information to operations staff, not developers.
  • when, where and how messages are logged can help or hinder recovery
Also see:

Labels: , , , , ,

Importance of Operations and Performance

Notes from Velocity Conference continue:
Next up is a Keynote by Artur Bergman (Wikia). Wikia runs 7000 Wikis and has 400 million page views per month.
  • Google, Yahoo and Amazon are what people rely on
  • Friendster.com, Twitter and boo.com have serious reliability problem



Value of performance/ reliability
  • brand value (they rely on you)
  • more page views (fixed amount of time + faster site)
Match user expectations:
  • World of Warcraft:
    • $520 million in profit last year
    • 99% reliable
    • down every week, scheduled
    • server crash
"We pay them money, so we have to accept the downtime."

Operations:
  • efficient use of resources
  • end user performance
  • reliability
  • bad operations wastes R&D and cost of sale money
Business
  • cost per pageview. How many actually know this?
  • cost per page
  • revenue CPM - cost per pageview
  • Gross margin
Reliability example:
  • 20% of wikia pages
  • 200ms -> 15s to load
  • 35% reduction of page views out of the slow pages
  • 15% reduction of the fast pages
  • slow pages made people abandon the site.
Happy users - Lower cost
  • for the users: service industry
  • for the business: cost per page view, lower capex.
  • VC love to give money to IBM, HP, Dell
  • better for the environment.
May performance project:
  • 50% cpu usage cost
  • delay investment 6 months
  • 3 engineers - 4 weeks
Cache miss
  • 300ms -> 190 ms
Perception
  • Ads
  • Ads are slow
  • Load ads after content load
Dramatic change:
  • significant % increase in pageviews
  • We lose money
    • but edits increase
  • Stay in loose and simple area, stay away from the complexity.
Yet another great presentation at Velocity!

Labels: , , , , ,

Friday, February 29, 2008

The Strategy Chronicles #1

This is the first edition of a new series of blog posts that I will try to write regularly (no hardcore promises though). I will be calling it, for the lack of better imagination and because it's 3:09 in the morning, The Strategy Chronicles, or TSC. The primary purpose of this series is to help me keep track of what's happening each week. So whether you liked it or hated it, please let me know. So, let's get started.

Microsoft makes Yahoo! investors very unhappy: At least some of the Yahoo! investors are now worried about an increased Microsoft bid to acquire Yahoo!. The primary reason being that there are several Yahoo! investors who are also Microsoft investors. Separately, several share holders are blasting Yahoo! for rejecting Microsoft's bid.

Yahoo! losing another ground: Opera has now decided to dump Yahoo! in favor of Google for its mobile browser.
- "every month, Opera Mini users browse more than 1.7 billion pages"
- most of the page views were related to search. Yahoo! has a serious relevancy problem in addition to the problem of effectively monetizing its search traffic. I can see how easy it was for Opera decision makers to ditch Yahoo! in favor of Google.
- Google has been Opera's choice for desktop browser for 7 years.

Even more troubles for Yahoo!: Remember the time when Yahoo! handed over the information about its Chinese users to the authorities? Well, that issue continues to haunt Yahoo! as yet another lawsuit was filed against the company by its Chinese dissidents. Yahoo!'s submitted evidence lead to one year imprisonment of a plaintiff.

Finally!: Meanwhile, Google has finally re-launched JotSpot as Google Sites. Exactly how that will hurt Microsoft's similar initiatives is yet to be seen, however the outlook according to the media, doesn't look so promising as far as Microsoft is concerned.

Clean console!: Consumerist reports about an incident where a passionate Xbox user lost his beloved collection of autographs and custom artwork drawn on the console by prominent members of the gaming community. Perhaps, the Microsoft employee responsible for this thought the collectible autographs were making the Xbox unit look dirty.

Bill Gates now LinkedIn!: Facebook's loss is LinkedIn's gain. Bill Gates recently quit Facebook despite Microsoft's recent investment in the social networking powerhouse, and decided to start a profile on LinkedIn. Once his profile was created, Gates then asked a question that received more than 1,000 answers. Interestingly enough the same day I noticed the featured question, Facebook changed their site's theme.

Sliding the Social way!: Slide, the media and gadget distribution powerhouse announced recently that it will create new applications to for MySpace users using MySpace Developer Platform. This could increase Slide.com's profile and reach significantly. Earlier this year, Slide announced that it raised $50 million in its latest funding round.

Labels: , , , , , ,

Friday, October 05, 2007

Beef recall: Single recall kills 67 year old Topps Meat

How disastrous one recall in your 67 year industry can be? If I say, "potentially fatal," I won't be exaggerating. After all, that's exactly what has happened to Topps Meat, the largest supplier of frozen ground beef products in the US. They had one recall, just one, and now it has been announced that Topps Meat is going to be closing down. The beef recall consisted of 21.7 million pounds of ground beef products. The recalled hamburger patties were believed to be contaminated with fatal E. coli bacteria strain O157:H7. According to CDC website:
E. coli O157:H7 is one of hundreds of strains of the bacterium Escherichia coli. Although most strains are harmless, this strain produces a powerful toxin that can cause severe illness. E. coli O157:H7 has been found in the intestines of healthy cattle, deer, goats, and sheep.

E. coli O157:H7 was first recognized as a cause of illness in 1982 during an outbreak of severe bloody diarrhea; the outbreak was traced to contaminated hamburgers. Since then, more infections in the United States have been caused by eating undercooked ground beef than by any other food.


The company issued the following statement regarding the beef recall

“This is tragic for all concerned,” said Anthony D’Urso, Chief Operating Officer. “In one week we have gone from the largest U.S. manufacturer of frozen hamburgers to a
company that cannot overcome the economic reality of a recall this large. We sincerely regret the impact this will have on our employees, our customers and suppliers, and the community. Most of all, we regret that our products have been linked by public health agencies to recently reported illnesses. We hope and pray for the full recovery of those individuals.”...

As the Company assesses and addresses the impact of the closing on all affected parties, individuals with business-related questions regarding Topps Meat Company can call 888-734-0451 or email info@toppsmeat.com. Employees and government inquiries can call (888) 240-2734.


Information Sources:
- Topps Meat to close after beef recall

Labels: , ,

  • View Farhan 'Frank' Mashraqi's profile on LinkedIn
  • Structure 08
  • Graphing Social Patterns - East 2008
  • Velocity Conference
    follow me on Twitter

    © 2006 The Mashraqi's.