Learning Organization

“Problems are not a never-ending plague to be endured but a never-ending guide to improvement” – Steven Spear

Twice as good, Half the effort
In the 1990s Toyota was famous for having twice the capacity with half the workers, half the inventory, and half the resources when compared to the big three US Auto Manufactures. The subject of many case studies, Toyota’s production system was abbreviated TPS, and came to be known by that name.

How did TPS achieve such incredible results? Especially for a company which started out with horrible cars in the 1950s (on hills the cars worked better in reverse). Many of case studies credit TPS for creating a company and a culture that continued to evolve and grow. Toyota saw problems as opportunities, fixed those problems, learned from their mistakes, sharing knowledge throughout the organization.

Where you start isn’t import. How fast you improve compared to the competition is very important.

One of the basic and fundamental keys to learning and continuous improvement is being specific. Consider the following:

Imagine you are an engineer, and you build a new web service. The business requires a service which will handle 3000 RPS at a reasonable cost. 10 virtual machines (VM) would be a reasonable cost so you target 300 RPS per VM for your service.

  • If testing shows 200 RPS per VM that would be bad, something unplanned happed and fixes are required
  • If testing shows 400 RPS per VM something unplanned happened.

At 400 RPS represents additional throughput. Additional throughput over and above the designed capacity would be a surprising outcome. The “extra” RPS had to come from somewhere. Anything surprising is a problem. Each of the following is a learning opportunity.

  • Perhaps there was a mistake in the testing
  • Perhaps we didn’t understand our own design and there is a bug
  • Perhaps we didn’t understand our own design and there is an new efficiency to be shared

Being specific provides the opportunity to be surprised. These surprised create teachable moments. For software we own and design we should have specific high and low tolerances. In addition, we need to have an understanding of what happens when those tolerances are crossed. We need to use the scientific method to create a hypothesis and test it during failures of excess and failures of sufficiency.

Following those guidelines will give us a better understanding of the systems we build and manage. A better understanding will allow us to move at an increasingly faster cadence, manage increasing levels of risk, and execute with progressing higher quality.

Take a look at your software and your processes. Do you have learning processes in place. Are you specific about how your software should behave?

Emphasizing the Small C in MVC

In Web Platforms there are three schools of though. Originally there were two schools.

One camp believed a web page done today was better than two done tomorrow. Let’s call them the Hackers.

The other camp believed in building the foundation, the very plumbing first, and then enabling others to build on top. Let’s call this group the Platformists.

The third group is a recent development. Fundamentally hackers in their beliefs, one hacker will go off and build a platform. Let’s call this last group the Frameworkers. Frameworks are a strange mix the work on a Platformer; however they act as Hackers building for today and not tomorrow.

This is an important introduction, and critical if we want to understand the Model View Controller (MVC) pattern. To better understand MVC take as an example a web application that tells the temperature.
• Model – The objects storing the data, the current temperature in degrees Celsius stored every minute.
• Controller – The web container which renders the page, and converts between Celsius and Fahrenheit
• View – The presentation of the data, the code which generates the HTML given a temperature.

Hackers typically build web platforms out of interpreted languages like PHP and Python. The code is relatively straight forward, and loosely typed to allow maximum flexibility. Hackers rarely if ever consider the MVC pattern. They use a templating library (i.e. Smarty) which expands macros and loops through lists as the view/HTML-Generator. Inside there may be calls to objects or even direct database calls hard coded into the view. There really isn’t a controller. Instead the libraries extend the language to perform controller like tasks at any point in the execution. This is done by storing the context of the HTTP request in memory during the life of the request.

Platformists relish the MVC pattern and tend to follow it to the letter. Platformists typically use languages like C# and Java. The code is broken into discrete chunks with an eye to reuse. The calling card of a Platformist is chained controllers. In this setup, controls call other controller, each of which updates the web context, and returns to the view with single object or method to render. This ends up creating a lot of complexity complete with pluggable objects whose true origins are hard to deduce.

Frameworks, tend to strictly adhere to the MVC pattern. Their hacker mentality keeps the controllers simple as singleton methods which cannot be chained. Framerworks use frameworks like RAILS.

Where do I stand in these camps? I guess I’m a Hacker. I’m not fully in the hacker camp, as I’m not too comfortable with all that free coding. I do like to adhere to some structure. That’s why I’m a big fan of the MV-little-C pattern. Big fat model, simple template view is enough for me. I’ll go even further; I like to keep my views clean using only primitive types, plus arrays and maps. Why no controller?
• Controllers are hard to test – requires emulating the web context, ugh
• Controllers are dumb most of the time they redirect or render
• Controllers are not explicitly called. You need to dig through config to find the right controller.

So what do people put in controllers, and what do you really need a controller for? The following things are put into controllers
• Multi-part form handling and validation
• Access controls
• Routing (pick the right view template)
• Cookie setting
• Localization
• Caching
• URL encoding and URL parameter parsing

My projects typically don’t have multi-part form handling, even for tools. I haven’t done one of these in the last three years.

Access controls, routing, and Caching can and should be handled at a layer above the application, in a single uber or root controller. If you don’t keep this separated out, things get messy fast, and chained controllers take root. So yes it’s a controller, but it’s a controller which is not dumb, and it is always called.

Cookie setting, Localization, and URL parameters can all be handled in the view. Languages have expanded to include facilities to handle these use cases inside the view-template.

There you have it my thoughts on why MVC should become MV-little-C. If we unite perhaps we can form our own tribe. I’ve got the secrete handshake ready.

Speaking at Lean SSC

Very excited. I just found out my paper has been accepted, and I’ll be a speaker at Lean SSC http://lssc11.leanssc.org this May 3-6th in Long Beach California.

Tags: ,

kanban Value Trumps Flow

Interesting presentation on Kanban from David Joyce a development manager at the BBC. The best slide is 25.

  • Value Trumps Flow
  • Flow Trumps Waste
  • Eliminate Waste to Improve

He also states that bottlenecks are a far greater indication of constrained value. Traditionally we’ve all looked at flow. I agree! Time to get rid of local optimums.

http://www.slideshare.net/stephenellliott/kanban-overview-and-experience-report-export

Why We Write Bad Code

I have a personal problem with caching. Caching code has gone horrible wrong so many times, yet caching isn’t bad. In fact, I would say caching is absolutely necessary to scale web applications. It improves both capacity and reduces latency.

Let me recount the ways caching has gone wrong. These aren’t edge cases, each of these has happed more than once with different teams.

  • Locking was so throughly done that the application ended up single threaded
  • The code pathway was so convoluted and stale items might overwrite new-fresh items
  • Failure to account for a thundering herd of requests created a backup of processes all waiting for a key resource
  • Geo-distributed caches weren’t synchronized and different location showed different results

These are severe issues. They have occurred at several different companies with several different teams. Both architects with 15 years of experience and software engineers with 3 years of experience suffered causalities of caching. No one wants to write bad code, so why does this happen.

  • Pushing Our Competency – As professionals it is only natural that we want to push to the edge of our competency. Engineers take on synchronization tasks because they are challenging. This is good trait, with possibly bad consequences.
  • Quality is Hard to Define for New-Complex Systems – Rare and complex tasks get little oversight. Quality processes and conventional wisdom are usually focused on familiar areas.  Some issues like herding are hard to test.  This leads to a lack of feedback or identification of causality in certain areas.
  • Desire for High Velocity - The desire to move quickly can create a tendency to ignore small problems until they become severe problems.

Caching in itself isn’t the problem, rather the spectacular failures related to caching are instructive. If you are an engineer there are three things you can do to be significantly better.

  • Seek Perspective - Talk to other people and try to figure out what are the import edges cases. Then plan to deal with them. In caching an example would be identifying herding/hotspoting as an issue. Make sure your solution isn’t worse than the problem its intended to resolve.
  • Constant Feedback – It isn’t enough to see and solve the big pitfalls. Small problems are often indications we don’t understand the system. There are lots of ways to get feedback (code reviews, unit tests, nightly builds and tests). Engage your efforts early and get feedback frequently.
  • Solicit Vigorous Feedback - Seek feedback from people who will teach the harsh reality of  mistakes. Bad code has consequences. This is the hardest one. It could be called egoless programming. Establishing the cause and effect relationship between choices and outcomes will make you a better engineer.

As a manager there is an important take-away underlying these solutions. Hard problems like caching do not have an immutable level of risk associate with them. In fact the severity and frequency of problems related to caching may be reduced to almost zero. Therefore it would be a mistake to assume that better risk controls are needed. New controls makes the false assumption that risk is immutable and that the problem statement is constant and well defined. Instead rapid advances are driving increased complexity in systems and the types and severity of potential risks are constantly changing. The key to making a high performance organization is to focus on a learning organization (faster, more frequent feedback loops)

  • Share Knowledge - As a manager establish practices to share hard won experience. The conversations need to setup and facilitated to cut to the core of the issue, and expose some of the tensions which lead to the learning. This will help keep the dialog interesting and informative.
  • Create a Culture of Causality – Create an environment where people can admit to failure as a badge of honor. Create and environment where feedback from management on performance is specific, credible, and immediate. Care is needed to make this an empowered culture, not a punitive culture.
  • Shine a Light - Create a measurable goal and use it to examine processes and interactions large and small. For example, set a goal of zero code rollbacks while increasing code deployments to once a week. Use this goal to turn the everyone into everyday problem solvers. This will drive continuous improvements and empower people to fix issues.

These solutions have been implemented and shown to work in place like Alcoa and the US Navy’s Nuclear Reactor program. A good follow up book is Chasing the Rabbit if you would like to explore the real world examples.

Tags: , , , , ,

Someday Websockets Will Change the World

Bottom Line: Websockets are very cool. Since the spec is changing implementations have been in flux. If the community can rally around the lastest (76/00) spec we’ll be on our way to one of the most important advancements since XHR.
PS For those of you who like pictures, here is a great websockets slide-show from OSCON July 22, 2010.

Websockets is part of HTML5. It enables bi-directional communication from browser to server over a single TCP connection. Websockets would allow for things like streaming data from the browser, or event notification from the server. Some examples of the possible with Websockets

  • Chat/IM like interfaces inside the browser
  • Streaming video/audio enabling video calls inside a browser
  • Simultaneously viewing hundreds of stock tickers inside a single browser window
  • Multiplayer games like Quake

Currently on the web we use Flash or long-polling as a poor workaround. Using flash to manage a TCP connection is like using a bazooka to kill a fly. Long-polling can fail silently, its ties up an HTTP connection, and it can’t be shared between multiple widgets.

So are Websockets now?

Anything written with protocol-75 is not compatible with protocol-00. So we’ve seen lots of previously released implementations get pulled back. Webkit continues to lead the way, and code for the new handshake is already in the trunk.

Where are the servers now? Websockets are an HTTP with an upgrade request. Normal HTTP/1.1 won’t accept the upgrade, and besides normal HTTP doesn’t handle the nonce.

Finally, if you want a cool walkthrough of the webkit websockets code, check this out.

Third Way of Software Development

I was recently interviewed by the CEO of a company. He astutely observed the dilemma with software in one sentence, “Thats the problem, either you buy software and it doesn’t fit your needs, or you build software and after a few years no one know how it works”.

The frustration is real. The impacts are real. Why does this happen and what can you do?

Familiar Problems

  • Purchased solution requires massive customization, the product you end up with doesn’t even match what the manual describes
  • Professional services who customize the software take on a life of their own, and an ever expanding role. Many times the vendor even gets a sales office right inside the building!
  • Engineering teams are afraid to ask executives to make decisions. The resulting ambiguity leads to massive pluggable platforms, and engineering heros who operate with minimum oversight.
  • Everyone (engineers, product managers, sales, marketing) wants to add new features, creating a cultural disincentive to take care of the exiting infrastructure

To sum this up.
Technical managers, engineers, and project mangers with technical proficiency often lack the political savvy to tackle issues. Often there is simply an unwillingness to escalate issues to the “right” level. At the same time project sponsors, business managers, and project managers have trouble engaging as every issue is described in technical detail which is frankly hard to decode. Often an assumption is made that technology is just hard, its best to leave decisions to the experts.

Lets work through an example.
Say we are building a content management system for freelancers. The UI team completes their mock-ups, and the engineering team is shocked to discover a new “pay me now” button. Typically the engineers complain to their manager of the unplanned work this will create. Originally, payment was handled every 30 days, and double approval was done every month as a control to prevent the misuse of funds. Although payment in 24 hours is possible, the old double approval control is too onerous, and a new control like a reconciliation report run every 2-4 weeks is needed. In addition, the back-office team needs to add some new functionality, which requires additional communication between computer systems, which requires additional testing. The back-office team is wary of making last minute changes because if they goof up real money is at stake.

Typically, the engineering manager talks to the UI folks and gets them to water down “pay me now” by showing up only when the dollar amount is more than $100. Then the schedule is massaged to make way for the extra work. At the weekly meeting the project manager explains that schedule has been pushed out 2 weeks. Many reasons are sighted for the schedule change, and an important chance to raise a timely concern around increased volume of double approve for the finance team is lost.

What should happen.
The UI change should be placed on an intake queue and discussed at a change control meeting. To make a good decision the following three things needs to be identified:

  • The effort and scope of work
  • The teams involved
  • The level of approvals needed

The decision should be encapsulated as options, and it should be recorded and signed off at the appropriate level.

  • option one: Pay-Me-Now is a good idea, due all of the work now, which will cause a meaningful delay. The CFO needs to be aware of and possible approve the updated controls on the finance side
  • option two: Pay-Me-Now will not go into this release, but the back-office team will start to work on the changes, and testing them. The CFO needs to be aware of and possible approve the controls on the finance side. The UI team will need to re-work the mockups.
  • option three: We don’t have time for the change, lets keep on to the existing schedule, and consider this for a future release. The UI team will need to re-work the mockups.

Here are the important take-aways

  • There will always be changes and adjustments, and there needs to be a specific and formal process for managing changes.
  • Quality of decisions is important, making decisions at the right level assures better accountability, and helps the organization learn
  • The change need to be specified in a way which characterizes the desired output. As opposed to describing how the change will be executed

They can be restated more elegantly as follows:

  • Management is responsible for sharing experience and learnings throughout the organization
  • Swarm on problems, get the right people involved early and often
  • Specify the output

Tags: , , ,

Flashcache good idea tilted tweets

Some developers at Facebook have developed a new block level cache for Linux. Its called flashcache.

Lots of tweets have been going around proclaiming flashcache to be a MySQL Innodb accelerator. Which isn’t quite true. Twitters great and all but people are just mindlessly re-posting other tweets.

This is my small attempt to set the record straight.

Tags: , , , ,

HTML5 is Coming

Picked this up from Ajaxian. The best slides on HTML5.

http://apirocks.com/html5/html5.html

Its best to download an HTML5 compatible browser like Safari or Opera so you can see the demos in action.

You have to check out the new JavaScript APIs. Warning links point to boring W3C specs.

Tags: , , , ,

Whats Useful in Spring 3

I’m having a hard time figuring out what is new and useful in Spring 3. Looks like the Spring Code Base was cleaned up to make better use of Java 5 (generics, autoboxing, varargs, annotations). Rest HTTP support was added.

Biggest thing seems to be uniformed support for lots of annotations (inject code by tagging it).

http://video.google.com/googleplayer.swf?docId=-366510744777874848

Tags:

Switch to our mobile site