transmissions from a free roaming agent of kaos: The Trial Environment - Innovation Infrastructure with an Enterprise wrapper

Introduction

A "trial" environment is a high risk production environment that sits within a low risk Enterprise environment.

Circumstances that might drive you to set up a trial environment:

Business has revenues derived from enterprise production systems it wants to protect through risk management, but...
Business wants to move fast and be innovative, and...
Business wants to work with third parties, some of which are "two-guys-and-a-dog" start-ups who can't afford to focus on making their systems enterprise friendly.

The trial environment helps a technical team to balance these potentially conflicting requirements and deliver both risk- managed and risk-embracing services into the business.

(NB: Like most articles on this blog, the trial environment was conceived within a context of internet delivery systems and online gambling. Please keep that context in mind as you read.)

Rationale

But why would a production IT service want to enable lunatic startup companies and me-me-me product managers into a carefully risk managed production environment?

One reason is to foster innovation. New innovative products tend to focus on the core features and not the "-ilities" such as scalability, stability, and security. The new product team shouldn't be spending time on having to specify and justify a large enterprise environment when there are crucial features to be coded. If we can provide infrastructure and costs that play by the same rules as cheeky little start-ups, we can limit their ability to end-run us.

Another reason is that from a business case perspective there is no reason to spend on big expensive kit to meet fanciful revenue forecasts. It's much better to trend off of real data and provide an environment that can scale to a medium level quickly.

But mostly it's just nice to be able to say "yes"when the panicked bizdev guy comes over to you in desperation so he can close a deal tomorrow as opposed to "that will take 6 sign-offs, 3 months to order the kit, and will cost £400,000." Operational process and cost intensity should match as closely as possible to revenue upsides and product complexity.

The Business Owners Point of View

How the trial environment is presented to the business, perhaps product managers, that have to pay for it:

Suitable for working with small, entrepreneurial, and/or external companies/teams
You can move quickly with it
Fewer sign-offs, less paperwork
Cheap (after a base environment is set up)
Enables you to focus on initial product bring-up and delivery and not overspend on an unproven product.
It's billed back to your project as you use it so no big up-front costs; if you're project stops, we don't have left over dead kit
Suitable for lower concurrent users and transaction volumes
Good for proof of concept projects - if project not signed off, no big capital investment
More risky (less stable, scalable, secure) than our enterprise environment
Your first point of contact if there are technical problems is the small entrepreneurial company you're working with and not IT support
Not particularly secure
Not PCI/DSS friendly (so don't store related data or encode related processes in trial)
Only small to medium sized products can use trial - we only have so much capacity standing by
If there is a failure in the trial environment, it will generally be the responsibility of the third party to fix it. We won't know much about it. We'll only take care of power, connectivity and hardware.
At a practical level, a failure in the trial environment might mean several days of downtime
If your revenue goes up for a product running from trial, we recommend it's moved it from trial to enterprise. That will be your call for you to manage your revenue risks.
A new product that is failing will still accrue operational costs. Pull the plug if you need to and with trial shutting down a member environment is trivial.

What happens if things take off for a product in the trial environment? It's up to the small team or company the product manager is working with to identify this and initiate a project to "enterprise" their product.

From the Entrepreneurial Point of View

How the trial environment is offered as a production option to small, entrepreneurial, and/or external companies or teams:

We'll give you an infrastructure that you're comfortable working with that doesn't have the usual enterprise computing overheads
We'll take responsibility for deploying and fixing the hardware, power, and connectivity - everything else is yours.
Quickly receive 1 or servers you need to get your product going - no paperwork and waiting around for kit to show up
We only have a few types of servers on offer - likely a "small" one for web/app servers and a "big" one for a database server. We'll recommend some options if you're not sure what you need. The servers are not redundant, fault-tolerant kit. If you want that in trial, you'll need to build it into your application.
Tell us what OS you want. We have 3 standard OSs (Linux, Solaris, and, maybe Windows) and if you want something else it's going to be more difficult for everyone.
Tell us how much storage you need. You'll get a little bit local on the server and a flexible capacity will be mounted on your server. The flexible capacity can grow over time without any retooling or paperwork.
Tell us how much network capacity you'll need. We'll QoS at that level. Maybe no bursting allowed.
Your servers will be on their own subnet, just one flat LAN for everything. No DMZ, multi layer firewalls.
You get a firewall in front of you with, tell us what inbound and outbound ports you want open for each server you request. 80, 443, and 22 are easy for us, everything else will make us raise an eyebrow.
Beyond simple firewalling, you manage your own security, e.g., locking down ports/services and OS patching
No content switch. They're expensive and you're clever enough to use Apache to figure that out I'm sure.
Put your own monitoring in place, we're not going to watch it for you. If you need to go from a "small" to "large" server or need more servers, you'll need to let us know.
Put your own backups in place. Specify some flexible storage for them on one of the servers. We won't be backing up anything.
All change control sits with you. We have no oversight.
No remote hands provision is expected to be required.
If you're doing anything that affects production or other members of trial you're servers will be powered down immediately.

The Production Operations Team Point of View

How the trial environment is managed by the production operations team:

Beneath the edge network, the trial environment is on hardware fully separate and distinct from production.
Production operations owns and is responsible for the hardware, network, and power - both initial and on-going. We provision a base OS and hand over the keys to the product team. That's it.
Fairly generous SLA on responding to HW, network, power failures reported to production support.
The trial environment is ideally implemented with some type of in-house cloud service and/or VMWare. If that's not possible, you'll have to manage by-box inventory so that you always have a few unused boxes of each type ready to commission. Must keep a stand-by inventory ready to go. Effective maintenance of slack and procurement to backfill is essential.
Create two server types, small and large. Decide on cores, memory, disk space for each. You will need to change this view over time, so re-evaluate it every 6-12 months.
Establish maybe 3 standard OS installs. We don't own patching or securing the OS.
Use a SAN to enable flexible filesystem provisioning
Fixed maximum allocation of internet bandwidth for all members of trial, then fixed allocation to each member. No trial member should be able to stomp on other trial members or anything in production. QoS implemented. Bursting is debatable.
Dedicated edge firewall.
Network to enable multiple subnets for each different user of trial. Each user of trial can generally only see only their own network and servers. Holes/routing between subnets and between enterprise and trial subnets may be conditionally opened for API (and only API; no e.g. DB) access.
No content switch, load balancer
No backups
No hardware RNG
Some Single Points of Failure ok
We own firmware updates for hardware
We don't monitor or alert on any virtual servers. We do monitor and alert on underlying hardware, including the network kit and SAN.
May use a second tier hosting location for trial kit
It might be possible to use older kit being decommissioned from production for the trial environment. While this would likely increase day-to-day operational costs (heterogenous and older kit), it would bring down initial capital investment in trial. Also consider used/refurb kit. Think cheap.
Keep a basic overview of trial and its services updated on the intranet. Make sure all product managers and bizdev types are educated about it.
Periodically review trial usage with each business owner.

As the production team evolves the trial environment offer, it's likely that some of the "we don't do this in trial" items above will change as cost effective and lightweight ways are found to deliver them into trial. Possible examples are backups, more sophisticated networking (load balancer), a provision of fault tolerant disk on server instances, or a shared (between trial members) database instance.

Other Considerations

Some other things to keep in mind to make the trial environment successful:

Aspects of production may be accessed via e.g. an API. This introduces a point of vulnerability to production. There are good design practices for hardening APIs that are exposed to trial such as logging, monitoring, authentication, rate limiting, and kill switches to protect what is on the production side of the API.
It may be cost efficient to spin up a single instance of an expensive service (e.g., Oracle) that can be shared between multiple trial members. This introduces a fair amount of complexity to manage the DB itself including QoS, security, and change control.
The trial environment won't build and run itself for free. Technical operation staff are required. The number of staff should be proportional to level of change and size of the environment.
If a third party is involved, they must have an internal business representative championing their product or service, someone who understands the product and will champion it regularly. A bizdevy, bring-the-external-party-in, hurl-over-the-wall-to-IT-ops doesn't work.
A product or service typically requires other functional contributions as well: game platform operations, marketing, account management and sales, handling of amended contracts for the new product, on-going product management to improve the product, and website integration and updates.
Trial could also be used to spin up staging or pre-production test environments.

Conclusion

The trial environment can be used to provide a low cost alternative for startup, experimental, speculative, and just plain insane product ideas. It's a hosting option that edgy product managers and bizdevs will like because of the lightweight commitment and speed of delivery. Entrepreneurial teams and startups will like it because it'll feel like something they're use to and won't slow them down. The production support team may feel uncomfortable with it initially because trial violates a lot of "best practices" in production. But in the long run they'll see how it becomes a business enabler that fosters innovation in a cost effective way.

Good luck and let me know if you manage to establish a trial environment in your shop!

transmissions from a free roaming agent of kaos

05 March 2011

The Trial Environment - Innovation Infrastructure with an Enterprise wrapper

No comments:

Post a Comment