Scalability is a question that comes up when evaluating businesses and their supporting technologies, especially in a high growth area like online gambling. This post will focus on the technical scalability of the technical systems of a business, particularly gaming delivery platforms in use by online gambling businesses. I won't be covering non-technical aspects of business scalability in this post.
While whole books are written on the subject, the following is an abbreviated version of what to look for when evaluating the scalability of any technical system (software and supporting systems) from high level architectural point of view. My goal is to provide you with some practical and real-life knowledge to better evaluate software vendors flogging their betting exchange, sportsbook, poker, or casino products.
To assist the discussion, we’ll use the following general purpose view of an internet based service delivery architecture:
Tier 1: Client, e.g.:
) and/or Flash AJAX
A downloadable “heavy” client
Tier 2: Application Server, e.g.:
Tier 3: Database Server
This is a standard three tier architecture, and is a common reference tool when discussing internet-based applications. There are other interesting facets of this architecture like fault tolerance and security which I won’t be covering here.
Tier 2 can be quite complicated and sub-divide in many ways. Sometimes this subdivision results in a system called an “n-tier” or multi-tier architecture.
1. General separation of delivery framework
The delivery framework is a set of tools used to deliver a solution to a customer. It isn’t the specific application like Party Gaming’s Poker software on your PC or betfair’s browser-based trading interface. It is the toolset that companies use to build their applications.
A typical delivery framework for a website like betfair might be:
Tier 2A: Web server (e.g., Apache)
Tier 2B: J2EE compliant application server (e.g., JBoss)
Tier 2C: JMS messaging service
Tier 3: Database server (e.g., Oracle)
A typical delivery framework for a PC-based internet poker game like Party Poker might be:
Tier 1: A “heavy” client (“heavy” means it isn’t based in your browser and contains code that runs directly on your PC) written in a language like C++ using Microsoft’s application development tools and supporting functions
Tier 2: Microsoft .Net application server, including many discrete components such as player, lobby, table, and chat management
Tier 3: Microsoft SQL*Server
A system can more easily scale when each of these tiers can be separated. I emphasize “can be” as for cost reasons you may put application and database server software on the same hardware when you start out, but it is a well-understood and simple migration process to pull them apart at a later time when you need and can afford greater performance.
At a practical level, for Internet gambling, major software components (e.g., application server and database) should be separated out into multiple hardware platforms, even when you’re just starting out.
As a buyer evaluating systems, the thing you look for here is the use of a fairly standard delivery framework, and not some cobbled together proprietary Frankenstein framework that will be difficult to support.
2. Functional partitioning of system components
System components are created by developers in the context of the delivery frameworks they have chosen. Functional partitioning of one component from another is useful because different components can be run separately on their own hardware allowing for greater scalability.
To evaluate partitioning, logically divide up the various components involved in using the product.
Consider a poker system. Does the same component in the system support both player handling AND table play? Intuitively, managing player logins and the logic of a poker game around a table are too very different activities and could be separated.
A classical example of this is a backoffice reporting system that is used by an operator to report on game platform activity. The reporting should be completely separated from the systems that deliver customer game play, so that heavy reporting activity doesn’t jeopardize the customer’s game play by slowing it down.
The relative goodness of this area is all about how well the supplier designed the architecture of their product. Some suppliers, who evolved from a two-guys-and-their-dog software effort compounded with a lack of experience (or benefit of hindsight anyway), really fail in this area. Look for “monolithic” architectures (that is, all the logical components you might guess should be in the system are all balled into one big entity that handles middle tier responsibilities) in Tier 2 as a sign of trouble. Ever stay awake at night wondering why your poker network solution tops out at 6000 players? It’s probably in this area, and unfortunately, there isn’t much you (or your supplier) can do about it without a massive rewrite of the software to fix the fundamental design flaws.
Sometimes complex components are difficult to subdivide. For example, consider a large multi-table poker tournament that has 10,000 registered players. The logic required to create the seating between one round and the next is probably difficult to subdivide. In that case it may be best to have a specialty component whose sole job is to take the results of one round and then create the player and table structure for the next round, and that’s it.
Computing takes time. One part of a system asks another part of the system to do something, and then waits to get the result. Caching is the act of keeping that result around so you don’t have to take the time to reproduce it over and over again.
Caching is a key part of a high performance architecture, and the good news is that the delivery framework being used often provides various types of caching essentially for free. Caching is also the kind of thing that can sometimes be retrofit into a system to get a later performance boost without a lot of effort.
To understand caching, you need to understand how frequently what you might cache changes and how important absolute accuracy is. You’d be surprised – accuracy often isn’t that important.
Let’s use the example of a sportsbook homepage that shows a list of football matches and prices. On the backoffice side you have odds being changed through a set of automated rules in the system and human traders watching the market. Now imagine that there are 20 users every second browsing to that home page to check the odds. Should the system have to go all the way to the database to find the price to display to each of those 20 users? Definitely not. The system should generate that list of matches and prices once every (for example) 5 seconds and let all users that request the data during those 5 seconds see the same cached list.
An extreme example of this is the betting and display activity right before the off of the Grand National on betfair. Betfair is probably processing 100s of transactions each second, and the prices and amount available a given price is changing perhaps every 20 milliseconds (20/1000 of a second). What you see when you hit refresh on that page isn’t a true representation of the market at that moment, it’s just an approximation.
At a practical level, things typically cached are whole web pages, parts of web pages, data sets extracted from the database, and derived data sets, as calculated in Tier 2.
Another use of caching on a global scale is a service like Akamai. This service keeps copies of your content geographically close to your customers so that they have (at least the appearance of) faster page load times. That way if your servers are in
and your customers are in Costa Rica , many aspects of your web pages can load from Akamai’s servers in Russia , which will be a lot faster than your servers in Russia . Costa Rica
4. Message passing
Message passing is the act of one component giving one or more (other) components some information asynchronously. Asynchronous information transfer means is that the component that is giving the information doesn’t waste time waiting around for the information to be delivered. It relies on a delivery framework (e.g., JMS, the Java Messaging Service) to make sure that the receiving component(s) get the information.
The great thing here is that the information sender and receiver can be completely separate from each other allowing them to (potentially) sit on different hardware platforms, or at least putting a multi-processor server to better use.
Consider a poker system. A component decides that the ace of spades will come up as the River card to be seen by 9 players around a poker table. The component sends that message out to each of the 9 player's heavy clients via the messaging service and then continues processing without waiting around to see the information delivered to all 9. The message service handles the actual delivery process.
Message passing has been a long time critical component of high performance architectures. If your software provider, particularly for products like poker and betting exchanges where there is a lot of player interaction, you should carefully understand whether message passing has been implemented.
For our purposes, the state of something is a description of its current status. When computing, state takes time and resources to maintain. When something is stateless, it doesn’t remember anything between one request and the next.
Imagine a conversation between 10,000 customers each running poker clients on their PC. Each player is unique from the other, and each is about to do one of a number of different things (e.g., fold, stand, or raise!). If the software on the customer’s PC keeps track of the customer’s state, it keeps track of exactly one customer's state. If the poker server keeps track of the player’s state, it has to keep track of 10,000 different player's states. Clearly keeping the state with the client is much more scalable.
The most common example of this is web browsing. The system that supplies you with the page you just requested can only serve you the page you request. It doesn’t remember the previous page you were on. It is stateless. It is up to your web browser (typically working in unison with an application server behind the web server) to pass state information to the web server, so the server knows what you want.
Other than resource handling scalability, the characteristic of statelessness also helps enables #8 below.
When evaluating a system, you can quickly identify potential performance bottlenecks by understanding where state is maintained.
6. Resource pooling
Resourcing pooling is the act of pre-creating a set of resources that are used by many components, particularly when there are many more components wanting resources (but not at the same time) than there are resources. The resources are allocated from the resource pool, used, returned to the resource pool, and then recycled. The resources aren’t created and then destroyed, only to be created again when needed. The resources are pre-created at system startup (and perhaps on demand) and then (re)used as needed.
An example of resource pooling is something called connection pooling. When an application server (Tier 2) needs data from the database (Tier 3), it has to establish a connection (like a two-way pipe) to the database. Connection setup has a cost in terms of computing time and resources on both the application server and database sides. To reduce costs, most application and database servers maintain a ready pool of database connections, just waiting to be used. Once the application server has made its request to and received its data from the database and is ready to move on, it returns the connection to the pool so that it can be re-used again later (likely for a completely different data request).
At a practical level, delivery frameworks handle common points of resource pooling, and very little is required on a software developers part to utilize them. In order to evaluate how well the vendor’s system makes use of resource pooling, you have to dig pretty far into the architecture, which likely won’t be very practical. However, if you do identify a component that is logically very complex to set up and is frequently used, its worth understanding whether that component is set up and discarded over and over again, or is it part of a resource pool.
7. APIs, services and loose coupling of components
Loose coupling of components means that each component doesn’t know much about or share much with other components. It is an architectural way of thinking about how to separate and sometimes replicate data and functionality between components to maximize scalability.
Loose coupling makes use of and is a consequence of implementing most of the performance techniques covered above. Conversely, an architectural imperative of loose coupling would drive you to use most of the above techniques.
The reason why I’ve separated this performance technique out is to highlight the use of APIs (Application Programming Interfaces) to access loosely coupled services. An API is (hopefully!) a well understood way of accessing a service that is being provided by one component to another component. For example, in the online gambling world, connecting to a third party payment gateway like Netteller is an important business enabler. Netteller provides a very well defined way (API) to access Netteller funds transfer services. Netteller doesn’t know much about your customers and your gambling platform doesn’t know anything about the banking network under Netteller. If the two components (your gambling platform and Netteller) are loosely coupled, it should be a simple matter to unplug Netteller and plug in Firepay (at least from a technical, but not necessarily a commercial deal point of view!).
At a practical level, third parties like Netteller that want to make it easy to use their services do a good job with APIs and their documentation. They provide examples in various computing languages so developers can almost just cut and paste in the necessary components to access the services.
On the other hand, your primary gambling platform supplier may not be too keen on having you hook to other parties, so they might make it difficult or impossible for you to do so by not exposing or documenting their APIs. Also, some gambling platforms may be so poorly designed (especially in area #2 above) they can’t expose an API because there simply isn’t one.
Another telltale sign of loose coupling and API design is how easily you can get to the data you see within the application. That is, is the presentation of the data only loosely coupled with the production of that data. If the presentation layer (e.g., how the web page is coded to display the data) can be easily changed to display the same core chunk of data in different ways, the data is probably loosely coupled from the presentation.
8. Clustering (horizontal scalability)
Clustering is the ability to take one component and create multiple copies of it, potentially across multiple hardware platforms. Each copy of the component is equal to all the other copies – they are all peers. When these peers are together in a common pool and can provide services equally, they are considered to be clustered. Horizontal scalability means that I can keep adding in peers (e.g., additional hardware) to increase the performance of that peer group.
Like loose coupling, clustering is another derivative of some of the areas above.
The clustering concept is similar to resource pooling (#6 above). However, with resource pooling, your trying to avoid the costs of repetitiously creating and destroying components. With clustering, your creating a variable size pool of components to match your performance requirements.
We’ll consider two practical aspects of clustering – system level and component level.
At a system level, you want to be able to create a cluster of database, application, and web servers. As your performance needs go up, you can add a second, third, and so on server. Each of these servers can run on a new hardware platform. This can be tricky in the database tier, but is certainly well understood in the application and web tiers.
Things get interesting and complex at a component level. Consider an extremely popular football match in betfair, a few minutes before the start of the game. A good design suggests that there should be a cluster of components that handles markets, and a single “market” component (of a cluster of market components) can be dedicated to a single the hot market. An even higher performance model would be a stateless one where a cluster of market component peers can handle any market. It would be a design disaster for one component to have to handle all markets.
It is quite possible that as an operator evaluating your current or potential future online gambling software you will never to to ask these questions. Vendors will cry "proprietary", "competitive advantage" and other bollocks. They will ask you not to look at the man behind the curtain. This will change over time as the market becomes more competitive and using good design practices and high performance frameworks becomes a competitive advantage.