21 March 2010

Using wget to ask jspwiki to re-index its search DB

For whatever reason, our installation of jspwiki (v2.8.2) decides to ignore or lose pages out of its index (hey, what do you want for free?!).  With our jspwiki hitting 2000 pages, search is the main tool to find pages. Unfortunately, I've taken to keeping my own links page to important pages just so I don't lose them as the search indexing seems to break regularly.  While a re-index solves the problem, but it requires going into the site, authenticating, and clicking a button - way too much work.

Here is a quicky to use wget to log in to jspwiki and force a re-indexing of pages:

# POST to log in and get login and session cookies
wget --verbose --save-cookies=cookie --keep-session-cookies --post-data="j_username=myuid&j_password=mypw&redirect=Main&submitlogin=Login" "http://wiki.mydomain.com/JSPWiki/Login.jsp" --output-document "MainPostLogin.html"

# POST to kick off reindexing using cookies
wget --verbose --load-cookies=cookie --post-data="tab-admin=core&tab-core=Search+manager&bean=com.ecyrd.jspwiki.ui.admin.beans.SearchManagerBean&searchmanagerbean-reload=Force+index+reload" --output-document "PostFromForceIndexReload.html"  "http://wiki.mydomain.com/JSPWiki/admin/Admin.jsp"


Tweak myuid, mypw, and wiki.mydomain.com in the above to have them be what you need.  Drop the output once you're comfortable it's working (I was saving it in the above to make sure I could see artifacts of being authenticated in the output).

Put the above into a cron'ed script and run it hourly.

Note that all versions of wget are not created equal as 1.10 didn't seem to work but 1.10.2 and 1.12 worked fine for the above.

QCon London 2010 - Miscellaneous Topics

There were no shortage of interesting topics at QCon London 2010.  Although I'm writing in some depth about a few of them due to personal interest and/or applicability to Internet gambling, there are many others I'll highlight here briefly.

Shared nothing architecture
- Each node of a system is stand-alone and shares nothing with other nodes
- Great horizontal scalability
- Shared databases, data stores, caches are constraining
- Great for stateless, single-shot request-response, and content oriented services; less so for multi-state transactional systems

Industry Consolidation driving big boys architectures
- Internet traits such as the network effect and rapid feedback loops accelerate consolidation on a single market-dominant (defacto monopoly) services (e.g., ebay, betfair)
- Big consolidated services require a big compute capacity.  Market convergence on a single supplier isn't possible if that supplier can't scale to meet demand.
- Big compute capacity requires a lot more thought on the "-ilities" (non-functional attributes) of service delivery.  Functionality becomes commodity.
- Web technologies are embracing "traditional" approaches to increase compute capacity: asynchronous message oriented design, greater attention to maximizing hardware
- Consolidation also means longer life of legacy software
- The CAP theorem (see below) is coming into play for big systems that want to be highly available and need to massively scale

Programmer Quality of Life wins over Abstraction and Separation
- XML is painful
- Co-locate configuration with code (annotations)
- Convention over configuration (even Java coming on board with apps like Roo on Spring)
- Repetitive coding requirements should be built in (no boilerplate or scaffolding) - aspects relatively for free
- (Where does that leave dependency specification?  Hello maven pom.xml my nemesis!)

HTML 5, CSS 3, and Javascript versus rich client interface technologies
- Native executables (Wintel binaries) used to be way ahead of the browser on usability and richness but HTML/CSS/JS continues to move the browser experience closer to native executable experience
- Major new browser advances are right around the corner
- Flash, Air, Silverlight - great interfaces but browser continues to advance
- Mobile causing a renaissance of RIA and native executables - but browser continues to advance
- Innovation areas will tend to use an RIA and then the browser will catch up
- High touch experience (e.g., game graphics with high performance requirements) will require native executable performance for some time to come
- For most enterprise and business requirements, the browser experience is already sufficient today

Power efficiency, carbon credits and trading
- Assuming carbon trading advances, we might see a day where well written (more efficient, less energy consumptive) applications are important again
- Energy efficient HW (e.g., Sparc v Intel) may be more valued
- Some odd things may happen such as shifting compute capacity (carbon emission) to third world "carbon dumping grounds" due to economic incentives

Right tool for the right job versus efficiency from limited technology choices
- Although Java is dominant in the enterprise, Ruby is making inroads.  Recognition of productivity boost of a pleasant coding environment that encourages DRY and good programming techniques.
- Functional languages that facilite multi-core (parallel) computing are increasing in popularity as currently popular languages in the enterprise do not (Java!)
- Advent of language neutral information passing protocols to better enable innovation within components (but not forcing between components)
- As of today, homogeneous technology choices for the enterprise are still winning

Software Developer can "do it all"
- Moving test into development through TDD (and from unit to functional and some end-to-end)
- Cloud services abstracting operational systems (the specific HW and OS don't matter)
- Moving live deployment into development (Continuous Integration leading to Continuous Deployment)
- Better to use a shared nothing architecture under developer's control than reliance on specialty approaches like a cache in BigIP F5s or a shared in-memory cache

And a grab bag of others:
  • OSGi and Java.  JARs lack versioning and dependency declarations and therefore lack safe coupling.  OSGi defines bundles to make integration/upgrade safer.  Feels complicated versus using a Convention over Configuration approach.  Could we use co-located annotations in the code instead to describe dependencies?  What about dependencies outside a specific application/JVM?
  • SOA (Service Oriented Architecture) is dead, long live SOA!  (No one seems to like SOA but a lot of practices from SOA are in prevalent and growing use)
  • TDD (Test Driven Development) is pretty much assumed now even for the smallest teams and projects.  CI in varying states but clearly the next development practice that will be an assumption shortly
  • Log everything (Google, Facebook) - both customer actions and internal systems and be able to compare anything to anything
  • CPU clocks hitting speed limits.  Until some new as yet unidentified technology breakthrough, CPU clock speeds have hit about as fast as they're going to be.  From now forward it will be about parallel processing on a growing number of cores.
  • DDD (Domain Driven Design) - Design software with the interests of specific stakeholder's interests at heart, using the stakeholder's terms ("Ubiquitous Language") Let stakeholder interest area ("Bounded Context") warp a "perfect" implementation to one that is tailored to the stakeholder's needs.  In a complex system, identify the Domains of interest, and design around each of them in parallel with figuring out how to glue together these Domains.
  • CAP theorem - pick 2: Consistency, Availability, and Partition tolerance (CAP).  Business will generally pick Availability and Partition tolerance, so that leaves Consistency as the odd man out and implies that more attention is then needed on identifying and recovering from inconsistent states.  Eventual consistency for some functions is sufficient.
  • New persistance models - Social networks with their many-to-many relationships in the data are driving the use of new persistance models to supplement their relational databases
  • Dreyfus model of skill acquisition - a good way to take a view on how people pick up skills and as a way to assess how skilled/mature your staff actually is

17 March 2010

QCon London 2010 - Cloud Computing

Cloud computing and virtualization was a popular topic at QCon London 2010.

Background/primer/proposition:
  • Cloud marketing suggests that hardware and/or systems administration is now a commodity that you shouldn't have to think about too much and can safely outsource. 
  • Just like TDD (Test Driven Development) decreases the need for QA, CI (Continuous Integration) with direct deployments into an operational environment will decrease the need for systems administration.
  • Outsourced pay-as-you-use cloud propositions will likely cause costs to switch from capex to opex to budget for computing capacity (was traditionally HW and SW in capex)
  • Grossly simplifying, there are four interesting cloud propositions available:
    • In-house hardware virtualization - cloud under your control, in your data centre (e.g., VMware, Xen, Solaris Zones)
    • Outsourced hardware virtualization (IaaS - Infrastructure as a Service) - cloud as an "infinite capacity" of generic computing and you define the systems from the OS up (e.g., Amazon's AWS EC3)
    • Outsource compute capacity (PaaS - Platform as a Service) - cloud as a place to deploy software components into a fairly tightly defined (constrained) operating environment (e.g., Google's App Engine)
    • Pure services (SaaS - Software as a Service) - cloud as a source of "commoditized" services to be used when you construct an application (e.g., Google's web analytics, Facebook OAuth API for user credential management, AWS's S3 for storage)
  • Cloud means that you can cost effectively create and delete computing resources as needed for parts of your IT environment that don't require regular use.  For example testing and in particular load testing.
  • Non-tech business types get excited by cloud because:
    • If your an entrepreneur type, you get bonus points for running your infrastructure from the cloud when looking for funding (more-so in the last two years, this is declining some now)
    • Finance and P&L owners get exited any time they can commoditize something to drive down costs.  Tech has mixed feels about this as "drive down costs" tends to imply redundancies.
    • Easier to justify upfront costs for a new business case if you only pay for what you use (a failure is easy to delete, no sunk capex expenditures)
  • Both tech and non-tech types get excited about not having to generate a lot of paperwork then wait for authorizations and shipping times to get new kit.  Assuming company bureaucracy doesn't shackle down cloud controls too vigorously, a new virtual platform can made available very quickly and at low costs.
  • If you can maximize utilization of HW you buy, then it's no different than buying cloud resources (likely cheaper)
General Observations on Cloud and Virtualization

Virtualization enables us to achieve that solutions architecture ideal of "one box one purpose", it just that it's become "one virtual box one purpose".

Virtualization enables us to take applications that don't have a good threading model to take advantage of boxes with many cores and use up all the cores (application per VM; VMs added until all cores are utilized)

Cloud does imply a lack of control over your core infrastructure.  Do you need this control?

The cloud is still just a bunch of hardware systems in a data centre.  There is no magic.  Their DC and systems admins will have their share of problems as well.  If the cloud sysadmins can provide more uptime than your own techops can provide at a similar cost point, the argument for cloud increases.

Similarly, there is debate over how good the SLAs are for cloud.  But really, how enforceable are the SLAs you have anyway?

Your choice of virtualization or cloud will enforce a way of creating applications and handling services.  You may not like it.  Conversely, it may force you to be disciplined in a new way otherwise missing when you create applications.

You will make an investment to learn the systems and make your applications work in the cloud environment.  This will cost and create some lock-in.  This is more true for PaaS than IaaS.

The cloud is being used to "long tail" a number of services.  Service "particles" are appearing you can use to provide an aspect of functionality in your overall solution.  The more of these partners you use that are in the same cloud with you, the greater the efficiencies and hence lower costs.  Combined with first mover advantage and vendor lock-ins, this is a network effect that should drive toward having just a few cloud suppliers in a few years.

Relating Cloud to Internet Gambling Business

The use of an in-house cloud like VMWare makes good sense.  We're regularly adding in new products that need to undergo development and test yet we don't need permanent capacity to service these requirements.  While a VMWare setup can't fully proxy a production environment (unless you use VMWare in production as well), it is very suitable for most types of functional verification other than load and low level device compatibility.

Being able to hand the keys over to a set of virtualized servers enables more entrepreneurial behavior.  For example, if you have a larger business that has a heavy layer of process, you can still work effectively with start-up partners.  Give them the keys to their own set of systems and they can do whatever they want with them without impacting your core systems.  At which time they're proven successful, their revenue stream can justify improved risk management.

Handling flash crowds with cloud probably isn't possible for our industry today.  In-house clouds don't really handle flash crowds (Why not just have the capacity there anyway? What do you want to cripple to support that big marketing campaign?).  Outsourced cloud generally isn't possible as the bigger cloud providers may not allow internet gambling to be run within their clouds (AWS restriction anyway; and yes, this will likely ease up at some point, just look at Akamai's behavior on Internet Gambling).  Also a CDN (Content Distribution network; an SaaS of a sorts) will take care of a lot of the flash crowd load we experience.

Using an outsourced cloud PaaS for data analytics doesn't seem likely.  Data analytics crunching benefits from close proximity to the data set being crunched.  Bandwidth to upload big data sets into the cloud from higher connectivity costs locations (lots of internet gambling in offshore locations with expensive ISP costs) doesn't make sense.

SaaS however is quite interesting.  Services like Google Analytics that enable almost real-time data analysis are clearly the way to go for an Internet gambling site.  Highly bespoke business analytics will likely stay inside the business or use a SaaS for commodity analytics.  

Depending on who you ask, the following may be real risks or just FUD:
  • Taxation - as services are sourced from someplace other than the tax advantaged place you have your business in, you are at risk of emerging taxation implications
  • Centralized point for governments to enforce legal compliance.  By hosting in the cloud (which is actually going to be one or more physical data centres), you've given the governments that have oversight of those data centres a good choke point to use against you.  They could use taxation, inappropriate content, or services not in compliance with regulation.
Conclusion

Virtualization makes complete sense for Internet gambling companies, all the way from development through to production.  That's not news, most in our sector have been using virtualization for a few years now.

On Cloud/IaaS provisions, AWS (a clear IaaS market leader) have flatly disallowed any internet gambling related operations inside their service.  While it is likely you could get away with internal use (dev, test) of cloud in these services, do you want to create a dependency and then have it suddenly shut off on you?  AWS of course isn't the only show in town for IaaS  There are other providers -  you would have to evaluate them versus related risk factors and re-development costs to integrate their use into your environment.

There is no clear use yet of Cloud/PaaS for standard Internet gambling products.

There are plenty of emergent opportunities to use Cloud/SaaS for Internet gambling.

(Index of emergent technologies applied to Internet Gambling)

QCon London 2010 - Themes and Trends

Last week I had the good fortune to attend QCon London which bills itself as an "enterprise software development conference".

I thought the conference struck a good balance between maybe 40% academic/futures/ideas from the ivory towers versus 60% practical, grunty software development from the trenches.  That of course varied by what sessions you attended as there were various tracks and tutorials available.

QCon was fairly software development centric.  Although there were tracks on technical operations and QA, both felt more like "what software development thinks how techops and QA should work" versus hardcore QA and techops experts running the tracks and presenting.

Although billed as "enterprise" software development, QCon was new media centric.  Less about enterprise and more about entrepreneurship using (recently) new tools and techniques to deliver and manage software.  I found this quite suitable for igaming that is still more entrepreneur land than it is enterprise.

The following are themes and trends that were in the air at Qcon that captured my interest.  Some will be old and familiar (yet receiving continued attention), others are relatively emergent in the last year or so.  Each item may eventually lead to a blog entry with detailed commentary on the subject and as relevant a view on how they apply to internet gambling systems.
  • Cloud Computing
  • NoSQL versus Relational Databases
  • RESTful architecture
  • Functional programming languages
  • Post Scrum
  • Mobile Computing
  • Event based architectures, asynchronous messaging
  • DevOps, particularly Continuous Deployment
  • Miscellaneous topics