I'm pretty keen on designing for user experience, user experience testing—and user experience monitoring in production. There's some background reading on WikiPedia, of course, and IBM has an interesting paper here, largely about the "ergonomics" and usability aspects of the issue. The User Experience Network also looks interesting.
A bad customer user experience will lose you customers—and they won't forgive you just because you can show that data was coming out of your database like sh++ off a shovel at the time (perhaps your database was performing so well only because almost no-one could actually reach it from outside your firewall). And, good user experience can't be bolted on to a badly designed system; it must be designed in, often by recognising design "antipatterns" for poor performance, at an early stage, and eliminating them.
There are quite a few tools that recognise user experience now but the key thing, to my mind, is that they should integrate across the whole lifecycle from design to production. And, of course, there are many aspects of "user experience"; as well as the usability issues the references above concern themselves with, a consistent performance level is fundamental to providing a good user experience—although performance by itself is not sufficient. And, it is important that you consider the holistic end-to-end performance experienced by end-users, not just the technical performance of a part of the system
One interesting tool that does this, and which I've just come across, is dynaTrace. I was talking to Bernd Greifeneder, CTO and Founder of dynaTrace software—and once CTO at Segue, which is a good provenance to have. Our conversation started around the current trends towards service orientation, software as a service (SaaS), virtualisation and so on, which are making business applications fundamentally more complex. Software these days runs remotely, sometimes on platforms you don't control, and this is much harder to manage than the old in-house mainframe applications with a real customer-facing person to handle the customer's "user experience". Then again, Internet interactions are now loosely coupled and asynchronous, which makes recovery from error situations very much harder (you can't simply back out an interaction you no longer want, as people may have used its intermediate results already).
Two dysfunctional effects of this complexity, which particularly interest Bernd, are application problem resolution times and the emergence of production performance problems which can't easily be traced back to their root cases using manual techniques.
He is proposing a new approach called Continuous Application Performance Management which fits with current trends and enables companies to do more, in the way of fixing production problems, with less, in the way of people. And, most important of all, it helps them to fix problems quickly. dynaTrace, which implements Continuous Application Performance Management, follows individual transactions down to code level and collects metrics for CPU usage, stores method arguments/returns, SQL invocations, messages, logs, exceptions and so on. Moreover, Bernd claims extremely low overhead (embedded lightweight agents just collect data and send it asynchronously to a centralised Diagnostics Server for real-time, off-line analysis). The proof of this pudding will be in the eating, but I thought that the obvious possible issues seemed to be addressed well—dynaTrace monitors its own overhead and configures itself accordingly and it prioritises production throughput over monitoring (so if something goes wrong in dynaTrace, there may be a gap in the monitoring but production shouldn't be affected); and all transactions are monitored, not just a subset (which latter, of course, might not include those with problems).
What dynaTrace calls PurePath maps the transaction's precise execution path, containing relevant sequence, timing, resource usage and contextual information for each method/step the transaction executes, across multiple servers, possibly running on different machines, whether running on the same or different machines (although the mainframe isn't fully supported at present—it runs there when WebSphere does but can't trace into COBOL code—which is a pity).
What this means, to an organisation, is that its IT staff understand the dynamic behaviour of its applications both in development and production, and can therefore anticipate and correct performance problems before the business can be affected. And, if something does go wrong "time to repair" is reduced because the problem transaction can be quickly reconstructed, from captured data to the underlying "root cause" code and repaired (Bernd claims) in minutes, not hours or days, often reducing cost per defect as much as 100 times.
That's all well and good and you can check it out on dynaTrace's website, but where next? Well, one sideline is OEMing dynaTrace as part of ALM tool suites from other vendors such as Borland. This provides a real "proof of concept" for dynaTrace monitoring.
However, the really interesting question is whether dynaTrace can address dysfunctional development cultures. Can it reduce the dysfunctional gaps between developers, operations and business users? Unsurprisingly. Bernd says it already does, because of the traceability between user experience and code supports communication between the different stakeholders in a problem. dynaTrace makes its information available via business-oriented dashboards—but, more than that, it provides real-time role-based dashboards for all stakeholders. These facilities are being developed further and should promote increased awareness of business user experience amongst developers and help developers build business-friendly systems "right first time" that meet user's "working experience" as well as technical needs.
Another possible future, according to Bernd, lies in moving the dynaTrace offering up a level by looking for design "antipatterns" which are likely to result in poor user experience and suggesting refactorings that will address the potential issue. If root cause code analysis for production problems makes fixing them in the code cheaper, identifying potential design problems and fixing them before you write any code at all, will be orders of magnitude cheaper still.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.