Wednesday, September 21, 2005

The Enterprise Walk: Technology is not an issue

I have been recently summoned by a large financial institute, located somewhere in the world, to review the architecture of a core, mission-critical project they have outsourced to a major software development house.
While hearing and seeing the architects of the software house explaining the different solution's functional components and their interactions, I felt an ever growing uneasiness. These guys were no doubt domain experts in the Microsoft world; they have iterated all the right words: Whidbey, Yukon, Biztalk 2005, and (the inevitable) XML and Web Services. They also presented a layered architecture with UI on top, followed by business logic layer, connectivity layer and so on. So they talked the talk; but they didn't walk the (Enterprise) walk.

What these guys have missed altogether, is that they were not architecting a software application, but rather an Enterprise Solution. When a software house builds an Enterprise Solution (and we're talking millions of US$ Enterprise Solution), the Enterprise does not really care if the Solution will be developed using Biztalk or Whidbey, nor should it give any approving nod has it been J2EE or AJAX. Technology is not an issue in the architecture of an Enterprise Solution. Technology might be relevant for the software developers (i.e. the software house), but in itself, detached from a specific Enterprise Context, Technology does not matter! The software house architects confused Technology and software development principles with Enterprise Solution Architecture, and that's what was wrong.

So what's Enterprise Solution Architecture? There are two kinds of architectures: functional and non-functional. Both comply with the same principles of order, clarity, simplicity, ease of use and manageability.

The functional architecture layers the business processes, i.e. the Solution functionality, in a coherent, logical business-wise fashion. A good example for that would be Telemanagement Forum's eTOM (enhanced Telecom Operations Map), where Telecom Business Processes are layered across multiple dimensions of Enterprise Life-Cycle Stages (such as Fulfillment, Assurance and Billing) and Business Entities (such as Customer, Service, Resource and Supplier). So when an Enterprise Solution is designed, it's not a bad idea to layer the Solution's business processes on a larger Enterprise Functional Architecture to see where the Solution fits. Actually, most of the big Enterprise Software ISVs, such as Oracle, PeopleSoft, JD Edwards and Siebel ( :) ) have started to release versions of their products adapted to Vertical's functional architecture (PeopleSoft for communications; SAP for automobile, banking and so forth).
So bear in mind: even though no Enterprise is created equal, there are good chances that Enterprises from the same Vertical adhere to a similar functional architecture.

The non-functional architecture lays the foundations for the life-cycle management of an application or an Enterprise. The same architectural principles mentioned above (order, clarity, simplicity, manageability etc.) must apply here as well. But unlike the functional architecture which is determined by the business domain and the specific Enterprise needs, non-functional architecture is shared across all Verticals and Enterprises. It might even have a checklist!

I'll give here a short version of the non-functional architecture checklist. Bolded subjects indicate an impact on the Solutions' design and coding. When the subject font is in a regular style it's usually realized outside the Solution. Pay attention to the majority of bolded subjects: non-functional architecture is at the heart of any Enterprise Solution.

1. Change Management
Changes to the overall Solution's components (software code, infrastructures and applistructures - not just the software code!) must be managed in the following manner:
a. Impact Analysis: a critical factor in today's Enterprises. Who's who in the specific Enterprise Solution, in its entire life-cycle: requireemnts, codes, tests, applistructures, srevices, servers, storage, versions, stakeholders and so forth. Obviously - inter-relations are necessary: which element is related to what?
This information must be formally represented, automatically updated, and easily accessed.
b. Version Management - we all know that for software. Is it maintained for the other Solution's elements? It must be. Probably it can't be managed in the software version management package, but all the solution's components must be versioned in accordance with the solution's own version.

2. Dev/Test/Integration/Prod Environments
Normally, there will be multiple change tracks to a single Solution in different development stages. In my previous company we had more than hundred dev/test/integration environments representing the same set of core applications in different stages and versions.
What's expected from the Solution provider is an installation kit for scratch install and delta updates of those different environments.

3. Regression testing
The management of a set of tests validating that "the rest of the functionality" is still in good shape. Do note: it's the management of this set, not the set itself. Automatically updating the content of the set; presenting the content of the set; running the set; showing its historical run outcomes – all that is part of the regression testing management. Some of the best Enterprise Solutions I know had this regression testing management incorporated inside them!

4. Integration
A solution usually consists of different modules and/or applications and usually it interacts with other enterprise solutions. Information is, therefore, exchanged internally and externally.
Every Enterprise Solution must, therefore, contain an internal Sub-Solution that handles the integration (there's an exception, see later on). All the Solution components must use the integration sub-solution for information exchange – be it external or internal. They must access the integration sub-solution using the same request payload standard, and they should receive standardized reply payloads from the integration sub-solution. I'm insisting on the terminology sub-solution (rather than sub-component) so it would be clear that the sub-solution requires an identical compliance to the non-functional architecture subjects, as if it was a standalone solution.

5. Service Orientation
The Solution must be designed in a way that outsourcing its internal sub-solutions or functions would be possible (thus, enabling the reuse of existing components and reduction of vendors lock-in). If, for instance, there's an Enterprise Integration Architecture in place to which all Enterprise components adhere, the Solution must be able to use it instead of its own integration sub-solution.
Do note: there's absolutely no need to think "Web Services" whenever you see Service Oriented Architecture. As I mentioned earlier, technology is not an issue, and "Web Services" is just one out of many alternatives to technically realize SOA.

6. Scalability
This is the ability to easily scale-out. Note that scale-up is no longer acceptable, as real-time Enterprises are adopting the internet architecture to provide streamlining operations. Scale-out practically translates into cloning and partitioning. Different modules of the solution can have multiple, concurrently running instances (cloning), each taking care of a workload subset (partitioning). There must be a coordinator that either distributes the workload or performs effort de-duplication (I'll explain this better in another post).

7. Availability
This refers to the Solution's ability to assure non-stop operation, regardless of failures and utilization peaks. I sincerely believe and my experience proves that clusters are no longer relevant for nowadays mission-critical Enterprise Solutions. The failover time is far too long. Actually, Real-Time Enterprises cannot tolerate failovers. Designing Enterprise Solution for streamline operations with no failover is not a trivial mission, and vendors should thoroughly explain how they cover this.

8. Data Consistency
Most of today's and tomorrow's Solutions are exchanging asynchronous messages to get and set information (internally and externally). Guaranteeing every message reaches its destination, processed once and only once and properly returned to its requester is a tedious task. Failures can happen all along the integration chain; queues and databases can get corrupted; systems can fail in the middle of processing. It is no longer the days of the XA transactions with an automatic rollback – oh, no! These days are gone for good.

9. Monitoring (don't think of HP-Openview, please...)
Monitoring is confused with 3rd party monitoring frameworks. That's a great mistake. 3rd party frameworks do not understand the Solution! They might recognize the underlying hardware or applistructures of the Solution, but that's basically it. The Solution must manage its own well-being counters, as defined by the business stakeholders of the application. In the famous design review I described at the beginning of this post, the business stakeholder has defined 20 or 30 response-time thresholds for different use cases. Each of these response-time thresholds must have a counter that maintains threshold information and communicates it to the outside world. If you follow Microsoft's Dynamic Systems Initiative, you'd see that that has become a pillar in their "Design for Operations" architecture.

10. Operations
Given the complexity and distributed nature of today's Enterprise Solutions, it becomes highly desirable, if not a must, to have an Operators console as part of the Solution. Through this console the Operator can control the different elements of the solution, such as performing start/stop; examining current happenings (for instance, current queue state; number of processed files etc); investigating past states, alerts and logs, trends and so on – all from this central console.

11. Problem Resolution, Log & Audit, Debugging
If the Solution is aware of its self-state (through the employment of KPIs & KQIs counters) it would be possible to direct the support teams toward the potential source of a production problem. By providing consolidated logging architecture, i.e. all sub-solutions and sub-components are logging in a standard manner into a unified, designated location - problem resolution time is dramatically reduced.
Non-intrusive logging levels switching should be enabled across all sub-solutions and sub-components so debugging of a distributed, a-synchronous Solution would be easier.
combining Problem Resolution techniques with Centralized Operations capabilities is crucial for smooth operations. "Transferring knowledge" to operation teams is impractical and futile in the recursive nature of today's Enterprises; providing them with the adequate tools and UIs is simply a must.

(12. And if you insist: security.)

The beauty of the non-functional architecture is that it repeats itself across all possible dimensions, from the component level to the Enterprise level. As such, it is highly responsive to service orientation. Most of the non-functional architecture elements can be outsourced into Enterprise Services that will provide the required functionality for all components, sub-solutions and solutions across the entire Enterprise. In the future, Intelligent Enterprise Services would be used as part of the non-functional Enterprise Architecture, so Enterprise will become autonomic and self-managed. This will prepare the grounds for the rise of the machines.


Post a Comment

<< Home