Client-side Service Composition Using Generic Service Representative

I’m back for a second day at CASCON, attending the technical papers session focused on service oriented systems.

First of the three papers was Client-side Service Composition Using Generic Service Representative by Mehran Najafi and Kamran Sartipi of McMaster University; this concept is presented to prevent the possible privacy and bandwidth problems that can occur when data is passed to a server-side composition. This relies on a client-side stateless task service and service representative, and doing whatever processing can be done locally before resorting to a remote web service call. Using the task services approach rather than a Javascript or RIA-based approach provides more flexibility in terms of local service composition. McMaster has a big medical school, and the example that he discussed was based on clinical data, where privacy is a big concern; being able to maintain the patient data only on the client rather than having it flow through a server-side composition reduces the privacy concerns as well as improving performance.

I’ve been seen this paradigm in use in a couple of different BPM systems that provide client-side screen flow (usually in Javascript at the client or in the web tier) within a single process activity; I’ve seen this from TIBCO AMX/BPM’s Page Flow, Salesforce’s Visual Process Manager and Outsystems. Obviously, the service composition presented in the paper today is a more flexible approach, and is true client-side rather than on the web tier, but the ideas of local process management are already appearing in some BPM products.

There were some interesting questions about support for this approach on mobile platforms (possible as the mobile OS’s become more capable) and a discussion on what we’re giving up in terms of loose coupling by having a particular orchestration of multiple services bound to a single client-side activity.

CASCON Keynote: 20th Anniversary, Big Data and a Smarter Planet

With the morning workshop (and lunch) behind us, the first part of the afternoon is the opening keynote, starting with Judy Huber, who oversees the 5,000 people at the IBM Canada software labs, which includes the Centre for Advanced Studies (CAS) technology incubation lab that spawned this conference. This is the 20th year of CASCON, and some of the attendees have been here since the beginning, but there are a lot of younger faces who were barely born when CASCON started.

To recognize the achievements over the years, Joanna Ng, head of research at CAS, presented awards for the high-impact papers from the first decade of CASCON, one each for 1991 to 2000 inclusive. Many of the authors of those papers were present to receive the award. Ng also presented an award to Hausi Müller from University of Victoria for driving this review and selection process. The theme of this year’s conference is smarter technology for a smarter planet – I’ve seen that theme at all three IBM conferences that I’ve attended this year – and Ng challenged the audience to step up to making the smarter planet vision into reality. Echoing the words of Brenda Dietrich that I heard last week, she stated that it’s a great time to be in this type of research because of the exciting things that are happening, and the benefits that are accruing.

Following the awards, Rod Smith, VP of IBM emerging internet technologies and an IBM fellow, gave the keynote address. His research group, although it hasn’t been around as long as CAS, has a 15-year history of looking at emerging technology, with a current focus on “big data” analytics, mobile, and browser application environments. Since they’re not a product group, they’re able to take their ideas out to customers 12-18 months in advance of marketplace adoption to test the waters and fine-tune the products that will result from this.

They see big data analytics as a new class of application on the horizon, since they’re hearing customers ask for the ability to search, filter, remix and analyze vast quantities of data from disparate sources: something that the customers thought of as Google’s domain. Part of IBM’s BigInsights project (which I heard about a bit last week at IOD)  is BigSheets, an insight engine for enabling ad hoc discovery for business users, on a web scale. It’s like a spreadsheet view on the web, which is a metaphor easily understood by most business users. They’re using the Hadoop open source project to power all of the BigInsights projects.

It wouldn’t be a technical conference in 2010 if someone didn’t mention Twitter, and this is no exception: Smith discussed using BigSheets to analyze and visualize Twitter streams related to specific products or companies. They also used IBM Content Analytics to create the analysis model, particularly to find tweets related to mobile phones with a “buy signal” in the message. They’ve also done work on a UK web archive for the British Library, automating the web page classification and making 128 TB of data available to researchers. In fact, any organization that has a lot of data, mostly unstructured, and wants to open it up for research and analysis is a target for these sort of big data solutions. It stands to reason that the more often you can generate business insights from the massive quantity of data constantly being generated, the greater the business value.

Next up was Christian Couturier, co-chair of the conference and Director General of the Institute of Information Technology at the Canada’s National Research Council. NRC provides some of the funding to IBM Canada CAS Research, driven by the government’s digital economy strategy which includes not just improving business productivity but creating high-paying jobs within Canada. He mentioned that Canadian businesses lag behind other countries in adoption of certain technologies, and I’m biting my tongue so that I don’t repeat my questions of two years ago at IT360 where I challenged the Director General of Industry Canada on what they were doing about the excessively high price of broadband and complete lack of net neutrality in Canada.

The program co-chairs presented the award for best paper at this show, on Testing Sequence Diagram to Colored Petri Nets Transformation, and the best student paper, on Integrating MapReduce and RDBMSs; I’ll check these out in the proceedings as well as a number of other interesting looking papers, even if I don’t get to the presentations.

Oh yeah, and in addition to being a great, free conference, there’s birthday cake to celebrate 20 years!

CASCON Workshop: Accelerate Service Integration In Your BPM and SOA Applications

I’m attending a workshop at the first morning of CASCON, the conference on software research hosted by IBM Canada. There’s quite a bit of good work done at the IBM Toronto software lab, and this annual conference gives them a chance to engage the academic and corporate community to present this research.

The focus of this workshop is service integration, including enabling new services from existing applications and creating new services by composing from existing services. Hacking together a few services into a solution is fairly simple, but your results may not be all that predictable; industrial-strength service integration is a bit more complex, and is concerned with everything from reusability to service level agreements. As Allen Chan of IBM put it when introducing the session: “How do we enable mere mortals to create a service integration solution with predictable results and enterprise-level reliability?”

The first presentation was by Mannie Kagan, an IBMer who is working with TD Bank on their service strategy and implementation; he walked us through a real-life example of how to integrate services into a complex technology environment that includes legacy systems as well as newer technologies. Based on this, and a large number of other engagements by IBM, they are able to discern patterns in service integration that can greatly aid in implementation. Patterns can appear at many levels of granularity, which they classify as primitive, subflow, flow, distributed flow, and connectivity topology. From there, they have created an ESB framework pattern toolkit, an Eclipse-based toolkit that allows for the creation of exemplars (templates) of service integration that can then be adapted for use in a specific instance.

He discussed two particular patterns that they’ve found to be particularly useful: web service notification (effectively, pub-sub over web services), and SCRUD (search, create, read, updated, delete); think of these as some basic building blocks of many of the types of service integrations that you might want to create. This was presented in a specific IBM technology context, as you might imagine: DataPower SOA appliances for processing XML messages and legacy message transformations, and WebSphere Services Registry and Repository (WSRR) for service governance.

In his wrapup, he pointed out that not all patterns need to be created at the start, and that patterns can be created as required when there is evidence of reuse potential. Since patterns take more resources to create than a simple service integration, you need to be sure that there will be reuse before it is worth creating a template and adding it to the framework.

Next up was Hans-Arno Jacobsen of University of Toronto discussing their research in managing SLAs across services. He started with a business process example of loan application processing that included automated credit check services, and had an SLA in terms of parameters such as total service subprocess time, service roundtrip time, service cost and service uptime. They’re looking at how the SLAs can guide the efficient execution of processes, based in a large part on event processing to detect and determine the events within the process (published state transitions). He gave quite a detailed description of content-based routing and publish-subscription models, which underlie event-driven BPM, and their PADRES ESB stack that hides the intricacies of the underlying network and system events from the business process execution by creating an overlay of pub-sub brokers that filters and distributes those events. In addition to the usual efficiencies created by the event pub-sub model, this allows (for example) the correlation of network slowdowns with business process delays, so that the root cause of a delay can be understood. Real-time business analytics can also be driven from the pub-sub brokers.

He finished by discussing how business processes can actually be guided by SLAs, that is, runtime use of SLAs rather than just for monitoring processes. If the process can be allocated to multiple resources in a fine-grained manner, then the ESB broker can dynamically determine the assignment of process parts to resources based on how well those resources are meeting their SLAs, or expected performance based on other factors such as location of data or minimization of traffic. He gave an example of optimization based on minimizing traffic by measuring message hops, which takes into account both rate of message hops and distance between execution engines. This requires that the distributed execution engines include engine profiling capabilities that allows an engine to determine not only its own load and capacity, but that of other engines with which it communicates, in order to minimize cost over the entire distribute process. To fine-tune this sort of model, process steps that have a high probability of occurring in sequence can be dynamically bound to the same execution engine. In this situation, they’ve seen a 47% reduction in traffic, and a 50% reduction in cost relative to the static deployment model.

After a brief break, Ignacio Silva-Lepe from IBM Research presented on federated SOA. SOA today is mostly used in a single domain within an organization, that is, it is fairly siloed in spite of the potential for services to be reused across domains. Whereas a single domain will typically have its own registry and repository, a federated SOA can’t assume that is the case, and must be able to discover and invoke services across multiple registries. This requires a federation manager to establish bridges across domains in order to make the service group shareable, and inject any cross-domain proxies required to invoke services across domains.

It’s not always appropriate to have a designated centralized federation manager, so there is also the need for domain autonomy, where each domain can decide what services to share and specify the services that it wants to reuse. The resulting cross-domain service management approach allows for this domain autonomy, while preserving location transparency, dynamic selection and other properties expected from federated SOA. In order to enable domain autonomy, the domain registry must not only have normal service registry functionality, but also references to required services that may be in other domains (possibly in multiple locations). The registries then need to be able to do a bilateral dissemination and matching of interest and availability information: it’s like internet dating for services.

They have quite a bit of work planned for the future, beyond the fairly simple matching of interest to availability: allowing domains to restrict visibility of service specifications to authorized parties without using a centralized authority, for example.

Marsha Checkik, also from University of Toronto, gave a presentation on automated integration determination; like Jacobsen, she collaborates with the IBM Research on middleware and SOA research; unlike Jacobsen, however, she is presenting on research that is at a much earlier stage. She started with a general description of integration, where a producer and a consumer share some interface characteristics. She went on to discuss interface characteristics (what already exists) and service exposition characteristics (what we want): the as-is and to-be state of service interfaces. For example, there may be a requirement for idempotence, where multiple “submit” events over an unreliable communications medium would result in only a single result. In order to resolve the differences in characteristics between the as-is and to-be, we can consider typical service interface patterns, such as data aggregation, mapping or choreography, to describe the resolution of any conflicts. The problem, however, is that there are too many patterns, too many choices and too many dependencies; the goal of their research is to identify essential integration characteristics and make a language out of them, identify a methodology for describing aspects of integration, identify the order in which patterns can be determined, identify decision trees for integration pattern determination, and determine cases where integration is impossible.

Their first insight was to separate pattern-related concerns between physical and logical characteristics; every service has elements of both. They have a series of questions that begin to form a language for describing the service characteristics, and a classification for the results from those questions. The methodology contains a number of steps:

  1. Determine principle data flow
  2. Determine data integrity data flow, e.g., stateful versus stateless
  3. Determine reliability flow, e.g., mean time between failure
  4. Determine efficiency, e.g., response time
  5. Determine maintainability

Each of these steps determines characteristics and mapping to integration patterns; once a step is completed and decisions made, revisiting it should be minimized while performing later steps.

It’s not always possible to provide a specific characteristic for any particular service; their research is working on generating decision trees for determining if a service requirement can be fulfilled. This results in a pattern decision tree based on types of interactions; this provides a logical view but not any information on how to actually implement them. From there, however, patterns can be mapped to implementation alternatives. They are starting to see the potential for automated determination of integration patterns based on the initial language-constrained questions, but aren’t seeing any hard results yet. It will be interesting to see this research a year from now to see how it progresses, especially if they’re able to bring in some targeted domain knowledge.

Last up in the workshop was Vadim Berestetsky of IBM’s ESB tools development group, presenting on support for patterns in IBM integration offerings. He started with a very brief description of an ESB, and WebSphere Message Broker as an example of an ESB that routes messages from anywhere to anywhere, doing transformations and mapping along the way. He basically walked through the usage of the product for creating and using patterns, and gave a demo (where I could see vestiges of the MQ naming conventions). A pattern specification typically includes some descriptive text and solution diagrams, and provides the ability to create a new instance from this pattern. The result is a service integration/orchestration map with many of the properties already filled in; obviously, if this is close to what you need, it can save you a lot of time, like any other template approach.

In addition to demonstrating pattern usage (instantiation), he also showed pattern creation by specifying the exposed properties, artifacts, points of variability, and (developer) user interface. Looks good, but nothing earth-shattering relative to other service and message broker application development environments.

There was an interesting question that goes to the heart of SOA application development: is there any control over what patterns are created and published to ensure that they are useful as well as unique? The answer, not surprisingly, is no: that sort of governance isn’t enforced in the tool since architects and developers who guide the purchase of this tool don’t want that sort of control over what they do. However, IBM may see very similar patterns being created by multiple customer organizations, and choose to include a general version of that pattern in the product in future. A discussion about using social collaboration to create and approve patterns followed, with Berestetsky hinting that something like that might be in the works.

That’s it for the workshop; we’re off to lunch. Overall, a great review of the research being done in the area of service integration.

This afternoon, there’s the keynote and a panel that I’ll be attending. Tomorrow, I’ll likely pop in for a couple of the technical papers and to view the technology showcase exhibits, then I’m back Wednesday morning for the workshop on practical ontologies, and the women in technology lunch panel. Did I mention that this is a great conference? And it’s free?