|
February 2007
February 16, 2007
Can, or should SOA be implemented without web services?
Ram asks if SOA can be implemented without web services, which seems to be a valid question once we’ve understood that web services do not equal SOA. He hones this question on the issues of homogeneity and interoperability. Given that architecture, the ‘A’ of SOA is technology independent, and in his post Ram even gives examples where an SOA is implemented on a single technology, maybe the question should be should SOA be implemented without web services. Or, in other words, from a project/risk standpoint does it make sense not to use web services?
[Originally published here]
The bottom line of Ram’s post is that “web services are essential to implement SOA if the environment changes to a heterogeneous one…”. I guess that the question is what actually constitutes a web service? Is SOAP enough? If so, are we talking about SOAP 1.0, 1.1, or 1.2? If not, do we also need more of the WS specs? If so, which ones? WS-Reliable Messaging? WS-Atomic Transaction? WS-Addressing? WS-Topics? That’s what’s so great about standards – there are so many to choose from. Of course the different vendors support different subsets of all these standards so interoperability is still something of a crap shoot.
I actually want to take this discussion in a slightly different direction. At the message-payload level, interoperability is handled well enough by XML and XSD, although RelaxNG looks so much more elegant for schemas. The question now becomes one of communication – what does a message look like and how are various communications patterns represented. At the simplest level, we have HTTP which is interoperable across (almost?) all platforms but also quite lacking in higher level features. Going up we see things like JMS, queuing, and other middleware. We actually have a reasonably good ability to stitch these together across platforms. In my opinion, this would be a good place to start in terms of enterprise, and other reasonably complex systems. Going even higher we get into things like spaces which support other architectural styles than SOA, have a richer feature-set, and are often much more expensive.
One thing I think that bridges many of the concerns here is to create an abstraction layer between your service logic and the communications infrastructure. This abstraction layer would be implemented in the same technology as the service. When using platforms that support things like interfaces (.net, Java, C++ pure virtual classes) this often yields two actual layers – one for the interface, and another for the mapping to the specific technology. Add to that the use of dependency injection and you can evolve your communications with the specs and platforms. You should expect to have such an abstraction layer for each of the technologies found among your services.
Once again, from a risk/project perspective this often yields the best of both worlds. We can start with a single platform keeping things simple, say a JMS, but remaining decoupled from it we are able to evolve with the technology while maintaining most of the investments made in our services. I think that that is one aspect of the agility expected from SOA.
On the projects I consult on, web services are really just an implementation detail and not a primary architectural concern. The communications interface is well defined logically and the mapping to the chosen technology is sometimes simple, and sometimes complex – say if the technology doesn’t support pub/sub and we have to implement it ourselves in the mapping layer. If we use WS-Addressing or WS-Topics really isn’t relevant. Other specs like WS-Atomic Transaction is something that we’ll likely never use since flowing transactions between services ruins their autonomy.
Anyway, my bottom line here is that SOA can be implemented without web services in both homogeneous and heterogeneous environments, and each project needs to analyze for itself whether or not it should. However, this technology question should in no way impact any of the architectural decisions made – it’s a communications issue. Also, be aware that not all inter-service interactions should be the same, some need reliability, others need to move tons of messages, others need to interoperate well with external partners. Plan and schedule for these special cases, they take time to get right and no WS spec will magically make everything alright.
Posted by Udi Dahan at 04:41 AM Permalink
|
February 12, 2007
So, how many machines/CPUs do we need?
Regu posted an interesting question recently: "Is scalability a factor of the number of machines/CPUs?". His answer can ultimately be summed up as "yes, but..." -- it was qualified in terms of threads: "... scalability in a well designed system is a factor of number of threads that can be efficiently executed in parallel". The word "efficiently" meaning that the threads are actually doing work and not just waiting. However, the question of how many machines do we need is a hard one. Nick calls out a very important point on this, “An asymmetric farm, with machines of varying capabilities, is really hard to tune.” In all cases we find that load-leveling mechanisms like queues are good for scalability.
[Originally published here]
Just as a slight sidebar for anybody who deals with systems where work needs to be divided up and run in parallel to achieve required latency requirements, we have to deal with all the above problems and more. For instance, if we have to process images, finishing the processing on each image in one minute. Now, we have an algorithm that can do part of an image that runs at a speed of 1MB per second, single threaded on a dedicated machine with a standard 3GHz processor. So, how can we process a 1GB image in 60 seconds? Simple, get 17 processors right? Well, if you were running a 16 or 32 way SMP machine then probably yes. But what if you want to scale out, say, because you’re receiving one image every 2 seconds on average? Well, once we scale out, time is impacted quite significantly by the cost of just moving data between servers – one of the fallacies of distributed computing. It becomes a much more difficult problem – the kind that I just love sinking my teeth into :)
Anyway, a lot of us aren’t dealing in these massively parallel problem spaces but are just looking for good scalability advice. Well, one of the characteristics of a scalable system is that load is evenly distributed between machines (up to a point – if we have more machines than work that needs to be done, some will be idle). Load can be broken up in terms of resource usage – CPU, memory, disk, network, etc and we should be looking at all parameters. I’ve noticed a tendency of people to focus only on CPU usage. One case I consulted on was a system that was having performance problems although average CPU utilization was around 50%. They did a costly hardware upgrade at the time from single-CPU machines to all double-CPU, hoping to drive down the utilization and improve performance. They only succeeded half way – CPU utilization did drop, but performance (in terms of response time and throughput) didn’t improve – quite simply because the network was the bottleneck, and not processor power. As Dan so eloquently states: “Latency exists, Cope!”
If you use the Pipeline architectural pattern (page 5) that is so well known in the embedded/real-time space at the macro level (inside the service, not between services – that’s SOA), and SEDA (Staged Event-Driven Architecture) at the micro level you can create an environment where you can know the amount of resources you need to buy/provision for the expected load at a high degree of accuracy. An additional, maybe even more important benefit has to do with the resiliency of such a system. If there is a degradation in resource performance or availability, the system won’t come crashing down but rather “limp along”. Conversely, if load continues to increase beyond expected maxima, the performance (in terms of throughput) of such a system would not degrade. By monitoring response time per request, you could notice the upward trend and provision more resources. If you were working with a grid-like infrastructure, you could set these rules up so that they would be executed automatically. These are the building blocks for building “self healing” systems – one of my current favorite areas of interest.
Bottom line, I’ve found that the layered-architecture/tiered-distribution pair to be rather limited in terms of scalability (in terms of load). I would say that the solution isn’t necessarily to move to a Space-Based Architecture, as Guy mentions in this post, although many of the event-based concepts are definitely broadly applicable. Werners Vogels (Amazon’s CTO) mentions the CAP (consistency, availability, partitioning – choose 2) model for distributed systems in this podcast which I think is critical in analyzing the different parts of a complex system. On the flip side, Patrick does an excellent job of warning about the dangers of other appealing, siren-esque paths – follow them at your peril.
I’m afraid that there aren’t any easy answers, but at least we have some models that have proven themselves viable in the most strenuous scenarios. These models sometimes contradict popular architectural styles and it’s good to be aware of that. At the end of the day, it is our job to make the difficult technical tradeoffs.
Posted by Udi Dahan at 05:13 PM Permalink
|
February 10, 2007
Problems with SOA Vocabulary
After I finished reading Arnon’s SOA definitions post I felt a distinctive distaste for many of the common terms in the industry’s SOA vocabulary. Let’s say that communications between my services occurs over a pub/sub channel – a topic. One service publishes messages on that channel, another receives them via its subscription. Let’s go over the SOA terms in this context.
[Originally published here]
Contract: Who owns the message type being published? The publisher or the subscriber? Common SOA knowledge would say that the message belongs to the contract of the service that receives it. However, would that receiver “control” that message type? Would it be in charge of versioning it? I would put my money on the publisher of that message type. I think that the concept of contract is more far-reaching than just which messages a service receives. Rather, contract seems to be tied quite closely to the business-level responsibilities of the service. This brings me to my next point:
Endpoint: From Arnon’s post, “… a specific place where the service can be found and consumed. A specific contract can be exposed at a specific endpoint.” A service could probably have more than one endpoint, and it would make sense that not necessarily all of the contract in its entirety would be exposed at each one. But what about the topic described above? Is it an endpoint? If so, does it belong to the receiving service(s) or the publisher? It doesn’t make sense to have an endpoint that is shared between services, does it, or maybe that’s how we define service consumers?
Service Consumer: “A service doesn’t mean much if there isn’t someone/something in the world that uses it.” Is the publishing service “using” the subscriber when it publishes a message? I don’t think so, and the subscriber definitely isn’t using the publisher at that point either. So, we’ve got some inter-service message-based communication going on and it isn’t clear if we even have a service consumer. In fact, if all a service ever did was subscribe to some topics, and publish messages on other topics, it looks like we’d have very loose-coupling but be straying from the common SOA wisdom.
And I guess that that’s my bottom line. Patterns for building loosely-coupled, large-scale systems existed prior to the SOA tagline, and SOA (or EDA, or whatever TLA you want) has come to stand for those very patterns. However, somewhere along the line vendors appear to have gotten hold of the discussion around SOA and have apparently polluted it with terms that only cloud the original message. I know that Arnon agrees with me on many of these points (seeing as we’ve done some projects together according to these exact principles) so I don’t want this to come off the wrong way. But we, as an industry, really have to get back to our roots here, because I see this new vocabulary steering us in all sorts of sub-optimal directions.
Posted by Udi Dahan at 04:20 PM Permalink
|
February 05, 2007
Queues, Scalability, & Availability
Dr. Nick has a great post up on scalability, queues, and WCF. For some reason, everybody’s always talking about scalability, but availability gets much less play. For many systems, availability is actually the more important *ility.
[Originally published here.]
Anyway, when it comes to scaling out queues help a lot. Although not explicitly mentioned in the above post, having multiple machines feeding off of the same queue is the key to scalability and is known as the Competing Consumer pattern. The added benefit of such a design is that you get availability without any additional work, given that you have more than one consuming machine per queue.
One thing to keep in mind about the Microsoft platform today is that MSMQ does not currently support remote, transactional receives. What this means is that, in the above design, you cannot make sure that if one of the servers fails while processing a message from the queue, that that message will return to the queue. For some kinds of messages this isn’t a big deal (like stock prices), but in other cases (like money transfers) this isn’t acceptable.
So, bottom line is that queues and other asynchronous transports (JMSs and topics) enable robust systems to be built using proven patterns, but be aware of any limitations of the technology and what ramifications they may have.
Posted by Udi Dahan at 12:02 PM Permalink
|
|