June 23, 2007
BI and SOA: A Question
Few months ago I wrote here about solving the mismatch between Service Oriented Architecture (SOA) and Business Intelligence (BI).
I recently got this question from Ben:
One major question I have is around large data sets. As an experienced BI/DW architect and developer I have worked on a number of large scale data warehouses. Retrieving large data sets (i.e. millions of records) doesn't seem to fit well into SOA. As you state in your article, we could have another point-to-point interface, where the service which houses data we need gets a request and writes out a batch file (xml or plain ascii text). Then using typical ETL, we grab the file and load it. The underlying source system (service) can use optimization in generating a large data set (vs. record by record) and the data warehouse can correspondingly load in bulk.
Like most architectural questions, the answer is "it depends". For instance, if you do a run-of-the-mill ETL as an on-time setup then it is just that -- a one time setup and I don't see any contradiction between SOA goals or tenets and that.
I do think that iit is better to enhance SOA with EDA interactions to provide a long-term solution to the BI problem. You can also have a dedicated component that aggregated the information that flows in in these events and builds batch files that are suited for the ETL you've used during the setup phase (mentioned above).
It is true though that moving an already-in-place SOA to EDA is not a small feat, but adding EDA layers does not have to mean that the old interfaces go away -- especially not immediately (remember to treat services as products).
If you have a business that generates millions of records on a daily basis, then the situation is more complicated. Now you have to think about the trade-offs between "compromising" SOA and adding a dedicated interface (or a backdoor to the database) for the ETL vs. the implications of performance, bandwidth, transition costs, ROI, etc. of pushing that information with EDA. I believe in pragmatism and the "no-silver-bullet" approach so I can't say that EDA is always the best solution. (As an aside, this is part of the reason my book refers to "patterns," not "best-practices guidance"). You may find that ETL is the best trade-off in your situation. Yes, I know that it isn't a definitive answer, but real life is (usually) a little more complicated than black-and-white solutions. As architects we need to find the best trade off for the situation at hand.
By the way, if you have a question regarding anything I write here or anything else related to software architecture and you want to hear what I think about it, feel free to send it to ask@rgoarchitects.com.
Posted by Arnon Rotem-Gal-Oz at 04:13 AM Permalink
|