Site Archive (Complete)
DATABASE
EXCEPTION::QUERY

A Blog About Database Products and Technology.

by Kevin Carlson

October 2006


October 24, 2006

Stylus Studio 2007


Mario Morejon reviews the latest iteration of a favored XML toolkit, Stylus Studio 2007. New in this edition is the so-called XML Pipeline system, which provides graphical tools for describing and automating a complex series of combinations and transformations from XML and non-XML data sources into various target document types. In case you're new to the pleasure of banging your head against the desktop, trying to figure out how to code some obscure XSLT transformation, you'll also appreciate the active and informed Stylus online community.

Posted by John Jainschigg at 10:51 AM  Permalink |


October 18, 2006

Black Duck's exportIP


Black Duck does opensource software provenance analysis. They have a huge database, embracing the contents of multiple opensource, public and private repositories, including full licensing information. And their software (which, in its application form, is called protexIP) and service (which in its online form is called protexIP OnDemand) analyzes your source code, determines where every line of it came from, identifies and summarizes licensing issues affecting every scrap, and identifies areas of potential exposure. It can be used to scan masses of source (as when doing due diligence, prior to an acquisition), or be deployed downstream from checkin, identifying "cut and paste" issues as they appear, without permitting unchecked code to become part of a current build or (Heaven forfend) release. Last week, Black Duck introduced exportIP -- a further spin on the same great idea -- which analyzes source for compliance with export restrictions on strong encryption.

exportIP scans your code and comes back instantly with a list of areas affected by crypto compliance regs (and which regulations are applicable). It then streamlines the process of filling out government notification documents, and provides the necessary audit trail to substantiate claims of due diligence. The system understands a wide range of programming and scripting languages. The underlying database incorporates hundreds of opensource and private crypto libraries; and can also identify crypto-aware components exploiting resources external to applications. It can also, says Black Duck, heuristically analyze code to find "hidden" cryptographic functionality. You basically dial in your export intentions, and it gives back all the reports relevant to that set of conditions.

Posted by John Jainschigg at 08:14 AM  Permalink |


October 11, 2006

Google Code Search - RegExp on Google


I'm fast becoming addicted to Google Code Search, even though there's no site-specific meta (e.g., the equivalent of site: in a standard Google searchphrase), and I'm going to have to go through deep circles of the Inferno to get the massive ddj.com code archives indexed. At least part of the appeal, here, is that GCS uses regular expressions -- letting you drill, drill, drill down to the precise snippet of code you want to review. This morning, I was using it to track down C++ implementations of the Mersenne Twister RNG (because I need to translate it into a very weird and obscure scripting language used by the front-end for a popular MMORPG, so my consortium can play Texas Hold 'Em in virtual reality ... everybody has a hobby, right?)

If your grasp of RegExp is limited, or if you just want to stop pounding your head and start producing regular expressions that work the first time (I'm convinced that somewhere, there's a happy land filled with programmers who not only have no trouble remembering and flawlessly applying core and extended RegExp syntax and semantics, but can also effortlessly trans-escape and intermogrify their 'standard' expressions to work in environments like PHP and javascript ... I just don't know any of those people, personally), then you need to run ... do not walk to this site and buy (for very little money, which you will never begrudge) a copy of RegExp Buddy -- a very beautiful piece of software that parses regular expressions into a nice, clear tree of English-language statements, showing you exactly what your '\/[0-9]*{4,}blah/gi' does; lets you run it against sample data; and lets you cut and paste working RegExp flawlessly, to and from the escaped and intermogrified formats required by many popular language environments.

Posted by John Jainschigg at 12:41 PM  Permalink |


October 04, 2006

Great Book on High-Dimensional Indexing


Anyone dealing professionally with problems of abstracting and working with large masses of complex data -- from stock- and market analysis to the esoterica of temperature-analog "annealing" in back-prop neural networks -- will be familiar with the practice of converting data features to high-dimensional points. In some cases, the goal is visualization in a "terrain map" or similar format. But while visualization can certainly be useful, it's just one way we can imagine wanting to extract information and insight from such transformed data.

But it's hard to do similarity searches and other forms of knowledge discovery with high-dimensional datasets. In many applications, you swiftly encounter a combinatorial explosion (trees! everywhere, trees!) that disqualifies B+-tree "low-dimensional" indexing strategies, and even makes R-tree and similar multi-dimensional tools hard to scale.

No problemo (mei guangxi), says professor Cui Yu of the Dept. of Computer Science, Monmouth University. In her 2002 monograph, High-Dimensional Data Indexing, (Springer, preface online here as PDF), Prof. Yu proposes a set of improved transformative indexing, space-pruning, iteration-constraining and similarity-"distance"-judging strategies and algorithms that can make searches of even very dense HD datasets readily doable using a B+-tree indexing architecture. Her iMinMax(theta) strategy maps points in high-dimensional spaces to one-dimensional values by reference to minima and maxima of each point among all dimensions -- the process "tunable" for different data distributions by modifying the 'theta' value. The resulting data can be efficiently indexed with B+-tree.

Dr. Yu has product-architecture and development as well as academic and theoretical experience, and prioritizes real-world applications -- both in the generalized (i.e., she'd like to see standard database products and libraries incorporate these tools) and specific sense. Her writing is terse and charming, she explains hard things really well, and the applications of her work are undeniably cool. Definitely worth a read.

Posted by John Jainschigg at 11:14 AM  Permalink |



November 2007
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30  


 
INFO-LINK


Related Sites: DotNetJunkies, SD Expo, SqlJunkies