Database Blog 2008-03-06T16:15:54Z tag:,2008:/49 Movable Type Copyright (c) 2008, Nik Hemdal Evans Data Market Research database survey from Dec 2007 2008-03-06T16:15:54Z 2008-03-06T16:04:58Z tag:,2008:/49.31044 2008-03-06T16:04:58Z The survey, compiled from 1470 developers and IT Managers, compared Oracle, DB2, MySQL, Informix Dynamic Server, PostgreSQL, Microsoft SQL Server, and Sybase Advanced Server Enterprise on 13 categories. The complete survey results may be downloaded from here.... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog The survey, compiled from 1470 developers and IT Managers, compared Oracle, DB2, MySQL, Informix Dynamic Server, PostgreSQL, Microsoft SQL Server, and Sybase Advanced Server Enterprise on 13 categories.

The complete survey results may be downloaded from here.

]]>
Microsoft SQL Server Data Services 2008-03-05T21:14:50Z 2008-03-05T20:54:36Z tag:,2008:/49.31017 2008-03-05T20:54:36Z SSDS Promises scalable, on-demand data storage and query processing web services that are pay as you go. No restrictions on the amount of data storage. Supports REST and SOAP interfaces, and there is no surprise that it utilizes LINQ as... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog SSDS Promises scalable, on-demand data storage and query processing web services that are pay as you go. No restrictions on the amount of data storage. Supports REST and SOAP interfaces, and there is no surprise that it utilizes LINQ as a query language. SSDS is still in private Beta, but public Beta is coming.

Also see Amazon SimpleDB.

]]>
Personal Disaster Recovery on a stick 2008-03-04T07:14:47Z 2008-03-04T06:42:39Z tag:,2008:/49.30945 2008-03-04T06:42:39Z What started out as a simple knowledge management life-hacking exercise has now blossomed into a full-blown personal disaster recovery solution. Aside from the traditional paper filing system, a redundant electronic version of my life’s inventory was nonexistent. I lacked a... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog What started out as a simple knowledge management life-hacking exercise has now blossomed into a full-blown personal disaster recovery solution. Aside from the traditional paper filing system, a redundant electronic version of my life’s inventory was nonexistent. I lacked a personal electronic data management policy. A data management policy where the documents that represent my life are secure and stored on my person at all times. With the help of a scanner and a custom JDBC application to store and retrieve BLOBs, I’ve corralled all of my personal documents into an encrypted Apache Derby database. An Apache Derby database is stored in platform-independent files in a directory of the same name as the database. Apache Derby encryption provides complete encryption of on-disk data: indexes, tables, transaction log file, table data, metadata, etc. Using Apache Derby also allows me to provide my relatives with secure backups of my life’s inventory.

]]> I’m still shopping for ruggedized USB sticks to use as actual containers. I need USB sticks that will withstand accidental washings, foul weather if I am caught in a storm while riding my bicycle, have heightened crush resistance, etc. USB sticks like the IronKey or those that utilize biometric information look interesting. HD Tach reported that my Verbatim 2GB Store ‘n’ Go was the fastest USB stick that I had lying around; however, at 1.0” by .5” my ADATA 4GB offered the best form factor especially when attached to a dog tag style personal ID.

Everyone talks about the desert island applications, tools, pictures, and music that they carry with them on their USB stick, but not many people consider using a USB stick to communicate information to first responders. In addition to including an ICE(In Case of Emergency) text document for first responders, I also run XAMPP/MediaWiki directly from the USB stick to manage my daily brain-dumps.


In Case Of Emergency

So why include first responder ICE data on my USB stick when I don’t have any medical conditions to speak of? Some of my personal activities (bicycle racing and training) are dangerous, and I prefer to carry a product like RoadID instead of a wallet for these activities. I want emergency personnel to have immediate access to my emergency information in the event of an accident. The information includes: full name, all emergency contacts, doctors, blood type, conditions, regular medications, allergies, basic health insurance info (name and phone only no ids). I will have a USB stick affixed to my RoadID whenever I head out of the house, and the RoadID includes a line that says “EMERGENCY SEE USB”. A simple text document named “In Case Of Emergency.txt” on my USB stick will provide quick access to my emergency information. There are USB products for this purpose of course, but there is no reason that a DIY labeled “In Case Of Emergency” USB is any less effective/visible with a first responder. In fact, I asked the Police Chief of my little Pennsylvania Borough about first responders and USB sticks, and hear is his response:

“Currently, there is no protocol to look for USB flash drives on an injured, unconscious, and/or Alzheimer type persons. The only thing that first responders have been asked to look for in these situations are the commercially available Medic Alert bracelets and or necklaces.
Through the years as an Assistant Fire Chief, EMT/Paramedic Assistant, and Police Officer, I’ve received a lot of info on this issue. As of yet, we have not been told to look for USB flash drives. We do look for a wallet, purse, mobile phone or the Road ID type of info shown on the site that you provided etc. and then go through it to check for any info such as medical issues and contact information.

I am certain if a person is in the unfortunate position that you described, and we found a USB flash drive marked as you indicated, we would quickly plug it into our car computer to see what we could find.”

While I appreciate the willingness of RoadID, Google, Microsoft Corp, and Revolution Health Group, LLC to manage my personal emergency information online and possibly make it available to first responders, I plan on keeping this information as close to me as possible. It is interesting to note that only 14% of medical practices keep records electronically, so at some point my encrypted Apache Derby database will expand to include some scanned medical records as well.


Knowledge Portfolio

Let’s move on to brain-dumps. While I invest regularly in my knowledge portfolio, I have not done the greatest job of electronically centralizing all those wonderful little nuggets of continuous self-improvement. Problem solved – “Wiki on a Stick”. I downloaded two compressed files (XAMMP Lite and MediaWiki), decompressed both files, and copied the contents to my USB stick. Some simple property settings using my browser, and my wiki was alive and serving all of my desktops.


Household Inventory

The most important piece of this discussion is managing my life’s inventory in an embedded Apache Derby database. This is the data that will help me rebuild my world in the event of some personal disaster such as a fire, tornado, or earthquake. I’m talking about storing scanned documents that include the following information:
• all account information and recent bills
• paystubs
• important receipts and canceled checks
• birth/marriage certificates
• deeds
• tax papers
• insurance policies
• stock/bond certificates
• professional appraisals.
• photographs of possessions that include a member of the family holding the item.
• photographs of the house, every room, every closet, basement, garage, and automobiles


Testing Apache Derby Encryption

Is my data safe in an Apache Derby BLOB? To establish an encryption test baseline, I inserted scanned documents into an unencrypted Derby database. I then made a DD image of the USB stick using FTK Imager Lite. Finally, using scalpel I was able to easily carve out the scanned PDF documents stored in Apache Derby in under a minute. I expected that the scanned PDF documents were easily retrievable from an unencrypted Apache Derby database regardless of database user authentication settings. I repeated the same process for an encrypted Apache Derby database, and was unable to carve out the scanned documents using scalpel. I now have a warm fuzzy that my life’s inventory is safe.

I'll post the JDBC application source shortly.

]]>
My Life On A Stick 2008-02-01T21:41:14Z 2008-02-01T20:58:35Z tag:,2008:/49.30082 2008-02-01T20:58:35Z Ok, so I have come to the realization that my memory isn’t quite what it used to be. My wife came to this realization a decade ago, but that is a topic for another day and another blog for that... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog Ok, so I have come to the realization that my memory isn’t quite what it used to be. My wife came to this realization a decade ago, but that is a topic for another day and another blog for that matter. I used to have an uncanny knack for storage and retrieval of both professional and personal data using just my brain. I am basing this perceived waning of my data storage and retrieval capabilities on the fact that I find myself clicking “Forgot Password” or “Forgot Username” more often. I don’t think that I have a data capacity issue, but you never know. To combat this, I’ve enacted a “No Data Left Behind” policy. This means that, provided there is a Linux or Windows PC available, I will always have access to all the data (URLs, login credentials, account numbers, etc.) that I can no longer seem to store and retrieve efficiently using just my brain. The additional value proposition of storing images of important documents and receipts will come in handy as well. Oh yeah, the biggest win for me is that all the data is portable and searchable. It really is amazing that I will get all of this on a simple USB flash drive – essentially “my life on a stick”.

]]> Now I could keep things simple and use TrueCrypt and flat files, but I decided to use the freely available Apache Derby and the security that it provides as my digital wallet instead. By the way, a new release of TrueCrypt is due out next week and will include Windows system partition encryption with pre-boot authentication, a Mac OS X version, a Linux GUI, etc. In addition to maintaining a data store that is forensically limiting, I also have some concern with the mean time to failure of my USB flash drive. This is something that I have not quantified/qualified but have read that read-write cycles run as high as 100,000 for better quality drives, and as low as 25,000 read-write cycles for the cheap ones. I have also ignored thinking about drive transfer speeds, and focused more on the total storage size of the flash drives that I have purchased. It may be prudent to carry redundant flash drives.

My digital wallet data security is job one. In addition to the typical user authentication database access restrictions, Apache Derby provides complete encryption of on-disk data. Everything is encrypted: tables, indexes, transaction log, table data, temporary files, system metadata, and so forth. Out of the box encryption strength is 56-bit DES but this is easily switched to another encryption algorithm. I do plan on periodically verifying/validating physical data file security with FTK Imager Lite and WinHex, or some other combination of cyber forensics tools. Come to think about it, the default 56-bit DES is probably enough considering that I regularly entrust waitrons with my credit card information, and retail staff with my driver’s license information for check verification purposes.

Apache Derby is a fully functional RDBMS written entirely in Java. It runs in any JVM (version 1.4 higher). For now, I plan on using the Apache Derby ij JDBC application with Linux and Windows scripting to manage my digital wallet data. I may also incorporate the use of the SQuirrel SQL universal client. I haven’t had issues with either on my openSUSE or Windows PCs. In my next post, we’ll explore this project further.

]]>
Adding comments, or remarks, to database schema objects. 2007-11-28T14:17:08Z 2007-11-28T01:18:41Z tag:,2007:/49.28536 2007-11-28T01:18:41Z Last week I was asked a question about the ability to provide explanatory remarks to database schema objects. The question focused on the typical remarks that elucidate the intent or purpose of the schema objects; however, keep in mind that... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog Last week I was asked a question about the ability to provide explanatory remarks to database schema objects. The question focused on the typical remarks that elucidate the intent or purpose of the schema objects; however, keep in mind that remarks can also specify justification of standards or best practice violations, stored procedure limitations, use of undocumented features, column data units, conversion information, and so forth. An experienced database designer, or database developer, will make every attempt to create self-documenting schema objects. In the event that there is information that cannot be expressed by the name of the schema object itself, the database designer/developer will apply an appropriate schema object remark. The remarks are stored in the system catalog. In addition to providing an essential inline database reference for the developers and/or maintenance team, remarks make you think about the design.

]]> While this capability is not part of the SQL standard, it is supported by the major DBMS vendors. Every major vendor except Microsoft supports adding explanatory remarks to their respective system catalogs through the use of the COMMENT statement. DB2 provides support for up to 254 characters while Oracle supports up to 2K. DB2 stores the remarks inline with the other metadata in the system catalog. For example, to access remarks in the DB2 system catalog you would supply the following SQL:

SELECT tabname, remarks FROM syscat.tables where tabschema=’DEV’

Both Oracle and Microsoft do not store the remarks inline with the other metadata in the system catalog. You access Oracle remarks using the system views USER_TAB_COMMENTS and USER_COL_COMMENTS. In Microsoft SQL Server, you access remarks through the sys.extended_properties catalog view. Microsoft SQL Server does not limit you to applying only one piece of explanatory information per schema object; however, the stored procedures to create and manipulate schema object remarks are laborious when compared to the other vendors.

Do yourself a favor and spend some time reviewing the schema object documentation capabilities of your DBMS. Remember the remarks that you supply to the system catalog are available to the team when they are working with the physical database. The team will love you for it, especially when you are not around.

]]>
Preserving XML document order when using XQuery.nodes in relational table joins 2007-11-15T19:56:14Z 2007-11-15T17:48:10Z tag:,2007:/49.28317 2007-11-15T17:48:10Z If you are not taking advantage of the XML support in SQL Server 2005, then shame on you. Among other things, XML is the perfect data representation format for passing simple collections (i.e. arrays and lists) to stored procedures. Whether... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog If you are not taking advantage of the XML support in SQL Server 2005, then shame on you. Among other things, XML is the perfect data representation format for passing simple collections (i.e. arrays and lists) to stored procedures. Whether you want to easily optimize your CRUD operations, and/or take advantage of stored procedure parameter validation using typed XML. Furthermore, XQuery is great at shredding the XML into a relational format for use within your stored procedures.

]]> Unlike the relational model, XML documents have an implied order. By default, the XQuery ordering mode in SQL Server 2005 is ordered. In other words, the node sequences returned by the path expressions are in XML document order. It is never safe to assume that the SQL Server Query Processor will preserve the XML document order when joining it with a relational table. In most of the testing that I conducted, the ordering was preserved; however, in the following circumstance it was not.

Consider the following DDL:

CREATE TABLE Test (
col1 UNIQUEIDENTIFIER NOT NULL PRIMARY KEY CLUSTERED DEFAULT NEWSEQUENTIALID(),
col2 VARCHAR(100) NOT NULL
)
GO

CREATE PROCEDURE TestProc
@list XML
AS
BEGIN
SELECT Test.col1, Test.col2
FROM @list.nodes ( ‘/List/Value’ ) List(col1)
INNER JOIN Test ON Test.col1 = List.col1.value (‘@col1’, ‘UNIQUEIDENTIFIER’ )
END
GO

and the following Table data (in INSERT order):

col1 col2
------------------------------------ ---------------------------------
D443AD7A-9293-DC11-9042-00065B83FA16 One
D543AD7A-9293-DC11-9042-00065B83FA16 Two
D643AD7A-9293-DC11-9042-00065B83FA16 Three
D743AD7A-9293-DC11-9042-00065B83FA16 Four
D843AD7A-9293-DC11-9042-00065B83FA16 Five
D943AD7A-9293-DC11-9042-00065B83FA16 Six
DA43AD7A-9293-DC11-9042-00065B83FA16 Seven
DB43AD7A-9293-DC11-9042-00065B83FA16 Eight
DC43AD7A-9293-DC11-9042-00065B83FA16 Nine
DD43AD7A-9293-DC11-9042-00065B83FA16 Ten

and finally the following batch:

DECLARE @list XML
SET @list = '<List>
<Value col1="D843AD7A-9293-DC11-9042-00065B83FA16" />
<Value col1="D943AD7A-9293-DC11-9042-00065B83FA16" />
<Value col1="D843AD7A-9293-DC11-9042-00065B83FA16" />
<Value col1="D943AD7A-9293-DC11-9042-00065B83FA16" />
<Value col1="D843AD7A-9293-DC11-9042-00065B83FA16" />
<Value col1="D943AD7A-9293-DC11-9042-00065B83FA16" />
</List>'

EXEC TestProc @list

SELECT t.col1, t.col2
FROM @list.nodes('/List/Value') List(col1)
INNER JOIN Test t ON t.col1 = List.col1.value('@col1','uniqueidentifier')

While the SELECT t.col1, t.col2… produced the desired resultset, EXEC TestProc @list produced a resultset not in XML order because the optimizer chose a plan that specified a Merge Join algorithm.

col1 col2
------------------------------------ ---------------------------------
D843AD7A-9293-DC11-9042-00065B83FA16 Five
D843AD7A-9293-DC11-9042-00065B83FA16 Five
D843AD7A-9293-DC11-9042-00065B83FA16 Five
D943AD7A-9293-DC11-9042-00065B83FA16 Six
D943AD7A-9293-DC11-9042-00065B83FA16 Six
D943AD7A-9293-DC11-9042-00065B83FA16 Six

To preserve the XML document order within our stored procedure, we can simply use the SQL Server 2005 ranking functions. So TestProc now looks like this:

ALTER PROCEDURE TestProc
@list XML
AS
BEGIN
SELECT t.col1, t.col2
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY preserveCount) AS rowNumber, PreserveOrder.col1
FROM
(SELECT List.col1.value('@col1','uniqueidentifier') AS col1
, 0 AS preserveCount
FROM @list.nodes('/List/Value') List(col1)) PreserveOrder
) OrderedList
INNER JOIN Test t ON t.col1 = OrderedList.col1
ORDER BY OrderedList.rowNumber ASC;
END


EXEC TestProc @list will now produce the proper results.

col1 col2
------------------------------------ ---------------------------------
D843AD7A-9293-DC11-9042-00065B83FA16 Five
D943AD7A-9293-DC11-9042-00065B83FA16 Six
D843AD7A-9293-DC11-9042-00065B83FA16 Five
D943AD7A-9293-DC11-9042-00065B83FA16 Six
D843AD7A-9293-DC11-9042-00065B83FA16 Five
D943AD7A-9293-DC11-9042-00065B83FA16 Six

]]>
Applying Sprint Updates to a Database Codeline 2007-11-13T23:41:38Z 2007-10-25T04:47:02Z tag:,2007:/49.27780 2007-10-25T04:47:02Z The previous posts discussed a generic codeline folder structure and PowerShell management scripts for database schema version control. This post discusses the management of active database development work in the Mainline codeline. For every database in our Mainline codeline folder... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog The previous posts discussed a generic codeline folder structure and PowerShell management scripts for database schema version control. This post discusses the management of active database development work in the Mainline codeline.

For every database in our Mainline codeline folder structure there are two child folders: Previous and Sprint. The Previous folder contains DDL files for every database schema object, including all BCP domain/default data files, necessary to restore a reference image of the last released database schema. In the folder structure below, the Previous folder contents would construct a 2007.2 equivalent database schema for MyDatabase2. The Sprint folder contains the DDL and DML necessary for active development work. In older codelines, the Sprint folder provides a quick snapshot of what happened in the database schema for that particular release.

]]>


Project
   2007.2
   Mainline
      Business Logic
      Unit Tests
      Database
         bin
         Reference
         MyDatabase1
         MyDatabase2
            Previous
               Tables
               StoredProcedures
               …
            Sprint

Sprint database schema object updates are applied using this PowerShell/SMO script and controlled via an XML manifest. Every DDL or DML modification is listed in the manifest. A SprintUpdate XML Element in the manifest identifies one modification, and is decorated with appropriate sprint and sprint backlog information XML attributes; furthermore, there is an XML attribute to control whether or not the modification is actually applied to the specified database during script execution. The manifest will contain DDL and DML database schema modifications for all Sprints until the Release Sprint. When a Release Sprint completes, the Mainline codeline is renamed to the appropriate release version id and a new Mainline codeline is constructed.

]]>
Powershell and SMO scripts to support Version Control 2007-10-25T04:21:34Z 2007-10-22T22:25:50Z tag:,2007:/49.27710 2007-10-22T22:25:50Z In the previous post we established a generic version control folder structure for our codelines. I have created two scripts: ScriptDatabase.ps1 and CreateSchema.ps1 to support this codeline structure. The ScriptDatabase.ps1 script will extract all of the database objects from a... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog In the previous post we established a generic version control folder structure for our codelines. I have created two scripts: ScriptDatabase.ps1 and CreateSchema.ps1 to support this codeline structure. The ScriptDatabase.ps1 script will extract all of the database objects from a specified database into our Previous folder in the codeline structure. The CreateSchema.ps1 will create a new database schema from the codeline folders on the specified Microsoft SQL Server instance.

]]> I have incorporated log4net in each PowerShell script. You can view the log4net configuration file here. Just drop the log4net.dll in the same directory as the PowerShell script and you are good to go. log4net is configured for Console output, but I have included the Appender for logging to a file.

Invoking the scriptdatabase.ps1 script:

C:\powershell .\scriptdatabase.ps1 MyInstanceName MyDatabaseName C:\Project\MainLine\Database\MyDatabaseName\Previous

where, "MyInstanceName" is the name of a Microsoft SQL Server Instance, "MyDatabaseName" is the name of an existing database on the specified instance, and "C:\temp" is the root path for the codeline folders. In other words the script will create/populate C:\Project\MainLine\Database\MyDatabaseName\Previous\StoredProcedures, C:\Project\MainLine\Database\MyDatabaseName\Previous\ForeignKeys, C:\Project\MainLine\Database\MyDatabaseName\Previous\Tables, etc.

In the next post I will explain how Sprint updates are handled.

[The source code is also available as one zip file: ddj071022hemdal.zip.]

]]>
Database Version Control for Agile teams using Scrum 2007-10-13T06:05:11Z 2007-10-13T05:34:26Z tag:,2007:/49.27472 2007-10-13T05:34:26Z The key challenges of maintaining database schemas for agile teams are: 1. providing easy restoration of previous versions 2. repeatable process that supports autonomous work 3. allow concurrent updates to the database schema objects 4. synchronization with the application code... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog The key challenges of maintaining database schemas for agile teams are:

1. providing easy restoration of previous versions
2. repeatable process that supports autonomous work
3. allow concurrent updates to the database schema objects
4. synchronization with the application code to ensure stable builds
5. storage of production regression test data
6. storage of default domain data
7. quick deployment of a specific version of a database schema through reference objects
8. a quick view of what happened to the database schema for a particular release.

]]> Supporting a database-centric COTS or hosted product line typically involves troubleshooting customer/production issues with previous database schema versions that can be many releases old. Central to easily reverting to previous database schema versions is designing an appropriate codeline structure in the version control repository. Codelines are branches of development that contain the application source files/artifacts and the database schema artifacts. An agile version control repository will contain a codeline for each released version of an application, and a single codeline name “Mainline”. All of the active development work is performed in the “Mainline” codeline, and the other release codelines are immutable. The only time a release codeline is not immutable is if a patch or maintenance work is required in a released version prior to the conclusion of the Release Sprint of the “Mainline” codeline. The last release is branched into a new release codeline, and upon conclusion any maintenance modifications are merged back into the “Mainline” codeline. The codeline structure that follows addresses the key challenges identified earlier.

Sample codeline structure:


Project
   2007.1.0
   2007.1.1
   2007.1.2
   2007.2
   Mainline
      Business Logic
      Unit Tests
      Database
         Library
         Reference
         MyDatabase1
         MyDatabase2
            Previous
               Tables
               Foreign Keys
               Defaults
               DML Triggers
               DDL Triggers
               Check Constraints
               Functions
               XML Schemas
               Stored Procedures
               Views
               Options
               Security
               Indexes
               Synonyms
            Data
            Sprint

The Database folder structure in each codeline above is similar in format. The folder contains a Library folder, a Reference Folder, and a folder for each database in the codeline. The Library folder contains the PowerShell scripts and other files necessary for providing a repeatable database schema restoration process. The Reference folder contains pre-built implementation(s) of each database contained in the codeline. The pre-built database(s) do not have any Sprint updates applied, but they are populated with data. The Sprint folder contains the DDL and DML scripts for active development work. Use descriptive script names for DDL and DML scripts to provide a quick view of database schema modifications in previous releases. I also include a reference to the Sprint Backlog Item in the name for traceability.

I will post PowerShell/SMO scripts to support this codeline structure shortly.

]]>
SQL Server 2005 Forensics 2007-10-02T06:30:55Z 2007-10-02T06:25:15Z tag:,2007:/49.27155 2007-10-02T06:25:15Z Check out this recent addition to the SANS InfoSec Forensics Reading Room. It contains an interesting report that includes some methods and techniques to uncover digital evidence related to a SQL Server 2005 security incident. The report follows proper cyber... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog Check out this recent addition to the SANS InfoSec Forensics Reading Room. It contains an interesting report that includes some methods and techniques to uncover digital evidence related to a SQL Server 2005 security incident. The report follows proper cyber forensics investigative procedures, and the data acquisition section will definitely increase your personal knowledge portfolio.

]]>
SMO and PowerShell 1.0 2007-09-29T03:00:00Z 2007-09-29T00:42:18Z tag:,2007:/49.27117 2007-09-29T00:42:18Z This is an introductory post, in a series of posts, where I intend to use PowerShell and SMO to describe a database schema versioning process that supports the needs of your Agile development team. I'm not going to bore anyone... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog This is an introductory post, in a series of posts, where I intend to use PowerShell and SMO to describe a database schema versioning process that supports the needs of your Agile development team. I'm not going to bore anyone with a Powershell or SMO Primer, so if you have any questions please contact me at niklas@hemdal.com for assistance.

For this post, I just wanted to expose a little PowerShell script to demonstrate how amazingly powerful this technology is. The following code enumerates the schema objects for a user-supplied Microsoft SQL Server database, and generates the corresponding T-SQL create script files for objects whose names match a user-supplied regular expression. I chose to enumerate the objects based upon the object name, but I have left script comments in to specify the enumeration using the schema object type (i.e. Stored Procedure, Table, Foreign Key, etc.). To create T-SQL create script files for all database schema objects simply supply ".*" as the regular expression to match.

]]> param (
[string] $serverName,
[string] $dbName,
[string] $objectPattern,
[string] $outputPath
)

function ScriptSqlObject([String]$obj,[string]$objType,[string]$targetPath)
{
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO") | out-null
$server = new-object ( 'Microsoft.SqlServer.Management.Smo.Server')$serverName
$db = $server.Databases[$dbName]

$o = new-object ( 'Microsoft.SqlServer.Management.Smo.Scripter') ($server)
$o.Options.WithDependencies = $false

switch ( $objType.Trim() )
{
"U" { $actualObject = $db.Tables[$obj]; $extension = ".tbl"; break; }
"P" { $actualObject = $db.StoredProcedures[$obj]; $extension = ".sp"; break; }
Default { return; }
}

$f = [System.IO.Path]::Combine($targetPath, $actualObject.Name + $extension)
if ( [System.IO.File]::Exists($f) -eq $true )
{
[System.IO.File]::Delete($f)
}
$o.Options.FileName = $f
$o.Options.AppendToFile = $true
$o.Options.ScriptDrops = $true
$o.Options.IncludeIfNotExists = $true
# script the drop
$o.Script($actualObject.Urn)

$o.Options.DriPrimaryKey = $true
$o.Options.ScriptDrops = $false
$o.Options.IncludeIfNotExists = $false
# script the create
$o.Script($actualObject.Urn)
}

$cn = new-object System.Data.SqlClient.SqlConnection
$cn.ConnectionString = "Server=$serverName;Database=$dbName;Integrated
Security=True"
$cmd = new-object System.Data.SqlClient.SqlCommand $cmd.CommandText = "SELECT * FROM sys.objects"
$cmd.Connection = $cn
$a = new-object System.Data.SqlClient.SqlDataAdapter
$a.SelectCommand = $cmd
$ds = new-object System.Data.DataSet
$a.Fill($ds)
$cn.Close()
$names = @{}
$ds.Tables[0] | %{if($_.Name -match [regex]$objectPattern) { $names[$_.Name] = $_.Type } }
# to enumerate all table objects specify “U\s” as input to $objectPattern
#$ds.Tables[0] | %{if($_.Type -match [regex]$objectPattern) { $names[$_.Name] = $_.Type } }
foreach ( $key in $names.Keys )
{
ScriptSqlObject $element $names[$key] $outputPath
}

#$sr=new-object System.IO.StreamReader("C:\")
#$script=sr.ReadToEnd
#$db.ExecuteNonQuery($script)


]]>
An introduction to SQLite. 2007-08-21T16:56:01Z 2007-08-21T16:24:25Z tag:,2007:/49.25761 2007-08-21T16:24:25Z SQLite is used by some of the biggest names in IT; Apple, Adobe, and Google to name a few (Apple in the iPhone, Adobe in AIR for building and deploying web applications on the desktop, and Google in the Gears... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog SQLite is used by some of the biggest names in IT; Apple, Adobe, and Google to name a few (Apple in the iPhone, Adobe in AIR for building and deploying web applications on the desktop, and Google in the Gears browser extension). It may already be the most widely distributed database in the world. It truly is a zero-cofiguration, almost SQL-92 compliant, public domain database that runs on pretty much every operating system. Little Endian, Big Endian, makes no difference - files can be freely shared. It certainly is not an enterprise database replacement, but there are quite a few situations where SQLite works very well. Dr. Richard Hipp, the primary author of SQLite, provided a Google TechTalk briefing in early 2006; legible slides are found here.

While the SQLite Library core is C code, the number of language bindings available is staggering. As my SQLite introduction, I developed a simple .NET 2.0 assembly that uses SQLite to aggregrate Internet Explorer favorites from all of my PCs. The supplied command-line administration tool provides the capability to quickly generate a consumable html file of aggregated links. What is nice is that I can use the power of SQL to filter the aggregated links by title, date, grouping, etc. No surprise that Mozilla's Firefox 3 is moving to SQLite for storage of their bookmarks among other things.

One of the distinctive SQLite features that did lead to a data defect was column datatype affinity. While traditional databases use static datatyping on columns, SQLite uses the column datatype as a recommendation. In other words, you can store any value of any datatype into any column (except a column that specifies INTEGER PRIMARY KEY). I had improperly formatted the creationTime attribute for insertion which created a problem when attempting to use it in an ORDER BY later on. Besides column type affinity, it's also important to keep in mind that SQLite does not enforce RI and complicates table evolution with limited ALTER TABLE options.

It was extremely simple, and deserves a closer look for use in other endeavors. You can read the source listing in html here.

]]>
Assessing Production Data Quality 2007-08-09T00:22:31Z 2007-08-08T06:25:06Z tag:,2007:/49.25476 2007-08-08T06:25:06Z In a recent Agile Newsletter, Scott Ambler highlighted the need to validate data quality via testing (Questioning Traditional Data Management). At a minimum, he proposed regression testing things like column domain value rules, column default value rules, value existence rules,... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog In a recent Agile Newsletter, Scott Ambler highlighted the need to validate data quality via testing (Questioning Traditional Data Management). At a minimum, he proposed regression testing things like column domain value rules, column default value rules, value existence rules, row value rules, and size rules to help ensure data quality. Scott also points out that constraints setup to prohibit data quality errors are easily dropped or reworked.

]]> Existing constraints are easily checked with commercial schema comparison tools and a reference database; however, the quality of the data in a production system is a different story. Keep in mind that data quality issues can incubate over time in the rush to quickly deploy new features when fundamental design and modeling practices are skirted, or (shudder) when a Production DBA relaxes constraints to merge in some new data.

I started out with the question of how can I quickly assess data quality in 300 of my production Microsoft SQL Server databases, at any time, using some of Scott’s regression testing criteria? I wanted these assessment tests to be close to the database itself, a simple table with a few stored procedures. I started out simple for this post by testing column default values. The first thing I needed was a Stored Procedure, ExecLiteral, that would automatically execute intermediate results similar to what sp_execresultset did in Microsoft SQL Server 2000. Next I needed to generate the reference xml data required to validate column default values and column value existence for future tests in other databases. The idea was a simple stored procedure that used the INFORMATION_SCHEMA.COLUMNS view and FOR XML EXPLICIT to generate the xml that I needed to validate the column defaults, and test the existing column data for value existence. In future posts, we may evolve this to include actual default value INSERT tests. The Stored Procedure that identifies missing defaults uses the new T-SQL EXCEPT operator to check for missing default schema values. Then ExecLiteral is invoked to test for the existence of a value in a column with a specified default.

Let’s see where this takes us. You can read the source listing in html, and text.

]]>
Katmai and the End of DMO Support 2007-08-07T20:04:54Z 2007-08-07T19:13:09Z tag:,2007:/49.25463 2007-08-07T19:13:09Z SQL Server 2008 (Katmai) is set for launch on February 27, 2008, and Microsoft had previously announced that this would be the last version to support SQL Database Management Objects (SQL-DMO). However, Allen White notes in his SQLJunkies blog that... jdorsey https://i.cmpnet.com/ddj/images/headshots/jdorsey.jpg jdorsey@cmp.com Editors Blog SQL Server 2008 (Katmai) is set for launch on February 27, 2008, and Microsoft had previously announced that this would be the last version to support SQL Database Management Objects (SQL-DMO). However, Allen White notes in his SQLJunkies blog that the July CTP release of Katmai contains a warning that the Express version of SQL Server 2008 will not support DMO. Developers should instead use the SQL Server Management Objects (SMO) library introduced in SQL Server 2005.

]]>
New Blog by Niklas Hemdal 2007-08-02T19:01:06Z 2007-08-02T18:58:51Z tag:,2007:/49.25366 2007-08-02T18:58:51Z Nik designs secure and efficient Enterprise and COTS data solutions in Washington, DC. His career includes database development/administration and data warehousing using vendor products from Sybase, Oracle, IBM, and Microsoft on a variety of platforms. You can reach him at... Nik Hemdal http://i.cmpnet.com/ddj/images/headshots/nhemdal.jpg niklas@hemdal.com Freelancer2 Blog Nik designs secure and efficient Enterprise and COTS data solutions in Washington, DC. His career includes database development/administration and data warehousing using vendor products from Sybase, Oracle, IBM, and Microsoft on a variety of platforms. You can reach him at niklas@hemdal.com.

]]>