FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
Windows .NET Blog: In the valley of filename validation
Windows/.NET
DOCUMENT OUTLINE

Notes on DotNet

by John Dorsey
Practicing .NET

Improving developer productivity and software quality

by Mark M. Baker
September 19, 2006

In the valley of filename validation

Yesterday I was working on some code and realized I needed to validate a filename in .NET. Seemed like a straightforward thing to do. I started rummaging around in the File, Directory and Path classes in System.IO looking for something like "IsValid" that I could use.

Nothing. Thus started my journey into the valley of filenames.

I then thought I might be able to use behavior in something like Path.GetFilename to see if it would validate a bogus filename string in a path variable. Although the function throws the ArgumentException if something bad happens, it doesn't work in all cases.

Next I moved on to Path.GetInvalidFileNameChars which returns an array of characters that are *potentially* invalid as filename characters. Hmm. This seemed like a worthy candidate. Then I read through the MSDN help on the API to discover this helpful mention:

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).

Well at least they documented the fact it isn't reliable.

So I did what most other engineers do when they're stumped - I hit Google. It didn't take long to find the following blog post by Brian Dewey at Microsoft. The relevant passage is the following:

A common question for people starting to program on Windows is, “What makes a valid Windows file name?” You want to use this information to make simplifying assumptions in your code: that names can be no longer than MAX_PATH, that two names won't differ only by case, etc. Unfortunately, the answer to what makes a valid file name in Windows is not simple.

He goes on to describe peculiarities in the NTFS and Posix subsystems that allow for any Unicode character whereas the Win32 subsystem does not likely due to historical reasons (ie. its Win16/DOS heritage).

By this time, it became apparent there was only one good, reliable and simple way to test for a valid filename - try to create it. So I wrote some code to take the filename and create it in the temp directory. If it worked, the filename was ok (for now I'm ignoring issues where the file already exists and is read-only). If not, the filename was bad on the platform.

Then it occurred to me that this subjectivity may be the reason there is no File.IsValid method in .NET. Made sense after all.

Posted by Mark M. Baker at 07:13 PM  Permalink




 
INFO-LINK