What characters are allowed in a filename

The file system which an operating system uses determines the way it handles files and the applicable file naming conventions. Most file systems follow the same top level convention for individual filenames: a base file name and an optional extension, the two parts being separated by a period. The extension normally identifies the content within the file, and for Windows Operating Systems, helps to associate a specific application program which can process the file’s content.

The allowable characters which can be used in a filename are highly dependent on the computer Operating System (OS) being considered, and the underlying file system:

  • Windows based operating systems like Windows XP prohibit specific characters from appearing in filenames. The * . ” / \ [ ] : ; | = , characters are all forbidden or can only be used with restrictions. These characters have explicit significance in the Windows file system, and as such cannot be allowed in filenames – for instance, the backslash \ is used to separate the components of a path (ex C:\Documents\), while the period is used to separate the base file name from the extension (ex test.doc). In addition to these characters, the space and the period (.) are not allowed as the first or final character of a filename. Whenever the user tries to type a forbidden character in a filename, Windows XP complains with a clear error message. Certain reserved device names are also prohibited from appearing in filenames, ex CON, PRN, AUX, COM1, LPT1, USB1, etc since these refer to computer devices and/or ports, ex COM1 refers to the PC serial port, LPT1 refers to the PC parallel port. Allowing these in filenames would create ambiguity for the OS
  • Unix-like operating systems (such as Unix, Linux, Solaris, etc) take a different approach and only prohibit the null character and the path separator / from appearing in filenames. Due to the reduced subset of forbidden file characters, many Unix shells and scripting languages like Perl require that when certain characters such as spaces, <, >, etc are used, these are quoted or escaped, i.e. preceded with a particular ASCII control character to identify their use and avoid ambiguity.

It is also interesting to investigate how different Operating Systems handle lower and upper case characters in filenames, and the maximum number of allowable characters per file name and per file absolute path:

  • Microsoft Windows OSes using the FAT or NTFS file systems treat upper and lower case characters in filenames equally. This means that the file system for the OS is said to be case insensitive. However, these Operating Systems are also said to be case preserving, meaning that case information is not discarded, which implies that when viewing a file’s name (for instance in Windows Explorer), it will be presented in the capitalization used when the file was created.
  • For Unix-like operating systems using ext2 or ext3 file systems, the OS file system is case sensitive, meaning that upper case and lower case characters in filenames are considered different.

The older MS-DOS and Win16 Operating Systems using FAT were limited to a maximum 8 character name and 3 character extension. The newer NTFS file systems used by Win32 and Win64 Operating Systems like Windows XP and Vista support filenames up to 255 characters long and paths up to about 32,767 characters long. Windows places additional restrictions, and limits paths to be of up to a total of 259 characters maximum.

How outdated are your PC drivers?
Old drivers harm system performance and make your PC vulnerable to errors and crashes.

Posted in File Formats

(comments are closed).