Tuesday, April 5, 2011

What characters should be restricted from a Unix file name?

Consider a Save As dialog with a free text entry where the user enters a file name as free text, then clicks a Save button. The software then validates the file name, and saves the file if the name is valid.

On a Unix file system, what rules should be applied in the validation such that:

  • The name will not be difficult to manipulate later in terms of escaping special characters, etc.
  • The rules are not so restrictive that saving a file becomes non-user-friendly.

So basically, what is the minimum set of characters that should be restricted from a Unix file name?

From stackoverflow
  • Firstly, what you're describing is black listing. Your better option is to white list your characters, as it is easier (from a user perspective) to have characters inserted rather than taken away.

    In terms of what would be good in a unix environment:

  • a-z
  • A-Z
  • 0-9
  • underscore (_)
  • dash (-)
  • period (.)

    Should cover your basics. Spaces can be okay, but make things difficult. Windows users love them, unix/linux don't. So depending on your target audience choose accordingly.

    Pim Jager : Then what characters should be whitelisted?
    workmad3 : The characters given so far sound good. A hyphen and a period would be good additions as well.
    Gavin Miller : @workmad3 - good suggestions; I made the answer wiki, feel free to add.
    Jonathan Leffler : Newlines are a nuisance. Commas are pretty harmless. Colon would do no damage in Unix, but are problematic if the name is copied to Windows - or if the 'file' is a directory that might need to be added to PATH.
    Jonathan Leffler : There is some room to argue that any characters classified as 'isalpha()' in the current locale are OK - that allows people to use accented characters in the names. It complicates the story, though.
    hop : i for one will regard anything that probits accented characters as user-unfriendly
  • The minimum are slash ('/') and NULL ('\0')

    workmad3 : The minimum is /, ; and | to avoid the user running arbitrary commands (assuming it's not escaped :))
    Andrew Medico : This. No characters besides '/' should be disallowed.
    Jonathan Leffler : And ASCII NUL '\0' since that marks the end of the file name :D
    Jonathan Leffler : This is the rigourous answer. The application should be coded to assume that the user was this unconstrained (so when opening files, it should accept any name). It isn't such a good answer for saving (new) files; it is reasonable to put some limits on the file names.
  • Let the user enter whatever name he wants. Artificially restricting the range of characters will only annoy the users and serve no real purpose.

    workmad3 : sounds good... I'll enter a file called 'blah;rm -rf /' ;)
    Gavin Miller : +1 for the comment!
    Jonathan Leffler : Or, better: '$(rm -fr $HOME)' (minus the single quotes) as the file name? That will wreak havoc sooner rather than later. Backticks and $(...) are particularly pernicious as they 'work' when the file name is quoted, unlike most of the other special characters. Embedded quotes are tricky, too.
    Bombe : Those are all non-issues when saving the filename. fopen() doesn’t care about your filenames. When using a graphical shell (e.g. konqueror) it doesn’t care about your filenames. When you use auto-completion in the shell it doesn’t care about your filenames. So what are your points? :)
    le dorfier : @Bombe, what one user might want in many cases will alienate other users, regardless of the havoc it plays with your UI development process. Bad idea.
    Bombe : That’s my point: choosing strange names will not wreak havoc with anything—unless your “anything” is badly written. None of the standard tools of UNIX is badly written. Again: what’s your point?
  • Do not forget the dot (.) so that you can hide files and folders... Otherwise, I'd follow a UN*X name convention (from wikipedia):

    Most UNIX file systems

    • Case handling: case-sensitive case-preservation
    • Allowed character set: any
    • Reserved characters: / null
    • Max length: 255
    • Notes: A leading . indicates that ls and file managers will not by default show the file

    Link to wikipedia article about file names

  • Often forgotten: the colon (:) is not a good idea, since it's commonly used in stuff like $PATH, i.e. the list of directories where executables are found "automatically". This can cause confusion with DOS/Windows directory names, where of course the colon is used in drive names.

  • Please don't use spaces! They work, but can be a big pain to work with on the command line.

0 comments:

Post a Comment