Escaping strings for use at any command line

Okay, I have finally sussed this problem on both Windows and Linux.

The following code is written in Perl but it can be quite easily adapted to work for pretty much any programming language.

Procedure for escaping an arbitrary argument for use at a command line

sub escape_arg {
	my $arg = shift;

	# Windows cmd.exe:
	if($^O eq "MSWin32") {

		# Sequence of backslashes followed by a double quote:
		# double up all the backslashes and escape the double quote
		$arg =~ s/(\\*)"/$1$1\\"/g;
		
		# Sequence of backslashes followed by the end of the string
		# (which will become a double quote later):
		# double up all the backslashes
		$arg =~ s/(\\*)$/$1$1/;

		# All other backslashes occur literally

		# Quote the whole thing:
		$arg = "\"".$arg."\"";

		# Escape shell metacharacters:
		$arg =~ s/([()%!^"<>&|;, ])/\^$1/g;
	}

	# Unix shells:
	else {
		# Backslash-escape any hairy characters:
		$arg =~ s/([^a-zA-Z0-9_])/\\$1/g;
	}

	return $arg;
}

Procedure for escaping the name of an arbitrary program for use at a command line

That is, the 0th argument of the call. On Windows, this needs different treatment from the actual arguments.

sub escape_prog {
	my $prog = shift;

	# Windows cmd.exe: needs special treatment
	if($^O eq "MSWin32") {
		# Escape shell metacharacters
		$prog =~ s/([()%!^"<>&|;, ])/\^$1/g;
	}
	
	# Unix shells: same procedure as for arguments
	else {
		$prog = escape_arg($prog);
	}

	return $prog;
}

Procedure for escaping an arbitrary command

As presented in the form of a program followed by a series of arguments for that program. Returns a string.

sub escape_cmd {
	die "No call supplied\n" unless scalar @_ > 0;

	my @escaped = ();

	push @escaped, escape_prog($_[0]);
	push @escaped, map { escape_arg($_) } @_[ 1 .. $#_ ];

	return join " ", @escaped;
}

Tests

These subroutines worked on my Windows machine and the Linux machine which hosts this site. If you find faults or want to suggest some more test strings, be my guest.

The complete list of strings I used for unit tests is:

yes
no
child.exe
argument 1
Hello, world
Hello"world
\some\path with\spaces
C:\Program Files\
she said, "you had me at hello"
arg;,ument"2
\some\directory with\spaces\
"
\
\\
\\\
\\\\
\\\\\
"\
"\T
"\\T
!1
!A
"!\/'"
"Jeff's!"
$PATH
%PATH%
&
<>|&^
()%!^"<>&|
>\\.\nul
malicious argument"&whoami
*@$$A$@#?-_

Discussion (7)

2012-02-22 20:33:39 by qntm:

It seems that this procedure has a fault on Windows, when trying to invoke a program whose name has a space in it. I'm unable to figure out a workaround for this. In particular, I can't find any way to invoke a program named "foo %PATH%.exe" at the command line. Any ideas, anybody?

2012-10-23 00:11:27 by Phil:

Trivial: invoke "foo "%"PATH"%".exe"

2012-11-30 16:52:22 by Johan:

I've made a JavaScript version of it: http://jsbin.com/anitaz/11/ You can use it without installation of software.

2013-08-06 15:55:51 by RP:

i am trying to run "tf changeset /collection:tfsapp.dotcom.blabla.org/Misc 765 /noprompt" as a command and the escaping doesn't work unfortunately, would be great to write a post about how what rules are you trying to implement for escaping characters.

2015-10-27 14:48:53 by Resuna:

Um. First, a general solution is not possible even in principle on Windows, because the command line is passed to the program as a single string, not a series of strings. This means that the parsing is not guaranteed to be handled by a component provided by Microsoft. It may be completely ad-hoc. If you need to do this on Windows, god help you. Second, if you need to use this function in UNIX, I mean if you THINK you need to use this in UNIX, the first thing you need to do is take a step back and look at what you're doing. Because in UNIX, you should be using {exec()} to call programs. If you use exec(), you don't need to quote anything. So in most cases, you shouldn't need to use this. There are a few cases where you do (for example, pasting a file name into a terminal window) but mostly the solution is refactoring.

2019-09-20 17:59:34 by Artoria2e5:

Ran the tests on Windows 10.0.18362.10019 escape_prog does not have to be so complicated for cmd, mainly because the good ol' Win32 filename restrictions already rules out [<>:"/\\|?*]. The rest of the stuff [()%!^&;, ] are simply silenced by surrounding with double quotes. In other words, on Win32, just throw double quotes onto and you are done. The complexity for escape_arg is somewhat justified given the Windows processing. However, it is still incorrect: running a argv-printing toy compiled with MSVC on cmd does not quite return the expected result with "^(^)^%^!\^"^<^>^&^|^;^ a": it gives ^(^)^%^\^!^<>&|;\na. In other words, the quote is still treated as the end of a quoting-block and as a result only the rest needs quoting. There's also the very strange thing about "^ " becoming a newline. We already know that we are dealing with a heterogenous set of cmdline-to-argv parsers here, but the most common one is still MS's magical formula kinda documented at [0]. It has a tiny undocumented caveat that a pair of "", when found in quotes, means a single ". And guess what? "()%!""<>&|;, a" is taken up perfectly. [0]: https://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments As a quick sidenote, yes the unix shell escaping is too complicated as well. The quick path to escaping it is to replace all ' with '\'' and put the whole thing in a pair of single quotes. The POSIX shell single quote takes everything literally and does not process escapes. (MS PowerShell single quotes are similar, but pair-of-single-quotes may be used to denote a literal single quote in them.)

2019-09-21 08:04:13 by Mingye Wang:

Correction: escape_arg for CMD is, well, not incorrect. The confusion stems from the fact that it quotes the entire thing and then escapes the quotes, making cmd not recognize the fact that something has been quoted. That justifies all the escaping of all the metacharacters since they are outside quotes. Yes it is still overcomplicated and all but it ain't wrong. The newline thing is a mistake on my end; I was puts()-ting all the argvs and that was no good for telling the difference between argument boundaries and actual newlines.

New comment by :

Plain text only. Line breaks become <br/>
The square root of minus one: