Table of Contents

BASH - WordSplitting

The shell's parser performs several operations on your commands before finally executing them.

Understanding how your original command will be transformed by the shell is of paramount importance in writing robust scripts.

NOTE: Shell commands execute some program with a specific set of arguments (as well as setting up environment variables, opening file descriptors, etc.).

  • Word splitting is part of the process that determines what each of those arguments will be.
    • After Word splitting and pathname expansion, every resulting word becomes an argument to the program that the shell executes.

Word splitting is performed on the results of almost all unquoted expansions.

  • The result of the expansion is broken into separate words based on the characters of the IFS variable.

From the bash man page:

The order of expansions is: brace expansion, tilde expansion, parameter, variable and arithmetic expansion 
and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion. 

For additional information on word splitting and argument handling in Bash, consider reading Arguments.

What is Word Splitting?

Let's write a little helper script that will show us the arguments as passed by the shell:

test.sh
#!/bin/sh -
printf "%d args:" "$#"
printf " <%s>" "$@"
echo

Make it executable

chmod +x test.sh

Run it

./test.sh hello world "how are you?"

returns:

3 args: <hello> <world> <how are you?>

NOTE: The helper program above receives the argument list as constructed by the shell, and shows it to us.


Test when IFS is not SET

If IFS is not set, then it will be performed as if IFS contained a space, a tab, and a newline.

For example:

var="This is a variable"
test.sh $var

returns:

4 args: <This> <is> <a> <variable>

Test when IFS is SET

An example using IFS:

log=/var/log/qmail/current IFS=/
test.sh $log

returns:

args: <> <var> <log> <qmail> <current>
unset IFS

Test with Command Substitution

ls -l

returns:

total 2864
-rw-r--r-- 1 greg greg 2919154 2001-05-23 00:48 Yello - Oh Yeah.mp3

Now run against the test script:

test.sh $(ls -l)

returns:

11 args: <-rw-r--r--> <1> <greg> <greg> <2919154> <2001-05-23> <00:48> <Yello> <-> <Oh> <Yeah.mp3>

Controlling Word Splitting

As you can see above, we usually do not want to let word splitting occur when filenames are involved.

Double quoting an expansion suppresses word splitting, except in the special cases of "$@" and "${array[@]}":

var="This is a variable"; test.sh "$var"

returns:

1 args: <This is a variable>

array=(testing, testing, "1 2 3"); test.sh "${array[@]}"

returns:

3 args: <testing,> <testing,> <1 2 3>

NOTE: "$@" causes each positional parameter to be expanded to a separate word; its array equivalent likewise causes each element of the array to be expanded to a separate word.

There are very complicated rules involving whitespace characters in IFS. Quoting the man page again:

  • If IFS is unset, or its value is exactly <space><tab><newline>, the default, then any sequence of IFS characters serves to delimit words.
  • If IFS has a value other than the default, then sequences of the whitespace characters space and tab are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character).
  • Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field.
  • A sequence of IFS whitespace characters is also treated as a delimiter.
  • If the value of IFS is null, no word splitting occurs.

We won't explore those rules in depth here, except to note the part about sequences of non-whitespace characters.


IFS with non-whitespace characters

If IFS contains non-whitespace characters, then empty words can be generated:

getent passwd sshd

returns:

sshd:x:100:65534::/var/run/sshd:/usr/sbin/nologin

Set IFS

IFS=:; test.sh $(getent passwd sshd)

returns:

7 args: <sshd> <x> <100> <65534> <> </var/run/sshd> </usr/sbin/nologin>

Unset IFS

unset IFS

NOTE: There was another empty word generated in one of our previous examples, where IFS was set to /.

  • non-whitespace IFS characters are not ignored at the beginning and end of expansions, the way whitespace IFS characters are.

Whitespace IFS characters get consolidated.

  • Multiple spaces in a row, for example, have the same effect as a single space, when IFS contains a space (or is not set at all).
  • Newlines also count as whitespace for this purpose, which has important ramifications when attempting to load an array with lines of input.

Pathname expansion happens after word splitting, and can produce some very shocking results:

getent passwd qmaild
qmaild:*:994:998::/var/qmail:/sbin/nologin
 
IFS=:; test $(getent passwd qmaild)
737 args: <qmaild> <00INDEX.lsof> <03> <037_ftpd.patch> ...
 
unset IFS

The * word, produced by the shell's word splitting, was then expanded as a glob, resulting in several hundred new and exciting words.

files='*.mp3 *.ogg'
test.sh $files
2 args: <Yello - Oh Yeah.mp3> <*.ogg>

NOTE: Pathname expansion can be disabled with set -f or set -o noglob; though this can lead to surprising and confusing code.


Notes