Differences

This shows you the differences between two versions of the page.

--- bash:files:read_a_file:read_fields_from_a_file [2021/01/26 13:22] – created peter
+++ bash:files:read_a_file:read_fields_from_a_file [2021/01/26 13:44] (current) – [Field splitting, white-space trimming, and other input processing] peter
@@ Line 1: / Line 1: @@
 ====== BASH - Files - Read a file - Read fields from a file ======
+To read fields within each line of the file, additional variables may be used with the read:
+----
+===== Fields are separated with white-space (space or tab characters only) =====
+If an input file has 3 columns separated by white-space (space or tab characters only).
+<code bash>
+while read -r first_name last_name phone; do
+  # Only print the last name (second column).
+  printf '%s\n' "$last_name"
+done < "$file"
+</code>
+----
+===== Fields are NOT separated with white-space =====
+If the field delimiters are not whitespace, set the IFS (internal field separator):
+<code bash>
+# Extract the username and its shell from /etc/passwd:
+while IFS=: read -r user pass uid gid gecos home shell; do
+  printf '%s: %s\n' "$user" "$shell"
+done < /etc/passwd
+</code>
+<WRAP info>
+**NOTE:**  IFS is set to a colon, **:**, as every field in the passwd file is separated by a colon.
+</WRAP>
+----
+===== Tab-delimited files =====
+<WRAP info>
+**NOTE:**  For tab-delimited files, use **IFS=$'\t'**.
+<WRAP important>
+**WARNING:**  Multiple tab characters in the input will be considered as one delimiter (and the **IFS=$'\t\t'** workaround does not work in Bash).
+</WRAP>
+</WRAP>
+----
+===== Not knowing how many fields a line contains =====
+You do not necessarily need to know how many fields each line of input contains.
+  * If you supply more variables than there are fields, the extra variables will be empty.
+  * If you supply fewer, the last variable gets "all the rest" of the fields after the preceding ones are satisfied.
+For example:
+<code bash>
+read -r first last junk <<< 'Bob Smith 123 Main Street Saint Helier Jersey'
+</code>
+<WRAP info>
+**NOTE:**
+  * **first**:   will contain "Bob"
+  * **last**:  will contain "Smith".
+  * **junk**:  holds everything else.
+</WRAP>
+----
+===== Throwaway variable =====
+<code bash>
+read -r _ _ first middle last _ <<< "$record"
+</code>
+<WRAP info>
+**NOTE:**  The throwaway variable **_** can be used as a "junk variable" to ignore fields.
+  * It, and any other variable, can be used more than once in a single read command, if we don't care what goes into it.
+  * The first two fields are skipped.
+  * The next three fields are reading into variables.
+  * The final **_** will absorb any remaining fields on the line.
+    * It does not need to be repeated there.
+<WRAP important>
+**WARNING:**  This usage of **_** is only guaranteed to work in Bash.
+  * Many other shells use **_** for other purposes that will at best cause this to not have the desired effect, and can break the script entirely.
+  * It is better to choose a unique variable that isn't used elsewhere in the script, even though **_** is a common Bash convention.
+</WRAP>
+</WRAP>
+----