Unix commands are great for manipulating data and files. They get even better when used in shell pipelines. The following are a few of my go-tos – I’ll list the commands with an example or two. While many of the commands can be used standalone, I’ll provide examples that assume the input is piped in because that’s how you’d used these commands in a pipeline. Lastly, most of these commands are pretty simple and that is by design – the Unix philosophy focuses of simple, modular code, which can be composed to perform more complex operations.
Note: if you’re using a Mac, the builtin tools shipped with macOS might behave a little differently than the most recent versions. You can get more recently compiled version of these tools by running brew install coreutils
. The typically usage is then ghead
instead of head
, gtail
instead of tail
, gpaste
instead of paste
, etc.
Here are the commands and a one sentence description:
Please forgive the “useless use of cat
”. I’m using cat
to show the pipe-able versions of the piped-to commands.
Print the first 5 lines
cat somefile.txt | head -n 5
Print the last 5 lines:
cat somefile.json | tail -n 5
Print all but the first line
cat somefile.csv | tail -n +2
Send all lines of output as arguments to echo
cat somefile.xml | xargs echo
Send each line individually as an argument to echo
cat somefile.yaml | xargs -n 1 echo
Send two arguments at a time to echo
cat somefile | xargs -n 2 echo
Print only column two of each line (assumes whitespace between columns)
cat somefile | awk '{print $2}'
Print only column two of each line with ,
separators
cat somefile | awk -F',' '{print $2}'
or
cat somefile | cut -d',' -f 2
String replace each line matching 1, 2, or 3 using regex with the letter ‘x’
cat somefile.txt | sed 's/[1-3]/x/'
Count the number of times each line appears in the file
cat somefile.txt | sort | uniq -c
Write out the intermediate result to a file in the middle of a pipeline using tee
cat somefile.txt | grep "mystring" | tee newfile.txt | wc -l
The above are a bunch of “tools” to file away for when you need them. I like to store them along with a few keywords or a sentence describing what each does for easy searching and recall.
Now, let’s consider the following example file myfile.txt
:
uuid,name
72e925a8-58fb-11e9-87f4-fbe50933ad95,name1
237fd8a4-58fb-11e9-8bb7-bf5388556288,name2
91834624-58fb-11e9-8c01-2393beecfc80,name3
223c7438-58fc-11e9-a629-7707a76a95c5,name4
a0362ab0-58fb-11e9-8aed-1f7ca90ae318,name5
f3f153aa-58fb-11e9-a37c-3ffed25b518b,name6
09f6cdb0-58fc-11e9-9663-93ede25c7774,name7
21f4c2ec-58fc-11e9-baf6-7792114ee968,name8
We want to create files in the current folder using the names in column two, where the uuids in column one start with a “2”.
Break down the transformation into parts:
- remove the first line (the header)
- filter for lines starting with
2
- grab the second column using a
,
separator - use the result to create the files with
touch
tail -n +2 myfile.txt | grep "^2" | awk -F',' '{print $2}' | xargs touch
Confirm it worked:
$ ls name*
name2 name4 name8
As you add new tools to your toolbox, you can plug them in to your shell one-liners to manipulate data streams.
Some other useful commands for text processing worth looking into: