Piping Basics
In a UNIX environment, if you wanted to find all files last modified in April, you might do:
ls -l | grep Apr
The first part of this command, ls -l
returns a listing like:
-rw-r--r-- 1 karl us 2573 Jan 20 09:44 2011-1-10-Why-Markdown-And-Why-Not-Word.html
-rw-r--r-- 1 karl us 8119 Mar 30 16:52 2011-7-5-Rethink-your-Data-Model.html
-rw-r--r-- 1 karl us 3594 Mar 30 16:49 2012-02-03-Node-Require-and-Exports.html
-rw-r--r-- 1 karl us 2816 Apr 3 18:32 2012-4-2-Is-Kindle-The-Next-Rim.html
-rw-r--r-- 1 karl us 3504 Apr 4 19:04 2012-4-4-You-Really-Should-Log-Client-Side-Error.html
The second part, grep Apr
will filter out lines passed into its standard input (think doing a Console.ReadLine from code) based on the provided pattern (in this case Apr
). The pipe operator |
redirects the output of one command into the input of the other. Therefore, the above output from ls
becomes the input for grep
.
To better understand this, let's look at something that won't work. Say you wanted to delete all markdown files. You might be tempted to try:
find . -iname "*.md" | rm
The first part does what we expect it to do, it finds all files with a markdown extension. However, piping find's output to rm
doesn't do anything other than display the help message for rm
. Why is that? Remember, pipe redirects a program's output to another program's input. rm
however does not work via standard input. It works via the command-line. In C#, that's the difference between Console.ReadLine()
and using the args[]
parameter.
The solution to this problem is to use a special utility which converts standard input into a command-line. This is what xargs
does. Unfortunately, xargs
can be quite different from platform to platform, but all we need right now is the simplest thing:
find . -iname "*.md" | xargs rm
If you've been following along, you can guess that xargs
takes data from the standard input (hence data can be piped to it) and converts that to command-line parameters for whatever program you specify (rm
in this case).
How do you know if a program takes data from standard input vs the command-line? Well, if you look at the help message from grep
, you'll see that it takes its input from [FILE]
, whereas rm
takes it from file
. It's a subtle difference.
To wrap it up, we can also look at the redirection operator >
. Rather than sending standard output to standard input like pipe, the redirection operator sends the standard output to a file, overwriting any previous values (you can append using >>
). If we wanted to save the list of markdown files (rather than delete them), we'd do:
find . -iname "*.md" > markdown.list