Sometimes, you may wondering why a simple grep -R /my/path suddenly stalls, without any blatant reason nor disk activity.
First thing to do (after having nervously restarted the command of course ;-)) : check the process state with ps:
$ ps -uwp 2821
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
loic 2821 0.1 0.0 13368 3148 pts/2 S+ 18:22 0:00 grep -R cfe.png /srv/
The STAT column is interesting here, because it gives the process status (man ps)
D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped, either by a job control signal or because it is being traced
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by its parent
S means the process is waiting for something to happen, but not I/O related like I was expected, otherwise the state would have been D.
Before restarting the command through strace, let’s ps tell us more about the event our grep process is waiting for. Here is the waiting channel, or wchan.
Waiting channel
A waiting channel is the kernel function where the process is sleeping, waiting for its completion to resume (man ps):
(use wchan if you want the kernel function name). Running tasks will
display a dash ('-') in this column.
Just ask ps to display this field:
$ ps -o user,pid,stat,wchan,command -p 2821
USER PID STAT WCHAN COMMAND
loic 2821 S+ pipe_w grep -R cfe.png /srv/
pipe_w ? A first sight, pipe means the shell | character, used to stream and filter output through multiple commands. But I’ve launched a standalone grep command, not followed by any pipe, so wtf?
Just remember that the pipe concept is also hidden behind another computing concept: a FIFO, which is nothing else than a persistent pipe on the file system.
Things become clear: a leftover FIFO is somewhere in my tree, and grep is waiting for something to read. Finally, more details with strace:
$ strace grep -R cfe.png /srv/
(...)
read(3, "", 32768) = 0
close(3) = 0
stat("/srv/tmp/.licq/licq_fifo", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
openat(AT_FDCWD, "/srv/tmp/.licq/licq_fifo", O_RDONLY
(stuck on)
grep is stuck on the /srv/tmp/.licq/licq_fifo file, waiting for data that will never come, since nothing is written to this FIFO.
However I wonder why openat() blocks, instead of issuing a subsequent read() syscall ?
Grep and FIFOs
Of course, this case is handled by grep, a quick glance to its manpage shows that its behavior can be tweaked regarding special files:
-D ACTION, --devices=ACTION
If an input file is a device, FIFO or socket, use ACTION to process it.
By default, ACTION is read, which means that devices are read just as if
they were ordinary files. If ACTION is skip, devices are silently skipped.
Finally, with this option my command works as expected:
$ grep -D skip -R cfe.png /srv/
I’m starting to consider to really read the manpage of each day-to-day commands I use, many gems are hidden !