Why does the wc utility generate multiple lines with "total"?

I am using the wc utility in a shell script that I run from Cygwin, and I noticed that there is more than one line with "total" in its output.

The following function is used to count the number of lines in my source files:

count_curdir_src() {
    find . '(' -name '*.vb' -o -name '*.cs' ')' 
        -a '!' -iname '*.Designer.*' -a '!' -iname '.svn' -print0 | 
    xargs -0 wc -l
}

But its output for a certain directory looks like this:

$ find . '(' -name '*.vb' -o -name '*.cs' ')' -a '!' -iname '*.Designer.*' -a '!' -iname '.svn' -print0 | xargs -0 wc -l
     19 ./dirA/fileABC.cs
    640 ./dirA/subdir1/fileDEF.cs
    507 ./dirA/subdir1/fileGHI.cs
   2596 ./dirA/subdir1/fileJKL.cs
(...many others...)
     58 ./dirB/fileMNO.cs
     36 ./dirB/subdir1/filePQR.cs
 122200 total
  6022 ./dirB/subdir2/subsubdir/fileSTU.cs
    24 ./dirC/fileVWX.cs
(...)
    36 ./dirZ/Properties/AssemblyInfo.cs
    88 ./dirZ/fileYZ.cs
 25236 total

It looks like wc resets somewhere in the process. It cannot be caused by space characters in filenames or directory names, because I use the -print0 option. And it only happens when I run it on my largest source tree.

So, is this a bug in wc, or in Cygwin? Or something else? The wc manpage says:

Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified.

It doesn't mention anything about multiple total lines (intermediate total counts or something), so who's to blame here?


You're calling wc multiple times - once for each "batch" of input arguments provided by xargs. You're getting one total per batch.

One alternative is to use a temporary file and the --files0-from option for wc :

$ find . '(' -name '*.vb' -o -name '*.cs' ')' -a '!' -iname '*.Designer.*' -a 
    '!' -iname   '.svn' -print0 > files

$ wc --files0-from files

What's happening is that xargs is running wc multiple times. xargs by default batches as many arguments as it thinks it can into each invocation of the command it's supposed to run, but if there are too many files it will run the command multiple times on subsets of the files.

There are a couple ways I see to fix this. The first, which will break if you have too many files, is to skip xargs and use the shell. This may not work well on Cygwin, but would look like this:

wc -l $(find . '(' -name '*.vb' -o -name '*.cs' ')' 
    -a '!' -iname '*.Designer.*' -a '!' -iname '.svn' )

and you also lose the print0 capabilities.

The other is to use an awk (or perl ) script to process the output of your find / xargs combo, skip "total" lines, and sum up the total yourself.


The command-line length is much more limited under cygwin than on a standard linux box, and xargs must split the input to respect those limits. You can check the limits with xargs --show-limits :

On cygwin:

$ xargs --show-limits < /dev/null
Your environment variables take up 4913 bytes
POSIX upper limit on argument length (this system): 25039
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 20126
Size of command buffer we are actually using: 25039

On centos:

$ xargs --show-limits < /dev/null
Your environment variables take up 1816 bytes
POSIX upper limit on argument length (this system): 2617576
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2615760
Size of command buffer we are actually using: 131072

And to build on @JonSkeet's answer, you don't need to create an additional file, you can pipe your find results directly to wc, by passing - as argument to --files0-from :

find . -name '*.vb' -print0 | wc -l --files0-from=-
链接地址: http://www.djcxy.com/p/57164.html

上一篇: 秘密空树对象可靠,为什么没有它的象征性名称?

下一篇: 为什么wc实用程序会用“total”生成多行?