Tuesday, July 21, 2009

Binmode, textmode, cygwin, msys and grep

In the UNIX world, I guess, not many people care about the file is opened
in text mode or binary mode. In fact, most of us probably are just using
binary mode. But when it comes to the Windows world, the difference is
becoming a pain in the arse. Because in Windows, when you open a file in
text mode, every time you write a line of text, Windows makes it end with
'CRLN' instead of 'LN'. I don't intend to make a detailed explanation here
and reinvent the wheel. The best explanation is here:
http://www.cygwin.com/cygwin-ug-net/using-textbinary.html
But that link just talk about cygwin, how about MSYS. Is MSYS doing the same?
Yes, it is. By default, MSYS is in binmode. So whatever is passed in will
be pass through (the shell here I mean, including the pipes, redirected file
descriptors) untouched. So what I was trying to find out and puttting down here
is: If you run a Python program (using Windows python), the output pass through
a cygwin/msys shell (pipe), what is the end-of-line marker? My non-authorative
answer is: whatever the python program outputing and when you run the Windows
version of python, every text line is output with 'CRLN' end-of-line'.
So this is 'fine', as long as it is consistent on Windows. The thing is driving
you mad, when some programs is doing the correct thing, that is, stripping the
'\r' off the input/output lines. Which programs fall into this category, I now
find one for sure, that is grep, one of the most useful POSIX tool. The MSYS grep
will silently strip off the '\r' (CR) on each line, which the cygwin one will not.
Imagine what this will cost you! So to be safe, I decide to always use the
-U (--binary) switch.

Below is a test program to verify this. I wrote a python script:

import os

print 'hello world'

f = os.fdopen(1, 'wb', 0)
f.write('hello world\n')
f.write('hello world\r\n')

import sys
sys.stdout.close()
sys.stdout = f

print 'hello world'

Run it with command:

python test.py | xxd

See what is the output!

Run it under MSYS with this command:

python test.py | grep -v 'asdf' | xxd

You will see it is end with 'LN'.
Now run it with this:

python test.py | grep -U -v 'asdf' | xxd

You will see it is end with 'CRLN'.

Try the two commands on cygwin, both have the same output.

No comments:

Post a Comment