Wednesday, October 12, 2011

microsoft's ifstream automatically removes carriage return

By default, the ifstream class in miscrosoft's STL converts a carriage return and new line (0x0d0a or crlf) pair to a single new line (0x0a) automatically while reading a text file.

Take the following code for example:

 #include    <cstdlib>
 2 #include    <fstream>
 3 #include    <iostream>
 5 using namespace std;
 7 int main ( int argc, char *argv[] )
 8 {
 9     ifstream fs("crlf.txt");
10     fs.seekg(0, ios::end);
11     int len = fs.tellg();
12     fs.seekg(0, ios::beg);
14     char* buf = new char[len];
15, len);
16     cout << hex;
17     // dump buf in hex format
18     for(int i = 0; i < len; ++i)
19         cout << static_cast<int>(buf[i]) << " ";
20     cout << endl << dec << buf << endl;
21     cout << "file size: " << len
22         << " actual read len: " << fs.gcount() << endl;
23     return EXIT_SUCCESS;
24 }               // ----------  end of function main  ----------

And let's suppose the content of crlf.txt is:

If we compile the code with microsoft's vc++ compiler, and run the executable against the preceeding text file, we get below output:
68 65 6c 6c 6f a 77 6f 72 6c 64 a 0 0

file size: 14 actual read len: 12

As we can see, the 0x0d0a has been changed to 0x0a, and the number of bytes actually read is 12, other than 14. But if we compile the code with g++ and run the test, we get different output:

68 65 6c 6c 6f d a 77 6f 72 6c 64 d a

file size: 14 actual read len: 14
The number of bytes actually read is now the same as the text file's size. And the bytes read into memory is the same as the original file's on disk.

In very rare cases, we may appreciate the microsoft ifstream's behavior, which saves our time from making such conversion our-self. But in most cases, it has negative consequence and incurs subtle bugs that are hard to debug.
Not sure why is it, just to be alerted by this specific behavior exhibited by microsoft's STL.

1 comment:

anti said...

its because per default C++ open your text in text mode :)
you have to open in binary mode and you will have all !