Tom 7 Radar: Screwed like me, thanks to libc

[ back to Tom 7 Radar ]

p
e
r
s
o
n
a
l Screwed like me, thanks to libc (04 May 2004 at 19:50)

The UNIX C library is dumb beyond what I already knew. Here's what the man page says for fread (which is a function that tries to read nmemb blocks of length size from the file stream into ptr):

size_t fread( void *ptr, size_t size, size_t nmemb, FILE *stream);

... RETURN VALUE fread and fwrite return the number of items successfully read or written (i.e., not the number of characters). If an error occurs, or the end-of-file is reached, the return value is a short item count (or zero).

Based on this spec, what do you suppose the following code should print? I'm asking to read 1 block of zero bytes from "test.c", which exists and is non-empty.

FILE * ff = fopen("test.c", "r"); char r[100]; printf("fread(r, 0, 1, ff) = %d\n", fread(r, 0, 1, ff));

If you're me reading the spec and writing this code, then fread should return 1. Why? Well, I've asked for it to read 1 block of size 0, and it will return the number of items successfully read, unless there is an error or the end of file is reached. It doesn't need to read any bytes, so it certainly won't "reach" the end of file. It doesn't make sense to deliberately read fewer items than is possible (the "If an error occurs..." clause seems to imply, though not require, that a short item count is only returned if it can't read all of the items requested). Therefore, I believe the fread above should always return 1, since it can successfully read one zero-byte element from the file into r.

I think this reading of the spec will be obvious to most of the people I work with; if you disagree, I welcome your interpretation.

Well, what does it do? It returns zero. In fact, there is a special case of the implementation to deal with this case. Check out glibc-2.3.2/libio/iofread.c:

_IO_size_t _IO_fread (buf, size, count, fp) void *buf; _IO_size_t size; _IO_size_t count; _IO_FILE *fp; { _IO_size_t bytes_requested = size * count; ... if (bytes_requested == 0) return 0; ... /* continue sanely */

At best, the specification is ambiguous. I don't actually think that it is, unless you can consider 0-byte argument an occurrence of an "error." At worst, this is another example of one of UNIX's worst philosophies (I suppose I project my negative opinions of perl onto UNIX in general, but this example seems to bear that out!): Finding cases where I'm doing something that's seemingly useless or degenerate, and then replacing that with "more useful" behavior. I suppose the argument that the designers of the C library (if they indeed intended this behavior and it is not just glibc interpreting the apparent ambiguity) would use is, "Why would you ever want to read blocks of zero length?" And so they "helpfully" give me an "error" instead of just doing what I asked.

(Aside: I also discovered recently that an empty \begin{enumerate} \end{enumerate} is illegal in LaTeX, which is precisely the same kind of stupid misdesign/bug.)

I don't believe that a reasonable use scenario should be a pre-requisite for a library function having a general specification. Implementing the spec even in seemingly useless corner cases should just be what you do. But having a reasonable use for such "corner" cases makes the situation even worse, and in this case I have one. (The difficulty in tracking down a bug in real code to this problem is the reason that I'm bothering to compose this rant in the first place.) Here it is. Suppose you have a file format where you store data objects of arbitrary length (say, strings) by first writing their length n, then writing n bytes. (RIFF based file formats like MIDI and WAV use this scheme, among others.) How do you read such a format? Well, fread an integer, and then, perhaps fread(buffer, n, 1, ff). Well, if n is ever zero, as it is in some of my files, then you are screwed like me, thanks to libc.

c
o
m
m
e
n
t Andrew (yale128036074100.student.yale.edu) – 05.04.04 21:49:23

Clearly the correct solution is to write a macro that checks if n is 0 and returns 1 if it is. Fight fire with fire!

c
o
m
m
e
n
t Tom 7 (h-66-167-250-95.phlapafg.dynamic.covad.net) – 05.04.04 22:31:44

Of course! What was I thinking?

c
o
m
m
e
n
t Adam (c-24-3-25-91.client.comcast.net) – 05.05.04 00:52:17

It looks like it's a case where the specification was ambiguous long ago, someone chose a concrete implementation, and we're stuck with it.

Curiously, the documentation you read is in error. ISO C requires that 0 be returned only on error, even when size or nmemb are 0. UNIX documentation says the opposite.

Here is a bug report filed in 1998 about this:
http://www.opengroup.org/platform/resolutions/bwg98-007.html

Still, try this with VC++. I bet you'll get the same thing.

c
o
m
m
e
n
t Mike (dialn-async447.dial.net.nyu.edu) – 05.05.04 06:12:17

I don't know... whether or not you can "successfully read" 0 bytes seems like a philosophical question to me. Specifically, since the contents of the buffer you're reading into don't change, a "success" would have the exact same effect as a "failure." Suppose you were doing this:

size_t size = get_size_somehow();
char *buffer = (char*)malloc(size);
if (fread(buffer, size, 1, file))
{
// do something with a lot of overhead
}

I think this is what the authors were thinking: in most cases, somebody is going to have to check for zero somewhere, so we might as well do it in the library. If they return 0, then somebody writing this code doesn't need to check before the big-overhead part; if they return 1, then somebody writing code like yours doesn't need to check. I think you could make a case for either.

Not that there's any excuse for not following the standard, or for having bad documentation, though. I think the spec should state explicitly what happens in this case.

c
o
m
m
e
n
t Tom 7 (h-66-167-250-95.phlapafg.dynamic.covad.net) – 05.05.04 09:47:28

I don't doubt that there are uses for the special case, but that's not important to me. (It's not prima facie obvious to me that even in your example, the programmer wouldn't want to do the overhead block even on zero-sized objects. I do in my code, which is basically the same. Of course, he'd also have to deal with the special situation of malloc(0), which is at least documented!)

How can we "fail" to read 0 bytes? That's what I don't understand. If it's not a failure, then the function doesn't implement the spec. You argue that the observable effects of failure and success are the same (other than the return value). I agree. But why would fread then choose failure instead of success (which it can obviously also attain)... sheer pessimism? That doesn't make sense to me.

I've been pointed to other specifications for this function that do point out the special case. If it's well-known that there are special cases to "help" you, then programmers can use those (maybe to benefit) and it's not *as* bad. I would have spent less time debugging this had it been documented. But it still makes it harder to understand and write code, because you have to remember the special cases in addition to the general specification. Therefore I contend that general specifications are always better.

Adam: it does also crash on win32, but Microsoft is not responsible for the design of the ANSI C library. ;) In any case, this argues that there is some kind of standard surrounding this behavior, but that it is not documented well.

c
o
m
m
e
n
t nothings (adsl-63-203-75-155.dsl.snfc21.pacbell.net) – 05.06.04 00:06:53

I was going to say that you could kind of see the design making sense it it was a specification that was trying to avoid a divide-by-zero: that you might at some point multiply size * num_elem to get the total bytes to read, and later divide bytes_read / size to get the actual num_elem read. But that argument doesn't really convince me.

On the other hand, I think you're looking at this solely from the point of view of varying the size of the element, leading to a 'natural' conclusion. But if we look at the behavior of fread() relative to feof(), we could also assert a pretty plausible behavior: fread() should always return 0 if feof() is true.

If you buy that behavior, you're now stuck with fread(buffer,0,1,f) returning 1 if not at EOF and 0 if at EOF, which is probably not what you want either. You could argue that the 0-lengthness should win out over the feof()ness--"i'm not actually asking for any data, so it doesn't matter if there's any data there or not"--but I'm not convinced that in practical code this might not be the opposite of what you want. Certainly if you fclose(f), I don't think fread(buffer,0,1,f) should return 1, although I guess that's probably undefined anyway.

c
o
m
m
e
n
t Tom 7 (h-66-167-250-95.phlapafg.dynamic.covad.net) – 05.06.04 01:31:52

I think there's an argument to be made if you're actually at the end of the file, yeah. But the spec says it fails if the end of file is "reached" and I don't see how we can reach anything if we don't move. In addition, if one reads a single byte file in a call to fread(b, 1, 1, f), this "reaches" the end of file in some sense, but of course should succeed.

c
o
m
m
e
n
t Anonymous (p3ee35d7a.dip.t-dialin.net) – 05.15.09 06:39:29

imho reading zero blocks of size n or n block of size zero is an error. this should be avoided by any serious programmer.
a program should not do meaningless stuff like reading blocks of length zero.
if you debug programs you have to decide what is an error and what is intended by the programmer. at first glance i would assume that reading zero bytes is not intended to be happened and therefore a possible error.

c
o
m
m
e
n
t Tom 7 (mobile-032-148-046-153.mycingular.net) – 05.15.09 13:16:52

I would claim to be a "serious programmer" and strongly disagree, for the reasons I explained. Special cases are the rough surfaces that make it difficult to put large software together.

c
o
m
m
e
n
t Anonymous (194.128.253.254) – 02.15.10 09:51:57

The C language Standard (C99) says:

"The fread function returns the number of elements successfully read, which may be less than nmemb if a read error or end-of-file is encountered. If size or nmemb is zero, fread returns zero and the contents of the array and the state of the stream remain unchanged."

So the "UNIX C library" works that way because if it didn't it would be broken. Any recent C implementation must work like this because that is what the C standard requires, at least since 1999.

Another point here is that the fread() is defined to read into an array of objects where the size parameter gives the size of each object. C defines the term object (not like an OO object, in C is is roughly a thing that can store values). An object as C defines it cannot have zero size so passing 0 for the size parameter is best viewed as an error.

p
o
s
t

a

c
o
m
m
e
n
t

[ Tom 7 Radar • Tom 7 on Google+ • on Twitter • on Facebook ]