Log in

No account? Create an account

Thu, Oct. 15th, 2009, 12:07 pm
print() experiencing deja vu

I have a very multithreaded Python3 app I'm running which I have a whole bunch of calls to print() in. I don't generally do multithreading, but this is spinning up a whole bunch of servers to check if they're working right. I have another test which is non-multithreaded and reproducible, and will of course have another test of actually running everything on multiple machines, but this is the intermediate step.

Anyhow, my problem is that print() appears to be sometimes causing the same line to be printed out repeatedly. I'm loathe to draw the conclusion that the underlying libraries are misbehaving, but I've taken the following precautions:

The amount of data it's spitting out is extensive
The code is asserting that the exact string hasn't been printed before
The thread and object id's are included in what's printed
There's a call to randrange() included at the end, just to make sure

And yet, I'm still getting identical lines of output.

My questions are:

What the fuck?

Is there some config setup to make this stop?

If not, is there a workaround?

My next step is to try making everything dump objects to be printed on a queue and have a single thread to all the printing, in the hopes that that will get the duplicates to go away, or at least be next to each other instead of obnoxiously interleaved. But first I'm taking a break for lunch, I've already wasted a day on this crap.

Thu, Oct. 15th, 2009 09:00 pm (UTC)

One possible cause: if you do the print(), the actual string may get buffered before it's output. If you launch a thread at that point, both threads may believe it's their job to output what's in the buffer.

I've definitely seen this happen with fork, but I'm not positive it should happen with threads.

Thu, Oct. 15th, 2009 09:31 pm (UTC)
peterschuller: Have you seen a claim that sys.stdout is thread-safe?

Check out pythonrun.c. On cursory examination I don't see anything special going on with stdout; it's just an io object like others. Why would there be an expectation that this be thread-safe? Especially given that it's defined to be using stdio style buffering. Is there a claim that IO objects in python 3 should be thread-safe (seems strange)?

sys.stdout in Python 3.0 (based on runtime inspection) is a TextIOWrapper. I don't see any synchronization going on in io.py, which is what I would expect.

As far as I can tell this behavior is expected.

(I didn't find where print() itself is defined, though any thread-safety added there would not jive well with other uses of stdout anyway.)

Thu, Oct. 15th, 2009 09:32 pm (UTC)
peterschuller: And on the topic of a work-around

You might use the logging API which is thread-safe (though I don't remember whether this is defined to be the case or just is).

Thu, Oct. 15th, 2009 09:35 pm (UTC)
peterschuller: Re: And on the topic of a work-around

(Or make sys.stdout thread-safe by wrapping it before loading any other modules that have a chance to de-ref sys.stdout. Not sure whether there is a standard thread-safe wrapper in py3 or whether you'd have to implement it. Also not sure whether the print() function can be counted on to deref sys.stdout - I would presume so.)