Monday, April 25, 2011

Call recv() on the same blocking socket from two threads

What happens if I have one socket, s, there is no data currently available on it, it is a blocking socket, and I call recv on it from two threads at once? Will one of the threads get the data? Will both get it? Will the 2nd call to recv return with an error?

From stackoverflow
  • Socket implementations should be thread-safe, so exactly one thread should get the data when it becomes available. The other call should just block.

  • One thread will get it, and there's no way to tell which.

    This doesn't seem like a reasonable design. Is there a reason why you need two threads calling recv() on the same socket?

    Claudiu : no, i'm implementing this myself for a project and I wanted to know what should happen
    dwc : I asked because I can't see a good reason to design it that way. One of the benefits of threading is to isolate things like sockets to simplify handling. Sharing reads on a socket between threads introduces lots of complexity instead.
    bdonlan : Reading from the same socket from two threads makes sense if it's a UDP socket for a connectionless protocol such as DNS. Each thread then independently works on incoming requests.
  • I can't find a reference for this, but here's my understanding:

    A vendor's guarantee of thread-safety may mean only that multiple threads can each safely use their own sockets; it does not guarantee atomicity across a single call, and it doesn't promise any particular allocation of the socket's data among multiple threads.

    Suppose thread A calls recv() on a socket that's receiving TCP data streaming in at a high rate. If recv() needs to be an atomic call, then thread A could block all other threads from executing, because it needs to be running continuously to pull in all the data (until its buffer is full, anyway.) That wouldn't be good. Hence, I would not assume that recv() is immune to context switching.

    Conversely, suppose thread A makes a blocking call to recv() on a TCP socket, and the data is coming in slowly. Hence the call to recv() returns with errno set to EAGAIN.

    In either of these cases, suppose thread B calls recv() on the same socket while thread A is still receiving data. When does thread A stop getting data handed to it so that thread B can start receiving data? I don't know of a Unix implementation that will try to remember that thread A was in the middle of an operation on the socket; instead, it's up to the application (threads A and B) to negotiate their use of it.

    Generally, it's best to design the app so that only one of the threads will call recv() on a single socket.

    Zan Lynx : I think that you are wrong about your assumptions here. recv() gets one packet of data and is usually used with UDP, with TCP it still gets one packet of data but unless you were very careful on the sender side you may get the contents of two writes in one recv.
    Zan Lynx : Okay, I just checked the docs and it is a bit more complicated than one packet depending on flags, but that is what I have observed as the usual case in Linux apps I have worked on.
    Dan Breslau : UDP datagrams are certainly safer than TCP streams wrt threads. I'm not certain that they're 100% thread-safe, though -- if you get an EINTR (does that still happen?), then thread A returns from recv() prematurely, giving thread B a chance to jump in.
    Dan Breslau : But TCP is not packetized; that's why we refer to TCP *streams*.
    Zan Lynx : Signals generally only interrupt blocking system calls on a Unix. recv() doesn't block unless you ask it to, it is only copying from already received packet buffers. I don't think EINTR would happen mid-packet.
    Zan Lynx : TCP *is* packetized because IP is packetized, run a network analyzer like Wireshark and see. What I have seen is that recv() will tend to (not always) return one IP packet worth of TCP data.
    Dan Breslau : But IP packets aren't visible to the application -- and vice-versa. When using TCP, it's up to the callers to mark and handle message boundaries. However, UDP "knows" that a packet of data was passed to send(), and so it "knows" when that packet is complete on the recv() side.
    Zan Lynx : A case I see very often: application waiting with select/poll, then does a recv() on a TCP socket. The very first complete data available to the Linux OS is one packet, which will be returned by recv. This is under Internet speed conditions, not local gigabit.
    Dan Breslau : It's been a while since I've done socket-level programming. But many years ago, the experience of writing a Jabber client in a single-threaded environment pretty much burned into my brain that TCP is *not* packetized at the socket level.
    Dan Breslau : In your app, perhaps the sender is delaying even a short time between sends? That could explain why you get a complete packet (and no other data) in your recv() calls.
  • From the man page on recv

    A recv() on a SOCK_STREAM socket returns as much available information as the size of the buffer supplied can hold.

    Lets assume you are using TCP, since it was not specified in the question. So suppose you have thread A and thread B both blocking on recv() for socket s. Once s has some data to be received it will unblock one of the threads, lets say A, and return the data. The data returned will be of some random size as far as we are concerned. Thread A inspects the data received and decides if it has a complete "message", where a message is an application level concept.

    Thread A decides it does not have a complete message, so it calls recv() again. BUT in the meantime B was already blocking on the same socket, and has received the rest of the "message" that was intended for thread A. I am using intended loosely here.

    Now both thread A and thread B have an incomplete message, and will, depending on how the code is written, throw the data away as invalid, or cause weird and subtle errors.

    I wish I could say I didn't know this from experience.

    So while recv() itself is technically thread safe, it is a bad idea to have two threads calling it simultaneously if you are using it for TCP.

    As far as I know it is completely safe when you are using UDP.

    I hope this helps.

0 comments:

Post a Comment