Lou Langholtz wrote: > > Paul Clements wrote: > > >>Except that in the error case, the send basically didn't succeed. So no > >>need to worry about recieving a reply and no race possibility in that case. > > > >As long as the request is on the queue, it is possible for nbd-client to > >die, thus freeing the request (via nbd_clear_que -> nbd_end_request), > >and leaving us with a race between the free and do_nbd_request() > >accessing the request structure. > > Quite right. I missed that case in this last patch (when nbd_do_it has > returned and NBD_DO_IT is about to call nbd_clear_que [1]). Just moving > the errors increment (near the end of nbd_send_req) to within the > semaphore protected region would fix this particular case. An even > larger race window exists with the request getting free'd when > nbd-client is used to disconnect in which it calls NBD_CLEAR_QUE before > NBD_DISCONNECT [2]. In this case, moving the errors increment doesn't > help of course since the nbd_clear_queue in 2.6.0-test2 doesn't bother > to check the tx_lock semaphore anyway. I believe reference counting the > request (as you suggest) would protect against both these windows though. > Will you be working on closing the other clear-queue race also then? Here's the patch to fix up several race conditions in nbd. It requires reverting the already included (but admittedly incomplete) nbd-race-fix.patch that's in -mm5. Andrew, please apply. Thanks, Paul