Lou Langholtz wrote:
> 
> Paul Clements wrote:
> 
> >>Except that in the error case, the send basically didn't succeed. So no
> >>need to worry about recieving a reply and no race possibility in that case.
> >
> >As long as the request is on the queue, it is possible for nbd-client to
> >die, thus freeing the request (via nbd_clear_que -> nbd_end_request),
> >and leaving us with a race between the free and do_nbd_request()
> >accessing the request structure.
>
> Quite right. I missed that case in this last patch (when nbd_do_it has
> returned and NBD_DO_IT is about to call nbd_clear_que [1]). Just moving
> the errors increment (near the end of nbd_send_req) to within the
> semaphore protected region would fix this particular case. An even
> larger race window exists with the request getting free'd when
> nbd-client is used to disconnect in which it calls NBD_CLEAR_QUE before
> NBD_DISCONNECT [2]. In this case, moving the errors increment doesn't
> help of course since the nbd_clear_queue in 2.6.0-test2 doesn't bother
> to check the tx_lock semaphore anyway. I believe reference counting the
> request (as you suggest) would protect against both these windows though.

> Will you be working on closing the other clear-queue race also then?

Here's the patch to fix up several race conditions in nbd. It requires
reverting the already included (but admittedly incomplete)
nbd-race-fix.patch that's in -mm5.

Andrew, please apply.

Thanks,
Paul