Am Montag, den 08.06.2020, 11:24 +0900 schrieb Tetsuo Handa: Hi, sorry for being late in reply. I have had an emergency to take care of. > On 2020/05/31 0:47, Alan Stern wrote: > > On Sat, May 30, 2020 at 05:25:11PM +0200, Oliver Neukum wrote: > > > Am Donnerstag, den 28.05.2020, 16:58 -0400 schrieb Alan Stern: > > > > This sounds like a bug in the driver. What would it do if someone had a > > > > > > Arguably yes. I will introduce a timeout. Unfortunately flush() > > > requires a non-interruptible sleep, as you cannot sanely return EAGAIN. > > > > But maybe you can kill some URBs and drop some data. > > You mean call usb_kill_urb() via kill_urbs() ? I have to correct myself. We can return -EINTR. But that is no solution ultimately. We could not close the fd, though we would not hang. > As far as I tested, it seems that usb_kill_urb() sometimes fails to call > wdm_out_callback() despite the comment for usb_kill_urb() says > > * This routine cancels an in-progress request. It is guaranteed that > * upon return all completion handlers will have finished and the URB > * will be totally idle and available for reuse. These features make > * this an ideal way to stop I/O in a disconnect() callback or close() > * function. If the request has not already finished or been unlinked > * the completion handler will see urb->status == -ENOENT. It looks like it does exactly as the description says. Cancelling an URB is by necessity a race condition. It can always finish before you can kill it. > . Is something still wrong? Or just replacing > > BUG_ON(test_bit(WDM_IN_USE, &desc->flags) && > !test_bit(WDM_DISCONNECTING, &desc->flags)); > > with > > wait_event(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || > test_bit(WDM_DISCONNECTING, &desc->flags)); > > in the patch shown below is sufficient? > > diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c > index e3db6fbeadef..3e92e79ce0a0 100644 > --- a/drivers/usb/class/cdc-wdm.c > +++ b/drivers/usb/class/cdc-wdm.c > @@ -151,7 +151,7 @@ static void wdm_out_callback(struct urb *urb) > kfree(desc->outbuf); > desc->outbuf = NULL; > clear_bit(WDM_IN_USE, &desc->flags); > - wake_up(&desc->wait); > + wake_up_all(&desc->wait); > } > > static void wdm_in_callback(struct urb *urb) > @@ -424,6 +424,7 @@ static ssize_t wdm_write > if (rv < 0) { > desc->outbuf = NULL; > clear_bit(WDM_IN_USE, &desc->flags); > + wake_up_all(&desc->wait); > dev_err(&desc->intf->dev, "Tx URB error: %d\n", rv); > rv = usb_translate_errors(rv); > goto out_free_mem_pm; > @@ -587,15 +588,16 @@ static int wdm_flush(struct file *file, fl_owner_t id) > { > struct wdm_device *desc = file->private_data; > > - wait_event(desc->wait, > - /* > - * needs both flags. We cannot do with one > - * because resetting it would cause a race > - * with write() yet we need to signal > - * a disconnect > - */ > - !test_bit(WDM_IN_USE, &desc->flags) || > - test_bit(WDM_DISCONNECTING, &desc->flags)); > + /* > + * needs both flags. We cannot do with one because resetting it would > + * cause a race with write() yet we need to signal a disconnect > + */ > + if (!wait_event_timeout(desc->wait, !test_bit(WDM_IN_USE, &desc->flags) || > + test_bit(WDM_DISCONNECTING, &desc->flags), 20 * HZ)) { > + kill_urbs(desc); No. We cannot just kill all URBs just because one fd's owner wants to flush. In fact we have multiple code paths that can reach the same hang. Could you test the attached patches? Regards Oliver