From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754845Ab3A2LLq (ORCPT ); Tue, 29 Jan 2013 06:11:46 -0500 Received: from h1446028.stratoserver.net ([85.214.92.142]:43384 "EHLO mail.ahsoftware.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751720Ab3A2LLo (ORCPT ); Tue, 29 Jan 2013 06:11:44 -0500 Message-ID: <5107AE4F.9000809@ahsoftware.de> Date: Tue, 29 Jan 2013 12:11:11 +0100 From: Alexander Holler User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Andrew Morton CC: linux-kernel@vger.kernel.org, linux-fbdev@vger.kernel.org, Florian Tobias Schandinat , Bernie Thompson , Steve Glendinning , stable@vger.kernel.org Subject: Re: [PATCH 2/3 v2] fb: udlfb: fix hang at disconnect References: <50F2A310.5010006@ahsoftware.de> <1359139768-32294-1-git-send-email-holler@ahsoftware.de> <1359139768-32294-2-git-send-email-holler@ahsoftware.de> <20130128162238.7fba92fe.akpm@linux-foundation.org> <51071E21.9030008@ahsoftware.de> <5107A5ED.7020009@ahsoftware.de> In-Reply-To: <5107A5ED.7020009@ahsoftware.de> Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 29.01.2013 11:35, schrieb Alexander Holler: > Am 29.01.2013 01:56, schrieb Alexander Holler: >> Am 29.01.2013 01:22, schrieb Andrew Morton: >>> On Fri, 25 Jan 2013 19:49:27 +0100 >>> Alexander Holler wrote: >>> >>>> When a device was disconnected the driver may hang at waiting for >>>> urbs it never >>>> will get. Fix this by using a timeout while waiting for the used >>>> semaphore. >>>> >>>> There is still a memory leak if a timeout happens, but at least the >>>> driver >>>> now continues his disconnect routine. >>>> >>>> ... >>>> >>>> --- a/drivers/video/udlfb.c >>>> +++ b/drivers/video/udlfb.c >>>> @@ -1832,8 +1832,9 @@ static void dlfb_free_urb_list(struct >>>> dlfb_data *dev) >>>> /* keep waiting and freeing, until we've got 'em all */ >>>> while (count--) { >>>> >>>> - /* Getting interrupted means a leak, but ok at disconnect */ >>>> - ret = down_interruptible(&dev->urbs.limit_sem); >>>> + /* Timeout likely occurs at disconnect (resulting in a >>>> leak) */ >>>> + ret = down_timeout_killable(&dev->urbs.limit_sem, >>>> + FREE_URB_TIMEOUT); >>>> if (ret) >>>> break; >>> >>> This is rather a hack. Do you have an understanding of the underlying >>> bug? Why is the driver waiting for things which will never happen? > > To add a bit more explanation: > > I've experienced that bug after moving the fb-damage-handling into a > workqueue (to make the driver usable as console). This likely has > increased the possibility that an urb gets missed when the usb-stack > calls the (usb-)disconnect function of the driver. But I don't know as I > couldn't use the driver before (as fbcon) so I don't really have a > comparison. > > What currently happens here is something like that: > > fb -> damage -> workload which sends urb and waits for answer > device disconnect -> dlfb_usb_disconnect() -> stall (no answer to the > above urb) > > I don't know why the disconnect waits for all urbs. The code looks like > it does that just to free the allocated memory. As I'm not very familiar > with the usb-stack, I would have to read up about the urb-handling to > find out how to free the memory otherwise. > > As the previous comment in the code suggests that urbs already got > missed (on shutdown) before, I assume that even without my patch, which > moved the damage into a workqueue, the problem could occur which then > prevents a shutdown as there is no timeout. As I've experienced that > problem not only on disconnect, but on shutdown too (no shutdown was > possible), I have to assume, that the previous used down_interruptible() > didn't get a signal on shutdown (if the driver is used as fbcon), > therefor I consider the timeout as necessary. To explain the problem on shutdown a bit further, I think the following happens (usb and driver are statically linked and started by the kernel): shutdown -> kill signal -> usb stack shuts down -> udlfb waits (forever) for a kill or an urb which it doesn't get. Maybe the sequence is different if the usb-stack and udlfb are used as a module and/or udlfb is used only for X/fb. I'm not sure what actually does shut down the usb-stack in such a case, but maybe more than one kill signal might be thrown around. Regards, Alexander From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Holler Date: Tue, 29 Jan 2013 11:11:11 +0000 Subject: Re: [PATCH 2/3 v2] fb: udlfb: fix hang at disconnect Message-Id: <5107AE4F.9000809@ahsoftware.de> List-Id: References: <50F2A310.5010006@ahsoftware.de> <1359139768-32294-1-git-send-email-holler@ahsoftware.de> <1359139768-32294-2-git-send-email-holler@ahsoftware.de> <20130128162238.7fba92fe.akpm@linux-foundation.org> <51071E21.9030008@ahsoftware.de> <5107A5ED.7020009@ahsoftware.de> In-Reply-To: <5107A5ED.7020009@ahsoftware.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-fbdev@vger.kernel.org, Florian Tobias Schandinat , Bernie Thompson , Steve Glendinning , stable@vger.kernel.org Am 29.01.2013 11:35, schrieb Alexander Holler: > Am 29.01.2013 01:56, schrieb Alexander Holler: >> Am 29.01.2013 01:22, schrieb Andrew Morton: >>> On Fri, 25 Jan 2013 19:49:27 +0100 >>> Alexander Holler wrote: >>> >>>> When a device was disconnected the driver may hang at waiting for >>>> urbs it never >>>> will get. Fix this by using a timeout while waiting for the used >>>> semaphore. >>>> >>>> There is still a memory leak if a timeout happens, but at least the >>>> driver >>>> now continues his disconnect routine. >>>> >>>> ... >>>> >>>> --- a/drivers/video/udlfb.c >>>> +++ b/drivers/video/udlfb.c >>>> @@ -1832,8 +1832,9 @@ static void dlfb_free_urb_list(struct >>>> dlfb_data *dev) >>>> /* keep waiting and freeing, until we've got 'em all */ >>>> while (count--) { >>>> >>>> - /* Getting interrupted means a leak, but ok at disconnect */ >>>> - ret = down_interruptible(&dev->urbs.limit_sem); >>>> + /* Timeout likely occurs at disconnect (resulting in a >>>> leak) */ >>>> + ret = down_timeout_killable(&dev->urbs.limit_sem, >>>> + FREE_URB_TIMEOUT); >>>> if (ret) >>>> break; >>> >>> This is rather a hack. Do you have an understanding of the underlying >>> bug? Why is the driver waiting for things which will never happen? > > To add a bit more explanation: > > I've experienced that bug after moving the fb-damage-handling into a > workqueue (to make the driver usable as console). This likely has > increased the possibility that an urb gets missed when the usb-stack > calls the (usb-)disconnect function of the driver. But I don't know as I > couldn't use the driver before (as fbcon) so I don't really have a > comparison. > > What currently happens here is something like that: > > fb -> damage -> workload which sends urb and waits for answer > device disconnect -> dlfb_usb_disconnect() -> stall (no answer to the > above urb) > > I don't know why the disconnect waits for all urbs. The code looks like > it does that just to free the allocated memory. As I'm not very familiar > with the usb-stack, I would have to read up about the urb-handling to > find out how to free the memory otherwise. > > As the previous comment in the code suggests that urbs already got > missed (on shutdown) before, I assume that even without my patch, which > moved the damage into a workqueue, the problem could occur which then > prevents a shutdown as there is no timeout. As I've experienced that > problem not only on disconnect, but on shutdown too (no shutdown was > possible), I have to assume, that the previous used down_interruptible() > didn't get a signal on shutdown (if the driver is used as fbcon), > therefor I consider the timeout as necessary. To explain the problem on shutdown a bit further, I think the following happens (usb and driver are statically linked and started by the kernel): shutdown -> kill signal -> usb stack shuts down -> udlfb waits (forever) for a kill or an urb which it doesn't get. Maybe the sequence is different if the usb-stack and udlfb are used as a module and/or udlfb is used only for X/fb. I'm not sure what actually does shut down the usb-stack in such a case, but maybe more than one kill signal might be thrown around. Regards, Alexander