From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756913Ab2DXRfW (ORCPT ); Tue, 24 Apr 2012 13:35:22 -0400 Received: from mx1.redhat.com ([209.132.183.28]:5507 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756863Ab2DXRfU (ORCPT ); Tue, 24 Apr 2012 13:35:20 -0400 Message-ID: <4F96E44F.9060808@redhat.com> Date: Tue, 24 Apr 2012 13:35:11 -0400 From: Don Dutile User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111104 Red Hat/3.1.16-2.el6_1 Thunderbird/3.1.16 MIME-Version: 1.0 To: Bjorn Helgaas CC: "Rafael J. Wysocki" , Linus Torvalds , Mikko Vinni , "linux-input@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Dmitry Torokhov , Allen Kay , Jesse Barnes , "linux-pci@vger.kernel.org" Subject: Re: [PATCH] PCI: Fix regression in pci_restore_state() References: <1334310754.17013.YahooMailNeo@web161804.mail.bf1.yahoo.com> <201204152140.40747.rjw@sisk.pl> <201204232153.57351.rjw@sisk.pl> <4F95B67D.7020706@redhat.com> <4F96CEDB.3060300@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/24/2012 01:01 PM, Bjorn Helgaas wrote: > On Tue, Apr 24, 2012 at 10:03 AM, Don Dutile wrote: >> On 04/23/2012 06:33 PM, Bjorn Helgaas wrote: >>> >>> On Mon, Apr 23, 2012 at 2:07 PM, Don Dutile wrote: >>>> >>>> On 04/23/2012 03:53 PM, Rafael J. Wysocki wrote: >>>>> >>>>> >>>>> On Monday, April 23, 2012, Bjorn Helgaas wrote: >>>>>> >>>>>> >>>>>> On Sun, Apr 15, 2012 at 1:40 PM, Rafael J. Wysocki >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On Sunday, April 15, 2012, Linus Torvalds wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> mdelay(10) doesn't really look good either to me in this case, >>>>>>>>> though. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Oh, I agree. What kind of ass-backwards device actually needs that >>>>>>>> kind of crazy delays? It is almost certainly buggy. >>>>>>>> >>>>>>>> With retries, 10ms delays are totally unacceptable. There's something >>>>>>>> wrong. >>>>>>>> >>>>>>>> A single ms *may* be ok. >>>>>>>> >>>>>>>> Anyway, can you also split the actual "write _one_ register with >>>>>>>> retry" into a function of its own? The code looks like crap with >>>>>>>> those >>>>>>>> multiple levels of looping, with conditionals inside them etc. With a >>>>>>>> simple helper function, you could change the break into return, and >>>>>>>> it >>>>>>>> would look much better, I bet. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Sure. It appears cleaner this way. >>>>>>> >>>>>>> --- >>>>>>> From: Rafael J. Wysocki >>>>>>> Subject: PCI: Fix regression in pci_restore_state(), v3 >>>>>>> >>>>>>> Commit 26f41062f28de65e11d3cf353e52d0be73442be1 >>>>>>> >>>>>>> PCI: check for pci bar restore completion and retry >>>>>>> >>>>>>> attempted to address problems with PCI BAR restoration on systems >>>>>>> where FLR had not been completed before pci_restore_state() was >>>>>>> called, but it did that in an utterly wrong way. >>>>>>> >>>>>>> First off, instead of retrying the writes for the BAR registers >>>>>>> only, it did that for all of the PCI config space of the device, >>>>>>> including the status register (whose value after the write quite >>>>>>> obviously need not be the same as the written one). Second, it >>>>>>> added arbitrary delay to pci_restore_state() even for systems >>>>>>> where the PCI config space restoration was successful at first >>>>>>> attempt. Finally, the mdelay(10) it added to every iteration of the >>>>>>> writing loop was way too much of a delay for any reasonable device. >>>>>>> >>>>>>> All of this actually caused resume failures for some devices on >>>>>>> the Mikko's system. >>>>>>> >>>>>>> To fix the regression, make pci_restore_state() only retry the >>>>>>> writes for BAR registers and only wait if the first read from >>>>>>> the register doesn't return the written value. Additionaly, make >>>>>>> it wait for 1 ms, instead of 10 ms, after every failing attempt >>>>>>> to write into config space. >>>>>>> >>>>>>> Reported-by: Mikko Vinni >>>>>>> Signed-off-by: Rafael J. Wysocki >>>>>>> --- >>>>>>> drivers/pci/pci.c | 57 >>>>>>> ++++++++++++++++++++++++++++++++++++------------------ >>>>>>> 1 file changed, 39 insertions(+), 18 deletions(-) >>>>>>> >>>>>>> Index: linux/drivers/pci/pci.c >>>>>>> =================================================================== >>>>>>> --- linux.orig/drivers/pci/pci.c >>>>>>> +++ linux/drivers/pci/pci.c >>>>>>> @@ -967,16 +967,47 @@ pci_save_state(struct pci_dev *dev) >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> +static void pci_restore_config_dword(struct pci_dev *pdev, int >>>>>>> offset, >>>>>>> + u32 saved_val, int retry) >>>>>>> +{ >>>>>>> + u32 val; >>>>>>> + >>>>>>> + pci_read_config_dword(pdev, offset,&val); >>>>>>> + if (val == saved_val) >>>>>>> + return; >>>>>>> + >>>>>>> + for (;;) { >>>>>>> + dev_dbg(&pdev->dev, "restoring config space at offset >>>>>>> " >>>>>>> + "%#x (was %#x, writing %#x)\n", offset, val, >>>>>>> saved_val); >>>>>>> + pci_write_config_dword(pdev, offset, saved_val); >>>>>>> + if (retry--<= 0) >>>>>>> + return; >>>>>>> + >>>>>>> + pci_read_config_dword(pdev, offset,&val); >>>>>>> + if (val == saved_val) >>>>>>> + return; >>>>>>> + >>>>>>> + mdelay(1); >>>>>>> + } >>>>>>> +} >>>>>>> + >>>>>>> +static void pci_restore_config_space(struct pci_dev *pdev, int start, >>>>>>> int end, >>>>>>> + int retry) >>>>>>> +{ >>>>>>> + int index; >>>>>>> + >>>>>>> + for (index = end; index>= start; index--) >>>>>>> + pci_restore_config_dword(pdev, 4 * index, >>>>>>> + >>>>>>> pdev->saved_config_space[index], >>>>>>> + retry); >>>>>>> +} >>>>>>> + >>>>>>> /** >>>>>>> * pci_restore_state - Restore the saved state of a PCI device >>>>>>> * @dev: - PCI device that we're dealing with >>>>>>> */ >>>>>>> void pci_restore_state(struct pci_dev *dev) >>>>>>> { >>>>>>> - int i; >>>>>>> - u32 val; >>>>>>> - int tries; >>>>>>> - >>>>>>> if (!dev->state_saved) >>>>>>> return; >>>>>>> >>>>>>> @@ -984,24 +1015,14 @@ void pci_restore_state(struct pci_dev *d >>>>>>> pci_restore_pcie_state(dev); >>>>>>> pci_restore_ats_state(dev); >>>>>>> >>>>>>> + pci_restore_config_space(dev, 10, 15, 0); >>>>>>> /* >>>>>>> * The Base Address register should be programmed before the >>>>>>> command >>>>>>> * register(s) >>>>>>> */ >>>>>>> - for (i = 15; i>= 0; i--) { >>>>>>> - pci_read_config_dword(dev, i * 4,&val); >>>>>>> - tries = 10; >>>>>>> - while (tries&& val != dev->saved_config_space[i]) { >>>>>>> >>>>>>> - dev_dbg(&dev->dev, "restoring config " >>>>>>> - "space at offset %#x (was %#x, writing >>>>>>> %#x)\n", >>>>>>> - i, val, >>>>>>> (int)dev->saved_config_space[i]); >>>>>>> - pci_write_config_dword(dev,i * 4, >>>>>>> - dev->saved_config_space[i]); >>>>>>> - pci_read_config_dword(dev, i * 4,&val); >>>>>>> - mdelay(10); >>>>>>> - tries--; >>>>>>> - } >>>>>>> - } >>>>>>> + pci_restore_config_space(dev, 4, 9, 10); >>>>>>> + pci_restore_config_space(dev, 0, 3, 0); >>>>>>> + >>>>>>> pci_restore_pcix_state(dev); >>>>>>> pci_restore_msi_state(dev); >>>>>>> pci_restore_iov_state(dev); >>>>>> >>>>>> >>>>>> >>>>>> I'd feel better about this if there were a way to delay in the FLR >>>>>> path instead. If we delay in the restore path, we're only fixing one >>>>>> of the many ways config space can be written. Other paths that write >>>>>> config space will just silently fail. >>>>>> >>>>>> The PCIe spec (r3.0, sec 6.6.2) mentions waiting for the "pre-FLR >>>>>> value for Completion Timeout," but I don't see anything that looks >>>>>> like that in pcie_flr() or pci_af_flr(). Are there any other direct >>>>>> ways we can detect when the FLR is complete? >>>>> >>>>> >>>>> >>>>> I'm not aware of any. >>>>> >>>>> Thanks, >>>>> Rafael >>>> >>>> >>>> I don't think so, either. >>>> I believe an ECN is being worked in the PCI-SIG >>>> to add such a notification, though. >>>> Even if adopted, need to wait for another crank of the hw before >>>> the notification can be used. >>> >>> >>> I agree, we can't do something that works only on new hardware -- we >>> have to make the existing hardware in the field work. >>> >>> What about the "waiting for as much time as the pre-FLR value for >>> Completion Timeout" part? >>> >>> Or can we do something like asserting FLR, sleeping 100ms, then >>> attempting a write to something in config space and retrying until it >>> sticks? It's kludgy, but I'm not sure it's any worse than putting the >>> retries in the restore path, and it would have the advantage that >>> other writers of config space wouldn't have to worry. >> >> Depending on system config, reading a port that is being FLR'd >> can cause AERs, which if a driver is registered for the endpoint, >> it will get AERs reported to the driver and potentially complicate the >> FLRhandling. >> >> This implies a hook to temp-disable AER during FLR, then turning it >> back on (hw&/or sw). > > Don't we have the same possibility of causing AERs if we add this > retry stuff in pci_restore_state()? It sounds like the same > possibility exists regardless of whether the config access happens in > the FLR path or the restore path. Dont know; have to re-read thread on where/when pci_restore_state() is recalled wrt link synch/enable state(s).