From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754422Ab3GaQKX (ORCPT ); Wed, 31 Jul 2013 12:10:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:25621 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751130Ab3GaQKV (ORCPT ); Wed, 31 Jul 2013 12:10:21 -0400 Date: Wed, 31 Jul 2013 12:09:45 -0400 From: Vivek Goyal To: Bjorn Helgaas Cc: Takao Indoh , "linux-kernel@vger.kernel.org" , "linux-pci@vger.kernel.org" , "open list:INTEL IOMMU (VT-d)" , "kexec@lists.infradead.org" , "ishii.hironobu@jp.fujitsu.com" , Don Dutile , "Sumner, William" , "alex.williamson@redhat.com" , Haren Myneni Subject: Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA Message-ID: <20130731160944.GA1577@redhat.com> References: <51B19DF3.2070009@jp.fujitsu.com> <51B6BEDB.3000509@jp.fujitsu.com> <51B93221.2040505@jp.fujitsu.com> <51BA7BB6.1080104@jp.fujitsu.com> <51EF7466.20703@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 25, 2013 at 11:00:46AM -0600, Bjorn Helgaas wrote: > On Wed, Jul 24, 2013 at 12:29 AM, Takao Indoh > wrote: > > Sorry for letting this discussion slide, I was busy on other works:-( > > Anyway, the summary of previous discussion is: > > - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on > > boot. This expects PCI enumeration is done before IOMMU > > initialization as follows. > > (1) PCI enumeration > > (2) fs_initcall ---> device reset > > (3) IOMMU initialization > > - This works on x86, but does not work on other architecture because > > IOMMU is initialized before PCI enumeration on some architectures. So, > > device reset should be done where IOMMU is initialized instead of > > initcall. > > - Or, as another idea, we can reset devices in first kernel(panic kernel) > > > > Resetting devices in panic kernel is against kdump policy and seems not to > > be good idea. So I think adding reset code into iommu initialization is > > better. I'll post patches for that. > > Of course nobody *wants* to do anything in the panic kernel. But > simply saying "it's against kdump policy and seems not to be a good > idea" is not a technical argument. There are things that are > impractical to do in the kdump kernel, so they have to be done in the > panic kernel even though we know the kernel is unreliable and the > attempt may fail. I think resetting all devices in crashed kernel is really a lot of code. If there is a small piece of code, it can still be considered. I don't know much about IOMMU or PCI or PCIE. But I am taking one step back and discuss again the idea of not resetting the IOMMU in second kernel. I think resetting the bus is a good idea but just resetting PCIE will solve only part of the problem and we will same issues with devices on other buses. So what sounds more appealing if we could fix this particular problem at IOMMU level first (and continue to develp patches for resetting various buses). In the past also these ideas have been proposed that continue to use translation table from first kernel. Retain those mappings and don't reset IOMMU. Reserve some space for kdump mappings in first kernel and use that reserved mapping space in second kernel. It never got implemented though. Bjorn, so what's the fundamental problem with this idea? Also, what's wrong with DMAR error. If some device tried to do DMA, and DMA was blocked because IOMMU got reset and mappings are no more there, why does it lead to failure. Shouldn't we just reate limit error messages in such case and if device is needed, anyway driver will reset it. Other problem mentioned in this thread is PCI SERR. What is it? Is it some kind of error device reports if it can't do DMA successfully. Can these errors be simply ignored kdump kernel? This problem sounds similar to a device keeping interrupt asserted in second kernel and kernel simply disables the interrupt line if nobody claims the interrupt. IOW, it feels to me that we should handle the issue (DMAR error) at IOMMU level first (instead of trying to make sure that by the time we get to initialize IOMMU(), all devices in system have been quiesced and nobody is doing DMA). Thanks Vivek From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vivek Goyal Subject: Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA Date: Wed, 31 Jul 2013 12:09:45 -0400 Message-ID: <20130731160944.GA1577@redhat.com> References: <51B19DF3.2070009@jp.fujitsu.com> <51B6BEDB.3000509@jp.fujitsu.com> <51B93221.2040505@jp.fujitsu.com> <51BA7BB6.1080104@jp.fujitsu.com> <51EF7466.20703@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Bjorn Helgaas Cc: "linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org" , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Haren Myneni , "open list:INTEL IOMMU (VT-d)" , "ishii.hironobu-+CUm20s59erQFUHtdCDX3A@public.gmane.org" , "Sumner, William" List-Id: iommu@lists.linux-foundation.org On Thu, Jul 25, 2013 at 11:00:46AM -0600, Bjorn Helgaas wrote: > On Wed, Jul 24, 2013 at 12:29 AM, Takao Indoh > wrote: > > Sorry for letting this discussion slide, I was busy on other works:-( > > Anyway, the summary of previous discussion is: > > - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on > > boot. This expects PCI enumeration is done before IOMMU > > initialization as follows. > > (1) PCI enumeration > > (2) fs_initcall ---> device reset > > (3) IOMMU initialization > > - This works on x86, but does not work on other architecture because > > IOMMU is initialized before PCI enumeration on some architectures. So, > > device reset should be done where IOMMU is initialized instead of > > initcall. > > - Or, as another idea, we can reset devices in first kernel(panic kernel) > > > > Resetting devices in panic kernel is against kdump policy and seems not to > > be good idea. So I think adding reset code into iommu initialization is > > better. I'll post patches for that. > > Of course nobody *wants* to do anything in the panic kernel. But > simply saying "it's against kdump policy and seems not to be a good > idea" is not a technical argument. There are things that are > impractical to do in the kdump kernel, so they have to be done in the > panic kernel even though we know the kernel is unreliable and the > attempt may fail. I think resetting all devices in crashed kernel is really a lot of code. If there is a small piece of code, it can still be considered. I don't know much about IOMMU or PCI or PCIE. But I am taking one step back and discuss again the idea of not resetting the IOMMU in second kernel. I think resetting the bus is a good idea but just resetting PCIE will solve only part of the problem and we will same issues with devices on other buses. So what sounds more appealing if we could fix this particular problem at IOMMU level first (and continue to develp patches for resetting various buses). In the past also these ideas have been proposed that continue to use translation table from first kernel. Retain those mappings and don't reset IOMMU. Reserve some space for kdump mappings in first kernel and use that reserved mapping space in second kernel. It never got implemented though. Bjorn, so what's the fundamental problem with this idea? Also, what's wrong with DMAR error. If some device tried to do DMA, and DMA was blocked because IOMMU got reset and mappings are no more there, why does it lead to failure. Shouldn't we just reate limit error messages in such case and if device is needed, anyway driver will reset it. Other problem mentioned in this thread is PCI SERR. What is it? Is it some kind of error device reports if it can't do DMA successfully. Can these errors be simply ignored kdump kernel? This problem sounds similar to a device keeping interrupt asserted in second kernel and kernel simply disables the interrupt line if nobody claims the interrupt. IOW, it feels to me that we should handle the issue (DMAR error) at IOMMU level first (instead of trying to make sure that by the time we get to initialize IOMMU(), all devices in system have been quiesced and nobody is doing DMA). Thanks Vivek From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mx1.redhat.com ([209.132.183.28]) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1V4Yye-0003gX-KH for kexec@lists.infradead.org; Wed, 31 Jul 2013 16:10:17 +0000 Date: Wed, 31 Jul 2013 12:09:45 -0400 From: Vivek Goyal Subject: Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA Message-ID: <20130731160944.GA1577@redhat.com> References: <51B19DF3.2070009@jp.fujitsu.com> <51B6BEDB.3000509@jp.fujitsu.com> <51B93221.2040505@jp.fujitsu.com> <51BA7BB6.1080104@jp.fujitsu.com> <51EF7466.20703@jp.fujitsu.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=twosheds.infradead.org@lists.infradead.org To: Bjorn Helgaas Cc: "alex.williamson@redhat.com" , Takao Indoh , "linux-pci@vger.kernel.org" , "kexec@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Haren Myneni , "open list:INTEL IOMMU (VT-d)" , Don Dutile , "ishii.hironobu@jp.fujitsu.com" , "Sumner, William" On Thu, Jul 25, 2013 at 11:00:46AM -0600, Bjorn Helgaas wrote: > On Wed, Jul 24, 2013 at 12:29 AM, Takao Indoh > wrote: > > Sorry for letting this discussion slide, I was busy on other works:-( > > Anyway, the summary of previous discussion is: > > - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on > > boot. This expects PCI enumeration is done before IOMMU > > initialization as follows. > > (1) PCI enumeration > > (2) fs_initcall ---> device reset > > (3) IOMMU initialization > > - This works on x86, but does not work on other architecture because > > IOMMU is initialized before PCI enumeration on some architectures. So, > > device reset should be done where IOMMU is initialized instead of > > initcall. > > - Or, as another idea, we can reset devices in first kernel(panic kernel) > > > > Resetting devices in panic kernel is against kdump policy and seems not to > > be good idea. So I think adding reset code into iommu initialization is > > better. I'll post patches for that. > > Of course nobody *wants* to do anything in the panic kernel. But > simply saying "it's against kdump policy and seems not to be a good > idea" is not a technical argument. There are things that are > impractical to do in the kdump kernel, so they have to be done in the > panic kernel even though we know the kernel is unreliable and the > attempt may fail. I think resetting all devices in crashed kernel is really a lot of code. If there is a small piece of code, it can still be considered. I don't know much about IOMMU or PCI or PCIE. But I am taking one step back and discuss again the idea of not resetting the IOMMU in second kernel. I think resetting the bus is a good idea but just resetting PCIE will solve only part of the problem and we will same issues with devices on other buses. So what sounds more appealing if we could fix this particular problem at IOMMU level first (and continue to develp patches for resetting various buses). In the past also these ideas have been proposed that continue to use translation table from first kernel. Retain those mappings and don't reset IOMMU. Reserve some space for kdump mappings in first kernel and use that reserved mapping space in second kernel. It never got implemented though. Bjorn, so what's the fundamental problem with this idea? Also, what's wrong with DMAR error. If some device tried to do DMA, and DMA was blocked because IOMMU got reset and mappings are no more there, why does it lead to failure. Shouldn't we just reate limit error messages in such case and if device is needed, anyway driver will reset it. Other problem mentioned in this thread is PCI SERR. What is it? Is it some kind of error device reports if it can't do DMA successfully. Can these errors be simply ignored kdump kernel? This problem sounds similar to a device keeping interrupt asserted in second kernel and kernel simply disables the interrupt line if nobody claims the interrupt. IOW, it feels to me that we should handle the issue (DMAR error) at IOMMU level first (instead of trying to make sure that by the time we get to initialize IOMMU(), all devices in system have been quiesced and nobody is doing DMA). Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec