linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump
@ 2021-02-12 16:41 Greg Kurz
  2021-02-12 16:51 ` Laurent Vivier
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Greg Kurz @ 2021-02-12 16:41 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, Cédric Le Goater, Greg Kurz,
	lvivier, stable

Depending on the number of online CPUs in the original kernel, it is
likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
in the affinity mappings provided by irq_create_affinity_masks() are
thus not started by irq_startup(), as per-design with managed IRQs.

This can be a problem with multi-queue block devices driven by blk-mq :
such a non-started IRQ is very likely paired with the single queue
enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
causes the device to remain silent and likely hangs the guest at
some point.

This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
Pass MSI affinity to irq_create_mapping()"). Note that this only happens
with the XIVE interrupt controller because XICS has a workaround to bypass
affinity, which is activated during kdump with the "noirqdistrib" kernel
parameter.

The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue
  block driver, that was used to create the MSI vectors, and the single
  queue mode enforced later on by blk-mq because of kdump (i.e. keeping
  all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)

Given that I couldn't reproduce on x86, which seems to always have CPU#0
online even during kdump, I'm not sure where this should be fixed. Hence
going for another approach : fine-grained affinity is for performance
and we don't really care about that during kdump. Simply revert to the
previous working behavior of ignoring affinity masks in this case only.

Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
Cc: lvivier@redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/platforms/pseries/msi.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index b3ac2455faad..29d04b83288d 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
 			return hwirq;
 		}
 
-		virq = irq_create_mapping_affinity(NULL, hwirq,
-						   entry->affinity);
+		/*
+		 * Depending on the number of online CPUs in the original
+		 * kernel, it is likely for CPU #0 to be offline in a kdump
+		 * kernel. The associated IRQs in the affinity mappings
+		 * provided by irq_create_affinity_masks() are thus not
+		 * started by irq_startup(), as per-design for managed IRQs.
+		 * This can be a problem with multi-queue block devices driven
+		 * by blk-mq : such a non-started IRQ is very likely paired
+		 * with the single queue enforced by blk-mq during kdump (see
+		 * blk_mq_alloc_tag_set()). This causes the device to remain
+		 * silent and likely hangs the guest at some point.
+		 *
+		 * We don't really care for fine-grained affinity when doing
+		 * kdump actually : simply ignore the pre-computed affinity
+		 * masks in this case and let the default mask with all CPUs
+		 * be used when creating the IRQ mappings.
+		 */
+		if (is_kdump_kernel())
+			virq = irq_create_mapping(NULL, hwirq);
+		else
+			virq = irq_create_mapping_affinity(NULL, hwirq,
+							   entry->affinity);
 
 		if (!virq) {
 			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump
  2021-02-12 16:41 [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump Greg Kurz
@ 2021-02-12 16:51 ` Laurent Vivier
  2021-02-12 17:27 ` Cédric Le Goater
  2021-02-12 21:05 ` kernel test robot
  2 siblings, 0 replies; 4+ messages in thread
From: Laurent Vivier @ 2021-02-12 16:51 UTC (permalink / raw)
  To: Greg Kurz, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, Cédric Le Goater, stable

On 12/02/2021 17:41, Greg Kurz wrote:
> Depending on the number of online CPUs in the original kernel, it is
> likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
> in the affinity mappings provided by irq_create_affinity_masks() are
> thus not started by irq_startup(), as per-design with managed IRQs.
> 
> This can be a problem with multi-queue block devices driven by blk-mq :
> such a non-started IRQ is very likely paired with the single queue
> enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
> causes the device to remain silent and likely hangs the guest at
> some point.
> 
> This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
> Pass MSI affinity to irq_create_mapping()"). Note that this only happens
> with the XIVE interrupt controller because XICS has a workaround to bypass
> affinity, which is activated during kdump with the "noirqdistrib" kernel
> parameter.
> 
> The issue comes from a combination of factors:
> - discrepancy between the number of queues detected by the multi-queue
>   block driver, that was used to create the MSI vectors, and the single
>   queue mode enforced later on by blk-mq because of kdump (i.e. keeping
>   all queues fixes the issue)
> - CPU#0 offline (i.e. kdump always succeed with CPU#0)
> 
> Given that I couldn't reproduce on x86, which seems to always have CPU#0
> online even during kdump, I'm not sure where this should be fixed. Hence
> going for another approach : fine-grained affinity is for performance
> and we don't really care about that during kdump. Simply revert to the
> previous working behavior of ignoring affinity masks in this case only.
> 
> Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
> Cc: lvivier@redhat.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  arch/powerpc/platforms/pseries/msi.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
> index b3ac2455faad..29d04b83288d 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
>  			return hwirq;
>  		}
>  
> -		virq = irq_create_mapping_affinity(NULL, hwirq,
> -						   entry->affinity);
> +		/*
> +		 * Depending on the number of online CPUs in the original
> +		 * kernel, it is likely for CPU #0 to be offline in a kdump
> +		 * kernel. The associated IRQs in the affinity mappings
> +		 * provided by irq_create_affinity_masks() are thus not
> +		 * started by irq_startup(), as per-design for managed IRQs.
> +		 * This can be a problem with multi-queue block devices driven
> +		 * by blk-mq : such a non-started IRQ is very likely paired
> +		 * with the single queue enforced by blk-mq during kdump (see
> +		 * blk_mq_alloc_tag_set()). This causes the device to remain
> +		 * silent and likely hangs the guest at some point.
> +		 *
> +		 * We don't really care for fine-grained affinity when doing
> +		 * kdump actually : simply ignore the pre-computed affinity
> +		 * masks in this case and let the default mask with all CPUs
> +		 * be used when creating the IRQ mappings.
> +		 */
> +		if (is_kdump_kernel())
> +			virq = irq_create_mapping(NULL, hwirq);
> +		else
> +			virq = irq_create_mapping_affinity(NULL, hwirq,
> +							   entry->affinity);
>  
>  		if (!virq) {
>  			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> 

Reviewed-by: Laurent Vivier <lvivier@redhat.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump
  2021-02-12 16:41 [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump Greg Kurz
  2021-02-12 16:51 ` Laurent Vivier
@ 2021-02-12 17:27 ` Cédric Le Goater
  2021-02-12 21:05 ` kernel test robot
  2 siblings, 0 replies; 4+ messages in thread
From: Cédric Le Goater @ 2021-02-12 17:27 UTC (permalink / raw)
  To: Greg Kurz, Michael Ellerman; +Cc: linuxppc-dev, linux-kernel, lvivier, stable

On 2/12/21 5:41 PM, Greg Kurz wrote:
> Depending on the number of online CPUs in the original kernel, it is
> likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
> in the affinity mappings provided by irq_create_affinity_masks() are
> thus not started by irq_startup(), as per-design with managed IRQs.
> 
> This can be a problem with multi-queue block devices driven by blk-mq :
> such a non-started IRQ is very likely paired with the single queue
> enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
> causes the device to remain silent and likely hangs the guest at
> some point.
> 
> This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
> Pass MSI affinity to irq_create_mapping()"). Note that this only happens
> with the XIVE interrupt controller because XICS has a workaround to bypass
> affinity, which is activated during kdump with the "noirqdistrib" kernel
> parameter.
> 
> The issue comes from a combination of factors:
> - discrepancy between the number of queues detected by the multi-queue
>   block driver, that was used to create the MSI vectors, and the single
>   queue mode enforced later on by blk-mq because of kdump (i.e. keeping
>   all queues fixes the issue)
> - CPU#0 offline (i.e. kdump always succeed with CPU#0)
> 
> Given that I couldn't reproduce on x86, which seems to always have CPU#0
> online even during kdump, I'm not sure where this should be fixed. Hence
> going for another approach : fine-grained affinity is for performance
> and we don't really care about that during kdump. Simply revert to the
> previous working behavior of ignoring affinity masks in this case only.
> 
> Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
> Cc: lvivier@redhat.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Greg Kurz <groug@kaod.org>


Reviewed-by: Cédric Le Goater <clg@kaod.org>

Thanks for tracking this issue. 

This layer needs a rework. Patches adding a MSI domain should be ready 
in a couple of releases. Hopefully. 

C. 

> ---
>  arch/powerpc/platforms/pseries/msi.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
> index b3ac2455faad..29d04b83288d 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
>  			return hwirq;
>  		}
>  
> -		virq = irq_create_mapping_affinity(NULL, hwirq,
> -						   entry->affinity);
> +		/*
> +		 * Depending on the number of online CPUs in the original
> +		 * kernel, it is likely for CPU #0 to be offline in a kdump
> +		 * kernel. The associated IRQs in the affinity mappings
> +		 * provided by irq_create_affinity_masks() are thus not
> +		 * started by irq_startup(), as per-design for managed IRQs.
> +		 * This can be a problem with multi-queue block devices driven
> +		 * by blk-mq : such a non-started IRQ is very likely paired
> +		 * with the single queue enforced by blk-mq during kdump (see
> +		 * blk_mq_alloc_tag_set()). This causes the device to remain
> +		 * silent and likely hangs the guest at some point.
> +		 *
> +		 * We don't really care for fine-grained affinity when doing
> +		 * kdump actually : simply ignore the pre-computed affinity
> +		 * masks in this case and let the default mask with all CPUs
> +		 * be used when creating the IRQ mappings.
> +		 */
> +		if (is_kdump_kernel())
> +			virq = irq_create_mapping(NULL, hwirq);
> +		else
> +			virq = irq_create_mapping_affinity(NULL, hwirq,
> +							   entry->affinity);
>  
>  		if (!virq) {
>  			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump
  2021-02-12 16:41 [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump Greg Kurz
  2021-02-12 16:51 ` Laurent Vivier
  2021-02-12 17:27 ` Cédric Le Goater
@ 2021-02-12 21:05 ` kernel test robot
  2 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2021-02-12 21:05 UTC (permalink / raw)
  To: Greg Kurz, Michael Ellerman
  Cc: kbuild-all, linuxppc-dev, linux-kernel, Cédric Le Goater,
	Greg Kurz, lvivier, stable

[-- Attachment #1: Type: text/plain, Size: 6268 bytes --]

Hi Greg,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linus/master v5.11-rc7 next-20210211]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Greg-Kurz/powerpc-pseries-Don-t-enforce-MSI-affinity-with-kdump/20210213-004658
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/1e5f7523fcfc57ab9437b8c7b29a974b62bde79d
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Greg-Kurz/powerpc-pseries-Don-t-enforce-MSI-affinity-with-kdump/20210213-004658
        git checkout 1e5f7523fcfc57ab9437b8c7b29a974b62bde79d
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/powerpc/platforms/pseries/msi.c: In function 'rtas_setup_msi_irqs':
>> arch/powerpc/platforms/pseries/msi.c:478:7: error: implicit declaration of function 'is_kdump_kernel' [-Werror=implicit-function-declaration]
     478 |   if (is_kdump_kernel())
         |       ^~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/is_kdump_kernel +478 arch/powerpc/platforms/pseries/msi.c

   369	
   370	static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
   371	{
   372		struct pci_dn *pdn;
   373		int hwirq, virq, i, quota, rc;
   374		struct msi_desc *entry;
   375		struct msi_msg msg;
   376		int nvec = nvec_in;
   377		int use_32bit_msi_hack = 0;
   378	
   379		if (type == PCI_CAP_ID_MSIX)
   380			rc = check_req_msix(pdev, nvec);
   381		else
   382			rc = check_req_msi(pdev, nvec);
   383	
   384		if (rc)
   385			return rc;
   386	
   387		quota = msi_quota_for_device(pdev, nvec);
   388	
   389		if (quota && quota < nvec)
   390			return quota;
   391	
   392		if (type == PCI_CAP_ID_MSIX && check_msix_entries(pdev))
   393			return -EINVAL;
   394	
   395		/*
   396		 * Firmware currently refuse any non power of two allocation
   397		 * so we round up if the quota will allow it.
   398		 */
   399		if (type == PCI_CAP_ID_MSIX) {
   400			int m = roundup_pow_of_two(nvec);
   401			quota = msi_quota_for_device(pdev, m);
   402	
   403			if (quota >= m)
   404				nvec = m;
   405		}
   406	
   407		pdn = pci_get_pdn(pdev);
   408	
   409		/*
   410		 * Try the new more explicit firmware interface, if that fails fall
   411		 * back to the old interface. The old interface is known to never
   412		 * return MSI-Xs.
   413		 */
   414	again:
   415		if (type == PCI_CAP_ID_MSI) {
   416			if (pdev->no_64bit_msi) {
   417				rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, nvec);
   418				if (rc < 0) {
   419					/*
   420					 * We only want to run the 32 bit MSI hack below if
   421					 * the max bus speed is Gen2 speed
   422					 */
   423					if (pdev->bus->max_bus_speed != PCIE_SPEED_5_0GT)
   424						return rc;
   425	
   426					use_32bit_msi_hack = 1;
   427				}
   428			} else
   429				rc = -1;
   430	
   431			if (rc < 0)
   432				rc = rtas_change_msi(pdn, RTAS_CHANGE_MSI_FN, nvec);
   433	
   434			if (rc < 0) {
   435				pr_debug("rtas_msi: trying the old firmware call.\n");
   436				rc = rtas_change_msi(pdn, RTAS_CHANGE_FN, nvec);
   437			}
   438	
   439			if (use_32bit_msi_hack && rc > 0)
   440				rtas_hack_32bit_msi_gen2(pdev);
   441		} else
   442			rc = rtas_change_msi(pdn, RTAS_CHANGE_MSIX_FN, nvec);
   443	
   444		if (rc != nvec) {
   445			if (nvec != nvec_in) {
   446				nvec = nvec_in;
   447				goto again;
   448			}
   449			pr_debug("rtas_msi: rtas_change_msi() failed\n");
   450			return rc;
   451		}
   452	
   453		i = 0;
   454		for_each_pci_msi_entry(entry, pdev) {
   455			hwirq = rtas_query_irq_number(pdn, i++);
   456			if (hwirq < 0) {
   457				pr_debug("rtas_msi: error (%d) getting hwirq\n", rc);
   458				return hwirq;
   459			}
   460	
   461			/*
   462			 * Depending on the number of online CPUs in the original
   463			 * kernel, it is likely for CPU #0 to be offline in a kdump
   464			 * kernel. The associated IRQs in the affinity mappings
   465			 * provided by irq_create_affinity_masks() are thus not
   466			 * started by irq_startup(), as per-design for managed IRQs.
   467			 * This can be a problem with multi-queue block devices driven
   468			 * by blk-mq : such a non-started IRQ is very likely paired
   469			 * with the single queue enforced by blk-mq during kdump (see
   470			 * blk_mq_alloc_tag_set()). This causes the device to remain
   471			 * silent and likely hangs the guest at some point.
   472			 *
   473			 * We don't really care for fine-grained affinity when doing
   474			 * kdump actually : simply ignore the pre-computed affinity
   475			 * masks in this case and let the default mask with all CPUs
   476			 * be used when creating the IRQ mappings.
   477			 */
 > 478			if (is_kdump_kernel())
   479				virq = irq_create_mapping(NULL, hwirq);
   480			else
   481				virq = irq_create_mapping_affinity(NULL, hwirq,
   482								   entry->affinity);
   483	
   484			if (!virq) {
   485				pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
   486				return -ENOSPC;
   487			}
   488	
   489			dev_dbg(&pdev->dev, "rtas_msi: allocated virq %d\n", virq);
   490			irq_set_msi_desc(virq, entry);
   491	
   492			/* Read config space back so we can restore after reset */
   493			__pci_read_msi_msg(entry, &msg);
   494			entry->msg = msg;
   495		}
   496	
   497		return 0;
   498	}
   499	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 72574 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-12 21:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-12 16:41 [PATCH] powerpc/pseries: Don't enforce MSI affinity with kdump Greg Kurz
2021-02-12 16:51 ` Laurent Vivier
2021-02-12 17:27 ` Cédric Le Goater
2021-02-12 21:05 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).