All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Add machanism to limit msi allocation for Loongson
@ 2023-05-27  5:46 Huacai Chen
  2023-05-27  5:46 ` [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs() Huacai Chen
  2023-05-27  5:46 ` [PATCH 2/2] irqchip/loongson-pch-msi: Add machanism to limit msi allocation Huacai Chen
  0 siblings, 2 replies; 17+ messages in thread
From: Huacai Chen @ 2023-05-27  5:46 UTC (permalink / raw)
  To: Thomas Gleixner, Marc Zyngier, Bjorn Helgaas
  Cc: linux-kernel, loongson-kernel, Xuefeng Li, Huacai Chen,
	Jiaxun Yang, Huacai Chen, Juxin Gao

Loongson machines can have as many as 256 logical cpus, but the maximum
of msi vectors in one irqchip is also 256 (practically that is less than
256, because pch-pic consumes some of them). Even on a 64-core machine,
256 irqs can be easily exhausted if there are several NICs (NICs usually
allocate msi irqs depending on the number of online cpus). So we want to
limit the msi allocation.

Patch-1 adjusts the return value semanteme of msi_domain_prepare_irqs(),
allowing us to modify the input "nvec" by overriding the msi_domain_ops
::msi_prepare().
    
Patch-2 adds a machanism to limit msi allocation:
1, Modify input "nvec" by overriding the msi_domain_ops::msi_prepare();
2, The default limit is 256, which is compatible with the old behavior;
3, Add a cmdline parameter "loongson_msi_limit=xxx" to control the limit.

Huacai Chen and Juxin Gao(2):
 PCI: Omit pci_disable_device() in .shutdown().
 PCI: loongson: Improve the MRRS quirk for LS7A.

Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> 
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
2.27.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-27  5:46 [PATCH 0/2] Add machanism to limit msi allocation for Loongson Huacai Chen
@ 2023-05-27  5:46 ` Huacai Chen
  2023-05-27 14:03   ` Thomas Gleixner
  2023-05-27  5:46 ` [PATCH 2/2] irqchip/loongson-pch-msi: Add machanism to limit msi allocation Huacai Chen
  1 sibling, 1 reply; 17+ messages in thread
From: Huacai Chen @ 2023-05-27  5:46 UTC (permalink / raw)
  To: Thomas Gleixner, Marc Zyngier, Bjorn Helgaas
  Cc: linux-kernel, loongson-kernel, Xuefeng Li, Huacai Chen,
	Jiaxun Yang, Huacai Chen

Adjust the return value semanteme of msi_domain_prepare_irqs(), which
allows us to modify the input nvec by overriding the msi_domain_ops::
msi_prepare(). This is necessary for the later patch.

Before:
0 on success, others on error.

After:
= 0: Success;
> 0: The modified nvec;
< 0: Error code.

Callers are also updated.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/base/platform-msi.c |  2 +-
 kernel/irq/msi.c            | 10 +++++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index f37ad34c80ec..e4a517c144e7 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -298,7 +298,7 @@ __platform_msi_create_device_domain(struct device *dev,
 
 	platform_msi_set_proxy_dev(&data->arg);
 	err = msi_domain_prepare_irqs(domain->parent, dev, nvec, &data->arg);
-	if (err)
+	if (err < 0)
 		goto free_domain;
 
 	return domain;
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7a97bcb086bf..d151936aec05 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1058,6 +1058,12 @@ bool msi_match_device_irq_domain(struct device *dev, unsigned int domid,
 	return ret;
 }
 
+/*
+ * Return Val:
+ * = 0: Success;
+ * > 0: The modified nvec;
+ * < 0: Error code.
+ */
 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
 			    int nvec, msi_alloc_info_t *arg)
 {
@@ -1260,8 +1266,10 @@ static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain
 	int i, ret, virq;
 
 	ret = msi_domain_prepare_irqs(domain, dev, ctrl->nirqs, &arg);
-	if (ret)
+	if (ret < 0)
 		return ret;
+	if (ret > 0)
+		ctrl->nirqs = ret;
 
 	/*
 	 * This flag is set by the PCI layer as we need to activate
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] irqchip/loongson-pch-msi: Add machanism to limit msi allocation
  2023-05-27  5:46 [PATCH 0/2] Add machanism to limit msi allocation for Loongson Huacai Chen
  2023-05-27  5:46 ` [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs() Huacai Chen
@ 2023-05-27  5:46 ` Huacai Chen
  1 sibling, 0 replies; 17+ messages in thread
From: Huacai Chen @ 2023-05-27  5:46 UTC (permalink / raw)
  To: Thomas Gleixner, Marc Zyngier, Bjorn Helgaas
  Cc: linux-kernel, loongson-kernel, Xuefeng Li, Huacai Chen,
	Jiaxun Yang, Huacai Chen, Juxin Gao

Loongson machines can have as many as 256 logical cpus, but the maximum
of msi vectors in one irqchip is also 256 (practically that is less than
256, because pch-pic consumes some of them). Even on a 64-core machine,
256 irqs can be easily exhausted if there are several NICs (NICs usually
allocate msi irqs depending on the number of online cpus). So we want to
limit the msi allocation.

In this patch we add a machanism to limit msi allocation:
1, Modify input "nvec" by overriding the msi_domain_ops::msi_prepare();
2, The default limit is 256, which is compatible with the old behavior;
3, Add a cmdline parameter "loongson_msi_limit=xxx" to control the limit.

Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/irqchip/irq-loongson-pch-msi.c | 27 ++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/irqchip/irq-loongson-pch-msi.c b/drivers/irqchip/irq-loongson-pch-msi.c
index 6e1e1f011bb2..85e2e3468b8c 100644
--- a/drivers/irqchip/irq-loongson-pch-msi.c
+++ b/drivers/irqchip/irq-loongson-pch-msi.c
@@ -85,9 +85,36 @@ static void pch_msi_compose_msi_msg(struct irq_data *data,
 	msg->data = data->hwirq;
 }
 
+#define DEFAULT_MSI_LIMITS 256
+
+static int pch_msi_limits = DEFAULT_MSI_LIMITS;
+
+static int __init pch_msi_limit(char *str)
+{
+	get_option(&str, &pch_msi_limits);
+
+	if (pch_msi_limits <= 0)
+		pch_msi_limits = DEFAULT_MSI_LIMITS;
+
+	return 0;
+}
+
+early_param("loongson_msi_limit", pch_msi_limit);
+
+static int pch_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg)
+{
+	memset(arg, 0, sizeof(*arg));
+	return clamp_val(nvec, 0, pch_msi_limits);
+}
+
+static struct msi_domain_ops pch_msi_ops = {
+	.msi_prepare	= pch_msi_prepare,
+};
+
 static struct msi_domain_info pch_msi_domain_info = {
 	.flags	= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
 		  MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX,
+	.ops	= &pch_msi_ops,
 	.chip	= &pch_msi_irq_chip,
 };
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-27  5:46 ` [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs() Huacai Chen
@ 2023-05-27 14:03   ` Thomas Gleixner
  2023-05-28  3:42     ` Huacai Chen
  2023-05-28  7:47     ` Marc Zyngier
  0 siblings, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-27 14:03 UTC (permalink / raw)
  To: Huacai Chen, Marc Zyngier, Bjorn Helgaas
  Cc: linux-kernel, loongson-kernel, Xuefeng Li, Huacai Chen,
	Jiaxun Yang, Huacai Chen

On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> Adjust the return value semanteme of msi_domain_prepare_irqs(), which
> allows us to modify the input nvec by overriding the msi_domain_ops::
> msi_prepare(). This is necessary for the later patch.
>
> Before:
> 0 on success, others on error.
>
> After:
> = 0: Success;
>> 0: The modified nvec;
> < 0: Error code.

This explains what the patch does, but provides zero justification for
this nor any analysis why this is correct for the existing use cases.

That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a
mechanism to return the actual possible number of vectors if the
underlying space is exhausted.

Why is that not sufficient for your problem at hand?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-27 14:03   ` Thomas Gleixner
@ 2023-05-28  3:42     ` Huacai Chen
  2023-05-29  7:44       ` Thomas Gleixner
  2023-05-28  7:47     ` Marc Zyngier
  1 sibling, 1 reply; 17+ messages in thread
From: Huacai Chen @ 2023-05-28  3:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Huacai Chen, Marc Zyngier, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

Hi, Thomas,

On Sat, May 27, 2023 at 10:03 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> > Adjust the return value semanteme of msi_domain_prepare_irqs(), which
> > allows us to modify the input nvec by overriding the msi_domain_ops::
> > msi_prepare(). This is necessary for the later patch.
> >
> > Before:
> > 0 on success, others on error.
> >
> > After:
> > = 0: Success;
> >> 0: The modified nvec;
> > < 0: Error code.
>
> This explains what the patch does, but provides zero justification for
> this nor any analysis why this is correct for the existing use cases.
I checked all msi_prepare() callbacks and none of them return positive
values now, so I think it is correct.

>
> That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a
> mechanism to return the actual possible number of vectors if the
> underlying space is exhausted.
>
> Why is that not sufficient for your problem at hand?
Hmm, maybe I should make things clearer. We want to do some proactive
throttling here. For example, if we have two NICs, we want both of
them to get 32 msi vectors, not one get 64 vectors, and  the other
fallback to use legacy irq.

Huacai
>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-27 14:03   ` Thomas Gleixner
  2023-05-28  3:42     ` Huacai Chen
@ 2023-05-28  7:47     ` Marc Zyngier
  2023-05-28 12:07       ` Huacai Chen
  1 sibling, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2023-05-28  7:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Huacai Chen, Bjorn Helgaas, linux-kernel, loongson-kernel,
	Xuefeng Li, Huacai Chen, Jiaxun Yang

On Sat, 27 May 2023 15:03:29 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> > Adjust the return value semanteme of msi_domain_prepare_irqs(), which
> > allows us to modify the input nvec by overriding the msi_domain_ops::
> > msi_prepare(). This is necessary for the later patch.
> >
> > Before:
> > 0 on success, others on error.
> >
> > After:
> > = 0: Success;
> >> 0: The modified nvec;
> > < 0: Error code.
> 
> This explains what the patch does, but provides zero justification for
> this nor any analysis why this is correct for the existing use cases.
> 
> That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a
> mechanism to return the actual possible number of vectors if the
> underlying space is exhausted.
> 
> Why is that not sufficient for your problem at hand?

I've already made that point, but it seems that the argument is
falling on deaf ears.

Being able to allocate MSIs is not a guarantee, and is always
opportunistic. If some drivers badly fail because the they don't get
the number of MSIs they need, then they need fixing.

I really don't see the point in papering over this at the lowest level
of the stack.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-28  7:47     ` Marc Zyngier
@ 2023-05-28 12:07       ` Huacai Chen
  2023-05-29  9:27         ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Huacai Chen @ 2023-05-28 12:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Huacai Chen, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

Hi, Marc,

On Sun, May 28, 2023 at 3:47 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Sat, 27 May 2023 15:03:29 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> > > Adjust the return value semanteme of msi_domain_prepare_irqs(), which
> > > allows us to modify the input nvec by overriding the msi_domain_ops::
> > > msi_prepare(). This is necessary for the later patch.
> > >
> > > Before:
> > > 0 on success, others on error.
> > >
> > > After:
> > > = 0: Success;
> > >> 0: The modified nvec;
> > > < 0: Error code.
> >
> > This explains what the patch does, but provides zero justification for
> > this nor any analysis why this is correct for the existing use cases.
> >
> > That longsoon MSI domain is a PCI MSI domain. PCI/MSI has already a
> > mechanism to return the actual possible number of vectors if the
> > underlying space is exhausted.
> >
> > Why is that not sufficient for your problem at hand?
>
> I've already made that point, but it seems that the argument is
> falling on deaf ears.
I'm very sorry that I didn't answer your question directly.

>
> Being able to allocate MSIs is not a guarantee, and is always
> opportunistic. If some drivers badly fail because the they don't get
> the number of MSIs they need, then they need fixing.
Yes, I know allocating MSIs is not a guarantee, and most existing
drivers will fallback to use legacy irqs when failed. However, as I
replied in an early mail, we want to do some proactive throttling in
the loongson-pch-msi irqchip driver, rather than consume msi vectors
aggressively. For example, if we have two NICs, we want both of them
to get 32 msi vectors; not one exhaust all available vectors, and the
other fallback to use legacy irq.

I hope I have explained clearly, thanks.

Huacai

>
> I really don't see the point in papering over this at the lowest level
> of the stack.
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-28  3:42     ` Huacai Chen
@ 2023-05-29  7:44       ` Thomas Gleixner
  2023-05-29  9:35         ` Huacai Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-29  7:44 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Marc Zyngier, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

On Sun, May 28 2023 at 11:42, Huacai Chen wrote:
> On Sat, May 27, 2023 at 10:03 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
>> > After:
>> > = 0: Success;
>> >> 0: The modified nvec;
>> > < 0: Error code.
>>
>> This explains what the patch does, but provides zero justification for
>> this nor any analysis why this is correct for the existing use cases.
> I checked all msi_prepare() callbacks and none of them return positive
> values now, so I think it is correct.

Still you failed to tell so in the changelog. It's not helpful if you
think it is correct. The point is that you have to make clear why it
_IS_ correct.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-28 12:07       ` Huacai Chen
@ 2023-05-29  9:27         ` Thomas Gleixner
  2023-05-29  9:36           ` Huacai Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-29  9:27 UTC (permalink / raw)
  To: Huacai Chen, Marc Zyngier
  Cc: Huacai Chen, Bjorn Helgaas, linux-kernel, loongson-kernel,
	Xuefeng Li, Jiaxun Yang

On Sun, May 28 2023 at 20:07, Huacai Chen wrote:
> On Sun, May 28, 2023 at 3:47 PM Marc Zyngier <maz@kernel.org> wrote:
>>
>> Being able to allocate MSIs is not a guarantee, and is always
>> opportunistic. If some drivers badly fail because the they don't get
>> the number of MSIs they need, then they need fixing.
>
> Yes, I know allocating MSIs is not a guarantee, and most existing
> drivers will fallback to use legacy irqs when failed. However, as I
> replied in an early mail, we want to do some proactive throttling in
> the loongson-pch-msi irqchip driver, rather than consume msi vectors
> aggressively. For example, if we have two NICs, we want both of them
> to get 32 msi vectors; not one exhaust all available vectors, and the
> other fallback to use legacy irq.

By default you allow up to 256 interrupts to be allocated, right? So to
prevent vector exhaustion, the admin needs to reboot the machine and set
a command line parameter to limit this, right? As that parameter is not
documented the admin is going to dice a number. That's impractical and
just a horrible bandaid.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-29  7:44       ` Thomas Gleixner
@ 2023-05-29  9:35         ` Huacai Chen
  0 siblings, 0 replies; 17+ messages in thread
From: Huacai Chen @ 2023-05-29  9:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Huacai Chen, Marc Zyngier, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

Hi, Thomas,

On Mon, May 29, 2023 at 3:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sun, May 28 2023 at 11:42, Huacai Chen wrote:
> > On Sat, May 27, 2023 at 10:03 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >> On Sat, May 27 2023 at 13:46, Huacai Chen wrote:
> >> > After:
> >> > = 0: Success;
> >> >> 0: The modified nvec;
> >> > < 0: Error code.
> >>
> >> This explains what the patch does, but provides zero justification for
> >> this nor any analysis why this is correct for the existing use cases.
> > I checked all msi_prepare() callbacks and none of them return positive
> > values now, so I think it is correct.
>
> Still you failed to tell so in the changelog. It's not helpful if you
> think it is correct. The point is that you have to make clear why it
> _IS_ correct.
OK, I think I should add more information in the next version, thanks.

Huacai
>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-29  9:27         ` Thomas Gleixner
@ 2023-05-29  9:36           ` Huacai Chen
  2023-05-29 20:19             ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Huacai Chen @ 2023-05-29  9:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Marc Zyngier, Huacai Chen, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

Hi, Thomas,

On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sun, May 28 2023 at 20:07, Huacai Chen wrote:
> > On Sun, May 28, 2023 at 3:47 PM Marc Zyngier <maz@kernel.org> wrote:
> >>
> >> Being able to allocate MSIs is not a guarantee, and is always
> >> opportunistic. If some drivers badly fail because the they don't get
> >> the number of MSIs they need, then they need fixing.
> >
> > Yes, I know allocating MSIs is not a guarantee, and most existing
> > drivers will fallback to use legacy irqs when failed. However, as I
> > replied in an early mail, we want to do some proactive throttling in
> > the loongson-pch-msi irqchip driver, rather than consume msi vectors
> > aggressively. For example, if we have two NICs, we want both of them
> > to get 32 msi vectors; not one exhaust all available vectors, and the
> > other fallback to use legacy irq.
>
> By default you allow up to 256 interrupts to be allocated, right? So to
> prevent vector exhaustion, the admin needs to reboot the machine and set
> a command line parameter to limit this, right? As that parameter is not
> documented the admin is going to dice a number. That's impractical and
> just a horrible bandaid.
OK, I think I should update the documents in the new version.

Huacai
>
> Thanks,
>
>         tglx
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-29  9:36           ` Huacai Chen
@ 2023-05-29 20:19             ` Thomas Gleixner
  2023-05-30  8:34               ` Huacai Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-29 20:19 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Marc Zyngier, Huacai Chen, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

Huacai!

On Mon, May 29 2023 at 17:36, Huacai Chen wrote:
> On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> By default you allow up to 256 interrupts to be allocated, right? So to
>> prevent vector exhaustion, the admin needs to reboot the machine and set
>> a command line parameter to limit this, right? As that parameter is not
>> documented the admin is going to dice a number. That's impractical and
>> just a horrible bandaid.
>
> OK, I think I should update the documents in the new version.

Updating documentation neither makes it more practical (it still
requires a reboot) nor does it justify the abuse of the msi_prepare()
callback.

The only reason why this hack "works" is that there is a historical
mechanism which tells the PCI/MSI core that the number of requested
vectors cannot be allocated, but that there would be $N vectors
possible. But even that return value has no guarantee.

This mechanism is ill defined and really should go away.

Adding yet another way to limit this via msi_prepare() is just
proliferating this ill defined mechanism and I have zero interest in
that.

Let's take a step back and look at the larger picture:

 1) A PCI/MSI irqdomain is attached to a PCI bus  

 2) The number of PCI devices on that PCI bus is usually known at boot
    time _before_ the first device driver is probed.

    That's not entirely true for PCI hotplug devices, but that's hardly
    relevant for an architecture which got designed less than 10 years
    ago and the architects decided that 256 MSI vectors are good enough
    for up to 256 CPUs. The concept of per CPU queues was already known
    at that time, no?

So the irqdomain can tell the PCI/MSI core the maximum number of vectors
available for a particular bus, right?

The default, i.e if the irqdomain does not expose that information,
would be "unlimited", i.e. ULONG_MAX.

Now take that number and divide it by the number of devices on the bus
and you get at least a sensible limit which does not immediately cause
vector exhaustion.

That limit might be suboptimal if there are lots of other devices on
that bus which just require one or two vectors, but that's something
which can be optimized via a generic command line option or even a sysfs
mechanism.

Thanks,

        tglx






^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-29 20:19             ` Thomas Gleixner
@ 2023-05-30  8:34               ` Huacai Chen
  2023-05-30 12:22                 ` Thomas Gleixner
  2023-05-30 15:03                 ` Thomas Gleixner
  0 siblings, 2 replies; 17+ messages in thread
From: Huacai Chen @ 2023-05-30  8:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Marc Zyngier, Huacai Chen, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

Hi, Thomas,

On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Huacai!
>
> On Mon, May 29 2023 at 17:36, Huacai Chen wrote:
> > On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >> By default you allow up to 256 interrupts to be allocated, right? So to
> >> prevent vector exhaustion, the admin needs to reboot the machine and set
> >> a command line parameter to limit this, right? As that parameter is not
> >> documented the admin is going to dice a number. That's impractical and
> >> just a horrible bandaid.
> >
> > OK, I think I should update the documents in the new version.
>
> Updating documentation neither makes it more practical (it still
> requires a reboot) nor does it justify the abuse of the msi_prepare()
> callback.
>
> The only reason why this hack "works" is that there is a historical
> mechanism which tells the PCI/MSI core that the number of requested
> vectors cannot be allocated, but that there would be $N vectors
> possible. But even that return value has no guarantee.
>
> This mechanism is ill defined and really should go away.
>
> Adding yet another way to limit this via msi_prepare() is just
> proliferating this ill defined mechanism and I have zero interest in
> that.
>
> Let's take a step back and look at the larger picture:
>
>  1) A PCI/MSI irqdomain is attached to a PCI bus
>
>  2) The number of PCI devices on that PCI bus is usually known at boot
>     time _before_ the first device driver is probed.
>
>     That's not entirely true for PCI hotplug devices, but that's hardly
>     relevant for an architecture which got designed less than 10 years
>     ago and the architects decided that 256 MSI vectors are good enough
>     for up to 256 CPUs. The concept of per CPU queues was already known
>     at that time, no?
Does this solution depend on the per-device msi domain? Can we do that
if we use the global msi domain?

>
> So the irqdomain can tell the PCI/MSI core the maximum number of vectors
> available for a particular bus, right?
>
> The default, i.e if the irqdomain does not expose that information,
> would be "unlimited", i.e. ULONG_MAX.
OK, thanks, but how to expose? By msi_domain_info::hwsize?

>
> Now take that number and divide it by the number of devices on the bus
> and you get at least a sensible limit which does not immediately cause
> vector exhaustion.
>
> That limit might be suboptimal if there are lots of other devices on
> that bus which just require one or two vectors, but that's something
> which can be optimized via a generic command line option or even a sysfs
> mechanism.
Hmm, if we still use the command line, then we still have some similar
drawbacks.

Huacai
>
> Thanks,
>
>         tglx
>
>
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-30  8:34               ` Huacai Chen
@ 2023-05-30 12:22                 ` Thomas Gleixner
  2023-05-30 15:03                 ` Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-30 12:22 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Marc Zyngier, Huacai Chen, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

On Tue, May 30 2023 at 16:34, Huacai Chen wrote:
> On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> Now take that number and divide it by the number of devices on the bus
>> and you get at least a sensible limit which does not immediately cause
>> vector exhaustion.
>>
>> That limit might be suboptimal if there are lots of other devices on
>> that bus which just require one or two vectors, but that's something
>> which can be optimized via a generic command line option or even a sysfs
>> mechanism.
> Hmm, if we still use the command line, then we still have some similar
> drawbacks.

Only for optimization. Without the optimization the limit might end up
being overbroad, but it definitely prevents vector exhaustion. For quite
some cases this might be even the proper limit which does not need any
further tweaking.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-30  8:34               ` Huacai Chen
  2023-05-30 12:22                 ` Thomas Gleixner
@ 2023-05-30 15:03                 ` Thomas Gleixner
  2023-06-01 15:18                   ` Huacai Chen
  2023-06-02  1:10                   ` bibo, mao
  1 sibling, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2023-05-30 15:03 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Marc Zyngier, Huacai Chen, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

On Tue, May 30 2023 at 16:34, Huacai Chen wrote:
> On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> Let's take a step back and look at the larger picture:
>>
>>  1) A PCI/MSI irqdomain is attached to a PCI bus
>>
>>  2) The number of PCI devices on that PCI bus is usually known at boot
>>     time _before_ the first device driver is probed.
>>
>>     That's not entirely true for PCI hotplug devices, but that's hardly
>>     relevant for an architecture which got designed less than 10 years
>>     ago and the architects decided that 256 MSI vectors are good enough
>>     for up to 256 CPUs. The concept of per CPU queues was already known
>>     at that time, no?
> Does this solution depend on the per-device msi domain? Can we do that
> if we use the global msi domain?

In principle it should not depend on per-device MSI domains, but I
really don't want to add new functionality to the old operating models
as that does not create an incentive for people to convert their stuff
over.

>> So the irqdomain can tell the PCI/MSI core the maximum number of vectors
>> available for a particular bus, right?
>>
>> The default, i.e if the irqdomain does not expose that information,
>> would be "unlimited", i.e. ULONG_MAX.
> OK, thanks, but how to expose? By msi_domain_info::hwsize?

Probably. Needs a proper helper around it.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-30 15:03                 ` Thomas Gleixner
@ 2023-06-01 15:18                   ` Huacai Chen
  2023-06-02  1:10                   ` bibo, mao
  1 sibling, 0 replies; 17+ messages in thread
From: Huacai Chen @ 2023-06-01 15:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Marc Zyngier, Huacai Chen, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang

Hi, Thomas,

On Tue, May 30, 2023 at 11:03 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, May 30 2023 at 16:34, Huacai Chen wrote:
> > On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >> Let's take a step back and look at the larger picture:
> >>
> >>  1) A PCI/MSI irqdomain is attached to a PCI bus
> >>
> >>  2) The number of PCI devices on that PCI bus is usually known at boot
> >>     time _before_ the first device driver is probed.
> >>
> >>     That's not entirely true for PCI hotplug devices, but that's hardly
> >>     relevant for an architecture which got designed less than 10 years
> >>     ago and the architects decided that 256 MSI vectors are good enough
> >>     for up to 256 CPUs. The concept of per CPU queues was already known
> >>     at that time, no?
> > Does this solution depend on the per-device msi domain? Can we do that
> > if we use the global msi domain?
>
> In principle it should not depend on per-device MSI domains, but I
> really don't want to add new functionality to the old operating models
> as that does not create an incentive for people to convert their stuff
> over.
Thank you for your advice, but for our scenario, its effect is no
better than this patch (because not all devices are aggressive
devices), so we give up. :)

And as Jason said in another thread, this problem can be solved by
simply reducing the number of queues by ethtool. So let's ignore this
patch since it is not acceptable.

Huacai
>
> >> So the irqdomain can tell the PCI/MSI core the maximum number of vectors
> >> available for a particular bus, right?
> >>
> >> The default, i.e if the irqdomain does not expose that information,
> >> would be "unlimited", i.e. ULONG_MAX.
> > OK, thanks, but how to expose? By msi_domain_info::hwsize?
>
> Probably. Needs a proper helper around it.
>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()
  2023-05-30 15:03                 ` Thomas Gleixner
  2023-06-01 15:18                   ` Huacai Chen
@ 2023-06-02  1:10                   ` bibo, mao
  1 sibling, 0 replies; 17+ messages in thread
From: bibo, mao @ 2023-06-02  1:10 UTC (permalink / raw)
  To: Thomas Gleixner, Huacai Chen
  Cc: Marc Zyngier, Huacai Chen, Bjorn Helgaas, linux-kernel,
	loongson-kernel, Xuefeng Li, Jiaxun Yang



在 2023/5/30 23:03, Thomas Gleixner 写道:
> On Tue, May 30 2023 at 16:34, Huacai Chen wrote:
>> On Tue, May 30, 2023 at 4:19 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>> Let's take a step back and look at the larger picture:
>>>
>>>  1) A PCI/MSI irqdomain is attached to a PCI bus
>>>
>>>  2) The number of PCI devices on that PCI bus is usually known at boot
>>>     time _before_ the first device driver is probed.
>>>
>>>     That's not entirely true for PCI hotplug devices, but that's hardly
>>>     relevant for an architecture which got designed less than 10 years
>>>     ago and the architects decided that 256 MSI vectors are good enough
>>>     for up to 256 CPUs. The concept of per CPU queues was already known
>>>     at that time, no?
>> Does this solution depend on the per-device msi domain? Can we do that
>> if we use the global msi domain?
> 
> In principle it should not depend on per-device MSI domains, but I
> really don't want to add new functionality to the old operating models
> as that does not create an incentive for people to convert their stuff
> over.
> 
>>> So the irqdomain can tell the PCI/MSI core the maximum number of vectors
>>> available for a particular bus, right?
>>>
>>> The default, i.e if the irqdomain does not expose that information,
>>> would be "unlimited", i.e. ULONG_MAX.
>> OK, thanks, but how to expose? By msi_domain_info::hwsize?
> 
> Probably. Needs a proper helper around it.

It is not common issue, command line and documentation explanation
is not suitable here.

Can we add weak function like this?
int __weak arch_set_max_msix_vectors(void)

Regards
Bibo, mao
> 
> Thanks,
> 
>         tglx


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-02  1:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-27  5:46 [PATCH 0/2] Add machanism to limit msi allocation for Loongson Huacai Chen
2023-05-27  5:46 ` [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs() Huacai Chen
2023-05-27 14:03   ` Thomas Gleixner
2023-05-28  3:42     ` Huacai Chen
2023-05-29  7:44       ` Thomas Gleixner
2023-05-29  9:35         ` Huacai Chen
2023-05-28  7:47     ` Marc Zyngier
2023-05-28 12:07       ` Huacai Chen
2023-05-29  9:27         ` Thomas Gleixner
2023-05-29  9:36           ` Huacai Chen
2023-05-29 20:19             ` Thomas Gleixner
2023-05-30  8:34               ` Huacai Chen
2023-05-30 12:22                 ` Thomas Gleixner
2023-05-30 15:03                 ` Thomas Gleixner
2023-06-01 15:18                   ` Huacai Chen
2023-06-02  1:10                   ` bibo, mao
2023-05-27  5:46 ` [PATCH 2/2] irqchip/loongson-pch-msi: Add machanism to limit msi allocation Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.