All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] nvme-pci: add module param for io queue number
@ 2018-12-21  6:04 Shan Hai
  2018-12-21  6:04 ` [PATCH 2/2] nvme-pci: take the io_queue_number into account when setting number of io queues Shan Hai
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Shan Hai @ 2018-12-21  6:04 UTC (permalink / raw)


The num_possible_cpus() number of io queues by default would cause
irq vector shortage problem on a large system when hotplugging cpus,
add a module parameter to set number of io queues according to the
system configuration to fix the issue.

Signed-off-by: Shan Hai <shan.hai at oracle.com>
---
 drivers/nvme/host/pci.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c33bb20..0d60451 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -64,6 +64,16 @@ MODULE_PARM_DESC(sgl_threshold,
 		"Use SGLs when average request segment size is larger or equal to "
 		"this size. Use 0 to disable SGLs.");
 
+static int io_queue_number_set(const char *val, const struct kernel_param *kp);
+static const struct kernel_param_ops io_queue_number_ops = {
+	.set = io_queue_number_set,
+	.get = param_get_uint,
+};
+
+static unsigned int io_queue_number = UINT_MAX;
+module_param_cb(io_queue_number, &io_queue_number_ops, &io_queue_number, 0644);
+MODULE_PARM_DESC(io_queue_number, "set io queue number, should >= 2");
+
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp);
 static const struct kernel_param_ops io_queue_depth_ops = {
 	.set = io_queue_depth_set,
@@ -123,6 +133,17 @@ struct nvme_dev {
 	void **host_mem_desc_bufs;
 };
 
+static int io_queue_number_set(const char *val, const struct kernel_param *kp)
+{
+	unsigned int n = 0, ret;
+
+	ret = kstrtouint(val, 10, &n);
+	if (ret != 0 || n < 2)
+		return -EINVAL;
+
+	return param_set_uint(val, kp);
+}
+
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
 {
 	int n = 0, ret;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] nvme-pci: take the io_queue_number into account when setting number of io queues
  2018-12-21  6:04 [PATCH 1/2] nvme-pci: add module param for io queue number Shan Hai
@ 2018-12-21  6:04 ` Shan Hai
  2018-12-21 15:02 ` [PATCH 1/2] nvme-pci: add module param for io queue number Bart Van Assche
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Shan Hai @ 2018-12-21  6:04 UTC (permalink / raw)


Add a wrapper around num_possible_cpus() to take the new added module
parameter account when setting number of io queues.

Signed-off-by: Shan Hai <shan.hai at oracle.com>
---
 drivers/nvme/host/pci.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0d60451..e359e90 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -219,6 +219,11 @@ struct nvme_iod {
 	struct scatterlist inline_sg[0];
 };
 
+static inline unsigned int nvme_io_queue_number(void)
+{
+	return min_t(unsigned int, io_queue_number, num_possible_cpus());
+}
+
 /*
  * Check we didin't inadvertently grow the command struct
  */
@@ -241,7 +246,7 @@ static inline void _nvme_check_size(void)
 
 static inline unsigned int nvme_dbbuf_size(u32 stride)
 {
-	return ((num_possible_cpus() + 1) * 8 * stride);
+	return ((nvme_io_queue_number() + 1) * 8 * stride);
 }
 
 static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev)
@@ -1923,7 +1928,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 		.pre_vectors = 1
 	};
 
-	nr_io_queues = num_possible_cpus();
+	nr_io_queues = nvme_io_queue_number();
 	result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
 	if (result < 0)
 		return result;
@@ -2512,7 +2517,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (!dev)
 		return -ENOMEM;
 
-	dev->queues = kcalloc_node(num_possible_cpus() + 1,
+	dev->queues = kcalloc_node(nvme_io_queue_number() + 1,
 			sizeof(struct nvme_queue), GFP_KERNEL, node);
 	if (!dev->queues)
 		goto free;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-21  6:04 [PATCH 1/2] nvme-pci: add module param for io queue number Shan Hai
  2018-12-21  6:04 ` [PATCH 2/2] nvme-pci: take the io_queue_number into account when setting number of io queues Shan Hai
@ 2018-12-21 15:02 ` Bart Van Assche
  2018-12-24  1:10   ` Shan Hai
  2018-12-23  0:38 ` Ming Lei
  2018-12-26 10:23 ` Ming Lei
  3 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2018-12-21 15:02 UTC (permalink / raw)


On 12/20/18 10:04 PM, Shan Hai wrote:
> The num_possible_cpus() number of io queues by default would cause
> irq vector shortage problem on a large system when hotplugging cpus,
> add a module parameter to set number of io queues according to the
> system configuration to fix the issue.

Is it possible to achieve this without introducing a new configuration 
knob? I think it will be much more convenient for users of large systems 
that they don't have to discover a knob like this one first in order to 
make the NVMe driver work on their systems.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-21  6:04 [PATCH 1/2] nvme-pci: add module param for io queue number Shan Hai
  2018-12-21  6:04 ` [PATCH 2/2] nvme-pci: take the io_queue_number into account when setting number of io queues Shan Hai
  2018-12-21 15:02 ` [PATCH 1/2] nvme-pci: add module param for io queue number Bart Van Assche
@ 2018-12-23  0:38 ` Ming Lei
  2018-12-24  1:02   ` Shan Hai
  2018-12-26 10:23 ` Ming Lei
  3 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2018-12-23  0:38 UTC (permalink / raw)


Hi Shanhai,

On Fri, Dec 21, 2018@2:05 PM Shan Hai <shan.hai@oracle.com> wrote:
>
> The num_possible_cpus() number of io queues by default would cause
> irq vector shortage problem on a large system when hotplugging cpus,
> add a module parameter to set number of io queues according to the
> system configuration to fix the issue.

Yeah, the default nr_io_queues is num_possible_cpus(), which can be a bit
big on some systems which supports small number of irq vectors.

But nvme_setup_irqs() may decrease nr_io_queues and try to allocate
again until it succeeds.

Could you share us what the actual issue is?

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-23  0:38 ` Ming Lei
@ 2018-12-24  1:02   ` Shan Hai
  2018-12-24  1:47     ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Shan Hai @ 2018-12-24  1:02 UTC (permalink / raw)


Hi Minglei,

On 2018/12/23 ??8:38, Ming Lei wrote:
> Hi Shanhai,
>
> On Fri, Dec 21, 2018@2:05 PM Shan Hai <shan.hai@oracle.com> wrote:
>> The num_possible_cpus() number of io queues by default would cause
>> irq vector shortage problem on a large system when hotplugging cpus,
>> add a module parameter to set number of io queues according to the
>> system configuration to fix the issue.
> Yeah, the default nr_io_queues is num_possible_cpus(), which can be a bit
> big on some systems which supports small number of irq vectors.
>
> But nvme_setup_irqs() may decrease nr_io_queues and try to allocate
> again until it succeeds.
>
> Could you share us what the actual issue is?


On an 8-way NUMA with total 384 CPUs system installed with multiple NVME 
storage devices the CPU

offline operation will fail when the online CPU numbers drop to a 
certain value, the failure is caused by

cpu interrupt vector exhaustion because the irqs of the NVME have to be 
migrated to the online CPUs.


Thanks

Shan Hai

>
> Thanks,
> Ming Lei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-21 15:02 ` [PATCH 1/2] nvme-pci: add module param for io queue number Bart Van Assche
@ 2018-12-24  1:10   ` Shan Hai
  2019-01-04 18:09     ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Shan Hai @ 2018-12-24  1:10 UTC (permalink / raw)


Hi Bart,

On 2018/12/21 ??11:02, Bart Van Assche wrote:
> On 12/20/18 10:04 PM, Shan Hai wrote:
>> The num_possible_cpus() number of io queues by default would cause
>> irq vector shortage problem on a large system when hotplugging cpus,
>> add a module parameter to set number of io queues according to the
>> system configuration to fix the issue.
>
> Is it possible to achieve this without introducing a new configuration 
> knob? I think it will be much more convenient for users of large 
> systems that they don't have to discover a knob like this one first in 
> order to make the NVMe driver work on their systems.
>

The problem occurs when offlining CPUs aggressively on a system with 
large number of cores,

for instance offlining 368? out of 384 total cores, the failure of 
hotplug is caused by device irq migration

on cpu hotplugging, the situation becomes worse when there are multiple 
NVME devices are available

on the system.


I didn't find a simpler way to achieve this except the solution provided 
in this patchset.


Thanks

Shan Hai

> Thanks,
>
> Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-24  1:02   ` Shan Hai
@ 2018-12-24  1:47     ` Ming Lei
  2018-12-24  2:12       ` Shan Hai
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2018-12-24  1:47 UTC (permalink / raw)


On Mon, Dec 24, 2018@9:02 AM Shan Hai <shan.hai@oracle.com> wrote:
>
> Hi Minglei,
>
> On 2018/12/23 ??8:38, Ming Lei wrote:
> > Hi Shanhai,
> >
> > On Fri, Dec 21, 2018@2:05 PM Shan Hai <shan.hai@oracle.com> wrote:
> >> The num_possible_cpus() number of io queues by default would cause
> >> irq vector shortage problem on a large system when hotplugging cpus,
> >> add a module parameter to set number of io queues according to the
> >> system configuration to fix the issue.
> > Yeah, the default nr_io_queues is num_possible_cpus(), which can be a bit
> > big on some systems which supports small number of irq vectors.
> >
> > But nvme_setup_irqs() may decrease nr_io_queues and try to allocate
> > again until it succeeds.
> >
> > Could you share us what the actual issue is?
>
>
> On an 8-way NUMA with total 384 CPUs system installed with multiple NVME
> storage devices the CPU
>
> offline operation will fail when the online CPU numbers drop to a
> certain value, the failure is caused by
>
> cpu interrupt vector exhaustion because the irqs of the NVME have to be
> migrated to the online CPUs.

I can understand there is issue when the whole system has very limited
irq vectors,
then some NVMe may consume too many irq vectors, and the remained NVMe
may not get any irq vectors left. Is this your case?

But I don't understand ' the irqs of the NVME have to be  migrated to
the online CPUs.',
in theory one IRQ vector is enough to drive NVMe, so could you explain it a bit?

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-24  1:47     ` Ming Lei
@ 2018-12-24  2:12       ` Shan Hai
  2018-12-24  2:46         ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Shan Hai @ 2018-12-24  2:12 UTC (permalink / raw)




On 2018/12/24 ??9:47, Ming Lei wrote:
> On Mon, Dec 24, 2018@9:02 AM Shan Hai <shan.hai@oracle.com> wrote:
>>
>> Hi Minglei,
>>
>> On 2018/12/23 ??8:38, Ming Lei wrote:
>>> Hi Shanhai,
>>>
>>> On Fri, Dec 21, 2018@2:05 PM Shan Hai <shan.hai@oracle.com> wrote:
>>>> The num_possible_cpus() number of io queues by default would cause
>>>> irq vector shortage problem on a large system when hotplugging cpus,
>>>> add a module parameter to set number of io queues according to the
>>>> system configuration to fix the issue.
>>> Yeah, the default nr_io_queues is num_possible_cpus(), which can be a bit
>>> big on some systems which supports small number of irq vectors.
>>>
>>> But nvme_setup_irqs() may decrease nr_io_queues and try to allocate
>>> again until it succeeds.
>>>
>>> Could you share us what the actual issue is?
>>
>>
>> On an 8-way NUMA with total 384 CPUs system installed with multiple NVME
>> storage devices the CPU
>>
>> offline operation will fail when the online CPU numbers drop to a
>> certain value, the failure is caused by
>>
>> cpu interrupt vector exhaustion because the irqs of the NVME have to be
>> migrated to the online CPUs.
> 
> I can understand there is issue when the whole system has very limited
> irq vectors,
> then some NVMe may consume too many irq vectors, and the remained NVMe
> may not get any irq vectors left. Is this your case?
> 

The problem only occurs on cpu offlining.

> But I don't understand ' the irqs of the NVME have to be  migrated to
> the online CPUs.',
> in theory one IRQ vector is enough to drive NVMe, so could you explain it a bit?
> 

Oops, it's not the migration of the NVME interrupts, sorry.
The interrupt migration failure occurs on other multi-queue devices
like NICs which has not use managed irq feature yet, so the migration
of interrupts of theses devices will fail because the NVMEs consume
much more vectors.

Thanks
Shan Hai

> Thanks,
> Ming Lei
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-24  2:12       ` Shan Hai
@ 2018-12-24  2:46         ` Ming Lei
  2018-12-24  3:05           ` Shan Hai
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2018-12-24  2:46 UTC (permalink / raw)


On Mon, Dec 24, 2018@10:12 AM Shan Hai <shan.hai@oracle.com> wrote:
>
>
>
> On 2018/12/24 ??9:47, Ming Lei wrote:
> > On Mon, Dec 24, 2018@9:02 AM Shan Hai <shan.hai@oracle.com> wrote:
> >>
> >> Hi Minglei,
> >>
> >> On 2018/12/23 ??8:38, Ming Lei wrote:
> >>> Hi Shanhai,
> >>>
> >>> On Fri, Dec 21, 2018@2:05 PM Shan Hai <shan.hai@oracle.com> wrote:
> >>>> The num_possible_cpus() number of io queues by default would cause
> >>>> irq vector shortage problem on a large system when hotplugging cpus,
> >>>> add a module parameter to set number of io queues according to the
> >>>> system configuration to fix the issue.
> >>> Yeah, the default nr_io_queues is num_possible_cpus(), which can be a bit
> >>> big on some systems which supports small number of irq vectors.
> >>>
> >>> But nvme_setup_irqs() may decrease nr_io_queues and try to allocate
> >>> again until it succeeds.
> >>>
> >>> Could you share us what the actual issue is?
> >>
> >>
> >> On an 8-way NUMA with total 384 CPUs system installed with multiple NVME
> >> storage devices the CPU
> >>
> >> offline operation will fail when the online CPU numbers drop to a
> >> certain value, the failure is caused by
> >>
> >> cpu interrupt vector exhaustion because the irqs of the NVME have to be
> >> migrated to the online CPUs.
> >
> > I can understand there is issue when the whole system has very limited
> > irq vectors,
> > then some NVMe may consume too many irq vectors, and the remained NVMe
> > may not get any irq vectors left. Is this your case?
> >
>
> The problem only occurs on cpu offlining.
>
> > But I don't understand ' the irqs of the NVME have to be  migrated to
> > the online CPUs.',
> > in theory one IRQ vector is enough to drive NVMe, so could you explain it a bit?
> >
>
> Oops, it's not the migration of the NVME interrupts, sorry.
> The interrupt migration failure occurs on other multi-queue devices
> like NICs which has not use managed irq feature yet, so the migration
> of interrupts of theses devices will fail because the NVMEs consume
> much more vectors.

OK, I guess NICs may allocate irq vectors in case of migration.

BTW, do you have any logs about this failure? So we can easily recognize
this kind of issue if it is reported by someone else.

Yeah, for NVMe, in case of big system with lots CPU cores, it looks not fair
to do the 1:1 mapping, because actually one IRQ vector is allocated for each
CPU core, and it shouldn't take so many CPUs just for serving IO.

So far, looks it is fine to introduce module parameter to limit the allocation
for this issue, even though it isn't flexible.

Another candidate approach might be to support it via multi queue mapping
style, we may introduce one new parameter of 'default_queues' for this purpose,
just like 'write_queues' and 'poll_queues'.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-24  2:46         ` Ming Lei
@ 2018-12-24  3:05           ` Shan Hai
  0 siblings, 0 replies; 13+ messages in thread
From: Shan Hai @ 2018-12-24  3:05 UTC (permalink / raw)




On 2018/12/24 ??10:46, Ming Lei wrote:
> On Mon, Dec 24, 2018@10:12 AM Shan Hai <shan.hai@oracle.com> wrote:
>>
>>
>>
>> On 2018/12/24 ??9:47, Ming Lei wrote:
>>> On Mon, Dec 24, 2018@9:02 AM Shan Hai <shan.hai@oracle.com> wrote:
>>>>
>>>> Hi Minglei,
>>>>
>>>> On 2018/12/23 ??8:38, Ming Lei wrote:
>>>>> Hi Shanhai,
>>>>>
>>>>> On Fri, Dec 21, 2018@2:05 PM Shan Hai <shan.hai@oracle.com> wrote:
>>>>>> The num_possible_cpus() number of io queues by default would cause
>>>>>> irq vector shortage problem on a large system when hotplugging cpus,
>>>>>> add a module parameter to set number of io queues according to the
>>>>>> system configuration to fix the issue.
>>>>> Yeah, the default nr_io_queues is num_possible_cpus(), which can be a bit
>>>>> big on some systems which supports small number of irq vectors.
>>>>>
>>>>> But nvme_setup_irqs() may decrease nr_io_queues and try to allocate
>>>>> again until it succeeds.
>>>>>
>>>>> Could you share us what the actual issue is?
>>>>
>>>>
>>>> On an 8-way NUMA with total 384 CPUs system installed with multiple NVME
>>>> storage devices the CPU
>>>>
>>>> offline operation will fail when the online CPU numbers drop to a
>>>> certain value, the failure is caused by
>>>>
>>>> cpu interrupt vector exhaustion because the irqs of the NVME have to be
>>>> migrated to the online CPUs.
>>>
>>> I can understand there is issue when the whole system has very limited
>>> irq vectors,
>>> then some NVMe may consume too many irq vectors, and the remained NVMe
>>> may not get any irq vectors left. Is this your case?
>>>
>>
>> The problem only occurs on cpu offlining.
>>
>>> But I don't understand ' the irqs of the NVME have to be  migrated to
>>> the online CPUs.',
>>> in theory one IRQ vector is enough to drive NVMe, so could you explain it a bit?
>>>
>>
>> Oops, it's not the migration of the NVME interrupts, sorry.
>> The interrupt migration failure occurs on other multi-queue devices
>> like NICs which has not use managed irq feature yet, so the migration
>> of interrupts of theses devices will fail because the NVMEs consume
>> much more vectors.
> 
> OK, I guess NICs may allocate irq vectors in case of migration.
> 
> BTW, do you have any logs about this failure? So we can easily recognize
> this kind of issue if it is reported by someone else.
> 

OK, I'll include a log in the comments of the v2 patches, thanks for the
suggestion.

> Yeah, for NVMe, in case of big system with lots CPU cores, it looks not fair
> to do the 1:1 mapping, because actually one IRQ vector is allocated for each
> CPU core, and it shouldn't take so many CPUs just for serving IO.
> 
> So far, looks it is fine to introduce module parameter to limit the allocation
> for this issue, even though it isn't flexible.
> 
> Another candidate approach might be to support it via multi queue mapping
> style, we may introduce one new parameter of 'default_queues' for this purpose,
> just like 'write_queues' and 'poll_queues'.
> 

Agreed, but it needs more efforts on rebuilding cpu to hw queue mappings
etc. in my opinion, I will think about is anyway.

Thanks
Shan Hai

> Thanks,
> Ming Lei
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-21  6:04 [PATCH 1/2] nvme-pci: add module param for io queue number Shan Hai
                   ` (2 preceding siblings ...)
  2018-12-23  0:38 ` Ming Lei
@ 2018-12-26 10:23 ` Ming Lei
  3 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2018-12-26 10:23 UTC (permalink / raw)


On Fri, Dec 21, 2018@02:04:12PM +0800, Shan Hai wrote:
> The num_possible_cpus() number of io queues by default would cause
> irq vector shortage problem on a large system when hotplugging cpus,
> add a module parameter to set number of io queues according to the
> system configuration to fix the issue.
> 
> Signed-off-by: Shan Hai <shan.hai at oracle.com>
> ---
>  drivers/nvme/host/pci.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c33bb20..0d60451 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -64,6 +64,16 @@ MODULE_PARM_DESC(sgl_threshold,
>  		"Use SGLs when average request segment size is larger or equal to "
>  		"this size. Use 0 to disable SGLs.");
>  
> +static int io_queue_number_set(const char *val, const struct kernel_param *kp);
> +static const struct kernel_param_ops io_queue_number_ops = {
> +	.set = io_queue_number_set,
> +	.get = param_get_uint,
> +};
> +
> +static unsigned int io_queue_number = UINT_MAX;
> +module_param_cb(io_queue_number, &io_queue_number_ops, &io_queue_number, 0644);
> +MODULE_PARM_DESC(io_queue_number, "set io queue number, should >= 2");

I suggest to name it as 'default_queues', and 'queue_count_ops' can be
reused too.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2018-12-24  1:10   ` Shan Hai
@ 2019-01-04 18:09     ` Christoph Hellwig
  2019-01-05  0:18       ` Shan Hai
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2019-01-04 18:09 UTC (permalink / raw)


On Mon, Dec 24, 2018@09:10:41AM +0800, Shan Hai wrote:
> The problem occurs when offlining CPUs aggressively on a system with large
> number of cores,
> 
> for instance offlining 368? out of 384 total cores, the failure of hotplug
> is caused by device irq migration
> 
> on cpu hotplugging, the situation becomes worse when there are multiple NVME
> devices are available

NVMe uses managed interrupts, which should never be migrated.  What
exact kernel version do you see this on?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] nvme-pci: add module param for io queue number
  2019-01-04 18:09     ` Christoph Hellwig
@ 2019-01-05  0:18       ` Shan Hai
  0 siblings, 0 replies; 13+ messages in thread
From: Shan Hai @ 2019-01-05  0:18 UTC (permalink / raw)




On 2019/1/5 ??2:09, Christoph Hellwig wrote:
> On Mon, Dec 24, 2018@09:10:41AM +0800, Shan Hai wrote:
>> The problem occurs when offlining CPUs aggressively on a system with large
>> number of cores,
>>
>> for instance offlining 368? out of 384 total cores, the failure of hotplug
>> is caused by device irq migration
>>
>> on cpu hotplugging, the situation becomes worse when there are multiple NVME
>> devices are available
> 
> NVMe uses managed interrupts, which should never be migrated.  What
> exact kernel version do you see this on?
> 

Yes the NVMe itself is not a problem, but there are NIC drivers which have not
adopted the managed interrupts feature and the IRQs will be migrated on hotplug.

The CPU-hotplug failure is because of failed NIC IRQ migration, we can tolerate
smaller NVMe queues which do not harm the performance too much to keep the NIC
intact, but didn't find other ways to set the affinity of the NVMe IRQs.

Thanks
Shan Hai
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-01-05  0:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-21  6:04 [PATCH 1/2] nvme-pci: add module param for io queue number Shan Hai
2018-12-21  6:04 ` [PATCH 2/2] nvme-pci: take the io_queue_number into account when setting number of io queues Shan Hai
2018-12-21 15:02 ` [PATCH 1/2] nvme-pci: add module param for io queue number Bart Van Assche
2018-12-24  1:10   ` Shan Hai
2019-01-04 18:09     ` Christoph Hellwig
2019-01-05  0:18       ` Shan Hai
2018-12-23  0:38 ` Ming Lei
2018-12-24  1:02   ` Shan Hai
2018-12-24  1:47     ` Ming Lei
2018-12-24  2:12       ` Shan Hai
2018-12-24  2:46         ` Ming Lei
2018-12-24  3:05           ` Shan Hai
2018-12-26 10:23 ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.