All of lore.kernel.org
 help / color / mirror / Atom feed
From: shan.hai@oracle.com (Shan Hai)
Subject: [PATCH 1/2] nvme-pci: add module param for io queue number
Date: Mon, 24 Dec 2018 11:05:33 +0800	[thread overview]
Message-ID: <3b6e64af-fbd7-713c-6b1f-b5efa0df86fb@oracle.com> (raw)
In-Reply-To: <CACVXFVPNSE4wM5C0bZgvYOx+mLsjRSQRk9_wv2V=+VVGGY7G7A@mail.gmail.com>



On 2018/12/24 ??10:46, Ming Lei wrote:
> On Mon, Dec 24, 2018@10:12 AM Shan Hai <shan.hai@oracle.com> wrote:
>>
>>
>>
>> On 2018/12/24 ??9:47, Ming Lei wrote:
>>> On Mon, Dec 24, 2018@9:02 AM Shan Hai <shan.hai@oracle.com> wrote:
>>>>
>>>> Hi Minglei,
>>>>
>>>> On 2018/12/23 ??8:38, Ming Lei wrote:
>>>>> Hi Shanhai,
>>>>>
>>>>> On Fri, Dec 21, 2018@2:05 PM Shan Hai <shan.hai@oracle.com> wrote:
>>>>>> The num_possible_cpus() number of io queues by default would cause
>>>>>> irq vector shortage problem on a large system when hotplugging cpus,
>>>>>> add a module parameter to set number of io queues according to the
>>>>>> system configuration to fix the issue.
>>>>> Yeah, the default nr_io_queues is num_possible_cpus(), which can be a bit
>>>>> big on some systems which supports small number of irq vectors.
>>>>>
>>>>> But nvme_setup_irqs() may decrease nr_io_queues and try to allocate
>>>>> again until it succeeds.
>>>>>
>>>>> Could you share us what the actual issue is?
>>>>
>>>>
>>>> On an 8-way NUMA with total 384 CPUs system installed with multiple NVME
>>>> storage devices the CPU
>>>>
>>>> offline operation will fail when the online CPU numbers drop to a
>>>> certain value, the failure is caused by
>>>>
>>>> cpu interrupt vector exhaustion because the irqs of the NVME have to be
>>>> migrated to the online CPUs.
>>>
>>> I can understand there is issue when the whole system has very limited
>>> irq vectors,
>>> then some NVMe may consume too many irq vectors, and the remained NVMe
>>> may not get any irq vectors left. Is this your case?
>>>
>>
>> The problem only occurs on cpu offlining.
>>
>>> But I don't understand ' the irqs of the NVME have to be  migrated to
>>> the online CPUs.',
>>> in theory one IRQ vector is enough to drive NVMe, so could you explain it a bit?
>>>
>>
>> Oops, it's not the migration of the NVME interrupts, sorry.
>> The interrupt migration failure occurs on other multi-queue devices
>> like NICs which has not use managed irq feature yet, so the migration
>> of interrupts of theses devices will fail because the NVMEs consume
>> much more vectors.
> 
> OK, I guess NICs may allocate irq vectors in case of migration.
> 
> BTW, do you have any logs about this failure? So we can easily recognize
> this kind of issue if it is reported by someone else.
> 

OK, I'll include a log in the comments of the v2 patches, thanks for the
suggestion.

> Yeah, for NVMe, in case of big system with lots CPU cores, it looks not fair
> to do the 1:1 mapping, because actually one IRQ vector is allocated for each
> CPU core, and it shouldn't take so many CPUs just for serving IO.
> 
> So far, looks it is fine to introduce module parameter to limit the allocation
> for this issue, even though it isn't flexible.
> 
> Another candidate approach might be to support it via multi queue mapping
> style, we may introduce one new parameter of 'default_queues' for this purpose,
> just like 'write_queues' and 'poll_queues'.
> 

Agreed, but it needs more efforts on rebuilding cpu to hw queue mappings
etc. in my opinion, I will think about is anyway.

Thanks
Shan Hai

> Thanks,
> Ming Lei
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 

  reply	other threads:[~2018-12-24  3:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-21  6:04 [PATCH 1/2] nvme-pci: add module param for io queue number Shan Hai
2018-12-21  6:04 ` [PATCH 2/2] nvme-pci: take the io_queue_number into account when setting number of io queues Shan Hai
2018-12-21 15:02 ` [PATCH 1/2] nvme-pci: add module param for io queue number Bart Van Assche
2018-12-24  1:10   ` Shan Hai
2019-01-04 18:09     ` Christoph Hellwig
2019-01-05  0:18       ` Shan Hai
2018-12-23  0:38 ` Ming Lei
2018-12-24  1:02   ` Shan Hai
2018-12-24  1:47     ` Ming Lei
2018-12-24  2:12       ` Shan Hai
2018-12-24  2:46         ` Ming Lei
2018-12-24  3:05           ` Shan Hai [this message]
2018-12-26 10:23 ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b6e64af-fbd7-713c-6b1f-b5efa0df86fb@oracle.com \
    --to=shan.hai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.