From: Long Li <longli@microsoft.com>
To: Sagi Grimberg <sagi@grimberg.me>, Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@fb.com>, Hannes Reinecke <hare@suse.com>,
John Garry <john.garry@huawei.com>,
Bart Van Assche <bvanassche@acm.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
LKML <linux-kernel@vger.kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
Keith Busch <keith.busch@intel.com>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Christoph Hellwig <hch@lst.de>
Subject: RE: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
Date: Tue, 24 Sep 2019 00:57:39 +0000 [thread overview]
Message-ID: <CY4PR21MB0741BC9992A7A945A0D4A62CCE840@CY4PR21MB0741.namprd21.prod.outlook.com> (raw)
In-Reply-To: <100d001a-1dda-32ff-fa5e-c18b121444d9@grimberg.me>
>Thanks for the clarification.
>
>The problem with what Ming is proposing in my mind (and its an existing
>problem that exists today), is that nvme is taking precedence over anything
>else until it absolutely cannot hog the cpu in hardirq.
>
>In the thread Ming referenced a case where today if the cpu core has a net
>softirq activity it cannot make forward progress. So with Ming's suggestion,
>net softirq will eventually make progress, but it creates an inherent fairness
>issue. Who said that nvme completions should come faster then the net rx/tx
>or another I/O device (or hrtimers or sched events...)?
>
>As much as I'd like nvme to complete as soon as possible, I might have other
>activities in the system that are as important if not more. So I don't think we
>can solve this with something that is not cooperative or fair with the rest of
>the system.
>
>>> If we are context switching too much, it means the soft-irq operation
>>> is not efficient, not necessarily the fact that the completion path
>>> is running in soft- irq..
>>>
>>> Is your kernel compiled with full preemption or voluntary preemption?
>>
>> The tests are based on Ubuntu 18.04 kernel configuration. Here are the
>parameters:
>>
>> # CONFIG_PREEMPT_NONE is not set
>> CONFIG_PREEMPT_VOLUNTARY=y
>> # CONFIG_PREEMPT is not set
>
>I see, so it still seems that irq_poll_softirq is still not efficient in reaping
>completions. reaping the completions on its own is pretty much the same in
>hard and soft irq, so its really the scheduling part that is creating the overhead
>(which does not exist in hard irq).
>
>Question:
>when you test with without the patch (completions are coming in hard-irq),
>do the fio threads that run on the cpu cores that are assigned to the cores that
>are handling interrupts get substantially lower throughput than the rest of the
>fio threads? I would expect that the fio threads that are running on the first 32
>cores to get very low iops (overpowered by the nvme interrupts) and the rest
>doing much more given that nvme has almost no limits to how much time it
>can spend on processing completions.
>
>If need_resched() is causing us to context switch too aggressively, does
>changing that to local_softirq_pending() make things better?
>--
>diff --git a/lib/irq_poll.c b/lib/irq_poll.c index d8eab563fa77..05d524fcaf04
>100644
>--- a/lib/irq_poll.c
>+++ b/lib/irq_poll.c
>@@ -116,7 +116,7 @@ static void __latent_entropy irq_poll_softirq(struct
>softirq_action *h)
> /*
> * If softirq window is exhausted then punt.
> */
>- if (need_resched())
>+ if (local_softirq_pending())
> break;
> }
>--
>
>Although, this can potentially cause other threads from making forward
>progress.. If it is better, perhaps we also need a time limit as well.
Thanks for this patch. The IOPS was about the same. (it tends to fluctuate more but within 3% variation)
I captured the following from one of the CPUs. All CPUs tend to have similar numbers. The following numbers are captured during 5 seconds and averaged:
Context switches/s:
Without any patch: 5
With the previous patch: 640
With this patch: 522
Process migrated/s:
Without any patch: 0.6
With the previous patch: 104
With this patch: 121
>
>Perhaps we should add statistics/tracing on how many completions we are
>reaping per invocation...
I'll look into a bit more on completion. From the numbers I think the increased number of context switches/migrations are hurting most on performance.
Thanks
Long
next prev parent reply other threads:[~2019-09-24 0:57 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-27 8:53 [PATCH 0/4] genirq/nvme: add IRQF_RESCUE_THREAD for avoiding IRQ flood Ming Lei
2019-08-27 8:53 ` [PATCH 1/4] softirq: implement IRQ flood detection mechanism Ming Lei
2019-08-27 14:42 ` Thomas Gleixner
2019-08-27 16:19 ` Thomas Gleixner
2019-08-27 23:04 ` Ming Lei
2019-08-27 23:12 ` Thomas Gleixner
2019-08-27 22:58 ` Ming Lei
2019-08-27 23:09 ` Thomas Gleixner
2019-08-28 11:06 ` Ming Lei
2019-08-28 11:23 ` Thomas Gleixner
2019-08-28 13:50 ` Ming Lei
2019-08-28 14:07 ` Thomas Gleixner
2019-09-03 3:30 ` Ming Lei
2019-09-03 5:59 ` Daniel Lezcano
2019-09-03 6:31 ` Ming Lei
2019-09-03 6:40 ` Daniel Lezcano
2019-09-03 7:28 ` Ming Lei
2019-09-03 7:50 ` Daniel Lezcano
2019-09-03 9:30 ` Ming Lei
2019-09-04 17:07 ` Bart Van Assche
2019-09-04 17:31 ` Daniel Lezcano
2019-09-04 17:38 ` Bart Van Assche
2019-09-04 18:02 ` Peter Zijlstra
2019-09-04 19:47 ` Bart Van Assche
2019-09-05 9:11 ` Ming Lei
2019-09-05 9:06 ` Ming Lei
2019-09-05 10:37 ` Daniel Lezcano
2019-09-06 1:22 ` Long Li
2019-09-06 4:36 ` Daniel Lezcano
2019-09-06 4:44 ` Long Li
2019-09-06 1:48 ` Ming Lei
2019-09-06 5:14 ` Daniel Lezcano
2019-09-06 18:30 ` Sagi Grimberg
2019-09-06 18:52 ` Keith Busch
2019-09-07 0:01 ` Ming Lei
2019-09-10 3:10 ` Sagi Grimberg
2019-09-18 0:00 ` Long Li
2019-09-20 17:14 ` Sagi Grimberg
2019-09-20 19:12 ` Long Li
2019-09-20 20:45 ` Sagi Grimberg
2019-09-24 0:57 ` Long Li [this message]
2019-09-18 14:37 ` Ming Lei
2019-09-20 17:09 ` Sagi Grimberg
2019-09-06 14:18 ` Keith Busch
2019-09-06 17:50 ` Long Li
2019-09-06 22:19 ` Ming Lei
2019-09-06 22:25 ` Keith Busch
2019-09-06 23:13 ` Ming Lei
2019-09-10 0:24 ` Ming Lei
2019-09-03 8:09 ` Thomas Gleixner
2019-09-03 9:24 ` Ming Lei
2019-08-29 6:15 ` Long Li
2019-08-30 0:55 ` Ming Lei
2019-08-27 8:53 ` [PATCH 2/4] genirq: add IRQF_RESCUE_THREAD Ming Lei
2019-08-27 8:53 ` [PATCH 3/4] nvme: pci: pass IRQF_RESCURE_THREAD to request_threaded_irq Ming Lei
2019-08-27 9:06 ` Johannes Thumshirn
2019-08-27 9:09 ` Ming Lei
2019-08-27 9:12 ` Johannes Thumshirn
2019-08-27 14:34 ` Keith Busch
2019-08-27 14:44 ` Keith Busch
2019-08-27 15:10 ` Bart Van Assche
2019-08-28 1:45 ` Ming Lei
2019-08-27 8:53 ` [PATCH 4/4] genirq: use irq's affinity for threaded irq with IRQF_RESCUE_THREAD Ming Lei
2019-08-27 14:35 ` Keith Busch
2019-09-06 8:50 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CY4PR21MB0741BC9992A7A945A0D4A62CCE840@CY4PR21MB0741.namprd21.prod.outlook.com \
--to=longli@microsoft.com \
--cc=axboe@fb.com \
--cc=bvanassche@acm.org \
--cc=daniel.lezcano@linaro.org \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=john.garry@huawei.com \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=sagi@grimberg.me \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).