linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@fb.com>, Hannes Reinecke <hare@suse.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-scsi@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Long Li <longli@microsoft.com>,
	John Garry <john.garry@huawei.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-nvme@lists.infradead.org,
	Keith Busch <keith.busch@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
Date: Tue, 3 Sep 2019 11:30:08 +0800	[thread overview]
Message-ID: <20190903033001.GB23861@ming.t460p> (raw)
In-Reply-To: <alpine.DEB.2.21.1908281605190.23149@nanos.tec.linutronix.de>

On Wed, Aug 28, 2019 at 04:07:19PM +0200, Thomas Gleixner wrote:
> On Wed, 28 Aug 2019, Ming Lei wrote:
> > On Wed, Aug 28, 2019 at 01:23:06PM +0200, Thomas Gleixner wrote:
> > > On Wed, 28 Aug 2019, Ming Lei wrote:
> > > > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote:
> > > > > > > Also how is that supposed to work when sched_clock is jiffies based?
> > > > > > 
> > > > > > Good catch, looks ktime_get_ns() is needed.
> > > > > 
> > > > > And what is ktime_get_ns() returning when the only available clocksource is
> > > > > jiffies?
> > > > 
> > > > IMO, it isn't one issue. If the only clocksource is jiffies, we needn't to
> > > > expect high IO performance. Then it is fine to always handle the irq in
> > > > interrupt context or thread context.
> > > > 
> > > > However, if it can be recognized runtime, irq_flood_detected() can
> > > > always return true or false.
> > > 
> > > Right. The clocksource is determined at runtime. And if there is no high
> > > resolution clocksource then that function will return crap.
> > 
> > This patch still works even though the only clocksource is jiffies.
> 
> Works by some definition of works, right?

I am not sure there is such system which doesn't provide any high resolution
clocksource, meantime there is one high performance storage device
attached, and expect top IO performance can be reached.

Suppose there is such system: I mean that irq_flood_detected() returns either
true or false, then the actual IO performance should be accepted on
system without high resolution clocksource from user view.

> 
> > > Well, yes. But it's trivial enough to utilize parts of it for your
> > > purposes.
> > 
> > >From the code of kernel/irq/timing.c:
> > 
> > 1) record_irq_time() only records the start time of one irq, and not
> > consider the time taken in interrupt handler, so we can't figure out
> > the real interval between two do_IRQ() on one CPU
> 
> I said utilize and that means that the infrastructure can be used and
> extended. I did not say that it solves your problem, right?

The infrastructure is for predicating when the next interrupt comes,
which is used in PM cases(usually for mobile phone or power sensitive
cases). However, IRQ flood is used in high performance system(usually
enterprise case). The two use cases are actually orthogonal, also:

1) if the irq timing infrastructure is used, we have to apply the
management code on irq flood detection, for example, we have to
build the irq timing code in kernel and enable it. Then performance
regression might be caused for enterprise application.

2) irq timing's runtime overload is much higher, irq_timings_push()
touches much more memory footprint, since it records recent 32
irq's timestamp. That isn't what IRQ flood detection wants, also
not enough for flood detection.

3) irq flood detection itself is very simple, just one EWMA
calculation, see the following code:

irq_update_interval() (called from irq_enter())
        int cpu = raw_smp_processor_id();
        struct irq_interval *inter = per_cpu_ptr(&avg_irq_interval, cpu);
        u64 delta = sched_clock_cpu(cpu) - inter->last_irq_end;

        inter->avg = (inter->avg * IRQ_INTERVAL_EWMA_PREV_FACTOR +
                delta * IRQ_INTERVAL_EWMA_CURR_FACTOR) /
                IRQ_INTERVAL_EWMA_WEIGHT;

bool irq_flood_detected(void) (called from __handle_irq_event_percpu())
{
        return raw_cpu_ptr(&avg_irq_interval)->avg <= IRQ_FLOOD_THRESHOLD_NS;
}

irq_exit()
	inter->last_irq_end = sched_clock_cpu(smp_processor_id());

So there is basically nothing shared between the two, only one percpu
variable is needed for detecting irq flood.

> 
> > 2) irq/timing doesn't cover softirq
> 
> That's solvable, right?

Yeah, we can extend irq/timing, but ugly for irq/timing, since irq/timing
focuses on hardirq predication, and softirq isn't involved in that
purpose.

>  
> > Daniel, could you take a look and see if irq flood detection can be
> > implemented easily by irq/timing.c?
> 
> I assume you can take a look as well, right?

Yeah, I have looked at the code for a while, but I think that irq/timing
could become complicated unnecessarily for covering irq flood detection,
meantime it is much less efficient for detecting IRQ flood.

thanks,
Ming

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

       reply	other threads:[~2019-09-03  3:30 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190827085344.30799-1-ming.lei@redhat.com>
     [not found] ` <20190827085344.30799-2-ming.lei@redhat.com>
     [not found]   ` <alpine.DEB.2.21.1908271633450.1939@nanos.tec.linutronix.de>
     [not found]     ` <20190827225827.GA5263@ming.t460p>
     [not found]       ` <alpine.DEB.2.21.1908280104330.1939@nanos.tec.linutronix.de>
     [not found]         ` <20190828110633.GC15524@ming.t460p>
     [not found]           ` <alpine.DEB.2.21.1908281316230.1869@nanos.tec.linutronix.de>
     [not found]             ` <20190828135054.GA23861@ming.t460p>
     [not found]               ` <alpine.DEB.2.21.1908281605190.23149@nanos.tec.linutronix.de>
2019-09-03  3:30                 ` Ming Lei [this message]
2019-09-03  5:59                   ` [PATCH 1/4] softirq: implement IRQ flood detection mechanism Daniel Lezcano
2019-09-03  6:31                     ` Ming Lei
2019-09-03  6:40                       ` Daniel Lezcano
2019-09-03  7:28                         ` Ming Lei
2019-09-03  7:50                           ` Daniel Lezcano
2019-09-03  9:30                             ` Ming Lei
2019-09-04 17:07                             ` Bart Van Assche
2019-09-04 17:31                               ` Daniel Lezcano
2019-09-04 17:38                                 ` Bart Van Assche
2019-09-04 18:02                                   ` Peter Zijlstra
2019-09-04 19:47                                     ` Bart Van Assche
2019-09-05  9:11                                       ` Ming Lei
2019-09-05  9:06                                 ` Ming Lei
2019-09-05 10:37                                   ` Daniel Lezcano
2019-09-06  1:22                                     ` Long Li
2019-09-06  4:36                                       ` Daniel Lezcano
2019-09-06  4:44                                         ` Long Li
2019-09-06  1:48                                     ` Ming Lei
2019-09-06  5:14                                       ` Daniel Lezcano
2019-09-06 18:30                                         ` Sagi Grimberg
2019-09-06 18:52                                           ` Keith Busch
2019-09-07  0:01                                           ` Ming Lei
2019-09-10  3:10                                             ` Sagi Grimberg
2019-09-18  0:00                                               ` Long Li
2019-09-20 17:14                                                 ` Sagi Grimberg
2019-09-20 19:12                                                   ` Long Li
2019-09-20 20:45                                                     ` Sagi Grimberg
2019-09-24  0:57                                                       ` Long Li
2019-09-18 14:37                                               ` Ming Lei
2019-09-20 17:09                                                 ` Sagi Grimberg
2019-09-06 14:18                                       ` Keith Busch
2019-09-06 17:50                                         ` Long Li
2019-09-06 22:19                                           ` Ming Lei
2019-09-06 22:25                                             ` Keith Busch
2019-09-06 23:13                                               ` Ming Lei
2019-09-10  0:24                                             ` Ming Lei
2019-09-03  8:09                           ` Thomas Gleixner
2019-09-03  9:24                             ` Ming Lei
     [not found] ` <20190827085344.30799-5-ming.lei@redhat.com>
2019-09-06  8:50   ` [PATCH 4/4] genirq: use irq's affinity for threaded irq with IRQF_RESCUE_THREAD John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190903033001.GB23861@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@fb.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sagi@grimberg.me \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).