Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

* Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
       [not found]               ` <alpine.DEB.2.21.1908281605190.23149@nanos.tec.linutronix.de>
@ 2019-09-03  3:30                 ` Ming Lei
  2019-09-03  5:59                   ` Daniel Lezcano
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2019-09-03  3:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jens Axboe, Hannes Reinecke, Daniel Lezcano, Sagi Grimberg,
	linux-scsi, Peter Zijlstra, Long Li, John Garry, LKML,
	linux-nvme, Keith Busch, Ingo Molnar, Christoph Hellwig

On Wed, Aug 28, 2019 at 04:07:19PM +0200, Thomas Gleixner wrote:
> On Wed, 28 Aug 2019, Ming Lei wrote:
> > On Wed, Aug 28, 2019 at 01:23:06PM +0200, Thomas Gleixner wrote:
> > > On Wed, 28 Aug 2019, Ming Lei wrote:
> > > > On Wed, Aug 28, 2019 at 01:09:44AM +0200, Thomas Gleixner wrote:
> > > > > > > Also how is that supposed to work when sched_clock is jiffies based?
> > > > > > 
> > > > > > Good catch, looks ktime_get_ns() is needed.
> > > > > 
> > > > > And what is ktime_get_ns() returning when the only available clocksource is
> > > > > jiffies?
> > > > 
> > > > IMO, it isn't one issue. If the only clocksource is jiffies, we needn't to
> > > > expect high IO performance. Then it is fine to always handle the irq in
> > > > interrupt context or thread context.
> > > > 
> > > > However, if it can be recognized runtime, irq_flood_detected() can
> > > > always return true or false.
> > > 
> > > Right. The clocksource is determined at runtime. And if there is no high
> > > resolution clocksource then that function will return crap.
> > 
> > This patch still works even though the only clocksource is jiffies.
> 
> Works by some definition of works, right?

I am not sure there is such system which doesn't provide any high resolution
clocksource, meantime there is one high performance storage device
attached, and expect top IO performance can be reached.

Suppose there is such system: I mean that irq_flood_detected() returns either
true or false, then the actual IO performance should be accepted on
system without high resolution clocksource from user view.

> 
> > > Well, yes. But it's trivial enough to utilize parts of it for your
> > > purposes.
> > 
> > >From the code of kernel/irq/timing.c:
> > 
> > 1) record_irq_time() only records the start time of one irq, and not
> > consider the time taken in interrupt handler, so we can't figure out
> > the real interval between two do_IRQ() on one CPU
> 
> I said utilize and that means that the infrastructure can be used and
> extended. I did not say that it solves your problem, right?

The infrastructure is for predicating when the next interrupt comes,
which is used in PM cases(usually for mobile phone or power sensitive
cases). However, IRQ flood is used in high performance system(usually
enterprise case). The two use cases are actually orthogonal, also:

1) if the irq timing infrastructure is used, we have to apply the
management code on irq flood detection, for example, we have to
build the irq timing code in kernel and enable it. Then performance
regression might be caused for enterprise application.

2) irq timing's runtime overload is much higher, irq_timings_push()
touches much more memory footprint, since it records recent 32
irq's timestamp. That isn't what IRQ flood detection wants, also
not enough for flood detection.

3) irq flood detection itself is very simple, just one EWMA
calculation, see the following code:

irq_update_interval() (called from irq_enter())
        int cpu = raw_smp_processor_id();
        struct irq_interval *inter = per_cpu_ptr(&avg_irq_interval, cpu);
        u64 delta = sched_clock_cpu(cpu) - inter->last_irq_end;

        inter->avg = (inter->avg * IRQ_INTERVAL_EWMA_PREV_FACTOR +
                delta * IRQ_INTERVAL_EWMA_CURR_FACTOR) /
                IRQ_INTERVAL_EWMA_WEIGHT;

bool irq_flood_detected(void) (called from __handle_irq_event_percpu())
{
        return raw_cpu_ptr(&avg_irq_interval)->avg <= IRQ_FLOOD_THRESHOLD_NS;
}

irq_exit()
	inter->last_irq_end = sched_clock_cpu(smp_processor_id());

So there is basically nothing shared between the two, only one percpu
variable is needed for detecting irq flood.

> 
> > 2) irq/timing doesn't cover softirq
> 
> That's solvable, right?

Yeah, we can extend irq/timing, but ugly for irq/timing, since irq/timing
focuses on hardirq predication, and softirq isn't involved in that
purpose.

>  
> > Daniel, could you take a look and see if irq flood detection can be
> > implemented easily by irq/timing.c?
> 
> I assume you can take a look as well, right?

Yeah, I have looked at the code for a while, but I think that irq/timing
could become complicated unnecessarily for covering irq flood detection,
meantime it is much less efficient for detecting IRQ flood.

thanks,
Ming

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread