Re: [PATCH 0/4] nvme: Threaded interrupt handling improvements

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Christoph Hellwig <hch@lst.de>
Cc: sagi@grimberg.me, linux-nvme@lists.infradead.org,
	ming.lei@redhat.com, helgaas@kernel.org,
	Keith Busch <kbusch@kernel.org>,
	tglx@linutronix.de
Subject: Re: [PATCH 0/4] nvme: Threaded interrupt handling improvements
Date: Mon, 2 Dec 2019 20:57:30 +0100	[thread overview]
Message-ID: <20191202195730.bzzldihtv37odsie@linutronix.de> (raw)
In-Reply-To: <20191202171239.GA8547@lst.de>

On 2019-12-02 18:12:39 [+0100], Christoph Hellwig wrote:
> On Mon, Dec 02, 2019 at 06:05:38PM +0100, Sebastian Andrzej Siewior wrote:
> > That might be a misunderstanding. I think if your threaded-IRQ handler
> > is running legitimately for longer period of time (and making progress)
> > and IRQ core's "nobody-care" detector shuts it down then the detector
> > might need a tweak.
> > The worst thing that could happen, is that the RT tasks run for too long
> > and the scheduler punishes them to protect against run-away-tasks (the
> > default limit is at 950ms RT task time within 1 second,
> > sched_rt_runtime_us).
> 
> The problem is that by doing the agressive polling we can keep one
> CPU busy just running the irq handler and starve processes on that
> CPU if an NVMe queue servers multiple CPUs.

and this is bad? The scheduler will move everything to other CPUs unless
it is for pinned to this CPU. You can offload even RCU these days :)
Performance wise it might be better to dedicate one CPU doing this work
instead spreading it over four CPUs each doing a fraction of it and
using same cache lines which bounce from one CPU to the next.

> That's why I had the previous idea of one irq thread per cpu that
> is assigned to the irq.  We'd have to encode a relative index into
> the hardirq handler return value which we get from bits encoded in
> the NVMe command ID, but that should be doable.  At that point we
> shouldn't need the cond_resched.  I can try to hack that up, but
> I'm not an expert on the irq thread code.

there is always one IRQ-thread per-CPU/Interrupt.
You could start a kthread_create_worker_on_cpu() on multiple CPUs and
feed them with work from your interrupt. And if you make it SCHED_FIFO
then you should be able to run your completion on multiple CPUs from one 
interrupt.

Sebastian

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme