linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] nvme: utilize two queue maps, one for reads and one for writes
Date: Thu, 15 Nov 2018 10:28:33 -0800	[thread overview]
Message-ID: <20181115182833.GA15729@roeck-us.net> (raw)

Hi Jens,

On Wed, Nov 14, 2018 at 10:12:44AM -0700, Jens Axboe wrote:
> On 11/13/18 9:52 PM, Guenter Roeck wrote:
> > On Tue, Nov 13, 2018 at 05:51:08PM -0700, Jens Axboe wrote:
> >> On 11/13/18 5:41 PM, Guenter Roeck wrote:
> >>> Hi,
> >>>
> >>> On Wed, Oct 31, 2018 at 08:36:31AM -0600, Jens Axboe wrote:
> >>>> NVMe does round-robin between queues by default, which means that
> >>>> sharing a queue map for both reads and writes can be problematic
> >>>> in terms of read servicing. It's much easier to flood the queue
> >>>> with writes and reduce the read servicing.
> >>>>
> >>>> Implement two queue maps, one for reads and one for writes. The
> >>>> write queue count is configurable through the 'write_queues'
> >>>> parameter.
> >>>>
> >>>> By default, we retain the previous behavior of having a single
> >>>> queue set, shared between reads and writes. Setting 'write_queues'
> >>>> to a non-zero value will create two queue sets, one for reads and
> >>>> one for writes, the latter using the configurable number of
> >>>> queues (hardware queue counts permitting).
> >>>>
> >>>> Reviewed-by: Hannes Reinecke <hare@suse.com>
> >>>> Reviewed-by: Keith Busch <keith.busch@intel.com>
> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>
> >>> This patch causes hangs when running recent versions of
> >>> -next with several architectures; see the -next column at
> >>> kerneltests.org/builders for details.  Bisect log below; this
> >>> was run with qemu on alpha. Reverting this patch as well as
> >>> "nvme: add separate poll queue map" fixes the problem.
> >>
> >> I don't see anything related to what hung, the trace, and so on.
> >> Can you clue me in? Where are the test results with dmesg?
> >>
> > alpha just stalls during boot. parisc reports a hung task
> > in nvme_reset_work. sparc64 reports EIO when instantiating
> > the nvme driver, called from nvme_reset_work, and then stalls.
> > In all three cases, reverting the two mentioned patches fixes
> > the problem.
> 
> I think the below patch should fix it.
> 
Sorry I wasn't able to test this earlier. Looks like it does
fix the problem; the problem is no longer seen in next-20181115.
Minor comment below.

Guenter

> > https://kerneltests.org/builders/qemu-parisc-next/builds/173/steps/qemubuildcommand_1/logs/stdio
> > 
> > is an example log for parisc.
> > 
> > I didn't check if the other boot failures (ppc looks bad)
> > have the same root cause.
> > 
> >> How to reproduce?
> >>
> > parisc:
> > 
> > qemu-system-hppa -kernel vmlinux -no-reboot \
> > 	-snapshot -device nvme,serial=foo,drive=d0 \
> > 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> > 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \
> > 	-nographic -monitor null
> > 
> > alpha:
> > 
> > qemu-system-alpha -M clipper -kernel arch/alpha/boot/vmlinux -no-reboot \
> > 	-snapshot -device nvme,serial=foo,drive=d0 \
> > 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> > 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
> > 	-m 128M -nographic -monitor null -serial stdio
> > 
> > sparc64:
> > 
> > qemu-system-sparc64 -M sun4u -cpu 'TI UltraSparc IIi' -m 512 \
> > 	-snapshot -device nvme,serial=foo,drive=d0,bus=pciB \
> > 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> > 	-kernel arch/sparc/boot/image -no-reboot \
> > 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
> > 	-nographic -monitor none
> > 
> > The root file systems are available from the respective subdirectories
> > of:
> > 
> > https://github.com/groeck/linux-build-test/tree/master/rootfs
> 
> This is useful, thanks! I haven't tried it yet, but I was able to
> reproduce on x86 with MSI turned off.
> 
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 8df868afa363..6c03461ad988 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2098,7 +2098,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
>  		.nr_sets = ARRAY_SIZE(irq_sets),
>  		.sets = irq_sets,
>  	};
> -	int result;
> +	int result = 0;
>  
>  	/*
>  	 * For irq sets, we have to ask for minvec == maxvec. This passes
> @@ -2113,9 +2113,16 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
>  			affd.nr_sets = 1;
>  
>  		/*
> -		 * Need IRQs for read+write queues, and one for the admin queue
> +		 * Need IRQs for read+write queues, and one for the admin queue.
> +		 * If we can't get more than one vector, we have to share the
> +		 * admin queue and IO queue vector. For that case, don't add
> +		 * an extra vector for the admin queue, or we'll continue
> +		 * asking for 2 and get -ENOSPC in return.
>  		 */
> -		nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
> +		if (result == -ENOSPC && nr_io_queues == 1)
> +			nr_io_queues = 1;

Setting nr_io_queues to 1 when it already is set to 1 doesn't really do
anything. Is this for clarification ?

> +		else
> +			nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
>  
>  		result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues,
>  				nr_io_queues,
> 
> -- 
> Jens Axboe
> 

             reply	other threads:[~2018-11-15 18:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-15 18:28 Guenter Roeck [this message]
2018-11-15 18:38 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2018-11-15 22:46 Guenter Roeck
2018-11-15 23:03 ` Jens Axboe
2018-11-15 19:11 Guenter Roeck
2018-11-15 19:29 ` Jens Axboe
2018-11-15 19:38   ` Guenter Roeck
2018-11-15 19:40     ` Jens Axboe
2018-11-15 19:43       ` Jens Axboe
2018-11-15 22:06         ` Guenter Roeck
2018-11-15 22:12           ` Jens Axboe
2018-11-15 19:36 ` Guenter Roeck
2018-11-15 19:39   ` Jens Axboe
2018-11-14  0:41 Guenter Roeck
2018-11-14  0:51 ` Jens Axboe
2018-11-14  4:52   ` Guenter Roeck
2018-11-14 17:12     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181115182833.GA15729@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=axboe@kernel.dk \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).