All of lore.kernel.org
 help / color / mirror / Atom feed
* nvme-5.3 ssd performance regression
@ 2019-07-16  1:04 Benjamin Herrenschmidt
  2019-07-16  7:12 ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-16  1:04 UTC (permalink / raw)


Hi !

Something I noticed while testing those patches for the Apple drives.

With 5.2.0 (and those patches):

[benh at mini ~]$ sudo hdparm -tf /dev/nvme0n1

/dev/nvme0n1:
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 5960 MB in  3.00 seconds = 1986.22 MB/sec

(The numbers are reasonably stable accross multiple runs)

With nvme-5.3 (and those patches & the ctrl->opts NULL fix)

[benh at mini linux]$ sudo hdparm -tf /dev/nvme0n1

/dev/nvme0n1:
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 4520 MB in  3.00 seconds = 1506.63 MB/sec
[benh at mini linux]$ sudo hdparm -tf /dev/nvme0n1

Here too, the numbers are quite stable during a given boot but
interestingly they seem to change from boot to boot, I also observed
1700 MB/s for example. It's a rather major regression.

The .config is identical in both cases (an x86_64-defconfig with a
small tweak or two, I removed more drivers mostly and made nvme a
module)

In both cases the io scheduler is mq-deadline. The device has a single
IO queue.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-16  1:04 nvme-5.3 ssd performance regression Benjamin Herrenschmidt
@ 2019-07-16  7:12 ` Ming Lei
  2019-07-16  9:36   ` Christoph Hellwig
  2019-07-17  0:15   ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 12+ messages in thread
From: Ming Lei @ 2019-07-16  7:12 UTC (permalink / raw)


On Tue, Jul 16, 2019 at 9:05 AM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> Hi !
>
> Something I noticed while testing those patches for the Apple drives.
>
> With 5.2.0 (and those patches):
>
> [benh at mini ~]$ sudo hdparm -tf /dev/nvme0n1
>
> /dev/nvme0n1:
>  HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
>  Timing buffered disk reads: 5960 MB in  3.00 seconds = 1986.22 MB/sec
>
> (The numbers are reasonably stable accross multiple runs)
>
> With nvme-5.3 (and those patches & the ctrl->opts NULL fix)
>
> [benh at mini linux]$ sudo hdparm -tf /dev/nvme0n1
>
> /dev/nvme0n1:
>  HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
>  Timing buffered disk reads: 4520 MB in  3.00 seconds = 1506.63 MB/sec
> [benh at mini linux]$ sudo hdparm -tf /dev/nvme0n1
>
> Here too, the numbers are quite stable during a given boot but
> interestingly they seem to change from boot to boot, I also observed
> 1700 MB/s for example. It's a rather major regression.
>
> The .config is identical in both cases (an x86_64-defconfig with a
> small tweak or two, I removed more drivers mostly and made nvme a
> module)
>
> In both cases the io scheduler is mq-deadline. The device has a single
> IO queue.

It should be caused by the following commit, and that patch uses
single mapping size to limit max hw sectors, and looks that way
is wrong. For example, on qemu, max_hw_sectors is decreased to
512. You can try to revert the patch and see if it makes a difference.

I feel we might need the max segment size limit too.

commit 7637de311bd2124b298a072852448b940d8a34b9
Author: Christoph Hellwig <hch at lst.de>
Date:   Wed Jul 3 09:54:44 2019 -0700

    nvme-pci: limit max_hw_sectors based on the DMA max mapping size



Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-16  7:12 ` Ming Lei
@ 2019-07-16  9:36   ` Christoph Hellwig
  2019-07-16 10:55     ` Benjamin Herrenschmidt
  2019-07-16 12:47     ` Ming Lei
  2019-07-17  0:15   ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2019-07-16  9:36 UTC (permalink / raw)


On Tue, Jul 16, 2019@03:12:06PM +0800, Ming Lei wrote:
> It should be caused by the following commit, and that patch uses
> single mapping size to limit max hw sectors, and looks that way
> is wrong. For example, on qemu, max_hw_sectors is decreased to
> 512. You can try to revert the patch and see if it makes a difference.
> 
> I feel we might need the max segment size limit too.

No, with swiotlb it really is the whole request size that ?s limited
by the swiotlb buffer size.  Similar for potential iommus where
again it is the whole thing.

But looking at the implementation of dma_direct_max_mapping_size I
think we need to relax it - it currently limits the size as soon
as swiotlb is enabled, and not just if we need it for a given device.

We have some open coded versions of those checks elsewhere, let me
cook up a series to sort that mess out.

Ben, in then meantime can you try a revert of the commit Ming identified?
That is obviously not the proper fix, but it would help validating the
hypothesis.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-16  9:36   ` Christoph Hellwig
@ 2019-07-16 10:55     ` Benjamin Herrenschmidt
  2019-07-16 12:47     ` Ming Lei
  1 sibling, 0 replies; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-16 10:55 UTC (permalink / raw)


On Tue, 2019-07-16@11:36 +0200, Christoph Hellwig wrote:
> On Tue, Jul 16, 2019@03:12:06PM +0800, Ming Lei wrote:
> > It should be caused by the following commit, and that patch uses
> > single mapping size to limit max hw sectors, and looks that way
> > is wrong. For example, on qemu, max_hw_sectors is decreased to
> > 512. You can try to revert the patch and see if it makes a difference.
> > 
> > I feel we might need the max segment size limit too.
> 
> No, with swiotlb it really is the whole request size that ?s limited
> by the swiotlb buffer size.  Similar for potential iommus where
> again it is the whole thing.
> 
> But looking at the implementation of dma_direct_max_mapping_size I
> think we need to relax it - it currently limits the size as soon
> as swiotlb is enabled, and not just if we need it for a given device.
> 
> We have some open coded versions of those checks elsewhere, let me
> cook up a series to sort that mess out.
> 
> Ben, in then meantime can you try a revert of the commit Ming identified?
> That is obviously not the proper fix, but it would help validating the
> hypothesis.

Sure I was going to do it anyways. I screwed up my remote access
tonight so it will have to wait for tomorrow though.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-16  9:36   ` Christoph Hellwig
  2019-07-16 10:55     ` Benjamin Herrenschmidt
@ 2019-07-16 12:47     ` Ming Lei
  2019-07-16 19:12       ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Ming Lei @ 2019-07-16 12:47 UTC (permalink / raw)


On Tue, Jul 16, 2019@11:36:17AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 16, 2019@03:12:06PM +0800, Ming Lei wrote:
> > It should be caused by the following commit, and that patch uses
> > single mapping size to limit max hw sectors, and looks that way
> > is wrong. For example, on qemu, max_hw_sectors is decreased to
> > 512. You can try to revert the patch and see if it makes a difference.
> > 
> > I feel we might need the max segment size limit too.
> 
> No, with swiotlb it really is the whole request size that ?s limited
> by the swiotlb buffer size.  Similar for potential iommus where
> again it is the whole thing.

Documentation/DMA-API.txt:

		size_t
		dma_max_mapping_size(struct device *dev);
	
	Returns the maximum size of a mapping for the device. The size parameter
	of the mapping functions like dma_map_single(), dma_map_page() and
	others should not be larger than the returned value.

And dma_map_single() & dma_map_page() is usually for mapping single
element of SGL, instead of whole SGL.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-16 12:47     ` Ming Lei
@ 2019-07-16 19:12       ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2019-07-16 19:12 UTC (permalink / raw)


On Tue, Jul 16, 2019@08:47:42PM +0800, Ming Lei wrote:
> > No, with swiotlb it really is the whole request size that ?s limited
> > by the swiotlb buffer size.  Similar for potential iommus where
> > again it is the whole thing.
> 
> Documentation/DMA-API.txt:
> 
> 		size_t
> 		dma_max_mapping_size(struct device *dev);
> 	
> 	Returns the maximum size of a mapping for the device. The size parameter
> 	of the mapping functions like dma_map_single(), dma_map_page() and
> 	others should not be larger than the returned value.
> 
> And dma_map_single() & dma_map_page() is usually for mapping single
> element of SGL, instead of whole SGL.

... and others.  And if you look at the actual swiotlb code it is
pretty clear that the whole request counts.  But I'll make sure to
make clarify the documentation when fixing the issue.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-16  7:12 ` Ming Lei
  2019-07-16  9:36   ` Christoph Hellwig
@ 2019-07-17  0:15   ` Benjamin Herrenschmidt
  2019-07-17  4:26     ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-17  0:15 UTC (permalink / raw)


On Tue, 2019-07-16@15:12 +0800, Ming Lei wrote:
> I feel we might need the max segment size limit too.
> 
> commit 7637de311bd2124b298a072852448b940d8a34b9
> Author: Christoph Hellwig <hch at lst.de>
> Date:   Wed Jul 3 09:54:44 2019 -0700
> 
>     nvme-pci: limit max_hw_sectors based on the DMA max mapping size

Yes. Reverting this brings the speed back to 2GB/s

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-17  0:15   ` Benjamin Herrenschmidt
@ 2019-07-17  4:26     ` Christoph Hellwig
  2019-07-17  4:33       ` Benjamin Herrenschmidt
  2019-07-17  6:19       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2019-07-17  4:26 UTC (permalink / raw)


On Wed, Jul 17, 2019@10:15:00AM +1000, Benjamin Herrenschmidt wrote:
> Yes. Reverting this brings the speed back to 2GB/s

Thanks.  Can you try this series as well:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-max-mapping-size-fix

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-17  4:26     ` Christoph Hellwig
@ 2019-07-17  4:33       ` Benjamin Herrenschmidt
  2019-07-17  4:40         ` Christoph Hellwig
  2019-07-17  6:19       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-17  4:33 UTC (permalink / raw)


On Wed, 2019-07-17@06:26 +0200, Christoph Hellwig wrote:
> On Wed, Jul 17, 2019 at 10:15:00AM +1000, Benjamin Herrenschmidt
> wrote:
> > Yes. Reverting this brings the speed back to 2GB/s
> 
> Thanks.  Can you try this series as well:
> 
> 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-max-mapping-size-fix

With or without

 "nvme-pci: limit max_hw_sectors based on the DMA max mapping size"

?

Ben.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-17  4:33       ` Benjamin Herrenschmidt
@ 2019-07-17  4:40         ` Christoph Hellwig
  2019-07-17  5:50           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2019-07-17  4:40 UTC (permalink / raw)


On Wed, Jul 17, 2019@02:33:59PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2019-07-17@06:26 +0200, Christoph Hellwig wrote:
> > On Wed, Jul 17, 2019 at 10:15:00AM +1000, Benjamin Herrenschmidt
> > wrote:
> > > Yes. Reverting this brings the speed back to 2GB/s
> > 
> > Thanks.  Can you try this series as well:
> > 
> > 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-max-mapping-size-fix
> 
> With or without
> 
>  "nvme-pci: limit max_hw_sectors based on the DMA max mapping size"
> 
> ?

With.  Just pull in the branch :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-17  4:40         ` Christoph Hellwig
@ 2019-07-17  5:50           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-17  5:50 UTC (permalink / raw)


On Wed, 2019-07-17@06:40 +0200, Christoph Hellwig wrote:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-max-mapping-size-fix
> > 
> > With or without
> > 
> >  "nvme-pci: limit max_hw_sectors based on the DMA max mapping size"
> > 
> > ?
> 
> With.  Just pull in the branch :)

Ah oops :-) Brain not engaged clearly ...

I'll get back to you when I get past the lockups caused by the Intel
graphic driver... Something is trying to do synchronous module loads
from async init calls and its' not happy.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* nvme-5.3 ssd performance regression
  2019-07-17  4:26     ` Christoph Hellwig
  2019-07-17  4:33       ` Benjamin Herrenschmidt
@ 2019-07-17  6:19       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2019-07-17  6:19 UTC (permalink / raw)


On Wed, 2019-07-17@06:26 +0200, Christoph Hellwig wrote:
> On Wed, Jul 17, 2019 at 10:15:00AM +1000, Benjamin Herrenschmidt
> wrote:
> > Yes. Reverting this brings the speed back to 2GB/s
> 
> Thanks.  Can you try this series as well:
> 
> 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-max-mapping-size-fix

This works. The performance is normal.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-07-17  6:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-16  1:04 nvme-5.3 ssd performance regression Benjamin Herrenschmidt
2019-07-16  7:12 ` Ming Lei
2019-07-16  9:36   ` Christoph Hellwig
2019-07-16 10:55     ` Benjamin Herrenschmidt
2019-07-16 12:47     ` Ming Lei
2019-07-16 19:12       ` Christoph Hellwig
2019-07-17  0:15   ` Benjamin Herrenschmidt
2019-07-17  4:26     ` Christoph Hellwig
2019-07-17  4:33       ` Benjamin Herrenschmidt
2019-07-17  4:40         ` Christoph Hellwig
2019-07-17  5:50           ` Benjamin Herrenschmidt
2019-07-17  6:19       ` Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.