Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
From: Alan Stern <stern@rowland.harvard.edu>
To: Christoph Hellwig <hch@lst.de>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>,
	Linux-Renesas <linux-renesas-soc@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>
Subject: Re: How to resolve an issue in swiotlb environment?
Date: Tue, 11 Jun 2019 10:51:14 -0400 (EDT)
Message-ID: <Pine.LNX.4.44L0.1906110956510.1535-100000@iolanthe.rowland.org> (raw)
In-Reply-To: <20190611064158.GA20601@lst.de>

On Tue, 11 Jun 2019, Christoph Hellwig wrote:

> Hi Alan,
> 
> thanks for the explanation.  It seems like what usb wants is to:
> 
>  - set sg_tablesize to 1 for devices that can't handle scatterlist at all

Hmmm.  usb-storage (and possible other drivers too) currently handles
such controllers by setting up an SG transfer as a series of separate
URBs, one for each scatterlist entry.  But this is not the same thing,
for two reasons:

	It has less I/O overhead than setting sg_tablesize to 1 because 
	it sets up the whole transfer as a single SCSI command, which 
	requires much less time and traffic on the USB bus than sending 
	multiple commands.

	It has that requirement about each scatterlist element except
	the last being a multiple of the maximum packet size in length.
	(This is because the USB protocol says that a transfer ends
	whenever a less-than-maximum-size packet is encountered.)

We would like to avoid the extra I/O overhead for host controllers that
can't handle SG.  In fact, switching to sg_tablesize = 1 would probably
be considered a regression.

>  - set the virt boundary as-is for devices supporting "basic" scatterlist,
>    although that still assumes they can rejiggle them because for example
>    you could still get a smaller than expected first segment ala (assuming
>    a 1024 byte packet size and thus 1023 virt_boundary_mask):
> 
>         | 0 .. 511 | 512 .. 1023 | 1024 .. 1535 |
> 
>    as the virt_bondary does not guarantee that the first segment is
>    the same size as all the mid segments.

But that is exactly the problem we need to solve.

The issue which prompted the commit this thread is about arose in a
situation where the block layer set up a scatterlist containing buffer
sizes something like:

	4096 4096 1536 1024

and the maximum packet size was 1024.  The situation was a little 
unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't 
matter much in normal practice because:

	Block devices normally have a block size of 512 bytes or more.
	Smaller values are very uncommon.  So scatterlist element sizes
	are always divisible by 512.

	xHCI is the only USB host controller type with a maximum packet 
	size larger than 512, and xHCI hardware can do full 
	scatter-gather so it doesn't care what the buffer sizes are.

So another approach would be to fix vhci-hcd and then trust that the
problem won't arise again, for the reasons above.  We would be okay so
long as nobody tried to use a USB-SCSI device with a block size of 256
bytes or less.

>  - do not set any limit on xhci
> 
> But that just goes back to the original problem, and that is that with
> swiotlb we are limited in the total dma mapping size, and recent block
> layer changes in the way we handle the virt_boundary mean we now build
> much larger requests by default.  For SCSI ULDs to take that into
> account I need to call dma_max_mapping_size() and use that as the
> upper bound for the request size.  My plan is to do that in scsi_lib.c,
> but for that we need to expose the actual struct device that the dma
> mapping is perfomed on to the scsi layer.  If that device is different
> from the sysfs hierchary struct device, which it is for usb the ULDD
> needs to scsi_add_host_with_dma and pass the dma device as well.  How
> do I get at the dma device (aka the HCDs pci_dev or similar) from
> usb-storage/uas?

From usb_stor_probe2(): us->pusb_dev->bus->sysdev.
From uas_probe(): udev->bus->sysdev.

The ->sysdev field points to the device used for DMA mapping.  It is
often the same as ->controller, but sometimes it is
->controller->parent because of the peculiarities of some platforms.

Alan Stern


  reply index

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-03  6:42 Yoshihiro Shimoda
2019-06-07 12:00 ` Yoshihiro Shimoda
2019-06-10  7:31   ` Biju Das
2019-06-10 11:13   ` Yoshihiro Shimoda
2019-06-10 12:32     ` Christoph Hellwig
2019-06-10 18:46       ` Alan Stern
2019-06-11  6:41         ` Christoph Hellwig
2019-06-11 14:51           ` Alan Stern [this message]
2019-06-12  7:30             ` Christoph Hellwig
2019-06-12  8:52               ` Yoshihiro Shimoda
2019-06-12 11:31                 ` Christoph Hellwig
2019-06-13  4:52                   ` Yoshihiro Shimoda
2019-06-12 11:46               ` Oliver Neukum
2019-06-12 12:06                 ` Christoph Hellwig
2019-06-12 14:43                   ` Alan Stern
2019-06-13  7:39                     ` Christoph Hellwig
2019-06-13 16:57                       ` Martin K. Petersen
2019-06-13 17:16                       ` Alan Stern
2019-06-13 18:18                         ` Greg KH
2019-06-13 23:01                         ` shuah
2019-06-14 14:44                           ` Alan Stern
2019-06-18 15:28                             ` shuah
2019-06-19 20:23                               ` shuah
2019-06-19 21:05                                 ` Alan Stern
2019-06-21 17:43                                   ` Suwan Kim
2019-06-11  6:49         ` Yoshihiro Shimoda

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44L0.1906110956510.1535-100000@iolanthe.rowland.org \
    --to=stern@rowland.harvard.edu \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=yoshihiro.shimoda.uh@renesas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox