linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sg_dma_page_iter offset & length considerations
@ 2019-04-24  7:22 Daniel Drake
  2019-04-24  7:57 ` Arnd Bergmann
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Daniel Drake @ 2019-04-24  7:22 UTC (permalink / raw)
  To: jgg, imre.deak, Linux Kernel, linux-mmc; +Cc: Oleksij Rempel

Hi,

In drivers/mmc/alcor.c we're working with a MMC controller which
supports DMA transfers split up into page-sized chunks. A DMA transfer
consists of multiple pages, after each page is transferred there is an
interrupt so that the driver can program the DMA address of the next
page in the sequence. All pages must be complete in length, only the
last one can be a partial transfer.

I thought that the sg_dma_page_iter API looked like a great fit here:
the driver can accept any old sglist, and then use this new API to
collapse it into a list of pages that can be easily iterated over, and
fed to the hardware one at a time.

But looking closer I think I may have made some bad assumptions, and
I'm left with some fundamental questions about this API.

Specifically I can see userspace generates requests which present a
sglist such as:
 - first entry with offset=1536 length=2560
 - 7 entries with offset=0 length=4096
 - last entry with offset=0 length=1536

I gather that dma_map_sg() will take care off the offsets, i.e. any
physical address I get with sg_page_iter_dma_address() will already
have the offset applied, so I don't have to worry about tracking that.

But what about the length? For every page returned by the iterator, I
can't assume that I am being asked to work with the full page, right?
Such as the first and last page in the above example. I need to go
back to the sglist to check the corresponding length variable, and
having to go back to check the sglist seems to defeat the convenience
of having the sglist collapsed into a list of pages by the iterator.

Any comments?

Thanks
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sg_dma_page_iter offset & length considerations
  2019-04-24  7:22 sg_dma_page_iter offset & length considerations Daniel Drake
@ 2019-04-24  7:57 ` Arnd Bergmann
  2019-04-24 11:21 ` Jason Gunthorpe
  2019-04-25  8:02 ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Arnd Bergmann @ 2019-04-24  7:57 UTC (permalink / raw)
  To: Daniel Drake
  Cc: Jason Gunthorpe, Imre Deak, Linux Kernel, linux-mmc, Oleksij Rempel

On Wed, Apr 24, 2019 at 9:22 AM Daniel Drake <drake@endlessm.com> wrote:
> In drivers/mmc/alcor.c we're working with a MMC controller which
> supports DMA transfers split up into page-sized chunks. A DMA transfer
> consists of multiple pages, after each page is transferred there is an
> interrupt so that the driver can program the DMA address of the next
> page in the sequence. All pages must be complete in length, only the
> last one can be a partial transfer.
>
> I thought that the sg_dma_page_iter API looked like a great fit here:
> the driver can accept any old sglist, and then use this new API to
> collapse it into a list of pages that can be easily iterated over, and
> fed to the hardware one at a time.
>
> But looking closer I think I may have made some bad assumptions, and
> I'm left with some fundamental questions about this API.
>
> Specifically I can see userspace generates requests which present a
> sglist such as:
>  - first entry with offset=1536 length=2560
>  - 7 entries with offset=0 length=4096
>  - last entry with offset=0 length=1536
>
> I gather that dma_map_sg() will take care off the offsets, i.e. any
> physical address I get with sg_page_iter_dma_address() will already
> have the offset applied, so I don't have to worry about tracking that.
>
> But what about the length? For every page returned by the iterator, I
> can't assume that I am being asked to work with the full page, right?
> Such as the first and last page in the above example. I need to go
> back to the sglist to check the corresponding length variable, and
> having to go back to check the sglist seems to defeat the convenience
> of having the sglist collapsed into a list of pages by the iterator.
>
> Any comments?

I would assume that 4K aligned data is the common case, as that
is the smallest page size we allow in the page cache, and SDHC
cards expect FAT32 with at least 4K aligned clusters as well
in order to get into their fast path.

I'd keep that assumption by default, and would suggest you fall
back to to a kmalloc() bounce buffer if you get a buffer from user
space that is not fully aligned. I suppose the only way to create
those would be using O_DIRECT writes.

     Arnd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sg_dma_page_iter offset & length considerations
  2019-04-24  7:22 sg_dma_page_iter offset & length considerations Daniel Drake
  2019-04-24  7:57 ` Arnd Bergmann
@ 2019-04-24 11:21 ` Jason Gunthorpe
  2019-04-25  7:38   ` Daniel Drake
  2019-04-25  8:02 ` Christoph Hellwig
  2 siblings, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2019-04-24 11:21 UTC (permalink / raw)
  To: Daniel Drake; +Cc: imre.deak, Linux Kernel, linux-mmc, Oleksij Rempel

On Wed, Apr 24, 2019 at 03:22:18PM +0800, Daniel Drake wrote:
> Hi,
> 
> In drivers/mmc/alcor.c we're working with a MMC controller which
> supports DMA transfers split up into page-sized chunks.

Keep in mind that sg_page_iter_page splits into PAGE_SIZE chuncks, so
if you HW needs exactly a 4k chunk or something then it is not the
right API.

I have in mind the idea to move this code:

https://patchwork.kernel.org/patch/10909379/

Into the global scatterlist someday if there are other users in the
tree that need arbitary splitting..

> Specifically I can see userspace generates requests which present a
> sglist such as:
>  - first entry with offset=1536 length=2560
>  - 7 entries with offset=0 length=4096
>  - last entry with offset=0 length=1536
> 
> I gather that dma_map_sg() will take care off the offsets, i.e. any
> physical address I get with sg_page_iter_dma_address() will already
> have the offset applied, so I don't have to worry about tracking that.

Well the DMA iter aligns everything to pages so all the offsets are
lost.

> But what about the length? For every page returned by the iterator, I
> can't assume that I am being asked to work with the full page,
> right?

So far no user has required the length/offset, but it would be easy
enough to make a function to calculate these values for the current
step.

This is because this API is used by drivers building page lists for
HW, and they usually have additional information outside the SGL that
indicates what the start/end offsets are.

A driver that simply wants a SGL with a capped max size (ie 4k?)
should use the dma_set_max_seg_size() API and just never get a SGE
with a larger length. 

But the SGE may still cross a page boundary..

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sg_dma_page_iter offset & length considerations
  2019-04-24 11:21 ` Jason Gunthorpe
@ 2019-04-25  7:38   ` Daniel Drake
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Drake @ 2019-04-25  7:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: imre.deak, Linux Kernel, linux-mmc, Oleksij Rempel, Arnd Bergmann

On Wed, Apr 24, 2019 at 7:22 PM Jason Gunthorpe <jgg@mellanox.com> wrote:
> A driver that simply wants a SGL with a capped max size (ie 4k?)
> should use the dma_set_max_seg_size() API and just never get a SGE
> with a larger length.

That sounds like exactly what we want here. However I tried that
function and it seems to have no effect. Larger SGEs still appear just
as readily as before.

I tried to find the code that honours the value set here, but didn't
find anything. Sure there are some non-x86 archs and some iommu
drivers that appear to do something with it, but in this case
(standard Intel x86_64) it seems like nothing observes/honours this
value.

That did lead me back to the similarly-named mmc_host.max_seg_size
though, which is passed through to blk_queue_max_segment_size().
That is documented as "Enables a low level driver to set an upper
limit on the size of a coalesced segment", but perhaps I don't fully
grasp what it means by "coalesced segment".

The alcor driver currently has:
    mmc->max_segs = 64;
    mmc->max_seg_size = 240 * 512;

which I had interpreted to mean "we can accept a sglist with maximum
64 entries, and if you sum the size of all entries in that sglist, the
total must not exceed 240 sectors". i.e. with "coalesced" referring to
the overall sum; the description does not suggest to me that we can
influence the size of an individual segment here.

Other mmc drivers may also share a similar interpretation, e.g. see atmel-mci.c:
    mmc->max_seg_size = mmc->max_blk_size * mmc->max_segs;

but I just experimented with the driver again, and setting
mmc_host.max_seg_size to 4096 does seem to do what we want here. Now
we get multi-entry sg lists but each sge is max 4096 bytes in size.

I'll do some more verification and then send a patch along these
lines. Thanks for your comments!

Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sg_dma_page_iter offset & length considerations
  2019-04-24  7:22 sg_dma_page_iter offset & length considerations Daniel Drake
  2019-04-24  7:57 ` Arnd Bergmann
  2019-04-24 11:21 ` Jason Gunthorpe
@ 2019-04-25  8:02 ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2019-04-25  8:02 UTC (permalink / raw)
  To: Daniel Drake; +Cc: jgg, imre.deak, Linux Kernel, linux-mmc, Oleksij Rempel

Given that mmc uses block layer helpers to build the sg list you
just have to set the right block layer and DMA layer (in case an
iommu merges during map_sg) dma_boundary paramters (PAGE_SIZE - 1),
and you should get sglists formatted to your requirements, no need
to use an iterator.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-04-25  8:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24  7:22 sg_dma_page_iter offset & length considerations Daniel Drake
2019-04-24  7:57 ` Arnd Bergmann
2019-04-24 11:21 ` Jason Gunthorpe
2019-04-25  7:38   ` Daniel Drake
2019-04-25  8:02 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).