linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Christoph Hellwig <hch@lst.de>, Ming Lei <tom.leiming@gmail.com>,
	linux-block <linux-block@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"open list:XFS FILESYSTEM" <linux-xfs@vger.kernel.org>,
	Dave Chinner <dchinner@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Christoph Lameter <cl@linux.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: block: DMA alignment of IO buffer allocated from slab
Date: Mon, 24 Sep 2018 20:28:26 -0700	[thread overview]
Message-ID: <20180925032826.GA4110@bombadil.infradead.org> (raw)
In-Reply-To: <20180925001615.GA14386@ming.t460p>

On Tue, Sep 25, 2018 at 08:16:16AM +0800, Ming Lei wrote:
> On Mon, Sep 24, 2018 at 11:57:53AM -0700, Matthew Wilcox wrote:
> > On Mon, Sep 24, 2018 at 09:19:44AM -0700, Bart Van Assche wrote:
> > You're not supposed to use kmalloc memory for DMA.  This is why we have
> > dma_alloc_coherent() and friends.  Also, from DMA-API.txt:
> 
> Please take a look at USB drivers, or storage drivers or scsi layer. Lot of
> DMA buffers are allocated via kmalloc.

Then we have lots of broken places.  I mean, this isn't new.  We used
to have lots of broken places that did DMA to the stack.  And then
the stack was changed to be vmalloc'ed and all those places got fixed.
The difference this time is that it's only certain rare configurations
that are broken, and the brokenness is only found by corruption in some
fairly unlikely scenarios.

> Also see the following description in DMA-API-HOWTO.txt:
> 
> 	If the device supports DMA, the driver sets up a buffer using kmalloc() or
> 	a similar interface, which returns a virtual address (X).  The virtual
> 	memory system maps X to a physical address (Y) in system RAM.  The driver
> 	can use virtual address X to access the buffer, but the device itself
> 	cannot because DMA doesn't go through the CPU virtual memory system.

Sure, but that's not addressing the cacheline coherency problem.

Regardless of what the docs did or didn't say, let's try answering
the question: what makes for a more useful system?

A: A kmalloc implementation which always returns an address suitable
for mapping using the DMA interfaces

B: A kmalloc implementation which is more efficient, but requires drivers
to use a different interface for allocating space for the purposes of DMA

I genuinely don't know the answer to this question, and I think there are
various people in this thread who believe A or B quite strongly.

I would also like to ask people who believe in A what should happen in
this situation:

        blocks = kmalloc(4, GFP_KERNEL);
        sg_init_one(&sg, blocks, 4);
...
        result = ntohl(*blocks);
        kfree(blocks);

(this is just one example; there are others).  Because if we have to
round all allocations below 64 bytes up to 64 bytes, that's going to be
a memory consumption problem.  On my laptop:

kmalloc-96         11527  15792     96   42    1 : slabdata    376    376      0
kmalloc-64         54406  62912     64   64    1 : slabdata    983    983      0
kmalloc-32         80325  84096     32  128    1 : slabdata    657    657      0
kmalloc-16         26844  30208     16  256    1 : slabdata    118    118      0
kmalloc-8          17141  21504      8  512    1 : slabdata     42     42      0

I make that an extra 1799 pages (7MB).  Not the end of the world, but
not free either.

  reply	other threads:[~2018-09-25  3:28 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-19  9:15 block: DMA alignment of IO buffer allocated from slab Ming Lei
2018-09-19  9:41 ` Vitaly Kuznetsov
2018-09-19 10:02   ` Ming Lei
2018-09-19 11:15     ` Vitaly Kuznetsov
2018-09-20  1:28       ` Ming Lei
2018-09-20  3:59         ` Yang Shi
2018-09-20  6:32         ` Christoph Hellwig
2018-09-20  6:31 ` Christoph Hellwig
2018-09-21 13:04   ` Vitaly Kuznetsov
2018-09-21 13:05     ` Christoph Hellwig
2018-09-21 15:00       ` Jens Axboe
2018-09-24 16:06       ` Christopher Lameter
2018-09-24 17:49         ` Jens Axboe
2018-09-24 18:00           ` Christopher Lameter
2018-09-24 18:09             ` Jens Axboe
2018-09-25  7:49               ` Dave Chinner
2018-09-25 15:44                 ` Jens Axboe
2018-09-25 21:04                 ` Matthew Wilcox
2018-09-23 22:42     ` Ming Lei
2018-09-24  9:46       ` Andrey Ryabinin
2018-09-24 14:19         ` Bart Van Assche
2018-09-24 14:43           ` Andrey Ryabinin
2018-09-24 15:08             ` Bart Van Assche
2018-09-24 15:52               ` Andrey Ryabinin
2018-09-24 15:58                 ` Bart Van Assche
2018-09-24 16:07                   ` Andrey Ryabinin
2018-09-24 16:19                     ` Bart Van Assche
2018-09-24 16:47                       ` Christopher Lameter
2018-09-24 18:57                       ` Matthew Wilcox
2018-09-24 19:56                         ` Bart Van Assche
2018-09-24 20:41                           ` Matthew Wilcox
2018-09-24 20:54                             ` Bart Van Assche
2018-09-24 21:09                               ` Matthew Wilcox
2018-09-25  0:16                         ` Ming Lei
2018-09-25  3:28                           ` Matthew Wilcox [this message]
2018-09-25  4:10                             ` Bart Van Assche
2018-09-25  4:44                               ` Matthew Wilcox
2018-09-25  6:55                                 ` Ming Lei
2018-09-24 15:17           ` Christopher Lameter
2018-09-25  0:20             ` Ming Lei
2018-09-20 14:07 ` Bart Van Assche
2018-09-21  1:56 ` Dave Chinner
2018-09-21  7:08   ` Christoph Hellwig
2018-09-21  7:25     ` Ming Lei
2018-09-21 14:59       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180925032826.GA4110@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=cl@linux.com \
    --cc=dchinner@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=tom.leiming@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).