Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	john.hubbard@gmail.com, Michal Hocko <mhocko@kernel.org>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, Doug Ledford <dledford@redhat.com>,
	Mike Marciniszyn <mike.marciniszyn@intel.com>,
	Dennis Dalessandro <dennis.dalessandro@intel.com>,
	Christian Benvenuti <benve@cisco.com>
Subject: Re: [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call
Date: Mon, 1 Oct 2018 08:51:46 -0700
Message-ID: <20181001155146.GA30236@infradead.org> (raw)
In-Reply-To: <20181001152929.GA21881@bombadil.infradead.org>

On Mon, Oct 01, 2018 at 08:29:29AM -0700, Matthew Wilcox wrote:
> I don't understand the dislike of the sg list.  Other than for special
> cases which we should't be optimising for (ramfs, brd, loopback
> filesystems), when we get a page to do I/O, we're going to want a dma
> mapping for them.  It makes sense to already allocate space to store
> the mapping at the outset.

We don't actually need the space - the scatterlist forces it on us,
otherwise we could translate directly in the on-disk format and
save that duplicate space.  I have prototypes for NVMe and RDMA that do
away with the scatterlist entirely.

And even if we are still using the scatterlist as we do right now we'd
need a second scatterlist at least for block / file system based I/O
as we can't plug the scatterlist into the I/O stack (nevermind that
due to splitting merging the lower one might not map 1:1 to the upper
one).

> [1] Can we ever admit that the bio_vec and the skb_frag_t are actually
> the same thing?

When I brought this up years ago the networking folks insisted that
their use of u16 offset/size fields was important for performance,
while for bio_vecs we needed the larger ones for some cases.  Since
then networking switched to 32-bit fields for what is now the fast
path, so it might be worth to give it another spin.

Than should also help with using my new bio_vec based dma-mapping
helpers to batch iommu mappings in networking, which Jesper had on
his todo list as all the indirect calls are causing performance
issues.

  reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-28  5:39 [PATCH 0/4] get_user_pages*() and RDMA: first steps john.hubbard
2018-09-28  5:39 ` [PATCH 1/4] mm: get_user_pages: consolidate error handling john.hubbard
2018-09-28  5:39 ` [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call john.hubbard
2018-09-28 15:39   ` Jason Gunthorpe
2018-09-29  3:12     ` John Hubbard
2018-09-29 16:21       ` Matthew Wilcox
2018-09-29 19:19         ` Jason Gunthorpe
2018-10-01 12:50         ` Christoph Hellwig
2018-10-01 15:29           ` Matthew Wilcox
2018-10-01 15:51             ` Christoph Hellwig [this message]
2018-10-01 14:35       ` Dennis Dalessandro
2018-10-03  5:40         ` John Hubbard
2018-10-03 16:27       ` Jan Kara
2018-10-03 23:19         ` John Hubbard
2018-09-28  5:39 ` [PATCH 2/4] mm: introduce put_user_page(), placeholder version john.hubbard
2018-10-03 16:22   ` Jan Kara
2018-10-03 23:23     ` John Hubbard
2018-09-28  5:39 ` [PATCH 4/4] goldfish_pipe/mm: convert to the new release_user_pages() call john.hubbard
2018-09-28 15:29 ` [PATCH 0/4] get_user_pages*() and RDMA: first steps Jerome Glisse
2018-09-28 19:06   ` John Hubbard
2018-09-28 21:49     ` Jerome Glisse
2018-09-29  2:28       ` John Hubbard
2018-09-29  8:46         ` Jerome Glisse
2018-10-01  6:11           ` Dave Chinner
2018-10-01 12:47             ` Christoph Hellwig
2018-10-02  1:14               ` Dave Chinner
2018-10-03 16:21                 ` Jan Kara
2018-10-01 15:31             ` Jason Gunthorpe
2018-10-03 16:08           ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181001155146.GA30236@infradead.org \
    --to=hch@infradead.org \
    --cc=benve@cisco.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=john.hubbard@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git