All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>, Chris Mason <clm@fb.com>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH 4/6] iomap: add struct iomap_ctx
Date: Tue, 17 Dec 2019 17:25:16 -0800	[thread overview]
Message-ID: <20191218012516.GA12752@magnolia> (raw)
In-Reply-To: <9941995e-19c5-507b-9339-b8d2cb568932@kernel.dk>

On Tue, Dec 17, 2019 at 05:15:46PM -0700, Jens Axboe wrote:
> On 12/17/19 1:26 PM, Linus Torvalds wrote:
> > On Tue, Dec 17, 2019 at 11:39 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> 'loff_t length' is not right.
> > 
> > Looking around, it does seem to get used that way. Too much, though.
> > 
> >>> +       loff_t pos = data->pos;
> >>> +       loff_t length = pos + data->len;
> >>
> >> And WTH is that? "pos + data->len" is not "length", that's end. And this:
> >>
> >>>         loff_t end = pos + length, done = 0;
> >>
> >> What? Now 'end' is 'pos+length', which is 'pos+pos+data->len'.
> > 
> > But this is unrelated to the crazy types. That just can't bve right.
> 
> Yeah, I fixed that one up, that was my error.
> 
> >> Is there some reason for this horrible case of "let's allow 64-bit sizes?"
> >>
> >> Because even if there is, it shouldn't be "loff_t". That's an
> >> _offset_. Not a length.
> > 
> > We do seem to have a lot of these across filesystems. And a lot of
> > confusion. Most of the IO reoutines clearly take or return a size_t
> > (returning ssize_t) as the IO size. And then you have the
> > zeroing/truncation stuff that tends to take loff_t. Which still smells
> > wrong, and s64 would look like a better case, but whatever.
> > 
> > The "iomap_zero_range() for truncate" case really does seem to need a
> > 64-bit value, because people do the difference of two loff_t's for it.
> > In fact, it almost looks like that function should take a "start ,
> > end" pair, which would make loff_t be the _right_ thing.

Yeah.  "loff_t length" always struck me as a little odd, but until now I
hadn't heard enough complaining about it to put any effort into fixing
the iomap_apply code that (afaict) mostly worked ok.  But it shouldn't
be a difficult change.

> > Because "length" really is just (a positive) size_t normally.

However, I don't think it's a good idea to reduce the @length argument
to size_t (and the iomap_apply return value to ssize_t) because they're
32-bit values and doing that will force iomap to clamp lengths and
return values to S32_MAX.  Instituting a ~2G max on read and write calls
is fine because those operate directly on file data (== slow), but the
vfs already clamps the length before the iov gets to iomap.

For the other iomap users that care more about the mappings and less
about the data in those mappings (seek hole, seek data, fiemap, swap) it
doesn't make much sense.  If the filesystem can send back a 100GB extent
map (e.g. holes in a sparse file, or we just have superstar allocation
strategies), the fs should send that straight to the iomap actor
function without having to cut that into 50x loop iterations.  Looking
ahead to things like file mapping leases, a (formerly wealthy) client
should be able to request a mmap lease on 100GB worth of pmem and get
the whole lease if the fs can allocate 100G at once.

I like the idea of making the length parameter and the return value
int64_t instead of loff_t.  Is int64_t the preferred typedef or s64?  I
forget.

> Honestly, I'd much rather leave the loff_t -> size_t/ssize_t to
> Darrick/Dave, it's really outside the scope of this patch, and I'd
> prefer not to have to muck with it. They probably feel the same way!

Don't forget Christoph.  Heh, we /did/ forget Christoph. :(
Maybe they have better historical context since they invented this iomap
mechanism for pnfs or something long before I came along.

--D

> -- 
> Jens Axboe
> 

  reply	other threads:[~2019-12-18  1:25 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-17 14:39 [PATCHSET v5 0/6] Support for RWF_UNCACHED Jens Axboe
2019-12-17 14:39 ` [PATCH 1/6] fs: add read support " Jens Axboe
2019-12-17 15:16   ` Guoqing Jiang
2019-12-17 16:42     ` Jens Axboe
2019-12-17 15:57   ` Matthew Wilcox
2019-12-17 16:41     ` Jens Axboe
2019-12-18  3:17   ` Dave Chinner
2019-12-17 14:39 ` [PATCH 2/6] mm: make generic_perform_write() take a struct kiocb Jens Axboe
2019-12-17 14:39 ` [PATCH 3/6] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
2019-12-17 14:39 ` [PATCH 4/6] iomap: add struct iomap_ctx Jens Axboe
2019-12-17 19:39   ` Linus Torvalds
2019-12-17 19:39     ` Linus Torvalds
2019-12-17 20:26     ` Linus Torvalds
2019-12-17 20:26       ` Linus Torvalds
2019-12-18  0:15       ` Jens Axboe
2019-12-18  1:25         ` Darrick J. Wong [this message]
2019-12-26  9:49   ` [iomap] 5c6f67c138: WARNING:at_fs/iomap/buffered-io.c:#iomap_readpages kernel test robot
2019-12-26  9:49     ` kernel test robot
2019-12-17 14:39 ` [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
2019-12-18  1:52   ` Darrick J. Wong
2019-12-18  3:18     ` Jens Axboe
2019-12-18  4:15       ` Darrick J. Wong
2019-12-17 14:39 ` [PATCH 6/6] xfs: don't do delayed allocations for uncached " Jens Axboe
2019-12-18  1:57   ` Darrick J. Wong
2019-12-18  1:57     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191218012516.GA12752@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.