All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Schoenebeck <linux_oss@crudebyte.com>
To: Al Viro <viro@zeniv.linux.org.uk>,
	Dominique Martinet <asmadeus@codewreck.org>
Cc: linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <willy@infradead.org>,
	David Howells <dhowells@redhat.com>,
	Christian Brauner <brauner@kernel.org>
Subject: Re: [PATCH 01/44] 9p: handling Rerror without copy_from_iter_full()
Date: Fri, 01 Jul 2022 18:02:31 +0200	[thread overview]
Message-ID: <6628265.VPEUYjqhpI@silver> (raw)
In-Reply-To: <Yr6TbVQvu+noSzc8@codewreck.org>

On Freitag, 1. Juli 2022 08:25:49 CEST Dominique Martinet wrote:
> (sigh, I'm tired -- said I'd add Christian in Ccs and promply forgot to
> do it. Sorry for double send to everyone else.)
> 
> +Christian Schoenebeck in Ccs as that concerns qemu as well.
> 
> The patch I'm replying to is at
> https://lkml.kernel.org/r/20220622041552.737754-1-viro@zeniv.linux.org.uk
> 
> Al Viro wrote on Wed, Jun 22, 2022 at 05:15:09AM +0100:
> >         p9_client_zc_rpc()/p9_check_zc_errors() are playing fast
> > 
> > and loose with copy_from_iter_full().
> > 
> > 	Reading from file is done by sending Tread request.  Response
> > 
> > consists of fixed-sized header (including the amount of data actually
> > read) followed by the data itself.
> > 
> > 	For zero-copy case we arrange the things so that the first
> > 
> > 11 bytes of reply go into the fixed-sized buffer, with the rest going
> > straight into the pages we want to read into.
> > 
> > 	What makes the things inconvenient is that sglist describing
> > 
> > what should go where has to be set *before* the reply arrives.  As
> > the result, if reply is an error, the things get interesting.  On success
> > we get
> > 
> > 	size[4] Rread tag[2] count[4] data[count]
> > 
> > For error layout varies depending upon the protocol variant -
> > in original 9P and 9P2000 it's
> > 
> > 	size[4] Rerror tag[2] len[2] error[len]
> > 
> > in 9P2000.U
> > 
> > 	size[4] Rerror tag[2] len[2] error[len] errno[4]
> > 
> > in 9P2000.L
> > 
> > 	size[4] Rlerror tag[2] errno[4]
> > 	
> > 	The last case is nice and simple - we have an 11-byte response
> > 
> > that fits into the fixed-sized buffer we hoped to get an Rread into.
> > In other two, though, we get a variable-length string spill into the
> > pages we'd prepared for the data to be read.
> > 
> > 	Had that been in fixed-sized buffer (which is actually 4K),
> > 
> > we would've dealt with that the same way we handle non-zerocopy case.
> > However, for zerocopy it doesn't end up there, so we need to copy it
> > from those pages.
> > 
> > 	The trouble is, by the time we get around to that, the
> > 
> > references to pages in question are already dropped.  As the result,
> > p9_zc_check_errors() tries to get the data using copy_from_iter_full().
> > Unfortunately, the iov_iter it's trying to read from might *NOT* be
> > capable of that.  It is, after all, a data destination, not data source.
> > In particular, if it's an ITER_PIPE one, copy_from_iter_full() will
> > simply fail.
> > 
> > 	In ->zc_request() itself we do have those pages and dealing with
> > 
> > the problem in there would be a simple matter of memcpy_from_page()
> > into the fixed-sized buffer.  Moreover, it isn't hard to recognize
> > the (rare) case when such copying is needed.  That way we get rid of
> > p9_zc_check_errors() entirely - p9_check_errors() can be used instead
> > both for zero-copy and non-zero-copy cases.
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> I ran basic tests with this, should be ok given the code path is never
> used on normal (9p2000.L) workloads.

I haven't read this patch in detail yet, but upfront: POSIX error strings are 
like what, max. 128 bytes, no? So my expectation therefore would be that this 
patch could be further simplified.

Apart from that, I would rename handle_rerror() to something that reflects 
better what it actually does, e.g. unsparse_error() or cp_rerror_to_sdata().

> I also tried 9p2000.u for principle and ... I have no idea if this works
> but it didn't seem to blow up there at least.
> The problem is that 9p2000.u just doesn't work well even without these
> patches, so I still stand by what I said about 9p2000.u and virtio (zc
> interface): we really can (and I think should) just say virtio doesn't
> support 9p2000.u.
> (and could then further simplify this)
>
> If you're curious, 9p2000.u hangs without your patch on at least two
> different code paths (trying to read a huge buffer aborts sending a
> reply because msize is too small instead of clamping it, that one has a
> qemu warning message; but there are others ops like copyrange that just
> fail silently and I didn't investigate)

Last time I tested 9p2000.u was with the "remove msize limit" (WIP) patches:
https://lore.kernel.org/all/cover.1640870037.git.linux_oss@crudebyte.com/
Where I did not observe any issue with 9p2000.u.

What msize are we talking about, or can you tell a way to reproduce?

> I'd rather not fool someone into believing we support it when nobody has
> time to maintain it and it fails almost immediately when user requests
> some unusual IO patterns... And I definitely don't have time to even try
> fixing it.
> I'll suggest the same thing to qemu lists if we go that way.

Yeah, the situation with 9p2000.u in QEMU is similar in the sense that 
9p2000.u is barely used, little contributions, code not in good shape (e.g 
slower in many aspects in comparison to 9p2000.L), and for that reason I 
discussed with Greg to deprecate 9p2000.u in QEMU (not done yet). We are not 
aware about any serious issue with 9p2000.u though.

Best regards,
Christian Schoenebeck



  reply	other threads:[~2022-07-01 16:02 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22  4:10 [RFC][CFT][PATCHSET] iov_iter stuff Al Viro
2022-06-22  4:15 ` [PATCH 01/44] 9p: handling Rerror without copy_from_iter_full() Al Viro
2022-06-22  4:15   ` [PATCH 02/44] No need of likely/unlikely on calls of check_copy_size() Al Viro
2022-06-22  4:15   ` [PATCH 03/44] teach iomap_dio_rw() to suppress dsync Al Viro
2022-06-22  4:15   ` [PATCH 04/44] btrfs: use IOMAP_DIO_NOSYNC Al Viro
2022-06-22  4:15   ` [PATCH 05/44] struct file: use anonymous union member for rcuhead and llist Al Viro
2022-06-22  4:15   ` [PATCH 06/44] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC Al Viro
2022-06-22  4:15   ` [PATCH 07/44] keep iocb_flags() result cached in struct file Al Viro
2022-06-22  4:15   ` [PATCH 08/44] copy_page_{to,from}_iter(): switch iovec variants to generic Al Viro
2022-06-27 18:31     ` Jeff Layton
2022-06-28 12:32     ` Christian Brauner
2022-06-28 18:36       ` Al Viro
2022-06-22  4:15   ` [PATCH 09/44] new iov_iter flavour - ITER_UBUF Al Viro
2022-06-27 18:47     ` Jeff Layton
2022-06-28 18:41       ` Al Viro
2022-06-28 12:38     ` Christian Brauner
2022-06-28 18:44       ` Al Viro
2022-07-28  9:55     ` [PATCH 9/44] " Alexander Gordeev
2022-07-29 17:21       ` Al Viro
2022-07-29 21:12         ` Alexander Gordeev
2022-07-30  0:03           ` Al Viro
2022-06-22  4:15   ` [PATCH 10/44] switch new_sync_{read,write}() to ITER_UBUF Al Viro
2022-06-22  4:15   ` [PATCH 11/44] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
2022-06-27 18:48     ` Jeff Layton
2022-06-28 12:40     ` Christian Brauner
2022-06-22  4:15   ` [PATCH 12/44] fix short copy handling in copy_mc_pipe_to_iter() Al Viro
2022-06-27 19:15     ` Jeff Layton
2022-06-28 12:42     ` Christian Brauner
2022-06-22  4:15   ` [PATCH 13/44] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
2022-06-27 19:17     ` Jeff Layton
2022-06-28 12:43     ` Christian Brauner
2022-06-22  4:15   ` [PATCH 14/44] ITER_PIPE: helper for getting pipe buffer by index Al Viro
2022-06-28 10:38     ` Jeff Layton
2022-06-28 12:45     ` Christian Brauner
2022-06-22  4:15   ` [PATCH 15/44] ITER_PIPE: helpers for adding pipe buffers Al Viro
2022-06-28 11:32     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 16/44] ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives Al Viro
2022-06-22  4:15   ` [PATCH 17/44] ITER_PIPE: fold push_pipe() into __pipe_get_pages() Al Viro
2022-06-22  4:15   ` [PATCH 18/44] ITER_PIPE: lose iter_head argument of __pipe_get_pages() Al Viro
2022-06-22  4:15   ` [PATCH 19/44] ITER_PIPE: clean pipe_advance() up Al Viro
2022-06-22  4:15   ` [PATCH 20/44] ITER_PIPE: clean iov_iter_revert() Al Viro
2022-06-22  4:15   ` [PATCH 21/44] ITER_PIPE: cache the type of last buffer Al Viro
2022-06-22  4:15   ` [PATCH 22/44] ITER_PIPE: fold data_start() and pipe_space_for_user() together Al Viro
2022-06-22  4:15   ` [PATCH 23/44] iov_iter_get_pages{,_alloc}(): cap the maxsize with MAX_RW_COUNT Al Viro
2022-06-28 11:41     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 24/44] iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper Al Viro
2022-06-28 11:45     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 25/44] iov_iter_get_pages(): sanity-check arguments Al Viro
2022-06-28 11:47     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 26/44] unify pipe_get_pages() and pipe_get_pages_alloc() Al Viro
2022-06-28 11:49     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 27/44] unify xarray_get_pages() and xarray_get_pages_alloc() Al Viro
2022-06-28 11:50     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 28/44] unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts Al Viro
2022-06-28 11:54     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 29/44] ITER_XARRAY: don't open-code DIV_ROUND_UP() Al Viro
2022-06-28 11:54     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 30/44] iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() Al Viro
2022-06-28 11:56     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 31/44] iov_iter: first_{iovec,bvec}_segment() - simplify a bit Al Viro
2022-06-28 11:58     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 32/44] iov_iter: massage calling conventions for first_{iovec,bvec}_segment() Al Viro
2022-06-28 12:06     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 33/44] found_iovec_segment(): just return address Al Viro
2022-06-28 12:09     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 34/44] fold __pipe_get_pages() into pipe_get_pages() Al Viro
2022-06-28 12:11     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 35/44] iov_iter: saner helper for page array allocation Al Viro
2022-06-28 12:12     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 36/44] iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() Al Viro
2022-06-28 12:13     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 37/44] block: convert to " Al Viro
2022-06-28 12:16     ` Jeff Layton
2022-06-30 22:11     ` [block.git conflicts] " Al Viro
2022-06-30 22:39       ` Al Viro
2022-07-01  2:07         ` Keith Busch
2022-07-01 17:40           ` Al Viro
2022-07-01 17:53             ` Keith Busch
2022-07-01 18:07               ` Al Viro
2022-07-01 18:12                 ` Al Viro
2022-07-01 18:38                   ` Keith Busch
2022-07-01 19:08                     ` Al Viro
2022-07-01 19:28                       ` Keith Busch
2022-07-01 19:43                         ` Al Viro
2022-07-01 19:56                           ` Keith Busch
2022-07-02  5:35                             ` Al Viro
2022-07-02 21:02                               ` Keith Busch
2022-07-01 19:05                 ` Keith Busch
2022-07-01 21:30             ` Jens Axboe
2022-06-30 23:07       ` Jens Axboe
2022-07-10 18:04     ` Sedat Dilek
2022-06-22  4:15   ` [PATCH 38/44] iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() Al Viro
2022-06-28 12:18     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 39/44] af_alg_make_sg(): " Al Viro
2022-06-28 12:18     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 40/44] 9p: convert to advancing variant of iov_iter_get_pages_alloc() Al Viro
2022-07-01  9:01     ` Dominique Martinet
2022-07-01 13:47     ` Christian Schoenebeck
2022-07-06 22:06       ` Christian Schoenebeck
2022-06-22  4:15   ` [PATCH 41/44] ceph: switch the last caller " Al Viro
2022-06-28 12:20     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 42/44] get rid of non-advancing variants Al Viro
2022-06-28 12:21     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 43/44] pipe_get_pages(): switch to append_pipe() Al Viro
2022-06-28 12:23     ` Jeff Layton
2022-06-22  4:15   ` [PATCH 44/44] expand those iov_iter_advance() Al Viro
2022-06-28 12:23     ` Jeff Layton
2022-07-01  6:21   ` [PATCH 01/44] 9p: handling Rerror without copy_from_iter_full() Dominique Martinet
2022-07-01  6:25   ` Dominique Martinet
2022-07-01 16:02     ` Christian Schoenebeck [this message]
2022-07-01 21:00       ` Dominique Martinet
2022-07-03 13:30         ` Christian Schoenebeck
2022-08-01 12:42   ` [PATCH 09/44] new iov_iter flavour - ITER_UBUF David Howells
2022-08-01 21:14     ` Al Viro
2022-08-01 22:54     ` David Howells
2022-06-23 15:21 ` [RFC][CFT][PATCHSET] iov_iter stuff David Howells
2022-06-23 20:32   ` Al Viro
2022-06-28 12:25 ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6628265.VPEUYjqhpI@silver \
    --to=linux_oss@crudebyte.com \
    --cc=asmadeus@codewreck.org \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.