linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Henriques <lhenriques@suse.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Gregory Farnum <gfarnum@redhat.com>, Sage Weil <sage@redhat.com>,
	Ilya Dryomov <idryomov@gmail.com>, "Yan, Zheng" <zyan@redhat.com>,
	ceph-devel <ceph-devel@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH v4] ceph: use 'copy-from2' operation in copy_file_range
Date: Tue, 14 Jan 2020 15:09:02 +0000	[thread overview]
Message-ID: <20200114150902.GA28556@brahms.Home> (raw)
In-Reply-To: <46c92e6678906fa065b18e418044647e7cdb47e1.camel@kernel.org>

On Tue, Jan 14, 2020 at 06:57:56AM -0500, Jeff Layton wrote:
> On Tue, 2020-01-14 at 09:55 +0000, Luis Henriques wrote:
> > On Mon, Jan 13, 2020 at 09:10:01AM -0800, Gregory Farnum wrote:
> > > On Thu, Jan 9, 2020 at 5:06 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > > On Wed, 2020-01-08 at 10:03 +0000, Luis Henriques wrote:
> > > > > Instead of using the 'copy-from' operation, switch copy_file_range to the
> > > > > new 'copy-from2' operation, which allows to send the truncate_seq and
> > > > > truncate_size parameters.
> > > > > 
> > > > > If an OSD does not support the 'copy-from2' operation it will return
> > > > > -EOPNOTSUPP.  In that case, the kernel client will stop trying to do
> > > > > remote object copies for this fs client and will always use the generic
> > > > > VFS copy_file_range.
> > > > > 
> > > > > Signed-off-by: Luis Henriques <lhenriques@suse.com>
> > > > > ---
> > > > > Hi Jeff,
> > > > > 
> > > > > This is a follow-up to the discussion in [1].  Since PR [2] has been
> > > > > merged, it's now time to change the kernel client to use the new
> > > > > 'copy-from2'.  And that's what this patch does.
> > > > > 
> > > > > [1] https://lore.kernel.org/lkml/20191118120935.7013-1-lhenriques@suse.com/
> > > > > [2] https://github.com/ceph/ceph/pull/31728
> > > > > 
> > > > >  fs/ceph/file.c                  | 13 ++++++++++++-
> > > > >  fs/ceph/super.c                 |  1 +
> > > > >  fs/ceph/super.h                 |  3 +++
> > > > >  include/linux/ceph/osd_client.h |  1 +
> > > > >  include/linux/ceph/rados.h      |  2 ++
> > > > >  net/ceph/osd_client.c           | 18 ++++++++++++------
> > > > >  6 files changed, 31 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > > > index 11929d2bb594..1e6cdf2dfe90 100644
> > > > > --- a/fs/ceph/file.c
> > > > > +++ b/fs/ceph/file.c
> > > > > @@ -1974,6 +1974,10 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> > > > >       if (ceph_test_mount_opt(src_fsc, NOCOPYFROM))
> > > > >               return -EOPNOTSUPP;
> > > > > 
> > > > > +     /* Do the OSDs support the 'copy-from2' operation? */
> > > > > +     if (!src_fsc->have_copy_from2)
> > > > > +             return -EOPNOTSUPP;
> > > > > +
> > > > >       /*
> > > > >        * Striped file layouts require that we copy partial objects, but the
> > > > >        * OSD copy-from operation only supports full-object copies.  Limit
> > > > > @@ -2101,8 +2105,15 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
> > > > >                       CEPH_OSD_OP_FLAG_FADVISE_NOCACHE,
> > > > >                       &dst_oid, &dst_oloc,
> > > > >                       CEPH_OSD_OP_FLAG_FADVISE_SEQUENTIAL |
> > > > > -                     CEPH_OSD_OP_FLAG_FADVISE_DONTNEED, 0);
> > > > > +                     CEPH_OSD_OP_FLAG_FADVISE_DONTNEED,
> > > > > +                     dst_ci->i_truncate_seq, dst_ci->i_truncate_size,
> > > > > +                     CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ);
> > > > >               if (err) {
> > > > > +                     if (err == -EOPNOTSUPP) {
> > > > > +                             src_fsc->have_copy_from2 = false;
> > > > > +                             pr_notice("OSDs don't support 'copy-from2'; "
> > > > > +                                       "disabling copy_file_range\n");
> > > > > +                     }
> > > > >                       dout("ceph_osdc_copy_from returned %d\n", err);
> > > > >                       if (!ret)
> > > > >                               ret = err;
> > > > 
> > > > The patch itself looks fine to me. I'll not merge yet, since you sent it
> > > > as an RFC, but I don't have any objection to it at first glance. The
> > > > only other comment I'd make is that you should probably split this into
> > > > two patches -- one for the libceph changes and one for cephfs.
> > > > 
> > > > On a related note, I wonder if we'd get better performance out of large
> > > > copy_file_range calls here if you were to move the wait for all of these
> > > > osd requests after issuing them all in parallel?
> > > > 
> > > > Currently we're doing:
> > > > 
> > > > copy_from
> > > > wait
> > > > copy_from
> > > > wait
> > > > 
> > > > ...but figure that the second copy_from might very well be between osds
> > > > that are not involved in the first copy. There's no reason to do them
> > > > sequentially. It'd be better to issue all of the OSD requests first, and
> > > > then wait on all of the replies in turn:
> > > 
> > > If this is added (good idea in general) it should be throttled — we
> > > don’t want users accidentally trying to copy a 1TB file and setting
> > > off 250000 simultaneous copy_from2 requests!
> > 
> > Good point, thanks for the input Greg.  I'll have this in consideration.
> > That'll probably require another kernel module knob for setting this
> > throttling value.
> > 
> > 
> 
> Yes, we probably do need some sort of limit here. It'd be nice to avoid
> adding new knobs for it though. Maybe we could make this value some
> multiple of min(rsize,wsize) ?

Yeah, that's probably a reasonable approach.  I'll see what I can come up
with, once I get to it.  First, /me wants to find out if this will
_really_ improve performance.

Cheers,
--
Luís

      reply	other threads:[~2020-01-14 15:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-08 10:03 [RFC PATCH v4] ceph: use 'copy-from2' operation in copy_file_range Luis Henriques
2020-01-09 13:06 ` Jeff Layton
2020-01-09 14:30   ` Luis Henriques
2020-01-09 14:53     ` Ilya Dryomov
2020-01-09 16:05       ` Jeff Layton
2020-01-09 16:28         ` Luis Henriques
2020-01-13 17:10   ` Gregory Farnum
2020-01-14  9:55     ` Luis Henriques
2020-01-14 11:57       ` Jeff Layton
2020-01-14 15:09         ` Luis Henriques [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200114150902.GA28556@brahms.Home \
    --to=lhenriques@suse.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sage@redhat.com \
    --cc=zyan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).