linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Henriques <lhenriques@suse.com>
To: Jeff Layton <jlayton@kernel.org>, Sage Weil <sage@redhat.com>,
	Ilya Dryomov <idryomov@gmail.com>, "Yan, Zheng" <zyan@redhat.com>,
	Gregory Farnum <gfarnum@redhat.com>
Cc: ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Luis Henriques <lhenriques@suse.com>
Subject: [PATCH v3 0/1] parallel 'copy-from' Ops in copy_file_range
Date: Mon,  3 Feb 2020 16:51:16 +0000	[thread overview]
Message-ID: <20200203165117.5701-1-lhenriques@suse.com> (raw)

Hi,

Here's another re-spin of the patch to improve copy_file_range performance
by parallelizing the OSD requests.

Changelog since v2:

- Refactor copy loop to ensure 'ret' is updated with the number of bytes
  already copied only after the ceph_osdc_wait_requests() returns.  This
  meant to change this helper function to also return the number of
  successful requests.  Also fixed error handling.

- Use counter for # copies instead of re-calculating ncopies (now
  copy_count) everytime time it reaches zero.

- Dropped 'dout' in ceph_osdc_wait_requests

- Have ceph_osdc_wait_requests and ceph_osdc_wait_request functions close

Changelog since v1 (the RFC):

- Dropped synchronous version of ceph_osdc_copy_from
  * This was the reason for merging the patchset into a single patch, as
    changing ceph_osdc_copy_from definition would break bisectability

- Moved wait_copy_from_reqs into libceph, renaming it to
  ceph_osdc_wait_requests

- Use ->r_private_item instead of adding a new list_head into struct
  ceph_osd_request

- Dropped the module parameter (used for testing) and added the throttling
  mechanism using the formula suggested by Ilya.

For reference, here's the original RFC cover letter (fixed typo in results
table: 'throttle=5' should be 'throttle=0').

--------------------------------------------------------------------------

As discussed here[1] I'm sending an RFC patchset that does the
parallelization of the requests sent to the OSDs during a copy_file_range
syscall in CephFS.

  [1] https://lore.kernel.org/lkml/20200108100353.23770-1-lhenriques@suse.com/

I've also some performance numbers that I wanted to share. Here's a
description of the very simple tests I've run:

 - create a file with 200 objects in it
   * i.e. tests with different object sizes mean different file sizes
 - drop all caches and umount the filesystem
 - Measure:
   * mount filesystem
   * full file copy (with copy_file_range)
   * umount filesystem

Tests were repeated several times and the average value was used for
comparison.

  DISCLAIMER:
  These numbers are only indicative, and different clusters and client
  configs will for sure show different performance!  More rigorous tests
  would be require to validate these results.

Having as baseline a full read+write (basically, a copy_file_range
operation within a filesystem mounted without the 'copyfrom' option),
here's some values for different object sizes:

			  8M	  4M	  1M	  65k
read+write		100%	100%	100%	 100%
sequential		 51%	 52%	 83%	>100%
parallel (throttle=1)	 51%	 52%	 83%	>100%
parallel (throttle=0)	 17%	 17%	 83%	>100%

Notes:

- 'parallel (throttle=0)' was a test where *all* the requests (i.e. 200
  requests to copy the 200 objects in the file) were sent to the OSDs and
  the wait for requests completion is done at the end only.

- 'parallel (throttle=1)' was just a control test, where the wait for
  completion is done immediately after a request is sent.  It was expected
  to be very similar to the non-optimized ('sequential') tests.

- These tests were executed on a cluster with 40 OSDs, spread across 5
  (bare-metal) nodes.

- The tests with object size of 65k show that copy_file_range definitely
  doesn't scale to files with small object sizes.  '> 100%' actually means
  more than 10x slower.

Measuring the mount+copy+umount masks the actual difference between
different throttle values due to the time spent in mount+umount.  Thus,
there was no real difference between throttle=0 (send all and wait) and
throttle=20 (send 20, wait, send 20, ...).  But here's what I observed
when measuring only the copy operation (4M object size):

read+write		100%
parallel (throttle=1)	 56%
parallel (throttle=5)	 23%
parallel (throttle=10)	 14%
parallel (throttle=20)	  9%
parallel (throttle=0)	  5%

Anyway, I'll still need to revisit patch 0003 as it doesn't follow the
suggestion done by Jeff to *not* add another knob to fine-tune the
throttle value -- this patch adds a kernel parameter for a knob that I
wanted to use in my testing to observe different values of this throttle
limit.

The goal is to probably to drop this patch and do the throttling in patch
0002.  I just need to come up with a decent heuristic.  Jeff's suggestion
was to use rsize/wsize, which are set to 64M by default IIRC.  Somehow I
feel that it should be related to the number of OSDs in the cluster
instead, but I'm not sure how.  And testing these sort of heuristics would
require different clusters, which isn't particularly easy to get.  Anyway,
comments are welcome!

Cheers,
--
Luis

Luis Henriques (1):
  ceph: parallelize all copy-from requests in copy_file_range

 fs/ceph/file.c                  | 45 +++++++++++++++++++++-----
 include/linux/ceph/osd_client.h |  6 +++-
 net/ceph/osd_client.c           | 56 +++++++++++++++++++++++++--------
 3 files changed, 85 insertions(+), 22 deletions(-)


             reply	other threads:[~2020-02-03 16:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-03 16:51 Luis Henriques [this message]
2020-02-03 16:51 ` [PATCH v3 1/1] ceph: parallelize all copy-from requests in copy_file_range Luis Henriques
2020-02-04 10:56   ` Ilya Dryomov
2020-02-04 15:11     ` Luis Henriques
2020-02-04 18:06       ` Ilya Dryomov
2020-02-05 11:00         ` Luis Henriques
2020-02-05 12:01           ` Ilya Dryomov
2020-02-05 13:12             ` Jeff Layton
2020-02-05 19:41               ` Luis Henriques
2020-02-04 13:30   ` Jeff Layton
2020-02-04 15:30     ` Luis Henriques
2020-02-04 19:42       ` Jeff Layton
2020-02-04 20:30         ` Gregory Farnum
2020-02-04 21:39         ` Ilya Dryomov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200203165117.5701-1-lhenriques@suse.com \
    --to=lhenriques@suse.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sage@redhat.com \
    --cc=zyan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).