LKML Archive on lore.kernel.org
 help / Atom feed
From: Nikos Tsironis <ntsironis@arrikto.com>
To: snitzer@redhat.com, agk@redhat.com, dm-devel@redhat.com
Cc: mpatocka@redhat.com, paulmck@linux.ibm.com, hch@infradead.org,
	iliastsi@arrikto.com, linux-kernel@vger.kernel.org
Subject: [PATCH v3 0/6] dm snapshot: Improve performance using a more fine-grained locking scheme
Date: Sun, 17 Mar 2019 14:22:52 +0200
Message-ID: <20190317122258.21760-1-ntsironis@arrikto.com> (raw)

dm-snapshot uses a single mutex to serialize every access to the
snapshot state, including accesses to the exception hash tables. This
mutex is a bottleneck preventing dm-snapshot to scale as the number of
threads doing IO increases.

The major contention points are __origin_write()/snapshot_map() and
pending_complete(), i.e., the submission and completion of pending
exceptions.

This patchset substitutes the single mutex with:

  * A read-write semaphore, which protects the mostly read fields of the
    snapshot structure.

  * Per-bucket bit spinlocks, that protect accesses to the exception
    hash tables.

fio benchmarks using the null_blk device show significant performance
improvements as the number of worker processes increases. Write latency
is almost halved and write IOPS are nearly doubled.

The relevant patch provides detailed benchmark results.

A summary of the patchset follows:

  1. The first patch removes an unnecessary use of WRITE_ONCE() in
     hlist_add_behind().

  2. The second patch adds two helper functions to linux/list_bl.h,
     which is used to implement the per-bucket bit spinlocks in
     dm-snapshot.

  3. The third patch removes the need to sleep holding the snapshot lock
     in pending_complete(), thus allowing us to replace the mutex with
     the per-bucket bit spinlocks.

  4. Patches 4, 5 and 6 change the locking scheme, as described
     previously.

Changes in v3:
  - Don't use WRITE_ONCE() in hlist_bl_add_behind(), as it's not needed.
  - Fix hlist_add_behind() to also not use WRITE_ONCE().
  - Use uintptr_t instead of unsigned long in hlist_bl_add_before().

v2: https://www.redhat.com/archives/dm-devel/2019-March/msg00007.html

Changes in v2:
  - Split third patch of v1 into three patches: 3/5, 4/5, 5/5.

v1: https://www.redhat.com/archives/dm-devel/2018-December/msg00161.html

Nikos Tsironis (6):
  list: Don't use WRITE_ONCE() in hlist_add_behind()
  list_bl: Add hlist_bl_add_before/behind helpers
  dm snapshot: Don't sleep holding the snapshot lock
  dm snapshot: Replace mutex with rw semaphore
  dm snapshot: Make exception tables scalable
  dm snapshot: Use fine-grained locking scheme

 drivers/md/dm-exception-store.h |   3 +-
 drivers/md/dm-snap.c            | 359 +++++++++++++++++++++++++++-------------
 include/linux/list.h            |   2 +-
 include/linux/list_bl.h         |  26 +++
 4 files changed, 269 insertions(+), 121 deletions(-)

-- 
2.11.0


             reply index

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-17 12:22 Nikos Tsironis [this message]
2019-03-17 12:22 ` [PATCH v3 1/6] list: Don't use WRITE_ONCE() in hlist_add_behind() Nikos Tsironis
2019-03-18 15:39   ` Paul E. McKenney
2019-03-17 12:22 ` [PATCH v3 2/6] list_bl: Add hlist_bl_add_before/behind helpers Nikos Tsironis
2019-03-18 15:41   ` Paul E. McKenney
2019-03-20 17:44     ` Nikos Tsironis
2019-03-17 12:22 ` [PATCH v3 3/6] dm snapshot: Don't sleep holding the snapshot lock Nikos Tsironis
2019-03-17 12:22 ` [PATCH v3 4/6] dm snapshot: Replace mutex with rw semaphore Nikos Tsironis
2019-03-17 12:22 ` [PATCH v3 5/6] dm snapshot: Make exception tables scalable Nikos Tsironis
2019-03-17 12:22 ` [PATCH v3 6/6] dm snapshot: Use fine-grained locking scheme Nikos Tsironis
2019-03-20 19:06 ` [PATCH v3 0/6] dm snapshot: Improve performance using a more " Mikulas Patocka

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190317122258.21760-1-ntsironis@arrikto.com \
    --to=ntsironis@arrikto.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=iliastsi@arrikto.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=paulmck@linux.ibm.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox