linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: linux-bcache@vger.kernel.org
Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.01.org,
	axboe@kernel.dk, jianpeng.ma@intel.com, qiaowei.ren@intel.com,
	hare@suse.com, jack@suse.cz, dan.j.williams@intel.com,
	Coly Li <colyli@suse.de>
Subject: [PATCH v7 00/16] bcache: support NVDIMM for journaling
Date: Sat, 10 Apr 2021 00:43:27 +0800	[thread overview]
Message-ID: <20210409164343.56828-1-colyli@suse.de> (raw)

This is the 7th effort for bcache to support NVDIMM for jouranling since
the first nvm-pages series was posted.

This series is combination of the v7 nvm-pages allocator developed by
Intel developers and related bcache changes from me.

The nvm-pages allocator is a buddy-like allocator, which allocates size
in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
has a new added '-M' option to format a NVDIMM namespace and register it
via sysfs interface as a bcache meta device. The nvm-pages kernel code
does a DAX mapping to map the whole namespace into system's memory
address range, and allocating the pages to requestion like typical buddy
allocator does. The major difference is nvm-pages allocator maintains
the pages allocated to each requester by a owner list which stored on
NVDIMM too. Owner list of different requester is tracked by a pre-
defined UUID, all the pages tracked in all owner lists are treated as
allocated busy pages and won't be initialized into buddy system after
the system reboot.

The bcache journal code may request a block of power-of-2 size pages
from the nvm-pages allocator, normally it is a range of 256MB or 512MB
continuous pages range. During meta data journaling, the in-memory jsets
go into the calculated nvdimm pages location by kernel memcpy routine.
So the journaling I/Os won't go into block device (e.g. SSD) anymore, 
the write and read for journal jsets happen on NVDIMM.

The nvm-pages on-NVDIMM data structures are defined as legacy in-memory
objects, because they ARE in-memory objects directly referenced by
linear addresses, both in system DRAM and NVDIMM. They are defined in
the following patch,
- bcache: add initial data structures for nvm pages

Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
nvm-pages, the related patches are,
- bcache: initialize the nvm pages allocator
- bcache: initialization of the buddy
- bcache: bch_nvm_alloc_pages() of the buddy
- bcache: bch_nvm_free_pages() of the buddy
- bcache: get allocated pages from specific owner
All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
pages allocator can be treated as user of these two drivers.

I modify the bcache code to recognize the nvm meta device feature,
initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
following patches,
- bcache: add initial data structures for nvm pages
- bcache: use bucket index to set GC_MARK_METADATA for journal buckets
  in bch_btree_gc_finish()
- bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
- bcache: initialize bcache journal for NVDIMM meta device
- bcache: support storing bcache journal into NVDIMM meta device
- bcache: read jset from NVDIMM pages for journal replay
- bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
  meta device

Also during the code integration and testing, there are some issues are
fixed by the following patches,
- bcache: nvm-pages fixes for bcache integration testing
- bcache: use div_u64() in init_owner_info()
- bcache: fix BCACHE_NVM_PAGES' dependences in Kconfig
- bcache: more fix for compiling error when BCACHE_NVM_PAGES disabled
The above patches can be added or merged into nvm-pages code, so that
they can be dropped in next version of this series.

Current series works as expected, of course it is not perfect but the
state is fine as a code base for further improvement. For example the
power failure tolerance for nvm-pages owner list operations, more error
handling for journal code, and moving the B+ tree node I/Os into NVDIMM.

All the code is EXPERIMENTAL, they won't be enabled by default until we
feel the NVDIMM support is completed and stable.

Any comments and suggestion is warmly welcome :-)

Thank you in advance.

Coly Li

---
Changelog:
v7: Refine nvm-pages allocator code to operate owner list directly in
    dax mapped NVDIMM pages, and remove the meta data copy from DRAM.
v6: The series submitted but not merged in Linux 5.12 merge window.
v1-v5: RFC patches of bcache nvm-pages.


Coly Li (11):
  bcache: add initial data structures for nvm pages
  bcache: nvm-pages fixes for bcache integration testing
  bcache: use bucket index to set GC_MARK_METADATA for journal buckets
    in bch_btree_gc_finish()
  bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
  bcache: initialize bcache journal for NVDIMM meta device
  bcache: support storing bcache journal into NVDIMM meta device
  bcache: read jset from NVDIMM pages for journal replay
  bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
    meta device
  bcache: use div_u64() in init_owner_info()
  bcache: fix BCACHE_NVM_PAGES' dependences in Kconfig
  bcache: more fix for compiling error when BCACHE_NVM_PAGES disabled

Jianpeng Ma (5):
  bcache: initialize the nvm pages allocator
  bcache: initialization of the buddy
  bcache: bch_nvm_alloc_pages() of the buddy
  bcache: bch_nvm_free_pages() of the buddy
  bcache: get allocated pages from specific owner

 drivers/md/bcache/Kconfig       |   9 +
 drivers/md/bcache/Makefile      |   2 +-
 drivers/md/bcache/btree.c       |   6 +-
 drivers/md/bcache/features.h    |   9 +
 drivers/md/bcache/journal.c     | 317 +++++++++++---
 drivers/md/bcache/journal.h     |   2 +-
 drivers/md/bcache/nvm-pages.c   | 744 ++++++++++++++++++++++++++++++++
 drivers/md/bcache/nvm-pages.h   |  95 ++++
 drivers/md/bcache/super.c       |  73 +++-
 include/uapi/linux/bcache-nvm.h | 208 +++++++++
 10 files changed, 1392 insertions(+), 73 deletions(-)
 create mode 100644 drivers/md/bcache/nvm-pages.c
 create mode 100644 drivers/md/bcache/nvm-pages.h
 create mode 100644 include/uapi/linux/bcache-nvm.h

-- 
2.26.2


             reply	other threads:[~2021-04-09 16:43 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 16:43 Coly Li [this message]
2021-04-09 16:43 ` [PATCH v7 01/16] bcache: add initial data structures for nvm pages Coly Li
2021-04-09 16:43 ` [PATCH v7 02/16] bcache: initialize the nvm pages allocator Coly Li
2021-04-09 16:43 ` [PATCH v7 03/16] bcache: initialization of the buddy Coly Li
2021-04-09 16:43 ` [PATCH v7 04/16] bcache: bch_nvm_alloc_pages() " Coly Li
2021-04-09 16:43 ` [PATCH v7 05/16] bcache: bch_nvm_free_pages() " Coly Li
2021-04-09 16:43 ` [PATCH v7 06/16] bcache: get allocated pages from specific owner Coly Li
2021-04-09 16:43 ` [PATCH v7 07/16] bcache: nvm-pages fixes for bcache integration testing Coly Li
2021-04-09 16:43 ` [PATCH v7 08/16] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
2021-04-09 16:43 ` [PATCH v7 09/16] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
2021-04-09 16:43 ` [PATCH v7 10/16] bcache: initialize bcache journal for NVDIMM meta device Coly Li
2021-04-09 16:43 ` [PATCH v7 11/16] bcache: support storing bcache journal into " Coly Li
2021-04-09 16:43 ` [PATCH v7 12/16] bcache: read jset from NVDIMM pages for journal replay Coly Li
2021-04-09 16:43 ` [PATCH v7 13/16] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
2021-04-09 16:43 ` [PATCH v7 14/16] bcache: use div_u64() in init_owner_info() Coly Li
2021-04-09 16:43 ` [PATCH v7 15/16] bcache: fix BCACHE_NVM_PAGES' dependences in Kconfig Coly Li
2021-04-09 16:43 ` [PATCH v7 16/16] bcache: more fix for compiling error when BCACHE_NVM_PAGES disabled Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210409164343.56828-1-colyli@suse.de \
    --to=colyli@suse.de \
    --cc=axboe@kernel.dk \
    --cc=dan.j.williams@intel.com \
    --cc=hare@suse.com \
    --cc=jack@suse.cz \
    --cc=jianpeng.ma@intel.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=qiaowei.ren@intel.com \
    --subject='Re: [PATCH v7 00/16] bcache: support NVDIMM for journaling' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).