Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
From: Nabeel M Mohamed <nmeeramohide@micron.com>
To: <linux-kernel@vger.kernel.org>, <linux-block@vger.kernel.org>,
	<linux-nvme@lists.infradead.org>, <linux-mm@kvack.org>,
	<linux-nvdimm@lists.01.org>
Cc: <smoyer@micron.com>, <gbecker@micron.com>, <plabat@micron.com>,
	<jgroves@micron.com>, Nabeel M Mohamed <nmeeramohide@micron.com>
Subject: [PATCH v2 00/22] add Object Storage Media Pool (mpool)
Date: Mon, 12 Oct 2020 11:27:14 -0500
Message-ID: <20201012162736.65241-1-nmeeramohide@micron.com> (raw)

This patch series introduces the mpool object storage media pool driver.
Mpool implements a simple transactional object store on top of block
storage devices.

Mpool was developed for the Heterogeneous-Memory Storage Engine (HSE)
project, which is a high-performance key-value storage engine designed
for SSDs. HSE stores its data exclusively in mpool.

Mpool is readily applicable to other storage systems built on immutable
objects. For example, the many databases that store records in
immutable SSTables organized as an LSM-tree or similar data structure.

We developed mpool for HSE storage, versus using a file system or raw
block device, for several reasons.

A primary motivator was the need for a storage model that maps naturally
to conventional block storage devices, as well as to emerging device
interfaces we plan to support in the future, such as
* NVMe Zoned Namespaces (ZNS)
* NVMe Streams
* Persistent memory accessed via CXL or similar technologies

Another motivator was the need for a storage model that readily supports
multiple classes of storage devices or media in a single storage pool,
such as
* QLC SSDs for storing the bulk of objects, and
* 3DXP SSDs or persistent memory for storing objects requiring
  low-latency access

The mpool object storage model meets these needs. It also provides
other features that benefit storage systems built on immutable objects,
including
* Facilities to memory-map a specified collection of objects into a
  linear address space
* Concurrent access to object data directly and memory-mapped to greatly
  reduce page cache pollution from background operations such as
  LSM-tree compaction
* Proactive eviction of object data from the page cache, based on
  object-level metrics, to avoid excessive memory pressure and its
  associated performance impacts
* High concurrency and short code paths for efficient access to
  low-latency storage devices

HSE takes advantage of all these mpool features to achieve high
throughput with low tail-latencies.

Mpool is implemented as a character device driver where
* /dev/mpoolctl is the control file (minor number 0) supporting mpool
  management ioctls
* /dev/mpool/<mpool-name> are mpool files (minor numbers >0), one per
  mpool, supporting object management ioctls

CLI/UAPI access to /dev/mpoolctl and /dev/mpool/<mpool-name> are
controlled by their UID, GID, and mode bits. To provide a familiar look
and feel, the mpool management model and CLI are intentionally aligned
to those of LVM to the degree practical.

An mpool is created with a block storage device specified for its
required capacity media class, and optionally a second block storage
device specified for its staging media class. We recommend virtual
block devices (such as LVM logical volumes) to aggregate the performance
and capacity of multiple physical block devices, to enable sharing of
physical block devices between mpools (or for other uses), and to
support extending the size of a block device used for an mpool media
class. The libblkid library recognizes mpool formatted block devices as
of util-linux v2.32.

Mpool implements a transactional object store with two simple object
abstractions: mblocks and mlogs.

Mblock objects are containers comprising a linear sequence of bytes that
can be written exactly once, are immutable after writing, and can be
read in whole or in part as needed until deleted. Mblocks in a media
class are currently fixed size, which is configured when an mpool is
created, though the amount of data written to mblocks will differ.

Mlog objects are containers for record logging. Records of arbitrary
size can be appended to an mlog until it is full. Once full, an mlog
must be erased before additional records can be appended. Mlog records
can be read sequentially from the beginning at any time. Mlogs in a
media class are always a multiple of the mblock size for that media
class.

Mblock and mlog writes avoid the page cache. Mblocks are written,
committed, and made immutable before they can be read either directly
(avoiding the page cache) or mmaped. Mlogs are always read and updated
directly (avoiding the page cache) and cannot be mmaped.

Mpool also provides the metadata container (MDC) APIs that clients can
use to simplify storing and maintaining metadata. These MDC APIs are
helper functions built on a pair of mlogs per MDC.

The mpool Wiki contains full details on the
* Management model in the "Configure mpools" section
* Object model in the "Develop mpool Applications" section
* Kernel module architecture in the "Explore mpool Internals" section,
  which provides context for reviewing this patch series

See https://github.com/hse-project/mpool/wiki

The mpool UAPI and kernel module (not the patchset) are available on
GitHub at:

https://github.com/hse-project/mpool

https://github.com/hse-project/mpool-kmod

The HSE key-value storage engine is available on GitHub at:

https://github.com/hse-project/hse

Changes in v2:

- Fixes build errors/warnings reported by bot on ARCH=m68k
Reported-by: kernel test robot <lkp@intel.com>

- Addresses review comments from Randy and Hillf:
  * Updates ioctl-number.rst file with mpool driver's ioctl code
  * Fixes issues in the usage of printk_timed_ratelimit()

- Fixes a readahead issue found by internal testing

Nabeel M Mohamed (22):
  mpool: add utility routines and ioctl definitions
  mpool: add in-memory struct definitions
  mpool: add on-media struct definitions
  mpool: add pool drive component which handles mpool IO using the block
    layer API
  mpool: add space map component which manages free space on mpool
    devices
  mpool: add on-media pack, unpack and upgrade routines
  mpool: add superblock management routines
  mpool: add pool metadata routines to manage object lifecycle and IO
  mpool: add mblock lifecycle management and IO routines
  mpool: add mlog IO utility routines
  mpool: add mlog lifecycle management and IO routines
  mpool: add metadata container or mlog-pair framework
  mpool: add utility routines for mpool lifecycle management
  mpool: add pool metadata routines to create persistent mpools
  mpool: add mpool lifecycle management routines
  mpool: add mpool control plane utility routines
  mpool: add mpool lifecycle management ioctls
  mpool: add object lifecycle management ioctls
  mpool: add support to mmap arbitrary collection of mblocks
  mpool: add support to proactively evict cached mblock data from the
    page-cache
  mpool: add documentation
  mpool: add Kconfig and Makefile

 .../userspace-api/ioctl/ioctl-number.rst      |    3 +-
 drivers/Kconfig                               |    2 +
 drivers/Makefile                              |    1 +
 drivers/mpool/Kconfig                         |   28 +
 drivers/mpool/Makefile                        |   11 +
 drivers/mpool/assert.h                        |   25 +
 drivers/mpool/init.c                          |  126 +
 drivers/mpool/init.h                          |   17 +
 drivers/mpool/mblock.c                        |  432 +++
 drivers/mpool/mblock.h                        |  161 +
 drivers/mpool/mcache.c                        | 1072 +++++++
 drivers/mpool/mcache.h                        |  104 +
 drivers/mpool/mclass.c                        |  103 +
 drivers/mpool/mclass.h                        |  137 +
 drivers/mpool/mdc.c                           |  486 +++
 drivers/mpool/mdc.h                           |  106 +
 drivers/mpool/mlog.c                          | 1667 ++++++++++
 drivers/mpool/mlog.h                          |  212 ++
 drivers/mpool/mlog_utils.c                    | 1352 ++++++++
 drivers/mpool/mlog_utils.h                    |   63 +
 drivers/mpool/mp.c                            | 1086 +++++++
 drivers/mpool/mp.h                            |  231 ++
 drivers/mpool/mpcore.c                        |  987 ++++++
 drivers/mpool/mpcore.h                        |  354 +++
 drivers/mpool/mpctl.c                         | 2747 +++++++++++++++++
 drivers/mpool/mpctl.h                         |   58 +
 drivers/mpool/mpool-locking.rst               |   90 +
 drivers/mpool/mpool_ioctl.h                   |  636 ++++
 drivers/mpool/mpool_printk.h                  |   43 +
 drivers/mpool/omf.c                           | 1316 ++++++++
 drivers/mpool/omf.h                           |  593 ++++
 drivers/mpool/omf_if.h                        |  381 +++
 drivers/mpool/params.h                        |  116 +
 drivers/mpool/pd.c                            |  424 +++
 drivers/mpool/pd.h                            |  202 ++
 drivers/mpool/pmd.c                           | 2046 ++++++++++++
 drivers/mpool/pmd.h                           |  379 +++
 drivers/mpool/pmd_obj.c                       | 1569 ++++++++++
 drivers/mpool/pmd_obj.h                       |  499 +++
 drivers/mpool/reaper.c                        |  686 ++++
 drivers/mpool/reaper.h                        |   71 +
 drivers/mpool/sb.c                            |  625 ++++
 drivers/mpool/sb.h                            |  162 +
 drivers/mpool/smap.c                          | 1031 +++++++
 drivers/mpool/smap.h                          |  334 ++
 drivers/mpool/sysfs.c                         |   48 +
 drivers/mpool/sysfs.h                         |   48 +
 drivers/mpool/upgrade.c                       |  138 +
 drivers/mpool/upgrade.h                       |  128 +
 drivers/mpool/uuid.h                          |   59 +
 50 files changed, 23194 insertions(+), 1 deletion(-)
 create mode 100644 drivers/mpool/Kconfig
 create mode 100644 drivers/mpool/Makefile
 create mode 100644 drivers/mpool/assert.h
 create mode 100644 drivers/mpool/init.c
 create mode 100644 drivers/mpool/init.h
 create mode 100644 drivers/mpool/mblock.c
 create mode 100644 drivers/mpool/mblock.h
 create mode 100644 drivers/mpool/mcache.c
 create mode 100644 drivers/mpool/mcache.h
 create mode 100644 drivers/mpool/mclass.c
 create mode 100644 drivers/mpool/mclass.h
 create mode 100644 drivers/mpool/mdc.c
 create mode 100644 drivers/mpool/mdc.h
 create mode 100644 drivers/mpool/mlog.c
 create mode 100644 drivers/mpool/mlog.h
 create mode 100644 drivers/mpool/mlog_utils.c
 create mode 100644 drivers/mpool/mlog_utils.h
 create mode 100644 drivers/mpool/mp.c
 create mode 100644 drivers/mpool/mp.h
 create mode 100644 drivers/mpool/mpcore.c
 create mode 100644 drivers/mpool/mpcore.h
 create mode 100644 drivers/mpool/mpctl.c
 create mode 100644 drivers/mpool/mpctl.h
 create mode 100644 drivers/mpool/mpool-locking.rst
 create mode 100644 drivers/mpool/mpool_ioctl.h
 create mode 100644 drivers/mpool/mpool_printk.h
 create mode 100644 drivers/mpool/omf.c
 create mode 100644 drivers/mpool/omf.h
 create mode 100644 drivers/mpool/omf_if.h
 create mode 100644 drivers/mpool/params.h
 create mode 100644 drivers/mpool/pd.c
 create mode 100644 drivers/mpool/pd.h
 create mode 100644 drivers/mpool/pmd.c
 create mode 100644 drivers/mpool/pmd.h
 create mode 100644 drivers/mpool/pmd_obj.c
 create mode 100644 drivers/mpool/pmd_obj.h
 create mode 100644 drivers/mpool/reaper.c
 create mode 100644 drivers/mpool/reaper.h
 create mode 100644 drivers/mpool/sb.c
 create mode 100644 drivers/mpool/sb.h
 create mode 100644 drivers/mpool/smap.c
 create mode 100644 drivers/mpool/smap.h
 create mode 100644 drivers/mpool/sysfs.c
 create mode 100644 drivers/mpool/sysfs.h
 create mode 100644 drivers/mpool/upgrade.c
 create mode 100644 drivers/mpool/upgrade.h
 create mode 100644 drivers/mpool/uuid.h

-- 
2.17.2


             reply index

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-12 16:27 Nabeel M Mohamed [this message]
2020-10-12 16:27 ` [PATCH v2 01/22] mpool: add utility routines and ioctl definitions Nabeel M Mohamed
2020-10-12 16:45   ` Randy Dunlap
2020-10-12 16:48     ` Randy Dunlap
2020-10-12 16:27 ` [PATCH v2 02/22] mpool: add in-memory struct definitions Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 03/22] mpool: add on-media " Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 04/22] mpool: add pool drive component which handles mpool IO using the block layer API Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 05/22] mpool: add space map component which manages free space on mpool devices Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 06/22] mpool: add on-media pack, unpack and upgrade routines Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 07/22] mpool: add superblock management routines Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 08/22] mpool: add pool metadata routines to manage object lifecycle and IO Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 09/22] mpool: add mblock lifecycle management and IO routines Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 10/22] mpool: add mlog IO utility routines Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 11/22] mpool: add mlog lifecycle management and IO routines Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 12/22] mpool: add metadata container or mlog-pair framework Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 13/22] mpool: add utility routines for mpool lifecycle management Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 14/22] mpool: add pool metadata routines to create persistent mpools Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 15/22] mpool: add mpool lifecycle management routines Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 16/22] mpool: add mpool control plane utility routines Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 17/22] mpool: add mpool lifecycle management ioctls Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 18/22] mpool: add object " Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 19/22] mpool: add support to mmap arbitrary collection of mblocks Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 20/22] mpool: add support to proactively evict cached mblock data from the page-cache Nabeel M Mohamed
2020-10-12 16:27 ` [PATCH v2 21/22] mpool: add documentation Nabeel M Mohamed
2020-10-12 16:53   ` Randy Dunlap
2020-10-12 16:27 ` [PATCH v2 22/22] mpool: add Kconfig and Makefile Nabeel M Mohamed
2020-10-15  8:02 ` [PATCH v2 00/22] add Object Storage Media Pool (mpool) Christoph Hellwig
2020-10-16 21:58   ` [EXT] " Nabeel Meeramohideen Mohamed (nmeeramohide)
2020-10-16 22:11     ` Dan Williams
2020-10-19 22:30       ` Nabeel Meeramohideen Mohamed (nmeeramohide)
2020-10-20 21:35         ` Dan Williams
2020-10-21 17:10           ` Nabeel Meeramohideen Mohamed (nmeeramohide)
2020-10-21 17:48             ` Dan Williams
2020-10-21 14:24       ` Mike Snitzer
2020-10-21 16:24         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201012162736.65241-1-nmeeramohide@micron.com \
    --to=nmeeramohide@micron.com \
    --cc=gbecker@micron.com \
    --cc=jgroves@micron.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=plabat@micron.com \
    --cc=smoyer@micron.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git