Re: [LSF/MM TOPIC] atomic block device

From: Andy Rudoff <andy@rudoff.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: lsf-pc <lsf-pc@lists.linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	jmoyer <jmoyer@redhat.com>, david <david@fromorbit.com>,
	Chris Mason <clm@fb.com>, Jens Axboe <axboe@kernel.dk>,
	Bryan E Veal <bryan.e.veal@intel.com>,
	Annie Foong <annie.foong@intel.com>
Subject: Re: [LSF/MM TOPIC] atomic block device
Date: Sat, 15 Feb 2014 10:55:34 -0700	[thread overview]
Message-ID: <CABBL8ELycRzfyDGtKWk1nFySh9-a5Rh5uZXdgGEwMYHxCQzO3Q@mail.gmail.com> (raw)
In-Reply-To: <CAA9_cmf7Y1TL8XqR7dYUn=Pv-En2e0X0FM0zdpkiBkUuNBGKfQ@mail.gmail.com>

On Sat, Feb 15, 2014 at 8:04 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> In response to Dave's call [1] and highlighting Jeff's attend request
> [2] I'd like to stoke a discussion on an emulation layer for atomic
> block commands.  Specifically, SNIA has laid out their position on the
> command set an atomic block device may support (NVM Programming Model
> [3]) and it is a good conversation piece for this effort.  The goal
> would be to review the proposed operations, identify the capabilities
> that would be readily useful to filesystems / existing use cases, and
> tear down a straw man implementation proposal.
...
> The argument for not doing this as a
> device-mapper target or stacked block device driver is to ease
> provisioning and make the emulation transparent.  On the other hand,
> the argument for doing this as a virtual block device is that the
> "failed to parse device metadata" is a known failure scenario for
> dm/md, but not sd for example.

Hi Dan,

Like Jeff, I'm a member of the NVMP workgroup and I'd like to ring in
here with a couple observations.  I think the most interesting cases
where atomics provide a benefit are cases where storage is RAIDed
across multiple devices.  Part of the argument for atomic writes on
SSDs is that databases and file systems can save bandwidth and
complexity by avoiding write-ahead-logging.  But even if every SSD
supported it, the majority of production databases span across devices
for either capacity, performance, or, most likely, high availability
reasons.  So in my opinion, that very much supports the idea of doing
atomics at a layer where it applies to SW RAIDed storage (as I believe
Dave and others are suggesting).

On the other side of the coin, I remember Dave talking about this
during our NVM discussion at LSF last year and I got the impression
the size and number of writes he'd need supported before he could
really stop using his journaling code was potentially large.  Dave:
perhaps you can re-state the number of writes and their total size
that would have to be supported by block level atomics in order for
them to be worth using by XFS?

Finally, I think atomics for file system use is interesting, but also
exposing them for database use is very interesting.  That means
exposing the size and number of writes supported to the app and making
the file system able to turn around and leverage those when a database
app tries to use them via the file system.  This has been the primary
focus of the NVMP workgroup, helping ISVs determine what features they
can leverage in a uniform way.  So my point here is we get the most
use out of atomics by exposing them both in-kernel for file systems
and in user space for apps.

-andy