From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dan Williams <dan.j.williams@intel.com>
Subject: Re: [LSF/MM TOPIC] atomic block device
Date: Sat, 15 Feb 2014 13:35:30 -0800
Message-ID: <CAPcyv4ivNz-DtpZgCXWLQe1+azcN2pSA3mQG6X6pOPt=tsiWSg@mail.gmail.com>
References: <CAA9_cmf7Y1TL8XqR7dYUn=Pv-En2e0X0FM0zdpkiBkUuNBGKfQ@mail.gmail.com>
	<CABBL8E+r+Uao9aJsezy16K_JXQgVuoD7ArepB46WTS=zruHL4g@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Cc: lsf-pc@lists.linux-foundation.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>, jmoyer@redhat.com,
	david@fromorbit.com, Chris Mason <clm@fb.com>,
	Jens Axboe <axboe@kernel.dk>,
	Bryan E Veal <bryan.e.veal@intel.com>,
	Annie Foong <annie.foong@intel.com>
To: Andy Rudoff <andy@rudoff.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-ve0-f175.google.com ([209.85.128.175]:43895 "EHLO
	mail-ve0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753862AbaBOVfd (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Sat, 15 Feb 2014 16:35:33 -0500
Received: by mail-ve0-f175.google.com with SMTP id c14so10684743vea.20
        for <linux-fsdevel@vger.kernel.org>; Sat, 15 Feb 2014 13:35:31 -0800 (PST)
In-Reply-To: <CABBL8E+r+Uao9aJsezy16K_JXQgVuoD7ArepB46WTS=zruHL4g@mail.gmail.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Sat, Feb 15, 2014 at 9:47 AM, Andy Rudoff <andy@rudoff.com> wrote:
> On Sat, Feb 15, 2014 at 8:04 AM, Dan Williams <dan.j.williams@intel.com>
> wrote:
>>
>> In response to Dave's call [1] and highlighting Jeff's attend request
>> [2] I'd like to stoke a discussion on an emulation layer for atomic
>> block commands.  Specifically, SNIA has laid out their position on the
>> command set an atomic block device may support (NVM Programming Model
>> [3]) and it is a good conversation piece for this effort.  The goal
>> would be to review the proposed operations, identify the capabilities
>> that would be readily useful to filesystems / existing use cases, and
>> tear down a straw man implementation proposal.
>
> ...
>>
>> The argument for not doing this as a
>> device-mapper target or stacked block device driver is to ease
>> provisioning and make the emulation transparent.  On the other hand,
>> the argument for doing this as a virtual block device is that the
>> "failed to parse device metadata" is a known failure scenario for
>> dm/md, but not sd for example.
>
>
> Hi Dan,

Hi Andy.

> Like Jeff, I'm a member of the NVMP workgroup and I'd like to ring in here
> with a couple observations.  I think the most interesting cases where
> atomics provide a benefit are cases where storage is RAIDed across multiple
> devices.  Part of the argument for atomic writes on SSDs is that databases
> and file systems can save bandwidth and complexity by avoiding
> write-ahead-logging.  But even if every SSD supported it, the majority of
> production databases span across devices for either capacity, performance,
> or, most likely, high availability reasons.

The primary Facebook database server (Type 3 [1]) is single-device,
are they an outlier?  I would think scale-out architectures in general
handle database capacity and availability by scaling at the node
level... that said I don't doubt that some are dependent on
multi-device configurations.

[1]: http://opencompute.org/summit/ (slide 12)

> So in my opinion, that very
> much supports the idea of doing atomics at a layer where it applies to SW
> RAIDed storage (as I believe Dave and others are suggesting).

Sure this can expand to a multi-device capability, but that is
incremental to the single device use case.

> On the other side of the coin, I remember Dave talking about this during our
> NVM discussion at LSF last year and I got the impression the size and number
> of writes he'd need supported before he could really stop using his
> journaling code was potentially large.  Dave: perhaps you can re-state the
> number of writes and their total size that would have to be supported by
> block level atomics in order for them to be worth using by XFS?

...and that's the driving example of the value of having a solution
like this upstream.  Beat up on a common layer to determine the
minimum practical requirements across different use cases.

> Finally, I think atomics for file system use is interesting, but also
> exposing them for database use is very interesting.  That means exposing the
> size and number of writes supported to the app and making the file system
> able to turn around and leverage those when a database app tries to use them
> via the file system.  This has been the primary focus of the NVMP workgroup,
> helping ISVs determine what features they can leverage in a uniform way.  So
> my point here is we get the most use out of atomics by exposing them both
> in-kernel for file systems and in user space for apps.

*nod*