From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Williams Subject: Re: [LSF/MM TOPIC] atomic block device Date: Sat, 15 Feb 2014 13:35:30 -0800 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel , jmoyer@redhat.com, david@fromorbit.com, Chris Mason , Jens Axboe , Bryan E Veal , Annie Foong To: Andy Rudoff Return-path: Received: from mail-ve0-f175.google.com ([209.85.128.175]:43895 "EHLO mail-ve0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753862AbaBOVfd (ORCPT ); Sat, 15 Feb 2014 16:35:33 -0500 Received: by mail-ve0-f175.google.com with SMTP id c14so10684743vea.20 for ; Sat, 15 Feb 2014 13:35:31 -0800 (PST) In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Feb 15, 2014 at 9:47 AM, Andy Rudoff wrote: > On Sat, Feb 15, 2014 at 8:04 AM, Dan Williams > wrote: >> >> In response to Dave's call [1] and highlighting Jeff's attend request >> [2] I'd like to stoke a discussion on an emulation layer for atomic >> block commands. Specifically, SNIA has laid out their position on the >> command set an atomic block device may support (NVM Programming Model >> [3]) and it is a good conversation piece for this effort. The goal >> would be to review the proposed operations, identify the capabilities >> that would be readily useful to filesystems / existing use cases, and >> tear down a straw man implementation proposal. > > ... >> >> The argument for not doing this as a >> device-mapper target or stacked block device driver is to ease >> provisioning and make the emulation transparent. On the other hand, >> the argument for doing this as a virtual block device is that the >> "failed to parse device metadata" is a known failure scenario for >> dm/md, but not sd for example. > > > Hi Dan, Hi Andy. > Like Jeff, I'm a member of the NVMP workgroup and I'd like to ring in here > with a couple observations. I think the most interesting cases where > atomics provide a benefit are cases where storage is RAIDed across multiple > devices. Part of the argument for atomic writes on SSDs is that databases > and file systems can save bandwidth and complexity by avoiding > write-ahead-logging. But even if every SSD supported it, the majority of > production databases span across devices for either capacity, performance, > or, most likely, high availability reasons. The primary Facebook database server (Type 3 [1]) is single-device, are they an outlier? I would think scale-out architectures in general handle database capacity and availability by scaling at the node level... that said I don't doubt that some are dependent on multi-device configurations. [1]: http://opencompute.org/summit/ (slide 12) > So in my opinion, that very > much supports the idea of doing atomics at a layer where it applies to SW > RAIDed storage (as I believe Dave and others are suggesting). Sure this can expand to a multi-device capability, but that is incremental to the single device use case. > On the other side of the coin, I remember Dave talking about this during our > NVM discussion at LSF last year and I got the impression the size and number > of writes he'd need supported before he could really stop using his > journaling code was potentially large. Dave: perhaps you can re-state the > number of writes and their total size that would have to be supported by > block level atomics in order for them to be worth using by XFS? ...and that's the driving example of the value of having a solution like this upstream. Beat up on a common layer to determine the minimum practical requirements across different use cases. > Finally, I think atomics for file system use is interesting, but also > exposing them for database use is very interesting. That means exposing the > size and number of writes supported to the app and making the file system > able to turn around and leverage those when a database app tries to use them > via the file system. This has been the primary focus of the NVMP workgroup, > helping ISVs determine what features they can leverage in a uniform way. So > my point here is we get the most use out of atomics by exposing them both > in-kernel for file systems and in user space for apps. *nod*