From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423129AbdD0VgZ (ORCPT ); Thu, 27 Apr 2017 17:36:25 -0400 Received: from mail-it0-f42.google.com ([209.85.214.42]:34056 "EHLO mail-it0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1032716AbdD0VgQ (ORCPT ); Thu, 27 Apr 2017 17:36:16 -0400 MIME-Version: 1.0 In-Reply-To: References: <149307780135.7155.11108531648914675756.stgit@dwillia2-desk3.amr.corp.intel.com> <149315140303.23340.14688142799059150805.stgit@dwillia2-desk3.amr.corp.intel.com> From: Dan Williams Date: Thu, 27 Apr 2017 14:36:15 -0700 Message-ID: Subject: Re: [PATCH v3] libnvdimm, region: sysfs trigger for nvdimm_flush() To: Jeff Moyer Cc: "linux-nvdimm@lists.01.org" , Masayoshi Mizuma , Linux ACPI , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 27, 2017 at 1:02 PM, Dan Williams wrote: > On Thu, Apr 27, 2017 at 12:40 PM, Jeff Moyer wrote: >> Dan Williams writes: >> >>> On Thu, Apr 27, 2017 at 11:41 AM, Jeff Moyer wrote: >>>> Dan Williams writes: >>>> >>>>>> The sentiment is that programs shouldn't have to grovel around in sysfs >>>>>> to do stuff related to an open file descriptor or mapping. I don't take >>>>>> issue with the name. I do worry that something like 'wpq_drain' may be >>>>>> too platform specific, though. The NVM Programming Model specification >>>>>> is going to call this "deep flush", so maybe that will give you >>>>>> some inspiration if you do want to change the name. >>>>> >>>>> I'll change to "deep_flush", and I quibble that this is related to a >>>>> single open file descriptor or mapping. It really is a "region flush" >>>>> for giving extra protection for global metadata, but the persistence >>>>> of individual fds or mappings is handled by ADR. I think an ioctl >>>>> might give the false impression that every time you flush a cacheline >>>>> to persistence you need to call the ioctl. >>>> >>>> fsync, for example, may affect more than one fd--all data in the drive >>>> write cache will be flushed. I don't see how this is so different. I >>>> think a sysfs file is awkward because it requires an application to >>>> chase down the correct file in the sysfs hierarchy. If the application >>>> already has an open fd or a mapping, it should be able to operate on >>>> that. >>> >>> I'm teetering, but still leaning towards sysfs. The use case that >>> needs this is device-dax because we otherwise silently do this behind >>> the application's back on filesystem-dax for fsync / msync. >> >> We may yet get file system support for flush from userspace (NOVA, for >> example). So I don't think we should restrict ourselves to only >> thinking about the device dax use case. >> >>> A device-dax ioctl would be straightforward, but 'deep flush' assumes >>> that the device-dax instance is fronting persistent memory. There's >>> nothing persistent memory specific about device-dax except that today >>> only the nvdimm sub-system knows how to create them, but there's >>> nothing that prevents other memory regions from being mapped this way. >> >> You're concerned that applications operating on device dax instances >> that are not backed by pmem will try to issue a deep flush? Why would >> they do that, and why can't you just return failure from the ioctl? >> >>> So I'd rather this persistent memory specific mechanism stay with the >>> persistent memory specific portion of the interface rather than plumb >>> persistent memory details out through the generic device-dax interface >>> since we have no other intercept point like we do in the >>> filesystem-dax case to hide this flush. >> >> Look at the block layer. You can issue an ioctl on a block device, and >> if the generic block layer can handle it, it does. If not, it gets >> passed down to lower layers until either it gets handled, or it bubbles >> back up because nobody knew what to do with it. I think you can do the >> same thing here, and that solves your layering violation. > > So this is where I started. I was going to follow the block layer. > Except recently the block layer has been leaning away from ioctls and > implementing support for syscalls directly. The same approach for > device-dax fallocate() support got NAKd, so I opted for sysfs out of > the gate. > > However, since there really is no analog for "deep flush" in the > syscall namespace lets (*gag*) implement an ioctl for this. So I think about it for 2 seconds and now I'm veering back to sysfs. We don't want applications calling this ioctl on filesystem-dax fds. I simply can't bring myself to do the work to pick a unique ioctl number that is known to be unique for the full filsystem and block-layer paths when we can put this interface right where it belongs in nvdimm specific sysfs.