From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-nvdimm-bounces@lists.01.org>
Date: Tue, 13 Sep 2016 07:34:36 +1000
From: Dave Chinner <david@fromorbit.com>
Subject: Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in
 /proc/self/smaps)
Message-ID: <20160912213435.GD30497@dastard>
References: <CAPcyv4iDra+mRqEejfGqapKEAFZmUtUcg0dsJ8nt7mOhcT-Qpw@mail.gmail.com>
 <20160908225636.GB15167@linux.intel.com>
 <20160912052703.GA1897@infradead.org>
 <CAOSf1CHaW=szD+YEjV6vcUG0KKr=aXv8RXomw9xAgknh_9NBFQ@mail.gmail.com>
 <20160912075128.GB21474@infradead.org>
 <20160912180507.533b3549@roar.ozlabs.ibm.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20160912180507.533b3549@roar.ozlabs.ibm.com>
List-Unsubscribe: <https://lists.01.org/mailman/options/linux-nvdimm>,
 <mailto:linux-nvdimm-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/linux-nvdimm/>
List-Post: <mailto:linux-nvdimm@lists.01.org>
List-Help: <mailto:linux-nvdimm-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/linux-nvdimm>,
 <mailto:linux-nvdimm-request@lists.01.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: linux-nvdimm-bounces@lists.01.org
Sender: "Linux-nvdimm" <linux-nvdimm-bounces@lists.01.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Yumei Huang <yuhuang@redhat.com>, Michal Hocko <mhocko@suse.com>, Xiao Guangrong <guangrong.xiao@linux.intel.com>, KVM list <kvm@vger.kernel.org>, Dave Hansen <dave.hansen@intel.com>, Gleb Natapov <gleb@kernel.org>, "linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>, mtosatti@redhat.com, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Christoph Hellwig <hch@infradead.org>, Linux MM <linux-mm@kvack.org>, Stefan Hajnoczi <stefanha@redhat.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Paolo Bonzini <pbonzini@redhat.com>, Andrew Morton <akpm@linux-foundation.org>
List-ID: <linux-nvdimm@lists.01.org>

On Mon, Sep 12, 2016 at 06:05:07PM +1000, Nicholas Piggin wrote:
> On Mon, 12 Sep 2016 00:51:28 -0700
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Mon, Sep 12, 2016 at 05:25:15PM +1000, Oliver O'Halloran wrote:
> > > What are the problems here? Is this a matter of existing filesystems
> > > being unable/unwilling to support this or is it just fundamentally
> > > broken?  
> > 
> > It's a fundamentally broken model.  See Dave's post that actually was
> > sent slightly earlier then mine for the list of required items, which
> > is fairly unrealistic.  You could probably try to architect a file
> > system for it, but I doubt it would gain much traction.
> 
> It's not fundamentally broken, it just doesn't fit well existing
> filesystems.
> 
> Dave's post of requirements is also wrong. A filesystem does not have
> to guarantee all that, it only has to guarantee that is the case for
> a given block after it has a mapping and page fault returns, other
> operations can be supported by invalidating mappings, etc.

Sure, but filesystems are completely unaware of what is mapped at
any given time, or what constraints that mapping might have. Trying
to make filesystems aware of per-page mapping constraints seems like
a fairly significant layering violation based on a flawed
assumption. i.e. that operations on other parts of the file do not
affect the block that requires immutable metadata.

e.g an extent operation in some other area of the file can cause a
tip-to-root extent tree split or merge, and that moves the metadata
that points to the mapped block that we've told userspace "doesn't
need fsync".  We now need an fsync to ensure that the metadata is
consistent on disk again, even though that block has not physically
been moved. IOWs, the immutable data block updates are now not
ordered correctly w.r.t. other updates done to the file, especially
when we consider crash recovery....

All this will expose is an unfixable problem with ordering of stable
data + metadata operations and their synchronisation. As such, it
seems like nothing but a major cluster-fuck to try to do mapping
specific, per-block immutable metadata - it adds major complexity
and even more untractable problems.

Yes, we /could/ try to solve this but, quite frankly, it's far
easier to change the broken PMEM programming model assumptions than
it is to implement what you are suggesting. Or to do what Christoph
suggested and just use a wrapper around something like device
mapper to hand out chunks of unchanging, static pmem to
applications...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1757894AbcILVen (ORCPT <rfc822;w@1wt.eu>);
        Mon, 12 Sep 2016 17:34:43 -0400
Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:27790 "EHLO
        ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1752858AbcILVel (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 12 Sep 2016 17:34:41 -0400
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A2CMHQDzHtdXEAI1LHldHAEBBAEBCgEBgzkBAQEBAR6BU4J6g3mcNAEBBox5hhmEEoYXAgIBAQKBQk0BAgEBAQEBAgYBAQEBAQEBATdAhGIBAQQ6HBYKAxAIAw4KCSUPBSUDBxoTiEm+NQEBAQEGAQEBASMehUqFGIdugi8FmWOPQo9sjFWDe4N5gUgqNIZwAQEB
Date: Tue, 13 Sep 2016 07:34:36 +1000
From: Dave Chinner <david@fromorbit.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>,
        "Oliver O'Halloran" <oohall@gmail.com>,
        Yumei Huang <yuhuang@redhat.com>, Michal Hocko <mhocko@suse.com>,
        Xiao Guangrong <guangrong.xiao@linux.intel.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        KVM list <kvm@vger.kernel.org>, Linux MM <linux-mm@kvack.org>,
        Gleb Natapov <gleb@kernel.org>,
        "linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
        mtosatti@redhat.com,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Dave Hansen <dave.hansen@intel.com>,
        Stefan Hajnoczi <stefanha@redhat.com>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in
 /proc/self/smaps)
Message-ID: <20160912213435.GD30497@dastard>
References: <CAPcyv4iDra+mRqEejfGqapKEAFZmUtUcg0dsJ8nt7mOhcT-Qpw@mail.gmail.com>
 <20160908225636.GB15167@linux.intel.com>
 <20160912052703.GA1897@infradead.org>
 <CAOSf1CHaW=szD+YEjV6vcUG0KKr=aXv8RXomw9xAgknh_9NBFQ@mail.gmail.com>
 <20160912075128.GB21474@infradead.org>
 <20160912180507.533b3549@roar.ozlabs.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160912180507.533b3549@roar.ozlabs.ibm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Sep 12, 2016 at 06:05:07PM +1000, Nicholas Piggin wrote:
> On Mon, 12 Sep 2016 00:51:28 -0700
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Mon, Sep 12, 2016 at 05:25:15PM +1000, Oliver O'Halloran wrote:
> > > What are the problems here? Is this a matter of existing filesystems
> > > being unable/unwilling to support this or is it just fundamentally
> > > broken?  
> > 
> > It's a fundamentally broken model.  See Dave's post that actually was
> > sent slightly earlier then mine for the list of required items, which
> > is fairly unrealistic.  You could probably try to architect a file
> > system for it, but I doubt it would gain much traction.
> 
> It's not fundamentally broken, it just doesn't fit well existing
> filesystems.
> 
> Dave's post of requirements is also wrong. A filesystem does not have
> to guarantee all that, it only has to guarantee that is the case for
> a given block after it has a mapping and page fault returns, other
> operations can be supported by invalidating mappings, etc.

Sure, but filesystems are completely unaware of what is mapped at
any given time, or what constraints that mapping might have. Trying
to make filesystems aware of per-page mapping constraints seems like
a fairly significant layering violation based on a flawed
assumption. i.e. that operations on other parts of the file do not
affect the block that requires immutable metadata.

e.g an extent operation in some other area of the file can cause a
tip-to-root extent tree split or merge, and that moves the metadata
that points to the mapped block that we've told userspace "doesn't
need fsync".  We now need an fsync to ensure that the metadata is
consistent on disk again, even though that block has not physically
been moved. IOWs, the immutable data block updates are now not
ordered correctly w.r.t. other updates done to the file, especially
when we consider crash recovery....

All this will expose is an unfixable problem with ordering of stable
data + metadata operations and their synchronisation. As such, it
seems like nothing but a major cluster-fuck to try to do mapping
specific, per-block immutable metadata - it adds major complexity
and even more untractable problems.

Yes, we /could/ try to solve this but, quite frankly, it's far
easier to change the broken PMEM programming model assumptions than
it is to implement what you are suggesting. Or to do what Christoph
suggested and just use a wrapper around something like device
mapper to hand out chunks of unchanging, static pmem to
applications...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
Date: Tue, 13 Sep 2016 07:34:36 +1000
From: Dave Chinner <david@fromorbit.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Oliver O'Halloran <oohall@gmail.com>,
	Yumei Huang <yuhuang@redhat.com>, Michal Hocko <mhocko@suse.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KVM list <kvm@vger.kernel.org>, Linux MM <linux-mm@kvack.org>,
	Gleb Natapov <gleb@kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	mtosatti@redhat.com,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in
 /proc/self/smaps)
Message-ID: <20160912213435.GD30497@dastard>
References: <CAPcyv4iDra+mRqEejfGqapKEAFZmUtUcg0dsJ8nt7mOhcT-Qpw@mail.gmail.com>
 <20160908225636.GB15167@linux.intel.com>
 <20160912052703.GA1897@infradead.org>
 <CAOSf1CHaW=szD+YEjV6vcUG0KKr=aXv8RXomw9xAgknh_9NBFQ@mail.gmail.com>
 <20160912075128.GB21474@infradead.org>
 <20160912180507.533b3549@roar.ozlabs.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160912180507.533b3549@roar.ozlabs.ibm.com>
Sender: owner-linux-mm@kvack.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Mon, Sep 12, 2016 at 06:05:07PM +1000, Nicholas Piggin wrote:
> On Mon, 12 Sep 2016 00:51:28 -0700
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Mon, Sep 12, 2016 at 05:25:15PM +1000, Oliver O'Halloran wrote:
> > > What are the problems here? Is this a matter of existing filesystems
> > > being unable/unwilling to support this or is it just fundamentally
> > > broken?  
> > 
> > It's a fundamentally broken model.  See Dave's post that actually was
> > sent slightly earlier then mine for the list of required items, which
> > is fairly unrealistic.  You could probably try to architect a file
> > system for it, but I doubt it would gain much traction.
> 
> It's not fundamentally broken, it just doesn't fit well existing
> filesystems.
> 
> Dave's post of requirements is also wrong. A filesystem does not have
> to guarantee all that, it only has to guarantee that is the case for
> a given block after it has a mapping and page fault returns, other
> operations can be supported by invalidating mappings, etc.

Sure, but filesystems are completely unaware of what is mapped at
any given time, or what constraints that mapping might have. Trying
to make filesystems aware of per-page mapping constraints seems like
a fairly significant layering violation based on a flawed
assumption. i.e. that operations on other parts of the file do not
affect the block that requires immutable metadata.

e.g an extent operation in some other area of the file can cause a
tip-to-root extent tree split or merge, and that moves the metadata
that points to the mapped block that we've told userspace "doesn't
need fsync".  We now need an fsync to ensure that the metadata is
consistent on disk again, even though that block has not physically
been moved. IOWs, the immutable data block updates are now not
ordered correctly w.r.t. other updates done to the file, especially
when we consider crash recovery....

All this will expose is an unfixable problem with ordering of stable
data + metadata operations and their synchronisation. As such, it
seems like nothing but a major cluster-fuck to try to do mapping
specific, per-block immutable metadata - it adds major complexity
and even more untractable problems.

Yes, we /could/ try to solve this but, quite frankly, it's far
easier to change the broken PMEM programming model assumptions than
it is to implement what you are suggesting. Or to do what Christoph
suggested and just use a wrapper around something like device
mapper to hand out chunks of unchanging, static pmem to
applications...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>