All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Dan Williams <dan.j.williams@intel.com>,
	Andreas Dilger <adilger@dilger.ca>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Matthew Wilcox <willy@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	X86 ML <x86@kernel.org>, XFS Developers <xfs@oss.sgi.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling
Date: Mon, 16 Nov 2015 16:29:27 -0700	[thread overview]
Message-ID: <20151116232927.GA5582@linux.intel.com> (raw)
In-Reply-To: <20151116221412.GV19199@dastard>

On Tue, Nov 17, 2015 at 09:14:12AM +1100, Dave Chinner wrote:
> On Mon, Nov 16, 2015 at 03:05:26PM +0100, Jan Kara wrote:
> > On Mon 16-11-15 14:37:14, Jan Kara wrote:
> > > On Fri 13-11-15 18:32:40, Dan Williams wrote:
> > > > On Fri, Nov 13, 2015 at 4:43 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> > > > > On Nov 13, 2015, at 5:20 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >>
> > > > >> On Fri, Nov 13, 2015 at 4:06 PM, Ross Zwisler
> > > > >> <ross.zwisler@linux.intel.com> wrote:
> > > > >>> Currently the PMEM driver doesn't accept REQ_FLUSH or REQ_FUA bios.  These
> > > > >>> are sent down via blkdev_issue_flush() in response to a fsync() or msync()
> > > > >>> and are used by filesystems to order their metadata, among other things.
> > > > >>>
> > > > >>> When we get an msync() or fsync() it is the responsibility of the DAX code
> > > > >>> to flush all dirty pages to media.  The PMEM driver then just has issue a
> > > > >>> wmb_pmem() in response to the REQ_FLUSH to ensure that before we return all
> > > > >>> the flushed data has been durably stored on the media.
> > > > >>>
> > > > >>> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > > >>
> > > > >> Hmm, I'm not seeing why we need this patch.  If the actual flushing of
> > > > >> the cache is done by the core why does the driver need support
> > > > >> REQ_FLUSH?  Especially since it's just a couple instructions.  REQ_FUA
> > > > >> only makes sense if individual writes can bypass the "drive" cache,
> > > > >> but no I/O submitted to the driver proper is ever cached we always
> > > > >> flush it through to media.
> > > > >
> > > > > If the upper level filesystem gets an error when submitting a flush
> > > > > request, then it assumes the underlying hardware is broken and cannot
> > > > > be as aggressive in IO submission, but instead has to wait for in-flight
> > > > > IO to complete.
> > > > 
> > > > Upper level filesystems won't get errors when the driver does not
> > > > support flush.  Those requests are ended cleanly in
> > > > generic_make_request_checks().  Yes, the fs still needs to wait for
> > > > outstanding I/O to complete but in the case of pmem all I/O is
> > > > synchronous.  There's never anything to await when flushing at the
> > > > pmem driver level.
> > > > 
> > > > > Since FUA/FLUSH is basically a no-op for pmem devices,
> > > > > it doesn't make sense _not_ to support this functionality.
> > > > 
> > > > Seems to be a nop either way.  Given that DAX may lead to dirty data
> > > > pending to the device in the cpu cache that a REQ_FLUSH request will
> > > > not touch, its better to leave it all to the mm core to handle.  I.e.
> > > > it doesn't make sense to call the driver just for two instructions
> > > > (sfence + pcommit) when the mm core is taking on the cache flushing.
> > > > Either handle it all in the mm or the driver, not a mixture.
> > > 
> > > So I think REQ_FLUSH requests *must* end up doing sfence + pcommit because
> > > e.g. journal writes going through block layer or writes done through
> > > dax_do_io() must be on permanent storage once REQ_FLUSH request finishes
> > > and the way driver does IO doesn't guarantee this, does it?
> > 
> > Hum, and looking into how dax_do_io() works and what drivers/nvdimm/pmem.c
> > does, I'm indeed wrong because they both do wmb_pmem() after each write
> > which seems to include sfence + pcommit. Sorry for confusion.
> 
> Which I want to remove, because it makes DAX IO 3x slower than
> buffered IO on ramdisk based testing.
> 
> > But a question: Won't it be better to do sfence + pcommit only in response
> > to REQ_FLUSH request and don't do it after each write? I'm not sure how
> > expensive these instructions are but in theory it could be a performance
> > win, couldn't it? For filesystems this is enough wrt persistency
> > guarantees...
> 
> I'm pretty sure it would be, because all of the overhead (and
> therefore latency) I measured is in the cache flushing instructions.
> But before we can remove the wmb_pmem() from  dax_do_io(), we need
> the underlying device to support REQ_FLUSH correctly...

By "support REQ_FLUSH correctly" do you mean call wmb_pmem() as I do in my
set?  Or do you mean something that also involves cache flushing such as the
"big hammer" that flushes everything or something like WBINVD?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Dan Williams <dan.j.williams@intel.com>,
	Andreas Dilger <adilger@dilger.ca>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Matthew Wilcox <willy@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	X86 ML <x86@kernel.org>, XFS Developers <xfs@oss.sgi.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling
Date: Mon, 16 Nov 2015 16:29:27 -0700	[thread overview]
Message-ID: <20151116232927.GA5582@linux.intel.com> (raw)
In-Reply-To: <20151116221412.GV19199@dastard>

On Tue, Nov 17, 2015 at 09:14:12AM +1100, Dave Chinner wrote:
> On Mon, Nov 16, 2015 at 03:05:26PM +0100, Jan Kara wrote:
> > On Mon 16-11-15 14:37:14, Jan Kara wrote:
> > > On Fri 13-11-15 18:32:40, Dan Williams wrote:
> > > > On Fri, Nov 13, 2015 at 4:43 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> > > > > On Nov 13, 2015, at 5:20 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >>
> > > > >> On Fri, Nov 13, 2015 at 4:06 PM, Ross Zwisler
> > > > >> <ross.zwisler@linux.intel.com> wrote:
> > > > >>> Currently the PMEM driver doesn't accept REQ_FLUSH or REQ_FUA bios.  These
> > > > >>> are sent down via blkdev_issue_flush() in response to a fsync() or msync()
> > > > >>> and are used by filesystems to order their metadata, among other things.
> > > > >>>
> > > > >>> When we get an msync() or fsync() it is the responsibility of the DAX code
> > > > >>> to flush all dirty pages to media.  The PMEM driver then just has issue a
> > > > >>> wmb_pmem() in response to the REQ_FLUSH to ensure that before we return all
> > > > >>> the flushed data has been durably stored on the media.
> > > > >>>
> > > > >>> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > > >>
> > > > >> Hmm, I'm not seeing why we need this patch.  If the actual flushing of
> > > > >> the cache is done by the core why does the driver need support
> > > > >> REQ_FLUSH?  Especially since it's just a couple instructions.  REQ_FUA
> > > > >> only makes sense if individual writes can bypass the "drive" cache,
> > > > >> but no I/O submitted to the driver proper is ever cached we always
> > > > >> flush it through to media.
> > > > >
> > > > > If the upper level filesystem gets an error when submitting a flush
> > > > > request, then it assumes the underlying hardware is broken and cannot
> > > > > be as aggressive in IO submission, but instead has to wait for in-flight
> > > > > IO to complete.
> > > > 
> > > > Upper level filesystems won't get errors when the driver does not
> > > > support flush.  Those requests are ended cleanly in
> > > > generic_make_request_checks().  Yes, the fs still needs to wait for
> > > > outstanding I/O to complete but in the case of pmem all I/O is
> > > > synchronous.  There's never anything to await when flushing at the
> > > > pmem driver level.
> > > > 
> > > > > Since FUA/FLUSH is basically a no-op for pmem devices,
> > > > > it doesn't make sense _not_ to support this functionality.
> > > > 
> > > > Seems to be a nop either way.  Given that DAX may lead to dirty data
> > > > pending to the device in the cpu cache that a REQ_FLUSH request will
> > > > not touch, its better to leave it all to the mm core to handle.  I.e.
> > > > it doesn't make sense to call the driver just for two instructions
> > > > (sfence + pcommit) when the mm core is taking on the cache flushing.
> > > > Either handle it all in the mm or the driver, not a mixture.
> > > 
> > > So I think REQ_FLUSH requests *must* end up doing sfence + pcommit because
> > > e.g. journal writes going through block layer or writes done through
> > > dax_do_io() must be on permanent storage once REQ_FLUSH request finishes
> > > and the way driver does IO doesn't guarantee this, does it?
> > 
> > Hum, and looking into how dax_do_io() works and what drivers/nvdimm/pmem.c
> > does, I'm indeed wrong because they both do wmb_pmem() after each write
> > which seems to include sfence + pcommit. Sorry for confusion.
> 
> Which I want to remove, because it makes DAX IO 3x slower than
> buffered IO on ramdisk based testing.
> 
> > But a question: Won't it be better to do sfence + pcommit only in response
> > to REQ_FLUSH request and don't do it after each write? I'm not sure how
> > expensive these instructions are but in theory it could be a performance
> > win, couldn't it? For filesystems this is enough wrt persistency
> > guarantees...
> 
> I'm pretty sure it would be, because all of the overhead (and
> therefore latency) I measured is in the cache flushing instructions.
> But before we can remove the wmb_pmem() from  dax_do_io(), we need
> the underlying device to support REQ_FLUSH correctly...

By "support REQ_FLUSH correctly" do you mean call wmb_pmem() as I do in my
set?  Or do you mean something that also involves cache flushing such as the
"big hammer" that flushes everything or something like WBINVD?

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Dan Williams <dan.j.williams@intel.com>,
	Andreas Dilger <adilger@dilger.ca>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Matthew Wilcox <willy@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	X86 ML <x86@kernel.org>, XFS Developers <xfs@oss.sgi.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.ha
Subject: Re: [PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling
Date: Mon, 16 Nov 2015 16:29:27 -0700	[thread overview]
Message-ID: <20151116232927.GA5582@linux.intel.com> (raw)
In-Reply-To: <20151116221412.GV19199@dastard>

On Tue, Nov 17, 2015 at 09:14:12AM +1100, Dave Chinner wrote:
> On Mon, Nov 16, 2015 at 03:05:26PM +0100, Jan Kara wrote:
> > On Mon 16-11-15 14:37:14, Jan Kara wrote:
> > > On Fri 13-11-15 18:32:40, Dan Williams wrote:
> > > > On Fri, Nov 13, 2015 at 4:43 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> > > > > On Nov 13, 2015, at 5:20 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >>
> > > > >> On Fri, Nov 13, 2015 at 4:06 PM, Ross Zwisler
> > > > >> <ross.zwisler@linux.intel.com> wrote:
> > > > >>> Currently the PMEM driver doesn't accept REQ_FLUSH or REQ_FUA bios.  These
> > > > >>> are sent down via blkdev_issue_flush() in response to a fsync() or msync()
> > > > >>> and are used by filesystems to order their metadata, among other things.
> > > > >>>
> > > > >>> When we get an msync() or fsync() it is the responsibility of the DAX code
> > > > >>> to flush all dirty pages to media.  The PMEM driver then just has issue a
> > > > >>> wmb_pmem() in response to the REQ_FLUSH to ensure that before we return all
> > > > >>> the flushed data has been durably stored on the media.
> > > > >>>
> > > > >>> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > > >>
> > > > >> Hmm, I'm not seeing why we need this patch.  If the actual flushing of
> > > > >> the cache is done by the core why does the driver need support
> > > > >> REQ_FLUSH?  Especially since it's just a couple instructions.  REQ_FUA
> > > > >> only makes sense if individual writes can bypass the "drive" cache,
> > > > >> but no I/O submitted to the driver proper is ever cached we always
> > > > >> flush it through to media.
> > > > >
> > > > > If the upper level filesystem gets an error when submitting a flush
> > > > > request, then it assumes the underlying hardware is broken and cannot
> > > > > be as aggressive in IO submission, but instead has to wait for in-flight
> > > > > IO to complete.
> > > > 
> > > > Upper level filesystems won't get errors when the driver does not
> > > > support flush.  Those requests are ended cleanly in
> > > > generic_make_request_checks().  Yes, the fs still needs to wait for
> > > > outstanding I/O to complete but in the case of pmem all I/O is
> > > > synchronous.  There's never anything to await when flushing at the
> > > > pmem driver level.
> > > > 
> > > > > Since FUA/FLUSH is basically a no-op for pmem devices,
> > > > > it doesn't make sense _not_ to support this functionality.
> > > > 
> > > > Seems to be a nop either way.  Given that DAX may lead to dirty data
> > > > pending to the device in the cpu cache that a REQ_FLUSH request will
> > > > not touch, its better to leave it all to the mm core to handle.  I.e.
> > > > it doesn't make sense to call the driver just for two instructions
> > > > (sfence + pcommit) when the mm core is taking on the cache flushing.
> > > > Either handle it all in the mm or the driver, not a mixture.
> > > 
> > > So I think REQ_FLUSH requests *must* end up doing sfence + pcommit because
> > > e.g. journal writes going through block layer or writes done through
> > > dax_do_io() must be on permanent storage once REQ_FLUSH request finishes
> > > and the way driver does IO doesn't guarantee this, does it?
> > 
> > Hum, and looking into how dax_do_io() works and what drivers/nvdimm/pmem.c
> > does, I'm indeed wrong because they both do wmb_pmem() after each write
> > which seems to include sfence + pcommit. Sorry for confusion.
> 
> Which I want to remove, because it makes DAX IO 3x slower than
> buffered IO on ramdisk based testing.
> 
> > But a question: Won't it be better to do sfence + pcommit only in response
> > to REQ_FLUSH request and don't do it after each write? I'm not sure how
> > expensive these instructions are but in theory it could be a performance
> > win, couldn't it? For filesystems this is enough wrt persistency
> > guarantees...
> 
> I'm pretty sure it would be, because all of the overhead (and
> therefore latency) I measured is in the cache flushing instructions.
> But before we can remove the wmb_pmem() from  dax_do_io(), we need
> the underlying device to support REQ_FLUSH correctly...

By "support REQ_FLUSH correctly" do you mean call wmb_pmem() as I do in my
set?  Or do you mean something that also involves cache flushing such as the
"big hammer" that flushes everything or something like WBINVD?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Linux MM <linux-mm@kvack.org>, "H. Peter Anvin" <hpa@zytor.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Dan Williams <dan.j.williams@intel.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	X86 ML <x86@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	XFS Developers <xfs@oss.sgi.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andreas Dilger <adilger@dilger.ca>, Theodore Ts'o <tytso@mit.edu>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jan Kara <jack@suse.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>
Subject: Re: [PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling
Date: Mon, 16 Nov 2015 16:29:27 -0700	[thread overview]
Message-ID: <20151116232927.GA5582@linux.intel.com> (raw)
In-Reply-To: <20151116221412.GV19199@dastard>

On Tue, Nov 17, 2015 at 09:14:12AM +1100, Dave Chinner wrote:
> On Mon, Nov 16, 2015 at 03:05:26PM +0100, Jan Kara wrote:
> > On Mon 16-11-15 14:37:14, Jan Kara wrote:
> > > On Fri 13-11-15 18:32:40, Dan Williams wrote:
> > > > On Fri, Nov 13, 2015 at 4:43 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> > > > > On Nov 13, 2015, at 5:20 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >>
> > > > >> On Fri, Nov 13, 2015 at 4:06 PM, Ross Zwisler
> > > > >> <ross.zwisler@linux.intel.com> wrote:
> > > > >>> Currently the PMEM driver doesn't accept REQ_FLUSH or REQ_FUA bios.  These
> > > > >>> are sent down via blkdev_issue_flush() in response to a fsync() or msync()
> > > > >>> and are used by filesystems to order their metadata, among other things.
> > > > >>>
> > > > >>> When we get an msync() or fsync() it is the responsibility of the DAX code
> > > > >>> to flush all dirty pages to media.  The PMEM driver then just has issue a
> > > > >>> wmb_pmem() in response to the REQ_FLUSH to ensure that before we return all
> > > > >>> the flushed data has been durably stored on the media.
> > > > >>>
> > > > >>> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > > >>
> > > > >> Hmm, I'm not seeing why we need this patch.  If the actual flushing of
> > > > >> the cache is done by the core why does the driver need support
> > > > >> REQ_FLUSH?  Especially since it's just a couple instructions.  REQ_FUA
> > > > >> only makes sense if individual writes can bypass the "drive" cache,
> > > > >> but no I/O submitted to the driver proper is ever cached we always
> > > > >> flush it through to media.
> > > > >
> > > > > If the upper level filesystem gets an error when submitting a flush
> > > > > request, then it assumes the underlying hardware is broken and cannot
> > > > > be as aggressive in IO submission, but instead has to wait for in-flight
> > > > > IO to complete.
> > > > 
> > > > Upper level filesystems won't get errors when the driver does not
> > > > support flush.  Those requests are ended cleanly in
> > > > generic_make_request_checks().  Yes, the fs still needs to wait for
> > > > outstanding I/O to complete but in the case of pmem all I/O is
> > > > synchronous.  There's never anything to await when flushing at the
> > > > pmem driver level.
> > > > 
> > > > > Since FUA/FLUSH is basically a no-op for pmem devices,
> > > > > it doesn't make sense _not_ to support this functionality.
> > > > 
> > > > Seems to be a nop either way.  Given that DAX may lead to dirty data
> > > > pending to the device in the cpu cache that a REQ_FLUSH request will
> > > > not touch, its better to leave it all to the mm core to handle.  I.e.
> > > > it doesn't make sense to call the driver just for two instructions
> > > > (sfence + pcommit) when the mm core is taking on the cache flushing.
> > > > Either handle it all in the mm or the driver, not a mixture.
> > > 
> > > So I think REQ_FLUSH requests *must* end up doing sfence + pcommit because
> > > e.g. journal writes going through block layer or writes done through
> > > dax_do_io() must be on permanent storage once REQ_FLUSH request finishes
> > > and the way driver does IO doesn't guarantee this, does it?
> > 
> > Hum, and looking into how dax_do_io() works and what drivers/nvdimm/pmem.c
> > does, I'm indeed wrong because they both do wmb_pmem() after each write
> > which seems to include sfence + pcommit. Sorry for confusion.
> 
> Which I want to remove, because it makes DAX IO 3x slower than
> buffered IO on ramdisk based testing.
> 
> > But a question: Won't it be better to do sfence + pcommit only in response
> > to REQ_FLUSH request and don't do it after each write? I'm not sure how
> > expensive these instructions are but in theory it could be a performance
> > win, couldn't it? For filesystems this is enough wrt persistency
> > guarantees...
> 
> I'm pretty sure it would be, because all of the overhead (and
> therefore latency) I measured is in the cache flushing instructions.
> But before we can remove the wmb_pmem() from  dax_do_io(), we need
> the underlying device to support REQ_FLUSH correctly...

By "support REQ_FLUSH correctly" do you mean call wmb_pmem() as I do in my
set?  Or do you mean something that also involves cache flushing such as the
"big hammer" that flushes everything or something like WBINVD?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-11-16 23:29 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-14  0:06 [PATCH v2 00/11] DAX fsynx/msync support Ross Zwisler
2015-11-14  0:06 ` Ross Zwisler
2015-11-14  0:06 ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 01/11] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 02/11] mm: add pmd_mkclean() Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  1:02   ` Dave Hansen
2015-11-14  1:02     ` Dave Hansen
2015-11-14  1:02     ` Dave Hansen
2015-11-17 17:52     ` Ross Zwisler
2015-11-17 17:52       ` Ross Zwisler
2015-11-17 17:52       ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 03/11] pmem: enable REQ_FUA/REQ_FLUSH handling Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:20   ` Dan Williams
2015-11-14  0:20     ` Dan Williams
2015-11-14  0:20     ` Dan Williams
2015-11-14  0:43     ` Andreas Dilger
2015-11-14  0:43       ` Andreas Dilger
2015-11-14  0:43       ` Andreas Dilger
2015-11-14  2:32       ` Dan Williams
2015-11-14  2:32         ` Dan Williams
2015-11-14  2:32         ` Dan Williams
2015-11-16 13:37         ` Jan Kara
2015-11-16 13:37           ` Jan Kara
2015-11-16 13:37           ` Jan Kara
2015-11-16 13:37           ` Jan Kara
2015-11-16 14:05           ` Jan Kara
2015-11-16 14:05             ` Jan Kara
2015-11-16 14:05             ` Jan Kara
2015-11-16 17:28             ` Dan Williams
2015-11-16 17:28               ` Dan Williams
2015-11-16 17:28               ` Dan Williams
2015-11-16 19:48               ` Ross Zwisler
2015-11-16 19:48                 ` Ross Zwisler
2015-11-16 19:48                 ` Ross Zwisler
2015-11-16 19:48                 ` Ross Zwisler
2015-11-16 20:34                 ` Dan Williams
2015-11-16 20:34                   ` Dan Williams
2015-11-16 20:34                   ` Dan Williams
2015-11-16 20:34                   ` Dan Williams
2015-11-16 23:57                   ` Ross Zwisler
2015-11-16 23:57                     ` Ross Zwisler
2015-11-16 23:57                     ` Ross Zwisler
2015-11-16 23:57                     ` Ross Zwisler
2015-11-16 22:14             ` Dave Chinner
2015-11-16 22:14               ` Dave Chinner
2015-11-16 22:14               ` Dave Chinner
2015-11-16 23:29               ` Ross Zwisler [this message]
2015-11-16 23:29                 ` Ross Zwisler
2015-11-16 23:29                 ` Ross Zwisler
2015-11-16 23:29                 ` Ross Zwisler
2015-11-16 23:42                 ` Dave Chinner
2015-11-16 23:42                   ` Dave Chinner
2015-11-16 23:42                   ` Dave Chinner
2015-11-16 23:42                   ` Dave Chinner
2015-11-16 20:09         ` Ross Zwisler
2015-11-16 20:09           ` Ross Zwisler
2015-11-16 20:09           ` Ross Zwisler
2015-11-18 10:40           ` Jan Kara
2015-11-18 10:40             ` Jan Kara
2015-11-18 10:40             ` Jan Kara
2015-11-18 10:40             ` Jan Kara
2015-11-18 16:16             ` Ross Zwisler
2015-11-18 16:16               ` Ross Zwisler
2015-11-18 16:16               ` Ross Zwisler
2015-11-18 16:16               ` Ross Zwisler
2015-11-18 16:16               ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 04/11] dax: support dirty DAX entries in radix tree Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 05/11] mm: add follow_pte_pmd() Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 06/11] mm: add pgoff_mkclean() Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 07/11] mm: add find_get_entries_tag() Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-16 22:42   ` Dave Chinner
2015-11-16 22:42     ` Dave Chinner
2015-11-16 22:42     ` Dave Chinner
2015-11-17 18:08     ` Ross Zwisler
2015-11-17 18:08       ` Ross Zwisler
2015-11-17 18:08       ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 08/11] dax: add support for fsync/sync Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-16 22:58   ` Dave Chinner
2015-11-16 22:58     ` Dave Chinner
2015-11-16 22:58     ` Dave Chinner
2015-11-17 18:30     ` Ross Zwisler
2015-11-17 18:30       ` Ross Zwisler
2015-11-17 18:30       ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 09/11] ext2: add support for DAX fsync/msync Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 10/11] ext4: " Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06 ` [PATCH v2 11/11] xfs: " Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-14  0:06   ` Ross Zwisler
2015-11-16 23:12   ` Dave Chinner
2015-11-16 23:12     ` Dave Chinner
2015-11-16 23:12     ` Dave Chinner
2015-11-16 23:12     ` Dave Chinner
2015-11-17 19:03     ` Ross Zwisler
2015-11-17 19:03       ` Ross Zwisler
2015-11-17 19:03       ` Ross Zwisler
2015-11-17 19:03       ` Ross Zwisler
2015-11-20  0:37       ` Dave Chinner
2015-11-20  0:37         ` Dave Chinner
2015-11-20  0:37         ` Dave Chinner
2015-11-16 14:41 ` [PATCH v2 00/11] DAX fsynx/msync support Jan Kara
2015-11-16 14:41   ` Jan Kara
2015-11-16 14:41   ` Jan Kara
2015-11-16 16:58   ` Dan Williams
2015-11-16 16:58     ` Dan Williams
2015-11-16 16:58     ` Dan Williams
2015-11-16 16:58     ` Dan Williams
2015-11-16 20:01     ` Ross Zwisler
2015-11-16 20:01       ` Ross Zwisler
2015-11-16 20:01       ` Ross Zwisler
2015-11-16 20:01       ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151116232927.GA5582@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=adilger@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=jlayton@poochiereds.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=x86@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.