All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>
Subject: Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences
Date: Sat, 6 Feb 2016 22:27:23 -0700	[thread overview]
Message-ID: <20160207052723.GA26420@linux.intel.com> (raw)
In-Reply-To: <20160206231512.GF31407@dastard>

On Sun, Feb 07, 2016 at 10:15:12AM +1100, Dave Chinner wrote:
> On Thu, Feb 04, 2016 at 10:15:58AM +0100, Jan Kara wrote:
> > On Wed 03-02-16 13:13:28, Ross Zwisler wrote:
> > > Here is the comment from Dave Chinner that had me move to having the calls to
> > > dax_writeback_mapping_range() into the generic mm code:
> > > 
> > >    > Lastly, this flushing really needs to be inside
> > >    > filemap_write_and_wait_range(), because we call the writeback code
> > >    > from many more places than just fsync to ensure ordering of various
> > >    > operations such that files are in known state before proceeding
> > >    > (e.g. hole punch).
> > > https://lkml.org/lkml/2015/11/16/847
> .....
> > > > So revisiting the decision I see two options:
> > > > 
> > > > 1) Move the DAX flushing code from filemap_write_and_wait() into
> > > > ->writepages() fs callback. There the filesystem can provide all the
> > > > information it needs including bdev, get_block callback, or whatever.
> > > 
> > > This seems fine as long as we add it to ->fsync as well since ->writepages is
> > > never called in that path, and as long as we are okay with skipping DAX
> > > writebacks on hole punch, truncate, and block relocation.
> > 
> > Look at ext4_sync_file() -> filemap_write_and_wait_range() ->
> > __filemap_fdatawrite_range() -> do_writepages(). Except those nrpages > 0
> > checks which would need to be changed.
> 
> Just to be clear: this is pretty much what I was implying was
> necessary when I said that the DAX flushing needed to be "inside
> filemap_write_and_wait_range".  And that's what I thought Ross was
> planning on doing after that round discussion.  i.e. Ross said:
> 
> "If the race described above isn't an issue then I agree moving this
> call out of the filesystems and down into the generic page writeback
> code is probably the right thing to do."
> 
> https://lkml.org/lkml/2015/11/17/718
> 
> Realistically, what Jan is saying in this thread is exactly what I
> said we needed to do way back when I first pointed out that fsync
> was broken and dirty tracking in the mapping radix tree was still
> needed for fsync to work effectively.

Here, let me try and quickly summarize what is going on.

1) The DAX fsync set was merged into v4.5-rc1, it does use the radix tree for
tracking dirty PTE and PMD pages, and we do currently call into the DAX sync
code via filemap_write_and_wait_range() as you initially suggested.

2) During testing of raw block devices + DAX I noticed that the struct
block_device that we were using for DAX operations was incorrect.  For the
fault handlers, etc. we can just get the correct bdev via get_block(), which
is passed in as a function pointer, but for the flushing code we don't have
access to get_block().  This is also an issue for XFS real-time devices,
whenever we get those working.

In short, somehow we need to get dax_writeback_mapping_range() a valid bdev.
Right now it is called via filemap_write_and_wait_range(), which can't provide
either the bdev nor a get_block() function pointer.  So, our options seem to
be:
  a) Move the calls to dax_writeback_mapping_range() into the filesystems
  (what Jan is suggesting, i.e. ->writepages())
  b) Keep the calls to dax_writeback_mapping_range() in the mm code, and
  provide a generic way to ask a filesystem for an inode's bdev.  I did a
  version of this using a superblock operation here:
  https://lkml.org/lkml/2016/2/2/941

3) During the review and discussion for the above problems, Jan noticed that
the flushing code wasn't being called for sync() and syncfs().  Clearly from
your other response (https://lkml.org/lkml/2016/2/6/168) you think this is
incorrect.  Regardless, the above issue remains -
dax_writeback_mapping_range() needs a bdev.  Do we move the calls into the
filesystem so the fs can provide a bdev, or do we we create a generic method
for DAX to ask the fs for the correct bdev for an inode?

  reply	other threads:[~2016-02-07  5:27 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-28 19:35 [PATCH 1/2] block: fix pfn_mkwrite() DAX fault handler Ross Zwisler
2016-01-28 19:35 ` Ross Zwisler
2016-01-28 19:35 ` [PATCH 2/2] dax: fix bdev NULL pointer dereferences Ross Zwisler
2016-01-28 19:35   ` Ross Zwisler
2016-01-28 20:21   ` Dan Williams
2016-01-28 20:21     ` Dan Williams
2016-01-28 21:38   ` Christoph Hellwig
2016-01-29 18:28     ` Ross Zwisler
2016-01-29 23:34       ` Ross Zwisler
2016-01-30  0:18         ` Dan Williams
2016-01-31 22:44         ` Dave Chinner
2016-01-30  5:28       ` Matthew Wilcox
2016-01-30  6:01         ` Dan Williams
2016-01-30  7:08           ` Jared Hulbert
2016-01-31  2:32           ` Matthew Wilcox
2016-01-31  6:12             ` Ross Zwisler
2016-01-31 10:55               ` Matthew Wilcox
2016-01-31 16:38                 ` Dan Williams
2016-01-31 18:07                   ` Matthew Wilcox
2016-01-31 18:18                     ` Dan Williams
2016-01-31 18:27                       ` Matthew Wilcox
2016-01-31 18:50                         ` Dan Williams
2016-01-31 19:51                     ` Dan Williams
2016-02-01 13:44             ` Matthew Wilcox
2016-02-01 14:51         ` Jan Kara
2016-02-01 20:49           ` Matthew Wilcox
2016-02-01 21:47           ` Dave Chinner
2016-02-02  6:06             ` Jared Hulbert
2016-02-02  6:46               ` Dan Williams
2016-02-02  8:05                 ` Jared Hulbert
2016-02-02 16:51                   ` Dan Williams
2016-02-02 21:46                     ` Jared Hulbert
2016-02-03  0:34                       ` Matthew Wilcox
2016-02-03  1:21                         ` Jared Hulbert
2016-02-02 11:17             ` Jan Kara
2016-02-02 16:33               ` Dan Williams
2016-02-02 16:46                 ` Jan Kara
2016-02-02 17:10                   ` Dan Williams
2016-02-02 17:34                     ` Ross Zwisler
2016-02-02 17:46                       ` Dan Williams
2016-02-02 17:47                         ` Dan Williams
2016-02-02 18:24                           ` Ross Zwisler
2016-02-02 18:46                         ` Matthew Wilcox
2016-02-02 18:59                           ` Dan Williams
2016-02-02 20:14                             ` Matthew Wilcox
2016-02-03 11:09                           ` Jan Kara
2016-02-03 10:46                       ` Jan Kara
2016-02-03 20:13                         ` Ross Zwisler
2016-02-04  9:15                           ` Jan Kara
2016-02-04 23:38                             ` Ross Zwisler
2016-02-06 23:15                             ` Dave Chinner
2016-02-07  5:27                               ` Ross Zwisler [this message]
2016-02-04 19:56                         ` Ross Zwisler
2016-02-04 20:29                           ` Jan Kara
2016-02-04 22:19                             ` Ross Zwisler
2016-02-05 22:25                             ` Ross Zwisler
2016-02-06 23:40                               ` Dave Chinner
2016-02-07  6:43                                 ` Ross Zwisler
2016-02-08 13:48                                   ` Jan Kara
2016-02-07  8:38                               ` Christoph Hellwig
2016-02-08 15:55                                 ` Ross Zwisler
2016-02-02 18:41               ` Ross Zwisler
2016-02-02 18:53                 ` Ross Zwisler
2016-02-02  0:02     ` Ross Zwisler
2016-02-02  7:10       ` Dave Chinner
2016-02-02 10:34       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160207052723.GA26420@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.