All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jared Hulbert <jaredeh@gmail.com>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>
Subject: Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences
Date: Tue, 2 Feb 2016 17:21:55 -0800	[thread overview]
Message-ID: <CA+ZsKJ5MXN7_rAp5y_HYor9zrrAG6ZW6Grix4fou5XDegup_gg@mail.gmail.com> (raw)
In-Reply-To: <20160203003416.GD3260@linux.intel.com>

On Tue, Feb 2, 2016 at 4:34 PM, Matthew Wilcox <willy@linux.intel.com> wrote:
> On Tue, Feb 02, 2016 at 01:46:06PM -0800, Jared Hulbert wrote:
>> On Tue, Feb 2, 2016 at 8:51 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> >> The filesystem I'm concerned with is AXFS
>> >> (https://www.kernel.org/doc/ols/2008/ols2008v1-pages-211-218.pdf).
>> >> Which I've been planning on trying to merge again due to a recent
>> >> resurgence of interest.  The device model for AXFS is... weird.  It
>> >> can use one or two devices at a time of any mix of NOR MTD, NAND MTD,
>> >> block, and unmanaged physical memory.  It's a terribly useful model
>> >> for embedded.  Anyway AXFS is readonly so hacking in a read only
>> >> dax_fault_nodev() and dax_file_read() would work fine, looks easy
>> >> enough.  But... it would be cool if similar small embedded focused RW
>> >> filesystems were enabled.
>> >
>> > Are those also out of tree?
>>
>> Of course.  Merging embedded filesystems is little merging regular
>> filesystems except 98% of you reviewers don't want it merged.
>
> You should at least be able to get it into staging these days.  I mean,
> look at some of the junk that's in staging ... and I don't think AXFS was
> nearly as bad.

Thanks....? ;)

>> IMO you're making DAX more complex by overly coupling to the bdev and
>> I think it could bite you later.  I submit this rework of the radix
>> tree and confusion about where to get the real bdev as evidence.  I'm
>> guessing that it won't be the last time.  It's unnecessary to couple
>> it like this, and in fact is not how the vfs has been layered in the
>> past.
>
> Huh?  The rework to use the radix tree for PFNs was done with one eye
> firmly on your usage case.  Just because I had to thread the get_block
> interface through it for the moment doesn't mean that I didn't have
> the "how do we get rid of get_block entirely" question on my mind.

Oh yeah.  I think we're on the same page.  But I'm not sure Dan is.  I
get the need to phase this in too.

> Using get_block seemed like the right idea three years ago.  I didn't
> know just how fundamentally ext4 and XFS disagree on how it should be
> used.

Sure.  I can see that.

>> To look at the the downside consider dax_fault().  Its called on a
>> fault to a user memory map, uses the filesystems get_block() to lookup
>> a sector so you can ask a block device to convert it to an address on
>> a DIMM.  Come on, that's awkward.  Everything around dax_fault() is
>> dripping with memory semantic interfaces, the dax_fault() call are
>> fundamentally about memory, the pmem calls are memory, the hardware is
>> memory, and yet it directly calls bdev_direct_access().  It's out of
>> place.
>
> What was out of place was the old 'get_xip_mem' in address_space
> operations.  Returning a kernel virtual address and a PFN from a
> filesystem operation?  That looks awful.

Yes.  Yes it does!  But at least my big hack was just one line. ;)
Nobody really even seemed to notice at the time.

>  All the other operations deal
> in struct pages, file offsets and occasionally sectors.  Of course, we
> don't have a struct page, so a pfn makes sense, but the kernel virtual
> address being returned was a gargantuan layering problem.

Well yes, but it was an expedient hack.

>> The legacy vfs/mm code didn't have this layering problem either.  Even
>> filemap_fault() that dax_fault() is modeled after doesn't call any
>> bdev methods directly, when it needs something it asks the filesystem
>> with a ->readpage().  The precedence is that you ask the filesystem
>> for what you need.  Look at the get_bdev() thing you've concluded you
>> need.  It _almost_ makes my point.  I just happen to be of the opinion
>> that you don't actually want or need the bdev, you want the pfn/kaddr
>> so you can flush or map or memcpy().
>
> You want the pfn.  The device driver doesn't have enough information to
> give you a (coherent with userspace) kaddr.  That's what (some future
> arch-specific implementation of) dax_map_pfn() is for.  That's why it
> takes 'index' as a parameter, so you can calculate where it'll be mapped
> in userspace, and determine an appropriate kernel virtual address to
> use for it.

Oh.... I think I'm just beginning to catch your vision for
dax_map_pfn().  I still don't get why we can't just do semi-arch
specific flushing instead of the alignment thing.  But that just might
be epic ignorance on my part.  Either way flush or magic alignments
dax_(un)map_pfn() would handle it, right?

  reply	other threads:[~2016-02-03  1:21 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-28 19:35 [PATCH 1/2] block: fix pfn_mkwrite() DAX fault handler Ross Zwisler
2016-01-28 19:35 ` Ross Zwisler
2016-01-28 19:35 ` [PATCH 2/2] dax: fix bdev NULL pointer dereferences Ross Zwisler
2016-01-28 19:35   ` Ross Zwisler
2016-01-28 20:21   ` Dan Williams
2016-01-28 20:21     ` Dan Williams
2016-01-28 21:38   ` Christoph Hellwig
2016-01-29 18:28     ` Ross Zwisler
2016-01-29 23:34       ` Ross Zwisler
2016-01-30  0:18         ` Dan Williams
2016-01-31 22:44         ` Dave Chinner
2016-01-30  5:28       ` Matthew Wilcox
2016-01-30  6:01         ` Dan Williams
2016-01-30  7:08           ` Jared Hulbert
2016-01-31  2:32           ` Matthew Wilcox
2016-01-31  6:12             ` Ross Zwisler
2016-01-31 10:55               ` Matthew Wilcox
2016-01-31 16:38                 ` Dan Williams
2016-01-31 18:07                   ` Matthew Wilcox
2016-01-31 18:18                     ` Dan Williams
2016-01-31 18:27                       ` Matthew Wilcox
2016-01-31 18:50                         ` Dan Williams
2016-01-31 19:51                     ` Dan Williams
2016-02-01 13:44             ` Matthew Wilcox
2016-02-01 14:51         ` Jan Kara
2016-02-01 20:49           ` Matthew Wilcox
2016-02-01 21:47           ` Dave Chinner
2016-02-02  6:06             ` Jared Hulbert
2016-02-02  6:46               ` Dan Williams
2016-02-02  8:05                 ` Jared Hulbert
2016-02-02 16:51                   ` Dan Williams
2016-02-02 21:46                     ` Jared Hulbert
2016-02-03  0:34                       ` Matthew Wilcox
2016-02-03  1:21                         ` Jared Hulbert [this message]
2016-02-02 11:17             ` Jan Kara
2016-02-02 16:33               ` Dan Williams
2016-02-02 16:46                 ` Jan Kara
2016-02-02 17:10                   ` Dan Williams
2016-02-02 17:34                     ` Ross Zwisler
2016-02-02 17:46                       ` Dan Williams
2016-02-02 17:47                         ` Dan Williams
2016-02-02 18:24                           ` Ross Zwisler
2016-02-02 18:46                         ` Matthew Wilcox
2016-02-02 18:59                           ` Dan Williams
2016-02-02 20:14                             ` Matthew Wilcox
2016-02-03 11:09                           ` Jan Kara
2016-02-03 10:46                       ` Jan Kara
2016-02-03 20:13                         ` Ross Zwisler
2016-02-04  9:15                           ` Jan Kara
2016-02-04 23:38                             ` Ross Zwisler
2016-02-06 23:15                             ` Dave Chinner
2016-02-07  5:27                               ` Ross Zwisler
2016-02-04 19:56                         ` Ross Zwisler
2016-02-04 20:29                           ` Jan Kara
2016-02-04 22:19                             ` Ross Zwisler
2016-02-05 22:25                             ` Ross Zwisler
2016-02-06 23:40                               ` Dave Chinner
2016-02-07  6:43                                 ` Ross Zwisler
2016-02-08 13:48                                   ` Jan Kara
2016-02-07  8:38                               ` Christoph Hellwig
2016-02-08 15:55                                 ` Ross Zwisler
2016-02-02 18:41               ` Ross Zwisler
2016-02-02 18:53                 ` Ross Zwisler
2016-02-02  0:02     ` Ross Zwisler
2016-02-02  7:10       ` Dave Chinner
2016-02-02 10:34       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+ZsKJ5MXN7_rAp5y_HYor9zrrAG6ZW6Grix4fou5XDegup_gg@mail.gmail.com \
    --to=jaredeh@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.