All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jared Hulbert <jaredeh@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>
Subject: Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences
Date: Tue, 2 Feb 2016 13:46:06 -0800	[thread overview]
Message-ID: <CA+ZsKJ4rrgQNnnrdvmnTP2GcrZna83+yUV_GFBhEQ6HDKqd7HA@mail.gmail.com> (raw)
In-Reply-To: <CAPcyv4iM9NGO3SbYqEg4HfT6f-ant-debNHpAaBDvXj+-hAjRQ@mail.gmail.com>

On Tue, Feb 2, 2016 at 8:51 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Feb 2, 2016 at 12:05 AM, Jared Hulbert <jaredeh@gmail.com> wrote:
> [..]
>> Well... as CONFIG_BLOCK was not required with filemap_xip.c for a
>> decade.  This CONFIG_BLOCK dependency is a result of an incremental
>> feature from a certain point of view ;)
>>
>> The obvious 'driver' is physical RAM without a particular driver.
>> Remember please I'm talking about embedded.  RAM measured in MiB and
>> funky one off hardware etc.  In the embedded world there are lots of
>> ways that persistent memory has been supported in device specific ways
>> without the new fancypants NFIT and Intel instructions,so frankly
>> they don't fit in the PMEM stuff.  Maybe they could be supported in
>> PMEM but not without effort to bring embedded players to the table.
>
> Not sure what you're trying to say here.  An ACPI NFIT only feeds the
> generic libnvdimm device model.  You don't need NFIT to get pmem.

Right... I'm just not seeing how the libnvdimm device model fits, is
relevant, or useful to a persistent SRAM in embedded.  Therefore I
don't see some of the user will have a driver.

>> The other drivers are the MTD drivers, probably as read-only for now.
>> But the paradigm there isn't so different from what PMEM looks like
>> with asymmetric read/write capabilities.
>>
>> The filesystem I'm concerned with is AXFS
>> (https://www.kernel.org/doc/ols/2008/ols2008v1-pages-211-218.pdf).
>> Which I've been planning on trying to merge again due to a recent
>> resurgence of interest.  The device model for AXFS is... weird.  It
>> can use one or two devices at a time of any mix of NOR MTD, NAND MTD,
>> block, and unmanaged physical memory.  It's a terribly useful model
>> for embedded.  Anyway AXFS is readonly so hacking in a read only
>> dax_fault_nodev() and dax_file_read() would work fine, looks easy
>> enough.  But... it would be cool if similar small embedded focused RW
>> filesystems were enabled.
>
> Are those also out of tree?

Of course.  Merging embedded filesystems is little merging regular
filesystems except 98% of you reviewers don't want it merged.

>> I don't expect you to taint DAX with design requirements for this
>> stuff that it wasn't built for, nobody ends up happy in that case.
>> However, if enabling the filesystem to manage the bdev_direct_access()
>> interactions solves some of the "alternate device" problems you are
>> discussing here, then there is a chance we can accommodate both.
>> Sometimes that works.
>>
>> So... Forget CONFIG_BLOCK=n entirely I didn't want that to be the
>> focus anyway.  Does it help to support the weirder XFS and btrfs
>> device models to enable the filesystem to handle the
>> bdev_direct_access() stuff?
>
> It's not clear that it does.  We just clarified with xfs and ext4 that
> we can really on get_blocks().  That solves the immediate concern with
> multi-device filesystems.

IMO you're making DAX more complex by overly coupling to the bdev and
I think it could bite you later.  I submit this rework of the radix
tree and confusion about where to get the real bdev as evidence.  I'm
guessing that it won't be the last time.  It's unnecessary to couple
it like this, and in fact is not how the vfs has been layered in the
past.

The trouble with vfs work has been that it straddles the line between
mm and block, unfortunately that line is dark chasm with ill defined
boundaries.  DAX is even more exciting because it's trying to duct
tape the filesystem even closer to the mm system, one could argue it's
actually in some respects enabling the filesystem to bypass the mm
code.  On top of that DAX is designed to enable block based
filesystems to use RAM like devices.

Bolting the block device interface on to NVDIMM is a brilliant hack
and the right design choice, but it's still a hack.  The upside is it
enables the reuse of all this glorious legacy filesystem code which
does a pretty amazing job of handling what the pmem device
applications need considering they were designed to manage data on
platters of slow spinning rust.  How would DAX look like developed
with a filesystem purpose built for pmem?

To look at the the downside consider dax_fault().  Its called on a
fault to a user memory map, uses the filesystems get_block() to lookup
a sector so you can ask a block device to convert it to an address on
a DIMM.  Come on, that's awkward.  Everything around dax_fault() is
dripping with memory semantic interfaces, the dax_fault() call are
fundamentally about memory, the pmem calls are memory, the hardware is
memory, and yet it directly calls bdev_direct_access().  It's out of
place.

The legacy vfs/mm code didn't have this layering problem either.  Even
filemap_fault() that dax_fault() is modeled after doesn't call any
bdev methods directly, when it needs something it asks the filesystem
with a ->readpage().  The precedence is that you ask the filesystem
for what you need.  Look at the get_bdev() thing you've concluded you
need.  It _almost_ makes my point.  I just happen to be of the opinion
that you don't actually want or need the bdev, you want the pfn/kaddr
so you can flush or map or memcpy().

  reply	other threads:[~2016-02-02 21:46 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-28 19:35 [PATCH 1/2] block: fix pfn_mkwrite() DAX fault handler Ross Zwisler
2016-01-28 19:35 ` Ross Zwisler
2016-01-28 19:35 ` [PATCH 2/2] dax: fix bdev NULL pointer dereferences Ross Zwisler
2016-01-28 19:35   ` Ross Zwisler
2016-01-28 20:21   ` Dan Williams
2016-01-28 20:21     ` Dan Williams
2016-01-28 21:38   ` Christoph Hellwig
2016-01-29 18:28     ` Ross Zwisler
2016-01-29 23:34       ` Ross Zwisler
2016-01-30  0:18         ` Dan Williams
2016-01-31 22:44         ` Dave Chinner
2016-01-30  5:28       ` Matthew Wilcox
2016-01-30  6:01         ` Dan Williams
2016-01-30  7:08           ` Jared Hulbert
2016-01-31  2:32           ` Matthew Wilcox
2016-01-31  6:12             ` Ross Zwisler
2016-01-31 10:55               ` Matthew Wilcox
2016-01-31 16:38                 ` Dan Williams
2016-01-31 18:07                   ` Matthew Wilcox
2016-01-31 18:18                     ` Dan Williams
2016-01-31 18:27                       ` Matthew Wilcox
2016-01-31 18:50                         ` Dan Williams
2016-01-31 19:51                     ` Dan Williams
2016-02-01 13:44             ` Matthew Wilcox
2016-02-01 14:51         ` Jan Kara
2016-02-01 20:49           ` Matthew Wilcox
2016-02-01 21:47           ` Dave Chinner
2016-02-02  6:06             ` Jared Hulbert
2016-02-02  6:46               ` Dan Williams
2016-02-02  8:05                 ` Jared Hulbert
2016-02-02 16:51                   ` Dan Williams
2016-02-02 21:46                     ` Jared Hulbert [this message]
2016-02-03  0:34                       ` Matthew Wilcox
2016-02-03  1:21                         ` Jared Hulbert
2016-02-02 11:17             ` Jan Kara
2016-02-02 16:33               ` Dan Williams
2016-02-02 16:46                 ` Jan Kara
2016-02-02 17:10                   ` Dan Williams
2016-02-02 17:34                     ` Ross Zwisler
2016-02-02 17:46                       ` Dan Williams
2016-02-02 17:47                         ` Dan Williams
2016-02-02 18:24                           ` Ross Zwisler
2016-02-02 18:46                         ` Matthew Wilcox
2016-02-02 18:59                           ` Dan Williams
2016-02-02 20:14                             ` Matthew Wilcox
2016-02-03 11:09                           ` Jan Kara
2016-02-03 10:46                       ` Jan Kara
2016-02-03 20:13                         ` Ross Zwisler
2016-02-04  9:15                           ` Jan Kara
2016-02-04 23:38                             ` Ross Zwisler
2016-02-06 23:15                             ` Dave Chinner
2016-02-07  5:27                               ` Ross Zwisler
2016-02-04 19:56                         ` Ross Zwisler
2016-02-04 20:29                           ` Jan Kara
2016-02-04 22:19                             ` Ross Zwisler
2016-02-05 22:25                             ` Ross Zwisler
2016-02-06 23:40                               ` Dave Chinner
2016-02-07  6:43                                 ` Ross Zwisler
2016-02-08 13:48                                   ` Jan Kara
2016-02-07  8:38                               ` Christoph Hellwig
2016-02-08 15:55                                 ` Ross Zwisler
2016-02-02 18:41               ` Ross Zwisler
2016-02-02 18:53                 ` Ross Zwisler
2016-02-02  0:02     ` Ross Zwisler
2016-02-02  7:10       ` Dave Chinner
2016-02-02 10:34       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+ZsKJ4rrgQNnnrdvmnTP2GcrZna83+yUV_GFBhEQ6HDKqd7HA@mail.gmail.com \
    --to=jaredeh@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.