All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: "Kani, Toshimitsu" <toshi.kani-ZPxbGqLxI0U@public.gmane.org>
Cc: "axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org"
	<axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
	"sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org"
	<sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org"
	<snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"axboe-b10kYP2dOMg@public.gmane.org"
	<axboe-b10kYP2dOMg@public.gmane.org>,
	"linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org"
	<linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org"
	<dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org"
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	"agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org"
	<agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices
Date: Wed, 22 Jun 2016 12:15:25 -0700	[thread overview]
Message-ID: <CAPcyv4gqiNQ-FRqCV3WxBzjUBZNY6eZVA9ioc0q+Lm=oG8bWAg@mail.gmail.com> (raw)
In-Reply-To: <1466616868.3504.320.camel-ZPxbGqLxI0U@public.gmane.org>

On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote:
> On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote:
>> On Tue, Jun 21 2016 at 11:44am -0400,
>> Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote:
>> >
>> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:
>> > >
>> > > On Mon, Jun 20 2016 at  6:22pm -0400,
>> > > Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>  :
>> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX
>> > > rather than establish GENHD_FL_DAX on the genhd?
>> > >
>> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to
>> > > check for a queue flag.
>> >
>> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request
>> > queue, except for protecting the underlining device being disabled while
>> > direct_access() is called (b2e0d1625e19).
>>
>> The devices in question have a request_queue.  All bio-based device have
>> a request_queue.
>
> DAX-capable devices have two operation modes, bio-based and DAX.  I agree that
> bio-based operation is associated with a request queue, and its capabilities
> should be set to it.  DAX, on the other hand, is rather independent from a
> request queue.
>
>> I don't have a big problem with GENHD_FL_DAX.  Just wanted to point out
>> that such block device capabilities are generally advertised in terms of
>> a QUEUE_FLAG.
>
> I do not have a strong opinion, but feel a bit odd to associate DAX to a
> request queue.

Given that we do not support dax to a raw block device [1] it seems a
gendisk flag is more misleading than request_queue flag that specifies
what requests can be made of the device.

[1]: acc93d30d7d4 Revert "block: enable dax for raw block devices"


>> > About protecting direct_access, this patch assumes that the underlining
>> > device cannot be disabled until dtr() is called.  Is this correct?  If
>> > not, I will need to call dax_map_atomic().
>>
>> One of the big design considerations for DM that a DM device can be
>> suspended (with or without flush) and any new IO will be blocked until
>> the DM device is resumed.
>>
>> So ideally DM should be able to have the same capability even if using
>> DAX.
>
> Supporting suspend for DAX is challenging since it allows user applications to
> access a device directly.  Once a device range is mmap'd, there is no kernel
> intervention to access the range, unless we invalidate user mappings.  This
> isn't done today even after a driver is unbind'd from a device.
>
>> But that is different than what commit b2e0d1625e19 is addressing.  For
>> DM, I wouldn't think you'd need the extra protections that
>> dax_map_atomic() is providing given that the underlying block device
>> lifetime is managed via DM core's dm_get_device/dm_put_device (see also:
>> dm.c:open_table_device/close_table_device).
>
> I thought so as well.  But I realized that there is (almost) nothing that can
> prevent the unbind operation.  It cannot fail, either.  This unbind proceeds
> even when a device is in-use.  In case of a pmem device, it is only protected
> by pmem_release_queue(), which is called when a pmem device is being deleted
> and calls blk_cleanup_queue() to serialize a critical section between
> blk_queue_enter() and blk_queue_exit() per b2e0d1625e19.  This prevents from a
> kernel DTLB fault, but does not prevent a device disappeared while in-use.
>
> Protecting DM's underlining device with blk_queue_enter() (or something
> similar) requires more thoughts...  blk_queue_enter() to a DM device cannot be
> redirected to its underlining device.  So, this is TBD for now.  But I do not
> think this is a blocker issue since doing unbind to a underlining device is
> quite harmful no matter what we do - even if it is protected with
> blk_queue_enter().

I still have the "block device removed" notification patches on my
todo list.  It's not a blocker, but there are scenarios where we can
keep accessing memory via dax of a disabled device leading to memory
corruption.  I'll bump that up in my queue now that we are looking at
additional scenarios where letting DAX mappings leak past the
reconfiguration of a block device could lead to trouble.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: "Kani, Toshimitsu" <toshi.kani@hpe.com>
Cc: "snitzer@redhat.com" <snitzer@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"sandeen@redhat.com" <sandeen@redhat.com>,
	"linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
	"agk@redhat.com" <agk@redhat.com>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"axboe@fb.com" <axboe@fb.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>
Subject: Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices
Date: Wed, 22 Jun 2016 12:15:25 -0700	[thread overview]
Message-ID: <CAPcyv4gqiNQ-FRqCV3WxBzjUBZNY6eZVA9ioc0q+Lm=oG8bWAg@mail.gmail.com> (raw)
In-Reply-To: <1466616868.3504.320.camel@hpe.com>

On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote:
>> On Tue, Jun 21 2016 at 11:44am -0400,
>> Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> >
>> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:
>> > >
>> > > On Mon, Jun 20 2016 at  6:22pm -0400,
>> > > Mike Snitzer <snitzer@redhat.com> wrote:
>  :
>> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX
>> > > rather than establish GENHD_FL_DAX on the genhd?
>> > >
>> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to
>> > > check for a queue flag.
>> >
>> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request
>> > queue, except for protecting the underlining device being disabled while
>> > direct_access() is called (b2e0d1625e19).
>>
>> The devices in question have a request_queue.  All bio-based device have
>> a request_queue.
>
> DAX-capable devices have two operation modes, bio-based and DAX.  I agree that
> bio-based operation is associated with a request queue, and its capabilities
> should be set to it.  DAX, on the other hand, is rather independent from a
> request queue.
>
>> I don't have a big problem with GENHD_FL_DAX.  Just wanted to point out
>> that such block device capabilities are generally advertised in terms of
>> a QUEUE_FLAG.
>
> I do not have a strong opinion, but feel a bit odd to associate DAX to a
> request queue.

Given that we do not support dax to a raw block device [1] it seems a
gendisk flag is more misleading than request_queue flag that specifies
what requests can be made of the device.

[1]: acc93d30d7d4 Revert "block: enable dax for raw block devices"


>> > About protecting direct_access, this patch assumes that the underlining
>> > device cannot be disabled until dtr() is called.  Is this correct?  If
>> > not, I will need to call dax_map_atomic().
>>
>> One of the big design considerations for DM that a DM device can be
>> suspended (with or without flush) and any new IO will be blocked until
>> the DM device is resumed.
>>
>> So ideally DM should be able to have the same capability even if using
>> DAX.
>
> Supporting suspend for DAX is challenging since it allows user applications to
> access a device directly.  Once a device range is mmap'd, there is no kernel
> intervention to access the range, unless we invalidate user mappings.  This
> isn't done today even after a driver is unbind'd from a device.
>
>> But that is different than what commit b2e0d1625e19 is addressing.  For
>> DM, I wouldn't think you'd need the extra protections that
>> dax_map_atomic() is providing given that the underlying block device
>> lifetime is managed via DM core's dm_get_device/dm_put_device (see also:
>> dm.c:open_table_device/close_table_device).
>
> I thought so as well.  But I realized that there is (almost) nothing that can
> prevent the unbind operation.  It cannot fail, either.  This unbind proceeds
> even when a device is in-use.  In case of a pmem device, it is only protected
> by pmem_release_queue(), which is called when a pmem device is being deleted
> and calls blk_cleanup_queue() to serialize a critical section between
> blk_queue_enter() and blk_queue_exit() per b2e0d1625e19.  This prevents from a
> kernel DTLB fault, but does not prevent a device disappeared while in-use.
>
> Protecting DM's underlining device with blk_queue_enter() (or something
> similar) requires more thoughts...  blk_queue_enter() to a DM device cannot be
> redirected to its underlining device.  So, this is TBD for now.  But I do not
> think this is a blocker issue since doing unbind to a underlining device is
> quite harmful no matter what we do - even if it is protected with
> blk_queue_enter().

I still have the "block device removed" notification patches on my
todo list.  It's not a blocker, but there are scenarios where we can
keep accessing memory via dax of a disabled device leading to memory
corruption.  I'll bump that up in my queue now that we are looking at
additional scenarios where letting DAX mappings leak past the
reconfiguration of a block device could lead to trouble.

  parent reply	other threads:[~2016-06-22 19:15 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani
2016-06-13 22:21 ` Toshi Kani
2016-06-13 22:21 ` Toshi Kani
2016-06-13 22:21 ` [PATCH 1/6] genhd: Add GENHD_FL_DAX to gendisk flags Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21 ` [PATCH 2/6] block: Check GENHD_FL_DAX for DAX capability Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21 ` [PATCH 3/6] dm: Add dm_blk_direct_access() for mapped device Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21 ` [PATCH 4/6] dm-linear: Add linear_direct_access() Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21 ` [PATCH 5/6] dm, dm-linear: Add dax_supported to dm_target Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21 ` [PATCH 6/6] dm: Enable DAX support for mapper device Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:21   ` Toshi Kani
2016-06-13 22:57 ` [PATCH 0/6] Support DAX for device-mapper dm-linear devices Mike Snitzer
     [not found]   ` <20160613225756.GA18417-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-20 18:00     ` Mike Snitzer
2016-06-20 18:00       ` Mike Snitzer
     [not found]       ` <20160620180043.GA21261-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-20 18:31         ` Kani, Toshimitsu
2016-06-20 18:31           ` Kani, Toshimitsu
     [not found]           ` <1466446861.3504.243.camel-ZPxbGqLxI0U@public.gmane.org>
2016-06-20 19:40             ` Mike Snitzer
2016-06-20 19:40               ` Mike Snitzer
     [not found]               ` <20160620194026.GA21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-20 19:52                 ` Mike Snitzer
2016-06-20 19:52                   ` Mike Snitzer
     [not found]                   ` <20160620195217.GB21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-20 20:11                     ` Kani, Toshimitsu
2016-06-20 20:11                       ` Kani, Toshimitsu
     [not found]                       ` <1466452883.3504.244.camel-ZPxbGqLxI0U@public.gmane.org>
2016-06-20 21:28                         ` Kani, Toshimitsu
2016-06-20 21:28                           ` Kani, Toshimitsu
     [not found]                           ` <1466457467.3504.249.camel-ZPxbGqLxI0U@public.gmane.org>
2016-06-20 22:22                             ` Mike Snitzer
2016-06-20 22:22                               ` Mike Snitzer
     [not found]                               ` <20160620222236.GA22461-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-21 13:41                                 ` Mike Snitzer
2016-06-21 13:41                                   ` Mike Snitzer
     [not found]                                   ` <20160621134147.GA26392-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-21 15:44                                     ` Kani, Toshimitsu
2016-06-21 15:44                                       ` Kani, Toshimitsu
     [not found]                                       ` <1466523280.3504.262.camel-ZPxbGqLxI0U@public.gmane.org>
2016-06-21 15:50                                         ` Kani, Toshimitsu
2016-06-21 15:50                                           ` Kani, Toshimitsu
2016-06-21 16:25                                         ` Dan Williams
2016-06-21 16:25                                           ` Dan Williams
     [not found]                                           ` <CAPcyv4gFREc94ANuFD_Lyddx3iqRTN2UDebgeJe3LqPL8xrVzg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-21 16:35                                             ` Kani, Toshimitsu
2016-06-21 16:35                                               ` Kani, Toshimitsu
     [not found]                                               ` <1466526342.3504.270.camel-ZPxbGqLxI0U@public.gmane.org>
2016-06-21 16:45                                                 ` Dan Williams
2016-06-21 16:45                                                   ` Dan Williams
     [not found]                                                   ` <CAPcyv4ht8B7dHe1ckv5d=bOrRzCy3=ZDVSTD0rRsak_LYD8r8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-21 16:56                                                     ` Kani, Toshimitsu
2016-06-21 16:56                                                       ` Kani, Toshimitsu
2016-06-21 18:17                                         ` Mike Snitzer
2016-06-21 18:17                                           ` Mike Snitzer
2016-06-22 17:44                                           ` Kani, Toshimitsu
2016-06-22 17:44                                             ` Kani, Toshimitsu
     [not found]                                             ` <1466616868.3504.320.camel-ZPxbGqLxI0U@public.gmane.org>
2016-06-22 19:15                                               ` Dan Williams [this message]
2016-06-22 19:15                                                 ` Dan Williams
2016-06-22 20:16                                                 ` Kani, Toshimitsu
2016-06-22 22:38                                                   ` Mike Snitzer
2016-06-22 22:38                                                     ` Mike Snitzer
     [not found]                                                     ` <20160622223842.GA34512-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-22 22:59                                                       ` Kani, Toshimitsu
2016-06-22 22:59                                                         ` Kani, Toshimitsu
2016-06-13 23:18 ` Dan Williams
2016-06-13 23:18   ` Dan Williams
2016-06-13 23:18   ` Dan Williams
2016-06-13 23:59   ` Kani, Toshimitsu
2016-06-13 23:59     ` Kani, Toshimitsu
2016-06-13 23:59     ` Kani, Toshimitsu
2016-06-14  0:02     ` Dan Williams
2016-06-14  0:02       ` Dan Williams
2016-06-14  0:02       ` Dan Williams
2016-06-14  7:30       ` Dan Williams
2016-06-14  7:30         ` Dan Williams
2016-06-14  7:30         ` Dan Williams
2016-06-14 13:50     ` Jeff Moyer
2016-06-14 13:50       ` Jeff Moyer
2016-06-14 13:50       ` Jeff Moyer
2016-06-14 15:41       ` Mike Snitzer
2016-06-14 15:41         ` Mike Snitzer
2016-06-14 15:41         ` Mike Snitzer
2016-06-14 18:00         ` Kani, Toshimitsu
2016-06-14 20:19         ` Jeff Moyer
2016-06-14 20:19           ` Jeff Moyer
2016-06-14 20:19           ` Jeff Moyer
2016-06-15  1:46           ` Mike Snitzer
2016-06-15  1:46             ` Mike Snitzer
2016-06-15  2:07             ` Dan Williams
2016-06-15  2:07               ` Dan Williams
2016-06-15  2:35               ` Mike Snitzer
2016-06-15  2:35                 ` Mike Snitzer
2016-06-14 15:53       ` Kani, Toshimitsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4gqiNQ-FRqCV3WxBzjUBZNY6eZVA9ioc0q+Lm=oG8bWAg@mail.gmail.com' \
    --to=dan.j.williams-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=axboe-b10kYP2dOMg@public.gmane.org \
    --cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
    --cc=dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org \
    --cc=linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=toshi.kani-ZPxbGqLxI0U@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.