All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	Juergen Gross <JGross@suse.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	andrew.cooper3@citrix.com,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Stefano Stabellini <stefano@aporeto.com>,
	David Vrabel <david.vrabel@citrix.com>,
	xen-devel@lists.xenproject.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen
Date: Thu, 13 Oct 2016 03:08:03 -0600	[thread overview]
Message-ID: <57FF6B130200007800116F96@prv-mh.provo.novell.com> (raw)
In-Reply-To: <20161013085344.ulju7pnnbvufc4em@hz-desktop>

>>> On 13.10.16 at 10:53, <haozhong.zhang@intel.com> wrote:
> On 10/13/16 02:34 -0600, Jan Beulich wrote:
>>>>> On 12.10.16 at 18:19, <dan.j.williams@intel.com> wrote:
>>> On Wed, Oct 12, 2016 at 9:01 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 12.10.16 at 17:42, <dan.j.williams@intel.com> wrote:
>>>>> On Wed, Oct 12, 2016 at 8:39 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>> On 12.10.16 at 16:58, <haozhong.zhang@intel.com> wrote:
>>>>>>> On 10/12/16 05:32 -0600, Jan Beulich wrote:
>>>>>>>>>>> On 12.10.16 at 12:33, <haozhong.zhang@intel.com> wrote:
>>>>>>>>> The layout is shown as the following diagram.
>>>>>>>>>
>>>>>>>>> +---------------+-----------+-------+----------+--------------+
>>>>>>>>> | whatever used | Partition | Super | Reserved | /dev/pmem0p1 |
>>>>>>>>> |  by kernel    |   Table   | Block | for Xen  |              |
>>>>>>>>> +---------------+-----------+-------+----------+--------------+
>>>>>>>>>                 \_____________________ _______________________/
>>>>>>>>>                                   V
>>>>>>>>>                              /dev/pmem0
>>>>>>>>
>>>>>>>>I have to admit that I dislike this, for not being OS-agnostic.
>>>>>>>>Neither should there be any Xen-specific region, nor should the
>>>>>>>>"whatever used by kernel" one be restricted to just Linux. What
>>>>>>>>I could see is an OS-reserved area ahead of the partition table,
>>>>>>>>the exact usage of which depends on which OS is currently
>>>>>>>>running (and in the Xen case this might be both Xen _and_ the
>>>>>>>>Dom0 kernel, arbitrated by a tbd protocol). After all, when
>>>>>>>>running under Xen, the Dom0 may not have a need for as much
>>>>>>>>control data as it has when running on bare hardware, for it
>>>>>>>>controlling less (if any) of the actual memory ranges when Xen
>>>>>>>>is present.
>>>>>>>>
>>>>>>>
>>>>>>> Isn't this OS-reserved area still not OS-agnostic, as it requires OS
>>>>>>> to know where the reserved area is?  Or do you mean it's not if it's
>>>>>>> defined by a protocol that is accepted by all OSes?
>>>>>>
>>>>>> The latter - we clearly won't get away without some agreement on
>>>>>> where to retrieve position and size of this area. I was simply
>>>>>> assuming that such a protocol already exists.
>>>>>>
>>>>>
>>>>> No, we should not mix the struct page reservation that the Dom0 kernel
>>>>> may actively use with the Xen reservation that the Dom0 kernel does
>>>>> not consume.  Explain again what is wrong with the partition approach?
>>>>
>>>> Not sure what was unclear in my previous reply. I don't think there
>>>> should be apriori knowledge of whether Xen is (going to be) used on
>>>> a system, and even if it gets used, but just occasionally, it would
>>>> (apart from the abstract considerations already given) be a waste
>>>> of resources to set something aside that could be used for other
>>>> purposes while Xen is not running. Static partitioning should only be
>>>> needed for persistent data.
>>>
>>> The reservation needs to be persistent / static even if the data is
>>> volatile, as is the case with struct page, because we can't have the
>>> size of the device change depending on use.  So, from the aspect of
>>> wasting space while Xen is not in use, both partitions and the
>>> intrinsic reservation approach suffer the same problem. Setting that
>>> aside I don't want to mix 2 different use cases into the same
>>> reservation.
>>
>>Then you didn't understand what I've said: I certainly didn't mean
>>the reservation to vary from a device perspective. However, when
>>Xen is in use I don't see why part of that static reservation couldn't
>>be used by Xen, and another part by the Dom0 kernel. The kernel
>>obviously would need to ask the hypervisor how much of the space
>>is left, and where that area starts.
>>
> 
> I think Dan means that there should be a clear separation between
> reservations for different usages (kernel/xen/...). The libnvdimm
> driver is for the linux kernel and only needs to maintain the
> reservation for kernel functionality. For others including xen/dm/...,
> if they want reservation for their own purpose, they should maintain
> their own reservations out of libnvdimm driver and avoid bothering the
> libnvdimm driver (e.g. add specific handling in libnvdimm driver).
> 
> IIUC, one existing example is device-mapper device (dm) which needs to
> reserve on-device area for its own meta-data. Its choice is to store
> the meta-data on the block device (/dev/pmemN) provided by the
> libnvdimm driver.
> 
> I think we can do the similar for Xen, like to lay another pseudo
> device on /dev/pmem and do the reservation, like 2. in my previous
> reply.

Well, my opinion certainly doesn't count much here, but I continue to
consider this a bad idea. For entities like drivers it may well be
appropriate, but I think there ought to be an independent concept
of "OS reserved", and in the Xen case this could then be shared
between hypervisor and Dom0 kernel. Or if we were to consider Dom0
"just a guest", things should even be the other way around: Xen gets
all of the OS reserved space, and Dom0 needs something custom.

Jan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: "Jan Beulich" <JBeulich@suse.com>
To: "Haozhong Zhang" <haozhong.zhang@intel.com>
Cc: "Stefano Stabellini" <stefano@aporeto.com>,
	"Arnd Bergmann" <arnd@arndb.de>, <andrew.cooper3@citrix.com>,
	"David Vrabel" <david.vrabel@citrix.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Xiao Guangrong" <guangrong.xiao@linux.intel.com>,
	"Ross Zwisler" <ross.zwisler@linux.intel.com>,
	<xen-devel@lists.xenproject.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	"Juergen Gross" <JGross@suse.com>,
	"Johannes Thumshirn" <jthumshirn@suse.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen
Date: Thu, 13 Oct 2016 03:08:03 -0600	[thread overview]
Message-ID: <57FF6B130200007800116F96@prv-mh.provo.novell.com> (raw)
In-Reply-To: <20161013085344.ulju7pnnbvufc4em@hz-desktop>

>>> On 13.10.16 at 10:53, <haozhong.zhang@intel.com> wrote:
> On 10/13/16 02:34 -0600, Jan Beulich wrote:
>>>>> On 12.10.16 at 18:19, <dan.j.williams@intel.com> wrote:
>>> On Wed, Oct 12, 2016 at 9:01 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 12.10.16 at 17:42, <dan.j.williams@intel.com> wrote:
>>>>> On Wed, Oct 12, 2016 at 8:39 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>> On 12.10.16 at 16:58, <haozhong.zhang@intel.com> wrote:
>>>>>>> On 10/12/16 05:32 -0600, Jan Beulich wrote:
>>>>>>>>>>> On 12.10.16 at 12:33, <haozhong.zhang@intel.com> wrote:
>>>>>>>>> The layout is shown as the following diagram.
>>>>>>>>>
>>>>>>>>> +---------------+-----------+-------+----------+--------------+
>>>>>>>>> | whatever used | Partition | Super | Reserved | /dev/pmem0p1 |
>>>>>>>>> |  by kernel    |   Table   | Block | for Xen  |              |
>>>>>>>>> +---------------+-----------+-------+----------+--------------+
>>>>>>>>>                 \_____________________ _______________________/
>>>>>>>>>                                   V
>>>>>>>>>                              /dev/pmem0
>>>>>>>>
>>>>>>>>I have to admit that I dislike this, for not being OS-agnostic.
>>>>>>>>Neither should there be any Xen-specific region, nor should the
>>>>>>>>"whatever used by kernel" one be restricted to just Linux. What
>>>>>>>>I could see is an OS-reserved area ahead of the partition table,
>>>>>>>>the exact usage of which depends on which OS is currently
>>>>>>>>running (and in the Xen case this might be both Xen _and_ the
>>>>>>>>Dom0 kernel, arbitrated by a tbd protocol). After all, when
>>>>>>>>running under Xen, the Dom0 may not have a need for as much
>>>>>>>>control data as it has when running on bare hardware, for it
>>>>>>>>controlling less (if any) of the actual memory ranges when Xen
>>>>>>>>is present.
>>>>>>>>
>>>>>>>
>>>>>>> Isn't this OS-reserved area still not OS-agnostic, as it requires OS
>>>>>>> to know where the reserved area is?  Or do you mean it's not if it's
>>>>>>> defined by a protocol that is accepted by all OSes?
>>>>>>
>>>>>> The latter - we clearly won't get away without some agreement on
>>>>>> where to retrieve position and size of this area. I was simply
>>>>>> assuming that such a protocol already exists.
>>>>>>
>>>>>
>>>>> No, we should not mix the struct page reservation that the Dom0 kernel
>>>>> may actively use with the Xen reservation that the Dom0 kernel does
>>>>> not consume.  Explain again what is wrong with the partition approach?
>>>>
>>>> Not sure what was unclear in my previous reply. I don't think there
>>>> should be apriori knowledge of whether Xen is (going to be) used on
>>>> a system, and even if it gets used, but just occasionally, it would
>>>> (apart from the abstract considerations already given) be a waste
>>>> of resources to set something aside that could be used for other
>>>> purposes while Xen is not running. Static partitioning should only be
>>>> needed for persistent data.
>>>
>>> The reservation needs to be persistent / static even if the data is
>>> volatile, as is the case with struct page, because we can't have the
>>> size of the device change depending on use.  So, from the aspect of
>>> wasting space while Xen is not in use, both partitions and the
>>> intrinsic reservation approach suffer the same problem. Setting that
>>> aside I don't want to mix 2 different use cases into the same
>>> reservation.
>>
>>Then you didn't understand what I've said: I certainly didn't mean
>>the reservation to vary from a device perspective. However, when
>>Xen is in use I don't see why part of that static reservation couldn't
>>be used by Xen, and another part by the Dom0 kernel. The kernel
>>obviously would need to ask the hypervisor how much of the space
>>is left, and where that area starts.
>>
> 
> I think Dan means that there should be a clear separation between
> reservations for different usages (kernel/xen/...). The libnvdimm
> driver is for the linux kernel and only needs to maintain the
> reservation for kernel functionality. For others including xen/dm/...,
> if they want reservation for their own purpose, they should maintain
> their own reservations out of libnvdimm driver and avoid bothering the
> libnvdimm driver (e.g. add specific handling in libnvdimm driver).
> 
> IIUC, one existing example is device-mapper device (dm) which needs to
> reserve on-device area for its own meta-data. Its choice is to store
> the meta-data on the block device (/dev/pmemN) provided by the
> libnvdimm driver.
> 
> I think we can do the similar for Xen, like to lay another pseudo
> device on /dev/pmem and do the reservation, like 2. in my previous
> reply.

Well, my opinion certainly doesn't count much here, but I continue to
consider this a bad idea. For entities like drivers it may well be
appropriate, but I think there ought to be an independent concept
of "OS reserved", and in the Xen case this could then be shared
between hypervisor and Dom0 kernel. Or if we were to consider Dom0
"just a guest", things should even be the other way around: Xen gets
all of the OS reserved space, and Dom0 needs something custom.

Jan

  reply	other threads:[~2016-10-13  9:08 UTC|newest]

Thread overview: 142+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-10  0:35 [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen Haozhong Zhang
2016-10-10  0:35 ` Haozhong Zhang
2016-10-10  0:35 ` [RFC KERNEL PATCH 1/2] nvdimm: add PFN_MODE_XEN to pfn device for Xen usage Haozhong Zhang
2016-10-10  0:35   ` Haozhong Zhang
2016-10-10  0:35 ` Haozhong Zhang
2016-10-10  0:35 ` [RFC KERNEL PATCH 2/2] xen, nvdimm: report pfn devices in PFN_MODE_XEN to Xen hypervisor Haozhong Zhang
2016-10-10  0:35   ` Haozhong Zhang
2016-10-10  0:35 ` Haozhong Zhang
2016-10-10  3:45 ` [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen Dan Williams
2016-10-10  3:45 ` Dan Williams
2016-10-10  3:45   ` Dan Williams
2016-10-10  6:32   ` Haozhong Zhang
2016-10-10  6:32     ` Haozhong Zhang
2016-10-10 16:24     ` Dan Williams
2016-10-10 16:24     ` Dan Williams
2016-10-10 16:24       ` Dan Williams
2016-10-11  7:11       ` Haozhong Zhang
2016-10-11  7:11         ` Haozhong Zhang
2016-10-11  7:11       ` Haozhong Zhang
2016-10-10  6:32   ` Haozhong Zhang
2016-10-10 16:43 ` [Xen-devel] " Andrew Cooper
2016-10-10 16:43   ` Andrew Cooper
2016-10-11  5:52   ` Haozhong Zhang
2016-10-11  5:52   ` [Xen-devel] " Haozhong Zhang
2016-10-11  5:52     ` Haozhong Zhang
2016-10-11 18:37     ` Andrew Cooper
2016-10-11 18:37       ` Andrew Cooper
2016-10-11 18:45       ` Konrad Rzeszutek Wilk
     [not found]       ` <de62aa59-37e0-b01f-1617-6fc8f6fb3620-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2016-10-11 18:45         ` [Xen-devel] " Konrad Rzeszutek Wilk
2016-10-11 18:45           ` Konrad Rzeszutek Wilk
2016-10-11 18:48         ` Konrad Rzeszutek Wilk
2016-10-11 18:48           ` Konrad Rzeszutek Wilk
2016-10-11 18:48       ` Konrad Rzeszutek Wilk
2016-10-11 18:37     ` Andrew Cooper
2016-10-11 13:08   ` [Xen-devel] " Jan Beulich
2016-10-11 13:08     ` Jan Beulich
2016-10-11 15:53     ` Dan Williams
2016-10-11 15:53     ` [Xen-devel] " Dan Williams
2016-10-11 15:53       ` Dan Williams
2016-10-11 16:58       ` Konrad Rzeszutek Wilk
2016-10-11 16:58       ` [Xen-devel] " Konrad Rzeszutek Wilk
2016-10-11 16:58         ` Konrad Rzeszutek Wilk
2016-10-11 17:51         ` Dan Williams
2016-10-11 17:51           ` Dan Williams
2016-10-11 18:15           ` Andrew Cooper
2016-10-11 18:15             ` Andrew Cooper
2016-10-11 18:42             ` Konrad Rzeszutek Wilk
2016-10-11 18:42             ` [Xen-devel] " Konrad Rzeszutek Wilk
2016-10-11 18:42               ` Konrad Rzeszutek Wilk
2016-10-11 19:43               ` Konrad Rzeszutek Wilk
2016-10-11 19:43                 ` Konrad Rzeszutek Wilk
2016-10-11 19:43               ` Konrad Rzeszutek Wilk
2016-10-11 18:15           ` Andrew Cooper
2016-10-11 18:33           ` [Xen-devel] " Konrad Rzeszutek Wilk
2016-10-11 18:33             ` Konrad Rzeszutek Wilk
2016-10-11 19:28             ` Dan Williams
2016-10-11 19:28               ` Dan Williams
2016-10-11 19:48               ` Konrad Rzeszutek Wilk
2016-10-11 19:48               ` [Xen-devel] " Konrad Rzeszutek Wilk
2016-10-11 19:48                 ` Konrad Rzeszutek Wilk
2016-10-11 20:17                 ` Dan Williams
2016-10-12 10:33                   ` Haozhong Zhang
2016-10-12 10:33                     ` Haozhong Zhang
2016-10-12 11:32                     ` Jan Beulich
2016-10-12 11:32                     ` [Xen-devel] " Jan Beulich
2016-10-12 11:32                       ` Jan Beulich
2016-10-12 14:58                       ` Haozhong Zhang
2016-10-12 14:58                       ` [Xen-devel] " Haozhong Zhang
2016-10-12 14:58                         ` Haozhong Zhang
2016-10-12 15:39                         ` Jan Beulich
2016-10-12 15:39                         ` [Xen-devel] " Jan Beulich
2016-10-12 15:39                           ` Jan Beulich
2016-10-12 15:42                           ` Dan Williams
2016-10-12 15:42                             ` Dan Williams
2016-10-12 16:01                             ` Jan Beulich
2016-10-12 16:01                               ` Jan Beulich
2016-10-12 16:19                               ` Dan Williams
2016-10-12 16:19                               ` [Xen-devel] " Dan Williams
2016-10-12 16:19                                 ` Dan Williams
2016-10-13  8:34                                 ` Jan Beulich
2016-10-13  8:34                                   ` Jan Beulich
2016-10-13  8:53                                   ` Haozhong Zhang
2016-10-13  8:53                                   ` [Xen-devel] " Haozhong Zhang
2016-10-13  8:53                                     ` Haozhong Zhang
2016-10-13  9:08                                     ` Jan Beulich [this message]
2016-10-13  9:08                                       ` Jan Beulich
2016-10-13 15:40                                       ` Dan Williams
2016-10-13 15:40                                       ` [Xen-devel] " Dan Williams
2016-10-13 15:40                                         ` Dan Williams
2016-10-13 16:01                                         ` Andrew Cooper
2016-10-13 16:01                                         ` [Xen-devel] " Andrew Cooper
2016-10-13 16:01                                           ` Andrew Cooper
2016-10-13 18:59                                           ` Dan Williams
2016-10-13 18:59                                           ` [Xen-devel] " Dan Williams
2016-10-13 18:59                                             ` Dan Williams
2016-10-13 19:33                                             ` Andrew Cooper
2016-10-13 19:33                                               ` Andrew Cooper
2016-10-14  7:08                                               ` Haozhong Zhang
2016-10-14  7:08                                                 ` Haozhong Zhang
2016-10-14 12:18                                                 ` Andrew Cooper
2016-10-14 12:18                                                   ` Andrew Cooper
2016-10-20  9:14                                                   ` Haozhong Zhang
2016-10-20  9:14                                                   ` [Xen-devel] " Haozhong Zhang
2016-10-20  9:14                                                     ` Haozhong Zhang
2016-10-20 21:46                                                     ` Andrew Cooper
2016-10-20 21:46                                                     ` [Xen-devel] " Andrew Cooper
2016-10-20 21:46                                                       ` Andrew Cooper
2016-10-14 12:18                                                 ` Andrew Cooper
2016-10-14  7:08                                               ` Haozhong Zhang
2016-10-13 19:33                                             ` Andrew Cooper
2016-10-14 10:03                                         ` [Xen-devel] " Jan Beulich
2016-10-14 10:03                                           ` Jan Beulich
2016-10-14 10:03                                         ` Jan Beulich
2016-10-13 15:46                                       ` Haozhong Zhang
2016-10-13 15:46                                       ` [Xen-devel] " Haozhong Zhang
2016-10-13 15:46                                         ` Haozhong Zhang
2016-10-14 10:16                                         ` Jan Beulich
2016-10-14 10:16                                         ` [Xen-devel] " Jan Beulich
2016-10-14 10:16                                           ` Jan Beulich
2016-10-20  9:15                                           ` Haozhong Zhang
2016-10-20  9:15                                             ` Haozhong Zhang
2016-10-20  9:15                                           ` Haozhong Zhang
2016-10-13  9:08                                     ` Jan Beulich
2016-10-13  9:08                                     ` [Xen-devel] " Haozhong Zhang
2016-10-13  9:08                                       ` Haozhong Zhang
2016-10-13  9:08                                     ` Haozhong Zhang
2016-10-13  8:34                                 ` Jan Beulich
2016-10-12 16:01                             ` Jan Beulich
2016-10-12 15:42                           ` Dan Williams
2016-10-12 10:33                   ` Haozhong Zhang
2016-10-11 20:17                 ` Dan Williams
2016-10-11 20:18                 ` Andrew Cooper
2016-10-11 20:18                 ` [Xen-devel] " Andrew Cooper
2016-10-11 20:18                   ` Andrew Cooper
2016-10-11 19:28             ` Dan Williams
2016-10-11 18:33           ` Konrad Rzeszutek Wilk
2016-10-11 17:51         ` Dan Williams
2016-10-12  7:25       ` Jan Beulich
2016-10-12  7:25       ` [Xen-devel] " Jan Beulich
2016-10-12  7:25         ` Jan Beulich
2016-10-11 13:08   ` Jan Beulich
2016-10-10 16:43 ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57FF6B130200007800116F96@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=JGross@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=arnd@arndb.de \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=haozhong.zhang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=stefano@aporeto.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.