Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
WARNING: multiple messages refer to this Message-ID
From: Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	KVM list <kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	David Hildenbrand <david-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-nvdimm
	<linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org>,
	Jason Wang <jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Qemu Developers
	<qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org>,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	adilger kernel
	<adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>,
	Ross Zwisler <zwisler-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Eric Blake <eblake-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	darrick wong
	<darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	"Michael S. Tsirkin"
	<mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Matthew Wilcox <willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Linux ACPI <linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-ext4 <linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Rik van Riel <riel-ebMLmSuQjDVBDgjK7y7TUQ@public.gmane.org>,
	Stefan Hajnoczi
	<stefanha-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Igor Mammedov <imammedo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	lcapitulino-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Kevin Wolf <kwolf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Nitesh Narayan Lal
	<nilal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
	xiaoguangrong eric
	<xiaoguangrong.eric-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Subject: Re: [PATCH v3 0/5] kvm "virtio pmem" device
Date: Tue, 15 Jan 2019 00:35:06 -0500 (EST)
Message-ID: <1684638419.64320214.1547530506805.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20190114222132.GH4205@dastard>


> > > On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote:
> > > >
> > > > > > Until you have images (and hence host page cache) shared between
> > > > > > multiple guests. People will want to do this, because it means they
> > > > > > only need a single set of pages in host memory for executable
> > > > > > binaries rather than a set of pages per guest. Then you have
> > > > > > multiple guests being able to detect residency of the same set of
> > > > > > pages. If the guests can then, in any way, control eviction of the
> > > > > > pages from the host cache, then we have a guest-to-guest
> > > > > > information
> > > > > > leak channel.
> > > > >
> > > > > I don't think we should ever be considering something that would
> > > > > allow a
> > > > > guest to evict page's from the host's pagecache [1].  The guest
> > > > > should
> > > > > be able to kick its own references to the host's pagecache out of its
> > > > > own pagecache, but not be able to influence whether the host or
> > > > > another
> > > > > guest has a read-only mapping cached.
> > > > >
> > > > > [1] Unless the guest is allowed to modify the host's file; obviously
> > > > > truncation, holepunching, etc are going to evict pages from the
> > > > > host's
> > > > > page cache.
> > > >
> > > > This is so correct. Guest does not not evict host page cache pages
> > > > directly.
> > >
> > > They don't right now.
> > >
> > > But someone is going to end up asking for discard to work so that
> > > the guest can free unused space in the underlying spares image (i.e.
> > > make use of fstrim or mount -o discard) because they have workloads
> > > that have bursts of space usage and they need to trim the image
> > > files afterwards to keep their overall space usage under control.
> > >
> > > And then....
> > 
> > ...we reject / push back on that patch citing the above concern.
> 
> So at what point do we draw the line?
> 
> We're allowing writable DAX mappings, but as I've pointed out that
> means we are going to be allowing  a potential information leak via
> files with shared extents to be directly mapped and written to.
> 
> But we won't allow useful admin operations that allow better
> management of host side storage space similar to how normal image
> files are used by guests because it's an information leak vector?

First of all Thank you for all the useful discussions. 
I am summarizing here:

- We have to live with the limitation to not support fstrim and 
  mount -o discard options with virtio-pmem as they will evict 
  host page cache pages. We cannot allow this for virtio-pmem
  for security reasons. These filesystem commands will just zero out 
  unused pages currently.

- If alot of space is unused and not freed guest can request host 
  Administrator for truncating the host backing image. 
  We are also planning to support qcow2 sparse image format at 
  host side with virtio-pmem.

- There is no existing solution for Qemu persistent memory 
  emulation with write support currently. This solution provides 
  us the paravartualized way of emulating persistent memory. It 
  does not emulate of ACPI structures instead it just uses VIRTIO 
  for communication between guest & host. It is fast because of its
  asynchronous nature and it works well. This makes use of at guest 
  side libnvdimm API's 
  
- If disk size freeing problem with guest files trim truncate is 
  very important for users, they can still use real hardware which 
  will provide them both (advance disk features & page cache by pass).

Considering all the above reasons I think this feature is useful
from virtualization point of view. As Dave rightly said we should
be careful and I think now we are careful with the security implications
of this device. 

Thanks again for all the inputs.

Best regards,
Pankaj  


> 
> That's splitting some really fine hairs there...
> 
> > > > In case of virtio-pmem & DAX, guest clears guest page cache exceptional
> > > > entries.
> > > > Its solely decision of host to take action on the host page cache
> > > > pages.
> > > >
> > > > In case of virtio-pmem, guest does not modify host file directly i.e
> > > > don't
> > > > perform hole punch & truncation operation directly on host file.
> > >
> > > ... this will no longer be true, and the nuclear landmine in this
> > > driver interface will have been armed....
> > 
> > I agree with the need to be careful when / if explicit cache control
> > is added, but that's not the case today.
> 
> "if"?
> 
> I expect it to be "when", not if. Expect the worst, plan for it now.
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
> 

From: Pankaj Gupta <pagupta@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>,
	Qemu Developers <qemu-devel@nongnu.org>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	virtualization@lists.linux-foundation.org,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>, Jan Kara <jack@suse.cz>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Rik van Riel <riel@surriel.com>,
	Nitesh Narayan Lal <nilal@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Ross Zwisler <zwisler@kernel.org>,
	vishal l verma <vishal.l.verma@intel.com>,
	dave jiang <dave.jiang@intel.com>,
	David Hildenbrand <david@redhat.com>, jmoyer <jmoyer@redhat.com>,
	xiaoguangrong eric <xiaoguangrong.eric@gmail.com>,
	Christoph Hellwig <hch@infradead.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	lcapitulino@redhat.com, Igor Mammedov <imammedo@redhat.com>,
	Eric Blake <eblake@redhat.com>, Theodore Ts'o <tytso@mit.edu>,
	adilger kernel <adilger.kernel@dilger.ca>,
	darrick wong <darrick.wong@oracle.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: Re: [PATCH v3 0/5] kvm "virtio pmem" device
Date: Tue, 15 Jan 2019 00:35:06 -0500 (EST)
Message-ID: <1684638419.64320214.1547530506805.JavaMail.zimbra@redhat.com> (raw)
Message-ID: <20190115053506.NS86bdCRruk7wDWavjaU12ASKlGw05yltcLt7vgoql4@z> (raw)
In-Reply-To: <20190114222132.GH4205@dastard>


> > > On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote:
> > > >
> > > > > > Until you have images (and hence host page cache) shared between
> > > > > > multiple guests. People will want to do this, because it means they
> > > > > > only need a single set of pages in host memory for executable
> > > > > > binaries rather than a set of pages per guest. Then you have
> > > > > > multiple guests being able to detect residency of the same set of
> > > > > > pages. If the guests can then, in any way, control eviction of the
> > > > > > pages from the host cache, then we have a guest-to-guest
> > > > > > information
> > > > > > leak channel.
> > > > >
> > > > > I don't think we should ever be considering something that would
> > > > > allow a
> > > > > guest to evict page's from the host's pagecache [1].  The guest
> > > > > should
> > > > > be able to kick its own references to the host's pagecache out of its
> > > > > own pagecache, but not be able to influence whether the host or
> > > > > another
> > > > > guest has a read-only mapping cached.
> > > > >
> > > > > [1] Unless the guest is allowed to modify the host's file; obviously
> > > > > truncation, holepunching, etc are going to evict pages from the
> > > > > host's
> > > > > page cache.
> > > >
> > > > This is so correct. Guest does not not evict host page cache pages
> > > > directly.
> > >
> > > They don't right now.
> > >
> > > But someone is going to end up asking for discard to work so that
> > > the guest can free unused space in the underlying spares image (i.e.
> > > make use of fstrim or mount -o discard) because they have workloads
> > > that have bursts of space usage and they need to trim the image
> > > files afterwards to keep their overall space usage under control.
> > >
> > > And then....
> > 
> > ...we reject / push back on that patch citing the above concern.
> 
> So at what point do we draw the line?
> 
> We're allowing writable DAX mappings, but as I've pointed out that
> means we are going to be allowing  a potential information leak via
> files with shared extents to be directly mapped and written to.
> 
> But we won't allow useful admin operations that allow better
> management of host side storage space similar to how normal image
> files are used by guests because it's an information leak vector?

First of all Thank you for all the useful discussions. 
I am summarizing here:

- We have to live with the limitation to not support fstrim and 
  mount -o discard options with virtio-pmem as they will evict 
  host page cache pages. We cannot allow this for virtio-pmem
  for security reasons. These filesystem commands will just zero out 
  unused pages currently.

- If alot of space is unused and not freed guest can request host 
  Administrator for truncating the host backing image. 
  We are also planning to support qcow2 sparse image format at 
  host side with virtio-pmem.

- There is no existing solution for Qemu persistent memory 
  emulation with write support currently. This solution provides 
  us the paravartualized way of emulating persistent memory. It 
  does not emulate of ACPI structures instead it just uses VIRTIO 
  for communication between guest & host. It is fast because of its
  asynchronous nature and it works well. This makes use of at guest 
  side libnvdimm API's 
  
- If disk size freeing problem with guest files trim truncate is 
  very important for users, they can still use real hardware which 
  will provide them both (advance disk features & page cache by pass).

Considering all the above reasons I think this feature is useful
from virtualization point of view. As Dave rightly said we should
be careful and I think now we are careful with the security implications
of this device. 

Thanks again for all the inputs.

Best regards,
Pankaj  


> 
> That's splitting some really fine hairs there...
> 
> > > > In case of virtio-pmem & DAX, guest clears guest page cache exceptional
> > > > entries.
> > > > Its solely decision of host to take action on the host page cache
> > > > pages.
> > > >
> > > > In case of virtio-pmem, guest does not modify host file directly i.e
> > > > don't
> > > > perform hole punch & truncation operation directly on host file.
> > >
> > > ... this will no longer be true, and the nuclear landmine in this
> > > driver interface will have been armed....
> > 
> > I agree with the need to be careful when / if explicit cache control
> > is added, but that's not the case today.
> 
> "if"?
> 
> I expect it to be "when", not if. Expect the worst, plan for it now.
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> 

  parent reply index

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-09 14:47 Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 1/5] libnvdimm: nd_region flush callback support Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 2/5] virtio-pmem: Add virtio pmem driver Pankaj Gupta
2019-01-14 15:54   ` Michael S. Tsirkin
2019-01-14 15:54     ` Michael S. Tsirkin
     [not found]     ` <20190114105314-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2019-01-15  6:33       ` Pankaj Gupta
2019-01-15  6:33         ` Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 4/5] ext4: disable map_sync for virtio pmem Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 5/5] xfs: " Pankaj Gupta
2019-01-09 14:47   ` Pankaj Gupta
2019-01-09 16:26   ` Darrick J. Wong
2019-01-09 18:08     ` Pankaj Gupta
     [not found] ` <20190109144736.17452-1-pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-10  1:26   ` [PATCH v3 0/5] kvm "virtio pmem" device Dave Chinner
2019-01-10  1:26     ` Dave Chinner
2019-01-10  2:40     ` Rik van Riel
2019-01-10  2:40       ` Rik van Riel
2019-01-10 10:17     ` Jan Kara
2019-01-10 10:17       ` Jan Kara
     [not found]       ` <20190110101757.GC15790-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2019-01-13  1:38         ` Pankaj Gupta
2019-01-13  1:38           ` Pankaj Gupta
     [not found]           ` <1354249849.63357171.1547343519970.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-13  1:43             ` Dan Williams
2019-01-13  1:43               ` Dan Williams
     [not found]               ` <CAPcyv4hwcgTUpgNCefCGu4DvgkYBp5b=f+hJ+FC=s5APYKoycg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-13  2:17                 ` [Qemu-devel] " Pankaj Gupta
2019-01-13  2:17                   ` Pankaj Gupta
     [not found]                   ` <540171952.63371441.1547345866585.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-14  9:55                     ` Jan Kara
2019-01-14  9:55                       ` Jan Kara
     [not found]                       ` <20190114095520.GC13316-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2019-01-14 10:16                         ` Pankaj Gupta
2019-01-14 10:16                           ` Pankaj Gupta
2019-01-11  7:45     ` Pankaj Gupta
2019-01-11  7:45       ` Pankaj Gupta
     [not found]       ` <1326478078.61913951.1547192704870.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-13 23:29         ` Dave Chinner
2019-01-13 23:29           ` Dave Chinner
2019-01-13 23:38           ` Matthew Wilcox
2019-01-13 23:38             ` Matthew Wilcox
     [not found]             ` <20190113233820.GX6310-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2019-01-14  2:50               ` Dave Chinner
2019-01-14  2:50                 ` Dave Chinner
2019-01-14  7:15               ` Pankaj Gupta
2019-01-14  7:15                 ` Pankaj Gupta
     [not found]                 ` <942065073.64011540.1547450140670.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-14 21:25                   ` Dave Chinner
2019-01-14 21:25                     ` Dave Chinner
2019-01-14 21:35                     ` Dan Williams
2019-01-14 21:35                       ` Dan Williams
     [not found]                       ` <CAPcyv4jtPcLV-s0sKNHwwk0ug7GLBV6699dpm1h3r2xSo879dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-14 22:21                         ` Dave Chinner
2019-01-14 22:21                           ` Dave Chinner
2019-01-15  2:19                           ` Michael S. Tsirkin
2019-01-15  2:19                             ` Michael S. Tsirkin
     [not found]                             ` <20190114205031-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2019-01-15  5:37                               ` [Qemu-devel] " Pankaj Gupta
2019-01-15  5:37                                 ` Pankaj Gupta
2019-01-15  5:35                           ` Pankaj Gupta [this message]
2019-01-15  5:35                             ` Pankaj Gupta
     [not found]                             ` <1684638419.64320214.1547530506805.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-15 20:42                               ` Dave Chinner
2019-01-15 20:42                                 ` Dave Chinner
2019-02-04 22:56 ` security implications of caching with virtio pmem (was Re: [PATCH v3 0/5] kvm "virtio pmem" device) Michael S. Tsirkin
2019-02-05  7:29   ` [Qemu-devel] " Pankaj Gupta
2019-02-06 14:00   ` David Hildenbrand
2019-02-06 18:01     ` Michael S. Tsirkin
2019-02-11  7:29   ` [Qemu-devel] " Pankaj Gupta
2019-02-11 22:29     ` Dave Chinner
2019-02-11 22:58       ` David Hildenbrand
2019-02-11 23:07         ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2019-01-09 13:50 [PATCH v3 0/5] kvm "virtio pmem" device Pankaj Gupta

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1684638419.64320214.1547530506805.JavaMail.zimbra@redhat.com \
    --to=pagupta-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org \
    --cc=darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
    --cc=david-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=eblake-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=imammedo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=jack-AlSwsSmVLrQ@public.gmane.org \
    --cc=jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=kwolf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=lcapitulino-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org \
    --cc=mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=nilal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org \
    --cc=riel-ebMLmSuQjDVBDgjK7y7TUQ@public.gmane.org \
    --cc=stefanha-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=tytso-3s7WtUTddSA@public.gmane.org \
    --cc=virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=xiaoguangrong.eric-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=zwisler-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox