Re: [PATCH] hw/vfio: let readonly flag take effect for mmaped regions

From: Alex Williamson <alex.williamson@redhat.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Zeng, Xin" <xin.zeng@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [PATCH] hw/vfio: let readonly flag take effect for mmaped regions
Date: Mon, 30 Mar 2020 08:59:23 -0600	[thread overview]
Message-ID: <20200330085923.19d7345f@w520.home> (raw)
In-Reply-To: <20200330063402.GE30683@joy-OptiPlex-7040>

On Mon, 30 Mar 2020 02:34:02 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Mon, Mar 30, 2020 at 09:35:27AM +0800, Yan Zhao wrote:
> > On Sat, Mar 28, 2020 at 01:25:37AM +0800, Alex Williamson wrote:  
> > > On Fri, 27 Mar 2020 11:19:34 +0000
> > > yan.y.zhao@intel.com wrote:
> > >   
> > > > From: Yan Zhao <yan.y.zhao@intel.com>
> > > > 
> > > > currently, vfio regions without VFIO_REGION_INFO_FLAG_WRITE are only
> > > > read-only when VFIO_REGION_INFO_FLAG_MMAP is not set.
> > > > 
> > > > regions with flag VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_MMAP
> > > > are only read-only in host page table for qemu.
> > > > 
> > > > This patch sets corresponding ept page entries read-only for regions
> > > > with flag VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_MMAP.
> > > > 
> > > > accordingly, it ignores guest write when guest writes to the read-only
> > > > regions are trapped.
> > > > 
> > > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > > Signed-off-by: Xin Zeng <xin.zeng@intel.com>
> > > > ---  
> > > 
> > > Currently we set the r/w protection on the mmap, do I understand
> > > correctly that the change in the vfio code below results in KVM exiting
> > > to QEMU to handle a write to a read-only region and therefore we need
> > > the memory.c change to drop the write?  This prevents a SIGBUS or
> > > similar?  
> > yes, correct. the change in memory.c is to prevent a SIGSEGV in host as
> > it's mmaped to read-only. we think it's better to just drop the writes
> > from guest rather than corrupt the qemu.
> >   
> > > 
> > > Meanwhile vfio_region_setup() uses the same vfio_region_ops for all
> > > regions and vfio_region_write() would still allow writes, so if the
> > > device were using x-no-mmap=on, I think we'd still get a write to this
> > > region and expect the vfio device to drop it.  Should we prevent that
> > > write in QEMU as well?  
> > yes, it expects vfio device to drop it right now.
> > As the driver sets the flag without VFIO_REGION_INFO_FLAG_WRITE, it should
> > handle it properly.
> > both dropping in qemu and dropping in vfio device are fine to us.
> > we wonder which one is your preference :)

The kernel and device should always do the right thing, we cannot rely
on the user to honor the mapping, but it's also a reasonable response
from the kernel to kill the process with a SIGSEGV if the user ignores
the protections.  So I don't think it's an either/or, the kernel needs
to do the right thing for itself and in this case QEMU should do the
right thing for itself, which is to drop writes for regions that don't
support it.  So in general, I agree with your patch.

> > > Can you also identify what device and region requires this so that we
> > > can decide whether this is QEMU 5.0 or 5.1 material?  PCI BARs are of
> > > course always R/W and the ROM uses different ops and doesn't support
> > > mmap, so this is a device specific region of some sort.  Thanks,
> > >   
> > It's a virtual mdev device for which we want to emulate a virtual
> > read-only MMIO BAR.
> > Is there any consideration that PCI BARs have to be R/W ?
> > we didn't find it out in PCI specification.

What the device chooses to do with writes to a BAR is its own business,
the PCI spec shouldn't try to define that.  There's also no PCI spec
mechanism to declare the access protections for an entire BAR, that's
device specific behavior.  The current QEMU vfio-pci behavior is
therefore somewhat implicit in knowing this for a directly assigned
device.  We can mmap the device and we expect writes to unwritable
registers within that mapping to be dropped.

For an mdev device, we can't rely on the user honoring the access
protections, ie. the user shouldn't be able to exploit the kernel or
device by doing so, but I also agree that QEMU, as a friendly vfio
user, should avoid unsupported operations and protect itself from how
the kernel may handle the fault.

Since this mdev device doesn't exist yet, I'm thinking this is QEMU
v5.1 material though.

> looks MMIO regions in vfio platform are also possible to be read-only and
> mmaped.

Yes.  Thanks,

Alex