All of lore.kernel.org
 help / color / mirror / Atom feed
From: Grzegorz Jaszczyk <jaz@semihalf.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Dominik Behr <dbehr@google.com>,
	Dominik Behr <dbehr@chromium.org>,
	linux-kernel@vger.kernel.org, dmy@semihalf.com, tn@semihalf.com,
	upstream@semihalf.com, dtor@google.com, jgg@ziepe.ca,
	kevin.tian@intel.com, cohuck@redhat.com, abhsahu@nvidia.com,
	yishaih@nvidia.com, yi.l.liu@intel.com, kvm@vger.kernel.org,
	libvir-list@redhat.com
Subject: Re: [PATCH] vfio/pci: Propagate ACPI notifications to the user-space
Date: Wed, 22 Mar 2023 10:47:56 +0100	[thread overview]
Message-ID: <CAH76GKMVquX7eLoGYtwMo06ar0dE9J3LtJEnYJKY0-3X5XFwow@mail.gmail.com> (raw)
In-Reply-To: <CAH76GKP+W9JUvQpqvjLHADMeRORPUf0d8vn5gCgE5fjxz0YkPQ@mail.gmail.com>

Hi Alex

Could you please refer to my previous message?

Thank you in advance,
Grzegorz

czw., 9 mar 2023 o 14:41 Grzegorz Jaszczyk <jaz@semihalf.com> napisał(a):
>
> czw., 9 mar 2023 o 00:38 Alex Williamson <alex.williamson@redhat.com>
> napisał(a):
> >
> > On Wed, 8 Mar 2023 14:44:28 -0800
> > Dominik Behr <dbehr@google.com> wrote:
> >
> > > On Wed, Mar 8, 2023 at 12:06 PM Alex Williamson
> > > <alex.williamson@redhat.com> wrote:
> > > >
> > > > On Wed, 8 Mar 2023 10:45:51 -0800
> > > > Dominik Behr <dbehr@chromium.org> wrote:
> > > >
> > > > > It is the same interface as other ACPI events like AC adapter LID etc
> > > > > are forwarded to user-space.
> > > > >  ACPI events are not particularly high frequency like interrupts.
> > > >
> > > > I'm not sure that's relevant, these interfaces don't proclaim to
> > > > provide isolation among host processes which manage behavior relative
> > > > to accessories.  These are effectively system level services.  It's only
> > > > a very, very specialized use case that places a VMM as peers among these
> > > > processes.  Generally we don't want to grant a VMM any privileges beyond
> > > > what it absolutely needs, so letting a VMM managing an assigned NIC
> > > > really ought not to be able to snoop host events related to anything
> > > > other than the NIC.
> > > How is that related to the fact that we are forwarding VFIO-PCI events
> > > to netlink? Kernel does not grant any privileges to VMM.
> > > There are already other ACPI events on netlink. The implementer of the
> > > VMM can choose to allow VMM to snoop them or not.
> > > In our case our VMM (crosvm) does already snoop LID, battery and AC
> > > adapter events so the guest can adjust its behavior accordingly.
> > > This change just adds another class of ACPI events that are forwarded
> > > to netlink.
> >
> > That's true, it is the VMM choice whether to allow snooping netlink,
> > but this is being proposed as THE solution to allow VMMs to receive
> > ACPI events related to vfio assigned devices.  If the solution
> > inherently requires escalating the VMM privileges to see all netlink
> > events, that's a weakness in the proposal.  As noted previously,
> > there's also no introspection here, the VMM can't know whether it
> > should listen to netlink for ACPI events or include AML related to a
> > GPE for the device.  It cannot determine if either the kernel supports
> > this feature or if the device has an ACPI companion that can generate
> > these events.
>
> To be precise the VMM doesn't listen to all netlink events: it listens
> only to "acpi_event" family and acpi related multicast group, which
> means it listens to all events generated through
> acpi_bus_generate_netlink_event.
>
> Before sending this patch I thought about using eventfd instead
> netalink which will actually provide a channel associated with a given
> device and therefore such notifications will be received only by the
> VMM associated with such a device. Nevertheless, it seems like eventfd
> will allow to signalize events happening (notify on a given device)
> but is not capable of sending any payload so in our case there is no
> room for propagating notification value via eventfd. Maybe there is
> other mechanism eventfd-like which will allow to achieve above?
>
> If there is no such mechanism, maybe instead of using existing acpi
> netlink events, which are associated with "acpi_event" netlink family
> and acpi multicast group, we could create per vfio-pci a different
> netlink family or probably reuse "acpi_event" family but use different
> multicast group, so each device will have dedicated netlink family.
> Does it seem reasonable?
>
> >
> > > >
> > > > > > > > What sort of ACPI events are we expecting to see here and what does user space do with them?
> > > > > The use we are looking at right now are D-notifier events about the
> > > > > GPU power available to mobile discrete GPUs.
> > > > > The firmware notifies the GPU driver and resource daemon to
> > > > > dynamically adjust the amount of power that can be used by the GPU.
> > > > >
> > > > > > The proposed interface really has no introspection, how does the VMM
> > > > > > know which devices need ACPI tables added "upfront"?  How do these
> > > > > > events factor into hotplug device support, where we may not be able to
> > > > > > dynamically inject ACPI code into the VM?
> > > > >
> > > > > The VMM can examine PCI IDs and the associated firmware node of the
> > > > > PCI device to figure out what events to expect and what ACPI table to
> > > > > generate to support it but that should not be necessary.
> > > >
> > > > I'm not entirely sure where your VMM is drawing the line between the VM
> > > > and management tools, but I think this is another case where the
> > > > hypervisor itself should not have privileges to examine the host
> > > > firmware tables to build its own.  Something like libvirt would be
> > > > responsible for that.
> > > Yes, but that depends on the design of hypervisor and VMM and is not
> > > related to this patch.
> >
> > It is very much related to this patch if it proposes an interface to
> > solve a problem which is likely not compatible with the security model
> > of other VMMs.  We need a single solution to support all VMMs.
> >
> > > >
> > > > > A generic GPE based ACPI event forwarder as Grzegorz proposed can be
> > > > > injected at VM init time and handle any notification that comes later,
> > > > > even from hotplug devices.
> > > >
> > > > It appears that forwarder is sending the notify to a specific ACPI
> > > > device node, so it's unclear to me how that becomes boilerplate AML
> > > > added to all VMs.  We'll need to notify different devices based on
> > > > different events, right?
> > > Valid point. The notifications have a "scope" ACPI path.
> > > In my experience these events are consumed without looking where they
> > > came from but I believe the patch can be extended to
> > > provide ACPI path, in your example "_SB.PCI0.GPP0.PEGP" instead of
> > > generic vfio_pci which VMM could use to translate an equivalent ACPI
> > > path in the guest and pass it to a generic ACPI GPE based notifier via
> > > shared memory. Grzegorz could you chime in whether that would be
> > > possible?
> >
> > So effectively we're imposing the host ACPI namespace on the VM, or at
> > least a mapping between the host and VM namespace?  The generality of
> > this is not improving.
>
> Yes, in the example VMM implementation we have mapping between the
> host pci device address and guest pci device. Therefore VMM knows,
> based on device name (BDF) sent via netlink, to which guest device
> this notification should be propagated. The boilerplate AML is added
> to each vfio-pci device which belongs to VMM and each vfio-pci device
> has associated pre-allocated GPE so the VMM knows which GPE should be
> triggered to replicate notification for a given device. BTW this is
> only current WIP VMM implementation - this could probably be optimized
> if needed.
>
> Handling hotplug devices is more problematic. I see that the kernel
> provides some runtime ACPI patching mechanism:
> https://www.kernel.org/doc/html/latest/firmware-guide/acpi/method-customizing.html
> (which I never tried) but not even sure how VMM could take advantage
> of it. BTW this realized me that the same problem with hotplug applies
> to other vfio-pci use-cases e.g. runtime PM, which relies on guest
> virtual ACPI method:
> https://patchwork.kernel.org/project/linux-pm/patch/20220829114850.4341-5-abhsahu@nvidia.com/.
> Generating virtual ACPI content for hotplug devices seems like a more
> generic issue.
>
> >
> > > > > > The acpi_bus_generate_netlink_event() below really only seems to form a
> > > > > > u8 event type from the u32 event.  Is this something that could be
> > > > > > provided directly from the vfio device uAPI with an ioeventfd, thus
> > > > > > providing introspection that a device supports ACPI event notifications
> > > > > > and the ability for the VMM to exclusively monitor those events, and
> > > > > > only those events for the device, without additional privileges?
> > > > >
> > > > > From what I can see these events are 8 bit as they come from ACPI.
> > > > > They also do not carry any payload and it is up to the receiving
> > > > > driver to query any additional context/state from the device.
> > > > > This will work the same in the VM where driver can query the same
> > > > > information from the passed through PCI device.
> > > > > There are multiple other netflink based ACPI events forwarders which
> > > > > do exactly the same thing for other devices like AC adapter, lid/power
> > > > > button, ACPI thermal notifications, etc.
> > > > > They all use the same mechanism and can be received by user-space
> > > > > programs whether VMMs or others.
> > > >
> > > > But again, those other receivers are potentially system services, not
> > > > an isolated VM instance operating in a limited privilege environment.
> > > > IMO, it's very different if the host display server has access to lid
> > > > or power events than it is to allow some arbitrary VM that happens to
> > > > have an unrelated assigned device that same privilege.
> > > Therefore these VFIO related ACPI events could be received by a system
> > > service via this netlink event and selectively forwarded to VMM if
> > > such is a desire of whoever implements the userspace.
> > > This is outside the scope of this patch. In our case our VMM does
> > > receive these LID, AC or battery events.
> >
> > But this is backwards, we're presupposing the choice to use netlink
> > based on the convenience of one VMM, which potentially creates
> > obstacles, maybe even security isolation issues for other VMMs.  The
> > method of delivering ACPI events to a VMM is very much within the scope
> > of this proposal.  Thanks,
> >
> > Alex
> >
>
> regarding:
> > > > On my laptop, I see multiple _GPE scopes, each apparently very unique
> > > > to the devices:
> > > >
> > > >    Scope (_GPE)
> > > >    {
> > > >        Method (_L0C, 0, Serialized)  // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
> > > >        {
> > > >            Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change
> > > >        }
> > > >
> > > >        Method (_L0D, 0, Serialized)  // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
> > > >        {
> > > >             Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change
> > > >        }
> > > >
> > > >        Method (_L0F, 0, Serialized)  // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
> > > >        {
> > > >             Notify (\_SB.PCI0.GPP0.PEGP, 0x81) // Information Change
> > > >         }
> > > >     }
> > > >
> > > >     Scope (_GPE)
> > > >     {
> > > >         Method (_L19, 0, NotSerialized)  // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
> > > >         {
> > > >             Notify (\_SB.PCI0.GP17, 0x02) // Device Wake
> > > >             Notify (\_SB.PCI0.GP17.XHC0, 0x02) // Device Wake
> > > >             Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake
> > > >             Notify (\_SB.PWRB, 0x02) // Device Wake
> > > >         }
> > > >
> > > >         Method (_L08, 0, NotSerialized)  // _Lxx: Level-Triggered GPE, xx=0x00-0xFF
> > > >         {
> > > >            Notify (\_SB.PCI0.GP18, 0x02) // Device Wake
> > > >             Notify (\_SB.PCI0.GPP0, 0x02) // Device Wake
> > > >             Notify (\_SB.PCI0.GPP1, 0x02) // Device Wake
> > > >             Notify (\_SB.PCI0.GPP5, 0x02) // Device Wake
> > > >         }
> > > >     }
> > > >
> > > > At least one more even significantly more extensive, calling methods
> > > > that interact with OpRegions.  So how does a simple stub of a
> > > > GPE block replicate this sort of behavior in the host AML?  Thanks,
>
> The simple stub of GPE block will work to replicate the ACPI
> notification only: as mentioned earlier GPE handler will be generated
> per-vfio device so in your example if let assume that only:
> - \_SB.PCI0.GPP0.PEGP
> - \_SB.PCI0.GP17.XHC1
> will be pass-through to the guest, the generated AML code for VM will
> look more-less like below:
>
>         Scope (_GPE)
>         {
>             Method (_E00, 0, NotSerialized)
>             {
>                 Local0 =  \_SB.PCI0.GPP0.PEGP.NOTY
>                 Notify ( \_SB.PCI0.GPP0.PEGP, Local0)
>             }
>         }
>         Scope (_GPE)
>         {
>             Method (_E01, 0, NotSerialized)
>             {
>                 Local0 =\_SB.PCI0.GP17.XHC1.NOTY
>                 Notify (\_SB.PCI0.GP17.XHC1, Local0)
>             }
>         }
>
> So each pass-through device will have associated GPE (0 for
> \_SB.PCI0.GPP0.PEGP and 1 for \_SB.PCI0.GP17.XHC1). The path in Notify
> could actually be different and related to guest pci hierarchy (but
> associated to those host devices). Please also note that in this case
> we use GPE in order to "inform" guest about notification coming and we
> do not try to replicate host GPE scope description.
>
> Above we assumed that other devices (like \_SB.PCI0.GPP0/1) are not
> pass-through to the guest and notification are handled in host as
> usual (they are not binded to pci-vfio) and there is no need to
> generate AML code, allocate GPE for them and so on.
>
> Thank you,
> Grzegorz

  reply	other threads:[~2023-03-22  9:49 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07 22:05 [PATCH] vfio/pci: Propagate ACPI notifications to the user-space Grzegorz Jaszczyk
2023-03-07 23:41 ` Alex Williamson
2023-03-08  8:11   ` Tian, Kevin
2023-03-08 11:41   ` Grzegorz Jaszczyk
2023-03-08 17:49     ` Alex Williamson
2023-03-08 18:45       ` Dominik Behr
2023-03-08 20:06         ` Alex Williamson
2023-03-08 22:44           ` Dominik Behr
2023-03-08 23:38             ` Alex Williamson
2023-03-09  1:51               ` Dominik Behr
2023-03-09 18:25                 ` Jason Gunthorpe
2023-03-09 13:41               ` Grzegorz Jaszczyk
2023-03-22  9:47                 ` Grzegorz Jaszczyk [this message]
2023-03-23 17:07                 ` Alex Williamson
2023-03-24 12:29                   ` Grzegorz Jaszczyk
2023-03-08 14:40 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAH76GKMVquX7eLoGYtwMo06ar0dE9J3LtJEnYJKY0-3X5XFwow@mail.gmail.com \
    --to=jaz@semihalf.com \
    --cc=abhsahu@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=dbehr@chromium.org \
    --cc=dbehr@google.com \
    --cc=dmy@semihalf.com \
    --cc=dtor@google.com \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=libvir-list@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tn@semihalf.com \
    --cc=upstream@semihalf.com \
    --cc=yi.l.liu@intel.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.