linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Graf <graf@amazon.com>
To: Ajay Kaher <akaher@vmware.com>, Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "x86@kernel.org" <x86@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	Srivatsa Bhat <srivatsab@vmware.com>,
	"srivatsa@csail.mit.edu" <srivatsa@csail.mit.edu>,
	Alexey Makhalov <amakhalov@vmware.com>,
	Vasavi Sirnapalli <vsirnapalli@vmware.com>,
	"er.ajay.kaher@gmail.com" <er.ajay.kaher@gmail.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"Nadav Amit" <namit@vmware.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"jailhouse-dev@googlegroups.com" <jailhouse-dev@googlegroups.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"acrn-dev@lists.projectacrn.org" <acrn-dev@lists.projectacrn.org>,
	"helgaas@kernel.org" <helgaas@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on all hypervisor
Date: Thu, 29 Sep 2022 10:12:08 +0100	[thread overview]
Message-ID: <1f1beb97-2a36-dcc0-f09a-59af19663ae2@amazon.com> (raw)
In-Reply-To: <B64FD502-E794-4E94-A267-D690476C57EE@vmware.com>


On 29.09.22 07:36, Ajay Kaher wrote:
>> On 13/09/22, 7:05 PM, "Vitaly Kuznetsov" <vkuznets@redhat.com> wrote:
>>> Thanks Vitaly for your response.
>>>
>>> 1. we have multiple objects of struct pci_raw_ops, 2. adding 'priority' field to struct pci_raw_ops
>>> doesn't seems to be appropriate as need to take decision which object of struct pci_raw_ops has
>>> to be used, not something with-in struct pci_raw_ops.
>> I'm not sure I follow, you have two instances of 'struct pci_raw_ops'
>> which are called 'raw_pci_ops' and 'raw_pci_ext_ops'. What if you do
>> something like (completely untested):
>>
>> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
>> index 70533fdcbf02..fb8270fa6c78 100644
>> --- a/arch/x86/include/asm/pci_x86.h
>> +++ b/arch/x86/include/asm/pci_x86.h
>> @@ -116,6 +116,7 @@ extern void (*pcibios_disable_irq)(struct pci_dev *dev);
>> extern bool mp_should_keep_irq(struct device *dev);
>>
>> struct pci_raw_ops {
>> +       int rating;
>>           int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
>>                                                 int reg, int len, u32 *val);
>>           int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
>> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
>> index ddb798603201..e9965fd11576 100644
>> --- a/arch/x86/pci/common.c
>> +++ b/arch/x86/pci/common.c
>> @@ -40,7 +40,8 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
>>   int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
>>                                                  int reg, int len, u32 *val)
>> {
>> -       if (domain == 0 && reg < 256 && raw_pci_ops)
>> +       if (domain == 0 && reg < 256 && raw_pci_ops &&
>> +           (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating))
>>                  return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
>>          if (raw_pci_ext_ops)
>>                  return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
>> @@ -50,7 +51,8 @@ int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
>>   int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
>>                                                  int reg, int len, u32 val)
>> {
>> -       if (domain == 0 && reg < 256 && raw_pci_ops)
>> +       if (domain == 0 && reg < 256 && raw_pci_ops &&
>> +           (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating))
>>                  return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
>>           if (raw_pci_ext_ops)
>>                  return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
>>
>> and then somewhere in Vmware hypervisor initialization code
>> (arch/x86/kernel/cpu/vmware.c) you do
>>
>>   raw_pci_ext_ops->rating = 100;
> Thanks Vitaly, for your review and helping us to improve the code.
>
> I was working to make changes as you suggested, but before sending v3 would like to
> discuss on following:
>
> If we add rating with-in struct pci_raw_ops then we can't have pci_mmcfg as const,
> and following change is must in arch/x86/pci/mmconfig_64.c:
>
> -const struct pci_raw_ops pci_mmcfg = {
> +struct pci_raw_ops pci_mmcfg = {
>          .read =         pci_mmcfg_read,
>          .write =        pci_mmcfg_write,
> };
>
> So to avoid this change, is it fine to have global bool prefer_raw_pci_ext_ops?
>
> And raw_pci_read() will have following change:
>
> -       if (domain == 0 && reg < 256 && raw_pci_ops)
> +       if (domain == 0 && reg < 256 && raw_pci_ops &&
> +            (!prefer_raw_pci_ext_ops ||  !raw_pci_ext_ops)
>
>> why wouldn't it work?
>>
>> (diclaimer: completely untested, raw_pci_ops/raw_pci_ext_ops
>> initialization has to be checked so 'rating' is not garbage).
>>
>>> It's a generic solution for all hypervisor (sorry for earlier wrong
>>> Subject), not specific to VMware. Further looking for feedback if it's
>>> impacting to any hypervisor.
>> That's the tricky part. We can check modern hypervisor versions, but
>> what about all other versions in existence? How can we know that there's
>> no QEMU/Hyper-V/... version out there where MMIO path is broken? I'd
>> suggest we limit the change to Vmware hypervisor, other hypervisors may
>> use the same mechanism (like the one above) later (but the person
>> suggesting the patch is always responsible for the research why it is
>> safe to do so).
> Ok, as of now we will make this change specific to VMware hypervisor.


Is there a way we can make it an ACPI property in MCFG to have the 
environment self-describe the fact that it's safe to do ECAM access for 
config space access over legacy PIO? That way we don't need to patch 
guests every time a hypervisor decides that it's safe to prefer ECAM.

Also, Michael (CC'ed) mentioned that according to spec, your PCIe host 
bridge with PCI_COMMAND.MEMORY=0 would stop responding to its ECAM 
window. Given that most ARM systems have no PIO fallback path, we want 
to make sure we never hit that condition.


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



  reply	other threads:[~2022-09-29  9:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-13 12:47 [PATCH v2] x86/PCI: Prefer MMIO over PIO on all hypervisor Ajay Kaher
2022-09-13 13:34 ` Vitaly Kuznetsov
2022-09-29  5:36   ` Ajay Kaher
2022-09-29  9:12     ` Alexander Graf [this message]
2022-10-03 15:03     ` Vitaly Kuznetsov
2022-10-03 17:34       ` Nadav Amit
2022-10-03 21:06         ` H. Peter Anvin
2022-10-03 21:28           ` Nadav Amit
2022-10-03 23:51             ` H. Peter Anvin
2022-10-04  0:19               ` Nadav Amit
2022-10-04  8:22         ` Alexander Graf
2022-10-04 18:48           ` Nadav Amit
2022-10-10 14:58             ` Nadav Amit
2022-10-10 17:05             ` Michael S. Tsirkin
2022-10-04  8:30         ` Vitaly Kuznetsov
2022-10-03 21:01       ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f1beb97-2a36-dcc0-f09a-59af19663ae2@amazon.com \
    --to=graf@amazon.com \
    --cc=acrn-dev@lists.projectacrn.org \
    --cc=akaher@vmware.com \
    --cc=amakhalov@vmware.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=er.ajay.kaher@gmail.com \
    --cc=helgaas@kernel.org \
    --cc=hpa@zytor.com \
    --cc=jailhouse-dev@googlegroups.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mst@redhat.com \
    --cc=namit@vmware.com \
    --cc=rostedt@goodmis.org \
    --cc=srivatsa@csail.mit.edu \
    --cc=srivatsab@vmware.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=vsirnapalli@vmware.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).