linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Logan Gunthorpe <logang@deltatee.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: Multitude of resource assignment functions
Date: Sun, 30 Jun 2019 02:40:36 +0000	[thread overview]
Message-ID: <SL2P216MB01871C19CA4C0477105B567E80FE0@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <1a0e2012fd26685819cb1ee83180405717f690be.camel@kernel.crashing.org>

Thank you for the reply. I have been mulling it over for a while.

On Thu, Jun 27, 2019 at 06:48:35PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2019-06-27 at 07:40 +0000, Nicholas Johnson wrote:
> > Unfortunately, the operating system is designed to let the firmware do 
> > things. In my mind, ACPI should not need to exist, and the operating 
> > system should start with a clean state with PCI and re-enumerate 
> > everything at boot time. The PCI allocation is so broken and 
> > inconsistent (as you have noted) because it tries to combine the two, 
> > when firmware enumeration and native enumeration should be mutually 
> > exclusive. I have attempted to re-write large chunks of probe.c, pci.c 
> > and setup-bus.c to completely disregard firmware enumeration and clean 
> > everything up. Unfortunately, I get stuck in probe.c with the double 
> > recursive loop which assigns bus numbers - I cannot figure out how to 
> > re-write it successfully. Plus, I feel like nobody will be ready for 
> > such a drastic change - I am having trouble selling minor changes that 
> > fix actual use cases, as opposed to code reworking.
> 
> Well... so a lot of platforms are happy to do a full re-assignment,
> though they use the current code today which leads to rather sub
> standard results when it comes to hotplug bridges.
> 
> All the embedded platforms today are like that,and all of ARM64 though
> the latter will somewhat change, all DT based ARM64 will probably
> remain that way.
> 
> > My next proposal might be a kernel parameter for PCI to set various 
> > levels of disregard for firmware
> 
> Well, at least ACPI has this _DSM #5 thingy that can tell us that we
> are allowed to disregard firmware for selected bits and pieces
> (hopefully that tends to be whole hierarchies but I don't know how well
> it's used in practice).
I will need to find out more about this - can you suggest any 
particularly good resources on learning about ACPI?

> 
> > , from none to complete, which can be 
> > added to incrementally to do more and more (rather than all in one patch 
> > series).
> 
> So there are a number of reasons to honor what the firmware did.
> 
> First, today (but that's fixable), we suck at setting up reasonable
> space for hotplug by default.
What annoys me more is that the BIOS vendors

a) don't provide means to 
configure this in the BIOS, and if they do, it is hidden options which 
require you to re-flash the BIOS or use the dumped IFRs and EFI shell to 
modify the variables

b) Even the few motherboards with the options for Thunderbolt available 
without resorting to (a) have it limited to 4096M.

c) Motherboards are still cramming us into the 32-bit address space in 
case somebody is still using a 32-bit OS. There is the "above 4G 
decoding option" available on most motherboards, but I am not sure if 
that completely fixes the issue. Given that Microsoft said you need 
Windows 10 to run on the latest hardware, I do not see many people using 
32-bit OS on the latest hardware.

d) These options are especially needed because Windows cannot override 
anything whatsoever. Not even _OSC like pcie_ports=native on Linux.

> 
> But there are more insidious ones. There are platforms where you can't
> move things (typically virtualized platforms with specific hypervisors,
> such as IBM pseries).
I cannot argue with this.

> 
> There are platforms where the *runtime* firwmare (SMM or equivalent or
> even ACPI AML bits) will be poking at some system devices and those
> really must not be moved. (In fact there's a theorical problem with
> such devices becoming temporarily inaccessible during BAR sizing today
> but we mostly get lucky).
I think SMM is a nasty back door. Unfortunately the precident set is 
that the firmware makers can do what they want and we are expected to 
honour that in the kernel. In an ideal world, it would default to the OS 
assigning things and the firmware vendors getting blamed when things 
break if they insist on using runtime firmware.

In my ideal world, motherboards would have the absolute bare minimum in 
BIOS to initialise DRAM and the tricky stuff, and then boot a CoreBoot 
Linux kernel off a MicroSD slot on the board. This could easily be 
updated constantly (for example, to add NVMe support to old boards) and 
it would be impossible to brick the motherboard by changing this, as the 
SD card could be removed and restored.

This would fix the following:
- No longer need for PCI option ROMs and 
their security issues
- Open source / free firmware
- Will not need firmware updates to add NVMe boot support
- Allow target OS booted with kexec to assign resources as required
- Set up IOMMU for Thunderbolt (and all DMA ports) at boot time without 
special BIOS updates required
- Etc

I am sure there are problems to what I am saying, but I do find it 
frustrating that the industry has the inability to move on from legacy 
to the massive extent that it does.

When you have an arch, you expect that the same bytecode will run on the 
next system with that same arch. I don't understand why it stops there - 
I believe two systems of the same arch should be indistinguishable - 
without all of the firmware differences, and I hope to influence this 
during my career.

> 
> There are other "interesting" cases, like EFI giving us the framebuffer
> address to use if we don't have a native driver... which happens to be
> off a PCI BAR somewhere. Now we *could* probably try to special case
> that and detect when we move that BAR but today we'll probably break if
> we move it.
Also fixed by CoreBoot which will have the Linux kernel and all the 
drivers - no need for legacy services like this.

> 
> x86 historically has other nasty "hidden" devices. There are historical
> cases of devices that break if they move after initial setup, etc...
> Most of these things are ancient but we have to ensure we keep today's
> policy for old platforms at least.
Sometimes I think that we need a fork of Linux. Although that would be 
the same as saying "for old systems, support ends on this kernel version 
and you are unlikely to need the new features of the latest kernels on 
oldest hardware". They did drop the older X86 recently, I believe.

> 
> >  This can supercede pci=realloc. The realloc command is so 
> > broken because once the system has loaded drivers, it becomes next to 
> > impossible to free and reallocate a resource to fit another device in - 
> > because it will upset existing devices. The realloc command is only 
> > useful in early boot because nothing is yet assigned, so it works. 
> > However, the same effect can be achieved by releasing all the resources 
> > on the root port before anything happens. I think it was 
> > pci_assign_unassigned_resources(), and I did verify this experimentally. 
> > This switch could be part of such a new kernel parameter to ignore 
> > firmware influence on PCI.
> 
> We should see what ACPI gives us in _DSM #5 on x86 these days.. if it's
> meaningful on enough machines we could use that as an indication that a
> given tree can be reallocated.
> 
> > I hope that somehow we can transition to ignoring the firmware - because 
> > firmware and native enumeration need to be mutually exclusive, and we 
> > need native enumeration for PCI hotplug. If anybody has any ideas how, I 
> > would love to hear.
> 
> We'll probably have to live with an "in-between" forever on x86 and
> maybe arm64, but with some luck, the static devices will only be the
> on-board stuff, and we can go wild below bridges...
The rest was just speculation and thoughts. My real question here is: 
What path do we have towards modernisation? We cannot replace the PCI 
code to handle everything natively and disregard the firmware for modern 
architectures like the emerging RISC-V because that code will screw up 
X86. So do we have to have pci-old and pci-new subsystems which can be 
elected by each arch?

> 
> BTW: I'd like us to discuss that f2f at Plumbers in a miniconf if
> enough of us can go.
Please explain this as I have no idea what f2f, Plumbers and miniconf 
are.

Cheers,
Nicholas

> 
> Cheers,
> Ben.
> 
> 

  reply	other threads:[~2019-06-30  2:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <SL2P216MB01874DFDDBDE49B935A9B1B380E50@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM>
2019-06-19 16:21 ` [nicholas.johnson-opensource@outlook.com.au: [PATCH v6 3/4] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window] Logan Gunthorpe
2019-06-20  0:44   ` Nicholas Johnson
2019-06-20  0:49     ` Logan Gunthorpe
2019-06-23  5:01       ` Nicholas Johnson
2019-06-24  9:13         ` Multitude of resource assignment functions Benjamin Herrenschmidt
2019-06-24 16:45           ` Logan Gunthorpe
2019-06-27  7:40             ` Nicholas Johnson
2019-06-27  8:48               ` Benjamin Herrenschmidt
2019-06-30  2:40                 ` Nicholas Johnson [this message]
2019-06-27 16:35               ` Logan Gunthorpe
2019-06-27 20:26                 ` Benjamin Herrenschmidt
2019-06-30  2:57                 ` Nicholas Johnson
2019-07-01  4:33                   ` Oliver O'Halloran
2019-07-02 21:39                   ` Bjorn Helgaas
2019-07-03 13:43                     ` Nicholas Johnson
2019-07-03 14:19                       ` Bjorn Helgaas
2019-07-03 22:54                       ` Benjamin Herrenschmidt
2019-06-20 13:43     ` [nicholas.johnson-opensource@outlook.com.au: [PATCH v6 3/4] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window] Bjorn Helgaas
2019-06-20 23:24       ` Benjamin Herrenschmidt
2019-06-27  7:50   ` Nicholas Johnson
2019-06-27 16:54     ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SL2P216MB01871C19CA4C0477105B567E80FE0@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM \
    --to=nicholas.johnson-opensource@outlook.com.au \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).