linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Shevchenko <andriy.shevchenko@intel.com>
To: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Cc: linux-pci@vger.kernel.org, "Bjorn Helgaas" <bhelgaas@google.com>,
	"Lorenzo Pieralisi" <lorenzo.pieralisi@arm.com>,
	"Rob Herring" <robh@kernel.org>,
	"Krzysztof Wilczyński" <kw@linux.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Mika Westerberg" <mika.westerberg@linux.intel.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
	linux-kernel@vger.kernel.org,
	"Lidong Wang" <lidong.wang@intel.com>
Subject: Re: [PATCH v3 8/8] PCI: Relax bridge window tail sizing rules
Date: Tue, 7 May 2024 16:49:49 +0300	[thread overview]
Message-ID: <ZjoxfYEi9mMQnRBh@smile.fi.intel.com> (raw)
In-Reply-To: <20240507102523.57320-9-ilpo.jarvinen@linux.intel.com>

On Tue, May 07, 2024 at 01:25:23PM +0300, Ilpo Järvinen wrote:
> During remove & rescan cycle, PCI subsystem will recalculate and adjust
> the bridge window sizing that was initially done by "BIOS". The size
> calculation is based on the required alignment of the largest resource
> among the downstream resources as per pbus_size_mem() (unimportant or
> zero parameters marked with "..."):
> 
> 	min_align = calculate_mem_align(aligns, max_order);
> 	size0 = calculate_memsize(size, ..., min_align);
> 
> inside calculate_memsize(), for the largest alignment:
> 	min_align = align1 >> 1;
> 	...
> 	return min_align;
> 
> and then in calculate_memsize():
> 	return ALIGN(max(size, ...), align);
> 
> If the original bridge window sizing tried to conserve space, this will
> lead to massive increase of the required bridge window size when the
> downstream has a large disparity in BAR sizes. E.g., with 16MiB and
> 16GiB BARs this results in 24GiB bridge window size even if 16MiB BAR
> does not require gigabytes of space to fit.
> 
> When doing remove & rescan for a bus that contains such a PCI device, a
> larger bridge window is suddenly required on rescan but when there is a
> bridge window upstream that is already assigned based on the original
> size, it cannot be enlarged to the new requirement. This causes the
> allocation of the bridge window to fail (0x600000000 > 0x400ffffff):
> 
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:01:00.0: PCI bridge to [bus 02-04]
> pci 0000:01:00.0:   bridge window [mem 0x40400000-0x406fffff]
> pci 0000:01:00.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> 
> pci 0000:03:00.0: device released
> pci 0000:02:01.0: device released
> pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 0
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 0
> pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]
> pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]
> pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]
> 
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
> pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
> pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: can't assign; no space
> pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: failed to assign
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
> pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space
> pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: failed to assign
> pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space
> pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: failed to assign
> pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
> 
> This is a major surprise for users who are suddenly left with a PCIe
> device that was working fine with the original bridge window sizing.
> 
> Even if the already assigned bridge window could be enlarged by
> reallocation in some cases (something the current code does not attempt
> to do), it is not possible in general case and the large amount of
> wasted space at the tail of the bridge window may lead to other
> resource exhaustion problems on Root Complex level (think of multiple
> PCIe cards with VFs and BAR size disparity in a single system).
> 
> PCI specifications only expect natural alignment for BARs (PCI Express
> Base Specification, rev. 6.1 sect. 7.5.1.2.1) and minimum of 1MiB
> alignment for the bridge window (PCI Express Base Specification,
> rev 6.1 sect. 7.5.1.3). The current bridge window tail alignment rule
> was introduced in the commit 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI
> allocation code (alpha, arm, parisc) [2/2]") that only states:
> "pbus_size_mem: core stuff; tested with randomly generated sets of
> resources". It does not explain the motivation for the extra tail space
> allocated that is not truly needed by the downstream resources. As
> such, it is far from clear if it ever has been required by any HW.
> 
> To prevent PCIe cards with BAR size disparity from becoming unusable
> after remove & rescan cycle, attempt to do a truly minimal allocation
> for memory resources if needed. First check if the normally calculated
> bridge window will not fit into an already assigned upstream resource.
> In such case, try with relaxed bridge window tail sizing rules instead
> where no extra tail space is requested beyond what the downstream
> resources require. Only enforce the alignment requirement of the bridge
> window itself (normally 1MiB).
> 
> With this patch, the resources are successfully allocated:
> 
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
> pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
> pcieport 0000:01:00.0: Assigned bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 02-04] cannot fit 0x600000000 required for 0000:02:01.0 bridging to [bus 03]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 03] requires relaxed alignment rules
> pcieport 0000:01:00.0: Assigned bridge window [mem 0x40400000-0x406fffff] to [bus 02-04] free space at [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]: assigned
> pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
> pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]: assigned
> pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]: assigned
> pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
> pci 0000:02:01.0: PCI bridge to [bus 03]
> pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
> pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
> 
> This patch draws inspiration from the initial investigations and work
> by Mika Westerberg.

...

> +		min_align = 1ULL << (max_order + __ffs(SZ_1M));

In case of a new version of the series, this can utilise BIT_ULL().

-- 
With Best Regards,
Andy Shevchenko



  reply	other threads:[~2024-05-07 13:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07 10:25 [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 1/8] PCI: Fix resource double counting on remove & rescan Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 2/8] resource: Rename find_resource() to find_resource_space() Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 3/8] resource: Document find_resource_space() and resource_constraint Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 4/8] resource: Use typedef for alignf callback Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 5/8] resource: Handle simple alignment inside __find_resource_space() Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 6/8] resource: Export find_resource_space() Ilpo Järvinen
2024-05-07 10:25 ` [PATCH v3 7/8] PCI: Make minimum bridge window alignment reference more obvious Ilpo Järvinen
2024-05-07 10:36   ` Mika Westerberg
2024-05-07 10:50     ` Ilpo Järvinen
2024-05-07 14:01   ` Andy Shevchenko
2024-05-07 10:25 ` [PATCH v3 8/8] PCI: Relax bridge window tail sizing rules Ilpo Järvinen
2024-05-07 13:49   ` Andy Shevchenko [this message]
2024-05-28 13:10 ` [PATCH v3 0/8] PCI: Solve two bridge window sizing issues Ilpo Järvinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjoxfYEi9mMQnRBh@smile.fi.intel.com \
    --to=andriy.shevchenko@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=bhelgaas@google.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=imammedo@redhat.com \
    --cc=kw@linux.com \
    --cc=lidong.wang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=rafael@kernel.org \
    --cc=robh@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).