From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753032AbbBXEh5 (ORCPT ); Mon, 23 Feb 2015 23:37:57 -0500 Received: from numascale.com ([213.162.240.84]:49156 "EHLO numascale.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752244AbbBXEhz (ORCPT ); Mon, 23 Feb 2015 23:37:55 -0500 Message-ID: <54EC0013.7000100@numascale.com> Date: Tue, 24 Feb 2015 12:37:39 +0800 From: Daniel J Blueman Organization: Numascale AS User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Bjorn Helgaas , Jiang Liu CC: Ingo Molnar , H Peter Anvin , Thomas Gleixner , Linux Kernel , Steffen Persvold , "x86@kernel.org" , Yinghai Lu Subject: Re: PCIe 32-bit MMIO exhaustion References: <54C8A10B.3070207@numascale.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - cpanel21.proisp.no X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - numascale.com X-Get-Message-Sender-Via: cpanel21.proisp.no: authenticated_id: daniel@numascale.com X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Bjorn, Jiang, On 29/01/2015 23:23, Bjorn Helgaas wrote: > Hi Daniel, > > On Wed, Jan 28, 2015 at 2:42 AM, Daniel J Blueman wrote: >> With systems with a large number of PCI devices, we're seeing lack of 32-bit >> MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1]. >> >> An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit >> non-prefetchable BARs (since bridges have only 32-bit non-prefetchable >> ranges) stating that vendors can enable the prefetchable bit in BARs under >> certain circumstances to allow 64-bit allocation [2]. >> >> The problem with that, is that vendors can't know apriori what hosts their >> products will be in, so can't just advertise prefetchable 64-bit BARs. What >> can be done, is system firmware can use the 64-bit prefetchable BAR in >> bridges, and assign a 64-bit non-prefetchable device BAR into that area, >> where it is safe to do so (following the guidance). >> >> At present, linux denies such allocations [3] and disables the BARs. It >> seems a practical solution to allow them if the firmware believes it is >> safe. > > This particular message ([3]): > >> pci 0002:01:00.0: BAR 0: [mem size 0x00002000 64bit] conflicts with PCI Bus >> 0002:00 [mem 0x10020000000-0x10027ffffff pref] > > is misleading at best and likely a symptom of a bug. We printed the > *size* of BAR 0, not an address, which means we haven't assigned space > for the BAR. That means it should not conflict with anything. > > We already do revert to firmware assignments in some situations when > Linux can't figure out how to assign things itself. But apparently > not in *this* situation. > > Without seeing the whole picture, it's hard for me to figure out > what's going on here. Could you open a bug report at > http://bugzilla.kernel.org (category drivers/PCI) and attach a > complete dmesg and "lspci -vv" output? Then we can look at what > firmware did and what Linux thought was wrong with it. Done a while back: https://bugzilla.kernel.org/show_bug.cgi?id=92671 An interesting question popped up: I find the kernel doesn't accept IO BARs and bridge windows after address 0xffff, though the PCI spec and modern hardware allows 32-bit decode. Thus for practical reasons, our NumaConnect firmware doesn't setup IO BARs/windows beyond the first PCI domain (which is the only one with legacy support, and no drivers seem to require IO their BARs anyway), and we get conflicts and warnings [1]: pnp 00:00: disabling [io 0x0061] because it overlaps 0001:05:00.0 BAR 0 [io 0x0000-0x00ff] pci 0001:03:00.0: BAR 13: no space for [io size 0x1000] pci 0001:03:00.0: BAR 13: failed to assign [io size 0x1000] Is there a cleaner way of dealing with this, in our firmware and/or the kernel? Eg, I guess if IO BARs aren't assigned (value 0) on PCI domains without IO bridge windows in the ACPI AML, no need to conflict/attempt assignment? Many thanks! Daniel [1] https://bugzilla.kernel.org/attachment.cgi?id=165831 -- Daniel J Blueman Principal Software Engineer, Numascale