linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <bhelgaas@google.com>
To: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@au1.ibm.com>,
	linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	gwshan@linux.vnet.ibm.com, Donald Dutile <ddutile@redhat.com>,
	Myron Stowe <myron.stowe@redhat.com>
Subject: Re: [PATCH V9 03/18] PCI: Add weak pcibios_iov_resource_size() interface
Date: Wed, 19 Nov 2014 10:23:50 -0700	[thread overview]
Message-ID: <20141119172350.GC23467@google.com> (raw)
In-Reply-To: <20141119092740.GA12872@richard>

On Wed, Nov 19, 2014 at 05:27:40PM +0800, Wei Yang wrote:
> On Tue, Nov 18, 2014 at 09:26:01PM -0700, Bjorn Helgaas wrote:
> >On Wed, Nov 19, 2014 at 11:21:00AM +0800, Wei Yang wrote:
> >> On Wed, Nov 19, 2014 at 01:15:32PM +1100, Benjamin Herrenschmidt wrote:
> >> >On Tue, 2014-11-18 at 18:12 -0700, Bjorn Helgaas wrote:

> >> But the HW
> >> must map 256 segments with the same size. This will lead a situation like
> >> this.
> >> 
> >>    +------+------+        +------+------+------+------+
> >>    |VF#0  |VF#1  |   ...  |      |VF#N-1|PF#A  |PF#B  |
> >>    +------+------+        +------+------+------+------+
> >> 
> >> Suppose N = 254 and the HW map these 256 segments to their corresponding PE#.
> >
> >I guess these 256 segments are regions of CPU physical address space, and
> >they are being mapped to bus address space?  Is there some relationship
> >between a PE and part of the bus address space?
> >
> 
> PE is an entity for EEH, which may include a whole bus or one pci device.

Yes, I've read that many times.  What's missing is the connection between a
PE and the things in the PCI specs (buses, devices, functions, MMIO address
space, DMA, MSI, etc.)  Presumably the PE structure imposes constraints on
how the core uses the standard PCI elements, but we don't really have a
clear description of those constraints yet.

> When some device got some error, we need to identify which PE it belongs to.
> So we have some HW to map between PE# and MMIO/DMA/MSI address.
> 
> The HW mentioned in previous letter is the one to map MMIO address to a PE#.
> While this HW must map a range with 256 equal segments. And yes, this is
> mapped to bus address space.
> ...

> >> The difference after our expanding is the IOV BAR size is 256*4KB instead of
> >> 16KB. So it will look like this:
> >> 
> >>   PF  pci_dev->resource[7] = [mem 0x00000000-0x000fffff] (1024KB)
> >
> >Is the idea that you want this resource to be big enough to cover all 256
> >segments?  I think I'm OK with increasing the size of the PF resources to
> >prevent overlap.  That part shouldn't be too ugly.
> >
> 
> Yes, big enough to cover all 256 segments.
> 
> Sorry for making it ugly :-(

I didn't mean that what you did was ugly.  I meant that increasing the size
of the PF resource can be done cleanly.

By the way, when you do this, it would be nice if the dmesg showed the
standard PF IOV BAR sizing, and then a separate line showing the resource
expansion to deal with the PE constraints.  I don't think even the standard
output is very clear -- I think we currently get something like this:

  pci 0000:00:00.0 reg 0x174: [mem 0x00000000-0x00000fff]

But that is only the size of a single VF BAR aperture.  Then sriov_init()
multiplies that by the number of possible VFs, but I don't think we print
the overall size of that PF resource.  I think we should, because it's
misleading to print only the smaller piece.  Maybe something like this:

  pci 0000:00:00.0 VF BAR0: [mem 0x00000000-0x00003fff] (for 4 VFs)

And then you could do something like:

  pci 0000:00:00.0 VF BAR0: [mem 0x00000000-0x000fffff] (expanded for PE alignment)

> >>   VF1 pci_dev->resource[0] = [mem 0x00000000-0x00000fff]
> >>   VF2 pci_dev->resource[0] = [mem 0x00001000-0x00001fff]
> >>   VF3 pci_dev->resource[0] = [mem 0x00002000-0x00002fff]
> >>   VF4 pci_dev->resource[0] = [mem 0x00003000-0x00003fff]
> >>   ...
> >>   and 252 4KB space leave not used.
> >> 
> >> So the start address and the size of VF will not change, but the PF's IOV BAR
> >> will be expanded.
> >
> >I'm really dubious about this change to use pci_iov_resource_size().  I
> >think you might be doing that because if you increase the PF resource size,
> >dividing that increased size by total_VFs will give you garbage.  E.g., in
> >the example above, you would compute "size = 1024KB / 4", which would make
> >the VF BARs appear to be 256KB instead of 4KB as they should be.
> 
> Yes, your understanding is correct.
> 
> >I think it would be better to solve that problem by decoupling the PF
> >resource size and the VF BAR size.  For example, we could keep track of the
> >VF BAR size explicitly in struct pci_sriov, instead of computing it from
> >the PF resource size and total_VFs.  This would keep the VF BAR size
> >completely platform-independent.
> 
> Hmm... this is another solution.
> 
> If you prefer this one, I will make a change accordingly.

Yes, I definitely prefer to track the VF BAR size explicitly.  I think that
will make the code much clearer.

Bjorn

  reply	other threads:[~2014-11-19 17:23 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-02 15:41 [PATCH V9 00/18] Enable SRIOV on PowerNV Wei Yang
2014-11-02 15:41 ` [PATCH V9 01/18] PCI/IOV: Export interface for retrieve VF's BDF Wei Yang
2014-11-19 23:35   ` Bjorn Helgaas
2014-11-02 15:41 ` [PATCH V9 02/18] PCI: Add weak pcibios_iov_resource_alignment() interface Wei Yang
2014-11-02 15:41 ` [PATCH V9 03/18] PCI: Add weak pcibios_iov_resource_size() interface Wei Yang
2014-11-19  1:12   ` Bjorn Helgaas
2014-11-19  2:15     ` Benjamin Herrenschmidt
2014-11-19  3:21       ` Wei Yang
2014-11-19  4:26         ` Bjorn Helgaas
2014-11-19  9:27           ` Wei Yang
2014-11-19 17:23             ` Bjorn Helgaas [this message]
2014-11-19 20:51               ` Benjamin Herrenschmidt
2014-11-20  5:40                 ` Wei Yang
2014-11-20  5:39               ` Wei Yang
2014-11-02 15:41 ` [PATCH V9 04/18] PCI: Take additional PF's IOV BAR alignment in sizing and assigning Wei Yang
2014-11-02 15:41 ` [PATCH V9 05/18] powerpc/pci: Add PCI resource alignment documentation Wei Yang
2014-11-02 15:41 ` [PATCH V9 06/18] powerpc/pci: Don't unset pci resources for VFs Wei Yang
2014-11-02 15:41 ` [PATCH V9 07/18] powerpc/pci: Define pcibios_disable_device() on powerpc Wei Yang
2014-11-02 15:41 ` [PATCH V9 08/18] powrepc/pci: Refactor pci_dn Wei Yang
2014-11-19 23:30   ` Bjorn Helgaas
2014-11-20  1:02     ` Gavin Shan
2014-11-20  7:25       ` Wei Yang
2014-11-20  7:20     ` Wei Yang
2014-11-20 19:05       ` Bjorn Helgaas
2014-11-21  0:04         ` Gavin Shan
2014-11-25  9:28           ` Wei Yang
2014-11-21  1:46         ` Wei Yang
2014-11-02 15:41 ` [PATCH V9 09/18] powerpc/pci: remove pci_dn->pcidev field Wei Yang
2014-11-02 15:41 ` [PATCH V9 10/18] powerpc/powernv: Use pci_dn in PCI config accessor Wei Yang
2014-11-02 15:41 ` [PATCH V9 11/18] powerpc/powernv: Allocate pe->iommu_table dynamically Wei Yang
2014-11-02 15:41 ` [PATCH V9 12/18] powerpc/powernv: Expand VF resources according to the number of total_pe Wei Yang
2014-11-02 15:41 ` [PATCH V9 13/18] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv Wei Yang
2014-11-02 15:41 ` [PATCH V9 14/18] powerpc/powernv: Implement pcibios_iov_resource_size() " Wei Yang
2014-11-02 15:41 ` [PATCH V9 15/18] powerpc/powernv: Shift VF resource with an offset Wei Yang
2014-11-02 15:41 ` [PATCH V9 16/18] powerpc/powernv: Allocate VF PE Wei Yang
2014-11-02 15:41 ` [PATCH V9 17/18] powerpc/powernv: Expanding IOV BAR, with m64_per_iov supported Wei Yang
2014-11-02 15:41 ` [PATCH V9 18/18] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3 Wei Yang
2014-11-18 23:11 ` [PATCH V9 00/18] Enable SRIOV on PowerNV Gavin Shan
2014-11-18 23:40   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141119172350.GC23467@google.com \
    --to=bhelgaas@google.com \
    --cc=benh@au1.ibm.com \
    --cc=ddutile@redhat.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=myron.stowe@redhat.com \
    --cc=weiyang@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).