linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Wei Yang <weiyang@linux.vnet.ibm.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org, Wei Yang <weiyang@linux.vnet.ibm.com>,
	benh@au1.ibm.com, linuxppc-dev@lists.ozlabs.org,
	gwshan@linux.vnet.ibm.com
Subject: Re: [PATCH V9 08/18] powrepc/pci: Refactor pci_dn
Date: Thu, 20 Nov 2014 15:20:57 +0800	[thread overview]
Message-ID: <20141120072057.GC8562@richard> (raw)
In-Reply-To: <20141119233024.GF23467@google.com>

On Wed, Nov 19, 2014 at 04:30:24PM -0700, Bjorn Helgaas wrote:
>On Sun, Nov 02, 2014 at 11:41:24PM +0800, Wei Yang wrote:
>> From: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> 
>> pci_dn is the extension of PCI device node and it's created from
>> device node. Unfortunately, VFs that are enabled dynamically by
>> PF's driver and they don't have corresponding device nodes, and
>> pci_dn. The patch refactors pci_dn to support VFs:
>> 
>>    * pci_dn is organized as a hierarchy tree. VF's pci_dn is put
>>      to the child list of pci_dn of PF's bridge. pci_dn of other
>>      device put to the child list of pci_dn of its upstream bridge.
>> 
>>    * VF's pci_dn is expected to be created dynamically when applying
>>      final fixup to PF. VF's pci_dn will be destroyed when releasing
>>      PF's pci_dev instance. pci_dn of other device is still created
>>      from device node as before.
>> 
>>    * For one particular PCI device (VF or not), its pci_dn can be
>>      found from pdev->dev.archdata.firmware_data, PCI_DN(devnode),
>>      or parent's list. The fast path (fetching pci_dn through PCI
>>      device instance) is populated during early fixup time.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>> ...
>
>> +struct pci_dn *add_dev_pci_info(struct pci_dev *pdev)
>> +{
>> +#ifdef CONFIG_PCI_IOV
>> +	struct pci_dn *parent, *pdn;
>> +	int i;
>> +
>> +	/* Only support IOV for now */
>> +	if (!pdev->is_physfn)
>> +		return pci_get_pdn(pdev);
>> +
>> +	/* Check if VFs have been populated */
>> +	pdn = pci_get_pdn(pdev);
>> +	if (!pdn || (pdn->flags & PCI_DN_FLAG_IOV_VF))
>> +		return NULL;
>> +
>> +	pdn->flags |= PCI_DN_FLAG_IOV_VF;
>> +	parent = pci_bus_to_pdn(pdev->bus);
>> +	if (!parent)
>> +		return NULL;
>> +
>> +	for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
>> +		pdn = add_one_dev_pci_info(parent, NULL,
>> +					   pci_iov_virtfn_bus(pdev, i),
>> +					   pci_iov_virtfn_devfn(pdev, i));
>
>I'm not sure this makes sense, but I certainly don't know this code, so
>maybe I'm missing something.
>
>pci_iov_virtfn_bus() and pci_iov_virtfn_devfn() depend on
>pdev->sriov->stride and pdev->sriov->offset.  These are read from VF Stride
>and First VF Offset in the SR-IOV capability by sriov_init(), which is
>called before add_dev_pci_info():
>
>  pci_scan_child_bus
>    pci_scan_slot
>      pci_scan_single_device
>	pci_device_add
>	  pci_init_capabilities
>	    pci_iov_init(PF)
>	      sriov_init(PF, pos)
>		pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, 0)
>		pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, &offset)
>		pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, &stride)
>		iov->offset = offset
>		iov->stride = stride
>
>  pci_bus_add_devices
>    pci_bus_add_device
>      pci_fixup_device(pci_fixup_final)
>	add_dev_pci_info
>	  pci_iov_virtfn_bus
>	    return ... + sriov->offset + (sriov->stride * id) ...
>
>But both First VF Offset and VF Stride change when ARI Capable Hierarchy or
>NumVFs changes (SR-IOV spec sec 3.3.9, 3.3.10).  We set NumVFs to zero in
>sriov_init() above.  We will change NumVFs to something different when a
>driver calls pci_enable_sriov():
>
>  pci_enable_sriov
>    sriov_enable
>      pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn)
>
>Now First VF Offset and VF Stride have changed from what they were when we
>called pci_iov_virtfn_bus() above.
>

Oops, I see the ARI would affect those value, while missed the NumVFs also
would.

Let's look at the problem one by one.

1. The ARI capability.
===============================================================================
The kernel initialize the capability like this:

pci_init_capabilities()
	pci_configure_ari()
	pci_iov_init()
		iov->offset = offset
		iov->stride = stride

When offset/stride is retrieved at this point, the ARI capability is taken
into consideration.

2. The PF's NumVFs field
===============================================================================
2.1 Potential problem in current code
===============================================================================
First, is current pci code has some potential problem?

sriov_enable()
	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, &offset);
	pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, &stride);
	iov->offset = offset;
	iov->stride = stride;
	pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn);
	virtfn_add()
		...
		virtfn->devfn = pci_iov_virtfn_devfn(dev, id);

The sriov_enable() retrieve the offset/stride then write the NumVFs. According
to the SPEC, at this moment the offset/stride may change. While I don't see
some code to retrieve and store those value again. And these fields will be
used in virtfn_add().

If my understanding is correct, I suggest to move the retrieve and store
operation after NumVFs is written.

2.2 The IOV bus range may not be correct in pci_scan_child_bus()?
===============================================================================
In current pci core, when enumerating the pci tree, we do like this:

pci_scan_child_bus()
	pci_scan_slot()
		pci_scan_single_device()
			pci_device_add()
				pci_init_capabilities()
					pci_iov_init()
	max += pci_iov_bus_range(bus);
		busnr = pci_iov_virtfn_bus(dev, dev->sriov->total_VFs - 1);
	max = pci_scan_bridge(bus, dev, max, pass);

>From this point, we see pci core reserve some bus range for VFs. This
calculation is based on the offset/stride at this moment. And do the
enumeration with the new bus number.

sriov_enable() could be called several times from driver to enable SRIOV, and
with different nr_virtfn. If each time the NumVFs written, the offset/stride
will change. This means we may try to address an extra bus we didn't reserve?
Or this means it is out of control?

Do I miss something?

2.3 How can I reserve bus range in FW?
===============================================================================
This question comes from the previous one.

Based on my understanding, current pci core will rely on the bus number in HW
if pcibios_assign_all_busses() is not set. If we want to support those VFs
sits on different bus with PF, we need to reserve bus range and write the
correct secondary/subordinate in bridge. Otherwise, those VFs on different bus
may not be addressed.

Currently I am writing the code in FW to reserve the range with the same
mechanism in pci core. While as you mentioned the offset/stride may change
after sriov_enable(), I am confused whether this is the correct way.

2.4 The potential problem for [Patch 08/18]
===============================================================================
According to the SPEC, the offset/stride will change after each
sriov_enable(). This means the bus/devfn will change after each
sriov_enable().

My current thought is to fix it up in virtfn_add(). If the total VF number
will not change, we could create those pci_dn at the beginning and fix the
bus/devfn at each time the VF is truely created.

-- 
Richard Yang
Help you, Help me

  parent reply	other threads:[~2014-11-20  7:21 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-02 15:41 [PATCH V9 00/18] Enable SRIOV on PowerNV Wei Yang
2014-11-02 15:41 ` [PATCH V9 01/18] PCI/IOV: Export interface for retrieve VF's BDF Wei Yang
2014-11-19 23:35   ` Bjorn Helgaas
2014-11-02 15:41 ` [PATCH V9 02/18] PCI: Add weak pcibios_iov_resource_alignment() interface Wei Yang
2014-11-02 15:41 ` [PATCH V9 03/18] PCI: Add weak pcibios_iov_resource_size() interface Wei Yang
2014-11-19  1:12   ` Bjorn Helgaas
2014-11-19  2:15     ` Benjamin Herrenschmidt
2014-11-19  3:21       ` Wei Yang
2014-11-19  4:26         ` Bjorn Helgaas
2014-11-19  9:27           ` Wei Yang
2014-11-19 17:23             ` Bjorn Helgaas
2014-11-19 20:51               ` Benjamin Herrenschmidt
2014-11-20  5:40                 ` Wei Yang
2014-11-20  5:39               ` Wei Yang
2014-11-02 15:41 ` [PATCH V9 04/18] PCI: Take additional PF's IOV BAR alignment in sizing and assigning Wei Yang
2014-11-02 15:41 ` [PATCH V9 05/18] powerpc/pci: Add PCI resource alignment documentation Wei Yang
2014-11-02 15:41 ` [PATCH V9 06/18] powerpc/pci: Don't unset pci resources for VFs Wei Yang
2014-11-02 15:41 ` [PATCH V9 07/18] powerpc/pci: Define pcibios_disable_device() on powerpc Wei Yang
2014-11-02 15:41 ` [PATCH V9 08/18] powrepc/pci: Refactor pci_dn Wei Yang
2014-11-19 23:30   ` Bjorn Helgaas
2014-11-20  1:02     ` Gavin Shan
2014-11-20  7:25       ` Wei Yang
2014-11-20  7:20     ` Wei Yang [this message]
2014-11-20 19:05       ` Bjorn Helgaas
2014-11-21  0:04         ` Gavin Shan
2014-11-25  9:28           ` Wei Yang
2014-11-21  1:46         ` Wei Yang
2014-11-02 15:41 ` [PATCH V9 09/18] powerpc/pci: remove pci_dn->pcidev field Wei Yang
2014-11-02 15:41 ` [PATCH V9 10/18] powerpc/powernv: Use pci_dn in PCI config accessor Wei Yang
2014-11-02 15:41 ` [PATCH V9 11/18] powerpc/powernv: Allocate pe->iommu_table dynamically Wei Yang
2014-11-02 15:41 ` [PATCH V9 12/18] powerpc/powernv: Expand VF resources according to the number of total_pe Wei Yang
2014-11-02 15:41 ` [PATCH V9 13/18] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv Wei Yang
2014-11-02 15:41 ` [PATCH V9 14/18] powerpc/powernv: Implement pcibios_iov_resource_size() " Wei Yang
2014-11-02 15:41 ` [PATCH V9 15/18] powerpc/powernv: Shift VF resource with an offset Wei Yang
2014-11-02 15:41 ` [PATCH V9 16/18] powerpc/powernv: Allocate VF PE Wei Yang
2014-11-02 15:41 ` [PATCH V9 17/18] powerpc/powernv: Expanding IOV BAR, with m64_per_iov supported Wei Yang
2014-11-02 15:41 ` [PATCH V9 18/18] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3 Wei Yang
2014-11-18 23:11 ` [PATCH V9 00/18] Enable SRIOV on PowerNV Gavin Shan
2014-11-18 23:40   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141120072057.GC8562@richard \
    --to=weiyang@linux.vnet.ibm.com \
    --cc=benh@au1.ibm.com \
    --cc=bhelgaas@google.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).