Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Serge Semin <fancer.lancer@gmail.com>,
	linux-kernel@vger.kernel.org, linux-ntb@googlegroups.com,
	linux-pci@vger.kernel.org, iommu@lists.linux-foundation.org,
	linux-kselftest@vger.kernel.org, Jon Mason <jdmason@kudzu.us>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Joerg Roedel <joro@8bytes.org>, Allen Hubbe <allenbh@gmail.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Eric Pilmore <epilmore@gigaio.com>
Subject: Re: [PATCH v2 07/12] NTB: Introduce functions to calculate multi-port resource index
Date: Wed, 6 Mar 2019 12:11:11 -0700
Message-ID: <bd72f24f-5982-0fe7-59df-2fbbfe9f798a@deltatee.com> (raw)
In-Reply-To: <20190306012420.wjeatxgb7nwq3j5q@mobilestation>



On 2019-03-05 6:24 p.m., Serge Semin wrote:
>> + * In a 5 peer system, this function will return the following matrix
>> + *
>> + * pidx \ port    0    1    2    3    4
>> + * 0              0    0    1    2    3
>> + * 1              0    1    2    3    4
>> + * 2              0    1    2    3    4
>> + * 3              0    1    2    3    4
>> + *

Oh, first, oops: looks like I copied this down wrong anyway; the code
was what I had intended, but the documented example should have been:

pidx \ local_port     0    1    2    3    4
 0                    0    0    1    2    3
 1                    0    1    1    2    3
 2                    0    1    2    2    3
 3                    0    1    2    3    3

And this is definitely the correct table we are aiming for.
ntb_peer_resource_idx() is supposed to return the result of
ntb_peer_port_idx(ntb, local_port) when run on the peer specified by pidx.

Note: this table also makes sense because it only uses 4 resources for 5
ports which is the best case scenario. (In other words, to communicate
between N ports, N-1 resources are required on each peer).

> This table is too simplified to represent a generic case of port-index
> mapping table. In particular the IDT PCIe switch got it ports numbered
> with uneven integers like: 0 2 4 6 8 12 16 20 or 0 8 16, and so on.
> Moreover some of the ports might be disabled or may have NTB functions
> deactivated, in which case these ports shouldn't be considered by NTB subsystem
> at all. Basically we may have any increasing subset of that port
> numbers depending on the current IDT PCIe-switch ports setup.

Yes, I did not consider situations where there would be gaps in the
"port number" space. It wasn't at all clear from the code that this was
possible. Switchtec hardware could be configured for such an
arrangement, but I don't know why anyone would do that as it just
needlessly complicates everything.

As you point out, with a gap, we end up with something that is wrong:

pidx \ port     0    1    3    4    5
 0              0    0    2    3    4
 1              0    1    2    3    4
 2              0    1    3    3    4
 3              0    1    3    4    4

Here, the relationship between ntb_peer_resource_idx() and
ntb_peer_port_idx() is not maintained and it seems to prescribe 5
resources for 5 ports. If there were more gaps it would be even more wrong.

>> +static inline int ntb_peer_resource_idx(struct ntb_dev *ntb, int pidx)
>> +{
>> +	int local_port, peer_port;
>> +
>> +	if (pidx >= ntb_peer_port_count(ntb))
>> +		return -EINVAL;
>> +
>> +	local_port = ntb_port_number(ntb);
>> +	peer_port = ntb_peer_port_number(ntb, pidx);
>> +
>> +	if (peer_port < local_port)
>> +		return local_port - 1;
>> +	else
>> +		return local_port;
>> +}
>> +
> 
> Instead of redefining the port-index table we can just fix the
> ntb_peer_resource_idx() method, so it would return a global port index
> instead of some number based on the port number. It can be done just by
> the next modification:
> 
> +     if (peer_port <= local_port)
> +             return pidx;
> +     else
> +             return pidx + 1;
> 

This creates a table that looks like:

pidx \ port     0    1    2    3    4
 0              1    0    0    0    0
 1              2    2    1    1    1
 2              3    3    3    2    2
 3              4    4    4    4    3

Which is not correct. In fact, it seems to require 5 resources for 5
ports. This appears to be what is done in the current ntb_perf and I
think I figured it out several months ago but it's way too messy and
hard to understand and I don't want to spend the time to figure it out
again.

IMO, in order to support gaps, we'd need to, on some layer, create an
un-gapped numbering scheme for the ports. I think the easiest thing is
to just have Logical and Physical port numbers; so we would have
something like:

Physical Port Number: 0  2  4  6  8  12  16  20
Logical Port Number:  0  1  2  3  4  5   6   7
Peer Index (Port 0):  x  0  1  2  3  4   5   6
Port Index (Port 8):  0  1  2  3  x  4   5   6
(etc)

Where the Physical Port Number is whatever the hardware uses and the
logical port number is a numbering scheme starting with zero with no
gaps. Then the port indexes are still as we currently have them. If we
say that the port numbers we have now are the Logical Port Number, then
ntb_peer_resource_idx() is correct.

I would strongly argue that the clients don't need to know anything
about the Physical Port Number and these should be handled strictly
inside the drivers. If multiple drivers need to do something similar to
map the logical to physical port numbers then we should introduce helper
functions to allow them to do so. If the Physical Numbers are not
contained in the driver than the API would need to be expanded to expose
which numbers are actually used to avoid needing to constantly loop
through all the indexes to find this out.

On a similar vein, I'd  suggest that most clients shouldn't even really
need to do anything with the Logical Port Number and should deal largely
with Port Indexes. Ideally, Logical Port Numbers should only be used by
helper code in the common layer to help assign resources used by the
clients (like ntb_peer_resource_idx()).

This world view isn't far off from what we have now, though you *may*
need to adjust your IDT driver and we will have to eventually clean up
the existing test clients to use the new helper functions.

> Personally I'd prefer the first solution even though it may lead to the
> "Unsupported TLP" errors and cause a greater code changes. Here is why:
> 1) the error might be IDT-specific, so we shouldn't limit the API due to
> one particular hardware peculiarity,
> 2) port-index table with global indexes implementation shall simplify the IDT
> NTB hw driver and provide a cleaner NTB API with simpler shared resources
> utilization code.

> The final decision is after the NTB subsystem maintainers. If they agree with
> solution #1 I'll send a corresponding patchset on this week, so you can
> alter this patchset being based on it.

I think what we have right now is close enough and we just have to clean
up the code and fix things. I don't think we need to do another big
change to the semantics. I *certainly* don't want to risk breaking
everything again to do it.
 Logan

  reply index

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-13 17:54 [PATCH v2 00/12] Support using MSI interrupts in ntb_transport Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 01/12] iommu/vt-d: Implement dma_[un]map_resource() Logan Gunthorpe
2019-02-13 17:57   ` Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 02/12] NTB: ntb_transport: Ensure the destination buffer is mapped for TX DMA Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 03/12] iommu/vt-d: Add helper to set an IRTE to verify only the bus number Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 04/12] iommu/vt-d: Allow interrupts from the entire bus for aliased devices Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 05/12] PCI/MSI: Support allocating virtual MSI interrupts Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 06/12] PCI/switchtec: Add module parameter to request more interrupts Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 07/12] NTB: Introduce functions to calculate multi-port resource index Logan Gunthorpe
2019-03-06  1:24   ` Serge Semin
2019-03-06 19:11     ` Logan Gunthorpe [this message]
2019-03-06 22:45       ` Serge Semin
2019-03-06 23:22         ` Logan Gunthorpe
2019-03-12 20:42           ` Serge Semin
2019-03-12 21:30             ` Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 08/12] NTB: Rename ntb.c to support multiple source files in the module Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 09/12] NTB: Introduce MSI library Logan Gunthorpe
2019-03-06 20:26   ` Serge Semin
2019-03-06 21:35     ` Logan Gunthorpe
2019-03-06 23:13       ` Serge Semin
2019-02-13 17:54 ` [PATCH v2 10/12] NTB: Introduce NTB MSI Test Client Logan Gunthorpe
2019-03-06 20:44   ` Serge Semin
2019-03-06 21:39     ` Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 11/12] NTB: Add ntb_msi_test support to ntb_test Logan Gunthorpe
2019-02-13 17:54 ` [PATCH v2 12/12] NTB: Add MSI interrupt support to ntb_transport Logan Gunthorpe
2019-02-26  9:34 ` [PATCH v2 00/12] Support using MSI interrupts in ntb_transport Joerg Roedel
2019-02-26 16:18   ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd72f24f-5982-0fe7-59df-2fbbfe9f798a@deltatee.com \
    --to=logang@deltatee.com \
    --cc=allenbh@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=dave.jiang@intel.com \
    --cc=epilmore@gigaio.com \
    --cc=fancer.lancer@gmail.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jdmason@kudzu.us \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-ntb@googlegroups.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git