All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin LaHaise <bcrl@kvack.org>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jon Mason <mason@myri.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@suse.de>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH 2/3] pci: Clamp pcie_set_readrq() when using "performance" settings
Date: Tue, 4 Oct 2011 12:44:30 -0400	[thread overview]
Message-ID: <20111004164430.GH19130@kvack.org> (raw)
In-Reply-To: <1317745197.29415.244.camel@pasglop>

On Tue, Oct 04, 2011 at 06:19:57PM +0200, Benjamin Herrenschmidt wrote:
> On Tue, 2011-10-04 at 11:59 -0400, Benjamin LaHaise wrote:
...
> > My concern, which I forgot to put in the original message is that allowing 
> > a bridge to have a large MPS than the endpoints will result in things 
> > failing when a large write occurs.  Aiui, we don't restrict the size of 
> > memcpy_toio() type functions, and there are PCIe devices which do not 
> > perform DMA.  Clamping MPS on the bridge is a requirement of correctness.
> 
> Yes, this is a problem if your bridge can generate large writes. Ours
> can't so this is safe. Our bridges can't write combine beyond 128 bytes.

> Originally, I had that whole logic implemented in arch specific code,
> but when Jon started fiddling with MPS/MRRS in generic code, I suggested
> to have the generic code have the option of doing what I want. Maybe we
> should remove it completely as a user-visible option and leave it back
> to platform code only. In any case, it isn't the default.

Ah, I see now.  Yes, if that can be put into arch specific code, it makes 
far more sense since things will work correctly assuming the bridge also 
limits read completions with data to 128 bytes?

> Our performance guys say that between 128 and 512 or 2048 which is what
> we face in some cases here, it's worth it. I haven't yet had a chance to
> verify by myself.

Just to give you an idea of the problems with generating small read 
read requests: the on-the-wire cost of sending a 1500 byte ethernet packet 
with 128 byte read requests is the difference between 1x 12-16 byte TLP 
on the transmit link of the endpoint vs 12x 12-16 byte TLPs (the overhead 
for a TLP is at least 8 bytes of header and CRC iirc, maybe more for the 
DLLP flag bytes) or going from 20-24 bytes per tx packet to 256-288 bytes 
of read requests.  This won't matter much for devices with lots of excess 
bandwidth, but for PCIe 1x devices (ie common gige chipsets) that really 
hurts.  It also means that only 2 or 3 packets can have read requests 
outstanding with the default limit of 32 tags, but hopefully the device 
supports extended tags.

		-ben

      reply	other threads:[~2011-10-04 16:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-03 14:50 [PATCH 2/3] pci: Clamp pcie_set_readrq() when using "performance" settings Jon Mason
2011-10-03 20:56 ` Linus Torvalds
2011-10-04 15:40   ` Benjamin Herrenschmidt
2011-10-04 15:48     ` Linus Torvalds
2011-10-04 15:56       ` Bjorn Helgaas
2011-10-04 16:08       ` Benjamin Herrenschmidt
2011-10-04 16:51         ` Linus Torvalds
2011-10-04 17:30           ` Benjamin Herrenschmidt
2011-10-04 17:36             ` Linus Torvalds
2011-10-05  7:01               ` Benjamin Herrenschmidt
2011-10-05 14:49                 ` Linus Torvalds
2011-10-05 16:26                   ` Jesse Barnes
2011-10-04 17:41             ` Benjamin LaHaise
2011-10-03 21:55 ` Jon Mason
2011-10-04 14:42   ` Benjamin LaHaise
2011-10-04 15:37     ` Benjamin LaHaise
2011-10-04 15:52     ` Benjamin Herrenschmidt
2011-10-04 15:59       ` Benjamin LaHaise
2011-10-04 16:19         ` Benjamin Herrenschmidt
2011-10-04 16:44           ` Benjamin LaHaise [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111004164430.GH19130@kvack.org \
    --to=bcrl@kvack.org \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=gregkh@suse.de \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mason@myri.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.