linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pci_set_mwi() ... why isn't it used more?
@ 2003-01-20 18:41 David Brownell
  2003-01-20 19:00 ` Jeff Garzik
  0 siblings, 1 reply; 10+ messages in thread
From: David Brownell @ 2003-01-20 18:41 UTC (permalink / raw)
  To: linux-kernel

I was looking at some new hardware and noticed that it's
got explicit support for the PCI Memory Write and Invalidate
command ... enabled (in part) under Linux by pci_set_mwi().

However, very few Linux drivers use that routine.  Given
that it can lead to improved performance, and that devices
don't have to implement that enable bit, I'm curious what
the story is...

  - Just laziness or lack-of-education on the part of
    driver writers?

  - Iffy upport in motherboard chipsets or CPUs?  If so,
    which ones?

  - Flakey support in PCI devices, so that enabling it
    leads to trouble?

  - Something else?

  - Combination of all the above?

Briefly, MWI can avoid some cache flushes, thereby reducing
memory bus contention.  It can also enable longer PCI bursts
(since the dma master won't stop writing mid-cacheline).

And calling pci_set_mwi() makes sure that the device knows
the correct cache line size, which can make Memory Read
Multiple (and Memory Read Line) commands work better (also
with longer PCI bursts) by hinting to bridges when prefetch
would be a Fine Thing ... likewise reducing memory bus
contention.  Those benefits can happen even if the hardware
doesn't support MWI; on my systems I noticed that the
cacheline size is always set too small by default, which
seems like a PCI initialization bug.

So what's the story ... is there some reason Linux isn't
trying to enable such PCI features more often?  And why
it doesn't set the cacheline size correctly by default?

- Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pci_set_mwi() ... why isn't it used more?
  2003-01-20 18:41 pci_set_mwi() ... why isn't it used more? David Brownell
@ 2003-01-20 19:00 ` Jeff Garzik
  2003-01-20 19:37   ` David Brownell
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Garzik @ 2003-01-20 19:00 UTC (permalink / raw)
  To: David Brownell; +Cc: linux-kernel

On Mon, Jan 20, 2003 at 10:41:35AM -0800, David Brownell wrote:
> I was looking at some new hardware and noticed that it's
> got explicit support for the PCI Memory Write and Invalidate
> command ... enabled (in part) under Linux by pci_set_mwi().
> 
> However, very few Linux drivers use that routine.  Given
> that it can lead to improved performance, and that devices
> don't have to implement that enable bit, I'm curious what
> the story is...
> 
>  - Just laziness or lack-of-education on the part of
>    driver writers?
> 
>  - Iffy upport in motherboard chipsets or CPUs?  If so,
>    which ones?
> 
>  - Flakey support in PCI devices, so that enabling it
>    leads to trouble?
> 
>  - Something else?
> 
>  - Combination of all the above?

You missed the reason entirely ;-)

pci_set_mwi() is brand new, I just added it.  Hasn't filtered down to
drivers yet.  The few drivers that cared prior to its addition, like
drivers/net/acenic.c, just hand-coded the workarounds needed for proper
MWI support on all chipsets.

pci_set_mwi() would not exist at all, were it not for the existing
hardware quirks.  (if hardware were sane, drivers would just
individually twiddle the _INVALIDATE bit in PCI_COMMAND, and never call
functions other than pci_{read,write}_config_word.

	Jeff




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pci_set_mwi() ... why isn't it used more?
  2003-01-20 19:00 ` Jeff Garzik
@ 2003-01-20 19:37   ` David Brownell
  2003-01-30 13:52     ` Anton Blanchard
  0 siblings, 1 reply; 10+ messages in thread
From: David Brownell @ 2003-01-20 19:37 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

Jeff Garzik wrote:
> On Mon, Jan 20, 2003 at 10:41:35AM -0800, David Brownell wrote:
> 
>>I was looking at some new hardware and noticed that it's
>>got explicit support for the PCI Memory Write and Invalidate
>>command ... enabled (in part) under Linux by pci_set_mwi().
>>
>>However, very few Linux drivers use that routine.  Given
>>that it can lead to improved performance, and that devices
>>don't have to implement that enable bit, I'm curious what
>>the story is...
> 
> You missed the reason entirely ;-)

What, with a "covers everything" choice like "something else"? ;)


But to confirm:  you're saying there's no particular reason not to
use it pretty generally?  (Or at least, no known reason?)

I'd mostly be concerned about potential bridge/cpu chipset problems,
since those are the class of problems I'd have very little chance
of noticing, with only a handful of test platforms.  If individual
devices have broken MWI it'd be easy for them not to enable it.
But if they have to cope with buggy platform implementations...

I suppose the potential for broken PCI devices is exactly why MWI
isn't automatically enabled when DMA mastering is enabled, though
I don't understand why the cacheline size doesn't get fixed then
(unless it's that same issue).  Devices can use the cacheline size
to get better Memory Read Line/Read Multiple" throughput; setting
it shouldn't be tied exclusively to enabling MWI.


> pci_set_mwi() is brand new, I just added it.  Hasn't filtered down to
> drivers yet.  The few drivers that cared prior to its addition, like
> drivers/net/acenic.c, just hand-coded the workarounds needed for proper
> MWI support on all chipsets.

Yep, I noticed that it grew from acenic.  Didn't check back too many
kernel revs though, I guess "new" is relative ... 2.4 and 2.5 both
have it today.


> pci_set_mwi() would not exist at all, were it not for the existing
> hardware quirks.  (if hardware were sane, drivers would just
> individually twiddle the _INVALIDATE bit in PCI_COMMAND, and never call
> functions other than pci_{read,write}_config_word.

Actually I sort of prefer having the extra logic (set cacheline size,
twiddle that bit) out of drivers; there's no reason to have two copies
of that, particularly given there's already one arch-specific tweak.

Not that it's complex code, but it's easier for driver writers to
just know "call pci_set_mwi() if you're using DMA, unless you know
the hardware is buggy in that way" than to replicate its logic.

- Dave








^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pci_set_mwi() ... why isn't it used more?
  2003-01-20 19:37   ` David Brownell
@ 2003-01-30 13:52     ` Anton Blanchard
  2003-01-30 16:25       ` David Brownell
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Blanchard @ 2003-01-30 13:52 UTC (permalink / raw)
  To: David Brownell; +Cc: Jeff Garzik, linux-kernel

 
> I suppose the potential for broken PCI devices is exactly why MWI
> isn't automatically enabled when DMA mastering is enabled, though
> I don't understand why the cacheline size doesn't get fixed then
> (unless it's that same issue).  Devices can use the cacheline size
> to get better Memory Read Line/Read Multiple" throughput; setting
> it shouldn't be tied exclusively to enabling MWI.

There are a number of cases where we cant enable MWI either because
the PCI card doesnt support the CPU cacheline size or we have to set the
PCI card cacheline size to something smaller due to various bugs.

eg I understand earlier versions of the e100 dont support a 128 byte
cacheline (and the top bits are read only so setting it up for 128 bytes
will result in it it being set to 0). Not good for read multiple/line
and even worse if we decide to enable MWI :)

Anton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pci_set_mwi() ... why isn't it used more?
  2003-01-30 13:52     ` Anton Blanchard
@ 2003-01-30 16:25       ` David Brownell
  2003-01-30 16:59         ` Ivan Kokshaysky
  0 siblings, 1 reply; 10+ messages in thread
From: David Brownell @ 2003-01-30 16:25 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Jeff Garzik, linux-kernel

Anton Blanchard wrote:
>  
> 
>>I suppose the potential for broken PCI devices is exactly why MWI
>>isn't automatically enabled when DMA mastering is enabled, though
>>I don't understand why the cacheline size doesn't get fixed then
>>(unless it's that same issue).  Devices can use the cacheline size
>>to get better Memory Read Line/Read Multiple" throughput; setting
>>it shouldn't be tied exclusively to enabling MWI.
> 
> 
> There are a number of cases where we cant enable MWI either because
> the PCI card doesnt support the CPU cacheline size or we have to set the
> PCI card cacheline size to something smaller due to various bugs.

At least the former case seems like it should be easily detectible.
The cacheline setting "must" be supported (sez the PCI spec) if MWI
can be enabled ... and may be supported in other cases too.


> eg I understand earlier versions of the e100 dont support a 128 byte
> cacheline (and the top bits are read only so setting it up for 128 bytes
> will result in it it being set to 0). Not good for read multiple/line
> and even worse if we decide to enable MWI :)

At least on 2.5.59, the pci_generic_prep_mwi() code doesn't check
for that type of error:  it just writes the cacheline size, and
doesn't verify that setting it worked as expected.  Checking for
that kind of problem would make it safer to call pci_set_mwi() in
such cases ... e.g. using it on a P4 with 128 byte L1 cachelines
would fail cleanly, while on an Athlon (64 byte L1) it might work
(depending in which top bits are unusable).

- Dave



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pci_set_mwi() ... why isn't it used more?
  2003-01-30 16:25       ` David Brownell
@ 2003-01-30 16:59         ` Ivan Kokshaysky
  2003-01-30 18:35           ` David Brownell
  0 siblings, 1 reply; 10+ messages in thread
From: Ivan Kokshaysky @ 2003-01-30 16:59 UTC (permalink / raw)
  To: David Brownell; +Cc: Anton Blanchard, Jeff Garzik, linux-kernel

On Thu, Jan 30, 2003 at 08:25:07AM -0800, David Brownell wrote:
> At least on 2.5.59, the pci_generic_prep_mwi() code doesn't check
> for that type of error:  it just writes the cacheline size, and
> doesn't verify that setting it worked as expected.  Checking for
> that kind of problem would make it safer to call pci_set_mwi() in
> such cases ... e.g. using it on a P4 with 128 byte L1 cachelines
> would fail cleanly, while on an Athlon (64 byte L1) it might work
> (depending in which top bits are unusable).

Hmm, what happens if you boot the kernel configured for 80386
on P4 and enable MWI?

Ivan.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pci_set_mwi() ... why isn't it used more?
  2003-01-30 16:59         ` Ivan Kokshaysky
@ 2003-01-30 18:35           ` David Brownell
  2003-01-30 23:34             ` Ivan Kokshaysky
  0 siblings, 1 reply; 10+ messages in thread
From: David Brownell @ 2003-01-30 18:35 UTC (permalink / raw)
  To: Ivan Kokshaysky; +Cc: Anton Blanchard, Jeff Garzik, linux-kernel

Ivan Kokshaysky wrote:
> 
> Hmm, what happens if you boot the kernel configured for 80386
> on P4 and enable MWI?

Which answer is better?

  - It works safely:  pci cache line size not less than the real one.

  - There's I/O-pattern dependant breakage caused by that config goof.

I think the first answer is better, but it looks like 2.5.59 will
set the pci cache line size to 16 bytes not 128 bytes in that case.

The generic pci code to set cacheline size uses SMP_CACHE_BYTES,
which is usually L1_CACHE_BYTES ... and looks incorrect at least on
non-SMP ia64.  Only sparc64 seems to have non-generic code to set
the line size but it's done for all devices, as they're probed.

Maybe i386 should HAVE_ARCH_PCI_MWI, smart enough to use the actual
CPU's cache line size (boot code saves that, yes?) and maybe even
check for that particular class of breakage Anton mentioned.

Another option would be to do like SPARC64 and set the cacheline
sizes as part of DMA enable (which is what I'd first thought of).
And have the breakage test in the ARCH_PCI_MWI code -- something
that sparc64 doesn't do, fwiw.

- Dave





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pci_set_mwi() ... why isn't it used more?
  2003-01-30 18:35           ` David Brownell
@ 2003-01-30 23:34             ` Ivan Kokshaysky
  2003-01-31  0:11               ` Jeff Garzik
  2003-01-31  0:51               ` David Brownell
  0 siblings, 2 replies; 10+ messages in thread
From: Ivan Kokshaysky @ 2003-01-30 23:34 UTC (permalink / raw)
  To: David Brownell
  Cc: Ivan Kokshaysky, Anton Blanchard, Jeff Garzik, linux-kernel

On Thu, Jan 30, 2003 at 10:35:25AM -0800, David Brownell wrote:
> I think the first answer is better, but it looks like 2.5.59 will
> set the pci cache line size to 16 bytes not 128 bytes in that case.

Yes, and it looks dangerous as the device would transfer incomplete
cache lines with MWI...

> Another option would be to do like SPARC64 and set the cacheline
> sizes as part of DMA enable (which is what I'd first thought of).
> And have the breakage test in the ARCH_PCI_MWI code -- something
> that sparc64 doesn't do, fwiw.

Actually I think there is nothing wrong if we'll try to be a bit
more aggressive with MWI and move all of this into generic
pci_set_master().
To do it safely, we need
- kind of "broken_mwi" field in the struct pci_dev for buggy devices,
  it can be set either by PCI quirks or by driver before pci_set_master()
  call;
- arch-specific pci_cache_line_size() function/macro (instead of
  SMP_CACHE_BYTES) that returns either actual CPU cache line size
  or other safe value (including 0, which means "don't enable MWI");
- check that the device does support desired cache line size, i.e.
  read back the value that we've written into the PCI_CACHE_LINE_SIZE
  register and if it's zero (or dev->broken_mwi == 1) don't enable MWI.

Thoughts?

Ivan.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pci_set_mwi() ... why isn't it used more?
  2003-01-30 23:34             ` Ivan Kokshaysky
@ 2003-01-31  0:11               ` Jeff Garzik
  2003-01-31  0:51               ` David Brownell
  1 sibling, 0 replies; 10+ messages in thread
From: Jeff Garzik @ 2003-01-31  0:11 UTC (permalink / raw)
  To: Ivan Kokshaysky; +Cc: David Brownell, Anton Blanchard, linux-kernel

Ivan Kokshaysky wrote:
> Thoughts?


Seems sane to me...  I agree it's aggressive, because you're changing 
behavior of pci_set_master().  But given that the MWI stuff in pci.c is 
so new, I think that's probably ok.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pci_set_mwi() ... why isn't it used more?
  2003-01-30 23:34             ` Ivan Kokshaysky
  2003-01-31  0:11               ` Jeff Garzik
@ 2003-01-31  0:51               ` David Brownell
  1 sibling, 0 replies; 10+ messages in thread
From: David Brownell @ 2003-01-31  0:51 UTC (permalink / raw)
  To: Ivan Kokshaysky; +Cc: Anton Blanchard, Jeff Garzik, linux-kernel

Ivan Kokshaysky wrote:
> On Thu, Jan 30, 2003 at 10:35:25AM -0800, David Brownell wrote:
> 
>>I think the first answer is better, but it looks like 2.5.59 will
>>set the pci cache line size to 16 bytes not 128 bytes in that case.
> 
> 
> Yes, and it looks dangerous as the device would transfer incomplete
> cache lines with MWI...

... while invalidating complete lines, yes that's what I meant about
it causing breakage.


>>Another option would be to do like SPARC64 and set the cacheline
>>sizes as part of DMA enable (which is what I'd first thought of).
>>And have the breakage test in the ARCH_PCI_MWI code -- something
>>that sparc64 doesn't do, fwiw.

It also sets latencies ... one could get the impression that most
PCI code on Linux hasn't been tuned for bus utilization yet!


> Actually I think there is nothing wrong if we'll try to be a bit
> more aggressive with MWI and move all of this into generic
> pci_set_master().
> To do it safely, we need
> - kind of "broken_mwi" field in the struct pci_dev for buggy devices,
>   it can be set either by PCI quirks or by driver before pci_set_master()
>   call;
> - arch-specific pci_cache_line_size() function/macro (instead of
>   SMP_CACHE_BYTES) that returns either actual CPU cache line size
>   or other safe value (including 0, which means "don't enable MWI");
> - check that the device does support desired cache line size, i.e.
>   read back the value that we've written into the PCI_CACHE_LINE_SIZE
>   register and if it's zero (or dev->broken_mwi == 1) don't enable MWI.
> 
> Thoughts?

Sounds plausible to me, but then I was asking about this because
it was evident there were a few more things going on here than I
was aware of.  You imply removing pci_{set,clear}_mwi() too.

I certainly agree that setting the cache line size automatically
should happen; and having that be correct means there's got to
be some arch dependent code.  That alone might help improve
throughput on PCI_DMA_TODEVICE, even on MWI-incapable devices.

It might be appropriate to make aggressively setting MWI be a
Kconfig (or boot) option.  That it "should not" be problematic
doesn't mean that some 2.6 users might not find trouble.

Also, benchmarks from folk with really busy PCI busses would
be interesting.  All this stuff can be tweaked with "setpci",
so they could be generated without needing kernel patches.

- Dave







^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-01-31  0:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-20 18:41 pci_set_mwi() ... why isn't it used more? David Brownell
2003-01-20 19:00 ` Jeff Garzik
2003-01-20 19:37   ` David Brownell
2003-01-30 13:52     ` Anton Blanchard
2003-01-30 16:25       ` David Brownell
2003-01-30 16:59         ` Ivan Kokshaysky
2003-01-30 18:35           ` David Brownell
2003-01-30 23:34             ` Ivan Kokshaysky
2003-01-31  0:11               ` Jeff Garzik
2003-01-31  0:51               ` David Brownell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).