linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
       [not found] <20030907195557.GK14436@fs.tum.de.suse.lists.linux.kernel>
@ 2003-09-07 20:30 ` Andi Kleen
  2003-09-07 21:39   ` Dave Jones
       [not found] ` <Pine.LNX.4.30.0309072228110.9987-100000@swamp.bayern.net.suse.lists.linux.kernel>
  1 sibling, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2003-09-07 20:30 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: marcelo.tosatti, linux-kernel

Adrian Bunk <bunk@fs.tum.de> writes:

> With CONFIG_M686 CONFIG_X86_L1_CACHE_SHIFT was set to 5, but a Pentium 4 
> requires 7.

It doesn't require 7, it just prefers 7. 

> The patch below does:
> - set CONFIG_X86_L1_CACHE_SHIFT 7 for all Intel processors (needed for 
>   the Pentium 4)
> - set CONFIG_X86_L1_CACHE_SHIFT 6 for the K6 (needed for the Athlon)

I think these changes should be only done with CONFIG_X86_GENERIC is set.

Otherwise the people who want kernels really optimized for their CPUs
won't get the full benefit. On UP it does not make that much difference,
but on a SMP kernel having a bigger than needed cache size wastes a lot
of memory.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
  2003-09-07 20:30 ` [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT Andi Kleen
@ 2003-09-07 21:39   ` Dave Jones
  2003-09-08  8:15     ` Peter Daum
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Jones @ 2003-09-07 21:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Adrian Bunk, marcelo.tosatti, linux-kernel, Peter Daum

On Sun, Sep 07, 2003 at 10:30:52PM +0200, Andi Kleen wrote:
 > Adrian Bunk <bunk@fs.tum.de> writes:
 > 
 > > With CONFIG_M686 CONFIG_X86_L1_CACHE_SHIFT was set to 5, but a Pentium 4 
 > > requires 7.
 > It doesn't require 7, it just prefers 7. 

*nod*. This 'fix' also papers over the bug instead of fixing it.
Likely it's something like a network card driver setting its cacheline
size incorrectly. Peter what NIC did you see the problem on ?

I thought Ivan's PCI cacheline sizing fixes from 2.6
(see arch/i386/pci/common.c) already made it into 2.4,
but from a quick grep, it seems that didn't happen.

 > > The patch below does:
 > > - set CONFIG_X86_L1_CACHE_SHIFT 7 for all Intel processors (needed for 
 > >   the Pentium 4)
 > > - set CONFIG_X86_L1_CACHE_SHIFT 6 for the K6 (needed for the Athlon)
 > I think these changes should be only done with CONFIG_X86_GENERIC is set.
 > Otherwise the people who want kernels really optimized for their CPUs
 > won't get the full benefit. On UP it does not make that much difference,
 > but on a SMP kernel having a bigger than needed cache size wastes a lot
 > of memory.

ACK.

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
       [not found] ` <Pine.LNX.4.30.0309072228110.9987-100000@swamp.bayern.net.suse.lists.linux.kernel>
@ 2003-09-07 21:57   ` Andi Kleen
  0 siblings, 0 replies; 12+ messages in thread
From: Andi Kleen @ 2003-09-07 21:57 UTC (permalink / raw)
  To: Peter Daum; +Cc: Marcelo Tosatti, Adrian Bunk, linux-kernel

peter_daum@t-online.de (Peter Daum) writes:

> ... actually, the problems also occurred when running on machines
> with Pentium II/Pentium Pro CPUs - even on these machines, I only
> could use kernels compiled with "CONFIG_MPENTIUM4".
> 
> Adrian's patch does fix these problems. What is amazing, is that
> in kernel version 2.4.20, the same values were used for
> "CONFIG_X86_L1_CACHE_SHIFT". The problems that I described,
> however, occur only with 2.4.22 - the same machines with the same
> configuration work just fine with 2.4.20. Maybe, there's
> something else involved, too?

Yes it very much sounds like some memory corruption that is just
masked by the bigger cacheline padding.

Maybe you should try to compile with CONFIG_DEBUG_SLAB on 
and see if it triggers something?

The padding itself is a pure optimization, if it changes any behaviour
that's a bug.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
  2003-09-07 21:39   ` Dave Jones
@ 2003-09-08  8:15     ` Peter Daum
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Daum @ 2003-09-08  8:15 UTC (permalink / raw)
  To: Dave Jones; +Cc: Andi Kleen, Adrian Bunk, marcelo.tosatti, linux-kernel

Hi,

On Sun, 7 Sep 2003, Dave Jones wrote:

> *nod*. This 'fix' also papers over the bug instead of fixing it.
> Likely it's something like a network card driver setting its cacheline
> size incorrectly. Peter what NIC did you see the problem on ?

All the machines have Forerunner LE ATM NICs and use LAN
Emulation. I made an attempt to check whether the problems also occur
with ethernet, but for some reason the ethernet card also
didn't seem to work with 2.4.22. Maybe I should give this another
try ...

As mentioned, Adrian's patch for "CONFIG_X86_L1_CACHE_SHIFT"
seems to fix my current networking problems, but maybe the real
cause is something else.

Since somebody here mentioned "memory corruption": Already for
years I have been plagued by a bug somewhere in the ATM/LANE code
that causes the machines to crash from time to time (see
http://sourceforge.net/tracker/index.php?func=detail&aid=445059&group_id=7812&atid=107812)

I could not discover any pattern, when and under which
circumstances these crashes happen (usually, they occur with
several months in between) Several times, I managed to get at
least a stack trace, but the actual crashes occured at different
places in the code (which, I guess, could mean that the real
problem is somebody overwriting somebody elses memory). Could
there be any connection?

If somebody has any good idea how to find out, what is going on,
I'll be glad to investigate this further. At least, with my
current networking problems (see the thread "2.4.22 with
CONFIG_M686: networking broken") I have a test case ...

Regards,
               Peter Daum


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
  2003-09-08 19:45       ` Manfred Spraul
@ 2003-09-09 14:49         ` Peter Daum
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Daum @ 2003-09-09 14:49 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Jeff Garzik, Jamie Lokier, Adrian Bunk, linux-kernel

On Mon, 8 Sep 2003, Manfred Spraul wrote:

> Context: Peter experiences very bad network performance with 2.4.22 - it
> looks like 99% packet drop or something like that. The packet drop
> disappears if CONFIG_L1_CACHE_SHIFT is set to 7 (i.e. 128 byte cache
> line size). 2.4.21 works.
> The network cards are some kind of atm cards. Several systems are
> affected - at least Pentium II and PPro systems.
>
> Peter: what's the exact brand and nic driver that you use? Could you try
> to figure out what exactly breaks? I'd use "ping -f -s 1500", perhaps
> together with "tcpdump -s 1500 -x" on both ends.

Meanwhile, I could verify that the problems do not occur, when I use
an ethernet network adapter. All the machines have Marconi/Fore Systems
Forerunner LE ATM cards (/proc/pci: ATM network controller: Integrated
Device Tech IDT77211 ATM Adapter (rev 3)) with LAN emulation.

To figure out what exactly breaks, is the hard part: There is no general
malfunction, I only found some particular test cases. The easiest of these
is: "wget ftp://ftp.nai.com/pub/datfiles/english/dat-4291.zip". With a
vanilla 2.4.22 kernel, the data connection dies with a timeout after
transferring some kbytes. When capturing the connection with tcpdump, the
only thing unusual I can discover is that the time interval between 2
data packets coming in is unusually hight. I don't have the slightest
idea, what is really going wrong. Transferring data from some other ftp
server (e.g. ftp.kernel.org) works as usual.

As esoterical as it sounds, the whole issue is 100% reproducible and
disappears with CONFIG_L1_CACHE_SHIFT set to 7. (The example with wget
is only the easiest test I could find - for my purposes, the fact that
sendmail and samba don't work correctly makes the kernel almost useless).
"Ping" doesn't show anything unusual (no dropped packets). Since the
problem did not occur with older kernels, the "CONFIG_L1_CACHE_SHIFT"
setting can hardly be the real problem.

Regards,
               Peter



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
  2003-09-08 17:24     ` Jeff Garzik
@ 2003-09-08 19:45       ` Manfred Spraul
  2003-09-09 14:49         ` Peter Daum
  0 siblings, 1 reply; 12+ messages in thread
From: Manfred Spraul @ 2003-09-08 19:45 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Jamie Lokier, Adrian Bunk, linux-kernel, peter_daum

Jeff Garzik wrote:

>Yes; I've lost the specific context of the thread, but I have been
>working on MWI/cacheline size issues along with IvanK for a while.
>  
>
Context: Peter experiences very bad network performance with 2.4.22 - it 
looks like 99% packet drop or something like that. The packet drop 
disappears if CONFIG_L1_CACHE_SHIFT is set to 7 (i.e. 128 byte cache 
line size). 2.4.21 works.
The network cards are some kind of atm cards. Several systems are 
affected - at least Pentium II and PPro systems.

Peter: what's the exact brand and nic driver that you use? Could you try 
to figure out what exactly breaks? I'd use "ping -f -s 1500", perhaps 
together with "tcpdump -s 1500 -x" on both ends.

--
    Manfred


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
  2003-09-08 17:07   ` Jamie Lokier
@ 2003-09-08 17:24     ` Jeff Garzik
  2003-09-08 19:45       ` Manfred Spraul
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Garzik @ 2003-09-08 17:24 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Adrian Bunk, Manfred Spraul, linux-kernel, peter_daum

On Mon, Sep 08, 2003 at 06:07:51PM +0100, Jamie Lokier wrote:
> Adrian Bunk wrote:
> > > Why requires? On x86, the cpu caches are fully coherent. A too small L1 
> > > cache shift results in false sharing on SMP, but it shouldn't cause the 
> > > described problems.
> > >...
> > 
> > Thanks for the correction, I falsely thought CONFIG_X86_L1_CACHE_SHIFT 
> > does something different than it does.
> 
> Were there any changes in the kernel to do with PCI MWI settings?

Yes; I've lost the specific context of the thread, but I have been
working on MWI/cacheline size issues along with IvanK for a while.

It's apparently the responsibility of the OS to fill in correct
PCI_CACHE_LINE_SIZE values, which in the case of generic kernels must be
filled in at runtime (pci_cache_line_size) not at compile-time
(SMP_CACHE_BYTES, etc.)

If you don't call pci_set_mwi() for a PCI device, which triggers the
cacheline size fixups and other checks, then using
Memory-Write-Invalidate (MWI) is definitely wrong.  Or on an older
kernel, without the latest MWI changes, you could wind up programming
cacheline size to a value smaller than your current processor (again,
due to generic kernels).

If a feature/device/whatever can be programmed with cacheline size at
runtime, that will always be the preference.  With a compile-time
constant for cacheline size, you are _guaranteed_ it will be wrong in
some case.

	Jeff




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
  2003-09-08 14:20 ` Adrian Bunk
@ 2003-09-08 17:07   ` Jamie Lokier
  2003-09-08 17:24     ` Jeff Garzik
  0 siblings, 1 reply; 12+ messages in thread
From: Jamie Lokier @ 2003-09-08 17:07 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Manfred Spraul, linux-kernel, peter_daum

Adrian Bunk wrote:
> > Why requires? On x86, the cpu caches are fully coherent. A too small L1 
> > cache shift results in false sharing on SMP, but it shouldn't cause the 
> > described problems.
> >...
> 
> Thanks for the correction, I falsely thought CONFIG_X86_L1_CACHE_SHIFT 
> does something different than it does.

Were there any changes in the kernel to do with PCI MWI settings?

(MWI == memory write and invalidate)

If MWI is set incorrectly, I think PCI DMA is capable of breaking x86
cache coherence.

-- Jamie


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
  2003-09-07 20:36 Manfred Spraul
@ 2003-09-08 14:20 ` Adrian Bunk
  2003-09-08 17:07   ` Jamie Lokier
  0 siblings, 1 reply; 12+ messages in thread
From: Adrian Bunk @ 2003-09-08 14:20 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel, peter_daum

On Sun, Sep 07, 2003 at 10:36:19PM +0200, Manfred Spraul wrote:
> Adrian wrote:
> 
> >With CONFIG_M686 CONFIG_X86_L1_CACHE_SHIFT was set to 5, but a Pentium 4 
> >requires 7.
> > 
> >
> Why requires? On x86, the cpu caches are fully coherent. A too small L1 
> cache shift results in false sharing on SMP, but it shouldn't cause the 
> described problems.
>...

Thanks for the correction, I falsely thought CONFIG_X86_L1_CACHE_SHIFT 
does something different than it does.

>    Manfred

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
  2003-09-07 19:55 Adrian Bunk
@ 2003-09-07 20:59 ` Peter Daum
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Daum @ 2003-09-07 20:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Marcelo Tosatti, Adrian Bunk

Hi,

On Sun, 7 Sep 2003, Adrian Bunk wrote:

> Peter Daum reported in the "2.4.22 with CONFIG_M686: networking broken"
> thread some problems when using a kernel with CONFIG_M686 on a
> Pentium 4.

... actually, the problems also occurred when running on machines
with Pentium II/Pentium Pro CPUs - even on these machines, I only
could use kernels compiled with "CONFIG_MPENTIUM4".

Adrian's patch does fix these problems. What is amazing, is that
in kernel version 2.4.20, the same values were used for
"CONFIG_X86_L1_CACHE_SHIFT". The problems that I described,
however, occur only with 2.4.22 - the same machines with the same
configuration work just fine with 2.4.20. Maybe, there's
something else involved, too?

Regards,
                Peter



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
@ 2003-09-07 20:36 Manfred Spraul
  2003-09-08 14:20 ` Adrian Bunk
  0 siblings, 1 reply; 12+ messages in thread
From: Manfred Spraul @ 2003-09-07 20:36 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: linux-kernel, peter_daum

Adrian wrote:

>With CONFIG_M686 CONFIG_X86_L1_CACHE_SHIFT was set to 5, but a Pentium 4 
>requires 7.
>  
>
Why requires? On x86, the cpu caches are fully coherent. A too small L1 
cache shift results in false sharing on SMP, but it shouldn't cause the 
described problems.

And obviously: Pentium II cpus have a 32 byte cache line, increasing the 
L1 setting to 128 bytes only helps by chance.

My bet is that someone overwrites critical memory structures, and with 
more padding, the critical stuff is further away.

--
    Manfred



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT
@ 2003-09-07 19:55 Adrian Bunk
  2003-09-07 20:59 ` Peter Daum
  0 siblings, 1 reply; 12+ messages in thread
From: Adrian Bunk @ 2003-09-07 19:55 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, Peter Daum

Hi Marcelo,

Peter Daum reported in the "2.4.22 with CONFIG_M686: networking broken" 
thread some problems when using a kernel with CONFIG_M686 on a
Pentium 4.

With CONFIG_M686 CONFIG_X86_L1_CACHE_SHIFT was set to 5, but a Pentium 4 
requires 7.

The problem comes from the fact that in 2.4 selecting a processor means 
"this processor and all better processors are supported". Without 
breaking this semantics in a kernel series the only solution is to make 
CONFIG_X86_L1_CACHE_SHIFT for older processors higher.

The patch below does:
- set CONFIG_X86_L1_CACHE_SHIFT 7 for all Intel processors (needed for 
  the Pentium 4)
- set CONFIG_X86_L1_CACHE_SHIFT 6 for the K6 (needed for the Athlon)

This issue was already resolved in 2.6.

Please apply
Adrian

--- linux-2.4.23-pre3-full/arch/i386/config.in.old	2003-09-07 17:10:31.000000000 +0200
+++ linux-2.4.23-pre3-full/arch/i386/config.in	2003-09-07 17:11:47.000000000 +0200
@@ -51,7 +51,7 @@
 if [ "$CONFIG_M386" = "y" ]; then
    define_bool CONFIG_X86_CMPXCHG n
    define_bool CONFIG_X86_XADD n
-   define_int  CONFIG_X86_L1_CACHE_SHIFT 4
+   define_int  CONFIG_X86_L1_CACHE_SHIFT 7
    define_bool CONFIG_RWSEM_GENERIC_SPINLOCK y
    define_bool CONFIG_RWSEM_XCHGADD_ALGORITHM n
    define_bool CONFIG_X86_PPRO_FENCE y
@@ -67,21 +67,21 @@
    define_bool CONFIG_RWSEM_XCHGADD_ALGORITHM y
 fi
 if [ "$CONFIG_M486" = "y" ]; then
-   define_int  CONFIG_X86_L1_CACHE_SHIFT 4
+   define_int  CONFIG_X86_L1_CACHE_SHIFT 7
    define_bool CONFIG_X86_USE_STRING_486 y
    define_bool CONFIG_X86_ALIGNMENT_16 y
    define_bool CONFIG_X86_PPRO_FENCE y
    define_bool CONFIG_X86_F00F_WORKS_OK n
 fi
 if [ "$CONFIG_M586" = "y" ]; then
-   define_int  CONFIG_X86_L1_CACHE_SHIFT 5
+   define_int  CONFIG_X86_L1_CACHE_SHIFT 7
    define_bool CONFIG_X86_USE_STRING_486 y
    define_bool CONFIG_X86_ALIGNMENT_16 y
    define_bool CONFIG_X86_PPRO_FENCE y
    define_bool CONFIG_X86_F00F_WORKS_OK n
 fi
 if [ "$CONFIG_M586TSC" = "y" ]; then
-   define_int  CONFIG_X86_L1_CACHE_SHIFT 5
+   define_int  CONFIG_X86_L1_CACHE_SHIFT 7
    define_bool CONFIG_X86_USE_STRING_486 y
    define_bool CONFIG_X86_ALIGNMENT_16 y
    define_bool CONFIG_X86_HAS_TSC y
@@ -89,7 +89,7 @@
    define_bool CONFIG_X86_F00F_WORKS_OK n
 fi
 if [ "$CONFIG_M586MMX" = "y" ]; then
-   define_int  CONFIG_X86_L1_CACHE_SHIFT 5
+   define_int  CONFIG_X86_L1_CACHE_SHIFT 7
    define_bool CONFIG_X86_USE_STRING_486 y
    define_bool CONFIG_X86_ALIGNMENT_16 y
    define_bool CONFIG_X86_HAS_TSC y
@@ -98,7 +98,7 @@
    define_bool CONFIG_X86_F00F_WORKS_OK n
 fi
 if [ "$CONFIG_M686" = "y" ]; then
-   define_int  CONFIG_X86_L1_CACHE_SHIFT 5
+   define_int  CONFIG_X86_L1_CACHE_SHIFT 7
    define_bool CONFIG_X86_HAS_TSC y
    define_bool CONFIG_X86_GOOD_APIC y
    bool 'PGE extensions (not for Cyrix/Transmeta)' CONFIG_X86_PGE
@@ -107,7 +107,7 @@
    define_bool CONFIG_X86_F00F_WORKS_OK y
 fi
 if [ "$CONFIG_MPENTIUMIII" = "y" ]; then
-   define_int  CONFIG_X86_L1_CACHE_SHIFT 5
+   define_int  CONFIG_X86_L1_CACHE_SHIFT 7
    define_bool CONFIG_X86_HAS_TSC y
    define_bool CONFIG_X86_GOOD_APIC y
    define_bool CONFIG_X86_PGE y
@@ -123,7 +123,7 @@
    define_bool CONFIG_X86_F00F_WORKS_OK y
 fi
 if [ "$CONFIG_MK6" = "y" ]; then
-   define_int  CONFIG_X86_L1_CACHE_SHIFT 5
+   define_int  CONFIG_X86_L1_CACHE_SHIFT 6
    define_bool CONFIG_X86_ALIGNMENT_16 y
    define_bool CONFIG_X86_HAS_TSC y
    define_bool CONFIG_X86_USE_PPRO_CHECKSUM y

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-09-09 14:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20030907195557.GK14436@fs.tum.de.suse.lists.linux.kernel>
2003-09-07 20:30 ` [2.4 patch] fix CONFIG_X86_L1_CACHE_SHIFT Andi Kleen
2003-09-07 21:39   ` Dave Jones
2003-09-08  8:15     ` Peter Daum
     [not found] ` <Pine.LNX.4.30.0309072228110.9987-100000@swamp.bayern.net.suse.lists.linux.kernel>
2003-09-07 21:57   ` Andi Kleen
2003-09-07 20:36 Manfred Spraul
2003-09-08 14:20 ` Adrian Bunk
2003-09-08 17:07   ` Jamie Lokier
2003-09-08 17:24     ` Jeff Garzik
2003-09-08 19:45       ` Manfred Spraul
2003-09-09 14:49         ` Peter Daum
  -- strict thread matches above, loose matches on Subject: below --
2003-09-07 19:55 Adrian Bunk
2003-09-07 20:59 ` Peter Daum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).