* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) [not found] ` <1060772769.8009.4.camel@localhost.localdomain.suse.lists.linux.kernel> @ 2003-08-13 11:17 ` Andi Kleen [not found] ` <20030813042544.5064b3f4.akpm@osdl.org.suse.lists.linux.kernel> 1 sibling, 0 replies; 24+ messages in thread From: Andi Kleen @ 2003-08-13 11:17 UTC (permalink / raw) To: Alan Cox; +Cc: akpm, linux-kernel Alan Cox <alan@lxorguk.ukuu.org.uk> writes: > Put the likely(pos) in the asm/prefetch for Athlon until someone can out what is going on with some specific Athlons, 2.6 and certain > kernels (notably 4G/4G) You can use the same workaround as x86-64. add an exception handler and just jump back. Advantage is that it is completely outside the fast path. But note you also have to add runtime sorting of __ex_table when you do this, otherwise the __ex_table becomes unsorted when someone uses list_for_each (which does prefetch) in a __init function (all code is available in x86-64, just needs to be ported over) -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20030813042544.5064b3f4.akpm@osdl.org.suse.lists.linux.kernel>]
[parent not found: <1060774803.8008.24.camel@localhost.localdomain.suse.lists.linux.kernel>]
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) [not found] ` <1060774803.8008.24.camel@localhost.localdomain.suse.lists.linux.kernel> @ 2003-08-13 12:10 ` Andi Kleen 2003-08-13 12:48 ` Alan Cox 0 siblings, 1 reply; 24+ messages in thread From: Andi Kleen @ 2003-08-13 12:10 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel Alan Cox <alan@lxorguk.ukuu.org.uk> writes: > On Mer, 2003-08-13 at 12:25, Andrew Morton wrote: > > Like this? > > > > What happens if someone runs a K6 kernel on a K7? > > Or various other CPU types? What is the matrix here? > > Beats me, but then the prefetch code in 2.6 seems broken from > 5 seconds of inspection anyway. We are testing the XMM feature > and using prefetchnta for Athlon, thats wrong for lots of athlon > processors that dont have XMM but do have prefetch/prefetchw, > (which btw also seem to work properly on all these processors > while prefetchnta seems to do funky things) The early Athlon Specific test was not done to avoid too much bloat. (three alternatives instead of two) Most Athlons in existence should have XMM already and the rest works. You can hardly call that broken. I would be surprised if prefetch behaves differently than prefetchnta on Athlon. If the bug is similar to what happens on Opteron then I bet it won't make a difference. > For Athlon we should be testing 3Dnow, and using prefetch/prefetchw > for Intel cases we want to go for prefetchnta if XMM is set (PIII, PIV) That's done for write prefetches correctly. (as Intel does not have a write prefetch) -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 12:10 ` Andi Kleen @ 2003-08-13 12:48 ` Alan Cox 2003-08-13 13:14 ` Andi Kleen 0 siblings, 1 reply; 24+ messages in thread From: Alan Cox @ 2003-08-13 12:48 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel Mailing List On Mer, 2003-08-13 at 13:10, Andi Kleen wrote: > > Beats me, but then the prefetch code in 2.6 seems broken from > > 5 seconds of inspection anyway. We are testing the XMM feature > > and using prefetchnta for Athlon, thats wrong for lots of athlon > > processors that dont have XMM but do have prefetch/prefetchw, > > (which btw also seem to work properly on all these processors > > while prefetchnta seems to do funky things) > > The early Athlon Specific test was not done to avoid too much bloat. > (three alternatives instead of two) Lets replace working code with broken macros whoooo.. progress. Lots of Athlons don't have XMM, most of the older ones where prefetch has the most impact in fact. (The XMM using ones have the hw prefetcher too). > Most Athlons in existence should have XMM already and the rest works. Lots don't have XMM > You can hardly call that broken. I just did. Its worse than 2.4 behaviour. > That's done for write prefetches correctly. > (as Intel does not have a write prefetch) Actually its iffy too. 3Dnow doesnt imply prefetchw. You must test 3Dnow && vendor==AMD && Athlon. (K6 prefetchw is slower than not using it, other 3Dnow chips dont have it eg the Cyrix MII which may explain a couple of things. I don't see anywhere we mask the 3Dnow property by these but I've not dug through the CPU code right now to see if we have a "3Dnowplus" type definition we can check. I suspect the best way to do prefetch cleanly would be something like this #if defined(CONFIG_MK7) alternative_input("prefetch" or "prefetchnta") #else alternative_input(ASM_NOP4 or "prefetchnta"); #endif Ideally we want a 3 way patch table to fix up at boot time but the if case at least gets us back to desirable situations. Also if I remember the prefetch exception thing rightly you can misalign the prefetch instruction as a workaround. Alan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 12:48 ` Alan Cox @ 2003-08-13 13:14 ` Andi Kleen 2003-08-13 14:09 ` Alan Cox 0 siblings, 1 reply; 24+ messages in thread From: Andi Kleen @ 2003-08-13 13:14 UTC (permalink / raw) To: Alan Cox; +Cc: Andi Kleen, Linux Kernel Mailing List On Wed, Aug 13, 2003 at 01:48:45PM +0100, Alan Cox wrote: > On Mer, 2003-08-13 at 13:10, Andi Kleen wrote: > > > Beats me, but then the prefetch code in 2.6 seems broken from > > > 5 seconds of inspection anyway. We are testing the XMM feature > > > and using prefetchnta for Athlon, thats wrong for lots of athlon > > > processors that dont have XMM but do have prefetch/prefetchw, > > > (which btw also seem to work properly on all these processors > > > while prefetchnta seems to do funky things) > > > > The early Athlon Specific test was not done to avoid too much bloat. > > (three alternatives instead of two) > > Lets replace working code with broken macros whoooo.. progress. Lots of > Athlons don't have XMM, most of the older ones where prefetch has the > most impact in fact. (The XMM using ones have the hw prefetcher too). hw prefetch has nothing to do with how the linux kernel uses prefetch. It's only using it for data structures that cannot be handled by the auto prefetcher. [except the broken 3dnow! copy that was never enabled] > > > Most Athlons in existence should have XMM already and the rest works. > > Lots don't have XMM All XPs have. > > > You can hardly call that broken. > > I just did. Its worse than 2.4 behaviour. In 2.4 distribution users never got anything. That is what was really broken. > > > That's done for write prefetches correctly. > > (as Intel does not have a write prefetch) > > Actually its iffy too. 3Dnow doesnt imply prefetchw. You My AMD manual lists it as part of 3dnow. If an CPU advertises 3dnow! but doesn't have the instruction it's broken. > must test 3Dnow && vendor==AMD && Athlon. (K6 prefetchw > is slower than not using it, other 3Dnow chips dont have it > eg the Cyrix MII which may explain a couple of things. I don't I would consider the MII broken then. setup should clear the 3dnow bit. > see anywhere we mask the 3Dnow property by these but I've not > dug through the CPU code right now to see if we have a "3Dnowplus" > type definition we can check. there is 3dnowext, set on Athlons, but K6 has prefetchw too and it But if you only want Athlon you can check for X86_FEATURE_K7. The problem is that it doesn't include K8 and K8 has prefetchw too (alternative currently only allows a single bit, not a bitmask). Better is to either clear 3dnow on the MII or define a new pseudo bit that defines working and useful prefetchw > > I suspect the best way to do prefetch cleanly would be something > like this > > #if defined(CONFIG_MK7) > alternative_input("prefetch" or "prefetchnta") > #else > alternative_input(ASM_NOP4 or "prefetchnta"); > #endif No for weird combinations you define a new pseudo CPUID capability bit, check for that in the CPU detection and use that in the alternative. If you really want 3 way alternative you can just define a macro for it. The basic data structure supports it - the macro just needs to have two .altinstructions records and two replacement codes. But I have my doubt it is worth it for this case. No stinkin' ifdefs please, that would break the whole concept. > > Ideally we want a 3 way patch table to fix up at boot time but the if > case at least gets us back to desirable situations. Also if I remember > the prefetch exception thing rightly you can misalign the prefetch > instruction as a workaround. Nope, no misalignment. All it does is to just handle the exception using __ex_table and jumps to the next instruction. [the exceptions are very rare, they need very specific circumstances in the CPU to trigger, so it's ok to make it slow] Only trap is that you have to add the exception table sorting too... -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 13:14 ` Andi Kleen @ 2003-08-13 14:09 ` Alan Cox 2003-08-13 14:20 ` Andi Kleen 0 siblings, 1 reply; 24+ messages in thread From: Alan Cox @ 2003-08-13 14:09 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel Mailing List On Mer, 2003-08-13 at 14:14, Andi Kleen wrote: > hw prefetch has nothing to do with how the linux kernel uses prefetch. > It's only using it for data structures that cannot be handled by > the auto prefetcher. > > [except the broken 3dnow! copy that was never enabled] Not broken in 2.4, although the 2.4-ac kernel uses movntq instead for Athlon as it is faster than mmx_memcpy, which we use for Cyrix/VIA/IDT processors where it is a win. > > > Most Athlons in existence should have XMM already and the rest works. > > > > Lots don't have XMM > > All XPs have. And what about all the pre MP/XP ones, lots of those. > My AMD manual lists it as part of 3dnow. If an CPU advertises 3dnow! > but doesn't have the instruction it's broken. My AMD docs list it as part of the AMD extended 3dnow. The original 3dnow as done by AMD/Cyrix does not have it > I would consider the MII broken then. setup should clear the 3dnow > bit. "Mummy it doesnt work like I personally have decreed it shall lets break it and screw all the users". Thats the Dan Bernstein school of charm theory of software development. > there is 3dnowext, set on Athlons, but K6 has prefetchw too and > it 3dnowext is what we want here. It might end up doing a prefetchw on K6 but at least K6 actually has the instruction... > But if you only want Athlon you can check for X86_FEATURE_K7. > The problem is that it doesn't include K8 and K8 > has prefetchw too (alternative currently only allows a single > bit, not a bitmask). Better is to either clear 3dnow on the MII > or define a new pseudo bit that defines working and useful > prefetchw We want a pseudobit - otherwise we'll break other code that checks 3dnow is present properly. > > #if defined(CONFIG_MK7) > > alternative_input("prefetch" or "prefetchnta") > > #else > > alternative_input(ASM_NOP4 or "prefetchnta"); > > #endif > > No for weird combinations you define a new pseudo CPUID capability > bit, check for that in the CPU detection and use that in the alternative. Ok > If you really want 3 way alternative you can just define a macro > for it. The basic data structure supports it - the macro > just needs to have two .altinstructions records and two replacement codes. > But I have my doubt it is worth it for this case. prefetching is a big win on older Athlon because the CPU is fast and the chipset/ram sucks hugely relative to it > No stinkin' ifdefs please, that would break the whole concept. Ok > > case at least gets us back to desirable situations. Also if I remember > > the prefetch exception thing rightly you can misalign the prefetch > > instruction as a workaround. > > Nope, no misalignment. All it does is to just handle the exception > using __ex_table and jumps to the next instruction. If you misalign the instruction you don't seem to get the exception on Athlon, dunno about the Opteron errata or if the opteron errata bites in 32bit. If it does I guess we should clear mmx, xmm for Opteron by your arguments ;) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 14:09 ` Alan Cox @ 2003-08-13 14:20 ` Andi Kleen 2003-08-13 15:20 ` Alan Cox 0 siblings, 1 reply; 24+ messages in thread From: Andi Kleen @ 2003-08-13 14:20 UTC (permalink / raw) To: Alan Cox; +Cc: Andi Kleen, Linux Kernel Mailing List On Wed, Aug 13, 2003 at 03:09:55PM +0100, Alan Cox wrote: > On Mer, 2003-08-13 at 14:14, Andi Kleen wrote: > > hw prefetch has nothing to do with how the linux kernel uses prefetch. > > It's only using it for data structures that cannot be handled by > > the auto prefetcher. > > > > [except the broken 3dnow! copy that was never enabled] > > Not broken in 2.4, although the 2.4-ac kernel uses movntq instead for > Athlon as it is faster than mmx_memcpy, which we use for Cyrix/VIA/IDT > processors where it is a win. movntq for a memcpy? That ise a very bad idea. It wins in micro benchmarks, but the destination is pushed out of cache and the next code accessing the destination will suffer badly from the cache misses. All the NT stuff is basically useless in the kernel because it only helps with data sets significantly bigger than your cache, and we usually only deal with 4K chunks of everything. [I did the same mistake early on Opteron/x86-64 for copy_page, but later fixed it] > > My AMD manual lists it as part of 3dnow. If an CPU advertises 3dnow! > > but doesn't have the instruction it's broken. > > My AMD docs list it as part of the AMD extended 3dnow. The original > 3dnow as done by AMD/Cyrix does not have it The x86-64 manuals lists it as part of the 3dnow feature set. The K6 has it, right? Is there a "more original" 3dnow that what has been in the K6? > > I would consider the MII broken then. setup should clear the 3dnow > > bit. > > "Mummy it doesnt work like I personally have decreed it shall lets break > it and screw all the users". Thats the Dan Bernstein school of charm It doesn't work like the AMD instruction reference manual describes it. > theory of software development. Being a bit touchy from the heat today ? @) Of course it should be fixed, but the fix as it is a bug workaround doesn't have to be very fast. So it would be ok to just clear the 3dnow bit. But then to handle the K6 case (which is interesting, I didn't know) too it would be probably better to define a separate bit. > > The problem is that it doesn't include K8 and K8 > > has prefetchw too (alternative currently only allows a single > > bit, not a bitmask). Better is to either clear 3dnow on the MII > > or define a new pseudo bit that defines working and useful > > prefetchw > > We want a pseudobit - otherwise we'll break other code that checks > 3dnow is present properly. Ok. I will do that when I'm back next week unless someone beats me to it ;-) > > If you really want 3 way alternative you can just define a macro > > for it. The basic data structure supports it - the macro > > just needs to have two .altinstructions records and two replacement codes. > > But I have my doubt it is worth it for this case. > > prefetching is a big win on older Athlon because the CPU is fast and the > chipset/ram sucks hugely relative to it Hmm ok. So it probably needs an alternative3(). It's not hard to do, just a bit ugly because the macro will have a lot of arguments. > > Nope, no misalignment. All it does is to just handle the exception > > using __ex_table and jumps to the next instruction. > > If you misalign the instruction you don't seem to get the exception on > Athlon, dunno about the Opteron errata or if the opteron errata bites in > 32bit. If it does I guess we should clear mmx, xmm for Opteron by your > arguments ;) I didn't know about the misalignment bit. Interesting. Misalignment to what boundary? But is it slower than an aligned execution? If yes I would prefer my solution because it keeps the fast path as fast as possible. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 14:20 ` Andi Kleen @ 2003-08-13 15:20 ` Alan Cox 2003-08-13 15:32 ` Andi Kleen 2003-08-13 16:39 ` Dave Jones 0 siblings, 2 replies; 24+ messages in thread From: Alan Cox @ 2003-08-13 15:20 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel Mailing List On Mer, 2003-08-13 at 15:20, Andi Kleen wrote: > stuff is basically useless in the kernel because it only helps with data > sets significantly bigger than your cache, and we usually only deal > with 4K chunks of everything. Could be. I didnt write that code. I think Manfred also played with the copy tricks that came from the AMD slides. > The K6 has it, right? > Is there a "more original" 3dnow that what has been in the K6? K6-II/III does. I don't know about original K6. but I believe it doesn't. The original 3Dnow was a joint Cyrix/AMD thing and it lacks several instructions later added (including prefetch). The later Cyrix also has a couple of the additional ones but not prefetch. > > "Mummy it doesnt work like I personally have decreed it shall lets break > > it and screw all the users". Thats the Dan Bernstein school of charm > > It doesn't work like the AMD instruction reference manual describes it. Well there is a suprise, AMD didn't design it 8) > Of course it should be fixed, but the fix as it is a bug workaround > doesn't have to be very fast. So it would be ok to just clear the 3dnow bit. > But then to handle the K6 case (which is interesting, I didn't know) too it > would be probably better to define a separate bit. What else checks the 3Dnow bit ? > > We want a pseudobit - otherwise we'll break other code that checks > > 3dnow is present properly. > > Ok. I will do that when I'm back next week unless someone beats me > to it ;-) Some kind of "has prefetch and its actually useful" 8) > > If you misalign the instruction you don't seem to get the exception on > > Athlon, dunno about the Opteron errata or if the opteron errata bites in > > 32bit. If it does I guess we should clear mmx, xmm for Opteron by your > > arguments ;) > > I didn't know about the misalignment bit. Interesting. Misalignment to > what boundary? I'll have to go check again. Its something RH internal testing found when people were going "uh what the hell is going on here" 8) > But is it slower than an aligned execution? If yes I would prefer my > solution because it keeps the fast path as fast as possible. Has AMD confirmed that your solution is ok for the K7 as well as K8 - ie that if we hit the errata the fixup recovers the CPU from whatever lunatic state it is now in ? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 15:20 ` Alan Cox @ 2003-08-13 15:32 ` Andi Kleen 2003-08-13 18:44 ` Alan Cox 2003-08-13 16:39 ` Dave Jones 1 sibling, 1 reply; 24+ messages in thread From: Andi Kleen @ 2003-08-13 15:32 UTC (permalink / raw) To: Alan Cox; +Cc: Andi Kleen, Linux Kernel Mailing List On Wed, Aug 13, 2003 at 04:20:11PM +0100, Alan Cox wrote: > On Mer, 2003-08-13 at 15:20, Andi Kleen wrote: > > stuff is basically useless in the kernel because it only helps with data > > sets significantly bigger than your cache, and we usually only deal > > with 4K chunks of everything. > > Could be. I didnt write that code. I think Manfred also played with the > copy tricks that came from the AMD slides. The AMD slides assume all very big data sets ;-) I would recommend to remove it. > > Of course it should be fixed, but the fix as it is a bug workaround > > doesn't have to be very fast. So it would be ok to just clear the 3dnow bit. > > But then to handle the K6 case (which is interesting, I didn't know) too it > > would be probably better to define a separate bit. > > What else checks the 3Dnow bit ? Nothing in kernel AFAIK, but it's possible that it is used by user space reading /proc/cpuinfo. > > > > Ok. I will do that when I'm back next week unless someone beats me > > to it ;-) > > Some kind of "has prefetch and its actually useful" 8) X86_FEATURE_PREFETCHW X86_FEATURE_PREFETCH3DNOW (note I didn't volunteer to write alternative3 for the later, someone else has to do that if they want it ;-) > > But is it slower than an aligned execution? If yes I would prefer my > > solution because it keeps the fast path as fast as possible. > > Has AMD confirmed that your solution is ok for the K7 as well as K8 - ie > that if we hit the errata the fixup recovers the CPU from whatever > lunatic state it is now in ? My solution is a fix as the problem is described in the Opteron Specification Update (and also as our own testing showed - we discovered the problem originally) The Errata is basically: When there is a prefetch and a load for the same address in flight and the load faults and the CPU is in a very specific complicated state then the Exception is reported on the prefetch, not the fault. The fix just handles the exception and doesn't crash. At least on Opteron it can be also fixed with a magic bit in the BIOS, maybe that's possible on XP too. But I opted to work around it in the kernel to not force all people to get a new BIOS. BTW we saw it mainly in the x86-64 copy_*_user and csum_copy_* functions which do also prefetches. LTP would sometimes trigger it when it tests how the kernel behaves with invalid addresses. But it happened very rarely in the dcache hash too. But still it's hard to trigger, the linked list one is very hard to hit. I tried to reproduce it in user space, but failed. The LTP one is much easier, but still not that common. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 15:32 ` Andi Kleen @ 2003-08-13 18:44 ` Alan Cox 2003-08-13 18:53 ` Andi Kleen 0 siblings, 1 reply; 24+ messages in thread From: Alan Cox @ 2003-08-13 18:44 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel Mailing List On Mer, 2003-08-13 at 16:32, Andi Kleen wrote: > The AMD slides assume all very big data sets ;-) > > I would recommend to remove it. I'll do some timings when I get a moment - the prefetching mmx copy was a win (and faster than the others for small data as well as large on the K7-550 (really a K7 not "Athlon" 8)) way back when. > > What else checks the 3Dnow bit ? > > Nothing in kernel AFAIK, but it's possible that it is used by user space > reading /proc/cpuinfo. DaveJ and your docs are right on 3dnow it turns out so sorry about that and ignore me on prefetchw, its just the prefetch side thats 3 way. > BTW we saw it mainly in the x86-64 copy_*_user and csum_copy_* functions > which do also prefetches. LTP would sometimes trigger it when it tests > how the kernel behaves with invalid addresses. But it happened very > rarely in the dcache hash too. But still it's hard to trigger, the > linked list one is very hard to hit. I tried to reproduce it in user space, > but failed. The LTP one is much easier, but still not that common. Thanks ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 18:44 ` Alan Cox @ 2003-08-13 18:53 ` Andi Kleen 0 siblings, 0 replies; 24+ messages in thread From: Andi Kleen @ 2003-08-13 18:53 UTC (permalink / raw) To: Alan Cox; +Cc: Andi Kleen, Linux Kernel Mailing List On Wed, Aug 13, 2003 at 07:44:24PM +0100, Alan Cox wrote: > On Mer, 2003-08-13 at 16:32, Andi Kleen wrote: > > The AMD slides assume all very big data sets ;-) > > > > I would recommend to remove it. > > I'll do some timings when I get a moment - the prefetching mmx copy Microbenchmarks are useless for this. You have to bench the users too, otherwise you don't recognize the additional cache misses. > was a win (and faster than the others for small data as well as large > on the K7-550 (really a K7 not "Athlon" 8)) way back when. Possible. In my experience the best copy functions vary widly between different steppings. Just optimizing for a single one is probably not a good idea, especially not for an very old one (except when you add dynamic patches for the different steppings, but then it quickly gets ugly with too many variants) When in doubt use the more simple function. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 15:20 ` Alan Cox 2003-08-13 15:32 ` Andi Kleen @ 2003-08-13 16:39 ` Dave Jones 2003-08-13 18:34 ` Alan Cox 2003-08-13 18:37 ` Alan Cox 1 sibling, 2 replies; 24+ messages in thread From: Dave Jones @ 2003-08-13 16:39 UTC (permalink / raw) To: Alan Cox; +Cc: Andi Kleen, Linux Kernel Mailing List On Wed, Aug 13, 2003 at 04:20:11PM +0100, Alan Cox wrote: > K6-II/III does. I don't know about original K6. but I believe it > doesn't. The original 3Dnow was a joint Cyrix/AMD thing and it lacks > several instructions later added (including prefetch). The later Cyrix > also has a couple of the additional ones but not prefetch. Which Cyrixen are you talking about ? C3's up to and including Ezra-T should DTRT when it comes to 3dnow prefetch instruction, and pre-VIA Cyrixen didn't have 3dnow at all iirc. Dave -- Dave Jones http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 16:39 ` Dave Jones @ 2003-08-13 18:34 ` Alan Cox 2003-08-13 20:12 ` Dave Jones 2003-08-13 18:37 ` Alan Cox 1 sibling, 1 reply; 24+ messages in thread From: Alan Cox @ 2003-08-13 18:34 UTC (permalink / raw) To: Dave Jones; +Cc: Andi Kleen, Linux Kernel Mailing List On Mer, 2003-08-13 at 17:39, Dave Jones wrote: > > several instructions later added (including prefetch). The later Cyrix > > also has a couple of the additional ones but not prefetch. Ok I got this crossed - the Cyrix/AMD thing was the extended MMX stuff, they both did 3Dnow! but the Jalapeno old style Cyrix CPU with 3dnow was canned. Ok so there is a different reason why my Cyrix crashes on boot with 2.6test. Andi is right that 3Dnow safely implies prefetch. My docs list it as part of extended MMX not 3dnow although the Cyrix seems to not posess the instruction anyway. So 3dnow == prefetch/prefetchw is ok but not useful on K6. > Which Cyrixen are you talking about ? > C3's up to and including Ezra-T should DTRT when it comes to > 3dnow prefetch instruction, and pre-VIA Cyrixen didn't have 3dnow > at all iirc. pre VIA Cyrixen have MMX and CXMMX. The CPU also set bit 31 but doesn't have 3dnow (which fooled me but the kernel does know about). C3's seem to have prefetch/prefetchw (but not prefetchnta). I don't have a nemeiah but I assume Nemeiah has prefetchnta too ? I've tried building a summary list. Additional contributions welcomed MMX: Pentium (later only), Cyrix MediaGX (later only), Cyrix 6x86/MII Intel PII/PIII/PIV, AMD K6/Athlon/Opteron, VIA Cyrix III, VIA C3 CXMMX: Extended MMX - Cyrix MII/AMD K6(II+ ?)/K7/Opteron 3DNOW: AMD K6-II/III(not original K6),K7/,Opteron, VIA Cyrix III, VIA C3 (pre Nemiah only ??) "Enhanced" 3DNow: Athlon Tbird SSE: Intel PII, PIII, Athlon (XP, Duron >=1Gz only) SSE2: Pentium IV So the prefetch fallback is needed for pre Nemiah C3, Duron < 1Ghz and pre T-Bird Athlon if my table is right. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 18:34 ` Alan Cox @ 2003-08-13 20:12 ` Dave Jones 0 siblings, 0 replies; 24+ messages in thread From: Dave Jones @ 2003-08-13 20:12 UTC (permalink / raw) To: Alan Cox; +Cc: Andi Kleen, Linux Kernel Mailing List On Wed, Aug 13, 2003 at 07:34:36PM +0100, Alan Cox wrote: > pre VIA Cyrixen have MMX and CXMMX. The CPU also set bit 31 but doesn't > have 3dnow (which fooled me but the kernel does know about). C3's seem > to have prefetch/prefetchw (but not prefetchnta). I don't have a nemeiah > but I assume Nemeiah has prefetchnta too ? With Nehemiah, they dropped 3dnow, and went with SSE. > MMX: Pentium (later only), Cyrix MediaGX (later only), Cyrix 6x86/MII > Intel PII/PIII/PIV, AMD K6/Athlon/Opteron, VIA Cyrix III, VIA C3 > CXMMX: Extended MMX - Cyrix MII/AMD K6(II+ ?)/K7/Opteron > 3DNOW: AMD K6-II/III(not original K6),K7/,Opteron, VIA Cyrix III, > VIA C3 (pre Nemiah only ??) "Nehemiah". + Winchip-2A (though as mentioned, prefetch is a nop, the rest of 3dnow worked though iirc). > "Enhanced" 3DNow: Athlon Tbird > SSE: Intel PII, PIII, Athlon (XP, Duron >=1Gz only) > SSE2: Pentium IV Dave -- Dave Jones http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 16:39 ` Dave Jones 2003-08-13 18:34 ` Alan Cox @ 2003-08-13 18:37 ` Alan Cox 1 sibling, 0 replies; 24+ messages in thread From: Alan Cox @ 2003-08-13 18:37 UTC (permalink / raw) To: Dave Jones; +Cc: Andi Kleen, Linux Kernel Mailing List And one other update Winchip C6 - MMX, extended MMX Winchip II+ , MMX, extended MMX, 3Dnow (dunno if it has prefetch I don't have one of these) ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: 2.6.0-test3-mm1: scheduling while atomic (ext3?) @ 2003-08-13 18:59 richard.brunner 0 siblings, 0 replies; 24+ messages in thread From: richard.brunner @ 2003-08-13 18:59 UTC (permalink / raw) To: ak, alan; +Cc: linux-kernel See below. ] -Rich ... ] AMD Fellow ] richard.brunner at amd com > From: Andi Kleen [mailto:ak@suse.de] > On Wed, Aug 13, 2003 at 04:20:11PM +0100, Alan Cox wrote: > > On Mer, 2003-08-13 at 15:20, Andi Kleen wrote: > > Has AMD confirmed that your solution is ok for the K7 as well as K8 - > > ie that if we hit the errata the fixup recovers the CPU from whatever > > lunatic state it is now in ? > > My solution is a fix as the problem is described in the > Opteron Specification Update (and also as our own testing > showed - we discovered the problem originally) Hi, AMD has not confirmed anything with respect to this issue on K7/Athlon. We are currently trying to get the code that reproduces this bug into AMD so we can see what triggers it. Andi's workaround for Opteron (before a BIOS fix was available), is probably a fine *short-term* workaround until we can get back to you on this. AMD believes that dimissing a exception on prefetch as spurious on Athlon will work you around the current problem. Linking the Opteron bug to an Athlon bug is pre-mature at this point. > The Errata is basically: When there is a prefetch and a load for the > same address in flight and the load faults and the CPU is in > a very specific complicated state then the Exception is > reported on the prefetch, not the fault. > > The fix just handles the exception and doesn't crash. > > At least on Opteron it can be also fixed with a magic bit in > the BIOS, maybe that's possible on XP too. But I opted to > work around it in the kernel to not force all people to get a > new BIOS. Let us get back to you, ok? I am starting up our internal validation people to go poke at this. ^ permalink raw reply [flat|nested] 24+ messages in thread
* 2.6.0-test3-mm1: scheduling while atomic (ext3?) @ 2003-08-13 4:56 Jurriaan 2003-08-13 8:47 ` Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: Jurriaan @ 2003-08-13 4:56 UTC (permalink / raw) To: linux-kernel Aug 13 06:47:48 middle -- MARK -- Aug 13 06:53:03 middle kernel: printing eip: Aug 13 06:53:03 middle kernel: c016c14a Aug 13 06:53:03 middle kernel: Oops: 0000 [#1] Aug 13 06:53:03 middle kernel: PREEMPT Aug 13 06:53:03 middle kernel: CPU: 0 Aug 13 06:53:03 middle kernel: EIP: 0060:[<c016c14a>] Not tainted VLI Aug 13 06:53:03 middle kernel: EFLAGS: 00010286 Aug 13 06:53:03 middle kernel: EIP is at find_inode_fast+0x1a/0x60 Aug 13 06:53:03 middle kernel: eax: f7b7e000 ebx: 000d5ff4 ecx: e68e9a48 edx: 00000000 Aug 13 06:53:03 middle kernel: esi: f7b7e000 edi: c1a50d80 ebp: f2f41e14 esp: f2f41e04 Aug 13 06:53:03 middle kernel: ds: 007b es: 007b ss: 0068 Aug 13 06:53:03 middle kernel: Process make (pid: 9500, threadinfo=f2f40000 task=eb0966a0) Aug 13 06:53:03 middle kernel: Stack: f0c05cc0 f2f40000 f0271cc0 000d5ff4 f2f41e38 c016c7c0 f7b7e000 c1a50d80 Aug 13 06:53:03 middle kernel: 000d5ff4 c1a50d80 000d5ff4 f0271cc0 f7b7e000 f2f41e58 c018fc92 f7b7e000 Aug 13 06:53:03 middle kernel: 000d5ff4 c3600234 fffffff4 dddd2a74 dddd2a08 f2f41e7c c0160b10 dddd2a08 Aug 13 06:53:03 middle kernel: Call Trace: Aug 13 06:53:03 middle kernel: [<c016c7c0>] iget_locked+0x50/0xc0 Aug 13 06:53:03 middle kernel: [<c018fc92>] ext3_lookup+0x62/0xd0 Aug 13 06:53:03 middle kernel: [<c0160b10>] real_lookup+0xc0/0xf0 Aug 13 06:53:03 middle kernel: [<c0160d84>] do_lookup+0x84/0x90 Aug 13 06:53:03 middle kernel: [<c0161211>] link_path_walk+0x481/0x870 Aug 13 06:53:03 middle kernel: [<c0161abe>] __user_walk+0x3e/0x60 Aug 13 06:53:03 middle kernel: [<c015cdce>] vfs_stat+0x1e/0x60 Aug 13 06:53:03 middle kernel: [<c015d43b>] sys_stat64+0x1b/0x40 Aug 13 06:53:03 middle kernel: [<c03ca78f>] syscall_call+0x7/0xb Aug 13 06:53:03 middle kernel: Aug 13 06:53:03 middle kernel: Code: 75 ca eb c6 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 04 8b 5d 10 8b 7d 0c 8b 75 08 8b 0f 85 c9 74 13 8b 11 <0f> 18 02 90 39 59 18 89 c8 74 10 85 d2 89 d1 75 ed 31 c0 83 c4 Aug 13 06:53:03 middle kernel: <6>note: make[9500] exited with preempt_count 1 Aug 13 06:53:03 middle kernel: Call Trace: Aug 13 06:53:03 middle kernel: [<c011c29e>] schedule+0x58e/0x5a0 Aug 13 06:53:03 middle kernel: [<c0143f31>] unmap_page_range+0x41/0x70 Aug 13 06:53:03 middle kernel: [<c014411f>] unmap_vmas+0x1bf/0x220 Aug 13 06:53:03 middle kernel: [<c0147e39>] exit_mmap+0x79/0x190 Aug 13 06:53:03 middle kernel: [<c011dd9a>] mmput+0x7a/0xe0 Aug 13 06:53:03 middle kernel: [<c0121a88>] do_exit+0x118/0x3f0 Aug 13 06:53:03 middle kernel: [<c011a1e0>] do_page_fault+0x0/0x46b Aug 13 06:53:03 middle kernel: [<c010b6c9>] die+0xf9/0x100 Aug 13 06:53:03 middle kernel: [<c011a30d>] do_page_fault+0x12d/0x46b Aug 13 06:53:03 middle kernel: [<c0155b38>] __getblk+0x28/0x50 Aug 13 06:53:03 middle kernel: [<c018b8e2>] ext3_getblk+0x92/0x290 Aug 13 06:53:03 middle kernel: [<c01542b1>] wake_up_buffer+0x11/0x30 Aug 13 06:53:03 middle kernel: [<c01542fe>] unlock_buffer+0x2e/0x50 Aug 13 06:53:03 middle kernel: [<c0157afd>] ll_rw_block+0x4d/0x80 Aug 13 06:53:03 middle kernel: [<c018f957>] ext3_find_entry+0x307/0x3c0 Aug 13 06:53:03 middle kernel: [<c011a1e0>] do_page_fault+0x0/0x46b Aug 13 06:53:03 middle kernel: [<c03cb19b>] error_code+0x2f/0x38 Aug 13 06:53:03 middle kernel: [<c016c14a>] find_inode_fast+0x1a/0x60 Aug 13 06:53:03 middle kernel: [<c016c7c0>] iget_locked+0x50/0xc0 Aug 13 06:53:03 middle kernel: [<c018fc92>] ext3_lookup+0x62/0xd0 Aug 13 06:53:03 middle kernel: [<c0160b10>] real_lookup+0xc0/0xf0 Aug 13 06:53:03 middle kernel: [<c0160d84>] do_lookup+0x84/0x90 Aug 13 06:53:03 middle kernel: [<c0161211>] link_path_walk+0x481/0x870 Aug 13 06:53:03 middle kernel: [<c0161abe>] __user_walk+0x3e/0x60 Aug 13 06:53:03 middle kernel: [<c015cdce>] vfs_stat+0x1e/0x60 Aug 13 06:53:03 middle kernel: [<c015d43b>] sys_stat64+0x1b/0x40 Aug 13 06:53:03 middle kernel: [<c03ca78f>] syscall_call+0x7/0xb Aug 13 06:53:03 middle kernel: Kind regards, Jurriaan -- All lies all lies all schemes all schemes Every winner means a loser in the western dream. News Model Army - Western Dream Debian (Unstable) GNU/Linux 2.6.0-test3-mm1 4276 bogomips load av: 0.00 0.26 0.26 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 4:56 Jurriaan @ 2003-08-13 8:47 ` Andrew Morton 2003-08-13 9:19 ` Jurriaan on adsl-gate 2003-08-13 11:06 ` Andi Kleen 0 siblings, 2 replies; 24+ messages in thread From: Andrew Morton @ 2003-08-13 8:47 UTC (permalink / raw) To: thunder7; +Cc: linux-kernel, Andi Kleen, Zwane Mwaikambo, Dave Jones Jurriaan <thunder7@xs4all.nl> wrote: > > Aug 13 06:47:48 middle -- MARK -- > Aug 13 06:53:03 middle kernel: printing eip: > Aug 13 06:53:03 middle kernel: c016c14a > Aug 13 06:53:03 middle kernel: Oops: 0000 [#1] > Aug 13 06:53:03 middle kernel: PREEMPT > Aug 13 06:53:03 middle kernel: CPU: 0 > Aug 13 06:53:03 middle kernel: EIP: 0060:[<c016c14a>] Not tainted VLI > Aug 13 06:53:03 middle kernel: EFLAGS: 00010286 > Aug 13 06:53:03 middle kernel: EIP is at find_inode_fast+0x1a/0x60 > Aug 13 06:53:03 middle kernel: eax: f7b7e000 ebx: 000d5ff4 ecx: e68e9a48 edx: 00000000 > Aug 13 06:53:03 middle kernel: esi: f7b7e000 edi: c1a50d80 ebp: f2f41e14 esp: f2f41e04 > Aug 13 06:53:03 middle kernel: ds: 007b es: 007b ss: 0068 > Aug 13 06:53:03 middle kernel: Process make (pid: 9500, threadinfo=f2f40000 task=eb0966a0) > Aug 13 06:53:03 middle kernel: Stack: f0c05cc0 f2f40000 f0271cc0 000d5ff4 f2f41e38 c016c7c0 f7b7e000 c1a50d80 > Aug 13 06:53:03 middle kernel: 000d5ff4 c1a50d80 000d5ff4 f0271cc0 f7b7e000 f2f41e58 c018fc92 f7b7e000 > Aug 13 06:53:03 middle kernel: 000d5ff4 c3600234 fffffff4 dddd2a74 dddd2a08 f2f41e7c c0160b10 dddd2a08 > Aug 13 06:53:03 middle kernel: Call Trace: > Aug 13 06:53:03 middle kernel: [<c016c7c0>] iget_locked+0x50/0xc0 > Aug 13 06:53:03 middle kernel: [<c018fc92>] ext3_lookup+0x62/0xd0 > Aug 13 06:53:03 middle kernel: [<c0160b10>] real_lookup+0xc0/0xf0 > Aug 13 06:53:03 middle kernel: [<c0160d84>] do_lookup+0x84/0x90 > Aug 13 06:53:03 middle kernel: [<c0161211>] link_path_walk+0x481/0x870 > Aug 13 06:53:03 middle kernel: [<c0161abe>] __user_walk+0x3e/0x60 > Aug 13 06:53:03 middle kernel: [<c015cdce>] vfs_stat+0x1e/0x60 > Aug 13 06:53:03 middle kernel: [<c015d43b>] sys_stat64+0x1b/0x40 > Aug 13 06:53:03 middle kernel: [<c03ca78f>] syscall_call+0x7/0xb > Aug 13 06:53:03 middle kernel: > Aug 13 06:53:03 middle kernel: Code: 75 ca eb c6 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 04 8b 5d 10 8b 7d 0c 8b 75 08 8b 0f 85 c9 74 13 8b 11 <0f> 18 02 90 39 59 18 89 c8 74 10 85 d2 89 d1 75 ed 31 c0 83 c4 You oopsed here: Code; c016c144 No symbols available 25: 85 c9 test %ecx,%ecx Code; c016c146 No symbols available 27: 74 13 je 3c <_EIP+0x3c> Code; c016c148 No symbols available 29: 8b 11 mov (%ecx),%edx This decode from eip onwards should be reliable Code; c016c14a No symbols available 00000000 <_EIP>: Code; c016c14a No symbols available <===== 0: 0f 18 02 prefetchnta (%edx) <===== Code; c016c14d No symbols available 3: 90 nop Code; c016c14e No symbols available 4: 39 59 18 cmp %ebx,0x18(%ecx) Code; c016c151 No symbols available And indeed, your %edx is zero. But if a prefetch of zero oopses then we should be oopsing in there all the time. hlist_for_each() is completely assuming that prefetch(0) is safe, and you undoubtedly oopsed doing it. Colour me confused, and let me Cc lots of x86 guys ;) Exactly what sort of CPU are you using? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 8:47 ` Andrew Morton @ 2003-08-13 9:19 ` Jurriaan on adsl-gate 2003-08-13 9:55 ` Andrew Morton 2003-08-13 11:06 ` Andi Kleen 1 sibling, 1 reply; 24+ messages in thread From: Jurriaan on adsl-gate @ 2003-08-13 9:19 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, Aug 13, 2003 at 01:47:46AM -0700, Andrew Morton wrote: > Jurriaan <thunder7@xs4all.nl> wrote: > > > > Aug 13 06:47:48 middle -- MARK -- > > Aug 13 06:53:03 middle kernel: printing eip: > > Aug 13 06:53:03 middle kernel: c016c14a > > Aug 13 06:53:03 middle kernel: Oops: 0000 [#1] > > Aug 13 06:53:03 middle kernel: PREEMPT > > Aug 13 06:53:03 middle kernel: CPU: 0 > > Aug 13 06:53:03 middle kernel: EIP: 0060:[<c016c14a>] Not tainted VLI > > Aug 13 06:53:03 middle kernel: EFLAGS: 00010286 > > Aug 13 06:53:03 middle kernel: EIP is at find_inode_fast+0x1a/0x60 > > And indeed, your %edx is zero. > > But if a prefetch of zero oopses then we should be oopsing in there all the > time. > > hlist_for_each() is completely assuming that prefetch(0) is safe, and you > undoubtedly oopsed doing it. > > > Colour me confused, and let me Cc lots of x86 guys ;) > > Exactly what sort of CPU are you using? > - AMD Athlon XP2400+ on a VIA KT400 chipset, single CPU-system. Kind regards, Jurriaan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 9:19 ` Jurriaan on adsl-gate @ 2003-08-13 9:55 ` Andrew Morton 2003-08-13 11:06 ` Alan Cox 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2003-08-13 9:55 UTC (permalink / raw) To: thunder7; +Cc: linux-kernel Jurriaan on adsl-gate <thunder7@xs4all.nl> wrote: > > > Exactly what sort of CPU are you using? > > - > AMD Athlon XP2400+ on a VIA KT400 chipset, single CPU-system. OK, thanks. The word is that Athlons will, very occasionally, take a fault when prefetching from an unmapped address. include/linux/list.h | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) diff -puN include/linux/list.h~hlist_for_each-fix include/linux/list.h --- 25/include/linux/list.h~hlist_for_each-fix 2003-08-13 02:29:32.000000000 -0700 +++ 25-akpm/include/linux/list.h 2003-08-13 02:37:33.000000000 -0700 @@ -504,11 +504,15 @@ static __inline__ void hlist_add_after(s #define hlist_entry(ptr, type, member) container_of(ptr,type,member) -/* Cannot easily do prefetch unfortunately */ -#define hlist_for_each(pos, head) \ - for (pos = (head)->first; pos && ({ prefetch(pos->next); 1; }); \ - pos = pos->next) +#define hlist_for_each(pos, head) \ + for ( pos = (head)->first; \ + likely(pos) && ({ \ + if (likely(pos->next)) \ + prefetch(pos->next); \ + 1; }); \ + pos = pos->next) +/* Cannot easily do prefetch unfortunately */ #define hlist_for_each_safe(pos, n, head) \ for (pos = (head)->first; n = pos ? pos->next : 0, pos; \ pos = n) _ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 9:55 ` Andrew Morton @ 2003-08-13 11:06 ` Alan Cox 2003-08-13 11:25 ` Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: Alan Cox @ 2003-08-13 11:06 UTC (permalink / raw) To: Andrew Morton; +Cc: thunder7, Linux Kernel Mailing List On Mer, 2003-08-13 at 10:55, Andrew Morton wrote: > Jurriaan on adsl-gate <thunder7@xs4all.nl> wrote: > > > > > Exactly what sort of CPU are you using? > > > - > > AMD Athlon XP2400+ on a VIA KT400 chipset, single CPU-system. > > OK, thanks. The word is that Athlons will, very occasionally, > take a fault when prefetching from an unmapped address. Page zero in the kernel is mapped in 4Mb paging mode (which is what the Athlon uses). Also your likely(pos) pretty much wiped out the point of prefetching and punishes other processors because it is in the wrong place. For that matter we could add a LIST_NULL that pointed somewhere safe and wasn't NULL per se in 2.7. Put the likely(pos) in the asm/prefetch for Athlon until someone can figure out what is going on with some specific Athlons, 2.6 and certain kernels (notably 4G/4G). Long term we really do need to start supporting a zero page mapped at 0->64K when not debugging the kernel, then you can let the compiler do NULL dereferences which is a _huge_ win because you can move stuff around a lot of natural C conditionals to get better unrolling and instruction scheduling. The alternative is to start doing multipointer lists which is messier and uses more memory (ie each node has next, prev, "several nodes on") Alan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 11:06 ` Alan Cox @ 2003-08-13 11:25 ` Andrew Morton 2003-08-13 11:40 ` Alan Cox 2003-08-18 12:08 ` Pavel Machek 0 siblings, 2 replies; 24+ messages in thread From: Andrew Morton @ 2003-08-13 11:25 UTC (permalink / raw) To: Alan Cox; +Cc: thunder7, linux-kernel Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > Put the likely(pos) in the asm/prefetch for Athlon until someone can > figure out what is going on with some specific Athlons, 2.6 and certain > kernels (notably 4G/4G). <riffles through random config options> Like this? What happens if someone runs a K6 kernel on a K7? Or various other CPU types? What is the matrix here? I don't like the way this is headed... --- 25/include/asm-i386/processor.h~athlon-prefetch-fix 2003-08-13 04:21:01.000000000 -0700 +++ 25-akpm/include/asm-i386/processor.h 2003-08-13 04:22:10.000000000 -0700 @@ -568,6 +568,10 @@ static inline void rep_nop(void) #define ARCH_HAS_PREFETCH extern inline void prefetch(const void *x) { +#ifdef CONFIG_MK7 + if (unlikely(x == NULL)) + return; /* athlons like to oops in prefetch(0) */ +#endif alternative_input(ASM_NOP4, "prefetchnta (%1)", X86_FEATURE_XMM, _ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 11:25 ` Andrew Morton @ 2003-08-13 11:40 ` Alan Cox 2003-08-18 12:08 ` Pavel Machek 1 sibling, 0 replies; 24+ messages in thread From: Alan Cox @ 2003-08-13 11:40 UTC (permalink / raw) To: Andrew Morton; +Cc: thunder7, Linux Kernel Mailing List On Mer, 2003-08-13 at 12:25, Andrew Morton wrote: > Like this? > > What happens if someone runs a K6 kernel on a K7? > Or various other CPU types? What is the matrix here? Beats me, but then the prefetch code in 2.6 seems broken from 5 seconds of inspection anyway. We are testing the XMM feature and using prefetchnta for Athlon, thats wrong for lots of athlon processors that dont have XMM but do have prefetch/prefetchw, (which btw also seem to work properly on all these processors while prefetchnta seems to do funky things) Perhaps someone should fix prefetch() before they worry about the rest of the mess ? For Athlon we should be testing 3Dnow, and using prefetch/prefetchw for Intel cases we want to go for prefetchnta if XMM is set (PIII, PIV) Alan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 11:25 ` Andrew Morton 2003-08-13 11:40 ` Alan Cox @ 2003-08-18 12:08 ` Pavel Machek 1 sibling, 0 replies; 24+ messages in thread From: Pavel Machek @ 2003-08-18 12:08 UTC (permalink / raw) To: Andrew Morton; +Cc: Alan Cox, thunder7, linux-kernel Hi! > > Put the likely(pos) in the asm/prefetch for Athlon until someone can > > figure out what is going on with some specific Athlons, 2.6 and certain > > kernels (notably 4G/4G). > > <riffles through random config options> > > Like this? > What happens if someone runs a K6 kernel on a K7? You break things :-(. Also prefetch with test for null does probably more harm than good. What about simply assuming K7 can not do prefetch? Pavel -- When do you have a heart between your knees? [Johanka's followup: and *two* hearts?] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.6.0-test3-mm1: scheduling while atomic (ext3?) 2003-08-13 8:47 ` Andrew Morton 2003-08-13 9:19 ` Jurriaan on adsl-gate @ 2003-08-13 11:06 ` Andi Kleen 1 sibling, 0 replies; 24+ messages in thread From: Andi Kleen @ 2003-08-13 11:06 UTC (permalink / raw) To: Andrew Morton Cc: thunder7, linux-kernel, Andi Kleen, Zwane Mwaikambo, Dave Jones > But if a prefetch of zero oopses then we should be oopsing in there all the > time. > If it's an Opteron then it's a known errata (#91 iirc). Update your BIOS in this case. The x86-64 kernel port also has a workaround for this (adding exception handling to the prefetches) -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2003-08-18 12:09 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20030813045638.GA9713@middle.of.nowhere.suse.lists.linux.kernel> [not found] ` <20030813014746.412660ae.akpm@osdl.org.suse.lists.linux.kernel> [not found] ` <20030813091958.GA30746@gates.of.nowhere.suse.lists.linux.kernel> [not found] ` <20030813025542.32429718.akpm@osdl.org.suse.lists.linux.kernel> [not found] ` <1060772769.8009.4.camel@localhost.localdomain.suse.lists.linux.kernel> 2003-08-13 11:17 ` 2.6.0-test3-mm1: scheduling while atomic (ext3?) Andi Kleen [not found] ` <20030813042544.5064b3f4.akpm@osdl.org.suse.lists.linux.kernel> [not found] ` <1060774803.8008.24.camel@localhost.localdomain.suse.lists.linux.kernel> 2003-08-13 12:10 ` Andi Kleen 2003-08-13 12:48 ` Alan Cox 2003-08-13 13:14 ` Andi Kleen 2003-08-13 14:09 ` Alan Cox 2003-08-13 14:20 ` Andi Kleen 2003-08-13 15:20 ` Alan Cox 2003-08-13 15:32 ` Andi Kleen 2003-08-13 18:44 ` Alan Cox 2003-08-13 18:53 ` Andi Kleen 2003-08-13 16:39 ` Dave Jones 2003-08-13 18:34 ` Alan Cox 2003-08-13 20:12 ` Dave Jones 2003-08-13 18:37 ` Alan Cox 2003-08-13 18:59 richard.brunner -- strict thread matches above, loose matches on Subject: below -- 2003-08-13 4:56 Jurriaan 2003-08-13 8:47 ` Andrew Morton 2003-08-13 9:19 ` Jurriaan on adsl-gate 2003-08-13 9:55 ` Andrew Morton 2003-08-13 11:06 ` Alan Cox 2003-08-13 11:25 ` Andrew Morton 2003-08-13 11:40 ` Alan Cox 2003-08-18 12:08 ` Pavel Machek 2003-08-13 11:06 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).