linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.4 mm trouble [possible lru race]
@ 2002-10-01 14:20 Richard.Zidlicky
  2002-10-01 15:12 ` Daniel Phillips
  2002-10-01 15:29 ` Daniel Phillips
  0 siblings, 2 replies; 14+ messages in thread
From: Richard.Zidlicky @ 2002-10-01 14:20 UTC (permalink / raw)
  To: zippel, phillips; +Cc: linux-m68k, linux-kernel


> 
> The theoretical lru race possibly spotted in the wild...
> 
> >
> > Now I am wondering if that is just coincidence or why m68k hit that 
> > error so reliably.. is it supposed to have any effect at all on
> > UP?
> 
> Are you running UP+preempt?

no preempt or anything fancy, m68k vanila 2.4.19 (well almost).

Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 14:20 2.4 mm trouble [possible lru race] Richard.Zidlicky
@ 2002-10-01 15:12 ` Daniel Phillips
  2002-10-01 15:29 ` Daniel Phillips
  1 sibling, 0 replies; 14+ messages in thread
From: Daniel Phillips @ 2002-10-01 15:12 UTC (permalink / raw)
  To: Richard.Zidlicky, zippel; +Cc: linux-m68k, linux-kernel

On Tuesday 01 October 2002 16:20, Richard.Zidlicky@stud.informatik.uni-erlangen.de wrote:
> > 
> > The theoretical lru race possibly spotted in the wild...
> > 
> > >
> > > Now I am wondering if that is just coincidence or why m68k hit that 
> > > error so reliably.. is it supposed to have any effect at all on
> > > UP?
> > 
> > Are you running UP+preempt?
> 
> no preempt or anything fancy, m68k vanila 2.4.19 (well almost).

I'm having real trouble spotting substantive change in the patch that
would affect a UP kernel.  I believe you when you say it fixes your
problem, but we don't know why, and it is worth making some effort to
find out why.

Ah wait, I see one candidate, would you like to try:

                 * the page as well.
                 */
                if (page->buffers) {
                        /* avoid to free a locked page */
-                       get_page(page);
                        spin_unlock(&pagemap_lru_lock);
+                       get_page(page);
 
and see if your bug comes back?  There are a couple of other changes
that could be considered substantive by stretching one's imagination
enough, but this is the leading candidate.

Oh wait, you could also try this, a little further down:

+                                       page_cache_release(page);
                                        spin_lock(&pagemap_lru_lock);
-                                       put_page_nofree(page);

By the way, the original patch you posted was reversed and your editor
apparently took the liberty of cleaning up some whitespace in the file.
Generally, we try do avoid patch chunks that just, e.g., change bogus
spaces to tabs, and save those for official whitespace patches.

-- 
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 14:20 2.4 mm trouble [possible lru race] Richard.Zidlicky
  2002-10-01 15:12 ` Daniel Phillips
@ 2002-10-01 15:29 ` Daniel Phillips
  2002-10-01 16:56   ` Rik van Riel
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Phillips @ 2002-10-01 15:29 UTC (permalink / raw)
  To: Richard.Zidlicky, zippel; +Cc: linux-m68k, linux-kernel

On Tuesday 01 October 2002 16:20, Richard.Zidlicky@stud.informatik.uni-erlangen.de wrote:
> > 
> > The theoretical lru race possibly spotted in the wild...
> > 
> > >
> > > Now I am wondering if that is just coincidence or why m68k hit that 
> > > error so reliably.. is it supposed to have any effect at all on
> > > UP?
> > 
> > Are you running UP+preempt?
> 
> no preempt or anything fancy, m68k vanila 2.4.19 (well almost).

Vanilla would be CONFIG_SMP=y, is that what you have?  Otherwise please
disregard the post just above (which hasn't appeared on the list yet)
because spin_lock/unlock would be null, and the tests I suggested would
have no effect.

We would then be left with a *very* small number of candidates, which
we will test in accordance with the "what remains must be the truth"
principle.

-- 
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 15:29 ` Daniel Phillips
@ 2002-10-01 16:56   ` Rik van Riel
  2002-10-01 17:10     ` Daniel Phillips
  0 siblings, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2002-10-01 16:56 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Richard.Zidlicky, zippel, linux-m68k, linux-kernel

On Tue, 1 Oct 2002, Daniel Phillips wrote:
> On Tuesday 01 October 2002 16:20, Richard.Zidlicky@stud.informatik.uni-erlangen.de wrote:

> > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
>
> Vanilla would be CONFIG_SMP=y, is that what you have?

Somehow I doubt Linux supports m68k SMP machines ;)

Rik
-- 
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 16:56   ` Rik van Riel
@ 2002-10-01 17:10     ` Daniel Phillips
  2002-10-01 17:31       ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Phillips @ 2002-10-01 17:10 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Richard.Zidlicky, zippel, linux-m68k, linux-kernel

On Tuesday 01 October 2002 18:56, Rik van Riel wrote:
> On Tue, 1 Oct 2002, Daniel Phillips wrote:
> > On Tuesday 01 October 2002 16:20, Richard.Zidlicky@stud.informatik.uni-erlangen.de wrote:
> 
> > > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
> >
> > Vanilla would be CONFIG_SMP=y, is that what you have?
> 
> Somehow I doubt Linux supports m68k SMP machines ;)

CONFIG_SMP=y works perfectly well on single cpu machines - it forces
the spinlocks to actually exist.  It's not supposed to change any
behaviour, but you never know.  Behaviour is obviously changing here.

-- 
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 17:10     ` Daniel Phillips
@ 2002-10-01 17:31       ` Jens Axboe
  2002-10-01 18:01         ` Daniel Phillips
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2002-10-01 17:31 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Rik van Riel, Richard.Zidlicky, zippel, linux-m68k, linux-kernel

On Tue, Oct 01 2002, Daniel Phillips wrote:
> On Tuesday 01 October 2002 18:56, Rik van Riel wrote:
> > On Tue, 1 Oct 2002, Daniel Phillips wrote:
> > > On Tuesday 01 October 2002 16:20, Richard.Zidlicky@stud.informatik.uni-erlangen.de wrote:
> > 
> > > > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
> > >
> > > Vanilla would be CONFIG_SMP=y, is that what you have?
> > 
> > Somehow I doubt Linux supports m68k SMP machines ;)
> 
> CONFIG_SMP=y works perfectly well on single cpu machines - it forces
> the spinlocks to actually exist.  It's not supposed to change any
> behaviour, but you never know.  Behaviour is obviously changing here.

Again, m68k was the target.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 17:31       ` Jens Axboe
@ 2002-10-01 18:01         ` Daniel Phillips
  2002-10-01 18:04           ` Jens Axboe
  2002-10-02  9:45           ` Richard Zidlicky
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel Phillips @ 2002-10-01 18:01 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Rik van Riel, Richard.Zidlicky, zippel, linux-m68k, linux-kernel

On Tuesday 01 October 2002 19:31, Jens Axboe wrote:
> On Tue, Oct 01 2002, Daniel Phillips wrote:
> > On Tuesday 01 October 2002 18:56, Rik van Riel wrote:
> > > On Tue, 1 Oct 2002, Daniel Phillips wrote:
> > > > On Tuesday 01 October 2002 16:20, Richard.Zidlicky@stud.informatik.uni-erlangen.de wrote:
> > > > > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
> > > >
> > > > Vanilla would be CONFIG_SMP=y, is that what you have?
> > > 
> > > Somehow I doubt Linux supports m68k SMP machines ;)
> > 
> > CONFIG_SMP=y works perfectly well on single cpu machines - it forces
> > the spinlocks to actually exist.  It's not supposed to change any
> > behaviour, but you never know.  Behaviour is obviously changing here.
> 
> Again, m68k was the target.

Sure fine, no good reason to be cryptic about it though.

   #error "m68k doesn't do SMP yet"

So SMP must be off or the compile would abort.  Well, the only interesting
difference remaining is the extra count for the LRU.  I actually had that
parameterized at one time so you could turn it on/off easily, but akpm
complained about #ifdef's so I took that out ;-)

Richard, before I go making a test patch for you (it's not completely
straightforward) can you confirm that your bug comes back when you back
the lru race patch out?

-- 
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 18:01         ` Daniel Phillips
@ 2002-10-01 18:04           ` Jens Axboe
  2002-10-01 18:14             ` Daniel Phillips
  2002-10-02  9:45           ` Richard Zidlicky
  1 sibling, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2002-10-01 18:04 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Rik van Riel, Richard.Zidlicky, zippel, linux-m68k, linux-kernel

On Tue, Oct 01 2002, Daniel Phillips wrote:
> On Tuesday 01 October 2002 19:31, Jens Axboe wrote:
> > On Tue, Oct 01 2002, Daniel Phillips wrote:
> > > On Tuesday 01 October 2002 18:56, Rik van Riel wrote:
> > > > On Tue, 1 Oct 2002, Daniel Phillips wrote:
> > > > > On Tuesday 01 October 2002 16:20, Richard.Zidlicky@stud.informatik.uni-erlangen.de wrote:
> > > > > > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
> > > > >
> > > > > Vanilla would be CONFIG_SMP=y, is that what you have?
> > > > 
> > > > Somehow I doubt Linux supports m68k SMP machines ;)
> > > 
> > > CONFIG_SMP=y works perfectly well on single cpu machines - it forces
> > > the spinlocks to actually exist.  It's not supposed to change any
> > > behaviour, but you never know.  Behaviour is obviously changing here.
> > 
> > Again, m68k was the target.
> 
> Sure fine, no good reason to be cryptic about it though.
> 
>    #error "m68k doesn't do SMP yet"
> 
> So SMP must be off or the compile would abort.  Well, the only interesting

There's no CONFIG_SMP in the m68k arch config.in. Anyways, enough
beating of dead horse :)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 18:04           ` Jens Axboe
@ 2002-10-01 18:14             ` Daniel Phillips
  2002-10-01 18:22               ` Rik van Riel
  2002-10-02 12:07               ` Geert Uytterhoeven
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel Phillips @ 2002-10-01 18:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Rik van Riel, Richard.Zidlicky, zippel, linux-m68k, linux-kernel

On Tuesday 01 October 2002 20:04, Jens Axboe wrote:
> On Tue, Oct 01 2002, Daniel Phillips wrote:
> > On Tuesday 01 October 2002 19:31, Jens Axboe wrote:
> > > Again, m68k was the target.
> > 
> > Sure fine, no good reason to be cryptic about it though.
> > 
> >    #error "m68k doesn't do SMP yet"
> > 
> > So SMP must be off or the compile would abort.  Well, the only interesting
> 
> There's no CONFIG_SMP in the m68k arch config.in. Anyways, enough
> beating of dead horse :)

The horse isn't dead yet, it's still twitching a little.  At this
point we still need to speculate about wny anyone would want an SMP
Dragonball machine ;-)

-- 
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 18:14             ` Daniel Phillips
@ 2002-10-01 18:22               ` Rik van Riel
  2002-10-02 12:07               ` Geert Uytterhoeven
  1 sibling, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2002-10-01 18:22 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Jens Axboe, Richard.Zidlicky, zippel, linux-m68k, linux-kernel

On Tue, 1 Oct 2002, Daniel Phillips wrote:

> The horse isn't dead yet, it's still twitching a little.  At this
> point we still need to speculate about wny anyone would want an
> SMP Dragonball machine ;-)

I've seen an SMP 68k box, a DIAB DATA machine. I think
Bull shipped them, too.

What is that coloured spot on the pavement ?
Could it be a horse died there, long ago ?
Now, stop beating the pavement.

cheers,

Rik
-- 
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 18:01         ` Daniel Phillips
  2002-10-01 18:04           ` Jens Axboe
@ 2002-10-02  9:45           ` Richard Zidlicky
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Zidlicky @ 2002-10-02  9:45 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Jens Axboe, Rik van Riel, zippel, linux-m68k, linux-kernel

On Tue, Oct 01, 2002 at 08:01:10PM +0200, Daniel Phillips wrote:

> Richard, before I go making a test patch for you (it's not completely
> straightforward) can you confirm that your bug comes back when you back
> the lru race patch out?

bad luck, the disappearance of the bug was rather accidental - I have
switched to a different swap partition in the meantime. So backing out
the changes doesn't make the bug reappear, restoring previous IDE
configuration does.

Very likely interrupt related trouble, somewhere a missing spinlock_irqsave 
perhaps. The bug manifests itself so that pages from wrong procesess get 
swapped in for some process, however I have also had the luck to crash the 
kernel (no Oops) so it is not likely to be one of the TLB/cache problems.

What strikes me is that is always related to swap and I 've never got any
strange dmesg so far.

Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
  2002-10-01 18:14             ` Daniel Phillips
  2002-10-01 18:22               ` Rik van Riel
@ 2002-10-02 12:07               ` Geert Uytterhoeven
  1 sibling, 0 replies; 14+ messages in thread
From: Geert Uytterhoeven @ 2002-10-02 12:07 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Jens Axboe, Rik van Riel, Richard.Zidlicky, Roman Zippel,
	Linux/m68k, Linux Kernel Development

On Tue, 1 Oct 2002, Daniel Phillips wrote:
> On Tuesday 01 October 2002 20:04, Jens Axboe wrote:
> > On Tue, Oct 01 2002, Daniel Phillips wrote:
> > > On Tuesday 01 October 2002 19:31, Jens Axboe wrote:
> > > > Again, m68k was the target.
> > > 
> > > Sure fine, no good reason to be cryptic about it though.
> > > 
> > >    #error "m68k doesn't do SMP yet"
> > > 
> > > So SMP must be off or the compile would abort.  Well, the only interesting
> > 
> > There's no CONFIG_SMP in the m68k arch config.in. Anyways, enough
> > beating of dead horse :)
> 
> The horse isn't dead yet, it's still twitching a little.  At this
> point we still need to speculate about wny anyone would want an SMP
> Dragonball machine ;-)

Dragonballs don't have an MMU, so they would run uClinux/m68k, not Linux/m68k.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
@ 2002-10-20 14:37 Richard Zidlicky
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Zidlicky @ 2002-10-20 14:37 UTC (permalink / raw)
  To: axboe, phillips; +Cc: riel, zippel, linux-m68k, linux-kernel

> 
> On Tuesday 01 October 2002 20:04, Jens Axboe wrote:
> > On Tue, Oct 01 2002, Daniel Phillips wrote:
> > > On Tuesday 01 October 2002 19:31, Jens Axboe wrote:
> > > > Again, m68k was the target.
> > > 
> > > Sure fine, no good reason to be cryptic about it though.
> > > 
> > >    #error "m68k doesn't do SMP yet"
> > > 
> > > So SMP must be off or the compile would abort.  Well, the only interesting
> > 
> > There's no CONFIG_SMP in the m68k arch config.in. Anyways, enough
> > beating of dead horse :)
> 
> The horse isn't dead yet, it's still twitching a little.  At this
> point we still need to speculate about wny anyone would want an SMP
> Dragonball machine ;-)

not on Dragonball but there were many 68040 SMP systems around long 
before Intel had anything SMP capable. In the late 80'ies those were 
considered real number crunchers :)

Richard


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 mm trouble [possible lru race]
       [not found]   ` <20021001112229.A235@linux-m68k.org>
@ 2002-10-01 10:26     ` Daniel Phillips
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Phillips @ 2002-10-01 10:26 UTC (permalink / raw)
  To: Richard Zidlicky, Roman Zippel; +Cc: Linux m68k, linux-kernel

The theoretical lru race possibly spotted in the wild...

On Tuesday 01 October 2002 11:22, Richard Zidlicky wrote:
> On Sat, Sep 28, 2002 at 04:38:50PM +0200, Roman Zippel wrote:
> > Hi,
> > 
> > On Wed, 25 Sep 2002, Richard Zidlicky wrote:
> > 
> > > First I suspected a stale TLB entry so I've added pretty many
> > > extra flushes througout the code. This does very much reduce the
> > > risk of the problem, but the problem still happens if swapping is
> > > increased so it might very well be something else.
> > >
> > > Any ideas?
> > 
> > It sounds like a cache problem. I had to fix one early 2.4, so maybe there
> > is another one.
> > It would help a lot if you could reproduce it within gdb to get some more
> > information about the context of the crash (e.g. invalid data or code, the
> > type of mapping (from /proc/<pid>/maps)).
> 
> hm, I am now testing the appended patch, which is a backport to 2.4.19
> of this:
> 
<<<<<<<<<<<<
>
> From: Daniel Phillips <phillips@arcor.de>
> To: Marcelo Tosatti <marcelo@conectiva.com.br>,
>         "Christian Ehrhardt" <ehrhardt@mathematik.uni-ulm.de>
> Subject: [CFT] [PATCH] LRU race fix
> Date: 	Tue, 17 Sep 2002 19:03:19 +0200
> X-Mailer: KMail [version 1.3.2]
> Cc: <linux-kernel@vger.kernel.org>
> 
> This patch against 2.4.20-pre7 fixes a theoretical race where a page could
> possibly be freed while still on the lru list.  The details have been
> discussed at length earlier, see "[RFC] [PATCH] Include LRU in page count".
> 
> The race may not even be that theoretical, it's just so rare that when it
> does happen, it might be dismissed as a driver problem or similar...
>
> [...]
>
>>>>>>>>>>>>

> Somehow this does completely fix my problem, I have taken out all
> the tlb related hacks and the testcase that caused the problem 100%
> now runs without any sign of problems :))
>
> Now I am wondering if that is just coincidence or why m68k hit that 
> error so reliably.. is it supposed to have any effect at all on
> UP?

Are you running UP+preempt?

-- 
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-10-20 14:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-01 14:20 2.4 mm trouble [possible lru race] Richard.Zidlicky
2002-10-01 15:12 ` Daniel Phillips
2002-10-01 15:29 ` Daniel Phillips
2002-10-01 16:56   ` Rik van Riel
2002-10-01 17:10     ` Daniel Phillips
2002-10-01 17:31       ` Jens Axboe
2002-10-01 18:01         ` Daniel Phillips
2002-10-01 18:04           ` Jens Axboe
2002-10-01 18:14             ` Daniel Phillips
2002-10-01 18:22               ` Rik van Riel
2002-10-02 12:07               ` Geert Uytterhoeven
2002-10-02  9:45           ` Richard Zidlicky
  -- strict thread matches above, loose matches on Subject: below --
2002-10-20 14:37 Richard Zidlicky
     [not found] <20020925122439.C198@linux-m68k.org>
     [not found] ` <Pine.LNX.4.44.0209281634240.338-100000@serv>
     [not found]   ` <20021001112229.A235@linux-m68k.org>
2002-10-01 10:26     ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).