linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* page_launder() bug
@ 2001-05-06 21:08 BERECZ Szabolcs
  2001-05-06 21:59 ` Jonathan Morton
                   ` (3 more replies)
  0 siblings, 4 replies; 104+ messages in thread
From: BERECZ Szabolcs @ 2001-05-06 21:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

Hi!

there is a bug in page_launder introduced with kernel 2.4.3-ac12.
if the swapfile is on a filesystem, then after swapping out some
pages, the system locks up. sometimes it writes an oops message.
I don't know exactly what's the problem, but with the attached
patch it works. (this is just a reverse patch to go back to
pre 2.4.3-ac12, and it's against 2.4.4-ac5)
please fix the bug, or apply this patch until fixed properly.

a question:
why don't you include lkcd or something like that in the
mainstream kernel? it would be much easier to save those annoying
oopses :)

Bye,
Szabi


diff -Nur linux/mm/vmscan.c linux.swapfix/mm/vmscan.c
--- linux/mm/vmscan.c	Sun May  6 15:59:22 2001
+++ linux.swapfix/mm/vmscan.c	Sun May  6 16:07:09 2001
@@ -448,15 +448,9 @@
 	maxscan = nr_inactive_dirty_pages;
 	while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list &&
 				maxscan-- > 0) {
-		int dead_swap_page;
-
 		page = list_entry(page_lru, struct page, lru);
 		zone = page->zone;

-		dead_swap_page =
-			(PageSwapCache(page) &&
-			 page_count(page) == (1 + !!page->buffers));
-
 		/* Wrong page on list?! (list corruption, should not happen) */
 		if (!PageInactiveDirty(page)) {
 			printk("VM: page_launder, wrong page on list.\n");
@@ -467,10 +461,9 @@
 		}

 		/* Page is or was in use?  Move it to the active list. */
-		if (!dead_swap_page &&
-		    (PageTestandClearReferenced(page) || page->age > 0 ||
-		     (!page->buffers && page_count(page) > 1) ||
-		     page_ramdisk(page))) {
+		if (PageTestandClearReferenced(page) || page->age > 0 ||
+				(!page->buffers && page_count(page) > 1) ||
+				page_ramdisk(page)) {
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_active_list(page);
 			continue;
@@ -512,11 +505,8 @@
 			if (!writepage)
 				goto page_active;

-			/* First time through? Move it to the back of the list,
-			 * but not if it is a dead swap page. We want to reap
-			 * those as fast as possible.
-			 */
-			if (!launder_loop && !dead_swap_page) {
+			/* First time through? Move it to the back of the list */
+			if (!launder_loop) {
 				list_del(page_lru);
 				list_add(page_lru, &inactive_dirty_list);
 				UnlockPage(page);


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-06 21:08 page_launder() bug BERECZ Szabolcs
@ 2001-05-06 21:59 ` Jonathan Morton
  2001-05-06 22:07   ` BERECZ Szabolcs
  2001-05-07  4:55 ` David S. Miller
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 104+ messages in thread
From: Jonathan Morton @ 2001-05-06 21:59 UTC (permalink / raw)
  To: BERECZ Szabolcs, linux-kernel; +Cc: linux-mm

>-			 page_count(page) == (1 + !!page->buffers));

Two inversions in a row?  I'd like to see that made more explicit,
otherwise it looks like a bug to me.  Of course, if it IS a bug...

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-06 21:59 ` Jonathan Morton
@ 2001-05-06 22:07   ` BERECZ Szabolcs
  0 siblings, 0 replies; 104+ messages in thread
From: BERECZ Szabolcs @ 2001-05-06 22:07 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-kernel, linux-mm

Hi!

On Sun, 6 May 2001, Jonathan Morton wrote:

> >-			 page_count(page) == (1 + !!page->buffers));
>
> Two inversions in a row?  I'd like to see that made more explicit,
> otherwise it looks like a bug to me.  Of course, if it IS a bug...
it's not a bug.
if page->buffers is zero, than the page_count(page) is 1, and if
page->buffers is other than zero, page_count(page) is 2.
so it checks if page is really used by something.
maybe this last line is not true, but the !!page->buffers is not a bug.

Bye,
Szabi



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-06 21:08 page_launder() bug BERECZ Szabolcs
  2001-05-06 21:59 ` Jonathan Morton
@ 2001-05-07  4:55 ` David S. Miller
  2001-05-07  5:19   ` Aaron Lehmann
                     ` (6 more replies)
  2001-05-07 17:59 ` Linus Torvalds
  2001-05-07 22:44 ` David S. Miller
  3 siblings, 7 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-07  4:55 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: BERECZ Szabolcs, linux-kernel, linux-mm


Jonathan Morton writes:
 > >-			 page_count(page) == (1 + !!page->buffers));
 > 
 > Two inversions in a row?

It is the most straightforward way to make a '1' or '0'
integer from the NULL state of a pointer.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  4:55 ` David S. Miller
@ 2001-05-07  5:19   ` Aaron Lehmann
  2001-05-07  6:26   ` Tobias Ringstrom
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 104+ messages in thread
From: Aaron Lehmann @ 2001-05-07  5:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jonathan Morton, BERECZ Szabolcs, linux-kernel, linux-mm

On Sun, May 06, 2001 at 09:55:26PM -0700, David S. Miller wrote:
> 
> Jonathan Morton writes:
>  > >-			 page_count(page) == (1 + !!page->buffers));
>  > Two inversions in a row?
> It is the most straightforward way to make a '1' or '0'
> integer from the NULL state of a pointer.

page_count(page) == (1 + (page->buffers != 0));


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  4:55 ` David S. Miller
  2001-05-07  5:19   ` Aaron Lehmann
@ 2001-05-07  6:26   ` Tobias Ringstrom
  2001-05-07  8:59     ` Helge Hafting
                       ` (3 more replies)
  2001-05-07  8:54   ` David S. Miller
                     ` (4 subsequent siblings)
  6 siblings, 4 replies; 104+ messages in thread
From: Tobias Ringstrom @ 2001-05-07  6:26 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jonathan Morton, BERECZ Szabolcs, linux-kernel, linux-mm

On Sun, 6 May 2001, David S. Miller wrote:
> It is the most straightforward way to make a '1' or '0'
> integer from the NULL state of a pointer.

But is it really specified in the C "standards" to be exctly zero or one,
and not zero and non-zero?

IMHO, the ?: construct is way more readable and reliable.

/Tobias


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  4:55 ` David S. Miller
  2001-05-07  5:19   ` Aaron Lehmann
  2001-05-07  6:26   ` Tobias Ringstrom
@ 2001-05-07  8:54   ` David S. Miller
  2001-05-07 15:12     ` Tobias Ringstrom
  2001-05-07 14:52   ` Horst von Brand
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 104+ messages in thread
From: David S. Miller @ 2001-05-07  8:54 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: Jonathan Morton, BERECZ Szabolcs, linux-kernel, linux-mm


Tobias Ringstrom writes:
 > But is it really specified in the C "standards" to be exctly zero or one,
 > and not zero and non-zero?

I'm pretty sure it does.

 > IMHO, the ?: construct is way more readable and reliable.

Well identical code has been there for several months just a few lines
away.

I've seen this idiom used in many places (even the GCC sources :-),
so I'm rather surprised people are seeing it for the first time.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  6:26   ` Tobias Ringstrom
@ 2001-05-07  8:59     ` Helge Hafting
  2001-05-07 19:02       ` J . A . Magallon
  2001-05-07 10:52     ` Alan Cox
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 104+ messages in thread
From: Helge Hafting @ 2001-05-07  8:59 UTC (permalink / raw)
  To: Tobias Ringstrom; +Cc: linux-kernel

Tobias Ringstrom wrote:
> 
> On Sun, 6 May 2001, David S. Miller wrote:
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
> 
> But is it really specified in the C "standards" to be exctly zero or one,
> and not zero and non-zero?

!0 is 1.  !(anything else) is 0.  It is zero and one, not
zero and "non-zero".  So a !! construction gives zero if you have
zero, and one if you had anything else.  There's no doubt about it.
> 
> IMHO, the ?: construct is way more readable and reliable.

Thats your opinion.  There are many others.  Some don't like the
?: at all, for example.  And some like all valid C.

Helge Hafting

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  6:26   ` Tobias Ringstrom
  2001-05-07  8:59     ` Helge Hafting
@ 2001-05-07 10:52     ` Alan Cox
  2001-05-07 13:49     ` Daniel Phillips
  2001-05-07 13:53     ` H. Peter Anvin
  3 siblings, 0 replies; 104+ messages in thread
From: Alan Cox @ 2001-05-07 10:52 UTC (permalink / raw)
  To: Tobias Ringstrom
  Cc: David S. Miller, Jonathan Morton, BERECZ Szabolcs, linux-kernel,
	linux-mm

> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
> 
> But is it really specified in the C "standards" to be exctly zero or one,
> and not zero and non-zero?

Yes. (Fortunately since when this argument occurred Linus said he would eat
his underpants if he was wrong)

Alan


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  6:26   ` Tobias Ringstrom
  2001-05-07  8:59     ` Helge Hafting
  2001-05-07 10:52     ` Alan Cox
@ 2001-05-07 13:49     ` Daniel Phillips
  2001-05-07 13:53     ` H. Peter Anvin
  3 siblings, 0 replies; 104+ messages in thread
From: Daniel Phillips @ 2001-05-07 13:49 UTC (permalink / raw)
  To: Tobias Ringstrom, David S. Miller
  Cc: Jonathan Morton, BERECZ Szabolcs, linux-kernel, linux-mm

On Monday 07 May 2001 08:26, Tobias Ringstrom wrote:
> On Sun, 6 May 2001, David S. Miller wrote:
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
>
> But is it really specified in the C "standards" to be exctly zero or
> one, and not zero and non-zero?

Yes, and if we did not have this stupid situation where the C language 
standard is not freely available online then you would not have had to 
ask.</rant>

> IMHO, the ?: construct is way more readable and reliable.

There is no difference in reliability.  Readability is a matter of 
opinion - my opinion is that they are equally readable.  To its credit, 
gcc produces the same ia32 code in either case:

	int foo = 999;
	return 1 + !!foo;

<main+6>:	movl   $0x3e7,0xfffffffc(%ebp)
<main+13>:	cmpl   $0x0,0xfffffffc(%ebp)
<main+17>:	je     0x80483e0 <main+32>
<main+19>:	mov    $0x2,%eax
<main+24>:	jmp    0x80483e5 <main+37>
<main+26>:	lea    0x0(%esi),%esi
<main+32>:	mov    $0x1,%eax
<main+37>:	mov    %eax,%eax

	int foo = 999;
	return foo? 2: 1;

<main+6>:	movl   $0x3e7,0xfffffffc(%ebp)
<main+13>:	cmpl   $0x0,0xfffffffc(%ebp)
<main+17>:	je     0x80483e0 <main+32>
<main+19>:	mov    $0x2,%eax
<main+24>:	jmp    0x80483e5 <main+37>
<main+26>:	lea    0x0(%esi),%esi
<main+32>:	mov    $0x1,%eax
<main+37>:	mov    %eax,%eax

--
Daniel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  6:26   ` Tobias Ringstrom
                       ` (2 preceding siblings ...)
  2001-05-07 13:49     ` Daniel Phillips
@ 2001-05-07 13:53     ` H. Peter Anvin
  3 siblings, 0 replies; 104+ messages in thread
From: H. Peter Anvin @ 2001-05-07 13:53 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <Pine.LNX.4.33.0105070823060.24073-100000@svea.tellus>
By author:    Tobias Ringstrom <tori@tellus.mine.nu>
In newsgroup: linux.dev.kernel
>
> On Sun, 6 May 2001, David S. Miller wrote:
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
> 
> But is it really specified in the C "standards" to be exctly zero or one,
> and not zero and non-zero?
> 

Yes it is.

> 
> IMHO, the ?: construct is way more readable and reliable.
> 

In C99 one can write (bool)foo which is more readable than either,
especially since that is really what one is trying to do.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  4:55 ` David S. Miller
                     ` (2 preceding siblings ...)
  2001-05-07  8:54   ` David S. Miller
@ 2001-05-07 14:52   ` Horst von Brand
       [not found]     ` <davem@redhat.com>
  2001-05-08 17:59   ` page_launder() bug Kai Henningsen
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 104+ messages in thread
From: Horst von Brand @ 2001-05-07 14:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jonathan Morton, BERECZ Szabolcs, linux-kernel, linux-mm

"David S. Miller" <davem@redhat.com> said:
> Jonathan Morton writes:
>  > >-			 page_count(page) == (1 + !!page->buffers));
>  > 
>  > Two inversions in a row?
> 
> It is the most straightforward way to make a '1' or '0'
> integer from the NULL state of a pointer.

IMVHO, it is clearer to write:

  page_count(page) == 1 + (page->buffers != NULL)

At least, the original poster wouldn't have wondered, and I wouldn't have
had to think a bit to find out what it meant... If gcc generates worse code
for this, it should be fixed.
-- 
Dr. Horst H. von Brand                       mailto:vonbrand@inf.utfsm.cl
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  8:54   ` David S. Miller
@ 2001-05-07 15:12     ` Tobias Ringstrom
  0 siblings, 0 replies; 104+ messages in thread
From: Tobias Ringstrom @ 2001-05-07 15:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jonathan Morton, BERECZ Szabolcs, linux-kernel, linux-mm

On Mon, 7 May 2001, David S. Miller wrote:
>
> I've seen this idiom used in many places (even the GCC sources :-),
> so I'm rather surprised people are seeing it for the first time.

Or perhaps some of us have seen it one time too many?  :-)

/Tobias


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-06 21:08 page_launder() bug BERECZ Szabolcs
  2001-05-06 21:59 ` Jonathan Morton
  2001-05-07  4:55 ` David S. Miller
@ 2001-05-07 17:59 ` Linus Torvalds
  2001-05-07 21:22   ` Marcelo Tosatti
  2001-05-07 22:44 ` David S. Miller
  3 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2001-05-07 17:59 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.A41.4.31.0105062307290.59664-100000@pandora.inf.elte.hu>,
BERECZ Szabolcs  <szabi@inf.elte.hu> wrote:
>
>there is a bug in page_launder introduced with kernel 2.4.3-ac12.

Yes.

The whole "dead_swap_page" optimization in the -ac tree is apparentrly
completely bogus.  It caches a value that is not valid: you cannot
reliably look at whether the page has buffers etc without holding the
page locked. 

So calculating "dead_swap_page" without locking the page first is a sure
way to cause trouble.

I can see why the bug was introduced: standard kernels _optimistically_
test whether the condition might be true before they lock the page, and
decide to not even try to touch pages that look like they are probably
going to be considered active.

But it is important to re-calculate the deadness after getting the lock.
Before, it was just an informed guess. After the lock, it is knowledge.
And you can use informed guesses for heuristics, but you must _not_ use
them for any serious decisions.

Alan, please revert.

		Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  8:59     ` Helge Hafting
@ 2001-05-07 19:02       ` J . A . Magallon
  2001-05-08  7:52         ` Helge Hafting
  2001-05-10 10:51         ` Anuradha Ratnaweera
  0 siblings, 2 replies; 104+ messages in thread
From: J . A . Magallon @ 2001-05-07 19:02 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Tobias Ringstrom, linux-kernel


On 05.07 Helge Hafting wrote:
> Tobias Ringstrom wrote:
> > 
> > On Sun, 6 May 2001, David S. Miller wrote:
> > > It is the most straightforward way to make a '1' or '0'
> > > integer from the NULL state of a pointer.
> > 
> > But is it really specified in the C "standards" to be exctly zero or one,
> > and not zero and non-zero?
> 
> !0 is 1.  !(anything else) is 0.  It is zero and one, not
> zero and "non-zero".  So a !! construction gives zero if you have
> zero, and one if you had anything else.  There's no doubt about it.
> >

Isn't this asking for trouble with the optimizer ? It could kill both
!!. Using that is like trusting on a certain struct padding-alignment.

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Linux Mandrake release 8.1 (Cooker) for i586
Linux werewolf 2.4.4-ac5 #1 SMP Sat May 5 01:17:07 CEST 2001 i686


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 17:59 ` Linus Torvalds
@ 2001-05-07 21:22   ` Marcelo Tosatti
  2001-05-07 23:23     ` Linus Torvalds
  2001-05-07 23:31     ` David S. Miller
  0 siblings, 2 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-07 21:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel



On 7 May 2001, Linus Torvalds wrote:

> But it is important to re-calculate the deadness after getting the
> lock. Before, it was just an informed guess. After the lock, it is
> knowledge. And you can use informed guesses for heuristics, but you
> must _not_ use them for any serious decisions.

And thats what swap_writepage() is doing:

static int swap_writepage(struct page *page)
{
        /* One for the page cache, one for this user, one for page->buffers */
        if (page_count(page) > 2 + !!page->buffers)
                goto in_use;
        if (swap_count(page) > 1)
                goto in_use;

...
}




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 23:23     ` Linus Torvalds
@ 2001-05-07 21:50       ` Marcelo Tosatti
  2001-05-07 23:52         ` Linus Torvalds
  2001-05-08  0:06         ` page_launder() bug David S. Miller
  0 siblings, 2 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-07 21:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel



On Mon, 7 May 2001, Linus Torvalds wrote:

> 
> On Mon, 7 May 2001, Marcelo Tosatti wrote:
> > 
> > On 7 May 2001, Linus Torvalds wrote:
> > 
> > > But it is important to re-calculate the deadness after getting the
> > > lock. Before, it was just an informed guess. After the lock, it is
> > > knowledge. And you can use informed guesses for heuristics, but you
> > > must _not_ use them for any serious decisions.
> > 
> > And thats what swap_writepage() is doing:
> 
> Ehh.. swap_writepage() is called with the page locked. So it _can_ depend
> on it.

So the "dead_swap_page" logic is _not_ buggy and you are full of shit when
telling Alan to revert the change. (sorry, I could not avoid this one)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 23:52         ` Linus Torvalds
@ 2001-05-07 22:26           ` Marcelo Tosatti
  2001-05-08  2:29             ` Linus Torvalds
  2001-05-08  0:16           ` David S. Miller
  1 sibling, 1 reply; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-07 22:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel



On Mon, 7 May 2001, Linus Torvalds wrote:

> 
> On Mon, 7 May 2001, Marcelo Tosatti wrote:
> > 
> > So the "dead_swap_page" logic is _not_ buggy and you are full of shit when
> > telling Alan to revert the change. (sorry, I could not avoid this one)
> 
> Well, the problem is that the patch _is_ buggy. 
> 
> swap_writepage() does it right. And dead_swap_page does it wrong. It
> doesn't look at the swap counts, for one thing.

So lets fix it and make it look for the swap counts. 

> The patch should be reverted. The fact that other parts of the system do
> it _right_ is not an argument for mm/vmscan.c to do it wrong.

My point is that its _ok_ for us to check if the page is a dead swap cache
page _without_ the lock since writepage() will recheck again with the page
_locked_. Quoting you two messages back: 

"But it is important to re-calculate the deadness after getting the lock.
Before, it was just an informed guess. After the lock, it is knowledge."

See ? 






^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-06 21:08 page_launder() bug BERECZ Szabolcs
                   ` (2 preceding siblings ...)
  2001-05-07 17:59 ` Linus Torvalds
@ 2001-05-07 22:44 ` David S. Miller
  2001-05-08  1:00   ` Horst von Brand
  3 siblings, 1 reply; 104+ messages in thread
From: David S. Miller @ 2001-05-07 22:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


Linus Torvalds writes:
 > The whole "dead_swap_page" optimization in the -ac tree is apparentrly
 > completely bogus.  It caches a value that is not valid: you cannot
 > reliably look at whether the page has buffers etc without holding the
 > page locked. 

It caches a value controlling heuristics, not "state".  Specifically
it controls whether we:

1) Ignore the referenced bit, this is fine.

2) Allow writepage() operations in the first pass.  This is fine too.

All normal checks are redone, only heuristics are changed.

Please show me how this is illegal.  Everyone comes to this conclusion
when the first read the code, that I am doing something illegal, then
when I explain what that dead_swap_page thing is doing and they read
it a second time (how shocking! :-) they go "oh, I see".

If the patch is causing problems, it is due to some other bug not my
patch itself.

I do not argue that my patch is "the" way to solve the dead swap page
problem, to the contrary.  Stephen has something which seems to try to
attack this issue in a much nicer way.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 21:22   ` Marcelo Tosatti
@ 2001-05-07 23:23     ` Linus Torvalds
  2001-05-07 21:50       ` Marcelo Tosatti
  2001-05-07 23:31     ` David S. Miller
  1 sibling, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2001-05-07 23:23 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel


On Mon, 7 May 2001, Marcelo Tosatti wrote:
> 
> On 7 May 2001, Linus Torvalds wrote:
> 
> > But it is important to re-calculate the deadness after getting the
> > lock. Before, it was just an informed guess. After the lock, it is
> > knowledge. And you can use informed guesses for heuristics, but you
> > must _not_ use them for any serious decisions.
> 
> And thats what swap_writepage() is doing:

Ehh.. swap_writepage() is called with the page locked. So it _can_ depend
on it.

If the page isn't locked there, then THAT is a bug. A major one.

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 21:22   ` Marcelo Tosatti
  2001-05-07 23:23     ` Linus Torvalds
@ 2001-05-07 23:31     ` David S. Miller
  1 sibling, 0 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-07 23:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel


Linus Torvalds writes:
 > 
 > On Mon, 7 May 2001, Marcelo Tosatti wrote:
 > > And thats what swap_writepage() is doing:
 > 
 > Ehh.. swap_writepage() is called with the page locked. So it _can_ depend
 > on it.
 > 
 > If the page isn't locked there, then THAT is a bug. A major one.

Linus, he's trying to point out that writepage() will recheck (under
lock) what you think my dead_swap_page thing is not checking :-)

My changes always call writepage() and always redo all the checks
under the proper locks.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 21:50       ` Marcelo Tosatti
@ 2001-05-07 23:52         ` Linus Torvalds
  2001-05-07 22:26           ` Marcelo Tosatti
  2001-05-08  0:16           ` David S. Miller
  2001-05-08  0:06         ` page_launder() bug David S. Miller
  1 sibling, 2 replies; 104+ messages in thread
From: Linus Torvalds @ 2001-05-07 23:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel


On Mon, 7 May 2001, Marcelo Tosatti wrote:
> 
> So the "dead_swap_page" logic is _not_ buggy and you are full of shit when
> telling Alan to revert the change. (sorry, I could not avoid this one)

Well, the problem is that the patch _is_ buggy. 

swap_writepage() does it right. And dead_swap_page does it wrong. It
doesn't look at the swap counts, for one thing.

The patch should be reverted. The fact that other parts of the system do
it _right_ is not an argument for mm/vmscan.c to do it wrong.

What do you expect me to do? The patch is buggy. It should be reverted.
What's your problem?

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 21:50       ` Marcelo Tosatti
  2001-05-07 23:52         ` Linus Torvalds
@ 2001-05-08  0:06         ` David S. Miller
  1 sibling, 0 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-08  0:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel


Linus Torvalds writes:
 > What do you expect me to do? The patch is buggy. It should be reverted.
 > What's your problem?

I think the problem he has is that you are acting as if the patch
causes corruptions and will end in failures.  This is how you are
coming across, at least.

Really, your problem with the patch seems to be aesthetic in nature
and that the patch is not doing things cleanly at all.  To this I will
not contest, it surely is not the way to fix this in the end.

If my analysis up to this point is correct, the patch should in fact
not be reverted.  Many patches in Alan's tree are not done in the most
aesthetically pleasing way to you, we all know this. But they do solve
problems for people.  Half of Alan's tree should be reverted if we
followed this rule, really.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 23:52         ` Linus Torvalds
  2001-05-07 22:26           ` Marcelo Tosatti
@ 2001-05-08  0:16           ` David S. Miller
  2001-05-08  2:34             ` Linus Torvalds
                               ` (2 more replies)
  1 sibling, 3 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-08  0:16 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, linux-kernel


Marcelo Tosatti writes:
 > My point is that its _ok_ for us to check if the page is a dead swap cache
 > page _without_ the lock since writepage() will recheck again with the page
 > _locked_. Quoting you two messages back: 
 > 
 > "But it is important to re-calculate the deadness after getting the lock.
 > Before, it was just an informed guess. After the lock, it is knowledge."
 > 
 > See ? 

In fact my patch isn't changing writepage behavior wrt. that page, it
is changing behavior with respect to laundering policy for that page.

Here, let's talk code a little bit so there are no misunderstandings,
I really want to put this to rest:

+		int dead_swap_page;
+
 		page = list_entry(page_lru, struct page, lru);
 
+		dead_swap_page =
+			(PageSwapCache(page) &&
+			 page_count(page) == (1 + !!page->buffers));
+

Calculate dead_swap_page outside of lock.

 		/* Page is or was in use?  Move it to the active list. */
-		if (PageTestandClearReferenced(page) || page->age > 0 ||
-				(!page->buffers && page_count(page) > 1) ||
-				page_ramdisk(page)) {
+		if (!dead_swap_page &&
+		    (PageTestandClearReferenced(page) || page->age > 0 ||
+		     (!page->buffers && page_count(page) > 1) ||
+		     page_ramdisk(page))) {
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_active_list(page);
 			continue;

If dead_swap_page, ignore referenced bit heuristics.

-			/* First time through? Move it to the back of the list */
-			if (!launder_loop) {
+			/* First time through? Move it to the back of the list,
+			 * but not if it is a dead swap page. We want to reap
+			 * those as fast as possible.
+			 */
+			if (!launder_loop && !dead_swap_page) {
 				list_del(page_lru);
 				list_add(page_lru, &inactive_dirty_list);
 				UnlockPage(page);

If dead_swap_page, ignore launder_loop.  Again, another heuristic
test, not a "state correctness" test.  "launder_loop" is not
protecting "state correctness" of what we do to the page.

Really, what does this have to do with swap counts and page counts?

It's a heuristic. In fact it even seems stupid to me to recalculate
dead_swap_page after we get the lock just for the sake of these
heuristics.

Maybe I should have diguised this bit as:

if (dead_swap_page)
	do_writepage_first_pass = 1;

To divert people's brains to what the intent was :-)

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 22:44 ` David S. Miller
@ 2001-05-08  1:00   ` Horst von Brand
  0 siblings, 0 replies; 104+ messages in thread
From: Horst von Brand @ 2001-05-08  1:00 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, Linus Torvalds

"David S. Miller" <davem@redhat.com> said:
> Linus Torvalds writes:
>  > The whole "dead_swap_page" optimization in the -ac tree is apparentrly
>  > completely bogus.  It caches a value that is not valid: you cannot
>  > reliably look at whether the page has buffers etc without holding the
>  > page locked. 

[...]

> Please show me how this is illegal.  Everyone comes to this conclusion
> when the first read the code, that I am doing something illegal, then
> when I explain what that dead_swap_page thing is doing and they read
> it a second time (how shocking! :-) they go "oh, I see".

Then it might be a better use of your time to place a comment in there
telling the gentle reader what is going on, and not tell them each time
they come asking/screaming that it is wrong. If even Linus gets confused by
it, it is mandatory IMVVHO
-- 
Horst von Brand                             vonbrand@sleipnir.valparaiso.cl
Casilla 9G, Vin~a del Mar, Chile                               +56 32 672616

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  2:47             ` David S. Miller
@ 2001-05-08  1:34               ` Marcelo Tosatti
  2001-05-08  3:18               ` David S. Miller
  1 sibling, 0 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-08  1:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linus Torvalds, linux-kernel


On Mon, 7 May 2001, David S. Miller wrote:

> 
> Linus Torvalds writes:
>  > YOUR HEURISTIC IS WRONG!
> 
> Please start the conversation this way next time.
> 
>  > I call that a bug. You don't. Fine.
> 
> You made it sound like a data corrupter, a kernel crasher, and that
> any bug against a kernel with that patch indicates my patch caused it.
> There is an important distinction between "this is doing something
> silly" and "this will scramble your disk and crash the kernel".
> 
> The latter is the conclusion several people came to.
> 
> And I wanted a clarification on this, nothing more.
> 
> I wanted this clarification from you _BECAUSE_ the original posting in
> this thread saw data corruption which went away after reverting my
> patch.  But there is no possible connection between my patch and the
> crashes he saw.

Ugh, there is.

I just thought about this case:
  
We find a dead swap cache page, so dead_swap_page goes to 1.

We call swap_writepage(), but in the meantime the swapin readahead code   
got a reference on the swap map for the page.

We write the page out because "(swap_count(page) > 1)", and we may
not have __GFP_IO set in the gfp_mask. Boom.




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  2:34             ` Linus Torvalds
@ 2001-05-08  1:40               ` Marcelo Tosatti
  2001-05-08  3:46                 ` Linus Torvalds
  2001-05-08  6:50                 ` David S. Miller
  2001-05-08  3:22               ` David S. Miller
  1 sibling, 2 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-08  1:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, linux-kernel



On Mon, 7 May 2001, Linus Torvalds wrote:

> > To divert people's brains to what the intent was :-)
> 
> I can see the intent.
> 
> I can also see that the code doesn't match up to the intent.
> 
> I call that a bug. You don't. Fine.
> 
> But that code isn't coming anywhere _close_ to my tree until the two
> match. And I stand by my assertion that it should be reverted from Alans
> tree too.

I was wrong. The patch is indeed buggy because of the __GFP_IO thing.

So what about moving the check for a dead swap cache page from
swap_writepage() to page_launder() (+ PageSwapCache() check) just before
the "if (!launder_loop)" ? 

Yes, its ugly special casing. Any other suggestion ? 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  3:18               ` David S. Miller
@ 2001-05-08  1:47                 ` Marcelo Tosatti
  2001-05-08  3:24                 ` Linus Torvalds
  2001-05-08  3:29                 ` David S. Miller
  2 siblings, 0 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-08  1:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linus Torvalds, linux-kernel


On Mon, 7 May 2001, David S. Miller wrote:

> 
> Marcelo Tosatti writes:
>  > I just thought about this case:
>  >   
>  > We find a dead swap cache page, so dead_swap_page goes to 1.
>  > 
>  > We call swap_writepage(), but in the meantime the swapin readahead code   
>  > got a reference on the swap map for the page.
>  > 
>  > We write the page out because "(swap_count(page) > 1)", and we may
>  > not have __GFP_IO set in the gfp_mask. Boom.
> 
> Hmmm, can't this happen without my patch?

No. We will never call writepage() without __GFP_IO without your patch.

> Nothing stops people from getting references to the page
> between the "Page is or was in use?" test and the line
> which does "TryLockPage(page)".

I don't see any problem with people getting a reference to the page there.



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 22:26           ` Marcelo Tosatti
@ 2001-05-08  2:29             ` Linus Torvalds
  2001-05-13 16:08               ` Rik van Riel
  0 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2001-05-08  2:29 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel


On Mon, 7 May 2001, Marcelo Tosatti wrote:
> 
> So lets fix it and make it look for the swap counts. 

Ehh.

Which you MUST NOT do without holding the page lock.

Hint: it needs "page->index", and without holding the page lock you don't
know what it could be. An out-of-bounds page index could do anything,
including oopsing the kernel. It so happens that right now you'll only get
a printk from swap_count(), because of defensive programming, but the fact
remains that you _cannot_ decide on whether a page is a dead swap-cache
entry until you've gotten the page lock.

> My point is that its _ok_ for us to check if the page is a dead swap cache
> page _without_ the lock since writepage() will recheck again with the page
> _locked_. Quoting you two messages back: 

Yes. But MY point is that the patch is buggy, and should be reverted. 

Move the swap_count check into the page lock. It's such a heavy operation
that you should, anyway. 

My message is true: you can do some "early out" optimizations. The code
has always done that. But if you look at the old code, it never did any
swap-count calculations or anything like that: it only looked at simple
bits in the "struct page" structure and the page count.

But once the level of complexity reaches actually doing function calls
etc, you should not call them early-out optimizations any more. Especially
not on data that could be stale, and that is used to dereference other
data structures (ie the swap count maps).

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  0:16           ` David S. Miller
@ 2001-05-08  2:34             ` Linus Torvalds
  2001-05-08  1:40               ` Marcelo Tosatti
  2001-05-08  3:22               ` David S. Miller
  2001-05-08  2:47             ` David S. Miller
  2001-05-08 12:33             ` Mikulas Patocka
  2 siblings, 2 replies; 104+ messages in thread
From: Linus Torvalds @ 2001-05-08  2:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: Marcelo Tosatti, linux-kernel


On Mon, 7 May 2001, David S. Miller wrote:
> 
> Here, let's talk code a little bit so there are no misunderstandings,
> I really want to put this to rest:
> 
> Calculate dead_swap_page outside of lock.

NO. That's not what you're doing at all. You're calculating something
completely different that "dead swap page". You're calculating "do we have
a swap cache entry that is not mapped into any virtual memory"?

> If dead_swap_page, ignore referenced bit heuristics.

Which is complete crap. Those reference bits are valid and important
data. You have not computed anything that says otherwise. You have
computed a random number that doesn't tell you anything about whether the
page is dead or not. 

> Really, what does this have to do with swap counts and page counts?
> 
> It's a heuristic. In fact it even seems stupid to me to recalculate
> dead_swap_page after we get the lock just for the sake of these
> heuristics.

YOUR HEURISTIC IS WRONG!

> Maybe I should have diguised this bit as:
> 
> if (dead_swap_page)
> 	do_writepage_first_pass = 1;

So tell me: what does the above help?

I repeat: your "dead_swap_page" variable is a random number with
absolutely no meaning. ANYTHING that uses it is buggy. It doesn't help in
the least if you use the first random state to set another random
state: the amount of randomness does not increase or decrease.

See?

> To divert people's brains to what the intent was :-)

I can see the intent.

I can also see that the code doesn't match up to the intent.

I call that a bug. You don't. Fine.

But that code isn't coming anywhere _close_ to my tree until the two
match. And I stand by my assertion that it should be reverted from Alans
tree too.

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  3:46                 ` Linus Torvalds
@ 2001-05-08  2:37                   ` Marcelo Tosatti
  2001-05-08 21:16                   ` Marcelo Tosatti
  1 sibling, 0 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-08  2:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, linux-kernel



On Mon, 7 May 2001, Linus Torvalds wrote:

> In fact, it might even clean stuff up. Who knows? At least
> page_launder() would not need to know about magic dead swap pages, because
> the decision would be entirely in writepage().
> 
> And there aren't that many writepage() implementations in the kernel to
> fix up (and the fixup tends to be two simple added lines of code for most
> of them - just the "if (!priority) return").
> 
> Also note how some filesystems might use the writepage() callback even
> with a zero priority as a hint that things are approaching the point where
> we need to start flushing, which might make a difference for deciding when
> to try to write a log entry, for example.

Moreover, the filesystem may want to return "-1" even if "priority" is
non-zero --- think about delayed allocations. (the XFS guys were just
complaining about page_launder() not checking the return value of
writepage() so they could not do this kind of thing with delayed
allocations sometime ago).

> Now, I'm not saying this is _the_ solution to it, but I don't see any
> really clean alternatives.

I like it --- it pushes down control to the pagers so they can be smarter.

Will send a patch later.



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  0:16           ` David S. Miller
  2001-05-08  2:34             ` Linus Torvalds
@ 2001-05-08  2:47             ` David S. Miller
  2001-05-08  1:34               ` Marcelo Tosatti
  2001-05-08  3:18               ` David S. Miller
  2001-05-08 12:33             ` Mikulas Patocka
  2 siblings, 2 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-08  2:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel


Linus Torvalds writes:
 > YOUR HEURISTIC IS WRONG!

Please start the conversation this way next time.

 > I call that a bug. You don't. Fine.

You made it sound like a data corrupter, a kernel crasher, and that
any bug against a kernel with that patch indicates my patch caused it.
There is an important distinction between "this is doing something
silly" and "this will scramble your disk and crash the kernel".

The latter is the conclusion several people came to.

And I wanted a clarification on this, nothing more.

I wanted this clarification from you _BECAUSE_ the original posting in
this thread saw data corruption which went away after reverting my
patch.  But there is no possible connection between my patch and the
crashes he saw.

Many people have already concluded that "Linus says davem's patch is
crap so it must have been causing the corruptions of the original
reporter."

Like you and I, most people are lazy.  And a lazy person is likely
to treat "bug goes away with reverting patch" plus "Linus says the
patch is crap" as "get rid of the patch, this'll fix the bug".  And
that'll be the end of it, nobody will investigate the true cause of
the problem until it shows up again later for some different reason.

 > But that code isn't coming anywhere _close_ to my tree until the two
 > match. And I stand by my assertion that it should be reverted from Alans
 > tree too.

I have no intent to ever submit that patch to your tree, I know it's
not the right way to solve this problem.

In fact I submitted that patch to you as an "[RFC]" a long time ago
and it fell on deaf ears.  Or perhaps you happened to have laser eye
surgery that particular day, who knows. :-)

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  2:47             ` David S. Miller
  2001-05-08  1:34               ` Marcelo Tosatti
@ 2001-05-08  3:18               ` David S. Miller
  2001-05-08  1:47                 ` Marcelo Tosatti
                                   ` (2 more replies)
  1 sibling, 3 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-08  3:18 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, linux-kernel


Marcelo Tosatti writes:
 > I just thought about this case:
 >   
 > We find a dead swap cache page, so dead_swap_page goes to 1.
 > 
 > We call swap_writepage(), but in the meantime the swapin readahead code   
 > got a reference on the swap map for the page.
 > 
 > We write the page out because "(swap_count(page) > 1)", and we may
 > not have __GFP_IO set in the gfp_mask. Boom.

Hmmm, can't this happen without my patch?

Nothing stops people from getting references to the page
between the "Page is or was in use?" test and the line
which does "TryLockPage(page)".

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  2:34             ` Linus Torvalds
  2001-05-08  1:40               ` Marcelo Tosatti
@ 2001-05-08  3:22               ` David S. Miller
  2001-05-08  3:26                 ` Linus Torvalds
  1 sibling, 1 reply; 104+ messages in thread
From: David S. Miller @ 2001-05-08  3:22 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, linux-kernel


Marcelo Tosatti writes:
 > I was wrong. The patch is indeed buggy because of the __GFP_IO thing.

What about the __GFP_IO thing?

Specifically, what protects the __GFP_IO thing from happening without
my patch?

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  3:18               ` David S. Miller
  2001-05-08  1:47                 ` Marcelo Tosatti
@ 2001-05-08  3:24                 ` Linus Torvalds
  2001-05-08  3:29                 ` David S. Miller
  2 siblings, 0 replies; 104+ messages in thread
From: Linus Torvalds @ 2001-05-08  3:24 UTC (permalink / raw)
  To: David S. Miller; +Cc: Marcelo Tosatti, linux-kernel


On Mon, 7 May 2001, David S. Miller wrote:
> 
> Marcelo Tosatti writes:
>  > I just thought about this case:
>  >   
>  > We find a dead swap cache page, so dead_swap_page goes to 1.
>  > 
>  > We call swap_writepage(), but in the meantime the swapin readahead code   
>  > got a reference on the swap map for the page.
>  > 
>  > We write the page out because "(swap_count(page) > 1)", and we may
>  > not have __GFP_IO set in the gfp_mask. Boom.

Yes. That looks a lot easier to trigger than my "slow memory
leak" schenario.

> Hmmm, can't this happen without my patch?

No. The old code would never try to write anything if __GFP_IO wasn't set,
because "launder_loop" would never become non-zero.

			Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  3:22               ` David S. Miller
@ 2001-05-08  3:26                 ` Linus Torvalds
  0 siblings, 0 replies; 104+ messages in thread
From: Linus Torvalds @ 2001-05-08  3:26 UTC (permalink / raw)
  To: David S. Miller; +Cc: Marcelo Tosatti, linux-kernel


On Mon, 7 May 2001, David S. Miller wrote:
> 
> Marcelo Tosatti writes:
>  > I was wrong. The patch is indeed buggy because of the __GFP_IO thing.
> 
> What about the __GFP_IO thing?
> 
> Specifically, what protects the __GFP_IO thing from happening without
> my patch?

This:

                        /* First time through? Move it to the back of the list */
                        if (!launder_loop) {
                                list_del(page_lru);
                                list_add(page_lru, &inactive_dirty_list);
                                UnlockPage(page);
                                continue;
                        }


        if (can_get_io_locks && !launder_loop && free_shortage()) {
                launder_loop = 1;
		...


Notice? If "can_get_io_locks" is not true, we will never call
"writepage()" on the page, because if "can_get_io_locks" is false,
"launder_loop" will always be false too..

And "can_get_io_locks" depends on __GFP_IO.

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  3:18               ` David S. Miller
  2001-05-08  1:47                 ` Marcelo Tosatti
  2001-05-08  3:24                 ` Linus Torvalds
@ 2001-05-08  3:29                 ` David S. Miller
  2001-05-08 10:36                   ` BERECZ Szabolcs
  2 siblings, 1 reply; 104+ messages in thread
From: David S. Miller @ 2001-05-08  3:29 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, linux-kernel


Marcelo Tosatti writes:
 > > Hmmm, can't this happen without my patch?
 > 
 > No. We will never call writepage() without __GFP_IO without your patch.
 > 

I see, because launder_loop never progresses to 1 in that case.

My patch is crap and can cause corruptions, there is not argument
about it now :-)

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  1:40               ` Marcelo Tosatti
@ 2001-05-08  3:46                 ` Linus Torvalds
  2001-05-08  2:37                   ` Marcelo Tosatti
  2001-05-08 21:16                   ` Marcelo Tosatti
  2001-05-08  6:50                 ` David S. Miller
  1 sibling, 2 replies; 104+ messages in thread
From: Linus Torvalds @ 2001-05-08  3:46 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: David S. Miller, linux-kernel


On Mon, 7 May 2001, Marcelo Tosatti wrote:
> 
> So what about moving the check for a dead swap cache page from
> swap_writepage() to page_launder() (+ PageSwapCache() check) just before
> the "if (!launder_loop)" ? 
> 
> Yes, its ugly special casing. Any other suggestion ? 

My most favourite approach by far is to just remove the magic for
different writepage's altogether, and just unconditionally do a
writepage. But passing in enough information so that the writepage can
come to the right decision.

So take the old code, and remove the code that does

	if (!launder_loop) {
		.. move to head ..
		continue;
	}
	writepage(page);

and instead make it do something like

	if (writepage(page, launderloop)) {
		.. not able to write, move to head ..
		continue;
	}

where "launderloop" is passed in to the writepage function as a priority.

Then, a regular "writepage()" would just do

	if (!priority)
		return -1;
	.. do real writepage here ..
	return 0;

while the special casing of dead swap cache entries can be moved into
swap_writepage() (which would do the above _too_, but would first try to
just drop the page if it can tell that the page is dead).

Now, add in the whole "accessed recently" as part of the priority (ie a
page that had the PG_referenced bit set will always get a "priority 0",
even if we could have laundered it, because we don't actually want to
write it out all that badly), and I think you'll get to where David wanted
to get _without_ any special cases.

In fact, it might even clean stuff up. Who knows? At least
page_launder() would not need to know about magic dead swap pages, because
the decision would be entirely in writepage().

And there aren't that many writepage() implementations in the kernel to
fix up (and the fixup tends to be two simple added lines of code for most
of them - just the "if (!priority) return").

Also note how some filesystems might use the writepage() callback even
with a zero priority as a hint that things are approaching the point where
we need to start flushing, which might make a difference for deciding when
to try to write a log entry, for example.

Now, I'm not saying this is _the_ solution to it, but I don't see any
really clean alternatives.

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  1:40               ` Marcelo Tosatti
  2001-05-08  3:46                 ` Linus Torvalds
@ 2001-05-08  6:50                 ` David S. Miller
  2001-05-08  7:40                   ` Linus Torvalds
  2001-05-08  8:29                   ` David S. Miller
  1 sibling, 2 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-08  6:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel


Linus Torvalds writes:
 > My most favourite approach by far is to just remove the magic for
 > different writepage's altogether, and just unconditionally do a
 > writepage. But passing in enough information so that the writepage can
 > come to the right decision.
 ...
 > In fact, it might even clean stuff up. Who knows? At least
 > page_launder() would not need to know about magic dead swap pages, because
 > the decision would be entirely in writepage().

Sure this would work.

The only downside would be that the formerly "quick case" in the loop
of dealing with referenced pages would now need to go inside the page
lock.  It's probably a non-issue...

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  6:50                 ` David S. Miller
@ 2001-05-08  7:40                   ` Linus Torvalds
  2001-05-08 18:53                     ` Marcelo Tosatti
  2001-05-08  8:29                   ` David S. Miller
  1 sibling, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2001-05-08  7:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: Marcelo Tosatti, linux-kernel


On Mon, 7 May 2001, David S. Miller wrote:
> 
> The only downside would be that the formerly "quick case" in the loop
> of dealing with referenced pages would now need to go inside the page
> lock.  It's probably a non-issue...

It might easily be an issue. That function will touch pretty much every
single page that we ever want to free, and it might be worthwhile to know
what the pressure is.

However, the point is probably moot. I found a problem with my approach:
using writepage() to try to get rid of swap cache pages early on (ie not
doing the "if it is accessed, put it back on the list" thing early)
doesn't work all that well: it doesn't handle the case of _clean_
swap-cache pages at all. And those can be quite common, although usually
not in the simple benchmarks which just dirty as quickly as they can.

[ The way to get a clean swap-cache page is to dirty it early in the
  process lifetime, and then use the page read-only later on over
  time. Maybe it's not common enough to worry about. ]

Ho humm. 

Maybe we just can't avoid special-casing the swap cache in page_launder,
and letting it know about things like swap counts (well, we obviously
_can_ avoid the special casing, as that's what it does now. But we might
be losing out by not doing so - right now we at least avoid doing
unnecessary writes, but we might be trying too hard to free memory that
really _should_ be more easily free'd).

Maybe it's academic. Do we know that any of this actually makes any
performance difference at all?

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 19:02       ` J . A . Magallon
@ 2001-05-08  7:52         ` Helge Hafting
  2001-05-10 12:19           ` Ingo Oeser
  2001-05-10 10:51         ` Anuradha Ratnaweera
  1 sibling, 1 reply; 104+ messages in thread
From: Helge Hafting @ 2001-05-08  7:52 UTC (permalink / raw)
  To: J . A . Magallon; +Cc: linux-kernel

"J . A . Magallon" wrote:
> 
> On 05.07 Helge Hafting wrote:

> > !0 is 1.  !(anything else) is 0.  It is zero and one, not
> > zero and "non-zero".  So a !! construction gives zero if you have
> > zero, and one if you had anything else.  There's no doubt about it.
> > >
> 
> Isn't this asking for trouble with the optimizer ? It could kill both
> !!. Using that is like trusting on a certain struct padding-alignment.

No, this won't cause trouble with the optimizer, because the
optimizer isn't supposed to do _wrong_ things.

The optimizer will not remove two !! in a row, simply because that
_isn't_ a valid optimization as you just have seen.  
"!" is a logical not, it isn't a bitwise not.  The result of
!!(something)
is different from just (something) whenever (something) is
neither 0 or 1, the optimizer knows that very well.

Use gcc -S to get readable assembly.
Try these yourself, they compile to different code:
int main(int argc){return argc;}
int main(int argc){return !!argc;}
They have to be different, as argc >= 2 gives different results.

And try:
int main(int argc){return !argc;}
int main(int argc){return !!!argc;}

These gives the same code with or without optimization with
gcc 2.95.4 on i386, as they are equivalent.  The first ! normalize
to 1 or 0, the two others are then removeable.

Helge Hafting

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  6:50                 ` David S. Miller
  2001-05-08  7:40                   ` Linus Torvalds
@ 2001-05-08  8:29                   ` David S. Miller
  1 sibling, 0 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-08  8:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel


Linus Torvalds writes:
 > Maybe it's academic. Do we know that any of this actually makes any
 > performance difference at all?

We know that dirty swap pages can accumulate to the point where the
swapper starves before it gets to enough of the "second pass" cases of
the page_launder loop to run in order to get rid of them.

That was the behavior I saw that let me to hacking up my broken patch.

Even simple "memory hog" test programs can trigger this behavior.
Just write a program which "grows almost completely into swap then
subsides" over and over again.

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  3:29                 ` David S. Miller
@ 2001-05-08 10:36                   ` BERECZ Szabolcs
  0 siblings, 0 replies; 104+ messages in thread
From: BERECZ Szabolcs @ 2001-05-08 10:36 UTC (permalink / raw)
  To: David S. Miller; +Cc: Marcelo Tosatti, Linus Torvalds, linux-kernel

On Mon, 7 May 2001, David S. Miller wrote:

> My patch is crap and can cause corruptions, there is not argument
> about it now :-)
is it the only bug in the swap handling?
or why is this bug triggered so heavily if the swap is on a filesystem?
I had oopses when I used a swapfile on a partition, but that was really
rare. I even don't think it's becouse page_launder().
so what's so different if the swap sits on a filesystem?

Bye,
Szabi



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  0:16           ` David S. Miller
  2001-05-08  2:34             ` Linus Torvalds
  2001-05-08  2:47             ` David S. Miller
@ 2001-05-08 12:33             ` Mikulas Patocka
  2001-05-13 16:24               ` Rik van Riel
  2 siblings, 1 reply; 104+ messages in thread
From: Mikulas Patocka @ 2001-05-08 12:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: Marcelo Tosatti, Linus Torvalds, linux-kernel

>  > My point is that its _ok_ for us to check if the page is a dead swap cache
>  > page _without_ the lock since writepage() will recheck again with the page
>  > _locked_. Quoting you two messages back: 
>  > 
>  > "But it is important to re-calculate the deadness after getting the lock.
>  > Before, it was just an informed guess. After the lock, it is knowledge."
>  > 
>  > See ? 
> 
> In fact my patch isn't changing writepage behavior wrt. that page, it
> is changing behavior with respect to laundering policy for that page.
> 
> Here, let's talk code a little bit so there are no misunderstandings,
> I really want to put this to rest:
> 
> +		int dead_swap_page;
> +
>  		page = list_entry(page_lru, struct page, lru);
>  
> +		dead_swap_page =
> +			(PageSwapCache(page) &&
> +			 page_count(page) == (1 + !!page->buffers));
> +
> 
> Calculate dead_swap_page outside of lock.
> 
>  		/* Page is or was in use?  Move it to the active list. */
> -		if (PageTestandClearReferenced(page) || page->age > 0 ||
> -				(!page->buffers && page_count(page) > 1) ||
> -				page_ramdisk(page)) {
			     ^^^^^^^^^^^^^^^^^
> +		if (!dead_swap_page &&
> +		    (PageTestandClearReferenced(page) || page->age > 0 ||
> +		     (!page->buffers && page_count(page) > 1) ||
> +		     page_ramdisk(page))) {
		^^^^^^^^^^^^^^^^^^^^^^
>  			del_page_from_inactive_dirty_list(page);
>  			add_page_to_active_list(page);
>  			continue;

#define page_ramdisk(page) \
        (page->buffers && (MAJOR(page->buffers->b_dev) == RAMDISK_MAJOR))

Are you sure that no one will release buffers under your hands?

Mikulas



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  4:55 ` David S. Miller
                     ` (3 preceding siblings ...)
  2001-05-07 14:52   ` Horst von Brand
@ 2001-05-08 17:59   ` Kai Henningsen
  2001-05-09  2:32   ` Rusty Russell
  2001-05-09  3:36   ` Jonathan Morton
  6 siblings, 0 replies; 104+ messages in thread
From: Kai Henningsen @ 2001-05-08 17:59 UTC (permalink / raw)
  To: linux-kernel

vonbrand@inf.utfsm.cl (Horst von Brand)  wrote on 07.05.01 in <200105071452.f47Eq2jn008611@pincoya.inf.utfsm.cl>:

> "David S. Miller" <davem@redhat.com> said:
> > Jonathan Morton writes:
> >  > >-			 page_count(page) == (1 + !!page->buffers));
> >  >
> >  > Two inversions in a row?
> >
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
>
> IMVHO, it is clearer to write:
>
>   page_count(page) == 1 + (page->buffers != NULL)
>
> At least, the original poster wouldn't have wondered, and I wouldn't have
> had to think a bit to find out what it meant... If gcc generates worse code
> for this, it should be fixed.

Huh. IMO, that is significantly *less* readable. And incidentally I'd be  
less certain that it actually does what you want - it is rather easy to  
convince yourself that !! has to do the right thing.

MfG Kai

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  7:40                   ` Linus Torvalds
@ 2001-05-08 18:53                     ` Marcelo Tosatti
  0 siblings, 0 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-08 18:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, linux-kernel



On Tue, 8 May 2001, Linus Torvalds wrote:

> 
> On Mon, 7 May 2001, David S. Miller wrote:
> > 
> > The only downside would be that the formerly "quick case" in the loop
> > of dealing with referenced pages would now need to go inside the page
> > lock.  It's probably a non-issue...
> 
> It might easily be an issue. That function will touch pretty much every
> single page that we ever want to free, and it might be worthwhile to know
> what the pressure is.
> 
> However, the point is probably moot. I found a problem with my approach:
> using writepage() to try to get rid of swap cache pages early on (ie not
> doing the "if it is accessed, put it back on the list" thing early)
> doesn't work all that well: it doesn't handle the case of _clean_
> swap-cache pages at all. And those can be quite common, although usually
> not in the simple benchmarks which just dirty as quickly as they can.
> 
> [ The way to get a clean swap-cache page is to dirty it early in the
>   process lifetime, and then use the page read-only later on over
>   time. Maybe it's not common enough to worry about. ]

What about swapin readahead ? 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  3:46                 ` Linus Torvalds
  2001-05-08  2:37                   ` Marcelo Tosatti
@ 2001-05-08 21:16                   ` Marcelo Tosatti
  2001-05-08 23:38                     ` Linus Torvalds
  1 sibling, 1 reply; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-08 21:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, linux-kernel



On Mon, 7 May 2001, Linus Torvalds wrote:

> 
> On Mon, 7 May 2001, Marcelo Tosatti wrote:
> > 
> > So what about moving the check for a dead swap cache page from
> > swap_writepage() to page_launder() (+ PageSwapCache() check) just before
> > the "if (!launder_loop)" ? 
> > 
> > Yes, its ugly special casing. Any other suggestion ? 
> 
> My most favourite approach by far is to just remove the magic for
> different writepage's altogether, and just unconditionally do a
> writepage. But passing in enough information so that the writepage can
> come to the right decision.
> 
> So take the old code, and remove the code that does
> 
> 	if (!launder_loop) {
> 		.. move to head ..
> 		continue;
> 	}
> 	writepage(page);
> 
> and instead make it do something like
> 
> 	if (writepage(page, launderloop)) {
> 		.. not able to write, move to head ..
> 		continue;
> 	}
> 
> where "launderloop" is passed in to the writepage function as a priority.

There are two issues which I missed yesterday: we have to get a reference
on the page, mark it clean, drop the locks and then call writepage(). If
the writepage() fails, we'll have to set_page_dirty(page).

I guess this is too much overhead for the common case, don't you? 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08 21:16                   ` Marcelo Tosatti
@ 2001-05-08 23:38                     ` Linus Torvalds
  2001-05-08 23:53                       ` Marcelo Tosatti
  2001-05-09  2:13                       ` David S. Miller
  0 siblings, 2 replies; 104+ messages in thread
From: Linus Torvalds @ 2001-05-08 23:38 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: David S. Miller, linux-kernel



On Tue, 8 May 2001, Marcelo Tosatti wrote:
>
> There are two issues which I missed yesterday: we have to get a reference
> on the page, mark it clean, drop the locks and then call writepage(). If
> the writepage() fails, we'll have to set_page_dirty(page).

We can move the "mark it clean" into writepage, which would actually
simplify the error cases for shared memory writepage (no need to mark it
dirty again etc).

> I guess this is too much overhead for the common case, don't you?

You could easily be right.

On the other hand, remember that a noticeable part of the time you should
be seeing a real write too, so the CPU overhead compared to the IO might
not be prohibitive. Ie, let's assuem that 10% of the time we actually end
up doing writes, then that 10% is going to be _soo_ much more than the
extra 10 cycles 90% of the time that the cleanup may well be worth it.

Especially if the cleanup means that we can avoid doing some of the real
writes altogether, by being better able to release dead memory to the
system.

Tradeoffs..

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08 23:38                     ` Linus Torvalds
@ 2001-05-08 23:53                       ` Marcelo Tosatti
  2001-05-09  2:13                       ` David S. Miller
  1 sibling, 0 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-08 23:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, lkml



On Tue, 8 May 2001, Linus Torvalds wrote:

> 
> 
> On Tue, 8 May 2001, Marcelo Tosatti wrote:
> >
> > There are two issues which I missed yesterday: we have to get a reference
> > on the page, mark it clean, drop the locks and then call writepage(). If
> > the writepage() fails, we'll have to set_page_dirty(page).
> 
> We can move the "mark it clean" into writepage, which would actually
> simplify the error cases for shared memory writepage (no need to mark it
> dirty again etc).
> 
> > I guess this is too much overhead for the common case, don't you?
> 
> You could easily be right.
> 
> On the other hand, remember that a noticeable part of the time you should
> be seeing a real write too, so the CPU overhead compared to the IO might
> not be prohibitive. Ie, let's assuem that 10% of the time we actually end
> up doing writes, then that 10% is going to be _soo_ much more than the
> extra 10 cycles 90% of the time that the cleanup may well be worth it.
> 
> Especially if the cleanup means that we can avoid doing some of the real
> writes altogether, by being better able to release dead memory to the
> system.

Ok, this patch implements thet thing and also changes ext2+swap+shm
writepage operations (so I could test the thing).

The performance is better with the patch on my restricted swapping tests.

In case you don't have any problems with this I'll fix the other
writepage's (so tell me if its ok for you).


diff -Nur --exclude-from=exclude linux.orig/fs/buffer.c linux/fs/buffer.c
--- linux.orig/fs/buffer.c	Mon May  7 20:47:26 2001
+++ linux/fs/buffer.c	Tue May  8 22:04:00 2001
@@ -1933,12 +1933,17 @@
 	return err;
 }
 
-int block_write_full_page(struct page *page, get_block_t *get_block)
+int block_write_full_page(struct page *page, get_block_t *get_block, int priority)
 {
 	struct inode *inode = page->mapping->host;
 	unsigned long end_index = inode->i_size >> PAGE_CACHE_SHIFT;
 	unsigned offset;
 	int err;
+
+	if (!priority)
+		return -1;
+
+	ClearPageDirty(page);
 
 	/* easy case */
 	if (page->index < end_index)
diff -Nur --exclude-from=exclude linux.orig/fs/ext2/inode.c linux/fs/ext2/inode.c
--- linux.orig/fs/ext2/inode.c	Mon May  7 20:47:26 2001
+++ linux/fs/ext2/inode.c	Tue May  8 20:46:54 2001
@@ -650,9 +650,9 @@
 	return NULL;
 }
 
-static int ext2_writepage(struct page *page)
+static int ext2_writepage(struct page *page, int priority)
 {
-	return block_write_full_page(page,ext2_get_block);
+	return block_write_full_page(page,ext2_get_block,priority);
 }
 static int ext2_readpage(struct file *file, struct page *page)
 {
diff -Nur --exclude-from=exclude linux.orig/include/linux/fs.h linux/include/linux/fs.h
--- linux.orig/include/linux/fs.h	Tue May  8 16:45:42 2001
+++ linux/include/linux/fs.h	Tue May  8 22:22:38 2001
@@ -362,7 +362,7 @@
 struct address_space;
 
 struct address_space_operations {
-	int (*writepage)(struct page *);
+	int (*writepage)(struct page *, int);
 	int (*readpage)(struct file *, struct page *);
 	int (*sync_page)(struct page *);
 	int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
@@ -1268,7 +1268,7 @@
 /* Generic buffer handling for block filesystems.. */
 extern int block_flushpage(struct page *, unsigned long);
 extern int block_symlink(struct inode *, const char *, int);
-extern int block_write_full_page(struct page*, get_block_t*);
+extern int block_write_full_page(struct page*, get_block_t*, int);
 extern int block_read_full_page(struct page*, get_block_t*);
 extern int block_prepare_write(struct page*, unsigned, unsigned, get_block_t*);
 extern int cont_prepare_write(struct page*, unsigned, unsigned, get_block_t*,
diff -Nur --exclude-from=exclude linux.orig/mm/filemap.c linux/mm/filemap.c
--- linux.orig/mm/filemap.c	Mon May  7 20:47:26 2001
+++ linux/mm/filemap.c	Tue May  8 22:22:50 2001
@@ -411,7 +411,7 @@
  */
 void filemap_fdatasync(struct address_space * mapping)
 {
-	int (*writepage)(struct page *) = mapping->a_ops->writepage;
+	int (*writepage)(struct page *, int) = mapping->a_ops->writepage;
 
 	spin_lock(&pagecache_lock);
 
@@ -430,8 +430,7 @@
 		lock_page(page);
 
 		if (PageDirty(page)) {
-			ClearPageDirty(page);
-			writepage(page);
+			writepage(page, 1);
 		} else
 			UnlockPage(page);
 
diff -Nur --exclude-from=exclude linux.orig/mm/shmem.c linux/mm/shmem.c
--- linux.orig/mm/shmem.c	Mon May  7 20:47:26 2001
+++ linux/mm/shmem.c	Tue May  8 22:23:01 2001
@@ -221,13 +221,16 @@
  * once.  We still need to guard against racing with
  * shmem_getpage_locked().  
  */
-static int shmem_writepage(struct page * page)
+static int shmem_writepage(struct page * page, int priority)
 {
 	int error = 0;
 	struct shmem_inode_info *info;
 	swp_entry_t *entry, swap;
 	struct inode *inode;
 
+	if (!priority)
+		return -1;
+
 	if (!PageLocked(page))
 		BUG();
 	
@@ -243,6 +246,8 @@
 	if (!swap.val)
 		goto out;
 
+	ClearPageDirty(page);	
+
 	spin_lock(&info->lock);
 	entry = shmem_swp_entry(info, page->index);
 	if (IS_ERR(entry))	/* this had been allocted on page allocation */
@@ -265,7 +270,6 @@
 
 	spin_unlock(&info->lock);
 out:
-	set_page_dirty(page);
 	UnlockPage(page);
 	return error;
 }
diff -Nur --exclude-from=exclude linux.orig/mm/swap_state.c linux/mm/swap_state.c
--- linux.orig/mm/swap_state.c	Mon May  7 20:47:26 2001
+++ linux/mm/swap_state.c	Tue May  8 21:39:44 2001
@@ -21,7 +21,7 @@
  * We may have stale swap cache pages in memory: notice
  * them here and get rid of the unnecessary final write.
  */
-static int swap_writepage(struct page *page)
+static int swap_writepage(struct page *page, int priority)
 {
 	/* One for the page cache, one for this user, one for page->buffers */
 	if (page_count(page) > 2 + !!page->buffers)
@@ -30,10 +30,14 @@
 		goto in_use;
 
 	/* We could remove it here, but page_launder will do it anyway */
+	ClearPageDirty(page);
 	UnlockPage(page);
 	return 0;
 
 in_use:
+	if (!priority) 
+		return -1;
+	ClearPageDirty(page);
 	rw_swap_page(WRITE, page);
 	return 0;
 }
diff -Nur --exclude-from=exclude linux.orig/mm/vmscan.c linux/mm/vmscan.c
--- linux.orig/mm/vmscan.c	Mon May  7 20:47:26 2001
+++ linux/mm/vmscan.c	Tue May  8 22:23:32 2001
@@ -472,33 +472,30 @@
 		}
 
 		/*
-		 * Dirty swap-cache page? Write it out if
+		 * Dirty page? Write it out if
 		 * last copy..
 		 */
 		if (PageDirty(page)) {
-			int (*writepage)(struct page *) = page->mapping->a_ops->writepage;
+			int (*writepage)(struct page *, int) = 
+				page->mapping->a_ops->writepage;
 
 			if (!writepage)
 				goto page_active;
-
-			/* First time through? Move it to the back of the list */
-			if (!launder_loop) {
-				list_del(page_lru);
-				list_add(page_lru, &inactive_dirty_list);
+			
+			page_cache_get(page);
+			list_del(page_lru);
+			list_add(page_lru, &inactive_dirty_list);
+			spin_unlock(&pagemap_lru_lock);
+		
+			if (writepage(page, launder_loop)) {
+				spin_lock(&pagemap_lru_lock);
 				UnlockPage(page);
+				page_cache_release(page);
 				continue;
 			}
-
-			/* OK, do a physical asynchronous write to swap.  */
-			ClearPageDirty(page);
-			page_cache_get(page);
-			spin_unlock(&pagemap_lru_lock);
-
-			writepage(page);
-			page_cache_release(page);
-
-			/* And re-start the thing.. */
+		
 			spin_lock(&pagemap_lru_lock);
+			page_cache_release(page);
 			continue;
 		}
 







^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08 23:38                     ` Linus Torvalds
  2001-05-08 23:53                       ` Marcelo Tosatti
@ 2001-05-09  2:13                       ` David S. Miller
  2001-05-09 17:38                         ` Marcelo Tosatti
                                           ` (3 more replies)
  1 sibling, 4 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-09  2:13 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, lkml


Marcelo Tosatti writes:
 > Ok, this patch implements thet thing and also changes ext2+swap+shm
 > writepage operations (so I could test the thing).
 > 
 > The performance is better with the patch on my restricted swapping tests.

Nice.  Now the only bit left is moving the referenced bit
checking and/or state into writepage as well.  This is still
part of the plan right?

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  4:55 ` David S. Miller
                     ` (4 preceding siblings ...)
  2001-05-08 17:59   ` page_launder() bug Kai Henningsen
@ 2001-05-09  2:32   ` Rusty Russell
  2001-05-09  8:43     ` Martin Dalecki
  2001-05-09  3:36   ` Jonathan Morton
  6 siblings, 1 reply; 104+ messages in thread
From: Rusty Russell @ 2001-05-09  2:32 UTC (permalink / raw)
  To: linux-kernel, linux-mm

In message <15094.10942.592911.70443@pizda.ninka.net> you write:
> 
> Jonathan Morton writes:
>  > >-			 page_count(page) == (1 + !!page->buffers));
>  > 
>  > Two inversions in a row?
> 
> It is the most straightforward way to make a '1' or '0'
> integer from the NULL state of a pointer.

Overall, I'd have to say that this:

-		dead_swap_page =
-			(PageSwapCache(page) &&
-			 page_count(page) == (1 + !!page->buffers));
-

Is nicer as:

		int dead_swap_page = 0;

		if (PageSwapCache(page)
		    && page_count(page) == (page->buffers ? 1 : 2))
			dead_swap_page = 1;

After all, the second is what the code *means* (1 and 2 are magic
numbers).

That said, anyone who doesn't understand the former should probably
get some more C experience before commenting on others' code...

Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07  4:55 ` David S. Miller
                     ` (5 preceding siblings ...)
  2001-05-09  2:32   ` Rusty Russell
@ 2001-05-09  3:36   ` Jonathan Morton
  6 siblings, 0 replies; 104+ messages in thread
From: Jonathan Morton @ 2001-05-09  3:36 UTC (permalink / raw)
  To: Rusty Russell, linux-kernel, linux-mm

>That said, anyone who doesn't understand the former should probably
>get some more C experience before commenting on others' code...

I understood it, but it looked very much like a typo.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-09  2:32   ` Rusty Russell
@ 2001-05-09  8:43     ` Martin Dalecki
  0 siblings, 0 replies; 104+ messages in thread
From: Martin Dalecki @ 2001-05-09  8:43 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, linux-mm

Rusty Russell wrote:
> 
> In message <15094.10942.592911.70443@pizda.ninka.net> you write:
> >
> > Jonathan Morton writes:
> >  > >-                  page_count(page) == (1 + !!page->buffers));
> >  >
> >  > Two inversions in a row?
> >
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
> 
> Overall, I'd have to say that this:
> 
> -               dead_swap_page =
> -                       (PageSwapCache(page) &&
> -                        page_count(page) == (1 + !!page->buffers));
> -
> 
> Is nicer as:
> 
>                 int dead_swap_page = 0;
> 
>                 if (PageSwapCache(page)
>                     && page_count(page) == (page->buffers ? 1 : 2))
>                         dead_swap_page = 1;
> 
> After all, the second is what the code *means* (1 and 2 are magic
> numbers).
> 
> That said, anyone who doesn't understand the former should probably
> get some more C experience before commenting on others' code...

Basically Amen.

But there are may be better chances that the compiler does do
better job at branch prediction in the second case? 
Wenn anyway objdump -S should show it...

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-09  2:13                       ` David S. Miller
@ 2001-05-09 17:38                         ` Marcelo Tosatti
  2001-05-09 20:05                         ` David S. Miller
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-09 17:38 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linus Torvalds, lkml


On Tue, 8 May 2001, David S. Miller wrote:

> 
> Marcelo Tosatti writes:
>  > Ok, this patch implements thet thing and also changes ext2+swap+shm
>  > writepage operations (so I could test the thing).
>  > 
>  > The performance is better with the patch on my restricted swapping tests.
> 
> Nice.  Now the only bit left is moving the referenced bit
> checking and/or state into writepage as well.  This is still
> part of the plan right?

Hum...

You want writepage() to check/clean the referenced bit and move the page
to the active list itself ?


 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-09 20:05                         ` David S. Miller
@ 2001-05-09 18:40                           ` Marcelo Tosatti
  2001-05-09 21:08                           ` David S. Miller
  1 sibling, 0 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-09 18:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linus Torvalds, lkml



On Wed, 9 May 2001, David S. Miller wrote:

> 
> Marcelo Tosatti writes:
>  > You want writepage() to check/clean the referenced bit and move the page
>  > to the active list itself ?
> 
> Well, that's the other part of what my patch was doing.
> 
> Let me state it a different way, how is the new writepage() framework
> going to do things like ignore the referenced bit during page_launder
> for dead swap pages?

Its not able to ignore the referenced bit. 

I know we want that, but I can't see any clean way of doing that. 

Suggestions ? 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-09 21:08                           ` David S. Miller
@ 2001-05-09 19:50                             ` Marcelo Tosatti
  0 siblings, 0 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-09 19:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linus Torvalds, lkml


On Wed, 9 May 2001, David S. Miller wrote:

> 
> Marcelo Tosatti writes:
>  > > Let me state it a different way, how is the new writepage() framework
>  > > going to do things like ignore the referenced bit during page_launder
>  > > for dead swap pages?
>  > 
>  > Its not able to ignore the referenced bit. 
>  > 
>  > I know we want that, but I can't see any clean way of doing that. 
> 
> Unfortunately, one way would involve pushing the referenced bit check
> into the writepage() method.  But that is the only place we have the
> kind of information necessary to make these decisions.
> 
> This is exactly the kind of issue Linus and myself were talking
> about when the "cost analysis" parts of thie discussion began.

I'm not convinced that moving the referenced bit check inside writepage()
is worth it. We will duplicate code which is purely a VM job (ie check the
referenced bit and possibly remove the page from the inactive dirty list
and add it to the active list, _with_ the pagemap_lru_lock held), not a
pager job.

The PageDirty bit handling is different IMHO --- the pager is the one who
knows if it actually wrote the page or not. 











^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-09  2:13                       ` David S. Miller
  2001-05-09 17:38                         ` Marcelo Tosatti
@ 2001-05-09 20:05                         ` David S. Miller
  2001-05-09 18:40                           ` Marcelo Tosatti
  2001-05-09 21:08                           ` David S. Miller
  2001-05-13 16:34                         ` Rik van Riel
  2001-05-13 17:52                         ` David S. Miller
  3 siblings, 2 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-09 20:05 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, lkml


Marcelo Tosatti writes:
 > You want writepage() to check/clean the referenced bit and move the page
 > to the active list itself ?

Well, that's the other part of what my patch was doing.

Let me state it a different way, how is the new writepage() framework
going to do things like ignore the referenced bit during page_launder
for dead swap pages?

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-09 20:05                         ` David S. Miller
  2001-05-09 18:40                           ` Marcelo Tosatti
@ 2001-05-09 21:08                           ` David S. Miller
  2001-05-09 19:50                             ` Marcelo Tosatti
  1 sibling, 1 reply; 104+ messages in thread
From: David S. Miller @ 2001-05-09 21:08 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, lkml


Marcelo Tosatti writes:
 > > Let me state it a different way, how is the new writepage() framework
 > > going to do things like ignore the referenced bit during page_launder
 > > for dead swap pages?
 > 
 > Its not able to ignore the referenced bit. 
 > 
 > I know we want that, but I can't see any clean way of doing that. 

Unfortunately, one way would involve pushing the referenced bit check
into the writepage() method.  But that is the only place we have the
kind of information necessary to make these decisions.

This is exactly the kind of issue Linus and myself were talking
about when the "cost analysis" parts of thie discussion began.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-07 19:02       ` J . A . Magallon
  2001-05-08  7:52         ` Helge Hafting
@ 2001-05-10 10:51         ` Anuradha Ratnaweera
  1 sibling, 0 replies; 104+ messages in thread
From: Anuradha Ratnaweera @ 2001-05-10 10:51 UTC (permalink / raw)
  To: J . A . Magallon; +Cc: Helge Hafting, Tobias Ringstrom, linux-kernel


On Mon, 7 May 2001, J . A . Magallon wrote:

> 
> On 05.07 Helge Hafting wrote:
> >
> > !0 is 1.  !(anything else) is 0.  It is zero and one, not
> > zero and "non-zero".  So a !! construction gives zero if you have
> > zero, and one if you had anything else.  There's no doubt about it.
> > >
> 
> Isn't this asking for trouble with the optimizer ? It could kill both
> !!. Using that is like trusting on a certain struct padding-alignment.
> 

It isn't, or rather it can't. Because !!x is not x unless x is one or
zero.

Anuradha


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  7:52         ` Helge Hafting
@ 2001-05-10 12:19           ` Ingo Oeser
  0 siblings, 0 replies; 104+ messages in thread
From: Ingo Oeser @ 2001-05-10 12:19 UTC (permalink / raw)
  To: Helge Hafting; +Cc: J . A . Magallon, linux-kernel

On Tue, May 08, 2001 at 09:52:15AM +0200, Helge Hafting wrote:
> > Isn't this asking for trouble with the optimizer ? It could kill both
> > !!. Using that is like trusting on a certain struct padding-alignment.
> 
> No, this won't cause trouble with the optimizer, because the
> optimizer isn't supposed to do _wrong_ things.
 
Right. The optimizer proves equivalence of terms and exchange the
one that are bad for the optimization goal (e.g performance,
speed, size) against the one that works more towards this goal.

Everything else is an optimizer BUG, which should be reported and
fixed.

The C{89,99} standard now defines the syntax and semantics of
theses terms. 

Relevant for the optimizer: possible values of terms, assumptions
made on the static and dynamic behavior of these terms (add
anything I forgot).

So the optimizer should NEVER cause trouble if you write
completely valid C{89,99} and the compiler and environment
implement 100% of the semantics of it.

Compiler specific features should be seen as an addition to the
standard on this compiler. They follow the same rules stated
above.

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
         <<<<<<<<<<<<     been there and had much fun   >>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08  2:29             ` Linus Torvalds
@ 2001-05-13 16:08               ` Rik van Riel
  2001-05-13 19:29                 ` Linus Torvalds
  0 siblings, 1 reply; 104+ messages in thread
From: Rik van Riel @ 2001-05-13 16:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel

On Mon, 7 May 2001, Linus Torvalds wrote:

> Which you MUST NOT do without holding the page lock.
> 
> Hint: it needs "page->index", and without holding the page lock you
> don't know what it could be.

Wouldn't that be the pagecache_lock ?

Remember that the semantics for find_swap_page() and
friends got changed recently to first test PageUptodate
and only try to lock the page if that didn't work out.

This means that the swapin path (and the same path for
other pagecache pages) doesn't take the page lock and
the page lock doesn't protect us from other people using
the page while we have it locked.

What _does_ protect us, however, is the fact that
reclaim_page() grabs the pagecache_lock...

[OTOH, I could be out to lunch here since I was away for
almost a week and haven't checked if the discussed changes
really were integrated into the kernel]

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-08 12:33             ` Mikulas Patocka
@ 2001-05-13 16:24               ` Rik van Riel
  2001-05-13 21:02                 ` Another VM race? (was: page_launder() bug) Mikulas Patocka
  0 siblings, 1 reply; 104+ messages in thread
From: Rik van Riel @ 2001-05-13 16:24 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: David S. Miller, Marcelo Tosatti, Linus Torvalds, linux-kernel

On Tue, 8 May 2001, Mikulas Patocka wrote:

> > +		if (!dead_swap_page &&
> > +		    (PageTestandClearReferenced(page) || page->age > 0 ||
> > +		     (!page->buffers && page_count(page) > 1) ||
> > +		     page_ramdisk(page))) {
> 		^^^^^^^^^^^^^^^^^^^^^^
> >  			del_page_from_inactive_dirty_list(page);
> >  			add_page_to_active_list(page);
> >  			continue;
> 
> #define page_ramdisk(page) \
>         (page->buffers && (MAJOR(page->buffers->b_dev) == RAMDISK_MAJOR))
> 
> Are you sure that no one will release buffers under your hands?

Two things can happen:

1) the page gets ramdisk buffers _after_ we look at it first,
   in this case the page isn't freeable and will be moved to
   the active list on the next page_launder() loop

2) the page loses its ramdisk buffers after we look at it,
   now the page is freeable, but we won't see it again until
   it is moved from the active list to the inactive_dirty
   list again

Any side effects harmful enough to warrant complicating this
test ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-09  2:13                       ` David S. Miller
  2001-05-09 17:38                         ` Marcelo Tosatti
  2001-05-09 20:05                         ` David S. Miller
@ 2001-05-13 16:34                         ` Rik van Riel
  2001-05-13 19:34                           ` Linus Torvalds
  2001-05-13 17:52                         ` David S. Miller
  3 siblings, 1 reply; 104+ messages in thread
From: Rik van Riel @ 2001-05-13 16:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: Marcelo Tosatti, Linus Torvalds, lkml

On Tue, 8 May 2001, David S. Miller wrote:
> Marcelo Tosatti writes:
>  > Ok, this patch implements thet thing and also changes ext2+swap+shm
>  > writepage operations (so I could test the thing).
>  > 
>  > The performance is better with the patch on my restricted swapping tests.
> 
> Nice.  Now the only bit left is moving the referenced bit
> checking and/or state into writepage as well.  This is still
> part of the plan right?

Why the hell would we want this ?

If the page is referenced, it should be moved back to the
active list and should never be a candidate for writeout.

I'm very happy we got rid of the horribly intertwined VM
code in 2.2 and went to a more separated design in 2.4...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-09  2:13                       ` David S. Miller
                                           ` (2 preceding siblings ...)
  2001-05-13 16:34                         ` Rik van Riel
@ 2001-05-13 17:52                         ` David S. Miller
  2001-05-13 17:55                           ` Rik van Riel
  2001-05-13 18:00                           ` David S. Miller
  3 siblings, 2 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-13 17:52 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Marcelo Tosatti, Linus Torvalds, lkml


Rik van Riel writes:
 > On Tue, 8 May 2001, David S. Miller wrote:
 > > Nice.  Now the only bit left is moving the referenced bit
 > > checking and/or state into writepage as well.  This is still
 > > part of the plan right?
 > 
 > Why the hell would we want this ?

Because if it's a dead swap page the referenced bit is meaningless
and we should just kill off the page immediately.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-13 17:52                         ` David S. Miller
@ 2001-05-13 17:55                           ` Rik van Riel
  2001-05-13 18:00                           ` David S. Miller
  1 sibling, 0 replies; 104+ messages in thread
From: Rik van Riel @ 2001-05-13 17:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: Marcelo Tosatti, Linus Torvalds, lkml

On Sun, 13 May 2001, David S. Miller wrote:
> Rik van Riel writes:
>  > On Tue, 8 May 2001, David S. Miller wrote:
>  > > Nice.  Now the only bit left is moving the referenced bit
>  > > checking and/or state into writepage as well.  This is still
>  > > part of the plan right?
>  > 
>  > Why the hell would we want this ?
> 
> Because if it's a dead swap page the referenced bit is meaningless
> and we should just kill off the page immediately.

Then I'd rather check this in a visible place in page_launder()
itself. Granted, this is a special case, but I don't think this
one is worth obfuscating the code for...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-13 17:52                         ` David S. Miller
  2001-05-13 17:55                           ` Rik van Riel
@ 2001-05-13 18:00                           ` David S. Miller
  1 sibling, 0 replies; 104+ messages in thread
From: David S. Miller @ 2001-05-13 18:00 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Marcelo Tosatti, Linus Torvalds, lkml


Rik van Riel writes:
 > Then I'd rather check this in a visible place in page_launder()
 > itself. Granted, this is a special case, but I don't think this
 > one is worth obfuscating the code for...

I think Linus's scheme is just fine, controlling the new 'priority'
argument to writepage() using the referenced bit as an input.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-13 16:08               ` Rik van Riel
@ 2001-05-13 19:29                 ` Linus Torvalds
  2001-05-14 22:05                   ` Marcelo Tosatti
  0 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2001-05-13 19:29 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Marcelo Tosatti, linux-kernel


On Sun, 13 May 2001, Rik van Riel wrote:
> 
> This means that the swapin path (and the same path for
> other pagecache pages) doesn't take the page lock and
> the page lock doesn't protect us from other people using
> the page while we have it locked.

You can test for swap cache deadness without holding the page cache lock:
if the swap count is 1, then we know that nobody else has this swap entry
in its page tables, and thus there can not be any concurrent lookups
either.

Now, it may well be that we need to make sure that there is some proper
ordering (nobody must decrement the swap count before they increment the
page count or something). I think that is the case anyway (and I _think_
that everybody that mucks with the swap count always hold the page count -
this might be a good thing to check).

So while you're right in general, there are some things we can do with
less global locking. Just holding the page lock will guarantee that at
least nobody changes the index etc from under us, so that the things we
test are at least fairly well-defined.

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-13 16:34                         ` Rik van Riel
@ 2001-05-13 19:34                           ` Linus Torvalds
  2001-05-13 19:39                             ` Rik van Riel
  0 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2001-05-13 19:34 UTC (permalink / raw)
  To: Rik van Riel; +Cc: David S. Miller, Marcelo Tosatti, lkml


On Sun, 13 May 2001, Rik van Riel wrote:
> 
> Why the hell would we want this ?

You've missed about half the discussion, it seems..

> If the page is referenced, it should be moved back to the
> active list and should never be a candidate for writeout.

Wrong.

There are
 (a) dead swap pages, where it doesn't matter one _whit_ whether it is
     referenced or not, because we know with 100% certainty that nobody
     will ever reference it again. This _may_ be true in other cases too,
     but we know it is true for swap pages that have lost all references.
 (b) filesystems and memory allocators that might want to get feedback on
     the fact that we're even _looking_ at their pages, and that we're
     aging them down. They might easily use these things for starting
     background activity like deciding to close the logs..

The high-level VM layer simply doesn't have that kind of information.

		Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-13 19:34                           ` Linus Torvalds
@ 2001-05-13 19:39                             ` Rik van Riel
  2001-05-13 20:42                               ` Linus Torvalds
  0 siblings, 1 reply; 104+ messages in thread
From: Rik van Riel @ 2001-05-13 19:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, Marcelo Tosatti, lkml

On Sun, 13 May 2001, Linus Torvalds wrote:
> On Sun, 13 May 2001, Rik van Riel wrote:
> > 
> > Why the hell would we want this ?
> You've missed about half the discussion, it seems..

True, I was away at a conference ;)

> > If the page is referenced, it should be moved back to the
> > active list and should never be a candidate for writeout.
> 
> Wrong.
> 
> There are
>  (a) dead swap pages, where it doesn't matter one _whit_ whether it is
>      referenced or not, because we know with 100% certainty that nobody
>      will ever reference it again. This _may_ be true in other cases too,
>      but we know it is true for swap pages that have lost all references.
>  (b) filesystems and memory allocators that might want to get feedback on
>      the fact that we're even _looking_ at their pages, and that we're
>      aging them down. They might easily use these things for starting
>      background activity like deciding to close the logs..
> 
> The high-level VM layer simply doesn't have that kind of information.

Agreed.  I'd like to make sure, however, that we keep the
high-level VM cleanly separated from the lower layers so
we can keep the VM maintainable and predictable...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-13 19:39                             ` Rik van Riel
@ 2001-05-13 20:42                               ` Linus Torvalds
  2001-05-14  7:05                                 ` Kai Henningsen
  0 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2001-05-13 20:42 UTC (permalink / raw)
  To: Rik van Riel; +Cc: David S. Miller, Marcelo Tosatti, lkml


On Sun, 13 May 2001, Rik van Riel wrote:
> > 
> > The high-level VM layer simply doesn't have that kind of information.
> 
> Agreed.  I'd like to make sure, however, that we keep the
> high-level VM cleanly separated from the lower layers so
> we can keep the VM maintainable and predictable...

This is by far my biggest issue.

This is also why I much prefer to not have page_launder() know about swap
counts and issues like that - adding logic to page_launder() to find and
dismiss dead swap pages would have been the straightforward approach, but
would have made swap more special than I think it is.

Right now I'm _fairly_ happy with how generic the page cache handling is,
and how swapping and shared memory are much better integrated in the
overall design than they used to be. Swapping (and especially shared
memory) used to be a ratsnest of special cases, with magic function calls
and innate knowledge of how things worked by the core VM code.

These days, the page cache and the notion of "address spaces" have taken
up a lot of the special case logic, and most of it is gone. I'm looking
forward to the day when we could drop the "PG_swapcache" bit in the page
flags structure, and not have any special logic for swapping at
all.

Already some of these special cases are really unnecessary, and could
validly be replaced with checking for mapping matches instead. See for
example the lonely use in mm/filemap.c - it would make more sense to
verify that "page->mapping == mapping" instead of testing a
special-purpose bit: suddenly __find_get_swapcache_page and
__find_get_page actually end up doing pretty much the same thing.

It's not worth doing those kinds of small details now, but what I want to
have in the long run is to have _less_ special cases for swapping etc -
we've already pretty clearly shown that swapping isn't really anything
different from shared mappings, it's just a different allocation strategy.

And that's why I'd rather have generic support for _any_ page mapping that
wants to drop pages early than have specific logic for swapping.
Historically, we've always had very good results from trying to avoid
having special cases and instead trying to find what the common rules
might be.

			Linus


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Another VM race? (was: page_launder() bug)
  2001-05-13 16:24               ` Rik van Riel
@ 2001-05-13 21:02                 ` Mikulas Patocka
  2001-05-13 23:04                   ` Rik van Riel
  0 siblings, 1 reply; 104+ messages in thread
From: Mikulas Patocka @ 2001-05-13 21:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: David S. Miller, Marcelo Tosatti, Linus Torvalds, linux-kernel

> > > +		if (!dead_swap_page &&
> > > +		    (PageTestandClearReferenced(page) || page->age > 0 ||
> > > +		     (!page->buffers && page_count(page) > 1) ||
> > > +		     page_ramdisk(page))) {
> > 		^^^^^^^^^^^^^^^^^^^^^^
> > >  			del_page_from_inactive_dirty_list(page);
> > >  			add_page_to_active_list(page);
> > >  			continue;
> > 
> > #define page_ramdisk(page) \
> >         (page->buffers && (MAJOR(page->buffers->b_dev) == RAMDISK_MAJOR))
> > 
> > Are you sure that no one will release buffers under your hands?
> 
> Two things can happen:
> 
> 1) the page gets ramdisk buffers _after_ we look at it first,
>    in this case the page isn't freeable and will be moved to
>    the active list on the next page_launder() loop
> 
> 2) the page loses its ramdisk buffers after we look at it,
>    now the page is freeable, but we won't see it again until
>    it is moved from the active list to the inactive_dirty
>    list again
> 
> Any side effects harmful enough to warrant complicating this
> test ?

I mean this: Let's have a page with buffers. It does not care whether the
buffers are on ramdisk or not. 

CPU 0				CPU 1
is executing the code marked	is executing try_to_free_buffers on
above with ^^^^^^^:		the same page (it can be, because CPU 0
				did not lock the page)

(page->buffers &&

				page->buffers = NULL

MAJOR(page->buffers->b_dev) == 
	RAMDISK_MAJOR)) ===> Oops, NULL pointer dereference!



Maybe compiler CSE optimization will eliminate the double load of
page->buffers, but we must not rely on it. If the compiler doesn't
optimize it, it can produce random oopses.

Mikulas


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Another VM race? (was: page_launder() bug)
  2001-05-13 21:02                 ` Another VM race? (was: page_launder() bug) Mikulas Patocka
@ 2001-05-13 23:04                   ` Rik van Riel
  2001-05-14  9:53                     ` Mikulas Patocka
  0 siblings, 1 reply; 104+ messages in thread
From: Rik van Riel @ 2001-05-13 23:04 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: David S. Miller, Marcelo Tosatti, Linus Torvalds, linux-kernel

On Sun, 13 May 2001, Mikulas Patocka wrote:

> CPU 0				CPU 1
> is executing the code marked	is executing try_to_free_buffers on
> above with ^^^^^^^:		the same page (it can be, because CPU 0
> 				did not lock the page)
> 
> (page->buffers &&
> 
> 				page->buffers = NULL
> 
> MAJOR(page->buffers->b_dev) == 
> 	RAMDISK_MAJOR)) ===> Oops, NULL pointer dereference!
> 
> 
> 
> Maybe compiler CSE optimization will eliminate the double load of
> page->buffers, but we must not rely on it. If the compiler doesn't
> optimize it, it can produce random oopses.

You're right, this should be fixed. Do you happen to have a
patch ? ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-13 20:42                               ` Linus Torvalds
@ 2001-05-14  7:05                                 ` Kai Henningsen
  0 siblings, 0 replies; 104+ messages in thread
From: Kai Henningsen @ 2001-05-14  7:05 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

torvalds@transmeta.com (Linus Torvalds)  wrote on 13.05.01 in <Pine.LNX.4.21.0105131330350.20613-100000@penguin.transmeta.com>:

> And that's why I'd rather have generic support for _any_ page mapping that
> wants to drop pages early than have specific logic for swapping.
> Historically, we've always had very good results from trying to avoid
> having special cases and instead trying to find what the common rules
> might be.

Just a thought: isn't a dead swap page a rather similar condition to a  
page in a file without any links, open file descriptors, or open mmaps? In  
both cases, writeback really makes no sense.

MfG Kai

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Another VM race? (was: page_launder() bug)
  2001-05-13 23:04                   ` Rik van Riel
@ 2001-05-14  9:53                     ` Mikulas Patocka
  0 siblings, 0 replies; 104+ messages in thread
From: Mikulas Patocka @ 2001-05-14  9:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: David S. Miller, Marcelo Tosatti, Linus Torvalds, linux-kernel

> > CPU 0				CPU 1
> > is executing the code marked	is executing try_to_free_buffers on
> > above with ^^^^^^^:		the same page (it can be, because CPU 0
> > 				did not lock the page)
> > 
> > (page->buffers &&
> > 
> > 				page->buffers = NULL
> > 
> > MAJOR(page->buffers->b_dev) == 
> > 	RAMDISK_MAJOR)) ===> Oops, NULL pointer dereference!
> > 
> > 
> > 
> > Maybe compiler CSE optimization will eliminate the double load of
> > page->buffers, but we must not rely on it. If the compiler doesn't
> > optimize it, it can produce random oopses.
> 
> You're right, this should be fixed. Do you happen to have a
> patch ? ;)

You can apply this one.

Mikulas

--- linux/mm/vmscan.c_	Mon May 14 11:41:42 2001
+++ linux/mm/vmscan.c	Mon May 14 11:44:54 2001
@@ -454,8 +454,7 @@
 
 		/* Page is or was in use?  Move it to the active list. */
 		if (PageTestandClearReferenced(page) || page->age > 0 ||
-				(!page->buffers && page_count(page) > 1) ||
-				page_ramdisk(page)) {
+				(!page->buffers && page_count(page) > 1)) {
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_active_list(page);
 			continue;
@@ -470,6 +469,9 @@
 			list_add(page_lru, &inactive_dirty_list);
 			continue;
 		}
+
+		/* M.P.: We must not call page_ramdisk for unlocked page */
+		if (page_ramdisk(page)) goto page_active;
 
 		/*
 		 * Dirty swap-cache page? Write it out if



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
  2001-05-13 19:29                 ` Linus Torvalds
@ 2001-05-14 22:05                   ` Marcelo Tosatti
  0 siblings, 0 replies; 104+ messages in thread
From: Marcelo Tosatti @ 2001-05-14 22:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rik van Riel, linux-kernel



On Sun, 13 May 2001, Linus Torvalds wrote:

> 
> On Sun, 13 May 2001, Rik van Riel wrote:
> > 
> > This means that the swapin path (and the same path for
> > other pagecache pages) doesn't take the page lock and
> > the page lock doesn't protect us from other people using
> > the page while we have it locked.
> 
> You can test for swap cache deadness without holding the page cache lock:
> if the swap count is 1, then we know that nobody else has this swap entry
> in its page tables, and thus there can not be any concurrent lookups
> either.
> 
> Now, it may well be that we need to make sure that there is some proper
> ordering (nobody must decrement the swap count before they increment the
> page count or something). I think that is the case anyway (and I _think_
> that everybody that mucks with the swap count always hold the page count -
> this might be a good thing to check).

Swapin readahead _first_ increases the swap map count for the given page
and then increases the page count.



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  7:16       ` [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode Jeff V. Merkey
@ 2001-11-21  6:22         ` David S. Miller
  2001-11-21  6:47           ` David S. Miller
  2001-11-21  7:33           ` Jeff V. Merkey
  2001-11-21  6:47         ` Kai Henningsen
  2001-11-21  8:49         ` Alan Cox
  2 siblings, 2 replies; 104+ messages in thread
From: David S. Miller @ 2001-11-21  6:22 UTC (permalink / raw)
  To: jmerkey; +Cc: linux-kernel, jmerkey


Your code is violating one of the assertions in
kmem_cache_create(), check the BUG(); calls in
mm/slab.c:kmem_cache_create() to try and figure out
which one you are firing off.

Probably either you are trying to send it debug flags
but CONFIG_SLAB_DEBUG is not defined _OR_ you are trying
to create the same SLAB cache twice (forgetting to destroy
it on module unload perhaps)?

Slab if fine, it's your code which is busted :)

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  7:16       ` [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode Jeff V. Merkey
  2001-11-21  6:22         ` David S. Miller
@ 2001-11-21  6:47         ` Kai Henningsen
  2001-11-21 18:28           ` Jeff Merkey
  2001-11-21  8:49         ` Alan Cox
  2 siblings, 1 reply; 104+ messages in thread
From: Kai Henningsen @ 2001-11-21  6:47 UTC (permalink / raw)
  To: linux-kernel

jmerkey@vger.timpanogas.org (Jeff V. Merkey)  wrote on 21.11.01 in <20011121003304.A683@vger.timpanogas.org>:

> download pre7, apply my patch, and do the build.  I went back
> over how I did the build, and this is the result of the build
> if you have unpacked, patched, then run "make oldconfig."  If I
> do a "make dep" then this problem does not occur, and the build

Isn't that exactly the FAQ Keith points out every other day or so (usually  
because of a modprobe "symbol not found"), one of the design bugs that  
kbuild 2.5 fixes (i.e., the kernel does not notice when it needs to make  
dep, so kbuild 2.5 handles dependencies differently)?

MfG Kai

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  6:22         ` David S. Miller
@ 2001-11-21  6:47           ` David S. Miller
  2001-11-21  6:54             ` Jeff Merkey
  2001-11-21  6:56             ` David S. Miller
  2001-11-21  7:33           ` Jeff V. Merkey
  1 sibling, 2 replies; 104+ messages in thread
From: David S. Miller @ 2001-11-21  6:47 UTC (permalink / raw)
  To: jmerkey; +Cc: linux-kernel, jmerkey

   From: "Jeff V. Merkey" <jmerkey@vger.timpanogas.org>
   Date: Wed, 21 Nov 2001 00:33:04 -0700
   
   No. I think the build in linux is broken.  The Linux tree should 
   not generate garbase opcodes from the includes is make dep 
   has not been run and someone is simply building a module against
   the include files.

It executes a bogus opcode because that is how we signal
an assertion failure, see the BUG() macro define in
include/asm-i386/page.h

If it only fails as a module, then most likely (as I stated in my
original mail, which you decided not to read) you are trying to create
a SLAB cache of the same name twice and it is giving you an OOPS to
let you know about it.

On module unload you have to kmem_cache_destroy or else you'll
hit this assertion failure the next time you load the module.

If you aren't going to look at the things I've asked you to look at to
try and determine the problem, and will merely complain about the
"garbage opcodes" without looking at what put those opcodes there in
the kernel image, then your problem is one that I cannot solve.

I said: "A BUG() assertion is being triggered in slab.c"
You retort: "Nothing should make garbage opcodes execute."

I am now saying: "Go look at the BUG() definition, it is a garbage
opcode and it is on purpose".

Are you now going to say: "Linux is still broken, nothing should make
garbage opcodes, the build in Linux is broken"

???

You are really a fucking pain in the ass to help Jeff.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  7:33           ` Jeff V. Merkey
@ 2001-11-21  6:54             ` Chris Abbey
  2001-11-21  7:05               ` Jeff Merkey
  0 siblings, 1 reply; 104+ messages in thread
From: Chris Abbey @ 2001-11-21  6:54 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: linux-kernel

Today, Jeff V. Merkey wrote:
> [...] I went back
> over how I did the build, and this is the result of the build
> if you have unpacked, patched, then run "make oldconfig."  If I
> do a "make dep" then this problem does not occur, [....]

umm... lemme see if I understand you correctly, you patched the
kernel and soemthing breaks if you don't run make dep after
patching? Unless you can prove 100% that nothing in that
patch affects the dependency structure of the code, nor any of
the other things that are generated during the make dep stage,
then what we have here is user error. The directions say, quite
clearly, make oldconfig, make dep, make vmlinux, etc. Unless my
memory is totally shot tonight the last thing make oldconfig
spits out is in fact the direction to run make dep.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  6:47           ` David S. Miller
@ 2001-11-21  6:54             ` Jeff Merkey
  2001-11-21  6:56             ` David S. Miller
  1 sibling, 0 replies; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21  6:54 UTC (permalink / raw)
  To: jmerkey, David S. Miller; +Cc: linux-kernel


----- Original Message -----
From: "David S. Miller" <davem@redhat.com>
To: <jmerkey@vger.timpanogas.org>
Cc: <linux-kernel@vger.kernel.org>; <jmerkey@timpanogas.org>
Sent: Tuesday, November 20, 2001 11:47 PM
Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
opcode


>    From: "Jeff V. Merkey" <jmerkey@vger.timpanogas.org>
>    Date: Wed, 21 Nov 2001 00:33:04 -0700
>
>    No. I think the build in linux is broken.  The Linux tree should
>    not generate garbase opcodes from the includes is make dep
>    has not been run and someone is simply building a module against
>    the include files.
>
> It executes a bogus opcode because that is how we signal
> an assertion failure, see the BUG() macro define in
> include/asm-i386/page.h
>
> If it only fails as a module, then most likely (as I stated in my
> original mail, which you decided not to read) you are trying to create
> a SLAB cache of the same name twice and it is giving you an OOPS to
> let you know about it.
>

No dave, I read it, and I am not trying to create a slab cache twice.  I
only
create it once.  I did go look at the code and I got the part of using an
invalid
opcode to generate an exception.  Sort of like an int3 embedded in the
code (the old NetWare/NT way).  :-)

> On module unload you have to kmem_cache_destroy or else you'll
> hit this assertion failure the next time you load the module.
>
> If you aren't going to look at the things I've asked you to look at to
> try and determine the problem, and will merely complain about the
> "garbage opcodes" without looking at what put those opcodes there in
> the kernel image, then your problem is one that I cannot solve.
>
> I said: "A BUG() assertion is being triggered in slab.c"
> You retort: "Nothing should make garbage opcodes execute."
>
> I am now saying: "Go look at the BUG() definition, it is a garbage
> opcode and it is on purpose".
>
> Are you now going to say: "Linux is still broken, nothing should make
> garbage opcodes, the build in Linux is broken"
>
> ???
>
> You are really a fucking pain in the ass to help Jeff.

Dave,  I went and looked at this stuff.  I have been running this code for
over a year on 2.4 and I AM NOT CREATING A SLAB CACHE TWICE!!!!
I am building an NWFS module external of the kernel tree, and unless make
dep
has been run, the default behavior of the includes causes me to drop into
the
BUG() trap.

Jeff


> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  6:47           ` David S. Miller
  2001-11-21  6:54             ` Jeff Merkey
@ 2001-11-21  6:56             ` David S. Miller
  2001-11-21  7:03               ` Jeff Merkey
                                 ` (2 more replies)
  1 sibling, 3 replies; 104+ messages in thread
From: David S. Miller @ 2001-11-21  6:56 UTC (permalink / raw)
  To: jmerkey; +Cc: jmerkey, linux-kernel

   From: "Jeff Merkey" <jmerkey@timpanogas.org>
   Date: Tue, 20 Nov 2001 23:54:21 -0700
   
   I am building an NWFS module external of the kernel tree, and unless make
   dep
   has been run, the default behavior of the includes causes me to drop into
   the
   BUG() trap.

When you change configuration options, you have to run make
dep again, that is a known requirement of the 2.4.x build system
like it or not :-)

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  6:56             ` David S. Miller
@ 2001-11-21  7:03               ` Jeff Merkey
  2001-11-21  7:49                 ` arjan
  2001-11-21  7:09               ` David S. Miller
  2001-11-21  7:28               ` Stuart Young
  2 siblings, 1 reply; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21  7:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: jmerkey, linux-kernel


----- Original Message -----
From: "David S. Miller" <davem@redhat.com>
To: <jmerkey@timpanogas.org>
Cc: <jmerkey@vger.timpanogas.org>; <linux-kernel@vger.kernel.org>
Sent: Tuesday, November 20, 2001 11:56 PM
Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
opcode


>    From: "Jeff Merkey" <jmerkey@timpanogas.org>
>    Date: Tue, 20 Nov 2001 23:54:21 -0700
>
>    I am building an NWFS module external of the kernel tree, and unless
make
>    dep
>    has been run, the default behavior of the includes causes me to drop
into
>    the
>    BUG() trap.
>
> When you change configuration options, you have to run make
> dep again, that is a known requirement of the 2.4.x build system

OK.  Cool.  Now we are making progress.  I think this is a nasty problem.
There
are numerous RPMs that will build against the kernel tree and be busted.  I
would
expect an rpm -ba on your DEFAULT kernel in Redhat with the sources
contained
in the kernel.rpm files to also be broken unless someone has done this.  You
probably should have someone check this out.  I just built the SCI drivers
against
2.4.15-pre7 and they blow up as well.

Jeff

> like it or not :-)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  6:54             ` Chris Abbey
@ 2001-11-21  7:05               ` Jeff Merkey
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21  7:05 UTC (permalink / raw)
  To: Chris Abbey, Jeff V. Merkey; +Cc: linux-kernel


This happens as well with the SCI drivers, which are not dependant on any
patches.

Jeff

----- Original Message -----
From: "Chris Abbey" <linux@cabbey.net>
To: "Jeff V. Merkey" <jmerkey@vger.timpanogas.org>
Cc: <linux-kernel@vger.kernel.org>
Sent: Tuesday, November 20, 2001 11:54 PM
Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
opcode


> Today, Jeff V. Merkey wrote:
> > [...] I went back
> > over how I did the build, and this is the result of the build
> > if you have unpacked, patched, then run "make oldconfig."  If I
> > do a "make dep" then this problem does not occur, [....]
>
> umm... lemme see if I understand you correctly, you patched the
> kernel and soemthing breaks if you don't run make dep after
> patching? Unless you can prove 100% that nothing in that
> patch affects the dependency structure of the code, nor any of
> the other things that are generated during the make dep stage,
> then what we have here is user error. The directions say, quite
> clearly, make oldconfig, make dep, make vmlinux, etc. Unless my
> memory is totally shot tonight the last thing make oldconfig
> spits out is in fact the direction to run make dep.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  6:56             ` David S. Miller
  2001-11-21  7:03               ` Jeff Merkey
@ 2001-11-21  7:09               ` David S. Miller
  2001-11-21  7:14                 ` Jeff Merkey
  2001-11-21  7:28               ` Stuart Young
  2 siblings, 1 reply; 104+ messages in thread
From: David S. Miller @ 2001-11-21  7:09 UTC (permalink / raw)
  To: jmerkey; +Cc: jmerkey, linux-kernel

   From: "Jeff Merkey" <jmerkey@timpanogas.org>
   Date: Wed, 21 Nov 2001 00:03:15 -0700
   
   OK.  Cool.  Now we are making progress.  I think this is a nasty problem.
   There
   are numerous RPMs that will build against the kernel tree and be busted.

If you patch sources files of the main kernel, you have to
rebuild the dependencies.

Why does this seem illogical to you?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  7:09               ` David S. Miller
@ 2001-11-21  7:14                 ` Jeff Merkey
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21  7:14 UTC (permalink / raw)
  To: David S. Miller; +Cc: jmerkey, linux-kernel

>
> Why does this seem illogical to you?

Philisophical.  Kind of like Linus hating kernel debuggers or something.  If
someone
is building applcations or modules, etc. in a "commerical" software world (I
just opened
the door to get my head bitten off) where I came from, doing stuff like this
was totally
forbidden.  There's a sort of "shell shocked" conditioning folks get into
who have been
in software companies like where I came from where anything that makes it
difficult for
a vendor, partner, developer, etc. to build and maintain code is considered
a serious
defect.

This type of a problem could cause a partner or vendor to spend a lot of
time trying to
figure out what was wrong.  Even more so since the way I stumbled across the
problem
was building a driver on one system with modversions turned off, then
loading the
module on a target system and watching it crash -- very annoying and
wasteful of
time.  It's just a philisophical kind of thing.  i.e. the tools and code
shoudl not have
"easter eggs" hidden in it that make it harder to maintain code.

:-)

Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
@ 2001-11-21  7:16       ` Jeff V. Merkey
  2001-11-21  6:22         ` David S. Miller
                           ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Jeff V. Merkey @ 2001-11-21  7:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: jmerkey

[-- Attachment #1: Type: text/plain, Size: 246 bytes --]



Here's really strange one.  Building a module against 2.4.15-pre7 
seems to generate invalid opcodes (???) from the kernel includes.
THe code looks very strange.  Perhaps someone who owns kmem_cache_create
has some idea?

Oops attached.

Jeff


[-- Attachment #2: kmem.oops --]
[-- Type: text/plain, Size: 3533 bytes --]

ksymoops 2.4.0 on i686 2.4.15-pre7.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.15-pre7/ (default)
     -m /boot/System.map-2.4.15-pre7 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Nov 21 00:11:14 vger kernel: invalid operand: 0000
Nov 21 00:11:14 vger kernel: CPU:    0
Nov 21 00:11:14 vger kernel: EIP:    0010:[<c012b0eb>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Nov 21 00:11:14 vger kernel: EFLAGS: 00010246
Nov 21 00:11:14 vger kernel: eax: 00000000   ebx: c13da430   ecx: c13da590   edx: c13da684
Nov 21 00:11:14 vger kernel: esi: c13da58d   edi: cc9d8a40   ebp: c13da590   esp: c90e9ef8
Nov 21 00:11:14 vger kernel: ds: 0018   es: 0018   ss: 0018
Nov 21 00:11:14 vger kernel: Process nwfs-async0 (pid: 657, stackpage=c90e9000)
Nov 21 00:11:14 vger kernel: Stack: c90e9efc 00000040 c9265a90 cc9ed5c0 00000000 00000000 cc9b8f17 cc9d8a2f 
Nov 21 00:11:14 vger kernel:        00000060 00000020 00002000 00000000 00000000 00000000 00000000 00000000 
Nov 21 00:11:14 vger kernel:        00000000 00000000 cc9b97c5 00000008 00000200 00000008 c1242bc0 c930c818 
Nov 21 00:11:14 vger kernel: Call Trace: [<cc9ed5c0>] [<cc9b8f17>] [<cc9d8a2f>] [<cc9b97c5>] [<cc9ed5c0>] 
Nov 21 00:11:14 vger kernel:    [<cc9b6314>] [<cc9b6241>] [<cc9ddea0>] [<c0105bdc>] [<cc9ac4e7>] [<cc9d4f69>] 
Nov 21 00:11:14 vger kernel:    [<c0105616>] [<cc9ac478>] 
Nov 21 00:11:14 vger kernel: Code: 0f 0b 89 d1 8b 01 89 c2 0f 18 02 81 f9 4c 2b 2e c0 75 d2 8d 

>>EIP; c012b0eb <kmem_cache_create+2fb/340>   <=====
Trace; cc9ed5c0 <[nwfs]__module_using_checksums+9772/1c771>
Trace; cc9b8f17 <[nwfs]nwfs_get_bh+27/64>
Trace; cc9d8a2f <[nwfs].rodata.start+398f/8d3f>
Trace; cc9b97c5 <[nwfs]aReadDiskSectors+c9/210>
Trace; cc9ed5c0 <[nwfs]__module_using_checksums+9772/1c771>
Trace; cc9b6314 <[nwfs]process_asynch_io+174/274>
Trace; cc9b6241 <[nwfs]process_asynch_io+a1/274>
Trace; cc9ddea0 <[nwfs]asynch_io_sem0+8/14>
Trace; c0105bdc <__down+bc/d0>
Trace; cc9ac4e7 <[nwfs]nwfs_asynch_io_process+6f/cc>
Trace; cc9d4f69 <[nwfs].text.end+2e/165>
Trace; c0105616 <kernel_thread+26/30>
Trace; cc9ac478 <[nwfs]nwfs_asynch_io_process+0/cc>
Code;  c012b0eb <kmem_cache_create+2fb/340>
00000000 <_EIP>:
Code;  c012b0eb <kmem_cache_create+2fb/340>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012b0ed <kmem_cache_create+2fd/340>
   2:   89 d1                     mov    %edx,%ecx
Code;  c012b0ef <kmem_cache_create+2ff/340>
   4:   8b 01                     mov    (%ecx),%eax
Code;  c012b0f1 <kmem_cache_create+301/340>
   6:   89 c2                     mov    %eax,%edx
Code;  c012b0f3 <kmem_cache_create+303/340>
   8:   0f 18 02                  prefetchnta (%edx)
Code;  c012b0f6 <kmem_cache_create+306/340>
   b:   81 f9 4c 2b 2e c0         cmp    $0xc02e2b4c,%ecx
Code;  c012b0fc <kmem_cache_create+30c/340>
  11:   75 d2                     jne    ffffffe5 <_EIP+0xffffffe5> c012b0d0 <kmem_cache_create+2e0/340>
Code;  c012b0fe <kmem_cache_create+30e/340>
  13:   8d 00                     lea    (%eax),%eax


1 warning issued.  Results may not be reliable.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  6:56             ` David S. Miller
  2001-11-21  7:03               ` Jeff Merkey
  2001-11-21  7:09               ` David S. Miller
@ 2001-11-21  7:28               ` Stuart Young
  2 siblings, 0 replies; 104+ messages in thread
From: Stuart Young @ 2001-11-21  7:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: David S. Miller

At 11:09 PM 20/11/01 -0800, David S. Miller wrote:
>If you patch sources files of the main kernel, you have to
>rebuild the dependencies.
>
>Why does this seem illogical to you?

Maybe the way to stop this sort of problem is to enforce it in the kernel 
build procedure/Makefiles (and/or in the module's build procedure, if it's 
separate to the kernel's).

Eg: Anything that produces output code (bzImage, modules, etc) should fail 
if .config is newer than any of the dependencies. Maybe even spit out a 
"You need to run 'make dep' before using 'make bzImage'" or something like 
that. If they really want to get round it, they can play with 'touch'.

This seems perfectly logical to me.


AMC Enterprises P/L    - Stuart Young
First Floor            - Network and Systems Admin
3 Chesterville Rd      - sgy@amc.com.au
Cheltenham Vic 3192    - Ph:  (03) 9584-2700
http://www.amc.com.au/ - Fax: (03) 9584-2755


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  6:22         ` David S. Miller
  2001-11-21  6:47           ` David S. Miller
@ 2001-11-21  7:33           ` Jeff V. Merkey
  2001-11-21  6:54             ` Chris Abbey
  1 sibling, 1 reply; 104+ messages in thread
From: Jeff V. Merkey @ 2001-11-21  7:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, jmerkey

On Tue, Nov 20, 2001 at 10:22:03PM -0800, David S. Miller wrote:
> 
> Your code is violating one of the assertions in
> kmem_cache_create(), check the BUG(); calls in
> mm/slab.c:kmem_cache_create() to try and figure out
> which one you are firing off.
> 
> Probably either you are trying to send it debug flags
> but CONFIG_SLAB_DEBUG is not defined _OR_ you are trying
> to create the same SLAB cache twice (forgetting to destroy
> it on module unload perhaps)?
> 
> Slab if fine, it's your code which is busted :)

David,

I need some help here (big surprise).  I did nothing other than 
download pre7, apply my patch, and do the build.  I went back 
over how I did the build, and this is the result of the build 
if you have unpacked, patched, then run "make oldconfig."  If I
do a "make dep" then this problem does not occur, and the build 
works fine.  If I build the external module (this bug only 
shows up when building the external module file system driver,
not the kernel patched version of NWFS).

No. I think the build in linux is broken.  The Linux tree should 
not generate garbase opcodes from the includes is make dep 
has not been run and someone is simply building a module against
the include files.

:-)

Jeff


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  7:03               ` Jeff Merkey
@ 2001-11-21  7:49                 ` arjan
  2001-11-21 18:31                   ` Jeff Merkey
  0 siblings, 1 reply; 104+ messages in thread
From: arjan @ 2001-11-21  7:49 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: linux-kernel

In article <003401c1725a$975ad4e0$f5976dcf@nwfs> you wrote:

> OK.  Cool.  Now we are making progress.  I think this is a nasty problem.
> There are numerous RPMs that will build against the kernel tree and be
> busted.  I would expect an rpm -ba on your DEFAULT kernel in Redhat with
> the sources contained in the kernel.rpm files to also be broken unless
> someone has done this.

That's why Red Hat ships the kernel-source RPM; you can build external
modules against that and it has the "make dep" information for all kernels
Red Hat ships for that platform (with a smart "if" that selects the
currently running one)........... But note the word "external". You build in
another directory and don't touch the original .config file or tree......
Unless you need core changes, that's perfectly possible for almost all
modules....

Greetings,
    Arjan van de Ven


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  7:16       ` [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode Jeff V. Merkey
  2001-11-21  6:22         ` David S. Miller
  2001-11-21  6:47         ` Kai Henningsen
@ 2001-11-21  8:49         ` Alan Cox
  2001-11-21 18:28           ` Jeff Merkey
  2 siblings, 1 reply; 104+ messages in thread
From: Alan Cox @ 2001-11-21  8:49 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: linux-kernel, jmerkey

> Here's really strange one.  Building a module against 2.4.15-pre7 
> seems to generate invalid opcodes (???) from the kernel includes.

You hit a BUG(). If you rebuild the kernel with verbose BUG reporting 
included you'll get a line and file to work back from

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  8:49         ` Alan Cox
@ 2001-11-21 18:28           ` Jeff Merkey
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21 18:28 UTC (permalink / raw)
  To: Jeff V. Merkey, Alan Cox; +Cc: linux-kernel


----- Original Message -----
From: "Alan Cox" <alan@lxorguk.ukuu.org.uk>
To: "Jeff V. Merkey" <jmerkey@vger.timpanogas.org>
Cc: <linux-kernel@vger.kernel.org>; <jmerkey@timpanogas.org>
Sent: Wednesday, November 21, 2001 1:49 AM
Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
opcode


> > Here's really strange one.  Building a module against 2.4.15-pre7
> > seems to generate invalid opcodes (???) from the kernel includes.
>
> You hit a BUG(). If you rebuild the kernel with verbose BUG reporting
> included you'll get a line and file to work back from

This may help me determine which include file is breaking the tree.  I know
I hit a
BUG() but I should not have.  Looks like a hole somewhere.  I will attempt
to track
this down today.

Jeff


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  6:47         ` Kai Henningsen
@ 2001-11-21 18:28           ` Jeff Merkey
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21 18:28 UTC (permalink / raw)
  To: Kai Henningsen, linux-kernel


----- Original Message -----
From: "Kai Henningsen" <kaih@khms.westfalen.de>
To: <linux-kernel@vger.kernel.org>
Sent: Tuesday, November 20, 2001 11:47 PM
Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
opcode


> jmerkey@vger.timpanogas.org (Jeff V. Merkey)  wrote on 21.11.01 in
<20011121003304.A683@vger.timpanogas.org>:
>
> > download pre7, apply my patch, and do the build.  I went back
> > over how I did the build, and this is the result of the build
> > if you have unpacked, patched, then run "make oldconfig."  If I
> > do a "make dep" then this problem does not occur, and the build
>
> Isn't that exactly the FAQ Keith points out every other day or so (usually
> because of a modprobe "symbol not found"), one of the design bugs that
> kbuild 2.5 fixes (i.e., the kernel does not notice when it needs to make
> dep, so kbuild 2.5 handles dependencies differently)?
>
> MfG Kai

This is good news on the dependency methods.

Jeff

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21  7:49                 ` arjan
@ 2001-11-21 18:31                   ` Jeff Merkey
  2001-11-21 19:06                     ` Doug Ledford
  2001-11-21 19:16                     ` Arjan van de Ven
  0 siblings, 2 replies; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21 18:31 UTC (permalink / raw)
  To: arjan; +Cc: linux-kernel


----- Original Message -----
From: <arjan@fenrus.demon.nl>
To: "Jeff Merkey" <jmerkey@timpanogas.org>
Cc: <linux-kernel@vger.kernel.org>
Sent: Wednesday, November 21, 2001 12:49 AM
Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
opcode


> In article <003401c1725a$975ad4e0$f5976dcf@nwfs> you wrote:
>
> > OK.  Cool.  Now we are making progress.  I think this is a nasty
problem.
> > There are numerous RPMs that will build against the kernel tree and be
> > busted.  I would expect an rpm -ba on your DEFAULT kernel in Redhat with
> > the sources contained in the kernel.rpm files to also be broken unless
> > someone has done this.
>
> That's why Red Hat ships the kernel-source RPM; you can build external
> modules against that and it has the "make dep" information for all kernels
> Red Hat ships for that platform (with a smart "if" that selects the
> currently running one)........... But note the word "external". You build
in
> another directory and don't touch the original .config file or tree......
> Unless you need core changes, that's perfectly possible for almost all
> modules....
>

I would anticipate seeing this problem with their kernel source RPM.  In
fact, I do,
you have to do a make distclean before you can use it because of the way
their rpm
script munges all the versioned trees into a tmp area during RPM creation.
There's only
one source tree (usually the last one they built) and lots of binary rpm
versions from the
one tree (i.e. i386, i686, etc.).

Jeff

> Greetings,
>     Arjan van de Ven


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 18:31                   ` Jeff Merkey
@ 2001-11-21 19:06                     ` Doug Ledford
  2001-11-21 19:51                       ` Jeff Merkey
  2001-11-21 19:16                     ` Arjan van de Ven
  1 sibling, 1 reply; 104+ messages in thread
From: Doug Ledford @ 2001-11-21 19:06 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: arjan, linux-kernel

Jeff Merkey wrote:

> ----- Original Message -----
> From: <arjan@fenrus.demon.nl>
> To: "Jeff Merkey" <jmerkey@timpanogas.org>
> Cc: <linux-kernel@vger.kernel.org>
> Sent: Wednesday, November 21, 2001 12:49 AM
> Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
> opcode
> 
> 
> 
>>In article <003401c1725a$975ad4e0$f5976dcf@nwfs> you wrote:
>>
>>
>>>OK.  Cool.  Now we are making progress.  I think this is a nasty
>>>
> problem.
> 
>>>There are numerous RPMs that will build against the kernel tree and be
>>>busted.  I would expect an rpm -ba on your DEFAULT kernel in Redhat with
>>>the sources contained in the kernel.rpm files to also be broken unless
>>>someone has done this.
>>>
>>That's why Red Hat ships the kernel-source RPM; you can build external
>>modules against that and it has the "make dep" information for all kernels
>>Red Hat ships for that platform (with a smart "if" that selects the
>>currently running one)........... But note the word "external". You build
>>
> in
> 
>>another directory and don't touch the original .config file or tree......
>>Unless you need core changes, that's perfectly possible for almost all
>>modules....
>>
>>
> 
> I would anticipate seeing this problem with their kernel source RPM.  In
> fact, I do,
> you have to do a make distclean before you can use it because of the way
> their rpm
> script munges all the versioned trees into a tmp area during RPM creation.



This is not true at all.  To see what I'm referring to, download the 
module build kit from my website (http://people.redhat.com/dledford) and 
see how it uses box stock kernel include files from our kernel-source 
package to build *all* of the needed modules from one source tree (i386, 
i386smp, i386BOOT, i586, i586SMP, i686, i686SMP, i686enterprise).


> There's only
> one source tree (usually the last one they built) and lots of binary rpm
> versions from the
> one tree (i.e. i386, i686, etc.).


If you look in the linux/include/linux/modversions/*.h files, you will 
see that the different RPM build package versions have all their symbols 
in those files #ifdef'ed so that you get the ones you need to match the 
running kernel (or if you trick the rhversion.h file, whichever version 
you request, see my Makefile).


> Jeff
> 
> 
>>Greetings,
>>    Arjan van de Ven
>>
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



-- 

  Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
       Please check my web site for aic7xxx updates/answers before
                       e-mailing me about problems


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 18:31                   ` Jeff Merkey
  2001-11-21 19:06                     ` Doug Ledford
@ 2001-11-21 19:16                     ` Arjan van de Ven
  2001-11-21 19:53                       ` Jeff Merkey
  1 sibling, 1 reply; 104+ messages in thread
From: Arjan van de Ven @ 2001-11-21 19:16 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: linux-kernel

On Wed, Nov 21, 2001 at 11:31:15AM -0700, Jeff Merkey wrote:

> I would anticipate seeing this problem with their kernel source RPM.  In
> fact, I do, you have to do a make distclean before you can use it because
> of the way their rpm script munges all the versioned trees into a tmp area
> during RPM creation. There's only one source tree (usually the last one
> they built) and lots of binary rpm versions from the one tree (i.e. i386,
> i686, etc.).

Yes and during the build the modversions and depenency info  etc for each
version is nicely stored in separate directories which is later combined
into one tree with #if's for the proper currently running kernel.

Have you even looked at the kernel-source RPM ?

Greetings,
  Arjan van de Ven
  Red Hat Linux kernel maintainer

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 19:06                     ` Doug Ledford
@ 2001-11-21 19:51                       ` Jeff Merkey
  2001-11-21 19:58                         ` J Sloan
  2001-11-21 20:38                         ` Doug Ledford
  0 siblings, 2 replies; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21 19:51 UTC (permalink / raw)
  To: Doug Ledford; +Cc: arjan, linux-kernel

Doug,

I have seen some problems with the rpm build and default install of your
kernel sources.
NWFS and the SCI drivers will **NOT** build against it since you post in a
linux and linux-up kernel for lilo during boot.  People using these drivers
who email me always have to do a "make distclean" to get stuff to build.  I
am very familiar with the kernel.h
changes you guys put in that are different from stock kernels, but despite
this, it's
far from "plug and play" for a customer building third party kernel modules
on your rpms.
I am not saying this is bad or anything, but it does require that the
customer A) have a
Linux consultant to do the installation or B) be a competent Linux
programmer.  Kind of
tough this to expect a secretary to do without a little help.

This is way off topic at this point.  This was originally related to BUG()
getting called from builds against a virgin source tree.  Alan Cox has asked
me to look into the code and determine just where the BUG() message is
getting generated from.  I am pursuing this at present.

Jeff

:-)

Jeff


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 19:16                     ` Arjan van de Ven
@ 2001-11-21 19:53                       ` Jeff Merkey
  2001-11-21 20:36                         ` Doug Ledford
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21 19:53 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel


----- Original Message -----
From: "Arjan van de Ven" <arjan@fenrus.demon.nl>
To: "Jeff Merkey" <jmerkey@timpanogas.org>
Cc: <linux-kernel@vger.kernel.org>
Sent: Wednesday, November 21, 2001 12:16 PM
Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
opcode


> On Wed, Nov 21, 2001 at 11:31:15AM -0700, Jeff Merkey wrote:
>
> > I would anticipate seeing this problem with their kernel source RPM.  In
> > fact, I do, you have to do a make distclean before you can use it
because
> > of the way their rpm script munges all the versioned trees into a tmp
area
> > during RPM creation. There's only one source tree (usually the last one
> > they built) and lots of binary rpm versions from the one tree (i.e.
i386,
> > i686, etc.).
>
> Yes and during the build the modversions and depenency info  etc for each
> version is nicely stored in separate directories which is later combined
> into one tree with #if's for the proper currently running kernel.
>
> Have you even looked at the kernel-source RPM ?

Yes.  I based a Linux distribution on RedHat's 6.2 last year, and I am
**VERY** familiar with your anaconda installer and kernel.src.rpm build
modules.  I know the 7.X stuff got a hell of a lot better, but customers
still have to sterilize the build area are your rpm gets installed in order
to build external kernel modules.

Jeff

>
> Greetings,
>   Arjan van de Ven
>   Red Hat Linux kernel maintainer


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 19:51                       ` Jeff Merkey
@ 2001-11-21 19:58                         ` J Sloan
  2001-11-21 20:38                         ` Doug Ledford
  1 sibling, 0 replies; 104+ messages in thread
From: J Sloan @ 2001-11-21 19:58 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: linux-kernel

Jeff Merkey wrote:

>  Kind of
> tough this to expect a secretary to do without a little help.

You have secretaries developing kernel modules?

cu

jjs



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 19:53                       ` Jeff Merkey
@ 2001-11-21 20:36                         ` Doug Ledford
  2001-11-21 21:16                           ` Jeff Merkey
  2001-11-21 21:28                           ` Robert Love
  0 siblings, 2 replies; 104+ messages in thread
From: Doug Ledford @ 2001-11-21 20:36 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: Arjan van de Ven, linux-kernel

Jeff Merkey wrote:

>>Have you even looked at the kernel-source RPM ?
>>
> 
> Yes.  I based a Linux distribution on RedHat's 6.2 last year, and I am
> **VERY** familiar with your anaconda installer and kernel.src.rpm build
> modules.  I know the 7.X stuff got a hell of a lot better, but customers
> still have to sterilize the build area are your rpm gets installed in order
> to build external kernel modules.


<sigh>  Again, this isn't true.  I build modules against our 
kernel-source RPM tree all the time, and I *never* do a make distclean. 
  If I did, it would screw the tree permanently.  If you are basing your 
arguments about what you saw with 6.2, then you are sorely out of date 
(hell, that was still a 2.2 kernel system).  Things have improved a lot 
since then.  The one overridding rule of working with a tree like we 
ship though, is *NEVER* do anything in the tree itself.  That tree is 
assembled to provide *ALL* the kernel versions and includes for all the 
kernels we ship.  However, even doing a make dep in the tree will blow 
important parts away.  Download my module build kit and see what I'm 
talking about because you currently obviously *don't* know what I'm 
talking about.




-- 

  Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
       Please check my web site for aic7xxx updates/answers before
                       e-mailing me about problems


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 19:51                       ` Jeff Merkey
  2001-11-21 19:58                         ` J Sloan
@ 2001-11-21 20:38                         ` Doug Ledford
  2001-11-21 21:17                           ` Jeff Merkey
  1 sibling, 1 reply; 104+ messages in thread
From: Doug Ledford @ 2001-11-21 20:38 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: arjan, linux-kernel

Jeff Merkey wrote:

> Doug,
> 
> I have seen some problems with the rpm build and default install of your
> kernel sources.
> NWFS and the SCI drivers will **NOT** build against it since you post in a
> linux and linux-up kernel for lilo during boot. 


It would if you used my module build kit.

> People using these drivers
> who email me always have to do a "make distclean" to get stuff to build. 


They (and you) think they do, but they don't.

> I
> am very familiar with the kernel.h
> changes you guys put in that are different from stock kernels, but despite
> this, it's
> far from "plug and play" for a customer building third party kernel modules
> on your rpms.



See my build kit (which has been available since 6.2 incidentally). 
Very plug and play for a kernel developer.






-- 

  Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
       Please check my web site for aic7xxx updates/answers before
                       e-mailing me about problems


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 20:36                         ` Doug Ledford
@ 2001-11-21 21:16                           ` Jeff Merkey
  2001-11-21 21:28                           ` Robert Love
  1 sibling, 0 replies; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21 21:16 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Arjan van de Ven, linux-kernel

Which kernel version?  Neither NWFS or SCI will build against "stock"
installed kernel sources provided with Seawolf.

Jeff


----- Original Message -----
From: "Doug Ledford" <dledford@redhat.com>
To: "Jeff Merkey" <jmerkey@timpanogas.org>
Cc: "Arjan van de Ven" <arjan@fenrus.demon.nl>;
<linux-kernel@vger.kernel.org>
Sent: Wednesday, November 21, 2001 1:36 PM
Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
opcode


> Jeff Merkey wrote:
>
> >>Have you even looked at the kernel-source RPM ?
> >>
> >
> > Yes.  I based a Linux distribution on RedHat's 6.2 last year, and I am
> > **VERY** familiar with your anaconda installer and kernel.src.rpm build
> > modules.  I know the 7.X stuff got a hell of a lot better, but customers
> > still have to sterilize the build area are your rpm gets installed in
order
> > to build external kernel modules.
>
>
> <sigh>  Again, this isn't true.  I build modules against our
> kernel-source RPM tree all the time, and I *never* do a make distclean.
>   If I did, it would screw the tree permanently.  If you are basing your
> arguments about what you saw with 6.2, then you are sorely out of date
> (hell, that was still a 2.2 kernel system).  Things have improved a lot
> since then.  The one overridding rule of working with a tree like we
> ship though, is *NEVER* do anything in the tree itself.  That tree is
> assembled to provide *ALL* the kernel versions and includes for all the
> kernels we ship.  However, even doing a make dep in the tree will blow
> important parts away.  Download my module build kit and see what I'm
> talking about because you currently obviously *don't* know what I'm
> talking about.
>
>
>
>
> --
>
>   Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
>        Please check my web site for aic7xxx updates/answers before
>                        e-mailing me about problems


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 20:38                         ` Doug Ledford
@ 2001-11-21 21:17                           ` Jeff Merkey
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Merkey @ 2001-11-21 21:17 UTC (permalink / raw)
  To: Doug Ledford; +Cc: arjan, linux-kernel


----- Original Message -----
From: "Doug Ledford" <dledford@redhat.com>
To: "Jeff Merkey" <jmerkey@timpanogas.org>
Cc: <arjan@fenrus.demon.nl>; <linux-kernel@vger.kernel.org>
Sent: Wednesday, November 21, 2001 1:38 PM
Subject: Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid
opcode


> Jeff Merkey wrote:
>
> > Doug,
> >
> > I have seen some problems with the rpm build and default install of your
> > kernel sources.
> > NWFS and the SCI drivers will **NOT** build against it since you post in
a
> > linux and linux-up kernel for lilo during boot.
>
>
> It would if you used my module build kit.
>

Where is this kit?  Is is included in the distribution.  This is way OT,
lets take this discussion off line regarding specific red hat issues.

Jeff

> > People using these drivers
> > who email me always have to do a "make distclean" to get stuff to build.
>
>
> They (and you) think they do, but they don't.
>
> > I
> > am very familiar with the kernel.h
> > changes you guys put in that are different from stock kernels, but
despite
> > this, it's
> > far from "plug and play" for a customer building third party kernel
modules
> > on your rpms.
>
>
>
> See my build kit (which has been available since 6.2 incidentally).
> Very plug and play for a kernel developer.
>
>
>
>
>
>
> --
>
>   Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
>        Please check my web site for aic7xxx updates/answers before
>                        e-mailing me about problems


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode
  2001-11-21 20:36                         ` Doug Ledford
  2001-11-21 21:16                           ` Jeff Merkey
@ 2001-11-21 21:28                           ` Robert Love
  1 sibling, 0 replies; 104+ messages in thread
From: Robert Love @ 2001-11-21 21:28 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: Doug Ledford, Arjan van de Ven, linux-kernel

On Wed, 2001-11-21 at 16:16, Jeff Merkey wrote:
> Which kernel version?  Neither NWFS or SCI will build against "stock"
> installed kernel sources provided with Seawolf.

You are using the kernel-source RPM and not the kernel SRPM, right? 
I've seen that mistake.  But, I've never seen a problem compiling
against kernel-source.

	Robert Love



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: page_launder() bug
@ 2001-05-07  0:32 Jonathan Lundell
  0 siblings, 0 replies; 104+ messages in thread
From: Jonathan Lundell @ 2001-05-07  0:32 UTC (permalink / raw)
  To: linux-kernel

At 12:07 AM +0200 2001-05-07, BERECZ Szabolcs wrote:
>On Sun, 6 May 2001, Jonathan Morton wrote:
>
>  > >-			 page_count(page) == (1 + !!page->buffers));
>>
>>  Two inversions in a row?  I'd like to see that made more explicit,
>>  otherwise it looks like a bug to me.  Of course, if it IS a bug...
>it's not a bug.
>if page->buffers is zero, than the page_count(page) is 1, and if
>page->buffers is other than zero, page_count(page) is 2.
>so it checks if page is really used by something.
>maybe this last line is not true, but the !!page->buffers is not a bug.

There's something to be said for expressing it a little more clearly:

	page_count(page) == (page->buffers ? 2 : 1);

(sorry, I don't remember the relative precedence of == and ?:)
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2001-11-21 21:30 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-05-06 21:08 page_launder() bug BERECZ Szabolcs
2001-05-06 21:59 ` Jonathan Morton
2001-05-06 22:07   ` BERECZ Szabolcs
2001-05-07  4:55 ` David S. Miller
2001-05-07  5:19   ` Aaron Lehmann
2001-05-07  6:26   ` Tobias Ringstrom
2001-05-07  8:59     ` Helge Hafting
2001-05-07 19:02       ` J . A . Magallon
2001-05-08  7:52         ` Helge Hafting
2001-05-10 12:19           ` Ingo Oeser
2001-05-10 10:51         ` Anuradha Ratnaweera
2001-05-07 10:52     ` Alan Cox
2001-05-07 13:49     ` Daniel Phillips
2001-05-07 13:53     ` H. Peter Anvin
2001-05-07  8:54   ` David S. Miller
2001-05-07 15:12     ` Tobias Ringstrom
2001-05-07 14:52   ` Horst von Brand
     [not found]     ` <davem@redhat.com>
2001-11-21  7:16       ` [VM/MEMORY-SICKNESS] 2.4.15-pre7 kmem_cache_create invalid opcode Jeff V. Merkey
2001-11-21  6:22         ` David S. Miller
2001-11-21  6:47           ` David S. Miller
2001-11-21  6:54             ` Jeff Merkey
2001-11-21  6:56             ` David S. Miller
2001-11-21  7:03               ` Jeff Merkey
2001-11-21  7:49                 ` arjan
2001-11-21 18:31                   ` Jeff Merkey
2001-11-21 19:06                     ` Doug Ledford
2001-11-21 19:51                       ` Jeff Merkey
2001-11-21 19:58                         ` J Sloan
2001-11-21 20:38                         ` Doug Ledford
2001-11-21 21:17                           ` Jeff Merkey
2001-11-21 19:16                     ` Arjan van de Ven
2001-11-21 19:53                       ` Jeff Merkey
2001-11-21 20:36                         ` Doug Ledford
2001-11-21 21:16                           ` Jeff Merkey
2001-11-21 21:28                           ` Robert Love
2001-11-21  7:09               ` David S. Miller
2001-11-21  7:14                 ` Jeff Merkey
2001-11-21  7:28               ` Stuart Young
2001-11-21  7:33           ` Jeff V. Merkey
2001-11-21  6:54             ` Chris Abbey
2001-11-21  7:05               ` Jeff Merkey
2001-11-21  6:47         ` Kai Henningsen
2001-11-21 18:28           ` Jeff Merkey
2001-11-21  8:49         ` Alan Cox
2001-11-21 18:28           ` Jeff Merkey
2001-05-08 17:59   ` page_launder() bug Kai Henningsen
2001-05-09  2:32   ` Rusty Russell
2001-05-09  8:43     ` Martin Dalecki
2001-05-09  3:36   ` Jonathan Morton
2001-05-07 17:59 ` Linus Torvalds
2001-05-07 21:22   ` Marcelo Tosatti
2001-05-07 23:23     ` Linus Torvalds
2001-05-07 21:50       ` Marcelo Tosatti
2001-05-07 23:52         ` Linus Torvalds
2001-05-07 22:26           ` Marcelo Tosatti
2001-05-08  2:29             ` Linus Torvalds
2001-05-13 16:08               ` Rik van Riel
2001-05-13 19:29                 ` Linus Torvalds
2001-05-14 22:05                   ` Marcelo Tosatti
2001-05-08  0:16           ` David S. Miller
2001-05-08  2:34             ` Linus Torvalds
2001-05-08  1:40               ` Marcelo Tosatti
2001-05-08  3:46                 ` Linus Torvalds
2001-05-08  2:37                   ` Marcelo Tosatti
2001-05-08 21:16                   ` Marcelo Tosatti
2001-05-08 23:38                     ` Linus Torvalds
2001-05-08 23:53                       ` Marcelo Tosatti
2001-05-09  2:13                       ` David S. Miller
2001-05-09 17:38                         ` Marcelo Tosatti
2001-05-09 20:05                         ` David S. Miller
2001-05-09 18:40                           ` Marcelo Tosatti
2001-05-09 21:08                           ` David S. Miller
2001-05-09 19:50                             ` Marcelo Tosatti
2001-05-13 16:34                         ` Rik van Riel
2001-05-13 19:34                           ` Linus Torvalds
2001-05-13 19:39                             ` Rik van Riel
2001-05-13 20:42                               ` Linus Torvalds
2001-05-14  7:05                                 ` Kai Henningsen
2001-05-13 17:52                         ` David S. Miller
2001-05-13 17:55                           ` Rik van Riel
2001-05-13 18:00                           ` David S. Miller
2001-05-08  6:50                 ` David S. Miller
2001-05-08  7:40                   ` Linus Torvalds
2001-05-08 18:53                     ` Marcelo Tosatti
2001-05-08  8:29                   ` David S. Miller
2001-05-08  3:22               ` David S. Miller
2001-05-08  3:26                 ` Linus Torvalds
2001-05-08  2:47             ` David S. Miller
2001-05-08  1:34               ` Marcelo Tosatti
2001-05-08  3:18               ` David S. Miller
2001-05-08  1:47                 ` Marcelo Tosatti
2001-05-08  3:24                 ` Linus Torvalds
2001-05-08  3:29                 ` David S. Miller
2001-05-08 10:36                   ` BERECZ Szabolcs
2001-05-08 12:33             ` Mikulas Patocka
2001-05-13 16:24               ` Rik van Riel
2001-05-13 21:02                 ` Another VM race? (was: page_launder() bug) Mikulas Patocka
2001-05-13 23:04                   ` Rik van Riel
2001-05-14  9:53                     ` Mikulas Patocka
2001-05-08  0:06         ` page_launder() bug David S. Miller
2001-05-07 23:31     ` David S. Miller
2001-05-07 22:44 ` David S. Miller
2001-05-08  1:00   ` Horst von Brand
2001-05-07  0:32 Jonathan Lundell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).