linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug with shared memory.
@ 2002-05-14 15:13 Martin Schwidefsky
  2002-05-14 19:33 ` Andrew Morton
  0 siblings, 1 reply; 29+ messages in thread
From: Martin Schwidefsky @ 2002-05-14 15:13 UTC (permalink / raw)
  To: linux-kernel

Hi,
we managed to hang the kernel with a db/2 stress test on s/390. The test
was done on 2.4.7 but the problem is present on all recent 2.4.x and 2.5.x
kernels (all architectures). In short a schedule is done while holding
the shm_lock of a shared memory segment. The system call that caused
this has been sys_ipc with IPC_RMID and from there the call chain is
as follows: sys_shmctl, shm_destroy, fput, dput, iput, truncate_inode_pages,
truncate_list_pages, schedule. The scheduler picked a process that called
sys_shmat. It tries to get the lock and hangs.

One way to fix this is to remove the schedule call from truncate_list_pages:

--- linux-2.5/mm/filemap.c~   Tue May 14 17:04:14 2002
+++ linux-2.5/mm/filemap.c    Tue May 14 17:04:33 2002
@@ -237,11 +237,6 @@

                  page_cache_release(page);

-                 if (need_resched()) {
-                       __set_current_state(TASK_RUNNING);
-                       schedule();
-                 }
-
                  write_lock(&mapping->page_lock);
                  goto restart;
            }

Another way is to free the lock before calling fput in shm_destroy but the
comment says that this functions has to be called with shp and shm_ids.sem
locked. Comments?

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-14 15:13 Bug with shared memory Martin Schwidefsky
@ 2002-05-14 19:33 ` Andrew Morton
  2002-05-15 22:42   ` Mike Kravetz
  2002-05-20  4:30   ` Andrea Arcangeli
  0 siblings, 2 replies; 29+ messages in thread
From: Andrew Morton @ 2002-05-14 19:33 UTC (permalink / raw)
  To: Martin Schwidefsky; +Cc: linux-kernel

Martin Schwidefsky wrote:
> 
> Hi,
> we managed to hang the kernel with a db/2 stress test on s/390. The test
> was done on 2.4.7 but the problem is present on all recent 2.4.x and 2.5.x
> kernels (all architectures). In short a schedule is done while holding
> the shm_lock of a shared memory segment. The system call that caused
> this has been sys_ipc with IPC_RMID and from there the call chain is
> as follows: sys_shmctl, shm_destroy, fput, dput, iput, truncate_inode_pages,
> truncate_list_pages, schedule. The scheduler picked a process that called
> sys_shmat. It tries to get the lock and hangs.

There's no way the kernel can successfully hold a spinlock
across that call chain.

> One way to fix this is to remove the schedule call from truncate_list_pages:
> 
> --- linux-2.5/mm/filemap.c~   Tue May 14 17:04:14 2002
> +++ linux-2.5/mm/filemap.c    Tue May 14 17:04:33 2002
> @@ -237,11 +237,6 @@
> 
>                   page_cache_release(page);
> 
> -                 if (need_resched()) {
> -                       __set_current_state(TASK_RUNNING);
> -                       schedule();
> -                 }
> -
>                   write_lock(&mapping->page_lock);
>                   goto restart;
>             }
> 
> Another way is to free the lock before calling fput in shm_destroy but the
> comment says that this functions has to be called with shp and shm_ids.sem
> locked. Comments?

Maybe ipc_ids.ary should become a semaphore?

-

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-14 19:33 ` Andrew Morton
@ 2002-05-15 22:42   ` Mike Kravetz
  2002-05-15 23:07     ` Andrew Morton
  2002-05-17 17:53     ` Bill Davidsen
  2002-05-20  4:30   ` Andrea Arcangeli
  1 sibling, 2 replies; 29+ messages in thread
From: Mike Kravetz @ 2002-05-15 22:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin Schwidefsky, linux-kernel

On Tue, May 14, 2002 at 12:33:23PM -0700, Andrew Morton wrote:
> Martin Schwidefsky wrote:
> >                                          The system call that caused
> > this has been sys_ipc with IPC_RMID and from there the call chain is
> > as follows: sys_shmctl, shm_destroy, fput, dput, iput, truncate_inode_pages,
> > truncate_list_pages, schedule. The scheduler picked a process that called
> > sys_shmat. It tries to get the lock and hangs.
> 
> There's no way the kernel can successfully hold a spinlock
> across that call chain.
> 

Is adding a check for this type of situation (under CONFIG_DEBUG_SPINLOCK
of course) worth the effort?  One would simply add a 'locks_held' count
for each task and check for zero at certain places such as return to
user mode, and during context switching.

It appears that this was done for 'sparc64', but no other architectures.
I would consider doing this for i386, if anyone would actually use it.

One would think these types of things are easily found, but this example
suggests otherwise.  Has anyone run the kernel through an extensive
(stress) test suite with any of the kernel debug options enabled?

-- 
Mike

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-15 22:42   ` Mike Kravetz
@ 2002-05-15 23:07     ` Andrew Morton
  2002-05-17 17:53     ` Bill Davidsen
  1 sibling, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2002-05-15 23:07 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Martin Schwidefsky, linux-kernel

Mike Kravetz wrote:
> 
> On Tue, May 14, 2002 at 12:33:23PM -0700, Andrew Morton wrote:
> > Martin Schwidefsky wrote:
> > >                                          The system call that caused
> > > this has been sys_ipc with IPC_RMID and from there the call chain is
> > > as follows: sys_shmctl, shm_destroy, fput, dput, iput, truncate_inode_pages,
> > > truncate_list_pages, schedule. The scheduler picked a process that called
> > > sys_shmat. It tries to get the lock and hangs.
> >
> > There's no way the kernel can successfully hold a spinlock
> > across that call chain.
> >
> 
> Is adding a check for this type of situation (under CONFIG_DEBUG_SPINLOCK
> of course) worth the effort?  One would simply add a 'locks_held' count
> for each task and check for zero at certain places such as return to
> user mode, and during context switching.

I think it would be worth the effort.  One approach would be to
create a `can_sleep()' macro.  Add that to functions which may
schedule.  It's useful for documentation purposes as well as runtime
checks.

The Stanford checker caught a lot of these, but it seems that
the (high) amount of source-level obfuscation in the ipc code
defeated it.

> One would think these types of things are easily found, but this example
> suggests otherwise.  Has anyone run the kernel through an extensive
> (stress) test suite with any of the kernel debug options enabled?

There are at present no tools in the tree to trap this
problem.

-

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-15 22:42   ` Mike Kravetz
  2002-05-15 23:07     ` Andrew Morton
@ 2002-05-17 17:53     ` Bill Davidsen
  2002-05-17 20:07       ` Mike Kravetz
  1 sibling, 1 reply; 29+ messages in thread
From: Bill Davidsen @ 2002-05-17 17:53 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Andrew Morton, Martin Schwidefsky, linux-kernel

On Wed, 15 May 2002, Mike Kravetz wrote:


> It appears that this was done for 'sparc64', but no other architectures.
> I would consider doing this for i386, if anyone would actually use it.
> 
> One would think these types of things are easily found, but this example
> suggests otherwise.  Has anyone run the kernel through an extensive
> (stress) test suite with any of the kernel debug options enabled?

Does this imply that the option:
  CONFIG_DEBUG_SPINLOCK=y
doesn't work on x86? Or works poorly? I'm not sure what changes you're
proposing, but if they will make this more robust I'll certainly use them!
SMP lockups are the bane of my existance, although
19-pre8-ac4+preempt+iowait has yet to take that route.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-17 17:53     ` Bill Davidsen
@ 2002-05-17 20:07       ` Mike Kravetz
  2002-05-17 20:29         ` Anton Blanchard
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Kravetz @ 2002-05-17 20:07 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel

On Fri, May 17, 2002 at 01:53:10PM -0400, Bill Davidsen wrote:
> On Wed, 15 May 2002, Mike Kravetz wrote:
> 
> > It appears that this was done for 'sparc64', but no other architectures.
> > I would consider doing this for i386, if anyone would actually use it.
> > 
> > One would think these types of things are easily found, but this example
> > suggests otherwise.  Has anyone run the kernel through an extensive
> > (stress) test suite with any of the kernel debug options enabled?
> 
> Does this imply that the option:
>   CONFIG_DEBUG_SPINLOCK=y
> doesn't work on x86? Or works poorly?

No I did not intend to imply this. AFAIK 'CONFIG_DEBUG_SPINLOCK=y'
works fine on x86 (although I'm not a user myself).  My intention
was only to add additional features to x86, that appear to only exist
for sparc64.

-- 
Mike

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-17 20:07       ` Mike Kravetz
@ 2002-05-17 20:29         ` Anton Blanchard
  0 siblings, 0 replies; 29+ messages in thread
From: Anton Blanchard @ 2002-05-17 20:29 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Bill Davidsen, linux-kernel

 
> No I did not intend to imply this. AFAIK 'CONFIG_DEBUG_SPINLOCK=y'
> works fine on x86 (although I'm not a user myself).  My intention
> was only to add additional features to x86, that appear to only exist
> for sparc64.

It would be nice to see some of these things end up in a generic
place. Now that we can wrap our spinlocks easily (preempt changed
all the spinlocks to _raw_*) we could create a generic debug option
that checked:

uninitialised lock, timeout on deadlock, double lock by same cpu,
unlock by another cpu, unlock when not locked, scheduling with lock held,
read lock when same cpu has write lock.

And Im sure there are some more useful tests. 

Anton

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-14 19:33 ` Andrew Morton
  2002-05-15 22:42   ` Mike Kravetz
@ 2002-05-20  4:30   ` Andrea Arcangeli
  2002-05-20  5:21     ` Andrew Morton
                       ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: Andrea Arcangeli @ 2002-05-20  4:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel

On Tue, May 14, 2002 at 12:33:23PM -0700, Andrew Morton wrote:
> Martin Schwidefsky wrote:
> > 
> > Hi,
> > we managed to hang the kernel with a db/2 stress test on s/390. The test
> > was done on 2.4.7 but the problem is present on all recent 2.4.x and 2.5.x
> > kernels (all architectures). In short a schedule is done while holding
> > the shm_lock of a shared memory segment. The system call that caused
> > this has been sys_ipc with IPC_RMID and from there the call chain is
> > as follows: sys_shmctl, shm_destroy, fput, dput, iput, truncate_inode_pages,
> > truncate_list_pages, schedule. The scheduler picked a process that called
> > sys_shmat. It tries to get the lock and hangs.
> 
> There's no way the kernel can successfully hold a spinlock
> across that call chain.

Yep, that's a blocking operations (even truncate_inode_pages can block
for reasons not related to the expired timeslice).

> 
> > One way to fix this is to remove the schedule call from truncate_list_pages:
> > 
> > --- linux-2.5/mm/filemap.c~   Tue May 14 17:04:14 2002
> > +++ linux-2.5/mm/filemap.c    Tue May 14 17:04:33 2002
> > @@ -237,11 +237,6 @@
> > 
> >                   page_cache_release(page);
> > 
> > -                 if (need_resched()) {
> > -                       __set_current_state(TASK_RUNNING);
> > -                       schedule();
> > -                 }
> > -
> >                   write_lock(&mapping->page_lock);
> >                   goto restart;
> >             }
> > 
> > Another way is to free the lock before calling fput in shm_destroy but the
> > comment says that this functions has to be called with shp and shm_ids.sem
> > locked. Comments?
> 
> Maybe ipc_ids.ary should become a semaphore?

the shmid spinlock is not needed across the fput so we don't need to
switch to a semaphore, btw, there seems some bit of superflous locking
in such file (usually harmless, but not in this case). NOTE: apparently
only shm_rmid seems to need the shmid lock in such function (please
double check).

This should fix it:

--- 2.4.19pre8aa3/ipc/shm.c.~1~	Wed May 15 22:37:20 2002
+++ 2.4.19pre8aa3/ipc/shm.c	Mon May 20 06:18:11 2002
@@ -117,12 +117,15 @@ static void shm_open (struct vm_area_str
  *
  * @shp: struct to free
  *
- * It has to be called with shp and shm_ids.sem locked
+ * It has to be called with the id locked and the shm_ids.sem
+ * held. Before returning it unlocks the id internally. fput()
+ * is a blocking operation.
  */
-static void shm_destroy (struct shmid_kernel *shp)
+static void shm_destroy (struct shmid_kernel *shp, int id)
 {
-	shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	shm_rmid (shp->id);
+	shm_unlock(id);
+	shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	shmem_lock(shp->shm_file, 0);
 	fput (shp->shm_file);
 	kfree (shp);
@@ -149,9 +152,9 @@ static void shm_close (struct vm_area_st
 	shp->shm_nattch--;
 	if(shp->shm_nattch == 0 &&
 	   shp->shm_flags & SHM_DEST)
-		shm_destroy (shp);
-
-	shm_unlock(id);
+		shm_destroy (shp, id);
+	else
+		shm_unlock(id);
 	up (&shm_ids.sem);
 }
 
@@ -512,11 +515,11 @@ asmlinkage long sys_shmctl (int shmid, i
 			shp->shm_flags |= SHM_DEST;
 			/* Do not find it any more */
 			shp->shm_perm.key = IPC_PRIVATE;
+			shm_unlock(shmid);
 		} else
-			shm_destroy (shp);
+			shm_destroy (shp, shmid);
 
 		/* Unlock */
-		shm_unlock(shmid);
 		up(&shm_ids.sem);
 		return err;
 	}
@@ -653,8 +656,9 @@ invalid:
 	shp->shm_nattch--;
 	if(shp->shm_nattch == 0 &&
 	   shp->shm_flags & SHM_DEST)
-		shm_destroy (shp);
-	shm_unlock(shmid);
+		shm_destroy (shp, shmid);
+	else
+		shm_unlock(shmid);
 	up (&shm_ids.sem);
 
 	*raddr = (unsigned long) user_addr;

As next thing I'll go ahead on the inode/highmem imbalance repored by
Alexey in the weekend.  Then the only pending thing before next -aa is
the integration of the 2.5 scheduler like in 2.4-ac. I will also soon
automate the porting of all the not-yet merged stuff into 2.5, so we
don't risk to forget anything. I mostly care about mainline but I would
also suggest Alan to merge in -ac the very same VM of -aa (so we also
increase the testing before it gets merged in mainline as it is
happening bit by bit overtime).  Alan, I'm looking at it a bit and what
you're shipping is an hybrid between the old 2.4 vm and the current one,
plus using the rmap design for unmapping pagetables. The issue is not
the rmap design, the issue is that the hybrid gratuitously reintroduces
various bugs like kswapd infinite loop and it misses all the recent
fixes (problems like kswapd infinite loop are reproducible after months
the machine are up, other things like unfreed bh with shortage in the
normal zone fixed in recent -aa are reproducible only with big irons,
numa/discontigmem stuff, definitive fix for google etc..etc..). The rmap
design (not the rmap patch, the rmap patch in -ac does a whole lots more
than just switching shrink_cache to use a reverse lookup chain instead
of triggering swap_out, it basically returns to the old 2.4 vm lru
logic) as well cannot make nearly any positive difference during paging,
the system CPU usage during a DBMS workload swapping 1.2G and with 1G of
shm in RAM is not significant, swap_out linear complexity is not showing
up heavily on the profiling yet (on smaller machines with less total
virtual memory it's a no brainer), and it instead hurts scalability of
most fast paths like page faults and fork.  The aging changes also
doesn't look significant either, infact aging the accessed bits in round
robin is more fair then aging it using vm-lru order. All numbers I seen
so far agrees but I'll be very interested to know if you reproduced any
different pratical results in your lab doing -ac vs -aa vm comparisons.

About rmap design I would very much appreciate if Rik could make a
version of his patch that implements rmap on top of current -aa (it
wouldn't be a rewrite, just a porting of the strict rmap feature), so we
can compare apples to apples and benchmark the effect of the rmap patch,
not the rmap + the hybrid, most of the slowdown during paging is most
probably due the hybrid, not because of the rmap design, the rmap design
if something should make things a bit faster during swapout infact, by
being a bit slower in the more important fast paths. It is definitely
possible to put a strict rmap on top of -aa without the huge "hybrid"
thing attached to the rmap code, so without impacting at all the rest of
the vm. It's just a matter of adding the try_to_unmap in shrink_cache
and deleting the swap_out call (it's almost as easy as shipping a
version of Windows without a web browser installed by default).

Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20  4:30   ` Andrea Arcangeli
@ 2002-05-20  5:21     ` Andrew Morton
  2002-05-20 11:34       ` Andrey Savochkin
                         ` (2 more replies)
  2002-05-20 16:13     ` Martin J. Bligh
  2002-05-24  7:33     ` inode highmem imbalance fix [Re: Bug with shared memory.] Andrea Arcangeli
  2 siblings, 3 replies; 29+ messages in thread
From: Andrew Morton @ 2002-05-20  5:21 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel

Andrea Arcangeli wrote:
> 
> ...
> As next thing I'll go ahead on the inode/highmem imbalance repored by
> Alexey in the weekend.

(Some background:  the problem is that ZONE_NORMAL gets clogged with
 inodes which are unfreeable because they have attached pagecache
 pages.  There is no VM pressure against ZONE_HIGHMEM and there is
 nothing which causes the pagecache pages to be freed.  The machine
 dies due to ZONE_NORMAL exhaustion.  Workload is (presumably) the
 reading or creation of a very large number of small files on a machine
 which has much highmem.

 I expect Andrea is looking at a solution which releases pages against
 unused inodes when there are many unused inodes, but prune_icache()
 is failing to free sufficient inodes).

I'll be interested to see your solution ;)  We need to be careful to
not over-shrink pagecache.  Also we need to be doubly-careful to
ensure that LRU order is maintained on unused_list.


I expect this is a compound bug: sure, ZONE_NORMAL is clogged with
inodes.  But I bet it's also clogged with buffer_heads.  If the
buffer_head problem was fixed then the inode problem would occur
much later. - more highmem, more/smaller files.

>  Then the only pending thing before next -aa is
> the integration of the 2.5 scheduler like in 2.4-ac. I will also soon
> automate the porting of all the not-yet merged stuff into 2.5, so we
> don't risk to forget anything.

That's good.

> I mostly care about mainline but I would
> also suggest Alan to merge in -ac the very same VM of -aa (so we also
> increase the testing before it gets merged in mainline as it is
> happening bit by bit overtime).  Alan, I'm looking at it a bit and what
> you're shipping is an hybrid between the old 2.4 vm and the current one,
> plus using the rmap design for unmapping pagetables. The issue is not
> the rmap design, the issue is that the hybrid gratuitously reintroduces
> various bugs like kswapd infinite loop and it misses all the recent
> fixes (problems like kswapd infinite loop are reproducible after months
> the machine are up, other things like unfreed bh with shortage in the
> normal zone fixed in recent -aa are reproducible only with big irons,
> numa/discontigmem stuff, definitive fix for google etc..etc..).

I've seen a vague report from the IBM team which indicates that -aa VM does
not solve the ZONE_NORMAL-full-of-buffer_heads lockup.  The workload was
specweb on a 16 gig machine, I believe.

I did a quickie patch early this month which simply strips buffers
away from pages at each and every opportunity - that fixed it of
course.  At the end of the run, ZONE_NORMAL had 250 MB free, as
opposed to zero with 2.4.18-stock.

I think this is a better approach than memclass_related_bhs().
Because it proactively makes memory available in ZONE_NORMAL
for blockdev pages.  If we minimise the amount of memory which
is consumed by buffer_heads then we maximise the amount of
memory which is available to inodes and indirects.   Sure,
the buffers cache the disk mapping.  But the inodes and indirects
cache the same information which *much* better packing density.
Sure, there's some extra CPU cost in reestablishing the mapping,
but in the no-buffer_heads testing I'm doing on 2.5 it is unmeasurable.

Currently, we're effectively allowing buffer_heads to evict useful
pagecache data.  We have to perform additional I/O to get that back.

In other words: once we've finished I/O, just get rid of the
damn things.   Bit radical for 2.4 maybe.

> ...
> 
> About rmap design I would very much appreciate if Rik could make a
> version of his patch that implements rmap on top of current -aa

That would be good to see.  I believe Rik is offline for a week
or three, so he may not see your words.

-

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20  5:21     ` Andrew Morton
@ 2002-05-20 11:34       ` Andrey Savochkin
  2002-05-20 14:15       ` Andrea Arcangeli
  2002-05-20 16:22       ` Martin J. Bligh
  2 siblings, 0 replies; 29+ messages in thread
From: Andrey Savochkin @ 2002-05-20 11:34 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel

Hi,

On Sun, May 19, 2002 at 10:21:18PM -0700, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> > 
> > ...
> > As next thing I'll go ahead on the inode/highmem imbalance repored by
> > Alexey in the weekend.
> 
> (Some background:  the problem is that ZONE_NORMAL gets clogged with
>  inodes which are unfreeable because they have attached pagecache
>  pages.  There is no VM pressure against ZONE_HIGHMEM and there is
>  nothing which causes the pagecache pages to be freed.  The machine
>  dies due to ZONE_NORMAL exhaustion.  Workload is (presumably) the
>  reading or creation of a very large number of small files on a machine
>  which has much highmem.
> 
>  I expect Andrea is looking at a solution which releases pages against
>  unused inodes when there are many unused inodes, but prune_icache()
>  is failing to free sufficient inodes).
> 
> I'll be interested to see your solution ;)  We need to be careful to
> not over-shrink pagecache.  Also we need to be doubly-careful to
> ensure that LRU order is maintained on unused_list.
> 
> 
> I expect this is a compound bug: sure, ZONE_NORMAL is clogged with
> inodes.  But I bet it's also clogged with buffer_heads.  If the
> buffer_head problem was fixed then the inode problem would occur
> much later. - more highmem, more/smaller files.

In fact, running the test on different hardware, I saw both cases: just huge
inode cache and inodes+buffer_heads.

If ZONE_NORMAL is clogged with inodes+buffer_heads, the system works very
slowly (`tar x' on a test archive finishes in 20+ minutes instead of 20
seconds), but it's still alive.
If the test is aborted, the system returns to the normal state.

If ZONE_NORMAL is clogged with inodes only, the system looks completely frozen,
and only Magic-SysRq shows the attempts to shrink the cache.

	Andrey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20  5:21     ` Andrew Morton
  2002-05-20 11:34       ` Andrey Savochkin
@ 2002-05-20 14:15       ` Andrea Arcangeli
  2002-05-20 19:24         ` Rik van Riel
  2002-05-20 16:22       ` Martin J. Bligh
  2 siblings, 1 reply; 29+ messages in thread
From: Andrea Arcangeli @ 2002-05-20 14:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel

On Sun, May 19, 2002 at 10:21:18PM -0700, Andrew Morton wrote:
> I've seen a vague report from the IBM team which indicates that -aa VM does
> not solve the ZONE_NORMAL-full-of-buffer_heads lockup.  The workload was
> specweb on a 16 gig machine, I believe.

I didn't see any bugreport, could you pass it over. I was confident such
bug is gone since a few weeks but of course if there are still problems
I'd like to know.

Note also that if the excessive number of bh is due rawio or O_DIRECT
that's not a VM issue.

> I did a quickie patch early this month which simply strips buffers
> away from pages at each and every opportunity - that fixed it of
> course.  At the end of the run, ZONE_NORMAL had 250 MB free, as
> opposed to zero with 2.4.18-stock.
>
> I think this is a better approach than memclass_related_bhs().
> Because it proactively makes memory available in ZONE_NORMAL
> for blockdev pages.  If we minimise the amount of memory which
> is consumed by buffer_heads then we maximise the amount of
> memory which is available to inodes and indirects.   Sure,

With memclass_related_bhs() we automatically maximixed the amount of ram
available as inodes/indirects and everything else ZONE_NORMAL, after
that you can consider all the unlocked bh as free memory, just like if
it was other cache. The whole point of memclass_related_bhs is that it
renders the bh an atomically shrinkable cache, so we can left it there
until we run low in lowmem, then we will shrink them just fine. Actually
I see it would be better to start freeing the bh than the zone_normal
memory, the cache in the zone_normal is more important, so you may argue
about page replacement decisions, but the VM should just work correctly
as if the bh are free.

> the buffers cache the disk mapping.  But the inodes and indirects
> cache the same information which *much* better packing density.
> Sure, there's some extra CPU cost in reestablishing the mapping,
> but in the no-buffer_heads testing I'm doing on 2.5 it is unmeasurable.
> 
> Currently, we're effectively allowing buffer_heads to evict useful
> pagecache data.  We have to perform additional I/O to get that back.
> 
> In other words: once we've finished I/O, just get rid of the
> damn things.   Bit radical for 2.4 maybe.

Well, I'm doing a superset of that at the moment, I drop it and I also
know when it's the time to do it. So to implement the "drop lazily any
bh ASAP" it would be a matter of a few liner, can do that, I'm only a
bit worried that the thing become measurable in some overwrite intensive
workload. We would introduce additional not scalable get_blocks in the
overwrite paths. But overall I agree with you it's probably most
certainly worse to shrink the normal pagecache, but my solution
guarantees no runtime difference for example in a zone_normal machine.
Also in 2.5 you shrunk the bh quite a bit so it's a lighter entity to
refill.

Anyways the point is that you shouldn't run out of normal zone anymore
and non highmem won't see any change in the behaviour with the
highmem/bh balance fix. If this is not the case that is the real
problem, everything else is a performance optimization.

One bad thing actually is that if there's somebody re-reading endlessy
the page and constantly keeping it in the active list, such page may
keep the bh allocated for no good reasons, but again, if there are too
many bh pinned that way eventually we'll get more aggressive against the
active list so it would(/should) be again mostly a performnace issue.

> > ...
> > 
> > About rmap design I would very much appreciate if Rik could make a
> > version of his patch that implements rmap on top of current -aa
> 
> That would be good to see.  I believe Rik is offline for a week

Indeed. A strict rmap patch would also make it easier an integration in
2.5, I think rmap like -preempt O(1) scheduler, your bio based async
flushed rewrite following the inodes then the dirty pages in the inode,
and all the other non-showstopper issues (unlike pte-highmem, vm fixes
etc..) are one of the ideal items to research during 2.5, I don't think
it makes much sense to keep them on top of 2.4.

> or three, so he may not see your words.

Ok. We CC'ed Rik so I assume it won't get lost in the mail flood.

Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20  4:30   ` Andrea Arcangeli
  2002-05-20  5:21     ` Andrew Morton
@ 2002-05-20 16:13     ` Martin J. Bligh
  2002-05-20 16:37       ` Andrea Arcangeli
  2002-05-24  7:33     ` inode highmem imbalance fix [Re: Bug with shared memory.] Andrea Arcangeli
  2 siblings, 1 reply; 29+ messages in thread
From: Martin J. Bligh @ 2002-05-20 16:13 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton
  Cc: Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel

> About rmap design I would very much appreciate if Rik could make a
> version of his patch that implements rmap on top of current -aa (it
> wouldn't be a rewrite, just a porting of the strict rmap feature), 
> so we can compare apples to apples and benchmark the effect of the 
> rmap patch, not the rmap + the hybrid, most of the slowdown during 
> paging is most probably due the hybrid, not because of the rmap design, 
> the rmap design if something should make things a bit faster during 
> swapout infact, by being a bit slower in the more important fast paths. 
> It is definitely possible to put a strict rmap on top of -aa without 
> the huge "hybrid" thing attached to the rmap code, so without impacting 
> at all the rest of the vm. It's just a matter of adding the try_to_unmap 
> in shrink_cache and deleting the swap_out call (it's almost as easy as
> shipping a version of Windows without a web browser installed by default).

Is it really the rmap patch, or is this Alan's VM as a whole?
Could you take a look at http://www.surriel.com/patches/ and
see if the rmap 13 patch there is still objectionable to you?

I've been benchmarking rmap 13 against mainline (2.4.19-pre7)
and with the latest lock breakup changes performance now seems
to be about equivalent to mainline (for kernel compile on NUMA-Q).
Those changes reduced system time from 650s to 160s. The only
reason I haven't published results "officially" yet is that I 
was sorting out some timer problems with the machine.

M.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20  5:21     ` Andrew Morton
  2002-05-20 11:34       ` Andrey Savochkin
  2002-05-20 14:15       ` Andrea Arcangeli
@ 2002-05-20 16:22       ` Martin J. Bligh
  2002-05-20 19:38         ` Rik van Riel
  2 siblings, 1 reply; 29+ messages in thread
From: Martin J. Bligh @ 2002-05-20 16:22 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli
  Cc: Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel

> I expect this is a compound bug: sure, ZONE_NORMAL is clogged with
> inodes.  But I bet it's also clogged with buffer_heads.  If the
> buffer_head problem was fixed then the inode problem would occur
> much later. - more highmem, more/smaller files.

This should be easy to see from slabinfo. Note that the problem is
worse on a P4 - alignment takes us from 96 to 128 bytes.
 
> I've seen a vague report from the IBM team which indicates that 
> -aa VM does not solve the ZONE_NORMAL-full-of-buffer_heads lockup.  
> The workload was specweb on a 16 gig machine, I believe.

I haven't been very clear on this, mea culpa. We have two different
machines, each with 4 procs & 16Gb (at least). Both are exhausting
memory through buffer_heads. 

1) Running Oracle apps, doing raw IO. We are running an -aa kernel
on this machine, and it doesn't help. I believe that's just because
it's raw IO, though. We have some patches we did for TPC-H to fix
the raw IO problems with buffer_heads, and we'll be trying those
out this week.

2) Running specweb, doing non-raw IO. This is the machine we tried
Andrew's patch on and it worked a treat. We haven't tried the -aa
fix for this yet, I'll see if I can get that done this week.

So, we haven't really given Andrea's patch a fair test yet. If you
guys can agree which the better approach is by just discussing it,
great. If not, we'll benchmark the hell out of it, and decide things
that way ;-)

M

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20 16:13     ` Martin J. Bligh
@ 2002-05-20 16:37       ` Andrea Arcangeli
  2002-05-20 17:23         ` Martin J. Bligh
  0 siblings, 1 reply; 29+ messages in thread
From: Andrea Arcangeli @ 2002-05-20 16:37 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel

On Mon, May 20, 2002 at 09:13:53AM -0700, Martin J. Bligh wrote:
> Is it really the rmap patch, or is this Alan's VM as a whole?
> Could you take a look at http://www.surriel.com/patches/ and
> see if the rmap 13 patch there is still objectionable to you?

I think it's almost the same code.

> I've been benchmarking rmap 13 against mainline (2.4.19-pre7)
> and with the latest lock breakup changes performance now seems
> to be about equivalent to mainline (for kernel compile on NUMA-Q).
> Those changes reduced system time from 650s to 160s. The only

How much are you swapping in your workload? (as said the fast paths are
hurted a little so it's expected that it's almost as fast as mainline
with a kernel compile, similar to the fact we also add anon pages to the
lru list). I think you're only exercising the fast paths in your
workload, not the memory balancing that is the whole point of the change.

> reason I haven't published results "officially" yet is that I 
> was sorting out some timer problems with the machine.
> 
> M.


Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20 16:37       ` Andrea Arcangeli
@ 2002-05-20 17:23         ` Martin J. Bligh
  2002-05-20 17:32           ` William Lee Irwin III
  0 siblings, 1 reply; 29+ messages in thread
From: Martin J. Bligh @ 2002-05-20 17:23 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel

>> I've been benchmarking rmap 13 against mainline (2.4.19-pre7)
>> and with the latest lock breakup changes performance now seems
>> to be about equivalent to mainline (for kernel compile on NUMA-Q).
>> Those changes reduced system time from 650s to 160s. The only
> 
> How much are you swapping in your workload? (as said the fast paths are
> hurted a little so it's expected that it's almost as fast as mainline
> with a kernel compile, similar to the fact we also add anon pages to the
> lru list). I think you're only exercising the fast paths in your
> workload, not the memory balancing that is the whole point of the change.

No swapping. We fixed the horrendous locking problem we were seeing,
but this was only one test - obviously others are needed. But I think we're
in agreement that it's time to give it a beating and see what happens ;-)

M.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20 17:23         ` Martin J. Bligh
@ 2002-05-20 17:32           ` William Lee Irwin III
  0 siblings, 0 replies; 29+ messages in thread
From: William Lee Irwin III @ 2002-05-20 17:32 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrea Arcangeli, Andrew Morton, Martin Schwidefsky,
	linux-kernel, Alan Cox, Rik van Riel

At some point in the past, Andrea Arcangeli wrote:
>> How much are you swapping in your workload? (as said the fast paths are
>> hurted a little so it's expected that it's almost as fast as mainline
>> with a kernel compile, similar to the fact we also add anon pages to the
>> lru list). I think you're only exercising the fast paths in your
>> workload, not the memory balancing that is the whole point of the change.

On Mon, May 20, 2002 at 10:23:05AM -0700, Martin J. Bligh wrote:
> No swapping. We fixed the horrendous locking problem we were seeing,
> but this was only one test - obviously others are needed. But I think we're
> in agreement that it's time to give it a beating and see what happens ;-)

There's no mystery or secrecy to the locking work, really just overzealous
(which is good wrt. locking changes) QA and a conservative release schedule.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20 14:15       ` Andrea Arcangeli
@ 2002-05-20 19:24         ` Rik van Riel
  2002-05-20 23:46           ` Andrea Arcangeli
  0 siblings, 1 reply; 29+ messages in thread
From: Rik van Riel @ 2002-05-20 19:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, Martin Schwidefsky, linux-kernel, Alan Cox

On Mon, 20 May 2002, Andrea Arcangeli wrote:

> With memclass_related_bhs() we automatically maximixed the amount of ram
> available as inodes/indirects and everything else ZONE_NORMAL, after

OK, a question and a remark:

1) does memclass_related_bhs() work if the bufferheads are
   on another node ?  (NUMA-Q)

2) memclass_related_bhs() will definately not work if the
   data structure is pinned indirectly, say struct address_space
   which is pinned by mere the existance of the page cache page
   and cannot easily be freed

> > or three, so he may not see your words.
>
> Ok. We CC'ed Rik so I assume it won't get lost in the mail flood.

I'm on holidays, don't expect patches soon :)

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20 16:22       ` Martin J. Bligh
@ 2002-05-20 19:38         ` Rik van Riel
  2002-05-20 20:06           ` William Lee Irwin III
  0 siblings, 1 reply; 29+ messages in thread
From: Rik van Riel @ 2002-05-20 19:38 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, Andrea Arcangeli, Martin Schwidefsky,
	linux-kernel, Alan Cox

On Mon, 20 May 2002, Martin J. Bligh wrote:

> 1) Running Oracle apps, doing raw IO. We are running an -aa kernel
> on this machine, and it doesn't help.

> So, we haven't really given Andrea's patch a fair test yet. If you
> guys can agree which the better approach is by just discussing it,

One thing that seems to be missing in Linux are proper VM
statistics. There is no way handwaving and discussions are
going to be better than a measurement of what's going on
inside the VM.

Treating the different VM patch sets as black boxes with
benchmarks will show us which VM works best for which
benchmark, but it won't show us why or how to combine the
good features of the different VMs...

I think good statistics to start would be the traditional
page fault rate (pf), page free rate (fr), page scan rate
(sr), reclaims (re), pageout (po), pagein (pi) and some
variation of iowait stats.

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20 19:38         ` Rik van Riel
@ 2002-05-20 20:06           ` William Lee Irwin III
  0 siblings, 0 replies; 29+ messages in thread
From: William Lee Irwin III @ 2002-05-20 20:06 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Martin J. Bligh, Andrew Morton, Andrea Arcangeli,
	Martin Schwidefsky, linux-kernel, Alan Cox

On Mon, May 20, 2002 at 04:38:15PM -0300, Rik van Riel wrote:
> I think good statistics to start would be the traditional
> page fault rate (pf), page free rate (fr), page scan rate
> (sr), reclaims (re), pageout (po), pagein (pi) and some
> variation of iowait stats.

This has been on my TODO list for months. Sorry I haven't gotten
around to it sooner. I'll follow up with that in a bit.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20 19:24         ` Rik van Riel
@ 2002-05-20 23:46           ` Andrea Arcangeli
  2002-05-21  0:14             ` Martin J. Bligh
  0 siblings, 1 reply; 29+ messages in thread
From: Andrea Arcangeli @ 2002-05-20 23:46 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, Martin Schwidefsky, linux-kernel, Alan Cox

On Mon, May 20, 2002 at 04:24:37PM -0300, Rik van Riel wrote:
> On Mon, 20 May 2002, Andrea Arcangeli wrote:
> 
> > With memclass_related_bhs() we automatically maximixed the amount of ram
> > available as inodes/indirects and everything else ZONE_NORMAL, after
> 
> OK, a question and a remark:
> 
> 1) does memclass_related_bhs() work if the bufferheads are
>    on another node ?  (NUMA-Q)

yes of course. With all the numa fixes all the classzone logic is
indipendent from the pgdat layer, not only during the
memclass_related_bhs check. The faster way to understand how the logic
works is to read the zone_idx/memclass define in mmzone.h.

> 2) memclass_related_bhs() will definately not work if the
>    data structure is pinned indirectly, say struct address_space
>    which is pinned by mere the existance of the page cache page
>    and cannot easily be freed

memclass_related_bhs is just for the bhs, it has nothing to do with the
address space. It's in the page->buffer side of the page, not the
page->mapping, so I don't understand very well.

the inode problem is a separate thing, nothing to do with
memclass_related_bhs(), the fix will be local to prune_icache.

For the memclass_related_bhs() fix in -aa, that's in the testing TODO
list of Martin (on the multi giga machines), he also nicely proposed to
compare it to the other throw-away-all-bh-regardless patch from Andrew
(that I actually didn't seen floating around yet but it's clear how it
works, it's a subset of memclass_related_bhs). However the right way to
test the memclass_related_bhs vs throw-away-all-bh, is to run a rewrite
test that fits in cache, so write,fsync,write,fsync,write,fsync. specweb
or any other read-only test will obviously perform exactly the same both
ways (actually theoretically a bit cpu-faster in throw-away-all-bh
because it doesn't check the bh list).

> > > or three, so he may not see your words.
> >
> > Ok. We CC'ed Rik so I assume it won't get lost in the mail flood.
> 
> I'm on holidays, don't expect patches soon :)

you shouldn't read emails either then, you know it is strictly forbidden
to read emails during vacations :). I'm kidding of course, thanks for
the fast reply!

Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-20 23:46           ` Andrea Arcangeli
@ 2002-05-21  0:14             ` Martin J. Bligh
  2002-05-21  1:40               ` Andrea Arcangeli
  0 siblings, 1 reply; 29+ messages in thread
From: Martin J. Bligh @ 2002-05-21  0:14 UTC (permalink / raw)
  To: Andrea Arcangeli, Rik van Riel
  Cc: Andrew Morton, Martin Schwidefsky, linux-kernel, Alan Cox

> For the memclass_related_bhs() fix in -aa, that's in the testing TODO
> list of Martin (on the multi giga machines), he also nicely proposed to
> compare it to the other throw-away-all-bh-regardless patch from Andrew
> (that I actually didn't seen floating around yet but it's clear how it
> works, it's a subset of memclass_related_bhs). However the right way to
> test the memclass_related_bhs vs throw-away-all-bh, is to run a rewrite
> test that fits in cache, so write,fsync,write,fsync,write,fsync. specweb
> or any other read-only test will obviously perform exactly the same both
> ways (actually theoretically a bit cpu-faster in throw-away-all-bh
> because it doesn't check the bh list).

The only thing that worries me in theory about your approach for this
Andrea is fragmentation - if we try to shrink only when we're low on
memory, isn't there a danger that one buffer_head per page of slab
cache will be in use, and thus no pages are freeable (obviously this
is extreme, but I can certainly see a situation with lots of partially used
pages)? 

With Andrew's approach, keeping things freed as we go, we should
reuse the partially allocated slab pages, which would seem (to me)
to result in less fragmentation?

Thanks,

M.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Bug with shared memory.
  2002-05-21  0:14             ` Martin J. Bligh
@ 2002-05-21  1:40               ` Andrea Arcangeli
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Arcangeli @ 2002-05-21  1:40 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Rik van Riel, Andrew Morton, Martin Schwidefsky, linux-kernel, Alan Cox

On Mon, May 20, 2002 at 05:14:24PM -0700, Martin J. Bligh wrote:
> > For the memclass_related_bhs() fix in -aa, that's in the testing TODO
> > list of Martin (on the multi giga machines), he also nicely proposed to
> > compare it to the other throw-away-all-bh-regardless patch from Andrew
> > (that I actually didn't seen floating around yet but it's clear how it
> > works, it's a subset of memclass_related_bhs). However the right way to
> > test the memclass_related_bhs vs throw-away-all-bh, is to run a rewrite
> > test that fits in cache, so write,fsync,write,fsync,write,fsync. specweb
> > or any other read-only test will obviously perform exactly the same both
> > ways (actually theoretically a bit cpu-faster in throw-away-all-bh
> > because it doesn't check the bh list).
> 
> The only thing that worries me in theory about your approach for this
> Andrea is fragmentation - if we try to shrink only when we're low on
> memory, isn't there a danger that one buffer_head per page of slab
> cache will be in use, and thus no pages are freeable (obviously this
> is extreme, but I can certainly see a situation with lots of partially used
> pages)? 

well, then you should be worried first for the whole /proc/slabinfo, not
just the bh heahders :) if it's a problem for the bh, it's a problem for
everything else.

The reason fragmentation is never a problem is that as far as the
persistent slab objects can be reclaimed dynamically by the vm we will
always be able to free all the slab pages, the only downside we run into
vs being aware of the slab fragmentation, is that we risk to reclaim
more objects than necessary at the layer above the slab cache (so at the
bh layer), but dropping more bh than necessary will never be a problem
(Andrew wants to drop them all indeed).

> With Andrew's approach, keeping things freed as we go, we should
> reuse the partially allocated slab pages, which would seem (to me)
> to result in less fragmentation?

less fragmentation because of zero bh allocated from slab cache :).

Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* inode highmem imbalance fix [Re: Bug with shared memory.]
  2002-05-20  4:30   ` Andrea Arcangeli
  2002-05-20  5:21     ` Andrew Morton
  2002-05-20 16:13     ` Martin J. Bligh
@ 2002-05-24  7:33     ` Andrea Arcangeli
  2002-05-24  7:51       ` William Lee Irwin III
                         ` (3 more replies)
  2 siblings, 4 replies; 29+ messages in thread
From: Andrea Arcangeli @ 2002-05-24  7:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel, kuznet,
	Andrey Savochkin

On Mon, May 20, 2002 at 06:30:40AM +0200, Andrea Arcangeli wrote:
> As next thing I'll go ahead on the inode/highmem imbalance repored by
> Alexey in the weekend.  Then the only pending thing before next -aa is

Here it is, you should apply it together with vm-35 that you need too
for the bh/highmem balance (or on top of 2.4.19pre8aa3). I tested it
slightly on uml and it didn't broke so far, so be careful because it's not
very well tested yet. On the lines of what Alexey suggested originally,
if goal isn't reached, in a second pass we shrink the cache too, but
only if the cache is the only reason for the "pinning" beahiour of the
inode. If for example there are dirty blocks of metadata or of data
belonging to the inode we wakeup_bdflush instead and we never shrink the
cache in such case. If the inode itself is dirty as well we let the two
passes fail so we will schedule the work for keventd. This logic should
ensure we never fall into shrinking the cache for no good reason and
that we free the cache only for the inodes that we actually go ahead and
free. (basically only dirty pages set with SetPageDirty aren't trapped
by the logic before calling the invalidate, like ramfs, but that's
expected of course, those pages cannot be damaged by the non destructive
invalidate anyways)

Comments?

--- 2.4.19pre8aa4/fs/inode.c.~1~	Fri May 24 03:17:10 2002
+++ 2.4.19pre8aa4/fs/inode.c	Fri May 24 05:03:54 2002
@@ -672,35 +672,87 @@ void prune_icache(int goal)
 {
 	LIST_HEAD(list);
 	struct list_head *entry, *freeable = &list;
-	int count;
+	int count, pass;
 	struct inode * inode;
 
-	spin_lock(&inode_lock);
+	count = pass = 0;
 
-	count = 0;
-	entry = inode_unused.prev;
-	while (entry != &inode_unused)
-	{
-		struct list_head *tmp = entry;
+	spin_lock(&inode_lock);
+	while (goal && pass++ < 2) {
+		entry = inode_unused.prev;
+		while (entry != &inode_unused)
+		{
+			struct list_head *tmp = entry;
 
-		entry = entry->prev;
-		inode = INODE(tmp);
-		if (inode->i_state & (I_FREEING|I_CLEAR|I_LOCK))
-			continue;
-		if (!CAN_UNUSE(inode))
-			continue;
-		if (atomic_read(&inode->i_count))
-			continue;
-		list_del(tmp);
-		list_del(&inode->i_hash);
-		INIT_LIST_HEAD(&inode->i_hash);
-		list_add(tmp, freeable);
-		inode->i_state |= I_FREEING;
-		count++;
-		if (!--goal)
-			break;
+			entry = entry->prev;
+			inode = INODE(tmp);
+			if (inode->i_state & (I_FREEING|I_CLEAR|I_LOCK))
+				continue;
+			if (atomic_read(&inode->i_count))
+				continue;
+			if (pass == 2 && !inode->i_state && !CAN_UNUSE(inode)) {
+				if (inode_has_buffers(inode))
+					/*
+					 * If the inode has dirty buffers
+					 * pending, start flushing out bdflush.ndirty
+					 * worth of data even if there's no dirty-memory
+					 * pressure. Do nothing else in this
+					 * case, until all dirty buffers are gone
+					 * we can do nothing about the inode other than
+					 * to keep flushing dirty stuff. We could also
+					 * flush only the dirty buffers in the inode
+					 * but there's no API to do it asynchronously
+					 * and this simpler approch to deal with the
+					 * dirty payload shouldn't make much difference
+					 * in practice. Also keep in mind if somebody
+					 * keeps overwriting data in a flood we'd
+					 * never manage to drop the inode anyways,
+					 * and we really shouldn't do that because
+					 * it's an heavily used one.
+					 */
+					wakeup_bdflush();
+				else if (inode->i_data.nrpages)
+					/*
+					 * If we're here it means the only reason
+					 * we cannot drop the inode is that its
+					 * due its pagecache so go ahead and trim it
+					 * hard. If it doesn't go away it means
+					 * they're dirty or dirty/pinned pages ala
+					 * ramfs.
+					 *
+					 * invalidate_inode_pages() is a non
+					 * blocking operation but we introduce
+					 * a dependency order between the
+					 * inode_lock and the pagemap_lru_lock,
+					 * the inode_lock must always be taken
+					 * first from now on.
+					 */
+					invalidate_inode_pages(inode);
+			}
+			if (!CAN_UNUSE(inode))
+				continue;
+			list_del(tmp);
+			list_del(&inode->i_hash);
+			INIT_LIST_HEAD(&inode->i_hash);
+			list_add(tmp, freeable);
+			inode->i_state |= I_FREEING;
+			count++;
+			if (!--goal)
+				break;
+		}
 	}
 	inodes_stat.nr_unused -= count;
+
+	/*
+	 * the unused list is hardly an LRU so it makes
+	 * more sense to rotate it so we don't bang
+	 * always on the same inodes in case they're
+	 * unfreeable for whatever reason.
+	 */
+	if (entry != &inode_unused) {
+		list_del(&inode_unused);
+		list_add(&inode_unused, entry);
+	}
 	spin_unlock(&inode_lock);
 
 	dispose_list(freeable);

Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode highmem imbalance fix [Re: Bug with shared memory.]
  2002-05-24  7:33     ` inode highmem imbalance fix [Re: Bug with shared memory.] Andrea Arcangeli
@ 2002-05-24  7:51       ` William Lee Irwin III
  2002-05-24  8:04       ` Andrew Morton
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: William Lee Irwin III @ 2002-05-24  7:51 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, Martin Schwidefsky, linux-kernel, Alan Cox,
	Rik van Riel, kuznet, Andrey Savochkin

On Fri, May 24, 2002 at 09:33:41AM +0200, Andrea Arcangeli wrote:
> Here it is, you should apply it together with vm-35 that you need too
> for the bh/highmem balance (or on top of 2.4.19pre8aa3). I tested it
> slightly on uml and it didn't broke so far, so be careful because it's not
> very well tested yet. On the lines of what Alexey suggested originally,
> if goal isn't reached, in a second pass we shrink the cache too, but
> only if the cache is the only reason for the "pinning" beahiour of the
> inode. If for example there are dirty blocks of metadata or of data
> belonging to the inode we wakeup_bdflush instead and we never shrink the
> cache in such case. If the inode itself is dirty as well we let the two
> passes fail so we will schedule the work for keventd. This logic should
> ensure we never fall into shrinking the cache for no good reason and
> that we free the cache only for the inodes that we actually go ahead and
> free. (basically only dirty pages set with SetPageDirty aren't trapped
> by the logic before calling the invalidate, like ramfs, but that's
> expected of course, those pages cannot be damaged by the non destructive
> invalidate anyways)
> Comments?

I haven't had the chance to give this a test run yet, but it looks very
promising. I have a slight concern about the hold time of the inode_lock
because prune_icache() already generates some amount of contention,
but what you've presented appears to be necessary to prevent lethal
cache bloat, and so that concern is secondary at most. I'll give it a
test run tomorrow if no one else on-site gets to it first, though with
the proviso that I've not been involved in workloads triggering this
specific KVA exhaustion issue, so what testing I can do is limited.


Thanks,
Bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode highmem imbalance fix [Re: Bug with shared memory.]
  2002-05-24  7:33     ` inode highmem imbalance fix [Re: Bug with shared memory.] Andrea Arcangeli
  2002-05-24  7:51       ` William Lee Irwin III
@ 2002-05-24  8:04       ` Andrew Morton
  2002-05-24 15:20         ` Andrea Arcangeli
  2002-05-24 11:47       ` Ed Tomlinson
  2002-05-30 11:25       ` Denis Lunev
  3 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2002-05-24  8:04 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel, kuznet,
	Andrey Savochkin

Andrea Arcangeli wrote:
> 
> On Mon, May 20, 2002 at 06:30:40AM +0200, Andrea Arcangeli wrote:
> > As next thing I'll go ahead on the inode/highmem imbalance repored by
> > Alexey in the weekend.  Then the only pending thing before next -aa is
> 
> Here it is, you should apply it together with vm-35 that you need too
> for the bh/highmem balance (or on top of 2.4.19pre8aa3).

Looks OK to me.  But I wonder if it should be inside some config
option - I don't think machines with saner memory architectures
would want this?

> ...
> +                                        * in practice. Also keep in mind if somebody
> +                                        * keeps overwriting data in a flood we'd
> +                                        * never manage to drop the inode anyways,
> +                                        * and we really shouldn't do that because
> +                                        * it's an heavily used one.
> +                                        */

Can anyone actually write to an inode which is on the unused
list?

> +                                       wakeup_bdflush();
> +                               else if (inode->i_data.nrpages)
> +                                       /*
> +                                        * If we're here it means the only reason
> +                                        * we cannot drop the inode is that its
> +                                        * due its pagecache so go ahead and trim it
> +                                        * hard. If it doesn't go away it means
> +                                        * they're dirty or dirty/pinned pages ala
> +                                        * ramfs.
> +                                        *
> +                                        * invalidate_inode_pages() is a non
> +                                        * blocking operation but we introduce
> +                                        * a dependency order between the
> +                                        * inode_lock and the pagemap_lru_lock,
> +                                        * the inode_lock must always be taken
> +                                        * first from now on.
> +                                        */
> +                                       invalidate_inode_pages(inode);

It seems that a call to try_to_free_buffers() has snuck into
invalidate_inode_pages().  That means that clean ext3 pages
which are on the checkpoint list won't be released.  Could you
please change that to try_to_release_page()?


-

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode highmem imbalance fix [Re: Bug with shared memory.]
  2002-05-24  7:33     ` inode highmem imbalance fix [Re: Bug with shared memory.] Andrea Arcangeli
  2002-05-24  7:51       ` William Lee Irwin III
  2002-05-24  8:04       ` Andrew Morton
@ 2002-05-24 11:47       ` Ed Tomlinson
  2002-05-30 11:25       ` Denis Lunev
  3 siblings, 0 replies; 29+ messages in thread
From: Ed Tomlinson @ 2002-05-24 11:47 UTC (permalink / raw)
  To: linux-kernel

I Wonder if the changes I posted putting slab pages in the lru 
would help?  For now the patch is against rmap (pre8-ac5).  I
would be very interested in hearing about results testing it.  
See subject:

Re: [RFC][PATCH] using page aging to shrink caches (pre8-ac5)

Comments?

Ed Tomlinson

Andrea Arcangeli wrote:

> On Mon, May 20, 2002 at 06:30:40AM +0200, Andrea Arcangeli wrote:
>> As next thing I'll go ahead on the inode/highmem imbalance repored by
>> Alexey in the weekend.  Then the only pending thing before next -aa is
> 
> Here it is, you should apply it together with vm-35 that you need too
> for the bh/highmem balance (or on top of 2.4.19pre8aa3). I tested it
> slightly on uml and it didn't broke so far, so be careful because it's not
> very well tested yet. On the lines of what Alexey suggested originally,
> if goal isn't reached, in a second pass we shrink the cache too, but
> only if the cache is the only reason for the "pinning" beahiour of the
> inode. If for example there are dirty blocks of metadata or of data
> belonging to the inode we wakeup_bdflush instead and we never shrink the
> cache in such case. If the inode itself is dirty as well we let the two
> passes fail so we will schedule the work for keventd. This logic should
> ensure we never fall into shrinking the cache for no good reason and
> that we free the cache only for the inodes that we actually go ahead and
> free. (basically only dirty pages set with SetPageDirty aren't trapped
> by the logic before calling the invalidate, like ramfs, but that's
> expected of course, those pages cannot be damaged by the non destructive
> invalidate anyways)
> 
> Comments?
> 
> --- 2.4.19pre8aa4/fs/inode.c.~1~      Fri May 24 03:17:10 2002
> +++ 2.4.19pre8aa4/fs/inode.c  Fri May 24 05:03:54 2002
> @@ -672,35 +672,87 @@ void prune_icache(int goal)
>  {
>  LIST_HEAD(list);
>  struct list_head *entry, *freeable = &list;
> -     int count;
> +     int count, pass;
>  struct inode * inode;
>  
> -     spin_lock(&inode_lock);
> +     count = pass = 0;
>  
> -     count = 0;
> -     entry = inode_unused.prev;
> -     while (entry != &inode_unused)
> -     {
> -             struct list_head *tmp = entry;
> +     spin_lock(&inode_lock);
> +     while (goal && pass++ < 2) {
> +             entry = inode_unused.prev;
> +             while (entry != &inode_unused)
> +             {
> +                     struct list_head *tmp = entry;
>  
> -             entry = entry->prev;
> -             inode = INODE(tmp);
> -             if (inode->i_state & (I_FREEING|I_CLEAR|I_LOCK))
> -                     continue;
> -             if (!CAN_UNUSE(inode))
> -                     continue;
> -             if (atomic_read(&inode->i_count))
> -                     continue;
> -             list_del(tmp);
> -             list_del(&inode->i_hash);
> -             INIT_LIST_HEAD(&inode->i_hash);
> -             list_add(tmp, freeable);
> -             inode->i_state |= I_FREEING;
> -             count++;
> -             if (!--goal)
> -                     break;
> +                     entry = entry->prev;
> +                     inode = INODE(tmp);
> +                     if (inode->i_state & (I_FREEING|I_CLEAR|I_LOCK))
> +                             continue;
> +                     if (atomic_read(&inode->i_count))
> +                             continue;
> +                     if (pass == 2 && !inode->i_state && !CAN_UNUSE(inode)) {
> +                             if (inode_has_buffers(inode))
> +                                     /*
> +                                      * If the inode has dirty buffers
> +                                      * pending, start flushing out bdflush.ndirty
> +                                      * worth of data even if there's no dirty-memory
> +                                      * pressure. Do nothing else in this
> +                                      * case, until all dirty buffers are gone
> +                                      * we can do nothing about the inode other than
> +                                      * to keep flushing dirty stuff. We could also
> +                                      * flush only the dirty buffers in the inode
> +                                      * but there's no API to do it asynchronously
> +                                      * and this simpler approch to deal with the
> +                                      * dirty payload shouldn't make much difference
> +                                      * in practice. Also keep in mind if somebody
> +                                      * keeps overwriting data in a flood we'd
> +                                      * never manage to drop the inode anyways,
> +                                      * and we really shouldn't do that because
> +                                      * it's an heavily used one.
> +                                      */
> +                                     wakeup_bdflush();
> +                             else if (inode->i_data.nrpages)
> +                                     /*
> +                                      * If we're here it means the only reason
> +                                      * we cannot drop the inode is that its
> +                                      * due its pagecache so go ahead and trim it
> +                                      * hard. If it doesn't go away it means
> +                                      * they're dirty or dirty/pinned pages ala
> +                                      * ramfs.
> +                                      *
> +                                      * invalidate_inode_pages() is a non
> +                                      * blocking operation but we introduce
> +                                      * a dependency order between the
> +                                      * inode_lock and the pagemap_lru_lock,
> +                                      * the inode_lock must always be taken
> +                                      * first from now on.
> +                                      */
> +                                     invalidate_inode_pages(inode);
> +                     }
> +                     if (!CAN_UNUSE(inode))
> +                             continue;
> +                     list_del(tmp);
> +                     list_del(&inode->i_hash);
> +                     INIT_LIST_HEAD(&inode->i_hash);
> +                     list_add(tmp, freeable);
> +                     inode->i_state |= I_FREEING;
> +                     count++;
> +                     if (!--goal)
> +                             break;
> +             }
>  }
>  inodes_stat.nr_unused -= count;
> +
> +     /*
> +      * the unused list is hardly an LRU so it makes
> +      * more sense to rotate it so we don't bang
> +      * always on the same inodes in case they're
> +      * unfreeable for whatever reason.
> +      */
> +     if (entry != &inode_unused) {
> +             list_del(&inode_unused);
> +             list_add(&inode_unused, entry);
> +     }
>  spin_unlock(&inode_lock);
>  
>  dispose_list(freeable);
> 
> Andrea
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode highmem imbalance fix [Re: Bug with shared memory.]
  2002-05-24  8:04       ` Andrew Morton
@ 2002-05-24 15:20         ` Andrea Arcangeli
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Arcangeli @ 2002-05-24 15:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Schwidefsky, linux-kernel, Alan Cox, Rik van Riel, kuznet,
	Andrey Savochkin

On Fri, May 24, 2002 at 01:04:35AM -0700, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> > 
> > On Mon, May 20, 2002 at 06:30:40AM +0200, Andrea Arcangeli wrote:
> > > As next thing I'll go ahead on the inode/highmem imbalance repored by
> > > Alexey in the weekend.  Then the only pending thing before next -aa is
> > 
> > Here it is, you should apply it together with vm-35 that you need too
> > for the bh/highmem balance (or on top of 2.4.19pre8aa3).
> 
> Looks OK to me.  But I wonder if it should be inside some config
> option - I don't think machines with saner memory architectures
> would want this?

This assumes that this hurts lowmem machines, hopefully it's not the
case. The vm bangs on the icache only when there's some remote sign of
pagecache shortage and if the whole icache is totally pinned by clean
pagecache, and there's some sign of pagecache shortage it is probably ok
to do an invalidate to release it. We were doing it in 2.2 too of course
(in a different simpler manner with truncate_inode_pages because there
was no problem in 2.2 in being destructive against the pgaecache, the
dirty cache was always on the buffercache side in 2.2).

> 
> > ...
> > +                                        * in practice. Also keep in mind if somebody
> > +                                        * keeps overwriting data in a flood we'd
> > +                                        * never manage to drop the inode anyways,
> > +                                        * and we really shouldn't do that because
> > +                                        * it's an heavily used one.
> > +                                        */
> 
> Can anyone actually write to an inode which is on the unused
> list?

no you can't (if it's in the unused list, not even a single dentry is
teking it pinned, that means nobody can have it open, it's not even
cached in unused dentries), but the inode can have async dirty buffers
or dirty pages (if also the inode is marked dirty, not only the pages,
unless it's ramfs), so it may be cosntantly not freeable if bdflush
cannot keep up with the writer.  Doesn't look a pratical problem, in
such case the inode is very hot and it doesn't worth to shrink it.

> 
> > +                                       wakeup_bdflush();
> > +                               else if (inode->i_data.nrpages)
> > +                                       /*
> > +                                        * If we're here it means the only reason
> > +                                        * we cannot drop the inode is that its
> > +                                        * due its pagecache so go ahead and trim it
> > +                                        * hard. If it doesn't go away it means
> > +                                        * they're dirty or dirty/pinned pages ala
> > +                                        * ramfs.
> > +                                        *
> > +                                        * invalidate_inode_pages() is a non
> > +                                        * blocking operation but we introduce
> > +                                        * a dependency order between the
> > +                                        * inode_lock and the pagemap_lru_lock,
> > +                                        * the inode_lock must always be taken
> > +                                        * first from now on.
> > +                                        */
> > +                                       invalidate_inode_pages(inode);
> 
> It seems that a call to try_to_free_buffers() has snuck into
> invalidate_inode_pages().  That means that clean ext3 pages
> which are on the checkpoint list won't be released.  Could you
> please change that to try_to_release_page()?

Indeed. thanks!

Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* inode highmem imbalance fix [Re: Bug with shared memory.]
  2002-05-24  7:33     ` inode highmem imbalance fix [Re: Bug with shared memory.] Andrea Arcangeli
                         ` (2 preceding siblings ...)
  2002-05-24 11:47       ` Ed Tomlinson
@ 2002-05-30 11:25       ` Denis Lunev
  2002-05-30 17:59         ` Andrea Arcangeli
  3 siblings, 1 reply; 29+ messages in thread
From: Denis Lunev @ 2002-05-30 11:25 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 146 bytes --]

Hello!

The patch itself cures my problems, but after a small fix concerning
uninitialized variable resulting in OOPS.

Regards,
	Denis V. Lunev


[-- Attachment #2: diff-andrea-inodes2 --]
[-- Type: text, Size: 246 bytes --]

--- linux/fs/inode.c.old	Wed May 29 20:16:17 2002
+++ linux/fs/inode.c	Wed May 29 20:17:08 2002
@@ -669,6 +669,7 @@
 	struct inode * inode;
 
 	count = pass = 0;
+	entry = &inode_unused;
 
 	spin_lock(&inode_lock);
 	while (goal && pass++ < 2) {

[-- Attachment #3: message body text --]
[-- Type: text/plain, Size: 1400 bytes --]



Andrea Arcangeli writes:
 > On Mon, May 20, 2002 at 06:30:40AM +0200, Andrea Arcangeli wrote:
 > > As next thing I'll go ahead on the inode/highmem imbalance repored by
 > > Alexey in the weekend.  Then the only pending thing before next -aa is
 > 
 > Here it is, you should apply it together with vm-35 that you need too
 > for the bh/highmem balance (or on top of 2.4.19pre8aa3). I tested it
 > slightly on uml and it didn't broke so far, so be careful because it's not
 > very well tested yet. On the lines of what Alexey suggested originally,
 > if goal isn't reached, in a second pass we shrink the cache too, but
 > only if the cache is the only reason for the "pinning" beahiour of the
 > inode. If for example there are dirty blocks of metadata or of data
 > belonging to the inode we wakeup_bdflush instead and we never shrink the
 > cache in such case. If the inode itself is dirty as well we let the two
 > passes fail so we will schedule the work for keventd. This logic should
 > ensure we never fall into shrinking the cache for no good reason and
 > that we free the cache only for the inodes that we actually go ahead and
 > free. (basically only dirty pages set with SetPageDirty aren't trapped
 > by the logic before calling the invalidate, like ramfs, but that's
 > expected of course, those pages cannot be damaged by the non destructive
 > invalidate anyways)
 > 
 > Comments?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: inode highmem imbalance fix [Re: Bug with shared memory.]
  2002-05-30 11:25       ` Denis Lunev
@ 2002-05-30 17:59         ` Andrea Arcangeli
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Arcangeli @ 2002-05-30 17:59 UTC (permalink / raw)
  To: Denis Lunev; +Cc: linux-kernel, Andrey Nekrasov, rwhron, Yann Dupont

On Thu, May 30, 2002 at 03:25:04PM +0400, Denis Lunev wrote:
Content-Description: message body text
> Hello!
> 
> The patch itself cures my problems, but after a small fix concerning
> uninitialized variable resulting in OOPS.
> 
> Regards,
> 	Denis V. Lunev
> 

Content-Description: diff-andrea-inodes2
> --- linux/fs/inode.c.old	Wed May 29 20:16:17 2002
> +++ linux/fs/inode.c	Wed May 29 20:17:08 2002
> @@ -669,6 +669,7 @@
>  	struct inode * inode;
>  
>  	count = pass = 0;
> +	entry = &inode_unused;
>  
>  	spin_lock(&inode_lock);
>  	while (goal && pass++ < 2) {


Great spotting! this fix is certainly correct, without it the unused
list will be corrupted if prune_icache gets a goal == 0 as parameter.
OTOH if fixes that cases only (that riggers only when the number of
unused inodes is <= vm_vfs_scan_ratio, not an extremely common case, I
wonder if that's enough to cure all the oopses I received today),
probably it's enough, the number of unused inodes is != than the number
of inodes allocated. At first glance I don't see other issues (my error
is been to assume goal was going to be always something significant).
BTW, it's great that at the first showstopper bug since a long time I
got such an high quality feedback after a few hours, thank you very
much! :)

Could you test if the below one liner from Denis (I attached it below
too without quotes) fixes all your problems with 2.4.19pre9aa1 or with
the single inode highmem imbalance fix? thanks,

--- linux/fs/inode.c.old	Wed May 29 20:16:17 2002
+++ linux/fs/inode.c	Wed May 29 20:17:08 2002
@@ -669,6 +669,7 @@
 	struct inode * inode;
 
 	count = pass = 0;
+	entry = &inode_unused;
 
 	spin_lock(&inode_lock);
 	while (goal && pass++ < 2) {


Also it seems the O1 scheduler is doing well so far. In next -aa I will
also include the patch from Mike Kravetz that I finished auditing and
it's really strightforward, it serializes the execution of the reder of
the pipe with the writer of the pipe if the writer expires the length
of the pipe buffer, that will maximize pipe bandwith similar to the
pre-o1 levels, and still the tasks runs in parallel in two cpus if no
blocking from the writer is necessary, I think it's the best heuristic.
Adding the sync beahviour also with the reader seems inferior, I can
imagine a writer running full time in a cpu and sometime posting a few
bytes to the pipe, while the reader always blocking. This way the reader
will keep running in its own cpu, and it won't interfere with the "cpu
intensive" writer.

Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2002-05-30 18:01 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-14 15:13 Bug with shared memory Martin Schwidefsky
2002-05-14 19:33 ` Andrew Morton
2002-05-15 22:42   ` Mike Kravetz
2002-05-15 23:07     ` Andrew Morton
2002-05-17 17:53     ` Bill Davidsen
2002-05-17 20:07       ` Mike Kravetz
2002-05-17 20:29         ` Anton Blanchard
2002-05-20  4:30   ` Andrea Arcangeli
2002-05-20  5:21     ` Andrew Morton
2002-05-20 11:34       ` Andrey Savochkin
2002-05-20 14:15       ` Andrea Arcangeli
2002-05-20 19:24         ` Rik van Riel
2002-05-20 23:46           ` Andrea Arcangeli
2002-05-21  0:14             ` Martin J. Bligh
2002-05-21  1:40               ` Andrea Arcangeli
2002-05-20 16:22       ` Martin J. Bligh
2002-05-20 19:38         ` Rik van Riel
2002-05-20 20:06           ` William Lee Irwin III
2002-05-20 16:13     ` Martin J. Bligh
2002-05-20 16:37       ` Andrea Arcangeli
2002-05-20 17:23         ` Martin J. Bligh
2002-05-20 17:32           ` William Lee Irwin III
2002-05-24  7:33     ` inode highmem imbalance fix [Re: Bug with shared memory.] Andrea Arcangeli
2002-05-24  7:51       ` William Lee Irwin III
2002-05-24  8:04       ` Andrew Morton
2002-05-24 15:20         ` Andrea Arcangeli
2002-05-24 11:47       ` Ed Tomlinson
2002-05-30 11:25       ` Denis Lunev
2002-05-30 17:59         ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).