linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] truncate_inode_pages
@ 2001-06-09 15:51 Andrew Morton
  2001-06-09 17:40 ` Alexander Viro
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2001-06-09 15:51 UTC (permalink / raw)
  To: lkml

The ftruncate() in this program:

#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>

int clean = 64 * 1024 * 1024;
int dirty = 64 * 1024 * 1024;

main()
{
	int fd = open("foo", O_RDWR|O_TRUNC|O_CREAT, 0666);
	void *mem;

	ftruncate(fd, clean + dirty);
	mem = mmap(0, clean + dirty, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	memset(mem, 0, clean + dirty);
	msync(mem, clean, MS_SYNC);
	ftruncate(fd, clean);
}


takes 45 seconds CPU time due to the O(clean * dirty) algorithm in
truncate_inode_pages().  The machine is locked up for the duration.
The patch reduces this to 20 milliseconds via an O(clean + dirty)
algorithm.


--- linux-2.4.5/mm/filemap.c	Mon May 28 13:31:49 2001
+++ linux-akpm/mm/filemap.c	Sun Jun 10 01:27:09 2001
@@ -273,15 +273,24 @@
 {
 	unsigned long start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+	int complete;
 
-repeat:
 	spin_lock(&pagecache_lock);
-	if (truncate_list_pages(&mapping->clean_pages, start, &partial))
-		goto repeat;
-	if (truncate_list_pages(&mapping->dirty_pages, start, &partial))
-		goto repeat;
-	if (truncate_list_pages(&mapping->locked_pages, start, &partial))
-		goto repeat;
+	do {
+		complete = 1;
+		while (truncate_list_pages(&mapping->clean_pages, start, &partial)) {
+			spin_lock(&pagecache_lock);
+			complete = 0;
+		}
+		while (truncate_list_pages(&mapping->dirty_pages, start, &partial)) {
+			spin_lock(&pagecache_lock);
+			complete = 0;
+		}
+		while (truncate_list_pages(&mapping->locked_pages, start, &partial)) {
+			spin_lock(&pagecache_lock);
+			complete = 0;
+		}
+	} while (!complete);
 	spin_unlock(&pagecache_lock);
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] truncate_inode_pages
  2001-06-09 15:51 [patch] truncate_inode_pages Andrew Morton
@ 2001-06-09 17:40 ` Alexander Viro
  2001-06-09 22:53   ` Daniel Phillips
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Viro @ 2001-06-09 17:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml



> takes 45 seconds CPU time due to the O(clean * dirty) algorithm in
> truncate_inode_pages().  The machine is locked up for the duration.
> The patch reduces this to 20 milliseconds via an O(clean + dirty)
> algorithm.

Unfortunately, it's _not_ O(clean + dirty).

> +		while (truncate_list_pages(&mapping->clean_pages, start, &partial)) {
> +			spin_lock(&pagecache_lock);
> +			complete = 0;
> +		}

Cool. Now think what happens if pages with large indices are in the
very end of list. Half of them. You skip clean/2 pages on each of
clean/2 passes. Hardly a linear behaviour - all you need is a different
program to trigger it.

Now, having a separate pass that would reorder the pages on list,
moving the to-kill ones in the beginning might help.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] truncate_inode_pages
  2001-06-09 17:40 ` Alexander Viro
@ 2001-06-09 22:53   ` Daniel Phillips
  2001-06-10  1:31     ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2001-06-09 22:53 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton; +Cc: lkml

On Saturday 09 June 2001 19:40, Alexander Viro wrote:
> > takes 45 seconds CPU time due to the O(clean * dirty) algorithm in
> > truncate_inode_pages().  The machine is locked up for the duration.
> > The patch reduces this to 20 milliseconds via an O(clean + dirty)
> > algorithm.
>
> Unfortunately, it's _not_ O(clean + dirty).
>
> > +		while (truncate_list_pages(&mapping->clean_pages, start,
> > &partial)) { +			spin_lock(&pagecache_lock);
> > +			complete = 0;
> > +		}
>
> Cool. Now think what happens if pages with large indices are in the
> very end of list. Half of them. You skip clean/2 pages on each of
> clean/2 passes. Hardly a linear behaviour - all you need is a
> different program to trigger it.
>
> Now, having a separate pass that would reorder the pages on list,
> moving the to-kill ones in the beginning might help.

This is easy, just set the list head to the page about to be truncated.

--
Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] truncate_inode_pages
  2001-06-09 22:53   ` Daniel Phillips
@ 2001-06-10  1:31     ` Andrew Morton
  2001-06-10 16:40       ` Daniel Phillips
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2001-06-10  1:31 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Alexander Viro, lkml

Daniel Phillips wrote:
> 
> This is easy, just set the list head to the page about to be truncated.

Works for me.

--- linux-2.4.5/mm/filemap.c	Mon May 28 13:31:49 2001
+++ linux-akpm/mm/filemap.c	Sun Jun 10 11:29:19 2001
@@ -235,12 +235,13 @@
 
 		/* Is one of the pages to truncate? */
 		if ((offset >= start) || (*partial && (offset + 1) == start)) {
+			list_del(head);
+			list_add_tail(head, curr);
 			if (TryLockPage(page)) {
 				page_cache_get(page);
 				spin_unlock(&pagecache_lock);
 				wait_on_page(page);
-				page_cache_release(page);
-				return 1;
+				goto out_restart;
 			}
 			page_cache_get(page);
 			spin_unlock(&pagecache_lock);
@@ -252,11 +253,14 @@
 				truncate_complete_page(page);
 
 			UnlockPage(page);
-			page_cache_release(page);
-			return 1;
+			goto out_restart;
 		}
 	}
 	return 0;
+out_restart:
+	page_cache_release(page);
+	spin_lock(&pagecache_lock);
+	return 1;
 }
 
 
@@ -273,15 +277,18 @@
 {
 	unsigned long start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+	int complete;
 
-repeat:
 	spin_lock(&pagecache_lock);
-	if (truncate_list_pages(&mapping->clean_pages, start, &partial))
-		goto repeat;
-	if (truncate_list_pages(&mapping->dirty_pages, start, &partial))
-		goto repeat;
-	if (truncate_list_pages(&mapping->locked_pages, start, &partial))
-		goto repeat;
+	do {
+		complete = 1;
+		while (truncate_list_pages(&mapping->clean_pages, start, &partial))
+			complete = 0;
+		while (truncate_list_pages(&mapping->dirty_pages, start, &partial))
+			complete = 0;
+		while (truncate_list_pages(&mapping->locked_pages, start, &partial))
+			complete = 0;
+	} while (!complete);
 	spin_unlock(&pagecache_lock);
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] truncate_inode_pages
  2001-06-10  1:31     ` Andrew Morton
@ 2001-06-10 16:40       ` Daniel Phillips
  2001-06-11 12:45         ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2001-06-10 16:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alexander Viro, lkml

On Sunday 10 June 2001 03:31, Andrew Morton wrote:
> Daniel Phillips wrote:
> > This is easy, just set the list head to the page about to be truncated.
>
> Works for me.

It looks good, but it's black magic - it could use a comment along the lines 
of:

/*
 * Ensure at least one pass through all three lists with lock held
 */

There is some fuzz on the original code that could use a scrubbing.  "start" 
should not be the 1st full page needing truncating, it should be the first 
page needing truncating at all, full or partial.

The truncation condition becomes:

-		if ((offset >= start) || (*partial && (offset + 1) == start)) {
+		if (offset >= start) {
[...]
-			if (*partial && (offset + 1) == start)
+			if (*partial && offset == start)

Then the conditions are so easy to test it's not worth passing the &partial, 
just pass lstart... and... oh bother, it's not broke, why fix it ;-)

--
Daniel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] truncate_inode_pages
  2001-06-10 16:40       ` Daniel Phillips
@ 2001-06-11 12:45         ` Andrew Morton
  2001-06-11 13:13           ` Daniel Phillips
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2001-06-11 12:45 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Alexander Viro, lkml

Daniel Phillips wrote:
> 
> On Sunday 10 June 2001 03:31, Andrew Morton wrote:
> > Daniel Phillips wrote:
> > > This is easy, just set the list head to the page about to be truncated.
> >
> > Works for me.
> 
> It looks good, but it's black magic

No, it's wrong.  I'm getting BUG()s in clear_inode():

485             if (inode->i_data.nrpages)
486                     BUG();

(gdb) p inode->i_data
$5 = {clean_pages = {next = 0xd7423ae4, prev = 0xc1628280}, dirty_pages = {next = 0xd7423aec, prev = 0xd7423aec}, 
  locked_pages = {next = 0xd7423af4, prev = 0xd7423af4}, nrpages = 0x1, a_ops = 0xc02e4940, host = 0xd7423a40, 
  i_mmap = 0x0, i_mmap_shared = 0x0, i_shared_lock = {lock = 0x1}, gfp_mask = 0x5}

(gdb) p *(struct page *)0xd7423ae4
$6 = {list = {next = 0xd7423ae4, prev = 0xc1628280}, mapping = 0xd7423aec, index = 0xd7423aec, 
  next_hash = 0xd7423af4, count = {counter = 0xd7423af4}, flags = 0x1, lru = {next = 0xc02e4940, 
    prev = 0xd7423a40}, age = 0x0, wait = {lock = {lock = 0x0}, task_list = {next = 0x1, prev = 0x5}}, 
  pprev_hash = 0x0, buffers = 0x0, virtual = 0x0, zone = 0x0}
(gdb) p &inode->i_data

The lists are mangled.

I think what's happening is:

                curr = curr->next;
                offset = page->index;

                /* Is one of the pages to truncate? */
                if ((offset >= start) || (*partial && (offset + 1) == start)) {
                        list_del(head);
                        list_add_tail(head, curr);

`curr' can point to `head' when we get inside the `if'.
So we do, effectively,

	list_del(head);
	list_add_tail(head, head);

Which turns mapping->clean_pages into an empty list,
and the target page.list thinks it's on a list, but
really isn't.  A subsequent list_del() on the page
screws things up somehow.

Ho-hum.  I'll have another go tomorrow.  Probably a
simple `if (curr != head)' will cure it.

-

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] truncate_inode_pages
  2001-06-11 12:45         ` Andrew Morton
@ 2001-06-11 13:13           ` Daniel Phillips
  2001-06-11 13:28             ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2001-06-11 13:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alexander Viro, lkml

On Monday 11 June 2001 14:45, Andrew Morton wrote:
> Daniel Phillips wrote:
> > On Sunday 10 June 2001 03:31, Andrew Morton wrote:
> > > Daniel Phillips wrote:
> > > > This is easy, just set the list head to the page about to be
> > > > truncated.
> > >
> > > Works for me.
> >
> > It looks good, but it's black magic
>
> No, it's wrong.  I'm getting BUG()s in clear_inode():
> [...]
> The lists are mangled.

curr is being advanced in the wrong place.

/me makes note to self: never resist the temptation to clean things up the 
rest of the way

I'll actually apply the patch and try it this time ;-)

--
Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] truncate_inode_pages
  2001-06-11 13:13           ` Daniel Phillips
@ 2001-06-11 13:28             ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2001-06-11 13:28 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Alexander Viro, lkml

Daniel Phillips wrote:
> 
> On Monday 11 June 2001 14:45, Andrew Morton wrote:
> > Daniel Phillips wrote:
> > > On Sunday 10 June 2001 03:31, Andrew Morton wrote:
> > > > Daniel Phillips wrote:
> > > > > This is easy, just set the list head to the page about to be
> > > > > truncated.
> > > >
> > > > Works for me.
> > >
> > > It looks good, but it's black magic
> >
> > No, it's wrong.  I'm getting BUG()s in clear_inode():
> > [...]
> > The lists are mangled.
> 
> curr is being advanced in the wrong place.

Yes.

> /me makes note to self: never resist the temptation to clean things up the
> rest of the way
> 
> I'll actually apply the patch and try it this time ;-)

The bug is surprisingly hard to trigger.


Take three:


--- linux-2.4.5/mm/filemap.c	Mon May 28 13:31:49 2001
+++ linux-akpm/mm/filemap.c	Mon Jun 11 23:31:08 2001
@@ -230,17 +230,17 @@
 		unsigned long offset;
 
 		page = list_entry(curr, struct page, list);
-		curr = curr->next;
 		offset = page->index;
 
 		/* Is one of the pages to truncate? */
 		if ((offset >= start) || (*partial && (offset + 1) == start)) {
+			list_del(head);
+			list_add(head, curr);
 			if (TryLockPage(page)) {
 				page_cache_get(page);
 				spin_unlock(&pagecache_lock);
 				wait_on_page(page);
-				page_cache_release(page);
-				return 1;
+				goto out_restart;
 			}
 			page_cache_get(page);
 			spin_unlock(&pagecache_lock);
@@ -252,11 +252,15 @@
 				truncate_complete_page(page);
 
 			UnlockPage(page);
-			page_cache_release(page);
-			return 1;
+			goto out_restart;
 		}
+		curr = curr->next;
 	}
 	return 0;
+out_restart:
+	page_cache_release(page);
+	spin_lock(&pagecache_lock);
+	return 1;
 }
 
 
@@ -273,15 +277,19 @@
 {
 	unsigned long start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+	int complete;
 
-repeat:
 	spin_lock(&pagecache_lock);
-	if (truncate_list_pages(&mapping->clean_pages, start, &partial))
-		goto repeat;
-	if (truncate_list_pages(&mapping->dirty_pages, start, &partial))
-		goto repeat;
-	if (truncate_list_pages(&mapping->locked_pages, start, &partial))
-		goto repeat;
+	do {
+		complete = 1;
+		while (truncate_list_pages(&mapping->clean_pages, start, &partial))
+			complete = 0;
+		while (truncate_list_pages(&mapping->dirty_pages, start, &partial))
+			complete = 0;
+		while (truncate_list_pages(&mapping->locked_pages, start, &partial))
+			complete = 0;
+	} while (!complete);
+	/* Traversed all three lists without dropping the lock */
 	spin_unlock(&pagecache_lock);
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] truncate_inode_pages
       [not found]   ` <01061214324600.00879@starship>
@ 2001-06-12 16:32     ` Dieter Nützel
  0 siblings, 0 replies; 10+ messages in thread
From: Dieter Nützel @ 2001-06-12 16:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel List

Am Dienstag, 12. Juni 2001 14:32 schrieb Daniel Phillips:
> On Tuesday 12 June 2001 02:00, you wrote:
> > Now with -ac13 and the third try.
> > Is it final?
>
> There have been no further problems reported, but it's Andrew Morton's
> patch, his decision.
>
> --
> Daniel

Hello Andrew,

have you forwarded it to Alan for apply?

Thanks,
	Dieter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] truncate_inode_pages
@ 2001-06-11  2:22 Dieter Nützel
       [not found] ` <200106112252.AAA19615@mail.bonn-fries.net>
  0 siblings, 1 reply; 10+ messages in thread
From: Dieter Nützel @ 2001-06-11  2:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel List

> Daniel Phillips wrote:
> > 
> > This is easy, just set the list head to the page about to be truncated.
>
> Works for me.
>
> --- linux-2.4.5/mm/filemap.c    Mon May 28 13:31:49 2001
> +++ linux-akpm/mm/filemap.c     Sun Jun 10 11:29:19 2001
> @@ -235,12 +235,13 @@

[snip]

Works for me 12 times faster on my Athlon 550 and ReiserFS.

System:

Athlon 550 (old one, 0.25 µm)
MSI MS-6167 Rev 1.0B (Irongate C4)
320 MB SDRAM PC100-2-2-2
AHA-2940UW
IBM DDYS-T18350N 18 GB (UW/U160)
Kernel 2.4.5-ac12
Glibc-2.2 (SuSE 7.1)

SunWave1>cat /proc/version
Linux version 2.4.5-ac12 (root@SunWave1) (gcc version 2.95.2 19991024 
(release)) #1 Sat Jun 9 17:41:07 CEST 2001
 
SunWave1>time ./ftruncate
0.430u 54.790s 1:00.09 91.8%    0+0k 0+0io 32887pf+0w
 
With the mm/filemap.c fix:
 
SunWave1>cat /proc/version
Linux version 2.4.5-ac12 (root@SunWave1) (gcc version 2.95.2 19991024 
(release)) #1 Sun Jun 10 22:49:07 CEST 2001
 
SunWave1>time ./ftruncate
0.220u 1.670s 0:05.13 36.8%     0+0k 0+0io 32852pf+0w

Thanks,
	Dieter

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-06-12 16:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-09 15:51 [patch] truncate_inode_pages Andrew Morton
2001-06-09 17:40 ` Alexander Viro
2001-06-09 22:53   ` Daniel Phillips
2001-06-10  1:31     ` Andrew Morton
2001-06-10 16:40       ` Daniel Phillips
2001-06-11 12:45         ` Andrew Morton
2001-06-11 13:13           ` Daniel Phillips
2001-06-11 13:28             ` Andrew Morton
2001-06-11  2:22 Dieter Nützel
     [not found] ` <200106112252.AAA19615@mail.bonn-fries.net>
     [not found]   ` <01061214324600.00879@starship>
2001-06-12 16:32     ` Dieter Nützel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).