* [patch] truncate_inode_pages
@ 2001-06-09 15:51 Andrew Morton
2001-06-09 17:40 ` Alexander Viro
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2001-06-09 15:51 UTC (permalink / raw)
To: lkml
The ftruncate() in this program:
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int clean = 64 * 1024 * 1024;
int dirty = 64 * 1024 * 1024;
main()
{
int fd = open("foo", O_RDWR|O_TRUNC|O_CREAT, 0666);
void *mem;
ftruncate(fd, clean + dirty);
mem = mmap(0, clean + dirty, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
memset(mem, 0, clean + dirty);
msync(mem, clean, MS_SYNC);
ftruncate(fd, clean);
}
takes 45 seconds CPU time due to the O(clean * dirty) algorithm in
truncate_inode_pages(). The machine is locked up for the duration.
The patch reduces this to 20 milliseconds via an O(clean + dirty)
algorithm.
--- linux-2.4.5/mm/filemap.c Mon May 28 13:31:49 2001
+++ linux-akpm/mm/filemap.c Sun Jun 10 01:27:09 2001
@@ -273,15 +273,24 @@
{
unsigned long start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+ int complete;
-repeat:
spin_lock(&pagecache_lock);
- if (truncate_list_pages(&mapping->clean_pages, start, &partial))
- goto repeat;
- if (truncate_list_pages(&mapping->dirty_pages, start, &partial))
- goto repeat;
- if (truncate_list_pages(&mapping->locked_pages, start, &partial))
- goto repeat;
+ do {
+ complete = 1;
+ while (truncate_list_pages(&mapping->clean_pages, start, &partial)) {
+ spin_lock(&pagecache_lock);
+ complete = 0;
+ }
+ while (truncate_list_pages(&mapping->dirty_pages, start, &partial)) {
+ spin_lock(&pagecache_lock);
+ complete = 0;
+ }
+ while (truncate_list_pages(&mapping->locked_pages, start, &partial)) {
+ spin_lock(&pagecache_lock);
+ complete = 0;
+ }
+ } while (!complete);
spin_unlock(&pagecache_lock);
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] truncate_inode_pages
2001-06-09 15:51 [patch] truncate_inode_pages Andrew Morton
@ 2001-06-09 17:40 ` Alexander Viro
2001-06-09 22:53 ` Daniel Phillips
0 siblings, 1 reply; 10+ messages in thread
From: Alexander Viro @ 2001-06-09 17:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml
> takes 45 seconds CPU time due to the O(clean * dirty) algorithm in
> truncate_inode_pages(). The machine is locked up for the duration.
> The patch reduces this to 20 milliseconds via an O(clean + dirty)
> algorithm.
Unfortunately, it's _not_ O(clean + dirty).
> + while (truncate_list_pages(&mapping->clean_pages, start, &partial)) {
> + spin_lock(&pagecache_lock);
> + complete = 0;
> + }
Cool. Now think what happens if pages with large indices are in the
very end of list. Half of them. You skip clean/2 pages on each of
clean/2 passes. Hardly a linear behaviour - all you need is a different
program to trigger it.
Now, having a separate pass that would reorder the pages on list,
moving the to-kill ones in the beginning might help.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] truncate_inode_pages
2001-06-09 17:40 ` Alexander Viro
@ 2001-06-09 22:53 ` Daniel Phillips
2001-06-10 1:31 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2001-06-09 22:53 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton; +Cc: lkml
On Saturday 09 June 2001 19:40, Alexander Viro wrote:
> > takes 45 seconds CPU time due to the O(clean * dirty) algorithm in
> > truncate_inode_pages(). The machine is locked up for the duration.
> > The patch reduces this to 20 milliseconds via an O(clean + dirty)
> > algorithm.
>
> Unfortunately, it's _not_ O(clean + dirty).
>
> > + while (truncate_list_pages(&mapping->clean_pages, start,
> > &partial)) { + spin_lock(&pagecache_lock);
> > + complete = 0;
> > + }
>
> Cool. Now think what happens if pages with large indices are in the
> very end of list. Half of them. You skip clean/2 pages on each of
> clean/2 passes. Hardly a linear behaviour - all you need is a
> different program to trigger it.
>
> Now, having a separate pass that would reorder the pages on list,
> moving the to-kill ones in the beginning might help.
This is easy, just set the list head to the page about to be truncated.
--
Daniel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] truncate_inode_pages
2001-06-09 22:53 ` Daniel Phillips
@ 2001-06-10 1:31 ` Andrew Morton
2001-06-10 16:40 ` Daniel Phillips
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2001-06-10 1:31 UTC (permalink / raw)
To: Daniel Phillips; +Cc: Alexander Viro, lkml
Daniel Phillips wrote:
>
> This is easy, just set the list head to the page about to be truncated.
Works for me.
--- linux-2.4.5/mm/filemap.c Mon May 28 13:31:49 2001
+++ linux-akpm/mm/filemap.c Sun Jun 10 11:29:19 2001
@@ -235,12 +235,13 @@
/* Is one of the pages to truncate? */
if ((offset >= start) || (*partial && (offset + 1) == start)) {
+ list_del(head);
+ list_add_tail(head, curr);
if (TryLockPage(page)) {
page_cache_get(page);
spin_unlock(&pagecache_lock);
wait_on_page(page);
- page_cache_release(page);
- return 1;
+ goto out_restart;
}
page_cache_get(page);
spin_unlock(&pagecache_lock);
@@ -252,11 +253,14 @@
truncate_complete_page(page);
UnlockPage(page);
- page_cache_release(page);
- return 1;
+ goto out_restart;
}
}
return 0;
+out_restart:
+ page_cache_release(page);
+ spin_lock(&pagecache_lock);
+ return 1;
}
@@ -273,15 +277,18 @@
{
unsigned long start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+ int complete;
-repeat:
spin_lock(&pagecache_lock);
- if (truncate_list_pages(&mapping->clean_pages, start, &partial))
- goto repeat;
- if (truncate_list_pages(&mapping->dirty_pages, start, &partial))
- goto repeat;
- if (truncate_list_pages(&mapping->locked_pages, start, &partial))
- goto repeat;
+ do {
+ complete = 1;
+ while (truncate_list_pages(&mapping->clean_pages, start, &partial))
+ complete = 0;
+ while (truncate_list_pages(&mapping->dirty_pages, start, &partial))
+ complete = 0;
+ while (truncate_list_pages(&mapping->locked_pages, start, &partial))
+ complete = 0;
+ } while (!complete);
spin_unlock(&pagecache_lock);
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] truncate_inode_pages
2001-06-10 1:31 ` Andrew Morton
@ 2001-06-10 16:40 ` Daniel Phillips
2001-06-11 12:45 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2001-06-10 16:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: Alexander Viro, lkml
On Sunday 10 June 2001 03:31, Andrew Morton wrote:
> Daniel Phillips wrote:
> > This is easy, just set the list head to the page about to be truncated.
>
> Works for me.
It looks good, but it's black magic - it could use a comment along the lines
of:
/*
* Ensure at least one pass through all three lists with lock held
*/
There is some fuzz on the original code that could use a scrubbing. "start"
should not be the 1st full page needing truncating, it should be the first
page needing truncating at all, full or partial.
The truncation condition becomes:
- if ((offset >= start) || (*partial && (offset + 1) == start)) {
+ if (offset >= start) {
[...]
- if (*partial && (offset + 1) == start)
+ if (*partial && offset == start)
Then the conditions are so easy to test it's not worth passing the &partial,
just pass lstart... and... oh bother, it's not broke, why fix it ;-)
--
Daniel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] truncate_inode_pages
2001-06-10 16:40 ` Daniel Phillips
@ 2001-06-11 12:45 ` Andrew Morton
2001-06-11 13:13 ` Daniel Phillips
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2001-06-11 12:45 UTC (permalink / raw)
To: Daniel Phillips; +Cc: Alexander Viro, lkml
Daniel Phillips wrote:
>
> On Sunday 10 June 2001 03:31, Andrew Morton wrote:
> > Daniel Phillips wrote:
> > > This is easy, just set the list head to the page about to be truncated.
> >
> > Works for me.
>
> It looks good, but it's black magic
No, it's wrong. I'm getting BUG()s in clear_inode():
485 if (inode->i_data.nrpages)
486 BUG();
(gdb) p inode->i_data
$5 = {clean_pages = {next = 0xd7423ae4, prev = 0xc1628280}, dirty_pages = {next = 0xd7423aec, prev = 0xd7423aec},
locked_pages = {next = 0xd7423af4, prev = 0xd7423af4}, nrpages = 0x1, a_ops = 0xc02e4940, host = 0xd7423a40,
i_mmap = 0x0, i_mmap_shared = 0x0, i_shared_lock = {lock = 0x1}, gfp_mask = 0x5}
(gdb) p *(struct page *)0xd7423ae4
$6 = {list = {next = 0xd7423ae4, prev = 0xc1628280}, mapping = 0xd7423aec, index = 0xd7423aec,
next_hash = 0xd7423af4, count = {counter = 0xd7423af4}, flags = 0x1, lru = {next = 0xc02e4940,
prev = 0xd7423a40}, age = 0x0, wait = {lock = {lock = 0x0}, task_list = {next = 0x1, prev = 0x5}},
pprev_hash = 0x0, buffers = 0x0, virtual = 0x0, zone = 0x0}
(gdb) p &inode->i_data
The lists are mangled.
I think what's happening is:
curr = curr->next;
offset = page->index;
/* Is one of the pages to truncate? */
if ((offset >= start) || (*partial && (offset + 1) == start)) {
list_del(head);
list_add_tail(head, curr);
`curr' can point to `head' when we get inside the `if'.
So we do, effectively,
list_del(head);
list_add_tail(head, head);
Which turns mapping->clean_pages into an empty list,
and the target page.list thinks it's on a list, but
really isn't. A subsequent list_del() on the page
screws things up somehow.
Ho-hum. I'll have another go tomorrow. Probably a
simple `if (curr != head)' will cure it.
-
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] truncate_inode_pages
2001-06-11 12:45 ` Andrew Morton
@ 2001-06-11 13:13 ` Daniel Phillips
2001-06-11 13:28 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2001-06-11 13:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: Alexander Viro, lkml
On Monday 11 June 2001 14:45, Andrew Morton wrote:
> Daniel Phillips wrote:
> > On Sunday 10 June 2001 03:31, Andrew Morton wrote:
> > > Daniel Phillips wrote:
> > > > This is easy, just set the list head to the page about to be
> > > > truncated.
> > >
> > > Works for me.
> >
> > It looks good, but it's black magic
>
> No, it's wrong. I'm getting BUG()s in clear_inode():
> [...]
> The lists are mangled.
curr is being advanced in the wrong place.
/me makes note to self: never resist the temptation to clean things up the
rest of the way
I'll actually apply the patch and try it this time ;-)
--
Daniel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] truncate_inode_pages
2001-06-11 13:13 ` Daniel Phillips
@ 2001-06-11 13:28 ` Andrew Morton
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2001-06-11 13:28 UTC (permalink / raw)
To: Daniel Phillips; +Cc: Alexander Viro, lkml
Daniel Phillips wrote:
>
> On Monday 11 June 2001 14:45, Andrew Morton wrote:
> > Daniel Phillips wrote:
> > > On Sunday 10 June 2001 03:31, Andrew Morton wrote:
> > > > Daniel Phillips wrote:
> > > > > This is easy, just set the list head to the page about to be
> > > > > truncated.
> > > >
> > > > Works for me.
> > >
> > > It looks good, but it's black magic
> >
> > No, it's wrong. I'm getting BUG()s in clear_inode():
> > [...]
> > The lists are mangled.
>
> curr is being advanced in the wrong place.
Yes.
> /me makes note to self: never resist the temptation to clean things up the
> rest of the way
>
> I'll actually apply the patch and try it this time ;-)
The bug is surprisingly hard to trigger.
Take three:
--- linux-2.4.5/mm/filemap.c Mon May 28 13:31:49 2001
+++ linux-akpm/mm/filemap.c Mon Jun 11 23:31:08 2001
@@ -230,17 +230,17 @@
unsigned long offset;
page = list_entry(curr, struct page, list);
- curr = curr->next;
offset = page->index;
/* Is one of the pages to truncate? */
if ((offset >= start) || (*partial && (offset + 1) == start)) {
+ list_del(head);
+ list_add(head, curr);
if (TryLockPage(page)) {
page_cache_get(page);
spin_unlock(&pagecache_lock);
wait_on_page(page);
- page_cache_release(page);
- return 1;
+ goto out_restart;
}
page_cache_get(page);
spin_unlock(&pagecache_lock);
@@ -252,11 +252,15 @@
truncate_complete_page(page);
UnlockPage(page);
- page_cache_release(page);
- return 1;
+ goto out_restart;
}
+ curr = curr->next;
}
return 0;
+out_restart:
+ page_cache_release(page);
+ spin_lock(&pagecache_lock);
+ return 1;
}
@@ -273,15 +277,19 @@
{
unsigned long start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+ int complete;
-repeat:
spin_lock(&pagecache_lock);
- if (truncate_list_pages(&mapping->clean_pages, start, &partial))
- goto repeat;
- if (truncate_list_pages(&mapping->dirty_pages, start, &partial))
- goto repeat;
- if (truncate_list_pages(&mapping->locked_pages, start, &partial))
- goto repeat;
+ do {
+ complete = 1;
+ while (truncate_list_pages(&mapping->clean_pages, start, &partial))
+ complete = 0;
+ while (truncate_list_pages(&mapping->dirty_pages, start, &partial))
+ complete = 0;
+ while (truncate_list_pages(&mapping->locked_pages, start, &partial))
+ complete = 0;
+ } while (!complete);
+ /* Traversed all three lists without dropping the lock */
spin_unlock(&pagecache_lock);
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] truncate_inode_pages
[not found] ` <01061214324600.00879@starship>
@ 2001-06-12 16:32 ` Dieter Nützel
0 siblings, 0 replies; 10+ messages in thread
From: Dieter Nützel @ 2001-06-12 16:32 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Kernel List
Am Dienstag, 12. Juni 2001 14:32 schrieb Daniel Phillips:
> On Tuesday 12 June 2001 02:00, you wrote:
> > Now with -ac13 and the third try.
> > Is it final?
>
> There have been no further problems reported, but it's Andrew Morton's
> patch, his decision.
>
> --
> Daniel
Hello Andrew,
have you forwarded it to Alan for apply?
Thanks,
Dieter
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] truncate_inode_pages
@ 2001-06-11 2:22 Dieter Nützel
[not found] ` <200106112252.AAA19615@mail.bonn-fries.net>
0 siblings, 1 reply; 10+ messages in thread
From: Dieter Nützel @ 2001-06-11 2:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Kernel List
> Daniel Phillips wrote:
> >
> > This is easy, just set the list head to the page about to be truncated.
>
> Works for me.
>
> --- linux-2.4.5/mm/filemap.c Mon May 28 13:31:49 2001
> +++ linux-akpm/mm/filemap.c Sun Jun 10 11:29:19 2001
> @@ -235,12 +235,13 @@
[snip]
Works for me 12 times faster on my Athlon 550 and ReiserFS.
System:
Athlon 550 (old one, 0.25 µm)
MSI MS-6167 Rev 1.0B (Irongate C4)
320 MB SDRAM PC100-2-2-2
AHA-2940UW
IBM DDYS-T18350N 18 GB (UW/U160)
Kernel 2.4.5-ac12
Glibc-2.2 (SuSE 7.1)
SunWave1>cat /proc/version
Linux version 2.4.5-ac12 (root@SunWave1) (gcc version 2.95.2 19991024
(release)) #1 Sat Jun 9 17:41:07 CEST 2001
SunWave1>time ./ftruncate
0.430u 54.790s 1:00.09 91.8% 0+0k 0+0io 32887pf+0w
With the mm/filemap.c fix:
SunWave1>cat /proc/version
Linux version 2.4.5-ac12 (root@SunWave1) (gcc version 2.95.2 19991024
(release)) #1 Sun Jun 10 22:49:07 CEST 2001
SunWave1>time ./ftruncate
0.220u 1.670s 0:05.13 36.8% 0+0k 0+0io 32852pf+0w
Thanks,
Dieter
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2001-06-12 16:23 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-09 15:51 [patch] truncate_inode_pages Andrew Morton
2001-06-09 17:40 ` Alexander Viro
2001-06-09 22:53 ` Daniel Phillips
2001-06-10 1:31 ` Andrew Morton
2001-06-10 16:40 ` Daniel Phillips
2001-06-11 12:45 ` Andrew Morton
2001-06-11 13:13 ` Daniel Phillips
2001-06-11 13:28 ` Andrew Morton
2001-06-11 2:22 Dieter Nützel
[not found] ` <200106112252.AAA19615@mail.bonn-fries.net>
[not found] ` <01061214324600.00879@starship>
2001-06-12 16:32 ` Dieter Nützel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).