All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] mm: write_cache_pages be more sequential
@ 2009-09-07 14:07 Adrian Hunter
  2009-09-07 14:15 ` Nick Piggin
  2009-09-14 15:44 ` Artem Bityutskiy
  0 siblings, 2 replies; 7+ messages in thread
From: Adrian Hunter @ 2009-09-07 14:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: chris.mason, david, Andrew Morton,
	Bityutskiy Artem (Nokia-M/Helsinki),
	LKML

>From 6f3bb7c26936c45d810048f59c369e8d5a5623fc Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter@nokia.com>
Date: Mon, 7 Sep 2009 10:49:11 +0300
Subject: [PATCH] mm: write_cache_pages be more sequential

If a file is written to sequentially, then writeback
should write the pages sequentially also.  However,
that does not always happen.  For example:

1) user writes pages 0, 1 and 2 but 2 is incomplete
2) write_cache_pages writes pages 0, 1 and 2 and sets
writeback_index to 3
3) user finishes writing page 2 and writes pages 3 and 4
4) write_cache_pages writes pages 3 and 4, and then cycles
back and writes page 2 again.

So the pages are written out in the order 0, 1, 2, 3 ,4 ,2
instead of 0, 1, 2, 2, 3, 4.

This situation was noticed on UBIFS because it writes
directly from writepage.  Hence if there is an unexpected
power-loss, a file will end up with a hole even though
the file was written sequentially by the user.

Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
---
 mm/page-writeback.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 81627eb..7410b7a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -960,6 +960,8 @@ int write_cache_pages(struct address_space *mapping,
 	pagevec_init(&pvec, 0);
 	if (wbc->range_cyclic) {
 		writeback_index = mapping->writeback_index; /* prev offset */
+		if (writeback_index)
+			writeback_index -= 1;
 		index = writeback_index;
 		if (index == 0)
 			cycled = 1;
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] mm: write_cache_pages be more sequential
  2009-09-07 14:07 [RFC][PATCH] mm: write_cache_pages be more sequential Adrian Hunter
@ 2009-09-07 14:15 ` Nick Piggin
  2009-09-07 14:29   ` Adrian Hunter
  2009-09-14 15:44 ` Artem Bityutskiy
  1 sibling, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2009-09-07 14:15 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: chris.mason, david, Andrew Morton,
	Bityutskiy Artem (Nokia-M/Helsinki),
	LKML

On Mon, Sep 07, 2009 at 05:07:38PM +0300, Adrian Hunter wrote:
> >From 6f3bb7c26936c45d810048f59c369e8d5a5623fc Mon Sep 17 00:00:00 2001
> From: Adrian Hunter <adrian.hunter@nokia.com>
> Date: Mon, 7 Sep 2009 10:49:11 +0300
> Subject: [PATCH] mm: write_cache_pages be more sequential
> 
> If a file is written to sequentially, then writeback
> should write the pages sequentially also.  However,
> that does not always happen.  For example:
> 
> 1) user writes pages 0, 1 and 2 but 2 is incomplete
> 2) write_cache_pages writes pages 0, 1 and 2 and sets
> writeback_index to 3
> 3) user finishes writing page 2 and writes pages 3 and 4
> 4) write_cache_pages writes pages 3 and 4, and then cycles
> back and writes page 2 again.
> 
> So the pages are written out in the order 0, 1, 2, 3 ,4 ,2
> instead of 0, 1, 2, 2, 3, 4.

Why does page 2 get set dirty if the write was incomplete?

 
> This situation was noticed on UBIFS because it writes
> directly from writepage.  Hence if there is an unexpected
> power-loss, a file will end up with a hole even though
> the file was written sequentially by the user.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
> ---
> mm/page-writeback.c |    2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 81627eb..7410b7a 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -960,6 +960,8 @@ int write_cache_pages(struct address_space *mapping,
> 	pagevec_init(&pvec, 0);
> 	if (wbc->range_cyclic) {
> 		writeback_index = mapping->writeback_index; /* prev offset */
> +		if (writeback_index)
> +			writeback_index -= 1;
> 		index = writeback_index;
> 		if (index == 0)
> 			cycled = 1;

Doesn't this just break range_cyclic? range_cyclic is supposed to
work across calls to write_cache_pages, and it's there I guess so
background writeout will be able to eventually get around to writing
all pages relatively fairly in the presence of redirtying operations.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] mm: write_cache_pages be more sequential
  2009-09-07 14:15 ` Nick Piggin
@ 2009-09-07 14:29   ` Adrian Hunter
  2009-09-07 14:45     ` Nick Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Adrian Hunter @ 2009-09-07 14:29 UTC (permalink / raw)
  To: Nick Piggin
  Cc: chris.mason, david, Andrew Morton,
	Bityutskiy Artem (Nokia-D/Helsinki),
	LKML

Nick Piggin wrote:
> On Mon, Sep 07, 2009 at 05:07:38PM +0300, Adrian Hunter wrote:
>> >From 6f3bb7c26936c45d810048f59c369e8d5a5623fc Mon Sep 17 00:00:00 2001
>> From: Adrian Hunter <adrian.hunter@nokia.com>
>> Date: Mon, 7 Sep 2009 10:49:11 +0300
>> Subject: [PATCH] mm: write_cache_pages be more sequential
>>
>> If a file is written to sequentially, then writeback
>> should write the pages sequentially also.  However,
>> that does not always happen.  For example:
>>
>> 1) user writes pages 0, 1 and 2 but 2 is incomplete
>> 2) write_cache_pages writes pages 0, 1 and 2 and sets
>> writeback_index to 3
>> 3) user finishes writing page 2 and writes pages 3 and 4
>> 4) write_cache_pages writes pages 3 and 4, and then cycles
>> back and writes page 2 again.
>>
>> So the pages are written out in the order 0, 1, 2, 3 ,4 ,2
>> instead of 0, 1, 2, 2, 3, 4.
> 
> Why does page 2 get set dirty if the write was incomplete?

I meant that only part of the page was written. e.g.
write 10240 bytes, wait for writeback, then write another
10240 bytes.  The pages will be written out in the order
0, 1, 2, 3, 4, 2

>> This situation was noticed on UBIFS because it writes
>> directly from writepage.  Hence if there is an unexpected
>> power-loss, a file will end up with a hole even though
>> the file was written sequentially by the user.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
>> ---
>> mm/page-writeback.c |    2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index 81627eb..7410b7a 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -960,6 +960,8 @@ int write_cache_pages(struct address_space *mapping,
>> 	pagevec_init(&pvec, 0);
>> 	if (wbc->range_cyclic) {
>> 		writeback_index = mapping->writeback_index; /* prev offset */
>> +		if (writeback_index)
>> +			writeback_index -= 1;
>> 		index = writeback_index;
>> 		if (index == 0)
>> 			cycled = 1;
> 
> Doesn't this just break range_cyclic? range_cyclic is supposed to
> work across calls to write_cache_pages, and it's there I guess so
> background writeout will be able to eventually get around to writing
> all pages relatively fairly in the presence of redirtying operations.

I do not immediately see how it breaks range_cyclic.  Can you give an
example?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] mm: write_cache_pages be more sequential
  2009-09-07 14:29   ` Adrian Hunter
@ 2009-09-07 14:45     ` Nick Piggin
  2009-09-07 18:12       ` Aaron Straus
  2009-09-09  8:03       ` Adrian Hunter
  0 siblings, 2 replies; 7+ messages in thread
From: Nick Piggin @ 2009-09-07 14:45 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: chris.mason, david, Andrew Morton,
	Bityutskiy Artem (Nokia-D/Helsinki),
	LKML

On Mon, Sep 07, 2009 at 05:29:07PM +0300, Adrian Hunter wrote:
> Nick Piggin wrote:
> >On Mon, Sep 07, 2009 at 05:07:38PM +0300, Adrian Hunter wrote:
> >>>From 6f3bb7c26936c45d810048f59c369e8d5a5623fc Mon Sep 17 00:00:00 2001
> >>From: Adrian Hunter <adrian.hunter@nokia.com>
> >>Date: Mon, 7 Sep 2009 10:49:11 +0300
> >>Subject: [PATCH] mm: write_cache_pages be more sequential
> >>
> >>If a file is written to sequentially, then writeback
> >>should write the pages sequentially also.  However,
> >>that does not always happen.  For example:
> >>
> >>1) user writes pages 0, 1 and 2 but 2 is incomplete
> >>2) write_cache_pages writes pages 0, 1 and 2 and sets
> >>writeback_index to 3
> >>3) user finishes writing page 2 and writes pages 3 and 4
> >>4) write_cache_pages writes pages 3 and 4, and then cycles
> >>back and writes page 2 again.
> >>
> >>So the pages are written out in the order 0, 1, 2, 3 ,4 ,2
> >>instead of 0, 1, 2, 2, 3, 4.
> >
> >Why does page 2 get set dirty if the write was incomplete?
> 
> I meant that only part of the page was written. e.g.
> write 10240 bytes, wait for writeback, then write another
> 10240 bytes.  The pages will be written out in the order
> 0, 1, 2, 3, 4, 2

OK...


> >>This situation was noticed on UBIFS because it writes
> >>directly from writepage.  Hence if there is an unexpected
> >>power-loss, a file will end up with a hole even though
> >>the file was written sequentially by the user.
> >>
> >>Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
> >>---
> >>mm/page-writeback.c |    2 ++
> >>1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >>diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> >>index 81627eb..7410b7a 100644
> >>--- a/mm/page-writeback.c
> >>+++ b/mm/page-writeback.c
> >>@@ -960,6 +960,8 @@ int write_cache_pages(struct address_space *mapping,
> >>	pagevec_init(&pvec, 0);
> >>	if (wbc->range_cyclic) {
> >>		writeback_index = mapping->writeback_index; /* prev offset */
> >>+		if (writeback_index)
> >>+			writeback_index -= 1;
> >>		index = writeback_index;
> >>		if (index == 0)
> >>			cycled = 1;
> >
> >Doesn't this just break range_cyclic? range_cyclic is supposed to
> >work across calls to write_cache_pages, and it's there I guess so
> >background writeout will be able to eventually get around to writing
> >all pages relatively fairly in the presence of redirtying operations.
> 
> I do not immediately see how it breaks range_cyclic.  Can you give an
> example?

Oh, I must be dyslexic, I read it as writeback_index = -1; :P
But I think it can still cause some subtle problems with error
cases.

I guess you could just make the done_index assignment more logical
and make it page->index. Then add a comment when assigning to
writeback_index that you want to start up again at the previously
written page to help this case.

Also, check to ensure the error cases are going to still work correctly.
Eg. you might want to increment done_index in the case of error.

I guess it is a reasonable workaround for the problem. It is a bit
unsatisfying to special case on a page basis like this, but anyway
I don't think there should be a realistic downside in practice.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] mm: write_cache_pages be more sequential
  2009-09-07 14:45     ` Nick Piggin
@ 2009-09-07 18:12       ` Aaron Straus
  2009-09-09  8:03       ` Adrian Hunter
  1 sibling, 0 replies; 7+ messages in thread
From: Aaron Straus @ 2009-09-07 18:12 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Adrian Hunter, chris.mason, david, Andrew Morton,
	Bityutskiy Artem (Nokia-D/Helsinki),
	LKML

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

On Sep 07 04:45 PM, Nick Piggin wrote:
> > I meant that only part of the page was written. e.g.
> > write 10240 bytes, wait for writeback, then write another
> > 10240 bytes.  The pages will be written out in the order
> > 0, 1, 2, 3, 4, 2
> 
> OK...
> 
> I guess it is a reasonable workaround for the problem. It is a bit
> unsatisfying to special case on a page basis like this, but anyway
> I don't think there should be a realistic downside in practice.

*NOTE* we also see this in NFS (I think) where a file is written to
sequentially yet readers will see holes in the file.

This is allowed behavior for NFS but annoying.  So a work-around would
be great!

					=a=


-- 
===================
Aaron Straus
aaron@merfinllc.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] mm: write_cache_pages be more sequential
  2009-09-07 14:45     ` Nick Piggin
  2009-09-07 18:12       ` Aaron Straus
@ 2009-09-09  8:03       ` Adrian Hunter
  1 sibling, 0 replies; 7+ messages in thread
From: Adrian Hunter @ 2009-09-09  8:03 UTC (permalink / raw)
  To: Nick Piggin
  Cc: chris.mason, david, Andrew Morton,
	Bityutskiy Artem (Nokia-D/Helsinki),
	LKML, Theodore Tso

ext Nick Piggin wrote:
> On Mon, Sep 07, 2009 at 05:29:07PM +0300, Adrian Hunter wrote:
>> Nick Piggin wrote:
>>> On Mon, Sep 07, 2009 at 05:07:38PM +0300, Adrian Hunter wrote:
>>>> >From 6f3bb7c26936c45d810048f59c369e8d5a5623fc Mon Sep 17 00:00:00 2001
>>>> From: Adrian Hunter <adrian.hunter@nokia.com>
>>>> Date: Mon, 7 Sep 2009 10:49:11 +0300
>>>> Subject: [PATCH] mm: write_cache_pages be more sequential
>>>>
>>>> If a file is written to sequentially, then writeback
>>>> should write the pages sequentially also.  However,
>>>> that does not always happen.  For example:
>>>>
>>>> 1) user writes pages 0, 1 and 2 but 2 is incomplete
>>>> 2) write_cache_pages writes pages 0, 1 and 2 and sets
>>>> writeback_index to 3
>>>> 3) user finishes writing page 2 and writes pages 3 and 4
>>>> 4) write_cache_pages writes pages 3 and 4, and then cycles
>>>> back and writes page 2 again.
>>>>
>>>> So the pages are written out in the order 0, 1, 2, 3 ,4 ,2
>>>> instead of 0, 1, 2, 2, 3, 4.
>>> Why does page 2 get set dirty if the write was incomplete?
>> I meant that only part of the page was written. e.g.
>> write 10240 bytes, wait for writeback, then write another
>> 10240 bytes.  The pages will be written out in the order
>> 0, 1, 2, 3, 4, 2
> 
> OK...
> 
> 
>>>> This situation was noticed on UBIFS because it writes
>>>> directly from writepage.  Hence if there is an unexpected
>>>> power-loss, a file will end up with a hole even though
>>>> the file was written sequentially by the user.
>>>>
>>>> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
>>>> ---
>>>> mm/page-writeback.c |    2 ++
>>>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>> index 81627eb..7410b7a 100644
>>>> --- a/mm/page-writeback.c
>>>> +++ b/mm/page-writeback.c
>>>> @@ -960,6 +960,8 @@ int write_cache_pages(struct address_space *mapping,
>>>> 	pagevec_init(&pvec, 0);
>>>> 	if (wbc->range_cyclic) {
>>>> 		writeback_index = mapping->writeback_index; /* prev offset */
>>>> +		if (writeback_index)
>>>> +			writeback_index -= 1;
>>>> 		index = writeback_index;
>>>> 		if (index == 0)
>>>> 			cycled = 1;
>>> Doesn't this just break range_cyclic? range_cyclic is supposed to
>>> work across calls to write_cache_pages, and it's there I guess so
>>> background writeout will be able to eventually get around to writing
>>> all pages relatively fairly in the presence of redirtying operations.
>> I do not immediately see how it breaks range_cyclic.  Can you give an
>> example?
> 
> Oh, I must be dyslexic, I read it as writeback_index = -1; :P
> But I think it can still cause some subtle problems with error
> cases.
> 
> I guess you could just make the done_index assignment more logical
> and make it page->index. Then add a comment when assigning to
> writeback_index that you want to start up again at the previously
> written page to help this case.

That means changing slightly the meaning of writeback_index which will
mean more analysis to avoid unexpected side-effects.

Speaking of unexpected side-effects, I glanced at ext4_da_writepages()
which contains the line:

		wbc->nr_to_write -= mpd.pages_written;

which should probably be:

		if (mpd.pages_written >= wbc->nr_to_write)
			wbc->nr_to_write = 0;
		else
			wbc->nr_to_write -= mpd.pages_written;

now that write_cache_pages() can write more than wbc->nr_to_write
pages.  What do you think?

> Also, check to ensure the error cases are going to still work correctly.
> Eg. you might want to increment done_index in the case of error.

Sure.

> I guess it is a reasonable workaround for the problem. It is a bit
> unsatisfying to special case on a page basis like this, but anyway
> I don't think there should be a realistic downside in practice.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH] mm: write_cache_pages be more sequential
  2009-09-07 14:07 [RFC][PATCH] mm: write_cache_pages be more sequential Adrian Hunter
  2009-09-07 14:15 ` Nick Piggin
@ 2009-09-14 15:44 ` Artem Bityutskiy
  1 sibling, 0 replies; 7+ messages in thread
From: Artem Bityutskiy @ 2009-09-14 15:44 UTC (permalink / raw)
  To: Nick Piggin
  Cc: chris.mason, david, Andrew Morton,
	Bityutskiy Artem (Nokia-M/Helsinki),
	LKML

On Mon, 2009-09-07 at 17:07 +0300, Adrian Hunter wrote:
> From 6f3bb7c26936c45d810048f59c369e8d5a5623fc Mon Sep 17 00:00:00 2001
> From: Adrian Hunter <adrian.hunter@nokia.com>
> Date: Mon, 7 Sep 2009 10:49:11 +0300
> Subject: [PATCH] mm: write_cache_pages be more sequential
> 
> If a file is written to sequentially, then writeback
> should write the pages sequentially also.  However,
> that does not always happen.  For example:
> 
> 1) user writes pages 0, 1 and 2 but 2 is incomplete
> 2) write_cache_pages writes pages 0, 1 and 2 and sets
> writeback_index to 3
> 3) user finishes writing page 2 and writes pages 3 and 4
> 4) write_cache_pages writes pages 3 and 4, and then cycles
> back and writes page 2 again.
> 
> So the pages are written out in the order 0, 1, 2, 3 ,4 ,2
> instead of 0, 1, 2, 2, 3, 4.
> 
> This situation was noticed on UBIFS because it writes
> directly from writepage.  Hence if there is an unexpected
> power-loss, a file will end up with a hole even though
> the file was written sequentially by the user.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>

I wonder, who would merge this patch?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-09-14 15:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-07 14:07 [RFC][PATCH] mm: write_cache_pages be more sequential Adrian Hunter
2009-09-07 14:15 ` Nick Piggin
2009-09-07 14:29   ` Adrian Hunter
2009-09-07 14:45     ` Nick Piggin
2009-09-07 18:12       ` Aaron Straus
2009-09-09  8:03       ` Adrian Hunter
2009-09-14 15:44 ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.