linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: bdflush flushing memory mapped pages.
  2003-04-09 19:20 bdflush flushing memory mapped pages Keith Ansell
@ 2003-04-09  9:13 ` Andre Hedrick
  2003-04-09  9:27   ` Andrew Morton
  2003-04-09  9:22 ` Arjan van de Ven
  2003-04-09 10:39 ` Alan Cox
  2 siblings, 1 reply; 10+ messages in thread
From: Andre Hedrick @ 2003-04-09  9:13 UTC (permalink / raw)
  To: Keith Ansell; +Cc: linux-kernel, Jens Axboe


Funny you mention this point!

I just spent 30-45 minutes on the phone talking to Jens about this very
issue.  Jens states he can map the model in to 2.5. and will give it a
fling in a bit.  This issue is a must; however, I had given up on the idea
until 2.7.  However, the issues he and I addressed, in combination to your
request jive in sync.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Wed, 9 Apr 2003, Keith Ansell wrote:

> help
> 
> My application uses SHARED memory mapping files for file I/O, and we have
> observed
> that Linux does not flush dirty pages to disk until munmap or msync are
> called.
> 
> I would like to know are there any development plans which would address
> this issue or
> if there is a version of bdflush which flushes write required pages (dirty
> pages) to disk?
> 
> Regards
>         Keith Ansell.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bdflush flushing memory mapped pages.
  2003-04-09 19:20 bdflush flushing memory mapped pages Keith Ansell
  2003-04-09  9:13 ` Andre Hedrick
@ 2003-04-09  9:22 ` Arjan van de Ven
  2003-04-09 10:39 ` Alan Cox
  2 siblings, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2003-04-09  9:22 UTC (permalink / raw)
  To: Keith Ansell; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]

On Wed, 2003-04-09 at 21:20, Keith Ansell wrote:
> help
> 
> My application uses SHARED memory mapping files for file I/O, and we have
> observed
> that Linux does not flush dirty pages to disk until munmap or msync are
> called.
> 
> I would like to know are there any development plans which would address
> this issue or
> if there is a version of bdflush which flushes write required pages (dirty
> pages) to disk?

The linux behavior is perfectly fine and conformant to the posix/sus
specifications, applications that break because of this are defective
and need fixing. This is a performance optimisation that the OS is
perfectly allowed to do.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bdflush flushing memory mapped pages.
  2003-04-09  9:13 ` Andre Hedrick
@ 2003-04-09  9:27   ` Andrew Morton
  2003-04-09  9:33     ` Jens Axboe
  2003-04-10 20:09     ` Keith Ansell
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2003-04-09  9:27 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: keitha, linux-kernel, axboe

Andre Hedrick <andre@linux-ide.org> wrote:
>
> 
> Funny you mention this point!
> 
> I just spent 30-45 minutes on the phone talking to Jens about this very
> issue.  Jens states he can map the model in to 2.5. and will give it a
> fling in a bit.  This issue is a must; however, I had given up on the idea
> until 2.7.  However, the issues he and I addressed, in combination to your
> request jive in sync.

noooo.....   This isn't going to happen.  There are many reasons.

Firstly, how can bdflush even know what pages to write?  The dirtiness of
these pages is recorded *only* in some processor's hardware pte cache and/or
the software pagetables.  Someone needs to go tell all the CPUs to writeback
their pte caches into the pagetables and then someone needs to walk the
pagetables propagating the pte dirty bit into the pageframes before we can
even start the I/O.

That's what msync does, in filemap_sync().


And even if bdflush did this automagically, it's the wrong thing to do
because the application could very well be repeatedly dirtying the pages. 
Very probably.  So we've just gone and done a ton of pointless I/O, over and
over.

You can view MAP_SHARED as an IPC mechanism which uses the filesystem
namespace for naming.  No way do these people want bdflush pointlessly
hammering the disk.

You can also view MAP_SHARED as a (strange) way of writing files out.  If you
want to do that then fine, but you need to tell the kernel when you've
finished, just like write() does.   You do that with msync.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bdflush flushing memory mapped pages.
  2003-04-09  9:27   ` Andrew Morton
@ 2003-04-09  9:33     ` Jens Axboe
  2003-04-10 20:09     ` Keith Ansell
  1 sibling, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2003-04-09  9:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andre Hedrick, keitha, linux-kernel

On Wed, Apr 09 2003, Andrew Morton wrote:
> Andre Hedrick <andre@linux-ide.org> wrote:
> >
> > 
> > Funny you mention this point!
> > 
> > I just spent 30-45 minutes on the phone talking to Jens about this very
> > issue.  Jens states he can map the model in to 2.5. and will give it a
> > fling in a bit.  This issue is a must; however, I had given up on the idea
> > until 2.7.  However, the issues he and I addressed, in combination to your
> > request jive in sync.
> 
> noooo.....   This isn't going to happen.  There are many reasons.

[snip]

This isn't Andres point at all. He wants a way to defer completion of
requests to the block layer until you actually know they are on platter.
I think he just tied it into this thread because it sort-of deals with
deferred errors as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bdflush flushing memory mapped pages.
  2003-04-09 19:20 bdflush flushing memory mapped pages Keith Ansell
  2003-04-09  9:13 ` Andre Hedrick
  2003-04-09  9:22 ` Arjan van de Ven
@ 2003-04-09 10:39 ` Alan Cox
  2 siblings, 0 replies; 10+ messages in thread
From: Alan Cox @ 2003-04-09 10:39 UTC (permalink / raw)
  To: Keith Ansell; +Cc: Linux Kernel Mailing List

On Mer, 2003-04-09 at 20:20, Keith Ansell wrote:
> help
> 
> My application uses SHARED memory mapping files for file I/O, and we have
> observed
> that Linux does not flush dirty pages to disk until munmap or msync are
> called.

This is correct behaviour



^ permalink raw reply	[flat|nested] 10+ messages in thread

* bdflush flushing memory mapped pages.
@ 2003-04-09 19:20 Keith Ansell
  2003-04-09  9:13 ` Andre Hedrick
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Keith Ansell @ 2003-04-09 19:20 UTC (permalink / raw)
  To: linux-kernel

help

My application uses SHARED memory mapping files for file I/O, and we have
observed
that Linux does not flush dirty pages to disk until munmap or msync are
called.

I would like to know are there any development plans which would address
this issue or
if there is a version of bdflush which flushes write required pages (dirty
pages) to disk?

Regards
        Keith Ansell.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bdflush flushing memory mapped pages.
  2003-04-10 20:09     ` Keith Ansell
@ 2003-04-10  8:16       ` Andre Hedrick
  2003-04-10  9:02       ` Nick Piggin
  2003-04-10 17:04       ` Alan Cox
  2 siblings, 0 replies; 10+ messages in thread
From: Andre Hedrick @ 2003-04-10  8:16 UTC (permalink / raw)
  To: Keith Ansell; +Cc: Andrew Morton, linux-kernel, axboe


Keith,

I know what you are asking for and need.
It is a requirement to be "Enterprise".
What you are seeking will take time, and effort.

I have explained successfully to Jens (block maintainer) the issues of
data integrity.  If you can prove this becomes a data integrity issue,
which I know it is for the general case, your argument will have strength.

Nothing stops "fastfreenet.com" from funding the development time.

ASS-GAS-GRASS-CASH, Linux is free but my time is not.

If you want to discuss more of this offline, I will listen and help make
the case.

Cheers,

Andre Hedrick
LAD Storage Consulting Group


On Thu, 10 Apr 2003, Keith Ansell wrote:

> Thank you for your prompt replies.
> 
> I realise that Linux conforms to the letter of the specification, but maybe
> not the spirit of the it.
> 
> I am porting a Database solution to Linux from Unix SVR4, Sco OpenServer and
> AIX, where all write required memory mapped files are flushed to disk with
> the system flusher, my users have large systems (some in excess of 600
> concurrent connections) flushing memory mapped files is a big part of are
> systems performance.  This ensures that in the event of a catastrophic
> system failure the customers vitual business data has been written to disk .
> 
> Keith Ansell
> 
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: "Andrew Morton" <akpm@digeo.com>
> To: "Andre Hedrick" <andre@linux-ide.org>
> Cc: <keitha@edp.fastfreenet.com>; <linux-kernel@vger.kernel.org>;
> <axboe@suse.de>
> Sent: Wednesday, April 09, 2003 10:27 AM
> Subject: Re: bdflush flushing memory mapped pages.
> 
> 
> > Andre Hedrick <andre@linux-ide.org> wrote:
> > >
> > >
> > > Funny you mention this point!
> > >
> > > I just spent 30-45 minutes on the phone talking to Jens about this very
> > > issue.  Jens states he can map the model in to 2.5. and will give it a
> > > fling in a bit.  This issue is a must; however, I had given up on the
> idea
> > > until 2.7.  However, the issues he and I addressed, in combination to
> your
> > > request jive in sync.
> >
> > noooo.....   This isn't going to happen.  There are many reasons.
> >
> > Firstly, how can bdflush even know what pages to write?  The dirtiness of
> > these pages is recorded *only* in some processor's hardware pte cache
> and/or
> > the software pagetables.  Someone needs to go tell all the CPUs to
> writeback
> > their pte caches into the pagetables and then someone needs to walk the
> > pagetables propagating the pte dirty bit into the pageframes before we can
> > even start the I/O.
> >
> > That's what msync does, in filemap_sync().
> >
> >
> > And even if bdflush did this automagically, it's the wrong thing to do
> > because the application could very well be repeatedly dirtying the pages.
> > Very probably.  So we've just gone and done a ton of pointless I/O, over
> and
> > over.
> >
> > You can view MAP_SHARED as an IPC mechanism which uses the filesystem
> > namespace for naming.  No way do these people want bdflush pointlessly
> > hammering the disk.
> >
> > You can also view MAP_SHARED as a (strange) way of writing files out.  If
> you
> > want to do that then fine, but you need to tell the kernel when you've
> > finished, just like write() does.   You do that with msync.
> >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bdflush flushing memory mapped pages.
  2003-04-10 20:09     ` Keith Ansell
  2003-04-10  8:16       ` Andre Hedrick
@ 2003-04-10  9:02       ` Nick Piggin
  2003-04-10 17:04       ` Alan Cox
  2 siblings, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2003-04-10  9:02 UTC (permalink / raw)
  To: Keith Ansell; +Cc: Andrew Morton, Andre Hedrick, linux-kernel, axboe



Keith Ansell wrote:

>Thank you for your prompt replies.
>
>I realise that Linux conforms to the letter of the specification, but maybe
>not the spirit of the it.
>
>I am porting a Database solution to Linux from Unix SVR4, Sco OpenServer and
>AIX, where all write required memory mapped files are flushed to disk with
>the system flusher, my users have large systems (some in excess of 600
>concurrent connections) flushing memory mapped files is a big part of are
>systems performance.  This ensures that in the event of a catastrophic
>system failure the customers vitual business data has been written to disk .
>
As Andrew mentioned, msync would do what you want. It seems
to me though, that your database solution wants a stronger
guarantee about the safety of the data than asynchronous
writes will provide anyway.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bdflush flushing memory mapped pages.
  2003-04-10 20:09     ` Keith Ansell
  2003-04-10  8:16       ` Andre Hedrick
  2003-04-10  9:02       ` Nick Piggin
@ 2003-04-10 17:04       ` Alan Cox
  2 siblings, 0 replies; 10+ messages in thread
From: Alan Cox @ 2003-04-10 17:04 UTC (permalink / raw)
  To: Keith Ansell
  Cc: Andrew Morton, Andre Hedrick, Linux Kernel Mailing List, axboe

On Iau, 2003-04-10 at 21:09, Keith Ansell wrote:
> I am porting a Database solution to Linux from Unix SVR4, Sco OpenServer and
> AIX, where all write required memory mapped files are flushed to disk with
> the system flusher, my users have large systems (some in excess of 600
> concurrent connections) flushing memory mapped files is a big part of are
> systems performance.  This ensures that in the event of a catastrophic
> system failure the customers vitual business data has been written to disk .

Well maybe you should fix the other ports, because they aren't required
to flush that data and you may get burned nastily from it.

Also understand _why_ the policy is the way it is. Flushing mapped files
is bad for performance of the app. Its also useless for most apps
because there are no ordering guarantees implied by it.

If you have a case that needs it all you have to do is fork/clone a 
thread which does the needed msync's. If we enforce broken behaviour on
applications they cant fork a thread to stop the msyncs...



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: bdflush flushing memory mapped pages.
  2003-04-09  9:27   ` Andrew Morton
  2003-04-09  9:33     ` Jens Axboe
@ 2003-04-10 20:09     ` Keith Ansell
  2003-04-10  8:16       ` Andre Hedrick
                         ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Keith Ansell @ 2003-04-10 20:09 UTC (permalink / raw)
  To: Andrew Morton, Andre Hedrick; +Cc: linux-kernel, axboe

Thank you for your prompt replies.

I realise that Linux conforms to the letter of the specification, but maybe
not the spirit of the it.

I am porting a Database solution to Linux from Unix SVR4, Sco OpenServer and
AIX, where all write required memory mapped files are flushed to disk with
the system flusher, my users have large systems (some in excess of 600
concurrent connections) flushing memory mapped files is a big part of are
systems performance.  This ensures that in the event of a catastrophic
system failure the customers vitual business data has been written to disk .

Keith Ansell






----- Original Message -----
From: "Andrew Morton" <akpm@digeo.com>
To: "Andre Hedrick" <andre@linux-ide.org>
Cc: <keitha@edp.fastfreenet.com>; <linux-kernel@vger.kernel.org>;
<axboe@suse.de>
Sent: Wednesday, April 09, 2003 10:27 AM
Subject: Re: bdflush flushing memory mapped pages.


> Andre Hedrick <andre@linux-ide.org> wrote:
> >
> >
> > Funny you mention this point!
> >
> > I just spent 30-45 minutes on the phone talking to Jens about this very
> > issue.  Jens states he can map the model in to 2.5. and will give it a
> > fling in a bit.  This issue is a must; however, I had given up on the
idea
> > until 2.7.  However, the issues he and I addressed, in combination to
your
> > request jive in sync.
>
> noooo.....   This isn't going to happen.  There are many reasons.
>
> Firstly, how can bdflush even know what pages to write?  The dirtiness of
> these pages is recorded *only* in some processor's hardware pte cache
and/or
> the software pagetables.  Someone needs to go tell all the CPUs to
writeback
> their pte caches into the pagetables and then someone needs to walk the
> pagetables propagating the pte dirty bit into the pageframes before we can
> even start the I/O.
>
> That's what msync does, in filemap_sync().
>
>
> And even if bdflush did this automagically, it's the wrong thing to do
> because the application could very well be repeatedly dirtying the pages.
> Very probably.  So we've just gone and done a ton of pointless I/O, over
and
> over.
>
> You can view MAP_SHARED as an IPC mechanism which uses the filesystem
> namespace for naming.  No way do these people want bdflush pointlessly
> hammering the disk.
>
> You can also view MAP_SHARED as a (strange) way of writing files out.  If
you
> want to do that then fine, but you need to tell the kernel when you've
> finished, just like write() does.   You do that with msync.
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-04-10 17:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-09 19:20 bdflush flushing memory mapped pages Keith Ansell
2003-04-09  9:13 ` Andre Hedrick
2003-04-09  9:27   ` Andrew Morton
2003-04-09  9:33     ` Jens Axboe
2003-04-10 20:09     ` Keith Ansell
2003-04-10  8:16       ` Andre Hedrick
2003-04-10  9:02       ` Nick Piggin
2003-04-10 17:04       ` Alan Cox
2003-04-09  9:22 ` Arjan van de Ven
2003-04-09 10:39 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).