* bdflush flushing memory mapped pages. @ 2003-04-09 19:20 Keith Ansell 2003-04-09 9:13 ` Andre Hedrick ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Keith Ansell @ 2003-04-09 19:20 UTC (permalink / raw) To: linux-kernel help My application uses SHARED memory mapping files for file I/O, and we have observed that Linux does not flush dirty pages to disk until munmap or msync are called. I would like to know are there any development plans which would address this issue or if there is a version of bdflush which flushes write required pages (dirty pages) to disk? Regards Keith Ansell. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bdflush flushing memory mapped pages. 2003-04-09 19:20 bdflush flushing memory mapped pages Keith Ansell @ 2003-04-09 9:13 ` Andre Hedrick 2003-04-09 9:27 ` Andrew Morton 2003-04-09 9:22 ` Arjan van de Ven 2003-04-09 10:39 ` Alan Cox 2 siblings, 1 reply; 10+ messages in thread From: Andre Hedrick @ 2003-04-09 9:13 UTC (permalink / raw) To: Keith Ansell; +Cc: linux-kernel, Jens Axboe Funny you mention this point! I just spent 30-45 minutes on the phone talking to Jens about this very issue. Jens states he can map the model in to 2.5. and will give it a fling in a bit. This issue is a must; however, I had given up on the idea until 2.7. However, the issues he and I addressed, in combination to your request jive in sync. Cheers, Andre Hedrick LAD Storage Consulting Group On Wed, 9 Apr 2003, Keith Ansell wrote: > help > > My application uses SHARED memory mapping files for file I/O, and we have > observed > that Linux does not flush dirty pages to disk until munmap or msync are > called. > > I would like to know are there any development plans which would address > this issue or > if there is a version of bdflush which flushes write required pages (dirty > pages) to disk? > > Regards > Keith Ansell. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bdflush flushing memory mapped pages. 2003-04-09 9:13 ` Andre Hedrick @ 2003-04-09 9:27 ` Andrew Morton 2003-04-09 9:33 ` Jens Axboe 2003-04-10 20:09 ` Keith Ansell 0 siblings, 2 replies; 10+ messages in thread From: Andrew Morton @ 2003-04-09 9:27 UTC (permalink / raw) To: Andre Hedrick; +Cc: keitha, linux-kernel, axboe Andre Hedrick <andre@linux-ide.org> wrote: > > > Funny you mention this point! > > I just spent 30-45 minutes on the phone talking to Jens about this very > issue. Jens states he can map the model in to 2.5. and will give it a > fling in a bit. This issue is a must; however, I had given up on the idea > until 2.7. However, the issues he and I addressed, in combination to your > request jive in sync. noooo..... This isn't going to happen. There are many reasons. Firstly, how can bdflush even know what pages to write? The dirtiness of these pages is recorded *only* in some processor's hardware pte cache and/or the software pagetables. Someone needs to go tell all the CPUs to writeback their pte caches into the pagetables and then someone needs to walk the pagetables propagating the pte dirty bit into the pageframes before we can even start the I/O. That's what msync does, in filemap_sync(). And even if bdflush did this automagically, it's the wrong thing to do because the application could very well be repeatedly dirtying the pages. Very probably. So we've just gone and done a ton of pointless I/O, over and over. You can view MAP_SHARED as an IPC mechanism which uses the filesystem namespace for naming. No way do these people want bdflush pointlessly hammering the disk. You can also view MAP_SHARED as a (strange) way of writing files out. If you want to do that then fine, but you need to tell the kernel when you've finished, just like write() does. You do that with msync. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bdflush flushing memory mapped pages. 2003-04-09 9:27 ` Andrew Morton @ 2003-04-09 9:33 ` Jens Axboe 2003-04-10 20:09 ` Keith Ansell 1 sibling, 0 replies; 10+ messages in thread From: Jens Axboe @ 2003-04-09 9:33 UTC (permalink / raw) To: Andrew Morton; +Cc: Andre Hedrick, keitha, linux-kernel On Wed, Apr 09 2003, Andrew Morton wrote: > Andre Hedrick <andre@linux-ide.org> wrote: > > > > > > Funny you mention this point! > > > > I just spent 30-45 minutes on the phone talking to Jens about this very > > issue. Jens states he can map the model in to 2.5. and will give it a > > fling in a bit. This issue is a must; however, I had given up on the idea > > until 2.7. However, the issues he and I addressed, in combination to your > > request jive in sync. > > noooo..... This isn't going to happen. There are many reasons. [snip] This isn't Andres point at all. He wants a way to defer completion of requests to the block layer until you actually know they are on platter. I think he just tied it into this thread because it sort-of deals with deferred errors as well. -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bdflush flushing memory mapped pages. 2003-04-09 9:27 ` Andrew Morton 2003-04-09 9:33 ` Jens Axboe @ 2003-04-10 20:09 ` Keith Ansell 2003-04-10 8:16 ` Andre Hedrick ` (2 more replies) 1 sibling, 3 replies; 10+ messages in thread From: Keith Ansell @ 2003-04-10 20:09 UTC (permalink / raw) To: Andrew Morton, Andre Hedrick; +Cc: linux-kernel, axboe Thank you for your prompt replies. I realise that Linux conforms to the letter of the specification, but maybe not the spirit of the it. I am porting a Database solution to Linux from Unix SVR4, Sco OpenServer and AIX, where all write required memory mapped files are flushed to disk with the system flusher, my users have large systems (some in excess of 600 concurrent connections) flushing memory mapped files is a big part of are systems performance. This ensures that in the event of a catastrophic system failure the customers vitual business data has been written to disk . Keith Ansell ----- Original Message ----- From: "Andrew Morton" <akpm@digeo.com> To: "Andre Hedrick" <andre@linux-ide.org> Cc: <keitha@edp.fastfreenet.com>; <linux-kernel@vger.kernel.org>; <axboe@suse.de> Sent: Wednesday, April 09, 2003 10:27 AM Subject: Re: bdflush flushing memory mapped pages. > Andre Hedrick <andre@linux-ide.org> wrote: > > > > > > Funny you mention this point! > > > > I just spent 30-45 minutes on the phone talking to Jens about this very > > issue. Jens states he can map the model in to 2.5. and will give it a > > fling in a bit. This issue is a must; however, I had given up on the idea > > until 2.7. However, the issues he and I addressed, in combination to your > > request jive in sync. > > noooo..... This isn't going to happen. There are many reasons. > > Firstly, how can bdflush even know what pages to write? The dirtiness of > these pages is recorded *only* in some processor's hardware pte cache and/or > the software pagetables. Someone needs to go tell all the CPUs to writeback > their pte caches into the pagetables and then someone needs to walk the > pagetables propagating the pte dirty bit into the pageframes before we can > even start the I/O. > > That's what msync does, in filemap_sync(). > > > And even if bdflush did this automagically, it's the wrong thing to do > because the application could very well be repeatedly dirtying the pages. > Very probably. So we've just gone and done a ton of pointless I/O, over and > over. > > You can view MAP_SHARED as an IPC mechanism which uses the filesystem > namespace for naming. No way do these people want bdflush pointlessly > hammering the disk. > > You can also view MAP_SHARED as a (strange) way of writing files out. If you > want to do that then fine, but you need to tell the kernel when you've > finished, just like write() does. You do that with msync. > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bdflush flushing memory mapped pages. 2003-04-10 20:09 ` Keith Ansell @ 2003-04-10 8:16 ` Andre Hedrick 2003-04-10 9:02 ` Nick Piggin 2003-04-10 17:04 ` Alan Cox 2 siblings, 0 replies; 10+ messages in thread From: Andre Hedrick @ 2003-04-10 8:16 UTC (permalink / raw) To: Keith Ansell; +Cc: Andrew Morton, linux-kernel, axboe Keith, I know what you are asking for and need. It is a requirement to be "Enterprise". What you are seeking will take time, and effort. I have explained successfully to Jens (block maintainer) the issues of data integrity. If you can prove this becomes a data integrity issue, which I know it is for the general case, your argument will have strength. Nothing stops "fastfreenet.com" from funding the development time. ASS-GAS-GRASS-CASH, Linux is free but my time is not. If you want to discuss more of this offline, I will listen and help make the case. Cheers, Andre Hedrick LAD Storage Consulting Group On Thu, 10 Apr 2003, Keith Ansell wrote: > Thank you for your prompt replies. > > I realise that Linux conforms to the letter of the specification, but maybe > not the spirit of the it. > > I am porting a Database solution to Linux from Unix SVR4, Sco OpenServer and > AIX, where all write required memory mapped files are flushed to disk with > the system flusher, my users have large systems (some in excess of 600 > concurrent connections) flushing memory mapped files is a big part of are > systems performance. This ensures that in the event of a catastrophic > system failure the customers vitual business data has been written to disk . > > Keith Ansell > > > > > > > ----- Original Message ----- > From: "Andrew Morton" <akpm@digeo.com> > To: "Andre Hedrick" <andre@linux-ide.org> > Cc: <keitha@edp.fastfreenet.com>; <linux-kernel@vger.kernel.org>; > <axboe@suse.de> > Sent: Wednesday, April 09, 2003 10:27 AM > Subject: Re: bdflush flushing memory mapped pages. > > > > Andre Hedrick <andre@linux-ide.org> wrote: > > > > > > > > > Funny you mention this point! > > > > > > I just spent 30-45 minutes on the phone talking to Jens about this very > > > issue. Jens states he can map the model in to 2.5. and will give it a > > > fling in a bit. This issue is a must; however, I had given up on the > idea > > > until 2.7. However, the issues he and I addressed, in combination to > your > > > request jive in sync. > > > > noooo..... This isn't going to happen. There are many reasons. > > > > Firstly, how can bdflush even know what pages to write? The dirtiness of > > these pages is recorded *only* in some processor's hardware pte cache > and/or > > the software pagetables. Someone needs to go tell all the CPUs to > writeback > > their pte caches into the pagetables and then someone needs to walk the > > pagetables propagating the pte dirty bit into the pageframes before we can > > even start the I/O. > > > > That's what msync does, in filemap_sync(). > > > > > > And even if bdflush did this automagically, it's the wrong thing to do > > because the application could very well be repeatedly dirtying the pages. > > Very probably. So we've just gone and done a ton of pointless I/O, over > and > > over. > > > > You can view MAP_SHARED as an IPC mechanism which uses the filesystem > > namespace for naming. No way do these people want bdflush pointlessly > > hammering the disk. > > > > You can also view MAP_SHARED as a (strange) way of writing files out. If > you > > want to do that then fine, but you need to tell the kernel when you've > > finished, just like write() does. You do that with msync. > > > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bdflush flushing memory mapped pages. 2003-04-10 20:09 ` Keith Ansell 2003-04-10 8:16 ` Andre Hedrick @ 2003-04-10 9:02 ` Nick Piggin 2003-04-10 17:04 ` Alan Cox 2 siblings, 0 replies; 10+ messages in thread From: Nick Piggin @ 2003-04-10 9:02 UTC (permalink / raw) To: Keith Ansell; +Cc: Andrew Morton, Andre Hedrick, linux-kernel, axboe Keith Ansell wrote: >Thank you for your prompt replies. > >I realise that Linux conforms to the letter of the specification, but maybe >not the spirit of the it. > >I am porting a Database solution to Linux from Unix SVR4, Sco OpenServer and >AIX, where all write required memory mapped files are flushed to disk with >the system flusher, my users have large systems (some in excess of 600 >concurrent connections) flushing memory mapped files is a big part of are >systems performance. This ensures that in the event of a catastrophic >system failure the customers vitual business data has been written to disk . > As Andrew mentioned, msync would do what you want. It seems to me though, that your database solution wants a stronger guarantee about the safety of the data than asynchronous writes will provide anyway. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bdflush flushing memory mapped pages. 2003-04-10 20:09 ` Keith Ansell 2003-04-10 8:16 ` Andre Hedrick 2003-04-10 9:02 ` Nick Piggin @ 2003-04-10 17:04 ` Alan Cox 2 siblings, 0 replies; 10+ messages in thread From: Alan Cox @ 2003-04-10 17:04 UTC (permalink / raw) To: Keith Ansell Cc: Andrew Morton, Andre Hedrick, Linux Kernel Mailing List, axboe On Iau, 2003-04-10 at 21:09, Keith Ansell wrote: > I am porting a Database solution to Linux from Unix SVR4, Sco OpenServer and > AIX, where all write required memory mapped files are flushed to disk with > the system flusher, my users have large systems (some in excess of 600 > concurrent connections) flushing memory mapped files is a big part of are > systems performance. This ensures that in the event of a catastrophic > system failure the customers vitual business data has been written to disk . Well maybe you should fix the other ports, because they aren't required to flush that data and you may get burned nastily from it. Also understand _why_ the policy is the way it is. Flushing mapped files is bad for performance of the app. Its also useless for most apps because there are no ordering guarantees implied by it. If you have a case that needs it all you have to do is fork/clone a thread which does the needed msync's. If we enforce broken behaviour on applications they cant fork a thread to stop the msyncs... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bdflush flushing memory mapped pages. 2003-04-09 19:20 bdflush flushing memory mapped pages Keith Ansell 2003-04-09 9:13 ` Andre Hedrick @ 2003-04-09 9:22 ` Arjan van de Ven 2003-04-09 10:39 ` Alan Cox 2 siblings, 0 replies; 10+ messages in thread From: Arjan van de Ven @ 2003-04-09 9:22 UTC (permalink / raw) To: Keith Ansell; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 668 bytes --] On Wed, 2003-04-09 at 21:20, Keith Ansell wrote: > help > > My application uses SHARED memory mapping files for file I/O, and we have > observed > that Linux does not flush dirty pages to disk until munmap or msync are > called. > > I would like to know are there any development plans which would address > this issue or > if there is a version of bdflush which flushes write required pages (dirty > pages) to disk? The linux behavior is perfectly fine and conformant to the posix/sus specifications, applications that break because of this are defective and need fixing. This is a performance optimisation that the OS is perfectly allowed to do. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bdflush flushing memory mapped pages. 2003-04-09 19:20 bdflush flushing memory mapped pages Keith Ansell 2003-04-09 9:13 ` Andre Hedrick 2003-04-09 9:22 ` Arjan van de Ven @ 2003-04-09 10:39 ` Alan Cox 2 siblings, 0 replies; 10+ messages in thread From: Alan Cox @ 2003-04-09 10:39 UTC (permalink / raw) To: Keith Ansell; +Cc: Linux Kernel Mailing List On Mer, 2003-04-09 at 20:20, Keith Ansell wrote: > help > > My application uses SHARED memory mapping files for file I/O, and we have > observed > that Linux does not flush dirty pages to disk until munmap or msync are > called. This is correct behaviour ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2003-04-10 17:51 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-04-09 19:20 bdflush flushing memory mapped pages Keith Ansell 2003-04-09 9:13 ` Andre Hedrick 2003-04-09 9:27 ` Andrew Morton 2003-04-09 9:33 ` Jens Axboe 2003-04-10 20:09 ` Keith Ansell 2003-04-10 8:16 ` Andre Hedrick 2003-04-10 9:02 ` Nick Piggin 2003-04-10 17:04 ` Alan Cox 2003-04-09 9:22 ` Arjan van de Ven 2003-04-09 10:39 ` Alan Cox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).