* mmap-related questions
@ 2003-03-31 14:41 Kenny Simpson
2003-03-31 17:55 ` Benjamin LaHaise
0 siblings, 1 reply; 8+ messages in thread
From: Kenny Simpson @ 2003-03-31 14:41 UTC (permalink / raw)
To: linux-kernel
Greetings! I hate to ask this type of questions here,
but having searched the list and googling, I have
found no good answers, so here goes..
If I use mmap to give me a sliding window view onto a
file (mmap/munmap/mmap or mremap), how can I sync all
unmapped memory associated with the file?
I read from Stevens that "the call to munmap does not
cause the contents of the mapped region to be written
to the disk file.", but I don't want to pay the
penalty of doing many msync()'s each time I move my
window.
I tested that fsync() does not seem to sync pages that
were mapped with mmap. Is there some way to sync all
data associated with the file? Is there a way which
is also portable to Solaris 2.6?
Thanks,
-Kenny
BTW: I'm using 2.4.7 (RH enterprise)
__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mmap-related questions
2003-03-31 14:41 mmap-related questions Kenny Simpson
@ 2003-03-31 17:55 ` Benjamin LaHaise
2003-04-01 3:25 ` Kenny Simpson
0 siblings, 1 reply; 8+ messages in thread
From: Benjamin LaHaise @ 2003-03-31 17:55 UTC (permalink / raw)
To: Kenny Simpson; +Cc: linux-kernel
On Mon, Mar 31, 2003 at 06:41:10AM -0800, Kenny Simpson wrote:
> I tested that fsync() does not seem to sync pages that
> were mapped with mmap. Is there some way to sync all
> data associated with the file? Is there a way which
> is also portable to Solaris 2.6?
No. You must use msync(). Note that fsync() after munmap() will flush the
pages to disk under Linux.
> BTW: I'm using 2.4.7 (RH enterprise)
2.4.7 is way out of date and should be updated for the numerous bugfixes and
security errata.
-ben
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mmap-related questions
2003-03-31 17:55 ` Benjamin LaHaise
@ 2003-04-01 3:25 ` Kenny Simpson
2003-04-01 17:50 ` Benjamin LaHaise
0 siblings, 1 reply; 8+ messages in thread
From: Kenny Simpson @ 2003-04-01 3:25 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: linux-kernel
--- Benjamin LaHaise <bcrl@redhat.com> wrote:
> No. You must use msync().
> Note that fsync() after
> munmap() will flush the
> pages to disk under Linux.
Sweet! Paydirt! Is this documented/guaranteed to
continue to work for a while?
Is this true for all non-mmap()ed dirty buffers for a
given file?
Just to restate what you said:
- if part of a file is mmap()ed, msync() MUST be used
to sync it.
- any non-mmap()ed portions are synched with fsync().
I'm assuming this is a per-process thing. i.e. The
above is true regardless of what other processes are
doing (e.g. even if another process has the same file
mmap()'d, I don't care).
> 2.4.7 is way out of date and should be updated for
> the numerous bugfixes and
> security errata.
I know. Unfortunately not my call. Desperately
trying to beat people with clue sticks....
Thanks!,
-Kenny
__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mmap-related questions
2003-04-01 3:25 ` Kenny Simpson
@ 2003-04-01 17:50 ` Benjamin LaHaise
2003-04-02 3:18 ` Kenny Simpson
0 siblings, 1 reply; 8+ messages in thread
From: Benjamin LaHaise @ 2003-04-01 17:50 UTC (permalink / raw)
To: Kenny Simpson; +Cc: linux-kernel
On Mon, Mar 31, 2003 at 07:25:46PM -0800, Kenny Simpson wrote:
> --- Benjamin LaHaise <bcrl@redhat.com> wrote:
> > No. You must use msync().
>
> > Note that fsync() after
> > munmap() will flush the
> > pages to disk under Linux.
> Sweet! Paydirt! Is this documented/guaranteed to
> continue to work for a while?
> Is this true for all non-mmap()ed dirty buffers for a
> given file?
It's only true for the pages the munmap() removes from the process' page
tables: the act of unmapping them transfers the dirty bit from the page
tables into the page cache where fsync() acts on them.
> Just to restate what you said:
> - if part of a file is mmap()ed, msync() MUST be used
> to sync it.
> - any non-mmap()ed portions are synched with fsync().
Pretty much.
> I'm assuming this is a per-process thing. i.e. The
> above is true regardless of what other processes are
> doing (e.g. even if another process has the same file
> mmap()'d, I don't care).
Right. Other processes are responsible for managing their own syncing of
dirty bits to disk at the appropriate times. The one case this breaks down
on is when the mmap()'d file is on NFS -- the reordering there can result in
writebacks from mmap()s occuring in unexpected ways. But then, nobody trusts
their data to NFS, right? ;-)
-ben
--
Junk email? <a href="mailto:aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mmap-related questions
2003-04-01 17:50 ` Benjamin LaHaise
@ 2003-04-02 3:18 ` Kenny Simpson
2003-04-02 9:30 ` Jakob Oestergaard
0 siblings, 1 reply; 8+ messages in thread
From: Kenny Simpson @ 2003-04-02 3:18 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: linux-kernel
--- Benjamin LaHaise <bcrl@redhat.com> wrote:
> the act of unmapping them transfers the
> dirty bit from the page
> tables into the page cache where fsync() acts on
> them.
>
Should this info be included with Mel Gorman's
excellent doc:
http://www.csn.ul.ie/~mel/projects/vm/guide/html/understand/node31.html#SECTION009411000000000000000
Or is it there, but I missed it?
> The
> one case this breaks down
> on is when the mmap()'d file is on NFS -- the
> reordering there can result in
> writebacks from mmap()s occuring in unexpected ways.
I sometimes wish mmap was not supported on NFS, or at
least require a special MAP_NFS flag be used. It has
caused lots of pain over the years.
Thanks again for this info, it has helped greatly!
-Kenny
__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mmap-related questions
2003-04-02 3:18 ` Kenny Simpson
@ 2003-04-02 9:30 ` Jakob Oestergaard
2003-04-02 15:10 ` Benjamin LaHaise
0 siblings, 1 reply; 8+ messages in thread
From: Jakob Oestergaard @ 2003-04-02 9:30 UTC (permalink / raw)
To: Kenny Simpson; +Cc: Benjamin LaHaise, linux-kernel
On Tue, Apr 01, 2003 at 07:18:40PM -0800, Kenny Simpson wrote:
> --- Benjamin LaHaise <bcrl@redhat.com> wrote:
> > the act of unmapping them transfers the
> > dirty bit from the page
> > tables into the page cache where fsync() acts on
> > them.
> >
> Should this info be included with Mel Gorman's
> excellent doc:
> http://www.csn.ul.ie/~mel/projects/vm/guide/html/understand/node31.html#SECTION009411000000000000000
> Or is it there, but I missed it?
>
> > The
> > one case this breaks down
> > on is when the mmap()'d file is on NFS -- the
> > reordering there can result in
> > writebacks from mmap()s occuring in unexpected ways.
> I sometimes wish mmap was not supported on NFS, or at
> least require a special MAP_NFS flag be used. It has
> caused lots of pain over the years.
Could someone elaborate on this please?
If my client does
big_map = mmap(... some file ...)
make_dirty(big_map)
msync(first half of big_map)
msync(second half of big_map) { crash during this }
Then I am guaranteed that (unless the server crashes), the first half of
big_map *will* have reached the server, but not that all of the second
half has. Right?
Like any local-disk backed file.
Ignoring the case where the NFS *server* crashes, where could the write
ordering differ, compared to local disk files ?
In other words, what does Benjamin's "unexpected ways" refer to ?
Thanks,
--
................................................................
: jakob@unthought.net : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob Østergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mmap-related questions
2003-04-02 9:30 ` Jakob Oestergaard
@ 2003-04-02 15:10 ` Benjamin LaHaise
2003-04-02 23:43 ` Jakob Oestergaard
0 siblings, 1 reply; 8+ messages in thread
From: Benjamin LaHaise @ 2003-04-02 15:10 UTC (permalink / raw)
To: Jakob Oestergaard, Kenny Simpson, linux-kernel
On Wed, Apr 02, 2003 at 11:30:50AM +0200, Jakob Oestergaard wrote:
> make_dirty(big_map)
> msync(first half of big_map)
> msync(second half of big_map) { crash during this }
>
> Then I am guaranteed that (unless the server crashes), the first half of
> big_map *will* have reached the server, but not that all of the second
> half has. Right?
Assuming you used MS_SYNC for the msync() flags. MS_ASYNC could still be
proceeding to flush the pages out in the background. And the kernel may
have triggered writeback of the second half -- it is free to do so as it
sees fit.
> Like any local-disk backed file.
>
> Ignoring the case where the NFS *server* crashes, where could the write
> ordering differ, compared to local disk files ?
> In other words, what does Benjamin's "unexpected ways" refer to ?
All local clients will see the mmap() being updated from the time it is
dirtied, but there is no ordering of write()s with respect to the mmap
unless you explicitely msync(..MS_SYNC..) as in your example.
-ben
--
Junk email? <a href="mailto:aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mmap-related questions
2003-04-02 15:10 ` Benjamin LaHaise
@ 2003-04-02 23:43 ` Jakob Oestergaard
0 siblings, 0 replies; 8+ messages in thread
From: Jakob Oestergaard @ 2003-04-02 23:43 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: Kenny Simpson, linux-kernel
On Wed, Apr 02, 2003 at 10:10:06AM -0500, Benjamin LaHaise wrote:
> On Wed, Apr 02, 2003 at 11:30:50AM +0200, Jakob Oestergaard wrote:
> > make_dirty(big_map)
> > msync(first half of big_map)
> > msync(second half of big_map) { crash during this }
> >
> > Then I am guaranteed that (unless the server crashes), the first half of
> > big_map *will* have reached the server, but not that all of the second
> > half has. Right?
>
> Assuming you used MS_SYNC for the msync() flags. MS_ASYNC could still be
> proceeding to flush the pages out in the background. And the kernel may
> have triggered writeback of the second half -- it is free to do so as it
> sees fit.
Yes. MS_ASYNC is "advisory" only, as I understand it. (too bad it isn't
select()'able actually, I could use that to work wonders with a database
engine here...)
>
> > Like any local-disk backed file.
> >
> > Ignoring the case where the NFS *server* crashes, where could the write
> > ordering differ, compared to local disk files ?
>
> > In other words, what does Benjamin's "unexpected ways" refer to ?
>
> All local clients will see the mmap() being updated from the time it is
> dirtied, but there is no ordering of write()s with respect to the mmap
> unless you explicitely msync(..MS_SYNC..) as in your example.
Ok, so we're talking multiple processes reading/writing.
Now it makes a lot more sense - I was thinking one process only. Silly
simple-minded me ;)
Thanks,
--
................................................................
: jakob@unthought.net : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob Østergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-04-02 23:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-31 14:41 mmap-related questions Kenny Simpson
2003-03-31 17:55 ` Benjamin LaHaise
2003-04-01 3:25 ` Kenny Simpson
2003-04-01 17:50 ` Benjamin LaHaise
2003-04-02 3:18 ` Kenny Simpson
2003-04-02 9:30 ` Jakob Oestergaard
2003-04-02 15:10 ` Benjamin LaHaise
2003-04-02 23:43 ` Jakob Oestergaard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).