linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mm: dirty page problem
@ 2009-06-23  8:17 xue yong
  2009-06-23  9:02 ` xue yong
  0 siblings, 1 reply; 5+ messages in thread
From: xue yong @ 2009-06-23  8:17 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 734 bytes --]

I wrote a test program. It  mmap  a file and do some write, so some
pages become dirty.
and then I do "cat /proc/meminfo", so I get dirty pages that I have
written. this happened in my home
computer with debian, self compiled kernel 2.6.18-5 installed.


but in my company, on the servers (suse kernel 2.6.16.54), after the
test program have written the data, there was no change in
dirty pages column of "cat /proc/meminfo" 's output. after I killed
the test program, the dirty pages changed immediately.


I dont know why such different behavior, can you help me?


becauce we want a program mmap some files, and after the files'
contents were changed, the OS can write out these dirty
data back to disk periodly.




Best regards!

[-- Attachment #2: mmaptest.c --]
[-- Type: application/octet-stream, Size: 1480 bytes --]

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/time.h>
#include <time.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    int fd;
    char *pm;
    struct stat st;
    pid_t p;
//    p = fork();
    fd = open("/home/ice/mmap.dat", O_RDWR);
    if( fd < 0 ){
        perror("open mmap.dat error");
        exit(1);
    }
    if ( fstat(fd, &st) != 0 ){
        perror("stat error");
        exit(1);
    }
    pm = (char*) mmap( NULL, st.st_size, PROT_WRITE | PROT_READ , MAP_SHARED , fd, 0);
//    pm = (char*) mmap( NULL, st.st_size,  PROT_READ , MAP_SHARED , fd, 0);
    if ( pm < 0 || pm == MAP_FAILED ){
        perror("mmap failed");
        exit(1);
    }
    
    struct timeval tBegin, tEnd;
    bzero( &tBegin, sizeof(tBegin) );
    bzero( &tEnd, sizeof(tEnd) );

    gettimeofday( &tBegin, NULL);
    int i;
    char buffer[4096];
//    read( fd, buffer, 4096);
    while(1){
    for( i = 0; i < 1024*100; ++i )
    {
        memcpy( pm + 4096*i,buffer, 4096 );
//        pwrite( fd, buffer, 4096, i * 4096 );

    }
    sleep(300);
    }
//    mlockall(MCL_CURRENT);
//
    gettimeofday( &tEnd, NULL);
    int liTimeDiff = ((tEnd.tv_sec - tBegin.tv_sec) * 1000000 + tEnd.tv_usec - tBegin.tv_usec ) / 1000;
    printf("%d ms\n", liTimeDiff );
    printf("msync\n");
    msync( pm, st.st_size, MS_SYNC );
    munmap( pm, st.st_size );
//    sleep(600);
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mm: dirty page problem
  2009-06-23  8:17 mm: dirty page problem xue yong
@ 2009-06-23  9:02 ` xue yong
  2009-06-23 10:33   ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: xue yong @ 2009-06-23  9:02 UTC (permalink / raw)
  To: linux-kernel

I did some search in the changlogs between 2..6.16 adn 2.6.19.
I found this in http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.19

and I concluded that the kernel below 2.6.19 can't tracking shared
dirty pages, am I right?

commit edc79b2a46ed854595e40edcf3f8b37f9f14aa3f
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Mon Sep 25 23:30:58 2006 -0700

    [PATCH] mm: balance dirty pages

    Now that we can detect writers of shared mappings, throttle them.
Avoids OOM
    by surprise.

    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Hugh Dickins <hugh@veritas.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Mon Sep 25 23:30:57 2006 -0700

    [PATCH] mm: tracking shared dirty pages

    Tracking of dirty pages in shared writeable mmap()s.

    The idea is simple: write protect clean shared writeable pages, catch the
    write-fault, make writeable and set dirty.  On page write-back clean all the
    PTE dirty bits and write protect them once again.

    The implementation is a tad harder, mainly because the default
    backing_dev_info capabilities were too loosely maintained.  Hence it is not
    enough to test the backing_dev_info for cap_account_dirty.

    The current heuristic is as follows, a VMA is eligible when:
     - its shared writeable
        (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
     - it is not a 'special' mapping
        (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
     - the backing_dev_info is cap_account_dirty
        mapping_cap_account_dirty(vma->vm_file->f_mapping)
     - f_op->mmap() didn't change the default page protection

    Page from remap_pfn_range() are explicitly excluded because their COW
    semantics are already horrid enough (see vm_normal_page() in
do_wp_page()) and
    because they don't have a backing store anyway.

    mprotect() is taught about the new behaviour as well.  However it overrides
    the last condition.

    Cleaning the pages on write-back is done with page_mkclean() a new
rmap call.
    It can be called on any page, but is currently only implemented for mapped
    pages, if the page is found the be of a VMA that accounts dirty
pages it will
    also wrprotect the PTE.

    Finally, in fs/buffers.c:try_to_free_buffers(); remove
clear_page_dirty() from
    under ->private_lock.  This seems to be safe, since ->private_lock
is used to
    serialize access to the buffers, not the page itself.  This is
needed because
    clear_page_dirty() will call into page_mkclean() and would thereby violate
    locking order.

    [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Hugh Dickins <hugh@veritas.com>
    Signed-off-by: David Howells <dhowells@redhat.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>


On Tue, Jun 23, 2009 at 4:17 PM, xue yong<ultraice.kernel@gmail.com> wrote:
> I wrote a test program. It  mmap  a file and do some write, so some
> pages become dirty.
> and then I do "cat /proc/meminfo", so I get dirty pages that I have
> written. this happened in my home
> computer with debian, self compiled kernel 2.6.18-5 installed.
>
>
> but in my company, on the servers (suse kernel 2.6.16.54), after the
> test program have written the data, there was no change in
> dirty pages column of "cat /proc/meminfo" 's output. after I killed
> the test program, the dirty pages changed immediately.
>
>
> I dont know why such different behavior, can you help me?
>
>
> becauce we want a program mmap some files, and after the files'
> contents were changed, the OS can write out these dirty
> data back to disk periodly.
>
>
>
>
> Best regards!
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mm: dirty page problem
  2009-06-23  9:02 ` xue yong
@ 2009-06-23 10:33   ` Peter Zijlstra
       [not found]     ` <fc71709d0906230443pf8c4b34pcd4b7fa798fbf1ed@mail.gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2009-06-23 10:33 UTC (permalink / raw)
  To: xue yong; +Cc: linux-kernel

On Tue, 2009-06-23 at 17:02 +0800, xue yong wrote:
> I did some search in the changlogs between 2..6.16 adn 2.6.19.
> I found this in http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.19
> 
> and I concluded that the kernel below 2.6.19 can't tracking shared
> dirty pages, am I right?
> 
> commit edc79b2a46ed854595e40edcf3f8b37f9f14aa3f
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Mon Sep 25 23:30:58 2006 -0700
> 
>     [PATCH] mm: balance dirty pages

> commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Mon Sep 25 23:30:57 2006 -0700
> 
>     [PATCH] mm: tracking shared dirty pages

Correct, prior to .19 we didn't have effective tracking of dirty pages.
munmap() and msync() would walk the page tables and collect dirty pages,
but without explicit action these pages would stay hidden.

These patches you found change that by mapping clean shared pages RO and
taking a fault on the dirtying write. We once again map then RO when
they'd be written out to disk.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mm: dirty page problem
       [not found]       ` <1245757970.19816.1675.camel@twins>
@ 2009-06-23 13:32         ` xue yong
  2009-06-23 13:38           ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: xue yong @ 2009-06-23 13:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel

On Tue, Jun 23, 2009 at 7:52 PM, Peter Zijlstra<peterz@infradead.org> wrote:
> On Tue, 2009-06-23 at 19:43 +0800, xue yong wrote:
>> Thanks a lot, Peter.
>> Your reply resolved my doubt.
>>
>> we have a service program (just say A) running with about 14G mmaped data.
>> and there is another daemon (just say B) do msync( SYNC) periodically.
>>
>> so I want to know in this pattern, was the data flushed to disk?
>
> I don't think so.
>
> The problem is that msync() only scans the current process' page tables,
> which would be clean since B doesn't write, only A does.
>
> So you'd have to modify your program, A, to do the msync() itself --
> possibly from a thread (threads share the vm context and thus page
> tables).
>


:)  I did have this thought, because there was littile bo(block out),
and pmap showed that
the dirty pages belong to a process was always growing.

I believe you are the authority. Your confirmation matters.

In "Understanding the Linux® Virtual Memory Manager" page 163, Mel
Gorman said that
Process-mapped pages are not easily swappable because there is no
way to map struct pages to PTEs except to search every page table, which is far
too expensive.
So  neither kswapd nor other kernel daemons do the scan job.
Without explicit action these pages would stay hidden.



>> and we have the problem,  after we stoped/restarted  A,  many ditry
>> pages emerged,
>> about 4G data was flushed out in the following time.
>>
>> Thanks for your advice on these patches. I understand it now.
>> So if we change the kernel to a higher version, can we solve the
>> problem I mentioned above.
>
> Yes, I would recommend running a more recent kernel.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mm: dirty page problem
  2009-06-23 13:32         ` xue yong
@ 2009-06-23 13:38           ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2009-06-23 13:38 UTC (permalink / raw)
  To: xue yong; +Cc: linux-kernel

On Tue, 2009-06-23 at 21:32 +0800, xue yong wrote:
> On Tue, Jun 23, 2009 at 7:52 PM, Peter Zijlstra<peterz@infradead.org> wrote:
> > On Tue, 2009-06-23 at 19:43 +0800, xue yong wrote:
> >> Thanks a lot, Peter.
> >> Your reply resolved my doubt.
> >>
> >> we have a service program (just say A) running with about 14G mmaped data.
> >> and there is another daemon (just say B) do msync( SYNC) periodically.
> >>
> >> so I want to know in this pattern, was the data flushed to disk?
> >
> > I don't think so.
> >
> > The problem is that msync() only scans the current process' page tables,
> > which would be clean since B doesn't write, only A does.
> >
> > So you'd have to modify your program, A, to do the msync() itself --
> > possibly from a thread (threads share the vm context and thus page
> > tables).
> >
> 
> 
> :)  I did have this thought, because there was littile bo(block out),
> and pmap showed that
> the dirty pages belong to a process was always growing.
> 
> I believe you are the authority. Your confirmation matters.

I'm one of the people who knows this code rather well, yes ;-)

> In "Understanding the Linux® Virtual Memory Manager" page 163, Mel
> Gorman said that
> Process-mapped pages are not easily swappable because there is no
> way to map struct pages to PTEs except to search every page table, which is far
> too expensive.
> So  neither kswapd nor other kernel daemons do the scan job.
> Without explicit action these pages would stay hidden.

While a great book to learn some of the basics from, it is severely
out-dated. I think in his 2.6 chapter he does mention something about
reverse map, or rmap as its called.

These days we do keep a data structure whereby it is easier to find all
ptes for a particular mapping (mm/rmap.c).

In particular try_to_unmap() is the routine used to remove all ptes in
order to swap a page.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-06-23 13:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-23  8:17 mm: dirty page problem xue yong
2009-06-23  9:02 ` xue yong
2009-06-23 10:33   ` Peter Zijlstra
     [not found]     ` <fc71709d0906230443pf8c4b34pcd4b7fa798fbf1ed@mail.gmail.com>
     [not found]       ` <1245757970.19816.1675.camel@twins>
2009-06-23 13:32         ` xue yong
2009-06-23 13:38           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).