From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932153Ab3GWLp5 (ORCPT ); Tue, 23 Jul 2013 07:45:57 -0400 Received: from cantor2.suse.de ([195.135.220.15]:36675 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756297Ab3GWLp4 (ORCPT ); Tue, 23 Jul 2013 07:45:56 -0400 Date: Tue, 23 Jul 2013 13:45:50 +0200 From: Michal Hocko To: Joonsoo Kim Cc: Andrew Morton , Rik van Riel , Mel Gorman , "Aneesh Kumar K.V" , KAMEZAWA Hiroyuki , Hugh Dickins , Davidlohr Bueso , David Gibson , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim Subject: Re: [PATCH v2 07/10] mm, hugetlb: do not use a page in page cache for cow optimization Message-ID: <20130723114550.GB8677@dhcp22.suse.cz> References: <1374482191-3500-1-git-send-email-iamjoonsoo.kim@lge.com> <1374482191-3500-8-git-send-email-iamjoonsoo.kim@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1374482191-3500-8-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 22-07-13 17:36:28, Joonsoo Kim wrote: > Currently, we use a page with mapped count 1 in page cache for cow > optimization. If we find this condition, we don't allocate a new > page and copy contents. Instead, we map this page directly. > This may introduce a problem that writting to private mapping overwrite > hugetlb file directly. You can find this situation with following code. > > size = 20 * MB; > flag = MAP_SHARED; > p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); > if (p == MAP_FAILED) { > fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); > return -1; > } > p[0] = 's'; > fprintf(stdout, "BEFORE STEAL PRIVATE WRITE: %c\n", p[0]); > munmap(p, size); > > flag = MAP_PRIVATE; > p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); > if (p == MAP_FAILED) { > fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); > } > p[0] = 'c'; > munmap(p, size); > > flag = MAP_SHARED; > p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); > if (p == MAP_FAILED) { > fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); > return -1; > } > fprintf(stdout, "AFTER STEAL PRIVATE WRITE: %c\n", p[0]); > munmap(p, size); > > We can see that "AFTER STEAL PRIVATE WRITE: c", not "AFTER STEAL > PRIVATE WRITE: s". If we turn off this optimization to a page > in page cache, the problem is disappeared. It would be nice to describe the fix here as well. It is far from being intuitive and trivial. The fix seems to be correct. > Reviewed-by: Wanpeng Li > Signed-off-by: Joonsoo Kim Acked-by: Michal Hocko > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 7ca8733..8a61638 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2508,7 +2508,6 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, > { > struct hstate *h = hstate_vma(vma); > struct page *old_page, *new_page; > - int avoidcopy; > int outside_reserve = 0; > unsigned long mmun_start; /* For mmu_notifiers */ > unsigned long mmun_end; /* For mmu_notifiers */ > @@ -2518,10 +2517,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, > retry_avoidcopy: > /* If no-one else is actually using this page, avoid the copy > * and just make the page writable */ > - avoidcopy = (page_mapcount(old_page) == 1); > - if (avoidcopy) { > - if (PageAnon(old_page)) > - page_move_anon_rmap(old_page, vma, address); > + if (page_mapcount(old_page) == 1 && PageAnon(old_page)) { > + page_move_anon_rmap(old_page, vma, address); > set_huge_ptep_writable(vma, address, ptep); > return 0; > } > -- > 1.7.9.5 > -- Michal Hocko SUSE Labs