* [PATCH v2] mm: thp: fix flags for pmd migration when split @ 2018-12-11 5:12 Peter Xu 2018-12-11 8:21 ` Konstantin Khlebnikov 0 siblings, 1 reply; 6+ messages in thread From: Peter Xu @ 2018-12-11 5:12 UTC (permalink / raw) To: linux-kernel Cc: peterx, Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, Matthew Wilcox, Michal Hocko, Dave Jiang, Aneesh Kumar K.V, Souptick Joarder, Konstantin Khlebnikov, linux-mm When splitting a huge migrating PMD, we'll transfer all the existing PMD bits and apply them again onto the small PTEs. However we are fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound() while actually they don't make sense at all when it's a migration entry. Fix them up by make it conditional. Note that if my understanding is correct about the problem then if without the patch there is chance to lose some of the dirty bits in the migrating pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset instead of bit 2) and it could potentially corrupt the memory of an userspace program which depends on the dirty bit. CC: Andrea Arcangeli <aarcange@redhat.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> CC: Matthew Wilcox <willy@infradead.org> CC: Michal Hocko <mhocko@suse.com> CC: Dave Jiang <dave.jiang@intel.com> CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> CC: Souptick Joarder <jrdr.linux@gmail.com> CC: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> CC: linux-mm@kvack.org CC: linux-kernel@vger.kernel.org Signed-off-by: Peter Xu <peterx@redhat.com> --- v2: - fix it up for young/write/dirty bits too [Konstantin] --- mm/huge_memory.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f2d19e4fe854..b00941b3d342 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2157,11 +2157,16 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); - if (pmd_dirty(old_pmd)) - SetPageDirty(page); - write = pmd_write(old_pmd); - young = pmd_young(old_pmd); - soft_dirty = pmd_soft_dirty(old_pmd); + if (unlikely(pmd_migration)) { + soft_dirty = pmd_swp_soft_dirty(old_pmd); + young = write = false; + } else { + if (pmd_dirty(old_pmd)) + SetPageDirty(page); + write = pmd_write(old_pmd); + young = pmd_young(old_pmd); + soft_dirty = pmd_soft_dirty(old_pmd); + } /* * Withdraw the table only after we mark the pmd entry invalid. -- 2.17.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: thp: fix flags for pmd migration when split 2018-12-11 5:12 [PATCH v2] mm: thp: fix flags for pmd migration when split Peter Xu @ 2018-12-11 8:21 ` Konstantin Khlebnikov 2018-12-11 19:07 ` Zi Yan 2018-12-12 5:15 ` Peter Xu 0 siblings, 2 replies; 6+ messages in thread From: Konstantin Khlebnikov @ 2018-12-11 8:21 UTC (permalink / raw) To: Peter Xu, linux-kernel Cc: Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, Matthew Wilcox, Michal Hocko, Dave Jiang, Aneesh Kumar K.V, Souptick Joarder, linux-mm On 11.12.2018 8:12, Peter Xu wrote: > When splitting a huge migrating PMD, we'll transfer all the existing > PMD bits and apply them again onto the small PTEs. However we are > fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() > or pmd_yound() while actually they don't make sense at all when it's > a migration entry. Fix them up by make it conditional. > > Note that if my understanding is correct about the problem then if > without the patch there is chance to lose some of the dirty bits in > the migrating pmd pages (on x86_64 we're fetching bit 11 which is part > of swap offset instead of bit 2) and it could potentially corrupt the > memory of an userspace program which depends on the dirty bit. > > CC: Andrea Arcangeli <aarcange@redhat.com> > CC: Andrew Morton <akpm@linux-foundation.org> > CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > CC: Matthew Wilcox <willy@infradead.org> > CC: Michal Hocko <mhocko@suse.com> > CC: Dave Jiang <dave.jiang@intel.com> > CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > CC: Souptick Joarder <jrdr.linux@gmail.com> > CC: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > CC: linux-mm@kvack.org > CC: linux-kernel@vger.kernel.org > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > v2: > - fix it up for young/write/dirty bits too [Konstantin] > --- > mm/huge_memory.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index f2d19e4fe854..b00941b3d342 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2157,11 +2157,16 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > page = pmd_page(old_pmd); > VM_BUG_ON_PAGE(!page_count(page), page); > page_ref_add(page, HPAGE_PMD_NR - 1); > - if (pmd_dirty(old_pmd)) > - SetPageDirty(page); > - write = pmd_write(old_pmd); > - young = pmd_young(old_pmd); > - soft_dirty = pmd_soft_dirty(old_pmd); > + if (unlikely(pmd_migration)) { > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > + young = write = false; > + } else { > + if (pmd_dirty(old_pmd)) > + SetPageDirty(page); > + write = pmd_write(old_pmd); > + young = pmd_young(old_pmd); > + soft_dirty = pmd_soft_dirty(old_pmd); > + } Write/read-only is encoded into migration entry. I suppose there should be something like this: --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2151,16 +2151,21 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = pmd_to_swp_entry(old_pmd); page = pfn_to_page(swp_offset(entry)); + write = is_write_migration_entry(entry); + young = false; + soft_dirty = pmd_swp_soft_dirty(old_pmd); } else #endif + { page = pmd_page(old_pmd); + if (pmd_dirty(old_pmd)) + SetPageDirty(page); + write = pmd_write(old_pmd); + young = pmd_young(old_pmd); + soft_dirty = pmd_soft_dirty(old_pmd); + } VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); - if (pmd_dirty(old_pmd)) - SetPageDirty(page); - write = pmd_write(old_pmd); - young = pmd_young(old_pmd); - soft_dirty = pmd_soft_dirty(old_pmd); /* * Withdraw the table only after we mark the pmd entry invalid. > > /* > * Withdraw the table only after we mark the pmd entry invalid. > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: thp: fix flags for pmd migration when split 2018-12-11 8:21 ` Konstantin Khlebnikov @ 2018-12-11 19:07 ` Zi Yan 2018-12-12 5:15 ` Peter Xu 1 sibling, 0 replies; 6+ messages in thread From: Zi Yan @ 2018-12-11 19:07 UTC (permalink / raw) To: Konstantin Khlebnikov, Peter Xu Cc: linux-kernel, Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, Matthew Wilcox, Michal Hocko, Dave Jiang, Aneesh Kumar K.V, Souptick Joarder, linux-mm [-- Attachment #1: Type: text/plain, Size: 1575 bytes --] On 11 Dec 2018, at 3:21, Konstantin Khlebnikov wrote: > > Write/read-only is encoded into migration entry. > I suppose there should be something like this: > > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2151,16 +2151,21 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > > entry = pmd_to_swp_entry(old_pmd); > page = pfn_to_page(swp_offset(entry)); > + write = is_write_migration_entry(entry); > + young = false; > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > } else > #endif > + { > page = pmd_page(old_pmd); > + if (pmd_dirty(old_pmd)) > + SetPageDirty(page); > + write = pmd_write(old_pmd); > + young = pmd_young(old_pmd); > + soft_dirty = pmd_soft_dirty(old_pmd); > + } > VM_BUG_ON_PAGE(!page_count(page), page); > page_ref_add(page, HPAGE_PMD_NR - 1); > - if (pmd_dirty(old_pmd)) > - SetPageDirty(page); > - write = pmd_write(old_pmd); > - young = pmd_young(old_pmd); > - soft_dirty = pmd_soft_dirty(old_pmd); > > /* > * Withdraw the table only after we mark the pmd entry invalid. > This one should fix the issue. Thanks. Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu> Fixes 84c3fc4e9c563 ("mm: thp: check pmd migration entry in common path") Do we need to cc: stable@vger.kernel.org # 4.14+ ? -- Best Regards, Yan Zi [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 516 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: thp: fix flags for pmd migration when split 2018-12-11 8:21 ` Konstantin Khlebnikov 2018-12-11 19:07 ` Zi Yan @ 2018-12-12 5:15 ` Peter Xu 2018-12-12 13:51 ` Konstantin Khlebnikov 1 sibling, 1 reply; 6+ messages in thread From: Peter Xu @ 2018-12-12 5:15 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: linux-kernel, Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, Matthew Wilcox, Michal Hocko, Dave Jiang, Aneesh Kumar K.V, Souptick Joarder, linux-mm On Tue, Dec 11, 2018 at 11:21:44AM +0300, Konstantin Khlebnikov wrote: > On 11.12.2018 8:12, Peter Xu wrote: > > When splitting a huge migrating PMD, we'll transfer all the existing > > PMD bits and apply them again onto the small PTEs. However we are > > fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() > > or pmd_yound() while actually they don't make sense at all when it's > > a migration entry. Fix them up by make it conditional. > > > > Note that if my understanding is correct about the problem then if > > without the patch there is chance to lose some of the dirty bits in > > the migrating pmd pages (on x86_64 we're fetching bit 11 which is part > > of swap offset instead of bit 2) and it could potentially corrupt the > > memory of an userspace program which depends on the dirty bit. > > > > CC: Andrea Arcangeli <aarcange@redhat.com> > > CC: Andrew Morton <akpm@linux-foundation.org> > > CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > CC: Matthew Wilcox <willy@infradead.org> > > CC: Michal Hocko <mhocko@suse.com> > > CC: Dave Jiang <dave.jiang@intel.com> > > CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > > CC: Souptick Joarder <jrdr.linux@gmail.com> > > CC: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > > CC: linux-mm@kvack.org > > CC: linux-kernel@vger.kernel.org > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > v2: > > - fix it up for young/write/dirty bits too [Konstantin] > > --- > > mm/huge_memory.c | 15 ++++++++++----- > > 1 file changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index f2d19e4fe854..b00941b3d342 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2157,11 +2157,16 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > > page = pmd_page(old_pmd); > > VM_BUG_ON_PAGE(!page_count(page), page); > > page_ref_add(page, HPAGE_PMD_NR - 1); > > - if (pmd_dirty(old_pmd)) > > - SetPageDirty(page); > > - write = pmd_write(old_pmd); > > - young = pmd_young(old_pmd); > > - soft_dirty = pmd_soft_dirty(old_pmd); > > + if (unlikely(pmd_migration)) { > > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > > + young = write = false; > > + } else { > > + if (pmd_dirty(old_pmd)) > > + SetPageDirty(page); > > + write = pmd_write(old_pmd); > > + young = pmd_young(old_pmd); > > + soft_dirty = pmd_soft_dirty(old_pmd); > > + } > > Write/read-only is encoded into migration entry. > I suppose there should be something like this: > > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2151,16 +2151,21 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > > entry = pmd_to_swp_entry(old_pmd); > page = pfn_to_page(swp_offset(entry)); > + write = is_write_migration_entry(entry); > + young = false; > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > } else > #endif > + { > page = pmd_page(old_pmd); > + if (pmd_dirty(old_pmd)) > + SetPageDirty(page); > + write = pmd_write(old_pmd); > + young = pmd_young(old_pmd); > + soft_dirty = pmd_soft_dirty(old_pmd); > + } > VM_BUG_ON_PAGE(!page_count(page), page); > page_ref_add(page, HPAGE_PMD_NR - 1); > - if (pmd_dirty(old_pmd)) > - SetPageDirty(page); > - write = pmd_write(old_pmd); > - young = pmd_young(old_pmd); > - soft_dirty = pmd_soft_dirty(old_pmd); > > /* > * Withdraw the table only after we mark the pmd entry invalid. > Oops yes, I missed the write bit. Thanks for pointing it out. Should I repost with your authorship and your sign-off? Or even I'll consider to directly drop the CONFIG_ARCH_ENABLE_THP_MIGRATION if with that since I don't see much gain to keep it: diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f2d19e4fe854..aebade83cec9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2145,23 +2145,25 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, */ old_pmd = pmdp_invalidate(vma, haddr, pmd); -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION pmd_migration = is_pmd_migration_entry(old_pmd); - if (pmd_migration) { + if (unlikely(pmd_migration)) { swp_entry_t entry; entry = pmd_to_swp_entry(old_pmd); page = pfn_to_page(swp_offset(entry)); - } else -#endif + write = is_write_migration_entry(entry); + young = false; + soft_dirty = pmd_swp_soft_dirty(old_pmd); + } else { page = pmd_page(old_pmd); + if (pmd_dirty(old_pmd)) + SetPageDirty(page); + write = pmd_write(old_pmd); + young = pmd_young(old_pmd); + soft_dirty = pmd_soft_dirty(old_pmd); + } VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); - if (pmd_dirty(old_pmd)) - SetPageDirty(page); - write = pmd_write(old_pmd); - young = pmd_young(old_pmd); - soft_dirty = pmd_soft_dirty(old_pmd); /* * Withdraw the table only after we mark the pmd entry invalid. Thanks, -- Peter Xu ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: thp: fix flags for pmd migration when split 2018-12-12 5:15 ` Peter Xu @ 2018-12-12 13:51 ` Konstantin Khlebnikov 2018-12-13 3:22 ` Peter Xu 0 siblings, 1 reply; 6+ messages in thread From: Konstantin Khlebnikov @ 2018-12-12 13:51 UTC (permalink / raw) To: peterx Cc: Константин Хлебников, Linux Kernel Mailing List, Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, Matthew Wilcox, Michal Hocko, dave.jiang, Aneesh Kumar K.V, Souptick Joarder, linux-mm On Wed, Dec 12, 2018 at 8:15 AM Peter Xu <peterx@redhat.com> wrote: > > On Tue, Dec 11, 2018 at 11:21:44AM +0300, Konstantin Khlebnikov wrote: > > On 11.12.2018 8:12, Peter Xu wrote: > > > When splitting a huge migrating PMD, we'll transfer all the existing > > > PMD bits and apply them again onto the small PTEs. However we are > > > fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() > > > or pmd_yound() while actually they don't make sense at all when it's > > > a migration entry. Fix them up by make it conditional. > > > > > > Note that if my understanding is correct about the problem then if > > > without the patch there is chance to lose some of the dirty bits in > > > the migrating pmd pages (on x86_64 we're fetching bit 11 which is part > > > of swap offset instead of bit 2) and it could potentially corrupt the > > > memory of an userspace program which depends on the dirty bit. > > > > > > CC: Andrea Arcangeli <aarcange@redhat.com> > > > CC: Andrew Morton <akpm@linux-foundation.org> > > > CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > > CC: Matthew Wilcox <willy@infradead.org> > > > CC: Michal Hocko <mhocko@suse.com> > > > CC: Dave Jiang <dave.jiang@intel.com> > > > CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > > > CC: Souptick Joarder <jrdr.linux@gmail.com> > > > CC: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > > > CC: linux-mm@kvack.org > > > CC: linux-kernel@vger.kernel.org > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > --- > > > v2: > > > - fix it up for young/write/dirty bits too [Konstantin] > > > --- > > > mm/huge_memory.c | 15 ++++++++++----- > > > 1 file changed, 10 insertions(+), 5 deletions(-) > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index f2d19e4fe854..b00941b3d342 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -2157,11 +2157,16 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > > > page = pmd_page(old_pmd); > > > VM_BUG_ON_PAGE(!page_count(page), page); > > > page_ref_add(page, HPAGE_PMD_NR - 1); > > > - if (pmd_dirty(old_pmd)) > > > - SetPageDirty(page); > > > - write = pmd_write(old_pmd); > > > - young = pmd_young(old_pmd); > > > - soft_dirty = pmd_soft_dirty(old_pmd); > > > + if (unlikely(pmd_migration)) { > > > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > > > + young = write = false; > > > + } else { > > > + if (pmd_dirty(old_pmd)) > > > + SetPageDirty(page); > > > + write = pmd_write(old_pmd); > > > + young = pmd_young(old_pmd); > > > + soft_dirty = pmd_soft_dirty(old_pmd); > > > + } > > > > Write/read-only is encoded into migration entry. > > I suppose there should be something like this: > > > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2151,16 +2151,21 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > > > > entry = pmd_to_swp_entry(old_pmd); > > page = pfn_to_page(swp_offset(entry)); > > + write = is_write_migration_entry(entry); > > + young = false; > > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > > } else > > #endif > > + { > > page = pmd_page(old_pmd); > > + if (pmd_dirty(old_pmd)) > > + SetPageDirty(page); > > + write = pmd_write(old_pmd); > > + young = pmd_young(old_pmd); > > + soft_dirty = pmd_soft_dirty(old_pmd); > > + } > > VM_BUG_ON_PAGE(!page_count(page), page); > > page_ref_add(page, HPAGE_PMD_NR - 1); > > - if (pmd_dirty(old_pmd)) > > - SetPageDirty(page); > > - write = pmd_write(old_pmd); > > - young = pmd_young(old_pmd); > > - soft_dirty = pmd_soft_dirty(old_pmd); > > > > /* > > * Withdraw the table only after we mark the pmd entry invalid. > > > > Oops yes, I missed the write bit. Thanks for pointing it out. > > Should I repost with your authorship and your sign-off? Feel free to use this piece for your own patch. > Or even I'll > consider to directly drop the CONFIG_ARCH_ENABLE_THP_MIGRATION if with > that since I don't see much gain to keep it: Yep, this ifdef could be removed. Without CONFIG_ARCH_ENABLE_THP_MIGRATION is_pmd_migration_entry() is constant 0 so compiler should eliminate "if" branch. > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index f2d19e4fe854..aebade83cec9 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2145,23 +2145,25 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > */ > old_pmd = pmdp_invalidate(vma, haddr, pmd); > > -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION > pmd_migration = is_pmd_migration_entry(old_pmd); > - if (pmd_migration) { > + if (unlikely(pmd_migration)) { > swp_entry_t entry; > > entry = pmd_to_swp_entry(old_pmd); > page = pfn_to_page(swp_offset(entry)); > - } else > -#endif > + write = is_write_migration_entry(entry); > + young = false; > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > + } else { > page = pmd_page(old_pmd); > + if (pmd_dirty(old_pmd)) > + SetPageDirty(page); > + write = pmd_write(old_pmd); > + young = pmd_young(old_pmd); > + soft_dirty = pmd_soft_dirty(old_pmd); > + } > VM_BUG_ON_PAGE(!page_count(page), page); > page_ref_add(page, HPAGE_PMD_NR - 1); > - if (pmd_dirty(old_pmd)) > - SetPageDirty(page); > - write = pmd_write(old_pmd); > - young = pmd_young(old_pmd); > - soft_dirty = pmd_soft_dirty(old_pmd); > > /* > * Withdraw the table only after we mark the pmd entry invalid. > > Thanks, > > -- > Peter Xu > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: thp: fix flags for pmd migration when split 2018-12-12 13:51 ` Konstantin Khlebnikov @ 2018-12-13 3:22 ` Peter Xu 0 siblings, 0 replies; 6+ messages in thread From: Peter Xu @ 2018-12-13 3:22 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Константин Хлебников, Linux Kernel Mailing List, Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, Matthew Wilcox, Michal Hocko, dave.jiang, Aneesh Kumar K.V, Souptick Joarder, linux-mm, Zi Yan On Wed, Dec 12, 2018 at 04:51:38PM +0300, Konstantin Khlebnikov wrote: > On Wed, Dec 12, 2018 at 8:15 AM Peter Xu <peterx@redhat.com> wrote: > > > > On Tue, Dec 11, 2018 at 11:21:44AM +0300, Konstantin Khlebnikov wrote: > > > On 11.12.2018 8:12, Peter Xu wrote: > > > > When splitting a huge migrating PMD, we'll transfer all the existing > > > > PMD bits and apply them again onto the small PTEs. However we are > > > > fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() > > > > or pmd_yound() while actually they don't make sense at all when it's > > > > a migration entry. Fix them up by make it conditional. > > > > > > > > Note that if my understanding is correct about the problem then if > > > > without the patch there is chance to lose some of the dirty bits in > > > > the migrating pmd pages (on x86_64 we're fetching bit 11 which is part > > > > of swap offset instead of bit 2) and it could potentially corrupt the > > > > memory of an userspace program which depends on the dirty bit. > > > > > > > > CC: Andrea Arcangeli <aarcange@redhat.com> > > > > CC: Andrew Morton <akpm@linux-foundation.org> > > > > CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > > > CC: Matthew Wilcox <willy@infradead.org> > > > > CC: Michal Hocko <mhocko@suse.com> > > > > CC: Dave Jiang <dave.jiang@intel.com> > > > > CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > > > > CC: Souptick Joarder <jrdr.linux@gmail.com> > > > > CC: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > > > > CC: linux-mm@kvack.org > > > > CC: linux-kernel@vger.kernel.org > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > --- > > > > v2: > > > > - fix it up for young/write/dirty bits too [Konstantin] > > > > --- > > > > mm/huge_memory.c | 15 ++++++++++----- > > > > 1 file changed, 10 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > > index f2d19e4fe854..b00941b3d342 100644 > > > > --- a/mm/huge_memory.c > > > > +++ b/mm/huge_memory.c > > > > @@ -2157,11 +2157,16 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > > > > page = pmd_page(old_pmd); > > > > VM_BUG_ON_PAGE(!page_count(page), page); > > > > page_ref_add(page, HPAGE_PMD_NR - 1); > > > > - if (pmd_dirty(old_pmd)) > > > > - SetPageDirty(page); > > > > - write = pmd_write(old_pmd); > > > > - young = pmd_young(old_pmd); > > > > - soft_dirty = pmd_soft_dirty(old_pmd); > > > > + if (unlikely(pmd_migration)) { > > > > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > > > > + young = write = false; > > > > + } else { > > > > + if (pmd_dirty(old_pmd)) > > > > + SetPageDirty(page); > > > > + write = pmd_write(old_pmd); > > > > + young = pmd_young(old_pmd); > > > > + soft_dirty = pmd_soft_dirty(old_pmd); > > > > + } > > > > > > Write/read-only is encoded into migration entry. > > > I suppose there should be something like this: > > > > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -2151,16 +2151,21 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > > > > > > entry = pmd_to_swp_entry(old_pmd); > > > page = pfn_to_page(swp_offset(entry)); > > > + write = is_write_migration_entry(entry); > > > + young = false; > > > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > > > } else > > > #endif > > > + { > > > page = pmd_page(old_pmd); > > > + if (pmd_dirty(old_pmd)) > > > + SetPageDirty(page); > > > + write = pmd_write(old_pmd); > > > + young = pmd_young(old_pmd); > > > + soft_dirty = pmd_soft_dirty(old_pmd); > > > + } > > > VM_BUG_ON_PAGE(!page_count(page), page); > > > page_ref_add(page, HPAGE_PMD_NR - 1); > > > - if (pmd_dirty(old_pmd)) > > > - SetPageDirty(page); > > > - write = pmd_write(old_pmd); > > > - young = pmd_young(old_pmd); > > > - soft_dirty = pmd_soft_dirty(old_pmd); > > > > > > /* > > > * Withdraw the table only after we mark the pmd entry invalid. > > > > > > > Oops yes, I missed the write bit. Thanks for pointing it out. > > > > Should I repost with your authorship and your sign-off? > > Feel free to use this piece for your own patch. > > > Or even I'll > > consider to directly drop the CONFIG_ARCH_ENABLE_THP_MIGRATION if with > > that since I don't see much gain to keep it: > > Yep, this ifdef could be removed. > Without CONFIG_ARCH_ENABLE_THP_MIGRATION > is_pmd_migration_entry() is constant 0 so compiler should eliminate "if" branch. Thank you, Konstantin. I'll post v3 with the macro dropped. Regards, -- Peter Xu ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-12-13 3:23 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-11 5:12 [PATCH v2] mm: thp: fix flags for pmd migration when split Peter Xu 2018-12-11 8:21 ` Konstantin Khlebnikov 2018-12-11 19:07 ` Zi Yan 2018-12-12 5:15 ` Peter Xu 2018-12-12 13:51 ` Konstantin Khlebnikov 2018-12-13 3:22 ` Peter Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).