linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Fix warning in insert_pfn()
@ 2018-08-24 15:45 Jan Kara
  2018-10-03 16:35 ` Theodore Y. Ts'o
  2018-10-11  0:30 ` Andrew Morton
  0 siblings, 2 replies; 6+ messages in thread
From: Jan Kara @ 2018-08-24 15:45 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, Ross Zwisler, Dan Williams, linux-mm, Dave Jiang, Jan Kara

In DAX mode a write pagefault can race with write(2) in the following
way:

CPU0                            CPU1
                                write fault for mapped zero page (hole)
dax_iomap_rw()
  iomap_apply()
    xfs_file_iomap_begin()
      - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - invalidates radix tree entries in given range
                                dax_iomap_pte_fault()
                                  grab_mapping_entry()
                                    - no entry found, creates empty
                                  ...
                                  xfs_file_iomap_begin()
                                    - finds already allocated block
                                  ...
                                  vmf_insert_mixed_mkwrite()
                                    - WARNs and does nothing because there
                                      is still zero page mapped in PTE
        unmap_mapping_pages()

This race results in WARN_ON from insert_pfn() and is occasionally
triggered by fstest generic/344. Note that the race is otherwise
harmless as before write(2) on CPU0 is finished, we will invalidate page
tables properly and thus user of mmap will see modified data from
write(2) from that point on. So just restrict the warning only to the
case when the PFN in PTE is not zero page.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 83aef222f11b..e82cd2125d72 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1787,10 +1787,15 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 			 * in may not match the PFN we have mapped if the
 			 * mapped PFN is a writeable COW page.  In the mkwrite
 			 * case we are creating a writable PTE for a shared
-			 * mapping and we expect the PFNs to match.
+			 * mapping and we expect the PFNs to match. If they
+			 * don't match, we are likely racing with block
+			 * allocation and mapping invalidation so just skip the
+			 * update.
 			 */
-			if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn)))
+			if (pte_pfn(*pte) != pfn_t_to_pfn(pfn)) {
+				WARN_ON_ONCE(!is_zero_pfn(pte_pfn(*pte)));
 				goto out_unlock;
+			}
 			entry = *pte;
 			goto out_mkwrite;
 		} else
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: Fix warning in insert_pfn()
  2018-08-24 15:45 [PATCH] mm: Fix warning in insert_pfn() Jan Kara
@ 2018-10-03 16:35 ` Theodore Y. Ts'o
  2018-10-03 16:56   ` Dan Williams
  2018-10-11  0:30 ` Andrew Morton
  1 sibling, 1 reply; 6+ messages in thread
From: Theodore Y. Ts'o @ 2018-10-03 16:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-ext4, Ross Zwisler, Dan Williams, linux-mm,
	Dave Jiang

On Fri, Aug 24, 2018 at 05:45:42PM +0200, Jan Kara wrote:
> In DAX mode a write pagefault can race with write(2) in the following
> way:
> 
> CPU0                            CPU1
>                                 write fault for mapped zero page (hole)
> dax_iomap_rw()
>   iomap_apply()
>     xfs_file_iomap_begin()
>       - allocates blocks
>     dax_iomap_actor()
>       invalidate_inode_pages2_range()
>         - invalidates radix tree entries in given range
>                                 dax_iomap_pte_fault()
>                                   grab_mapping_entry()
>                                     - no entry found, creates empty
>                                   ...
>                                   xfs_file_iomap_begin()
>                                     - finds already allocated block
>                                   ...
>                                   vmf_insert_mixed_mkwrite()
>                                     - WARNs and does nothing because there
>                                       is still zero page mapped in PTE
>         unmap_mapping_pages()
> 
> This race results in WARN_ON from insert_pfn() and is occasionally
> triggered by fstest generic/344. Note that the race is otherwise
> harmless as before write(2) on CPU0 is finished, we will invalidate page
> tables properly and thus user of mmap will see modified data from
> write(2) from that point on. So just restrict the warning only to the
> case when the PFN in PTE is not zero page.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

I don't see this in linux-next.  What's the status of this patch?

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: Fix warning in insert_pfn()
  2018-10-03 16:35 ` Theodore Y. Ts'o
@ 2018-10-03 16:56   ` Dan Williams
  2018-10-04 14:35     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Williams @ 2018-10-03 16:56 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, linux-fsdevel, linux-ext4, Ross Zwisler, Linux MM, Dave Jiang

On Wed, Oct 3, 2018 at 9:40 AM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> On Fri, Aug 24, 2018 at 05:45:42PM +0200, Jan Kara wrote:
> > In DAX mode a write pagefault can race with write(2) in the following
> > way:
> >
> > CPU0                            CPU1
> >                                 write fault for mapped zero page (hole)
> > dax_iomap_rw()
> >   iomap_apply()
> >     xfs_file_iomap_begin()
> >       - allocates blocks
> >     dax_iomap_actor()
> >       invalidate_inode_pages2_range()
> >         - invalidates radix tree entries in given range
> >                                 dax_iomap_pte_fault()
> >                                   grab_mapping_entry()
> >                                     - no entry found, creates empty
> >                                   ...
> >                                   xfs_file_iomap_begin()
> >                                     - finds already allocated block
> >                                   ...
> >                                   vmf_insert_mixed_mkwrite()
> >                                     - WARNs and does nothing because there
> >                                       is still zero page mapped in PTE
> >         unmap_mapping_pages()
> >
> > This race results in WARN_ON from insert_pfn() and is occasionally
> > triggered by fstest generic/344. Note that the race is otherwise
> > harmless as before write(2) on CPU0 is finished, we will invalidate page
> > tables properly and thus user of mmap will see modified data from
> > write(2) from that point on. So just restrict the warning only to the
> > case when the PFN in PTE is not zero page.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
>
> I don't see this in linux-next.  What's the status of this patch?
>

It's in Andrew's tree. I believe we are awaiting the next -next
release to rebase on latest mmotm.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: Fix warning in insert_pfn()
  2018-10-03 16:56   ` Dan Williams
@ 2018-10-04 14:35     ` Theodore Y. Ts'o
  0 siblings, 0 replies; 6+ messages in thread
From: Theodore Y. Ts'o @ 2018-10-04 14:35 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-fsdevel, linux-ext4, Ross Zwisler, Linux MM, Dave Jiang

On Wed, Oct 03, 2018 at 09:56:09AM -0700, Dan Williams wrote:
> 
> It's in Andrew's tree. I believe we are awaiting the next -next
> release to rebase on latest mmotm.

Great, thanks for the update!

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: Fix warning in insert_pfn()
  2018-08-24 15:45 [PATCH] mm: Fix warning in insert_pfn() Jan Kara
  2018-10-03 16:35 ` Theodore Y. Ts'o
@ 2018-10-11  0:30 ` Andrew Morton
  2018-10-11  0:46   ` Dan Williams
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2018-10-11  0:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-ext4, Ross Zwisler, Dan Williams, linux-mm,
	Dave Jiang

On Fri, 24 Aug 2018 17:45:42 +0200 Jan Kara <jack@suse.cz> wrote:

> In DAX mode a write pagefault can race with write(2) in the following
> way:
> 
> CPU0                            CPU1
>                                 write fault for mapped zero page (hole)
> dax_iomap_rw()
>   iomap_apply()
>     xfs_file_iomap_begin()
>       - allocates blocks
>     dax_iomap_actor()
>       invalidate_inode_pages2_range()
>         - invalidates radix tree entries in given range
>                                 dax_iomap_pte_fault()
>                                   grab_mapping_entry()
>                                     - no entry found, creates empty
>                                   ...
>                                   xfs_file_iomap_begin()
>                                     - finds already allocated block
>                                   ...
>                                   vmf_insert_mixed_mkwrite()
>                                     - WARNs and does nothing because there
>                                       is still zero page mapped in PTE
>         unmap_mapping_pages()
> 
> This race results in WARN_ON from insert_pfn() and is occasionally
> triggered by fstest generic/344. Note that the race is otherwise
> harmless as before write(2) on CPU0 is finished, we will invalidate page
> tables properly and thus user of mmap will see modified data from
> write(2) from that point on. So just restrict the warning only to the
> case when the PFN in PTE is not zero page.
> 
> ...
>
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1787,10 +1787,15 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
>  			 * in may not match the PFN we have mapped if the
>  			 * mapped PFN is a writeable COW page.  In the mkwrite
>  			 * case we are creating a writable PTE for a shared
> -			 * mapping and we expect the PFNs to match.
> +			 * mapping and we expect the PFNs to match. If they
> +			 * don't match, we are likely racing with block
> +			 * allocation and mapping invalidation so just skip the
> +			 * update.
>  			 */
> -			if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn)))
> +			if (pte_pfn(*pte) != pfn_t_to_pfn(pfn)) {
> +				WARN_ON_ONCE(!is_zero_pfn(pte_pfn(*pte)));
>  				goto out_unlock;
> +			}
>  			entry = *pte;

Shouldn't we just remove the warning?  We know it happens and we know
why it happens and we know it's harmless.  What's the point in scaring
people?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: Fix warning in insert_pfn()
  2018-10-11  0:30 ` Andrew Morton
@ 2018-10-11  0:46   ` Dan Williams
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Williams @ 2018-10-11  0:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, linux-fsdevel, linux-ext4, Ross Zwisler, Linux MM, Dave Jiang

On Wed, Oct 10, 2018 at 5:37 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 24 Aug 2018 17:45:42 +0200 Jan Kara <jack@suse.cz> wrote:
>
> > In DAX mode a write pagefault can race with write(2) in the following
> > way:
> >
> > CPU0                            CPU1
> >                                 write fault for mapped zero page (hole)
> > dax_iomap_rw()
> >   iomap_apply()
> >     xfs_file_iomap_begin()
> >       - allocates blocks
> >     dax_iomap_actor()
> >       invalidate_inode_pages2_range()
> >         - invalidates radix tree entries in given range
> >                                 dax_iomap_pte_fault()
> >                                   grab_mapping_entry()
> >                                     - no entry found, creates empty
> >                                   ...
> >                                   xfs_file_iomap_begin()
> >                                     - finds already allocated block
> >                                   ...
> >                                   vmf_insert_mixed_mkwrite()
> >                                     - WARNs and does nothing because there
> >                                       is still zero page mapped in PTE
> >         unmap_mapping_pages()
> >
> > This race results in WARN_ON from insert_pfn() and is occasionally
> > triggered by fstest generic/344. Note that the race is otherwise
> > harmless as before write(2) on CPU0 is finished, we will invalidate page
> > tables properly and thus user of mmap will see modified data from
> > write(2) from that point on. So just restrict the warning only to the
> > case when the PFN in PTE is not zero page.
> >
> > ...
> >
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1787,10 +1787,15 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> >                        * in may not match the PFN we have mapped if the
> >                        * mapped PFN is a writeable COW page.  In the mkwrite
> >                        * case we are creating a writable PTE for a shared
> > -                      * mapping and we expect the PFNs to match.
> > +                      * mapping and we expect the PFNs to match. If they
> > +                      * don't match, we are likely racing with block
> > +                      * allocation and mapping invalidation so just skip the
> > +                      * update.
> >                        */
> > -                     if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn)))
> > +                     if (pte_pfn(*pte) != pfn_t_to_pfn(pfn)) {
> > +                             WARN_ON_ONCE(!is_zero_pfn(pte_pfn(*pte)));
> >                               goto out_unlock;
> > +                     }
> >                       entry = *pte;
>
> Shouldn't we just remove the warning?  We know it happens and we know
> why it happens and we know it's harmless.  What's the point in scaring
> people?

tl;dr let's keep it.

I think this fix effectively pushes this into "can't happen"
territory, but if it does our dax assumptions are off somewhere else.
So, I think this is useful for developers hacking around in the dax
code to make sure they aren't breaking some fundamental assumption.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-11  0:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-24 15:45 [PATCH] mm: Fix warning in insert_pfn() Jan Kara
2018-10-03 16:35 ` Theodore Y. Ts'o
2018-10-03 16:56   ` Dan Williams
2018-10-04 14:35     ` Theodore Y. Ts'o
2018-10-11  0:30 ` Andrew Morton
2018-10-11  0:46   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).