All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Ming Lin <mlin@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Simon Ser <contact@emersion.fr>,
	Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	virtio-fs-list <virtio-fs@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap()
Date: Mon, 28 Jun 2021 10:27:23 -0400	[thread overview]
Message-ID: <20210628142723.GB1803896@redhat.com> (raw)
In-Reply-To: <1622792602-40459-3-git-send-email-mlin@kernel.org>

On Fri, Jun 04, 2021 at 12:43:22AM -0700, Ming Lin wrote:
> Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
> "don't SIGBUS on fault". Right now, this flag is only allowed
> for private mapping.
> 
> For MAP_NOSIGBUS mapping, map in the zero page on read fault
> or fill a freshly allocated page with zeroes on write fault.

I am wondering if this could be of limited use for me if MAP_NOSIGBUS
were to be supported for shared mappings as well.

When virtiofs is run with dax enabled, then it is possible that if
a file is shared between two guests, then one guest truncates the
file and second guest tries to do load/store operation. Given current
kvm architecture, there is no mechanism to propagate SIGBUS to guest
process, instead KVM retries page fault infinitely and guest cpu/process
hangs.

Ideally we want this error to propagate all the way back into the
guest and to the guest process but that solution is not in place yet.

https://lore.kernel.org/kvm/20200406190951.GA19259@redhat.com/

In the absense of a proper solution, one could think of mapping
shared file on host with MAP_NOSIGBUS, and hopefully that means
kvm will be able to resolve fault to a zero filled page and guest
will not hang. But this means that data sharing between two processes
is now broken. Writes by process A will not be visible to process B
in another once this situation happens, IIUC.

So if we were to MAP_NOSIGBUS, guest will not hang but failures resulting
from ftruncate will be silent and will be noticed sometime later. I guess
not exactly a very pleasant scenario...

Thanks
Vivek



> 
> Signed-off-by: Ming Lin <mlin@kernel.org>
> ---
>  arch/parisc/include/uapi/asm/mman.h          |  1 +
>  include/linux/mm.h                           |  2 ++
>  include/linux/mman.h                         |  1 +
>  include/uapi/asm-generic/mman-common.h       |  1 +
>  mm/memory.c                                  | 11 +++++++++++
>  mm/mmap.c                                    |  4 ++++
>  tools/include/uapi/asm-generic/mman-common.h |  1 +
>  7 files changed, 21 insertions(+)
> 
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index ab78cba..eecf9af 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -25,6 +25,7 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  #define MAP_FIXED_NOREPLACE 0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS	0x200000	/* do not SIGBUS on fault */
>  #define MAP_UNINITIALIZED 0		/* uninitialized anonymous mmap */
>  
>  #define MS_SYNC		1		/* synchronous memory sync */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9e86ca1..100d122 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>  # define VM_UFFD_MINOR		VM_NONE
>  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
>  
> +#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on fault */
> +
>  /* Bits set in the VMA until the stack is in its final location */
>  #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
>  
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index b2cbae9..c966b08 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags)
>  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
>  	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>  	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
> +	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
>  	       arch_calc_vm_flag_bits(flags);
>  }
>  
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index f94f65d..a2a5333 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -29,6 +29,7 @@
>  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
>  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
>  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
>  
>  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> diff --git a/mm/memory.c b/mm/memory.c
> index 8d5e583..6b5a897 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3676,6 +3676,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
>  	}
>  
>  	ret = vma->vm_ops->fault(vmf);
> +	if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) {
> +		/*
> +		 * For MAP_NOSIGBUS mapping, map in the zero page on read fault
> +		 * or fill a freshly allocated page with zeroes on write fault
> +		 */
> +		ret = do_anonymous_page(vmf);
> +		if (!ret)
> +			ret = VM_FAULT_NOPAGE;
> +		return ret;
> +	}
> +
>  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
>  			    VM_FAULT_DONE_COW)))
>  		return ret;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 8bed547..d5c9fb5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  	if (!len)
>  		return -EINVAL;
>  
> +	/* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */
> +	if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE))
> +		return -EINVAL;
> +
>  	/*
>  	 * Does the application expect PROT_READ to imply PROT_EXEC?
>  	 *
> diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
> index f94f65d..a2a5333 100644
> --- a/tools/include/uapi/asm-generic/mman-common.h
> +++ b/tools/include/uapi/asm-generic/mman-common.h
> @@ -29,6 +29,7 @@
>  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
>  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
>  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
>  
>  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> -- 
> 1.8.3.1
> 


WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: Ming Lin <mlin@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Simon Ser <contact@emersion.fr>, Hugh Dickins <hughd@google.com>,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	virtio-fs-list <virtio-fs@redhat.com>,
	linux-mm@kvack.org, linux-api@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [Virtio-fs] [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap()
Date: Mon, 28 Jun 2021 10:27:23 -0400	[thread overview]
Message-ID: <20210628142723.GB1803896@redhat.com> (raw)
In-Reply-To: <1622792602-40459-3-git-send-email-mlin@kernel.org>

On Fri, Jun 04, 2021 at 12:43:22AM -0700, Ming Lin wrote:
> Adds new flag MAP_NOSIGBUS of mmap() to specify the behavior of
> "don't SIGBUS on fault". Right now, this flag is only allowed
> for private mapping.
> 
> For MAP_NOSIGBUS mapping, map in the zero page on read fault
> or fill a freshly allocated page with zeroes on write fault.

I am wondering if this could be of limited use for me if MAP_NOSIGBUS
were to be supported for shared mappings as well.

When virtiofs is run with dax enabled, then it is possible that if
a file is shared between two guests, then one guest truncates the
file and second guest tries to do load/store operation. Given current
kvm architecture, there is no mechanism to propagate SIGBUS to guest
process, instead KVM retries page fault infinitely and guest cpu/process
hangs.

Ideally we want this error to propagate all the way back into the
guest and to the guest process but that solution is not in place yet.

https://lore.kernel.org/kvm/20200406190951.GA19259@redhat.com/

In the absense of a proper solution, one could think of mapping
shared file on host with MAP_NOSIGBUS, and hopefully that means
kvm will be able to resolve fault to a zero filled page and guest
will not hang. But this means that data sharing between two processes
is now broken. Writes by process A will not be visible to process B
in another once this situation happens, IIUC.

So if we were to MAP_NOSIGBUS, guest will not hang but failures resulting
from ftruncate will be silent and will be noticed sometime later. I guess
not exactly a very pleasant scenario...

Thanks
Vivek



> 
> Signed-off-by: Ming Lin <mlin@kernel.org>
> ---
>  arch/parisc/include/uapi/asm/mman.h          |  1 +
>  include/linux/mm.h                           |  2 ++
>  include/linux/mman.h                         |  1 +
>  include/uapi/asm-generic/mman-common.h       |  1 +
>  mm/memory.c                                  | 11 +++++++++++
>  mm/mmap.c                                    |  4 ++++
>  tools/include/uapi/asm-generic/mman-common.h |  1 +
>  7 files changed, 21 insertions(+)
> 
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index ab78cba..eecf9af 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -25,6 +25,7 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  #define MAP_FIXED_NOREPLACE 0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS	0x200000	/* do not SIGBUS on fault */
>  #define MAP_UNINITIALIZED 0		/* uninitialized anonymous mmap */
>  
>  #define MS_SYNC		1		/* synchronous memory sync */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9e86ca1..100d122 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>  # define VM_UFFD_MINOR		VM_NONE
>  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
>  
> +#define VM_NOSIGBUS		VM_FLAGS_BIT(38)	/* Do not SIGBUS on fault */
> +
>  /* Bits set in the VMA until the stack is in its final location */
>  #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
>  
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index b2cbae9..c966b08 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags)
>  	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
>  	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>  	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
> +	       _calc_vm_trans(flags, MAP_NOSIGBUS,   VM_NOSIGBUS  ) |
>  	       arch_calc_vm_flag_bits(flags);
>  }
>  
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index f94f65d..a2a5333 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -29,6 +29,7 @@
>  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
>  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
>  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
>  
>  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> diff --git a/mm/memory.c b/mm/memory.c
> index 8d5e583..6b5a897 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3676,6 +3676,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
>  	}
>  
>  	ret = vma->vm_ops->fault(vmf);
> +	if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) {
> +		/*
> +		 * For MAP_NOSIGBUS mapping, map in the zero page on read fault
> +		 * or fill a freshly allocated page with zeroes on write fault
> +		 */
> +		ret = do_anonymous_page(vmf);
> +		if (!ret)
> +			ret = VM_FAULT_NOPAGE;
> +		return ret;
> +	}
> +
>  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
>  			    VM_FAULT_DONE_COW)))
>  		return ret;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 8bed547..d5c9fb5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  	if (!len)
>  		return -EINVAL;
>  
> +	/* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */
> +	if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE))
> +		return -EINVAL;
> +
>  	/*
>  	 * Does the application expect PROT_READ to imply PROT_EXEC?
>  	 *
> diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
> index f94f65d..a2a5333 100644
> --- a/tools/include/uapi/asm-generic/mman-common.h
> +++ b/tools/include/uapi/asm-generic/mman-common.h
> @@ -29,6 +29,7 @@
>  #define MAP_HUGETLB		0x040000	/* create a huge page mapping */
>  #define MAP_SYNC		0x080000 /* perform synchronous page faults for the mapping */
>  #define MAP_FIXED_NOREPLACE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +#define MAP_NOSIGBUS		0x200000	/* do not SIGBUS on fault */
>  
>  #define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> -- 
> 1.8.3.1
> 


  parent reply	other threads:[~2021-06-28 14:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04  7:43 [PATCH v2 0/2] mm: support NOSIGBUS on fault of mmap Ming Lin
2021-06-04  7:43 ` [PATCH v2 1/2] mm: make "vm_flags" be an u64 Ming Lin
2021-06-04  7:43 ` [PATCH v2 2/2] mm: adds NOSIGBUS extension to mmap() Ming Lin
2021-06-04 15:24   ` Kirill A. Shutemov
2021-06-04 16:22     ` Ming Lin
2021-06-28 14:27   ` Vivek Goyal [this message]
2021-06-28 14:27     ` [Virtio-fs] " Vivek Goyal
2021-06-30 16:37     ` Ming Lin
2021-06-30 16:37       ` [Virtio-fs] " Ming Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210628142723.GB1803896@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=contact@emersion.fr \
    --cc=dgilbert@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=mlin@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=virtio-fs@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.