All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
@ 2015-11-12 15:18 Jason J. Herne
  2015-11-12 16:45 ` Christian Borntraeger
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jason J. Herne @ 2015-11-12 15:18 UTC (permalink / raw)
  To: linux-s390; +Cc: linux-mm, akpm, aarcange, borntraeger, Jason J. Herne

MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
hugepage but hugepage_madvise() takes the error path when we ask to turn
on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
new postcopy migration feature to fail on s390 because its first action is
to madvise the guest address space as NOHUGEPAGE. This patch modifies the
code so that the operation succeeds without error now.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/huge_memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c29ddeb..62fe06b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2009,7 +2009,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
 		/*
 		 * Be somewhat over-protective like KSM for now!
 		 */
-		if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
+		if (*vm_flags & VM_NO_THP)
 			return -EINVAL;
 		*vm_flags &= ~VM_NOHUGEPAGE;
 		*vm_flags |= VM_HUGEPAGE;
@@ -2025,7 +2025,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
 		/*
 		 * Be somewhat over-protective like KSM for now!
 		 */
-		if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
+		if (*vm_flags & VM_NO_THP)
 			return -EINVAL;
 		*vm_flags &= ~VM_HUGEPAGE;
 		*vm_flags |= VM_NOHUGEPAGE;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
  2015-11-12 15:18 [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390 Jason J. Herne
@ 2015-11-12 16:45 ` Christian Borntraeger
  2015-11-13 22:58 ` David Rientjes
       [not found] ` <1447341516-18076-1-git-send-email-jjherne-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2 siblings, 0 replies; 14+ messages in thread
From: Christian Borntraeger @ 2015-11-12 16:45 UTC (permalink / raw)
  To: Jason J. Herne, linux-s390; +Cc: linux-mm, akpm, aarcange

On 11/12/2015 04:18 PM, Jason J. Herne wrote:
> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
> hugepage but hugepage_madvise() takes the error path when we ask to turn
> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
> new postcopy migration feature to fail on s390 because its first action is
> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
> code so that the operation succeeds without error now.

maybe add

"For consistency reasons do the same for MADV_HUGEPAGE."

> 
> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

Andrew, can you queue this patch?


> ---
>  mm/huge_memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c29ddeb..62fe06b 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2009,7 +2009,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>  		/*
>  		 * Be somewhat over-protective like KSM for now!
>  		 */
> -		if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
> +		if (*vm_flags & VM_NO_THP)
>  			return -EINVAL;
>  		*vm_flags &= ~VM_NOHUGEPAGE;
>  		*vm_flags |= VM_HUGEPAGE;
> @@ -2025,7 +2025,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>  		/*
>  		 * Be somewhat over-protective like KSM for now!
>  		 */
> -		if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
> +		if (*vm_flags & VM_NO_THP)
>  			return -EINVAL;
>  		*vm_flags &= ~VM_HUGEPAGE;
>  		*vm_flags |= VM_NOHUGEPAGE;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
  2015-11-12 15:18 [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390 Jason J. Herne
  2015-11-12 16:45 ` Christian Borntraeger
@ 2015-11-13 22:58 ` David Rientjes
       [not found] ` <1447341516-18076-1-git-send-email-jjherne-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2 siblings, 0 replies; 14+ messages in thread
From: David Rientjes @ 2015-11-13 22:58 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: linux-s390, linux-mm, akpm, aarcange, borntraeger

On Thu, 12 Nov 2015, Jason J. Herne wrote:

> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
> hugepage but hugepage_madvise() takes the error path when we ask to turn
> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
> new postcopy migration feature to fail on s390 because its first action is
> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
> code so that the operation succeeds without error now.
> 
> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
  2015-11-12 15:18 [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390 Jason J. Herne
@ 2015-11-18 13:31     ` Vlastimil Babka
  2015-11-13 22:58 ` David Rientjes
       [not found] ` <1447341516-18076-1-git-send-email-jjherne-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-11-18 13:31 UTC (permalink / raw)
  To: Jason J. Herne, linux-s390-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	borntraeger-tA70FqPdS9bQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

[CC += linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org]

Since this is a kernel-user-space API change, please CC linux-api@. The kernel
source file Documentation/SubmitChecklist notes that all Linux kernel patches
that change userspace interfaces should be CCed to linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, so
that the various parties who are interested in API changes are informed. For
further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html

On 11/12/2015 04:18 PM, Jason J. Herne wrote:
> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
> hugepage but hugepage_madvise() takes the error path when we ask to turn
> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
> new postcopy migration feature to fail on s390 because its first action is
> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
> code so that the operation succeeds without error now.
> 
> Signed-off-by: Jason J. Herne <jjherne-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Reviewed-by: Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Looks like the manpage should be fine, as it wasn't very specific wrt these
madvise flags. The only thing that potentially applies is:

"EINVAL advice is not a valid."

which itself looks like it needs fixing. Valid what, value? As in completely
unknown flags, or flags not valid for the given vma?

Anyway, I agree that it doesn't make sense to fail madvise when the given flag
is already set. On the other hand, I don't think the userspace app should fail
just because of madvise failing? It should in general be an advice that the
kernel is also strictly speaking free to ignore as it shouldn't affect
correctnes, just performance. Yeah, there are exceptions today like
MADV_DONTNEED, but that shouldn't apply to hugepages?
So I think Qemu needs fixing too. Also what happens if the kernel is build
without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL, how does
Qemu handle that?

> ---
>  mm/huge_memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c29ddeb..62fe06b 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2009,7 +2009,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>  		/*
>  		 * Be somewhat over-protective like KSM for now!
>  		 */
> -		if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
> +		if (*vm_flags & VM_NO_THP)
>  			return -EINVAL;
>  		*vm_flags &= ~VM_NOHUGEPAGE;
>  		*vm_flags |= VM_HUGEPAGE;
> @@ -2025,7 +2025,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>  		/*
>  		 * Be somewhat over-protective like KSM for now!
>  		 */
> -		if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
> +		if (*vm_flags & VM_NO_THP)
>  			return -EINVAL;
>  		*vm_flags &= ~VM_HUGEPAGE;
>  		*vm_flags |= VM_NOHUGEPAGE;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
@ 2015-11-18 13:31     ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-11-18 13:31 UTC (permalink / raw)
  To: Jason J. Herne, linux-s390
  Cc: linux-mm, akpm, aarcange, borntraeger, linux-api, linux-man

[CC += linux-api@vger.kernel.org]

Since this is a kernel-user-space API change, please CC linux-api@. The kernel
source file Documentation/SubmitChecklist notes that all Linux kernel patches
that change userspace interfaces should be CCed to linux-api@vger.kernel.org, so
that the various parties who are interested in API changes are informed. For
further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html

On 11/12/2015 04:18 PM, Jason J. Herne wrote:
> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
> hugepage but hugepage_madvise() takes the error path when we ask to turn
> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
> new postcopy migration feature to fail on s390 because its first action is
> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
> code so that the operation succeeds without error now.
> 
> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

Looks like the manpage should be fine, as it wasn't very specific wrt these
madvise flags. The only thing that potentially applies is:

"EINVAL advice is not a valid."

which itself looks like it needs fixing. Valid what, value? As in completely
unknown flags, or flags not valid for the given vma?

Anyway, I agree that it doesn't make sense to fail madvise when the given flag
is already set. On the other hand, I don't think the userspace app should fail
just because of madvise failing? It should in general be an advice that the
kernel is also strictly speaking free to ignore as it shouldn't affect
correctnes, just performance. Yeah, there are exceptions today like
MADV_DONTNEED, but that shouldn't apply to hugepages?
So I think Qemu needs fixing too. Also what happens if the kernel is build
without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL, how does
Qemu handle that?

> ---
>  mm/huge_memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c29ddeb..62fe06b 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2009,7 +2009,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>  		/*
>  		 * Be somewhat over-protective like KSM for now!
>  		 */
> -		if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
> +		if (*vm_flags & VM_NO_THP)
>  			return -EINVAL;
>  		*vm_flags &= ~VM_NOHUGEPAGE;
>  		*vm_flags |= VM_HUGEPAGE;
> @@ -2025,7 +2025,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>  		/*
>  		 * Be somewhat over-protective like KSM for now!
>  		 */
> -		if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
> +		if (*vm_flags & VM_NO_THP)
>  			return -EINVAL;
>  		*vm_flags &= ~VM_HUGEPAGE;
>  		*vm_flags |= VM_NOHUGEPAGE;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
  2015-11-18 13:31     ` Vlastimil Babka
@ 2015-11-19  8:22       ` Christian Borntraeger
  -1 siblings, 0 replies; 14+ messages in thread
From: Christian Borntraeger @ 2015-11-19  8:22 UTC (permalink / raw)
  To: Vlastimil Babka, Jason J. Herne, linux-s390
  Cc: linux-mm, akpm, aarcange, linux-api, linux-man, qemu-devel,
	Dr. David Alan Gilbert, Juan Quintela

On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
> [CC += linux-api@vger.kernel.org]
> 
> Since this is a kernel-user-space API change, please CC linux-api@. The kernel
> source file Documentation/SubmitChecklist notes that all Linux kernel patches
> that change userspace interfaces should be CCed to linux-api@vger.kernel.org, so
> that the various parties who are interested in API changes are informed. For
> further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html
> 
> On 11/12/2015 04:18 PM, Jason J. Herne wrote:
>> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
>> hugepage but hugepage_madvise() takes the error path when we ask to turn
>> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
>> new postcopy migration feature to fail on s390 because its first action is
>> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
>> code so that the operation succeeds without error now.
>>
>> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> 
> Looks like the manpage should be fine, as it wasn't very specific wrt these
> madvise flags. The only thing that potentially applies is:
> 
> "EINVAL advice is not a valid."
> 
> which itself looks like it needs fixing. Valid what, value? As in completely
> unknown flags, or flags not valid for the given vma?
> 
> Anyway, I agree that it doesn't make sense to fail madvise when the given flag
> is already set. On the other hand, I don't think the userspace app should fail
> just because of madvise failing? It should in general be an advice that the
> kernel is also strictly speaking free to ignore as it shouldn't affect
> correctnes, just performance. Yeah, there are exceptions today like
> MADV_DONTNEED, but that shouldn't apply to hugepages?
> So I think Qemu needs fixing too.

yes, I agree. David, Juan. I think The postcopy code should not fail if the madvise.
Can you fix that? 

 Also what happens if the kernel is build
> without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL,

Does it? To me it looks more like we would trigger a kernel bug.

mm/madvise.c:
        case MADV_HUGEPAGE:
        case MADV_NOHUGEPAGE:
                error = hugepage_madvise(vma, &new_flags, behavior);  <-----
                if (error)
                        goto out;
                break;
        }


include/linux/huge_mm.h:
static inline int hugepage_madvise(struct vm_area_struct *vma,
                                   unsigned long *vm_flags, int advice)
{
        BUG();
        return 0;
}

If we just remove the BUG() statement the code would actually be correct
in simply ignoring an MADVISE it cannot handle. If you agree, I can
spin a patch.




> how does Qemu handle that?

The normal qemu startup ignores the return value of the madvise. Only the
recent post migration changes want to disable huge pages for userfaultd.
And this code checks the return value. And yes, we should change that
in QEMU.




> 
>> ---
>>  mm/huge_memory.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index c29ddeb..62fe06b 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2009,7 +2009,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>>  		/*
>>  		 * Be somewhat over-protective like KSM for now!
>>  		 */
>> -		if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
>> +		if (*vm_flags & VM_NO_THP)
>>  			return -EINVAL;
>>  		*vm_flags &= ~VM_NOHUGEPAGE;
>>  		*vm_flags |= VM_HUGEPAGE;
>> @@ -2025,7 +2025,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>>  		/*
>>  		 * Be somewhat over-protective like KSM for now!
>>  		 */
>> -		if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
>> +		if (*vm_flags & VM_NO_THP)
>>  			return -EINVAL;
>>  		*vm_flags &= ~VM_HUGEPAGE;
>>  		*vm_flags |= VM_NOHUGEPAGE;
>>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
@ 2015-11-19  8:22       ` Christian Borntraeger
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Borntraeger @ 2015-11-19  8:22 UTC (permalink / raw)
  To: Vlastimil Babka, Jason J. Herne, linux-s390
  Cc: aarcange, linux-man, Juan Quintela, linux-api, qemu-devel,
	Dr. David Alan Gilbert, linux-mm, akpm

On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
> [CC += linux-api@vger.kernel.org]
> 
> Since this is a kernel-user-space API change, please CC linux-api@. The kernel
> source file Documentation/SubmitChecklist notes that all Linux kernel patches
> that change userspace interfaces should be CCed to linux-api@vger.kernel.org, so
> that the various parties who are interested in API changes are informed. For
> further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html
> 
> On 11/12/2015 04:18 PM, Jason J. Herne wrote:
>> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
>> hugepage but hugepage_madvise() takes the error path when we ask to turn
>> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
>> new postcopy migration feature to fail on s390 because its first action is
>> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
>> code so that the operation succeeds without error now.
>>
>> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> 
> Looks like the manpage should be fine, as it wasn't very specific wrt these
> madvise flags. The only thing that potentially applies is:
> 
> "EINVAL advice is not a valid."
> 
> which itself looks like it needs fixing. Valid what, value? As in completely
> unknown flags, or flags not valid for the given vma?
> 
> Anyway, I agree that it doesn't make sense to fail madvise when the given flag
> is already set. On the other hand, I don't think the userspace app should fail
> just because of madvise failing? It should in general be an advice that the
> kernel is also strictly speaking free to ignore as it shouldn't affect
> correctnes, just performance. Yeah, there are exceptions today like
> MADV_DONTNEED, but that shouldn't apply to hugepages?
> So I think Qemu needs fixing too.

yes, I agree. David, Juan. I think The postcopy code should not fail if the madvise.
Can you fix that? 

 Also what happens if the kernel is build
> without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL,

Does it? To me it looks more like we would trigger a kernel bug.

mm/madvise.c:
        case MADV_HUGEPAGE:
        case MADV_NOHUGEPAGE:
                error = hugepage_madvise(vma, &new_flags, behavior);  <-----
                if (error)
                        goto out;
                break;
        }


include/linux/huge_mm.h:
static inline int hugepage_madvise(struct vm_area_struct *vma,
                                   unsigned long *vm_flags, int advice)
{
        BUG();
        return 0;
}

If we just remove the BUG() statement the code would actually be correct
in simply ignoring an MADVISE it cannot handle. If you agree, I can
spin a patch.




> how does Qemu handle that?

The normal qemu startup ignores the return value of the madvise. Only the
recent post migration changes want to disable huge pages for userfaultd.
And this code checks the return value. And yes, we should change that
in QEMU.




> 
>> ---
>>  mm/huge_memory.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index c29ddeb..62fe06b 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2009,7 +2009,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>>  		/*
>>  		 * Be somewhat over-protective like KSM for now!
>>  		 */
>> -		if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
>> +		if (*vm_flags & VM_NO_THP)
>>  			return -EINVAL;
>>  		*vm_flags &= ~VM_NOHUGEPAGE;
>>  		*vm_flags |= VM_HUGEPAGE;
>> @@ -2025,7 +2025,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
>>  		/*
>>  		 * Be somewhat over-protective like KSM for now!
>>  		 */
>> -		if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
>> +		if (*vm_flags & VM_NO_THP)
>>  			return -EINVAL;
>>  		*vm_flags &= ~VM_HUGEPAGE;
>>  		*vm_flags |= VM_NOHUGEPAGE;
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
  2015-11-19  8:22       ` [Qemu-devel] " Christian Borntraeger
  (?)
@ 2015-11-19  9:31           ` Vlastimil Babka
  -1 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-11-19  9:31 UTC (permalink / raw)
  To: Christian Borntraeger, Jason J. Herne, linux-s390-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA, qemu-devel,
	Dr. David Alan Gilbert, Juan Quintela

On 11/19/2015 09:22 AM, Christian Borntraeger wrote:
> On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
>> [CC += linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org]
>> Anyway, I agree that it doesn't make sense to fail madvise when the given flag
>> is already set. On the other hand, I don't think the userspace app should fail
>> just because of madvise failing? It should in general be an advice that the
>> kernel is also strictly speaking free to ignore as it shouldn't affect
>> correctnes, just performance. Yeah, there are exceptions today like
>> MADV_DONTNEED, but that shouldn't apply to hugepages?
>> So I think Qemu needs fixing too.
>
> yes, I agree. David, Juan. I think The postcopy code should not fail if the madvise.
> Can you fix that?
>
>   Also what happens if the kernel is build
>> without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL,
>
> Does it? To me it looks more like we would trigger a kernel bug.
>
> mm/madvise.c:
>          case MADV_HUGEPAGE:
>          case MADV_NOHUGEPAGE:
>                  error = hugepage_madvise(vma, &new_flags, behavior);  <-----
>                  if (error)
>                          goto out;
>                  break;
>          }
>
>
> include/linux/huge_mm.h:
> static inline int hugepage_madvise(struct vm_area_struct *vma,
>                                     unsigned long *vm_flags, int advice)
> {
>          BUG();
>          return 0;
> }
>
> If we just remove the BUG() statement the code would actually be correct
> in simply ignoring an MADVISE it cannot handle. If you agree, I can
> spin a patch.

Yeah this looks suspicious at first, but the code is not reachable 
thanks to madvise_behavior_valid() returning false:

...
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
         case MADV_HUGEPAGE:
         case MADV_NOHUGEPAGE:
#endif
         case MADV_DONTDUMP:
         case MADV_DODUMP:
                 return true;

         default:
                 return false;

I think the BUG() is pointless (KSM doesn't use it) but not wrong. I 
wouldn't object to removal.

>
>
>> how does Qemu handle that?
>
> The normal qemu startup ignores the return value of the madvise. Only the
> recent post migration changes want to disable huge pages for userfaultd.
> And this code checks the return value. And yes, we should change that
> in QEMU.

Great, thanks :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
@ 2015-11-19  9:31           ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-11-19  9:31 UTC (permalink / raw)
  To: Christian Borntraeger, Jason J. Herne, linux-s390
  Cc: linux-mm, akpm, aarcange, linux-api, linux-man, qemu-devel,
	Dr. David Alan Gilbert, Juan Quintela

On 11/19/2015 09:22 AM, Christian Borntraeger wrote:
> On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
>> [CC += linux-api@vger.kernel.org]
>> Anyway, I agree that it doesn't make sense to fail madvise when the given flag
>> is already set. On the other hand, I don't think the userspace app should fail
>> just because of madvise failing? It should in general be an advice that the
>> kernel is also strictly speaking free to ignore as it shouldn't affect
>> correctnes, just performance. Yeah, there are exceptions today like
>> MADV_DONTNEED, but that shouldn't apply to hugepages?
>> So I think Qemu needs fixing too.
>
> yes, I agree. David, Juan. I think The postcopy code should not fail if the madvise.
> Can you fix that?
>
>   Also what happens if the kernel is build
>> without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL,
>
> Does it? To me it looks more like we would trigger a kernel bug.
>
> mm/madvise.c:
>          case MADV_HUGEPAGE:
>          case MADV_NOHUGEPAGE:
>                  error = hugepage_madvise(vma, &new_flags, behavior);  <-----
>                  if (error)
>                          goto out;
>                  break;
>          }
>
>
> include/linux/huge_mm.h:
> static inline int hugepage_madvise(struct vm_area_struct *vma,
>                                     unsigned long *vm_flags, int advice)
> {
>          BUG();
>          return 0;
> }
>
> If we just remove the BUG() statement the code would actually be correct
> in simply ignoring an MADVISE it cannot handle. If you agree, I can
> spin a patch.

Yeah this looks suspicious at first, but the code is not reachable 
thanks to madvise_behavior_valid() returning false:

...
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
         case MADV_HUGEPAGE:
         case MADV_NOHUGEPAGE:
#endif
         case MADV_DONTDUMP:
         case MADV_DODUMP:
                 return true;

         default:
                 return false;

I think the BUG() is pointless (KSM doesn't use it) but not wrong. I 
wouldn't object to removal.

>
>
>> how does Qemu handle that?
>
> The normal qemu startup ignores the return value of the madvise. Only the
> recent post migration changes want to disable huge pages for userfaultd.
> And this code checks the return value. And yes, we should change that
> in QEMU.

Great, thanks :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
@ 2015-11-19  9:31           ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2015-11-19  9:31 UTC (permalink / raw)
  To: Christian Borntraeger, Jason J. Herne, linux-s390
  Cc: aarcange, linux-man, Juan Quintela, linux-api, qemu-devel,
	Dr. David Alan Gilbert, linux-mm, akpm

On 11/19/2015 09:22 AM, Christian Borntraeger wrote:
> On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
>> [CC += linux-api@vger.kernel.org]
>> Anyway, I agree that it doesn't make sense to fail madvise when the given flag
>> is already set. On the other hand, I don't think the userspace app should fail
>> just because of madvise failing? It should in general be an advice that the
>> kernel is also strictly speaking free to ignore as it shouldn't affect
>> correctnes, just performance. Yeah, there are exceptions today like
>> MADV_DONTNEED, but that shouldn't apply to hugepages?
>> So I think Qemu needs fixing too.
>
> yes, I agree. David, Juan. I think The postcopy code should not fail if the madvise.
> Can you fix that?
>
>   Also what happens if the kernel is build
>> without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL,
>
> Does it? To me it looks more like we would trigger a kernel bug.
>
> mm/madvise.c:
>          case MADV_HUGEPAGE:
>          case MADV_NOHUGEPAGE:
>                  error = hugepage_madvise(vma, &new_flags, behavior);  <-----
>                  if (error)
>                          goto out;
>                  break;
>          }
>
>
> include/linux/huge_mm.h:
> static inline int hugepage_madvise(struct vm_area_struct *vma,
>                                     unsigned long *vm_flags, int advice)
> {
>          BUG();
>          return 0;
> }
>
> If we just remove the BUG() statement the code would actually be correct
> in simply ignoring an MADVISE it cannot handle. If you agree, I can
> spin a patch.

Yeah this looks suspicious at first, but the code is not reachable 
thanks to madvise_behavior_valid() returning false:

...
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
         case MADV_HUGEPAGE:
         case MADV_NOHUGEPAGE:
#endif
         case MADV_DONTDUMP:
         case MADV_DODUMP:
                 return true;

         default:
                 return false;

I think the BUG() is pointless (KSM doesn't use it) but not wrong. I 
wouldn't object to removal.

>
>
>> how does Qemu handle that?
>
> The normal qemu startup ignores the return value of the madvise. Only the
> recent post migration changes want to disable huge pages for userfaultd.
> And this code checks the return value. And yes, we should change that
> in QEMU.

Great, thanks :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
  2015-11-19  8:22       ` [Qemu-devel] " Christian Borntraeger
@ 2015-11-19  9:43         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 14+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-19  9:43 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Vlastimil Babka, Jason J. Herne, linux-s390, linux-mm, akpm,
	aarcange, linux-api, linux-man, qemu-devel, Juan Quintela

* Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
> > [CC += linux-api@vger.kernel.org]
> > 
> > Since this is a kernel-user-space API change, please CC linux-api@. The kernel
> > source file Documentation/SubmitChecklist notes that all Linux kernel patches
> > that change userspace interfaces should be CCed to linux-api@vger.kernel.org, so
> > that the various parties who are interested in API changes are informed. For
> > further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html
> > 
> > On 11/12/2015 04:18 PM, Jason J. Herne wrote:
> >> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
> >> hugepage but hugepage_madvise() takes the error path when we ask to turn
> >> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
> >> new postcopy migration feature to fail on s390 because its first action is
> >> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
> >> code so that the operation succeeds without error now.
> >>
> >> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> >> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> > 
> > Looks like the manpage should be fine, as it wasn't very specific wrt these
> > madvise flags. The only thing that potentially applies is:
> > 
> > "EINVAL advice is not a valid."
> > 
> > which itself looks like it needs fixing. Valid what, value? As in completely
> > unknown flags, or flags not valid for the given vma?
> > 
> > Anyway, I agree that it doesn't make sense to fail madvise when the given flag
> > is already set. On the other hand, I don't think the userspace app should fail
> > just because of madvise failing? It should in general be an advice that the
> > kernel is also strictly speaking free to ignore as it shouldn't affect
> > correctnes, just performance. Yeah, there are exceptions today like
> > MADV_DONTNEED, but that shouldn't apply to hugepages?
> > So I think Qemu needs fixing too.
> 
> yes, I agree. David, Juan. I think The postcopy code should not fail if the madvise.
> Can you fix that? 

Yes, I can change that.

>  Also what happens if the kernel is build
> > without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL,
> 
> Does it? To me it looks more like we would trigger a kernel bug.

Yes, it does return EINVAL; it's a shame - there's no way to distinguish between
a kernel that doesn't have hugepage and a screwup in the addresses passed;
everything gets squashed into EINVAL.

Dave

> mm/madvise.c:
>         case MADV_HUGEPAGE:
>         case MADV_NOHUGEPAGE:
>                 error = hugepage_madvise(vma, &new_flags, behavior);  <-----
>                 if (error)
>                         goto out;
>                 break;
>         }
> 
> 
> include/linux/huge_mm.h:
> static inline int hugepage_madvise(struct vm_area_struct *vma,
>                                    unsigned long *vm_flags, int advice)
> {
>         BUG();
>         return 0;
> }
> 
> If we just remove the BUG() statement the code would actually be correct
> in simply ignoring an MADVISE it cannot handle. If you agree, I can
> spin a patch.
> 
> 
> 
> 
> > how does Qemu handle that?
> 
> The normal qemu startup ignores the return value of the madvise. Only the
> recent post migration changes want to disable huge pages for userfaultd.
> And this code checks the return value. And yes, we should change that
> in QEMU.
> 
> 
> 
> 
> > 
> >> ---
> >>  mm/huge_memory.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index c29ddeb..62fe06b 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -2009,7 +2009,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
> >>  		/*
> >>  		 * Be somewhat over-protective like KSM for now!
> >>  		 */
> >> -		if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
> >> +		if (*vm_flags & VM_NO_THP)
> >>  			return -EINVAL;
> >>  		*vm_flags &= ~VM_NOHUGEPAGE;
> >>  		*vm_flags |= VM_HUGEPAGE;
> >> @@ -2025,7 +2025,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
> >>  		/*
> >>  		 * Be somewhat over-protective like KSM for now!
> >>  		 */
> >> -		if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
> >> +		if (*vm_flags & VM_NO_THP)
> >>  			return -EINVAL;
> >>  		*vm_flags &= ~VM_HUGEPAGE;
> >>  		*vm_flags |= VM_NOHUGEPAGE;
> >>
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
@ 2015-11-19  9:43         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 14+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-19  9:43 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: aarcange, linux-s390, linux-man, linux-api, Juan Quintela,
	qemu-devel, linux-mm, Jason J. Herne, akpm, Vlastimil Babka

* Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
> > [CC += linux-api@vger.kernel.org]
> > 
> > Since this is a kernel-user-space API change, please CC linux-api@. The kernel
> > source file Documentation/SubmitChecklist notes that all Linux kernel patches
> > that change userspace interfaces should be CCed to linux-api@vger.kernel.org, so
> > that the various parties who are interested in API changes are informed. For
> > further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html
> > 
> > On 11/12/2015 04:18 PM, Jason J. Herne wrote:
> >> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
> >> hugepage but hugepage_madvise() takes the error path when we ask to turn
> >> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
> >> new postcopy migration feature to fail on s390 because its first action is
> >> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
> >> code so that the operation succeeds without error now.
> >>
> >> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> >> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> > 
> > Looks like the manpage should be fine, as it wasn't very specific wrt these
> > madvise flags. The only thing that potentially applies is:
> > 
> > "EINVAL advice is not a valid."
> > 
> > which itself looks like it needs fixing. Valid what, value? As in completely
> > unknown flags, or flags not valid for the given vma?
> > 
> > Anyway, I agree that it doesn't make sense to fail madvise when the given flag
> > is already set. On the other hand, I don't think the userspace app should fail
> > just because of madvise failing? It should in general be an advice that the
> > kernel is also strictly speaking free to ignore as it shouldn't affect
> > correctnes, just performance. Yeah, there are exceptions today like
> > MADV_DONTNEED, but that shouldn't apply to hugepages?
> > So I think Qemu needs fixing too.
> 
> yes, I agree. David, Juan. I think The postcopy code should not fail if the madvise.
> Can you fix that? 

Yes, I can change that.

>  Also what happens if the kernel is build
> > without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL,
> 
> Does it? To me it looks more like we would trigger a kernel bug.

Yes, it does return EINVAL; it's a shame - there's no way to distinguish between
a kernel that doesn't have hugepage and a screwup in the addresses passed;
everything gets squashed into EINVAL.

Dave

> mm/madvise.c:
>         case MADV_HUGEPAGE:
>         case MADV_NOHUGEPAGE:
>                 error = hugepage_madvise(vma, &new_flags, behavior);  <-----
>                 if (error)
>                         goto out;
>                 break;
>         }
> 
> 
> include/linux/huge_mm.h:
> static inline int hugepage_madvise(struct vm_area_struct *vma,
>                                    unsigned long *vm_flags, int advice)
> {
>         BUG();
>         return 0;
> }
> 
> If we just remove the BUG() statement the code would actually be correct
> in simply ignoring an MADVISE it cannot handle. If you agree, I can
> spin a patch.
> 
> 
> 
> 
> > how does Qemu handle that?
> 
> The normal qemu startup ignores the return value of the madvise. Only the
> recent post migration changes want to disable huge pages for userfaultd.
> And this code checks the return value. And yes, we should change that
> in QEMU.
> 
> 
> 
> 
> > 
> >> ---
> >>  mm/huge_memory.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index c29ddeb..62fe06b 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -2009,7 +2009,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
> >>  		/*
> >>  		 * Be somewhat over-protective like KSM for now!
> >>  		 */
> >> -		if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
> >> +		if (*vm_flags & VM_NO_THP)
> >>  			return -EINVAL;
> >>  		*vm_flags &= ~VM_NOHUGEPAGE;
> >>  		*vm_flags |= VM_HUGEPAGE;
> >> @@ -2025,7 +2025,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
> >>  		/*
> >>  		 * Be somewhat over-protective like KSM for now!
> >>  		 */
> >> -		if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
> >> +		if (*vm_flags & VM_NO_THP)
> >>  			return -EINVAL;
> >>  		*vm_flags &= ~VM_HUGEPAGE;
> >>  		*vm_flags |= VM_NOHUGEPAGE;
> >>
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
  2015-11-19  8:22       ` [Qemu-devel] " Christian Borntraeger
                         ` (2 preceding siblings ...)
  (?)
@ 2015-11-19 13:10       ` Dr. David Alan Gilbert
  2015-11-19 14:30         ` Jason J. Herne
  -1 siblings, 1 reply; 14+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-19 13:10 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: aarcange, Juan Quintela, qemu-devel, Jason J. Herne, akpm,
	Vlastimil Babka

* Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
> > [CC += linux-api@vger.kernel.org]
> > 
> > Since this is a kernel-user-space API change, please CC linux-api@. The kernel
> > source file Documentation/SubmitChecklist notes that all Linux kernel patches
> > that change userspace interfaces should be CCed to linux-api@vger.kernel.org, so
> > that the various parties who are interested in API changes are informed. For
> > further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html
> > 
> > On 11/12/2015 04:18 PM, Jason J. Herne wrote:
> >> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
> >> hugepage but hugepage_madvise() takes the error path when we ask to turn
> >> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
> >> new postcopy migration feature to fail on s390 because its first action is
> >> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
> >> code so that the operation succeeds without error now.
> >>
> >> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> >> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
> > 
> > Looks like the manpage should be fine, as it wasn't very specific wrt these
> > madvise flags. The only thing that potentially applies is:
> > 
> > "EINVAL advice is not a valid."
> > 
> > which itself looks like it needs fixing. Valid what, value? As in completely
> > unknown flags, or flags not valid for the given vma?
> > 
> > Anyway, I agree that it doesn't make sense to fail madvise when the given flag
> > is already set. On the other hand, I don't think the userspace app should fail
> > just because of madvise failing? It should in general be an advice that the
> > kernel is also strictly speaking free to ignore as it shouldn't affect
> > correctnes, just performance. Yeah, there are exceptions today like
> > MADV_DONTNEED, but that shouldn't apply to hugepages?
> > So I think Qemu needs fixing too.
> 
> yes, I agree. David, Juan. I think The postcopy code should not fail if the madvise.
> Can you fix that? 

(cutting down cc list a bit)

Can you tell me if the following works for you:


From 545809a18fa768eccdaafe9bd842490c3390b00c Mon Sep 17 00:00:00 2001
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Date: Thu, 19 Nov 2015 12:05:36 +0000
Subject: [PATCH] Assume madvise for (no)hugepage works

madvise() returns EINVAL in the case of many failures, but also
returns it in cases where the host kernel doesn't have THP enabled.
Postcopy only really cares that THP is off before it detects faults,
and turns it back on afterwards; so we're going to have
to assume that if the madvise fails then the host just doesn't do
THP and we can carry on with the postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/postcopy-ram.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 22d6b18..3946aa9 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -241,10 +241,7 @@ static int cleanup_range(const char *block_name, void *host_addr,
      * We turned off hugepage for the precopy stage with postcopy enabled
      * we can turn it back on now.
      */
-    if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
-        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
-        return -1;
-    }
+    qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE);
 
     /*
      * We can also turn off userfault now since we should have all the
@@ -345,10 +342,7 @@ static int nhp_range(const char *block_name, void *host_addr,
      * do delete areas of the page, even if THP thinks a hugepage would
      * be a good idea, so force hugepages off.
      */
-    if (qemu_madvise(host_addr, length, QEMU_MADV_NOHUGEPAGE)) {
-        error_report("%s: NOHUGEPAGE: %s", __func__, strerror(errno));
-        return -1;
-    }
+    qemu_madvise(host_addr, length, QEMU_MADV_NOHUGEPAGE);
 
     return 0;
 }
-- 
2.5.0

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
  2015-11-19 13:10       ` Dr. David Alan Gilbert
@ 2015-11-19 14:30         ` Jason J. Herne
  0 siblings, 0 replies; 14+ messages in thread
From: Jason J. Herne @ 2015-11-19 14:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Christian Borntraeger
  Cc: aarcange, akpm, qemu-devel, Vlastimil Babka, Juan Quintela

On 11/19/2015 08:10 AM, Dr. David Alan Gilbert wrote:
> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
>> On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
>>> [CC += linux-api@vger.kernel.org]
...
> Can you tell me if the following works for you:
>
>
>  From 545809a18fa768eccdaafe9bd842490c3390b00c Mon Sep 17 00:00:00 2001
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Date: Thu, 19 Nov 2015 12:05:36 +0000
> Subject: [PATCH] Assume madvise for (no)hugepage works
>
> madvise() returns EINVAL in the case of many failures, but also
> returns it in cases where the host kernel doesn't have THP enabled.
> Postcopy only really cares that THP is off before it detects faults,
> and turns it back on afterwards; so we're going to have
> to assume that if the madvise fails then the host just doesn't do
> THP and we can carry on with the postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/postcopy-ram.c | 10 ++--------
>   1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 22d6b18..3946aa9 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -241,10 +241,7 @@ static int cleanup_range(const char *block_name, void *host_addr,
>        * We turned off hugepage for the precopy stage with postcopy enabled
>        * we can turn it back on now.
>        */
> -    if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
> -        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
> -        return -1;
> -    }
> +    qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE);
>
>       /*
>        * We can also turn off userfault now since we should have all the
> @@ -345,10 +342,7 @@ static int nhp_range(const char *block_name, void *host_addr,
>        * do delete areas of the page, even if THP thinks a hugepage would
>        * be a good idea, so force hugepages off.
>        */
> -    if (qemu_madvise(host_addr, length, QEMU_MADV_NOHUGEPAGE)) {
> -        error_report("%s: NOHUGEPAGE: %s", __func__, strerror(errno));
> -        return -1;
> -    }
> +    qemu_madvise(host_addr, length, QEMU_MADV_NOHUGEPAGE);
>
>       return 0;
>   }
>

I tried this without my madvise kernel patch, and was able to get by the 
problem as expected.

We still need the kernel patch set "Allow gmap fault to retry​" as 
posted to linux-mm to get
userfaultfd support playing nicely with s390 async page faults. But that 
is a separate problem
entirely.

Tested-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-11-19 14:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12 15:18 [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390 Jason J. Herne
2015-11-12 16:45 ` Christian Borntraeger
2015-11-13 22:58 ` David Rientjes
     [not found] ` <1447341516-18076-1-git-send-email-jjherne-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2015-11-18 13:31   ` Vlastimil Babka
2015-11-18 13:31     ` Vlastimil Babka
2015-11-19  8:22     ` Christian Borntraeger
2015-11-19  8:22       ` [Qemu-devel] " Christian Borntraeger
     [not found]       ` <564D86AE.1010305-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2015-11-19  9:31         ` Vlastimil Babka
2015-11-19  9:31           ` [Qemu-devel] " Vlastimil Babka
2015-11-19  9:31           ` Vlastimil Babka
2015-11-19  9:43       ` Dr. David Alan Gilbert
2015-11-19  9:43         ` [Qemu-devel] " Dr. David Alan Gilbert
2015-11-19 13:10       ` Dr. David Alan Gilbert
2015-11-19 14:30         ` Jason J. Herne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.