All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>,
	"Jason J. Herne" <jjherne@linux.vnet.ibm.com>
Cc: linux-s390@vger.kernel.org, linux-mm@kvack.org,
	KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
Date: Wed, 11 Nov 2015 21:01:44 +0100	[thread overview]
Message-ID: <56439EA8.80505@de.ibm.com> (raw)
In-Reply-To: <20151111173044.GF4573@redhat.com>

Am 11.11.2015 um 18:30 schrieb Andrea Arcangeli:
> Hi Jason,
> 
> On Wed, Nov 11, 2015 at 10:35:16AM -0500, Jason J. Herne wrote:
>> MADV_NOHUGEPAGE processing is too restrictive. kvm already disables
>> hugepage but hugepage_madvise() takes the error path when we ask to turn
>> on the MADV_NOHUGEPAGE bit and the bit is already on. This causes Qemu's
> 
> I wonder why KVM disables transparent hugepages on s390. It sounds
> weird to disable transparent hugepages with KVM. In fact on x86 we
> call MADV_HUGEPAGE to be sure transparent hugepages are enabled on the
> guest physical memory, even if the transparent_hugepage/enabled ==
> madvise.
> 
>> new postcopy migration feature to fail on s390 because its first action is
>> to madvise the guest address space as NOHUGEPAGE. This patch modifies the
>> code so that the operation succeeds without error now.
> 
> The other way is to change qemu to keep track it already called
> MADV_NOHUGEPAGE and not to call it again. I don't have a strong
> opinion on this, I think it's ok to return 0 but it's a visible change
> to userland, I can't imagine it to break anything though. It sounds
> very unlikely that an app could error out if it notices the kernel
> doesn't error out on the second call of MADV_NOHUGEPAGE.
> 
> Glad to hear KVM postcopy live migration is already running on s390 too.

Sometimes....we have some issues with userfaultd, which we currently address.
One place is interesting: the kvm code might have to call fixup_user_fault
for a guest address (to map the page writable). Right now we do not pass
FAULT_FLAG_ALLOW_RETRY, which can trigger a warning like

[  119.414573] FAULT_FLAG_ALLOW_RETRY missing 1
[  119.414577] CPU: 42 PID: 12853 Comm: qemu-system-s39 Not tainted 4.3.0+ #315
[  119.414579]        000000011c4579b8 000000011c457a48 0000000000000002 0000000000000000 
                      000000011c457ae8 000000011c457a60 000000011c457a60 0000000000113e26 
                      00000000000002cf 00000000009feef8 0000000000a1e054 000000000000000b 
                      000000011c457aa8 000000011c457a48 0000000000000000 0000000000000000 
                      0000000000000000 0000000000113e26 000000011c457a48 000000011c457aa8 
[  119.414590] Call Trace:
[  119.414596] ([<0000000000113d16>] show_trace+0xf6/0x148)
[  119.414598]  [<0000000000113dda>] show_stack+0x72/0xf0
[  119.414600]  [<0000000000551b9e>] dump_stack+0x6e/0x90
[  119.414605]  [<000000000032d168>] handle_userfault+0xe0/0x448
[  119.414609]  [<000000000029a2d4>] handle_mm_fault+0x16e4/0x1798
[  119.414611]  [<00000000002930be>] fixup_user_fault+0x86/0x118
[  119.414614]  [<0000000000126bb8>] gmap_ipte_notify+0xa0/0x170
[  119.414617]  [<000000000013ae90>] kvm_arch_vcpu_ioctl_run+0x448/0xc58
[  119.414619]  [<000000000012e4dc>] kvm_vcpu_ioctl+0x37c/0x668
[  119.414622]  [<00000000002eba68>] do_vfs_ioctl+0x3a8/0x508
[  119.414624]  [<00000000002ebc6c>] SyS_ioctl+0xa4/0xb8
[  119.414627]  [<0000000000815c56>] system_call+0xd6/0x264
[  119.414629]  [<000003ff9628721a>] 0x3ff9628721a

I think we can rework this to use something that sets FAULT_FLAG_ALLOW_RETRY,
but this begs the question if a futex operation on userfault backed memory 
would also be broken. The futex code also does fixup_user_fault without 
FAULT_FLAG_ALLOW_RETRY as far as I can tell.

Christian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-11-11 20:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-11 15:35 [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390 Jason J. Herne
2015-11-11 17:30 ` Andrea Arcangeli
2015-11-11 19:47   ` Christian Borntraeger
2015-11-11 19:47     ` Christian Borntraeger
2015-11-11 20:42     ` Andrea Arcangeli
2015-11-11 20:42       ` Andrea Arcangeli
2015-11-11 20:01   ` Christian Borntraeger [this message]
2015-11-11 20:37     ` Andrea Arcangeli
2015-11-11 20:37       ` Andrea Arcangeli
2015-11-12 15:18 Jason J. Herne
2015-11-12 16:45 ` Christian Borntraeger
2015-11-13 22:58 ` David Rientjes
     [not found] ` <1447341516-18076-1-git-send-email-jjherne-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2015-11-18 13:31   ` Vlastimil Babka
2015-11-18 13:31     ` Vlastimil Babka
2015-11-19  8:22     ` Christian Borntraeger
     [not found]       ` <564D86AE.1010305-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2015-11-19  9:31         ` Vlastimil Babka
2015-11-19  9:31           ` Vlastimil Babka
2015-11-19  9:43       ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56439EA8.80505@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=jjherne@linux.vnet.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.