All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Huth <thuth@redhat.com>
Subject: Re: [PATCH RFC 0/2] KVM: s390: avoid having to enable vm.alloc_pgste
Date: Fri, 2 Jun 2017 10:11:05 +0200	[thread overview]
Message-ID: <20170602101105.11e7fd2c@mschwideX1> (raw)
In-Reply-To: <f758615c-babd-5a00-cb15-3a4b71e70afc@de.ibm.com>

On Fri, 2 Jun 2017 09:25:54 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 06/02/2017 09:18 AM, Christian Borntraeger wrote:
> > On 06/02/2017 09:16 AM, Martin Schwidefsky wrote:  
> >> On Fri, 2 Jun 2017 09:13:03 +0200
> >> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> >>  
> >>> On 06/02/2017 09:02 AM, Heiko Carstens wrote:  
> >>>> On Thu, Jun 01, 2017 at 12:46:51PM +0200, Martin Schwidefsky wrote:    
> >>>>>> Unfortunately, converting all page tables to 4k pgste page tables is
> >>>>>> not possible without provoking various race conditions.    
> >>>>>
> >>>>> That is one approach we tried and was found to be buggy. The point is that
> >>>>> you are not allowed to reallocate a page table while a VMA exists that is
> >>>>> in the address range of that page table.
> >>>>>
> >>>>> Another approach we tried is to use an ELF flag on the qemu executable.
> >>>>> That does not work either because fs/exec.c allocates and populates the
> >>>>> new mm struct for the argument pages before fs/binfmt_elf.c comes into
> >>>>> play.    
> >>>>
> >>>> How about if you would fail the system call within arch_check_elf() if you
> >>>> detect that the binary requires pgstes (as indicated by elf flags) and then
> >>>> restart the system call?
> >>>>
> >>>> That is: arch_check_elf() e.g. would set a thread flag that future mm's
> >>>> should be allocated with pgstes. Then do_execve() would cleanup everything
> >>>> and return to entry.S. Upon return to userspace we detect this condition
> >>>> and simply restart the system call, similar to signals vs -ERESTARTSYS.
> >>>>
> >>>> That would make do_execve() cleanup everything and upon reentering it would
> >>>> allocate an mm with the pgste flag set.
> >>>>
> >>>> Maybe this is a bit over-simplified, but might work.
> >>>>
> >>>> At least I also don't like the next "hack", that is specifically designed
> >>>> to only work with how QEMU is currently implemented. It might break with
> >>>> future QEMU changes or the next user space implementation that drives the
> >>>> kvm interface, but is doing everything differently.
> >>>> Let's look for a "clean" solution that will always work. We had too many
> >>>> hacks for this problem and *all* of them were broken.    
> >>>
> >>>
> >>> The more I think about it, dropping 2k page tables and always allocate a full
> >>> page would simplify pgalloc. As far I can see this would also get rid of
> >>> the &mm->context.pgtable_lock.  
> >>
> >> And it would waste twice the amount of memory for page tables. NAK.  
> > 
> > Yes and we spend the same amount of memory TODAY, because every distro on the
> > planet that uses KVM has sysctl.allocate_pgste set.  
> 
> Maybe todays approach might be still the best. (if qemu is installed, its all
> 4k, if not its all 2k)

Exactly-

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

  reply	other threads:[~2017-06-02  8:11 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-29 16:32 [PATCH RFC 0/2] KVM: s390: avoid having to enable vm.alloc_pgste David Hildenbrand
2017-05-29 16:32 ` [PATCH RFC 1/2] s390x: mm: allow mixed page table types (2k and 4k) David Hildenbrand
2017-06-01 11:39   ` Christian Borntraeger
2017-06-01 12:44     ` David Hildenbrand
2017-06-01 12:59   ` David Hildenbrand
2017-06-02  7:11     ` Christian Borntraeger
2017-05-29 16:32 ` [PATCH RFC 2/2] KVM: s390: Introduce KVM_VM_S390_LATE_MMAP David Hildenbrand
2017-06-01 10:46 ` [PATCH RFC 0/2] KVM: s390: avoid having to enable vm.alloc_pgste Martin Schwidefsky
2017-06-01 11:24   ` Christian Borntraeger
2017-06-01 11:27   ` David Hildenbrand
2017-06-02  7:06     ` Heiko Carstens
2017-06-02  7:02   ` Heiko Carstens
2017-06-02  7:13     ` Christian Borntraeger
2017-06-02  7:16       ` Martin Schwidefsky
2017-06-02  7:18         ` Christian Borntraeger
2017-06-02  7:25           ` Christian Borntraeger
2017-06-02  8:11             ` Martin Schwidefsky [this message]
2017-06-02  9:46     ` Martin Schwidefsky
2017-06-02 10:19       ` Christian Borntraeger
2017-06-02 10:53         ` Martin Schwidefsky
2017-06-02 13:20           ` Christian Borntraeger
2017-06-07 12:34             ` Martin Schwidefsky
2017-06-07 20:47               ` Heiko Carstens
2017-06-08  5:35                 ` Martin Schwidefsky
2017-06-08  6:25                   ` Heiko Carstens
2017-06-08 11:24                     ` Martin Schwidefsky
2017-06-08 13:17                       ` Heiko Carstens
2017-06-02 10:28       ` Heiko Carstens
2017-06-02 10:48         ` Martin Schwidefsky
2017-06-02 10:54     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170602101105.11e7fd2c@mschwideX1 \
    --to=schwidefsky@de.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=david@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.