linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: David Hildenbrand <david@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Thomas Huth <thuth@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>
Subject: Re: [PATCH RFC 0/2] KVM: s390: avoid having to enable vm.alloc_pgste
Date: Thu, 1 Jun 2017 12:46:51 +0200	[thread overview]
Message-ID: <20170601124651.3e7969ab@mschwideX1> (raw)
In-Reply-To: <20170529163202.13077-1-david@redhat.com>

Hi David,

it is nice to see that you are still working on s390 related topics.

On Mon, 29 May 2017 18:32:00 +0200
David Hildenbrand <david@redhat.com> wrote:

> Having to enable vm.alloc_pgste globally might not be the best solution.
> 4k page tables are created for all processes and running QEMU KVM guests
> is more complicated than it should be.

To run KVM guests you need to issue a single sysctl to set vm.allocate_pgste,
this is the best solution we found so far.

> Unfortunately, converting all page tables to 4k pgste page tables is
> not possible without provoking various race conditions.

That is one approach we tried and was found to be buggy. The point is that
you are not allowed to reallocate a page table while a VMA exists that is
in the address range of that page table.

Another approach we tried is to use an ELF flag on the qemu executable.
That does not work either because fs/exec.c allocates and populates the
new mm struct for the argument pages before fs/binfmt_elf.c comes into
play.

> However, we
> might be able to let 2k and 4k page tables co-exist. We only need
> 4k page tables whenever we want to expose such memory to a guest. So
> turning on 4k page table allocation at one point and only allowing such
> memory to go into our gmap (guest mapping) might be a solution.
> User space tools like QEMU that create the VM before mmap-ing any memory
> that will belong to the guest can simply use the new VM type. Proper 4k
> page tables will be created for any memory mmap-ed afterwards. And these
> can be used in the gmap without problems. Existing user space tools
> will work as before - having to enable vm.alloc_pgste explicitly.

I can not say that I like this approach. Right now a process either uses
2K page tables or 4K page tables. With your patch it is basically per page
table page. Memory areas that existed before the switch to allocate
4K page tables can not be mapped to the guests gmap anymore. There might
be hidden pitfalls e.g. with guest migration.

> This should play fine with vSIE, as vSIE code works completely on the gmap.
> So if only page tables with pgste go into our gmap, we should be fine.
> 
> Not sure if this breaks important concepts, has some serious performance
> problems or I am missing important cases. If so, I guess there is really
> no way to avoid setting vm.alloc_pgste.
> 
> Possible modifications:
> - Enable this option via an ioctl (like KVM_S390_ENABLE_SIE) instead of
>   a new VM type
> - Remember if we have mixed pgtables. If !mixed, we can make maybe faster
>   decisions (if that is really a problem).

What I do not like in particular is this function:

static inline int pgtable_has_pgste(struct mm_struct *mm, unsigned long addr)
{
	struct page *page;

	if (!mm_has_pgste(mm))
		return 0;

	page = pfn_to_page(addr >> PAGE_SHIFT);
	return atomic_read(&page->_mapcount) & 0x4U;
}

The check for pgstes got more complicated, it used to be a test-under-mask
of a bit in the mm struct and a branch. Now we have an additional pfn_to_page,
an atomic_read and a bit test. That is done multiple times for every ptep_xxx
operation. 

Is the operational simplification of not having to set vm.allocate_pgste really
that important ?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

  parent reply	other threads:[~2017-06-01 10:47 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-29 16:32 [PATCH RFC 0/2] KVM: s390: avoid having to enable vm.alloc_pgste David Hildenbrand
2017-05-29 16:32 ` [PATCH RFC 1/2] s390x: mm: allow mixed page table types (2k and 4k) David Hildenbrand
2017-06-01 11:39   ` Christian Borntraeger
2017-06-01 12:44     ` David Hildenbrand
2017-06-01 12:59   ` David Hildenbrand
2017-06-02  7:11     ` Christian Borntraeger
2017-05-29 16:32 ` [PATCH RFC 2/2] KVM: s390: Introduce KVM_VM_S390_LATE_MMAP David Hildenbrand
2017-06-01 10:46 ` Martin Schwidefsky [this message]
2017-06-01 11:24   ` [PATCH RFC 0/2] KVM: s390: avoid having to enable vm.alloc_pgste Christian Borntraeger
2017-06-01 11:27   ` David Hildenbrand
2017-06-02  7:06     ` Heiko Carstens
2017-06-02  7:02   ` Heiko Carstens
2017-06-02  7:13     ` Christian Borntraeger
2017-06-02  7:16       ` Martin Schwidefsky
2017-06-02  7:18         ` Christian Borntraeger
2017-06-02  7:25           ` Christian Borntraeger
2017-06-02  8:11             ` Martin Schwidefsky
2017-06-02  9:46     ` Martin Schwidefsky
2017-06-02 10:19       ` Christian Borntraeger
2017-06-02 10:53         ` Martin Schwidefsky
2017-06-02 13:20           ` Christian Borntraeger
2017-06-07 12:34             ` Martin Schwidefsky
2017-06-07 20:47               ` Heiko Carstens
2017-06-08  5:35                 ` Martin Schwidefsky
2017-06-08  6:25                   ` Heiko Carstens
2017-06-08 11:24                     ` Martin Schwidefsky
2017-06-08 13:17                       ` Heiko Carstens
2017-06-02 10:28       ` Heiko Carstens
2017-06-02 10:48         ` Martin Schwidefsky
2017-06-02 10:54     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170601124651.3e7969ab@mschwideX1 \
    --to=schwidefsky@de.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=david@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).