All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] Prototype PAPR hash page table resizing (guest side)
@ 2015-12-22  5:14 David Gibson
  2015-12-22  5:14 ` [RFC 1/3] pseries: Add hypercall wrappers for hash page table resizing David Gibson
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: David Gibson @ 2015-12-22  5:14 UTC (permalink / raw)
  To: paulus, benh, bharata; +Cc: thuth, lvivier, linuxppc-dev, David Gibson

I've discussed with Paul and Ben previously the possibility of
extending PAPR to allow changing the size of a running guest's hash
page table (HPT).  This would allow for much more flexible memory
hotplug, since the HPT wouldn't have to be sized in advance for the
maximum possible memory size of the guest.

This is a draft / prototype implementation of the guest side of this.
Or rather, the guest side of the core code for switching HPTs.  We'd
also need notification handling and CAS support for a complete
implementation, but I haven't gotten to writing that yet.

Obviously, for now it uses vendor specific hypercalls rather than
official PAPR ones.  I have a draft implementation of these in qemu
for TCG guests which I hope to post in the reasonably near future.

The design assumes that the HPT change happens in two phases:

   1) The "prepare" phase may be slow but can run asynchronously while
      the guest runs normally

   2) The "commit" phase switches to a previously prepared HPT, and
      must be run with no concurrent updates to the HPT - in practice
      that means stop_machine() for a Linux guest.

To go with that there are two (proposed) hcalls:

H_RESIZE_HPT_PREPARE:
    This starts (1) for a new HPT of a given size.  It will typically
    return H_LONG_DELAY_* and the guest must call it in a (sleeping)
    loop until it completes.

    Calling PREPARE with a different size from one already in progress
    will cancel the in-progress preparation (freeing the potential HPT
    if already allocated) and start a new one for the given size.

    As a special case calling PREPARE with shift == 0 will cancel any
    in-progress preparation and not start a new one, instead reverting
    to the existing HPT.

H_RESIZE_HPT_COMMIT:
    Switches to an HPT of the given size.  It will fail if there isn't
    a fully prepared HPT of the given size ready to go.  No HPT updats
    (H_ENTER etc.) may be run on *any* guest CPU while this is called.

    Once COMMIT returns H_SUCCESS, the guest will be operating on the
    new HPT.  On any other return it is still running on the old HPT.

    The hypervisor could cancel a prospective HPT for its own reasons
    - e.g. it could time out if the guest waits too long between
    PREPARE and COMMIT, or it could "forget" about an in-progress
    preparation due to live migration.  In that case COMMIT will fail,
    which the guest should be prepared to handle.

Both hypercalls take a flags parameter for extensibility, but I
haven't defined any flags so far.

I have two possible implementations in mind for the host side, both of
which should work with the same guest interface:

A) During the prepare phase we just allocate and clear the HPT (and
   install VRMA HPTEs for KVM).  During the commit phase we translate
   all bolted entries from the old HPT to the new then continue.

   This approach is relatively simple to implement, but could lead to
   a substantial delay during the commit phase.  Initial rough
   measurements suggest it will be around ~200ms on a POWER8 for a 1G
   HPT (128G guest).  Since typical live migration downtimes are
   300-500ms, that's probably still good enough to be useful.

B) During the prepare phase H_ENTER etc. calls are mirrored to both
   the current HPT and the prospective HPT.  Existing HPTEs are
   migrated to the new HPT in the background.  The prepare phase
   completes once the old and new HPTs are in sync.  The commit phase
   simply pivots to the new HPT.


Please comment on the proposed new PAPR interface and this
implementation.  Any information on what the next step would be in
proposing this as a formal PAPR update would be useful too.    

David Gibson (3):
  pseries: Add hypercall wrappers for hash page table resizing
  pseries: Add support for hash table resizing
  pseries: sysfs hack to trigger a hash page table resize

 arch/powerpc/include/asm/hvcall.h         |   2 +
 arch/powerpc/include/asm/plpar_wrappers.h |  12 +++
 arch/powerpc/platforms/pseries/lpar.c     | 150 ++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC 1/3] pseries: Add hypercall wrappers for hash page table resizing
  2015-12-22  5:14 [RFC 0/3] Prototype PAPR hash page table resizing (guest side) David Gibson
@ 2015-12-22  5:14 ` David Gibson
  2015-12-22  5:14 ` [RFC 2/3] pseries: Add support for hash " David Gibson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2015-12-22  5:14 UTC (permalink / raw)
  To: paulus, benh, bharata; +Cc: thuth, lvivier, linuxppc-dev, David Gibson

This adds the hypercall numbers and wrapper functions for the hash page
table resizing hypercalls.

These are experimental "platform specific" values for now, until we have a
formal PAPR update.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/hvcall.h         |  2 ++
 arch/powerpc/include/asm/plpar_wrappers.h | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 85bc8c0..ae1fcb7 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -273,6 +273,8 @@
 
 /* Platform specific hcalls, used by KVM */
 #define H_RTAS			0xf000
+#define H_RESIZE_HPT_PREPARE	0xf003
+#define H_RESIZE_HPT_COMMIT	0xf004
 
 /* "Platform specific hcalls", provided by PHYP */
 #define H_GET_24X7_CATALOG_PAGE	0xF078
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 67859ed..8f1d8fe 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -225,6 +225,18 @@ static inline long plpar_pte_protect(unsigned long flags, unsigned long ptex,
 	return plpar_hcall_norets(H_PROTECT, flags, ptex, avpn);
 }
 
+static inline long plpar_resize_hpt_prepare(unsigned long flags,
+					    unsigned long shift)
+{
+	return plpar_hcall_norets(H_RESIZE_HPT_PREPARE, flags, shift);
+}
+
+static inline long plpar_resize_hpt_commit(unsigned long flags,
+					   unsigned long shift)
+{
+	return plpar_hcall_norets(H_RESIZE_HPT_COMMIT, flags, shift);
+}
+
 static inline long plpar_tce_get(unsigned long liobn, unsigned long ioba,
 		unsigned long *tce_ret)
 {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 2/3] pseries: Add support for hash table resizing
  2015-12-22  5:14 [RFC 0/3] Prototype PAPR hash page table resizing (guest side) David Gibson
  2015-12-22  5:14 ` [RFC 1/3] pseries: Add hypercall wrappers for hash page table resizing David Gibson
@ 2015-12-22  5:14 ` David Gibson
  2015-12-22  5:14 ` [RFC 3/3] pseries: sysfs hack to trigger a hash page table resize David Gibson
  2015-12-24  9:22 ` [RFC 0/3] Prototype PAPR hash page table resizing (guest side) Anshuman Khandual
  3 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2015-12-22  5:14 UTC (permalink / raw)
  To: paulus, benh, bharata; +Cc: thuth, lvivier, linuxppc-dev, David Gibson

This adds support for using experimental hypercalls to change the size
of the main hash page table while running as a PAPR guest.  For now these
hypercalls are only in experimental qemu versions.

The interface is two part: first H_RESIZE_HPT_PREPARE is used to allocate
and prepare the new hash table.  This may be slow, but can be done
asynchronously.  Then, H_RESIZE_HPT_COMMIT is used to switch to the new
hash table.  This requires that no CPUs be concurrently updating the HPT,
and so must be run under stop_machine().

This patch only supplies a function to execute the hash table change,
nothing yet calls it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/platforms/pseries/lpar.c | 109 ++++++++++++++++++++++++++++++++++
 1 file changed, 109 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index b7a67e3..aadb242 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -27,6 +27,8 @@
 #include <linux/console.h>
 #include <linux/export.h>
 #include <linux/jump_label.h>
+#include <linux/delay.h>
+#include <linux/stop_machine.h>
 #include <asm/processor.h>
 #include <asm/mmu.h>
 #include <asm/page.h>
@@ -794,3 +796,110 @@ int h_get_mpp_x(struct hvcall_mpp_x_data *mpp_x_data)
 
 	return rc;
 }
+
+#define HPT_RESIZE_TIMEOUT	10000 /* ms */
+
+struct hpt_resize_state {
+	unsigned long shift;
+	int commit_rc;
+};
+
+static int pSeries_lpar_resize_hpt_commit(void *data)
+{
+	struct hpt_resize_state *state = data;
+
+	state->commit_rc = plpar_resize_hpt_commit(0, state->shift);
+	if (state->commit_rc != H_SUCCESS)
+		return -EIO;
+
+	/* Hypervisor has transitioned the HTAB, update our globals */
+	ppc64_pft_size = state->shift;
+	htab_size_bytes = 1UL << ppc64_pft_size;
+	htab_hash_mask = (htab_size_bytes >> 7) - 1;
+
+	return 0;
+}
+
+/* Must be called in user context */
+int pSeries_lpar_resize_hpt(unsigned long shift)
+{
+	struct hpt_resize_state state = {
+		.shift = shift,
+		.commit_rc = H_FUNCTION,
+	};
+	unsigned int delay, total_delay = 0;
+	int rc;
+	ktime_t t0, t1, t2;
+
+	might_sleep();
+
+	printk(KERN_INFO "lpar: Attempting to resize HPT to shift %lu\n",
+	       shift);
+
+	t0 = ktime_get();
+
+	rc = plpar_resize_hpt_prepare(0, shift);
+	while (H_IS_LONG_BUSY(rc)) {
+		delay = get_longbusy_msecs(rc);
+		total_delay += delay;
+		if (total_delay > HPT_RESIZE_TIMEOUT) {
+			/* prepare call with shift==0 cancels an
+			 * in-progress resize */
+			rc = plpar_resize_hpt_prepare(0, 0);
+			if (rc != H_SUCCESS)
+				printk(KERN_WARNING
+				       "lpar: Unexpected error %d cancelling timed out HPT resize\n",
+				       rc);
+			return -ETIMEDOUT;
+		}
+		msleep(delay);
+	};
+
+	switch (rc) {
+	case H_SUCCESS:
+		/* Continue on */
+		break;
+
+	case H_PARAMETER:
+		return -EINVAL;
+	case H_RESOURCE:
+		return -EPERM;
+	default:
+		printk(KERN_WARNING
+		       "lpar: Unexpected error %d from H_RESIZE_HPT_PREPARE\n",
+		       rc);
+		return -EIO;
+	}
+
+	t1 = ktime_get();
+
+	get_online_cpus();
+	mb();
+	rc = stop_machine(pSeries_lpar_resize_hpt_commit, &state,
+			  cpu_online_mask);
+	put_online_cpus();
+
+	t2 = ktime_get();
+
+	if (rc != 0) {
+		switch (state.commit_rc) {
+		case H_PTEG_FULL:
+			printk(KERN_WARNING
+			       "lpar: Hash collision while resizing HPT\n");
+			return -ENOSPC;
+
+		default:
+			printk(KERN_WARNING
+			       "lpar: Unexpected error %d from H_RESIZE_HPT_COMMIT\n",
+			       state.commit_rc);
+			return -EIO;
+		};
+	}
+
+	printk(KERN_INFO
+	       "lpar: HPT resize to shift %lu complete (%lld ms / %lld ms)\n",
+	       shift, (long long) ktime_ms_delta(t1, t0),
+	       (long long) ktime_ms_delta(t2, t1));
+
+	return 0;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 3/3] pseries: sysfs hack to trigger a hash page table resize
  2015-12-22  5:14 [RFC 0/3] Prototype PAPR hash page table resizing (guest side) David Gibson
  2015-12-22  5:14 ` [RFC 1/3] pseries: Add hypercall wrappers for hash page table resizing David Gibson
  2015-12-22  5:14 ` [RFC 2/3] pseries: Add support for hash " David Gibson
@ 2015-12-22  5:14 ` David Gibson
  2015-12-22 11:19   ` Denis Kirjanov
  2015-12-24  9:22 ` [RFC 0/3] Prototype PAPR hash page table resizing (guest side) Anshuman Khandual
  3 siblings, 1 reply; 12+ messages in thread
From: David Gibson @ 2015-12-22  5:14 UTC (permalink / raw)
  To: paulus, benh, bharata; +Cc: thuth, lvivier, linuxppc-dev, David Gibson

This patch adds a special file in /sys/kernel/mm which can be used to view
the current size of the hash page table (as a bit shift) and to trigger
a resize of the hash table on PAPR guests.

Logically this would make more sense as a debugfs file, but that doesn't
see to provide an obvious way to have a "store" function that has an effect
other than updating a variable.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/platforms/pseries/lpar.c | 41 +++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index aadb242..2759767 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -903,3 +903,44 @@ int pSeries_lpar_resize_hpt(unsigned long shift)
 
 	return 0;
 }
+
+static ssize_t ppc64_pft_size_show(struct kobject *kobj,
+				   struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%llu\n", (unsigned long long)ppc64_pft_size);
+}
+
+static ssize_t ppc64_pft_size_store(struct kobject *kobj,
+				    struct kobj_attribute *attr,
+				    const char *buf, size_t count)
+{
+	unsigned long new_shift;
+	int rc;
+
+	pr_err("lpar: ppc64_pft_size_store() count=%zd\n", count);
+
+	if (kstrtoul(buf, 0, &new_shift))
+		return -EINVAL;
+
+	rc = pSeries_lpar_resize_hpt(new_shift);
+	if (rc < 0)
+		return rc;
+
+	return count;
+}
+
+static struct kobj_attribute ppc64_pft_size_attr =
+	__ATTR(ppc64_pft_size, 0600, ppc64_pft_size_show, ppc64_pft_size_store);
+
+static int __init pseries_lpar_sysfs(void)
+{
+	int rc;
+
+	rc = sysfs_create_file(mm_kobj, &ppc64_pft_size_attr.attr);
+	if (rc)
+		pr_err("lpar: unable to create ppc64_pft_size sysfs file (%d)\n",
+		       rc);
+
+	return 0;
+}
+machine_device_initcall(pseries, pseries_lpar_sysfs);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC 3/3] pseries: sysfs hack to trigger a hash page table resize
  2015-12-22  5:14 ` [RFC 3/3] pseries: sysfs hack to trigger a hash page table resize David Gibson
@ 2015-12-22 11:19   ` Denis Kirjanov
  2015-12-23  8:45     ` David Gibson
  0 siblings, 1 reply; 12+ messages in thread
From: Denis Kirjanov @ 2015-12-22 11:19 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, benh, bharata, lvivier, thuth, linuxppc-dev

On 12/22/15, David Gibson <david@gibson.dropbear.id.au> wrote:
> This patch adds a special file in /sys/kernel/mm which can be used to view
> the current size of the hash page table (as a bit shift) and to trigger
> a resize of the hash table on PAPR guests.
>
> Logically this would make more sense as a debugfs file, but that doesn't
> see to provide an obvious way to have a "store" function that has an effect
> other than updating a variable.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/platforms/pseries/lpar.c | 41
> +++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/lpar.c
> b/arch/powerpc/platforms/pseries/lpar.c
> index aadb242..2759767 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -903,3 +903,44 @@ int pSeries_lpar_resize_hpt(unsigned long shift)
>
>  	return 0;
>  }
> +
> +static ssize_t ppc64_pft_size_show(struct kobject *kobj,
> +				   struct kobj_attribute *attr, char *buf)
> +{
> +	return sprintf(buf, "%llu\n", (unsigned long long)ppc64_pft_size);
> +}
> +
> +static ssize_t ppc64_pft_size_store(struct kobject *kobj,
> +				    struct kobj_attribute *attr,
> +				    const char *buf, size_t count)
> +{
> +	unsigned long new_shift;
> +	int rc;
> +
> +	pr_err("lpar: ppc64_pft_size_store() count=%zd\n", count);
Why pr_err?
> +
> +	if (kstrtoul(buf, 0, &new_shift))
> +		return -EINVAL;
> +
> +	rc = pSeries_lpar_resize_hpt(new_shift);
> +	if (rc < 0)
> +		return rc;
> +
> +	return count;
> +}
> +
> +static struct kobj_attribute ppc64_pft_size_attr =
> +	__ATTR(ppc64_pft_size, 0600, ppc64_pft_size_show, ppc64_pft_size_store);
> +
> +static int __init pseries_lpar_sysfs(void)
> +{
> +	int rc;
> +
> +	rc = sysfs_create_file(mm_kobj, &ppc64_pft_size_attr.attr);
> +	if (rc)
> +		pr_err("lpar: unable to create ppc64_pft_size sysfs file (%d)\n",
> +		       rc);
> +
> +	return 0;
I think that we need to return rc value, right?
> +}
> +machine_device_initcall(pseries, pseries_lpar_sysfs);
> --
> 2.5.0
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 3/3] pseries: sysfs hack to trigger a hash page table resize
  2015-12-22 11:19   ` Denis Kirjanov
@ 2015-12-23  8:45     ` David Gibson
  2015-12-23 12:16       ` Denis Kirjanov
  0 siblings, 1 reply; 12+ messages in thread
From: David Gibson @ 2015-12-23  8:45 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: paulus, benh, bharata, lvivier, thuth, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2911 bytes --]

On Tue, Dec 22, 2015 at 02:19:20PM +0300, Denis Kirjanov wrote:
> On 12/22/15, David Gibson <david@gibson.dropbear.id.au> wrote:
> > This patch adds a special file in /sys/kernel/mm which can be used to view
> > the current size of the hash page table (as a bit shift) and to trigger
> > a resize of the hash table on PAPR guests.
> >
> > Logically this would make more sense as a debugfs file, but that doesn't
> > see to provide an obvious way to have a "store" function that has an effect
> > other than updating a variable.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  arch/powerpc/platforms/pseries/lpar.c | 41
> > +++++++++++++++++++++++++++++++++++
> >  1 file changed, 41 insertions(+)
> >
> > diff --git a/arch/powerpc/platforms/pseries/lpar.c
> > b/arch/powerpc/platforms/pseries/lpar.c
> > index aadb242..2759767 100644
> > --- a/arch/powerpc/platforms/pseries/lpar.c
> > +++ b/arch/powerpc/platforms/pseries/lpar.c
> > @@ -903,3 +903,44 @@ int pSeries_lpar_resize_hpt(unsigned long shift)
> >
> >  	return 0;
> >  }
> > +
> > +static ssize_t ppc64_pft_size_show(struct kobject *kobj,
> > +				   struct kobj_attribute *attr, char *buf)
> > +{
> > +	return sprintf(buf, "%llu\n", (unsigned long long)ppc64_pft_size);
> > +}
> > +
> > +static ssize_t ppc64_pft_size_store(struct kobject *kobj,
> > +				    struct kobj_attribute *attr,
> > +				    const char *buf, size_t count)
> > +{
> > +	unsigned long new_shift;
> > +	int rc;
> > +
> > +	pr_err("lpar: ppc64_pft_size_store() count=%zd\n", count);
> Why pr_err?

Uh.. good question.  That's a not particularly useful debug comment I
should just remove anyway.

> > +
> > +	if (kstrtoul(buf, 0, &new_shift))
> > +		return -EINVAL;
> > +
> > +	rc = pSeries_lpar_resize_hpt(new_shift);
> > +	if (rc < 0)
> > +		return rc;
> > +
> > +	return count;
> > +}
> > +
> > +static struct kobj_attribute ppc64_pft_size_attr =
> > +	__ATTR(ppc64_pft_size, 0600, ppc64_pft_size_show, ppc64_pft_size_store);
> > +
> > +static int __init pseries_lpar_sysfs(void)
> > +{
> > +	int rc;
> > +
> > +	rc = sysfs_create_file(mm_kobj, &ppc64_pft_size_attr.attr);
> > +	if (rc)
> > +		pr_err("lpar: unable to create ppc64_pft_size sysfs file (%d)\n",
> > +		       rc);
> > +
> > +	return 0;
> I think that we need to return rc value, right?

No.  Failing to register this sysfs file shouldn't cause the boot to fail.

> > +}
> > +machine_device_initcall(pseries, pseries_lpar_sysfs);
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 3/3] pseries: sysfs hack to trigger a hash page table resize
  2015-12-23  8:45     ` David Gibson
@ 2015-12-23 12:16       ` Denis Kirjanov
  2015-12-24  6:37         ` David Gibson
  0 siblings, 1 reply; 12+ messages in thread
From: Denis Kirjanov @ 2015-12-23 12:16 UTC (permalink / raw)
  To: David Gibson; +Cc: paulus, benh, bharata, lvivier, thuth, linuxppc-dev

On 12/23/15, David Gibson <david@gibson.dropbear.id.au> wrote:
> On Tue, Dec 22, 2015 at 02:19:20PM +0300, Denis Kirjanov wrote:
>> On 12/22/15, David Gibson <david@gibson.dropbear.id.au> wrote:
>> > This patch adds a special file in /sys/kernel/mm which can be used to
>> > view
>> > the current size of the hash page table (as a bit shift) and to trigger
>> > a resize of the hash table on PAPR guests.
>> >
>> > Logically this would make more sense as a debugfs file, but that
>> > doesn't
>> > see to provide an obvious way to have a "store" function that has an
>> > effect
>> > other than updating a variable.
>> >
>> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> > ---
>> >  arch/powerpc/platforms/pseries/lpar.c | 41
>> > +++++++++++++++++++++++++++++++++++
>> >  1 file changed, 41 insertions(+)
>> >
>> > diff --git a/arch/powerpc/platforms/pseries/lpar.c
>> > b/arch/powerpc/platforms/pseries/lpar.c
>> > index aadb242..2759767 100644
>> > --- a/arch/powerpc/platforms/pseries/lpar.c
>> > +++ b/arch/powerpc/platforms/pseries/lpar.c
>> > @@ -903,3 +903,44 @@ int pSeries_lpar_resize_hpt(unsigned long shift)
>> >
>> >  	return 0;
>> >  }
>> > +
>> > +static ssize_t ppc64_pft_size_show(struct kobject *kobj,
>> > +				   struct kobj_attribute *attr, char *buf)
>> > +{
>> > +	return sprintf(buf, "%llu\n", (unsigned long long)ppc64_pft_size);
>> > +}
>> > +
>> > +static ssize_t ppc64_pft_size_store(struct kobject *kobj,
>> > +				    struct kobj_attribute *attr,
>> > +				    const char *buf, size_t count)
>> > +{
>> > +	unsigned long new_shift;
>> > +	int rc;
>> > +
>> > +	pr_err("lpar: ppc64_pft_size_store() count=%zd\n", count);
>> Why pr_err?
>
> Uh.. good question.  That's a not particularly useful debug comment I
> should just remove anyway.
>
>> > +
>> > +	if (kstrtoul(buf, 0, &new_shift))
>> > +		return -EINVAL;
>> > +
>> > +	rc = pSeries_lpar_resize_hpt(new_shift);
>> > +	if (rc < 0)
>> > +		return rc;
>> > +
>> > +	return count;
>> > +}
>> > +
>> > +static struct kobj_attribute ppc64_pft_size_attr =
>> > +	__ATTR(ppc64_pft_size, 0600, ppc64_pft_size_show,
>> > ppc64_pft_size_store);
>> > +
>> > +static int __init pseries_lpar_sysfs(void)
>> > +{
>> > +	int rc;
>> > +
>> > +	rc = sysfs_create_file(mm_kobj, &ppc64_pft_size_attr.attr);
>> > +	if (rc)
>> > +		pr_err("lpar: unable to create ppc64_pft_size sysfs file (%d)\n",
>> > +		       rc);
>> > +
>> > +	return 0;
>> I think that we need to return rc value, right?
>
> No.  Failing to register this sysfs file shouldn't cause the boot to fail.
Then we don't need rc at all ;)
>
>> > +}
>> > +machine_device_initcall(pseries, pseries_lpar_sysfs);
>> >
>> > _______________________________________________
>> > Linuxppc-dev mailing list
>> > Linuxppc-dev@lists.ozlabs.org
>> > https://lists.ozlabs.org/listinfo/linuxppc-dev
>>
>
> --
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 3/3] pseries: sysfs hack to trigger a hash page table resize
  2015-12-23 12:16       ` Denis Kirjanov
@ 2015-12-24  6:37         ` David Gibson
  0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2015-12-24  6:37 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: paulus, benh, bharata, lvivier, thuth, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 3147 bytes --]

On Wed, Dec 23, 2015 at 03:16:39PM +0300, Denis Kirjanov wrote:
> On 12/23/15, David Gibson <david@gibson.dropbear.id.au> wrote:
> > On Tue, Dec 22, 2015 at 02:19:20PM +0300, Denis Kirjanov wrote:
> >> On 12/22/15, David Gibson <david@gibson.dropbear.id.au> wrote:
> >> > This patch adds a special file in /sys/kernel/mm which can be used to
> >> > view
> >> > the current size of the hash page table (as a bit shift) and to trigger
> >> > a resize of the hash table on PAPR guests.
> >> >
> >> > Logically this would make more sense as a debugfs file, but that
> >> > doesn't
> >> > see to provide an obvious way to have a "store" function that has an
> >> > effect
> >> > other than updating a variable.
> >> >
> >> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >> > ---
> >> >  arch/powerpc/platforms/pseries/lpar.c | 41
> >> > +++++++++++++++++++++++++++++++++++
> >> >  1 file changed, 41 insertions(+)
> >> >
> >> > diff --git a/arch/powerpc/platforms/pseries/lpar.c
> >> > b/arch/powerpc/platforms/pseries/lpar.c
> >> > index aadb242..2759767 100644
> >> > --- a/arch/powerpc/platforms/pseries/lpar.c
> >> > +++ b/arch/powerpc/platforms/pseries/lpar.c
> >> > @@ -903,3 +903,44 @@ int pSeries_lpar_resize_hpt(unsigned long shift)
> >> >
> >> >  	return 0;
> >> >  }
> >> > +
> >> > +static ssize_t ppc64_pft_size_show(struct kobject *kobj,
> >> > +				   struct kobj_attribute *attr, char *buf)
> >> > +{
> >> > +	return sprintf(buf, "%llu\n", (unsigned long long)ppc64_pft_size);
> >> > +}
> >> > +
> >> > +static ssize_t ppc64_pft_size_store(struct kobject *kobj,
> >> > +				    struct kobj_attribute *attr,
> >> > +				    const char *buf, size_t count)
> >> > +{
> >> > +	unsigned long new_shift;
> >> > +	int rc;
> >> > +
> >> > +	pr_err("lpar: ppc64_pft_size_store() count=%zd\n", count);
> >> Why pr_err?
> >
> > Uh.. good question.  That's a not particularly useful debug comment I
> > should just remove anyway.
> >
> >> > +
> >> > +	if (kstrtoul(buf, 0, &new_shift))
> >> > +		return -EINVAL;
> >> > +
> >> > +	rc = pSeries_lpar_resize_hpt(new_shift);
> >> > +	if (rc < 0)
> >> > +		return rc;
> >> > +
> >> > +	return count;
> >> > +}
> >> > +
> >> > +static struct kobj_attribute ppc64_pft_size_attr =
> >> > +	__ATTR(ppc64_pft_size, 0600, ppc64_pft_size_show,
> >> > ppc64_pft_size_store);
> >> > +
> >> > +static int __init pseries_lpar_sysfs(void)
> >> > +{
> >> > +	int rc;
> >> > +
> >> > +	rc = sysfs_create_file(mm_kobj, &ppc64_pft_size_attr.attr);
> >> > +	if (rc)
> >> > +		pr_err("lpar: unable to create ppc64_pft_size sysfs file (%d)\n",
> >> > +		       rc);
> >> > +
> >> > +	return 0;
> >> I think that we need to return rc value, right?
> >
> > No.  Failing to register this sysfs file shouldn't cause the boot to fail.
> Then we don't need rc at all ;)

Except to report the error code in the message, which is useful.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/3] Prototype PAPR hash page table resizing (guest side)
  2015-12-22  5:14 [RFC 0/3] Prototype PAPR hash page table resizing (guest side) David Gibson
                   ` (2 preceding siblings ...)
  2015-12-22  5:14 ` [RFC 3/3] pseries: sysfs hack to trigger a hash page table resize David Gibson
@ 2015-12-24  9:22 ` Anshuman Khandual
  2015-12-24 10:38   ` David Gibson
  3 siblings, 1 reply; 12+ messages in thread
From: Anshuman Khandual @ 2015-12-24  9:22 UTC (permalink / raw)
  To: David Gibson, paulus, benh, bharata; +Cc: lvivier, thuth, linuxppc-dev

On 12/22/2015 10:44 AM, David Gibson wrote:
> I've discussed with Paul and Ben previously the possibility of
> extending PAPR to allow changing the size of a running guest's hash
> page table (HPT).  This would allow for much more flexible memory
> hotplug, since the HPT wouldn't have to be sized in advance for the
> maximum possible memory size of the guest.

Does it include reducing the size of HPT as well ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/3] Prototype PAPR hash page table resizing (guest side)
  2015-12-24  9:22 ` [RFC 0/3] Prototype PAPR hash page table resizing (guest side) Anshuman Khandual
@ 2015-12-24 10:38   ` David Gibson
  2015-12-28  4:39     ` Anshuman Khandual
  0 siblings, 1 reply; 12+ messages in thread
From: David Gibson @ 2015-12-24 10:38 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: paulus, benh, bharata, lvivier, thuth, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 820 bytes --]

On Thu, Dec 24, 2015 at 02:52:53PM +0530, Anshuman Khandual wrote:
> On 12/22/2015 10:44 AM, David Gibson wrote:
> > I've discussed with Paul and Ben previously the possibility of
> > extending PAPR to allow changing the size of a running guest's hash
> > page table (HPT).  This would allow for much more flexible memory
> > hotplug, since the HPT wouldn't have to be sized in advance for the
> > maximum possible memory size of the guest.
> 
> Does it include reducing the size of HPT as well ?

It does, but that could fail with H_PTEG_FULL if there's a collision
between bolted entries in the reduced table.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/3] Prototype PAPR hash page table resizing (guest side)
  2015-12-24 10:38   ` David Gibson
@ 2015-12-28  4:39     ` Anshuman Khandual
  2015-12-29  4:09       ` David Gibson
  0 siblings, 1 reply; 12+ messages in thread
From: Anshuman Khandual @ 2015-12-28  4:39 UTC (permalink / raw)
  To: David Gibson; +Cc: lvivier, thuth, paulus, bharata, linuxppc-dev

On 12/24/2015 04:08 PM, David Gibson wrote:
> On Thu, Dec 24, 2015 at 02:52:53PM +0530, Anshuman Khandual wrote:
>> > On 12/22/2015 10:44 AM, David Gibson wrote:
>>> > > I've discussed with Paul and Ben previously the possibility of
>>> > > extending PAPR to allow changing the size of a running guest's hash
>>> > > page table (HPT).  This would allow for much more flexible memory
>>> > > hotplug, since the HPT wouldn't have to be sized in advance for the
>>> > > maximum possible memory size of the guest.
>> > 
>> > Does it include reducing the size of HPT as well ?
> It does, but that could fail with H_PTEG_FULL if there's a collision
> between bolted entries in the reduced table.

So in the case when we request for a reduced size HPT table, as mentioned
in the second implementation method, will we allocate the required smaller
HPT table to shadow the original or we just reduce the original HPT in
size without allocating a new one ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/3] Prototype PAPR hash page table resizing (guest side)
  2015-12-28  4:39     ` Anshuman Khandual
@ 2015-12-29  4:09       ` David Gibson
  0 siblings, 0 replies; 12+ messages in thread
From: David Gibson @ 2015-12-29  4:09 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: lvivier, thuth, paulus, bharata, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

On Mon, Dec 28, 2015 at 10:09:52AM +0530, Anshuman Khandual wrote:
> On 12/24/2015 04:08 PM, David Gibson wrote:
> > On Thu, Dec 24, 2015 at 02:52:53PM +0530, Anshuman Khandual wrote:
> >> > On 12/22/2015 10:44 AM, David Gibson wrote:
> >>> > > I've discussed with Paul and Ben previously the possibility of
> >>> > > extending PAPR to allow changing the size of a running guest's hash
> >>> > > page table (HPT).  This would allow for much more flexible memory
> >>> > > hotplug, since the HPT wouldn't have to be sized in advance for the
> >>> > > maximum possible memory size of the guest.
> >> > 
> >> > Does it include reducing the size of HPT as well ?
> > It does, but that could fail with H_PTEG_FULL if there's a collision
> > between bolted entries in the reduced table.
> 
> So in the case when we request for a reduced size HPT table, as mentioned
> in the second implementation method, will we allocate the required smaller
> HPT table to shadow the original or we just reduce the original HPT in
> size without allocating a new one ?

The current implementation will allocate a new HPT and free the
original one once the transition is complete.  It might be possible to
avoid that and reduce the HPT in place, but doing a rollback in case
of collision will be a lot hairier.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-12-29  5:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-22  5:14 [RFC 0/3] Prototype PAPR hash page table resizing (guest side) David Gibson
2015-12-22  5:14 ` [RFC 1/3] pseries: Add hypercall wrappers for hash page table resizing David Gibson
2015-12-22  5:14 ` [RFC 2/3] pseries: Add support for hash " David Gibson
2015-12-22  5:14 ` [RFC 3/3] pseries: sysfs hack to trigger a hash page table resize David Gibson
2015-12-22 11:19   ` Denis Kirjanov
2015-12-23  8:45     ` David Gibson
2015-12-23 12:16       ` Denis Kirjanov
2015-12-24  6:37         ` David Gibson
2015-12-24  9:22 ` [RFC 0/3] Prototype PAPR hash page table resizing (guest side) Anshuman Khandual
2015-12-24 10:38   ` David Gibson
2015-12-28  4:39     ` Anshuman Khandual
2015-12-29  4:09       ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.