All of lore.kernel.org
 help / color / mirror / Atom feed
* powerpc hugepage bug(s) when no valid hstates?
@ 2014-03-24 23:02 ` Nishanth Aravamudan
  0 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-03-24 23:02 UTC (permalink / raw)
  To: linux-mm; +Cc: linuxppc-dev, nyc, benh, paulus, anton

In KVM guests on Power, if the guest is not backed by hugepages, we see
the following in the guest:

AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:         64 kB

This seems like a configuration issue -- why is a hstate of 64k being
registered?

I did some debugging and found that the following does trigger,
mm/hugetlb.c::hugetlb_init():

        /* Some platform decide whether they support huge pages at boot
         * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
         * there is no such support
         */
        if (HPAGE_SHIFT == 0)
                return 0;

That check is only during init-time. So we don't support hugepages, but
none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
0), so /proc/meminfo above falsely indicates there is a valid hstate (at
least one). But note that there is no /sys/kernel/mm/hugepages meaning
no hstate was actually registered.

Further, it turns out that huge_page_order(default_hstate) is 0, so
hugetlb_report_meminfo is doing:

1UL << (huge_page_order(h) + PAGE_SHIFT - 10)

which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
size is 64k, we report a hugepage size of 64k... And allow the user to
allocate hugepages via the sysctl, etc.

What's the right thing to do here?

1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
the config options being off. This seems like a lot of overhead, though,
to put everywhere, so maybe I can do it in an arch-specific macro, that
in asm-generic defaults to 0 (and so will hopefully be compiled out?).

2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
mountable? Obviously if it's mountable, we can't great files there
(since the fs will report insufficient space). [1]

Thanks,
Nish

[1]
Currently, I am seeing the following when I `mount -t hugetlbfs /none
/dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's
related to the fact that hugetlbfs is properly not correctly setting
itself up in this state?:

Unable to handle kernel paging request for data at address 0x00000031
Faulting instruction address: 0xc000000000245710
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: pseries_rng rng_core virtio_net virtio_pci virtio_ring virtio
CPU: 0 PID: 1807 Comm: ls Not tainted 3.14.0-rc7-00066-g774868c-dirty #14
task: c00000007e804520 ti: c00000007aed4000 task.ti: c00000007aed4000
NIP: c000000000245710 LR: c00000000024586c CTR: 0000000000000000
REGS: c00000007aed74f0 TRAP: 0300   Not tainted  (3.14.0-rc7-00066-g774868c-dirty)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
CFAR: 00003fff91037760 DAR: 0000000000000031 DSISR: 40000000 SOFTE: 1
GPR00: c00000000024586c c00000007aed7770 c000000000d85420 c00000007d7a0010
GPR04: c000000000abcf20 c000000000ed7c78 0000000000000020 c000000000cbc880
GPR08: 0000000000000000 0000000000000000 0000000080000000 0000000000000002
GPR12: 0000000044002484 c00000000fe40000 0000000000000000 00000000100232f0
GPR16: 0000000000000001 0000000000000000 0000000000000000 c00000007d794a40
GPR20: 0000000000000000 0000000000000024 c00000007a49a200 c00000007a2bd000
GPR24: c00000007aed7bb8 c00000007d7a0090 0000000000014800 0000000000000000
GPR28: c00000007d7a0010 c00000007a49a210 c00000007d7a0150 0000000000000001
NIP [c000000000245710] .time_out_leases+0x30/0x100
LR [c00000000024586c] .__break_lease+0x8c/0x480
Call Trace:
[c00000007aed7770] [c0000000002434c0] .lease_alloc+0x20/0xe0 (unreliable)
[c00000007aed77f0] [c00000000024586c] .__break_lease+0x8c/0x480
[c00000007aed78e0] [c0000000001e0374] .do_dentry_open.isra.14+0xf4/0x370
[c00000007aed7980] [c0000000001e0624] .finish_open+0x34/0x60
[c00000007aed7a00] [c0000000001f519c] .do_last+0x56c/0xe40
[c00000007aed7b20] [c0000000001f5b68] .path_openat+0xf8/0x800
[c00000007aed7c40] [c0000000001f7810] .do_filp_open+0x40/0xb0
[c00000007aed7d70] [c0000000001e1f08] .do_sys_open+0x198/0x2e0
[c00000007aed7e30] [c00000000000a158] syscall_exit+0x0/0x98
Instruction dump:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* powerpc hugepage bug(s) when no valid hstates?
@ 2014-03-24 23:02 ` Nishanth Aravamudan
  0 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-03-24 23:02 UTC (permalink / raw)
  To: linux-mm; +Cc: paulus, linuxppc-dev, anton, nyc

In KVM guests on Power, if the guest is not backed by hugepages, we see
the following in the guest:

AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:         64 kB

This seems like a configuration issue -- why is a hstate of 64k being
registered?

I did some debugging and found that the following does trigger,
mm/hugetlb.c::hugetlb_init():

        /* Some platform decide whether they support huge pages at boot
         * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
         * there is no such support
         */
        if (HPAGE_SHIFT == 0)
                return 0;

That check is only during init-time. So we don't support hugepages, but
none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
0), so /proc/meminfo above falsely indicates there is a valid hstate (at
least one). But note that there is no /sys/kernel/mm/hugepages meaning
no hstate was actually registered.

Further, it turns out that huge_page_order(default_hstate) is 0, so
hugetlb_report_meminfo is doing:

1UL << (huge_page_order(h) + PAGE_SHIFT - 10)

which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
size is 64k, we report a hugepage size of 64k... And allow the user to
allocate hugepages via the sysctl, etc.

What's the right thing to do here?

1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
the config options being off. This seems like a lot of overhead, though,
to put everywhere, so maybe I can do it in an arch-specific macro, that
in asm-generic defaults to 0 (and so will hopefully be compiled out?).

2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
mountable? Obviously if it's mountable, we can't great files there
(since the fs will report insufficient space). [1]

Thanks,
Nish

[1]
Currently, I am seeing the following when I `mount -t hugetlbfs /none
/dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's
related to the fact that hugetlbfs is properly not correctly setting
itself up in this state?:

Unable to handle kernel paging request for data at address 0x00000031
Faulting instruction address: 0xc000000000245710
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: pseries_rng rng_core virtio_net virtio_pci virtio_ring virtio
CPU: 0 PID: 1807 Comm: ls Not tainted 3.14.0-rc7-00066-g774868c-dirty #14
task: c00000007e804520 ti: c00000007aed4000 task.ti: c00000007aed4000
NIP: c000000000245710 LR: c00000000024586c CTR: 0000000000000000
REGS: c00000007aed74f0 TRAP: 0300   Not tainted  (3.14.0-rc7-00066-g774868c-dirty)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
CFAR: 00003fff91037760 DAR: 0000000000000031 DSISR: 40000000 SOFTE: 1
GPR00: c00000000024586c c00000007aed7770 c000000000d85420 c00000007d7a0010
GPR04: c000000000abcf20 c000000000ed7c78 0000000000000020 c000000000cbc880
GPR08: 0000000000000000 0000000000000000 0000000080000000 0000000000000002
GPR12: 0000000044002484 c00000000fe40000 0000000000000000 00000000100232f0
GPR16: 0000000000000001 0000000000000000 0000000000000000 c00000007d794a40
GPR20: 0000000000000000 0000000000000024 c00000007a49a200 c00000007a2bd000
GPR24: c00000007aed7bb8 c00000007d7a0090 0000000000014800 0000000000000000
GPR28: c00000007d7a0010 c00000007a49a210 c00000007d7a0150 0000000000000001
NIP [c000000000245710] .time_out_leases+0x30/0x100
LR [c00000000024586c] .__break_lease+0x8c/0x480
Call Trace:
[c00000007aed7770] [c0000000002434c0] .lease_alloc+0x20/0xe0 (unreliable)
[c00000007aed77f0] [c00000000024586c] .__break_lease+0x8c/0x480
[c00000007aed78e0] [c0000000001e0374] .do_dentry_open.isra.14+0xf4/0x370
[c00000007aed7980] [c0000000001e0624] .finish_open+0x34/0x60
[c00000007aed7a00] [c0000000001f519c] .do_last+0x56c/0xe40
[c00000007aed7b20] [c0000000001f5b68] .path_openat+0xf8/0x800
[c00000007aed7c40] [c0000000001f7810] .do_filp_open+0x40/0xb0
[c00000007aed7d70] [c0000000001e1f08] .do_sys_open+0x198/0x2e0
[c00000007aed7e30] [c00000000000a158] syscall_exit+0x0/0x98
Instruction dump:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH] hugetlb: ensure hugepage access is denied if hugepages are not supported
  2014-03-24 23:02 ` Nishanth Aravamudan
@ 2014-03-26 15:58   ` Nishanth Aravamudan
  -1 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-03-26 15:58 UTC (permalink / raw)
  To: linux-mm; +Cc: linuxppc-dev, nyc, benh, paulus, anton

On 24.03.2014 [16:02:56 -0700], Nishanth Aravamudan wrote:
> In KVM guests on Power, if the guest is not backed by hugepages, we see
> the following in the guest:
> 
> AnonHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:         64 kB
> 
> This seems like a configuration issue -- why is a hstate of 64k being
> registered?
> 
> I did some debugging and found that the following does trigger,
> mm/hugetlb.c::hugetlb_init():
> 
>         /* Some platform decide whether they support huge pages at boot
>          * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
>          * there is no such support
>          */
>         if (HPAGE_SHIFT == 0)
>                 return 0;
> 
> That check is only during init-time. So we don't support hugepages, but
> none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
> 0), so /proc/meminfo above falsely indicates there is a valid hstate (at
> least one). But note that there is no /sys/kernel/mm/hugepages meaning
> no hstate was actually registered.
> 
> Further, it turns out that huge_page_order(default_hstate) is 0, so
> hugetlb_report_meminfo is doing:
> 
> 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)
> 
> which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
> size is 64k, we report a hugepage size of 64k... And allow the user to
> allocate hugepages via the sysctl, etc.
> 
> What's the right thing to do here?
> 
> 1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
> seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
> the config options being off. This seems like a lot of overhead, though,
> to put everywhere, so maybe I can do it in an arch-specific macro, that
> in asm-generic defaults to 0 (and so will hopefully be compiled out?).
> 
> 2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
> mountable? Obviously if it's mountable, we can't great files there
> (since the fs will report insufficient space). [1]

Here is my solution to this. Comments appreciated!

In KVM guests on Power, in a guest not backed by hugepages, we see the
following:

AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:         64 kB

HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
are not supported at boot-time, but this is only checked in
hugetlb_init(). Extract the check to a helper function, and use it in a
few relevant places.

This does make hugetlbfs not supported in this environment. I believe
this is fine, as there are no valid hugepages and that won't change at
runtime.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index d19b30a..c7aa477 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1017,6 +1017,11 @@ static int __init init_hugetlbfs_fs(void)
 	int error;
 	int i;
 
+	if (!hugepages_supported()) {
+		printk(KERN_ERR "hugetlbfs: Disabling because there are no supported page sizes\n");
+		return -ENOTSUPP;
+	}
+
 	error = bdi_init(&hugetlbfs_backing_dev_info);
 	if (error)
 		return error;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8c43cc4..0aea8de 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -450,4 +450,14 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
 	return ptl;
 }
 
+static inline bool hugepages_supported(void)
+{
+	/*
+	 * Some platform decide whether they support huge pages at boot
+	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
+	 * there is no such support
+	 */
+	return HPAGE_SHIFT != 0;
+}
+
 #endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c01cb9f..1c99585 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1949,11 +1949,7 @@ module_exit(hugetlb_exit);
 
 static int __init hugetlb_init(void)
 {
-	/* Some platform decide whether they support huge pages at boot
-	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
-	 * there is no such support
-	 */
-	if (HPAGE_SHIFT == 0)
+	if (!hugepages_supported())
 		return 0;
 
 	if (!size_to_hstate(default_hstate_size)) {
@@ -2069,6 +2065,9 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
 	unsigned long tmp;
 	int ret;
 
+	if (!hugepages_supported())
+		return -ENOTSUPP;
+
 	tmp = h->max_huge_pages;
 
 	if (write && h->order >= MAX_ORDER)
@@ -2122,6 +2121,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,
 	unsigned long tmp;
 	int ret;
 
+	if (!hugepages_supported())
+		return -ENOTSUPP;
+
 	tmp = h->nr_overcommit_huge_pages;
 
 	if (write && h->order >= MAX_ORDER)
@@ -2147,6 +2149,8 @@ out:
 void hugetlb_report_meminfo(struct seq_file *m)
 {
 	struct hstate *h = &default_hstate;
+	if (!hugepages_supported())
+		return;
 	seq_printf(m,
 			"HugePages_Total:   %5lu\n"
 			"HugePages_Free:    %5lu\n"
@@ -2163,6 +2167,8 @@ void hugetlb_report_meminfo(struct seq_file *m)
 int hugetlb_report_node_meminfo(int nid, char *buf)
 {
 	struct hstate *h = &default_hstate;
+	if (!hugepages_supported())
+		return 0;
 	return sprintf(buf,
 		"Node %d HugePages_Total: %5u\n"
 		"Node %d HugePages_Free:  %5u\n"
@@ -2177,6 +2183,9 @@ void hugetlb_show_meminfo(void)
 	struct hstate *h;
 	int nid;
 
+	if (!hugepages_supported())
+		return;
+
 	for_each_node_state(nid, N_MEMORY)
 		for_each_hstate(h)
 			pr_info("Node %d hugepages_total=%u hugepages_free=%u hugepages_surp=%u hugepages_size=%lukB\n",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH] hugetlb: ensure hugepage access is denied if hugepages are not supported
@ 2014-03-26 15:58   ` Nishanth Aravamudan
  0 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-03-26 15:58 UTC (permalink / raw)
  To: linux-mm; +Cc: paulus, linuxppc-dev, anton, nyc

On 24.03.2014 [16:02:56 -0700], Nishanth Aravamudan wrote:
> In KVM guests on Power, if the guest is not backed by hugepages, we see
> the following in the guest:
> 
> AnonHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:         64 kB
> 
> This seems like a configuration issue -- why is a hstate of 64k being
> registered?
> 
> I did some debugging and found that the following does trigger,
> mm/hugetlb.c::hugetlb_init():
> 
>         /* Some platform decide whether they support huge pages at boot
>          * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
>          * there is no such support
>          */
>         if (HPAGE_SHIFT == 0)
>                 return 0;
> 
> That check is only during init-time. So we don't support hugepages, but
> none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
> 0), so /proc/meminfo above falsely indicates there is a valid hstate (at
> least one). But note that there is no /sys/kernel/mm/hugepages meaning
> no hstate was actually registered.
> 
> Further, it turns out that huge_page_order(default_hstate) is 0, so
> hugetlb_report_meminfo is doing:
> 
> 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)
> 
> which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
> size is 64k, we report a hugepage size of 64k... And allow the user to
> allocate hugepages via the sysctl, etc.
> 
> What's the right thing to do here?
> 
> 1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
> seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
> the config options being off. This seems like a lot of overhead, though,
> to put everywhere, so maybe I can do it in an arch-specific macro, that
> in asm-generic defaults to 0 (and so will hopefully be compiled out?).
> 
> 2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
> mountable? Obviously if it's mountable, we can't great files there
> (since the fs will report insufficient space). [1]

Here is my solution to this. Comments appreciated!

In KVM guests on Power, in a guest not backed by hugepages, we see the
following:

AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:         64 kB

HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
are not supported at boot-time, but this is only checked in
hugetlb_init(). Extract the check to a helper function, and use it in a
few relevant places.

This does make hugetlbfs not supported in this environment. I believe
this is fine, as there are no valid hugepages and that won't change at
runtime.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index d19b30a..c7aa477 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1017,6 +1017,11 @@ static int __init init_hugetlbfs_fs(void)
 	int error;
 	int i;
 
+	if (!hugepages_supported()) {
+		printk(KERN_ERR "hugetlbfs: Disabling because there are no supported page sizes\n");
+		return -ENOTSUPP;
+	}
+
 	error = bdi_init(&hugetlbfs_backing_dev_info);
 	if (error)
 		return error;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8c43cc4..0aea8de 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -450,4 +450,14 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
 	return ptl;
 }
 
+static inline bool hugepages_supported(void)
+{
+	/*
+	 * Some platform decide whether they support huge pages at boot
+	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
+	 * there is no such support
+	 */
+	return HPAGE_SHIFT != 0;
+}
+
 #endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c01cb9f..1c99585 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1949,11 +1949,7 @@ module_exit(hugetlb_exit);
 
 static int __init hugetlb_init(void)
 {
-	/* Some platform decide whether they support huge pages at boot
-	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
-	 * there is no such support
-	 */
-	if (HPAGE_SHIFT == 0)
+	if (!hugepages_supported())
 		return 0;
 
 	if (!size_to_hstate(default_hstate_size)) {
@@ -2069,6 +2065,9 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
 	unsigned long tmp;
 	int ret;
 
+	if (!hugepages_supported())
+		return -ENOTSUPP;
+
 	tmp = h->max_huge_pages;
 
 	if (write && h->order >= MAX_ORDER)
@@ -2122,6 +2121,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,
 	unsigned long tmp;
 	int ret;
 
+	if (!hugepages_supported())
+		return -ENOTSUPP;
+
 	tmp = h->nr_overcommit_huge_pages;
 
 	if (write && h->order >= MAX_ORDER)
@@ -2147,6 +2149,8 @@ out:
 void hugetlb_report_meminfo(struct seq_file *m)
 {
 	struct hstate *h = &default_hstate;
+	if (!hugepages_supported())
+		return;
 	seq_printf(m,
 			"HugePages_Total:   %5lu\n"
 			"HugePages_Free:    %5lu\n"
@@ -2163,6 +2167,8 @@ void hugetlb_report_meminfo(struct seq_file *m)
 int hugetlb_report_node_meminfo(int nid, char *buf)
 {
 	struct hstate *h = &default_hstate;
+	if (!hugepages_supported())
+		return 0;
 	return sprintf(buf,
 		"Node %d HugePages_Total: %5u\n"
 		"Node %d HugePages_Free:  %5u\n"
@@ -2177,6 +2183,9 @@ void hugetlb_show_meminfo(void)
 	struct hstate *h;
 	int nid;
 
+	if (!hugepages_supported())
+		return;
+
 	for_each_node_state(nid, N_MEMORY)
 		for_each_hstate(h)
 			pr_info("Node %d hugepages_total=%u hugepages_free=%u hugepages_surp=%u hugepages_size=%lukB\n",

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] hugetlb: ensure hugepage access is denied if hugepages are not supported
  2014-03-26 15:58   ` Nishanth Aravamudan
@ 2014-04-02 17:16     ` Nishanth Aravamudan
  -1 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-04-02 17:16 UTC (permalink / raw)
  To: linux-mm; +Cc: linuxppc-dev, nyc, benh, paulus, anton, akpm

On 26.03.2014 [08:58:15 -0700], Nishanth Aravamudan wrote:
> On 24.03.2014 [16:02:56 -0700], Nishanth Aravamudan wrote:
> > In KVM guests on Power, if the guest is not backed by hugepages, we see
> > the following in the guest:
> > 
> > AnonHugePages:         0 kB
> > HugePages_Total:       0
> > HugePages_Free:        0
> > HugePages_Rsvd:        0
> > HugePages_Surp:        0
> > Hugepagesize:         64 kB
> > 
> > This seems like a configuration issue -- why is a hstate of 64k being
> > registered?
> > 
> > I did some debugging and found that the following does trigger,
> > mm/hugetlb.c::hugetlb_init():
> > 
> >         /* Some platform decide whether they support huge pages at boot
> >          * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> >          * there is no such support
> >          */
> >         if (HPAGE_SHIFT == 0)
> >                 return 0;
> > 
> > That check is only during init-time. So we don't support hugepages, but
> > none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
> > 0), so /proc/meminfo above falsely indicates there is a valid hstate (at
> > least one). But note that there is no /sys/kernel/mm/hugepages meaning
> > no hstate was actually registered.
> > 
> > Further, it turns out that huge_page_order(default_hstate) is 0, so
> > hugetlb_report_meminfo is doing:
> > 
> > 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)
> > 
> > which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
> > size is 64k, we report a hugepage size of 64k... And allow the user to
> > allocate hugepages via the sysctl, etc.
> > 
> > What's the right thing to do here?
> > 
> > 1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
> > seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
> > the config options being off. This seems like a lot of overhead, though,
> > to put everywhere, so maybe I can do it in an arch-specific macro, that
> > in asm-generic defaults to 0 (and so will hopefully be compiled out?).
> > 
> > 2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
> > mountable? Obviously if it's mountable, we can't great files there
> > (since the fs will report insufficient space). [1]
> 
> Here is my solution to this. Comments appreciated!
> 
> In KVM guests on Power, in a guest not backed by hugepages, we see the
> following:
> 
> AnonHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:         64 kB
> 
> HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
> are not supported at boot-time, but this is only checked in
> hugetlb_init(). Extract the check to a helper function, and use it in a
> few relevant places.
> 
> This does make hugetlbfs not supported in this environment. I believe
> this is fine, as there are no valid hugepages and that won't change at
> runtime.
> 
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

Ping on this? The patch below fixes a pretty easy-to-reproduce bug in
guests under KVM guests on Power.

Thanks,
Nish

> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index d19b30a..c7aa477 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -1017,6 +1017,11 @@ static int __init init_hugetlbfs_fs(void)
>  	int error;
>  	int i;
>  
> +	if (!hugepages_supported()) {
> +		printk(KERN_ERR "hugetlbfs: Disabling because there are no supported page sizes\n");
> +		return -ENOTSUPP;
> +	}
> +
>  	error = bdi_init(&hugetlbfs_backing_dev_info);
>  	if (error)
>  		return error;
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 8c43cc4..0aea8de 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -450,4 +450,14 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
>  	return ptl;
>  }
>  
> +static inline bool hugepages_supported(void)
> +{
> +	/*
> +	 * Some platform decide whether they support huge pages at boot
> +	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> +	 * there is no such support
> +	 */
> +	return HPAGE_SHIFT != 0;
> +}
> +
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c01cb9f..1c99585 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1949,11 +1949,7 @@ module_exit(hugetlb_exit);
>  
>  static int __init hugetlb_init(void)
>  {
> -	/* Some platform decide whether they support huge pages at boot
> -	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> -	 * there is no such support
> -	 */
> -	if (HPAGE_SHIFT == 0)
> +	if (!hugepages_supported())
>  		return 0;
>  
>  	if (!size_to_hstate(default_hstate_size)) {
> @@ -2069,6 +2065,9 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
>  	unsigned long tmp;
>  	int ret;
>  
> +	if (!hugepages_supported())
> +		return -ENOTSUPP;
> +
>  	tmp = h->max_huge_pages;
>  
>  	if (write && h->order >= MAX_ORDER)
> @@ -2122,6 +2121,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,
>  	unsigned long tmp;
>  	int ret;
>  
> +	if (!hugepages_supported())
> +		return -ENOTSUPP;
> +
>  	tmp = h->nr_overcommit_huge_pages;
>  
>  	if (write && h->order >= MAX_ORDER)
> @@ -2147,6 +2149,8 @@ out:
>  void hugetlb_report_meminfo(struct seq_file *m)
>  {
>  	struct hstate *h = &default_hstate;
> +	if (!hugepages_supported())
> +		return;
>  	seq_printf(m,
>  			"HugePages_Total:   %5lu\n"
>  			"HugePages_Free:    %5lu\n"
> @@ -2163,6 +2167,8 @@ void hugetlb_report_meminfo(struct seq_file *m)
>  int hugetlb_report_node_meminfo(int nid, char *buf)
>  {
>  	struct hstate *h = &default_hstate;
> +	if (!hugepages_supported())
> +		return 0;
>  	return sprintf(buf,
>  		"Node %d HugePages_Total: %5u\n"
>  		"Node %d HugePages_Free:  %5u\n"
> @@ -2177,6 +2183,9 @@ void hugetlb_show_meminfo(void)
>  	struct hstate *h;
>  	int nid;
>  
> +	if (!hugepages_supported())
> +		return;
> +
>  	for_each_node_state(nid, N_MEMORY)
>  		for_each_hstate(h)
>  			pr_info("Node %d hugepages_total=%u hugepages_free=%u hugepages_surp=%u hugepages_size=%lukB\n",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] hugetlb: ensure hugepage access is denied if hugepages are not supported
@ 2014-04-02 17:16     ` Nishanth Aravamudan
  0 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-04-02 17:16 UTC (permalink / raw)
  To: linux-mm; +Cc: paulus, anton, nyc, akpm, linuxppc-dev

On 26.03.2014 [08:58:15 -0700], Nishanth Aravamudan wrote:
> On 24.03.2014 [16:02:56 -0700], Nishanth Aravamudan wrote:
> > In KVM guests on Power, if the guest is not backed by hugepages, we see
> > the following in the guest:
> > 
> > AnonHugePages:         0 kB
> > HugePages_Total:       0
> > HugePages_Free:        0
> > HugePages_Rsvd:        0
> > HugePages_Surp:        0
> > Hugepagesize:         64 kB
> > 
> > This seems like a configuration issue -- why is a hstate of 64k being
> > registered?
> > 
> > I did some debugging and found that the following does trigger,
> > mm/hugetlb.c::hugetlb_init():
> > 
> >         /* Some platform decide whether they support huge pages at boot
> >          * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> >          * there is no such support
> >          */
> >         if (HPAGE_SHIFT == 0)
> >                 return 0;
> > 
> > That check is only during init-time. So we don't support hugepages, but
> > none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
> > 0), so /proc/meminfo above falsely indicates there is a valid hstate (at
> > least one). But note that there is no /sys/kernel/mm/hugepages meaning
> > no hstate was actually registered.
> > 
> > Further, it turns out that huge_page_order(default_hstate) is 0, so
> > hugetlb_report_meminfo is doing:
> > 
> > 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)
> > 
> > which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
> > size is 64k, we report a hugepage size of 64k... And allow the user to
> > allocate hugepages via the sysctl, etc.
> > 
> > What's the right thing to do here?
> > 
> > 1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
> > seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
> > the config options being off. This seems like a lot of overhead, though,
> > to put everywhere, so maybe I can do it in an arch-specific macro, that
> > in asm-generic defaults to 0 (and so will hopefully be compiled out?).
> > 
> > 2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
> > mountable? Obviously if it's mountable, we can't great files there
> > (since the fs will report insufficient space). [1]
> 
> Here is my solution to this. Comments appreciated!
> 
> In KVM guests on Power, in a guest not backed by hugepages, we see the
> following:
> 
> AnonHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:         64 kB
> 
> HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
> are not supported at boot-time, but this is only checked in
> hugetlb_init(). Extract the check to a helper function, and use it in a
> few relevant places.
> 
> This does make hugetlbfs not supported in this environment. I believe
> this is fine, as there are no valid hugepages and that won't change at
> runtime.
> 
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

Ping on this? The patch below fixes a pretty easy-to-reproduce bug in
guests under KVM guests on Power.

Thanks,
Nish

> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index d19b30a..c7aa477 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -1017,6 +1017,11 @@ static int __init init_hugetlbfs_fs(void)
>  	int error;
>  	int i;
>  
> +	if (!hugepages_supported()) {
> +		printk(KERN_ERR "hugetlbfs: Disabling because there are no supported page sizes\n");
> +		return -ENOTSUPP;
> +	}
> +
>  	error = bdi_init(&hugetlbfs_backing_dev_info);
>  	if (error)
>  		return error;
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 8c43cc4..0aea8de 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -450,4 +450,14 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
>  	return ptl;
>  }
>  
> +static inline bool hugepages_supported(void)
> +{
> +	/*
> +	 * Some platform decide whether they support huge pages at boot
> +	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> +	 * there is no such support
> +	 */
> +	return HPAGE_SHIFT != 0;
> +}
> +
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c01cb9f..1c99585 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1949,11 +1949,7 @@ module_exit(hugetlb_exit);
>  
>  static int __init hugetlb_init(void)
>  {
> -	/* Some platform decide whether they support huge pages at boot
> -	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> -	 * there is no such support
> -	 */
> -	if (HPAGE_SHIFT == 0)
> +	if (!hugepages_supported())
>  		return 0;
>  
>  	if (!size_to_hstate(default_hstate_size)) {
> @@ -2069,6 +2065,9 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
>  	unsigned long tmp;
>  	int ret;
>  
> +	if (!hugepages_supported())
> +		return -ENOTSUPP;
> +
>  	tmp = h->max_huge_pages;
>  
>  	if (write && h->order >= MAX_ORDER)
> @@ -2122,6 +2121,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,
>  	unsigned long tmp;
>  	int ret;
>  
> +	if (!hugepages_supported())
> +		return -ENOTSUPP;
> +
>  	tmp = h->nr_overcommit_huge_pages;
>  
>  	if (write && h->order >= MAX_ORDER)
> @@ -2147,6 +2149,8 @@ out:
>  void hugetlb_report_meminfo(struct seq_file *m)
>  {
>  	struct hstate *h = &default_hstate;
> +	if (!hugepages_supported())
> +		return;
>  	seq_printf(m,
>  			"HugePages_Total:   %5lu\n"
>  			"HugePages_Free:    %5lu\n"
> @@ -2163,6 +2167,8 @@ void hugetlb_report_meminfo(struct seq_file *m)
>  int hugetlb_report_node_meminfo(int nid, char *buf)
>  {
>  	struct hstate *h = &default_hstate;
> +	if (!hugepages_supported())
> +		return 0;
>  	return sprintf(buf,
>  		"Node %d HugePages_Total: %5u\n"
>  		"Node %d HugePages_Free:  %5u\n"
> @@ -2177,6 +2183,9 @@ void hugetlb_show_meminfo(void)
>  	struct hstate *h;
>  	int nid;
>  
> +	if (!hugepages_supported())
> +		return;
> +
>  	for_each_node_state(nid, N_MEMORY)
>  		for_each_hstate(h)
>  			pr_info("Node %d hugepages_total=%u hugepages_free=%u hugepages_surp=%u hugepages_size=%lukB\n",

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] hugetlb: ensure hugepage access is denied if hugepages are not supported
  2014-03-26 15:58   ` Nishanth Aravamudan
@ 2014-04-03 16:19     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 10+ messages in thread
From: Aneesh Kumar K.V @ 2014-04-03 16:19 UTC (permalink / raw)
  To: Nishanth Aravamudan, linux-mm; +Cc: paulus, linuxppc-dev, anton, nyc

Nishanth Aravamudan <nacc@linux.vnet.ibm.com> writes:

> On 24.03.2014 [16:02:56 -0700], Nishanth Aravamudan wrote:
>> In KVM guests on Power, if the guest is not backed by hugepages, we see
>> the following in the guest:
>> 
>> AnonHugePages:         0 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:         64 kB
>> 
>> This seems like a configuration issue -- why is a hstate of 64k being
>> registered?
>> 
>> I did some debugging and found that the following does trigger,
>> mm/hugetlb.c::hugetlb_init():
>> 
>>         /* Some platform decide whether they support huge pages at boot
>>          * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
>>          * there is no such support
>>          */
>>         if (HPAGE_SHIFT == 0)
>>                 return 0;
>> 
>> That check is only during init-time. So we don't support hugepages, but
>> none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
>> 0), so /proc/meminfo above falsely indicates there is a valid hstate (at
>> least one). But note that there is no /sys/kernel/mm/hugepages meaning
>> no hstate was actually registered.
>> 
>> Further, it turns out that huge_page_order(default_hstate) is 0, so
>> hugetlb_report_meminfo is doing:
>> 
>> 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)
>> 
>> which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
>> size is 64k, we report a hugepage size of 64k... And allow the user to
>> allocate hugepages via the sysctl, etc.
>> 
>> What's the right thing to do here?
>> 
>> 1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
>> seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
>> the config options being off. This seems like a lot of overhead, though,
>> to put everywhere, so maybe I can do it in an arch-specific macro, that
>> in asm-generic defaults to 0 (and so will hopefully be compiled out?).
>> 
>> 2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
>> mountable? Obviously if it's mountable, we can't great files there
>> (since the fs will report insufficient space). [1]
>
> Here is my solution to this. Comments appreciated!
>
> In KVM guests on Power, in a guest not backed by hugepages, we see the
> following:
>
> AnonHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:         64 kB
>
> HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
> are not supported at boot-time, but this is only checked in
> hugetlb_init(). Extract the check to a helper function, and use it in a
> few relevant places.
>
> This does make hugetlbfs not supported in this environment. I believe
> this is fine, as there are no valid hugepages and that won't change at
> runtime.
>
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>


Looks good. Can you resubmit it as a proper patch ? You may also want to
capture in commit message saying hugetlbfs file system also will not be
registered. 

>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index d19b30a..c7aa477 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -1017,6 +1017,11 @@ static int __init init_hugetlbfs_fs(void)
>  	int error;
>  	int i;
>  
> +	if (!hugepages_supported()) {
> +		printk(KERN_ERR "hugetlbfs: Disabling because there are no supported page sizes\n");
> +		return -ENOTSUPP;
> +	}
> +
>  	error = bdi_init(&hugetlbfs_backing_dev_info);
>  	if (error)
>  		return error;
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 8c43cc4..0aea8de 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -450,4 +450,14 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
>  	return ptl;
>  }
>  
> +static inline bool hugepages_supported(void)
> +{
> +	/*
> +	 * Some platform decide whether they support huge pages at boot
> +	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> +	 * there is no such support
> +	 */
> +	return HPAGE_SHIFT != 0;
> +}
> +
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c01cb9f..1c99585 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1949,11 +1949,7 @@ module_exit(hugetlb_exit);
>  
>  static int __init hugetlb_init(void)
>  {
> -	/* Some platform decide whether they support huge pages at boot
> -	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> -	 * there is no such support
> -	 */
> -	if (HPAGE_SHIFT == 0)
> +	if (!hugepages_supported())
>  		return 0;
>  
>  	if (!size_to_hstate(default_hstate_size)) {
> @@ -2069,6 +2065,9 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
>  	unsigned long tmp;
>  	int ret;
>  
> +	if (!hugepages_supported())
> +		return -ENOTSUPP;
> +
>  	tmp = h->max_huge_pages;
>  
>  	if (write && h->order >= MAX_ORDER)
> @@ -2122,6 +2121,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,
>  	unsigned long tmp;
>  	int ret;
>  
> +	if (!hugepages_supported())
> +		return -ENOTSUPP;
> +
>  	tmp = h->nr_overcommit_huge_pages;
>  
>  	if (write && h->order >= MAX_ORDER)
> @@ -2147,6 +2149,8 @@ out:
>  void hugetlb_report_meminfo(struct seq_file *m)
>  {
>  	struct hstate *h = &default_hstate;
> +	if (!hugepages_supported())
> +		return;
>  	seq_printf(m,
>  			"HugePages_Total:   %5lu\n"
>  			"HugePages_Free:    %5lu\n"
> @@ -2163,6 +2167,8 @@ void hugetlb_report_meminfo(struct seq_file *m)
>  int hugetlb_report_node_meminfo(int nid, char *buf)
>  {
>  	struct hstate *h = &default_hstate;
> +	if (!hugepages_supported())
> +		return 0;
>  	return sprintf(buf,
>  		"Node %d HugePages_Total: %5u\n"
>  		"Node %d HugePages_Free:  %5u\n"
> @@ -2177,6 +2183,9 @@ void hugetlb_show_meminfo(void)
>  	struct hstate *h;
>  	int nid;
>  
> +	if (!hugepages_supported())
> +		return;
> +
>  	for_each_node_state(nid, N_MEMORY)
>  		for_each_hstate(h)
>  			pr_info("Node %d hugepages_total=%u hugepages_free=%u hugepages_surp=%u hugepages_size=%lukB\n",
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] hugetlb: ensure hugepage access is denied if hugepages are not supported
@ 2014-04-03 16:19     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 10+ messages in thread
From: Aneesh Kumar K.V @ 2014-04-03 16:19 UTC (permalink / raw)
  To: Nishanth Aravamudan, linux-mm; +Cc: linuxppc-dev, paulus, anton, nyc

Nishanth Aravamudan <nacc@linux.vnet.ibm.com> writes:

> On 24.03.2014 [16:02:56 -0700], Nishanth Aravamudan wrote:
>> In KVM guests on Power, if the guest is not backed by hugepages, we see
>> the following in the guest:
>> 
>> AnonHugePages:         0 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:         64 kB
>> 
>> This seems like a configuration issue -- why is a hstate of 64k being
>> registered?
>> 
>> I did some debugging and found that the following does trigger,
>> mm/hugetlb.c::hugetlb_init():
>> 
>>         /* Some platform decide whether they support huge pages at boot
>>          * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
>>          * there is no such support
>>          */
>>         if (HPAGE_SHIFT == 0)
>>                 return 0;
>> 
>> That check is only during init-time. So we don't support hugepages, but
>> none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
>> 0), so /proc/meminfo above falsely indicates there is a valid hstate (at
>> least one). But note that there is no /sys/kernel/mm/hugepages meaning
>> no hstate was actually registered.
>> 
>> Further, it turns out that huge_page_order(default_hstate) is 0, so
>> hugetlb_report_meminfo is doing:
>> 
>> 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)
>> 
>> which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
>> size is 64k, we report a hugepage size of 64k... And allow the user to
>> allocate hugepages via the sysctl, etc.
>> 
>> What's the right thing to do here?
>> 
>> 1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
>> seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
>> the config options being off. This seems like a lot of overhead, though,
>> to put everywhere, so maybe I can do it in an arch-specific macro, that
>> in asm-generic defaults to 0 (and so will hopefully be compiled out?).
>> 
>> 2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
>> mountable? Obviously if it's mountable, we can't great files there
>> (since the fs will report insufficient space). [1]
>
> Here is my solution to this. Comments appreciated!
>
> In KVM guests on Power, in a guest not backed by hugepages, we see the
> following:
>
> AnonHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:         64 kB
>
> HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
> are not supported at boot-time, but this is only checked in
> hugetlb_init(). Extract the check to a helper function, and use it in a
> few relevant places.
>
> This does make hugetlbfs not supported in this environment. I believe
> this is fine, as there are no valid hugepages and that won't change at
> runtime.
>
> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>


Looks good. Can you resubmit it as a proper patch ? You may also want to
capture in commit message saying hugetlbfs file system also will not be
registered. 

>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index d19b30a..c7aa477 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -1017,6 +1017,11 @@ static int __init init_hugetlbfs_fs(void)
>  	int error;
>  	int i;
>  
> +	if (!hugepages_supported()) {
> +		printk(KERN_ERR "hugetlbfs: Disabling because there are no supported page sizes\n");
> +		return -ENOTSUPP;
> +	}
> +
>  	error = bdi_init(&hugetlbfs_backing_dev_info);
>  	if (error)
>  		return error;
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 8c43cc4..0aea8de 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -450,4 +450,14 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
>  	return ptl;
>  }
>  
> +static inline bool hugepages_supported(void)
> +{
> +	/*
> +	 * Some platform decide whether they support huge pages at boot
> +	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> +	 * there is no such support
> +	 */
> +	return HPAGE_SHIFT != 0;
> +}
> +
>  #endif /* _LINUX_HUGETLB_H */
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c01cb9f..1c99585 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1949,11 +1949,7 @@ module_exit(hugetlb_exit);
>  
>  static int __init hugetlb_init(void)
>  {
> -	/* Some platform decide whether they support huge pages at boot
> -	 * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> -	 * there is no such support
> -	 */
> -	if (HPAGE_SHIFT == 0)
> +	if (!hugepages_supported())
>  		return 0;
>  
>  	if (!size_to_hstate(default_hstate_size)) {
> @@ -2069,6 +2065,9 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
>  	unsigned long tmp;
>  	int ret;
>  
> +	if (!hugepages_supported())
> +		return -ENOTSUPP;
> +
>  	tmp = h->max_huge_pages;
>  
>  	if (write && h->order >= MAX_ORDER)
> @@ -2122,6 +2121,9 @@ int hugetlb_overcommit_handler(struct ctl_table *table, int write,
>  	unsigned long tmp;
>  	int ret;
>  
> +	if (!hugepages_supported())
> +		return -ENOTSUPP;
> +
>  	tmp = h->nr_overcommit_huge_pages;
>  
>  	if (write && h->order >= MAX_ORDER)
> @@ -2147,6 +2149,8 @@ out:
>  void hugetlb_report_meminfo(struct seq_file *m)
>  {
>  	struct hstate *h = &default_hstate;
> +	if (!hugepages_supported())
> +		return;
>  	seq_printf(m,
>  			"HugePages_Total:   %5lu\n"
>  			"HugePages_Free:    %5lu\n"
> @@ -2163,6 +2167,8 @@ void hugetlb_report_meminfo(struct seq_file *m)
>  int hugetlb_report_node_meminfo(int nid, char *buf)
>  {
>  	struct hstate *h = &default_hstate;
> +	if (!hugepages_supported())
> +		return 0;
>  	return sprintf(buf,
>  		"Node %d HugePages_Total: %5u\n"
>  		"Node %d HugePages_Free:  %5u\n"
> @@ -2177,6 +2183,9 @@ void hugetlb_show_meminfo(void)
>  	struct hstate *h;
>  	int nid;
>  
> +	if (!hugepages_supported())
> +		return;
> +
>  	for_each_node_state(nid, N_MEMORY)
>  		for_each_hstate(h)
>  			pr_info("Node %d hugepages_total=%u hugepages_free=%u hugepages_surp=%u hugepages_size=%lukB\n",
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] hugetlb: ensure hugepage access is denied if hugepages are not supported
  2014-04-03 16:19     ` Aneesh Kumar K.V
@ 2014-04-03 23:12       ` Nishanth Aravamudan
  -1 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-04-03 23:12 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-mm, paulus, linuxppc-dev, anton, nyc

On 03.04.2014 [21:49:46 +0530], Aneesh Kumar K.V wrote:
> Nishanth Aravamudan <nacc@linux.vnet.ibm.com> writes:
> 
> > On 24.03.2014 [16:02:56 -0700], Nishanth Aravamudan wrote:
> >> In KVM guests on Power, if the guest is not backed by hugepages, we see
> >> the following in the guest:
> >> 
> >> AnonHugePages:         0 kB
> >> HugePages_Total:       0
> >> HugePages_Free:        0
> >> HugePages_Rsvd:        0
> >> HugePages_Surp:        0
> >> Hugepagesize:         64 kB
> >> 
> >> This seems like a configuration issue -- why is a hstate of 64k being
> >> registered?
> >> 
> >> I did some debugging and found that the following does trigger,
> >> mm/hugetlb.c::hugetlb_init():
> >> 
> >>         /* Some platform decide whether they support huge pages at boot
> >>          * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> >>          * there is no such support
> >>          */
> >>         if (HPAGE_SHIFT == 0)
> >>                 return 0;
> >> 
> >> That check is only during init-time. So we don't support hugepages, but
> >> none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
> >> 0), so /proc/meminfo above falsely indicates there is a valid hstate (at
> >> least one). But note that there is no /sys/kernel/mm/hugepages meaning
> >> no hstate was actually registered.
> >> 
> >> Further, it turns out that huge_page_order(default_hstate) is 0, so
> >> hugetlb_report_meminfo is doing:
> >> 
> >> 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)
> >> 
> >> which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
> >> size is 64k, we report a hugepage size of 64k... And allow the user to
> >> allocate hugepages via the sysctl, etc.
> >> 
> >> What's the right thing to do here?
> >> 
> >> 1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
> >> seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
> >> the config options being off. This seems like a lot of overhead, though,
> >> to put everywhere, so maybe I can do it in an arch-specific macro, that
> >> in asm-generic defaults to 0 (and so will hopefully be compiled out?).
> >> 
> >> 2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
> >> mountable? Obviously if it's mountable, we can't great files there
> >> (since the fs will report insufficient space). [1]
> >
> > Here is my solution to this. Comments appreciated!
> >
> > In KVM guests on Power, in a guest not backed by hugepages, we see the
> > following:
> >
> > AnonHugePages:         0 kB
> > HugePages_Total:       0
> > HugePages_Free:        0
> > HugePages_Rsvd:        0
> > HugePages_Surp:        0
> > Hugepagesize:         64 kB
> >
> > HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
> > are not supported at boot-time, but this is only checked in
> > hugetlb_init(). Extract the check to a helper function, and use it in a
> > few relevant places.
> >
> > This does make hugetlbfs not supported in this environment. I believe
> > this is fine, as there are no valid hugepages and that won't change at
> > runtime.
> >
> > Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> 
> 
> Looks good. Can you resubmit it as a proper patch ?

Will Cc you on that.


> You may also want to capture in commit message saying hugetlbfs file
> system also will not be registered. 

I did that already:

> > This does make hugetlbfs not supported in this environment. I
> > believe this is fine, as there are no valid hugepages and that won't
> > change at runtime.

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] hugetlb: ensure hugepage access is denied if hugepages are not supported
@ 2014-04-03 23:12       ` Nishanth Aravamudan
  0 siblings, 0 replies; 10+ messages in thread
From: Nishanth Aravamudan @ 2014-04-03 23:12 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linux-mm, paulus, linuxppc-dev, nyc, anton

On 03.04.2014 [21:49:46 +0530], Aneesh Kumar K.V wrote:
> Nishanth Aravamudan <nacc@linux.vnet.ibm.com> writes:
> 
> > On 24.03.2014 [16:02:56 -0700], Nishanth Aravamudan wrote:
> >> In KVM guests on Power, if the guest is not backed by hugepages, we see
> >> the following in the guest:
> >> 
> >> AnonHugePages:         0 kB
> >> HugePages_Total:       0
> >> HugePages_Free:        0
> >> HugePages_Rsvd:        0
> >> HugePages_Surp:        0
> >> Hugepagesize:         64 kB
> >> 
> >> This seems like a configuration issue -- why is a hstate of 64k being
> >> registered?
> >> 
> >> I did some debugging and found that the following does trigger,
> >> mm/hugetlb.c::hugetlb_init():
> >> 
> >>         /* Some platform decide whether they support huge pages at boot
> >>          * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when
> >>          * there is no such support
> >>          */
> >>         if (HPAGE_SHIFT == 0)
> >>                 return 0;
> >> 
> >> That check is only during init-time. So we don't support hugepages, but
> >> none of the hugetlb APIs actually check this condition (HPAGE_SHIFT ==
> >> 0), so /proc/meminfo above falsely indicates there is a valid hstate (at
> >> least one). But note that there is no /sys/kernel/mm/hugepages meaning
> >> no hstate was actually registered.
> >> 
> >> Further, it turns out that huge_page_order(default_hstate) is 0, so
> >> hugetlb_report_meminfo is doing:
> >> 
> >> 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)
> >> 
> >> which ends up just doing 1 << (PAGE_SHIFT - 10) and since the base page
> >> size is 64k, we report a hugepage size of 64k... And allow the user to
> >> allocate hugepages via the sysctl, etc.
> >> 
> >> What's the right thing to do here?
> >> 
> >> 1) Should we add checks for HPAGE_SHIFT == 0 to all the hugetlb APIs? It
> >> seems like HPAGE_SHIFT == 0 should be the equivalent, functionally, of
> >> the config options being off. This seems like a lot of overhead, though,
> >> to put everywhere, so maybe I can do it in an arch-specific macro, that
> >> in asm-generic defaults to 0 (and so will hopefully be compiled out?).
> >> 
> >> 2) What should hugetlbfs do when HPAGE_SHIFT == 0? Should it be
> >> mountable? Obviously if it's mountable, we can't great files there
> >> (since the fs will report insufficient space). [1]
> >
> > Here is my solution to this. Comments appreciated!
> >
> > In KVM guests on Power, in a guest not backed by hugepages, we see the
> > following:
> >
> > AnonHugePages:         0 kB
> > HugePages_Total:       0
> > HugePages_Free:        0
> > HugePages_Rsvd:        0
> > HugePages_Surp:        0
> > Hugepagesize:         64 kB
> >
> > HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
> > are not supported at boot-time, but this is only checked in
> > hugetlb_init(). Extract the check to a helper function, and use it in a
> > few relevant places.
> >
> > This does make hugetlbfs not supported in this environment. I believe
> > this is fine, as there are no valid hugepages and that won't change at
> > runtime.
> >
> > Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> 
> 
> Looks good. Can you resubmit it as a proper patch ?

Will Cc you on that.


> You may also want to capture in commit message saying hugetlbfs file
> system also will not be registered. 

I did that already:

> > This does make hugetlbfs not supported in this environment. I
> > believe this is fine, as there are no valid hugepages and that won't
> > change at runtime.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-04-03 23:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-24 23:02 powerpc hugepage bug(s) when no valid hstates? Nishanth Aravamudan
2014-03-24 23:02 ` Nishanth Aravamudan
2014-03-26 15:58 ` [RFC PATCH] hugetlb: ensure hugepage access is denied if hugepages are not supported Nishanth Aravamudan
2014-03-26 15:58   ` Nishanth Aravamudan
2014-04-02 17:16   ` Nishanth Aravamudan
2014-04-02 17:16     ` Nishanth Aravamudan
2014-04-03 16:19   ` Aneesh Kumar K.V
2014-04-03 16:19     ` Aneesh Kumar K.V
2014-04-03 23:12     ` Nishanth Aravamudan
2014-04-03 23:12       ` Nishanth Aravamudan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.