linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* hv_hypercall_pg page permissios
@ 2020-04-07  6:55 Christoph Hellwig
  2020-04-07  7:28 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-04-07  6:55 UTC (permalink / raw)
  To: K. Y. Srinivasan, Stephen Hemminger, Vitaly Kuznetsov,
	Andy Lutomirski, Peter Zijlstra
  Cc: x86, linux-hyperv, linux-kernel

Hi all,

The x86 Hyper-V hypercall page (hv_hypercall_pg) is the only allocation
in the kernel using __vmalloc with exectutable persmissions, and the
only user of PAGE_KERNEL_RX.  Is there any good reason it needs to
be readable?  Otherwise we could use vmalloc_exec and kill off
PAGE_KERNEL_RX.  Note that before 372b1e91343e6 ("drivers: hv: Turn off
write permission on the hypercall page") it was even mapped writable..

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-04-07  6:55 hv_hypercall_pg page permissios Christoph Hellwig
@ 2020-04-07  7:28 ` Vitaly Kuznetsov
  2020-04-07  7:38   ` Christoph Hellwig
  2020-04-07 18:10   ` Dexuan Cui
  0 siblings, 2 replies; 27+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-07  7:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: x86, linux-hyperv, linux-kernel, K. Y. Srinivasan,
	Stephen Hemminger, Andy Lutomirski, Peter Zijlstra

Christoph Hellwig <hch@lst.de> writes:

> Hi all,
>
> The x86 Hyper-V hypercall page (hv_hypercall_pg) is the only allocation
> in the kernel using __vmalloc with exectutable persmissions, and the
> only user of PAGE_KERNEL_RX.  Is there any good reason it needs to
> be readable?  Otherwise we could use vmalloc_exec and kill off
> PAGE_KERNEL_RX.  Note that before 372b1e91343e6 ("drivers: hv: Turn off
> write permission on the hypercall page") it was even mapped writable..

[There is nothing secret in the hypercall page, by reading it you can
figure out if you're running on Intel or AMD (VMCALL/VMMCALL) but it's
likely not the only possible way :-)]

I see no reason for hv_hypercall_pg to remain readable. I just
smoke-tested

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7581cab74acb..17845db67fe2 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -382,7 +382,7 @@ void __init hyperv_init(void)
        guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
        wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-       hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
+       hv_hypercall_pg  = vmalloc_exec(PAGE_SIZE);
        if (hv_hypercall_pg == NULL) {
                wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
                goto remove_cpuhp_state;

on a Hyper-V 2016 guest and nothing broke, feel free to go ahead and
kill PAGE_KERNEL_RX.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-04-07  7:28 ` Vitaly Kuznetsov
@ 2020-04-07  7:38   ` Christoph Hellwig
  2020-04-07 21:01     ` Andy Lutomirski
  2020-04-07 18:10   ` Dexuan Cui
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-04-07  7:38 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Christoph Hellwig, x86, linux-hyperv, linux-kernel,
	K. Y. Srinivasan, Stephen Hemminger, Andy Lutomirski,
	Peter Zijlstra

On Tue, Apr 07, 2020 at 09:28:01AM +0200, Vitaly Kuznetsov wrote:
> Christoph Hellwig <hch@lst.de> writes:
> 
> > Hi all,
> >
> > The x86 Hyper-V hypercall page (hv_hypercall_pg) is the only allocation
> > in the kernel using __vmalloc with exectutable persmissions, and the
> > only user of PAGE_KERNEL_RX.  Is there any good reason it needs to
> > be readable?  Otherwise we could use vmalloc_exec and kill off
> > PAGE_KERNEL_RX.  Note that before 372b1e91343e6 ("drivers: hv: Turn off
> > write permission on the hypercall page") it was even mapped writable..
> 
> [There is nothing secret in the hypercall page, by reading it you can
> figure out if you're running on Intel or AMD (VMCALL/VMMCALL) but it's
> likely not the only possible way :-)]
> 
> I see no reason for hv_hypercall_pg to remain readable. I just
> smoke-tested

Thanks, I have the same in my WIP tree, but just wanted to confirm this
makes sense.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: hv_hypercall_pg page permissios
  2020-04-07  7:28 ` Vitaly Kuznetsov
  2020-04-07  7:38   ` Christoph Hellwig
@ 2020-04-07 18:10   ` Dexuan Cui
  2020-04-07 20:42     ` Wei Liu
  1 sibling, 1 reply; 27+ messages in thread
From: Dexuan Cui @ 2020-04-07 18:10 UTC (permalink / raw)
  To: vkuznets, Christoph Hellwig
  Cc: x86, linux-hyperv, linux-kernel, KY Srinivasan,
	Stephen Hemminger, Andy Lutomirski, Peter Zijlstra

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Vitaly Kuznetsov
> Sent: Tuesday, April 7, 2020 12:28 AM
> Christoph Hellwig <hch@lst.de> writes:
> 
> > Hi all,
> >
> > The x86 Hyper-V hypercall page (hv_hypercall_pg) is the only allocation
> > in the kernel using __vmalloc with exectutable persmissions, and the
> > only user of PAGE_KERNEL_RX.  Is there any good reason it needs to
> > be readable?  Otherwise we could use vmalloc_exec and kill off
> > PAGE_KERNEL_RX.  Note that before 372b1e91343e6 ("drivers: hv: Turn
> off
> > write permission on the hypercall page") it was even mapped writable..
> 
> [There is nothing secret in the hypercall page, by reading it you can
> figure out if you're running on Intel or AMD (VMCALL/VMMCALL) but it's
> likely not the only possible way :-)]
> 
> I see no reason for hv_hypercall_pg to remain readable. I just
> smoke-tested
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 7581cab74acb..17845db67fe2 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -382,7 +382,7 @@ void __init hyperv_init(void)
>         guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>         wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
> -       hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL,
> PAGE_KERNEL_RX);
> +       hv_hypercall_pg  = vmalloc_exec(PAGE_SIZE);

If we try to write into the page, Hyper-V will kill the guest immediately
by a virtual double-fault (or triple fault?), IIRC.

> on a Hyper-V 2016 guest and nothing broke, feel free to go ahead and
> kill PAGE_KERNEL_RX.
> --
> Vitaly

This should be OK. Just remember never try to write into the page, unless
you're trying to use this as a means of forcing a guest reboot. :-)

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-04-07 18:10   ` Dexuan Cui
@ 2020-04-07 20:42     ` Wei Liu
  0 siblings, 0 replies; 27+ messages in thread
From: Wei Liu @ 2020-04-07 20:42 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: vkuznets, Christoph Hellwig, x86, linux-hyperv, linux-kernel,
	KY Srinivasan, Stephen Hemminger, Andy Lutomirski,
	Peter Zijlstra, Wei Liu

On Tue, Apr 07, 2020 at 06:10:42PM +0000, Dexuan Cui wrote:
> > From: linux-hyperv-owner@vger.kernel.org
> > <linux-hyperv-owner@vger.kernel.org> On Behalf Of Vitaly Kuznetsov
> > Sent: Tuesday, April 7, 2020 12:28 AM
> > Christoph Hellwig <hch@lst.de> writes:
> > 
> > > Hi all,
> > >
> > > The x86 Hyper-V hypercall page (hv_hypercall_pg) is the only allocation
> > > in the kernel using __vmalloc with exectutable persmissions, and the
> > > only user of PAGE_KERNEL_RX.  Is there any good reason it needs to
> > > be readable?  Otherwise we could use vmalloc_exec and kill off
> > > PAGE_KERNEL_RX.  Note that before 372b1e91343e6 ("drivers: hv: Turn
> > off
> > > write permission on the hypercall page") it was even mapped writable..
> > 
> > [There is nothing secret in the hypercall page, by reading it you can
> > figure out if you're running on Intel or AMD (VMCALL/VMMCALL) but it's
> > likely not the only possible way :-)]
> > 
> > I see no reason for hv_hypercall_pg to remain readable. I just
> > smoke-tested
> > 
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index 7581cab74acb..17845db67fe2 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -382,7 +382,7 @@ void __init hyperv_init(void)
> >         guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
> >         wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> > 
> > -       hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL,
> > PAGE_KERNEL_RX);
> > +       hv_hypercall_pg  = vmalloc_exec(PAGE_SIZE);
> 
> If we try to write into the page, Hyper-V will kill the guest immediately
> by a virtual double-fault (or triple fault?), IIRC.
> 

The guest would get injected a #GP fault in that case FWIW.  Perhaps
that leads to further double-fault or triple-fault.

Wei.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-04-07  7:38   ` Christoph Hellwig
@ 2020-04-07 21:01     ` Andy Lutomirski
  2020-06-12  7:48       ` Dexuan Cui
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2020-04-07 21:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vitaly Kuznetsov, x86, linux-hyperv, linux-kernel,
	K. Y. Srinivasan, Stephen Hemminger, Andy Lutomirski,
	Peter Zijlstra


> On Apr 7, 2020, at 12:38 AM, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Tue, Apr 07, 2020 at 09:28:01AM +0200, Vitaly Kuznetsov wrote:
>> Christoph Hellwig <hch@lst.de> writes:
>> 
>>> Hi all,
>>> 
>>> The x86 Hyper-V hypercall page (hv_hypercall_pg) is the only allocation
>>> in the kernel using __vmalloc with exectutable persmissions, and the
>>> only user of PAGE_KERNEL_RX.  Is there any good reason it needs to
>>> be readable?  Otherwise we could use vmalloc_exec and kill off
>>> PAGE_KERNEL_RX.  Note that before 372b1e91343e6 ("drivers: hv: Turn off
>>> write permission on the hypercall page") it was even mapped writable..
>> 
>> [There is nothing secret in the hypercall page, by reading it you can
>> figure out if you're running on Intel or AMD (VMCALL/VMMCALL) but it's
>> likely not the only possible way :-)]
>> 
>> I see no reason for hv_hypercall_pg to remain readable. I just
>> smoke-tested
> 
> Thanks, I have the same in my WIP tree, but just wanted to confirm this
> makes sense.

Just to make sure we’re all on the same page: x86 doesn’t normally have an execute-only mode. Executable memory in the kernel is readable unless you are using fancy hypervisor-based XO support.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: hv_hypercall_pg page permissios
  2020-04-07 21:01     ` Andy Lutomirski
@ 2020-06-12  7:48       ` Dexuan Cui
  2020-06-15  8:35         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 27+ messages in thread
From: Dexuan Cui @ 2020-06-12  7:48 UTC (permalink / raw)
  To: Andy Lutomirski, vkuznets, Christoph Hellwig, Michael Kelley,
	Ju-Hyoung Lee
  Cc: x86, linux-hyperv, linux-kernel, KY Srinivasan,
	Stephen Hemminger, Andy Lutomirski, Peter Zijlstra

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Andy Lutomirski
> Sent: Tuesday, April 7, 2020 2:01 PM
> To: Christoph Hellwig <hch@lst.de>
> Cc: vkuznets <vkuznets@redhat.com>; x86@kernel.org;
> linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <stephen@networkplumber.org>;
> Andy Lutomirski <luto@kernel.org>; Peter Zijlstra <peterz@infradead.org>
> Subject: Re: hv_hypercall_pg page permissios
> 
> 
> > On Apr 7, 2020, at 12:38 AM, Christoph Hellwig <hch@lst.de> wrote:
> >
> > On Tue, Apr 07, 2020 at 09:28:01AM +0200, Vitaly Kuznetsov wrote:
> >> Christoph Hellwig <hch@lst.de> writes:
> >>
> >>> Hi all,
> >>>
> >>> The x86 Hyper-V hypercall page (hv_hypercall_pg) is the only allocation
> >>> in the kernel using __vmalloc with exectutable persmissions, and the
> >>> only user of PAGE_KERNEL_RX.  Is there any good reason it needs to
> >>> be readable?  Otherwise we could use vmalloc_exec and kill off
> >>> PAGE_KERNEL_RX.  Note that before 372b1e91343e6 ("drivers: hv: Turn
> off
> >>> write permission on the hypercall page") it was even mapped writable..
> >>
> >> [There is nothing secret in the hypercall page, by reading it you can
> >> figure out if you're running on Intel or AMD (VMCALL/VMMCALL) but it's
> >> likely not the only possible way :-)]
> >>
> >> I see no reason for hv_hypercall_pg to remain readable. I just
> >> smoke-tested
> >
> > Thanks, I have the same in my WIP tree, but just wanted to confirm this
> > makes sense.
> 
> Just to make sure we’re all on the same page: x86 doesn’t normally have an
> execute-only mode. Executable memory in the kernel is readable unless you
> are using fancy hypervisor-based XO support.

Hi hch,
The patch is merged into the mainine recently, but unluckily we noticed
a warning with CONFIG_DEBUG_WX=y (it looks typically this config is defined
by default in Linux distros, at least in Ubuntu 18.04's  
/boot/config-4.18.0-11-generic).

Should we revert this patch, or figure out a way to ask the DEBUG_WX code to
ignore this page?

[   19.387536] debug: unmapping init [mem 0xffffffff82713000-0xffffffff82886fff]
[   19.431766] Write protecting the kernel read-only data: 18432k
[   19.438662] debug: unmapping init [mem 0xffffffff81c02000-0xffffffff81dfffff]
[   19.446830] debug: unmapping init [mem 0xffffffff821d6000-0xffffffff821fffff]
[   19.522368] ------------[ cut here ]------------
[   19.527495] x86/mm: Found insecure W+X mapping at address 0xffffc90000012000
[   19.535066] WARNING: CPU: 26 PID: 1 at arch/x86/mm/dump_pagetables.c:248 note_page+0x639/0x690
[   19.539038] Modules linked in:
[   19.539038] CPU: 26 PID: 1 Comm: swapper/0 Not tainted 5.7.0+ #1
[   19.539038] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[   19.539038] RIP: 0010:note_page+0x639/0x690
[   19.539038] Code: fe ff ff 31 c0 e9 a0 fe ff ff 80 3d 39 d1 31 01 00 0f 85 76 fa ff ff 48 c7 c7 98 55 0a 82 c6 05 25 d1 31 01 01 e8 f7 c9 00 00 <0f> 0b e9 5c fa ff ff 48 83 c0 18 48 c7 45 68 00 00 00 00 48 89 45
[   19.539038] RSP: 0000:ffffc90003137cb0 EFLAGS: 00010282
[   19.539038] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000007
[   19.539038] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff810fa9c4
[   19.539038] RBP: ffffc90003137ea0 R08: 0000000000000000 R09: 0000000000000000
[   19.539038] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000013000
[   19.539038] R13: 0000000000000000 R14: ffffc900001ff000 R15: 0000000000000000
[   19.539038] FS:  0000000000000000(0000) GS:ffff8884dad00000(0000) knlGS:0000000000000000
[   19.539038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.539038] CR2: 0000000000000000 CR3: 0000000002210001 CR4: 00000000003606e0
[   19.539038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   19.539038] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   19.539038] Call Trace:
[   19.539038]  ptdump_pte_entry+0x39/0x40
[   19.539038]  __walk_page_range+0x5b7/0x960
[   19.539038]  walk_page_range_novma+0x7e/0xd0
[   19.539038]  ptdump_walk_pgd+0x53/0x90
[   19.539038]  ptdump_walk_pgd_level_core+0xdf/0x110
[   19.539038]  ? ptdump_walk_pgd_level_debugfs+0x40/0x40
[   19.539038]  ? hugetlb_get_unmapped_area+0x2f0/0x2f0
[   19.703692]  ? rest_init+0x24d/0x24d
[   19.703692]  ? rest_init+0x24d/0x24d
[   19.703692]  kernel_init+0x2c/0x113
[   19.703692]  ret_from_fork+0x24/0x30
[   19.703692] irq event stamp: 2840666
[   19.703692] hardirqs last  enabled at (2840665): [<ffffffff810fa9c4>] console_unlock+0x444/0x5b0
[   19.703692] hardirqs last disabled at (2840666): [<ffffffff81001ec9>] trace_hardirqs_off_thunk+0x1a/0x1c
[   19.703692] softirqs last  enabled at (2840662): [<ffffffff81c00366>] __do_softirq+0x366/0x490
[   19.703692] softirqs last disabled at (2840655): [<ffffffff8107dba8>] irq_exit+0xe8/0x100
[   19.703692] ---[ end trace 99ca90806a8e657c ]---
[   19.786235] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
[   19.793298] rodata_test: all tests were successful
[   19.798508] x86/mm: Checking user space page tables
[   19.818007] x86/mm: Checked W+X mappings: passed, no W+X pages found.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: hv_hypercall_pg page permissios
  2020-06-12  7:48       ` Dexuan Cui
@ 2020-06-15  8:35         ` Vitaly Kuznetsov
  2020-06-15 17:41           ` Dexuan Cui
  0 siblings, 1 reply; 27+ messages in thread
From: Vitaly Kuznetsov @ 2020-06-15  8:35 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Stephen Hemminger, Andy Lutomirski, Peter Zijlstra,
	Andy Lutomirski, Christoph Hellwig, Michael Kelley,
	Ju-Hyoung Lee, x86, linux-hyperv, linux-kernel, KY Srinivasan

Dexuan Cui <decui@microsoft.com> writes:

>> From: linux-hyperv-owner@vger.kernel.org
>> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Andy Lutomirski
>> Sent: Tuesday, April 7, 2020 2:01 PM
>> To: Christoph Hellwig <hch@lst.de>
>> Cc: vkuznets <vkuznets@redhat.com>; x86@kernel.org;
>> linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; KY Srinivasan
>> <kys@microsoft.com>; Stephen Hemminger <stephen@networkplumber.org>;
>> Andy Lutomirski <luto@kernel.org>; Peter Zijlstra <peterz@infradead.org>
>> Subject: Re: hv_hypercall_pg page permissios
>> 
>> 
>> > On Apr 7, 2020, at 12:38 AM, Christoph Hellwig <hch@lst.de> wrote:
>> >
>> > On Tue, Apr 07, 2020 at 09:28:01AM +0200, Vitaly Kuznetsov wrote:
>> >> Christoph Hellwig <hch@lst.de> writes:
>> >>
>> >>> Hi all,
>> >>>
>> >>> The x86 Hyper-V hypercall page (hv_hypercall_pg) is the only allocation
>> >>> in the kernel using __vmalloc with exectutable persmissions, and the
>> >>> only user of PAGE_KERNEL_RX.  Is there any good reason it needs to
>> >>> be readable?  Otherwise we could use vmalloc_exec and kill off
>> >>> PAGE_KERNEL_RX.  Note that before 372b1e91343e6 ("drivers: hv: Turn
>> off
>> >>> write permission on the hypercall page") it was even mapped writable..
>> >>
>> >> [There is nothing secret in the hypercall page, by reading it you can
>> >> figure out if you're running on Intel or AMD (VMCALL/VMMCALL) but it's
>> >> likely not the only possible way :-)]
>> >>
>> >> I see no reason for hv_hypercall_pg to remain readable. I just
>> >> smoke-tested
>> >
>> > Thanks, I have the same in my WIP tree, but just wanted to confirm this
>> > makes sense.
>> 
>> Just to make sure we’re all on the same page: x86 doesn’t normally have an
>> execute-only mode. Executable memory in the kernel is readable unless you
>> are using fancy hypervisor-based XO support.
>
> Hi hch,
> The patch is merged into the mainine recently, but unluckily we noticed
> a warning with CONFIG_DEBUG_WX=y (it looks typically this config is defined
> by default in Linux distros, at least in Ubuntu 18.04's  
> /boot/config-4.18.0-11-generic).
>
> Should we revert this patch, or figure out a way to ask the DEBUG_WX code to
> ignore this page?
>

Are you sure it is hv_hypercall_pg? AFAIU it shouldn't be W+X as we
are allocating it with vmalloc_exec(). In other words, if you revert
78bb17f76edc, does the issue go away?

> [   19.387536] debug: unmapping init [mem 0xffffffff82713000-0xffffffff82886fff]
> [   19.431766] Write protecting the kernel read-only data: 18432k
> [   19.438662] debug: unmapping init [mem 0xffffffff81c02000-0xffffffff81dfffff]
> [   19.446830] debug: unmapping init [mem 0xffffffff821d6000-0xffffffff821fffff]
> [   19.522368] ------------[ cut here ]------------
> [   19.527495] x86/mm: Found insecure W+X mapping at address 0xffffc90000012000
> [   19.535066] WARNING: CPU: 26 PID: 1 at arch/x86/mm/dump_pagetables.c:248 note_page+0x639/0x690
> [   19.539038] Modules linked in:
> [   19.539038] CPU: 26 PID: 1 Comm: swapper/0 Not tainted 5.7.0+ #1
> [   19.539038] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> [   19.539038] RIP: 0010:note_page+0x639/0x690
> [   19.539038] Code: fe ff ff 31 c0 e9 a0 fe ff ff 80 3d 39 d1 31 01 00 0f 85 76 fa ff ff 48 c7 c7 98 55 0a 82 c6 05 25 d1 31 01 01 e8 f7 c9 00 00 <0f> 0b e9 5c fa ff ff 48 83 c0 18 48 c7 45 68 00 00 00 00 48 89 45
> [   19.539038] RSP: 0000:ffffc90003137cb0 EFLAGS: 00010282
> [   19.539038] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000007
> [   19.539038] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff810fa9c4
> [   19.539038] RBP: ffffc90003137ea0 R08: 0000000000000000 R09: 0000000000000000
> [   19.539038] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000013000
> [   19.539038] R13: 0000000000000000 R14: ffffc900001ff000 R15: 0000000000000000
> [   19.539038] FS:  0000000000000000(0000) GS:ffff8884dad00000(0000) knlGS:0000000000000000
> [   19.539038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   19.539038] CR2: 0000000000000000 CR3: 0000000002210001 CR4: 00000000003606e0
> [   19.539038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   19.539038] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   19.539038] Call Trace:
> [   19.539038]  ptdump_pte_entry+0x39/0x40
> [   19.539038]  __walk_page_range+0x5b7/0x960
> [   19.539038]  walk_page_range_novma+0x7e/0xd0
> [   19.539038]  ptdump_walk_pgd+0x53/0x90
> [   19.539038]  ptdump_walk_pgd_level_core+0xdf/0x110
> [   19.539038]  ? ptdump_walk_pgd_level_debugfs+0x40/0x40
> [   19.539038]  ? hugetlb_get_unmapped_area+0x2f0/0x2f0
> [   19.703692]  ? rest_init+0x24d/0x24d
> [   19.703692]  ? rest_init+0x24d/0x24d
> [   19.703692]  kernel_init+0x2c/0x113
> [   19.703692]  ret_from_fork+0x24/0x30
> [   19.703692] irq event stamp: 2840666
> [   19.703692] hardirqs last  enabled at (2840665): [<ffffffff810fa9c4>] console_unlock+0x444/0x5b0
> [   19.703692] hardirqs last disabled at (2840666): [<ffffffff81001ec9>] trace_hardirqs_off_thunk+0x1a/0x1c
> [   19.703692] softirqs last  enabled at (2840662): [<ffffffff81c00366>] __do_softirq+0x366/0x490
> [   19.703692] softirqs last disabled at (2840655): [<ffffffff8107dba8>] irq_exit+0xe8/0x100
> [   19.703692] ---[ end trace 99ca90806a8e657c ]---
> [   19.786235] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> [   19.793298] rodata_test: all tests were successful
> [   19.798508] x86/mm: Checking user space page tables
> [   19.818007] x86/mm: Checked W+X mappings: passed, no W+X pages found.
>
> Thanks,
> -- Dexuan

-- 
Vitaly


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: hv_hypercall_pg page permissios
  2020-06-15  8:35         ` Vitaly Kuznetsov
@ 2020-06-15 17:41           ` Dexuan Cui
  2020-06-15 19:49             ` Dexuan Cui
  0 siblings, 1 reply; 27+ messages in thread
From: Dexuan Cui @ 2020-06-15 17:41 UTC (permalink / raw)
  To: vkuznets, Christoph Hellwig
  Cc: Stephen Hemminger, Andy Lutomirski, Peter Zijlstra,
	Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee, x86,
	linux-hyperv, linux-kernel, KY Srinivasan

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Sent: Monday, June 15, 2020 1:35 AM
> Dexuan Cui <decui@microsoft.com> writes:
> 
> >> From: linux-hyperv-owner@vger.kernel.org
> >> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Andy Lutomirski
> >> Sent: Tuesday, April 7, 2020 2:01 PM
> >> > On Apr 7, 2020, at 12:38 AM, Christoph Hellwig <hch@lst.de> wrote:
> >> >
> >> > On Tue, Apr 07, 2020 at 09:28:01AM +0200, Vitaly Kuznetsov wrote:
> >> >> Christoph Hellwig <hch@lst.de> writes:
> >> >>
> >> >>> Hi all,
> >> >>>
> >> >>> The x86 Hyper-V hypercall page (hv_hypercall_pg) is the only allocation
> >> >>> in the kernel using __vmalloc with exectutable persmissions, and the
> >> >>> only user of PAGE_KERNEL_RX.  Is there any good reason it needs to
> >> >>> be readable?  Otherwise we could use vmalloc_exec and kill off
> >> >>> PAGE_KERNEL_RX.  Note that before 372b1e91343e6 ("drivers: hv:
> Turn
> >> off
> >> >>> write permission on the hypercall page") it was even mapped writable..
> >> >>
> >> >> [There is nothing secret in the hypercall page, by reading it you can
> >> >> figure out if you're running on Intel or AMD (VMCALL/VMMCALL) but it's
> >> >> likely not the only possible way :-)]
> >> >>
> >> >> I see no reason for hv_hypercall_pg to remain readable. I just
> >> >> smoke-tested
> >> >
> >> > Thanks, I have the same in my WIP tree, but just wanted to confirm this
> >> > makes sense.
> >>
> >> Just to make sure we’re all on the same page: x86 doesn’t normally have
> an
> >> execute-only mode. Executable memory in the kernel is readable unless you
> >> are using fancy hypervisor-based XO support.
> >
> > Hi hch,
> > The patch is merged into the mainine recently, but unluckily we noticed
> > a warning with CONFIG_DEBUG_WX=y (it looks typically this config is defined
> > by default in Linux distros, at least in Ubuntu 18.04's
> > /boot/config-4.18.0-11-generic).
> >
> > Should we revert this patch, or figure out a way to ask the DEBUG_WX code
> to
> > ignore this page?
> >
> 
> Are you sure it is hv_hypercall_pg? 
Yes, 100% sure. I printed the value of hv_hypercall_pg and and it matched the
address in the warning line " x86/mm: Found insecure W+X mapping at address".

> AFAIU it shouldn't be W+X as we
> are allocating it with vmalloc_exec(). In other words, if you revert
> 78bb17f76edc, does the issue go away?
> 
> Vitaly

Yes, the warning goes away if I revert
78bb17f76edc ("x86/hyperv: use vmalloc_exec for the hypercall page")
88dca4ca5a93 ("mm: remove the pgprot argument to __vmalloc") 
(I have to revert the second as well with some manual adjustments, since
__vmalloc() has 2 parameters now.)

Thanks,
Dexuan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: hv_hypercall_pg page permissios
  2020-06-15 17:41           ` Dexuan Cui
@ 2020-06-15 19:49             ` Dexuan Cui
  2020-06-16  7:23               ` Christoph Hellwig
  2020-06-16  9:29               ` Vitaly Kuznetsov
  0 siblings, 2 replies; 27+ messages in thread
From: Dexuan Cui @ 2020-06-15 19:49 UTC (permalink / raw)
  To: vkuznets, Christoph Hellwig
  Cc: Stephen Hemminger, Andy Lutomirski, Peter Zijlstra,
	Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee, x86,
	linux-hyperv, linux-kernel, KY Srinivasan

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
> Sent: Monday, June 15, 2020 10:42 AM
> > >
> > > Hi hch,
> > > The patch is merged into the mainine recently, but unluckily we noticed
> > > a warning with CONFIG_DEBUG_WX=y
> > >
> > > Should we revert this patch, or figure out a way to ask the DEBUG_WX
> > > code to ignore this page?
> >
> > Are you sure it is hv_hypercall_pg?
> Yes, 100% sure. I printed the value of hv_hypercall_pg and and it matched the
> address in the warning line " x86/mm: Found insecure W+X mapping at
> address".

I did this experiment:
  1. export vmalloc_exec and ptdump_walk_pgd_level_checkwx.
  2. write a test module that calls them.
  3. It turns out that every call of vmalloc_exec() triggers such a warning.

vmalloc_exec() uses PAGE_KERNEL_EXEC, which is defined as
   (__PP|__RW|   0|___A|   0|___D|   0|___G)

It looks the logic in note_page() is: for_each_RW_page, if the NX bit is unset,
then report the page as an insecure W+X mapping. IMO this explains the
warning?

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-15 19:49             ` Dexuan Cui
@ 2020-06-16  7:23               ` Christoph Hellwig
  2020-06-16 10:18                 ` Peter Zijlstra
  2020-06-16  9:29               ` Vitaly Kuznetsov
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16  7:23 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: vkuznets, Christoph Hellwig, Stephen Hemminger, Andy Lutomirski,
	Peter Zijlstra, Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee,
	x86, linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Mon, Jun 15, 2020 at 07:49:41PM +0000, Dexuan Cui wrote:
> I did this experiment:
>   1. export vmalloc_exec and ptdump_walk_pgd_level_checkwx.
>   2. write a test module that calls them.
>   3. It turns out that every call of vmalloc_exec() triggers such a warning.
> 
> vmalloc_exec() uses PAGE_KERNEL_EXEC, which is defined as
>    (__PP|__RW|   0|___A|   0|___D|   0|___G)
> 
> It looks the logic in note_page() is: for_each_RW_page, if the NX bit is unset,
> then report the page as an insecure W+X mapping. IMO this explains the
> warning?

It does.  But it also means every other user of PAGE_KERNEL_EXEC
should trigger this, of which there are a few (kexec, tboot, hibernate,
early xen pv mapping, early SEV identity mapping)

We really shouldn't create mappings like this by default.  Either we
need to flip PAGE_KERNEL_EXEC itself based on the needs of the above
users, or add another define to overload vmalloc_exec as there is no
other user of that for x86.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: hv_hypercall_pg page permissios
  2020-06-15 19:49             ` Dexuan Cui
  2020-06-16  7:23               ` Christoph Hellwig
@ 2020-06-16  9:29               ` Vitaly Kuznetsov
  2020-06-16  9:33                 ` Christoph Hellwig
  1 sibling, 1 reply; 27+ messages in thread
From: Vitaly Kuznetsov @ 2020-06-16  9:29 UTC (permalink / raw)
  To: Dexuan Cui, Christoph Hellwig
  Cc: Stephen Hemminger, Andy Lutomirski, Peter Zijlstra,
	Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee, x86,
	linux-hyperv, linux-kernel, KY Srinivasan

Dexuan Cui <decui@microsoft.com> writes:

>> From: linux-hyperv-owner@vger.kernel.org
>> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
>> Sent: Monday, June 15, 2020 10:42 AM
>> > >
>> > > Hi hch,
>> > > The patch is merged into the mainine recently, but unluckily we noticed
>> > > a warning with CONFIG_DEBUG_WX=y
>> > >
>> > > Should we revert this patch, or figure out a way to ask the DEBUG_WX
>> > > code to ignore this page?
>> >
>> > Are you sure it is hv_hypercall_pg?
>> Yes, 100% sure. I printed the value of hv_hypercall_pg and and it matched the
>> address in the warning line " x86/mm: Found insecure W+X mapping at
>> address".
>
> I did this experiment:
>   1. export vmalloc_exec and ptdump_walk_pgd_level_checkwx.
>   2. write a test module that calls them.
>   3. It turns out that every call of vmalloc_exec() triggers such a warning.
>
> vmalloc_exec() uses PAGE_KERNEL_EXEC, which is defined as
>    (__PP|__RW|   0|___A|   0|___D|   0|___G)
>
> It looks the logic in note_page() is: for_each_RW_page, if the NX bit is unset,
> then report the page as an insecure W+X mapping. IMO this explains the
> warning?

Yea, bummer.

it seems we need something like PAGE_KERNEL_READONLY_EXEC but we don't
seem to have one on x86. Hypercall page is special in a way that the
guest doesn't need to write there at all. vmalloc_exec() seems to have
only one other user on x86: module_alloc() and it has other needs. On
ARM, alloc_insn_page() does the following:

arch/arm64/kernel/probes/kprobes.c:     page = vmalloc_exec(PAGE_SIZE);
arch/arm64/kernel/probes/kprobes.c-     if (page) {
arch/arm64/kernel/probes/kprobes.c-             set_memory_ro((unsigned long)page, 1);
arch/arm64/kernel/probes/kprobes.c-             set_vm_flush_reset_perms(page);
arch/arm64/kernel/probes/kprobes.c-     }

What if we do the same? (almost untested):

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e2137070386a..31aadfea589b 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -23,6 +23,7 @@
 #include <linux/kernel.h>
 #include <linux/cpuhotplug.h>
 #include <linux/syscore_ops.h>
+#include <linux/set_memory.h>
 #include <clocksource/hyperv_timer.h>
 
 void *hv_hypercall_pg;
@@ -383,6 +384,8 @@ void __init hyperv_init(void)
                wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
                goto remove_cpuhp_state;
        }
+       set_memory_ro((unsigned long)hv_hypercall_pg, 1);
+       set_vm_flush_reset_perms(hv_hypercall_pg);
 
        rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
        hypercall_msr.enable = 1;

-- 
Vitaly


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16  9:29               ` Vitaly Kuznetsov
@ 2020-06-16  9:33                 ` Christoph Hellwig
  2020-06-16  9:55                   ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16  9:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Dexuan Cui, Christoph Hellwig, Stephen Hemminger,
	Andy Lutomirski, Peter Zijlstra, Andy Lutomirski, Michael Kelley,
	Ju-Hyoung Lee, x86, linux-hyperv, linux-kernel, KY Srinivasan

On Tue, Jun 16, 2020 at 11:29:33AM +0200, Vitaly Kuznetsov wrote:
> it seems we need something like PAGE_KERNEL_READONLY_EXEC but we don't
> seem to have one on x86. Hypercall page is special in a way that the
> guest doesn't need to write there at all. vmalloc_exec() seems to have
> only one other user on x86: module_alloc() and it has other needs.

module_alloc actually is a weak function and overriden on x86 (and many
other architectures) , so it isn't used either (did I mention that I hate
weak functions?)

> On
> ARM, alloc_insn_page() does the following:



> 
> arch/arm64/kernel/probes/kprobes.c:     page = vmalloc_exec(PAGE_SIZE);
> arch/arm64/kernel/probes/kprobes.c-     if (page) {
> arch/arm64/kernel/probes/kprobes.c-             set_memory_ro((unsigned long)page, 1);
> arch/arm64/kernel/probes/kprobes.c-             set_vm_flush_reset_perms(page);
> arch/arm64/kernel/probes/kprobes.c-     }
> 
> What if we do the same? (almost untested):
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e2137070386a..31aadfea589b 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -23,6 +23,7 @@
>  #include <linux/kernel.h>
>  #include <linux/cpuhotplug.h>
>  #include <linux/syscore_ops.h>
> +#include <linux/set_memory.h>
>  #include <clocksource/hyperv_timer.h>
>  
>  void *hv_hypercall_pg;
> @@ -383,6 +384,8 @@ void __init hyperv_init(void)
>                 wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>                 goto remove_cpuhp_state;
>         }
> +       set_memory_ro((unsigned long)hv_hypercall_pg, 1);
> +       set_vm_flush_reset_perms(hv_hypercall_pg);

This should work and might be the best for 5.8, but I think we need
to sort this whole mess out for real.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16  9:33                 ` Christoph Hellwig
@ 2020-06-16  9:55                   ` Christoph Hellwig
  2020-06-16 10:08                     ` Christoph Hellwig
  2020-06-16 10:20                     ` Peter Zijlstra
  0 siblings, 2 replies; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16  9:55 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Dexuan Cui, Christoph Hellwig, Stephen Hemminger,
	Andy Lutomirski, Peter Zijlstra, Andy Lutomirski, Michael Kelley,
	Ju-Hyoung Lee, x86, linux-hyperv, linux-kernel, KY Srinivasan

Actually, what do you think of this one:

---
From a5ee278f4244c6bfc0ce2d2b2fd4f99358dbde4d Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 16 Jun 2020 11:50:59 +0200
Subject: x86/hyperv: allocate the hypercall page with only read and execute
 bits

Avoid a W^X violation cause by the fact that PAGE_KERNEL_EXEC includes the
writable bit.

Fixes: 78bb17f76edc ("x86/hyperv: use vmalloc_exec for the hypercall page")
Reported-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/hyperv/hv_init.c            | 4 +++-
 arch/x86/include/asm/pgtable_types.h | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a54c6a401581dd..f4875bf05c17ff 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -375,7 +375,9 @@ void __init hyperv_init(void)
 	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-	hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
+	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
+			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_RX,
+			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, __func__);
 	if (hv_hypercall_pg == NULL) {
 		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 		goto remove_cpuhp_state;
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 2da1f95b88d761..591b5c92a66249 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -194,6 +194,7 @@ enum page_cache_mode {
 #define _PAGE_TABLE_NOENC	 (__PP|__RW|_USR|___A|   0|___D|   0|   0)
 #define _PAGE_TABLE		 (__PP|__RW|_USR|___A|   0|___D|   0|   0| _ENC)
 #define __PAGE_KERNEL_RO	 (__PP|   0|   0|___A|__NX|___D|   0|___G)
+#define __PAGE_KERNEL_RX	 (__PP|   0|   0|___A|   0|___D|   0|___G)
 #define __PAGE_KERNEL_NOCACHE	 (__PP|__RW|   0|___A|__NX|___D|   0|___G| __NC)
 #define __PAGE_KERNEL_VVAR	 (__PP|   0|_USR|___A|__NX|___D|   0|___G)
 #define __PAGE_KERNEL_LARGE	 (__PP|__RW|   0|___A|__NX|___D|_PSE|___G)
@@ -219,6 +220,7 @@ enum page_cache_mode {
 #define PAGE_KERNEL_RO		__pgprot_mask(__PAGE_KERNEL_RO         | _ENC)
 #define PAGE_KERNEL_EXEC	__pgprot_mask(__PAGE_KERNEL_EXEC       | _ENC)
 #define PAGE_KERNEL_EXEC_NOENC	__pgprot_mask(__PAGE_KERNEL_EXEC       |    0)
+#define PAGE_KERNEL_RX		__pgprot_mask(__PAGE_KERNEL_RX         | _ENC)
 #define PAGE_KERNEL_NOCACHE	__pgprot_mask(__PAGE_KERNEL_NOCACHE    | _ENC)
 #define PAGE_KERNEL_LARGE	__pgprot_mask(__PAGE_KERNEL_LARGE      | _ENC)
 #define PAGE_KERNEL_LARGE_EXEC	__pgprot_mask(__PAGE_KERNEL_LARGE_EXEC | _ENC)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16  9:55                   ` Christoph Hellwig
@ 2020-06-16 10:08                     ` Christoph Hellwig
  2020-06-16 10:50                       ` Vitaly Kuznetsov
  2020-06-16 10:20                     ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16 10:08 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Dexuan Cui, Christoph Hellwig, Stephen Hemminger,
	Andy Lutomirski, Peter Zijlstra, Andy Lutomirski, Michael Kelley,
	Ju-Hyoung Lee, x86, linux-hyperv, linux-kernel, KY Srinivasan

On Tue, Jun 16, 2020 at 11:55:49AM +0200, Christoph Hellwig wrote:
> Actually, what do you think of this one:

Plus this whole series to kill of vmalloc_exec entirely:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/vmalloc_exec-fixes

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16  7:23               ` Christoph Hellwig
@ 2020-06-16 10:18                 ` Peter Zijlstra
  2020-06-16 10:23                   ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-06-16 10:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dexuan Cui, vkuznets, Stephen Hemminger, Andy Lutomirski,
	Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee, x86,
	linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 09:23:18AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 15, 2020 at 07:49:41PM +0000, Dexuan Cui wrote:
> > I did this experiment:
> >   1. export vmalloc_exec and ptdump_walk_pgd_level_checkwx.
> >   2. write a test module that calls them.
> >   3. It turns out that every call of vmalloc_exec() triggers such a warning.
> > 
> > vmalloc_exec() uses PAGE_KERNEL_EXEC, which is defined as
> >    (__PP|__RW|   0|___A|   0|___D|   0|___G)
> > 
> > It looks the logic in note_page() is: for_each_RW_page, if the NX bit is unset,
> > then report the page as an insecure W+X mapping. IMO this explains the
> > warning?
> 
> It does.  But it also means every other user of PAGE_KERNEL_EXEC
> should trigger this, of which there are a few (kexec, tboot, hibernate,
> early xen pv mapping, early SEV identity mapping)

There are only 3 users in the entire tree afaict:

arch/arm64/kernel/probes/kprobes.c:     page = vmalloc_exec(PAGE_SIZE);
arch/x86/hyperv/hv_init.c:      hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
kernel/module.c:        return vmalloc_exec(size);

And that last one is a weak function that any arch that has STRICT_RWX
ought to override.

> We really shouldn't create mappings like this by default.  Either we
> need to flip PAGE_KERNEL_EXEC itself based on the needs of the above
> users, or add another define to overload vmalloc_exec as there is no
> other user of that for x86.

We really should get rid of the two !module users of this though; both
x86 and arm64 have STRICT_RWX and sufficient primitives to DTRT.

What is HV even trying to do with that page? AFAICT it never actually
writes to it, it seens to give the physica address to an MSR (which I
suspect then writes crud into the page for us from host context).

Suggesting the page really only needs to be RX.

On top of that, vmalloc_exec() gets us a page from the entire vmalloc
range, which can be outside of the 2G executable range, which seems to
suggest vmalloc_exec() is wrong too and all this works by accident.

How about something like this:


diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a54c6a401581..82a3a4a9481f 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -375,12 +375,15 @@ void __init hyperv_init(void)
 	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-	hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
+	hv_hypercall_pg = module_alloc(PAGE_SIZE);
 	if (hv_hypercall_pg == NULL) {
 		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 		goto remove_cpuhp_state;
 	}
 
+	set_memory_ro((unsigned long)hv_hypercall_pg, 1);
+	set_memory_x((unsigned long)hv_hypercall_pg, 1);
+
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	hypercall_msr.enable = 1;
 	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16  9:55                   ` Christoph Hellwig
  2020-06-16 10:08                     ` Christoph Hellwig
@ 2020-06-16 10:20                     ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2020-06-16 10:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vitaly Kuznetsov, Dexuan Cui, Stephen Hemminger, Andy Lutomirski,
	Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee, x86,
	linux-hyperv, linux-kernel, KY Srinivasan

On Tue, Jun 16, 2020 at 11:55:49AM +0200, Christoph Hellwig wrote:
> +	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
> +			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_RX,
> +			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, __func__);

I think that's wrong, they seem to want to CALL into that page, and that
doesn't unconditionally work for any page in the vmalloc range. This
really ought to use module_alloc() to guarantees it gets a page in the
correct range.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 10:18                 ` Peter Zijlstra
@ 2020-06-16 10:23                   ` Christoph Hellwig
  2020-06-16 10:24                     ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16 10:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Dexuan Cui, vkuznets, Stephen Hemminger,
	Andy Lutomirski, Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee,
	x86, linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 12:18:07PM +0200, Peter Zijlstra wrote:
> > It does.  But it also means every other user of PAGE_KERNEL_EXEC
> > should trigger this, of which there are a few (kexec, tboot, hibernate,
> > early xen pv mapping, early SEV identity mapping)
> 
> There are only 3 users in the entire tree afaict:
> 
> arch/arm64/kernel/probes/kprobes.c:     page = vmalloc_exec(PAGE_SIZE);
> arch/x86/hyperv/hv_init.c:      hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
> kernel/module.c:        return vmalloc_exec(size);
> 
> And that last one is a weak function that any arch that has STRICT_RWX
> ought to override.
> 
> > We really shouldn't create mappings like this by default.  Either we
> > need to flip PAGE_KERNEL_EXEC itself based on the needs of the above
> > users, or add another define to overload vmalloc_exec as there is no
> > other user of that for x86.
> 
> We really should get rid of the two !module users of this though; both
> x86 and arm64 have STRICT_RWX and sufficient primitives to DTRT.
> 
> What is HV even trying to do with that page? AFAICT it never actually
> writes to it, it seens to give the physica address to an MSR (which I
> suspect then writes crud into the page for us from host context).
> 
> Suggesting the page really only needs to be RX.
> 
> On top of that, vmalloc_exec() gets us a page from the entire vmalloc
> range, which can be outside of the 2G executable range, which seems to
> suggest vmalloc_exec() is wrong too and all this works by accident.
> 
> How about something like this:
> 
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index a54c6a401581..82a3a4a9481f 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -375,12 +375,15 @@ void __init hyperv_init(void)
>  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
>  
> -	hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
> +	hv_hypercall_pg = module_alloc(PAGE_SIZE);
>  	if (hv_hypercall_pg == NULL) {
>  		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>  		goto remove_cpuhp_state;
>  	}
>  
> +	set_memory_ro((unsigned long)hv_hypercall_pg, 1);
> +	set_memory_x((unsigned long)hv_hypercall_pg, 1);

The changing of the permissions sucks.  I thought about adding
a module_alloc_prot with an explicit pgprot_t argument.  On x86
alone at least ftrace would also benefit from that.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 10:23                   ` Christoph Hellwig
@ 2020-06-16 10:24                     ` Christoph Hellwig
  2020-06-16 10:31                       ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16 10:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Dexuan Cui, vkuznets, Stephen Hemminger,
	Andy Lutomirski, Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee,
	x86, linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 12:23:50PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 16, 2020 at 12:18:07PM +0200, Peter Zijlstra wrote:
> > > It does.  But it also means every other user of PAGE_KERNEL_EXEC
> > > should trigger this, of which there are a few (kexec, tboot, hibernate,
> > > early xen pv mapping, early SEV identity mapping)
> > 
> > There are only 3 users in the entire tree afaict:
> > 
> > arch/arm64/kernel/probes/kprobes.c:     page = vmalloc_exec(PAGE_SIZE);
> > arch/x86/hyperv/hv_init.c:      hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
> > kernel/module.c:        return vmalloc_exec(size);
> > 
> > And that last one is a weak function that any arch that has STRICT_RWX
> > ought to override.
> > 
> > > We really shouldn't create mappings like this by default.  Either we
> > > need to flip PAGE_KERNEL_EXEC itself based on the needs of the above
> > > users, or add another define to overload vmalloc_exec as there is no
> > > other user of that for x86.
> > 
> > We really should get rid of the two !module users of this though; both
> > x86 and arm64 have STRICT_RWX and sufficient primitives to DTRT.
> > 
> > What is HV even trying to do with that page? AFAICT it never actually
> > writes to it, it seens to give the physica address to an MSR (which I
> > suspect then writes crud into the page for us from host context).
> > 
> > Suggesting the page really only needs to be RX.
> > 
> > On top of that, vmalloc_exec() gets us a page from the entire vmalloc
> > range, which can be outside of the 2G executable range, which seems to
> > suggest vmalloc_exec() is wrong too and all this works by accident.
> > 
> > How about something like this:
> > 
> > 
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index a54c6a401581..82a3a4a9481f 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -375,12 +375,15 @@ void __init hyperv_init(void)
> >  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
> >  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> >  
> > -	hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
> > +	hv_hypercall_pg = module_alloc(PAGE_SIZE);
> >  	if (hv_hypercall_pg == NULL) {
> >  		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> >  		goto remove_cpuhp_state;
> >  	}
> >  
> > +	set_memory_ro((unsigned long)hv_hypercall_pg, 1);
> > +	set_memory_x((unsigned long)hv_hypercall_pg, 1);
> 
> The changing of the permissions sucks.  I thought about adding
> a module_alloc_prot with an explicit pgprot_t argument.  On x86
> alone at least ftrace would also benefit from that.

The above is also missing a set_vm_flush_reset_perms.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 10:24                     ` Christoph Hellwig
@ 2020-06-16 10:31                       ` Peter Zijlstra
  2020-06-16 10:33                         ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-06-16 10:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dexuan Cui, vkuznets, Stephen Hemminger, Andy Lutomirski,
	Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee, x86,
	linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 12:24:12PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 16, 2020 at 12:23:50PM +0200, Christoph Hellwig wrote:

> > > +	hv_hypercall_pg = module_alloc(PAGE_SIZE);
> > >  	if (hv_hypercall_pg == NULL) {
> > >  		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> > >  		goto remove_cpuhp_state;
> > >  	}
> > >  
> > > +	set_memory_ro((unsigned long)hv_hypercall_pg, 1);
> > > +	set_memory_x((unsigned long)hv_hypercall_pg, 1);
> > 
> > The changing of the permissions sucks.  I thought about adding
> > a module_alloc_prot with an explicit pgprot_t argument.  On x86
> > alone at least ftrace would also benefit from that.

How would ftrace benefit from a RX permission? We need to actually write
to the page first. This HV thing is special in that it lets the host
write.

> The above is also missing a set_vm_flush_reset_perms.

Ah, indeed. Otherwise we don't clean up properly.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 10:31                       ` Peter Zijlstra
@ 2020-06-16 10:33                         ` Christoph Hellwig
  2020-06-16 10:40                           ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16 10:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Dexuan Cui, vkuznets, Stephen Hemminger,
	Andy Lutomirski, Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee,
	x86, linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 12:31:37PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 16, 2020 at 12:24:12PM +0200, Christoph Hellwig wrote:
> > On Tue, Jun 16, 2020 at 12:23:50PM +0200, Christoph Hellwig wrote:
> 
> > > > +	hv_hypercall_pg = module_alloc(PAGE_SIZE);
> > > >  	if (hv_hypercall_pg == NULL) {
> > > >  		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> > > >  		goto remove_cpuhp_state;
> > > >  	}
> > > >  
> > > > +	set_memory_ro((unsigned long)hv_hypercall_pg, 1);
> > > > +	set_memory_x((unsigned long)hv_hypercall_pg, 1);
> > > 
> > > The changing of the permissions sucks.  I thought about adding
> > > a module_alloc_prot with an explicit pgprot_t argument.  On x86
> > > alone at least ftrace would also benefit from that.
> 
> How would ftrace benefit from a RX permission? We need to actually write
> to the page first. This HV thing is special in that it lets the host
> write.

sorry, s/ftrace/kprobes/.  See my updated branch here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/module_alloc-cleanup

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 10:33                         ` Christoph Hellwig
@ 2020-06-16 10:40                           ` Peter Zijlstra
  2020-06-16 10:42                             ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-06-16 10:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dexuan Cui, vkuznets, Stephen Hemminger, Andy Lutomirski,
	Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee, x86,
	linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 12:33:13PM +0200, Christoph Hellwig wrote:
> sorry, s/ftrace/kprobes/.  See my updated branch here:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/module_alloc-cleanup

Ah the insn slot page, yes. Didn't you just loose VM_FLUSH_RESET_PERMS
there?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 10:40                           ` Peter Zijlstra
@ 2020-06-16 10:42                             ` Christoph Hellwig
  2020-06-16 10:52                               ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16 10:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Dexuan Cui, vkuznets, Stephen Hemminger,
	Andy Lutomirski, Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee,
	x86, linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 12:40:32PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 16, 2020 at 12:33:13PM +0200, Christoph Hellwig wrote:
> > sorry, s/ftrace/kprobes/.  See my updated branch here:
> > 
> > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/module_alloc-cleanup
> 
> Ah the insn slot page, yes. Didn't you just loose VM_FLUSH_RESET_PERMS
> there?

Yes, we did.  vmalloc_exec had it, but given that module_alloc didn't
allocate executable on x86 it didn't.  I guess we should make sure it
is set by the low-level vmalloc code if exec permissions are set to
sort this mess out.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 10:08                     ` Christoph Hellwig
@ 2020-06-16 10:50                       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 27+ messages in thread
From: Vitaly Kuznetsov @ 2020-06-16 10:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dexuan Cui, Christoph Hellwig, Stephen Hemminger,
	Andy Lutomirski, Peter Zijlstra, Andy Lutomirski, Michael Kelley,
	Ju-Hyoung Lee, x86, linux-hyperv, linux-kernel, KY Srinivasan

Christoph Hellwig <hch@lst.de> writes:

> On Tue, Jun 16, 2020 at 11:55:49AM +0200, Christoph Hellwig wrote:
>> Actually, what do you think of this one:
>

I see Peter has some concerns but if vmalloc_exec() used to work then
open-coding it with the right flags should do. I've also tested your
patch with Hyper-V, it works, so:

Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> 


> Plus this whole series to kill of vmalloc_exec entirely:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/vmalloc_exec-fixes
>

FWIW, the vmalloc_exec() doing W+X allocation is misleading indeed, thus
killing it makes perfect sense.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 10:42                             ` Christoph Hellwig
@ 2020-06-16 10:52                               ` Christoph Hellwig
  2020-06-16 11:24                                 ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16 10:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Dexuan Cui, vkuznets, Stephen Hemminger,
	Andy Lutomirski, Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee,
	x86, linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 12:42:30PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 16, 2020 at 12:40:32PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 16, 2020 at 12:33:13PM +0200, Christoph Hellwig wrote:
> > > sorry, s/ftrace/kprobes/.  See my updated branch here:
> > > 
> > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/module_alloc-cleanup
> > 
> > Ah the insn slot page, yes. Didn't you just loose VM_FLUSH_RESET_PERMS
> > there?
> 
> Yes, we did.  vmalloc_exec had it, but given that module_alloc didn't
> allocate executable on x86 it didn't.  I guess we should make sure it
> is set by the low-level vmalloc code if exec permissions are set to
> sort this mess out.

I think something like this should solve the issue:

--
diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
index e988bac0a4a1c3..716e4de44a8e78 100644
--- a/arch/x86/include/asm/module.h
+++ b/arch/x86/include/asm/module.h
@@ -13,4 +13,6 @@ struct mod_arch_specific {
 #endif
 };
 
+void *module_alloc_prot(unsigned long size, pgprot_t prot);
+
 #endif /* _ASM_X86_MODULE_H */
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 34b153cbd4acb4..4db6e655120960 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -65,8 +65,10 @@ static unsigned long int get_module_load_offset(void)
 }
 #endif
 
-void *module_alloc(unsigned long size)
+void *module_alloc_prot(unsigned long size, pgprot_t prot)
 {
+	unsigned int flags = (pgprot_val(prot) & _PAGE_NX) ?
+			0 : VM_FLUSH_RESET_PERMS;
 	void *p;
 
 	if (PAGE_ALIGN(size) > MODULES_LEN)
@@ -75,7 +77,7 @@ void *module_alloc(unsigned long size)
 	p = __vmalloc_node_range(size, MODULE_ALIGN,
 				    MODULES_VADDR + get_module_load_offset(),
 				    MODULES_END, GFP_KERNEL,
-				    PAGE_KERNEL, 0, NUMA_NO_NODE,
+				    prot, flags, NUMA_NO_NODE,
 				    __builtin_return_address(0));
 	if (p && (kasan_module_alloc(p, size) < 0)) {
 		vfree(p);
@@ -85,6 +87,11 @@ void *module_alloc(unsigned long size)
 	return p;
 }
 
+void *module_alloc(unsigned long size)
+{
+	return module_alloc_prot(size, PAGE_KERNEL);
+}
+
 #ifdef CONFIG_X86_32
 int apply_relocate(Elf32_Shdr *sechdrs,
 		   const char *strtab,

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 10:52                               ` Christoph Hellwig
@ 2020-06-16 11:24                                 ` Peter Zijlstra
  2020-06-16 14:39                                   ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-06-16 11:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dexuan Cui, vkuznets, Stephen Hemminger, Andy Lutomirski,
	Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee, x86,
	linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 12:52:00PM +0200, Christoph Hellwig wrote:

> I think something like this should solve the issue:
> 
> --
> diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
> index e988bac0a4a1c3..716e4de44a8e78 100644
> --- a/arch/x86/include/asm/module.h
> +++ b/arch/x86/include/asm/module.h
> @@ -13,4 +13,6 @@ struct mod_arch_specific {
>  #endif
>  };
>  
> +void *module_alloc_prot(unsigned long size, pgprot_t prot);
> +
>  #endif /* _ASM_X86_MODULE_H */
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index 34b153cbd4acb4..4db6e655120960 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -65,8 +65,10 @@ static unsigned long int get_module_load_offset(void)
>  }
>  #endif
>  
> -void *module_alloc(unsigned long size)
> +void *module_alloc_prot(unsigned long size, pgprot_t prot)
>  {
> +	unsigned int flags = (pgprot_val(prot) & _PAGE_NX) ?
> +			0 : VM_FLUSH_RESET_PERMS;
>  	void *p;
>  
>  	if (PAGE_ALIGN(size) > MODULES_LEN)
> @@ -75,7 +77,7 @@ void *module_alloc(unsigned long size)
>  	p = __vmalloc_node_range(size, MODULE_ALIGN,
>  				    MODULES_VADDR + get_module_load_offset(),
>  				    MODULES_END, GFP_KERNEL,
> -				    PAGE_KERNEL, 0, NUMA_NO_NODE,
> +				    prot, flags, NUMA_NO_NODE,
>  				    __builtin_return_address(0));
>  	if (p && (kasan_module_alloc(p, size) < 0)) {
>  		vfree(p);

Hurmm.. Yes it would. It just doesn't feel right though. Can't we
unconditionally set the flag? At worst it makes free a little bit more
expensive.

The thing is, I don't think _NX is the only prot that needs restoring.
Any prot other than the default (RW IIRC) needs restoring.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: hv_hypercall_pg page permissios
  2020-06-16 11:24                                 ` Peter Zijlstra
@ 2020-06-16 14:39                                   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2020-06-16 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Dexuan Cui, vkuznets, Stephen Hemminger,
	Andy Lutomirski, Andy Lutomirski, Michael Kelley, Ju-Hyoung Lee,
	x86, linux-hyperv, linux-kernel, KY Srinivasan, Tom Lendacky

On Tue, Jun 16, 2020 at 01:24:15PM +0200, Peter Zijlstra wrote:
> > +void *module_alloc_prot(unsigned long size, pgprot_t prot)
> >  {
> > +	unsigned int flags = (pgprot_val(prot) & _PAGE_NX) ?
> > +			0 : VM_FLUSH_RESET_PERMS;
> >  	void *p;
> >  
> >  	if (PAGE_ALIGN(size) > MODULES_LEN)
> > @@ -75,7 +77,7 @@ void *module_alloc(unsigned long size)
> >  	p = __vmalloc_node_range(size, MODULE_ALIGN,
> >  				    MODULES_VADDR + get_module_load_offset(),
> >  				    MODULES_END, GFP_KERNEL,
> > -				    PAGE_KERNEL, 0, NUMA_NO_NODE,
> > +				    prot, flags, NUMA_NO_NODE,
> >  				    __builtin_return_address(0));
> >  	if (p && (kasan_module_alloc(p, size) < 0)) {
> >  		vfree(p);
> 
> Hurmm.. Yes it would. It just doesn't feel right though. Can't we
> unconditionally set the flag? At worst it makes free a little bit more
> expensive.
> 
> The thing is, I don't think _NX is the only prot that needs restoring.
> Any prot other than the default (RW IIRC) needs restoring.

If that actually is the case I think we should just check for
a non-default permission in __vmalloc_node_range itself and handle it
there, which seems like a nice solution.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-06-16 14:39 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-07  6:55 hv_hypercall_pg page permissios Christoph Hellwig
2020-04-07  7:28 ` Vitaly Kuznetsov
2020-04-07  7:38   ` Christoph Hellwig
2020-04-07 21:01     ` Andy Lutomirski
2020-06-12  7:48       ` Dexuan Cui
2020-06-15  8:35         ` Vitaly Kuznetsov
2020-06-15 17:41           ` Dexuan Cui
2020-06-15 19:49             ` Dexuan Cui
2020-06-16  7:23               ` Christoph Hellwig
2020-06-16 10:18                 ` Peter Zijlstra
2020-06-16 10:23                   ` Christoph Hellwig
2020-06-16 10:24                     ` Christoph Hellwig
2020-06-16 10:31                       ` Peter Zijlstra
2020-06-16 10:33                         ` Christoph Hellwig
2020-06-16 10:40                           ` Peter Zijlstra
2020-06-16 10:42                             ` Christoph Hellwig
2020-06-16 10:52                               ` Christoph Hellwig
2020-06-16 11:24                                 ` Peter Zijlstra
2020-06-16 14:39                                   ` Christoph Hellwig
2020-06-16  9:29               ` Vitaly Kuznetsov
2020-06-16  9:33                 ` Christoph Hellwig
2020-06-16  9:55                   ` Christoph Hellwig
2020-06-16 10:08                     ` Christoph Hellwig
2020-06-16 10:50                       ` Vitaly Kuznetsov
2020-06-16 10:20                     ` Peter Zijlstra
2020-04-07 18:10   ` Dexuan Cui
2020-04-07 20:42     ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).