All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Will Deacon <will.deacon@arm.com>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	"Naveen N . Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	jeyu@kernel.org, Network Development <netdev@vger.kernel.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Jann Horn <jannh@google.com>,
	Kristen Carlson Accardi <kristen@linux.intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	"Dock, Deneen T" <deneen.t.dock@intel.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 1/2] vmalloc: New flag for flush before releasing pages
Date: Tue, 4 Dec 2018 15:34:09 -0800	[thread overview]
Message-ID: <1FCD958E-C875-4F26-8E59-10615D7BF22E@gmail.com> (raw)
In-Reply-To: <C658F765-F38B-44DF-9FCE-B5ECB5DD9A86@amacapital.net>

> On Dec 4, 2018, at 3:27 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
> 
> 
> On Dec 4, 2018, at 2:48 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
>>> On Dec 4, 2018, at 11:48 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>> 
>>> On Tue, Dec 4, 2018 at 11:45 AM Nadav Amit <nadav.amit@gmail.com> wrote:
>>>>> On Dec 4, 2018, at 10:56 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>> 
>>>>> On Mon, Dec 3, 2018 at 5:43 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>>>>>>> On Nov 27, 2018, at 4:07 PM, Rick Edgecombe <rick.p.edgecombe@intel.com> wrote:
>>>>>>> 
>>>>>>> Since vfree will lazily flush the TLB, but not lazily free the underlying pages,
>>>>>>> it often leaves stale TLB entries to freed pages that could get re-used. This is
>>>>>>> undesirable for cases where the memory being freed has special permissions such
>>>>>>> as executable.
>>>>>> 
>>>>>> So I am trying to finish my patch-set for preventing transient W+X mappings
>>>>>> from taking space, by handling kprobes & ftrace that I missed (thanks again for
>>>>>> pointing it out).
>>>>>> 
>>>>>> But all of the sudden, I don’t understand why we have the problem that this
>>>>>> (your) patch-set deals with at all. We already change the mappings to make
>>>>>> the memory writable before freeing the memory, so why can’t we make it
>>>>>> non-executable at the same time? Actually, why do we make the module memory,
>>>>>> including its data executable before freeing it???
>>>>> 
>>>>> All the code you're looking at is IMO a very awkward and possibly
>>>>> incorrect of doing what's actually necessary: putting the direct map
>>>>> the way it wants to be.
>>>>> 
>>>>> Can't we shove this entirely mess into vunmap?  Have a flag (as part
>>>>> of vmalloc like in Rick's patch or as a flag passed to a vfree variant
>>>>> directly) that makes the vunmap code that frees the underlying pages
>>>>> also reset their permissions?
>>>>> 
>>>>> Right now, we muck with set_memory_rw() and set_memory_nx(), which
>>>>> both have very awkward (and inconsistent with each other!) semantics
>>>>> when called on vmalloc memory.  And they have their own flushes, which
>>>>> is inefficient.  Maybe the right solution is for vunmap to remove the
>>>>> vmap area PTEs, call into a function like set_memory_rw() that resets
>>>>> the direct maps to their default permissions *without* flushing, and
>>>>> then to do a single flush for everything.  Or, even better, to cause
>>>>> the change_page_attr code to do the flush and also to flush the vmap
>>>>> area all at once so that very small free operations can flush single
>>>>> pages instead of flushing globally.
>>>> 
>>>> Thanks for the explanation. I read it just after I realized that indeed the
>>>> whole purpose of this code is to get cpa_process_alias()
>>>> update the corresponding direct mapping.
>>>> 
>>>> This thing (pageattr.c) indeed seems over-engineered and very unintuitive.
>>>> Right now I have a list of patch-sets that I owe, so I don’t have the time
>>>> to deal with it.
>>>> 
>>>> But, I still think that disable_ro_nx() should not call set_memory_x().
>>>> IIUC, this breaks W+X of the direct-mapping which correspond with the module
>>>> memory. Does it ever stop being W+X?? I’ll have another look.
>>> 
>>> Dunno.  I did once chase down a bug where some memory got freed while
>>> it was still read-only, and the results were hilarious and hard to
>>> debug, since the explosion happened long after the buggy code
>>> finished.
>> 
>> This piece of code causes me pain and misery.
>> 
>> So, it turns out that the direct map is *not* changed if you just change
>> the NX-bit. See change_page_attr_set_clr():
>> 
>>       /* No alias checking for _NX bit modifications */
>>       checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
>> 
>> How many levels of abstraction are broken in the way? What would happen
>> if somebody tries to change the NX-bit and some other bit in the PTE?
>> Luckily, I don’t think someone does… at least for now.
>> 
>> So, again, I think the change I proposed makes sense. nios2 does not have
>> set_memory_x() and it will not be affected.
>> 
>> [ I can add a comment, although I don’t have know if nios2 has an NX bit,
>> and I don’t find any code that defines PTEs. Actually where is pte_present()
>> of nios2 being defined? Whatever. ]
> 
> At least rename the function, then. The last thing we need is for
> disable_ro_nx to *enable* NX.

The code is so horrible right now (IMHO), that it will not make it much
worse. But, yes, I will of course change the name. I just want to finish
this text_poke() patch-set and W+X mappings keep popping up.

Thanks (as usual) for your help.


WARNING: multiple messages have this Message-ID (diff)
From: Nadav Amit <nadav.amit@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Will Deacon <will.deacon@arm.com>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	"Naveen N . Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	jeyu@kernel.org, Network Development <netdev@vger.kernel.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Jann Horn <jannh@google.com>,
Subject: Re: [PATCH 1/2] vmalloc: New flag for flush before releasing pages
Date: Tue, 4 Dec 2018 15:34:09 -0800	[thread overview]
Message-ID: <1FCD958E-C875-4F26-8E59-10615D7BF22E@gmail.com> (raw)
In-Reply-To: <C658F765-F38B-44DF-9FCE-B5ECB5DD9A86@amacapital.net>

> On Dec 4, 2018, at 3:27 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
> 
> 
> On Dec 4, 2018, at 2:48 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
>>> On Dec 4, 2018, at 11:48 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>> 
>>> On Tue, Dec 4, 2018 at 11:45 AM Nadav Amit <nadav.amit@gmail.com> wrote:
>>>>> On Dec 4, 2018, at 10:56 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>>>> 
>>>>> On Mon, Dec 3, 2018 at 5:43 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>>>>>>> On Nov 27, 2018, at 4:07 PM, Rick Edgecombe <rick.p.edgecombe@intel.com> wrote:
>>>>>>> 
>>>>>>> Since vfree will lazily flush the TLB, but not lazily free the underlying pages,
>>>>>>> it often leaves stale TLB entries to freed pages that could get re-used. This is
>>>>>>> undesirable for cases where the memory being freed has special permissions such
>>>>>>> as executable.
>>>>>> 
>>>>>> So I am trying to finish my patch-set for preventing transient W+X mappings
>>>>>> from taking space, by handling kprobes & ftrace that I missed (thanks again for
>>>>>> pointing it out).
>>>>>> 
>>>>>> But all of the sudden, I don’t understand why we have the problem that this
>>>>>> (your) patch-set deals with at all. We already change the mappings to make
>>>>>> the memory writable before freeing the memory, so why can’t we make it
>>>>>> non-executable at the same time? Actually, why do we make the module memory,
>>>>>> including its data executable before freeing it???
>>>>> 
>>>>> All the code you're looking at is IMO a very awkward and possibly
>>>>> incorrect of doing what's actually necessary: putting the direct map
>>>>> the way it wants to be.
>>>>> 
>>>>> Can't we shove this entirely mess into vunmap?  Have a flag (as part
>>>>> of vmalloc like in Rick's patch or as a flag passed to a vfree variant
>>>>> directly) that makes the vunmap code that frees the underlying pages
>>>>> also reset their permissions?
>>>>> 
>>>>> Right now, we muck with set_memory_rw() and set_memory_nx(), which
>>>>> both have very awkward (and inconsistent with each other!) semantics
>>>>> when called on vmalloc memory.  And they have their own flushes, which
>>>>> is inefficient.  Maybe the right solution is for vunmap to remove the
>>>>> vmap area PTEs, call into a function like set_memory_rw() that resets
>>>>> the direct maps to their default permissions *without* flushing, and
>>>>> then to do a single flush for everything.  Or, even better, to cause
>>>>> the change_page_attr code to do the flush and also to flush the vmap
>>>>> area all at once so that very small free operations can flush single
>>>>> pages instead of flushing globally.
>>>> 
>>>> Thanks for the explanation. I read it just after I realized that indeed the
>>>> whole purpose of this code is to get cpa_process_alias()
>>>> update the corresponding direct mapping.
>>>> 
>>>> This thing (pageattr.c) indeed seems over-engineered and very unintuitive.
>>>> Right now I have a list of patch-sets that I owe, so I don’t have the time
>>>> to deal with it.
>>>> 
>>>> But, I still think that disable_ro_nx() should not call set_memory_x().
>>>> IIUC, this breaks W+X of the direct-mapping which correspond with the module
>>>> memory. Does it ever stop being W+X?? I’ll have another look.
>>> 
>>> Dunno.  I did once chase down a bug where some memory got freed while
>>> it was still read-only, and the results were hilarious and hard to
>>> debug, since the explosion happened long after the buggy code
>>> finished.
>> 
>> This piece of code causes me pain and misery.
>> 
>> So, it turns out that the direct map is *not* changed if you just change
>> the NX-bit. See change_page_attr_set_clr():
>> 
>>       /* No alias checking for _NX bit modifications */
>>       checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
>> 
>> How many levels of abstraction are broken in the way? What would happen
>> if somebody tries to change the NX-bit and some other bit in the PTE?
>> Luckily, I don’t think someone does… at least for now.
>> 
>> So, again, I think the change I proposed makes sense. nios2 does not have
>> set_memory_x() and it will not be affected.
>> 
>> [ I can add a comment, although I don’t have know if nios2 has an NX bit,
>> and I don’t find any code that defines PTEs. Actually where is pte_present()
>> of nios2 being defined? Whatever. ]
> 
> At least rename the function, then. The last thing we need is for
> disable_ro_nx to *enable* NX.

The code is so horrible right now (IMHO), that it will not make it much
worse. But, yes, I will of course change the name. I just want to finish
this text_poke() patch-set and W+X mappings keep popping up.

Thanks (as usual) for your help.

  reply	other threads:[~2018-12-04 23:34 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-28  0:07 [PATCH 0/2] Don’t leave executable TLB entries to freed pages Rick Edgecombe
2018-11-28  0:07 ` [PATCH 1/2] vmalloc: New flag for flush before releasing pages Rick Edgecombe
2018-12-04  0:04   ` Edgecombe, Rick P
2018-12-04  0:04     ` Edgecombe, Rick P
2018-12-04  0:04     ` Edgecombe, Rick P
2018-12-04  0:04     ` Edgecombe, Rick P
2018-12-04  1:43   ` Nadav Amit
2018-12-04 16:03     ` Will Deacon
2018-12-04 20:02       ` Edgecombe, Rick P
2018-12-04 20:02         ` Edgecombe, Rick P
2018-12-04 20:02         ` Edgecombe, Rick P
2018-12-04 20:02         ` Edgecombe, Rick P
2018-12-04 20:09         ` Andy Lutomirski
2018-12-04 20:09           ` Andy Lutomirski
2018-12-04 23:52           ` Edgecombe, Rick P
2018-12-04 23:52             ` Edgecombe, Rick P
2018-12-04 23:52             ` Edgecombe, Rick P
2018-12-05  1:57             ` Andy Lutomirski
2018-12-05  1:57               ` Andy Lutomirski
2018-12-05  1:57               ` Andy Lutomirski
2018-12-05 11:41           ` Will Deacon
2018-12-05 11:41             ` Will Deacon
2018-12-05 23:16             ` Andy Lutomirski
2018-12-05 23:16               ` Andy Lutomirski
2018-12-06  7:29               ` Ard Biesheuvel
2018-12-06  7:29                 ` Ard Biesheuvel
2018-12-06 11:10                 ` Will Deacon
2018-12-06 11:10                   ` Will Deacon
2018-12-06 18:53                 ` Andy Lutomirski
2018-12-06 18:53                   ` Andy Lutomirski
2018-12-06 19:01                   ` Tycho Andersen
2018-12-06 19:01                     ` Tycho Andersen
2018-12-06 19:19                     ` Andy Lutomirski
2018-12-06 19:19                       ` Andy Lutomirski
2018-12-06 19:39                       ` Nadav Amit
2018-12-06 19:39                         ` Nadav Amit
2018-12-06 20:17                         ` Andy Lutomirski
2018-12-06 20:17                           ` Andy Lutomirski
2018-12-06 23:08                           ` Nadav Amit
2018-12-06 23:08                             ` Nadav Amit
2018-12-07  3:06                             ` Edgecombe, Rick P
2018-12-07  3:06                               ` Edgecombe, Rick P
2018-12-07  3:06                               ` Edgecombe, Rick P
2018-12-06 20:19                       ` Edgecombe, Rick P
2018-12-06 20:19                         ` Edgecombe, Rick P
2018-12-06 20:19                         ` Edgecombe, Rick P
2018-12-06 20:26                         ` Andy Lutomirski
2018-12-06 20:26                           ` Andy Lutomirski
2018-12-06 19:04                   ` Ard Biesheuvel
2018-12-06 19:04                     ` Ard Biesheuvel
2018-12-06 19:20                     ` Andy Lutomirski
2018-12-06 19:20                       ` Andy Lutomirski
2018-12-06 19:23                       ` Ard Biesheuvel
2018-12-06 19:23                         ` Ard Biesheuvel
2018-12-06 19:31                         ` Will Deacon
2018-12-06 19:31                           ` Will Deacon
2018-12-06 19:36                           ` Ard Biesheuvel
2018-12-06 19:36                             ` Ard Biesheuvel
2018-12-04 20:36         ` Nadav Amit
2018-12-04 20:36           ` Nadav Amit
2018-12-04 20:36           ` Nadav Amit
2018-12-04 23:51           ` Edgecombe, Rick P
2018-12-04 23:51             ` Edgecombe, Rick P
2018-12-05  0:01             ` Nadav Amit
2018-12-05  0:01               ` Nadav Amit
2018-12-05  0:01               ` Nadav Amit
2018-12-05  0:29               ` Edgecombe, Rick P
2018-12-05  0:29                 ` Edgecombe, Rick P
2018-12-05  0:29                 ` Edgecombe, Rick P
2018-12-05  0:53                 ` Nadav Amit
2018-12-05  0:53                   ` Nadav Amit
2018-12-05  0:53                   ` Nadav Amit
2018-12-05  1:45                   ` Edgecombe, Rick P
2018-12-05  1:45                     ` Edgecombe, Rick P
2018-12-05  1:45                     ` Edgecombe, Rick P
2018-12-05  2:09                     ` Nadav Amit
2018-12-05  2:09                       ` Nadav Amit
2018-12-05  2:09                       ` Nadav Amit
2018-12-04 18:56     ` Andy Lutomirski
2018-12-04 18:56       ` Andy Lutomirski
2018-12-04 19:44       ` Nadav Amit
2018-12-04 19:44         ` Nadav Amit
2018-12-04 19:48         ` Andy Lutomirski
2018-12-04 19:48           ` Andy Lutomirski
2018-12-04 22:48           ` Nadav Amit
2018-12-04 22:48             ` Nadav Amit
2018-12-04 23:27             ` Andy Lutomirski
2018-12-04 23:27               ` Andy Lutomirski
2018-12-04 23:34               ` Nadav Amit [this message]
2018-12-04 23:34                 ` Nadav Amit
2018-12-05  1:09             ` Edgecombe, Rick P
2018-12-05  1:09               ` Edgecombe, Rick P
2018-12-05  1:09               ` Edgecombe, Rick P
2018-12-05  1:45               ` Nadav Amit
2018-12-05  1:45                 ` Nadav Amit
2018-12-05  1:45                 ` Nadav Amit
2018-11-28  0:07 ` [PATCH 2/2] x86/modules: Make x86 allocs to flush when free Rick Edgecombe
2018-11-28 23:11   ` Andrew Morton
2018-11-29  0:02     ` Edgecombe, Rick P
2018-11-29  0:02       ` Edgecombe, Rick P
2018-11-29  0:02       ` Edgecombe, Rick P
2018-11-29  1:40   ` Andy Lutomirski
2018-11-29  1:40     ` Andy Lutomirski
2018-11-29  6:14     ` Edgecombe, Rick P
2018-11-29  6:14       ` Edgecombe, Rick P
2018-11-29  6:14       ` Edgecombe, Rick P
2018-11-28  1:06 ` [PATCH 0/2] Don’t leave executable TLB entries to freed pages Nadav Amit
2018-11-28  1:21   ` Nadav Amit
2018-11-28  9:57     ` Will Deacon
2018-11-28 18:29       ` Nadav Amit
2018-11-29 14:06 ` Masami Hiramatsu
2018-11-29 18:49   ` Edgecombe, Rick P
2018-11-29 18:49     ` Edgecombe, Rick P
2018-11-29 18:49     ` Edgecombe, Rick P
2018-11-29 23:19     ` Masami Hiramatsu
2018-11-29 23:19       ` Masami Hiramatsu
2018-11-29 23:19       ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1FCD958E-C875-4F26-8E59-10615D7BF22E@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=deneen.t.dock@intel.com \
    --cc=jannh@google.com \
    --cc=jeyu@kernel.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kristen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.