All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>,
	kasan-dev@googlegroups.com, linux-mm@kvack.org, x86@kernel.org,
	glider@google.com, luto@kernel.org, linux-kernel@vger.kernel.org,
	dvyukov@google.com, christophe.leroy@c-s.fr,
	linuxppc-dev@lists.ozlabs.org, gor@linux.ibm.com
Subject: Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory
Date: Fri, 18 Oct 2019 13:43:40 +0300	[thread overview]
Message-ID: <95c87ba1-9c15-43fb-dba7-f3ecd01be8e0@virtuozzo.com> (raw)
In-Reply-To: <20191016132233.GA46264@lakrids.cambridge.arm.com>



On 10/16/19 4:22 PM, Mark Rutland wrote:
> Hi Andrey,
> 
> On Wed, Oct 16, 2019 at 03:19:50PM +0300, Andrey Ryabinin wrote:
>> On 10/14/19 4:57 PM, Daniel Axtens wrote:
>>>>> +	/*
>>>>> +	 * Ensure poisoning is visible before the shadow is made visible
>>>>> +	 * to other CPUs.
>>>>> +	 */
>>>>> +	smp_wmb();
>>>>
>>>> I'm not quite understand what this barrier do and why it needed.
>>>> And if it's really needed there should be a pairing barrier
>>>> on the other side which I don't see.
>>>
>>> Mark might be better able to answer this, but my understanding is that
>>> we want to make sure that we never have a situation where the writes are
>>> reordered so that PTE is installed before all the poisioning is written
>>> out. I think it follows the logic in __pte_alloc() in mm/memory.c:
>>>
>>> 	/*
>>> 	 * Ensure all pte setup (eg. pte page lock and page clearing) are
>>> 	 * visible before the pte is made visible to other CPUs by being
>>> 	 * put into page tables.
>>> 	 *
>>> 	 * The other side of the story is the pointer chasing in the page
>>> 	 * table walking code (when walking the page table without locking;
>>> 	 * ie. most of the time). Fortunately, these data accesses consist
>>> 	 * of a chain of data-dependent loads, meaning most CPUs (alpha
>>> 	 * being the notable exception) will already guarantee loads are
>>> 	 * seen in-order. See the alpha page table accessors for the
>>> 	 * smp_read_barrier_depends() barriers in page table walking code.
>>> 	 */
>>> 	smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
>>>
>>> I can clarify the comment.
>>
>> I don't see how is this relevant here.
> 
> The problem isn't quite the same, but it's a similar shape. See below
> for more details.
> 
>> barrier in __pte_alloc() for very the following case:
>>
>> CPU 0							CPU 1
>> __pte_alloc():                                          pte_offset_kernel(pmd_t * dir, unsigned long address):
>>      pgtable_t new = pte_alloc_one(mm);                        pte_t *new = (pte_t *) pmd_page_vaddr(*dir) + ((address >> PAGE_SHIFT) & (PTRS_PER_PAGE - 1));  
>>      smp_wmb();                                                smp_read_barrier_depends();
>>      pmd_populate(mm, pmd, new);
>> 							/* do something with pte, e.g. check if (pte_none(*new)) */
>>
>>
>> It's needed to ensure that if CPU1 sees pmd_populate() it also sees initialized contents of the 'new'.
>>
>> In our case the barrier would have been needed if we had the other side like this:
>>
>> if (!pte_none(*vmalloc_shadow_pte)) {
>> 	shadow_addr = (unsigned long)__va(pte_pfn(*vmalloc_shadow_pte) << PAGE_SHIFT);
>> 	smp_read_barrier_depends();
>> 	*shadow_addr; /* read the shadow, barrier ensures that if we see installed pte, we will see initialized shadow memory. */
>> }
>>
>>
>> Without such other side the barrier is pointless.
> 
> The barrier isn't pointless, but we are relying on a subtlety that is
> not captured in LKMM, as one of the observers involved is the TLB (and
> associated page table walkers) of the CPU.
> 
> Until the PTE written by CPU 0 has been observed by the TLB of CPU 1, it
> is not possible for CPU 1 to satisfy loads from the memory that PTE
> maps, as it doesn't yet know which memory that is.
> 
> Once the PTE written by CPU has been observed by the TLB of CPU 1, it is
> possible for CPU 1 to satisfy those loads. At this instant, CPU 1 must
> respect the smp_wmb() before the PTE was written, and hence sees zeroes
                                                                 s/zeroes/poison values

> written before this. Note that if this were not true, we could not
> safely swap userspace memory.
> 
> There is the risk (as laid out in [1]) that CPU 1 attempts to hoist the
> loads of the shadow memory above the load of the PTE, samples a stale
> (faulting) status from the TLB, then performs the load of the PTE and
> sees a valid value. In this case (on arm64) a spurious fault could be
> taken when the access is architecturally performed.
> 
> It is possible on arm64 to use a barrier here to prevent the spurious
> fault, but this is not smp_read_barrier_depends(), as that does nothing
> for everyone but alpha. On arm64 We have a spurious fault handler to fix
> this up.
>  

None of that really explains how the race looks like.
Please, describe concrete race race condition diagram starting with something like

CPU0                   CPU1
p0 = vmalloc()         p1 = vmalloc()
...




Or let me put it this way. Let's assume that CPU0 accesses shadow and CPU1 did the memset() and installed pte.
CPU0 may not observe memset() only if it dereferences completely random vmalloc addresses
or it performs out-of-bounds access which crosses KASAN_SHADOW_SCALE*PAGE_SIZE boundary, i.e. access to shadow crosses page boundary.
In both cases it will be hard to avoid crashes. OOB crossing the page boundary in vmalloc pretty much guarantees crash because of guard page,
and derefencing random address isn't going to last for long.

If CPU0 obtained pointer via vmalloc() call and it's doing out-of-bounds (within boundaries of the page) or use-after-free,
than the spin_[un]lock(&init_mm.page_table_lock) should allow CPU0 to see the memset done by CPU1 without any additional barrier.


WARNING: multiple messages have this Message-ID (diff)
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: gor@linux.ibm.com, x86@kernel.org, linux-kernel@vger.kernel.org,
	kasan-dev@googlegroups.com, linux-mm@kvack.org,
	glider@google.com, luto@kernel.org,
	linuxppc-dev@lists.ozlabs.org, dvyukov@google.com,
	Daniel Axtens <dja@axtens.net>
Subject: Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory
Date: Fri, 18 Oct 2019 13:43:40 +0300	[thread overview]
Message-ID: <95c87ba1-9c15-43fb-dba7-f3ecd01be8e0@virtuozzo.com> (raw)
In-Reply-To: <20191016132233.GA46264@lakrids.cambridge.arm.com>



On 10/16/19 4:22 PM, Mark Rutland wrote:
> Hi Andrey,
> 
> On Wed, Oct 16, 2019 at 03:19:50PM +0300, Andrey Ryabinin wrote:
>> On 10/14/19 4:57 PM, Daniel Axtens wrote:
>>>>> +	/*
>>>>> +	 * Ensure poisoning is visible before the shadow is made visible
>>>>> +	 * to other CPUs.
>>>>> +	 */
>>>>> +	smp_wmb();
>>>>
>>>> I'm not quite understand what this barrier do and why it needed.
>>>> And if it's really needed there should be a pairing barrier
>>>> on the other side which I don't see.
>>>
>>> Mark might be better able to answer this, but my understanding is that
>>> we want to make sure that we never have a situation where the writes are
>>> reordered so that PTE is installed before all the poisioning is written
>>> out. I think it follows the logic in __pte_alloc() in mm/memory.c:
>>>
>>> 	/*
>>> 	 * Ensure all pte setup (eg. pte page lock and page clearing) are
>>> 	 * visible before the pte is made visible to other CPUs by being
>>> 	 * put into page tables.
>>> 	 *
>>> 	 * The other side of the story is the pointer chasing in the page
>>> 	 * table walking code (when walking the page table without locking;
>>> 	 * ie. most of the time). Fortunately, these data accesses consist
>>> 	 * of a chain of data-dependent loads, meaning most CPUs (alpha
>>> 	 * being the notable exception) will already guarantee loads are
>>> 	 * seen in-order. See the alpha page table accessors for the
>>> 	 * smp_read_barrier_depends() barriers in page table walking code.
>>> 	 */
>>> 	smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
>>>
>>> I can clarify the comment.
>>
>> I don't see how is this relevant here.
> 
> The problem isn't quite the same, but it's a similar shape. See below
> for more details.
> 
>> barrier in __pte_alloc() for very the following case:
>>
>> CPU 0							CPU 1
>> __pte_alloc():                                          pte_offset_kernel(pmd_t * dir, unsigned long address):
>>      pgtable_t new = pte_alloc_one(mm);                        pte_t *new = (pte_t *) pmd_page_vaddr(*dir) + ((address >> PAGE_SHIFT) & (PTRS_PER_PAGE - 1));  
>>      smp_wmb();                                                smp_read_barrier_depends();
>>      pmd_populate(mm, pmd, new);
>> 							/* do something with pte, e.g. check if (pte_none(*new)) */
>>
>>
>> It's needed to ensure that if CPU1 sees pmd_populate() it also sees initialized contents of the 'new'.
>>
>> In our case the barrier would have been needed if we had the other side like this:
>>
>> if (!pte_none(*vmalloc_shadow_pte)) {
>> 	shadow_addr = (unsigned long)__va(pte_pfn(*vmalloc_shadow_pte) << PAGE_SHIFT);
>> 	smp_read_barrier_depends();
>> 	*shadow_addr; /* read the shadow, barrier ensures that if we see installed pte, we will see initialized shadow memory. */
>> }
>>
>>
>> Without such other side the barrier is pointless.
> 
> The barrier isn't pointless, but we are relying on a subtlety that is
> not captured in LKMM, as one of the observers involved is the TLB (and
> associated page table walkers) of the CPU.
> 
> Until the PTE written by CPU 0 has been observed by the TLB of CPU 1, it
> is not possible for CPU 1 to satisfy loads from the memory that PTE
> maps, as it doesn't yet know which memory that is.
> 
> Once the PTE written by CPU has been observed by the TLB of CPU 1, it is
> possible for CPU 1 to satisfy those loads. At this instant, CPU 1 must
> respect the smp_wmb() before the PTE was written, and hence sees zeroes
                                                                 s/zeroes/poison values

> written before this. Note that if this were not true, we could not
> safely swap userspace memory.
> 
> There is the risk (as laid out in [1]) that CPU 1 attempts to hoist the
> loads of the shadow memory above the load of the PTE, samples a stale
> (faulting) status from the TLB, then performs the load of the PTE and
> sees a valid value. In this case (on arm64) a spurious fault could be
> taken when the access is architecturally performed.
> 
> It is possible on arm64 to use a barrier here to prevent the spurious
> fault, but this is not smp_read_barrier_depends(), as that does nothing
> for everyone but alpha. On arm64 We have a spurious fault handler to fix
> this up.
>  

None of that really explains how the race looks like.
Please, describe concrete race race condition diagram starting with something like

CPU0                   CPU1
p0 = vmalloc()         p1 = vmalloc()
...




Or let me put it this way. Let's assume that CPU0 accesses shadow and CPU1 did the memset() and installed pte.
CPU0 may not observe memset() only if it dereferences completely random vmalloc addresses
or it performs out-of-bounds access which crosses KASAN_SHADOW_SCALE*PAGE_SIZE boundary, i.e. access to shadow crosses page boundary.
In both cases it will be hard to avoid crashes. OOB crossing the page boundary in vmalloc pretty much guarantees crash because of guard page,
and derefencing random address isn't going to last for long.

If CPU0 obtained pointer via vmalloc() call and it's doing out-of-bounds (within boundaries of the page) or use-after-free,
than the spin_[un]lock(&init_mm.page_table_lock) should allow CPU0 to see the memset done by CPU1 without any additional barrier.


  reply	other threads:[~2019-10-18 10:44 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-01  6:58 [PATCH v8 0/5] kasan: support backing vmalloc space with real shadow memory Daniel Axtens
2019-10-01  6:58 ` [PATCH v8 1/5] " Daniel Axtens
2019-10-01 10:17   ` Uladzislau Rezki
2019-10-01 10:17     ` Uladzislau Rezki
2019-10-02  1:23     ` Daniel Axtens
2019-10-02  1:23       ` Daniel Axtens
2019-10-02  7:13       ` Christophe Leroy
2019-10-02  7:13         ` Christophe Leroy
2019-10-02 11:49       ` Uladzislau Rezki
2019-10-02 11:49         ` Uladzislau Rezki
2019-10-07  8:02   ` Uladzislau Rezki
2019-10-07  8:02     ` Uladzislau Rezki
2019-10-11  5:15     ` Daniel Axtens
2019-10-11  5:15       ` Daniel Axtens
2019-10-11 19:57   ` Andrey Ryabinin
2019-10-14 13:57     ` Daniel Axtens
2019-10-14 15:27       ` Mark Rutland
2019-10-14 15:27         ` Mark Rutland
2019-10-15  6:32         ` Daniel Axtens
2019-10-15  6:32           ` Daniel Axtens
2019-10-15  6:29       ` Daniel Axtens
2019-10-16 12:19       ` Andrey Ryabinin
2019-10-16 13:22         ` Mark Rutland
2019-10-16 13:22           ` Mark Rutland
2019-10-18 10:43           ` Andrey Ryabinin [this message]
2019-10-18 10:43             ` Andrey Ryabinin
2019-10-28  7:39             ` Daniel Axtens
2019-10-28  7:39               ` Daniel Axtens
2019-10-28  1:26           ` Daniel Axtens
2019-10-28  1:26             ` Daniel Axtens
2019-10-14 15:43   ` Mark Rutland
2019-10-14 15:43     ` Mark Rutland
2019-10-15  6:27     ` Daniel Axtens
2019-10-15  6:27       ` Daniel Axtens
2019-10-01  6:58 ` [PATCH v8 2/5] kasan: add test for vmalloc Daniel Axtens
2019-10-01  6:58 ` [PATCH v8 3/5] fork: support VMAP_STACK with KASAN_VMALLOC Daniel Axtens
2019-10-01  6:58 ` [PATCH v8 4/5] x86/kasan: support KASAN_VMALLOC Daniel Axtens
2019-10-01  6:58 ` [PATCH v8 5/5] kasan debug: track pages allocated for vmalloc shadow Daniel Axtens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95c87ba1-9c15-43fb-dba7-f3ecd01be8e0@virtuozzo.com \
    --to=aryabinin@virtuozzo.com \
    --cc=christophe.leroy@c-s.fr \
    --cc=dja@axtens.net \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=gor@linux.ibm.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.