All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Have any influence on set_memory_** about below patch ??
       [not found] ` <20160111133145.GM6499@leverpostej>
@ 2016-01-12  1:20     ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-12  1:20 UTC (permalink / raw)
  To: Mark Rutland
  Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On 2016/1/11 21:31, Mark Rutland wrote:

> Hi,
> 
> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>
>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>
>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>> and merging wiil produce confilct in the liner mapping area. Based on the
>> situation, Assume that set up page table in 4kb page table way in the liner
>> mapping area, Does the set_memroy_** will work without any conplict??
> 
> I'm not sure I understand the question.
> 
> I'm also not a fan of responding to off-list queries as information gets
> lost.
> 
> Please ask your question on the mailing list. I am more than happy to
> respond there.
> 
> Thanks,
> Mark.
> 

Hi Mark,

In your patch it said "The presence of conflicting TLB entries may result in
a variety of behaviours detrimental to the system " and "but this(break-before-make
approach) cannot work for modifications to the swapper page tables that cover the
kernel text and data."

I'm not quite understand this, why the direct mapping can't work?
flush tlb can't resolve it?

I find x86 does not have this limit. e.g. set_memory_r*.

Thanks,
Xishi Qiu

> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-12  1:20     ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-12  1:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/1/11 21:31, Mark Rutland wrote:

> Hi,
> 
> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>
>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>
>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>> and merging wiil produce confilct in the liner mapping area. Based on the
>> situation, Assume that set up page table in 4kb page table way in the liner
>> mapping area, Does the set_memroy_** will work without any conplict??
> 
> I'm not sure I understand the question.
> 
> I'm also not a fan of responding to off-list queries as information gets
> lost.
> 
> Please ask your question on the mailing list. I am more than happy to
> respond there.
> 
> Thanks,
> Mark.
> 

Hi Mark,

In your patch it said "The presence of conflicting TLB entries may result in
a variety of behaviours detrimental to the system " and "but this(break-before-make
approach) cannot work for modifications to the swapper page tables that cover the
kernel text and data."

I'm not quite understand this, why the direct mapping can't work?
flush tlb can't resolve it?

I find x86 does not have this limit. e.g. set_memory_r*.

Thanks,
Xishi Qiu

> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-12  1:20     ` Xishi Qiu
@ 2016-01-12 11:15       ` Mark Rutland
  -1 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-12 11:15 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
> On 2016/1/11 21:31, Mark Rutland wrote:
> 
> > Hi,
> > 
> > On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
> >>
> >> http://www.spinics.net/lists/arm-kernel/msg472090.html
> >>
> >> Hi, Can I ask you a question? Say, This patch tells that the section spilting
> >> and merging wiil produce confilct in the liner mapping area. Based on the
> >> situation, Assume that set up page table in 4kb page table way in the liner
> >> mapping area, Does the set_memroy_** will work without any conplict??
> > 
> > I'm not sure I understand the question.
> > 
> > I'm also not a fan of responding to off-list queries as information gets
> > lost.
> > 
> > Please ask your question on the mailing list. I am more than happy to
> > respond there.
> > 
> > Thanks,
> > Mark.
> > 
> 
> Hi Mark,
> 
> In your patch it said "The presence of conflicting TLB entries may result in
> a variety of behaviours detrimental to the system " and "but this(break-before-make
> approach) cannot work for modifications to the swapper page tables that cover the
> kernel text and data."
> 
> I'm not quite understand this, why the direct mapping can't work?

The problem is that the TLB hardware can operate asynchronously to the
rest of the CPU. At any point in time, for any reason, it can decide to
destroy TLB entries, to allocate new ones, or to perform a walk based on
the existing contents of the TLB.

When the TLB contains conflicting entries, TLB lookups may result in TLB
conflict aborts, or may return an "amalgamation" of the conflicting
entries (e.g. you could get an erroneous output address).

The direct mapping is in active use (and hence live in TLBs). Modifying
it without break-before-make (BBM) risks the allocation of conflicting
TLB entries. Modifying it with BBM risks unmapping the portion of the
kernel performing the modification, resulting in an unrecoverable abort.

> flush tlb can't resolve it?

Flushing the TLB doesn't help because the page table update, TLB
invalidate, and corresponding barrier(s) are separate operations. The
TLB can allocate or destroy entries at any point during the sequence.

For example, without BBM a page table update would look something like:

1)	str	<newpte>, [<*pte>]
2)	dsb	ish
3)	tlbi	vmalle1is
4)	dsb	ish
5)	isb

After step 1, the new pte value may become visible to the TLBs, and the
TLBs may allocate a new entry for it. Until step 4 completes, this entry
may remain active in the TLB, and may conflict with an existing entry.

If that entry covers the kernel text for steps 2-5, executing the
sequence may result in an unrecoverable TLB conflict abort, or some
other behaviour resulting from an amalgamated TLB, e.g. the I-cache
might fetch instructions from the wrong address such that steps 2-5
cannot be executed.

If the kernel doesn't explicitly access the address covered by that pte,
there may still be a problem. The TLB may perform an internal lookup
when performing a page table walk, and could then use an erroneous
result to continue the walk, resulting in a variety of potential issues
(e.g. reading from an MMIO peripheral register).

BBM avoids the conflict, but as that would mean kernel text and/or data
would be unmapped, you can't execute the code to finish the update.

> I find x86 does not have this limit. e.g. set_memory_r*.

I don't know much about x86; it's probably worth asking the x86 guys
about that. It may be that the x86 architecture requires that a conflict
or amalgamation is never visible to software, or it could be that
contemporary implementations happen to provide that property.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-12 11:15       ` Mark Rutland
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-12 11:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
> On 2016/1/11 21:31, Mark Rutland wrote:
> 
> > Hi,
> > 
> > On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
> >>
> >> http://www.spinics.net/lists/arm-kernel/msg472090.html
> >>
> >> Hi, Can I ask you a question? Say, This patch tells that the section spilting
> >> and merging wiil produce confilct in the liner mapping area. Based on the
> >> situation, Assume that set up page table in 4kb page table way in the liner
> >> mapping area, Does the set_memroy_** will work without any conplict??
> > 
> > I'm not sure I understand the question.
> > 
> > I'm also not a fan of responding to off-list queries as information gets
> > lost.
> > 
> > Please ask your question on the mailing list. I am more than happy to
> > respond there.
> > 
> > Thanks,
> > Mark.
> > 
> 
> Hi Mark,
> 
> In your patch it said "The presence of conflicting TLB entries may result in
> a variety of behaviours detrimental to the system " and "but this(break-before-make
> approach) cannot work for modifications to the swapper page tables that cover the
> kernel text and data."
> 
> I'm not quite understand this, why the direct mapping can't work?

The problem is that the TLB hardware can operate asynchronously to the
rest of the CPU. At any point in time, for any reason, it can decide to
destroy TLB entries, to allocate new ones, or to perform a walk based on
the existing contents of the TLB.

When the TLB contains conflicting entries, TLB lookups may result in TLB
conflict aborts, or may return an "amalgamation" of the conflicting
entries (e.g. you could get an erroneous output address).

The direct mapping is in active use (and hence live in TLBs). Modifying
it without break-before-make (BBM) risks the allocation of conflicting
TLB entries. Modifying it with BBM risks unmapping the portion of the
kernel performing the modification, resulting in an unrecoverable abort.

> flush tlb can't resolve it?

Flushing the TLB doesn't help because the page table update, TLB
invalidate, and corresponding barrier(s) are separate operations. The
TLB can allocate or destroy entries at any point during the sequence.

For example, without BBM a page table update would look something like:

1)	str	<newpte>, [<*pte>]
2)	dsb	ish
3)	tlbi	vmalle1is
4)	dsb	ish
5)	isb

After step 1, the new pte value may become visible to the TLBs, and the
TLBs may allocate a new entry for it. Until step 4 completes, this entry
may remain active in the TLB, and may conflict with an existing entry.

If that entry covers the kernel text for steps 2-5, executing the
sequence may result in an unrecoverable TLB conflict abort, or some
other behaviour resulting from an amalgamated TLB, e.g. the I-cache
might fetch instructions from the wrong address such that steps 2-5
cannot be executed.

If the kernel doesn't explicitly access the address covered by that pte,
there may still be a problem. The TLB may perform an internal lookup
when performing a page table walk, and could then use an erroneous
result to continue the walk, resulting in a variety of potential issues
(e.g. reading from an MMIO peripheral register).

BBM avoids the conflict, but as that would mean kernel text and/or data
would be unmapped, you can't execute the code to finish the update.

> I find x86 does not have this limit. e.g. set_memory_r*.

I don't know much about x86; it's probably worth asking the x86 guys
about that. It may be that the x86 architecture requires that a conflict
or amalgamation is never visible to software, or it could be that
contemporary implementations happen to provide that property.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-12 11:15       ` Mark Rutland
@ 2016-01-13  4:10         ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-13  4:10 UTC (permalink / raw)
  To: Mark Rutland
  Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On 2016/1/12 19:15, Mark Rutland wrote:

> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>> On 2016/1/11 21:31, Mark Rutland wrote:
>>
>>> Hi,
>>>
>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>
>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>
>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>
>>> I'm not sure I understand the question.
>>>
>>> I'm also not a fan of responding to off-list queries as information gets
>>> lost.
>>>
>>> Please ask your question on the mailing list. I am more than happy to
>>> respond there.
>>>
>>> Thanks,
>>> Mark.
>>>
>>
>> Hi Mark,
>>
>> In your patch it said "The presence of conflicting TLB entries may result in
>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>> approach) cannot work for modifications to the swapper page tables that cover the
>> kernel text and data."
>>
>> I'm not quite understand this, why the direct mapping can't work?
> 
> The problem is that the TLB hardware can operate asynchronously to the
> rest of the CPU. At any point in time, for any reason, it can decide to
> destroy TLB entries, to allocate new ones, or to perform a walk based on
> the existing contents of the TLB.
> 
> When the TLB contains conflicting entries, TLB lookups may result in TLB
> conflict aborts, or may return an "amalgamation" of the conflicting
> entries (e.g. you could get an erroneous output address).
> 
> The direct mapping is in active use (and hence live in TLBs). Modifying
> it without break-before-make (BBM) risks the allocation of conflicting
> TLB entries. Modifying it with BBM risks unmapping the portion of the
> kernel performing the modification, resulting in an unrecoverable abort.
> 
>> flush tlb can't resolve it?
> 
> Flushing the TLB doesn't help because the page table update, TLB
> invalidate, and corresponding barrier(s) are separate operations. The
> TLB can allocate or destroy entries at any point during the sequence.
> 
> For example, without BBM a page table update would look something like:
> 
> 1)	str	<newpte>, [<*pte>]
> 2)	dsb	ish
> 3)	tlbi	vmalle1is
> 4)	dsb	ish
> 5)	isb
> 
> After step 1, the new pte value may become visible to the TLBs, and the
> TLBs may allocate a new entry for it. Until step 4 completes, this entry
> may remain active in the TLB, and may conflict with an existing entry.
> 
> If that entry covers the kernel text for steps 2-5, executing the
> sequence may result in an unrecoverable TLB conflict abort, or some
> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
> might fetch instructions from the wrong address such that steps 2-5
> cannot be executed.
> 
> If the kernel doesn't explicitly access the address covered by that pte,
> there may still be a problem. The TLB may perform an internal lookup
> when performing a page table walk, and could then use an erroneous
> result to continue the walk, resulting in a variety of potential issues
> (e.g. reading from an MMIO peripheral register).
> 
> BBM avoids the conflict, but as that would mean kernel text and/or data
> would be unmapped, you can't execute the code to finish the update.
> 
>> I find x86 does not have this limit. e.g. set_memory_r*.
> 
> I don't know much about x86; it's probably worth asking the x86 guys
> about that. It may be that the x86 architecture requires that a conflict
> or amalgamation is never visible to software, or it could be that
> contemporary implementations happen to provide that property.
> 
> Thanks,
> Mark.
> 

Hi Mark,

Thank you for your reply, I find this code in /arch/arm64/mm/mmu.c

...
#ifdef CONFIG_DEBUG_RODATA
void mark_rodata_ro(void)
{
	create_mapping_late(__pa(_stext), (unsigned long)_stext,
				(unsigned long)_etext - (unsigned long)_stext,
				PAGE_KERNEL_EXEC | PTE_RDONLY);

}
#endif
...

So does it also have this problem?

Thanks,
Xishi Qiu

> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-13  4:10         ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-13  4:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/1/12 19:15, Mark Rutland wrote:

> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>> On 2016/1/11 21:31, Mark Rutland wrote:
>>
>>> Hi,
>>>
>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>
>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>
>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>
>>> I'm not sure I understand the question.
>>>
>>> I'm also not a fan of responding to off-list queries as information gets
>>> lost.
>>>
>>> Please ask your question on the mailing list. I am more than happy to
>>> respond there.
>>>
>>> Thanks,
>>> Mark.
>>>
>>
>> Hi Mark,
>>
>> In your patch it said "The presence of conflicting TLB entries may result in
>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>> approach) cannot work for modifications to the swapper page tables that cover the
>> kernel text and data."
>>
>> I'm not quite understand this, why the direct mapping can't work?
> 
> The problem is that the TLB hardware can operate asynchronously to the
> rest of the CPU. At any point in time, for any reason, it can decide to
> destroy TLB entries, to allocate new ones, or to perform a walk based on
> the existing contents of the TLB.
> 
> When the TLB contains conflicting entries, TLB lookups may result in TLB
> conflict aborts, or may return an "amalgamation" of the conflicting
> entries (e.g. you could get an erroneous output address).
> 
> The direct mapping is in active use (and hence live in TLBs). Modifying
> it without break-before-make (BBM) risks the allocation of conflicting
> TLB entries. Modifying it with BBM risks unmapping the portion of the
> kernel performing the modification, resulting in an unrecoverable abort.
> 
>> flush tlb can't resolve it?
> 
> Flushing the TLB doesn't help because the page table update, TLB
> invalidate, and corresponding barrier(s) are separate operations. The
> TLB can allocate or destroy entries at any point during the sequence.
> 
> For example, without BBM a page table update would look something like:
> 
> 1)	str	<newpte>, [<*pte>]
> 2)	dsb	ish
> 3)	tlbi	vmalle1is
> 4)	dsb	ish
> 5)	isb
> 
> After step 1, the new pte value may become visible to the TLBs, and the
> TLBs may allocate a new entry for it. Until step 4 completes, this entry
> may remain active in the TLB, and may conflict with an existing entry.
> 
> If that entry covers the kernel text for steps 2-5, executing the
> sequence may result in an unrecoverable TLB conflict abort, or some
> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
> might fetch instructions from the wrong address such that steps 2-5
> cannot be executed.
> 
> If the kernel doesn't explicitly access the address covered by that pte,
> there may still be a problem. The TLB may perform an internal lookup
> when performing a page table walk, and could then use an erroneous
> result to continue the walk, resulting in a variety of potential issues
> (e.g. reading from an MMIO peripheral register).
> 
> BBM avoids the conflict, but as that would mean kernel text and/or data
> would be unmapped, you can't execute the code to finish the update.
> 
>> I find x86 does not have this limit. e.g. set_memory_r*.
> 
> I don't know much about x86; it's probably worth asking the x86 guys
> about that. It may be that the x86 architecture requires that a conflict
> or amalgamation is never visible to software, or it could be that
> contemporary implementations happen to provide that property.
> 
> Thanks,
> Mark.
> 

Hi Mark,

Thank you for your reply, I find this code in /arch/arm64/mm/mmu.c

...
#ifdef CONFIG_DEBUG_RODATA
void mark_rodata_ro(void)
{
	create_mapping_late(__pa(_stext), (unsigned long)_stext,
				(unsigned long)_etext - (unsigned long)_stext,
				PAGE_KERNEL_EXEC | PTE_RDONLY);

}
#endif
...

So does it also have this problem?

Thanks,
Xishi Qiu

> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-12 11:15       ` Mark Rutland
@ 2016-01-13  5:02         ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-13  5:02 UTC (permalink / raw)
  To: Mark Rutland
  Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On 2016/1/12 19:15, Mark Rutland wrote:

> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>> On 2016/1/11 21:31, Mark Rutland wrote:
>>
>>> Hi,
>>>
>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>
>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>
>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>
>>> I'm not sure I understand the question.
>>>
>>> I'm also not a fan of responding to off-list queries as information gets
>>> lost.
>>>
>>> Please ask your question on the mailing list. I am more than happy to
>>> respond there.
>>>
>>> Thanks,
>>> Mark.
>>>
>>
>> Hi Mark,
>>
>> In your patch it said "The presence of conflicting TLB entries may result in
>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>> approach) cannot work for modifications to the swapper page tables that cover the
>> kernel text and data."
>>
>> I'm not quite understand this, why the direct mapping can't work?
> 
> The problem is that the TLB hardware can operate asynchronously to the
> rest of the CPU. At any point in time, for any reason, it can decide to
> destroy TLB entries, to allocate new ones, or to perform a walk based on
> the existing contents of the TLB.
> 
> When the TLB contains conflicting entries, TLB lookups may result in TLB
> conflict aborts, or may return an "amalgamation" of the conflicting
> entries (e.g. you could get an erroneous output address).
> 
> The direct mapping is in active use (and hence live in TLBs). Modifying
> it without break-before-make (BBM) risks the allocation of conflicting
> TLB entries. Modifying it with BBM risks unmapping the portion of the
> kernel performing the modification, resulting in an unrecoverable abort.
> 
>> flush tlb can't resolve it?
> 
> Flushing the TLB doesn't help because the page table update, TLB
> invalidate, and corresponding barrier(s) are separate operations. The
> TLB can allocate or destroy entries at any point during the sequence.
> 
> For example, without BBM a page table update would look something like:
> 
> 1)	str	<newpte>, [<*pte>]
> 2)	dsb	ish
> 3)	tlbi	vmalle1is
> 4)	dsb	ish
> 5)	isb
> 
> After step 1, the new pte value may become visible to the TLBs, and the
> TLBs may allocate a new entry for it. Until step 4 completes, this entry
> may remain active in the TLB, and may conflict with an existing entry.
> 
> If that entry covers the kernel text for steps 2-5, executing the
> sequence may result in an unrecoverable TLB conflict abort, or some
> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
> might fetch instructions from the wrong address such that steps 2-5
> cannot be executed.
> 
> If the kernel doesn't explicitly access the address covered by that pte,
> there may still be a problem. The TLB may perform an internal lookup
> when performing a page table walk, and could then use an erroneous
> result to continue the walk, resulting in a variety of potential issues
> (e.g. reading from an MMIO peripheral register).
> 
> BBM avoids the conflict, but as that would mean kernel text and/or data
> would be unmapped, you can't execute the code to finish the update.
> 
>> I find x86 does not have this limit. e.g. set_memory_r*.
> 
> I don't know much about x86; it's probably worth asking the x86 guys
> about that. It may be that the x86 architecture requires that a conflict
> or amalgamation is never visible to software, or it could be that
> contemporary implementations happen to provide that property.
> 
> Thanks,
> Mark.
> 

Hi Mark,

If I do like this, does it have the problem too?

kmalloc a size
no access
flush tlb
call set_memory_ro to change the page table flag
flush tlb
start access

Thanks,
Xishi Qiu 

> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-13  5:02         ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-13  5:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/1/12 19:15, Mark Rutland wrote:

> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>> On 2016/1/11 21:31, Mark Rutland wrote:
>>
>>> Hi,
>>>
>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>
>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>
>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>
>>> I'm not sure I understand the question.
>>>
>>> I'm also not a fan of responding to off-list queries as information gets
>>> lost.
>>>
>>> Please ask your question on the mailing list. I am more than happy to
>>> respond there.
>>>
>>> Thanks,
>>> Mark.
>>>
>>
>> Hi Mark,
>>
>> In your patch it said "The presence of conflicting TLB entries may result in
>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>> approach) cannot work for modifications to the swapper page tables that cover the
>> kernel text and data."
>>
>> I'm not quite understand this, why the direct mapping can't work?
> 
> The problem is that the TLB hardware can operate asynchronously to the
> rest of the CPU. At any point in time, for any reason, it can decide to
> destroy TLB entries, to allocate new ones, or to perform a walk based on
> the existing contents of the TLB.
> 
> When the TLB contains conflicting entries, TLB lookups may result in TLB
> conflict aborts, or may return an "amalgamation" of the conflicting
> entries (e.g. you could get an erroneous output address).
> 
> The direct mapping is in active use (and hence live in TLBs). Modifying
> it without break-before-make (BBM) risks the allocation of conflicting
> TLB entries. Modifying it with BBM risks unmapping the portion of the
> kernel performing the modification, resulting in an unrecoverable abort.
> 
>> flush tlb can't resolve it?
> 
> Flushing the TLB doesn't help because the page table update, TLB
> invalidate, and corresponding barrier(s) are separate operations. The
> TLB can allocate or destroy entries at any point during the sequence.
> 
> For example, without BBM a page table update would look something like:
> 
> 1)	str	<newpte>, [<*pte>]
> 2)	dsb	ish
> 3)	tlbi	vmalle1is
> 4)	dsb	ish
> 5)	isb
> 
> After step 1, the new pte value may become visible to the TLBs, and the
> TLBs may allocate a new entry for it. Until step 4 completes, this entry
> may remain active in the TLB, and may conflict with an existing entry.
> 
> If that entry covers the kernel text for steps 2-5, executing the
> sequence may result in an unrecoverable TLB conflict abort, or some
> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
> might fetch instructions from the wrong address such that steps 2-5
> cannot be executed.
> 
> If the kernel doesn't explicitly access the address covered by that pte,
> there may still be a problem. The TLB may perform an internal lookup
> when performing a page table walk, and could then use an erroneous
> result to continue the walk, resulting in a variety of potential issues
> (e.g. reading from an MMIO peripheral register).
> 
> BBM avoids the conflict, but as that would mean kernel text and/or data
> would be unmapped, you can't execute the code to finish the update.
> 
>> I find x86 does not have this limit. e.g. set_memory_r*.
> 
> I don't know much about x86; it's probably worth asking the x86 guys
> about that. It may be that the x86 architecture requires that a conflict
> or amalgamation is never visible to software, or it could be that
> contemporary implementations happen to provide that property.
> 
> Thanks,
> Mark.
> 

Hi Mark,

If I do like this, does it have the problem too?

kmalloc a size
no access
flush tlb
call set_memory_ro to change the page table flag
flush tlb
start access

Thanks,
Xishi Qiu 

> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-13  5:02         ` Xishi Qiu
@ 2016-01-13  6:35           ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-13  6:35 UTC (permalink / raw)
  To: Mark Rutland
  Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On 2016/1/13 13:02, Xishi Qiu wrote:

> On 2016/1/12 19:15, Mark Rutland wrote:
> 
>> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>>> On 2016/1/11 21:31, Mark Rutland wrote:
>>>
>>>> Hi,
>>>>
>>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>>
>>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>>
>>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>>
>>>> I'm not sure I understand the question.
>>>>
>>>> I'm also not a fan of responding to off-list queries as information gets
>>>> lost.
>>>>
>>>> Please ask your question on the mailing list. I am more than happy to
>>>> respond there.
>>>>
>>>> Thanks,
>>>> Mark.
>>>>
>>>
>>> Hi Mark,
>>>
>>> In your patch it said "The presence of conflicting TLB entries may result in
>>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>>> approach) cannot work for modifications to the swapper page tables that cover the
>>> kernel text and data."
>>>
>>> I'm not quite understand this, why the direct mapping can't work?
>>
>> The problem is that the TLB hardware can operate asynchronously to the
>> rest of the CPU. At any point in time, for any reason, it can decide to
>> destroy TLB entries, to allocate new ones, or to perform a walk based on
>> the existing contents of the TLB.
>>
>> When the TLB contains conflicting entries, TLB lookups may result in TLB
>> conflict aborts, or may return an "amalgamation" of the conflicting
>> entries (e.g. you could get an erroneous output address).
>>
>> The direct mapping is in active use (and hence live in TLBs). Modifying
>> it without break-before-make (BBM) risks the allocation of conflicting
>> TLB entries. Modifying it with BBM risks unmapping the portion of the
>> kernel performing the modification, resulting in an unrecoverable abort.
>>
>>> flush tlb can't resolve it?
>>
>> Flushing the TLB doesn't help because the page table update, TLB
>> invalidate, and corresponding barrier(s) are separate operations. The
>> TLB can allocate or destroy entries at any point during the sequence.
>>
>> For example, without BBM a page table update would look something like:
>>
>> 1)	str	<newpte>, [<*pte>]
>> 2)	dsb	ish
>> 3)	tlbi	vmalle1is
>> 4)	dsb	ish
>> 5)	isb
>>
>> After step 1, the new pte value may become visible to the TLBs, and the
>> TLBs may allocate a new entry for it. Until step 4 completes, this entry
>> may remain active in the TLB, and may conflict with an existing entry.
>>
>> If that entry covers the kernel text for steps 2-5, executing the
>> sequence may result in an unrecoverable TLB conflict abort, or some
>> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
>> might fetch instructions from the wrong address such that steps 2-5
>> cannot be executed.
>>
>> If the kernel doesn't explicitly access the address covered by that pte,
>> there may still be a problem. The TLB may perform an internal lookup
>> when performing a page table walk, and could then use an erroneous
>> result to continue the walk, resulting in a variety of potential issues
>> (e.g. reading from an MMIO peripheral register).
>>
>> BBM avoids the conflict, but as that would mean kernel text and/or data
>> would be unmapped, you can't execute the code to finish the update.
>>
>>> I find x86 does not have this limit. e.g. set_memory_r*.
>>
>> I don't know much about x86; it's probably worth asking the x86 guys
>> about that. It may be that the x86 architecture requires that a conflict
>> or amalgamation is never visible to software, or it could be that
>> contemporary implementations happen to provide that property.
>>
>> Thanks,
>> Mark.
>>
> 
> Hi Mark,
> 
> If I do like this, does it have the problem too?
> 
> kmalloc a size

exactly is alloc page, the size is aligned with page size.

> no access
> flush tlb
> call set_memory_ro to change the page table flag
> flush tlb
> start access
> 
> Thanks,
> Xishi Qiu 
> 
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-13  6:35           ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-13  6:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/1/13 13:02, Xishi Qiu wrote:

> On 2016/1/12 19:15, Mark Rutland wrote:
> 
>> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>>> On 2016/1/11 21:31, Mark Rutland wrote:
>>>
>>>> Hi,
>>>>
>>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>>
>>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>>
>>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>>
>>>> I'm not sure I understand the question.
>>>>
>>>> I'm also not a fan of responding to off-list queries as information gets
>>>> lost.
>>>>
>>>> Please ask your question on the mailing list. I am more than happy to
>>>> respond there.
>>>>
>>>> Thanks,
>>>> Mark.
>>>>
>>>
>>> Hi Mark,
>>>
>>> In your patch it said "The presence of conflicting TLB entries may result in
>>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>>> approach) cannot work for modifications to the swapper page tables that cover the
>>> kernel text and data."
>>>
>>> I'm not quite understand this, why the direct mapping can't work?
>>
>> The problem is that the TLB hardware can operate asynchronously to the
>> rest of the CPU. At any point in time, for any reason, it can decide to
>> destroy TLB entries, to allocate new ones, or to perform a walk based on
>> the existing contents of the TLB.
>>
>> When the TLB contains conflicting entries, TLB lookups may result in TLB
>> conflict aborts, or may return an "amalgamation" of the conflicting
>> entries (e.g. you could get an erroneous output address).
>>
>> The direct mapping is in active use (and hence live in TLBs). Modifying
>> it without break-before-make (BBM) risks the allocation of conflicting
>> TLB entries. Modifying it with BBM risks unmapping the portion of the
>> kernel performing the modification, resulting in an unrecoverable abort.
>>
>>> flush tlb can't resolve it?
>>
>> Flushing the TLB doesn't help because the page table update, TLB
>> invalidate, and corresponding barrier(s) are separate operations. The
>> TLB can allocate or destroy entries at any point during the sequence.
>>
>> For example, without BBM a page table update would look something like:
>>
>> 1)	str	<newpte>, [<*pte>]
>> 2)	dsb	ish
>> 3)	tlbi	vmalle1is
>> 4)	dsb	ish
>> 5)	isb
>>
>> After step 1, the new pte value may become visible to the TLBs, and the
>> TLBs may allocate a new entry for it. Until step 4 completes, this entry
>> may remain active in the TLB, and may conflict with an existing entry.
>>
>> If that entry covers the kernel text for steps 2-5, executing the
>> sequence may result in an unrecoverable TLB conflict abort, or some
>> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
>> might fetch instructions from the wrong address such that steps 2-5
>> cannot be executed.
>>
>> If the kernel doesn't explicitly access the address covered by that pte,
>> there may still be a problem. The TLB may perform an internal lookup
>> when performing a page table walk, and could then use an erroneous
>> result to continue the walk, resulting in a variety of potential issues
>> (e.g. reading from an MMIO peripheral register).
>>
>> BBM avoids the conflict, but as that would mean kernel text and/or data
>> would be unmapped, you can't execute the code to finish the update.
>>
>>> I find x86 does not have this limit. e.g. set_memory_r*.
>>
>> I don't know much about x86; it's probably worth asking the x86 guys
>> about that. It may be that the x86 architecture requires that a conflict
>> or amalgamation is never visible to software, or it could be that
>> contemporary implementations happen to provide that property.
>>
>> Thanks,
>> Mark.
>>
> 
> Hi Mark,
> 
> If I do like this, does it have the problem too?
> 
> kmalloc a size

exactly is alloc page, the size is aligned with page size.

> no access
> flush tlb
> call set_memory_ro to change the page table flag
> flush tlb
> start access
> 
> Thanks,
> Xishi Qiu 
> 
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-12 11:15       ` Mark Rutland
@ 2016-01-13 10:30         ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-13 10:30 UTC (permalink / raw)
  To: Mark Rutland
  Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On 2016/1/12 19:15, Mark Rutland wrote:

> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>> On 2016/1/11 21:31, Mark Rutland wrote:
>>
>>> Hi,
>>>
>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>
>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>
>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>
>>> I'm not sure I understand the question.
>>>
>>> I'm also not a fan of responding to off-list queries as information gets
>>> lost.
>>>
>>> Please ask your question on the mailing list. I am more than happy to
>>> respond there.
>>>
>>> Thanks,
>>> Mark.
>>>
>>
>> Hi Mark,
>>
>> In your patch it said "The presence of conflicting TLB entries may result in
>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>> approach) cannot work for modifications to the swapper page tables that cover the
>> kernel text and data."
>>
>> I'm not quite understand this, why the direct mapping can't work?
> 
> The problem is that the TLB hardware can operate asynchronously to the
> rest of the CPU. At any point in time, for any reason, it can decide to
> destroy TLB entries, to allocate new ones, or to perform a walk based on
> the existing contents of the TLB.
> 
> When the TLB contains conflicting entries, TLB lookups may result in TLB
> conflict aborts, or may return an "amalgamation" of the conflicting
> entries (e.g. you could get an erroneous output address).
> 
> The direct mapping is in active use (and hence live in TLBs). Modifying
> it without break-before-make (BBM) risks the allocation of conflicting
> TLB entries. Modifying it with BBM risks unmapping the portion of the
> kernel performing the modification, resulting in an unrecoverable abort.
> 
>> flush tlb can't resolve it?
> 
> Flushing the TLB doesn't help because the page table update, TLB
> invalidate, and corresponding barrier(s) are separate operations. The
> TLB can allocate or destroy entries at any point during the sequence.
> 
> For example, without BBM a page table update would look something like:
> 
> 1)	str	<newpte>, [<*pte>]
> 2)	dsb	ish
> 3)	tlbi	vmalle1is
> 4)	dsb	ish
> 5)	isb
> 
> After step 1, the new pte value may become visible to the TLBs, and the
> TLBs may allocate a new entry for it. Until step 4 completes, this entry
> may remain active in the TLB, and may conflict with an existing entry.
> 
> If that entry covers the kernel text for steps 2-5, executing the
> sequence may result in an unrecoverable TLB conflict abort, or some
> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
> might fetch instructions from the wrong address such that steps 2-5
> cannot be executed.
> 
> If the kernel doesn't explicitly access the address covered by that pte,
> there may still be a problem. The TLB may perform an internal lookup
> when performing a page table walk, and could then use an erroneous
> result to continue the walk, resulting in a variety of potential issues
> (e.g. reading from an MMIO peripheral register).
> 
> BBM avoids the conflict, but as that would mean kernel text and/or data
> would be unmapped, you can't execute the code to finish the update.
> 
>> I find x86 does not have this limit. e.g. set_memory_r*.
> 
> I don't know much about x86; it's probably worth asking the x86 guys
> about that. It may be that the x86 architecture requires that a conflict
> or amalgamation is never visible to software, or it could be that
> contemporary implementations happen to provide that property.
> 
> Thanks,
> Mark.
> 

Hi Mark,

If I create swapper page tables by 4kb, not large page, then I use
set_memory_ro() to change the pate table flag, does it have the problem
too?

Thanks,
Xishi Qiu
 

> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-13 10:30         ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-13 10:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/1/12 19:15, Mark Rutland wrote:

> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>> On 2016/1/11 21:31, Mark Rutland wrote:
>>
>>> Hi,
>>>
>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>
>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>
>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>
>>> I'm not sure I understand the question.
>>>
>>> I'm also not a fan of responding to off-list queries as information gets
>>> lost.
>>>
>>> Please ask your question on the mailing list. I am more than happy to
>>> respond there.
>>>
>>> Thanks,
>>> Mark.
>>>
>>
>> Hi Mark,
>>
>> In your patch it said "The presence of conflicting TLB entries may result in
>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>> approach) cannot work for modifications to the swapper page tables that cover the
>> kernel text and data."
>>
>> I'm not quite understand this, why the direct mapping can't work?
> 
> The problem is that the TLB hardware can operate asynchronously to the
> rest of the CPU. At any point in time, for any reason, it can decide to
> destroy TLB entries, to allocate new ones, or to perform a walk based on
> the existing contents of the TLB.
> 
> When the TLB contains conflicting entries, TLB lookups may result in TLB
> conflict aborts, or may return an "amalgamation" of the conflicting
> entries (e.g. you could get an erroneous output address).
> 
> The direct mapping is in active use (and hence live in TLBs). Modifying
> it without break-before-make (BBM) risks the allocation of conflicting
> TLB entries. Modifying it with BBM risks unmapping the portion of the
> kernel performing the modification, resulting in an unrecoverable abort.
> 
>> flush tlb can't resolve it?
> 
> Flushing the TLB doesn't help because the page table update, TLB
> invalidate, and corresponding barrier(s) are separate operations. The
> TLB can allocate or destroy entries at any point during the sequence.
> 
> For example, without BBM a page table update would look something like:
> 
> 1)	str	<newpte>, [<*pte>]
> 2)	dsb	ish
> 3)	tlbi	vmalle1is
> 4)	dsb	ish
> 5)	isb
> 
> After step 1, the new pte value may become visible to the TLBs, and the
> TLBs may allocate a new entry for it. Until step 4 completes, this entry
> may remain active in the TLB, and may conflict with an existing entry.
> 
> If that entry covers the kernel text for steps 2-5, executing the
> sequence may result in an unrecoverable TLB conflict abort, or some
> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
> might fetch instructions from the wrong address such that steps 2-5
> cannot be executed.
> 
> If the kernel doesn't explicitly access the address covered by that pte,
> there may still be a problem. The TLB may perform an internal lookup
> when performing a page table walk, and could then use an erroneous
> result to continue the walk, resulting in a variety of potential issues
> (e.g. reading from an MMIO peripheral register).
> 
> BBM avoids the conflict, but as that would mean kernel text and/or data
> would be unmapped, you can't execute the code to finish the update.
> 
>> I find x86 does not have this limit. e.g. set_memory_r*.
> 
> I don't know much about x86; it's probably worth asking the x86 guys
> about that. It may be that the x86 architecture requires that a conflict
> or amalgamation is never visible to software, or it could be that
> contemporary implementations happen to provide that property.
> 
> Thanks,
> Mark.
> 

Hi Mark,

If I create swapper page tables by 4kb, not large page, then I use
set_memory_ro() to change the pate table flag, does it have the problem
too?

Thanks,
Xishi Qiu
 

> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-13 10:30         ` Xishi Qiu
@ 2016-01-13 11:18           ` Mark Rutland
  -1 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-13 11:18 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On Wed, Jan 13, 2016 at 06:30:06PM +0800, Xishi Qiu wrote:
> Hi Mark,
> 
> If I create swapper page tables by 4kb, not large page, then I use
> set_memory_ro() to change the pate table flag, does it have the problem
> too?

The splitting/merging problem would not apply.

However, you're going to waste a reasonable amount of memory by not
using section mappings in the swapper, and we gain additional complexity
in the page table setup code (which is shared with others things that
want section mappings).

What are you exactly actually trying to achieve?

What memory do you want to mark RO, and why?

>From a previous discussion [1], we figured out alternative approaches
for common cases. Do none of those work for your case?

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397320.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-13 11:18           ` Mark Rutland
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-13 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 13, 2016 at 06:30:06PM +0800, Xishi Qiu wrote:
> Hi Mark,
> 
> If I create swapper page tables by 4kb, not large page, then I use
> set_memory_ro() to change the pate table flag, does it have the problem
> too?

The splitting/merging problem would not apply.

However, you're going to waste a reasonable amount of memory by not
using section mappings in the swapper, and we gain additional complexity
in the page table setup code (which is shared with others things that
want section mappings).

What are you exactly actually trying to achieve?

What memory do you want to mark RO, and why?

>From a previous discussion [1], we figured out alternative approaches
for common cases. Do none of those work for your case?

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397320.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-13  4:10         ` Xishi Qiu
@ 2016-01-13 11:22           ` Mark Rutland
  -1 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-13 11:22 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On Wed, Jan 13, 2016 at 12:10:29PM +0800, Xishi Qiu wrote:
> On 2016/1/12 19:15, Mark Rutland wrote:
> 
> > On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
> >> On 2016/1/11 21:31, Mark Rutland wrote:
> >>
> >>> Hi,
> >>>
> >>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
> >>>>
> >>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
> >>>>
> >>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
> >>>> and merging wiil produce confilct in the liner mapping area. Based on the
> >>>> situation, Assume that set up page table in 4kb page table way in the liner
> >>>> mapping area, Does the set_memroy_** will work without any conplict??
> >>>
> >>> I'm not sure I understand the question.
> >>>
> >>> I'm also not a fan of responding to off-list queries as information gets
> >>> lost.
> >>>
> >>> Please ask your question on the mailing list. I am more than happy to
> >>> respond there.
> >>>
> >>> Thanks,
> >>> Mark.
> >>>
> >>
> >> Hi Mark,
> >>
> >> In your patch it said "The presence of conflicting TLB entries may result in
> >> a variety of behaviours detrimental to the system " and "but this(break-before-make
> >> approach) cannot work for modifications to the swapper page tables that cover the
> >> kernel text and data."
> >>
> >> I'm not quite understand this, why the direct mapping can't work?
> > 
> > The problem is that the TLB hardware can operate asynchronously to the
> > rest of the CPU. At any point in time, for any reason, it can decide to
> > destroy TLB entries, to allocate new ones, or to perform a walk based on
> > the existing contents of the TLB.
> > 
> > When the TLB contains conflicting entries, TLB lookups may result in TLB
> > conflict aborts, or may return an "amalgamation" of the conflicting
> > entries (e.g. you could get an erroneous output address).
> > 
> > The direct mapping is in active use (and hence live in TLBs). Modifying
> > it without break-before-make (BBM) risks the allocation of conflicting
> > TLB entries. Modifying it with BBM risks unmapping the portion of the
> > kernel performing the modification, resulting in an unrecoverable abort.
> > 
> >> flush tlb can't resolve it?
> > 
> > Flushing the TLB doesn't help because the page table update, TLB
> > invalidate, and corresponding barrier(s) are separate operations. The
> > TLB can allocate or destroy entries at any point during the sequence.
> > 
> > For example, without BBM a page table update would look something like:
> > 
> > 1)	str	<newpte>, [<*pte>]
> > 2)	dsb	ish
> > 3)	tlbi	vmalle1is
> > 4)	dsb	ish
> > 5)	isb
> > 
> > After step 1, the new pte value may become visible to the TLBs, and the
> > TLBs may allocate a new entry for it. Until step 4 completes, this entry
> > may remain active in the TLB, and may conflict with an existing entry.
> > 
> > If that entry covers the kernel text for steps 2-5, executing the
> > sequence may result in an unrecoverable TLB conflict abort, or some
> > other behaviour resulting from an amalgamated TLB, e.g. the I-cache
> > might fetch instructions from the wrong address such that steps 2-5
> > cannot be executed.
> > 
> > If the kernel doesn't explicitly access the address covered by that pte,
> > there may still be a problem. The TLB may perform an internal lookup
> > when performing a page table walk, and could then use an erroneous
> > result to continue the walk, resulting in a variety of potential issues
> > (e.g. reading from an MMIO peripheral register).
> > 
> > BBM avoids the conflict, but as that would mean kernel text and/or data
> > would be unmapped, you can't execute the code to finish the update.
> > 
> >> I find x86 does not have this limit. e.g. set_memory_r*.
> > 
> > I don't know much about x86; it's probably worth asking the x86 guys
> > about that. It may be that the x86 architecture requires that a conflict
> > or amalgamation is never visible to software, or it could be that
> > contemporary implementations happen to provide that property.
> > 
> > Thanks,
> > Mark.
> > 
> 
> Hi Mark,

Hi,

> Thank you for your reply, I find this code in /arch/arm64/mm/mmu.c
> 
> ...
> #ifdef CONFIG_DEBUG_RODATA
> void mark_rodata_ro(void)
> {
> 	create_mapping_late(__pa(_stext), (unsigned long)_stext,
> 				(unsigned long)_etext - (unsigned long)_stext,
> 				PAGE_KERNEL_EXEC | PTE_RDONLY);
> 
> }
> #endif
> ...
> 
> So does it also have this problem?

Currently, yes.

I've addressed the splitting/merging problem with my pagetable rework
series [1,2]. The RO region is initially mapped at the same granularity
as it will be modified with, so only the permissions bits will change
when mark_rodata_ro is called.

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397095.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397114.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-13 11:22           ` Mark Rutland
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-13 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 13, 2016 at 12:10:29PM +0800, Xishi Qiu wrote:
> On 2016/1/12 19:15, Mark Rutland wrote:
> 
> > On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
> >> On 2016/1/11 21:31, Mark Rutland wrote:
> >>
> >>> Hi,
> >>>
> >>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
> >>>>
> >>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
> >>>>
> >>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
> >>>> and merging wiil produce confilct in the liner mapping area. Based on the
> >>>> situation, Assume that set up page table in 4kb page table way in the liner
> >>>> mapping area, Does the set_memroy_** will work without any conplict??
> >>>
> >>> I'm not sure I understand the question.
> >>>
> >>> I'm also not a fan of responding to off-list queries as information gets
> >>> lost.
> >>>
> >>> Please ask your question on the mailing list. I am more than happy to
> >>> respond there.
> >>>
> >>> Thanks,
> >>> Mark.
> >>>
> >>
> >> Hi Mark,
> >>
> >> In your patch it said "The presence of conflicting TLB entries may result in
> >> a variety of behaviours detrimental to the system " and "but this(break-before-make
> >> approach) cannot work for modifications to the swapper page tables that cover the
> >> kernel text and data."
> >>
> >> I'm not quite understand this, why the direct mapping can't work?
> > 
> > The problem is that the TLB hardware can operate asynchronously to the
> > rest of the CPU. At any point in time, for any reason, it can decide to
> > destroy TLB entries, to allocate new ones, or to perform a walk based on
> > the existing contents of the TLB.
> > 
> > When the TLB contains conflicting entries, TLB lookups may result in TLB
> > conflict aborts, or may return an "amalgamation" of the conflicting
> > entries (e.g. you could get an erroneous output address).
> > 
> > The direct mapping is in active use (and hence live in TLBs). Modifying
> > it without break-before-make (BBM) risks the allocation of conflicting
> > TLB entries. Modifying it with BBM risks unmapping the portion of the
> > kernel performing the modification, resulting in an unrecoverable abort.
> > 
> >> flush tlb can't resolve it?
> > 
> > Flushing the TLB doesn't help because the page table update, TLB
> > invalidate, and corresponding barrier(s) are separate operations. The
> > TLB can allocate or destroy entries at any point during the sequence.
> > 
> > For example, without BBM a page table update would look something like:
> > 
> > 1)	str	<newpte>, [<*pte>]
> > 2)	dsb	ish
> > 3)	tlbi	vmalle1is
> > 4)	dsb	ish
> > 5)	isb
> > 
> > After step 1, the new pte value may become visible to the TLBs, and the
> > TLBs may allocate a new entry for it. Until step 4 completes, this entry
> > may remain active in the TLB, and may conflict with an existing entry.
> > 
> > If that entry covers the kernel text for steps 2-5, executing the
> > sequence may result in an unrecoverable TLB conflict abort, or some
> > other behaviour resulting from an amalgamated TLB, e.g. the I-cache
> > might fetch instructions from the wrong address such that steps 2-5
> > cannot be executed.
> > 
> > If the kernel doesn't explicitly access the address covered by that pte,
> > there may still be a problem. The TLB may perform an internal lookup
> > when performing a page table walk, and could then use an erroneous
> > result to continue the walk, resulting in a variety of potential issues
> > (e.g. reading from an MMIO peripheral register).
> > 
> > BBM avoids the conflict, but as that would mean kernel text and/or data
> > would be unmapped, you can't execute the code to finish the update.
> > 
> >> I find x86 does not have this limit. e.g. set_memory_r*.
> > 
> > I don't know much about x86; it's probably worth asking the x86 guys
> > about that. It may be that the x86 architecture requires that a conflict
> > or amalgamation is never visible to software, or it could be that
> > contemporary implementations happen to provide that property.
> > 
> > Thanks,
> > Mark.
> > 
> 
> Hi Mark,

Hi,

> Thank you for your reply, I find this code in /arch/arm64/mm/mmu.c
> 
> ...
> #ifdef CONFIG_DEBUG_RODATA
> void mark_rodata_ro(void)
> {
> 	create_mapping_late(__pa(_stext), (unsigned long)_stext,
> 				(unsigned long)_etext - (unsigned long)_stext,
> 				PAGE_KERNEL_EXEC | PTE_RDONLY);
> 
> }
> #endif
> ...
> 
> So does it also have this problem?

Currently, yes.

I've addressed the splitting/merging problem with my pagetable rework
series [1,2]. The RO region is initially mapped at the same granularity
as it will be modified with, so only the permissions bits will change
when mark_rodata_ro is called.

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397095.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397114.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-13  5:02         ` Xishi Qiu
@ 2016-01-13 11:28           ` Mark Rutland
  -1 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-13 11:28 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On Wed, Jan 13, 2016 at 01:02:31PM +0800, Xishi Qiu wrote:
> Hi Mark,
> 
> If I do like this, does it have the problem too?
> 
> kmalloc a size
> no access
> flush tlb
> call set_memory_ro to change the page table flag
> flush tlb
> start access

This is broken.

The kmalloc will give you memory form the linear mapping. Even if you
allocate a page, that page could have been mapped with a section at the
PMD/PUD/PGD level.

Other data could fall within that section (e.g. a kernel stack,
perhaps).

Additional TLB flushees do not help. There's still a race against the
asynchronous TLB logic. The TLB can allocate or destroy entries at any
tim. If there were no page table changes prior to the invalidate, the
TLB could re-allocate all existing entries immediately after the TLB
invalidate, leaving you in the same state as before.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-13 11:28           ` Mark Rutland
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-13 11:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 13, 2016 at 01:02:31PM +0800, Xishi Qiu wrote:
> Hi Mark,
> 
> If I do like this, does it have the problem too?
> 
> kmalloc a size
> no access
> flush tlb
> call set_memory_ro to change the page table flag
> flush tlb
> start access

This is broken.

The kmalloc will give you memory form the linear mapping. Even if you
allocate a page, that page could have been mapped with a section at the
PMD/PUD/PGD level.

Other data could fall within that section (e.g. a kernel stack,
perhaps).

Additional TLB flushees do not help. There's still a race against the
asynchronous TLB logic. The TLB can allocate or destroy entries at any
tim. If there were no page table changes prior to the invalidate, the
TLB could re-allocate all existing entries immediately after the TLB
invalidate, leaving you in the same state as before.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-13 11:18           ` Mark Rutland
@ 2016-01-14 12:35             ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-14 12:35 UTC (permalink / raw)
  To: Mark Rutland
  Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On 2016/1/13 19:18, Mark Rutland wrote:

> On Wed, Jan 13, 2016 at 06:30:06PM +0800, Xishi Qiu wrote:
>> Hi Mark,
>>
>> If I create swapper page tables by 4kb, not large page, then I use
>> set_memory_ro() to change the pate table flag, does it have the problem
>> too?
> 
> The splitting/merging problem would not apply.
> 
> However, you're going to waste a reasonable amount of memory by not
> using section mappings in the swapper, and we gain additional complexity
> in the page table setup code (which is shared with others things that
> want section mappings).
> 
> What are you exactly actually trying to achieve?
> 

If module allocates some pages and save data on them, and the data will
not be changed during the module running. So we want to use set_memory_ro()
to increase the security. If the data is changed, we can catch someone.

> What memory do you want to mark RO, and why?
> 

The key data, and it will not be changed during the running time.

>>From a previous discussion [1], we figured out alternative approaches
> for common cases. Do none of those work for your case?
> 

I have not read the patchset carefully, could you tell me the general meaning
of the approaches?

Thanks,
Xishi Qiu

> Thanks,
> Mark.
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397320.html
> 
> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-14 12:35             ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-14 12:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/1/13 19:18, Mark Rutland wrote:

> On Wed, Jan 13, 2016 at 06:30:06PM +0800, Xishi Qiu wrote:
>> Hi Mark,
>>
>> If I create swapper page tables by 4kb, not large page, then I use
>> set_memory_ro() to change the pate table flag, does it have the problem
>> too?
> 
> The splitting/merging problem would not apply.
> 
> However, you're going to waste a reasonable amount of memory by not
> using section mappings in the swapper, and we gain additional complexity
> in the page table setup code (which is shared with others things that
> want section mappings).
> 
> What are you exactly actually trying to achieve?
> 

If module allocates some pages and save data on them, and the data will
not be changed during the module running. So we want to use set_memory_ro()
to increase the security. If the data is changed, we can catch someone.

> What memory do you want to mark RO, and why?
> 

The key data, and it will not be changed during the running time.

>>From a previous discussion [1], we figured out alternative approaches
> for common cases. Do none of those work for your case?
> 

I have not read the patchset carefully, could you tell me the general meaning
of the approaches?

Thanks,
Xishi Qiu

> Thanks,
> Mark.
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397320.html
> 
> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-14 12:35             ` Xishi Qiu
@ 2016-01-14 13:06               ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-14 13:06 UTC (permalink / raw)
  To: Mark Rutland
  Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On 2016/1/14 20:35, Xishi Qiu wrote:

> On 2016/1/13 19:18, Mark Rutland wrote:
> 
>> On Wed, Jan 13, 2016 at 06:30:06PM +0800, Xishi Qiu wrote:
>>> Hi Mark,
>>>
>>> If I create swapper page tables by 4kb, not large page, then I use
>>> set_memory_ro() to change the pate table flag, does it have the problem
>>> too?
>>
>> The splitting/merging problem would not apply.
>>
>> However, you're going to waste a reasonable amount of memory by not
>> using section mappings in the swapper, and we gain additional complexity
>> in the page table setup code (which is shared with others things that
>> want section mappings).
>>
>> What are you exactly actually trying to achieve?
>>
> 
> If module allocates some pages and save data on them, and the data will
> not be changed during the module running. So we want to use set_memory_ro()
> to increase the security. If the data is changed, we can catch someone.
> 
>> What memory do you want to mark RO, and why?
>>
> 
> The key data, and it will not be changed during the running time.
> 
>> >From a previous discussion [1], we figured out alternative approaches
>> for common cases. Do none of those work for your case?
>>
> 
> I have not read the patchset carefully, could you tell me the general meaning
> of the approaches?
> 

Hi Mark,

Is the two approaches like following?
1. use create_mapping to map the data in read only, then use fixmap to create a
temp page table, and change the data when necessary.
2. use vmalloc, then we can use set_memory_ro to change the page table prot.

Thanks,
Xishi Qiu

> Thanks,
> Xishi Qiu
> 
>> Thanks,
>> Mark.
>>
>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397320.html
>>
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-14 13:06               ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-14 13:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/1/14 20:35, Xishi Qiu wrote:

> On 2016/1/13 19:18, Mark Rutland wrote:
> 
>> On Wed, Jan 13, 2016 at 06:30:06PM +0800, Xishi Qiu wrote:
>>> Hi Mark,
>>>
>>> If I create swapper page tables by 4kb, not large page, then I use
>>> set_memory_ro() to change the pate table flag, does it have the problem
>>> too?
>>
>> The splitting/merging problem would not apply.
>>
>> However, you're going to waste a reasonable amount of memory by not
>> using section mappings in the swapper, and we gain additional complexity
>> in the page table setup code (which is shared with others things that
>> want section mappings).
>>
>> What are you exactly actually trying to achieve?
>>
> 
> If module allocates some pages and save data on them, and the data will
> not be changed during the module running. So we want to use set_memory_ro()
> to increase the security. If the data is changed, we can catch someone.
> 
>> What memory do you want to mark RO, and why?
>>
> 
> The key data, and it will not be changed during the running time.
> 
>> >From a previous discussion [1], we figured out alternative approaches
>> for common cases. Do none of those work for your case?
>>
> 
> I have not read the patchset carefully, could you tell me the general meaning
> of the approaches?
> 

Hi Mark,

Is the two approaches like following?
1. use create_mapping to map the data in read only, then use fixmap to create a
temp page table, and change the data when necessary.
2. use vmalloc, then we can use set_memory_ro to change the page table prot.

Thanks,
Xishi Qiu

> Thanks,
> Xishi Qiu
> 
>> Thanks,
>> Mark.
>>
>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397320.html
>>
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-14 13:06               ` Xishi Qiu
@ 2016-01-14 13:44                 ` Mark Rutland
  -1 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-14 13:44 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML,
	ard.biesheuvel

On Thu, Jan 14, 2016 at 09:06:08PM +0800, Xishi Qiu wrote:
> On 2016/1/14 20:35, Xishi Qiu wrote:
> 
> > On 2016/1/13 19:18, Mark Rutland wrote:
> > 
> >> On Wed, Jan 13, 2016 at 06:30:06PM +0800, Xishi Qiu wrote:
> >>> Hi Mark,
> >>>
> >>> If I create swapper page tables by 4kb, not large page, then I use
> >>> set_memory_ro() to change the pate table flag, does it have the problem
> >>> too?
> >>
> >> The splitting/merging problem would not apply.
> >>
> >> However, you're going to waste a reasonable amount of memory by not
> >> using section mappings in the swapper, and we gain additional complexity
> >> in the page table setup code (which is shared with others things that
> >> want section mappings).
> >>
> >> What are you exactly actually trying to achieve?
> >>
> > 
> > If module allocates some pages and save data on them, and the data will
> > not be changed during the module running. So we want to use set_memory_ro()
> > to increase the security. If the data is changed, we can catch someone.
> > 
> >> What memory do you want to mark RO, and why?
> >>
> > 
> > The key data, and it will not be changed during the running time.
> > 
> >> >From a previous discussion [1], we figured out alternative approaches
> >> for common cases. Do none of those work for your case?
> >>
> > 
> > I have not read the patchset carefully, could you tell me the general meaning
> > of the approaches?
> > 
> 
> Hi Mark,
> 
> Is the two approaches like following?
> 1. use create_mapping to map the data in read only, then use fixmap to create a
> temp page table, and change the data when necessary.

In your code you'd have to statically place the data in .rodata somehow
(e.g. [2]). Your code would not call create_mapping. The usual init code
would take care of that.

Note that this can only work for a fixed amount of data, whereas it
sounds like you are doing dynamic allocation.

> 2. use vmalloc, then we can use set_memory_ro to change the page table prot.

Something like this should be workable, yes. See [3,4].

> >> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397320.html

[2] https://lkml.org/lkml/2015/11/24/724
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/399015.html
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/399252.html

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-14 13:44                 ` Mark Rutland
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-14 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 14, 2016 at 09:06:08PM +0800, Xishi Qiu wrote:
> On 2016/1/14 20:35, Xishi Qiu wrote:
> 
> > On 2016/1/13 19:18, Mark Rutland wrote:
> > 
> >> On Wed, Jan 13, 2016 at 06:30:06PM +0800, Xishi Qiu wrote:
> >>> Hi Mark,
> >>>
> >>> If I create swapper page tables by 4kb, not large page, then I use
> >>> set_memory_ro() to change the pate table flag, does it have the problem
> >>> too?
> >>
> >> The splitting/merging problem would not apply.
> >>
> >> However, you're going to waste a reasonable amount of memory by not
> >> using section mappings in the swapper, and we gain additional complexity
> >> in the page table setup code (which is shared with others things that
> >> want section mappings).
> >>
> >> What are you exactly actually trying to achieve?
> >>
> > 
> > If module allocates some pages and save data on them, and the data will
> > not be changed during the module running. So we want to use set_memory_ro()
> > to increase the security. If the data is changed, we can catch someone.
> > 
> >> What memory do you want to mark RO, and why?
> >>
> > 
> > The key data, and it will not be changed during the running time.
> > 
> >> >From a previous discussion [1], we figured out alternative approaches
> >> for common cases. Do none of those work for your case?
> >>
> > 
> > I have not read the patchset carefully, could you tell me the general meaning
> > of the approaches?
> > 
> 
> Hi Mark,
> 
> Is the two approaches like following?
> 1. use create_mapping to map the data in read only, then use fixmap to create a
> temp page table, and change the data when necessary.

In your code you'd have to statically place the data in .rodata somehow
(e.g. [2]). Your code would not call create_mapping. The usual init code
would take care of that.

Note that this can only work for a fixed amount of data, whereas it
sounds like you are doing dynamic allocation.

> 2. use vmalloc, then we can use set_memory_ro to change the page table prot.

Something like this should be workable, yes. See [3,4].

> >> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397320.html

[2] https://lkml.org/lkml/2015/11/24/724
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/399015.html
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/399252.html

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-13  5:02         ` Xishi Qiu
@ 2016-01-26 14:05           ` zhong jiang
  -1 siblings, 0 replies; 32+ messages in thread
From: zhong jiang @ 2016-01-26 14:05 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Xishi Qiu, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On 2016/1/13 13:02, Xishi Qiu wrote:
> On 2016/1/12 19:15, Mark Rutland wrote:
> 
>> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>>> On 2016/1/11 21:31, Mark Rutland wrote:
>>>
>>>> Hi,
>>>>
>>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>>
>>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>>
>>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>>
>>>> I'm not sure I understand the question.
>>>>
>>>> I'm also not a fan of responding to off-list queries as information gets
>>>> lost.
>>>>
>>>> Please ask your question on the mailing list. I am more than happy to
>>>> respond there.
>>>>
>>>> Thanks,
>>>> Mark.
>>>>
>>>
>>> Hi Mark,
>>>
>>> In your patch it said "The presence of conflicting TLB entries may result in
>>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>>> approach) cannot work for modifications to the swapper page tables that cover the
>>> kernel text and data."
>>>
>>> I'm not quite understand this, why the direct mapping can't work?
>>
>> The problem is that the TLB hardware can operate asynchronously to the
>> rest of the CPU. At any point in time, for any reason, it can decide to
>> destroy TLB entries, to allocate new ones, or to perform a walk based on
>> the existing contents of the TLB.
>>
>> When the TLB contains conflicting entries, TLB lookups may result in TLB
>> conflict aborts, or may return an "amalgamation" of the conflicting
>> entries (e.g. you could get an erroneous output address).
>>
>> The direct mapping is in active use (and hence live in TLBs). Modifying
>> it without break-before-make (BBM) risks the allocation of conflicting
>> TLB entries. Modifying it with BBM risks unmapping the portion of the
>> kernel performing the modification, resulting in an unrecoverable abort.
>>
>>> flush tlb can't resolve it?
>>
>> Flushing the TLB doesn't help because the page table update, TLB
>> invalidate, and corresponding barrier(s) are separate operations. The
>> TLB can allocate or destroy entries at any point during the sequence.
>>
>> For example, without BBM a page table update would look something like:
>>
>> 1)	str	<newpte>, [<*pte>]
>> 2)	dsb	ish
>> 3)	tlbi	vmalle1is
>> 4)	dsb	ish
>> 5)	isb
>>
>> After step 1, the new pte value may become visible to the TLBs, and the
>> TLBs may allocate a new entry for it. Until step 4 completes, this entry
>> may remain active in the TLB, and may conflict with an existing entry.
>>
>> If that entry covers the kernel text for steps 2-5, executing the
>> sequence may result in an unrecoverable TLB conflict abort, or some
>> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
>> might fetch instructions from the wrong address such that steps 2-5
>> cannot be executed.
>>
>> If the kernel doesn't explicitly access the address covered by that pte,
>> there may still be a problem. The TLB may perform an internal lookup
>> when performing a page table walk, and could then use an erroneous
>> result to continue the walk, resulting in a variety of potential issues
>> (e.g. reading from an MMIO peripheral register).
>>
>> BBM avoids the conflict, but as that would mean kernel text and/or data
>> would be unmapped, you can't execute the code to finish the update.
>>
>>> I find x86 does not have this limit. e.g. set_memory_r*.
>>
>> I don't know much about x86; it's probably worth asking the x86 guys
>> about that. It may be that the x86 architecture requires that a conflict
>> or amalgamation is never visible to software, or it could be that
>> contemporary implementations happen to provide that property.
>>
>> Thanks,
>> Mark.
>>
> 
> Hi Mark,
> 
> If I do like this, does it have the problem too?
> 
> kmalloc a size
> no access
> flush tlb
> call set_memory_ro to change the page table flag
> flush tlb
> start access
> 
> Thanks,
> Xishi Qiu 
> 

Hi, Mark,

I have some confuse about the tlb conflict when split from 2M block to 4K pages.
I think if core A starts to split page table from 2M to 4K pages, with the
operation __sync_icache_dcache to make sure flush the pte to PoU. Other core
will have three kinds of situation:

1. have the old pmd cached in tlb, so it will see the old physical address.
2. have no old pmd cached in tlb, it will see the new entry when
 __sync_icache_dcache is over.
3. have no old pmd cached in tlb, it maybe see the old entry before
__sync_icache_dcache is over. But, if the core A finish tlbi and dsb sy, all
the tlbs will see the new pte.

In my opinion, It seems that the below example will only trigger tlb conflict
when merging to huge page.

For example, without BBM a page table update would look something like:
 1)	str	<newpte>, [<*pte>]
 2)	dsb	ish
 3)	tlbi	vmalle1is
 4)	dsb	ish
 5)	isb

So I have no idea about how to trigger conflict when tlb conflict.
Can you give some advice and example ?


Thanks
zhongjiang

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-26 14:05           ` zhong jiang
  0 siblings, 0 replies; 32+ messages in thread
From: zhong jiang @ 2016-01-26 14:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/1/13 13:02, Xishi Qiu wrote:
> On 2016/1/12 19:15, Mark Rutland wrote:
> 
>> On Tue, Jan 12, 2016 at 09:20:54AM +0800, Xishi Qiu wrote:
>>> On 2016/1/11 21:31, Mark Rutland wrote:
>>>
>>>> Hi,
>>>>
>>>> On Mon, Jan 11, 2016 at 08:59:44PM +0800, zhong jiang wrote:
>>>>>
>>>>> http://www.spinics.net/lists/arm-kernel/msg472090.html
>>>>>
>>>>> Hi, Can I ask you a question? Say, This patch tells that the section spilting
>>>>> and merging wiil produce confilct in the liner mapping area. Based on the
>>>>> situation, Assume that set up page table in 4kb page table way in the liner
>>>>> mapping area, Does the set_memroy_** will work without any conplict??
>>>>
>>>> I'm not sure I understand the question.
>>>>
>>>> I'm also not a fan of responding to off-list queries as information gets
>>>> lost.
>>>>
>>>> Please ask your question on the mailing list. I am more than happy to
>>>> respond there.
>>>>
>>>> Thanks,
>>>> Mark.
>>>>
>>>
>>> Hi Mark,
>>>
>>> In your patch it said "The presence of conflicting TLB entries may result in
>>> a variety of behaviours detrimental to the system " and "but this(break-before-make
>>> approach) cannot work for modifications to the swapper page tables that cover the
>>> kernel text and data."
>>>
>>> I'm not quite understand this, why the direct mapping can't work?
>>
>> The problem is that the TLB hardware can operate asynchronously to the
>> rest of the CPU. At any point in time, for any reason, it can decide to
>> destroy TLB entries, to allocate new ones, or to perform a walk based on
>> the existing contents of the TLB.
>>
>> When the TLB contains conflicting entries, TLB lookups may result in TLB
>> conflict aborts, or may return an "amalgamation" of the conflicting
>> entries (e.g. you could get an erroneous output address).
>>
>> The direct mapping is in active use (and hence live in TLBs). Modifying
>> it without break-before-make (BBM) risks the allocation of conflicting
>> TLB entries. Modifying it with BBM risks unmapping the portion of the
>> kernel performing the modification, resulting in an unrecoverable abort.
>>
>>> flush tlb can't resolve it?
>>
>> Flushing the TLB doesn't help because the page table update, TLB
>> invalidate, and corresponding barrier(s) are separate operations. The
>> TLB can allocate or destroy entries at any point during the sequence.
>>
>> For example, without BBM a page table update would look something like:
>>
>> 1)	str	<newpte>, [<*pte>]
>> 2)	dsb	ish
>> 3)	tlbi	vmalle1is
>> 4)	dsb	ish
>> 5)	isb
>>
>> After step 1, the new pte value may become visible to the TLBs, and the
>> TLBs may allocate a new entry for it. Until step 4 completes, this entry
>> may remain active in the TLB, and may conflict with an existing entry.
>>
>> If that entry covers the kernel text for steps 2-5, executing the
>> sequence may result in an unrecoverable TLB conflict abort, or some
>> other behaviour resulting from an amalgamated TLB, e.g. the I-cache
>> might fetch instructions from the wrong address such that steps 2-5
>> cannot be executed.
>>
>> If the kernel doesn't explicitly access the address covered by that pte,
>> there may still be a problem. The TLB may perform an internal lookup
>> when performing a page table walk, and could then use an erroneous
>> result to continue the walk, resulting in a variety of potential issues
>> (e.g. reading from an MMIO peripheral register).
>>
>> BBM avoids the conflict, but as that would mean kernel text and/or data
>> would be unmapped, you can't execute the code to finish the update.
>>
>>> I find x86 does not have this limit. e.g. set_memory_r*.
>>
>> I don't know much about x86; it's probably worth asking the x86 guys
>> about that. It may be that the x86 architecture requires that a conflict
>> or amalgamation is never visible to software, or it could be that
>> contemporary implementations happen to provide that property.
>>
>> Thanks,
>> Mark.
>>
> 
> Hi Mark,
> 
> If I do like this, does it have the problem too?
> 
> kmalloc a size
> no access
> flush tlb
> call set_memory_ro to change the page table flag
> flush tlb
> start access
> 
> Thanks,
> Xishi Qiu 
> 

Hi, Mark,

I have some confuse about the tlb conflict when split from 2M block to 4K pages.
I think if core A starts to split page table from 2M to 4K pages, with the
operation __sync_icache_dcache to make sure flush the pte to PoU. Other core
will have three kinds of situation:

1. have the old pmd cached in tlb, so it will see the old physical address.
2. have no old pmd cached in tlb, it will see the new entry when
 __sync_icache_dcache is over.
3. have no old pmd cached in tlb, it maybe see the old entry before
__sync_icache_dcache is over. But, if the core A finish tlbi and dsb sy, all
the tlbs will see the new pte.

In my opinion, It seems that the below example will only trigger tlb conflict
when merging to huge page.

For example, without BBM a page table update would look something like:
 1)	str	<newpte>, [<*pte>]
 2)	dsb	ish
 3)	tlbi	vmalle1is
 4)	dsb	ish
 5)	isb

So I have no idea about how to trigger conflict when tlb conflict.
Can you give some advice and example ?


Thanks
zhongjiang

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-26 14:05           ` zhong jiang
@ 2016-01-26 16:07             ` Mark Rutland
  -1 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-26 16:07 UTC (permalink / raw)
  To: zhong jiang; +Cc: Xishi Qiu, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On Tue, Jan 26, 2016 at 10:05:43PM +0800, zhong jiang wrote:
> Hi, Mark,
> 
> I have some confuse about the tlb conflict when split from 2M block to 4K pages.
> I think if core A starts to split page table from 2M to 4K pages, with the
> operation __sync_icache_dcache to make sure flush the pte to PoU.

I don't follow why you would use __sync_icache_dcache here. This has
nothing to do with the I-cache (as the VA->PA mappings stay the same at
splitting time). 

For ARMv8 (or ARMv7 with ID_MMFR3.CohWalk > 0), the TLB walks are fully
coherent, and do not require page tables to be cleaned to the PoU in
order to be visible.

> Other core will have three kinds of situation:
> 
> 1. have the old pmd cached in tlb, so it will see the old physical address.

The presence of the old entry in the TLB does not guarantee that the new
entry cannot also be allocated. The TLB can allocate a new TLB entry at
any point in time for any active, valid page table entry (or combination
of entries).

For instance, perhaps when walking the page tables, the walker allocates
TLB entries for all valid page table entries in the same cache line, on
the assumption that future accesses are likely to be nearby in the VA
space. The TLB might handle duplicate (identical) entries by design, but
not conflicting ones. For this case, a (speculative) walk of of a nearby
page could result in allocation of a conflicting entry.

> 2. have no old pmd cached in tlb, it will see the new entry when
>  __sync_icache_dcache is over.

The TLB can fetch any valid, active entry at any time.

It could fetch the old value from memory before the write was completed,
then a subsequent fetch of the new value could occur. This devolves into
the case I describe above for (1).

> 3. have no old pmd cached in tlb, it maybe see the old entry before
> __sync_icache_dcache is over. But, if the core A finish tlbi and dsb sy, all
> the tlbs will see the new pte.

The TLB can fetch any valid, active entry at any time. It could fetch
the old entry, then the new entry, before the TLB maintenance completes.

If other asynchronous logic (e.g. speculative execution, I-cache
fetches, or page table walks) uses the results of an amalgamated
translation, the CPU may access a physical address that was not intended
to be accessed (perhaps resulting in an SError), or could allocate the
wrong data into caches or TLBs, leading to further issues.

The same problem applies as with (2), which devolves to (1).

> In my opinion, It seems that the below example will only trigger tlb conflict
> when merging to huge page.
> 
> For example, without BBM a page table update would look something like:
>  1)	str	<newpte>, [<*pte>]
>  2)	dsb	ish
>  3)	tlbi	vmalle1is
>  4)	dsb	ish
>  5)	isb
> 
> So I have no idea about how to trigger conflict when tlb conflict.
> Can you give some advice and example ?

I have explained above how this may occur, on one possible
implementation. There are many possible problems that I have not
described above, which are avoided by a Break-Before-Make seuqence.

Even in the presence of conflicting entries a CPU might not raise a TLB
conflict. It is also architecturally valid to match one entry, or to
match an amalgamation of the two. So you may not be able to trigger
problems resulting from a conflict on all implementations.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-26 16:07             ` Mark Rutland
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-26 16:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 26, 2016 at 10:05:43PM +0800, zhong jiang wrote:
> Hi, Mark,
> 
> I have some confuse about the tlb conflict when split from 2M block to 4K pages.
> I think if core A starts to split page table from 2M to 4K pages, with the
> operation __sync_icache_dcache to make sure flush the pte to PoU.

I don't follow why you would use __sync_icache_dcache here. This has
nothing to do with the I-cache (as the VA->PA mappings stay the same at
splitting time). 

For ARMv8 (or ARMv7 with ID_MMFR3.CohWalk > 0), the TLB walks are fully
coherent, and do not require page tables to be cleaned to the PoU in
order to be visible.

> Other core will have three kinds of situation:
> 
> 1. have the old pmd cached in tlb, so it will see the old physical address.

The presence of the old entry in the TLB does not guarantee that the new
entry cannot also be allocated. The TLB can allocate a new TLB entry at
any point in time for any active, valid page table entry (or combination
of entries).

For instance, perhaps when walking the page tables, the walker allocates
TLB entries for all valid page table entries in the same cache line, on
the assumption that future accesses are likely to be nearby in the VA
space. The TLB might handle duplicate (identical) entries by design, but
not conflicting ones. For this case, a (speculative) walk of of a nearby
page could result in allocation of a conflicting entry.

> 2. have no old pmd cached in tlb, it will see the new entry when
>  __sync_icache_dcache is over.

The TLB can fetch any valid, active entry at any time.

It could fetch the old value from memory before the write was completed,
then a subsequent fetch of the new value could occur. This devolves into
the case I describe above for (1).

> 3. have no old pmd cached in tlb, it maybe see the old entry before
> __sync_icache_dcache is over. But, if the core A finish tlbi and dsb sy, all
> the tlbs will see the new pte.

The TLB can fetch any valid, active entry at any time. It could fetch
the old entry, then the new entry, before the TLB maintenance completes.

If other asynchronous logic (e.g. speculative execution, I-cache
fetches, or page table walks) uses the results of an amalgamated
translation, the CPU may access a physical address that was not intended
to be accessed (perhaps resulting in an SError), or could allocate the
wrong data into caches or TLBs, leading to further issues.

The same problem applies as with (2), which devolves to (1).

> In my opinion, It seems that the below example will only trigger tlb conflict
> when merging to huge page.
> 
> For example, without BBM a page table update would look something like:
>  1)	str	<newpte>, [<*pte>]
>  2)	dsb	ish
>  3)	tlbi	vmalle1is
>  4)	dsb	ish
>  5)	isb
> 
> So I have no idea about how to trigger conflict when tlb conflict.
> Can you give some advice and example ?

I have explained above how this may occur, on one possible
implementation. There are many possible problems that I have not
described above, which are avoided by a Break-Before-Make seuqence.

Even in the presence of conflicting entries a CPU might not raise a TLB
conflict. It is also architecturally valid to match one entry, or to
match an amalgamation of the two. So you may not be able to trigger
problems resulting from a conflict on all implementations.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-13 11:28           ` Mark Rutland
@ 2016-01-27  1:18             ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-27  1:18 UTC (permalink / raw)
  To: Mark Rutland
  Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On 2016/1/13 19:28, Mark Rutland wrote:

> On Wed, Jan 13, 2016 at 01:02:31PM +0800, Xishi Qiu wrote:
>> Hi Mark,
>>
>> If I do like this, does it have the problem too?
>>
>> kmalloc a size
>> no access
>> flush tlb
>> call set_memory_ro to change the page table flag
>> flush tlb
>> start access
> 
> This is broken.
> 
> The kmalloc will give you memory form the linear mapping. Even if you
> allocate a page, that page could have been mapped with a section at the
> PMD/PUD/PGD level.
> 
> Other data could fall within that section (e.g. a kernel stack,
> perhaps).

Hi Mark,

If nobody use that whole section before(however it is almost impossible),
flush tlb is safe, right?

Thanks,
Xishi Qiu

> 
> Additional TLB flushees do not help. There's still a race against the
> asynchronous TLB logic. The TLB can allocate or destroy entries at any
> tim. If there were no page table changes prior to the invalidate, the
> TLB could re-allocate all existing entries immediately after the TLB
> invalidate, leaving you in the same state as before.
> 
> Thanks,
> Mark.
> 
> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-27  1:18             ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2016-01-27  1:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/1/13 19:28, Mark Rutland wrote:

> On Wed, Jan 13, 2016 at 01:02:31PM +0800, Xishi Qiu wrote:
>> Hi Mark,
>>
>> If I do like this, does it have the problem too?
>>
>> kmalloc a size
>> no access
>> flush tlb
>> call set_memory_ro to change the page table flag
>> flush tlb
>> start access
> 
> This is broken.
> 
> The kmalloc will give you memory form the linear mapping. Even if you
> allocate a page, that page could have been mapped with a section at the
> PMD/PUD/PGD level.
> 
> Other data could fall within that section (e.g. a kernel stack,
> perhaps).

Hi Mark,

If nobody use that whole section before(however it is almost impossible),
flush tlb is safe, right?

Thanks,
Xishi Qiu

> 
> Additional TLB flushees do not help. There's still a race against the
> asynchronous TLB logic. The TLB can allocate or destroy entries at any
> tim. If there were no page table changes prior to the invalidate, the
> TLB could re-allocate all existing entries immediately after the TLB
> invalidate, leaving you in the same state as before.
> 
> Thanks,
> Mark.
> 
> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Have any influence on set_memory_** about below patch ??
  2016-01-27  1:18             ` Xishi Qiu
@ 2016-01-27 11:25               ` Mark Rutland
  -1 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-27 11:25 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: zhong jiang, Laura Abbott, Hanjun Guo, linux-arm-kernel, LKML

On Wed, Jan 27, 2016 at 09:18:57AM +0800, Xishi Qiu wrote:
> On 2016/1/13 19:28, Mark Rutland wrote:
> 
> > On Wed, Jan 13, 2016 at 01:02:31PM +0800, Xishi Qiu wrote:
> >> Hi Mark,
> >>
> >> If I do like this, does it have the problem too?
> >>
> >> kmalloc a size
> >> no access
> >> flush tlb
> >> call set_memory_ro to change the page table flag
> >> flush tlb
> >> start access
> > 
> > This is broken.
> > 
> > The kmalloc will give you memory form the linear mapping. Even if you
> > allocate a page, that page could have been mapped with a section at the
> > PMD/PUD/PGD level.
> > 
> > Other data could fall within that section (e.g. a kernel stack,
> > perhaps).
> 
> Hi Mark,
> 
> If nobody use that whole section before(however it is almost impossible),
> flush tlb is safe, right?

No, it is not safe.

As I mentioned before, there is a race against the hardware that you
cannot win:

> > Additional TLB flushees do not help. There's still a race against the
> > asynchronous TLB logic. The TLB can allocate or destroy entries at any
> > tim. If there were no page table changes prior to the invalidate, the
> > TLB could re-allocate all existing entries immediately after the TLB
> > invalidate, leaving you in the same state as before.

It doesn't matter whether code hasn't accessed a portion of the VA
space. You cannot guarantee that a valid entry will not be allocated
into the TLB at any time.

See the ARM ARM (ARM DDI 0487A.h), D4.6.1, About ARMv8 Translation
Lookaside Buffers (TLBs):

    Any translation table entry that does not generate a Translation
    fault, an Address size fault, or an Access flag fault and is not
    from a translation regime for an Exception level that is lower than
    the current Exception level might be allocated to an enabled TLB at
    any time.

You must either use a Break-Before-Make approach, or ensure that the
page tables are not live (i.e. not reachable by one of the TTBRs, and
not having any partial walks cached in TLBs) at the time they are
modified. In practice, both of these require an approach like [1] and
are incredibly expensive.

The only other option is to not use sections at all [2], though this
incurs other costs.

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401434.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401690.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Have any influence on set_memory_** about below patch ??
@ 2016-01-27 11:25               ` Mark Rutland
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Rutland @ 2016-01-27 11:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 27, 2016 at 09:18:57AM +0800, Xishi Qiu wrote:
> On 2016/1/13 19:28, Mark Rutland wrote:
> 
> > On Wed, Jan 13, 2016 at 01:02:31PM +0800, Xishi Qiu wrote:
> >> Hi Mark,
> >>
> >> If I do like this, does it have the problem too?
> >>
> >> kmalloc a size
> >> no access
> >> flush tlb
> >> call set_memory_ro to change the page table flag
> >> flush tlb
> >> start access
> > 
> > This is broken.
> > 
> > The kmalloc will give you memory form the linear mapping. Even if you
> > allocate a page, that page could have been mapped with a section at the
> > PMD/PUD/PGD level.
> > 
> > Other data could fall within that section (e.g. a kernel stack,
> > perhaps).
> 
> Hi Mark,
> 
> If nobody use that whole section before(however it is almost impossible),
> flush tlb is safe, right?

No, it is not safe.

As I mentioned before, there is a race against the hardware that you
cannot win:

> > Additional TLB flushees do not help. There's still a race against the
> > asynchronous TLB logic. The TLB can allocate or destroy entries at any
> > tim. If there were no page table changes prior to the invalidate, the
> > TLB could re-allocate all existing entries immediately after the TLB
> > invalidate, leaving you in the same state as before.

It doesn't matter whether code hasn't accessed a portion of the VA
space. You cannot guarantee that a valid entry will not be allocated
into the TLB at any time.

See the ARM ARM (ARM DDI 0487A.h), D4.6.1, About ARMv8 Translation
Lookaside Buffers (TLBs):

    Any translation table entry that does not generate a Translation
    fault, an Address size fault, or an Access flag fault and is not
    from a translation regime for an Exception level that is lower than
    the current Exception level might be allocated to an enabled TLB at
    any time.

You must either use a Break-Before-Make approach, or ensure that the
page tables are not live (i.e. not reachable by one of the TTBRs, and
not having any partial walks cached in TLBs) at the time they are
modified. In practice, both of these require an approach like [1] and
are incredibly expensive.

The only other option is to not use sections at all [2], though this
incurs other costs.

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401434.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401690.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-01-27 11:25 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5693A740.7070408@huawei.com>
     [not found] ` <20160111133145.GM6499@leverpostej>
2016-01-12  1:20   ` Have any influence on set_memory_** about below patch ?? Xishi Qiu
2016-01-12  1:20     ` Xishi Qiu
2016-01-12 11:15     ` Mark Rutland
2016-01-12 11:15       ` Mark Rutland
2016-01-13  4:10       ` Xishi Qiu
2016-01-13  4:10         ` Xishi Qiu
2016-01-13 11:22         ` Mark Rutland
2016-01-13 11:22           ` Mark Rutland
2016-01-13  5:02       ` Xishi Qiu
2016-01-13  5:02         ` Xishi Qiu
2016-01-13  6:35         ` Xishi Qiu
2016-01-13  6:35           ` Xishi Qiu
2016-01-13 11:28         ` Mark Rutland
2016-01-13 11:28           ` Mark Rutland
2016-01-27  1:18           ` Xishi Qiu
2016-01-27  1:18             ` Xishi Qiu
2016-01-27 11:25             ` Mark Rutland
2016-01-27 11:25               ` Mark Rutland
2016-01-26 14:05         ` zhong jiang
2016-01-26 14:05           ` zhong jiang
2016-01-26 16:07           ` Mark Rutland
2016-01-26 16:07             ` Mark Rutland
2016-01-13 10:30       ` Xishi Qiu
2016-01-13 10:30         ` Xishi Qiu
2016-01-13 11:18         ` Mark Rutland
2016-01-13 11:18           ` Mark Rutland
2016-01-14 12:35           ` Xishi Qiu
2016-01-14 12:35             ` Xishi Qiu
2016-01-14 13:06             ` Xishi Qiu
2016-01-14 13:06               ` Xishi Qiu
2016-01-14 13:44               ` Mark Rutland
2016-01-14 13:44                 ` Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.