All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory
@ 2016-09-05  8:43 James Morse
  2016-09-05  8:43 ` [PATCH] arm64: Drop generic xlate_dev_mem_{k,}ptr() James Morse
  2016-09-05 10:35 ` [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory Will Deacon
  0 siblings, 2 replies; 10+ messages in thread
From: James Morse @ 2016-09-05  8:43 UTC (permalink / raw)
  To: linux-arm-kernel

Running 'acpidump' on an arm64 system causes the program to SIGSEGV, and
yield the following trace:
[   71.623572] acpidump[2159]: unhandled alignment fault (11) at 0xffff8a4a909c, esr 0x92000021
[   71.632030] pgd = ffff8003ecf7d000
[   71.635420] [ffff8a4a909c] *pgd=00000083ec370003, *pud=00000083e7225003, *pmd=00000083ec3f1003, *pte=01600083ff1c3fc3
[   71.646042]
[   71.647524] CPU: 0 PID: 2159 Comm: acpidump Tainted: G        W I     4.8.0-rc4 #5081
[   71.655352] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1002C 04/08/2016
[   71.663957] task: ffff8003eac41900 task.stack: ffff8003e7210000
[   71.669871] PC is at 0xffff8a3c77d4
[   71.673353] LR is at 0x404934
[   71.676314] pc : [<0000ffff8a3c77d4>] lr : [<0000000000404934>] pstate: 00000000
[   71.683702] sp : 0000ffffd0e39af0

Acpidump digs through /sys/ to find the physical address of the ACPI tables,
then reads them from /dev/mem. The code that provides /dev/mem uses
phys_mem_access_prot() to determine the protection type it should use.

Arm64's phys_mem_access_prot() reports any memory region that isn't part
of the linear map as device memory. This breaks Acpidump as the acpi tables
are neither device memory nor part of the linear map.

Change this check to use memblock_is_memory() instead. On an EFI system
any region the efi memory map described as any of WB/WC/WT appears in
memblock.memory. On a non-EFI system, setup_machine_fdt() will cause
memory nodes in the DT to be added to memblock.memory.

/dev/mem is the only user of phys_mem_access_prot()

Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Stable?
Acpi was introduced for v4.1 but hidden behind 'expert'.
'expert' was removed for v4.7.

 arch/arm64/mm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 4989948d1feb..96a2f327fd2c 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -63,7 +63,7 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
-	if (!pfn_valid(pfn))
+	if (!memblock_is_memory(pfn << PAGE_SHIFT))
 		return pgprot_noncached(vma_prot);
 	else if (file->f_flags & O_SYNC)
 		return pgprot_writecombine(vma_prot);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] arm64: Drop generic xlate_dev_mem_{k,}ptr()
  2016-09-05  8:43 [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory James Morse
@ 2016-09-05  8:43 ` James Morse
  2016-09-05 10:35 ` [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory Will Deacon
  1 sibling, 0 replies; 10+ messages in thread
From: James Morse @ 2016-09-05  8:43 UTC (permalink / raw)
  To: linux-arm-kernel

The code that provides /dev/mem uses xlate_dev_mem_{k,}ptr() to
avoid making a cachable mapping of a non-cachable area on ia64.
On arm64 we do this via phys_mem_access_prot() instead, but provide
dummy versions of xlate_dev_mem_{k,}ptr().

These are the same as those in asm-generic/io.h, which we include from
asm/io.h

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/io.h | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 9b6e408cfa51..ce20741b2cb5 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -184,17 +184,6 @@ extern void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size);
 #define iowrite32be(v,p)	({ __iowmb(); __raw_writel((__force __u32)cpu_to_be32(v), p); })
 #define iowrite64be(v,p)	({ __iowmb(); __raw_writeq((__force __u64)cpu_to_be64(v), p); })
 
-/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)	__va(p)
-
-/*
- * Convert a virtual cached pointer to an uncached pointer
- */
-#define xlate_dev_kmem_ptr(p)	p
-
 #include <asm-generic/io.h>
 
 /*
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory
  2016-09-05  8:43 [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory James Morse
  2016-09-05  8:43 ` [PATCH] arm64: Drop generic xlate_dev_mem_{k,}ptr() James Morse
@ 2016-09-05 10:35 ` Will Deacon
  2016-09-05 13:03   ` James Morse
  1 sibling, 1 reply; 10+ messages in thread
From: Will Deacon @ 2016-09-05 10:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 05, 2016 at 09:43:03AM +0100, James Morse wrote:
> Running 'acpidump' on an arm64 system causes the program to SIGSEGV, and
> yield the following trace:
> [   71.623572] acpidump[2159]: unhandled alignment fault (11) at 0xffff8a4a909c, esr 0x92000021
> [   71.632030] pgd = ffff8003ecf7d000
> [   71.635420] [ffff8a4a909c] *pgd=00000083ec370003, *pud=00000083e7225003, *pmd=00000083ec3f1003, *pte=01600083ff1c3fc3
> [   71.646042]
> [   71.647524] CPU: 0 PID: 2159 Comm: acpidump Tainted: G        W I     4.8.0-rc4 #5081
> [   71.655352] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1002C 04/08/2016
> [   71.663957] task: ffff8003eac41900 task.stack: ffff8003e7210000
> [   71.669871] PC is at 0xffff8a3c77d4
> [   71.673353] LR is at 0x404934
> [   71.676314] pc : [<0000ffff8a3c77d4>] lr : [<0000000000404934>] pstate: 00000000
> [   71.683702] sp : 0000ffffd0e39af0
> 
> Acpidump digs through /sys/ to find the physical address of the ACPI tables,
> then reads them from /dev/mem. The code that provides /dev/mem uses
> phys_mem_access_prot() to determine the protection type it should use.
> 
> Arm64's phys_mem_access_prot() reports any memory region that isn't part
> of the linear map as device memory. This breaks Acpidump as the acpi tables
> are neither device memory nor part of the linear map.
> 
> Change this check to use memblock_is_memory() instead. On an EFI system
> any region the efi memory map described as any of WB/WC/WT appears in
> memblock.memory. On a non-EFI system, setup_machine_fdt() will cause
> memory nodes in the DT to be added to memblock.memory.
> 
> /dev/mem is the only user of phys_mem_access_prot()
> 
> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Stable?
> Acpi was introduced for v4.1 but hidden behind 'expert'.
> 'expert' was removed for v4.7.
> 
>  arch/arm64/mm/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 4989948d1feb..96a2f327fd2c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -63,7 +63,7 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  			      unsigned long size, pgprot_t vma_prot)
>  {
> -	if (!pfn_valid(pfn))
> +	if (!memblock_is_memory(pfn << PAGE_SHIFT))

Hmm, this looks pretty weird to me. pfn_valid currently calls
memblock_is_map_memory, so now we have this slightly paradoxical situation
of having some memory marked NOMAP but wanting to map it to userspace.

So our abstractions don't seem to align with reality. Why are the ACPI
tables getting marked as NOMAP to being with?

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory
  2016-09-05 10:35 ` [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory Will Deacon
@ 2016-09-05 13:03   ` James Morse
  2016-09-05 13:10     ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: James Morse @ 2016-09-05 13:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

(CC: Ard and Mark, get_maintainer.pl didn't apply common sense for me...)

On 05/09/16 11:35, Will Deacon wrote:
> On Mon, Sep 05, 2016 at 09:43:03AM +0100, James Morse wrote:
>> Running 'acpidump' on an arm64 system causes the program to SIGSEGV, and
>> yield the following trace:
>> [   71.623572] acpidump[2159]: unhandled alignment fault (11) at 0xffff8a4a909c, esr 0x92000021
>> [   71.632030] pgd = ffff8003ecf7d000
>> [   71.635420] [ffff8a4a909c] *pgd=00000083ec370003, *pud=00000083e7225003, *pmd=00000083ec3f1003, *pte=01600083ff1c3fc3
>> [   71.646042]
>> [   71.647524] CPU: 0 PID: 2159 Comm: acpidump Tainted: G        W I     4.8.0-rc4 #5081
>> [   71.655352] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1002C 04/08/2016
>> [   71.663957] task: ffff8003eac41900 task.stack: ffff8003e7210000
>> [   71.669871] PC is at 0xffff8a3c77d4
>> [   71.673353] LR is at 0x404934
>> [   71.676314] pc : [<0000ffff8a3c77d4>] lr : [<0000000000404934>] pstate: 00000000
>> [   71.683702] sp : 0000ffffd0e39af0
>>
>> Acpidump digs through /sys/ to find the physical address of the ACPI tables,
>> then reads them from /dev/mem. The code that provides /dev/mem uses
>> phys_mem_access_prot() to determine the protection type it should use.
>>
>> Arm64's phys_mem_access_prot() reports any memory region that isn't part
>> of the linear map as device memory. This breaks Acpidump as the acpi tables
>> are neither device memory nor part of the linear map.
>>
>> Change this check to use memblock_is_memory() instead. On an EFI system
>> any region the efi memory map described as any of WB/WC/WT appears in
>> memblock.memory. On a non-EFI system, setup_machine_fdt() will cause
>> memory nodes in the DT to be added to memblock.memory.
>>
>> /dev/mem is the only user of phys_mem_access_prot()

>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 4989948d1feb..96a2f327fd2c 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -63,7 +63,7 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>  			      unsigned long size, pgprot_t vma_prot)
>>  {
>> -	if (!pfn_valid(pfn))
>> +	if (!memblock_is_memory(pfn << PAGE_SHIFT))

> Hmm, this looks pretty weird to me. pfn_valid currently calls
> memblock_is_map_memory, so now we have this slightly paradoxical situation
> of having some memory marked NOMAP but wanting to map it to userspace.

Yup, crazy as it looks, that is what is happening.


> So our abstractions don't seem to align with reality. Why are the ACPI
> tables getting marked as NOMAP to being with?

EFI adds any region we can map as normal memory to memblock.memory, it also adds
anything it considers reserved to the memblock.nomap, e.g. the ACPI tables. This
causes these regions be left out of the linear map.
(I don't know why exactly, but assume its so that the attributes and permissions
can be tweaked easily)

Nomap here means 'not in the linear map'. When we call phys_mem_access_prot(),
or acpi_os_ioremap() we are accessing these pages and want to know the
attributes to map with. For phys_mem_access_prot() 'not in the linear map' means
'device memory', which causes acpidump to crash as it makes unaligned accesses.
(We had similar problems with acpica in the kernel and acpi_os_ioremap(), which
[0] fixes).


Thanks,

James

[0] https://www.spinics.net/lists/arm-kernel/msg525369.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory
  2016-09-05 13:03   ` James Morse
@ 2016-09-05 13:10     ` Ard Biesheuvel
  2016-09-05 13:24       ` Will Deacon
  0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2016-09-05 13:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 5 September 2016 at 14:03, James Morse <james.morse@arm.com> wrote:
> Hi Will,
>
> (CC: Ard and Mark, get_maintainer.pl didn't apply common sense for me...)
>
> On 05/09/16 11:35, Will Deacon wrote:
>> On Mon, Sep 05, 2016 at 09:43:03AM +0100, James Morse wrote:
>>> Running 'acpidump' on an arm64 system causes the program to SIGSEGV, and
>>> yield the following trace:
>>> [   71.623572] acpidump[2159]: unhandled alignment fault (11) at 0xffff8a4a909c, esr 0x92000021
>>> [   71.632030] pgd = ffff8003ecf7d000
>>> [   71.635420] [ffff8a4a909c] *pgd=00000083ec370003, *pud=00000083e7225003, *pmd=00000083ec3f1003, *pte=01600083ff1c3fc3
>>> [   71.646042]
>>> [   71.647524] CPU: 0 PID: 2159 Comm: acpidump Tainted: G        W I     4.8.0-rc4 #5081
>>> [   71.655352] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1002C 04/08/2016
>>> [   71.663957] task: ffff8003eac41900 task.stack: ffff8003e7210000
>>> [   71.669871] PC is at 0xffff8a3c77d4
>>> [   71.673353] LR is at 0x404934
>>> [   71.676314] pc : [<0000ffff8a3c77d4>] lr : [<0000000000404934>] pstate: 00000000
>>> [   71.683702] sp : 0000ffffd0e39af0
>>>
>>> Acpidump digs through /sys/ to find the physical address of the ACPI tables,
>>> then reads them from /dev/mem. The code that provides /dev/mem uses
>>> phys_mem_access_prot() to determine the protection type it should use.
>>>
>>> Arm64's phys_mem_access_prot() reports any memory region that isn't part
>>> of the linear map as device memory. This breaks Acpidump as the acpi tables
>>> are neither device memory nor part of the linear map.
>>>
>>> Change this check to use memblock_is_memory() instead. On an EFI system
>>> any region the efi memory map described as any of WB/WC/WT appears in
>>> memblock.memory. On a non-EFI system, setup_machine_fdt() will cause
>>> memory nodes in the DT to be added to memblock.memory.
>>>
>>> /dev/mem is the only user of phys_mem_access_prot()
>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 4989948d1feb..96a2f327fd2c 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -63,7 +63,7 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>>                            unsigned long size, pgprot_t vma_prot)
>>>  {
>>> -    if (!pfn_valid(pfn))
>>> +    if (!memblock_is_memory(pfn << PAGE_SHIFT))
>
>> Hmm, this looks pretty weird to me. pfn_valid currently calls
>> memblock_is_map_memory, so now we have this slightly paradoxical situation
>> of having some memory marked NOMAP but wanting to map it to userspace.
>
> Yup, crazy as it looks, that is what is happening.
>
>

Indeed. Formerly, we would remove this from memblock entirely,
resulting in the same situation, i.e., that the region is omitted from
the linear mapping. However, by doing that, we also lose the
annotation that the region was memory to begin with, resulting in the
ACPI layer using ioremap() rather than ioremap_cache() or memremap()
to map it.

So the whole point of introducing MEMBLOCK_NOMAP was to allow memory
to be kept in the memblock array, but with an annotation that it
should not be covered by the linear mapping.

>> So our abstractions don't seem to align with reality. Why are the ACPI
>> tables getting marked as NOMAP to being with?
>
> EFI adds any region we can map as normal memory to memblock.memory, it also adds
> anything it considers reserved to the memblock.nomap, e.g. the ACPI tables. This
> causes these regions be left out of the linear map.
> (I don't know why exactly, but assume its so that the attributes and permissions
> can be tweaked easily)
>

To prevent mismatched attributes between the linear mapping and
whatever mapping the firmware and/or ACPI are using. Also, going
through the trouble of mapping runtime services executable code with
R-X permissions and then having a writable alias is not great in terms
of security.

> Nomap here means 'not in the linear map'. When we call phys_mem_access_prot(),
> or acpi_os_ioremap() we are accessing these pages and want to know the
> attributes to map with. For phys_mem_access_prot() 'not in the linear map' means
> 'device memory', which causes acpidump to crash as it makes unaligned accesses.
> (We had similar problems with acpica in the kernel and acpi_os_ioremap(), which
> [0] fixes).
>

Indeed. We need to classify memory regions into 3 groups: { linearly
mapped memory, other memory, not memory }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory
  2016-09-05 13:10     ` Ard Biesheuvel
@ 2016-09-05 13:24       ` Will Deacon
  2016-09-05 13:41         ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2016-09-05 13:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 05, 2016 at 02:10:54PM +0100, Ard Biesheuvel wrote:
> On 5 September 2016 at 14:03, James Morse <james.morse@arm.com> wrote:
> > On 05/09/16 11:35, Will Deacon wrote:
> >> On Mon, Sep 05, 2016 at 09:43:03AM +0100, James Morse wrote:
> >>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >>> index 4989948d1feb..96a2f327fd2c 100644
> >>> --- a/arch/arm64/mm/mmu.c
> >>> +++ b/arch/arm64/mm/mmu.c
> >>> @@ -63,7 +63,7 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
> >>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> >>>                            unsigned long size, pgprot_t vma_prot)
> >>>  {
> >>> -    if (!pfn_valid(pfn))
> >>> +    if (!memblock_is_memory(pfn << PAGE_SHIFT))
> >
> >> Hmm, this looks pretty weird to me. pfn_valid currently calls
> >> memblock_is_map_memory, so now we have this slightly paradoxical situation
> >> of having some memory marked NOMAP but wanting to map it to userspace.
> >
> > Yup, crazy as it looks, that is what is happening.
> >
> 
> Indeed. Formerly, we would remove this from memblock entirely,
> resulting in the same situation, i.e., that the region is omitted from
> the linear mapping. However, by doing that, we also lose the
> annotation that the region was memory to begin with, resulting in the
> ACPI layer using ioremap() rather than ioremap_cache() or memremap()
> to map it.
> 
> So the whole point of introducing MEMBLOCK_NOMAP was to allow memory
> to be kept in the memblock array, but with an annotation that it
> should not be covered by the linear mapping.
> 
> >> So our abstractions don't seem to align with reality. Why are the ACPI
> >> tables getting marked as NOMAP to being with?
> >
> > EFI adds any region we can map as normal memory to memblock.memory, it also adds
> > anything it considers reserved to the memblock.nomap, e.g. the ACPI tables. This
> > causes these regions be left out of the linear map.
> > (I don't know why exactly, but assume its so that the attributes and permissions
> > can be tweaked easily)
> >
> 
> To prevent mismatched attributes between the linear mapping and
> whatever mapping the firmware and/or ACPI are using. Also, going
> through the trouble of mapping runtime services executable code with
> R-X permissions and then having a writable alias is not great in terms
> of security.

Ok, but with this patch we potentially have both of those problems
(mismatched attributes and RWX permission) with the userspace mapping :/

Is it worth trying to avoid any of this, or do we just throw our hands
in the air and give up when /dev/mem is being used?

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory
  2016-09-05 13:24       ` Will Deacon
@ 2016-09-05 13:41         ` Ard Biesheuvel
  2016-09-05 14:36           ` Leif Lindholm
  2016-09-05 14:42           ` Will Deacon
  0 siblings, 2 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2016-09-05 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 5 September 2016 at 14:24, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Sep 05, 2016 at 02:10:54PM +0100, Ard Biesheuvel wrote:
>> On 5 September 2016 at 14:03, James Morse <james.morse@arm.com> wrote:
>> > On 05/09/16 11:35, Will Deacon wrote:
>> >> On Mon, Sep 05, 2016 at 09:43:03AM +0100, James Morse wrote:
>> >>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> >>> index 4989948d1feb..96a2f327fd2c 100644
>> >>> --- a/arch/arm64/mm/mmu.c
>> >>> +++ b/arch/arm64/mm/mmu.c
>> >>> @@ -63,7 +63,7 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>> >>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>> >>>                            unsigned long size, pgprot_t vma_prot)
>> >>>  {
>> >>> -    if (!pfn_valid(pfn))
>> >>> +    if (!memblock_is_memory(pfn << PAGE_SHIFT))
>> >
>> >> Hmm, this looks pretty weird to me. pfn_valid currently calls
>> >> memblock_is_map_memory, so now we have this slightly paradoxical situation
>> >> of having some memory marked NOMAP but wanting to map it to userspace.
>> >
>> > Yup, crazy as it looks, that is what is happening.
>> >
>>
>> Indeed. Formerly, we would remove this from memblock entirely,
>> resulting in the same situation, i.e., that the region is omitted from
>> the linear mapping. However, by doing that, we also lose the
>> annotation that the region was memory to begin with, resulting in the
>> ACPI layer using ioremap() rather than ioremap_cache() or memremap()
>> to map it.
>>
>> So the whole point of introducing MEMBLOCK_NOMAP was to allow memory
>> to be kept in the memblock array, but with an annotation that it
>> should not be covered by the linear mapping.
>>
>> >> So our abstractions don't seem to align with reality. Why are the ACPI
>> >> tables getting marked as NOMAP to being with?
>> >
>> > EFI adds any region we can map as normal memory to memblock.memory, it also adds
>> > anything it considers reserved to the memblock.nomap, e.g. the ACPI tables. This
>> > causes these regions be left out of the linear map.
>> > (I don't know why exactly, but assume its so that the attributes and permissions
>> > can be tweaked easily)
>> >
>>
>> To prevent mismatched attributes between the linear mapping and
>> whatever mapping the firmware and/or ACPI are using. Also, going
>> through the trouble of mapping runtime services executable code with
>> R-X permissions and then having a writable alias is not great in terms
>> of security.
>
> Ok, but with this patch we potentially have both of those problems
> (mismatched attributes and RWX permission) with the userspace mapping :/
>
> Is it worth trying to avoid any of this, or do we just throw our hands
> in the air and give up when /dev/mem is being used?
>

Well, I do think /dev/mem is a ridiculous hack, but it is arguably an
improvement to not map a region as device if we know it is backed by
memory.

Leif already did a lot of work on dmidecode and related tools to get
rid of /dev/mem dependencies, especially because of the fact that it
used 'known' physical addresses to look for magic numbers identifying
SMBIOS tables etc. I am not sure if acpidump was on our radar yet, but
in the long term, we should be able to reduce the dependency on
/dev/mem for anything other than development use.

Would it help at all if we restricted these uses to read only? I am
not aware of any tools that require read/write access to memory in
this way.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory
  2016-09-05 13:41         ` Ard Biesheuvel
@ 2016-09-05 14:36           ` Leif Lindholm
  2016-09-05 14:42           ` Will Deacon
  1 sibling, 0 replies; 10+ messages in thread
From: Leif Lindholm @ 2016-09-05 14:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 05, 2016 at 02:41:16PM +0100, Ard Biesheuvel wrote:
> On 5 September 2016 at 14:24, Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Sep 05, 2016 at 02:10:54PM +0100, Ard Biesheuvel wrote:
> >> On 5 September 2016 at 14:03, James Morse <james.morse@arm.com> wrote:
> >> > On 05/09/16 11:35, Will Deacon wrote:
> >> >> On Mon, Sep 05, 2016 at 09:43:03AM +0100, James Morse wrote:
> >> >>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> >>> index 4989948d1feb..96a2f327fd2c 100644
> >> >>> --- a/arch/arm64/mm/mmu.c
> >> >>> +++ b/arch/arm64/mm/mmu.c
> >> >>> @@ -63,7 +63,7 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
> >> >>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> >> >>>                            unsigned long size, pgprot_t vma_prot)
> >> >>>  {
> >> >>> -    if (!pfn_valid(pfn))
> >> >>> +    if (!memblock_is_memory(pfn << PAGE_SHIFT))
> >> >
> >> >> Hmm, this looks pretty weird to me. pfn_valid currently calls
> >> >> memblock_is_map_memory, so now we have this slightly paradoxical situation
> >> >> of having some memory marked NOMAP but wanting to map it to userspace.
> >> >
> >> > Yup, crazy as it looks, that is what is happening.
> >> >
> >>
> >> Indeed. Formerly, we would remove this from memblock entirely,
> >> resulting in the same situation, i.e., that the region is omitted from
> >> the linear mapping. However, by doing that, we also lose the
> >> annotation that the region was memory to begin with, resulting in the
> >> ACPI layer using ioremap() rather than ioremap_cache() or memremap()
> >> to map it.
> >>
> >> So the whole point of introducing MEMBLOCK_NOMAP was to allow memory
> >> to be kept in the memblock array, but with an annotation that it
> >> should not be covered by the linear mapping.
> >>
> >> >> So our abstractions don't seem to align with reality. Why are the ACPI
> >> >> tables getting marked as NOMAP to being with?
> >> >
> >> > EFI adds any region we can map as normal memory to memblock.memory, it also adds
> >> > anything it considers reserved to the memblock.nomap, e.g. the ACPI tables. This
> >> > causes these regions be left out of the linear map.
> >> > (I don't know why exactly, but assume its so that the attributes and permissions
> >> > can be tweaked easily)
> >> >
> >>
> >> To prevent mismatched attributes between the linear mapping and
> >> whatever mapping the firmware and/or ACPI are using. Also, going
> >> through the trouble of mapping runtime services executable code with
> >> R-X permissions and then having a writable alias is not great in terms
> >> of security.
> >
> > Ok, but with this patch we potentially have both of those problems
> > (mismatched attributes and RWX permission) with the userspace mapping :/
> >
> > Is it worth trying to avoid any of this, or do we just throw our hands
> > in the air and give up when /dev/mem is being used?
> 
> Well, I do think /dev/mem is a ridiculous hack, but it is arguably an
> improvement to not map a region as device if we know it is backed by
> memory.
> 
> Leif already did a lot of work on dmidecode and related tools to get
> rid of /dev/mem dependencies, especially because of the fact that it
> used 'known' physical addresses to look for magic numbers identifying
> SMBIOS tables etc. I am not sure if acpidump was on our radar yet, but

Yeah, Roy did acpidump last year:
https://lists.linaro.org/pipermail/linaro-uefi/2015-June/000906.html

> in the long term, we should be able to reduce the dependency on
> /dev/mem for anything other than development use.

On x86 there are certain legacy reasons why they need to keep the
interface enabled (for now) - but on ARM the only use for it is not
wanting to bother with creating a driver for poking various
controllers from userland. It has no business being enabled in a
distro kernel. It is the opposite of what an operating system is for.

> Would it help at all if we restricted these uses to read only? I am
> not aware of any tools that require read/write access to memory in
> this way.

If any such utilities remain, we need to fix them.

/
    Leif

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory
  2016-09-05 13:41         ` Ard Biesheuvel
  2016-09-05 14:36           ` Leif Lindholm
@ 2016-09-05 14:42           ` Will Deacon
  2016-09-05 14:52             ` Ard Biesheuvel
  1 sibling, 1 reply; 10+ messages in thread
From: Will Deacon @ 2016-09-05 14:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 05, 2016 at 02:41:16PM +0100, Ard Biesheuvel wrote:
> On 5 September 2016 at 14:24, Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Sep 05, 2016 at 02:10:54PM +0100, Ard Biesheuvel wrote:
> >> On 5 September 2016 at 14:03, James Morse <james.morse@arm.com> wrote:
> >> > On 05/09/16 11:35, Will Deacon wrote:
> >> >> So our abstractions don't seem to align with reality. Why are the ACPI
> >> >> tables getting marked as NOMAP to being with?
> >> >
> >> > EFI adds any region we can map as normal memory to memblock.memory, it also adds
> >> > anything it considers reserved to the memblock.nomap, e.g. the ACPI tables. This
> >> > causes these regions be left out of the linear map.
> >> > (I don't know why exactly, but assume its so that the attributes and permissions
> >> > can be tweaked easily)
> >> >
> >>
> >> To prevent mismatched attributes between the linear mapping and
> >> whatever mapping the firmware and/or ACPI are using. Also, going
> >> through the trouble of mapping runtime services executable code with
> >> R-X permissions and then having a writable alias is not great in terms
> >> of security.
> >
> > Ok, but with this patch we potentially have both of those problems
> > (mismatched attributes and RWX permission) with the userspace mapping :/
> >
> > Is it worth trying to avoid any of this, or do we just throw our hands
> > in the air and give up when /dev/mem is being used?
> >
> 
> Well, I do think /dev/mem is a ridiculous hack, but it is arguably an
> improvement to not map a region as device if we know it is backed by
> memory.
> 
> Leif already did a lot of work on dmidecode and related tools to get
> rid of /dev/mem dependencies, especially because of the fact that it
> used 'known' physical addresses to look for magic numbers identifying
> SMBIOS tables etc. I am not sure if acpidump was on our radar yet, but
> in the long term, we should be able to reduce the dependency on
> /dev/mem for anything other than development use.

Judging by Leif's reply, acpidump has already been fixed too.

> Would it help at all if we restricted these uses to read only? I am
> not aware of any tools that require read/write access to memory in
> this way.

I'd certainly be happier if we dropped the write permission and derived
the memory type from EFI for the NOMAP regions (it looks like ia64 tries
to do something like this, whereas we currently return non-cacheable if
O_SYNC is set on the fd).

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory
  2016-09-05 14:42           ` Will Deacon
@ 2016-09-05 14:52             ` Ard Biesheuvel
  0 siblings, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2016-09-05 14:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 5 September 2016 at 15:42, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Sep 05, 2016 at 02:41:16PM +0100, Ard Biesheuvel wrote:
>> On 5 September 2016 at 14:24, Will Deacon <will.deacon@arm.com> wrote:
>> > On Mon, Sep 05, 2016 at 02:10:54PM +0100, Ard Biesheuvel wrote:
>> >> On 5 September 2016 at 14:03, James Morse <james.morse@arm.com> wrote:
>> >> > On 05/09/16 11:35, Will Deacon wrote:
>> >> >> So our abstractions don't seem to align with reality. Why are the ACPI
>> >> >> tables getting marked as NOMAP to being with?
>> >> >
>> >> > EFI adds any region we can map as normal memory to memblock.memory, it also adds
>> >> > anything it considers reserved to the memblock.nomap, e.g. the ACPI tables. This
>> >> > causes these regions be left out of the linear map.
>> >> > (I don't know why exactly, but assume its so that the attributes and permissions
>> >> > can be tweaked easily)
>> >> >
>> >>
>> >> To prevent mismatched attributes between the linear mapping and
>> >> whatever mapping the firmware and/or ACPI are using. Also, going
>> >> through the trouble of mapping runtime services executable code with
>> >> R-X permissions and then having a writable alias is not great in terms
>> >> of security.
>> >
>> > Ok, but with this patch we potentially have both of those problems
>> > (mismatched attributes and RWX permission) with the userspace mapping :/
>> >
>> > Is it worth trying to avoid any of this, or do we just throw our hands
>> > in the air and give up when /dev/mem is being used?
>> >
>>
>> Well, I do think /dev/mem is a ridiculous hack, but it is arguably an
>> improvement to not map a region as device if we know it is backed by
>> memory.
>>
>> Leif already did a lot of work on dmidecode and related tools to get
>> rid of /dev/mem dependencies, especially because of the fact that it
>> used 'known' physical addresses to look for magic numbers identifying
>> SMBIOS tables etc. I am not sure if acpidump was on our radar yet, but
>> in the long term, we should be able to reduce the dependency on
>> /dev/mem for anything other than development use.
>
> Judging by Leif's reply, acpidump has already been fixed too.
>

Fortunately, yes.

>> Would it help at all if we restricted these uses to read only? I am
>> not aware of any tools that require read/write access to memory in
>> this way.




>
> I'd certainly be happier if we dropped the write permission and derived
> the memory type from EFI for the NOMAP regions (it looks like ia64 tries
> to do something like this, whereas we currently return non-cacheable if
> O_SYNC is set on the fd).
>

The UEFI memory map describes the supported mapping types. I.e, most
regions are listed as

[Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]

which means that it is legal to map it as

WB:
Normal Memory
Outer Write-back non-transient
Inner Write-back non-transient

WT:
Normal Memory
Outer Write-through non-transient
Inner Write-through non-transient

WC:
Normal Memory
Outer non-cacheable
Inner non-cacheable

UC:
Device-nGnRnE

(this is straight from the UEFI spec)

So while WB is used in the majority of cases, this is not universally
true, and the UEFI or ACPI layers may decide to use any of those for
reserved regions.

We already had a separate discussion whether it makes sense to include
the UC attribute in ACPI memory regions. The problem here is that it
is not unambiguously defined whether the attributes describe the
nature of the memory regions (i.e, the backing) or the nature of the
/contents/ of the memory regions. This is further complicated by the
fact that occupied and available regions are described in the same
way.

The bottom line is that, in general, we cannot infer from the UEFI
memory map how a region is currently mapped if it is not covered by
the linear mapping. What we /could/ do is generally disallow device
mappings for regions that can also be mapped as memory (i.e, not only
for /dev/mem) (although I think ACPI RAS stipulates Device-nGnRnE in
some cases.)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-09-05 14:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-05  8:43 [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory James Morse
2016-09-05  8:43 ` [PATCH] arm64: Drop generic xlate_dev_mem_{k,}ptr() James Morse
2016-09-05 10:35 ` [PATCH] arm64: mm: phys_mem_access_prot() reports non-kernel memory as device memory Will Deacon
2016-09-05 13:03   ` James Morse
2016-09-05 13:10     ` Ard Biesheuvel
2016-09-05 13:24       ` Will Deacon
2016-09-05 13:41         ` Ard Biesheuvel
2016-09-05 14:36           ` Leif Lindholm
2016-09-05 14:42           ` Will Deacon
2016-09-05 14:52             ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.