bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
@ 2024-03-07 17:42 Miguel Ojeda
  2024-03-07 17:54 ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Miguel Ojeda @ 2024-03-07 17:42 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Linux ARM, alexei.starovoitov,
	Daniel Borkmann, bpf, Bjorn Helgaas, linux-pci

Hi arm64/bpf/pci,

In today's next-20240307 with a defconfig LLVM=1 I am seeing [1] under
QEMU virt, i.e. from
https://lore.kernel.org/all/20240305030516.41519-2-alexei.starovoitov@gmail.com/
applied to the bpf-next tree.

Cheers,
Miguel

[1]

[    0.425177] pci-host-generic 4010000000.pcie: host bridge
/pcie@10000000 ranges:
[    0.425886] pci-host-generic 4010000000.pcie:       IO
0x003eff0000..0x003effffff -> 0x0000000000
[    0.426534] pci-host-generic 4010000000.pcie:      MEM
0x0010000000..0x003efeffff -> 0x0010000000
[    0.426764] pci-host-generic 4010000000.pcie:      MEM
0x8000000000..0xffffffffff -> 0x8000000000
[    0.427324] ------------[ cut here ]------------
[    0.427456] vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
[    0.427944] WARNING: CPU: 0 PID: 1 at mm/vmalloc.c:315
ioremap_page_range+0x25c/0x2bc
[    0.428543] Modules linked in:
[    0.429236] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
6.8.0-rc7-next-20240307 #1
[    0.429513] Hardware name: linux,dummy-virt (DT)
[    0.429751] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.429946] pc : ioremap_page_range+0x25c/0x2bc
[    0.430063] lr : ioremap_page_range+0x258/0x2bc
[    0.430178] sp : ffff80008002b670
[    0.430271] x29: ffff80008002b690 x28: 0000000000001820 x27: 0000000000001880
[    0.430556] x26: ffffde390bd7d000 x25: 0000000000000000 x24: ffffde390bce9000
[    0.430729] x23: ffff69958280b0a0 x22: 0000000000000000 x21: 000000003eff0000
[    0.430895] x20: ffff699582805090 x19: ffffffffc0800000 x18: 00000000de69481d
[    0.431060] x17: ffff80008002b1fb x16: 0000000000000108 x15: 0000000000000004
[    0.431245] x14: ffffde390bd19570 x13: 0000000000000fff x12: 0000000000000003
[    0.431421] x11: 0000000000000003 x10: ffffde390bca2008 x9 : 4038d3d4a4dd4200
[    0.431623] x8 : 4038d3d4a4dd4200 x7 : 0773076107200764 x6 : 0765076b07720761
[    0.431774] x5 : ffff699582829f00 x4 : ffff699582829fa0 x3 : 0000000000000000
[    0.431923] x2 : 0000000000000000 x1 : ffff80008002b400 x0 : 00000000ffffffea
[    0.432220] Call trace:
[    0.432462]  ioremap_page_range+0x25c/0x2bc
[    0.432703]  pci_remap_iospace+0x78/0x84
[    0.432854]  devm_pci_remap_iospace+0x54/0x98
[    0.432979]  devm_of_pci_bridge_init+0x2e0/0x48c
[    0.433114]  devm_pci_alloc_host_bridge+0xa4/0xbc
[    0.433254]  pci_host_common_probe+0x48/0x1a4
[    0.433363]  platform_probe+0xa8/0xd0
[    0.433456]  really_probe+0x130/0x2e4
[    0.433545]  __driver_probe_device+0xa0/0x128
[    0.433647]  driver_probe_device+0x3c/0x1f8
[    0.433742]  __driver_attach+0xdc/0x1a4
[    0.433834]  bus_for_each_dev+0xe8/0x140
[    0.433925]  driver_attach+0x24/0x30
[    0.434011]  bus_add_driver+0x154/0x240
[    0.434104]  driver_register+0x68/0x100
[    0.434196]  __platform_driver_register+0x24/0x30
[    0.434306]  gen_pci_driver_init+0x1c/0x28
[    0.434407]  do_one_initcall+0xbc/0x248
[    0.434533]  do_initcall_level+0x94/0xb4
[    0.434632]  do_initcalls+0x54/0x94
[    0.434721]  do_basic_setup+0x50/0x60
[    0.434810]  kernel_init_freeable+0x10c/0x178
[    0.434912]  kernel_init+0x20/0x1a0
[    0.435003]  ret_from_fork+0x10/0x20
[    0.435227] ---[ end trace 0000000000000000 ]---
[    0.435691] pci-host-generic 4010000000.pcie: error -22: failed to
map resource [io  0x0000-0xffff]
[    0.436124] pci-host-generic 4010000000.pcie: Memory resource size
exceeds max for 32 bits
[    0.436713] pci-host-generic 4010000000.pcie: ECAM at [mem
0x4010000000-0x401fffffff] for [bus 00-ff]
[    0.438068] pci-host-generic 4010000000.pcie: PCI host bridge to bus 0000:00
[    0.438414] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.438584] pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff]
[    0.438732] pci_bus 0000:00: root bus resource [mem
0x8000000000-0xffffffffff]
[    0.440126] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
conventional PCI endpoint
[    0.443526] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000
conventional PCI endpoint
[    0.443891] pci 0000:00:01.0: BAR 0 [io  0x0000-0x001f]
[    0.444059] pci 0000:00:01.0: BAR 1 [mem 0x00000000-0x00000fff]
[    0.444227] pci 0000:00:01.0: BAR 4 [mem 0x00000000-0x00003fff 64bit pref]
[    0.444468] pci 0000:00:01.0: ROM [mem 0x00000000-0x0007ffff pref]
[    0.446795] pci 0000:00:01.0: ROM [mem 0x10000000-0x1007ffff pref]: assigned
[    0.447158] pci 0000:00:01.0: BAR 4 [mem 0x8000000000-0x8000003fff
64bit pref]: assigned
[    0.447465] pci 0000:00:01.0: BAR 1 [mem 0x10080000-0x10080fff]: assigned
[    0.447632] pci 0000:00:01.0: BAR 0 [io  size 0x0020]: can't assign; no space
[    0.447945] pci 0000:00:01.0: BAR 0 [io  size 0x0020]: failed to assign

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-07 17:42 vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP Miguel Ojeda
@ 2024-03-07 17:54 ` Alexei Starovoitov
  2024-03-08  3:49   ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-07 17:54 UTC (permalink / raw)
  To: Miguel Ojeda, Christoph Hellwig, Linus Torvalds, linux-mm
  Cc: Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

On Thu, Mar 7, 2024 at 9:42 AM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> Hi arm64/bpf/pci,
>
> In today's next-20240307 with a defconfig LLVM=1 I am seeing [1] under
> QEMU virt, i.e. from
> https://lore.kernel.org/all/20240305030516.41519-2-alexei.starovoitov@gmail.com/
> applied to the bpf-next tree.
>
> Cheers,
> Miguel
>
> [1]
>
> [    0.425177] pci-host-generic 4010000000.pcie: host bridge
> /pcie@10000000 ranges:
> [    0.425886] pci-host-generic 4010000000.pcie:       IO
> 0x003eff0000..0x003effffff -> 0x0000000000
> [    0.426534] pci-host-generic 4010000000.pcie:      MEM
> 0x0010000000..0x003efeffff -> 0x0010000000
> [    0.426764] pci-host-generic 4010000000.pcie:      MEM
> 0x8000000000..0xffffffffff -> 0x8000000000
> [    0.427324] ------------[ cut here ]------------
> [    0.427456] vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
> [    0.427944] WARNING: CPU: 0 PID: 1 at mm/vmalloc.c:315
> ioremap_page_range+0x25c/0x2bc

Great. Thanks for flagging.
Looks like this check found some misuse of ioremap_page_range.

Note that without marking the address range as VM_IOREMAP
the vread_iter() will be bulk reading over IO and might
cause hard hangs and what not.
pci drivers need to mark their range as VM_IOREMAP.
That was the reason for the warning.

I'll try to figure out which piece of code missed passing
VM_IOREMAP into vm_area.
I'm not familiar with pci, so help is greatly appreciated.

> [    0.429236] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 6.8.0-rc7-next-20240307 #1
> [    0.429513] Hardware name: linux,dummy-virt (DT)
> [    0.429751] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    0.429946] pc : ioremap_page_range+0x25c/0x2bc
> [    0.430063] lr : ioremap_page_range+0x258/0x2bc
> [    0.432220] Call trace:
> [    0.432462]  ioremap_page_range+0x25c/0x2bc
> [    0.432703]  pci_remap_iospace+0x78/0x84
> [    0.432854]  devm_pci_remap_iospace+0x54/0x98
> [    0.432979]  devm_of_pci_bridge_init+0x2e0/0x48c
> [    0.433114]  devm_pci_alloc_host_bridge+0xa4/0xbc
> [    0.433254]  pci_host_common_probe+0x48/0x1a4
> [    0.433363]  platform_probe+0xa8/0xd0
> [    0.433456]  really_probe+0x130/0x2e4
> [    0.433545]  __driver_probe_device+0xa0/0x128
> [    0.433647]  driver_probe_device+0x3c/0x1f8
> [    0.433742]  __driver_attach+0xdc/0x1a4
> [    0.433834]  bus_for_each_dev+0xe8/0x140
> [    0.433925]  driver_attach+0x24/0x30
> [    0.434011]  bus_add_driver+0x154/0x240
> [    0.434104]  driver_register+0x68/0x100
> [    0.434196]  __platform_driver_register+0x24/0x30
> [    0.434306]  gen_pci_driver_init+0x1c/0x28
> [    0.434407]  do_one_initcall+0xbc/0x248
> [    0.434533]  do_initcall_level+0x94/0xb4
> [    0.434632]  do_initcalls+0x54/0x94
> [    0.434721]  do_basic_setup+0x50/0x60
> [    0.434810]  kernel_init_freeable+0x10c/0x178
> [    0.434912]  kernel_init+0x20/0x1a0
> [    0.435003]  ret_from_fork+0x10/0x20
> [    0.435227] ---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-07 17:54 ` Alexei Starovoitov
@ 2024-03-08  3:49   ` Alexei Starovoitov
  2024-03-08 10:48     ` Manivannan Sadhasivam
                       ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-08  3:49 UTC (permalink / raw)
  To: Miguel Ojeda, Christoph Hellwig, Linus Torvalds, linux-mm, Andrew Morton
  Cc: Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

[-- Attachment #1: Type: text/plain, Size: 1728 bytes --]

On Thu, Mar 7, 2024 at 9:54 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 7, 2024 at 9:42 AM Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com> wrote:
> >
> > Hi arm64/bpf/pci,
> >
> > In today's next-20240307 with a defconfig LLVM=1 I am seeing [1] under
> > QEMU virt, i.e. from
> > https://lore.kernel.org/all/20240305030516.41519-2-alexei.starovoitov@gmail.com/
> > applied to the bpf-next tree.
> >
> > Cheers,
> > Miguel
> >
> > [1]
> >
> > [    0.425177] pci-host-generic 4010000000.pcie: host bridge
> > /pcie@10000000 ranges:
> > [    0.425886] pci-host-generic 4010000000.pcie:       IO
> > 0x003eff0000..0x003effffff -> 0x0000000000
> > [    0.426534] pci-host-generic 4010000000.pcie:      MEM
> > 0x0010000000..0x003efeffff -> 0x0010000000
> > [    0.426764] pci-host-generic 4010000000.pcie:      MEM
> > 0x8000000000..0xffffffffff -> 0x8000000000
> > [    0.427324] ------------[ cut here ]------------
> > [    0.427456] vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
> > [    0.427944] WARNING: CPU: 0 PID: 1 at mm/vmalloc.c:315
> > ioremap_page_range+0x25c/0x2bc
>
> Great. Thanks for flagging.
> Looks like this check found some misuse of ioremap_page_range.
>
> Note that without marking the address range as VM_IOREMAP
> the vread_iter() will be bulk reading over IO and might
> cause hard hangs and what not.
> pci drivers need to mark their range as VM_IOREMAP.
> That was the reason for the warning.
>
> I'll try to figure out which piece of code missed passing
> VM_IOREMAP into vm_area.
> I'm not familiar with pci, so help is greatly appreciated.

Ok. I think I figured it out.
Please try the attached patch.

[-- Attachment #2: 0001-mm-Enforce-range-of-ioremap_page_range-when-it-s-ins.patch --]
[-- Type: application/octet-stream, Size: 1933 bytes --]

From b446534f2dc5c48d8ab07c68e79648f42238fef8 Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 7 Mar 2024 19:41:45 -0800
Subject: [PATCH bpf-next] mm: Enforce range of ioremap_page_range when it's
 inside vmalloc range

PCI address range is managed independently from vmalloc range.
Enforce flags and range in ioremap_page_range() only when
the start address is within vmalloc range allocated by get_vm_area().

Fixes: 3e49a866c9dc ("mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 mm/vmalloc.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e5b8c70950bc..17eb0f974e0f 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -311,16 +311,19 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
 	int err;
 
 	area = find_vm_area((void *)addr);
-	if (!area || !(area->flags & VM_IOREMAP)) {
-		WARN_ONCE(1, "vm_area at addr %lx is not marked as VM_IOREMAP\n", addr);
-		return -EINVAL;
-	}
-	if (addr != (unsigned long)area->addr ||
-	    (void *)end != area->addr + get_vm_area_size(area)) {
-		WARN_ONCE(1, "ioremap request [%lx,%lx) doesn't match vm_area [%lx, %lx)\n",
-			  addr, end, (long)area->addr,
-			  (long)area->addr + get_vm_area_size(area));
-		return -ERANGE;
+	if (area) {
+		if (!(area->flags & VM_IOREMAP)) {
+			WARN_ONCE(1, "vm_area at addr %lx is not marked as VM_IOREMAP\n",
+				  addr);
+			return -EINVAL;
+		}
+		if (addr != (unsigned long)area->addr ||
+		    (void *)end != area->addr + get_vm_area_size(area)) {
+			WARN_ONCE(1, "ioremap request [%lx,%lx) doesn't match vm_area [%lx, %lx)\n",
+				  addr, end, (long)area->addr,
+				  (long)area->addr + get_vm_area_size(area));
+			return -ERANGE;
+		}
 	}
 	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
 				 ioremap_max_page_shift);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08  3:49   ` Alexei Starovoitov
@ 2024-03-08 10:48     ` Manivannan Sadhasivam
  2024-03-08 11:23     ` Miguel Ojeda
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Manivannan Sadhasivam @ 2024-03-08 10:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Miguel Ojeda, Christoph Hellwig, Linus Torvalds, linux-mm,
	Andrew Morton, Catalin Marinas, Will Deacon, Linux ARM,
	Daniel Borkmann, bpf, Bjorn Helgaas, linux-pci

On Thu, Mar 07, 2024 at 07:49:16PM -0800, Alexei Starovoitov wrote:
> On Thu, Mar 7, 2024 at 9:54 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Mar 7, 2024 at 9:42 AM Miguel Ojeda
> > <miguel.ojeda.sandonis@gmail.com> wrote:
> > >
> > > Hi arm64/bpf/pci,
> > >
> > > In today's next-20240307 with a defconfig LLVM=1 I am seeing [1] under
> > > QEMU virt, i.e. from
> > > https://lore.kernel.org/all/20240305030516.41519-2-alexei.starovoitov@gmail.com/
> > > applied to the bpf-next tree.
> > >
> > > Cheers,
> > > Miguel
> > >
> > > [1]
> > >
> > > [    0.425177] pci-host-generic 4010000000.pcie: host bridge
> > > /pcie@10000000 ranges:
> > > [    0.425886] pci-host-generic 4010000000.pcie:       IO
> > > 0x003eff0000..0x003effffff -> 0x0000000000
> > > [    0.426534] pci-host-generic 4010000000.pcie:      MEM
> > > 0x0010000000..0x003efeffff -> 0x0010000000
> > > [    0.426764] pci-host-generic 4010000000.pcie:      MEM
> > > 0x8000000000..0xffffffffff -> 0x8000000000
> > > [    0.427324] ------------[ cut here ]------------
> > > [    0.427456] vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
> > > [    0.427944] WARNING: CPU: 0 PID: 1 at mm/vmalloc.c:315
> > > ioremap_page_range+0x25c/0x2bc
> >
> > Great. Thanks for flagging.
> > Looks like this check found some misuse of ioremap_page_range.
> >
> > Note that without marking the address range as VM_IOREMAP
> > the vread_iter() will be bulk reading over IO and might
> > cause hard hangs and what not.
> > pci drivers need to mark their range as VM_IOREMAP.
> > That was the reason for the warning.
> >
> > I'll try to figure out which piece of code missed passing
> > VM_IOREMAP into vm_area.
> > I'm not familiar with pci, so help is greatly appreciated.
> 
> Ok. I think I figured it out.
> Please try the attached patch.

I tested the attached patch on Qcom SDM845 based dev board and it fixes the
issue. So,

Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

On a side note, PCI is the only driver making use of this API. If I'm making
such an API change, I would've made sure that the callers are unaffected by the
change.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08  3:49   ` Alexei Starovoitov
  2024-03-08 10:48     ` Manivannan Sadhasivam
@ 2024-03-08 11:23     ` Miguel Ojeda
  2024-03-08 15:04     ` Christoph Hellwig
  2024-03-08 16:13     ` Bjorn Helgaas
  3 siblings, 0 replies; 20+ messages in thread
From: Miguel Ojeda @ 2024-03-08 11:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christoph Hellwig, Linus Torvalds, linux-mm, Andrew Morton,
	Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

On Fri, Mar 8, 2024 at 4:49 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> Ok. I think I figured it out.
> Please try the attached patch.

On top of today's next-20240308 for both arm64 and x86_64 under QEMU:

Tested-by: Miguel Ojeda <ojeda@kernel.org>

Thanks!

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08  3:49   ` Alexei Starovoitov
  2024-03-08 10:48     ` Manivannan Sadhasivam
  2024-03-08 11:23     ` Miguel Ojeda
@ 2024-03-08 15:04     ` Christoph Hellwig
  2024-03-08 16:33       ` Alexei Starovoitov
  2024-03-08 16:13     ` Bjorn Helgaas
  3 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-03-08 15:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Miguel Ojeda, Christoph Hellwig, Linus Torvalds, linux-mm,
	Andrew Morton, Catalin Marinas, Will Deacon, Linux ARM,
	Daniel Borkmann, bpf, Bjorn Helgaas, linux-pci

On Thu, Mar 07, 2024 at 07:49:16PM -0800, Alexei Starovoitov wrote:
> Ok. I think I figured it out.
> Please try the attached patch.

I don't think this is the right thing.  The probem is that
the PCI code shouldn't really be using ioremap_page_range if it is
not an ioremap area, but instead directly call into
vmap_range_noflush (or an added back vmap_range to avoid all the
duplication) similar to the vunmap case in vunmap_range.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08  3:49   ` Alexei Starovoitov
                       ` (2 preceding siblings ...)
  2024-03-08 15:04     ` Christoph Hellwig
@ 2024-03-08 16:13     ` Bjorn Helgaas
  2024-03-08 16:37       ` Alexei Starovoitov
  3 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2024-03-08 16:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Miguel Ojeda, Christoph Hellwig, Linus Torvalds, linux-mm,
	Andrew Morton, Catalin Marinas, Will Deacon, Linux ARM,
	Daniel Borkmann, bpf, Bjorn Helgaas, linux-pci

On Thu, Mar 07, 2024 at 07:49:16PM -0800, Alexei Starovoitov wrote:
> Ok. I think I figured it out.
> Please try the attached patch.

> PCI address range is managed independently from vmalloc range.

This suggests that the PCI maintainers should be aware of something,
but I don't know what this means.  Can you elaborate on what PCI
address range management this is, e.g., what functions allocate from
it?  Or how PCI should have been able to avoid this issue?

The patch is in a generic area with no obvious connection to PCI and
no obvious sign of independent management, which doesn't feel quite
right.  Maybe this is what Christoph is getting at.

> Enforce flags and range in ioremap_page_range() only when
> the start address is within vmalloc range allocated by get_vm_area().

> Fixes: 3e49a866c9dc ("mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.")
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  mm/vmalloc.c | 23 +++++++++++++----------
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e5b8c70950bc..17eb0f974e0f 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -311,16 +311,19 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
>  	int err;
>  
>  	area = find_vm_area((void *)addr);
> -	if (!area || !(area->flags & VM_IOREMAP)) {
> -		WARN_ONCE(1, "vm_area at addr %lx is not marked as VM_IOREMAP\n", addr);
> -		return -EINVAL;
> -	}
> -	if (addr != (unsigned long)area->addr ||
> -	    (void *)end != area->addr + get_vm_area_size(area)) {
> -		WARN_ONCE(1, "ioremap request [%lx,%lx) doesn't match vm_area [%lx, %lx)\n",
> -			  addr, end, (long)area->addr,
> -			  (long)area->addr + get_vm_area_size(area));
> -		return -ERANGE;
> +	if (area) {
> +		if (!(area->flags & VM_IOREMAP)) {
> +			WARN_ONCE(1, "vm_area at addr %lx is not marked as VM_IOREMAP\n",
> +				  addr);
> +			return -EINVAL;
> +		}
> +		if (addr != (unsigned long)area->addr ||
> +		    (void *)end != area->addr + get_vm_area_size(area)) {
> +			WARN_ONCE(1, "ioremap request [%lx,%lx) doesn't match vm_area [%lx, %lx)\n",
> +				  addr, end, (long)area->addr,
> +				  (long)area->addr + get_vm_area_size(area));
> +			return -ERANGE;
> +		}
>  	}
>  	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
>  				 ioremap_max_page_shift);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08 15:04     ` Christoph Hellwig
@ 2024-03-08 16:33       ` Alexei Starovoitov
  2024-03-08 17:02         ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-08 16:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miguel Ojeda, Linus Torvalds, linux-mm, Andrew Morton,
	Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

On Fri, Mar 8, 2024 at 7:04 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Mar 07, 2024 at 07:49:16PM -0800, Alexei Starovoitov wrote:
> > Ok. I think I figured it out.
> > Please try the attached patch.
>
> I don't think this is the right thing.  The probem is that
> the PCI code shouldn't really be using ioremap_page_range if it is
> not an ioremap area, but instead directly call into
> vmap_range_noflush (or an added back vmap_range to avoid all the
> duplication) similar to the vunmap case in vunmap_range.

vmap_range_noflush() is static in mm/vmalloc.c
There is vmap_pages_range_noflush() that is in mm/internal.h,
but it needs pages instead of phys_addr_t.
Newly introduced vm_area_map_pages() needs struct vm_struct *area
and struct page **pages.
In this PCI case there is no vm_struct and no pages.
ioremap_page_range() is the only api that fits. afaict.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08 16:13     ` Bjorn Helgaas
@ 2024-03-08 16:37       ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-08 16:37 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Miguel Ojeda, Christoph Hellwig, Linus Torvalds, linux-mm,
	Andrew Morton, Catalin Marinas, Will Deacon, Linux ARM,
	Daniel Borkmann, bpf, Bjorn Helgaas, linux-pci

On Fri, Mar 8, 2024 at 8:13 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Mar 07, 2024 at 07:49:16PM -0800, Alexei Starovoitov wrote:
> > Ok. I think I figured it out.
> > Please try the attached patch.
>
> > PCI address range is managed independently from vmalloc range.
>
> This suggests that the PCI maintainers should be aware of something,
> but I don't know what this means.  Can you elaborate on what PCI
> address range management this is, e.g., what functions allocate from
> it?  Or how PCI should have been able to avoid this issue?

I believe Chritoph's long term plan for ioremap_page_range()
is to be used for ranges _within_ vmalloc range only.
The vmalloc ranges are allocated by get_vm_area().
In PCI you don't use vmalloc address range.
PCI manages its own PCI_IOBASE, IO_SPACE_LIMIT address range
independently from vmalloc range and they do not overlap.
Hence this proposed patch.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08 16:33       ` Alexei Starovoitov
@ 2024-03-08 17:02         ` Christoph Hellwig
  2024-03-08 17:20           ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-03-08 17:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christoph Hellwig, Miguel Ojeda, Linus Torvalds, linux-mm,
	Andrew Morton, Catalin Marinas, Will Deacon, Linux ARM,
	Daniel Borkmann, bpf, Bjorn Helgaas, linux-pci

On Fri, Mar 08, 2024 at 08:33:18AM -0800, Alexei Starovoitov wrote:
> vmap_range_noflush() is static in mm/vmalloc.c
> There is vmap_pages_range_noflush() that is in mm/internal.h,
> but it needs pages instead of phys_addr_t.
> Newly introduced vm_area_map_pages() needs struct vm_struct *area
> and struct page **pages.
> In this PCI case there is no vm_struct and no pages.
> ioremap_page_range() is the only api that fits. afaict.

Except that we want to enforce a vm_area with the ioremap flag for
ioremap_page_range, and just adding a NULL check defeats that.

The right long term thing would be to actually create a vm_area
for the PCI_IOBASE region, but until then we just need a lower level
API.  That's why I suggest to add a vmap_range() that is basically
the ioremap_page_range before you added the checks, and make
ioremap_page_range a wrapper around that that checks the area.

If/when we get PCI_IOBASE handling converted to the proper
vmalloc/ioremap areas we can remove that again as well as
vunmap_range which is just used for PCI_IOBASE and other equivalent
ISA_IO_BASE in powerpc and somewhat unusual case in arm64 that I
need to look into a bit more.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08 17:02         ` Christoph Hellwig
@ 2024-03-08 17:20           ` Alexei Starovoitov
  2024-03-08 17:24             ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-08 17:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miguel Ojeda, Linus Torvalds, linux-mm, Andrew Morton,
	Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

On Fri, Mar 8, 2024 at 9:02 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Mar 08, 2024 at 08:33:18AM -0800, Alexei Starovoitov wrote:
> > vmap_range_noflush() is static in mm/vmalloc.c
> > There is vmap_pages_range_noflush() that is in mm/internal.h,
> > but it needs pages instead of phys_addr_t.
> > Newly introduced vm_area_map_pages() needs struct vm_struct *area
> > and struct page **pages.
> > In this PCI case there is no vm_struct and no pages.
> > ioremap_page_range() is the only api that fits. afaict.
>
> Except that we want to enforce a vm_area with the ioremap flag for
> ioremap_page_range, and just adding a NULL check defeats that.
>
> The right long term thing would be to actually create a vm_area
> for the PCI_IOBASE region, but until then we just need a lower level
> API.  That's why I suggest to add a vmap_range() that is basically
> the ioremap_page_range before you added the checks, and make
> ioremap_page_range a wrapper around that that checks the area.

ok. Like the attached patch?

> If/when we get PCI_IOBASE handling converted to the proper
> vmalloc/ioremap areas we can remove that again as well as
> vunmap_range which is just used for PCI_IOBASE and other equivalent
> ISA_IO_BASE in powerpc and somewhat unusual case in arm64 that I
> need to look into a bit more.

The plan makes sense to me.

[-- Attachment #2: 0001-mm-Introduce-vmap_page_range-to-be-used-by-PCI.patch --]
[-- Type: application/octet-stream, Size: 3479 bytes --]

From a3a26c8619f6ed6d8cca58ac1b1d548948bfee05 Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Fri, 8 Mar 2024 09:12:54 -0800
Subject: [PATCH bpf-next] mm: Introduce vmap_page_range() to be used by PCI

ioremap_page_range() should be used for ranges within vmalloc range only.
The vmalloc ranges are allocated by get_vm_area(). PCI has "resource"
allocator that manages PCI_IOBASE, IO_SPACE_LIMIT address range, hence
introduce vmap_page_range() to be used exclusively by PCI.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 drivers/pci/pci.c  |  4 ++--
 include/linux/io.h |  7 +++++++
 mm/vmalloc.c       | 23 +++++++++++++++--------
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c3585229c12a..ccee56615f78 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4353,8 +4353,8 @@ int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
 	if (res->end > IO_SPACE_LIMIT)
 		return -EINVAL;
 
-	return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
-				  pgprot_device(PAGE_KERNEL));
+	return vmap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
+			       pgprot_device(PAGE_KERNEL));
 #else
 	/*
 	 * This architecture does not have memory mapped I/O space,
diff --git a/include/linux/io.h b/include/linux/io.h
index 7304f2a69960..235ba7d80a8f 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -23,12 +23,19 @@ void __iowrite64_copy(void __iomem *to, const void *from, size_t count);
 #ifdef CONFIG_MMU
 int ioremap_page_range(unsigned long addr, unsigned long end,
 		       phys_addr_t phys_addr, pgprot_t prot);
+int vmap_page_range(unsigned long addr, unsigned long end,
+		    phys_addr_t phys_addr, pgprot_t prot);
 #else
 static inline int ioremap_page_range(unsigned long addr, unsigned long end,
 				     phys_addr_t phys_addr, pgprot_t prot)
 {
 	return 0;
 }
+static inline int vmap_page_range(unsigned long addr, unsigned long end,
+				  phys_addr_t phys_addr, pgprot_t prot)
+{
+	return 0;
+}
 #endif
 
 /*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e5b8c70950bc..1e36322d83d8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -304,11 +304,24 @@ static int vmap_range_noflush(unsigned long addr, unsigned long end,
 	return err;
 }
 
+int vmap_page_range(unsigned long addr, unsigned long end,
+		    phys_addr_t phys_addr, pgprot_t prot)
+{
+	int err;
+
+	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
+				 ioremap_max_page_shift);
+	flush_cache_vmap(addr, end);
+	if (!err)
+		err = kmsan_ioremap_page_range(addr, end, phys_addr, prot,
+					       ioremap_max_page_shift);
+	return err;
+}
+
 int ioremap_page_range(unsigned long addr, unsigned long end,
 		phys_addr_t phys_addr, pgprot_t prot)
 {
 	struct vm_struct *area;
-	int err;
 
 	area = find_vm_area((void *)addr);
 	if (!area || !(area->flags & VM_IOREMAP)) {
@@ -322,13 +335,7 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
 			  (long)area->addr + get_vm_area_size(area));
 		return -ERANGE;
 	}
-	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
-				 ioremap_max_page_shift);
-	flush_cache_vmap(addr, end);
-	if (!err)
-		err = kmsan_ioremap_page_range(addr, end, phys_addr, prot,
-					       ioremap_max_page_shift);
-	return err;
+	return vmap_page_range(addr, end, phys_addr, prot);
 }
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08 17:20           ` Alexei Starovoitov
@ 2024-03-08 17:24             ` Christoph Hellwig
  2024-03-08 17:53               ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-03-08 17:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christoph Hellwig, Miguel Ojeda, Linus Torvalds, linux-mm,
	Andrew Morton, Catalin Marinas, Will Deacon, Linux ARM,
	Daniel Borkmann, bpf, Bjorn Helgaas, linux-pci

On Fri, Mar 08, 2024 at 09:20:24AM -0800, Alexei Starovoitov wrote:
> ok. Like the attached patch?

Looks sensibe, but I think the powerpc callers of ioremap_page_range
will need the same treatment.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08 17:24             ` Christoph Hellwig
@ 2024-03-08 17:53               ` Alexei Starovoitov
  2024-03-08 22:44                 ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-08 17:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miguel Ojeda, Linus Torvalds, linux-mm, Andrew Morton,
	Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

[-- Attachment #1: Type: text/plain, Size: 695 bytes --]

On Fri, Mar 8, 2024 at 9:24 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Mar 08, 2024 at 09:20:24AM -0800, Alexei Starovoitov wrote:
> > ok. Like the attached patch?
>
> Looks sensibe, but I think the powerpc callers of ioremap_page_range
> will need the same treatment.

Good point. Only one of the callers in arch/powerpc needs adjusting.
Found few other similar arch users.
See attached patch.

ioremap_page() in arch/arm/mm/ioremap.c is an interesting case.
It is EXPORT_SYMBOL, but there are no in-tree users.
I think we shouldn't apply checks to it,
since some out-of-tree module may fail.
I have no arm boards to test, I suggest we play safe than sorry.

[-- Attachment #2: 0001-mm-Introduce-vmap_page_range-to-map-pages-in-PCI-add.patch --]
[-- Type: application/octet-stream, Size: 6504 bytes --]

From bf82806a2800b49c5627bbf057406245d0717ccd Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Fri, 8 Mar 2024 09:12:54 -0800
Subject: [PATCH bpf-next] mm: Introduce vmap_page_range() to map pages in PCI
 address space

ioremap_page_range() should be used for ranges within vmalloc range only.
The vmalloc ranges are allocated by get_vm_area(). PCI has "resource"
allocator that manages PCI_IOBASE, IO_SPACE_LIMIT address range, hence
introduce vmap_page_range() to be used exclusively to map pages
in PCI address space.

Fixes: 3e49a866c9dc ("mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/arm/mm/ioremap.c            |  8 ++++----
 arch/loongarch/kernel/setup.c    |  2 +-
 arch/mips/loongson64/init.c      |  2 +-
 arch/powerpc/kernel/isa-bridge.c |  4 ++--
 drivers/pci/pci.c                |  4 ++--
 include/linux/io.h               |  7 +++++++
 mm/vmalloc.c                     | 23 +++++++++++++++--------
 7 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index 2129070065c3..794cfea9f9d4 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -110,8 +110,8 @@ void __init add_static_vm_early(struct static_vm *svm)
 int ioremap_page(unsigned long virt, unsigned long phys,
 		 const struct mem_type *mtype)
 {
-	return ioremap_page_range(virt, virt + PAGE_SIZE, phys,
-				  __pgprot(mtype->prot_pte));
+	return vmap_page_range(virt, virt + PAGE_SIZE, phys,
+			       __pgprot(mtype->prot_pte));
 }
 EXPORT_SYMBOL(ioremap_page);
 
@@ -466,8 +466,8 @@ int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
 	if (res->end > IO_SPACE_LIMIT)
 		return -EINVAL;
 
-	return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
-				  __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
+	return vmap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
+			       __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
 }
 EXPORT_SYMBOL(pci_remap_iospace);
 
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index 634ef17fd38b..fd915ad69c09 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -490,7 +490,7 @@ static int __init add_legacy_isa_io(struct fwnode_handle *fwnode,
 	}
 
 	vaddr = (unsigned long)(PCI_IOBASE + range->io_start);
-	ioremap_page_range(vaddr, vaddr + size, hw_start, pgprot_device(PAGE_KERNEL));
+	vmap_page_range(vaddr, vaddr + size, hw_start, pgprot_device(PAGE_KERNEL));
 
 	return 0;
 }
diff --git a/arch/mips/loongson64/init.c b/arch/mips/loongson64/init.c
index 553142c1f14f..a35dd7311795 100644
--- a/arch/mips/loongson64/init.c
+++ b/arch/mips/loongson64/init.c
@@ -180,7 +180,7 @@ static int __init add_legacy_isa_io(struct fwnode_handle *fwnode, resource_size_
 
 	vaddr = PCI_IOBASE + range->io_start;
 
-	ioremap_page_range(vaddr, vaddr + size, hw_start, pgprot_device(PAGE_KERNEL));
+	vmap_page_range(vaddr, vaddr + size, hw_start, pgprot_device(PAGE_KERNEL));
 
 	return 0;
 }
diff --git a/arch/powerpc/kernel/isa-bridge.c b/arch/powerpc/kernel/isa-bridge.c
index 48e0eaf1ad61..5c064485197a 100644
--- a/arch/powerpc/kernel/isa-bridge.c
+++ b/arch/powerpc/kernel/isa-bridge.c
@@ -46,8 +46,8 @@ static void remap_isa_base(phys_addr_t pa, unsigned long size)
 	WARN_ON_ONCE(size & ~PAGE_MASK);
 
 	if (slab_is_available()) {
-		if (ioremap_page_range(ISA_IO_BASE, ISA_IO_BASE + size, pa,
-				pgprot_noncached(PAGE_KERNEL)))
+		if (vmap_page_range(ISA_IO_BASE, ISA_IO_BASE + size, pa,
+				    pgprot_noncached(PAGE_KERNEL)))
 			vunmap_range(ISA_IO_BASE, ISA_IO_BASE + size);
 	} else {
 		early_ioremap_range(ISA_IO_BASE, pa, size,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c3585229c12a..ccee56615f78 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4353,8 +4353,8 @@ int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
 	if (res->end > IO_SPACE_LIMIT)
 		return -EINVAL;
 
-	return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
-				  pgprot_device(PAGE_KERNEL));
+	return vmap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
+			       pgprot_device(PAGE_KERNEL));
 #else
 	/*
 	 * This architecture does not have memory mapped I/O space,
diff --git a/include/linux/io.h b/include/linux/io.h
index 7304f2a69960..235ba7d80a8f 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -23,12 +23,19 @@ void __iowrite64_copy(void __iomem *to, const void *from, size_t count);
 #ifdef CONFIG_MMU
 int ioremap_page_range(unsigned long addr, unsigned long end,
 		       phys_addr_t phys_addr, pgprot_t prot);
+int vmap_page_range(unsigned long addr, unsigned long end,
+		    phys_addr_t phys_addr, pgprot_t prot);
 #else
 static inline int ioremap_page_range(unsigned long addr, unsigned long end,
 				     phys_addr_t phys_addr, pgprot_t prot)
 {
 	return 0;
 }
+static inline int vmap_page_range(unsigned long addr, unsigned long end,
+				  phys_addr_t phys_addr, pgprot_t prot)
+{
+	return 0;
+}
 #endif
 
 /*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e5b8c70950bc..1e36322d83d8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -304,11 +304,24 @@ static int vmap_range_noflush(unsigned long addr, unsigned long end,
 	return err;
 }
 
+int vmap_page_range(unsigned long addr, unsigned long end,
+		    phys_addr_t phys_addr, pgprot_t prot)
+{
+	int err;
+
+	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
+				 ioremap_max_page_shift);
+	flush_cache_vmap(addr, end);
+	if (!err)
+		err = kmsan_ioremap_page_range(addr, end, phys_addr, prot,
+					       ioremap_max_page_shift);
+	return err;
+}
+
 int ioremap_page_range(unsigned long addr, unsigned long end,
 		phys_addr_t phys_addr, pgprot_t prot)
 {
 	struct vm_struct *area;
-	int err;
 
 	area = find_vm_area((void *)addr);
 	if (!area || !(area->flags & VM_IOREMAP)) {
@@ -322,13 +335,7 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
 			  (long)area->addr + get_vm_area_size(area));
 		return -ERANGE;
 	}
-	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
-				 ioremap_max_page_shift);
-	flush_cache_vmap(addr, end);
-	if (!err)
-		err = kmsan_ioremap_page_range(addr, end, phys_addr, prot,
-					       ioremap_max_page_shift);
-	return err;
+	return vmap_page_range(addr, end, phys_addr, prot);
 }
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08 17:53               ` Alexei Starovoitov
@ 2024-03-08 22:44                 ` Alexei Starovoitov
  2024-03-09  1:37                   ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-08 22:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miguel Ojeda, Linus Torvalds, linux-mm, Andrew Morton,
	Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

On Fri, Mar 8, 2024 at 9:53 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Mar 8, 2024 at 9:24 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Fri, Mar 08, 2024 at 09:20:24AM -0800, Alexei Starovoitov wrote:
> > > ok. Like the attached patch?
> >
> > Looks sensibe, but I think the powerpc callers of ioremap_page_range
> > will need the same treatment.
>
> Good point. Only one of the callers in arch/powerpc needs adjusting.
> Found few other similar arch users.
> See attached patch.
>
> ioremap_page() in arch/arm/mm/ioremap.c is an interesting case.
> It is EXPORT_SYMBOL, but there are no in-tree users.
> I think we shouldn't apply checks to it,
> since some out-of-tree module may fail.
> I have no arm boards to test, I suggest we play safe than sorry.

I double checked on my newly setup arm64 VM that the fix works.
I believe the regression needs to be fixed today, but
looks like Chritoph is out for today.
So I can either revert the offending commit or
apply the proposed fix to bpf-next.
I'm going to do the latter soon if no one objects.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-08 22:44                 ` Alexei Starovoitov
@ 2024-03-09  1:37                   ` Alexei Starovoitov
  2024-03-09 15:29                     ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-09  1:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miguel Ojeda, Linus Torvalds, linux-mm, Andrew Morton,
	Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

On Fri, Mar 8, 2024 at 2:44 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Mar 8, 2024 at 9:53 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Mar 8, 2024 at 9:24 AM Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > On Fri, Mar 08, 2024 at 09:20:24AM -0800, Alexei Starovoitov wrote:
> > > > ok. Like the attached patch?
> > >
> > > Looks sensibe, but I think the powerpc callers of ioremap_page_range
> > > will need the same treatment.
> >
> > Good point. Only one of the callers in arch/powerpc needs adjusting.
> > Found few other similar arch users.
> > See attached patch.
> >
> > ioremap_page() in arch/arm/mm/ioremap.c is an interesting case.
> > It is EXPORT_SYMBOL, but there are no in-tree users.
> > I think we shouldn't apply checks to it,
> > since some out-of-tree module may fail.
> > I have no arm boards to test, I suggest we play safe than sorry.
>
> I double checked on my newly setup arm64 VM that the fix works.
> I believe the regression needs to be fixed today, but
> looks like Chritoph is out for today.
> So I can either revert the offending commit or
> apply the proposed fix to bpf-next.
> I'm going to do the latter soon if no one objects.

And now applied the fix to bpf-next
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=d0d131e3b655fd267d14bb1bed49e3f990a1465e

I've looked through all remaining users of ioremap_page_range()
in alpha, arm, loongarch, mips, powerpc, sh, x86 and
all of them do the right thing:
*get_vm_area*(.. VM_IOREMAP, ...)
followed by ioremap_page_range().

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-09  1:37                   ` Alexei Starovoitov
@ 2024-03-09 15:29                     ` Christoph Hellwig
  2024-03-09 16:33                       ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-03-09 15:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christoph Hellwig, Miguel Ojeda, Linus Torvalds, linux-mm,
	Andrew Morton, Catalin Marinas, Will Deacon, Linux ARM,
	Daniel Borkmann, bpf, Bjorn Helgaas, linux-pci

Hi Alexei,

yes, the patch looks good, and yes, arm32 ioremap_page should just be
removed.  I can send  patch unless you plan to.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-09 15:29                     ` Christoph Hellwig
@ 2024-03-09 16:33                       ` Alexei Starovoitov
  2024-03-09 16:36                         ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-09 16:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miguel Ojeda, Linus Torvalds, linux-mm, Andrew Morton,
	Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

On Sat, Mar 9, 2024 at 7:29 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> Hi Alexei,
>
> yes, the patch looks good,

Thanks for the review.
Can I add your ack or reviewed-by ?

> and yes, arm32 ioremap_page should just be
> removed.  I can send  patch unless you plan to.

Pls go ahead.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-09 16:33                       ` Alexei Starovoitov
@ 2024-03-09 16:36                         ` Christoph Hellwig
  2024-03-09 16:38                           ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-03-09 16:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christoph Hellwig, Miguel Ojeda, Linus Torvalds, linux-mm,
	Andrew Morton, Catalin Marinas, Will Deacon, Linux ARM,
	Daniel Borkmann, bpf, Bjorn Helgaas, linux-pci

On Sat, Mar 09, 2024 at 08:33:37AM -0800, Alexei Starovoitov wrote:
> On Sat, Mar 9, 2024 at 7:29 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > Hi Alexei,
> >
> > yes, the patch looks good,
> 
> Thanks for the review.
> Can I add your ack or reviewed-by ?

I thought you had already applied it, but in case you can still
ammend it:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-09 16:36                         ` Christoph Hellwig
@ 2024-03-09 16:38                           ` Alexei Starovoitov
  2024-03-11 11:46                             ` Miguel Ojeda
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2024-03-09 16:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miguel Ojeda, Linus Torvalds, linux-mm, Andrew Morton,
	Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

On Sat, Mar 9, 2024 at 8:36 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Sat, Mar 09, 2024 at 08:33:37AM -0800, Alexei Starovoitov wrote:
> > On Sat, Mar 9, 2024 at 7:29 AM Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > Hi Alexei,
> > >
> > > yes, the patch looks good,
> >
> > Thanks for the review.
> > Can I add your ack or reviewed-by ?
>
> I thought you had already applied it, but in case you can still
> ammend it:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks. It's on the top, so it's easy.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP
  2024-03-09 16:38                           ` Alexei Starovoitov
@ 2024-03-11 11:46                             ` Miguel Ojeda
  0 siblings, 0 replies; 20+ messages in thread
From: Miguel Ojeda @ 2024-03-11 11:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christoph Hellwig, Linus Torvalds, linux-mm, Andrew Morton,
	Catalin Marinas, Will Deacon, Linux ARM, Daniel Borkmann, bpf,
	Bjorn Helgaas, linux-pci

On Sat, Mar 9, 2024 at 5:39 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> Thanks. It's on the top, so it's easy.

A Link/Closes to the discussion/report(s) (I think Naresh also
reported it) would be nice. In any case:

Tested-by: Miguel Ojeda <ojeda@kernel.org>

Thanks!

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-03-11 11:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07 17:42 vm_area at addr ffffffffc0800000 is not marked as VM_IOREMAP Miguel Ojeda
2024-03-07 17:54 ` Alexei Starovoitov
2024-03-08  3:49   ` Alexei Starovoitov
2024-03-08 10:48     ` Manivannan Sadhasivam
2024-03-08 11:23     ` Miguel Ojeda
2024-03-08 15:04     ` Christoph Hellwig
2024-03-08 16:33       ` Alexei Starovoitov
2024-03-08 17:02         ` Christoph Hellwig
2024-03-08 17:20           ` Alexei Starovoitov
2024-03-08 17:24             ` Christoph Hellwig
2024-03-08 17:53               ` Alexei Starovoitov
2024-03-08 22:44                 ` Alexei Starovoitov
2024-03-09  1:37                   ` Alexei Starovoitov
2024-03-09 15:29                     ` Christoph Hellwig
2024-03-09 16:33                       ` Alexei Starovoitov
2024-03-09 16:36                         ` Christoph Hellwig
2024-03-09 16:38                           ` Alexei Starovoitov
2024-03-11 11:46                             ` Miguel Ojeda
2024-03-08 16:13     ` Bjorn Helgaas
2024-03-08 16:37       ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).