All of lore.kernel.org
 help / color / mirror / Atom feed
* [net RFC v1 0/1] Fix NULL pointer dereference in page_pool
@ 2022-01-22  0:56 Colin Foster
  2022-01-22  0:56 ` [net RFC v1 1/1] page_pool: fix NULL dereference crash Colin Foster
  0 siblings, 1 reply; 10+ messages in thread
From: Colin Foster @ 2022-01-22  0:56 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Jakub Kicinski, David S. Miller,
	Ilias Apalodimas, Jesper Dangaard Brouer

I'm not sure if there's something wrong with my config, but as soon as I
run "ip link set eth0 up" I would get a crash that would invoke a
seemingly endless memory dump.

git bisect led me to the page_pool, where there isn't the existence of
page_pool_params inside of the pool. Therefore the check of
if (pool->p.init_callback) would cause a crash.

I have some out-of-tree patches currently, so I'm not sure if my case is
valid. Specifically the MTU of cpsw_new has been updated to 1520 to
account for my setup (beaglebone with eth0 as the CPU port of a DSA).
I'm also not familiar with much of net/core.

If it is valid that page_pool might not have page_pool_params in a DSA
scenario, then hopefully this patch is sufficient. If it isn't valid and
something I'm doing is invoking a memory issue - then I've got my work
cut out for me :-)


Colin Foster (1):
  page_pool: fix NULL dereference crash

 net/core/page_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [net RFC v1 1/1] page_pool: fix NULL dereference crash
  2022-01-22  0:56 [net RFC v1 0/1] Fix NULL pointer dereference in page_pool Colin Foster
@ 2022-01-22  0:56 ` Colin Foster
  2022-01-22  1:13   ` Alexei Starovoitov
  2022-01-22  3:15   ` kernel test robot
  0 siblings, 2 replies; 10+ messages in thread
From: Colin Foster @ 2022-01-22  0:56 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Jakub Kicinski, David S. Miller,
	Ilias Apalodimas, Jesper Dangaard Brouer

Check for the existence of page pool params before dereferencing. This can
cause crashes in certain conditions.

Fixes: 35b2e549894b ("page_pool: Add callback to init pages when they are
allocated")

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
---
 net/core/page_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index bd62c01a2ec3..641f849c95e7 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -213,7 +213,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
 {
 	page->pp = pool;
 	page->pp_magic |= PP_SIGNATURE;
-	if (pool->p.init_callback)
+	if (pool->p && pool->p.init_callback)
 		pool->p.init_callback(page, pool->p.init_arg);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [net RFC v1 1/1] page_pool: fix NULL dereference crash
  2022-01-22  0:56 ` [net RFC v1 1/1] page_pool: fix NULL dereference crash Colin Foster
@ 2022-01-22  1:13   ` Alexei Starovoitov
  2022-01-22  2:20     ` Colin Foster
  2022-01-22  2:40     ` Colin Foster
  2022-01-22  3:15   ` kernel test robot
  1 sibling, 2 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2022-01-22  1:13 UTC (permalink / raw)
  To: Colin Foster
  Cc: LKML, Network Development, Toke Høiland-Jørgensen,
	John Fastabend, Alexei Starovoitov, Jakub Kicinski,
	David S. Miller, Ilias Apalodimas, Jesper Dangaard Brouer

On Fri, Jan 21, 2022 at 4:57 PM Colin Foster
<colin.foster@in-advantage.com> wrote:
>
> Check for the existence of page pool params before dereferencing. This can
> cause crashes in certain conditions.

In what conditions?
Out of tree driver?

> Fixes: 35b2e549894b ("page_pool: Add callback to init pages when they are
> allocated")
>
> Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
> ---
>  net/core/page_pool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index bd62c01a2ec3..641f849c95e7 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -213,7 +213,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
>  {
>         page->pp = pool;
>         page->pp_magic |= PP_SIGNATURE;
> -       if (pool->p.init_callback)
> +       if (pool->p && pool->p.init_callback)
>                 pool->p.init_callback(page, pool->p.init_arg);
>  }
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [net RFC v1 1/1] page_pool: fix NULL dereference crash
  2022-01-22  1:13   ` Alexei Starovoitov
@ 2022-01-22  2:20     ` Colin Foster
  2022-01-22  2:47       ` Alexei Starovoitov
  2022-01-22  2:40     ` Colin Foster
  1 sibling, 1 reply; 10+ messages in thread
From: Colin Foster @ 2022-01-22  2:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: LKML, Network Development, Toke Høiland-Jørgensen,
	John Fastabend, Alexei Starovoitov, Jakub Kicinski,
	David S. Miller, Ilias Apalodimas, Jesper Dangaard Brouer

On Fri, Jan 21, 2022 at 05:13:28PM -0800, Alexei Starovoitov wrote:
> On Fri, Jan 21, 2022 at 4:57 PM Colin Foster
> <colin.foster@in-advantage.com> wrote:
> >
> > Check for the existence of page pool params before dereferencing. This can
> > cause crashes in certain conditions.
> 
> In what conditions?
> Out of tree driver?

Hi Alexei,

Thanks for the quick response.

I'm actively working on a DSA driver that is currently out-of-tree, but
trying to get it into mainline. But I'm not convinced that's the
problem...

I'm using a beagelebone black with the cpsw_new driver. There are two
tweaks to that driver: the default vlan port is 10 and 11 so there's no
conflict between cpsw_new and DSA, and the ndev->max_mtu / rx_packet_max
have been increased to 1600 to allow for DSA frames larger than the
standard MTU of 1500.

My focus is on the DSA driver, but the crash happens as soon as I run
"ip link set eth0 up" which is invoking the cpsw_new driver. Therefore I
suspect that the issue is not directly related to the DSA section
(ocelot / felix, much of which uses code that is mainline)

As I suggested, I haven't dug into what is going on around the
page_pool yet. If there is something that is pre-loading memory at 1500
byte intervals and I broke that, that's entirely on me.

[ removes 1600 byte MTU patch and pool patch ]

I can confirm it still crashes when I don't modify the MTU in the
cpsw_new driver... so that doesn't seem to be it. That crash log is
below.


# ip link set eth0 up
[   18.074704] cpsw-switch 4a100000.switch: starting ndev. mode: dual_mac
[   18.174286] SMSC LAN8710/LAN8720 4a101000.mdio:00: attached PHY driver (mii_bus:phy_addr=4a101000.mdio:00, irq=POLL)
[   18.185458] 8<--- cut here ---
[   18.188554] Unable to handle kernel paging request at virtual address c3104440
[   18.195819] [c3104440] *pgd=8300041e(bad)
[   18.199885] Internal error: Oops: 8000000d [#1] SMP ARM
[   18.205148] Modules linked in:
[   18.208233] CPU: 0 PID: 168 Comm: ip Not tainted 5.16.0-05302-g8bd405e6e8a0-dirty #265
[   18.216201] Hardware name: Generic AM33XX (Flattened Device Tree)
[   18.222328] PC is at 0xc3104440
[   18.225500] LR is at __page_pool_alloc_pages_slow+0xbc/0x2e0
[   18.231222] pc : [<c3104440>]    lr : [<c0ee06c8>]    psr: a00b0013
[   18.237523] sp : c3104440  ip : 00000020  fp : c219e580
[   18.242778] r10: c1a04d54  r9 : 00000000  r8 : 00000000
[   18.248032] r7 : c36b9000  r6 : 00000000  r5 : c36b9084  r4 : 00000000
[   18.254595] r3 : c07a399c  r2 : 00000000  r1 : c325784c  r0 : dfa48bc0
[   18.261162] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   18.268343] Control: 10c5387d  Table: 836f0019  DAC: 00000051
[   18.274119] Register r0 information: non-slab/vmalloc memory
[   18.279825] Register r1 information: non-slab/vmalloc memory
[   18.285523] Register r2 information: NULL pointer
[   18.290260] Register r3 information: non-slab/vmalloc memory
[   18.295957] Register r4 information: NULL pointer
[   18.300693] Register r5 information: slab kmalloc-1k start c36b9000 pointer offset 132 size 1024
[   18.309569] Register r6 information: NULL pointer
[   18.314306] Register r7 information: slab kmalloc-1k start c36b9000 pointer offset 0 size 1024
[   18.322999] Register r8 information: NULL pointer
[   18.327736] Register r9 information: NULL pointer
[   18.332473] Register r10 information: non-slab/vmalloc memory
[   18.338257] Register r11 information: slab kmalloc-4k start c219e000 pointer offset 1408 size 4096
[   18.347301] Register r12 information: non-paged memory
[   18.352475] Process ip (pid: 168, stack limit = 0x7eb0d4ab)
[   18.358089] Stack: (0xc3104440 to 0xc3258000)
(too big a stack to show)


I can confirm that it crashes on net-next/master as well:
commit fe8152b38d3a, using the same DTB that defines the cpsw_new port
as the DSA master. Relevant DTS snippet from my in-development driver:

+&spi0 {
+       #address-cells = <1>;
+       #size-cells = <0>;
+       status = "okay";
+
+       ocelot-chip@0 {
+               compatible = "mscc,vsc7512_mfd_spi";
+               spi-max-frequency = <2500000>;
+               reg = <0>;
+
+               ethernet-switch@0 {
+                       compatible = "mscc,vsc7512-ext-switch";
+                       ports {
+                               #address-cells = <1>;
+                               #size-cells = <0>;
+
+                               port@0 {
+                                       reg = <0>;
+                                       label = "cpu";
+                                       status = "okay";
+                                       ethernet = <&mac_sw>;
+                                       phy-handle = <&sw_phy0>;
+                                       phy-mode = "internal";
+                               };


I was hoping for an "oh, if a switch is set up in DSA the page_pool gets
set up this way" type scenario. I fully understand that might not be the
case, and the issue could be in something I'm doing incorrectly - it
certainly wouldn't be the first time.

If this patch doesn't make sense I can look deeper. As mentioned, I'm
working to get this accepted upstream, so I'll have to figure it out one
way or another.

> 
> > Fixes: 35b2e549894b ("page_pool: Add callback to init pages when they are
> > allocated")
> >
> > Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
> > ---
> >  net/core/page_pool.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index bd62c01a2ec3..641f849c95e7 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -213,7 +213,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
> >  {
> >         page->pp = pool;
> >         page->pp_magic |= PP_SIGNATURE;
> > -       if (pool->p.init_callback)
> > +       if (pool->p && pool->p.init_callback)
> >                 pool->p.init_callback(page, pool->p.init_arg);
> >  }
> >
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [net RFC v1 1/1] page_pool: fix NULL dereference crash
  2022-01-22  1:13   ` Alexei Starovoitov
  2022-01-22  2:20     ` Colin Foster
@ 2022-01-22  2:40     ` Colin Foster
  2022-01-22  8:31       ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 10+ messages in thread
From: Colin Foster @ 2022-01-22  2:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: LKML, Network Development, Toke Høiland-Jørgensen,
	John Fastabend, Alexei Starovoitov, Jakub Kicinski,
	David S. Miller, Ilias Apalodimas, Jesper Dangaard Brouer

On Fri, Jan 21, 2022 at 05:13:28PM -0800, Alexei Starovoitov wrote:
> On Fri, Jan 21, 2022 at 4:57 PM Colin Foster
> <colin.foster@in-advantage.com> wrote:
> >
> > Check for the existence of page pool params before dereferencing. This can
> > cause crashes in certain conditions.
> 
> In what conditions?
> Out of tree driver?
> 
> > Fixes: 35b2e549894b ("page_pool: Add callback to init pages when they are
> > allocated")
> >
> > Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
> > ---
> >  net/core/page_pool.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index bd62c01a2ec3..641f849c95e7 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -213,7 +213,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
> >  {
> >         page->pp = pool;
> >         page->pp_magic |= PP_SIGNATURE;
> > -       if (pool->p.init_callback)
> > +       if (pool->p && pool->p.init_callback)

And my apologies - this should be if (pool... not if (pool->p. kernelbot
will be sure to tell me of this blunder soon

> >                 pool->p.init_callback(page, pool->p.init_arg);
> >  }
> >
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [net RFC v1 1/1] page_pool: fix NULL dereference crash
  2022-01-22  2:20     ` Colin Foster
@ 2022-01-22  2:47       ` Alexei Starovoitov
  2022-01-24 12:12         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-01-22  2:47 UTC (permalink / raw)
  To: Colin Foster
  Cc: LKML, Network Development, Toke Høiland-Jørgensen,
	John Fastabend, Alexei Starovoitov, Jakub Kicinski,
	David S. Miller, Ilias Apalodimas, Jesper Dangaard Brouer

On Fri, Jan 21, 2022 at 6:20 PM Colin Foster
<colin.foster@in-advantage.com> wrote:
>
> On Fri, Jan 21, 2022 at 05:13:28PM -0800, Alexei Starovoitov wrote:
> > On Fri, Jan 21, 2022 at 4:57 PM Colin Foster
> > <colin.foster@in-advantage.com> wrote:
> > >
> > > Check for the existence of page pool params before dereferencing. This can
> > > cause crashes in certain conditions.
> >
> > In what conditions?
> > Out of tree driver?
>
> Hi Alexei,
>
> Thanks for the quick response.
>
> I'm actively working on a DSA driver that is currently out-of-tree, but
> trying to get it into mainline. But I'm not convinced that's the
> problem...
>
> I'm using a beagelebone black with the cpsw_new driver. There are two
> tweaks to that driver: the default vlan port is 10 and 11 so there's no
> conflict between cpsw_new and DSA, and the ndev->max_mtu / rx_packet_max
> have been increased to 1600 to allow for DSA frames larger than the
> standard MTU of 1500.
>
> My focus is on the DSA driver, but the crash happens as soon as I run
> "ip link set eth0 up" which is invoking the cpsw_new driver. Therefore I
> suspect that the issue is not directly related to the DSA section
> (ocelot / felix, much of which uses code that is mainline)
>
> As I suggested, I haven't dug into what is going on around the
> page_pool yet. If there is something that is pre-loading memory at 1500
> byte intervals and I broke that, that's entirely on me.
>
> [ removes 1600 byte MTU patch and pool patch ]
>
> I can confirm it still crashes when I don't modify the MTU in the
> cpsw_new driver... so that doesn't seem to be it. That crash log is
> below.
>
>
> # ip link set eth0 up
> [   18.074704] cpsw-switch 4a100000.switch: starting ndev. mode: dual_mac
> [   18.174286] SMSC LAN8710/LAN8720 4a101000.mdio:00: attached PHY driver (mii_bus:phy_addr=4a101000.mdio:00, irq=POLL)
> [   18.185458] 8<--- cut here ---
> [   18.188554] Unable to handle kernel paging request at virtual address c3104440
> [   18.195819] [c3104440] *pgd=8300041e(bad)
> [   18.199885] Internal error: Oops: 8000000d [#1] SMP ARM
> [   18.205148] Modules linked in:
> [   18.208233] CPU: 0 PID: 168 Comm: ip Not tainted 5.16.0-05302-g8bd405e6e8a0-dirty #265
> [   18.216201] Hardware name: Generic AM33XX (Flattened Device Tree)
> [   18.222328] PC is at 0xc3104440
> [   18.225500] LR is at __page_pool_alloc_pages_slow+0xbc/0x2e0
> [   18.231222] pc : [<c3104440>]    lr : [<c0ee06c8>]    psr: a00b0013
> [   18.237523] sp : c3104440  ip : 00000020  fp : c219e580
> [   18.242778] r10: c1a04d54  r9 : 00000000  r8 : 00000000
> [   18.248032] r7 : c36b9000  r6 : 00000000  r5 : c36b9084  r4 : 00000000
> [   18.254595] r3 : c07a399c  r2 : 00000000  r1 : c325784c  r0 : dfa48bc0
> [   18.261162] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> [   18.268343] Control: 10c5387d  Table: 836f0019  DAC: 00000051
> [   18.274119] Register r0 information: non-slab/vmalloc memory
> [   18.279825] Register r1 information: non-slab/vmalloc memory
> [   18.285523] Register r2 information: NULL pointer
> [   18.290260] Register r3 information: non-slab/vmalloc memory
> [   18.295957] Register r4 information: NULL pointer
> [   18.300693] Register r5 information: slab kmalloc-1k start c36b9000 pointer offset 132 size 1024
> [   18.309569] Register r6 information: NULL pointer
> [   18.314306] Register r7 information: slab kmalloc-1k start c36b9000 pointer offset 0 size 1024
> [   18.322999] Register r8 information: NULL pointer
> [   18.327736] Register r9 information: NULL pointer
> [   18.332473] Register r10 information: non-slab/vmalloc memory
> [   18.338257] Register r11 information: slab kmalloc-4k start c219e000 pointer offset 1408 size 4096
> [   18.347301] Register r12 information: non-paged memory
> [   18.352475] Process ip (pid: 168, stack limit = 0x7eb0d4ab)
> [   18.358089] Stack: (0xc3104440 to 0xc3258000)
> (too big a stack to show)
>
>
> I can confirm that it crashes on net-next/master as well:
> commit fe8152b38d3a, using the same DTB that defines the cpsw_new port
> as the DSA master. Relevant DTS snippet from my in-development driver:
>
> +&spi0 {
> +       #address-cells = <1>;
> +       #size-cells = <0>;
> +       status = "okay";
> +
> +       ocelot-chip@0 {
> +               compatible = "mscc,vsc7512_mfd_spi";
> +               spi-max-frequency = <2500000>;
> +               reg = <0>;
> +
> +               ethernet-switch@0 {
> +                       compatible = "mscc,vsc7512-ext-switch";
> +                       ports {
> +                               #address-cells = <1>;
> +                               #size-cells = <0>;
> +
> +                               port@0 {
> +                                       reg = <0>;
> +                                       label = "cpu";
> +                                       status = "okay";
> +                                       ethernet = <&mac_sw>;
> +                                       phy-handle = <&sw_phy0>;
> +                                       phy-mode = "internal";
> +                               };
>
>
> I was hoping for an "oh, if a switch is set up in DSA the page_pool gets
> set up this way" type scenario. I fully understand that might not be the
> case, and the issue could be in something I'm doing incorrectly - it
> certainly wouldn't be the first time.
>
> If this patch doesn't make sense I can look deeper. As mentioned, I'm
> working to get this accepted upstream, so I'll have to figure it out one
> way or another.

With !pool tweak the patch makes sense.

Toke, wdyt?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [net RFC v1 1/1] page_pool: fix NULL dereference crash
  2022-01-22  0:56 ` [net RFC v1 1/1] page_pool: fix NULL dereference crash Colin Foster
  2022-01-22  1:13   ` Alexei Starovoitov
@ 2022-01-22  3:15   ` kernel test robot
  1 sibling, 0 replies; 10+ messages in thread
From: kernel test robot @ 2022-01-22  3:15 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2082 bytes --]

Hi Colin,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Colin-Foster/Fix-NULL-pointer-dereference-in-page_pool/20220122-085826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 03c82e80ec283b115c56026ecdb95c901a57c51e
config: arc-randconfig-r043-20220122 (https://download.01.org/0day-ci/archive/20220122/202201221122.LOf7DZi6-lkp(a)intel.com/config)
compiler: arceb-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6b2d9e5c2901873c8bba988d4db6cd5bfbaaeec4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Colin-Foster/Fix-NULL-pointer-dereference-in-page_pool/20220122-085826
        git checkout 6b2d9e5c2901873c8bba988d4db6cd5bfbaaeec4
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash net/core/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/core/page_pool.c: In function 'page_pool_set_pp_info':
>> net/core/page_pool.c:216:13: error: used struct type value where scalar is required
     216 |         if (pool->p && pool->p.init_callback)
         |             ^~~~


vim +216 net/core/page_pool.c

   210	
   211	static void page_pool_set_pp_info(struct page_pool *pool,
   212					  struct page *page)
   213	{
   214		page->pp = pool;
   215		page->pp_magic |= PP_SIGNATURE;
 > 216		if (pool->p && pool->p.init_callback)
   217			pool->p.init_callback(page, pool->p.init_arg);
   218	}
   219	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [net RFC v1 1/1] page_pool: fix NULL dereference crash
  2022-01-22  2:40     ` Colin Foster
@ 2022-01-22  8:31       ` Jesper Dangaard Brouer
  2022-01-22 19:46         ` Colin Foster
  0 siblings, 1 reply; 10+ messages in thread
From: Jesper Dangaard Brouer @ 2022-01-22  8:31 UTC (permalink / raw)
  To: Colin Foster, Alexei Starovoitov
  Cc: brouer, LKML, Network Development,
	Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Jakub Kicinski, David S. Miller,
	Ilias Apalodimas, Jesper Dangaard Brouer



On 22/01/2022 03.40, Colin Foster wrote:
> On Fri, Jan 21, 2022 at 05:13:28PM -0800, Alexei Starovoitov wrote:
>> On Fri, Jan 21, 2022 at 4:57 PM Colin Foster
>> <colin.foster@in-advantage.com> wrote:
>>>
>>> Check for the existence of page pool params before dereferencing. This can
>>> cause crashes in certain conditions.
>>
>> In what conditions?
>> Out of tree driver?
>>
>>> Fixes: 35b2e549894b ("page_pool: Add callback to init pages when they are
>>> allocated")
>>>
>>> Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
>>> ---
>>>   net/core/page_pool.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>>> index bd62c01a2ec3..641f849c95e7 100644
>>> --- a/net/core/page_pool.c
>>> +++ b/net/core/page_pool.c
>>> @@ -213,7 +213,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
>>>   {
>>>          page->pp = pool;
>>>          page->pp_magic |= PP_SIGNATURE;
>>> -       if (pool->p.init_callback)
>>> +       if (pool->p && pool->p.init_callback)
> 
> And my apologies - this should be if (pool... not if (pool->p. kernelbot
> will be sure to tell me of this blunder soon

Can you confirm if your crash is fixed by this change?


>>>                  pool->p.init_callback(page, pool->p.init_arg);
>>>   }


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [net RFC v1 1/1] page_pool: fix NULL dereference crash
  2022-01-22  8:31       ` Jesper Dangaard Brouer
@ 2022-01-22 19:46         ` Colin Foster
  0 siblings, 0 replies; 10+ messages in thread
From: Colin Foster @ 2022-01-22 19:46 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexei Starovoitov, brouer, LKML, Network Development,
	Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Jakub Kicinski, David S. Miller,
	Ilias Apalodimas, Jesper Dangaard Brouer

On Sat, Jan 22, 2022 at 09:31:17AM +0100, Jesper Dangaard Brouer wrote:
> 
> 
> On 22/01/2022 03.40, Colin Foster wrote:
> > On Fri, Jan 21, 2022 at 05:13:28PM -0800, Alexei Starovoitov wrote:
> > > On Fri, Jan 21, 2022 at 4:57 PM Colin Foster
> > > <colin.foster@in-advantage.com> wrote:
> > > > 
> > > > Check for the existence of page pool params before dereferencing. This can
> > > > cause crashes in certain conditions.
> > > 
> > > In what conditions?
> > > Out of tree driver?
> > > 
> > > > Fixes: 35b2e549894b ("page_pool: Add callback to init pages when they are
> > > > allocated")
> > > > 
> > > > Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
> > > > ---
> > > >   net/core/page_pool.c | 2 +-
> > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > > index bd62c01a2ec3..641f849c95e7 100644
> > > > --- a/net/core/page_pool.c
> > > > +++ b/net/core/page_pool.c
> > > > @@ -213,7 +213,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
> > > >   {
> > > >          page->pp = pool;
> > > >          page->pp_magic |= PP_SIGNATURE;
> > > > -       if (pool->p.init_callback)
> > > > +       if (pool->p && pool->p.init_callback)
> > 
> > And my apologies - this should be if (pool... not if (pool->p. kernelbot
> > will be sure to tell me of this blunder soon
> 
> Can you confirm if your crash is fixed by this change?

Yes, this is confirmed. I'd obviously like to make a more comprehensive
commit message - my main question is "is this an issue for all DSA
configurations?" Seemingly that is the case, but like I said, I'm
unfamiliar with this code. I'll see if I can get a better understanding
before sending the real patch early next week.

> 
> 
> > > >                  pool->p.init_callback(page, pool->p.init_arg);
> > > >   }
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [net RFC v1 1/1] page_pool: fix NULL dereference crash
  2022-01-22  2:47       ` Alexei Starovoitov
@ 2022-01-24 12:12         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-01-24 12:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Colin Foster
  Cc: LKML, Network Development, John Fastabend, Alexei Starovoitov,
	Jakub Kicinski, David S. Miller, Ilias Apalodimas,
	Jesper Dangaard Brouer

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Fri, Jan 21, 2022 at 6:20 PM Colin Foster
> <colin.foster@in-advantage.com> wrote:
>>
>> On Fri, Jan 21, 2022 at 05:13:28PM -0800, Alexei Starovoitov wrote:
>> > On Fri, Jan 21, 2022 at 4:57 PM Colin Foster
>> > <colin.foster@in-advantage.com> wrote:
>> > >
>> > > Check for the existence of page pool params before dereferencing. This can
>> > > cause crashes in certain conditions.
>> >
>> > In what conditions?
>> > Out of tree driver?
>>
>> Hi Alexei,
>>
>> Thanks for the quick response.
>>
>> I'm actively working on a DSA driver that is currently out-of-tree, but
>> trying to get it into mainline. But I'm not convinced that's the
>> problem...
>>
>> I'm using a beagelebone black with the cpsw_new driver. There are two
>> tweaks to that driver: the default vlan port is 10 and 11 so there's no
>> conflict between cpsw_new and DSA, and the ndev->max_mtu / rx_packet_max
>> have been increased to 1600 to allow for DSA frames larger than the
>> standard MTU of 1500.
>>
>> My focus is on the DSA driver, but the crash happens as soon as I run
>> "ip link set eth0 up" which is invoking the cpsw_new driver. Therefore I
>> suspect that the issue is not directly related to the DSA section
>> (ocelot / felix, much of which uses code that is mainline)
>>
>> As I suggested, I haven't dug into what is going on around the
>> page_pool yet. If there is something that is pre-loading memory at 1500
>> byte intervals and I broke that, that's entirely on me.
>>
>> [ removes 1600 byte MTU patch and pool patch ]
>>
>> I can confirm it still crashes when I don't modify the MTU in the
>> cpsw_new driver... so that doesn't seem to be it. That crash log is
>> below.
>>
>>
>> # ip link set eth0 up
>> [   18.074704] cpsw-switch 4a100000.switch: starting ndev. mode: dual_mac
>> [   18.174286] SMSC LAN8710/LAN8720 4a101000.mdio:00: attached PHY driver (mii_bus:phy_addr=4a101000.mdio:00, irq=POLL)
>> [   18.185458] 8<--- cut here ---
>> [   18.188554] Unable to handle kernel paging request at virtual address c3104440
>> [   18.195819] [c3104440] *pgd=8300041e(bad)
>> [   18.199885] Internal error: Oops: 8000000d [#1] SMP ARM
>> [   18.205148] Modules linked in:
>> [   18.208233] CPU: 0 PID: 168 Comm: ip Not tainted 5.16.0-05302-g8bd405e6e8a0-dirty #265
>> [   18.216201] Hardware name: Generic AM33XX (Flattened Device Tree)
>> [   18.222328] PC is at 0xc3104440
>> [   18.225500] LR is at __page_pool_alloc_pages_slow+0xbc/0x2e0
>> [   18.231222] pc : [<c3104440>]    lr : [<c0ee06c8>]    psr: a00b0013
>> [   18.237523] sp : c3104440  ip : 00000020  fp : c219e580
>> [   18.242778] r10: c1a04d54  r9 : 00000000  r8 : 00000000
>> [   18.248032] r7 : c36b9000  r6 : 00000000  r5 : c36b9084  r4 : 00000000
>> [   18.254595] r3 : c07a399c  r2 : 00000000  r1 : c325784c  r0 : dfa48bc0
>> [   18.261162] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
>> [   18.268343] Control: 10c5387d  Table: 836f0019  DAC: 00000051
>> [   18.274119] Register r0 information: non-slab/vmalloc memory
>> [   18.279825] Register r1 information: non-slab/vmalloc memory
>> [   18.285523] Register r2 information: NULL pointer
>> [   18.290260] Register r3 information: non-slab/vmalloc memory
>> [   18.295957] Register r4 information: NULL pointer
>> [   18.300693] Register r5 information: slab kmalloc-1k start c36b9000 pointer offset 132 size 1024
>> [   18.309569] Register r6 information: NULL pointer
>> [   18.314306] Register r7 information: slab kmalloc-1k start c36b9000 pointer offset 0 size 1024
>> [   18.322999] Register r8 information: NULL pointer
>> [   18.327736] Register r9 information: NULL pointer
>> [   18.332473] Register r10 information: non-slab/vmalloc memory
>> [   18.338257] Register r11 information: slab kmalloc-4k start c219e000 pointer offset 1408 size 4096
>> [   18.347301] Register r12 information: non-paged memory
>> [   18.352475] Process ip (pid: 168, stack limit = 0x7eb0d4ab)
>> [   18.358089] Stack: (0xc3104440 to 0xc3258000)
>> (too big a stack to show)
>>
>>
>> I can confirm that it crashes on net-next/master as well:
>> commit fe8152b38d3a, using the same DTB that defines the cpsw_new port
>> as the DSA master. Relevant DTS snippet from my in-development driver:
>>
>> +&spi0 {
>> +       #address-cells = <1>;
>> +       #size-cells = <0>;
>> +       status = "okay";
>> +
>> +       ocelot-chip@0 {
>> +               compatible = "mscc,vsc7512_mfd_spi";
>> +               spi-max-frequency = <2500000>;
>> +               reg = <0>;
>> +
>> +               ethernet-switch@0 {
>> +                       compatible = "mscc,vsc7512-ext-switch";
>> +                       ports {
>> +                               #address-cells = <1>;
>> +                               #size-cells = <0>;
>> +
>> +                               port@0 {
>> +                                       reg = <0>;
>> +                                       label = "cpu";
>> +                                       status = "okay";
>> +                                       ethernet = <&mac_sw>;
>> +                                       phy-handle = <&sw_phy0>;
>> +                                       phy-mode = "internal";
>> +                               };
>>
>>
>> I was hoping for an "oh, if a switch is set up in DSA the page_pool gets
>> set up this way" type scenario. I fully understand that might not be the
>> case, and the issue could be in something I'm doing incorrectly - it
>> certainly wouldn't be the first time.
>>
>> If this patch doesn't make sense I can look deeper. As mentioned, I'm
>> working to get this accepted upstream, so I'll have to figure it out one
>> way or another.
>
> With !pool tweak the patch makes sense.
>
> Toke, wdyt?

I don't really see how page_pool_set_pp_info() can be called with a NULL
'pool' pointer. There are only two callers:
__page_pool_alloc_page_order() and __page_pool_alloc_pages_slow(). And
both functions deref the pool pointer right before calling
page_pool_set_pp_info(), so I don't really see how this patch would
actually be a proper fix?

Looking a bit closer, actually the fault seems like it lies with the
driver; Colin, could you please check if the patch below fixes your
issue?

I'll go check all the other drivers as well and send a fix for all that
are affected...

-Toke


diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c
index 3537502e5e8b..648c78ecf2a5 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.c
+++ b/drivers/net/ethernet/ti/cpsw_priv.c
@@ -1146,7 +1146,7 @@ int cpsw_fill_rx_channels(struct cpsw_priv *priv)
 static struct page_pool *cpsw_create_page_pool(struct cpsw_common *cpsw,
                                               int size)
 {
-       struct page_pool_params pp_params;
+       struct page_pool_params pp_params = {};
        struct page_pool *pool;
 
        pp_params.order = 0;


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-01-24 12:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-22  0:56 [net RFC v1 0/1] Fix NULL pointer dereference in page_pool Colin Foster
2022-01-22  0:56 ` [net RFC v1 1/1] page_pool: fix NULL dereference crash Colin Foster
2022-01-22  1:13   ` Alexei Starovoitov
2022-01-22  2:20     ` Colin Foster
2022-01-22  2:47       ` Alexei Starovoitov
2022-01-24 12:12         ` Toke Høiland-Jørgensen
2022-01-22  2:40     ` Colin Foster
2022-01-22  8:31       ` Jesper Dangaard Brouer
2022-01-22 19:46         ` Colin Foster
2022-01-22  3:15   ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.