All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Sukholitko <boris.sukholitko@broadcom.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, keescook@chromium.org,
	yzaikin@google.com
Subject: Re: [PATCH] __register_sysctl_table: do not drop subdir
Date: Thu, 28 May 2020 11:08:12 +0300	[thread overview]
Message-ID: <20200528080812.GA21974@noodle> (raw)
In-Reply-To: <20200527125805.GI11244@42.do-not-panic.com>

On Wed, May 27, 2020 at 12:58:05PM +0000, Luis Chamberlain wrote:
> Eric since you authored the code which this code claism to fix, your
> review would be appreciated.
> 
> On Wed, May 27, 2020 at 01:48:48PM +0300, Boris Sukholitko wrote:
> > Successful get_subdir returns dir with its header.nreg properly
> > adjusted. No need to drop the dir in that case.
> 
> This commit log is not that clear to me
> can you explain what happens
> without this patch, and how critical it is to fix it. How did you
> notice this issue?

Apologies for being too terse with my explanation. I'll try to expand
below.

In testing of our kernel (based on 4.19, tainted, sorry!) on our aarch64 based hardware
we've come upon the following oops (lightly edited to omit irrelevant details):

000:50:01.133 Unable to handle kernel paging request at virtual address 0000000000007a12
000:50:02.209 Process brctl (pid: 14467, stack limit = 0x00000000bcf7a578)
000:50:02.209 CPU: 1 PID: 14467 Comm: brctl Tainted: P                  4.19.122 #1
000:50:02.209 Hardware name: Broadcom-v8A (DT)
000:50:02.209 pstate: 60000005 (nZCv daif -PAN -UAO)
000:50:02.209 pc : unregister_sysctl_table+0x1c/0xa0
000:50:02.209 lr : unregister_net_sysctl_table+0xc/0x20
000:50:02.209 sp : ffffff800e5ab9e0
000:50:02.209 x29: ffffff800e5ab9e0 x28: ffffffc016439ec0 
000:50:02.209 x27: 0000000000000000 x26: ffffff8008804078 
000:50:02.209 x25: ffffff80087b4dd8 x24: ffffffc015d65000 
000:50:02.209 x23: ffffffc01f0d6010 x22: ffffffc01f0d6000 
000:50:02.209 x21: ffffffc0166c4eb0 x20: 00000000000000bd 
000:50:02.209 x19: ffffffc01f0d6030 x18: 0000000000000400 
000:50:02.256 x17: 0000000000000000 x16: 0000000000000000 
000:50:02.256 x15: 0000000000000400 x14: 0000000000000129 
000:50:02.256 x13: 0000000000000001 x12: 0000000000000030 
000:50:02.256 x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f 
000:50:02.256 x9 : feff646663687161 x8 : ffffffffffffffff 
000:50:02.256 x7 : fefefefefefefefe x6 : 0000000000008080 
000:50:02.256 x5 : 00000000ffffffff x4 : ffffff8008905c38 
000:50:02.256 x3 : ffffffc01f0d602c x2 : 00000000000000bd 
000:50:02.256 x1 : ffffffc01f0d60c0 x0 : 0000000000007a12 
000:50:02.256 Call trace:
000:50:02.256  unregister_sysctl_table+0x1c/0xa0
000:50:02.256  unregister_net_sysctl_table+0xc/0x20
000:50:02.256  __devinet_sysctl_unregister.isra.0+0x2c/0x60
000:50:02.256  inetdev_event+0x198/0x510
000:50:02.256  notifier_call_chain+0x58/0xa0
000:50:02.303  raw_notifier_call_chain+0x14/0x20
000:50:02.303  call_netdevice_notifiers_info+0x34/0x80
000:50:02.303  rollback_registered_many+0x384/0x600
000:50:02.303  unregister_netdevice_queue+0x8c/0x110
000:50:02.303  br_dev_delete+0x8c/0xa0
000:50:02.303  br_del_bridge+0x44/0x70
000:50:02.303  br_ioctl_deviceless_stub+0xcc/0x310
000:50:02.303  sock_ioctl+0x194/0x3f0
000:50:02.303  compat_sock_ioctl+0x678/0xc00
000:50:02.303  __arm64_compat_sys_ioctl+0xf0/0xcb0
000:50:02.303  el0_svc_common+0x70/0x170
000:50:02.303  el0_svc_compat_handler+0x1c/0x30
000:50:02.303  el0_svc_compat+0x8/0x18
000:50:02.303 Code: a90153f3 aa0003f3 f9401000 b40000c0 (f9400001) 

The crash is in the call to count_subheaders(header->ctl_table_arg).

Although the header (being in x19 == 0xffffffc01f0d6030) looks like a
normal kernel pointer, ctl_table_arg (x0 == 0x0000000000007a12) looks
invalid.

Trying to find the issue, we've started tracing header allocation being
done by kzalloc in __register_sysctl_table and header freeing being done
in drop_sysctl_table.

Then we've noticed headers being freed which where not allocated before.
The faulty freeing was done on parent->header at the end of
drop_sysctl_table.

From this we've started to suspect some infelicity in header.nreg
refcounting, thus leading us the __register_sysctl_table fix in the
patch.

Here is more detailed explanation of the fix.

The current __register_sysctl_table logic looks like:

1. We start with some root dir, incrementing its header.nreg.

2. Then we find suitable dir using get_subdir function.

3. get_subdir decrements nreg on the parent dir and increments it on the
   dir being returned. See found label there.

4. We decrement dir's header.nreg for the symmetry with step 1.

IMHO, the bug is on step 4. If another dir is being returned by
get_subdir we decrement its nreg. I.e. the returned dir nreg stays 1
despite having children added to it.

This leads eventually to the innocent parent header being freed.

> If you don't apply this patch what issue do you see?

For some unexplained reason, the crashes are very rare and require
stressing the system while creating and destroing network interfaces.

> 
> Do we test for it? Can we?
> 

With some printk tracing the issue is easy to see while doing simple
brctl addbr / delbr to create and destroy bridge interface.

Probably there is some SLUB debug option which may allow to catch the
faulty free.

Thanks,
Boris.

>   Luis
> 
> > 
> > Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
> > Fixes: 7ec66d06362d (sysctl: Stop requiring explicit management of sysctl directories)
> > ---
> >  fs/proc/proc_sysctl.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> > index b6f5d459b087..6f237f0eb531 100644
> > --- a/fs/proc/proc_sysctl.c
> > +++ b/fs/proc/proc_sysctl.c
> > @@ -1286,8 +1286,8 @@ struct ctl_table_header *__register_sysctl_table(
> >  {
> >  	struct ctl_table_root *root = set->dir.header.root;
> >  	struct ctl_table_header *header;
> > +	struct ctl_dir *dir, *start_dir;
> >  	const char *name, *nextname;
> > -	struct ctl_dir *dir;
> >  	struct ctl_table *entry;
> >  	struct ctl_node *node;
> >  	int nr_entries = 0;
> > @@ -1307,6 +1307,7 @@ struct ctl_table_header *__register_sysctl_table(
> >  
> >  	spin_lock(&sysctl_lock);
> >  	dir = &set->dir;
> > +	start_dir = dir;
> >  	/* Reference moved down the diretory tree get_subdir */
> >  	dir->header.nreg++;
> >  	spin_unlock(&sysctl_lock);
> > @@ -1333,7 +1334,8 @@ struct ctl_table_header *__register_sysctl_table(
> >  	if (insert_header(dir, header))
> >  		goto fail_put_dir_locked;
> >  
> > -	drop_sysctl_table(&dir->header);
> > +	if (start_dir == dir)
> > +		drop_sysctl_table(&dir->header);
> >  	spin_unlock(&sysctl_lock);
> >  
> >  	return header;
> > -- 
> > 2.23.1
> > 

  reply	other threads:[~2020-05-28  8:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-27 10:48 [PATCH] __register_sysctl_table: do not drop subdir Boris Sukholitko
2020-05-27 12:58 ` Luis Chamberlain
2020-05-28  8:08   ` Boris Sukholitko [this message]
2020-05-28 14:04     ` Eric W. Biederman
2020-05-28 14:20       ` Luis Chamberlain
2020-05-31 11:44         ` Boris Sukholitko
2020-06-01 13:17           ` Luis Chamberlain
2020-05-31 11:39       ` Boris Sukholitko
2020-06-03  1:07 ` [__register_sysctl_table] 4092a9304d: WARNING:at_fs/proc/proc_sysctl.c:#retire_sysctl_set kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200528080812.GA21974@noodle \
    --to=boris.sukholitko@broadcom.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.