Avoiding taking a lock in an IRQ context is not enough to prevent deadlocks, as discovered by syzbot: === WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 5.5.0-syzkaller #0 Not tainted ----------------------------------------------------- syz-executor.0/8927 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire: ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline] ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: sidtab_sid2str_put.part.0+0x36/0x880 security/selinux/ss/sidtab.c:533 and this task is already holding: ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline] ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: nf_conntrack_lock+0x17/0x70 net/netfilter/nf_conntrack_core.c:91 which would create a new lock dependency: (&(&nf_conntrack_locks[i])->rlock){+.-.} -> (&(&s->cache_lock)->rlock){+.+.} but this new dependency connects a SOFTIRQ-irq-safe lock: (&(&nf_conntrack_locks[i])->rlock){+.-.} [...] other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&s->cache_lock)->rlock); local_irq_disable(); lock(&(&nf_conntrack_locks[i])->rlock); lock(&(&s->cache_lock)->rlock); <Interrupt> lock(&(&nf_conntrack_locks[i])->rlock); *** DEADLOCK *** [...] === Fix this by simply locking with irqsave/irqrestore and stop giving up on !in_task(). It makes the locking a bit slower, but it shouldn't make a big difference in real workloads. Under the scenario from [1] (only cache hits) it only increased the runtime overhead from the security_secid_to_secctx() function from ~2% to ~3% (it was ~5-65% before introducing the cache). [1] https://bugzilla.redhat.com/show_bug.cgi?id=1733259 Fixes: d97bd23c2d7d ("selinux: cache the SID -> context string translation") Reported-by: syzbot+61cba5033e2072d61806@syzkaller.appspotmail.com Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> --- security/selinux/ss/sidtab.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c index a308ce1e6a13..f511ffccb131 100644 --- a/security/selinux/ss/sidtab.c +++ b/security/selinux/ss/sidtab.c @@ -518,19 +518,13 @@ void sidtab_sid2str_put(struct sidtab *s, struct sidtab_entry *entry, const char *str, u32 str_len) { struct sidtab_str_cache *cache, *victim = NULL; + unsigned long flags; /* do not cache invalid contexts */ if (entry->context.len) return; - /* - * Skip the put operation when in non-task context to avoid the need - * to disable interrupts while holding s->cache_lock. - */ - if (!in_task()) - return; - - spin_lock(&s->cache_lock); + spin_lock_irqsave(&s->cache_lock, flags); cache = rcu_dereference_protected(entry->cache, lockdep_is_held(&s->cache_lock)); @@ -561,7 +555,7 @@ void sidtab_sid2str_put(struct sidtab *s, struct sidtab_entry *entry, rcu_assign_pointer(entry->cache, cache); out_unlock: - spin_unlock(&s->cache_lock); + spin_unlock_irqrestore(&s->cache_lock, flags); kfree_rcu(victim, rcu_member); } -- 2.24.1
On 2/3/20 3:50 AM, Ondrej Mosnacek wrote:
> Avoiding taking a lock in an IRQ context is not enough to prevent
> deadlocks, as discovered by syzbot:
>
> ===
> WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
> 5.5.0-syzkaller #0 Not tainted
> -----------------------------------------------------
> syz-executor.0/8927 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
> ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
> ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: sidtab_sid2str_put.part.0+0x36/0x880 security/selinux/ss/sidtab.c:533
>
> and this task is already holding:
> ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
> ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: nf_conntrack_lock+0x17/0x70 net/netfilter/nf_conntrack_core.c:91
> which would create a new lock dependency:
> (&(&nf_conntrack_locks[i])->rlock){+.-.} -> (&(&s->cache_lock)->rlock){+.+.}
>
> but this new dependency connects a SOFTIRQ-irq-safe lock:
> (&(&nf_conntrack_locks[i])->rlock){+.-.}
>
> [...]
>
> other info that might help us debug this:
>
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&(&s->cache_lock)->rlock);
> local_irq_disable();
> lock(&(&nf_conntrack_locks[i])->rlock);
> lock(&(&s->cache_lock)->rlock);
> <Interrupt>
> lock(&(&nf_conntrack_locks[i])->rlock);
>
> *** DEADLOCK ***
> [...]
> ===
>
> Fix this by simply locking with irqsave/irqrestore and stop giving up on
> !in_task(). It makes the locking a bit slower, but it shouldn't make a
> big difference in real workloads. Under the scenario from [1] (only
> cache hits) it only increased the runtime overhead from the
> security_secid_to_secctx() function from ~2% to ~3% (it was ~5-65%
> before introducing the cache).
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1733259
>
> Fixes: d97bd23c2d7d ("selinux: cache the SID -> context string translation")
> Reported-by: syzbot+61cba5033e2072d61806@syzkaller.appspotmail.com
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[...]
On Mon, Feb 3, 2020 at 3:50 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> Avoiding taking a lock in an IRQ context is not enough to prevent
> deadlocks, as discovered by syzbot:
>
> ===
> WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
> 5.5.0-syzkaller #0 Not tainted
> -----------------------------------------------------
> syz-executor.0/8927 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
> ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
> ffff888027c94098 (&(&s->cache_lock)->rlock){+.+.}, at: sidtab_sid2str_put.part.0+0x36/0x880 security/selinux/ss/sidtab.c:533
>
> and this task is already holding:
> ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
> ffffffff898639b0 (&(&nf_conntrack_locks[i])->rlock){+.-.}, at: nf_conntrack_lock+0x17/0x70 net/netfilter/nf_conntrack_core.c:91
> which would create a new lock dependency:
> (&(&nf_conntrack_locks[i])->rlock){+.-.} -> (&(&s->cache_lock)->rlock){+.+.}
>
> but this new dependency connects a SOFTIRQ-irq-safe lock:
> (&(&nf_conntrack_locks[i])->rlock){+.-.}
>
> [...]
>
> other info that might help us debug this:
>
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&(&s->cache_lock)->rlock);
> local_irq_disable();
> lock(&(&nf_conntrack_locks[i])->rlock);
> lock(&(&s->cache_lock)->rlock);
> <Interrupt>
> lock(&(&nf_conntrack_locks[i])->rlock);
>
> *** DEADLOCK ***
> [...]
> ===
>
> Fix this by simply locking with irqsave/irqrestore and stop giving up on
> !in_task(). It makes the locking a bit slower, but it shouldn't make a
> big difference in real workloads. Under the scenario from [1] (only
> cache hits) it only increased the runtime overhead from the
> security_secid_to_secctx() function from ~2% to ~3% (it was ~5-65%
> before introducing the cache).
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1733259
>
> Fixes: d97bd23c2d7d ("selinux: cache the SID -> context string translation")
> Reported-by: syzbot+61cba5033e2072d61806@syzkaller.appspotmail.com
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
> security/selinux/ss/sidtab.c | 12 +++---------
> 1 file changed, 3 insertions(+), 9 deletions(-)
Merged into selinux/stable-5.6; I'll send this up to Linus next week.
--
paul moore
www.paul-moore.com