From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4386CC2BA83 for ; Wed, 12 Feb 2020 05:08:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 02F582073C for ; Wed, 12 Feb 2020 05:08:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="n73nQHFj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726351AbgBLFI4 (ORCPT ); Wed, 12 Feb 2020 00:08:56 -0500 Received: from mail-pf1-f173.google.com ([209.85.210.173]:37562 "EHLO mail-pf1-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725601AbgBLFI4 (ORCPT ); Wed, 12 Feb 2020 00:08:56 -0500 Received: by mail-pf1-f173.google.com with SMTP id p14so620091pfn.4; Tue, 11 Feb 2020 21:08:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=i6oz4Em/YwD3zMXErB3gfnjjpevME7SwZCfI7Bvtm8I=; b=n73nQHFjhy+TiG3McdOtcWdpXmLqd3TJ/QJXJliiwFcRDQXzZD9TCGt3XkOlhz8cwH 3lheFgTW7pjZ3IWM/kYBsY4Kxie3OwJGLu4hW/45WqIE7wWyJ0WKm9jq7RZU71Qh851o +oI7f8Or8ABx6bn9mSAlmN9EbBBubGoGLyGOxzxCccXurub9tOOg8pMqlXq8zodcqhZQ 8tMm/m7a04fvLvRRyGOP4p1J0TKqa9K7sakeHh29hbC0qBRrPEcsryZVtK5nNkZexxov le4PZoj/9do7rIz/16nb7coONIntjJGSBxpBHF/tN6A5x9hFD8whorppnUvrTvVvg7eR pFBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=i6oz4Em/YwD3zMXErB3gfnjjpevME7SwZCfI7Bvtm8I=; b=tLJlt15Y0xivbGaoUKaaiDS7GP0lksOciOvi1ubMwshEUBzHmdU4ccYRQKojGFsPmo UMA6EHg3yXbO8K//whnpy/CmaDzeRH88Qyb8XUhZgJaF3cUx4dTyYkUj3O0N0m4h7GRV VdWI6PLf8LGkVEipU5ARyjUimUkWKY45JzXodOIu26yalFJGeIehaW/4PRVj95Jd2B1p cKfMtKlHzCvY4KGDJ0OUpjNdL8TqD8ntiAWpGE1CO93BglQ6K0uIzJ5jiO7FnWGB62ux 6bBlN3FtAQCeVvbKzHolxRhuBYIkI5xEQOL8poGH9cyu294HLFqYMqZP2kImFxPbrinL axnw== X-Gm-Message-State: APjAAAU7muFiyefPfDBNya7bRPUdGAyXCuow/gMaeN+wvzOeBXqN8KxR AdhqzcGQZnHf6IzMUzMMSS0t1j/Q X-Google-Smtp-Source: APXvYqz5FqsvQRv0b2GKjMQgW95qAHtiaIEADGGBRLwPeXIw3RL56UOM15zhS61eAIH5iXWH+0H9RQ== X-Received: by 2002:a63:cd43:: with SMTP id a3mr10817674pgj.247.1581484135202; Tue, 11 Feb 2020 21:08:55 -0800 (PST) Received: from [192.168.86.235] (c-73-241-150-58.hsd1.ca.comcast.net. [73.241.150.58]) by smtp.gmail.com with ESMTPSA id n15sm6330225pfq.168.2020.02.11.21.08.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Feb 2020 21:08:54 -0800 (PST) Subject: Re: Deadlock in cleanup_net and addrconf_verify_work locks up workqueue To: Sargun Dhillon , netdev , LKML Cc: Tetsuo Handa , Gabriel Hartmann , Rob Gulewich , Bruce Curtis , Hannes Frederic Sowa References: <20200211192330.GA9862@ircssh-2.c.rugged-nimbus-611.internal> From: Eric Dumazet Message-ID: <8924a0a5-9179-f6a9-91d8-1163b425ec35@gmail.com> Date: Tue, 11 Feb 2020 21:08:53 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <20200211192330.GA9862@ircssh-2.c.rugged-nimbus-611.internal> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/11/20 11:23 AM, Sargun Dhillon wrote: > We've found a workqueue stall / deadlock. Our workload is a container-oriented > workload in which we utilize IPv6. Our container (namespace) churn is quite > frequent, and containers can be terminated before their networking is > even setup. > > We're running 4.19.73 in production, and in investigation of the underlying > causes, I don't think that future versions of 4.19 fix it. > > We've narrowed it down to a lockup between ipv6_addrconf, and cleanup_net. Sure, PID 1369493 addrconf_verify_work() is waiting for RTNL. But PID 8 ? __flush_work() is being called. But from where ? Stacks seem not complete. > > crash> bt 8 > PID: 8 TASK: ffff9a1072b50000 CPU: 24 COMMAND: "kworker/u192:0" > #0 [ffffbfe2c00fbb70] __schedule at ffffffffa7f02bf7 > #1 [ffffbfe2c00fbc10] schedule at ffffffffa7f031e8 > #2 [ffffbfe2c00fbc18] schedule_timeout at ffffffffa7f0700e > #3 [ffffbfe2c00fbc90] wait_for_completion at ffffffffa7f03b50 > #4 [ffffbfe2c00fbce0] __flush_work at ffffffffa76a2532 > #5 [ffffbfe2c00fbd58] rollback_registered_many at ffffffffa7dbcdf4 > #6 [ffffbfe2c00fbdc0] unregister_netdevice_many at ffffffffa7dbd31e > #7 [ffffbfe2c00fbdd0] default_device_exit_batch at ffffffffa7dbd512 > #8 [ffffbfe2c00fbe40] cleanup_net at ffffffffa7dab970 > #9 [ffffbfe2c00fbe98] process_one_work at ffffffffa76a17c4 > #10 [ffffbfe2c00fbed8] worker_thread at ffffffffa76a19dd > #11 [ffffbfe2c00fbf10] kthread at ffffffffa76a7fd3 > #12 [ffffbfe2c00fbf50] ret_from_fork at ffffffffa80001ff > > crash> bt 1369493 > PID: 1369493 TASK: ffff9a03684d9600 CPU: 58 COMMAND: "kworker/58:1" > #0 [ffffbfe30d68fd48] __schedule at ffffffffa7f02bf7 > #1 [ffffbfe30d68fde8] schedule at ffffffffa7f031e8 > #2 [ffffbfe30d68fdf0] schedule_preempt_disabled at ffffffffa7f0349a > #3 [ffffbfe30d68fdf8] __mutex_lock at ffffffffa7f04aed > #4 [ffffbfe30d68fe90] addrconf_verify_work at ffffffffa7e8d1aa > #5 [ffffbfe30d68fe98] process_one_work at ffffffffa76a17c4 > #6 [ffffbfe30d68fed8] worker_thread at ffffffffa76a19dd > #7 [ffffbfe30d68ff10] kthread at ffffffffa76a7fd3 > #8 [ffffbfe30d68ff50] ret_from_fork at ffffffffa80001ff > > > > struct -x mutex.owner.counter rtnl_mutex > owner.counter = 0xffff9a1072b50001 > > 0xffff9a1072b50001 & (~0x07) = 0xffff9a1072b50000 > > This points back to PID 8 / CPU 24. It is working on cleanup_net, and a part > of cleanup net involves calling ops_exit_list, and as part of that it calls > default_device_exit_batch. default_device_exit_batch takes the rtnl lock before > calling into unregister_netdevice_many, and rollback_registered_many. > rollback_registered_many calls flush_all_backlogs. This will never complete > because it is holding the rtnl lock, and PID 1369493 / CPU 58 is waiting > for rtnl_lock. But PID 1369493 is waiting on a mutex, thus properly yielding the cpu. (schedule() is clearly shown) This should not prevent other threads from making progress so that flush_all_backlogs() completes eventually. flush_all_backlogs() does not care of how many threads are currently blocked because they can not grab rtnl while flush_all_backlogs() is running. > > If relevant, the workqueue stalls themselves look something like: > BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=0 stuck for 3720s! > BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=-20 stuck for 3719s! > Showing busy workqueues and worker pools: > workqueue events: flags=0x0 > pwq 32: cpus=16 node=0 flags=0x0 nice=0 active=2/256 > in-flight: 1274779:slab_caches_to_rcu_destroy_workfn slab_caches_to_rcu_destroy_workfn > workqueue events_highpri: flags=0x10 > pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256 > pending: flush_backlog BAR(8) > workqueue events_power_efficient: flags=0x82 > pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256 > in-flight: 1396446:check_lifetime > workqueue mm_percpu_wq: flags=0x8 > pwq 140: cpus=70 node=0 flags=0x0 nice=0 active=1/256 > pending: vmstat_update > workqueue netns: flags=0xe000a > pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1 > in-flight: 8:cleanup_net > delayed: cleanup_net > workqueue writeback: flags=0x4e > pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256 > in-flight: 1334335:wb_workfn > workqueue kblockd: flags=0x18 > pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256 > pending: blk_mq_run_work_fn > workqueue ipv6_addrconf: flags=0x40008 > pwq 116: cpus=58 node=0 flags=0x0 nice=0 active=1/1 > in-flight: 1369493:addrconf_verify_work > workqueue ena: flags=0xe000a > pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1 > in-flight: 7505:ena_fw_reset_device [ena] >