From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C538AC433E0 for ; Sat, 27 Feb 2021 00:01:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 98BC864F0E for ; Sat, 27 Feb 2021 00:01:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230060AbhB0AB3 (ORCPT ); Fri, 26 Feb 2021 19:01:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229618AbhB0AB1 (ORCPT ); Fri, 26 Feb 2021 19:01:27 -0500 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA9C9C061574 for ; Fri, 26 Feb 2021 16:00:46 -0800 (PST) Received: by mail-lf1-x12f.google.com with SMTP id d3so16337058lfg.10 for ; Fri, 26 Feb 2021 16:00:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aYTn2RsRhI2Df4daFJ4TP47tBaYxEFBrTiAP9a7aC0g=; b=o6kPKv3wxHpYD7k0rSRibyaVaAgQcF7OL+FgLZX5IjA8o9dsnsMMZTQRZK+6EXxWJC uvIgLCy/2HjaZo30sdFqu2ProOkM14RhejEz+IbJLZhfw7/HuPpEdpybozAIDg5jVlBh E/hWwYj8syeS8vW3FPsSZUSnbdQ3rnDKGLQwqxYim8+jPVjTVLJVqsz5JIpPxtWncwcV RTHTwex6oYezFpoiIrQpPclJS0rfYzXkUWoMpC08RmIb138+XIz93D9ZqEvQt8rpuHsm 8RA/KfM1IcODGdWZwIAoK2PQSlqayTV2h3aq2KlfyK/8EjHXcGsQPRYWszFHbQ62lOvq pm/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aYTn2RsRhI2Df4daFJ4TP47tBaYxEFBrTiAP9a7aC0g=; b=Yd5Rh/Bn4hMP0akUGFrILhurq08W8weNUMjhUiiDXhWNJhC+2VU98NLQ5BfpOHaCL3 /TiV9/OBZIfcxBXxwqtJOqTb/TnDvJsDE0RHR86PL1LTtef857bq16P8Vvb7R5i0wSQM fPkBWbcHCL0uCPVbcphwTkzR50QzotBnLMAsy3WgKMIty5KUrpqcTSYfjrjYS8B7usvY H3/e+UdGP8TMCMMPpbdn1Zgnwd1FC4Zgb6GThd4sZMjB2XvvR7nLinTFC41dc1bmlwi2 oqtmPkerXX3w9V3/gFeXLKgOGSpxqN/kt4RrIsBW9/XOKjO35c7tOgSX+NRgSSpmoGRz 0zFg== X-Gm-Message-State: AOAM530KDKqxCqNVYREJEkn7eA4ZjGsaExSaeRnw4mIoWXomdmKkjOZR RPaUiI1anSxjBhR1NEC2wzYp8j2cXqy7Ty8yruH4Mw== X-Google-Smtp-Source: ABdhPJwOLCGXvB6scWEWV0XJGGu89gKZEPooyWEQwZolS/vKBxz5Gv+zx3mcESOVxxhMMU10WmGnMlbyLNlZ41ABbQo= X-Received: by 2002:a05:6512:942:: with SMTP id u2mr3106548lft.117.1614384044474; Fri, 26 Feb 2021 16:00:44 -0800 (PST) MIME-Version: 1.0 References: <000000000000f1c03b05bc43aadc@google.com> <7b7c4f41-b72e-840f-278a-320b9d97f887@oracle.com> In-Reply-To: <7b7c4f41-b72e-840f-278a-320b9d97f887@oracle.com> From: Shakeel Butt Date: Fri, 26 Feb 2021 16:00:30 -0800 Message-ID: Subject: Re: possible deadlock in sk_clone_lock To: Mike Kravetz Cc: syzbot , Andrew Morton , LKML , Linux MM , syzkaller-bugs , Eric Dumazet , Mina Almasry , Michal Hocko Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz wrote: > > Cc: Michal > > On 2/26/21 2:44 PM, Shakeel Butt wrote: > > On Fri, Feb 26, 2021 at 2:09 PM syzbot > > wrote: > > >> other info that might help us debug this: > >> > >> Possible interrupt unsafe locking scenario: > >> > >> CPU0 CPU1 > >> ---- ---- > >> lock(hugetlb_lock); > >> local_irq_disable(); > >> lock(slock-AF_INET); > >> lock(hugetlb_lock); > >> > >> lock(slock-AF_INET); > >> > >> *** DEADLOCK *** > > > > This has been reproduced on 4.19 stable kernel as well [1] and there > > is a reproducer as well. > > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I > > wonder if we just need to make hugetlb_lock softirq-safe. > > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93 > > Thanks Shakeel, > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task > context") attempted to address this issue. It uses a work queue to > acquire hugetlb_lock if the caller is !in_task(). > > In another recent thread, there was the suggestion to change the > !in_task to in_atomic. > > I need to do some research on the subtle differences between in_task, > in_atomic, etc. TBH, I 'thought' !in_task would prevent the issue > reported here. But, that obviously is not the case. I think the freeing is happening in the process context in this report but it is creating the lock chain from softirq-safe slock to irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1) always defer the freeing of hugetlb pages to a work queue or (2) make hugetlb_lock softirq-safe. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1568C433DB for ; Sat, 27 Feb 2021 00:01:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2F30A64F0D for ; Sat, 27 Feb 2021 00:01:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2F30A64F0D Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 78ED68D0010; Fri, 26 Feb 2021 19:01:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 740858D0001; Fri, 26 Feb 2021 19:01:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 62EA58D0010; Fri, 26 Feb 2021 19:01:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0046.hostedemail.com [216.40.44.46]) by kanga.kvack.org (Postfix) with ESMTP id 4D7878D0001 for ; Fri, 26 Feb 2021 19:01:47 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1CBFE1802452B for ; Sat, 27 Feb 2021 00:01:47 +0000 (UTC) X-FDA: 77862094254.28.28464C6 Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) by imf16.hostedemail.com (Postfix) with ESMTP id D75A5801AE6D for ; Sat, 27 Feb 2021 00:00:45 +0000 (UTC) Received: by mail-lf1-f45.google.com with SMTP id 18so7810032lff.6 for ; Fri, 26 Feb 2021 16:00:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aYTn2RsRhI2Df4daFJ4TP47tBaYxEFBrTiAP9a7aC0g=; b=o6kPKv3wxHpYD7k0rSRibyaVaAgQcF7OL+FgLZX5IjA8o9dsnsMMZTQRZK+6EXxWJC uvIgLCy/2HjaZo30sdFqu2ProOkM14RhejEz+IbJLZhfw7/HuPpEdpybozAIDg5jVlBh E/hWwYj8syeS8vW3FPsSZUSnbdQ3rnDKGLQwqxYim8+jPVjTVLJVqsz5JIpPxtWncwcV RTHTwex6oYezFpoiIrQpPclJS0rfYzXkUWoMpC08RmIb138+XIz93D9ZqEvQt8rpuHsm 8RA/KfM1IcODGdWZwIAoK2PQSlqayTV2h3aq2KlfyK/8EjHXcGsQPRYWszFHbQ62lOvq pm/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aYTn2RsRhI2Df4daFJ4TP47tBaYxEFBrTiAP9a7aC0g=; b=NdBzBFqjCC7lAeq5FzB0vzDHDI3nNvAxXoUdsloMQjkOcR5bCQszVdWoEJFyGO50Y/ anvlx4tg6ZOCz/i4YTmWUADZtZiS4OZTVhdnc9VHOHMTCsk3p9AtaZ3Wc7vsikOzerUL NR6h1O3eWMyXBSYwCScxOoOajw5jUhkJKRL6QCYjbWq+q8+jpjgLMX6KNZnqqhAvrFHg dQzqsPYbQ+gWf1LaLcTIMrfhazyJ2EUZ7xGTcXlEnCrrfMXErJmFU1jqhSvLis/WocUU EcKFhR0PY1IucfGX9GHrob0vnkRf2CEqrVmrcdxmZKnfszIfh+79wRLGILzqfBlYE29T wVnA== X-Gm-Message-State: AOAM531kpJ452zQGjyogEZT1IH446kOVMpSfJKKnXjcMBw+tXIp6fDxK pwXW6AT5C8lMLFJLsUsv/3D/as08z7W04JriKhO0Lw== X-Google-Smtp-Source: ABdhPJwOLCGXvB6scWEWV0XJGGu89gKZEPooyWEQwZolS/vKBxz5Gv+zx3mcESOVxxhMMU10WmGnMlbyLNlZ41ABbQo= X-Received: by 2002:a05:6512:942:: with SMTP id u2mr3106548lft.117.1614384044474; Fri, 26 Feb 2021 16:00:44 -0800 (PST) MIME-Version: 1.0 References: <000000000000f1c03b05bc43aadc@google.com> <7b7c4f41-b72e-840f-278a-320b9d97f887@oracle.com> In-Reply-To: <7b7c4f41-b72e-840f-278a-320b9d97f887@oracle.com> From: Shakeel Butt Date: Fri, 26 Feb 2021 16:00:30 -0800 Message-ID: Subject: Re: possible deadlock in sk_clone_lock To: Mike Kravetz Cc: syzbot , Andrew Morton , LKML , Linux MM , syzkaller-bugs , Eric Dumazet , Mina Almasry , Michal Hocko Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D75A5801AE6D X-Stat-Signature: kucj8wo3cwmr78w91tah6rbude85tefk Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf16; identity=mailfrom; envelope-from=""; helo=mail-lf1-f45.google.com; client-ip=209.85.167.45 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614384045-441043 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz wrote: > > Cc: Michal > > On 2/26/21 2:44 PM, Shakeel Butt wrote: > > On Fri, Feb 26, 2021 at 2:09 PM syzbot > > wrote: > > >> other info that might help us debug this: > >> > >> Possible interrupt unsafe locking scenario: > >> > >> CPU0 CPU1 > >> ---- ---- > >> lock(hugetlb_lock); > >> local_irq_disable(); > >> lock(slock-AF_INET); > >> lock(hugetlb_lock); > >> > >> lock(slock-AF_INET); > >> > >> *** DEADLOCK *** > > > > This has been reproduced on 4.19 stable kernel as well [1] and there > > is a reproducer as well. > > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I > > wonder if we just need to make hugetlb_lock softirq-safe. > > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93 > > Thanks Shakeel, > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task > context") attempted to address this issue. It uses a work queue to > acquire hugetlb_lock if the caller is !in_task(). > > In another recent thread, there was the suggestion to change the > !in_task to in_atomic. > > I need to do some research on the subtle differences between in_task, > in_atomic, etc. TBH, I 'thought' !in_task would prevent the issue > reported here. But, that obviously is not the case. I think the freeing is happening in the process context in this report but it is creating the lock chain from softirq-safe slock to irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1) always defer the freeing of hugetlb pages to a work queue or (2) make hugetlb_lock softirq-safe.