From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7F26C433E1 for ; Tue, 19 May 2020 23:11:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AF9B020578 for ; Tue, 19 May 2020 23:11:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=networkplumber-org.20150623.gappssmtp.com header.i=@networkplumber-org.20150623.gappssmtp.com header.b="c+jIAV2c" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728159AbgESXLw (ORCPT ); Tue, 19 May 2020 19:11:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726318AbgESXLv (ORCPT ); Tue, 19 May 2020 19:11:51 -0400 Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com [IPv6:2607:f8b0:4864:20::1042]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5186FC061A0F for ; Tue, 19 May 2020 16:11:51 -0700 (PDT) Received: by mail-pj1-x1042.google.com with SMTP id q9so387637pjm.2 for ; Tue, 19 May 2020 16:11:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qurn0n4Y6IbvY2sEfXURphuvrrh6gfojPnRjnDn7wmQ=; b=c+jIAV2c/bV5qZGOszyYayikwXcF7DKN3e/82k14r0BVVXHmikat7+pOCFJrOAu0zL RNozuDILbgOwEW8AnXvWsuyfVUC1nwEEm84RLu68+rJNE8HVJ8VeI52CpVPsxMIN8fot 56y8qb5VURVIT2vodgUGu9JLnt0CQkMLxsmJxbPz5uexuZpBDUeBDO7DD9BXrCV8sZoj K50nbJp286ZdjHh8Edcdxj0luEWiIM8eQigudlmQhuXr2mJMocfPP0xGyZiYKa+F6Otv YrlS6mwOxFluyF88sYqPJZNC3cOhs+vOGTMQ85HGnoMc4JfIXipn7ql3hYMaNvCgf1KH kwOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qurn0n4Y6IbvY2sEfXURphuvrrh6gfojPnRjnDn7wmQ=; b=MzuUy0WoIP2FmRtVCUOKFIjuBSu3/q1qCjB0lXN7wf4rJmhz4Ke8DnYJMg4ynMCPRl atKrnm5Zb/iUSThyIKqHeNxCUszdc3OXPd61xA7c04zLDskwU09cNgNBK5CQ9b6iRSY7 ua3F3Ijklbwe5NQvvcRmhXlVthVPI1sNFgni9PIWxoNGTZCWFTLWJOF7ZqmCjNTWXdZE ddvTsh+8s5VqqbUO1ZEx1JWhm2Odkg5+X2IRLtKSTLGw96FmU1vamx5GunA5tMum2lyB 4Vo5J+Y3QY1MUl3cCt/oQW9NxUOnZyYho/XeSqjkE4rG49lyiKOLiFLHYiBNoANWrDFD q5lA== X-Gm-Message-State: AOAM531sQzXdRUpPdWHVh7YnhMzP5CZnrLbPebTS7C+HBgC7LqfxXFSS W1D2vssZxmHPnJJzfuRQnfZVZQ== X-Google-Smtp-Source: ABdhPJwWcerbxfvi9Ui7U/h1FcFOUmMVeLA1WEqPpaQ3yyNdQW6PjE8iKL+kelCC2wV+F+ZeFViENw== X-Received: by 2002:a17:90b:ed2:: with SMTP id gz18mr2004324pjb.22.1589929910813; Tue, 19 May 2020 16:11:50 -0700 (PDT) Received: from hermes.lan (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id 131sm397060pgf.49.2020.05.19.16.11.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 May 2020 16:11:50 -0700 (PDT) Date: Tue, 19 May 2020 16:11:41 -0700 From: Stephen Hemminger To: Thomas Gleixner Cc: "Ahmed S. Darwish" , Peter Zijlstra , Ingo Molnar , Will Deacon , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org Subject: Re: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Message-ID: <20200519161141.5fbab730@hermes.lan> In-Reply-To: <87v9kr5zt7.fsf@nanos.tec.linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> <20200519214547.352050-2-a.darwish@linutronix.de> <20200519150159.4d91af93@hermes.lan> <87v9kr5zt7.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 20 May 2020 00:23:48 +0200 Thomas Gleixner wrote: > Stephen Hemminger writes: > > On Tue, 19 May 2020 23:45:23 +0200 > > "Ahmed S. Darwish" wrote: > > > >> Sequence counters write paths are critical sections that must never be > >> preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed. > >> > >> Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and > >> netdev name retrieval.") handled a deadlock, observed with > >> CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was > >> infinitely spinning: it got scheduled after the seqcount write side > >> blocked inside its own critical section. > >> > >> To fix that deadlock, among other issues, the commit added a > >> cond_resched() inside the read side section. While this will get the > >> non-preemptible kernel eventually unstuck, the seqcount reader is fully > >> exhausting its slice just spinning -- until TIF_NEED_RESCHED is set. > >> > >> The fix is also still broken: if the seqcount reader belongs to a > >> real-time scheduling policy, it can spin forever and the kernel will > >> livelock. > >> > >> Disabling preemption over the seqcount write side critical section will > >> not work: inside it are a number of GFP_KERNEL allocations and mutex > >> locking through the drivers/base/ :: device_rename() call chain. > >> > >> From all the above, replace the seqcount with a rwsem. > >> > >> Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.) > >> Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount) > >> Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name) > >> Cc: > >> Signed-off-by: Ahmed S. Darwish > >> Reviewed-by: Sebastian Andrzej Siewior > > > > Have your performance tested this with 1000's of network devices? > > No. We did not. -ENOTESTCASE Please try, it isn't that hard.. # time for ((i=0;i<1000;i++)); do ip li add dev dummy$i type dummy; done real 0m17.002s user 0m1.064s sys 0m0.375s