From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5372E39B for ; Thu, 20 Jul 2023 01:05:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5D15C433C7; Thu, 20 Jul 2023 01:05:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689815107; bh=KCOWm+V8fqEKub04SEBFZ9pWIMdNUprN0gkMa0JpyOE=; h=From:To:Cc:Subject:Date:From; b=PsHVN43Lvnml4/IZuLZf8M/tSO0l1uXclfsnY4d44Fearcr+nQTy6LrS3yaeSBbB7 c34XP8KgGfuLfGeQe53MxO034LYg4PB072+Ge1zrhKzd+c3zRfEfupSjP71Jtq/1Ti adDFXVJ6kYx/KR98URPgPMXzS8cetlHAhm+CLTO7ZvIGN6gdYKcRitTMsEXAcyOykF /urDT10KBnlH/oGiMzSZdqzhPeDMmUwr9xCE68Vp0abGwEhdoT5E5gweLRu7yVtZzK KqUJzDe8Ct+nkxWaliK4++9Si6ID1JllIeKCXkuXHVwGQT6ERgkLrxo2fQLC80n6CH NSb/2uZcrHdRw== From: Jakub Kicinski To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, pabeni@redhat.com, michael.chan@broadcom.com, Jakub Kicinski Subject: [PATCH net-next v2 0/3] eth: bnxt: handle invalid Tx completions more gracefully Date: Wed, 19 Jul 2023 18:04:37 -0700 Message-ID: <20230720010440.1967136-1-kuba@kernel.org> X-Mailer: git-send-email 2.41.0 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit bnxt trusts the events generated by the device which may lead to kernel crashes. These are extremely rare but they do happen. For a while I thought crashing may be intentional, because device reporting invalid completions should never happen, and having a core dump could be useful if it does. But in practice I haven't found any clues in the core dumps, and panic_on_warn exists. Series was tested by forcing the recovery path manually. Because of how rare the real crashes are I can't confirm it works for the actual device errors until it's been widely deployed. v2: - factor out the reset scheduling - also add a check on the XDP path v1: https://lore.kernel.org/all/20230710205611.1198878-1-kuba@kernel.org/ Jakub Kicinski (3): eth: bnxt: move and rename reset helpers eth: bnxt: take the bit to set as argument of bnxt_queue_sp_work() eth: bnxt: handle invalid Tx completions more gracefully drivers/net/ethernet/broadcom/bnxt/bnxt.c | 154 ++++++++++-------- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 + drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 4 + 3 files changed, 91 insertions(+), 70 deletions(-) -- 2.41.0