From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willem de Bruijn Subject: Re: [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue Date: Fri, 29 Jun 2018 14:49:01 -0400 Message-ID: References: <20180627215950.6719-1-jesus.sanchez-palencia@intel.com> <20180627215950.6719-15-jesus.sanchez-palencia@intel.com> <88846dcd-b4dc-ddde-6c4b-5a29ffde016b@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Network Development , Thomas Gleixner , jan.altenberg@linutronix.de, Vinicius Gomes , kurt.kanzenbach@linutronix.de, Henrik Austad , Richard Cochran , ilias.apalodimas@linaro.org, ivan.khoronzhuk@linaro.org, Miroslav Lichvar , Willem de Bruijn , Jamal Hadi Salim , Cong Wang , =?UTF-8?B?SmnFmcOtIFDDrXJrbw==?= , Alexander Duyck To: Jesus Sanchez-Palencia Return-path: Received: from mail-io0-f194.google.com ([209.85.223.194]:32992 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932781AbeF2Sti (ORCPT ); Fri, 29 Jun 2018 14:49:38 -0400 Received: by mail-io0-f194.google.com with SMTP id d185-v6so9336817ioe.0 for ; Fri, 29 Jun 2018 11:49:38 -0700 (PDT) In-Reply-To: <88846dcd-b4dc-ddde-6c4b-5a29ffde016b@intel.com> Sender: netdev-owner@vger.kernel.org List-ID: > >> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c > >> index 5514a8aa3bd5..166f4b72875b 100644 > >> --- a/net/sched/sch_etf.c > >> +++ b/net/sched/sch_etf.c > >> @@ -11,6 +11,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> #include > >> #include > >> @@ -124,6 +125,35 @@ static void reset_watchdog(struct Qdisc *sch) > >> qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next)); > >> } > >> > >> +static void report_sock_error(struct sk_buff *skb, u32 err, u8 code) > >> +{ > >> + struct sock_exterr_skb *serr; > >> + ktime_t txtime = skb->tstamp; > >> + > >> + if (!skb->sk || !(skb->sk->sk_txtime_flags & SK_TXTIME_RECV_ERR_MASK)) > >> + return; > >> + > >> + skb = skb_clone_sk(skb); > >> + if (!skb) > >> + return; > >> + > >> + sock_hold(skb->sk); > > > > Why take an extra reference? The skb holds a ref on the sk. > > > Yes, the cloned skb holds a ref on the socket, but the documentation of > skb_clone_sk() makes this explicit suggestion: > > (...) > * When passing buffers allocated with this function to sock_queue_err_skb > * it is necessary to wrap the call with sock_hold/sock_put in order to > * prevent the socket from being released prior to being enqueued on > * the sk_error_queue. > */ > > which I believe is here just so we are protected against a possible race after > skb_orphan() is called from sock_queue_err_skb(). Please let me know if I'm > misreading anything. Yes, indeed. Code only has to worry about that if there are no concurrent references on the socket. I may be mistaken, but I believe that this complicated logic exists only for cases where the timestamp may be queued after the original skb has been released. Specifically, when a tx timestamp is returned from a hardware device after transmission of the original skb. Then the cloned timestamp skb needs its own reference on the sk while it is waiting for the timestamp data (i.e., until the device completion arrives) and then we need a temporary extra ref to work around the skb_orphan in sock_queue_err_skb. Compare skb_complete_tx_timestamp with skb_tstamp_tx. The second is used in the regular datapath to clone an skb and queue it on the error queue immediately, while holding the original skb. This does not call skb_clone_sk and does not need the extra sock_hold. This should be good enough for this code path, too. As kb holds a ref on skb->sk, the socket cannot go away in the middle of report_sock_error.