From mboxrd@z Thu Jan  1 00:00:00 1970
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Subject: Re: [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue
Date: Fri, 29 Jun 2018 14:49:01 -0400
Message-ID: <CAF=yD-KqkuZsstXK2-s9_ttuCKx=wucHS9zEmkdH=BsgMY=a0Q@mail.gmail.com>
References: <20180627215950.6719-1-jesus.sanchez-palencia@intel.com>
 <20180627215950.6719-15-jesus.sanchez-palencia@intel.com> <CAF=yD-+R867otr0n9q-pHvvUhv6o3Nj42a3Ts54J7gwPgr0G9Q@mail.gmail.com>
 <88846dcd-b4dc-ddde-6c4b-5a29ffde016b@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Cc: Network Development <netdev@vger.kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        jan.altenberg@linutronix.de,
        Vinicius Gomes <vinicius.gomes@intel.com>,
        kurt.kanzenbach@linutronix.de, Henrik Austad <henrik@austad.us>,
        Richard Cochran <richardcochran@gmail.com>,
        ilias.apalodimas@linaro.org, ivan.khoronzhuk@linaro.org,
        Miroslav Lichvar <mlichvar@redhat.com>,
        Willem de Bruijn <willemb@google.com>,
        Jamal Hadi Salim <jhs@mojatatu.com>,
        Cong Wang <xiyou.wangcong@gmail.com>,
        =?UTF-8?B?SmnFmcOtIFDDrXJrbw==?= <jiri@resnulli.us>,
        Alexander Duyck <alexander.duyck@gmail.com>
To: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-io0-f194.google.com ([209.85.223.194]:32992 "EHLO
        mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S932781AbeF2Sti (ORCPT
        <rfc822;netdev@vger.kernel.org>); Fri, 29 Jun 2018 14:49:38 -0400
Received: by mail-io0-f194.google.com with SMTP id d185-v6so9336817ioe.0
        for <netdev@vger.kernel.org>; Fri, 29 Jun 2018 11:49:38 -0700 (PDT)
In-Reply-To: <88846dcd-b4dc-ddde-6c4b-5a29ffde016b@intel.com>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

> >> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
> >> index 5514a8aa3bd5..166f4b72875b 100644
> >> --- a/net/sched/sch_etf.c
> >> +++ b/net/sched/sch_etf.c
> >> @@ -11,6 +11,7 @@
> >>  #include <linux/kernel.h>
> >>  #include <linux/string.h>
> >>  #include <linux/errno.h>
> >> +#include <linux/errqueue.h>
> >>  #include <linux/rbtree.h>
> >>  #include <linux/skbuff.h>
> >>  #include <linux/posix-timers.h>
> >> @@ -124,6 +125,35 @@ static void reset_watchdog(struct Qdisc *sch)
> >>         qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next));
> >>  }
> >>
> >> +static void report_sock_error(struct sk_buff *skb, u32 err, u8 code)
> >> +{
> >> +       struct sock_exterr_skb *serr;
> >> +       ktime_t txtime = skb->tstamp;
> >> +
> >> +       if (!skb->sk || !(skb->sk->sk_txtime_flags & SK_TXTIME_RECV_ERR_MASK))
> >> +               return;
> >> +
> >> +       skb = skb_clone_sk(skb);
> >> +       if (!skb)
> >> +               return;
> >> +
> >> +       sock_hold(skb->sk);
> >
> > Why take an extra reference? The skb holds a ref on the sk.
>
>
> Yes, the cloned skb holds a ref on the socket, but the documentation of
> skb_clone_sk() makes this explicit suggestion:
>
> (...)
>  * When passing buffers allocated with this function to sock_queue_err_skb
>  * it is necessary to wrap the call with sock_hold/sock_put in order to
>  * prevent the socket from being released prior to being enqueued on
>  * the sk_error_queue.
>  */
>
> which I believe is here just so we are protected against a possible race after
> skb_orphan() is called from sock_queue_err_skb(). Please let me know if I'm
> misreading anything.

Yes, indeed. Code only has to worry about that if there are no
concurrent references
on the socket.

I may be mistaken, but I believe that this complicated logic exists
only for cases where
the timestamp may be queued after the original skb has been released.
Specifically,
when a tx timestamp is returned from a hardware device after transmission of the
original skb. Then the cloned timestamp skb needs its own reference on
the sk while
it is waiting for the timestamp data (i.e., until the device
completion arrives) and then
we need a temporary extra ref to work around the skb_orphan in
sock_queue_err_skb.

Compare skb_complete_tx_timestamp with skb_tstamp_tx. The second is used in
the regular datapath to clone an skb and queue it on the error queue
immediately,
while holding the original skb. This does not call skb_clone_sk and
does not need the
extra sock_hold. This should be good enough for this code path, too.
As kb holds a
ref on skb->sk, the socket cannot go away in the middle of report_sock_error.