netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Richard Sanger <rsanger@wand.net.nz>
Cc: Network Development <netdev@vger.kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Richard Cochran <richardcochran@gmail.com>
Subject: Re: [PATCH] net: packetmmap: fix only tx timestamp on request
Date: Tue, 11 May 2021 18:56:50 -0400	[thread overview]
Message-ID: <CA+FuTSfUAcjOpcopYMpfMGD7FmPkgrMYQnB+GKQYg438YvAxqQ@mail.gmail.com> (raw)
In-Reply-To: <CAN6QFNyHep+UGjM7XpA4akbtvZFNDarVcs3=zZPpYO7RMTJgHg@mail.gmail.com>

On Tue, May 11, 2021 at 6:11 PM Richard Sanger <rsanger@wand.net.nz> wrote:
>
> I've had a chance to look into this further and have found where the
> timestamp is added. Details are at the end of this message.
>
> On Thu, May 6, 2021 at 1:23 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Wed, May 5, 2021 at 7:42 PM Richard Sanger <rsanger@wand.net.nz> wrote:
> [...]
> > >
> > > I've just verified using printk() that after the call to skb_tx_timestamp(skb)
> > > in veth_xmit() skb->tstamp == 0 as expected.
> > >
> > > However, when skb_tx_timestamp() is called within the packetmmap code path
> > > skb->tstamp holds a valid time.
> >
> > Interesting. I had expected veth_xmit to trigger skb_orphan, which
> > calls the destructor.
> >
> > But this is no longer true as of commit 9c4c325252c5 ("skbuff:
> > preserve sock reference when scrubbing the skb.").
> >
> > As a result, I suppose the skb can enter the next namespace and be
> > timestamped there if receive timestamps are enabled (this is not
> > per-socket).
> >
> > One way to verify, if you can easily recompile a kernel, is to add a
> > WARN_ON_ONCE(1) to tpacket_destruct_skb to see which path led up to
> > queuing the completion notification.
> >
>
> Here's the output of putting a WARN_ON_ONCE(1) statement in
> tpacket_destruct_skb, I don't believe it is related to the problem.

It might be, if this is indeed an llc (ETH_P_802_2) packet that was
inserted with tpacket.

veth calls netif_rx, which enqueues the packet to the percpu queue for
later processing in softirq context using process_backlog.

>
> [   37.249629] RIP: 0010:tpacket_destruct_skb+0x24/0x60
> [...]
> [   37.249659] Call Trace:
> [   37.249661]  <IRQ>
> [   37.249666]  skb_release_head_state+0x44/0x90
> [   37.249680]  skb_release_all+0x13/0x30
> [   37.249684]  kfree_skb+0x2f/0xa0
> [   37.249689]  llc_rcv+0x2e/0x360 [llc]
> [   37.249698]  __netif_receive_skb_one_core+0x8f/0xa0
> [   37.249707]  __netif_receive_skb+0x18/0x60
> [   37.249710]  process_backlog+0xa9/0x160
> [   37.249714]  __napi_poll+0x31/0x140
> [   37.249717]  net_rx_action+0xde/0x210
> [   37.249722]  __do_softirq+0xe0/0x29b
> [   37.249737]  do_softirq+0x66/0x80
> [   37.249747]  </IRQ>
> [   37.249748]  __local_bh_enable_ip+0x50/0x60
> [   37.249751]  __dev_queue_xmit+0x23a/0x6e0
> [   37.249756]  dev_queue_xmit+0x10/0x20
> [   37.249759]  packet_sendmsg+0x6b8/0x1c90
> [   37.249763]  ? __drain_all_pages+0x150/0x1c0
> [   37.249772]  sock_sendmsg+0x65/0x70
> [   37.249778]  __sys_sendto+0x113/0x190
> [   37.249783]  ? handle_mm_fault+0xda/0x2b0
> [   37.249790]  ? exit_to_user_mode_prepare+0x3c/0x1e0
> [   37.249800]  ? do_user_addr_fault+0x1d3/0x640
> [   37.249805]  __x64_sys_sendto+0x29/0x30
> [   37.249809]  do_syscall_64+0x40/0xb0
> [   37.249816]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   37.249820] RIP: 0033:0x7f43950d27ea
>
> [...]
>
> > I think we need to understand exactly what goes on before we apply a
> > patch. It might just be papering over the problem otherwise.
>
> Okay, so the call path that adds the timestamp looks like this:
>
> send() syscall triggers tpacket_snd() which calls the veth_xmit() hander.
> In drivers/net/veth.c veth_xmit() calls veth_forward_skb() which then
> calls netif_rx()/netif_rx_internal() in net/core/dev.c.
> And finally, net_timestamp_check(netdev_tstamp_prequeue, skb) adds
> the timestamp, netdev_tstamp_prequeue defaults to 1.
>
> net_timestamp_check in its current form was added by 588f033075
> ("net: use jump_label for netstamp_needed ")
> In the kernel since 3.3-rc1, so it looks like this issue has been present the
> entire time. Pre-conditions are netstamp_needed_key and
> netdev_tstamp_prequeue, so if either is false, timestamping won't happen
> at this stage in the code.

Thanks. That's as I expected.

We cannot suppress the timestamp assignment by net_timestamp_check, as
it may be expected by the receive process.

I don't immediately see a way to clear/ignore the field in
tpacket_destruct_skb if the field was set from a different network
namespace than the one in which the packet socket inserted the packet.

So your suggested patch to suppress spurious tx timestamps if software
timestamping reporting is not enabled SGTM.


> Here's the call trace of where the timestamp is added
>
> [  251.619538] Call Trace:
> [  251.619550]  netif_rx+0x1b/0x60
> [  251.619556]  veth_xmit+0x19d/0x230 [veth]
> [  251.619563]  netdev_start_xmit+0x4a/0x8b
> [  251.619566]  dev_hard_start_xmit.cold+0xc8/0x1d5
> [  251.619569]  __dev_queue_xmit.cold+0xa3/0x12c
> [  251.619572]  dev_queue_xmit+0x10/0x20
> [  251.619575]  packet_sendmsg+0x6b8/0x1c90
> [  251.619580]  ? __drain_all_pages+0x150/0x1c0
> [  251.619588]  sock_sendmsg+0x65/0x70
> [  251.619594]  __sys_sendto+0x113/0x190
> [  251.619598]  ? handle_mm_fault+0xda/0x2b0
> [  251.619604]  ? exit_to_user_mode_prepare+0x3c/0x1e0
> [  251.619611]  ? do_user_addr_fault+0x1d3/0x640
> [  251.619615]  __x64_sys_sendto+0x29/0x30
> [  251.619618]  do_syscall_64+0x40/0xb0
> [  251.619623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> This appears to be reasonable, but I don't know what the expected behaviour
> is. Should this timestamp still be cleared before returning the sent skb?

      reply	other threads:[~2021-05-11 22:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-03 23:46 [PATCH] net: packetmmap: fix only tx timestamp on request Richard Sanger
2021-05-04  0:36 ` Willem de Bruijn
2021-05-04  1:22   ` Richard Sanger
2021-05-04 14:45     ` Willem de Bruijn
2021-05-05  4:29       ` Richard Sanger
2021-05-05  5:09         ` Richard Sanger
2021-05-06  1:23         ` Willem de Bruijn
2021-05-11 22:11           ` Richard Sanger
2021-05-11 22:56             ` Willem de Bruijn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+FuTSfUAcjOpcopYMpfMGD7FmPkgrMYQnB+GKQYg438YvAxqQ@mail.gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=netdev@vger.kernel.org \
    --cc=richardcochran@gmail.com \
    --cc=rsanger@wand.net.nz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).