netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Saeed Mahameed <saeedm@mellanox.com>,
	"netdev\@vger.kernel.org" <netdev@vger.kernel.org>
Cc: Eran Ben Elisha <eranbe@mellanox.com>,
	Tariq Toukan <tariqt@mellanox.com>,
	"brouer\@redhat.com" <brouer@redhat.com>
Subject: Re: Kernel oops with mlx5 and dual XDP redirect programs
Date: Tue, 23 Oct 2018 12:10:30 +0200	[thread overview]
Message-ID: <8736sxgei1.fsf@toke.dk> (raw)
In-Reply-To: <15797ad1ccee84dfd47c6f45af155806b4ccc228.camel@mellanox.com>

Saeed Mahameed <saeedm@mellanox.com> writes:

> On Thu, 2018-10-18 at 23:53 +0200, Toke Høiland-Jørgensen wrote:
>> Saeed Mahameed <saeedm@mellanox.com> writes:
>> 
>> > I think that the mlx5 driver doesn't know how to tell the other
>> > device
>> > to stop transmitting to it while it is resetting.. Maybe tariq or
>> > Jesper know more about this ?
>> > I will look at this tomorrow after noon and will try to repro...
>> 
>> Hi Saeed
>> 
>> Did you have a chance to poke at this? :)
>
> HI Toke, yes i have been planing to respond but also i wanted to dig
> more,
>
> so the root cause is very clear.
>
> 1. core 1 is doing tx_dev->ndo_xdp_xmit()
> 2. core 2 is doing tx_dev->xdp_set() //remove xdp program.

Right, it was also my guess that it was related to this interaction.
Thanks for looking into it!

> and the problem is beyond mlx5, since we don't have a way to tell a
> different core/different netdev to stop xmitting, or at least
> synchronize with it.

Hmm, ideally there should be some way for the higher level XDP API to
notice this and abort the call before it even reaches the driver on the
TX side, shouldn't there? At LPC, Jesper and I will be talking about a
proposal for decoupling the ndo_xdp_xmit() resource allocation from
loading and unloading XDP programs, which I guess could be a way to deal
with this as well.

In the meantime...

> I will be waiting for your confirmation that the fix did work.

I tested your patch, and it does indeed fix the crash. However, it also
seems to have the effect that the XDP redirect continues to function
even after removing the XDP program on the target device.

I.e., after the call to ./xdp_fwd -d $TX_IF, I still see packets being
redirected out $TX_IF. Is this intentional?

-Toke

  reply	other threads:[~2018-10-23 18:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-03  9:30 Kernel oops with mlx5 and dual XDP redirect programs Toke Høiland-Jørgensen
2018-10-03 23:44 ` Saeed Mahameed
2018-10-04 12:03   ` Toke Høiland-Jørgensen
2018-10-18 21:53   ` Toke Høiland-Jørgensen
2018-10-22 17:57     ` Saeed Mahameed
2018-10-23 10:10       ` Toke Høiland-Jørgensen [this message]
2018-10-23 18:01         ` Saeed Mahameed
2018-10-23 20:29           ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8736sxgei1.fsf@toke.dk \
    --to=toke@toke.dk \
    --cc=brouer@redhat.com \
    --cc=eranbe@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).