All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: John Fastabend <john.fastabend@gmail.com>
Cc: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"Thomas Graf" <tgraf@suug.ch>, "Jakub Kicinski" <kubakici@wp.pl>,
	netdev@vger.kernel.org, davem@davemloft.net, jhs@mojatatu.com,
	roopa@cumulusnetworks.com, simon.horman@netronome.com,
	ast@kernel.org, daniel@iogearbox.net, prem@barefootnetworks.com,
	hannes@stressinduktion.org, jbenc@redhat.com,
	tom@herbertland.com, mattyk@mellanox.com, idosch@mellanox.com,
	eladr@mellanox.com, yotamg@mellanox.com, nogahf@mellanox.com,
	ogerlitz@mellanox.com, linville@tuxdriver.com,
	andy@greyhouse.net, f.fainelli@gmail.com,
	dsa@cumulusnetworks.com, vivien.didelot@savoirfairelinux.com,
	andrew@lunn.ch, ivecera@redhat.com,
	"Maciej Żenczykowski" <zenczykowski@gmail.com>
Subject: Re: Let's do P4
Date: Wed, 2 Nov 2016 09:07:23 +0100	[thread overview]
Message-ID: <20161102080723.GD1713@nanopsycho.orion> (raw)
In-Reply-To: <5818B11C.2040004@gmail.com>

Tue, Nov 01, 2016 at 04:13:32PM CET, john.fastabend@gmail.com wrote:
>[...]
>
>>>> P4 is ment to program programable hw, not fixed pipeline.
>>>>
>>>
>>> I'm guessing there are no upstream drivers at the moment that support
>>> this though right? The rocker universe bits though could leverage this.
>> 
>> mlxsw. But this is naturaly not implemented yet, as there is no
>> infrastructure.
>
>Really? What is re-programmable?
>
>Can the parse graph support arbitrary parse graph?
>Can the table topology be reconfigured?
>Can new tables be created?
>What about "new" actions being defined at configuration time?
>
>Or is this just the normal TCAM configuration of defining key widths and
>fields.

At this point TCAM configuration.


>
>> 
>> 
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> since I cannot see how one can put the whole p4 language compiler
>>>>>>> into the driver, so this last step of p4ast->hw, I presume, will be
>>>>>>> done by firmware, which will be running full compiler in an embedded cpu
>>>>>>
>>>>>> In case of mlxsw, that compiler would be in driver.
>>>>>>
>>>>>>
>>>>>>> on the switch. To me that's precisely the kernel bypass, since we won't
>>>>>>> have a clue what HW capabilities actually are and won't be able to fine
>>>>>>> grain control them.
>>>>>>> Please correct me if I'm wrong.
>>>>>>
>>>>>> You are wrong. By your definition, everything has to be figured out in
>>>>>> driver and FW does nothing. Otherwise it could do "something else" and
>>>>>> that would be a bypass? Does not make any sense to me whatsoever.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Plus the thing I cannot imagine in the model you propose is table fillup.
>>>>>>>> For ebpf, you use maps. For p4 you would have to have a separate HW-only
>>>>>>>> API. This is very similar to the original John's Flow-API. And therefore
>>>>>>>> a kernel bypass.
>>>>>>>
>>>>>>> I think John's flow api is a better way to expose mellanox switch capabilities.
>>>>>>
>>>>>> We are under impression that p4 suits us nicely. But it is not about
>>>>>> us, it is about finding the common way to do this.
>>>>>>
>>>>>
>>>>> I'll just poke at my FlowAPI question again. For fixed ASICS what is
>>>>> the Flow-API missing. We have a few proof points that show it is both
>>>>> sufficient and usable for the handful of use cases we care about.
>>>>
>>>> Yeah, it is most probably fine. Even for flex ASICs to some point. The
>>>> question is how it stands comparing to other alternatives, like p4
>>>>
>>>
>>> Just to be clear the Flow-API _was_ generated from the initial P4 spec.
>>> The header files and tools used with it were autogenerated ("compiled"
>>> in a loose sense) from the P4 program. The piece I never exposed
>>> was the set_* operations to reconfigure running systems. I'm not sure
>>> how valuable this is in practice though.
>>>
>>> Also there is a P4-16 spec that will be released shortly that is more
>>> flexible and also more complex.
>> 
>> Would it be able to easily extend the Flow-API to include the changes?
>> 
>
>P4-16 will allow externs, "functions" to execute in the control flow and
>possibly inside the parse graph. None of this was considered in the
>Flow-API. So none of this is supported.
>
>I still have the question are you trying to push the "programming" of
>the device via 'tc' or just the runtime configuration of tables? If it
>is just runtime Flow-API is sufficient IMO. If its programming the
>device using the complete P4-16 spec than no its not sufficient. But

Sure we need both.


>I don't believe vendors will expose the complete programmability of the
>device in the driver, this is going to look more like a fw update than
>a runtime change at least on the devices I'm aware of.

Depends on driver. I think it is fine if driver processed it into come
hw configuration sequence or it simply pushed the program down to fw.
Both usecases are legit.


>
>> 
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> I also think it's not fair to call it 'bypass'. I see nothing in it
>>>>>>> that justify such 'swear word' ;)
>>>>>>
>>>>>> John's Flow-API was a kernel bypass. Why? It was a API specifically
>>>>>> designed to directly work with HW tables, without kernel being involved.
>>>>>
>>>>> I don't think that is a fair definition of HW bypass. The SKIP_SW flag
>>>>> does exactly that for 'tc' based offloads and it was not rejected.
>>>>
>>>> No, no, no. You still have possibility to do the same thing in kernel,
>>>> same functionality, with the same API. That is a big difference.
>>>>
>>>>
>>>>>
>>>>> The _real_ reason that seems to have fallen out of this and other
>>>>> discussion is the Flow-API didn't provide an in-kernel translation into
>>>>> an emulated patch. Note we always had a usermode translation to eBPF.
>>>>> A secondary reason appears to be overhead of adding yet another netlink
>>>>> family.
>>>>
>>>> Yeah. Maybe you remember, back then when Flow-API was being discussed,
>>>> I suggested to wrap it under TC as cls_xflows and cls_xflowsaction of
>>>> some sort and do in-kernel datapath implementation. I believe that after
>>>> that, it would be acceptable.
>>>>
>>>
>>> As I understand the thread here that is exactly the proposal here right?
>>> With a discussion around if the structures/etc are sufficient or any
>>> alternative representations exist.
>> 
>> Might be the way, yes. But I fear that with other p4 extensions this
>> might not be easy to align with. Therefore I though about something more
>> generic, like the p4ast.
>> 
>
>Same question as above are we _really_ talking about pushing the entire
>programmability of the device via 'tc'. If so we need to have a vendor
>say they will support and implement this?

We need some API, and I believe that TC is perfectly suitable for that.
Why do you think it's a problem?



>
>> 
>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> The goal of flow api was to expose HW features to user space, so that
>>>>>>> user space can program it. For something simple as mellanox switch
>>>>>>> asic it fits perfectly well.
>>>>>>
>>>>>> Again, this is not mlx-asic-specific. And again, that is a kernel bypass.
>>>>>>
>>>>>>
>>>>>>> Unless I misunderstand the bigger goal of this discussion and it's
>>>>>>> about programming ezchip devices.
>>>>>>
>>>>>> No. For network processors, I believe that BPF is nicely offloadable, no
>>>>>> need to do the excercise for that.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> If the goal is to model hw tcam in the linux kernel then just introduce
>>>>>>> tcam bpf map type. It will be dog slow in user space, but it will
>>>>>>> match exactly what is happnening in the HW and user space can make
>>>>>>> sensible trade-offs.
>>>>>>
>>>>>> No, you got me completely wrong. This is not about the TCAM. This is
>>>>>> about differences in the 2 words (p4/bpf).
>>>>>> Again, for "p4-ish" devices, you have to translate BPF. And as you
>>>>>> noted, it's an instruction set. Very hard if not impossible to parse in
>>>>>> order to get back the original semantics.
>>>>>>
>>>>>
>>>>> I think in this discussion "p4-ish" devices means devices with multiple
>>>>> tables in a pipeline? Not devices that have programmable/configurable
>>>>> pipelines right? And if we get to talking about reconfigurable devices
>>>>> I believe this should be done out of band as it typically means
>>>>> reloading some ucode, etc.
>>>>
>>>> I'm talking about both. But I think we should focus on reconfigurable
>>>> ones, as we probably won't see that much fixed ones in the future.
>>>>
>>>
>>> hmm maybe but the 10/40/100Gbps devices are going to be around for some
>>> time. So we need to ensure these work well.
>> 
>> Yes, but I would like to emphasize, if we are defining new api
>> the primary focus should be on new devices.
>> 
>> 
>
>What device though. Back to mlxsw question about actually supporting
>this stuff.
>

  reply	other threads:[~2016-11-02  8:07 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-29  7:53 Let's do P4 Jiri Pirko
2016-10-29  9:39 ` Thomas Graf
2016-10-29 10:10   ` Jiri Pirko
2016-10-29 11:15     ` Thomas Graf
2016-10-29 11:28       ` Jiri Pirko
2016-10-29 12:09         ` Thomas Graf
2016-10-29 13:58           ` Jiri Pirko
2016-10-29 14:54             ` Jakub Kicinski
2016-10-29 14:58               ` Jiri Pirko
2016-10-29 14:49 ` Jakub Kicinski
2016-10-29 14:55   ` Jiri Pirko
2016-10-29 16:46   ` John Fastabend
2016-10-30  7:44     ` Jiri Pirko
2016-10-30 10:26       ` Thomas Graf
2016-10-30 16:38         ` Jiri Pirko
2016-10-30 17:45           ` Jakub Kicinski
2016-10-30 18:01             ` Jiri Pirko
2016-10-30 18:44               ` Jakub Kicinski
2016-10-30 19:56                 ` Jiri Pirko
2016-10-30 21:14                   ` John Fastabend
2016-10-30 22:39           ` Alexei Starovoitov
2016-10-31  6:03             ` Maciej Żenczykowski
2016-10-31  7:47               ` Jiri Pirko
2016-10-31  9:39             ` Jiri Pirko
2016-10-31 16:53               ` John Fastabend
2016-10-31 17:12                 ` Jiri Pirko
2016-10-31 18:32                   ` Hannes Frederic Sowa
2016-10-31 19:35                   ` John Fastabend
2016-11-01  8:46                     ` Jiri Pirko
2016-11-01 15:13                       ` John Fastabend
2016-11-02  8:07                         ` Jiri Pirko [this message]
2016-11-02 15:18                           ` John Fastabend
2016-11-02 15:23                             ` Jiri Pirko
2016-11-02  2:29               ` Daniel Borkmann
2016-11-02  5:06                 ` Maciej Żenczykowski
2016-11-02  8:14                 ` Jiri Pirko
2016-11-02 15:22                   ` John Fastabend
2016-11-02 15:27                     ` Jiri Pirko
2016-10-30 20:54       ` John Fastabend
2016-11-01 11:57 ` Jamal Hadi Salim
2016-11-01 15:03   ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161102080723.GD1713@nanopsycho.orion \
    --to=jiri@resnulli.us \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=andy@greyhouse.net \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsa@cumulusnetworks.com \
    --cc=eladr@mellanox.com \
    --cc=f.fainelli@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=idosch@mellanox.com \
    --cc=ivecera@redhat.com \
    --cc=jbenc@redhat.com \
    --cc=jhs@mojatatu.com \
    --cc=john.fastabend@gmail.com \
    --cc=kubakici@wp.pl \
    --cc=linville@tuxdriver.com \
    --cc=mattyk@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=nogahf@mellanox.com \
    --cc=ogerlitz@mellanox.com \
    --cc=prem@barefootnetworks.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=simon.horman@netronome.com \
    --cc=tgraf@suug.ch \
    --cc=tom@herbertland.com \
    --cc=vivien.didelot@savoirfairelinux.com \
    --cc=yotamg@mellanox.com \
    --cc=zenczykowski@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.