All of lore.kernel.org
 help / color / mirror / Atom feed
* To netlink or not to netlink, that is the question
@ 2017-01-12 13:01 Jason A. Donenfeld
  2017-01-12 17:14 ` Stephen Hemminger
  0 siblings, 1 reply; 10+ messages in thread
From: Jason A. Donenfeld @ 2017-01-12 13:01 UTC (permalink / raw)
  To: Netdev

Hey folks,

A few months ago I switched away from using netlink in wireguard,
preferring instead to use ioctl. I had come up against limitations in
rtnetlink, and ioctl presented a straightforward hard to screw-up
alternative. The very simple API is documented here:
https://git.zx2c4.com/WireGuard/tree/src/uapi.h

This works well, and I'm reluctant to change it, but as I do more
complicated things, and as kernel submission time looms nearer, I'm
kept up at night by the notion that maybe I ought to give netlink
another chance. But how?

For each wireguard interface, there are three types of structures for
userspace to configure. There is one wgdevice for each interface. Each
wgdevice has a variable amount (up to 2^16) of wgpeers. Each wgpeer
has a variable amount (up to 2^16) of wgipmasks. I'd like an interface
to get and set all of these at once, atomically.

Presently, with the ioctl, I just have a simple get ioctl and a simple
set ioctl. The set one passes a user space pointer, which is read
incrementally in kernel space. The get one will first return how much
userspace should allocate, and then when called again will write
incrementally into a provided userspace buffer up to a passed-in
maximum number of bytes. Very basic, I'm quite happy.

When I had tried to do this priorly with netlink, I did it by defining
changelink and fill_info in rtnl_link_ops. For changelink, I iterated
through the netlink objects, and for fill_info, I filled in the skb
with netlink objects. This was a bit more complex but basically
worked. Except netlink skbs have a maximum size and are buffered,
which means things broke entirely when trying to read or write logs of
wgpeers or lots of wgipmasks. So, the meager interfaces afforded to me
by rtnl_link_ops are insufficient. Doing anything beyond this, either
by registering new rtnetlink messages, or by using generic netlink,
seemed overwhelmingly complex and undesirable.

So I'm wondering -- is there a good way to be doing this with netlink?
Or am I right to stay with ioctl?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: To netlink or not to netlink, that is the question
  2017-01-12 13:01 To netlink or not to netlink, that is the question Jason A. Donenfeld
@ 2017-01-12 17:14 ` Stephen Hemminger
  2017-01-12 18:28   ` Jason A. Donenfeld
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2017-01-12 17:14 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev

On Thu, 12 Jan 2017 14:01:07 +0100
"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:

> Hey folks,
> 
> A few months ago I switched away from using netlink in wireguard,
> preferring instead to use ioctl. I had come up against limitations in
> rtnetlink, and ioctl presented a straightforward hard to screw-up
> alternative. The very simple API is documented here:
> https://git.zx2c4.com/WireGuard/tree/src/uapi.h
> 
> This works well, and I'm reluctant to change it, but as I do more
> complicated things, and as kernel submission time looms nearer, I'm
> kept up at night by the notion that maybe I ought to give netlink
> another chance. But how?
> 
> For each wireguard interface, there are three types of structures for
> userspace to configure. There is one wgdevice for each interface. Each
> wgdevice has a variable amount (up to 2^16) of wgpeers. Each wgpeer
> has a variable amount (up to 2^16) of wgipmasks. I'd like an interface
> to get and set all of these at once, atomically.
> 
> Presently, with the ioctl, I just have a simple get ioctl and a simple
> set ioctl. The set one passes a user space pointer, which is read
> incrementally in kernel space. The get one will first return how much
> userspace should allocate, and then when called again will write
> incrementally into a provided userspace buffer up to a passed-in
> maximum number of bytes. Very basic, I'm quite happy.
> 
> When I had tried to do this priorly with netlink, I did it by defining
> changelink and fill_info in rtnl_link_ops. For changelink, I iterated
> through the netlink objects, and for fill_info, I filled in the skb
> with netlink objects. This was a bit more complex but basically
> worked. Except netlink skbs have a maximum size and are buffered,
> which means things broke entirely when trying to read or write logs of
> wgpeers or lots of wgipmasks. So, the meager interfaces afforded to me
> by rtnl_link_ops are insufficient. Doing anything beyond this, either
> by registering new rtnetlink messages, or by using generic netlink,
> seemed overwhelmingly complex and undesirable.
> 
> So I'm wondering -- is there a good way to be doing this with netlink?
> Or am I right to stay with ioctl?
> 
> Thanks,
> Jason

It is up to you but I doubt that code with new private ioctl's will be
accepted upstream. If you want full review then post for inclusion upstream.
If you just want to maintain it is a private fork, go ahead and do what
you want and suffer the consequences.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: To netlink or not to netlink, that is the question
  2017-01-12 17:14 ` Stephen Hemminger
@ 2017-01-12 18:28   ` Jason A. Donenfeld
  2017-01-12 18:43     ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Jason A. Donenfeld @ 2017-01-12 18:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Netdev

On Thu, Jan 12, 2017 at 6:14 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> It is up to you but I doubt that code with new private ioctl's will be
> accepted upstream. If you want full review then post for inclusion upstream.
> If you just want to maintain it is a private fork, go ahead and do what
> you want and suffer the consequences.

Obviously I'm going for upstream conclusion and not willing to "suffer
the consequences", hence my email in the first place. Given that you
seem most interested in netlink, might you have any constructive
suggestions on how netlink might be used with very large atomic
messages?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: To netlink or not to netlink, that is the question
  2017-01-12 18:28   ` Jason A. Donenfeld
@ 2017-01-12 18:43     ` Dan Williams
  2017-01-12 19:02       ` Jason A. Donenfeld
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2017-01-12 18:43 UTC (permalink / raw)
  To: Jason A. Donenfeld, Stephen Hemminger; +Cc: Netdev

On Thu, 2017-01-12 at 19:28 +0100, Jason A. Donenfeld wrote:
> On Thu, Jan 12, 2017 at 6:14 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > It is up to you but I doubt that code with new private ioctl's will
> > be
> > accepted upstream. If you want full review then post for inclusion
> > upstream.
> > If you just want to maintain it is a private fork, go ahead and do
> > what
> > you want and suffer the consequences.
> 
> Obviously I'm going for upstream conclusion and not willing to
> "suffer
> the consequences", hence my email in the first place. Given that you
> seem most interested in netlink, might you have any constructive
> suggestions on how netlink might be used with very large atomic
> messages?

Typically kernel code doesn't pass very large atomic messages back and
forth.  From the description you mailed, I would guess a better fit API
would be to have specific netlink mechanisms for add/remove wgpeers,
and a specific mechanism for add/remove ipmasks for each peer.  And
don't forget netlink events to indicate when peers and ipmasks have
been added/removed, so userspace is aware of that.

Besides, perhaps multiple programs may wish to manipulate a wgdevice,
which could include adding/removing peers or ipmasks from separate
userspace processes.  That's not really possible with a huge buffer
that includes all the information, as a process would have to read the
buffer, update it, and send it back, which would then potentially
clobber whatever another process did between the read/write.

In short, if the API doesn't quite fit, then perhaps the API could be
broken up into smaller, more discrete operations that better fit the
netlink API.

Just some thoughts...

Dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: To netlink or not to netlink, that is the question
  2017-01-12 18:43     ` Dan Williams
@ 2017-01-12 19:02       ` Jason A. Donenfeld
  2017-01-12 19:11         ` David Miller
  2017-01-13  7:17         ` Johannes Berg
  0 siblings, 2 replies; 10+ messages in thread
From: Jason A. Donenfeld @ 2017-01-12 19:02 UTC (permalink / raw)
  To: Dan Williams; +Cc: Stephen Hemminger, Netdev

Hi Dan,

Thanks for your response. I'd thought about this, at least for
adding/removing wgpeers/wgipmasks and for configuring wgdevices. This
would fit into multiple smaller messages indeed.

But what about fetching the list of all existing peers and ipmasks
atomically? It seems like with multiple calls, if I'm using some kind
of pagination, things could change in the process. That's why using
one big buffer was most appealing... Any ideas about this?

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: To netlink or not to netlink, that is the question
  2017-01-12 19:02       ` Jason A. Donenfeld
@ 2017-01-12 19:11         ` David Miller
  2017-01-12 20:07           ` Jason A. Donenfeld
  2017-01-13  7:17         ` Johannes Berg
  1 sibling, 1 reply; 10+ messages in thread
From: David Miller @ 2017-01-12 19:11 UTC (permalink / raw)
  To: Jason; +Cc: dcbw, stephen, netdev

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Thu, 12 Jan 2017 20:02:14 +0100

> But what about fetching the list of all existing peers and ipmasks
> atomically? It seems like with multiple calls, if I'm using some kind
> of pagination, things could change in the process. That's why using
> one big buffer was most appealing... Any ideas about this?

This is a fact of life, dumps are always chopped into suitable
numbers of responses as necessary.  We do this for IPV4 routes,
network interfaces, etc. and it all works out just fine.

The thing you should be asking yourself is, if something as heavily
used and fundamental as IPV4 can handle this, probably your scenerio
can be handled just fine as well.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: To netlink or not to netlink, that is the question
  2017-01-12 19:11         ` David Miller
@ 2017-01-12 20:07           ` Jason A. Donenfeld
  2017-01-12 20:27             ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Jason A. Donenfeld @ 2017-01-12 20:07 UTC (permalink / raw)
  To: David Miller; +Cc: Dan Williams, Stephen Hemminger, Netdev

On Thu, Jan 12, 2017 at 8:11 PM, David Miller <davem@davemloft.net> wrote:
> This is a fact of life, dumps are always chopped into suitable
> numbers of responses as necessary.  We do this for IPV4 routes,
> network interfaces, etc. and it all works out just fine.
>
> The thing you should be asking yourself is, if something as heavily
> used and fundamental as IPV4 can handle this, probably your scenerio
> can be handled just fine as well.

Okay, fair enough. I'll suck it up, then, and just use netlink in this
way. David - you concur with Stephen that ioctl is really not okay and
I should absolutely do netlink?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: To netlink or not to netlink, that is the question
  2017-01-12 20:07           ` Jason A. Donenfeld
@ 2017-01-12 20:27             ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2017-01-12 20:27 UTC (permalink / raw)
  To: Jason; +Cc: dcbw, stephen, netdev

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Thu, 12 Jan 2017 21:07:43 +0100

> David - you concur with Stephen that ioctl is really not okay and I
> should absolutely do netlink?

Yes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: To netlink or not to netlink, that is the question
  2017-01-12 19:02       ` Jason A. Donenfeld
  2017-01-12 19:11         ` David Miller
@ 2017-01-13  7:17         ` Johannes Berg
  2017-01-13 13:36           ` Nicolas Dichtel
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Berg @ 2017-01-13  7:17 UTC (permalink / raw)
  To: Jason A. Donenfeld, Dan Williams; +Cc: Stephen Hemminger, Netdev

On Thu, 2017-01-12 at 20:02 +0100, Jason A. Donenfeld wrote:
> Hi Dan,
> 
> Thanks for your response. I'd thought about this, at least for
> adding/removing wgpeers/wgipmasks and for configuring wgdevices. This
> would fit into multiple smaller messages indeed.
> 
> But what about fetching the list of all existing peers and ipmasks
> atomically? It seems like with multiple calls, if I'm using some kind
> of pagination, things could change in the process. That's why using
> one big buffer was most appealing... Any ideas about this?

In addition to what others have said - netlink typically includes (and
has helpers to do so) a generation counter that's updated whenever this
list changes, and included in each message, so if userspace really
cares (often not) it can retry the dump until the system was idle
enough to get a consistent snapshot.

It also looks to me like your existing API isn't even compat-safe due
to u64 alignment (e.g. in wgpeer), proving once again that ioctl is a
bad idea.

johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: To netlink or not to netlink, that is the question
  2017-01-13  7:17         ` Johannes Berg
@ 2017-01-13 13:36           ` Nicolas Dichtel
  0 siblings, 0 replies; 10+ messages in thread
From: Nicolas Dichtel @ 2017-01-13 13:36 UTC (permalink / raw)
  To: Johannes Berg, Jason A. Donenfeld, Dan Williams; +Cc: Stephen Hemminger, Netdev

Le 13/01/2017 à 08:17, Johannes Berg a écrit :
[snip]
> In addition to what others have said - netlink typically includes (and
> has helpers to do so) a generation counter that's updated whenever this
> list changes, and included in each message, so if userspace really
> cares (often not) it can retry the dump until the system was idle
> enough to get a consistent snapshot.
Look at NLM_F_DUMP_INTR to get details.

> 
> It also looks to me like your existing API isn't even compat-safe due
> to u64 alignment (e.g. in wgpeer), proving once again that ioctl is a
> bad idea.
+1


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-01-13 13:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12 13:01 To netlink or not to netlink, that is the question Jason A. Donenfeld
2017-01-12 17:14 ` Stephen Hemminger
2017-01-12 18:28   ` Jason A. Donenfeld
2017-01-12 18:43     ` Dan Williams
2017-01-12 19:02       ` Jason A. Donenfeld
2017-01-12 19:11         ` David Miller
2017-01-12 20:07           ` Jason A. Donenfeld
2017-01-12 20:27             ` David Miller
2017-01-13  7:17         ` Johannes Berg
2017-01-13 13:36           ` Nicolas Dichtel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.