All of
 help / color / mirror / Atom feed
From: Florian Westphal <>
To: Mat Martineau <>
Cc: Florian Westphal <>,
Subject: Re: Revised mptcp_info sockopt and selftests
Date: Mon, 2 Aug 2021 16:51:27 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Mat Martineau <> wrote:
> > Here is a minimal usage example to illustrate the feature:
> > 
> > UAPI structures:
> > struct mptcp_info_opt_ptr {
> > 	__aligned_u64	address;			/* userspace address */
> > 	__u32		nmemb_user;			/* number of elements in address[] */
> > 	__u32		size_user;			/* size of one element in address[] */
> > 	__u32		nmemb_kernel;			/* nmemb count kernel wanted to store */
> > 	__u32		size_kernel;			/* structure size in kernel */
> > };
> > 
> Will embedding multiple userspace addresses in this way be a problem for
> security frameworks, in-kernel sockets, BPF_CGROUP_RUN_PROG_SETSOCKOPT, etc?

Yes :-/

I was not aware of BPF_CGROUP_RUN_PROG_SETSOCKOPT, thanks for pointing
this out (in this case, BPF_...GETSOCKOPT, but problem is similar).

So, this approach is dead/cannot be done.


> With offsets, the above code could look something like:
> struct my_info_opt {
> 	struct mptcp_info_opt m; /* Has to be first */
> 	struct mptcp_info minfo;
> 	struct tcp_info tinfo[16];
> 	struct sockaddr_storage locals[16];
> 	struct sockaddr_storage remotes[16];
> } my_info;
> memset(&my_info, 0, sizeof(my_info));
> my_info.num_ptrs = 4;
> my_info.m.mptcp_info.offset = offsetof(my_info.minfo);
> my_info.m.mptcp_info.size_user = sizeof(my_info.minfo);
> my_info.m.tcp_info.offset = offsetof(my_info.tinfo);
> /* size_user / nmemb_user same as above */
> /* and so on for locals and remotes */

This takes away all the flexibility. Userspace would have to
define such struct with locals[256], remotes[256], and so on, i.e.
specify the largest possible value that it might need.

Or, create some macro magic that creates X 'struct my_info_opt_X'
versions; one for each value of X (the size of the sub-arrays).

> I think it's a very flexible approach, my main suggestion is the offset vs.
> address thing above.

See above, I think its cumbersome to use, since you can no longer do
something like

struct mptcp_info_opt x = {};

/* get the required sizes */
getsockopt(... &x);

x.locals.address = calloc(sizeof(struct sockaddr_storage), x.locals.nmemb_kernel);

/* really get the data */
getsockopt(... &x);

What about splitting the functionality in distinct getsockopt

struct mptcp_info_opt {
    __u32           nmemb_user;             /* number of elements stored */
    __u32           size_user;              /* size of one element stored */
    __u32           nmemb_kernel;           /* number of elements kernel wanted to store */
    __u32           size_kernel;            /* structure size in kernel */
    __aligned_u64   data[1];		    /* actual data */

getsockopt(..., MPTCP_GETPEERNAMES, &opt, &optlen);
getsockopt(..., MPTCP_GETSOCKNAMES, &opt, &optlen);

.. but subflows can change between those two :(

That brings us back to the compound structure that kernel has to
wrap both local and remote addresses in a single struct.

The more I think about it, the less ideas i have on how to make
any of this work.

Sctp has connectx and bindx pseudo-syscalls (its really just setsockopts),
those all expect 'struct sockaddr *', with some extra code in the kernel
to guess the correct size of each by looking at ->sa_family.

So, with all of that in mind:
MPTCP_INFO -> gets the mptcp info diag data, with nothing else.  Single struct
that is exported to userspace (already is).

MPTCP_TCP_INFO -> gets array of 'struct tcp_info'.
MPTCP_SUBFLOWS -> gets array of 'struct mptcp_subflow', which has both local+remote sockaddrs.

To make both of those work without a need to guess the individual element size,
it would make sense to prepend struct mptcp_info_opt, i.e.

opt = malloc(sizeof(struct mptcp_info_opt) + sizeof(struct tcp_info) * 16));

opt.nmemb_user = 16;
opt.size_user = sizeof(struct tcp_info);

getsockopt(..., MPTCP_TCP_INFO, &opt, &optlen);

This would still allow to fetch the rquired size, resp. retry if nmemb_kernel > nmemb_user
after the call returns.

> Would it be frowned upon to put some inline helper functions or macros in
> the mptcp.h UAPI to reduce the boilerplate code to populate the
> mptcp_info_opt_ptrs? It looks like some UAPI headers have that, but not
> many.

No idea.  Personally I think the kernel should only
expose enums/defines and, if needed, structure layouts and nothing else.

That said, we could certainly add helpers (e.g. cmsg-alike or whatever)
and expose that with a libmptcp or some such.

  parent reply	other threads:[~2021-08-02 14:51 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 11:50 Revised mptcp_info sockopt and selftests Florian Westphal
2021-07-30 23:43 ` Mat Martineau
2021-07-31 16:14   ` Mat Martineau
2021-08-02 14:51   ` Florian Westphal [this message]
2021-08-02 22:23     ` Mat Martineau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.