netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duncan Roe <duncan_roe@optusnet.com.au>
To: Netfilter Development <netfilter-devel@vger.kernel.org>,
	Pablo Neira Ayuso <pablo@netfilter.org>
Subject: Re: [PATCH libnetfilter_queue 0/3] pktbuff API updates
Date: Thu, 30 Apr 2020 05:10:47 +1000	[thread overview]
Message-ID: <20200429191047.GB3833@dimstar.local.net> (raw)
In-Reply-To: <20200429132840.GA3833@dimstar.local.net>

On Wed, Apr 29, 2020 at 11:28:40PM +1000, Duncan Roe wrote:
> On Wed, Apr 29, 2020 at 12:55:20AM +0200, Pablo Neira Ayuso wrote:
> > On Wed, Apr 29, 2020 at 07:14:52AM +1000, Duncan Roe wrote:
> > > On Tue, Apr 28, 2020 at 12:34:07PM +0200, Pablo Neira Ayuso wrote:
> > [...]
> > > > pktb_alloc2() still has a memcpy which is not needed by people that do
> > > > not need to mangle the packet.
> > >
> > > No it does not. Please look again. There is only a memcpy if the caller
> > > specifies extra > 0, in which case she clearly intends to mangle it (perhaps
> > > depending on its contents).
> >
> > Right, it only happens if extra is specified.
> >
> > +       if (extra) {
> > +               pkt_data = buf;
> > +               memcpy(pkt_data, data, len);
> > +               memset((uint8_t *)pkt_data + len, 0, extra);
> > +       } else {
> > +               pkt_data = data;
> > +       }
> >
> > So buf is only used if extra is specified?
>
> Yes, that's right.
> >
> > > "depending on its contents" is where the memcpy deferral comes in. pktb_alloc2()
> > > verifies that the supplied buffer is big enough (size >= len + extra). The user
> > > declared it as a stack variable that size so it will be. With the deferral
> > > enhancement, pktb_alloc2() records the buffer address and extra in the enlarged
> > > struct pktbuff (extra is needed to tell pktb_mangle how much memory to memset to
> > > 0).
> >
> > I agree that deferring the memcpy() and avoiding the malloc() is the
> > way to go, we only have to agree in the way to achieve this.
> >
> > > If pktb_mangle() finds it has to make the packet larger then its original length
> > > and the packet is still in its original location then copy it and zero extra.
> > > (i.e. pktb_mangle() doesn't just check whether it was asked to make the packet
> > > bigger: it might have previously been asked to make it smaller).
> > >
> > > Also (and this *is* tricky, update relevant pointers in the struct pktbuff).
> > > That invalidates any poiners the caller may have obtained from e.g. pktb_data()
> > > - see end of email.
> >
> > Regarding pktb_mangle() reallocation case, refetching the pointers
> > sounds fine, documenting this is sufficient.
> >
> > [...]
> > > > Revisiting, I would prefer to keep things simple. The caller should
> > > > make sure that pktb_mangle() has a buffer containing enough room. I
> > > > think it's more simple for the caller to allocate a buffer that is
> > > > large enough for any mangling.
>
> I reckon they'll just copy the code from the nfq_nlmsg_verdict_put_pkt() man /
> web page. After declaring "char pktbuf[plen + EXTRA];" one can use "sizeof
> pktbuf" as the length argument.
> > >
> > > Yes it's more complex. No problem with the buffer - the user gave that to
> > > pktb_alloc2().
> >
> > I'm just hesitating about the new pktb_alloc2() approach because it
> > has many parameters, it's just looks a bit complicated to me (this
> > function takes 8 parameters).
>
> It has the original 4 from pktb_alloc() plus 2 {buffer, size} pairs. It could
> have been just one pair, with packet data appended to metadata as in
> pktb_alloc() but I thought it would be really awkward to document how to
> dimension it.
> >
> > If you can just pre-allocate the pkt_buff head from the configuration
> > phase (before receiving packets from the kernel), then attach the
> > buffer via pktb_setup_metadata() for each packet that is received (so
> > the pkt_buff head is recycled). With this approach, pktb_head_size()
> > won't be needed either.
>
> I think we should not be usurping the data pointer of mnl_cb_run(). I can see
> people wanting to use it to pass a pointer to e.g. some kind of database that
> callbacks need to access. There's no performance gain to recycling the buffer:
> the CB doesn't need to call pktb_head_size() on every invocation, that can be
> done once by main() e.g.
>
>  static size_t sizeof_head;
>  ...
>  int main(int argc, char *argv[])
>  {
>  ...
>          sizeof_head = pktb_head_size(); /* Avoid multiple calls in CB */
>  ...
>  static int queue_cb(const struct nlmsghdr *nlh, void *data)
>  {
>          char head[sizeof_head];
>
> >
> > My understanding is that requirements are:
> >
> > * Users that do not want to mangle the packet, they use the buffer
> >   that was passed to recvmsg().
> > * Users that want to mangle the packet, they use the _mangle()
> >   function that might reallocate the data buffer (the one you would
> >   like to have). However, if this data buffer reallocation happens,
> >   then pkt_buff should annotate that this pkt_buff object needs to
> >   release this data buffer from pktb_free() otherwise.
>
> No, there is nothing to release. We told pktb_alloc2() where the buffer was,
> it's on the stack (usually).
> >
> > > Problem is that if mangler moves the packet, then any packet pointer the caller
> > > had is invalid (points to the un-mangled copy). This applies at all levels, e.g.
> > > nfq_udp_get_payload(). There is no way for the mangler functions to address
> > > this: it just has to be highlighted in the documentation.
> >
> > That's fine, this is exactly how the kernel works: if the function
> > might reallocate the data area, then you have to refetch pointers
> > after this. If you teach _mangle() to do reallocations, then
> > documenting this is fine.
> >
> > However, those reallocation need pktb_free() to release that new data
> > buffer, right?
>
> No way. There is no malloc() nor free() anywhere. The data buffer is
> (recommended to be) on the stack; for running under gdb it may be preferred to
> us a static buffer which has to be dimensioned hugely.
> >
> > > Still, I really like the deferred copy enhancement. Your thoughts?
> >
> > The deferred copy idea when mangling sounds fine, we only have to
> > agree on how to get this done.
> >
> > Thanks.
>
> Cheers ... Duncan.

Sorry, I should have explained a bit more how the system would work:

struct pkt_buff has 3 new members:

        bool    copy_done;
        uint32_t extra;
        uint8_t *copy_buf;

When extra > 0, pktb_alloc2 verifies that buflen is >= len + extra. It then
stores extra and copy_buf in pktb, ready for use by pktb_mangle() (all the other
manglers call this eventually).

So that's how pktb_mangle() doesn't need to allocate a buffer.

Cheers ... Duncan.

  parent reply	other threads:[~2020-04-29 19:10 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-26 13:23 [PATCH libnetfilter_queue 0/3] pktbuff API updates Pablo Neira Ayuso
2020-04-26 13:23 ` [PATCH libnetfilter_queue 1/3] pktbuff: add pktb_alloc_head() and pktb_build_data() Pablo Neira Ayuso
2020-04-30  5:41   ` Duncan Roe
2020-04-26 13:23 ` [PATCH libnetfilter_queue 2/3] example: nf-queue: use pkt_buff Pablo Neira Ayuso
2020-05-14  4:35   ` Duncan Roe
2020-05-14  4:35   ` [PATCH libnetfilter_queue 1/1] example: nf-queue: use pkt_buff (updated) Duncan Roe
2020-04-26 13:23 ` [PATCH libnetfilter_queue 3/3] pktbuff: add pktb_reset_network_header() and pktb_set_network_header() Pablo Neira Ayuso
2020-04-27 11:06 ` [PATCH libnetfilter_queue 0/3] pktbuff API updates Duncan Roe
2020-04-27 17:06   ` Pablo Neira Ayuso
2020-04-28  4:33     ` Duncan Roe
2020-04-28 10:34       ` Pablo Neira Ayuso
2020-04-28 21:14         ` Duncan Roe
2020-04-28 22:55           ` Pablo Neira Ayuso
2020-04-29 13:28             ` Duncan Roe
2020-04-29 19:00               ` Pablo Neira Ayuso
2020-04-29 19:54                 ` Duncan Roe
2020-04-29 21:12                   ` Pablo Neira Ayuso
2020-04-29 19:10               ` Duncan Roe [this message]
2020-04-29 19:16                 ` Pablo Neira Ayuso
2020-04-29 20:30                   ` Duncan Roe
2020-04-29 21:05                     ` Pablo Neira Ayuso
2020-04-30  6:34                       ` Duncan Roe
2020-05-02 12:50                         ` Duncan Roe
2020-05-05 12:30                         ` Pablo Neira Ayuso
2020-05-06  0:57                           ` Duncan Roe
2020-05-06  2:39                             ` Duncan Roe
2020-05-08  1:13                           ` Duncan Roe
2020-05-09  8:26                           ` Duncan Roe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200429191047.GB3833@dimstar.local.net \
    --to=duncan_roe@optusnet.com.au \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).