All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Olivier Matz" <olivier.matz@6wind.com>
Cc: "Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>, <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
Date: Fri, 6 Nov 2020 13:23:42 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35C61400@smartserver.smartshare.dk> (raw)
In-Reply-To: <DM6PR11MB330827B41F3E29DBE495CEBD9AED0@DM6PR11MB3308.namprd11.prod.outlook.com>

> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Friday, November 6, 2020 12:54 PM
> 
> > > > > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free
> > > whatever
> > > > > the
> > > > > > > value
> > > > > > > > > of m->next,
> > > > > > > > > > > > > > > >>> because it can happen that m->nb_seg is
> !=
> > > 1.
> > > > > For
> > > > > > > > > instance in this
> > > > > > > > > > > > > > > >>> case:
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset
> > > nb_seg in
> > > > > the
> > > > > > > > > initial m1
> > > > > > > > > > > > > > > >>> segment (this is not required), after
> this
> > > code
> > > > > the
> > > > > > > > > mbuf chain
> > > > > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> Freeing this mbuf chain will not
> restore
> > > > > nb_seg=1
> > > > > > > in
> > > > > > > > > the second
> > > > > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf
> *m)
> > > > > > > > > > > > > > > >> {
> > > > > > > > > > > > > > > >> 	...
> > > > > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > > > > >>                 }
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> m1->next != NULL, so it will enter the
> if()
> > > > > block,
> > > > > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > > > > >> Thinking in more generic way, that
> change:
> > > > > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Ah, sorry. I oversimplified the example
> and
> > > now
> > > > > it
> > > > > > > does
> > > > > > > > > not
> > > > > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The full example also adds a split() to
> break
> > > the
> > > > > > > mbuf
> > > > > > > > > chain
> > > > > > > > > > > > > > > > between m1 and m2. The kind of thing that
> > > would
> > > > > be
> > > > > > > done
> > > > > > > > > for
> > > > > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If so, may be the right solution is to care
> > > about
> > > > > > > nb_segs
> > > > > > > > > > > > > > > when next is set to NULL on split? Any
> place
> > > when
> > > > > next
> > > > > > > is
> > > > > > > > > set
> > > > > > > > > > > > > > > to NULL. Just to keep the optimization in a
> > > more
> > > > > > > generic
> > > > > > > > > place.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > The problem with that approach is that there
> are
> > > > > already
> > > > > > > > > several
> > > > > > > > > > > > > > existing split() or trim() implementations in
> > > > > different
> > > > > > > DPDK-
> > > > > > > > > based
> > > > > > > > > > > > > > applications. For instance, we have some in
> > > > > 6WINDGate. If
> > > > > > > we
> > > > > > > > > force
> > > > > > > > > > > > > > applications to set nb_seg to 1 when
> resetting
> > > next,
> > > > > it
> > > > > > > has
> > > > > > > > > to be
> > > > > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > > > > From my perspective it seems natural to reset
> > > nb_seg at
> > > > > > > same
> > > > > > > > > time
> > > > > > > > > > > > > we reset next, otherwise inconsistency will
> occur.
> > > > > > > > > > > >
> > > > > > > > > > > > While it is not explicitly stated for nb_segs, to
> me
> > > it
> > > > > was
> > > > > > > clear
> > > > > > > > > that
> > > > > > > > > > > > nb_segs is only valid in the first segment, like
> for
> > > many
> > > > > > > fields
> > > > > > > > > (port,
> > > > > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > > > > >
> > > > > > > > > > > > If we say that nb_segs has to be valid in any
> > > segments,
> > > > > it
> > > > > > > means
> > > > > > > > > that
> > > > > > > > > > > > chain() or split() will have to update it in all
> > > > > segments,
> > > > > > > which
> > > > > > > > > is not
> > > > > > > > > > > > efficient.
> > > > > > > > > > >
> > > > > > > > > > > Why in all?
> > > > > > > > > > > We can state that nb_segs on non-first segment
> should
> > > > > always
> > > > > > > equal
> > > > > > > > > 1.
> > > > > > > > > > > As I understand in that case, both split() and
> chain()
> > > have
> > > > > to
> > > > > > > > > update nb_segs
> > > > > > > > > > > only for head mbufs, rest ones will remain
> untouched.
> > > > > > > > > >
> > > > > > > > > > Well, anyway, I think it's strange to have a
> constraint
> > > on m-
> > > > > > > >nb_segs
> > > > > > > > > for
> > > > > > > > > > non-first segment. We don't have that kind of
> constraints
> > > for
> > > > > > > other
> > > > > > > > > fields.
> > > > > > > > >
> > > > > > > > > True, we don't. But this is one of the fields we
> consider
> > > > > critical
> > > > > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > > > > segments
> > > > > > > will provide any benefits.
> > > > > > >
> > > > > > > It would make this patch unneeded.
> > > > > > > So, for direct, non-segmented mbufs  pktmbuf_free() will
> remain
> > > > > write-
> > > > > > > free.
> > > > > >
> > > > > > I see. Then I agree with Konstantin that alternative
> solutions
> > > should
> > > > > be considered.
> > > > > >
> > > > > > The benefit regarding free()'ing non-segmented mbufs - which
> is a
> > > > > very common operation - certainly outweighs the cost of
> requiring
> > > > > split()/chain() operations to set the new head mbuf's nb_segs =
> 1.
> > > > > >
> > > > > > Nonetheless, the bug needs to be fixed somehow.
> > > > > >
> > > > > > If we can't come up with a better solution that doesn't break
> the
> > > > > ABI, we are forced to accept the patch.
> > > > > >
> > > > > > Unless the techboard accepts to break the ABI in order to
> avoid
> > > the
> > > > > performance cost of this patch.
> > > > >
> > > > > Did someone notice a performance drop with this patch?
> > > > > On my side, I don't see any regression on a L3 use case.
> > > >
> > > > I am afraid that the DPDK performance regression tests are based
> on
> > > TX immediately following RX, so cache misses in TX may go by
> unnoticed
> > > because RX warmed up the cache for TX already. And similarly for RX
> > > reusing mbufs that have been warmed up by the preceding free() at
> TX.
> > > >
> > > > Please consider testing the performance difference with the mbuf
> > > being completely cold at TX, and going completely cold again before
> > > being reused for RX.
> > > >
> > > > >
> > > > > Let's sumarize: splitting a mbuf chain and freeing it causes
> > > subsequent
> > > > > mbuf
> > > > > allocation to return a mbuf which is not correctly initialized.
> > > There
> > > > > are 2
> > > > > options to fix it:
> > > > >
> > > > > 1/ change the mbuf free function (this patch)
> > > > >
> > > > >    - m->nb_segs would behave like many other field: valid in
> the
> > > first
> > > > >      segment, ignored in other segments
> > > > >    - may impact performance (suspected)
> > > > >
> > > > > 2/ change all places where a mbuf chain is split, or trimmed
> > > > >
> > > > >    - m->nb_segs would have a specific behavior: count the
> number of
> > > > >      segments in the first mbuf, should be 1 in the last
> segment,
> > > > >      ignored in other ones.
> > > > >    - no code change in mbuf library, so no performance impact
> > > > >    - need to patch all places where we do a mbuf split or trim.
> > > From
> > > > > afar,
> > > > >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > > > > applications
> > > > >      may have to be patched (for instance, I already found 3
> places
> > > in
> > > > >      6WIND code base without a deep search).
> > > > >
> > > > > In my opinion, 1/ is better, except we notice a significant
> > > > > performance,
> > > > > because the (implicit) behavior is unchanged.
> > > > >
> > > > > Whatever the solution, some documentation has to be added.
> > > > >
> > > > > Olivier
> > > > >
> > > >
> > > > Unfortunately, I don't think that anything but the first option
> will
> > > go into 20.11 and stable releases of older versions, so I stand by
> my
> > > acknowledgment of the patch.
> > >
> > > If we are affraid about 20.11 performance (it is legitimate, few
> days
> > > before the release), we can target 21.02. After all, everybody
> lives
> > > with this bug since 2017, so there is no urgency. If accepted and
> well
> > > tested, it can be backported in stable branches.
> >
> > +1
> >
> > Good thinking, Olivier!
> 
> Looking at the changes once again, it probably can be reworked a bit:
> 
> -	if (m->next != NULL) {
> -		m->next = NULL;
> -		m->nb_segs = 1;
> -	}
> 
> +	if (m->next != NULL)
> +		m->next = NULL;
> +	if (m->nb_segs != 1)
> +		m->nb_segs = 1;
> 
> That way we add one more condition checking, but I suppose it
> shouldn't be that perf critical.
> That way for direct,non-segmented mbuf it still should be write-free.
> Except cases as you described above: chain(), then split().
> 
> Of-course we still need to do perf testing for that approach too.
> So if your preference it to postpone it till 21.02 - that's ok for me.
> Konstantin

With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.



  reply	other threads:[~2020-11-06 12:23 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-04 17:00 [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free Olivier Matz
2020-11-05  0:15 ` Ananyev, Konstantin
2020-11-05  7:46   ` Olivier Matz
2020-11-05  8:26     ` Andrew Rybchenko
2020-11-05  9:10       ` Olivier Matz
2020-11-05 11:34         ` Ananyev, Konstantin
2020-11-05 12:31           ` Olivier Matz
2020-11-05 13:14             ` Ananyev, Konstantin
2020-11-05 13:24               ` Olivier Matz
2020-11-05 13:55                 ` Ananyev, Konstantin
2020-11-05 16:30                   ` Morten Brørup
2020-11-05 23:55                     ` Ananyev, Konstantin
2020-11-06  7:52                       ` Morten Brørup
2020-11-06  8:20                         ` Olivier Matz
2020-11-06  8:50                           ` Morten Brørup
2020-11-06 10:04                             ` Olivier Matz
2020-11-06 10:07                               ` Morten Brørup
2020-11-06 11:53                                 ` Ananyev, Konstantin
2020-11-06 12:23                                   ` Morten Brørup [this message]
2020-11-08 14:16                                     ` Andrew Rybchenko
2020-11-08 14:19                                       ` Ananyev, Konstantin
2020-11-10 16:26                                         ` Olivier Matz
2020-11-05  8:33     ` Morten Brørup
2020-11-05  9:03       ` Olivier Matz
2020-11-05  9:09     ` Andrew Rybchenko
2020-11-08  7:25 ` Ali Alnubani
2020-12-18 12:52 ` [dpdk-dev] [PATCH v2] " Olivier Matz
2020-12-18 13:18   ` Morten Brørup
2020-12-18 23:33     ` Ajit Khaparde
2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
2021-01-10  9:28   ` Ali Alnubani
2021-01-11 13:14   ` Ananyev, Konstantin
2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
2021-01-15 13:59   ` [dpdk-dev] [dpdk-stable] " David Marchand
2021-01-15 18:39     ` Ali Alnubani
2021-01-18 17:52       ` Ali Alnubani
2021-01-19  8:32         ` Olivier Matz
2021-01-19  8:53           ` Morten Brørup
2021-01-19 12:00             ` Ferruh Yigit
2021-01-19 12:27               ` Morten Brørup
2021-01-19 14:03                 ` Ferruh Yigit
2021-01-19 14:21                   ` Morten Brørup
2021-01-21  9:15                     ` Ferruh Yigit
2021-01-19 14:04           ` Slava Ovsiienko
2021-07-24  8:47             ` Thomas Monjalon
2021-07-30 12:36               ` Olivier Matz
2021-07-30 14:35                 ` Morten Brørup
2021-07-30 14:54                   ` Thomas Monjalon
2021-07-30 15:14                     ` Olivier Matz
2021-07-30 15:23                       ` Morten Brørup
2021-08-04 13:29                       ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
2021-08-04 14:25                         ` Ajit Khaparde
2021-08-05  6:08                         ` Morten Brørup
2021-08-06 14:21                           ` Thomas Monjalon
2021-08-06 14:24                             ` Morten Brørup
2021-09-28  8:28                     ` [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free Thomas Monjalon
2021-09-28  9:00                       ` Slava Ovsiienko
2021-09-28  9:25                         ` Ananyev, Konstantin
2021-09-28  9:39                         ` Morten Brørup
2021-09-29  8:03                           ` Ali Alnubani
2021-09-29 21:39                             ` Olivier Matz
2021-09-30 13:29                               ` Ali Alnubani
2021-10-21  8:26                                 ` Thomas Monjalon
2021-01-21  9:19       ` Ferruh Yigit
2021-01-21  9:29         ` Morten Brørup
2021-01-21 16:35           ` [dpdk-dev] [dpdklab] " Lincoln Lavoie
2021-01-23  8:57             ` Morten Brørup
2021-01-25 17:00               ` Brandon Lo
2021-01-25 18:42             ` Ferruh Yigit
2021-06-15 13:56   ` [dpdk-dev] " Morten Brørup
2021-09-29 21:37   ` [dpdk-dev] [PATCH v5] " Olivier Matz
2021-09-30 13:27     ` Ali Alnubani
2021-10-21  9:18     ` David Marchand
2022-07-28 14:06       ` CI performance test results might be misleading Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35C61400@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.