tcp: skb_shift() issues w.r.t. SACK processing

* tcp: skb_shift() issues w.r.t. SACK processing
@ 2012-03-01 21:36 Vijay Subramanian
  2012-03-02 21:57 ` Neal Cardwell
  2012-03-05 10:42 ` Ilpo Järvinen
  0 siblings, 2 replies; 3+ messages in thread
From: Vijay Subramanian @ 2012-03-01 21:36 UTC (permalink / raw)
  To: netdev; +Cc: ilpo.jarvinen, ncardwell

Hi,

I was looking at the skb_shift() function that is called by
tcp_shift_skb_data() to shift bytes from a skb to the previous skb.
This is used to collapse multiple
SACK blocks into a single skb. I noticed the following two things.

1: skb_shift() has the following snippet.
 if (!to ||
           !skb_can_coalesce(tgt, to, skb_frag_page(fragfrom),
                             fragfrom->page_offset)) {
               merge = -1;
       } else {
               merge = to - 1;

               todo -= skb_frag_size(fragfrom);
               if (todo < 0) {
                       if (skb_prepare_for_shift(skb) ||
                           skb_prepare_for_shift(tgt))
                               return 0;

                       /* All previous frag pointers might be stale! */
                       fragfrom = &skb_shinfo(skb)->frags[from];
                       fragto = &skb_shinfo(tgt)->frags[merge];

                       skb_frag_size_add(fragto, shiftlen);
                       skb_frag_size_sub(fragfrom, shiftlen);
                       fragfrom->page_offset += shiftlen;

                       goto onlymerged;
               }

               from++;
       }

The (todo < 0) part is executed when we need to copy less than
skb_frag_size(fragfrom). Essentially, after the bytes have been
shifted, the frag in the original skb will still retain some data.
However, unless I am reading this wrong, the bytes are actually never
shifted. The frag offsets and size are updated and then we jump to
label onlymerged which  updates other counters.
It looks like the data will be lost since it is in neither skb. This
may not matter in normal case since this data has been SACKed by
receiver and will be discarded after cumulative ack.
But in case of receiver SACK reneging, this data may potentially be
resent. A similar case later in the function is treated correctly with
the data actually getting copied.

2: The reason I  was looking at skb_shift() was to understand why some
SACKed skbs were not getting collapsed into the previous skbs by
tcp_shift_skb_data(). This function has the following check.
if (!skb_can_shift(skb))
               goto fallback;
which fails if skb has data in linear part.

Currently skb_shift() does not shift bytes if there is data in linear
part. From skb_shift()
BUG_ON(skb_headlen(skb));       /* Would corrupt stream */

commit f07d960df3 (tcp: avoid frag allocation for small frames) made
it possible for skb to have data in linear part. Such skbs will not be
collapsed into the previous skb. This is not a bug but this behavior
prevents some savings by not allowing collapse of SACKed skbs. Can
anyone clarify why shifting bytes from linear part would corrupt the
stream in current code? Does it make sense to remove this constraint
after
commit f07d960df3?

Thanks,
Vijay

^ permalink raw reply	[flat|nested] 3+ messages in thread