All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Suspected renege problem in sctp
@ 2013-01-31  2:58 Vlad Yasevich
  2013-01-31  4:30 ` Roberts, Lee A.
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Vlad Yasevich @ 2013-01-31  2:58 UTC (permalink / raw)
  To: linux-sctp

On 01/30/2013 07:51 PM, Bob Montgomery wrote:
> Vlad,
>
> If you're not the right guy to ask can you let me know who might be?

Hi Bob.  I am still the right person, but it would be even better to
address these types of questions and writeups to linux-sctp@vger.kernel.org.

>
> We've been investigating sctpspray hangs for quite some time.
>
> I think we're seeing a case where a renege operation is removing
> fragments from the ulp reasm queue that are at or below the value
> of cumulative_tsn_ack_point.  We put a BUG statement in
> sctp_tsnmap_renege so we'd crash instead of ignoring this case:
>
>          if (TSN_lt(tsn, map->base_tsn))
>                  return;


Hmm..  Looking at the reneging functions there is no TSN checking at 
all.  You are completely right.  We MUST not renege DATA that has moved
the cumulative_tsn_ack_point.

Adding something like this in sctp_ulpq_renege_list should fix this:

diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index ada1746..9bd94e6 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -970,10 +970,14 @@ static __u16 sctp_ulpq_renege_list(struct 
sctp_ulpq *ulpq,
         tsnmap = &ulpq->asoc->peer.tsn_map;

         while ((skb = __skb_dequeue_tail(list)) != NULL) {
-               freed += skb_headlen(skb);
                 event = sctp_skb2event(skb);
                 tsn = event->tsn;
-
+
+               /* Make sure we do not renege below CTSN */
+               if (TSN_lte(tsn, sctp_tsnmap_get_ctsn(tsnmap)))
+                       break;
+
+               freed += skb_headlen(skb);
                 sctp_ulpevent_free(event);
                 sctp_tsnmap_renege(tsnmap, tsn);
                 if (freed >= needed)

I think there also might be a bug here when reneging from the ordered 
queue and the message has been reassembled.  I need to look at that a 
bit more.

-vlad

>
> And after two days of sctpspray pounding, hit the bug.
>
> Here's a partial write-up using examples from the core file:
>
> ====================================> Fact 1:  a renege operation will only be launched if there's a gap in
> the tsn.
>
> So a reassembly queue like this one would not be set upon:
>
> PID 1784
> sctp_association 0xffff88041b6a2000
>
>        tsn_map = 0xffff88041dd8d560,
>        base_tsn = 0x55751715,
>        cumulative_tsn_ack_point = 0x55751714,
>        max_tsn_seen = 0x55751714,
>
> reasm queue summary:
>    ssn = 0x345,   tsn = 0x5575170c,   msg_flags = 0x2,   rmem_len = 0x69c
>    ssn = 0x345,   tsn = 0x5575170d,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x345,   tsn = 0x5575170e,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x345,   tsn = 0x5575170f,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x345,   tsn = 0x55751710,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x345,   tsn = 0x55751711,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x345,   tsn = 0x55751712,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x345,   tsn = 0x55751713,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x345,   tsn = 0x55751714,   msg_flags = 0x0,   rmem_len = 0x69c
>
> No gap: the last tsn in the reasm queue matches cumulative_tsn_ack_point = 0x55751714,
>
>
> In our case, I believe the reasm queue looked like this when the renege launched:
>
> In sctp_association 0xffff88041b845000
>
>        base_tsn = 0x936a6d76,
>        cumulative_tsn_ack_point = 0x936a6d75,
>        max_tsn_seen = 0x936a6d79,
>
>    ssn = 0x0,   tsn = 0x936a6d6f,   msg_flags = 0x2,   rmem_len = 0x69c
>    ssn = 0x0,   tsn = 0x936a6d70,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x0,   tsn = 0x936a6d71,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x0,   tsn = 0x936a6d72,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x0,   tsn = 0x936a6d73,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x0,   tsn = 0x936a6d74,   msg_flags = 0x0,   rmem_len = 0x69c
>    ssn = 0x0,   tsn = 0x936a6d75,   msg_flags = 0x0,   rmem_len = 0x69c
>
>    ssn = 0x0,   tsn = 0x936a6d78,   msg_flags = 0x0,   rmem_len = 0x628
>
> The gap between 75 and 78 meant that a renege could be launched.
>
> It was launched for tsn = 0x936a6d76 (just arriving and apparently out
> of memory), and the "needed" amount was 0x05ac (1452 bytes).
>
> tsn = 0x936a6d78 was removed, and the amount recovered was 0x538 (1336 bytes).
> The value of "rmem_len" in the event is not what is used to calculated needed
> and freed.
>
> Since 0x538 didn't satisfy 0x5ac, it went for the next one down on the queue
> (tsn = 0x936a6d75)
> and recovered 0x5ac from it for a total recovery of 0xae4 (2788 bytes).
> So because the first post-gap fragment happened to be a LAST_FRAG and shorter than
> the rest of them, it wasn't enough to satisfy the request and we moved on
> to the one that caused the BUG.
>
> If there had been two gapped frags, or if the gapped frag had been another
> middle one that was big enough to satisfy the request, it would not have
> been caught freeing a fragment that was at the cumulative tsn ack point.
> ====================================
>
> Since the base_tsn and cumulative_tsn_ack_point are advanced in
> sctp_ulpevent_make_rcvmsg() before putting the fragments on the
> reasm queue, the renege code should not be allowed to dip below
> that point in sctp_ulpq_renege_list().   Otherwise, you're
> discarding undelivered data that you've already reported as
> "delivered" to the sender, right?
>
> Thanks,
> Bob Montgomery
>
>
>
>


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: Suspected renege problem in sctp
  2013-01-31  2:58 Suspected renege problem in sctp Vlad Yasevich
@ 2013-01-31  4:30 ` Roberts, Lee A.
  2013-01-31 15:08 ` Vlad Yasevich
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Roberts, Lee A. @ 2013-01-31  4:30 UTC (permalink / raw)
  To: linux-sctp

VmxhZCwNCg0KVGhlIHRlc3QgY29kZSB0aGF0IEknbSBydW5uaW5nIGF0IHRoZSBtb21lbnQgaGFz
IGNoYW5nZXMgc2ltaWxhciB0byB0aGUgZm9sbG93aW5nLg0KSSB0aGluayB3ZSB3YW50IHRvIHBl
ZWsgYXQgdGhlIHRhaWwgb2YgdGhlIHF1ZXVlLS0tYW5kIG5vdCBkZXF1ZXVlIChvciB1bmxpbmsp
IHRoZQ0KZGF0YSB1bnRpbCB3ZSdyZSBzdXJlIHdlIHdhbnQgdG8gcmVuZWdlLg0KDQogICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAtLSBMZWUgUm9iZXJ0cw0K
DQoNCiMgZGlmZiAtYyB1bHBxdWV1ZS5jfiB1bHBxdWV1ZS5jDQoqKiogdWxwcXVldWUuY34gMjAx
Mi0xMC0wOSAxNDozMTozNS4wMDAwMDAwMDAgLTA2MDANCi0tLSB1bHBxdWV1ZS5jICAyMDEzLTAx
LTMwIDIxOjIyOjQ5LjAwMDAwMDAwMCAtMDcwMA0KKioqKioqKioqKioqKioqDQoqKiogOTYzLDk3
MyAqKioqDQogIA0KICAgICAgICB0c25tYXAgPSAmdWxwcS0+YXNvYy0+cGVlci50c25fbWFwOw0K
ICANCiEgICAgICAgd2hpbGUgKChza2IgPSBfX3NrYl9kZXF1ZXVlX3RhaWwobGlzdCkpICE9IE5V
TEwpIHsNCiEgICAgICAgICAgICAgICBmcmVlZCArPSBza2JfaGVhZGxlbihza2IpOw0KICAgICAg
ICAgICAgICAgIGV2ZW50ID0gc2N0cF9za2IyZXZlbnQoc2tiKTsNCiAgICAgICAgICAgICAgICB0
c24gPSBldmVudC0+dHNuOw0KICANCiAgICAgICAgICAgICAgICBzY3RwX3VscGV2ZW50X2ZyZWUo
ZXZlbnQpOw0KICAgICAgICAgICAgICAgIHNjdHBfdHNubWFwX3JlbmVnZSh0c25tYXAsIHRzbik7
DQogICAgICAgICAgICAgICAgaWYgKGZyZWVkID49IG5lZWRlZCkNCi0tLSA5NjMsOTc2IC0tLS0N
CiAgDQogICAgICAgIHRzbm1hcCA9ICZ1bHBxLT5hc29jLT5wZWVyLnRzbl9tYXA7DQogIA0KISAg
ICAgICB3aGlsZSAoKHNrYiA9IHNrYl9wZWVrX3RhaWwobGlzdCkpICE9IE5VTEwpIHsNCiAgICAg
ICAgICAgICAgICBldmVudCA9IHNjdHBfc2tiMmV2ZW50KHNrYik7DQogICAgICAgICAgICAgICAg
dHNuID0gZXZlbnQtPnRzbjsNCisgICAgICAgICAgICAgICBpZiAoVFNOX2x0ZSh0c24sIHNjdHBf
dHNubWFwX2dldF9jdHNuKHRzbm1hcCkpKQ0KKyAgICAgICAgICAgICAgICAgICAgICAgYnJlYWs7
DQogIA0KKyAgICAgICAgICAgICAgIF9fc2tiX3VubGluayhza2IsIGxpc3QpOw0KKyAgICAgICAg
ICAgICAgIGZyZWVkICs9IHNrYl9oZWFkbGVuKHNrYik7DQogICAgICAgICAgICAgICAgc2N0cF91
bHBldmVudF9mcmVlKGV2ZW50KTsNCiAgICAgICAgICAgICAgICBzY3RwX3Rzbm1hcF9yZW5lZ2Uo
dHNubWFwLCB0c24pOw0KICAgICAgICAgICAgICAgIGlmIChmcmVlZCA+PSBuZWVkZWQpDQojDQoN
Cg0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogVmxhZCBZYXNldmljaCBbbWFp
bHRvOnZ5YXNldmljaEBnbWFpbC5jb21dIA0KU2VudDogV2VkbmVzZGF5LCBKYW51YXJ5IDMwLCAy
MDEzIDc6NTkgUE0NClRvOiBNb250Z29tZXJ5LCBCb2INCkNjOiBSb2JlcnRzLCBMZWUgQS47IGxp
bnV4LXNjdHBAdmdlci5rZXJuZWwub3JnDQpTdWJqZWN0OiBSZTogU3VzcGVjdGVkIHJlbmVnZSBw
cm9ibGVtIGluIHNjdHANCg0KT24gMDEvMzAvMjAxMyAwNzo1MSBQTSwgQm9iIE1vbnRnb21lcnkg
d3JvdGU6DQo+IFZsYWQsDQo+DQo+IElmIHlvdSdyZSBub3QgdGhlIHJpZ2h0IGd1eSB0byBhc2sg
Y2FuIHlvdSBsZXQgbWUga25vdyB3aG8gbWlnaHQgYmU/DQoNCkhpIEJvYi4gIEkgYW0gc3RpbGwg
dGhlIHJpZ2h0IHBlcnNvbiwgYnV0IGl0IHdvdWxkIGJlIGV2ZW4gYmV0dGVyIHRvDQphZGRyZXNz
IHRoZXNlIHR5cGVzIG9mIHF1ZXN0aW9ucyBhbmQgd3JpdGV1cHMgdG8gbGludXgtc2N0cEB2Z2Vy
Lmtlcm5lbC5vcmcuDQoNCj4NCj4gV2UndmUgYmVlbiBpbnZlc3RpZ2F0aW5nIHNjdHBzcHJheSBo
YW5ncyBmb3IgcXVpdGUgc29tZSB0aW1lLg0KPg0KPiBJIHRoaW5rIHdlJ3JlIHNlZWluZyBhIGNh
c2Ugd2hlcmUgYSByZW5lZ2Ugb3BlcmF0aW9uIGlzIHJlbW92aW5nDQo+IGZyYWdtZW50cyBmcm9t
IHRoZSB1bHAgcmVhc20gcXVldWUgdGhhdCBhcmUgYXQgb3IgYmVsb3cgdGhlIHZhbHVlDQo+IG9m
IGN1bXVsYXRpdmVfdHNuX2Fja19wb2ludC4gIFdlIHB1dCBhIEJVRyBzdGF0ZW1lbnQgaW4NCj4g
c2N0cF90c25tYXBfcmVuZWdlIHNvIHdlJ2QgY3Jhc2ggaW5zdGVhZCBvZiBpZ25vcmluZyB0aGlz
IGNhc2U6DQo+DQo+ICAgICAgICAgIGlmIChUU05fbHQodHNuLCBtYXAtPmJhc2VfdHNuKSkNCj4g
ICAgICAgICAgICAgICAgICByZXR1cm47DQoNCg0KSG1tLi4gIExvb2tpbmcgYXQgdGhlIHJlbmVn
aW5nIGZ1bmN0aW9ucyB0aGVyZSBpcyBubyBUU04gY2hlY2tpbmcgYXQgDQphbGwuICBZb3UgYXJl
IGNvbXBsZXRlbHkgcmlnaHQuICBXZSBNVVNUIG5vdCByZW5lZ2UgREFUQSB0aGF0IGhhcyBtb3Zl
ZA0KdGhlIGN1bXVsYXRpdmVfdHNuX2Fja19wb2ludC4NCg0KQWRkaW5nIHNvbWV0aGluZyBsaWtl
IHRoaXMgaW4gc2N0cF91bHBxX3JlbmVnZV9saXN0IHNob3VsZCBmaXggdGhpczoNCg0KZGlmZiAt
LWdpdCBhL25ldC9zY3RwL3VscHF1ZXVlLmMgYi9uZXQvc2N0cC91bHBxdWV1ZS5jDQppbmRleCBh
ZGExNzQ2Li45YmQ5NGU2IDEwMDY0NA0KLS0tIGEvbmV0L3NjdHAvdWxwcXVldWUuYw0KKysrIGIv
bmV0L3NjdHAvdWxwcXVldWUuYw0KQEAgLTk3MCwxMCArOTcwLDE0IEBAIHN0YXRpYyBfX3UxNiBz
Y3RwX3VscHFfcmVuZWdlX2xpc3Qoc3RydWN0IA0Kc2N0cF91bHBxICp1bHBxLA0KICAgICAgICAg
dHNubWFwID0gJnVscHEtPmFzb2MtPnBlZXIudHNuX21hcDsNCg0KICAgICAgICAgd2hpbGUgKChz
a2IgPSBfX3NrYl9kZXF1ZXVlX3RhaWwobGlzdCkpICE9IE5VTEwpIHsNCi0gICAgICAgICAgICAg
ICBmcmVlZCArPSBza2JfaGVhZGxlbihza2IpOw0KICAgICAgICAgICAgICAgICBldmVudCA9IHNj
dHBfc2tiMmV2ZW50KHNrYik7DQogICAgICAgICAgICAgICAgIHRzbiA9IGV2ZW50LT50c247DQot
DQorDQorICAgICAgICAgICAgICAgLyogTWFrZSBzdXJlIHdlIGRvIG5vdCByZW5lZ2UgYmVsb3cg
Q1RTTiAqLw0KKyAgICAgICAgICAgICAgIGlmIChUU05fbHRlKHRzbiwgc2N0cF90c25tYXBfZ2V0
X2N0c24odHNubWFwKSkpDQorICAgICAgICAgICAgICAgICAgICAgICBicmVhazsNCisNCisgICAg
ICAgICAgICAgICBmcmVlZCArPSBza2JfaGVhZGxlbihza2IpOw0KICAgICAgICAgICAgICAgICBz
Y3RwX3VscGV2ZW50X2ZyZWUoZXZlbnQpOw0KICAgICAgICAgICAgICAgICBzY3RwX3Rzbm1hcF9y
ZW5lZ2UodHNubWFwLCB0c24pOw0KICAgICAgICAgICAgICAgICBpZiAoZnJlZWQgPj0gbmVlZGVk
KQ0KDQpJIHRoaW5rIHRoZXJlIGFsc28gbWlnaHQgYmUgYSBidWcgaGVyZSB3aGVuIHJlbmVnaW5n
IGZyb20gdGhlIG9yZGVyZWQgDQpxdWV1ZSBhbmQgdGhlIG1lc3NhZ2UgaGFzIGJlZW4gcmVhc3Nl
bWJsZWQuICBJIG5lZWQgdG8gbG9vayBhdCB0aGF0IGEgDQpiaXQgbW9yZS4NCg0KLXZsYWQNCg0K
Pg0KPiBBbmQgYWZ0ZXIgdHdvIGRheXMgb2Ygc2N0cHNwcmF5IHBvdW5kaW5nLCBoaXQgdGhlIGJ1
Zy4NCj4NCj4gSGVyZSdzIGEgcGFydGlhbCB3cml0ZS11cCB1c2luZyBleGFtcGxlcyBmcm9tIHRo
ZSBjb3JlIGZpbGU6DQo+DQo+ID09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0NCj4gRmFjdCAxOiAgYSByZW5lZ2Ug
b3BlcmF0aW9uIHdpbGwgb25seSBiZSBsYXVuY2hlZCBpZiB0aGVyZSdzIGEgZ2FwIGluDQo+IHRo
ZSB0c24uDQo+DQo+IFNvIGEgcmVhc3NlbWJseSBxdWV1ZSBsaWtlIHRoaXMgb25lIHdvdWxkIG5v
dCBiZSBzZXQgdXBvbjoNCj4NCj4gUElEIDE3ODQNCj4gc2N0cF9hc3NvY2lhdGlvbiAweGZmZmY4
ODA0MWI2YTIwMDANCj4NCj4gICAgICAgIHRzbl9tYXAgPSAweGZmZmY4ODA0MWRkOGQ1NjAsDQo+
ICAgICAgICBiYXNlX3RzbiA9IDB4NTU3NTE3MTUsDQo+ICAgICAgICBjdW11bGF0aXZlX3Rzbl9h
Y2tfcG9pbnQgPSAweDU1NzUxNzE0LA0KPiAgICAgICAgbWF4X3Rzbl9zZWVuID0gMHg1NTc1MTcx
NCwNCj4NCj4gcmVhc20gcXVldWUgc3VtbWFyeToNCj4gICAgc3NuID0gMHgzNDUsICAgdHNuID0g
MHg1NTc1MTcwYywgICBtc2dfZmxhZ3MgPSAweDIsICAgcm1lbV9sZW4gPSAweDY5Yw0KPiAgICBz
c24gPSAweDM0NSwgICB0c24gPSAweDU1NzUxNzBkLCAgIG1zZ19mbGFncyA9IDB4MCwgICBybWVt
X2xlbiA9IDB4NjljDQo+ICAgIHNzbiA9IDB4MzQ1LCAgIHRzbiA9IDB4NTU3NTE3MGUsICAgbXNn
X2ZsYWdzID0gMHgwLCAgIHJtZW1fbGVuID0gMHg2OWMNCj4gICAgc3NuID0gMHgzNDUsICAgdHNu
ID0gMHg1NTc1MTcwZiwgICBtc2dfZmxhZ3MgPSAweDAsICAgcm1lbV9sZW4gPSAweDY5Yw0KPiAg
ICBzc24gPSAweDM0NSwgICB0c24gPSAweDU1NzUxNzEwLCAgIG1zZ19mbGFncyA9IDB4MCwgICBy
bWVtX2xlbiA9IDB4NjljDQo+ICAgIHNzbiA9IDB4MzQ1LCAgIHRzbiA9IDB4NTU3NTE3MTEsICAg
bXNnX2ZsYWdzID0gMHgwLCAgIHJtZW1fbGVuID0gMHg2OWMNCj4gICAgc3NuID0gMHgzNDUsICAg
dHNuID0gMHg1NTc1MTcxMiwgICBtc2dfZmxhZ3MgPSAweDAsICAgcm1lbV9sZW4gPSAweDY5Yw0K
PiAgICBzc24gPSAweDM0NSwgICB0c24gPSAweDU1NzUxNzEzLCAgIG1zZ19mbGFncyA9IDB4MCwg
ICBybWVtX2xlbiA9IDB4NjljDQo+ICAgIHNzbiA9IDB4MzQ1LCAgIHRzbiA9IDB4NTU3NTE3MTQs
ICAgbXNnX2ZsYWdzID0gMHgwLCAgIHJtZW1fbGVuID0gMHg2OWMNCj4NCj4gTm8gZ2FwOiB0aGUg
bGFzdCB0c24gaW4gdGhlIHJlYXNtIHF1ZXVlIG1hdGNoZXMgY3VtdWxhdGl2ZV90c25fYWNrX3Bv
aW50ID0gMHg1NTc1MTcxNCwNCj4NCj4NCj4gSW4gb3VyIGNhc2UsIEkgYmVsaWV2ZSB0aGUgcmVh
c20gcXVldWUgbG9va2VkIGxpa2UgdGhpcyB3aGVuIHRoZSByZW5lZ2UgbGF1bmNoZWQ6DQo+DQo+
IEluIHNjdHBfYXNzb2NpYXRpb24gMHhmZmZmODgwNDFiODQ1MDAwDQo+DQo+ICAgICAgICBiYXNl
X3RzbiA9IDB4OTM2YTZkNzYsDQo+ICAgICAgICBjdW11bGF0aXZlX3Rzbl9hY2tfcG9pbnQgPSAw
eDkzNmE2ZDc1LA0KPiAgICAgICAgbWF4X3Rzbl9zZWVuID0gMHg5MzZhNmQ3OSwNCj4NCj4gICAg
c3NuID0gMHgwLCAgIHRzbiA9IDB4OTM2YTZkNmYsICAgbXNnX2ZsYWdzID0gMHgyLCAgIHJtZW1f
bGVuID0gMHg2OWMNCj4gICAgc3NuID0gMHgwLCAgIHRzbiA9IDB4OTM2YTZkNzAsICAgbXNnX2Zs
YWdzID0gMHgwLCAgIHJtZW1fbGVuID0gMHg2OWMNCj4gICAgc3NuID0gMHgwLCAgIHRzbiA9IDB4
OTM2YTZkNzEsICAgbXNnX2ZsYWdzID0gMHgwLCAgIHJtZW1fbGVuID0gMHg2OWMNCj4gICAgc3Nu
ID0gMHgwLCAgIHRzbiA9IDB4OTM2YTZkNzIsICAgbXNnX2ZsYWdzID0gMHgwLCAgIHJtZW1fbGVu
ID0gMHg2OWMNCj4gICAgc3NuID0gMHgwLCAgIHRzbiA9IDB4OTM2YTZkNzMsICAgbXNnX2ZsYWdz
ID0gMHgwLCAgIHJtZW1fbGVuID0gMHg2OWMNCj4gICAgc3NuID0gMHgwLCAgIHRzbiA9IDB4OTM2
YTZkNzQsICAgbXNnX2ZsYWdzID0gMHgwLCAgIHJtZW1fbGVuID0gMHg2OWMNCj4gICAgc3NuID0g
MHgwLCAgIHRzbiA9IDB4OTM2YTZkNzUsICAgbXNnX2ZsYWdzID0gMHgwLCAgIHJtZW1fbGVuID0g
MHg2OWMNCj4NCj4gICAgc3NuID0gMHgwLCAgIHRzbiA9IDB4OTM2YTZkNzgsICAgbXNnX2ZsYWdz
ID0gMHgwLCAgIHJtZW1fbGVuID0gMHg2MjgNCj4NCj4gVGhlIGdhcCBiZXR3ZWVuIDc1IGFuZCA3
OCBtZWFudCB0aGF0IGEgcmVuZWdlIGNvdWxkIGJlIGxhdW5jaGVkLg0KPg0KPiBJdCB3YXMgbGF1
bmNoZWQgZm9yIHRzbiA9IDB4OTM2YTZkNzYgKGp1c3QgYXJyaXZpbmcgYW5kIGFwcGFyZW50bHkg
b3V0DQo+IG9mIG1lbW9yeSksIGFuZCB0aGUgIm5lZWRlZCIgYW1vdW50IHdhcyAweDA1YWMgKDE0
NTIgYnl0ZXMpLg0KPg0KPiB0c24gPSAweDkzNmE2ZDc4IHdhcyByZW1vdmVkLCBhbmQgdGhlIGFt
b3VudCByZWNvdmVyZWQgd2FzIDB4NTM4ICgxMzM2IGJ5dGVzKS4NCj4gVGhlIHZhbHVlIG9mICJy
bWVtX2xlbiIgaW4gdGhlIGV2ZW50IGlzIG5vdCB3aGF0IGlzIHVzZWQgdG8gY2FsY3VsYXRlZCBu
ZWVkZWQNCj4gYW5kIGZyZWVkLg0KPg0KPiBTaW5jZSAweDUzOCBkaWRuJ3Qgc2F0aXNmeSAweDVh
YywgaXQgd2VudCBmb3IgdGhlIG5leHQgb25lIGRvd24gb24gdGhlIHF1ZXVlDQo+ICh0c24gPSAw
eDkzNmE2ZDc1KQ0KPiBhbmQgcmVjb3ZlcmVkIDB4NWFjIGZyb20gaXQgZm9yIGEgdG90YWwgcmVj
b3Zlcnkgb2YgMHhhZTQgKDI3ODggYnl0ZXMpLg0KPiBTbyBiZWNhdXNlIHRoZSBmaXJzdCBwb3N0
LWdhcCBmcmFnbWVudCBoYXBwZW5lZCB0byBiZSBhIExBU1RfRlJBRyBhbmQgc2hvcnRlciB0aGFu
DQo+IHRoZSByZXN0IG9mIHRoZW0sIGl0IHdhc24ndCBlbm91Z2ggdG8gc2F0aXNmeSB0aGUgcmVx
dWVzdCBhbmQgd2UgbW92ZWQgb24NCj4gdG8gdGhlIG9uZSB0aGF0IGNhdXNlZCB0aGUgQlVHLg0K
Pg0KPiBJZiB0aGVyZSBoYWQgYmVlbiB0d28gZ2FwcGVkIGZyYWdzLCBvciBpZiB0aGUgZ2FwcGVk
IGZyYWcgaGFkIGJlZW4gYW5vdGhlcg0KPiBtaWRkbGUgb25lIHRoYXQgd2FzIGJpZyBlbm91Z2gg
dG8gc2F0aXNmeSB0aGUgcmVxdWVzdCwgaXQgd291bGQgbm90IGhhdmUNCj4gYmVlbiBjYXVnaHQg
ZnJlZWluZyBhIGZyYWdtZW50IHRoYXQgd2FzIGF0IHRoZSBjdW11bGF0aXZlIHRzbiBhY2sgcG9p
bnQuDQo+ID09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PQ0KPg0KPiBTaW5jZSB0aGUgYmFzZV90c24gYW5kIGN1bXVs
YXRpdmVfdHNuX2Fja19wb2ludCBhcmUgYWR2YW5jZWQgaW4NCj4gc2N0cF91bHBldmVudF9tYWtl
X3Jjdm1zZygpIGJlZm9yZSBwdXR0aW5nIHRoZSBmcmFnbWVudHMgb24gdGhlDQo+IHJlYXNtIHF1
ZXVlLCB0aGUgcmVuZWdlIGNvZGUgc2hvdWxkIG5vdCBiZSBhbGxvd2VkIHRvIGRpcCBiZWxvdw0K
PiB0aGF0IHBvaW50IGluIHNjdHBfdWxwcV9yZW5lZ2VfbGlzdCgpLiAgIE90aGVyd2lzZSwgeW91
J3JlDQo+IGRpc2NhcmRpbmcgdW5kZWxpdmVyZWQgZGF0YSB0aGF0IHlvdSd2ZSBhbHJlYWR5IHJl
cG9ydGVkIGFzDQo+ICJkZWxpdmVyZWQiIHRvIHRoZSBzZW5kZXIsIHJpZ2h0Pw0KPg0KPiBUaGFu
a3MsDQo+IEJvYiBNb250Z29tZXJ5DQo+DQo+DQo+DQo+DQoNCg=

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suspected renege problem in sctp
  2013-01-31  2:58 Suspected renege problem in sctp Vlad Yasevich
  2013-01-31  4:30 ` Roberts, Lee A.
@ 2013-01-31 15:08 ` Vlad Yasevich
  2013-02-04 23:47 ` Bob Montgomery
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Vlad Yasevich @ 2013-01-31 15:08 UTC (permalink / raw)
  To: linux-sctp

On 01/30/2013 11:30 PM, Roberts, Lee A. wrote:
> Vlad,
>
> The test code that I'm running at the moment has changes similar to the following.
> I think we want to peek at the tail of the queue---and not dequeue (or unlink) the
> data until we're sure we want to renege.

You are right.  If Bob can send a signed-off patch linux-sctp and 
netdev, we can get it upstream and into stable releases.

-vlad

>
>                                                  -- Lee Roberts
>
>
> # diff -c ulpqueue.c~ ulpqueue.c
> *** ulpqueue.c~ 2012-10-09 14:31:35.000000000 -0600
> --- ulpqueue.c  2013-01-30 21:22:49.000000000 -0700
> ***************
> *** 963,973 ****
>
>          tsnmap = &ulpq->asoc->peer.tsn_map;
>
> !       while ((skb = __skb_dequeue_tail(list)) != NULL) {
> !               freed += skb_headlen(skb);
>                  event = sctp_skb2event(skb);
>                  tsn = event->tsn;
>
>                  sctp_ulpevent_free(event);
>                  sctp_tsnmap_renege(tsnmap, tsn);
>                  if (freed >= needed)
> --- 963,976 ----
>
>          tsnmap = &ulpq->asoc->peer.tsn_map;
>
> !       while ((skb = skb_peek_tail(list)) != NULL) {
>                  event = sctp_skb2event(skb);
>                  tsn = event->tsn;
> +               if (TSN_lte(tsn, sctp_tsnmap_get_ctsn(tsnmap)))
> +                       break;
>
> +               __skb_unlink(skb, list);
> +               freed += skb_headlen(skb);
>                  sctp_ulpevent_free(event);
>                  sctp_tsnmap_renege(tsnmap, tsn);
>                  if (freed >= needed)
> #
>
>
>
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: Wednesday, January 30, 2013 7:59 PM
> To: Montgomery, Bob
> Cc: Roberts, Lee A.; linux-sctp@vger.kernel.org
> Subject: Re: Suspected renege problem in sctp
>
> On 01/30/2013 07:51 PM, Bob Montgomery wrote:
>> Vlad,
>>
>> If you're not the right guy to ask can you let me know who might be?
>
> Hi Bob.  I am still the right person, but it would be even better to
> address these types of questions and writeups to linux-sctp@vger.kernel.org.
>
>>
>> We've been investigating sctpspray hangs for quite some time.
>>
>> I think we're seeing a case where a renege operation is removing
>> fragments from the ulp reasm queue that are at or below the value
>> of cumulative_tsn_ack_point.  We put a BUG statement in
>> sctp_tsnmap_renege so we'd crash instead of ignoring this case:
>>
>>           if (TSN_lt(tsn, map->base_tsn))
>>                   return;
>
>
> Hmm..  Looking at the reneging functions there is no TSN checking at
> all.  You are completely right.  We MUST not renege DATA that has moved
> the cumulative_tsn_ack_point.
>
> Adding something like this in sctp_ulpq_renege_list should fix this:
>
> diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
> index ada1746..9bd94e6 100644
> --- a/net/sctp/ulpqueue.c
> +++ b/net/sctp/ulpqueue.c
> @@ -970,10 +970,14 @@ static __u16 sctp_ulpq_renege_list(struct
> sctp_ulpq *ulpq,
>           tsnmap = &ulpq->asoc->peer.tsn_map;
>
>           while ((skb = __skb_dequeue_tail(list)) != NULL) {
> -               freed += skb_headlen(skb);
>                   event = sctp_skb2event(skb);
>                   tsn = event->tsn;
> -
> +
> +               /* Make sure we do not renege below CTSN */
> +               if (TSN_lte(tsn, sctp_tsnmap_get_ctsn(tsnmap)))
> +                       break;
> +
> +               freed += skb_headlen(skb);
>                   sctp_ulpevent_free(event);
>                   sctp_tsnmap_renege(tsnmap, tsn);
>                   if (freed >= needed)
>
> I think there also might be a bug here when reneging from the ordered
> queue and the message has been reassembled.  I need to look at that a
> bit more.
>
> -vlad
>
>>
>> And after two days of sctpspray pounding, hit the bug.
>>
>> Here's a partial write-up using examples from the core file:
>>
>> ====================================>> Fact 1:  a renege operation will only be launched if there's a gap in
>> the tsn.
>>
>> So a reassembly queue like this one would not be set upon:
>>
>> PID 1784
>> sctp_association 0xffff88041b6a2000
>>
>>         tsn_map = 0xffff88041dd8d560,
>>         base_tsn = 0x55751715,
>>         cumulative_tsn_ack_point = 0x55751714,
>>         max_tsn_seen = 0x55751714,
>>
>> reasm queue summary:
>>     ssn = 0x345,   tsn = 0x5575170c,   msg_flags = 0x2,   rmem_len = 0x69c
>>     ssn = 0x345,   tsn = 0x5575170d,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x345,   tsn = 0x5575170e,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x345,   tsn = 0x5575170f,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x345,   tsn = 0x55751710,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x345,   tsn = 0x55751711,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x345,   tsn = 0x55751712,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x345,   tsn = 0x55751713,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x345,   tsn = 0x55751714,   msg_flags = 0x0,   rmem_len = 0x69c
>>
>> No gap: the last tsn in the reasm queue matches cumulative_tsn_ack_point = 0x55751714,
>>
>>
>> In our case, I believe the reasm queue looked like this when the renege launched:
>>
>> In sctp_association 0xffff88041b845000
>>
>>         base_tsn = 0x936a6d76,
>>         cumulative_tsn_ack_point = 0x936a6d75,
>>         max_tsn_seen = 0x936a6d79,
>>
>>     ssn = 0x0,   tsn = 0x936a6d6f,   msg_flags = 0x2,   rmem_len = 0x69c
>>     ssn = 0x0,   tsn = 0x936a6d70,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x0,   tsn = 0x936a6d71,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x0,   tsn = 0x936a6d72,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x0,   tsn = 0x936a6d73,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x0,   tsn = 0x936a6d74,   msg_flags = 0x0,   rmem_len = 0x69c
>>     ssn = 0x0,   tsn = 0x936a6d75,   msg_flags = 0x0,   rmem_len = 0x69c
>>
>>     ssn = 0x0,   tsn = 0x936a6d78,   msg_flags = 0x0,   rmem_len = 0x628
>>
>> The gap between 75 and 78 meant that a renege could be launched.
>>
>> It was launched for tsn = 0x936a6d76 (just arriving and apparently out
>> of memory), and the "needed" amount was 0x05ac (1452 bytes).
>>
>> tsn = 0x936a6d78 was removed, and the amount recovered was 0x538 (1336 bytes).
>> The value of "rmem_len" in the event is not what is used to calculated needed
>> and freed.
>>
>> Since 0x538 didn't satisfy 0x5ac, it went for the next one down on the queue
>> (tsn = 0x936a6d75)
>> and recovered 0x5ac from it for a total recovery of 0xae4 (2788 bytes).
>> So because the first post-gap fragment happened to be a LAST_FRAG and shorter than
>> the rest of them, it wasn't enough to satisfy the request and we moved on
>> to the one that caused the BUG.
>>
>> If there had been two gapped frags, or if the gapped frag had been another
>> middle one that was big enough to satisfy the request, it would not have
>> been caught freeing a fragment that was at the cumulative tsn ack point.
>> ====================================
>>
>> Since the base_tsn and cumulative_tsn_ack_point are advanced in
>> sctp_ulpevent_make_rcvmsg() before putting the fragments on the
>> reasm queue, the renege code should not be allowed to dip below
>> that point in sctp_ulpq_renege_list().   Otherwise, you're
>> discarding undelivered data that you've already reported as
>> "delivered" to the sender, right?
>>
>> Thanks,
>> Bob Montgomery
>>
>>
>>
>>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suspected renege problem in sctp
  2013-01-31  2:58 Suspected renege problem in sctp Vlad Yasevich
  2013-01-31  4:30 ` Roberts, Lee A.
  2013-01-31 15:08 ` Vlad Yasevich
@ 2013-02-04 23:47 ` Bob Montgomery
  2013-02-05 15:56 ` Vlad Yasevich
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Bob Montgomery @ 2013-02-04 23:47 UTC (permalink / raw)
  To: linux-sctp

On Thu, 2013-01-31 at 10:08 -0500, Vlad Yasevich wrote:
> On 01/30/2013 11:30 PM, Roberts, Lee A. wrote:
> > Vlad,
> >
> > The test code that I'm running at the moment has changes similar to the following.
> > I think we want to peek at the tail of the queue---and not dequeue (or unlink) the
> > data until we're sure we want to renege.
> 
> You are right.  If Bob can send a signed-off patch linux-sctp and 
> netdev, we can get it upstream and into stable releases.
> 
> -vlad

Vlad,

This is just one of many things we suspect, and doesn't explain (or fix)
the hang we're looking at.  Lee and I are working on a list of problems
around renege, tsnmap management, reassembly, and partial delivery mode.

Here's a current favorite potential issue (documented by Lee):

In sctp_ulpq_renege():

        /* If able to free enough room, accept this chunk. */
        if (chunk && (freed >= needed)) {
                __u32 tsn;
                tsn = ntohl(chunk->subh.data_hdr->tsn);
                sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
                sctp_ulpq_tail_data(ulpq, chunk, gfp);

                sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
        }

sctp_tsnmap_mark is called *before* calling sctp_ulpq_tail_data().  But
sctp_ulpq_tail_data can fail to allocated memory and return -ENOMEM.  So
potentially we've marked this tsn as present and then failed to actually
keep it, right?


Here's another potential issue:

Since an event in the lobby has a single tsn value, but it might have
been reassembled from several fragments (with sequential tsn's), the
renege_list operation only calls sctp_tsnmap_renege with the single
tsn.  So now I've discarded multiple tsn's worth of data, but only
noted one of them in the map, right??

And another:

Under normal operation, an event that fills a hole in the lobby will
result in a list of events (the new one and sequential ones that had
been waiting in the lobby) being sent to sctp_ulpq_tail_event().  Then
we do this:
         /* Check if the user wishes to receive this event.  */
        if (!sctp_ulpevent_is_enabled(event, &sctp_sk(sk)->subscribe))
                goto out_free;

In out_free, we do 
                sctp_queue_purge_ulpevents(skb_list);


So if the first event was a notification that we don't subscribe to,
but the remaining 100 were data, do we really throw out all the
other data with it??

These don't explain my favorite hang either, but I think I'm finally
getting close to that problem.

These things uncovered while trying to understand this code, and the
fact that we're not testing and debugging on the current kernel is why
we're not sending in any patches yet.  

Thanks for any confirmation or insight you can provide :-)

Bob Montgomery



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suspected renege problem in sctp
  2013-01-31  2:58 Suspected renege problem in sctp Vlad Yasevich
                   ` (2 preceding siblings ...)
  2013-02-04 23:47 ` Bob Montgomery
@ 2013-02-05 15:56 ` Vlad Yasevich
  2013-02-05 23:56 ` Bob Montgomery
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Vlad Yasevich @ 2013-02-05 15:56 UTC (permalink / raw)
  To: linux-sctp

On 02/04/2013 06:47 PM, Bob Montgomery wrote:
> On Thu, 2013-01-31 at 10:08 -0500, Vlad Yasevich wrote:
>> On 01/30/2013 11:30 PM, Roberts, Lee A. wrote:
>>> Vlad,
>>>
>>> The test code that I'm running at the moment has changes similar to the following.
>>> I think we want to peek at the tail of the queue---and not dequeue (or unlink) the
>>> data until we're sure we want to renege.
>>
>> You are right.  If Bob can send a signed-off patch linux-sctp and
>> netdev, we can get it upstream and into stable releases.
>>
>> -vlad
>
> Vlad,
>
> This is just one of many things we suspect, and doesn't explain (or fix)
> the hang we're looking at.  Lee and I are working on a list of problems
> around renege, tsnmap management, reassembly, and partial delivery mode.
>
> Here's a current favorite potential issue (documented by Lee):
>
> In sctp_ulpq_renege():
>
>          /* If able to free enough room, accept this chunk. */
>          if (chunk && (freed >= needed)) {
>                  __u32 tsn;
>                  tsn = ntohl(chunk->subh.data_hdr->tsn);
>                  sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
>                  sctp_ulpq_tail_data(ulpq, chunk, gfp);
>
>                  sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
>          }
>
> sctp_tsnmap_mark is called *before* calling sctp_ulpq_tail_data().  But
> sctp_ulpq_tail_data can fail to allocated memory and return -ENOMEM.  So
> potentially we've marked this tsn as present and then failed to actually
> keep it, right?

The sctp_tsnmap_mark() here is not needed since sctp_ulpq_tail_data() 
will mark the TSN properly.

>
>
> Here's another potential issue:
>
> Since an event in the lobby has a single tsn value, but it might have
> been reassembled from several fragments (with sequential tsn's), the
> renege_list operation only calls sctp_tsnmap_renege with the single
> tsn.  So now I've discarded multiple tsn's worth of data, but only
> noted one of them in the map, right??
>

Right.  I noticed this one as well.  Not only do we fail to clean up the 
TSN map but we also do not compute the freed space correctly.  That 
could result in us discarding more data then necessary.

> And another:
>
> Under normal operation, an event that fills a hole in the lobby will
> result in a list of events (the new one and sequential ones that had
> been waiting in the lobby) being sent to sctp_ulpq_tail_event().  Then
> we do this:
>           /* Check if the user wishes to receive this event.  */
>          if (!sctp_ulpevent_is_enabled(event, &sctp_sk(sk)->subscribe))
>                  goto out_free;
>
> In out_free, we do
>                  sctp_queue_purge_ulpevents(skb_list);
>
>
> So if the first event was a notification that we don't subscribe to,
> but the remaining 100 were data, do we really throw out all the
> other data with it??

No and for 2 reasons.
   1. sctp_ulpevent_is_enabled only checks for notification events, not 
DATA.
   2. Notification events aren't ordered and are always singular.

So, you will either have all data in the list or a singular notification 
that you don't subscribe to.

-vlad


>
> These don't explain my favorite hang either, but I think I'm finally
> getting close to that problem.
>
> These things uncovered while trying to understand this code, and the
> fact that we're not testing and debugging on the current kernel is why
> we're not sending in any patches yet.
>
> Thanks for any confirmation or insight you can provide :-)
>
> Bob Montgomery
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suspected renege problem in sctp
  2013-01-31  2:58 Suspected renege problem in sctp Vlad Yasevich
                   ` (3 preceding siblings ...)
  2013-02-05 15:56 ` Vlad Yasevich
@ 2013-02-05 23:56 ` Bob Montgomery
  2013-02-06 15:58 ` Vlad Yasevich
  2013-02-06 23:04 ` Bob Montgomery
  6 siblings, 0 replies; 8+ messages in thread
From: Bob Montgomery @ 2013-02-05 23:56 UTC (permalink / raw)
  To: linux-sctp

On Tue, 2013-02-05 at 10:56 -0500, Vlad Yasevich wrote:
> On 02/04/2013 06:47 PM, Bob Montgomery wrote:
> 
> > And another:
> >
> > Under normal operation, an event that fills a hole in the lobby will
> > result in a list of events (the new one and sequential ones that had
> > been waiting in the lobby) being sent to sctp_ulpq_tail_event().  Then
> > we do this:
> >           /* Check if the user wishes to receive this event.  */
> >          if (!sctp_ulpevent_is_enabled(event, &sctp_sk(sk)->subscribe))
> >                  goto out_free;
> >
> > In out_free, we do
> >                  sctp_queue_purge_ulpevents(skb_list);
> >
> >
> > So if the first event was a notification that we don't subscribe to,
> > but the remaining 100 were data, do we really throw out all the
> > other data with it??
> 
> No and for 2 reasons.
>    1. sctp_ulpevent_is_enabled only checks for notification events, not 
> DATA.
>    2. Notification events aren't ordered and are always singular.
> 
> So, you will either have all data in the list or a singular notification 
> that you don't subscribe to.
> 
> -vlad

Thanks.  That's the kind of insight I was referring to :-)  I'll take
that worry off my list.

We have two different hangs:

I believe the first one is caused by this code under these conditions:

        /* If able to free enough room, accept this chunk. */
        if (chunk && (freed >= needed)) {
                __u32 tsn;
                tsn = ntohl(chunk->subh.data_hdr->tsn);
                sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
                sctp_ulpq_tail_data(ulpq, chunk, gfp);

                sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
        }



If the reasm queue looks like this after we renege to make room for
LAST N+4:

	FIRST	N
	MIDDLE	N+1
        MIDDLE  N+2
	MIDDLE	N+3   <-- cumulative_tsn_ack
			<-- base_tsn would be N+4

	
	FIRST	N+10

then the call to sctp_ulpq_tail_data(ulpq, chunk, gfp);
will reasm and deliver N,N+1,N+2,N+3,N+4 moving the cumulative_ack
up to N+4 and leaving N+10 as the only item in the reasm queue.

Then we call sctp_ulpq_partial_delivery(ulpq, chunk, gfp);

chunk is ignored (then why even there??) and we'll partial
deliver FIRST N+10, which is scary because it's ahead of cumulative
tsn_ack.  And then we set pd_mode = 1.  

Now FIRST N+5 arrives and is put in the reasm queue by
sctp_ulpq_reasm().  But we're in pd_mode = 1 so we call
sctp_ulpq_retrieve_partial since N+5 is what we're looking
for.  But sctp_ulpq_retrieve_partial() will bail when it sees
N+5 (A FIRST_FRAG) because it is only looking for MIDDLE_FRAG
or LAST_FRAG in the first slot of the reasm queue.

And now I think we're stuck forever.
=======================

The second hang does not involve the reasm queue.  It occurs on a test
where all the events are non-fragmented.  The final state of the
ulpq lobby is this:

SSN   X
SSN   X+1
SSN   X+2
SSN   X+3
...
SSN   X+last

And the Next expected SSN value in ssnmap is X+1.
So we're waiting on X+1, but X is the first item in the queue.

I think that is caused under these conditions:

Say the lobby queue had:

ssn  10
ssn  10  (duplicate)
ssn  11
ssn  12

and we're waiting on ssn 9...

call sctp_ulpq_order with event ssn 9:
ssn_next is incremented to 10
call sctp_ulpq_retrieve_ordered()
start down the list and find 10.
ssn_next is incremented to 11.
grab ssn 10 off the queue, add to event_list and go around.
find 10 again and it's != new ssn_next(11), so break.

Now we're hung forever.

We built a module with a BUG statement on putting a duplicate
into the lobby and hit it.

The duplicate event was at the end of a group of sequential events,
followed by a gap and then another group of sequential events.
Coincidentally (or not), at the time the duplicate 
was sent to the lobby, it was represented by a lone bit in a
word of the tsnmap:

...
  ssn = 0x30d,
  tsn = 0x5505020f,

  ssn = 0x30e,     <<<<<<< About to insert this one again
  tsn = 0x55050210,

Big actual gap

  ssn = 0x378,
  tsn = 0x5505027a,

  ssn = 0x379,
  tsn = 0x5505027b,
...

tsn_map = 0xffff8807aa430b80,
base_tsn = 0x550501d0,

crash-6.0.8bobm> p/x 0x210-0x1d1
$8 = 0x3f
So 63 (0x3f) + 1 = 64 bits set,
then 106 (0x6a) - 1 = 105 bits clear,
then 12 (0xc) + 1 = 13 bits set.

crash-6.0.5bobm> rd 0xffff8807aa430b80 8
ffff8807aa430b80:  fffffffffffffffe 0000000000000001   ................
ffff8807aa430b90:  007ffc0000000000 0000000000000000   ................

fffffffffffffffe   1 off, 63 on
0000000000000001   1 on , 63 off
007ffc0000000000   42 off,  13 on

The lone bit in the second word describes tsn 0x55050210, our duplicate.


It might be a 1 in 64 coincidence that our duplicate was the LSB of an
otherwise empty 64-bit map word(?) or it might be some map manipulation
off by 1 error that maybe cleared the bit of an event we already had,
and let the state machine pass another copy??

We need to capture another example of this duplicate entry.

More later,
Bob Montgomery






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suspected renege problem in sctp
  2013-01-31  2:58 Suspected renege problem in sctp Vlad Yasevich
                   ` (4 preceding siblings ...)
  2013-02-05 23:56 ` Bob Montgomery
@ 2013-02-06 15:58 ` Vlad Yasevich
  2013-02-06 23:04 ` Bob Montgomery
  6 siblings, 0 replies; 8+ messages in thread
From: Vlad Yasevich @ 2013-02-06 15:58 UTC (permalink / raw)
  To: linux-sctp

On 02/05/2013 06:56 PM, Bob Montgomery wrote:
> On Tue, 2013-02-05 at 10:56 -0500, Vlad Yasevich wrote:
>> On 02/04/2013 06:47 PM, Bob Montgomery wrote:
>>
>>> And another:
>>>
>>> Under normal operation, an event that fills a hole in the lobby will
>>> result in a list of events (the new one and sequential ones that had
>>> been waiting in the lobby) being sent to sctp_ulpq_tail_event().  Then
>>> we do this:
>>>            /* Check if the user wishes to receive this event.  */
>>>           if (!sctp_ulpevent_is_enabled(event, &sctp_sk(sk)->subscribe))
>>>                   goto out_free;
>>>
>>> In out_free, we do
>>>                   sctp_queue_purge_ulpevents(skb_list);
>>>
>>>
>>> So if the first event was a notification that we don't subscribe to,
>>> but the remaining 100 were data, do we really throw out all the
>>> other data with it??
>>
>> No and for 2 reasons.
>>     1. sctp_ulpevent_is_enabled only checks for notification events, not
>> DATA.
>>     2. Notification events aren't ordered and are always singular.
>>
>> So, you will either have all data in the list or a singular notification
>> that you don't subscribe to.
>>
>> -vlad
>
> Thanks.  That's the kind of insight I was referring to :-)  I'll take
> that worry off my list.
>
> We have two different hangs:
>
> I believe the first one is caused by this code under these conditions:
>
>          /* If able to free enough room, accept this chunk. */
>          if (chunk && (freed >= needed)) {
>                  __u32 tsn;
>                  tsn = ntohl(chunk->subh.data_hdr->tsn);
>                  sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn);
>                  sctp_ulpq_tail_data(ulpq, chunk, gfp);
>
>                  sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
>          }
>
>
>
> If the reasm queue looks like this after we renege to make room for
> LAST N+4:
>
> 	FIRST	N
> 	MIDDLE	N+1
>          MIDDLE  N+2
> 	MIDDLE	N+3   <-- cumulative_tsn_ack
> 			<-- base_tsn would be N+4
>
> 	
> 	FIRST	N+10
>
> then the call to sctp_ulpq_tail_data(ulpq, chunk, gfp);
> will reasm and deliver N,N+1,N+2,N+3,N+4 moving the cumulative_ack
> up to N+4 and leaving N+10 as the only item in the reasm queue.
>
> Then we call sctp_ulpq_partial_delivery(ulpq, chunk, gfp);
>
> chunk is ignored (then why even there??) and we'll partial
> deliver FIRST N+10, which is scary because it's ahead of cumulative
> tsn_ack.  And then we set pd_mode = 1.
>
> Now FIRST N+5 arrives and is put in the reasm queue by
> sctp_ulpq_reasm().  But we're in pd_mode = 1 so we call
> sctp_ulpq_retrieve_partial since N+5 is what we're looking
> for.  But sctp_ulpq_retrieve_partial() will bail when it sees
> N+5 (A FIRST_FRAG) because it is only looking for MIDDLE_FRAG
> or LAST_FRAG in the first slot of the reasm queue.
>

This is an interesting find.  Seems to me that 
sctp_ulpq_retrieve_first() should make sure that we don't start pulling
fragments off the reassembly queue with TSN > Cumulative Ack Point 
(essentially no grabbing tsns that are after the gap).

This seems to be Day1 bug in that code as it blindly assumes that the 
next first fragment is the right one to start with.  The same problem 
also exist in sctp_ulpq_retrieve_partial().  There is no guarantee that 
the first fragment on the queue has TSN that is before the gap, so I 
might be possible to get into the exact same situation with middle/last 
fragments as well.

> And now I think we're stuck forever.
> =======================
>
> The second hang does not involve the reasm queue.  It occurs on a test
> where all the events are non-fragmented.  The final state of the
> ulpq lobby is this:
>
> SSN   X
> SSN   X+1
> SSN   X+2
> SSN   X+3
> ...
> SSN   X+last
>
> And the Next expected SSN value in ssnmap is X+1.
> So we're waiting on X+1, but X is the first item in the queue.
>
> I think that is caused under these conditions:
>
> Say the lobby queue had:
>
> ssn  10
> ssn  10  (duplicate)
> ssn  11
> ssn  12
>
> and we're waiting on ssn 9...

Hmm..  This is very strange.  We should never accept duplicate SSNs as 
they should also have duplicate TSNs as well.

>
> call sctp_ulpq_order with event ssn 9:
> ssn_next is incremented to 10
> call sctp_ulpq_retrieve_ordered()
> start down the list and find 10.
> ssn_next is incremented to 11.
> grab ssn 10 off the queue, add to event_list and go around.
> find 10 again and it's != new ssn_next(11), so break.
>
> Now we're hung forever.
>
> We built a module with a BUG statement on putting a duplicate
> into the lobby and hit it.
>
> The duplicate event was at the end of a group of sequential events,
> followed by a gap and then another group of sequential events.
> Coincidentally (or not), at the time the duplicate
> was sent to the lobby, it was represented by a lone bit in a
> word of the tsnmap:
>
> ...
>    ssn = 0x30d,
>    tsn = 0x5505020f,
>
>    ssn = 0x30e,     <<<<<<< About to insert this one again
>    tsn = 0x55050210,
>
> Big actual gap
>
>    ssn = 0x378,
>    tsn = 0x5505027a,
>
>    ssn = 0x379,
>    tsn = 0x5505027b,
> ...
>
> tsn_map = 0xffff8807aa430b80,
> base_tsn = 0x550501d0,
>
> crash-6.0.8bobm> p/x 0x210-0x1d1
> $8 = 0x3f
> So 63 (0x3f) + 1 = 64 bits set,
> then 106 (0x6a) - 1 = 105 bits clear,
> then 12 (0xc) + 1 = 13 bits set.
>
> crash-6.0.5bobm> rd 0xffff8807aa430b80 8
> ffff8807aa430b80:  fffffffffffffffe 0000000000000001   ................
> ffff8807aa430b90:  007ffc0000000000 0000000000000000   ................
>
> fffffffffffffffe   1 off, 63 on
> 0000000000000001   1 on , 63 off
> 007ffc0000000000   42 off,  13 on

This seems like there are 2 gaps in the map.  The first GAP is CTSN+1 
(since bit 0 in the map is not set).  You appear to be filling the 
second very large gap which is OK.  Can you look through the ulpq to see
which other TSN had the SSN 0x30e.  It should still be in the queue.

Thanks
-vlad
>
> The lone bit in the second word describes tsn 0x55050210, our duplicate.
>
>
> It might be a 1 in 64 coincidence that our duplicate was the LSB of an
> otherwise empty 64-bit map word(?) or it might be some map manipulation
> off by 1 error that maybe cleared the bit of an event we already had,
> and let the state machine pass another copy??
>
> We need to capture another example of this duplicate entry.
>
> More later,
> Bob Montgomery
>
>
>
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suspected renege problem in sctp
  2013-01-31  2:58 Suspected renege problem in sctp Vlad Yasevich
                   ` (5 preceding siblings ...)
  2013-02-06 15:58 ` Vlad Yasevich
@ 2013-02-06 23:04 ` Bob Montgomery
  6 siblings, 0 replies; 8+ messages in thread
From: Bob Montgomery @ 2013-02-06 23:04 UTC (permalink / raw)
  To: linux-sctp

On Wed, 2013-02-06 at 10:58 -0500, Vlad Yasevich wrote:
> On 02/05/2013 06:56 PM, Bob Montgomery wrote:

> This is an interesting find.  Seems to me that 
> sctp_ulpq_retrieve_first() should make sure that we don't start pulling
> fragments off the reassembly queue with TSN > Cumulative Ack Point 
> (essentially no grabbing tsns that are after the gap).
> 
> This seems to be Day1 bug in that code as it blindly assumes that the 
> next first fragment is the right one to start with.  The same problem 
> also exist in sctp_ulpq_retrieve_partial().  There is no guarantee that 
> the first fragment on the queue has TSN that is before the gap, so I 
> might be possible to get into the exact same situation with middle/last 
> fragments as well.j

Lee Roberts is testing some ideas for fixing this reasm queue hang.

> > =======================
> >
> > The second hang does not involve the reasm queue.  It occurs on a test
> > where all the events are non-fragmented.  The final state of the
> > ulpq lobby is this:
> >
> > SSN   X
> > SSN   X+1
> > SSN   X+2
> > SSN   X+3
> > ...
> > SSN   X+last
> >
> > And the Next expected SSN value in ssnmap is X+1.
> > So we're waiting on X+1, but X is the first item in the queue.
> >
> > I think that is caused under these conditions:
> >
> > Say the lobby queue had:
> >
> > ssn  10
> > ssn  10  (duplicate)
> > ssn  11
> > ssn  12
> >
> > and we're waiting on ssn 9...
> 
> Hmm..  This is very strange.  We should never accept duplicate SSNs as 
> they should also have duplicate TSNs as well.

I believe we accepted it because when the second one came in, we didn't
think it was a duplicate.  In other words, I believe this problem is
caused by a problem in tsnmap management.

> >
> > call sctp_ulpq_order with event ssn 9:
> > ssn_next is incremented to 10
> > call sctp_ulpq_retrieve_ordered()
> > start down the list and find 10.
> > ssn_next is incremented to 11.
> > grab ssn 10 off the queue, add to event_list and go around.
> > find 10 again and it's != new ssn_next(11), so break.
> >
> > Now we're hung forever.
> >
> > We built a module with a BUG statement on putting a duplicate
> > into the lobby and hit it.
> >
> > The duplicate event was at the end of a group of sequential events,
> > followed by a gap and then another group of sequential events.
> > Coincidentally (or not), at the time the duplicate
> > was sent to the lobby, it was represented by a lone bit in a
> > word of the tsnmap:
> >
> > ...
> >    ssn = 0x30d,
> >    tsn = 0x5505020f,
> >
> >    ssn = 0x30e,     <<<<<<< About to insert this one again
> >    tsn = 0x55050210,
> >
> > Big actual gap
> >
> >    ssn = 0x378,
> >    tsn = 0x5505027a,
> >
> >    ssn = 0x379,
> >    tsn = 0x5505027b,
> > ...
> >
> > tsn_map = 0xffff8807aa430b80,
> > base_tsn = 0x550501d0,
> >
> > crash-6.0.8bobm> p/x 0x210-0x1d1
> > $8 = 0x3f
> > So 63 (0x3f) + 1 = 64 bits set,
> > then 106 (0x6a) - 1 = 105 bits clear,
> > then 12 (0xc) + 1 = 13 bits set.
> >
> > crash-6.0.5bobm> rd 0xffff8807aa430b80 8
> > ffff8807aa430b80:  fffffffffffffffe 0000000000000001   ................
> > ffff8807aa430b90:  007ffc0000000000 0000000000000000   ................
> >
> > fffffffffffffffe   1 off, 63 on
> > 0000000000000001   1 on , 63 off
> > 007ffc0000000000   42 off,  13 on
> 

That first one has to be there or the map will be updated and shifted.

> This seems like there are 2 gaps in the map.  The first GAP is CTSN+1 
> (since bit 0 in the map is not set).  You appear to be filling the 
> second very large gap which is OK.  Can you look through the ulpq to see
> which other TSN had the SSN 0x30e.  It should still be in the queue.

There is no different TSN with SSN 0x30e.  There is a copy of the
*same* TSN with SSN 0x30e.


Here is a dump of the two event structs and of their surrounding
sk_buffs. Check the date codes in the sk_buffs and check the head and
data pointers.

I think these are distinct in memory, and I think that means that we 
accepted the second one off the wire (probably) after sending a SACK
that said we needed it to be resent (because the first one's bit 
was missing from the tsnmap).

The duplicate that we got caught trying to insert in the lobby list:

crash-6.0.5bobm> sctp_ulpevent ffff8807a6b8dee8
struct sctp_ulpevent {
  asoc = 0xffff8807a60d5000, 
  stream = 0x0, 
  ssn = 0x30e, 
  flags = 0x0, 
  ppid = 0x0, 
  tsn = 0x55050210, 
  cumtsn = 0x0, 
  msg_flags = 0x83, 
  iif = 0x8, 
  rmem_len = 0x199
}

Enclosing sk_buff (starts 0x28 below the event):

crash-6.0.5bobm> struct sk_buff ffff8807a6b8dec0
struct sk_buff {
  next = 0xffff88082fcc3800, 
  prev = 0xffff88082fcc3800, 
  tstamp = {
    tv64 = 0x12ceda15c5be1876    <<<<<
  }, 
  sk = 0xffff8807a606c100, 
  dev = 0xffff88080bcf4000, 
  cb = "\000P\r\246\a\210\377\377\000\000\016\003\000\000\000\000\000
\000\000\000\020\002\005U\000\000\000\000\203\000\000\000\b\000\000\000
\231\001\000\000\000\000\000\000\000\000\000", 
  _skb_refdst = 0xffff88082b91ec81, 
  sp = 0x0, 
  len = 0xa9, 
  data_len = 0x0, 
  mac_len = 0xe, 
  hdr_len = 0x0, 
  {
    csum = 0x0, 
    {
      csum_start = 0x0, 
      csum_offset = 0x0
    }
  }, 
  priority = 0x0, 
  local_df = 0x0, 
  cloned = 0x1, 
  ip_summed = 0x0, 
  nohdr = 0x0, 
  nfctinfo = 0x0, 
  pkt_type = 0x0, 
  fclone = 0x0, 
  ipvs_property = 0x0, 
  peeked = 0x0, 
  nf_trace = 0x0, 
  protocol = 0x8, 
  destructor = 0xffffffffa02a663f <sctp_sock_rfree>, 
  nfct = 0x0, 
  nfct_reasm = 0x0, 
  nf_bridge = 0x0, 
  skb_iif = 0x8, 
  tc_index = 0x0, 
  tc_verd = 0x0, 
  rxhash = 0x0, 
  queue_mapping = 0x0, 
  ndisc_nodetype = 0x0, 
  ooo_okay = 0x0, 
  l4_rxhash = 0x0, 
  dma_cookie = 0x0, 
  secmark = 0x0, 
  {
    mark = 0x0, 
    dropcount = 0x0, 
    avail_size = 0x0
  }, 
  vlan_tci = 0x0, 
  transport_header = 0x62, 
  network_header = 0x4e, 
  mac_header = 0x40, 
  tail = 0x127, 
  end = 0x680, 
  head = 0xffff8807a603f800 "",    <<<<<<<<<<<<<<<<
  data = 0xffff8807a603f87e "",    <<<<<<<<<<<<<<<<
  truesize = 0x900, 
  users = {
    counter = 0x1
  }
}

========================
From the lobby list:

crash-6.0.5bobm> sctp_ulpevent ffff8807a6162ae8
struct sctp_ulpevent {
  asoc = 0xffff8807a60d5000,
  stream = 0x0,
  ssn = 0x30e,
  flags = 0x0,
  ppid = 0x0,
  tsn = 0x55050210,
  cumtsn = 0x0,
  msg_flags = 0x83,
  iif = 0x8,
  rmem_len = 0x199
}

Enclosing sk_buff:


sh-6.0.8bobm> struct sk_buff ffff8807a6162ac0
struct sk_buff {
  next = 0xffff8807a60b5e80, 
  prev = 0xffff8807a6162bc0, 
  tstamp = {
    tv64 = 0x12ceda15c58eb79b    <<<<<<<<<<<<
  }, 
  sk = 0xffff8807a606c100, 
  dev = 0xffff88080bcf4000, 
  cb = "\000P\r\246\a\210\377\377\000\000\016\003\000\000\000\000\000
\000\000\000\020\002\005U\000\000\000\000\203\000\000\000\b\000\000\000
\231\001\000\000\000\000\000\000\000\000\000", 
  _skb_refdst = 0xffff88082b91ec81, 
  sp = 0x0, 
  len = 0xa9, 
  data_len = 0x0, 
  mac_len = 0xe, 
  hdr_len = 0x0, 
  {
    csum = 0x0, 
    {
      csum_start = 0x0, 
      csum_offset = 0x0
    }
  }, 
  priority = 0x0, 
  local_df = 0x0, 
  cloned = 0x1, 
  ip_summed = 0x0, 
  nohdr = 0x0, 
  nfctinfo = 0x0, 
  pkt_type = 0x0, 
  fclone = 0x0, 
  ipvs_property = 0x0, 
  peeked = 0x0, 
  nf_trace = 0x0, 
  protocol = 0x8, 
  destructor = 0xffffffffa02a663f <sctp_sock_rfree>, 
  nfct = 0x0, 
  nfct_reasm = 0x0, 
  nf_bridge = 0x0, 
  skb_iif = 0x8, 
  tc_index = 0x0, 
  tc_verd = 0x0, 
  rxhash = 0x0, 
  queue_mapping = 0x0, 
  ndisc_nodetype = 0x0, 
  ooo_okay = 0x0, 
  l4_rxhash = 0x0, 
  dma_cookie = 0x0, 
  secmark = 0x0, 
  {
    mark = 0x0, 
    dropcount = 0x0, 
    avail_size = 0x0
  }, 
  vlan_tci = 0x0, 
  transport_header = 0x62, 
  network_header = 0x4e, 
  mac_header = 0x40, 
  tail = 0x58f, 
  end = 0x680, 
  head = 0xffff8807a60cd000 "",     <<<<<<<<<<
  data = 0xffff8807a60cd4e6 "",     <<<<<<<<<<
  truesize = 0x900, 
  users = {
    counter = 0x1
  }
}


And here is a result of a search through memory for the portion
of an sk_buff that includes the time stamp, sk, dev, asoc, and 
ssn for matches to this sk and asoc with SSN 0x30n.  I sorted by 
time stamps so you can see when the bulk of the SSN 30n pieces
came in and then you can see the much later arrival of the 
duplicate SSN 30e. 

Searching for all the ssn 30X values and (by hand) sorting by timestamp:



crash-6.0.5bobm> search -k -q skip 0xffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 30MMMMM

ffff8807a686b5d0:
    12ceda15c58e649a ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3000000

ffff8807a60dc290: 
    12ceda15c58e7ece ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3060000
ffff8807a60dc690:
    12ceda15c58e7ece ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3070000
ffff8807a60dc790:
    12ceda15c58e7ece ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3050000
ffff8807a60dc890:
    12ceda15c58e7ece ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3020000
ffff8807a60dca90:
    12ceda15c58e7ece ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3030000
ffff8807a60dce90:
    12ceda15c58e7ece ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3040000
ffff8807a686bed0:
    12ceda15c58e7ece ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3010000

ffff8807a6162ad0:
    12ceda15c58eb79b ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 30e0000   <<<<<<<
ffff8807a6162bd0:
    12ceda15c58eb79b ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 30d0000
ffff8807a66033d0:
    12ceda15c58eb79b ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3080000
ffff88080d7b3390:
    12ceda15c58eb79b ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 30b0000
ffff88080d7b3490:
    12ceda15c58eb79b ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 30a0000
ffff88080d7b3590:
    12ceda15c58eb79b ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 3090000
ffff88080d7b3790: 
    12ceda15c58eb79b ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 30c0000

ffff8807a6b8ded0:
    12ceda15c5be1876 ffff8807a606c100 ffff88080bcf4000 ffff8807a60d5000 30e0000  <<<<<<<

Output columns:
    sk_buff.tstamp   sk_buff.sk       sk_buff.dev      sk_buff.cb: asoc [stream, ssn, flags, ppid]

This last one at timestamp 12ceda15c5be1876 is the duplicate 30e. 

Bob Montgomery 



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-02-06 23:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-31  2:58 Suspected renege problem in sctp Vlad Yasevich
2013-01-31  4:30 ` Roberts, Lee A.
2013-01-31 15:08 ` Vlad Yasevich
2013-02-04 23:47 ` Bob Montgomery
2013-02-05 15:56 ` Vlad Yasevich
2013-02-05 23:56 ` Bob Montgomery
2013-02-06 15:58 ` Vlad Yasevich
2013-02-06 23:04 ` Bob Montgomery

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.