From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Ibanez Subject: Re: Linux ECN Handling Date: Mon, 1 Jan 2018 23:43:36 -0800 Message-ID: References: <20171019124312.GE16796@breakpoint.cc> <5A006CF6.1020608@iogearbox.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Eric Dumazet , Yuchung Cheng , Daniel Borkmann , Netdev , Florian Westphal , Mohammad Alizadeh , Lawrence Brakmo To: Neal Cardwell Return-path: Received: from mx0a-00000d04.pphosted.com ([148.163.149.245]:49116 "EHLO mx0a-00000d04.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750979AbeABHno (ORCPT ); Tue, 2 Jan 2018 02:43:44 -0500 Received: from pps.filterd (m0102886.ppops.net [127.0.0.1]) by mx0a-00000d04.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id w027gOtU008177 for ; Mon, 1 Jan 2018 23:43:44 -0800 Received: from mx0a-00000d03.pphosted.com (mx0a-00000d03.pphosted.com [148.163.149.244]) by mx0a-00000d04.pphosted.com with ESMTP id 2f85reg0de-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 01 Jan 2018 23:43:44 -0800 Received: from pps.filterd (m0102880.ppops.net [127.0.0.1]) by mx0a-00000d03.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id w027dlj8020795 for ; Mon, 1 Jan 2018 23:43:43 -0800 Received: from codegreen6.stanford.edu (codegreen6.stanford.edu [171.67.224.8]) by mx0a-00000d03.pphosted.com with ESMTP id 2f82wqrm3b-1 (version=TLSv1 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 01 Jan 2018 23:43:43 -0800 Received: from codegreen6.stanford.edu (localhost.localdomain [127.0.0.1]) by codegreen6.stanford.edu (Postfix) with ESMTP id 7E3633E for ; Mon, 1 Jan 2018 23:43:43 -0800 (PST) Received: from smtp.stanford.edu (smtp1.stanford.edu [171.67.219.81]) by codegreen6.stanford.edu (Postfix) with ESMTP id 665BF3E for ; Mon, 1 Jan 2018 23:43:43 -0800 (PST) Received: from mail-io0-f172.google.com (mail-io0-f172.google.com [209.85.223.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: sibanez) by smtp.stanford.edu (Postfix) with ESMTPSA id 4AFBB4EE79 for ; Mon, 1 Jan 2018 23:43:38 -0800 (PST) Received: by mail-io0-f172.google.com with SMTP id w188so3712032iod.10 for ; Mon, 01 Jan 2018 23:43:38 -0800 (PST) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Hi Neal, Apologies for the delay, and happy new year! To answer your question, data is only transferred in one direction (from the client to the server). The SeqNo in the pkts from the server to the client is not incremented. So I don't think that a data pkt is attempted to be sent in the tcp_data_snd_check() call clearing the ICSK_ACK_SCHED bit. Although I think it would be helpful to include your new debugging field in the tcp_sock (tp->processing_cwr) so that I can check this field in the tcp_transmit_skb() and tcp_send_ack() functions. I added the new field and tried to set it at the top of the tcp_rcv_established(), but then when I try to check the field in the tcp_send_ack() function it never appears to be set. Below I'm showing how I set the tp->processing_cwr field in the tcp_rcv_established function and how I check it in the tcp_send_ack function. Is this how you were imagining the processing_cwr field to be used? void tcp_rcv_established(struct sock *sk, struct sk_buff *skb, const struct tcphdr *th, unsigned int len) { struct tcp_sock *tp = tcp_sk(sk); + struct inet_sock *inet = inet_sk(sk); + + // SI: Debugging TCP ECN handeling + if (tcp_hdr(skb)->cwr) { + tp->processing_cwr = 1; + } else { + tp->processing_cwr = 0; + } tcp_mstamp_refresh(tp); if (unlikely(!sk->sk_rx_dst)) void tcp_send_ack(struct sock *sk) { struct sk_buff *buff; + // SI: Debugging TCP ECN haneling + const struct tcp_sock *tp = tcp_sk(sk); + struct inet_sock *inet = inet_sk(sk); + + // SI: Debugging TCP ECN handeling + if ((sk->sk_family == AF_INET) && tp->processing_cwr) { + printk("tcp_debug: tcp_send_ack: %pI4/%u - CWR set and rcv_nxt=%u, snd_una=%u\n", + &inet->inet_daddr, ntohs(inet->inet_sport), tp->rcv_nxt, tp->snd_una); + } /* If we have been reset, we may not send again. */ if (sk->sk_state == TCP_CLOSE) Thanks! -Steve On Wed, Dec 20, 2017 at 12:17 PM, Neal Cardwell wrote: > On Wed, Dec 20, 2017 at 2:20 PM, Steve Ibanez wrote: >> >> Hi Neal, >> >> I added in some more printk statements and it does indeed look like >> all of these calls you listed are being invoked successfully. I guess >> this isn't too surprising given what the inet_csk_schedule_ack() and >> inet_csk_ack_scheduled() functions are doing: >> >> static inline void inet_csk_schedule_ack(struct sock *sk) >> { >> inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_SCHED; >> } >> >> static inline int inet_csk_ack_scheduled(const struct sock *sk) >> { >> return inet_csk(sk)->icsk_ack.pending & ICSK_ACK_SCHED; >> } >> >> So through the code path that you listed, the inet_csk_schedule_ack() >> function sets the ICSK_ACK_SCHED bit and then the tcp_ack_snd_check() >> function just checks that the ICSK_ACK_SCHED bit is indeed set. >>inet_csk_schedule_ack >> Do you know how I can verify that setting the ICSK_ACK_SCHED bit >> actually results in an ACK being sent? > > Hmm. I don't think in this case we can verify that setting the > ICSK_ACK_SCHED bit actually results in an ACK being sent. Because > AFAICT in this case it seems like an ACK is not sent. :-) This is > based on both the tcpdumps on Dec 5 and your detective work yesterday > ("The tcp_rcv_established() function calls tcp_ack_snd_check() at the > end of step5 and then the return statement indicated below is invoked, > which prevents the __tcp_ack_snd_check() function from running.") > > So AFAICT the puzzle is: how is the icsk_ack.pending ICSK_ACK_SCHED > bit being cleared between the inet_csk_schedule_ack() call and the > tcp_ack_snd_check() call, without (apparently) an actual ACK being > sent on the wire? > > AFAICT the ICSK_ACK_SCHED bit is not supposed to be cleared unless we > get to this sequence: > > tcp_transmit_skb() > if (likely(tcb->tcp_flags & TCPHDR_ACK)) > tcp_event_ack_sent(sk, tcp_skb_pcount(skb)); > -> inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK); > icsk->icsk_ack.blocked = icsk->icsk_ack.pending = 0; > > I don't have a theory that fits all of those data points, unless this > is a bi-directional transfer (is it?) and between the > inet_csk_schedule_ack() call and the tcp_ack_snd_check() call the TCP > connection sends a data packet (in tcp_data_snd_check()) and then it > is dropped for some reason before the packet make it to the tcpdump > sniffing point. Perhaps because of a qdisc limit or something? > > I guess a possible next step would be, while processing an incoming > skb with the cwr bit set, the code could set a new debugging field in > the tcp_sock (tp->processing_cwr), and then you could check this field > in tcp_transmit_skb() and printk if (1) there is an attempted > queue_xmit() cal and (2) if the queue_xmit() fails (the err > 0 case). > > That's a long shot, but the only idea I have at this point. > > thanks, > neal