All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: Xin Long <lucien.xin@gmail.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	Neal Cardwell <ncardwell@google.com>,
	Michael Tuexen <michael.tuexen@lurchi.franken.de>,
	Neil Horman <nhorman@tuxdriver.com>,
	Netdev <netdev@vger.kernel.org>,
	linux-sctp@vger.kernel.org, David Miller <davem@davemloft.net>,
	David Ahern <dsa@cumulusnetworks.com>,
	Eric Dumazet <edumazet@google.com>,
	syzkaller <syzkaller@googlegroups.com>
Subject: Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
Date: Mon, 4 Jun 2018 10:34:07 +0200	[thread overview]
Message-ID: <CACT4Y+bVvSsQm0hywC-_UqnJKhBeQryZZBjYaiWszvQGURS=vA@mail.gmail.com> (raw)
In-Reply-To: <CADvbK_fbKbH2wm6Xurr+ELVag-LvyQdL+peJd=wp7OL7_zMZTQ@mail.gmail.com>

On Tue, May 29, 2018 at 7:45 PM, Xin Long <lucien.xin@gmail.com> wrote:
> On Wed, May 30, 2018 at 1:06 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
>> On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote:
>>> On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner <
>>> marcelo.leitner@gmail.com> wrote:
>>> > - patch2 - fix rtx attack vector
>>> >    - Add the floor value to rto_min to HZ/20 (which fits the values
>>> >      that Michael shared on the other email)
>>>
>>> I would encourage allowing minimum RTO values down to 5ms, if the ACK
>>> policy in the receiver makes this feasible. Our experience is that in
>>> datacenter environments it can be advantageous to allow timer-based loss
>>> recoveries using timeout values as low as 5ms, e.g.:
>>
>> Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at
>> ~25ms already. Xin, can you share more details on the hw, which CPU
>> was used?

Hi,

Did we reach any decision on this? This continues to produce bug
reports on syzbot.

I am not sure whom you are asking, because Xin is you unless I am
missing something :)
But if you mean syzbot hardware, then it's GCE VMs with modern Intel
CPUs but an important aspect is a heavy-debug config (which you can
take from here https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f)
and systematic bug reporting. So if it's any flaky in your testing, it
will produce dozens of bug emails on syzbot.


> It was on a KVM guest,  "-smp 2,cores=1,threads=1,sockets=2"
> # lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                2
> On-line CPU(s) list:   0,1
> Thread(s) per core:    1
> Core(s) per socket:    1
> Socket(s):             2
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 13
> Model name:            QEMU Virtual CPU version 1.5.3
> Stepping:              3
> CPU MHz:               2397.222
> BogoMIPS:              4794.44
> Hypervisor vendor:     KVM
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              4096K
> NUMA node0 CPU(s):     0,1
> Flags:                 fpu de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good
> nopl cpuid pni cx16 hypervisor lahf_lm abm pti
>
> If we're counting on max_t to fix this CPU stuck. It should not that
> matter if min rto < the value causing that stuck.
>
>>
>> Anyway, what about we add a floor to rto_max too, so that RTO can
>> actually grow into something bigger that don't hog the CPU? Like:
>> rto_min floor = 5ms
>> rto_max floor = 50ms
>>
>>   Marcelo

WARNING: multiple messages have this Message-ID (diff)
From: Dmitry Vyukov <dvyukov@google.com>
To: Xin Long <lucien.xin@gmail.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	Neal Cardwell <ncardwell@google.com>,
	Michael Tuexen <michael.tuexen@lurchi.franken.de>,
	Neil Horman <nhorman@tuxdriver.com>,
	Netdev <netdev@vger.kernel.org>,
	linux-sctp@vger.kernel.org, David Miller <davem@davemloft.net>,
	David Ahern <dsa@cumulusnetworks.com>,
	Eric Dumazet <edumazet@google.com>,
	syzkaller <syzkaller@googlegroups.com>
Subject: Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
Date: Mon, 04 Jun 2018 08:34:07 +0000	[thread overview]
Message-ID: <CACT4Y+bVvSsQm0hywC-_UqnJKhBeQryZZBjYaiWszvQGURS=vA@mail.gmail.com> (raw)
In-Reply-To: <CADvbK_fbKbH2wm6Xurr+ELVag-LvyQdL+peJd=wp7OL7_zMZTQ@mail.gmail.com>

On Tue, May 29, 2018 at 7:45 PM, Xin Long <lucien.xin@gmail.com> wrote:
> On Wed, May 30, 2018 at 1:06 AM, Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
>> On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote:
>>> On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner <
>>> marcelo.leitner@gmail.com> wrote:
>>> > - patch2 - fix rtx attack vector
>>> >    - Add the floor value to rto_min to HZ/20 (which fits the values
>>> >      that Michael shared on the other email)
>>>
>>> I would encourage allowing minimum RTO values down to 5ms, if the ACK
>>> policy in the receiver makes this feasible. Our experience is that in
>>> datacenter environments it can be advantageous to allow timer-based loss
>>> recoveries using timeout values as low as 5ms, e.g.:
>>
>> Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at
>> ~25ms already. Xin, can you share more details on the hw, which CPU
>> was used?

Hi,

Did we reach any decision on this? This continues to produce bug
reports on syzbot.

I am not sure whom you are asking, because Xin is you unless I am
missing something :)
But if you mean syzbot hardware, then it's GCE VMs with modern Intel
CPUs but an important aspect is a heavy-debug config (which you can
take from here https://syzkaller.appspot.com/bug?extid=cd59a1f907245f891f)
and systematic bug reporting. So if it's any flaky in your testing, it
will produce dozens of bug emails on syzbot.


> It was on a KVM guest,  "-smp 2,cores=1,threads=1,sockets=2"
> # lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                2
> On-line CPU(s) list:   0,1
> Thread(s) per core:    1
> Core(s) per socket:    1
> Socket(s):             2
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 13
> Model name:            QEMU Virtual CPU version 1.5.3
> Stepping:              3
> CPU MHz:               2397.222
> BogoMIPS:              4794.44
> Hypervisor vendor:     KVM
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              4096K
> NUMA node0 CPU(s):     0,1
> Flags:                 fpu de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good
> nopl cpuid pni cx16 hypervisor lahf_lm abm pti
>
> If we're counting on max_t to fix this CPU stuck. It should not that
> matter if min rto < the value causing that stuck.
>
>>
>> Anyway, what about we add a floor to rto_max too, so that RTO can
>> actually grow into something bigger that don't hog the CPU? Like:
>> rto_min floor = 5ms
>> rto_max floor = 50ms
>>
>>   Marcelo

  parent reply	other threads:[~2018-06-04  8:34 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-25 17:41 [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs Xin Long
2018-05-25 17:41 ` Xin Long
2018-05-25 19:13 ` Neil Horman
2018-05-25 19:13   ` Neil Horman
2018-05-26 15:42   ` Michael Tuexen
2018-05-26 15:42     ` Michael Tuexen
2018-05-26 15:50     ` Dmitry Vyukov
2018-05-26 15:50       ` Dmitry Vyukov
2018-05-27  1:01       ` Neil Horman
2018-05-27  1:01         ` Neil Horman
2018-05-28 19:43         ` Marcelo Ricardo Leitner
2018-05-28 19:43           ` Marcelo Ricardo Leitner
2018-05-29 11:41           ` Neil Horman
2018-05-29 11:41             ` Neil Horman
2018-05-29 13:06             ` Michael Tuexen
2018-05-29 13:06               ` Michael Tuexen
2018-05-29 15:45               ` Marcelo Ricardo Leitner
2018-05-29 15:45                 ` Marcelo Ricardo Leitner
2018-05-29 16:03                 ` Neal Cardwell
2018-05-29 16:03                   ` Neal Cardwell
2018-05-29 17:06                   ` Marcelo Ricardo Leitner
2018-05-29 17:06                     ` Marcelo Ricardo Leitner
2018-05-29 17:45                     ` Xin Long
2018-05-29 17:45                       ` Xin Long
2018-05-29 18:02                       ` Marcelo Ricardo Leitner
2018-05-29 18:02                         ` Marcelo Ricardo Leitner
2018-06-04  8:34                       ` Dmitry Vyukov [this message]
2018-06-04  8:34                         ` Dmitry Vyukov
2018-06-04 12:15                         ` Xin Long
2018-06-04 12:15                           ` Xin Long
2018-05-27  8:58       ` Michael Tuexen
2018-05-27  8:58         ` Michael Tuexen
2018-05-28 18:56         ` Marcelo Ricardo Leitner
2018-05-28 18:56           ` Marcelo Ricardo Leitner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACT4Y+bVvSsQm0hywC-_UqnJKhBeQryZZBjYaiWszvQGURS=vA@mail.gmail.com' \
    --to=dvyukov@google.com \
    --cc=davem@davemloft.net \
    --cc=dsa@cumulusnetworks.com \
    --cc=edumazet@google.com \
    --cc=linux-sctp@vger.kernel.org \
    --cc=lucien.xin@gmail.com \
    --cc=marcelo.leitner@gmail.com \
    --cc=michael.tuexen@lurchi.franken.de \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=syzkaller@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.