Hi Paolo, On 16/10/2019 16:07, Matthieu Baerts wrote: > Hi Paolo, > > On 16/10/2019 15:54, Paolo Abeni wrote: >> On Wed, 2019-10-16 at 13:14 +0200, Matthieu Baerts wrote: >>> Hi Paolo, >>> >>> On 16/10/2019 12:02, Matthieu Baerts wrote: >>>> Hi Paolo, >>>> >>>> On 16/10/2019 10:58, Paolo Abeni wrote: >>>>> On Wed, 2019-10-16 at 10:37 +0200, Matthieu Baerts wrote: >>>>>> On 16/10/2019 10:22, Paolo Abeni wrote: >>>>>>> Yes, I'm cooking patch && rebase right now. I hopefully will send >>>>>>> soon >>>>>>> v2 of the patches with a reference to the rebase branch in the cover >>>>>>> letter. >>>>>> >>>>>> If you only adds the line mentioned by Mat, I am fine if you >>>>>> update your >>>>>> branch directly, no need to send new patches that we will not use >>>>>> directly >>>>> >>>>> The new rebase branch is: >>>>> >>>>> https://github.com/pabeni/mptcp/tree/mptcp-proposal-recvmsg_rebase_8 >>>> >>>> Great. I just used it to re-create the tree (not the export branch >>>> yet). >>>> >>>> [...] >>>> >>>>> It's based on top of net-next commit 77ffe33363c0 ("hv_sock: use >>>>> HV_HYP_PAGE_SIZE for Hyper-V communication") >>>> >>>> Thank you for this rebase! >>>> >>>>> I checked for build issue only on the modified patches. >>>> >>>> I just asked the CI to validate each commit before doing the export. >>>> I will notify you if there is an issue. With the build queue, it might >>>> take ~1 hour. >>> >>> There was a small issue. I had to move some code to other topics: >>> >>> - 46ef07c43dd5: move code from "mptcp: add MIB counter infrastructure".. >>> - 4e9c2cbc67c6: ..to "mptcp: increment MIB counters in a few places" >>> - 44b3404b5062..9fd478f3331c: result (no diff) >>> >>> Now that the export is OK, I ran the selftests and I got: >>> >>> >>> # ns4  TCP   -> ns2  (10.0.1.2:10059) MPTCP    [ OK ] >>> # ns4  MPTCP -> ns2  (10.0.2.1:10060) MPTCP    [  162.412089] >>> ------------[ >>> cut here ]------------ >>> [  162.412863] Bad mapping: ssn=2827884 map_seq=2818916 >>> map_data_len=8318 >>> [  162.412878] WARNING: CPU: 0 PID: 0 at net/mptcp/subflow.c:333 >>> warn_bad_map.isra.7.part.8+0x1b/0x20 >>> [  162.415125] Modules linked in: >>> [  162.415621] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc1+ #3 >>> [  162.416571] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >>> BIOS 1.10.2-1ubuntu1 04/01/2014 >>> [  162.417931] RIP: 0010:warn_bad_map.isra.7.part.8+0x1b/0x20 >>> [  162.418768] Code: 00 5b e9 58 8b 30 00 0f 1f 84 00 00 00 00 00 89 d0 >>> 8b 0e 8b 17 89 c6 48 c7 c7 d0 f8 ac 85 c6 05 57 76 a4 00 01 e8 85 ce 5d >>> ff <0f> 0b c3 66 90 48 8b 46 58 48 83 e0 fe f7 40 74 00 00 00 30 74 1d >>> [  162.421622] RSP: 0018:ffffb3eb80003bc8 EFLAGS: 00010286 >>> [  162.422433] RAX: 0000000000000000 RBX: ffff93a51d6d44e0 RCX: >>> 0000000000000000 >>> [  162.423524] RDX: 000000000000003a RSI: ffffffff8634c35a RDI: >>> ffffffff8634c75a >>> [  162.424619] RBP: ffff93a51d5bd9c0 R08: ffffffff8634c320 R09: >>> 000000000000003a >>> [  162.425710] R10: 0000000000000a4e R11: 0000000080000000 R12: >>> ffff93a51d609200 >>> [  162.426794] R13: ffff93a51d5bd9c0 R14: ffff93a51d6092d0 R15: >>> ffff93a51d5f8680 >>> [  162.427911] FS:  0000000000000000(0000) GS:ffff93a51f200000(0000) >>> knlGS:0000000000000000 >>> [  162.429110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [  162.429931] CR2: 00007ffe5a15a000 CR3: 000000001d73a003 CR4: >>> 00000000003606f0 >>> [  162.430983] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> [  162.432032] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>> 0000000000000400 >>> [  162.433104] Call Trace: >>> [  162.433490]  >>> [  162.433801]  mptcp_subflow_data_available+0x3c6/0x560 >>> [  162.434551]  subflow_data_ready+0x3b/0x70 >>> [  162.435169]  tcp_data_queue+0x3b1/0xc40 >>> [  162.435776]  tcp_rcv_state_process+0x5ec/0xce0 >>> [  162.436466]  ? security_sock_rcv_skb+0x25/0x40 >>> [  162.437156]  ? tcp_v4_do_rcv+0x110/0x1c0 >>> [  162.437767]  tcp_v4_do_rcv+0x110/0x1c0 >>> [  162.438351]  tcp_v4_rcv+0x9c7/0xa80 >>> [  162.438900]  ip_protocol_deliver_rcu+0x27/0x1a0 >>> [  162.439603]  ip_local_deliver_finish+0x3f/0x50 >>> [  162.440296]  ip_local_deliver+0x74/0xe0 >>> [  162.440895]  ? ip_rcv_finish_core.isra.15+0xed/0x340 >>> [  162.441662]  ip_rcv+0xb7/0xc0 >>> [  162.442129]  __netif_receive_skb_one_core+0x79/0x90 >>> [  162.442886]  process_backlog+0x82/0x120 >>> [  162.443483]  net_rx_action+0x138/0x3d0 >>> [  162.444068]  __do_softirq+0xd8/0x2cb >>> [  162.444626]  irq_exit+0x9b/0xa0 >>> [  162.445113]  smp_apic_timer_interrupt+0x6f/0x120 >>> [  162.445820]  apic_timer_interrupt+0xf/0x20 >>> [  162.446458]  >>> [  162.446793] RIP: 0010:default_idle+0x20/0x140 >>> [  162.447473] Code: ff eb c7 e8 22 bc 5b ff 90 90 41 55 41 54 55 53 65 >>> 8b 2d 43 59 cf 7a 0f 1f 44 00 00 e9 07 00 00 00 0f 00 2d a4 9b 4f 00 fb >>> f4 <65> 8b 2d 29 59 cf 7a 0f 1f 44 00 00 5b 5d 41 5c 41 5d c3 65 8b 05 >>> [  162.450316] RSP: 0018:ffffffff85c03e98 EFLAGS: 00000246 ORIG_RAX: >>> ffffffffffffff13 >>> [  162.451468] RAX: ffffffff8531aa00 RBX: 0000000000000000 RCX: >>> 0000000000000000 >>> [  162.452476] RDX: 0000000000000001 RSI: 0000000000000083 RDI: >>> 0000000000000000 >>> [  162.453534] RBP: 0000000000000000 R08: 0000004254d4a19e R09: >>> 0000000000000004 >>> [  162.454546] R10: ffffb3eb80197d88 R11: 0000000000000000 R12: >>> 0000000000000000 >>> [  162.455642] R13: 0000000000000000 R14: ffff93a51ffd6180 R15: >>> 0000000000000000 >>> [  162.456728]  ? __sched_text_end+0x2/0x2 >>> [  162.457323]  do_idle+0x19a/0x230 >>> [  162.457832]  cpu_startup_entry+0x14/0x20 >>> [  162.458443]  start_kernel+0x4c0/0x4e2 >>> [  162.459016]  secondary_startup_64+0xa4/0xb0 >>> [  162.459669] ---[ end trace 869ccfaeaf536fdb ]--- >>> [ FAIL ] client exit code 0, server 141 >>> # \nnetns ns2 socket stat for 10060: >>> # State   Recv-Q    Send-Q        Local Address:Port         Peer >>> Address:Port >>> # \nnetns ns4 socket stat for 10060: >>> # State   Recv-Q    Send-Q        Local Address:Port         Peer >>> Address:Port >>> # ns4  MPTCP -> ns3  (10.0.2.2:10061) MPTCP    [ OK ] >>> >>> May you have a look at it because it looks linked to your >>> modification? :) >> >> I'm investigating this right now. It took a good deal of iterations to >> reproduce it and I lack some info even then, so it looks like it will >> take some time get to the bottom of it. > > Thank you for looking at that. It seems the server was a bit busy (CPU > but mainly IO I think) with other tasks when executing the test. I don't > know if it would help for you to debug this. Also it is using virtme[1], > maybe some configurations are making the bug easier to reproduce. > > [1] https://github.com/multipath-tcp/mptcp_net-next/tree/scripts/ci > ($ ./patches/Dockerfile.virtme.sh patches/virtme.sh) It seems easier to reproduce with KASAN and PROVE_LOCKING: -e KASAN -e KASAN_OUTLINE -d TEST_KASAN -e PROVE_LOCKING -d DEBUG_LOCKDEP Same warning but not the same path: # ns3 MPTCP -> ns3 (10.0.2.2:10045) MPTCP [ 136.536881] ------------[ cut here ]------------ [ 136.537926] Bad mapping: ssn=1995768 map_seq=1971326 map_data_len=16317 [ 136.537954] WARNING: CPU: 0 PID: 885 at net/mptcp/subflow.c:333 warn_bad_map.isra.5.part.6+0x3a/0x50 [ 136.540400] Modules linked in: [ 136.540893] CPU: 0 PID: 885 Comm: mptcp_connect Not tainted 5.4.0-rc1+ #4 [ 136.541914] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 136.543176] RIP: 0010:warn_bad_map.isra.5.part.6+0x3a/0x50 [ 136.544045] Code: c6 05 c7 9f 2b 01 01 e8 74 a1 06 ff 8b 6d 00 48 89 df e8 69 a1 06 ff 8b 13 44 89 e6 48 c7 c7 40 89 ab 91 89 e9 e8 46 7e de fe <0f> 0b 5b 5d 41 5c c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 [ 136.546901] RSP: 0018:ffff88801766f888 EFLAGS: 00010282 [ 136.547750] RAX: 0000000000000000 RBX: ffff88801874de38 RCX: ffffffff8fffb8cd [ 136.548860] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff888019cfa044 [ 136.549959] RBP: 0000000000003fbd R08: 0000000000000001 R09: fffffbfff267539b [ 136.550980] R10: 0000000000000001 R11: fffffbfff267539a R12: 00000000001e73f8 [ 136.552052] R13: ffff88801874de00 R14: ffff88801874de00 R15: ffff888017fb9780 [ 136.553168] FS: 00007f24c9e9e500(0000) GS:ffff88801aa00000(0000) knlGS:0000000000000000 [ 136.554430] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 136.555342] CR2: 00007ffea64cfe6e CR3: 0000000017142004 CR4: 00000000003606f0 [ 136.556450] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 136.557531] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 136.558632] Call Trace: [ 136.559039] mptcp_subflow_data_available+0x818/0xa10 [ 136.559791] ? subflow_v4_init_req+0x490/0x490 [ 136.560459] ? tcp_read_sock+0x2db/0x490 [ 136.561046] ? dns_query+0x3c0/0x3c0 [ 136.561603] ? mark_held_locks+0x65/0x90 [ 136.562227] ? __local_bh_enable_ip+0x8f/0xf0 [ 136.562915] ? lock_sock_nested+0x39/0xa0 [ 136.563564] mptcp_recvmsg+0x43a/0x790 [ 136.564175] ? mptcp_retransmit_timer+0xe0/0xe0 [ 136.564891] ? mark_held_locks+0x1a/0x90 [ 136.565513] ? mark_lock+0x128/0x700 [ 136.566084] ? lockdep_hardirqs_on+0x17d/0x250 [ 136.566796] ? sock_has_perm+0x9b/0x170 [ 136.567413] ? selinux_secmark_relabel_packet+0x70/0x70 [ 136.568237] ? match_held_lock+0x1b/0x240 [ 136.568889] inet_recvmsg+0x2ad/0x350 [ 136.569476] ? inet_sendpage+0xb0/0xb0 [ 136.570082] ? security_socket_recvmsg+0x48/0x60 [ 136.570814] sock_read_iter+0x120/0x1b0 [ 136.571413] ? sock_recvmsg+0xb0/0xb0 [ 136.571981] ? iov_iter_init+0x8b/0xc0 [ 136.572551] new_sync_read+0x24a/0x360 [ 136.573117] ? generic_remap_file_range_prep+0x760/0x760 [ 136.573918] ? __fsnotify_parent+0x92/0x240 [ 136.574557] ? fsnotify+0x5d5/0x600 [ 136.575114] ? fsnotify_first_mark+0xe0/0xe0 [ 136.575806] ? security_file_permission+0xd1/0x170 [ 136.576562] vfs_read+0xa0/0x190 [ 136.577088] ksys_read+0x115/0x160 [ 136.577636] ? kernel_write+0xb0/0xb0 [ 136.578225] ? lockdep_hardirqs_off+0xb5/0x100 [ 136.578926] ? mark_held_locks+0x1a/0x90 [ 136.579564] do_syscall_64+0x63/0x250 [ 136.580148] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 136.580942] RIP: 0033:0x7f24c999d081 [ 136.581510] Code: fe ff ff 48 8d 3d 67 9c 0a 00 48 83 ec 08 e8 a6 4c 02 00 66 0f 1f 44 00 00 48 8d 05 81 08 2e 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 [ 136.584367] RSP: 002b:00007ffc9e1604b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 136.585534] RAX: ffffffffffffffda RBX: 0000000000002000 RCX: 00007f24c999d081 [ 136.586633] RDX: 0000000000002000 RSI: 00007ffc9e1604d0 RDI: 0000000000000006 [ 136.587738] RBP: 00007ffc9e1604d0 R08: 00007f24c9c781e0 R09: 00007f24c9c78240 [ 136.588804] R10: 00007ffc9e1625c4 R11: 0000000000000246 R12: 0000000000000006 [ 136.589827] R13: 00007ffc9e1604c8 R14: 0000000000002000 R15: 0000000000000000 [ 136.590892] irq event stamp: 62002 [ 136.591416] hardirqs last enabled at (62001): [] vprintk_emit+0xb5/0x2a0 [ 136.592619] hardirqs last disabled at (62002): [] trace_hardirqs_off_thunk+0x1a/0x20 [ 136.594047] softirqs last enabled at (61996): [] __do_softirq+0x49e/0x570 [ 136.595340] softirqs last disabled at (61989): [] irq_exit+0xf4/0x100 [ 136.596465] ---[ end trace 15817eda66b43ee4 ]--- write: Bad message # [ FAIL ] client exit code 141, server 0 # \nnetns ns3 socket stat for 10045: # State Recv-Q Send-Q Local Address:Port Peer Address:Port # \nnetns ns3 socket stat for 10045: # State Recv-Q Send-Q Local Address:Port Peer Address:Port # ns3 MPTCP -> ns3 (10.0.3.2:10046) MPTCP [ OK ] Cheers, Matt-- Matthieu Baerts | R&D Engineer matthieu.baerts(a)tessares.net Tessares SA | Hybrid Access Solutions www.tessares.net 1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium