* BUG: corrupted list in p9_read_work @ 2018-07-16 5:59 syzbot 2018-10-09 1:07 ` syzbot 0 siblings, 1 reply; 33+ messages in thread From: syzbot @ 2018-07-16 5:59 UTC (permalink / raw) To: davem, ericvh, linux-kernel, lucho, netdev, rminnich, syzkaller-bugs, v9fs-developer Hello, syzbot found the following crash on: HEAD commit: 37b5dca2898d Merge tag 'rtc-4.18-2' of git://git.kernel.or.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=12b3e770400000 kernel config: https://syzkaller.appspot.com/x/.config?x=25856fac4e580aa7 dashboard link: https://syzkaller.appspot.com/bug?extid=2222c34dc40b515f30dc compiler: gcc (GCC) 8.0.1 20180413 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com list_del corruption, ffff8801931d8e28->next is LIST_POISON1 (dead000000000100) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:47! invalid opcode: 0000 [#1] SMP KASAN CPU: 0 PID: 27 Comm: kworker/0:2 Not tainted 4.18.0-rc4+ #148 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events p9_read_work RIP: 0010:__list_del_entry_valid.cold.1+0x26/0x58 lib/list_debug.c:45 Code: 02 fe 0f 0b 4c 89 e2 48 89 de 48 c7 c7 60 68 1a 88 e8 ba 69 02 fe 0f 0b 4c 89 ea 48 89 de 48 c7 c7 00 68 1a 88 e8 a6 69 02 fe <0f> 0b 48 89 de 48 c7 c7 20 69 1a 88 e8 95 69 02 fe 0f 0b 48 89 de RSP: 0018:ffff8801d94575b0 EFLAGS: 00010286 RAX: 000000000000004e RBX: ffff8801931d8e28 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81631821 RDI: 0000000000000001 RBP: ffff8801d94575c8 R08: ffff8801d9448700 R09: ffffed003b5c4fc0 R10: ffffed003b5c4fc0 R11: ffff8801dae27e07 R12: dead000000000200 R13: dead000000000100 R14: ffff8801931d8e00 R15: ffff8801996f7a10 FS: 0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9e4e7e0000 CR3: 00000001aeeae000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __list_del_entry include/linux/list.h:117 [inline] list_del include/linux/list.h:125 [inline] p9_read_work+0xa29/0x1060 net/9p/trans_fd.c:375 process_one_work+0xc73/0x1ba0 kernel/workqueue.c:2153 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x345/0x410 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 4e7d7b3b542c6c1e ]--- RIP: 0010:__list_del_entry_valid.cold.1+0x26/0x58 lib/list_debug.c:45 Code: 02 fe 0f 0b 4c 89 e2 48 89 de 48 c7 c7 60 68 1a 88 e8 ba 69 02 fe 0f 0b 4c 89 ea 48 89 de 48 c7 c7 00 68 1a 88 e8 a6 69 02 fe <0f> 0b 48 89 de 48 c7 c7 20 69 1a 88 e8 95 69 02 fe 0f 0b 48 89 de RSP: 0018:ffff8801d94575b0 EFLAGS: 00010286 RAX: 000000000000004e RBX: ffff8801931d8e28 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81631821 RDI: 0000000000000001 RBP: ffff8801d94575c8 R08: ffff8801d9448700 R09: ffffed003b5c4fc0 R10: ffffed003b5c4fc0 R11: ffff8801dae27e07 R12: dead000000000200 R13: dead000000000100 R14: ffff8801931d8e00 R15: ffff8801996f7a10 FS: 0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9e4e7e0000 CR3: 00000001aeeae000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-07-16 5:59 BUG: corrupted list in p9_read_work syzbot @ 2018-10-09 1:07 ` syzbot 2018-10-09 2:09 ` Dominique Martinet 0 siblings, 1 reply; 33+ messages in thread From: syzbot @ 2018-10-09 1:07 UTC (permalink / raw) To: asmadeus, davem, ericvh, linux-kernel, lucho, netdev, rminnich, syzkaller-bugs, v9fs-developer syzbot has found a reproducer for the following crash on: HEAD commit: 0854ba5ff5c9 Merge git://git.kernel.org/pub/scm/linux/kern.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1514ec06400000 kernel config: https://syzkaller.appspot.com/x/.config?x=88e9a8a39dc0be2d dashboard link: https://syzkaller.appspot.com/bug?extid=2222c34dc40b515f30dc compiler: gcc (GCC) 8.0.1 20180413 (experimental) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10b91685400000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com FS-Cache: N-cookie d=000000000a092700 n=00000000d8ee0022 FS-Cache: N-key=[10] '34323935303034313132' list_del corruption, ffff88019ae36ee8->next is LIST_POISON1 (dead000000000100) ------------[ cut here ]------------ kobject: '9p-11043': free name kernel BUG at lib/list_debug.c:47! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 2686 Comm: kworker/1:2 Not tainted 4.19.0-rc7+ #274 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 kobject: '9p-11049' (0000000096206f7a): kobject_add_internal: parent: 'bdi', set: 'devices' Workqueue: events p9_read_work RIP: 0010:__list_del_entry_valid.cold.1+0x26/0x58 lib/list_debug.c:45 Code: d7 fd 0f 0b 4c 89 e2 48 89 de 48 c7 c7 40 92 40 88 e8 7a a2 d7 fd 0f 0b 4c 89 ea 48 89 de 48 c7 c7 e0 91 40 88 e8 66 a2 d7 fd <0f> 0b 48 89 de 48 c7 c7 00 93 40 88 e8 55 a2 d7 fd 0f 0b 48 89 de RSP: 0018:ffff8801cc5975b8 EFLAGS: 00010282 kobject: '9p-11049' (0000000096206f7a): kobject_uevent_env RAX: 000000000000004e RBX: ffff88019ae36ee8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81650405 RDI: 0000000000000005 RBP: ffff8801cc5975d0 R08: ffff8801cc58a4c0 R09: ffffed003b5e4fe8 R10: ffffed003b5e4fe8 R11: ffff8801daf27f47 R12: dead000000000200 R13: dead000000000100 R14: ffff8801c8931050 R15: ffff8801c8931010 FS: 0000000000000000(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fef196c1000 CR3: 00000001ccff7000 CR4: 00000000001406e0 Call Trace: kobject: '9p-11049' (0000000096206f7a): fill_kobj_path: path = '/devices/virtual/bdi/9p-11049' __list_del_entry include/linux/list.h:117 [inline] list_del include/linux/list.h:125 [inline] p9_read_work+0xab6/0x10e0 net/9p/trans_fd.c:379 kobject: 'loop4' (00000000513f3e2f): kobject_uevent_env FS-Cache: Duplicate cookie detected process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153 FS-Cache: O-cookie c=00000000911358e4 [p=000000006545c95d fl=222 nc=0 na=1] FS-Cache: O-cookie d=000000000a092700 n=000000007635356b FS-Cache: O-key=[10] ' 34 32 39 35 30 30 34 31 32 worker_thread+0x17f/0x1390 kernel/workqueue.c:2296 36 ' FS-Cache: N-cookie c=00000000abaeee81 [p=000000006545c95d fl=2 nc=0 na=1] FS-Cache: N-cookie d=000000000a092700 n=00000000ee16a363 FS-Cache: N-key=[10] ' 34 32 39 35 30 30 34 kthread+0x35a/0x420 kernel/kthread.c:246 31 32 36 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 ' Modules linked in: ---[ end trace 41e06641f5c3c814 ]--- kobject: '9p-11050' (000000002a096aa2): kobject_add_internal: parent: 'bdi', set: 'devices' RIP: 0010:__list_del_entry_valid.cold.1+0x26/0x58 lib/list_debug.c:45 Code: d7 fd 0f 0b 4c 89 e2 48 89 de 48 c7 c7 40 92 40 88 e8 7a a2 d7 fd 0f 0b 4c 89 ea 48 89 de 48 c7 c7 e0 91 40 88 e8 66 a2 d7 fd <0f> 0b 48 89 de 48 c7 c7 00 93 40 88 e8 55 a2 d7 fd 0f 0b 48 89 de RSP: 0018:ffff8801cc5975b8 EFLAGS: 00010282 RAX: 000000000000004e RBX: ffff88019ae36ee8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81650405 RDI: 0000000000000005 RBP: ffff8801cc5975d0 R08: ffff8801cc58a4c0 R09: ffffed003b5e4fe8 R10: ffffed003b5e4fe8 R11: ffff8801daf27f47 R12: dead000000000200 R13: dead000000000100 R14: ffff8801c8931050 R15: ffff8801c8931010 FS: 0000000000000000(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fef196c1000 CR3: 00000001ccff7000 CR4: 00000000001406e0 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-09 1:07 ` syzbot @ 2018-10-09 2:09 ` Dominique Martinet 2018-10-09 4:05 ` [PATCH 1/2] 9p/trans_fd: abort p9_read_work if req status changed Dominique Martinet ` (3 more replies) 0 siblings, 4 replies; 33+ messages in thread From: Dominique Martinet @ 2018-10-09 2:09 UTC (permalink / raw) To: syzbot Cc: davem, ericvh, linux-kernel, lucho, netdev, rminnich, syzkaller-bugs, v9fs-developer syzbot wrote on Mon, Oct 08, 2018: > syzbot has found a reproducer for the following crash on: > > HEAD commit: 0854ba5ff5c9 Merge git://git.kernel.org/pub/scm/linux/kern.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1514ec06400000 > kernel config: https://syzkaller.appspot.com/x/.config?x=88e9a8a39dc0be2d > dashboard link: https://syzkaller.appspot.com/bug?extid=2222c34dc40b515f30dc > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10b91685400000 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com > > list_del corruption, ffff88019ae36ee8->next is LIST_POISON1 > (dead000000000100) > ------------[ cut here ]------------ > [...] > list_del include/linux/list.h:125 [inline] > p9_read_work+0xab6/0x10e0 net/9p/trans_fd.c:379 Hmm this looks very much like the report from syzbot+735d926e9d1317c3310c@syzkaller.appspotmail.com which should have been fixed by Tomas in 9f476d7c540cb ("net/9p/trans_fd.c: fix race by holding the lock")... It looks like another double list_del, looking at the code again there actually are other ways this could happen around connection errors. For example, - p9_read_work receives something and lookup works... meanwhile - p9_write_work fails to write and calls p9_conn_cancel, which deletes from the req_list without waiting for other works to finish (could also happen in p9_poll_mux) - p9_read_work finishes processing the read and deletes from list again For this one the simplest fix would probably be to just not list_del/call p9_client_cb at all if m->r?req->status isn't REQ_STATUS_ERROR in p9_read_work after the "got new packet" debug print, and frankly I think that's saner so I'll send a patch shortly doing that, but I have zero confidence there aren't similar bugs around, the tcp code is so messy... Most of the syzbot reports recently have been around trans_fd which I don't think is used much in real life, and this is not really motivating (i.e. I think it would probably need a more extensive rewrite but nobody cares) :/ Dmitry, on that note, do you think syzbot could possibly test other transports somehow? rdma or virtio cannot be faked as easily as passing a fd around, but I'd be very interested in seeing these flayed a bit. (I'm also curious what logic is used to generate the syz tests, the write$P9_Rxx replies have nothing to do with what the client would expect so it probably doesn't test very far; this test in particular does not even get past the initial P9_TVERSION that the client would expect immediately after mount, so it's basically only testing logic around packet handling on error... Or if we're accepting a RREADDIR in reply to TVERSION we have bigger problems, and now I'm looking at it I think we just might never check that....... I'll look at that for the next cycle) Back to the current patch, since as I said I am not confident this is a good enough fix for the current bug, will I get notified if the bug happens again once the patch hits linux-next with the Reported-by tag ? (I don't have the setup necessary to run a syz repro as there is no C repro, and won't have much time to do that setup sorry) > FS-Cache: N-cookie d=000000000a092700 n=00000000d8ee0022 > FS-Cache: N-key=[10] '34323935303034313132' > FS-Cache: Duplicate cookie detected > FS-Cache: O-cookie c=00000000911358e4 [p=000000006545c95d fl=222 nc=0 na=1] > FS-Cache: O-cookie d=000000000a092700 n=000000007635356b > FS-Cache: O-key=[10] ' > [...] (on an unrelated topic, I got these FS-Cache warnings quite often when testing with cache enabled and have no idea what they mean. I don't normally use cache so haven't spent time looking at it, but I find these rather worrying... If someone having a clue reads this, I'd love to hear what they could mean and what we should look at) Thanks, -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 1/2] 9p/trans_fd: abort p9_read_work if req status changed 2018-10-09 2:09 ` Dominique Martinet @ 2018-10-09 4:05 ` Dominique Martinet 2018-10-09 4:05 ` [PATCH 2/2] 9p/trans_fd: put worker reqs on destroy Dominique Martinet 2018-10-10 14:03 ` BUG: corrupted list in p9_read_work Dmitry Vyukov ` (2 subsequent siblings) 3 siblings, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-10-09 4:05 UTC (permalink / raw) Cc: Dominique Martinet, v9fs-developer, netdev, linux-kernel, syzkaller-bugs, Eric Van Hensbergen, Latchesar Ionkov From: Dominique Martinet <dominique.martinet@cea.fr> p9_read_work would try to handle an errored req even if it got put to error state by another thread between the lookup (that worked) and the time it had been fully read. The request itself is safe to use because we hold a ref to it from the lookup (for m->rreq, so it was safe to read into the request data buffer until this point), but the req_list has been deleted at the same time status changed, and client_cb already has been called as well, so we should not do either. Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> --- As written in reply to the bug report I'm not sure it's what syzbot complained about exactly, but it feels like a correct thing to do. The second patch is unrelated to the syzbot report, but something that occured to me while looking at this ; I'll take both into linux-next around the start of next week after getting some proper testing done unless remarks happen. (they pass basic tests already) net/9p/trans_fd.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 12559c474dde..a0317d459cde 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -292,7 +292,6 @@ static void p9_read_work(struct work_struct *work) __poll_t n; int err; struct p9_conn *m; - int status = REQ_STATUS_ERROR; m = container_of(work, struct p9_conn, rq); @@ -375,11 +374,17 @@ static void p9_read_work(struct work_struct *work) p9_debug(P9_DEBUG_TRANS, "got new packet\n"); m->rreq->rc.size = m->rc.offset; spin_lock(&m->client->lock); - if (m->rreq->status != REQ_STATUS_ERROR) - status = REQ_STATUS_RCVD; - list_del(&m->rreq->req_list); - /* update req->status while holding client->lock */ - p9_client_cb(m->client, m->rreq, status); + if (m->rreq->status == REQ_STATUS_SENT) { + list_del(&m->rreq->req_list); + p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); + } else { + spin_unlock(&m->client->lock); + p9_debug(P9_DEBUG_ERROR, + "Request tag %d errored out while we were reading the reply\n", + m->rc.tag); + err = -EIO; + goto error; + } spin_unlock(&m->client->lock); m->rc.sdata = NULL; m->rc.offset = 0; -- 2.19.1 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 2/2] 9p/trans_fd: put worker reqs on destroy 2018-10-09 4:05 ` [PATCH 1/2] 9p/trans_fd: abort p9_read_work if req status changed Dominique Martinet @ 2018-10-09 4:05 ` Dominique Martinet 2018-10-09 13:19 ` Tomas Bortoli 0 siblings, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-10-09 4:05 UTC (permalink / raw) Cc: Dominique Martinet, v9fs-developer, netdev, linux-kernel, Eric Van Hensbergen, Latchesar Ionkov, Tomas Bortoli From: Dominique Martinet <dominique.martinet@cea.fr> p9_read_work/p9_write_work might still hold references to a req after having been cancelled; make sure we put any of these to avoid potential request leak on disconnect. Fixes: 728356dedeff8 ("9p: Add refcount to p9_req_t") Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Tomas Bortoli <tomasbortoli@gmail.com> --- Noticed we could leak a ref while looking at the syzbot report, this should be safe enough after the work has been cancelled... Probably. net/9p/trans_fd.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index a0317d459cde..f868cf6fba79 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -876,7 +876,15 @@ static void p9_conn_destroy(struct p9_conn *m) p9_mux_poll_stop(m); cancel_work_sync(&m->rq); + if (m->rreq) { + p9_req_put(m->rreq); + m->rreq = NULL; + } cancel_work_sync(&m->wq); + if (m->wreq) { + p9_req_put(m->wreq); + m->wreq = NULL; + } p9_conn_cancel(m, -ECONNRESET); -- 2.19.1 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] 9p/trans_fd: put worker reqs on destroy 2018-10-09 4:05 ` [PATCH 2/2] 9p/trans_fd: put worker reqs on destroy Dominique Martinet @ 2018-10-09 13:19 ` Tomas Bortoli 2018-10-15 10:46 ` Dominique Martinet 0 siblings, 1 reply; 33+ messages in thread From: Tomas Bortoli @ 2018-10-09 13:19 UTC (permalink / raw) To: Dominique Martinet Cc: Dominique Martinet, v9fs-developer, netdev, LKML, Eric Van Hensbergen, Latchesar Ionkov Il giorno mar 9 ott 2018 alle ore 06:06 Dominique Martinet <asmadeus@codewreck.org> ha scritto: > > From: Dominique Martinet <dominique.martinet@cea.fr> > > p9_read_work/p9_write_work might still hold references to a req after > having been cancelled; make sure we put any of these to avoid potential > request leak on disconnect. > > Fixes: 728356dedeff8 ("9p: Add refcount to p9_req_t") > Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> > Cc: Eric Van Hensbergen <ericvh@gmail.com> > Cc: Latchesar Ionkov <lucho@ionkov.net> > Cc: Tomas Bortoli <tomasbortoli@gmail.com> Reviewed-by: Tomas Bortoli <tomasbortoli@gmail.com> > --- > Noticed we could leak a ref while looking at the syzbot report, > this should be safe enough after the work has been cancelled... > Probably. > > net/9p/trans_fd.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c > index a0317d459cde..f868cf6fba79 100644 > --- a/net/9p/trans_fd.c > +++ b/net/9p/trans_fd.c > @@ -876,7 +876,15 @@ static void p9_conn_destroy(struct p9_conn *m) > > p9_mux_poll_stop(m); > cancel_work_sync(&m->rq); > + if (m->rreq) { > + p9_req_put(m->rreq); > + m->rreq = NULL; > + } > cancel_work_sync(&m->wq); > + if (m->wreq) { > + p9_req_put(m->wreq); > + m->wreq = NULL; > + } > > p9_conn_cancel(m, -ECONNRESET); > > -- > 2.19.1 > LGTM -- Tomas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] 9p/trans_fd: put worker reqs on destroy 2018-10-09 13:19 ` Tomas Bortoli @ 2018-10-15 10:46 ` Dominique Martinet 0 siblings, 0 replies; 33+ messages in thread From: Dominique Martinet @ 2018-10-15 10:46 UTC (permalink / raw) To: Tomas Bortoli Cc: Dominique Martinet, v9fs-developer, netdev, LKML, Eric Van Hensbergen, Latchesar Ionkov Tomas Bortoli wrote on Tue, Oct 09, 2018: > Il giorno mar 9 ott 2018 alle ore 06:06 Dominique Martinet > > Fixes: 728356dedeff8 ("9p: Add refcount to p9_req_t") > > Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> > > Cc: Eric Van Hensbergen <ericvh@gmail.com> > > Cc: Latchesar Ionkov <lucho@ionkov.net> > > Cc: Tomas Bortoli <tomasbortoli@gmail.com> > > Reviewed-by: Tomas Bortoli <tomasbortoli@gmail.com> Thanks Tomas. Tests seem ok, I've push both patches to my next branch, and will submit them with the rest in a couple of weeks. Quite a few of my patches haven't had reviews this cycle, if you read this and have a bit of time, please pull my branch and have a read, or each commit has a link to the lkml post to reply somewhat easily. Reminder gir url: git://github.com/martinetd/linux 9p-next -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-09 2:09 ` Dominique Martinet 2018-10-09 4:05 ` [PATCH 1/2] 9p/trans_fd: abort p9_read_work if req status changed Dominique Martinet @ 2018-10-10 14:03 ` Dmitry Vyukov 2018-10-10 14:40 ` Dominique Martinet 2018-10-10 14:29 ` BUG: corrupted list in p9_read_work Dmitry Vyukov 2018-10-10 14:42 ` Dmitry Vyukov 3 siblings, 1 reply; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-10 14:03 UTC (permalink / raw) To: Dominique Martinet, Leon Romanovsky Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Tue, Oct 9, 2018 at 4:09 AM, Dominique Martinet <asmadeus@codewreck.org> wrote: > syzbot wrote on Mon, Oct 08, 2018: >> syzbot has found a reproducer for the following crash on: >> >> HEAD commit: 0854ba5ff5c9 Merge git://git.kernel.org/pub/scm/linux/kern.. >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=1514ec06400000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=88e9a8a39dc0be2d >> dashboard link: https://syzkaller.appspot.com/bug?extid=2222c34dc40b515f30dc >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10b91685400000 >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: >> Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com >> >> list_del corruption, ffff88019ae36ee8->next is LIST_POISON1 >> (dead000000000100) >> ------------[ cut here ]------------ >> [...] >> list_del include/linux/list.h:125 [inline] >> p9_read_work+0xab6/0x10e0 net/9p/trans_fd.c:379 > > Hmm this looks very much like the report from > syzbot+735d926e9d1317c3310c@syzkaller.appspotmail.com > which should have been fixed by Tomas in 9f476d7c540cb > ("net/9p/trans_fd.c: fix race by holding the lock")... > > It looks like another double list_del, looking at the code again there > actually are other ways this could happen around connection errors. > For example, > - p9_read_work receives something and lookup works... meanwhile > - p9_write_work fails to write and calls p9_conn_cancel, which deletes > from the req_list without waiting for other works to finish (could also > happen in p9_poll_mux) > - p9_read_work finishes processing the read and deletes from list again > > For this one the simplest fix would probably be to just not > list_del/call p9_client_cb at all if m->r?req->status isn't > REQ_STATUS_ERROR in p9_read_work after the "got new packet" debug print, > and frankly I think that's saner so I'll send a patch shortly doing > that, but I have zero confidence there aren't similar bugs around, the > tcp code is so messy... Most of the syzbot reports recently have been > around trans_fd which I don't think is used much in real life, and this > is not really motivating (i.e. I think it would probably need a more > extensive rewrite but nobody cares) :/ > > > Dmitry, on that note, do you think syzbot could possibly test other > transports somehow? rdma or virtio cannot be faked as easily as passing > a fd around, but I'd be very interested in seeing these flayed a bit. Hi Dominique, How can they be faked? If we could create a private rdma/virtio stub instance per test process, then we could I think easily use that instance for 9p. But is it possible? Testing on real hardware is mostly outside of our priorities at the moment. I mean syzkaller itself can be run on anything, and one could extend descriptions to use a known rdma interface and run on a real hardware. But we can't afford this at the moment. As far as I understand RDMA maintainers run syzkaller on real hardware, but I don't know if they are up to including 9p into testing. +Leon ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-10 14:03 ` BUG: corrupted list in p9_read_work Dmitry Vyukov @ 2018-10-10 14:40 ` Dominique Martinet 2018-10-10 14:51 ` Dmitry Vyukov 0 siblings, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-10-10 14:40 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dmitry Vyukov wrote on Wed, Oct 10, 2018: > How can they be faked? > If we could create a private rdma/virtio stub instance per test > process, then we could I think easily use that instance for 9p. But is > it possible? "RDMA" itself can be faked pretty easily nowadays, there's a "rxe" driver that is soft RDMA over ethernet and can run over anything. The problem is that you can't just give the client a file like trans fd; you'd need to open an ""rdma socket"" (simplifying wording a bit), and afaik there is no standard tool for it ; or rather, the problem is that RDMA is packet based so even if there were you can't just write stuff in a fd and hope it'll work, so you need a server. If you're interested, 9p is trivial enough that I could provide you with a trivial server that works like your file (just need to reimplement something that parses header to packetize it properly; so you could write to its stdin for example) ; that'd require some setup in the VM (configure rxe and install that tool), but it would definitely be possible. What do you think ? For virtio, I'm not as familiar with the environment so I do not know if there are ways to fake it as easily unfortunately. > Testing on real hardware is mostly outside of our priorities at the > moment. I mean syzkaller itself can be run on anything, and one could > extend descriptions to use a known rdma interface and run on a real > hardware. But we can't afford this at the moment. Sure, I understand that. > As far as I understand RDMA maintainers run syzkaller on real > hardware, but I don't know if they are up to including 9p into > testing. +Leon I'd be interested in knowing what kind of tests runs there :) -- Dominique Martinet ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-10 14:40 ` Dominique Martinet @ 2018-10-10 14:51 ` Dmitry Vyukov 2018-10-10 15:58 ` Dominique Martinet 0 siblings, 1 reply; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-10 14:51 UTC (permalink / raw) To: Dominique Martinet Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Wed, Oct 10, 2018 at 4:40 PM, Dominique Martinet <asmadeus@codewreck.org> wrote: > Dmitry Vyukov wrote on Wed, Oct 10, 2018: >> How can they be faked? >> If we could create a private rdma/virtio stub instance per test >> process, then we could I think easily use that instance for 9p. But is >> it possible? > > "RDMA" itself can be faked pretty easily nowadays, there's a "rxe" > driver that is soft RDMA over ethernet and can run over anything. > > The problem is that you can't just give the client a file like trans fd; > you'd need to open an ""rdma socket"" (simplifying wording a bit), and > afaik there is no standard tool for it ; or rather, the problem is that > RDMA is packet based so even if there were you can't just write stuff > in a fd and hope it'll work, so you need a server. > > If you're interested, 9p is trivial enough that I could provide you with > a trivial server that works like your file (just need to reimplement > something that parses header to packetize it properly; so you could > write to its stdin for example) ; that'd require some setup in the VM > (configure rxe and install that tool), but it would definitely be > possible. > What do you think ? I would like to hear more details. Opening a socket is not a problem. Why do we need a tool for this? I don't understand the problem with "packet-based" and what does it mean to have a separate server? Any why? We definitely don't want to involve a separate third-party server, that's very problematic for multiple reasons. But we can have a chunk of custom C code inside of syzkaller. What exactly setup we need? I guess it will make things simpler if you provide some kind of "hello world" C program that mounts 9p/rdma. I don't need exact messages (they will be same as with pipe transport, right?) nor actual server implementation, but just the place where to inject these packets. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-10 14:51 ` Dmitry Vyukov @ 2018-10-10 15:58 ` Dominique Martinet 2018-10-11 12:33 ` Dmitry Vyukov 0 siblings, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-10-10 15:58 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dmitry Vyukov wrote on Wed, Oct 10, 2018: > > The problem is that you can't just give the client a file like trans fd; > > you'd need to open an ""rdma socket"" (simplifying wording a bit), and > > afaik there is no standard tool for it ; or rather, the problem is that > > RDMA is packet based so even if there were you can't just write stuff > > in a fd and hope it'll work, so you need a server. > > > > If you're interested, 9p is trivial enough that I could provide you with > > a trivial server that works like your file (just need to reimplement > > something that parses header to packetize it properly; so you could > > write to its stdin for example) ; that'd require some setup in the VM > > (configure rxe and install that tool), but it would definitely be > > possible. > > What do you think ? > > I would like to hear more details. > Opening a socket is not a problem. Why do we need a tool for this? Sorry, that's my head thinking unixy and piping things :) > I don't understand the problem with "packet-based" and what does it > mean to have a separate server? Any why? Packet-based means you can't just read/write in a fd and expect the other side to know where to cut the packets to send it to the client, but if we do it internally there's no problem. We know where to cut. > We definitely don't want to involve a separate third-party server, > that's very problematic for multiple reasons. But we can have a chunk > of custom C code inside of syzkaller. > What exactly setup we need? The setup itself isn't that bad, it's actually pretty much trivial - on a fedora VM I just had to run 'rxe_cfg start ens3' (virtio interface name) and then the infiniband tools are happy e.g. ibv_devinfo should list an interface if you have the userspace library that should have come with rxe_cfg. (specifically, my VM uses /etc/libibverbs.d/rxe.driver to point to the lib, and /usr/lib64/libibverbs/librxe-rdmav16.so the lib itself) Once tools like ibv_devinfo list the interface, it means syzkaller can use it, and very probably means the kernel can as well; that's it. > I guess it will make things simpler if you provide some kind of "hello > world" C program that mounts 9p/rdma. I don't need exact messages > (they will be same as with pipe transport, right?) nor actual server > implementation, but just the place where to inject these packets. That's still the tricky part, I'm afraid... Making a separate server would have been easy because I could have reused some of my junk for the actual connection handling (some rdma helper library I wrote ages ago[1]), but if you're going to just embed C code you'll probably want something lower level? I've never seen syzkaller use any library call but I'm not even sure I would know how to create a qp without libibverbs, would standard stuff be OK ? I think the interface improved quite a bit since I last looked at it so I'll need a bit of time to figure it out again but I'll send you a simple conection with a few messages soonish™ [1] https://github.com/cea-hpc/mooshika -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-10 15:58 ` Dominique Martinet @ 2018-10-11 12:33 ` Dmitry Vyukov 2018-10-11 13:10 ` Dominique Martinet 0 siblings, 1 reply; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-11 12:33 UTC (permalink / raw) To: Dominique Martinet Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Wed, Oct 10, 2018 at 5:58 PM, Dominique Martinet <asmadeus@codewreck.org> wrote: > Dmitry Vyukov wrote on Wed, Oct 10, 2018: >> > The problem is that you can't just give the client a file like trans fd; >> > you'd need to open an ""rdma socket"" (simplifying wording a bit), and >> > afaik there is no standard tool for it ; or rather, the problem is that >> > RDMA is packet based so even if there were you can't just write stuff >> > in a fd and hope it'll work, so you need a server. >> > >> > If you're interested, 9p is trivial enough that I could provide you with >> > a trivial server that works like your file (just need to reimplement >> > something that parses header to packetize it properly; so you could >> > write to its stdin for example) ; that'd require some setup in the VM >> > (configure rxe and install that tool), but it would definitely be >> > possible. >> > What do you think ? >> >> I would like to hear more details. >> Opening a socket is not a problem. Why do we need a tool for this? > > Sorry, that's my head thinking unixy and piping things :) > >> I don't understand the problem with "packet-based" and what does it >> mean to have a separate server? Any why? > > Packet-based means you can't just read/write in a fd and expect the > other side to know where to cut the packets to send it to the client, > but if we do it internally there's no problem. We know where to cut. Ah, OK. This should not be a problem because the descriptions: https://github.com/google/syzkaller/blob/master/sys/linux/9p.txt#L70 know what a packet is. So we always give write a single packet. >> We definitely don't want to involve a separate third-party server, >> that's very problematic for multiple reasons. But we can have a chunk >> of custom C code inside of syzkaller. >> What exactly setup we need? > > The setup itself isn't that bad, it's actually pretty much trivial - on > a fedora VM I just had to run 'rxe_cfg start ens3' (virtio interface > name) and then the infiniband tools are happy e.g. ibv_devinfo should > list an interface if you have the userspace library that should have > come with rxe_cfg. > (specifically, my VM uses /etc/libibverbs.d/rxe.driver to point to the > lib, and /usr/lib64/libibverbs/librxe-rdmav16.so the lib itself) > > Once tools like ibv_devinfo list the interface, it means syzkaller can > use it, and very probably means the kernel can as well; that's it. > > >> I guess it will make things simpler if you provide some kind of "hello >> world" C program that mounts 9p/rdma. I don't need exact messages >> (they will be same as with pipe transport, right?) nor actual server >> implementation, but just the place where to inject these packets. > > That's still the tricky part, I'm afraid... Making a separate server > would have been easy because I could have reused some of my junk for the > actual connection handling (some rdma helper library I wrote ages > ago[1]), but if you're going to just embed C code you'll probably want > something lower level? I've never seen syzkaller use any library call > but I'm not even sure I would know how to create a qp without > libibverbs, would standard stuff be OK ? Raw syscalls preferably. What does 'rxe_cfg start ens3' do on syscall level? Some netlink? Any libraries and utilities are hell pain in linux world. Will it work in Android userspace? gVisor? Who will explain all syzkaller users where they get this for their who-knows-what distro, which is 10 years old because of corp policies, and debug how their version of the library has a slightly incompatible version? For example, after figuring out that rxe_cfg actually comes from rdma-core (which is a separate delight on linux), my debian destribution failed to install it because of some conflicts around /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about such package. And we've just started :) Syscalls tend to be simpler and more reliable. If it gives ENOSUPP, ok, that's it. If it works, great, we can use it. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-11 12:33 ` Dmitry Vyukov @ 2018-10-11 13:10 ` Dominique Martinet 2018-10-11 13:27 ` Dmitry Vyukov 0 siblings, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-10-11 13:10 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dmitry Vyukov wrote on Thu, Oct 11, 2018: > > That's still the tricky part, I'm afraid... Making a separate server > > would have been easy because I could have reused some of my junk for the > > actual connection handling (some rdma helper library I wrote ages > > ago[1]), but if you're going to just embed C code you'll probably want > > something lower level? I've never seen syzkaller use any library call > > but I'm not even sure I would know how to create a qp without > > libibverbs, would standard stuff be OK ? > > Raw syscalls preferably. > What does 'rxe_cfg start ens3' do on syscall level? Some netlink? modprobe rdma_rxe (and a bunch of other rdma modules before that) then writes the interface name in /sys/module/rdma_rxe/parameters/add apparently; then checks it worked. this part could be done in C directly without too much trouble, but as long as the proper kernel configuration/modules are available > Any libraries and utilities are hell pain in linux world. Will it work > in Android userspace? gVisor? Who will explain all syzkaller users > where they get this for their who-knows-what distro, which is 10 years > old because of corp policies, and debug how their version of the > library has a slightly incompatible version? > For example, after figuring out that rxe_cfg actually comes from > rdma-core (which is a separate delight on linux), my debian > destribution failed to install it because of some conflicts around > /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about > such package. And we've just started :) The rdma ecosystem is a pain, I'll easily agree with that... > Syscalls tend to be simpler and more reliable. If it gives ENOSUPP, > ok, that's it. If it works, great, we can use it. I'll have to look into it a bit more; libibverbs abstracts a lot of stuff into per-nic userspace drivers (the files I cited in a previous mail) and basically with the mellanox cards I'm familiar with the whole user session looks like this: * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi version, what user driver to load etc) * it and the userspace driver issue "commands" over these two files' fd to setup the connection ; some commands are standard but some are specific to the interface and defined in the driver. There are many facets to a connection in RDMA: a protection domain used to register memory with the nic, a queue pair that is the actual tx/rx connection, optionally a completion channel that will be another fd to listen on for events that tell you something happened and finally some memory regions to directly communicate with the nic from userspace depending on the specific driver. * then there's the actual usage, more commands through the uverbs0 char device to register the memory you'll use, and once that's done it's entierly up to the driver - for example the mellanox lib can do everything in userspace playing with the memory regions it registered, but I'd wager the rxe driver does more calls through the uverbs0 fd... Honestly I'm not keen on reimplementing all of this; the interface itself pretty much depends on your version of the kernel (there is a common ABI defined, but as far as specific nics are concerned if your kernel module doesn't match the user library version you can get some nasty surprises), and it's far from the black or white of a good ol' ENOSUPP error. I'll look if I can figure out if there is a common subset of verbs commands that are standard and sufficient to setup a listening connection and exchange data that should be supported for all devices and would let us reimplement just that, but while I hear your point about android and ten years in the future I think it's more likely than ten years in the future the verb abi will have changed but libibverbs will just have the new version implemented and hide the change :P -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-11 13:10 ` Dominique Martinet @ 2018-10-11 13:27 ` Dmitry Vyukov 2018-10-11 13:40 ` Dmitry Vyukov 2018-10-11 14:19 ` Dominique Martinet 0 siblings, 2 replies; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-11 13:27 UTC (permalink / raw) To: Dominique Martinet Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Thu, Oct 11, 2018 at 3:10 PM, Dominique Martinet <asmadeus@codewreck.org> wrote: > Dmitry Vyukov wrote on Thu, Oct 11, 2018: >> > That's still the tricky part, I'm afraid... Making a separate server >> > would have been easy because I could have reused some of my junk for the >> > actual connection handling (some rdma helper library I wrote ages >> > ago[1]), but if you're going to just embed C code you'll probably want >> > something lower level? I've never seen syzkaller use any library call >> > but I'm not even sure I would know how to create a qp without >> > libibverbs, would standard stuff be OK ? >> >> Raw syscalls preferably. >> What does 'rxe_cfg start ens3' do on syscall level? Some netlink? > > modprobe rdma_rxe (and a bunch of other rdma modules before that) then > writes the interface name in /sys/module/rdma_rxe/parameters/add > apparently; then checks it worked. > this part could be done in C directly without too much trouble, but as > long as the proper kernel configuration/modules are available Now we are talking! We generally assume that all modules are simply compiled into kernel. At least that's we have on syzbot. If somebody can't compile them in, we can suggest to add modprobe into init. So this boils down to just writing to /sys/module/rdma_rxe/parameters/add. >> Any libraries and utilities are hell pain in linux world. Will it work >> in Android userspace? gVisor? Who will explain all syzkaller users >> where they get this for their who-knows-what distro, which is 10 years >> old because of corp policies, and debug how their version of the >> library has a slightly incompatible version? >> For example, after figuring out that rxe_cfg actually comes from >> rdma-core (which is a separate delight on linux), my debian >> destribution failed to install it because of some conflicts around >> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about >> such package. And we've just started :) > > The rdma ecosystem is a pain, I'll easily agree with that... > >> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP, >> ok, that's it. If it works, great, we can use it. > > I'll have to look into it a bit more; libibverbs abstracts a lot of > stuff into per-nic userspace drivers (the files I cited in a previous > mail) and basically with the mellanox cards I'm familiar with the whole > user session looks like this: > * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and > /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi > version, what user driver to load etc) > * it and the userspace driver issue "commands" over these two files' fd > to setup the connection ; some commands are standard but some are > specific to the interface and defined in the driver. But we will use some kind of virtual/stub driver, right? We don't have real hardware. So all these commands should be fixed and known for the virtual/stub driver. > There are many facets to a connection in RDMA: a protection domain used > to register memory with the nic, a queue pair that is the actual tx/rx > connection, optionally a completion channel that will be another fd to > listen on for events that tell you something happened and finally some > memory regions to directly communicate with the nic from userspace > depending on the specific driver. > * then there's the actual usage, more commands through the uverbs0 char > device to register the memory you'll use, and once that's done it's > entierly up to the driver - for example the mellanox lib can do > everything in userspace playing with the memory regions it registered, > but I'd wager the rxe driver does more calls through the uverbs0 fd... > > Honestly I'm not keen on reimplementing all of this; the interface > itself pretty much depends on your version of the kernel (there is a > common ABI defined, but as far as specific nics are concerned if your > kernel module doesn't match the user library version you can get some > nasty surprises), and it's far from the black or white of a good ol' > ENOSUPP error. > > > I'll look if I can figure out if there is a common subset of verbs > commands that are standard and sufficient to setup a listening > connection and exchange data that should be supported for all devices > and would let us reimplement just that, but while I hear your point > about android and ten years in the future I think it's more likely than > ten years in the future the verb abi will have changed but libibverbs > will just have the new version implemented and hide the change :P But again we don't need to support all of the available hardware. For example, we are testing net stack from external side using tun. tun is a very simple, virtual abstraction of a network card. It allows us to test all of generic net stack starting from L2 without messing with any real drivers and their differences entirely. I had impression that we are talking about something similar here too. Or not? Also I am a bit missing context about rdma<->9p interface. Do we need to setup all these ring buffers to satisfy the parts that 9p needs? Is it that 9p actually reads data directly from these ring buffers? Or there is some higher-level rdma interface that 9p uses? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-11 13:27 ` Dmitry Vyukov @ 2018-10-11 13:40 ` Dmitry Vyukov 2018-10-11 14:28 ` 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) Dominique Martinet 2018-10-11 14:19 ` Dominique Martinet 1 sibling, 1 reply; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-11 13:40 UTC (permalink / raw) To: Dominique Martinet Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Thu, Oct 11, 2018 at 3:27 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Thu, Oct 11, 2018 at 3:10 PM, Dominique Martinet > <asmadeus@codewreck.org> wrote: >> Dmitry Vyukov wrote on Thu, Oct 11, 2018: >>> > That's still the tricky part, I'm afraid... Making a separate server >>> > would have been easy because I could have reused some of my junk for the >>> > actual connection handling (some rdma helper library I wrote ages >>> > ago[1]), but if you're going to just embed C code you'll probably want >>> > something lower level? I've never seen syzkaller use any library call >>> > but I'm not even sure I would know how to create a qp without >>> > libibverbs, would standard stuff be OK ? >>> >>> Raw syscalls preferably. >>> What does 'rxe_cfg start ens3' do on syscall level? Some netlink? >> >> modprobe rdma_rxe (and a bunch of other rdma modules before that) then >> writes the interface name in /sys/module/rdma_rxe/parameters/add >> apparently; then checks it worked. >> this part could be done in C directly without too much trouble, but as >> long as the proper kernel configuration/modules are available > > Now we are talking! > We generally assume that all modules are simply compiled into kernel. > At least that's we have on syzbot. If somebody can't compile them in, > we can suggest to add modprobe into init. > So this boils down to just writing to /sys/module/rdma_rxe/parameters/add. This fails for me: root@syzkaller:~# echo -n syz1 > /sys/module/rdma_rxe/parameters/add [20992.905406] rdma_rxe: interface syz1 not found bash: echo: write error: Invalid argument >>> Any libraries and utilities are hell pain in linux world. Will it work >>> in Android userspace? gVisor? Who will explain all syzkaller users >>> where they get this for their who-knows-what distro, which is 10 years >>> old because of corp policies, and debug how their version of the >>> library has a slightly incompatible version? >>> For example, after figuring out that rxe_cfg actually comes from >>> rdma-core (which is a separate delight on linux), my debian >>> destribution failed to install it because of some conflicts around >>> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about >>> such package. And we've just started :) >> >> The rdma ecosystem is a pain, I'll easily agree with that... >> >>> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP, >>> ok, that's it. If it works, great, we can use it. >> >> I'll have to look into it a bit more; libibverbs abstracts a lot of >> stuff into per-nic userspace drivers (the files I cited in a previous >> mail) and basically with the mellanox cards I'm familiar with the whole >> user session looks like this: >> * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and >> /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi >> version, what user driver to load etc) >> * it and the userspace driver issue "commands" over these two files' fd >> to setup the connection ; some commands are standard but some are >> specific to the interface and defined in the driver. > > But we will use some kind of virtual/stub driver, right? We don't have > real hardware. So all these commands should be fixed and known for the > virtual/stub driver. > >> There are many facets to a connection in RDMA: a protection domain used >> to register memory with the nic, a queue pair that is the actual tx/rx >> connection, optionally a completion channel that will be another fd to >> listen on for events that tell you something happened and finally some >> memory regions to directly communicate with the nic from userspace >> depending on the specific driver. >> * then there's the actual usage, more commands through the uverbs0 char >> device to register the memory you'll use, and once that's done it's >> entierly up to the driver - for example the mellanox lib can do >> everything in userspace playing with the memory regions it registered, >> but I'd wager the rxe driver does more calls through the uverbs0 fd... >> >> Honestly I'm not keen on reimplementing all of this; the interface >> itself pretty much depends on your version of the kernel (there is a >> common ABI defined, but as far as specific nics are concerned if your >> kernel module doesn't match the user library version you can get some >> nasty surprises), and it's far from the black or white of a good ol' >> ENOSUPP error. >> >> >> I'll look if I can figure out if there is a common subset of verbs >> commands that are standard and sufficient to setup a listening >> connection and exchange data that should be supported for all devices >> and would let us reimplement just that, but while I hear your point >> about android and ten years in the future I think it's more likely than >> ten years in the future the verb abi will have changed but libibverbs >> will just have the new version implemented and hide the change :P > > But again we don't need to support all of the available hardware. > For example, we are testing net stack from external side using tun. > tun is a very simple, virtual abstraction of a network card. It allows > us to test all of generic net stack starting from L2 without messing > with any real drivers and their differences entirely. I had impression > that we are talking about something similar here too. Or not? > > Also I am a bit missing context about rdma<->9p interface. Do we need > to setup all these ring buffers to satisfy the parts that 9p needs? Is > it that 9p actually reads data directly from these ring buffers? Or > there is some higher-level rdma interface that 9p uses? ^ permalink raw reply [flat|nested] 33+ messages in thread
* 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) 2018-10-11 13:40 ` Dmitry Vyukov @ 2018-10-11 14:28 ` Dominique Martinet 2018-10-12 14:42 ` Dmitry Vyukov 0 siblings, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-10-11 14:28 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dmitry Vyukov wrote on Thu, Oct 11, 2018: > > Now we are talking! > > We generally assume that all modules are simply compiled into kernel. > > At least that's we have on syzbot. If somebody can't compile them in, > > we can suggest to add modprobe into init. > > So this boils down to just writing to /sys/module/rdma_rxe/parameters/add. > > This fails for me: > > root@syzkaller:~# echo -n syz1 > /sys/module/rdma_rxe/parameters/add > [20992.905406] rdma_rxe: interface syz1 not found > bash: echo: write error: Invalid argument Works here, I just did: [root@f2 ~]# modprobe rdma_rxe [root@f2 ~]# echo -n ens3 > /sys/module/rdma_rxe/parameters/add dmesg says: [ 35.595534] rdma_rxe: set rxe0 active [ 35.595541] rdma_rxe: added rxe0 to ens3 Actually for a dummy interface if I try the echo directly the echo works, and a verb device is created, and I just confirmed I can use it... so not sure why rxe_cfg said EINVAL earlier... [root@f2 ~]# ip link add dummy0 type dummy [root@f2 ~]# ip link set dummy0 up [root@f2 ~]# ip addr add 10.1.1.1/24 dev dummy0 [root@f2 ~]# modprobe rdma_rxe [root@f2 ~]# echo -n dummy0 > /sys/module/rdma_rxe/parameters/add (then using my test client: [root@f2 src]# ./rcat -s INFO: trans_rdma.c (879), msk_cma_event_handler: CONNECT_REQUEST INFO: trans_rdma.c (862), msk_cma_event_handler: ESTABLISHED INFO: trans_rdma.c (917), msk_cma_event_handler: DISCONNECT EVENT... [root@f2 src]# ./rcat -c 10.1.1.1 INFO: trans_rdma.c (862), msk_cma_event_handler: ESTABLISHED ^C ) I assume your syz1 interface is a tap device as you were saying earlier? Got anything in dmesg? -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) 2018-10-11 14:28 ` 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) Dominique Martinet @ 2018-10-12 14:42 ` Dmitry Vyukov 0 siblings, 0 replies; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-12 14:42 UTC (permalink / raw) To: Dominique Martinet Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Thu, Oct 11, 2018 at 4:28 PM, Dominique Martinet <asmadeus@codewreck.org> wrote: > Dmitry Vyukov wrote on Thu, Oct 11, 2018: >> > Now we are talking! >> > We generally assume that all modules are simply compiled into kernel. >> > At least that's we have on syzbot. If somebody can't compile them in, >> > we can suggest to add modprobe into init. >> > So this boils down to just writing to /sys/module/rdma_rxe/parameters/add. >> >> This fails for me: >> >> root@syzkaller:~# echo -n syz1 > /sys/module/rdma_rxe/parameters/add >> [20992.905406] rdma_rxe: interface syz1 not found >> bash: echo: write error: Invalid argument > > Works here, I just did: > > [root@f2 ~]# modprobe rdma_rxe > [root@f2 ~]# echo -n ens3 > /sys/module/rdma_rxe/parameters/add > > dmesg says: > [ 35.595534] rdma_rxe: set rxe0 active > [ 35.595541] rdma_rxe: added rxe0 to ens3 > > > Actually for a dummy interface if I try the echo directly the echo > works, and a verb device is created, and I just confirmed I can use > it... so not sure why rxe_cfg said EINVAL earlier... > > [root@f2 ~]# ip link add dummy0 type dummy > [root@f2 ~]# ip link set dummy0 up > [root@f2 ~]# ip addr add 10.1.1.1/24 dev dummy0 > [root@f2 ~]# modprobe rdma_rxe > [root@f2 ~]# echo -n dummy0 > /sys/module/rdma_rxe/parameters/add > > > (then using my test client: > [root@f2 src]# ./rcat -s > INFO: trans_rdma.c (879), msk_cma_event_handler: CONNECT_REQUEST > INFO: trans_rdma.c (862), msk_cma_event_handler: ESTABLISHED > INFO: trans_rdma.c (917), msk_cma_event_handler: DISCONNECT EVENT... > > [root@f2 src]# ./rcat -c 10.1.1.1 > INFO: trans_rdma.c (862), msk_cma_event_handler: ESTABLISHED > ^C > ) > > > I assume your syz1 interface is a tap device as you were saying earlier? > Got anything in dmesg? Umm... nope, just a random string. Don't assume any prior knowledge on my side :) ^ permalink raw reply [flat|nested] 33+ messages in thread
* 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) 2018-10-11 13:27 ` Dmitry Vyukov 2018-10-11 13:40 ` Dmitry Vyukov @ 2018-10-11 14:19 ` Dominique Martinet 2018-10-12 14:50 ` Dmitry Vyukov 1 sibling, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-10-11 14:19 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dmitry Vyukov wrote on Thu, Oct 11, 2018: > But again we don't need to support all of the available hardware. I agree with that, I just have no idea what the "librxe-rdmav16.so" lib could be doing and described something I am slightly more familiar with (e.g. libmlx5) I talked about a common subset of the verb abi because I didn't want to look into what it's doing, but if it's not enough there's always that possibility. > For example, we are testing net stack from external side using tun. > tun is a very simple, virtual abstraction of a network card. It allows > us to test all of generic net stack starting from L2 without messing > with any real drivers and their differences entirely. I had impression > that we are talking about something similar here too. Or not? That sounds about right, rxe is a software implementation that should work on most network interfaces ; at least from what I tried it worked on a VM's virtio net down to my laptop's wifi interface so it's a good start... I'm not saying all because I just tried a dummy interface and that returned EINVAL. The only point I disagree is the 'very simple', even getting that to work will be a far cry from a socket() call... :) > Also I am a bit missing context about rdma<->9p interface. Do we need > to setup all these ring buffers to satisfy the parts that 9p needs? Is > it that 9p actually reads data directly from these ring buffers? Or > there is some higher-level rdma interface that 9p uses? It needs an "RDMA_PS_TCP" connection established, that requires everything I described unfortunately... Once that's established we need to register some memory to the driver and post some recv buffers (even if we won't read it, the client would get errors if we aren't ready to receive anything - at least it does with real hardware), and also use some registered memory to send data. Thinking back though I think that my server implementation isn't very far from the raw level in what I'm doing, I recall libibverbs fallback implementation (e.g. if the driver lib doesn't implement it otherwise) of the functions I looked at like ibv_post_send to mostly be just serializing the arguments, slapping the command from an enum in front of it and sending it to the kernel, so it might be enough to just reimplement that shim in or figure a way to generate the binary commands once and then use these values; now I'm comparing two runs of strace of my test server I definitely see a pattern. I'll give it a try but don't expect something fast, and it's probably not going to be very pretty either... To give a concrete example, here are all the read/write/fcntl calls looking just at /dev/infiniband in a hello world program that just establishes connection (server side), receive and send two messages and quits: This part apparently sets up the listening connection of the server: 1430 1539262699.126025 openat(AT_FDCWD, "/dev/infiniband/rdma_cm", O_RDWR|O_CLOEXEC) = 3 1430 1539262699.126155 write(3, "\0\0\0\0\30\0\4\0@m'\1\0\0\0\0\344\327\375\271\374\177\0\0?\1\2\0\0\0\0\0", 32) = 32 1430 1539262699.126192 write(3, "\24\0\0\0\210\0\0\0\0\0\0\0000\0\0\0\33\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\6\0\0\377\377\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 144) = 144 1430 1539262699.126223 write(3, "\23\0\0\0\20\0\20\1 \326\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 1430 1539262699.126250 write(3, "\23\0\0\0\20\0\20\1 \326\375\271\374\177\0\0\0\0\0\0\2\0\0\0", 24) = 24 1430 1539262699.126274 write(3, "\1\0\0\0\20\0\4\0\324\327\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 1430 1539262699.126303 close(3) = 0 1430 1539262699.126360 openat(AT_FDCWD, "/dev/infiniband/rdma_cm", O_RDWR|O_CLOEXEC) = 3 1430 1539262699.126429 write(3, "\0\0\0\0\30\0\4\0\240\217'\1\0\0\0\0t\330\375\271\374\177\0\0\6\1\2\0\0\0\0\0", 32) = 32 1430 1539262699.126472 write(3, "\24\0\0\0\210\0\0\0\0\0\0\0\34\0\0\0\n\0\4\323\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 144) = 144 1430 1539262699.126501 write(3, "\23\0\0\0\20\0\20\1p\326\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 1430 1539262699.126534 write(3, "\23\0\0\0\20\0\20\1p\326\375\271\374\177\0\0\0\0\0\0\2\0\0\0", 24) = 24 1430 1539262699.127119 write(3, "\7\0\0\0\10\0\0\0\0\0\0\0@\0\0\0", 16) = 16 1430 1539262699.127149 write(3, "\23\0\0\0\20\0\20\1`\327\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 1430 1539262699.127319 fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) 1430 1539262699.127348 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE <unfinished ...> Then the client connects (had some epoll on read on fd 3, but no read?!) 1446 1539262706.268685 write(3, "\f\0\0\0\10\0H\1\200\307\211\302G\177\0\0", 16) = 16 1446 1539262706.268718 write(3, "\23\0\0\0\20\0\20\1\240\304\211\302G\177\0\0\2\0\0\0\0\0\0\0", 24) = 24 1446 1539262706.269440 openat(AT_FDCWD, "/dev/infiniband/uverbs0", O_RDWR|O_CLOEXEC) = 5 1446 1539262706.269474 write(5, "\0\0\0\0\4\0\2\0H\302\211\302G\177\0\0", 16) = 16 1446 1539262706.269503 write(5, "\1\0\0\0\4\0,\0\220\301\211\302G\177\0\0", 16) = 16 1446 1539262706.269545 write(5, "\2\0\0\0\6\0\n\0\20\302\211\302G\177\0\0\1\0\0\0\0\0\0\0", 24) = 24 1446 1539262706.269571 write(5, "\3\0\0\0\4\0\1\0\314\303\211\302G\177\0\0", 16) = 16 1446 1539262706.269596 write(3, "\23\0\0\0\20\0\20\1\240\304\211\302G\177\0\0\2\0\0\0\2\0\0\0", 24) = 24 1446 1539262706.269618 write(3, "\23\0\0\0\20\0\270\1\200\303\211\302G\177\0\0\2\0\0\0\1\0\0\0", 24) = 24 1430 1539262706.269801 write(5, "\3\0\0\0\4\0\1\0\354\330\375\271\374\177\0\0", 16) = 16 1430 1539262706.269944 write(5, "\21\0\0\0\4\0\1\0T\330\375\271\374\177\0\0", 16) = 16 1430 1539262706.270000 write(5, "\22\0\0\0\n\0\6\0 \330\375\271\374\177\0\0`\232'\1\0\0\0\0006\0\0\0\0\0\0\0\7\0\0\0\0\0\0\0", 40) = 40 1430 1539262706.270203 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 1430 1539262706.270262 write(5, "\30\0\0\0\20\0\20\0000\327\375\271\374\177\0\0\20\233'\1\0\0\0\0\1\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0002\0\0\0\4\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\1\2\0\0", 64) = 64 1430 1539262706.270482 write(3, "\v\0\0\0\20\0\220\0p\326\375\271\374\177\0\0\2\0\0\0\1\0\0\0", 24) = 24 1430 1539262706.270546 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\16\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0", 120) = 120 1430 1539262706.270677 write(5, "\t\0\0\0\f\0\3\0\224\330\375\271\374\177\0\0\20p)\302G\177\0\0\0\0@\0\0\0\0\0\20p)\302G\177\0\0\1\0\0\0\1\0\0\0", 48) = 48 1430 1539262706.271973 write(5, "\t\0\0\0\f\0\3\0D\330\375\271\374\177\0\0\210\362&\1\0\0\0\0\1\0\0\0\0\0\0\0\210\362&\1\0\0\0\0\1\0\0\0\1\0\0\0", 48) = 48 1430 1539262706.272060 write(3, "\v\0\0\0\20\0\220\0000\325\375\271\374\177\0\0\2\0\0\0\1\0\0\0", 24) = 24 1430 1539262706.272110 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\16\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0", 120) = 120 1430 1539262706.272159 write(3, "\v\0\0\0\20\0\220\0000\325\375\271\374\177\0\0\2\0\0\0\2\0\0\0", 24) = 24 1430 1539262706.272205 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\n*\21f\0\0\0\0\0\0\0\0\1@\0\0\0\7\1\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\201\221\22\0\0\0\0\0\340\t\351\0\0\0\0\0\23\0\0\0\0\0\0\0\0\0\0\0\2\0\3\0\0\0\1\0\0\0\0\0\0\0\0\0", 120) = 120 1430 1539262706.272439 write(3, "\v\0\0\0\20\0\220\0000\325\375\271\374\177\0\0\2\0\0\0\3\0\0\0", 24) = 24 1430 1539262706.272496 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\1.\1\0\0\0\0\0\0\0\0\0\364((\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\0\1\0\0\0\23\7\7\0\0\0\0", 120) = 120 1430 1539262706.272565 write(3, "\10\0\0\0 \1\0\0\220\f\0\274G\177\0\0\24\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\1\0\0\n\1\2\0\0\0\0\0\0\0", 296) = 296 1446 1539262706.272962 write(3, "\f\0\0\0\10\0H\1\200\307\211\302G\177\0\0", 16) = 16 1430 1539262706.274144 write(5, "\t\0\0\0\f\0\3\0D\330\375\271\374\177\0\0`\0\351\301G\177\0\0\0\0 \0\0\0\0\0`\0\351\301G\177\0\0\1\0\0\0\1\0\0\0", 48) = 48 Some data is exchanged (we don't see the data as it's in buffers whose address was given earlier): 1464 1539262714.529679 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 1464 1539262714.530059 write(5, "\34\0\0\0\10\0\1\0lT)\302G\177\0\0\3\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0", 32) = 32 1464 1539262714.530634 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 1430 1539262719.331307 write(5, "\34\0\0\0\10\0\1\0\374\327\375\271\374\177\0\0\3\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0", 32) = 32 1464 1539262719.332113 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 And disconnect: 1430 1539262721.192844 write(5, "\r\0\0\0\3\0\0\0\6\0\0\0", 12) = 12 1430 1539262721.193186 write(5, "\r\0\0\0\3\0\0\0\5\0\0\0", 12) = 12 1430 1539262721.193324 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 120) = 120 1430 1539262721.193567 write(3, "\n\0\0\0\4\0\0\0\2\0\0\0", 12) = 12 1446 1539262721.256556 write(3, "\f\0\0\0\10\0H\1\200\307\211\302G\177\0\0", 16) = 16 1430 1539262721.257618 write(3, "\1\0\0\0\20\0\4\0\204\327\375\271\374\177\0\0\2\0\0\0\0\0\0\0", 24) = 24 1430 1539262721.257769 write(5, "\4\0\0\0\3\0\0\0\0\0\0\0", 12) = 12 1430 1539262721.258369 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 1430 1539262721.258667 write(5, "\33\0\0\0\6\0\1\0T\327\375\271\374\177\0\0\3\0\0\0\0\0\0\0", 24) = 24 1430 1539262721.259223 write(5, "\24\0\0\0\6\0\2\08\327\375\271\374\177\0\0\2\0\0\0\0\0\0\0", 24) = 24 1430 1539262721.260476 write(3, "\1\0\0\0\20\0\4\0D\330\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 1430 1539262721.260726 close(3) = 0 1430 1539262721.261082 write(5, "\4\0\0\0\3\0\0\0\1\0\0\0", 12) = -1 EBUSY (Device or resource busy) 1430 1539262721.358728 write(5, "\r\0\0\0\3\0\0\0\4\0\0\0", 12) = 12 I don't see any read on these fd despite epoll being set to wait for read events on these so I'm not quite sure where ibverbs knows if the commands worked or not, but hopefully that illustrats that it's slightly more complex than just socket/bind/listen/accept/write/close! :) -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) 2018-10-11 14:19 ` Dominique Martinet @ 2018-10-12 14:50 ` Dmitry Vyukov 2018-10-12 15:08 ` Dominique Martinet 0 siblings, 1 reply; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-12 14:50 UTC (permalink / raw) To: Dominique Martinet Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Thu, Oct 11, 2018 at 4:19 PM, Dominique Martinet <asmadeus@codewreck.org> wrote: > Dmitry Vyukov wrote on Thu, Oct 11, 2018: >> But again we don't need to support all of the available hardware. > > I agree with that, I just have no idea what the "librxe-rdmav16.so" lib > could be doing and described something I am slightly more familiar with > (e.g. libmlx5) > I talked about a common subset of the verb abi because I didn't want to > look into what it's doing, but if it's not enough there's always that > possibility. > > >> For example, we are testing net stack from external side using tun. >> tun is a very simple, virtual abstraction of a network card. It allows >> us to test all of generic net stack starting from L2 without messing >> with any real drivers and their differences entirely. I had impression >> that we are talking about something similar here too. Or not? > > That sounds about right, rxe is a software implementation that should > work on most network interfaces ; at least from what I tried it worked > on a VM's virtio net down to my laptop's wifi interface so it's a good > start... I'm not saying all because I just tried a dummy interface and > that returned EINVAL. > The only point I disagree is the 'very simple', even getting that to > work will be a far cry from a socket() call... :) > > >> Also I am a bit missing context about rdma<->9p interface. Do we need >> to setup all these ring buffers to satisfy the parts that 9p needs? Is >> it that 9p actually reads data directly from these ring buffers? Or >> there is some higher-level rdma interface that 9p uses? > > It needs an "RDMA_PS_TCP" connection established, that requires > everything I described unfortunately... > Once that's established we need to register some memory to the driver > and post some recv buffers (even if we won't read it, the client would > get errors if we aren't ready to receive anything - at least it does > with real hardware), and also use some registered memory to send data. > > Thinking back though I think that my server implementation isn't very > far from the raw level in what I'm doing, I recall libibverbs fallback > implementation (e.g. if the driver lib doesn't implement it otherwise) > of the functions I looked at like ibv_post_send to mostly be just > serializing the arguments, slapping the command from an enum in front of > it and sending it to the kernel, so it might be enough to just > reimplement that shim in or figure a way to generate the binary commands > once and then use these values; now I'm comparing two runs of strace of > my test server I definitely see a pattern. > > I'll give it a try but don't expect something fast, and it's probably > not going to be very pretty either... > > To give a concrete example, here are all the read/write/fcntl calls > looking just at /dev/infiniband in a hello world program that just > establishes connection (server side), receive and send two messages and > quits: > > > This part apparently sets up the listening connection of the server: > > 1430 1539262699.126025 openat(AT_FDCWD, "/dev/infiniband/rdma_cm", O_RDWR|O_CLOEXEC) = 3 > 1430 1539262699.126155 write(3, "\0\0\0\0\30\0\4\0@m'\1\0\0\0\0\344\327\375\271\374\177\0\0?\1\2\0\0\0\0\0", 32) = 32 > 1430 1539262699.126192 write(3, "\24\0\0\0\210\0\0\0\0\0\0\0000\0\0\0\33\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\6\0\0\377\377\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 144) = 144 > 1430 1539262699.126223 write(3, "\23\0\0\0\20\0\20\1 \326\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 > 1430 1539262699.126250 write(3, "\23\0\0\0\20\0\20\1 \326\375\271\374\177\0\0\0\0\0\0\2\0\0\0", 24) = 24 > 1430 1539262699.126274 write(3, "\1\0\0\0\20\0\4\0\324\327\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 > 1430 1539262699.126303 close(3) = 0 > 1430 1539262699.126360 openat(AT_FDCWD, "/dev/infiniband/rdma_cm", O_RDWR|O_CLOEXEC) = 3 > 1430 1539262699.126429 write(3, "\0\0\0\0\30\0\4\0\240\217'\1\0\0\0\0t\330\375\271\374\177\0\0\6\1\2\0\0\0\0\0", 32) = 32 > 1430 1539262699.126472 write(3, "\24\0\0\0\210\0\0\0\0\0\0\0\34\0\0\0\n\0\4\323\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 144) = 144 > 1430 1539262699.126501 write(3, "\23\0\0\0\20\0\20\1p\326\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 > 1430 1539262699.126534 write(3, "\23\0\0\0\20\0\20\1p\326\375\271\374\177\0\0\0\0\0\0\2\0\0\0", 24) = 24 > 1430 1539262699.127119 write(3, "\7\0\0\0\10\0\0\0\0\0\0\0@\0\0\0", 16) = 16 > 1430 1539262699.127149 write(3, "\23\0\0\0\20\0\20\1`\327\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 > 1430 1539262699.127319 fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) > 1430 1539262699.127348 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE <unfinished ...> > > Then the client connects (had some epoll on read on fd 3, but no read?!) > > 1446 1539262706.268685 write(3, "\f\0\0\0\10\0H\1\200\307\211\302G\177\0\0", 16) = 16 > 1446 1539262706.268718 write(3, "\23\0\0\0\20\0\20\1\240\304\211\302G\177\0\0\2\0\0\0\0\0\0\0", 24) = 24 > 1446 1539262706.269440 openat(AT_FDCWD, "/dev/infiniband/uverbs0", O_RDWR|O_CLOEXEC) = 5 > 1446 1539262706.269474 write(5, "\0\0\0\0\4\0\2\0H\302\211\302G\177\0\0", 16) = 16 > 1446 1539262706.269503 write(5, "\1\0\0\0\4\0,\0\220\301\211\302G\177\0\0", 16) = 16 > 1446 1539262706.269545 write(5, "\2\0\0\0\6\0\n\0\20\302\211\302G\177\0\0\1\0\0\0\0\0\0\0", 24) = 24 > 1446 1539262706.269571 write(5, "\3\0\0\0\4\0\1\0\314\303\211\302G\177\0\0", 16) = 16 > 1446 1539262706.269596 write(3, "\23\0\0\0\20\0\20\1\240\304\211\302G\177\0\0\2\0\0\0\2\0\0\0", 24) = 24 > 1446 1539262706.269618 write(3, "\23\0\0\0\20\0\270\1\200\303\211\302G\177\0\0\2\0\0\0\1\0\0\0", 24) = 24 > 1430 1539262706.269801 write(5, "\3\0\0\0\4\0\1\0\354\330\375\271\374\177\0\0", 16) = 16 > 1430 1539262706.269944 write(5, "\21\0\0\0\4\0\1\0T\330\375\271\374\177\0\0", 16) = 16 > 1430 1539262706.270000 write(5, "\22\0\0\0\n\0\6\0 \330\375\271\374\177\0\0`\232'\1\0\0\0\0006\0\0\0\0\0\0\0\7\0\0\0\0\0\0\0", 40) = 40 > 1430 1539262706.270203 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 > 1430 1539262706.270262 write(5, "\30\0\0\0\20\0\20\0000\327\375\271\374\177\0\0\20\233'\1\0\0\0\0\1\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0002\0\0\0\4\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\1\2\0\0", 64) = 64 > 1430 1539262706.270482 write(3, "\v\0\0\0\20\0\220\0p\326\375\271\374\177\0\0\2\0\0\0\1\0\0\0", 24) = 24 > 1430 1539262706.270546 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\16\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0", 120) = 120 > 1430 1539262706.270677 write(5, "\t\0\0\0\f\0\3\0\224\330\375\271\374\177\0\0\20p)\302G\177\0\0\0\0@\0\0\0\0\0\20p)\302G\177\0\0\1\0\0\0\1\0\0\0", 48) = 48 > 1430 1539262706.271973 write(5, "\t\0\0\0\f\0\3\0D\330\375\271\374\177\0\0\210\362&\1\0\0\0\0\1\0\0\0\0\0\0\0\210\362&\1\0\0\0\0\1\0\0\0\1\0\0\0", 48) = 48 > 1430 1539262706.272060 write(3, "\v\0\0\0\20\0\220\0000\325\375\271\374\177\0\0\2\0\0\0\1\0\0\0", 24) = 24 > 1430 1539262706.272110 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\16\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0", 120) = 120 > 1430 1539262706.272159 write(3, "\v\0\0\0\20\0\220\0000\325\375\271\374\177\0\0\2\0\0\0\2\0\0\0", 24) = 24 > 1430 1539262706.272205 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\n*\21f\0\0\0\0\0\0\0\0\1@\0\0\0\7\1\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\201\221\22\0\0\0\0\0\340\t\351\0\0\0\0\0\23\0\0\0\0\0\0\0\0\0\0\0\2\0\3\0\0\0\1\0\0\0\0\0\0\0\0\0", 120) = 120 > 1430 1539262706.272439 write(3, "\v\0\0\0\20\0\220\0000\325\375\271\374\177\0\0\2\0\0\0\3\0\0\0", 24) = 24 > 1430 1539262706.272496 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\1.\1\0\0\0\0\0\0\0\0\0\364((\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\0\1\0\0\0\23\7\7\0\0\0\0", 120) = 120 > 1430 1539262706.272565 write(3, "\10\0\0\0 \1\0\0\220\f\0\274G\177\0\0\24\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\1\0\0\n\1\2\0\0\0\0\0\0\0", 296) = 296 > 1446 1539262706.272962 write(3, "\f\0\0\0\10\0H\1\200\307\211\302G\177\0\0", 16) = 16 > 1430 1539262706.274144 write(5, "\t\0\0\0\f\0\3\0D\330\375\271\374\177\0\0`\0\351\301G\177\0\0\0\0 \0\0\0\0\0`\0\351\301G\177\0\0\1\0\0\0\1\0\0\0", 48) = 48 > > > Some data is exchanged (we don't see the data as it's in buffers whose > address was given earlier): > > 1464 1539262714.529679 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 > 1464 1539262714.530059 write(5, "\34\0\0\0\10\0\1\0lT)\302G\177\0\0\3\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0", 32) = 32 > 1464 1539262714.530634 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 > 1430 1539262719.331307 write(5, "\34\0\0\0\10\0\1\0\374\327\375\271\374\177\0\0\3\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0", 32) = 32 > 1464 1539262719.332113 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 > > And disconnect: > > 1430 1539262721.192844 write(5, "\r\0\0\0\3\0\0\0\6\0\0\0", 12) = 12 > 1430 1539262721.193186 write(5, "\r\0\0\0\3\0\0\0\5\0\0\0", 12) = 12 > 1430 1539262721.193324 write(5, "\32\0\0\0\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 120) = 120 > 1430 1539262721.193567 write(3, "\n\0\0\0\4\0\0\0\2\0\0\0", 12) = 12 > 1446 1539262721.256556 write(3, "\f\0\0\0\10\0H\1\200\307\211\302G\177\0\0", 16) = 16 > 1430 1539262721.257618 write(3, "\1\0\0\0\20\0\4\0\204\327\375\271\374\177\0\0\2\0\0\0\0\0\0\0", 24) = 24 > 1430 1539262721.257769 write(5, "\4\0\0\0\3\0\0\0\0\0\0\0", 12) = 12 > 1430 1539262721.258369 write(5, "\27\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 > 1430 1539262721.258667 write(5, "\33\0\0\0\6\0\1\0T\327\375\271\374\177\0\0\3\0\0\0\0\0\0\0", 24) = 24 > 1430 1539262721.259223 write(5, "\24\0\0\0\6\0\2\08\327\375\271\374\177\0\0\2\0\0\0\0\0\0\0", 24) = 24 > 1430 1539262721.260476 write(3, "\1\0\0\0\20\0\4\0D\330\375\271\374\177\0\0\0\0\0\0\0\0\0\0", 24) = 24 > 1430 1539262721.260726 close(3) = 0 > 1430 1539262721.261082 write(5, "\4\0\0\0\3\0\0\0\1\0\0\0", 12) = -1 EBUSY (Device or resource busy) > 1430 1539262721.358728 write(5, "\r\0\0\0\3\0\0\0\4\0\0\0", 12) = 12 > > > I don't see any read on these fd despite epoll being set to wait for > read events on these so I'm not quite sure where ibverbs knows if the > commands worked or not, but hopefully that illustrats that it's slightly > more complex than just socket/bind/listen/accept/write/close! :) Yes, it seems so. I guess I am still missing the big picture somewhat. If we do "echo -n FOO > /sys/module/rdma_rxe/parameters/add" and let's say FOO is a tun device. Does it mean that we will send/receive packets from the tun? If yes, that would make things simpler. And do we still need ring buffers in that case? If not and we still send/recv via in-memory ring buffers, then why do we need tun at all? Leon, maybe you know how to setup a stub rdma that we could use as 9p transport? If we do this, I guess it will also expose lots of interesting rdma code paths for testing. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) 2018-10-12 14:50 ` Dmitry Vyukov @ 2018-10-12 15:08 ` Dominique Martinet 2018-11-17 8:46 ` Dominique Martinet 0 siblings, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-10-12 15:08 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dmitry Vyukov wrote on Fri, Oct 12, 2018: > > I don't see any read on these fd despite epoll being set to wait for > > read events on these so I'm not quite sure where ibverbs knows if the > > commands worked or not, but hopefully that illustrats that it's slightly > > more complex than just socket/bind/listen/accept/write/close! :) > > Yes, it seems so. > > I guess I am still missing the big picture somewhat. > If we do "echo -n FOO > /sys/module/rdma_rxe/parameters/add" and let's > say FOO is a tun device. Does it mean that we will send/receive > packets from the tun? If yes, that would make things simpler. And do > we still need ring buffers in that case? If not and we still send/recv > via in-memory ring buffers, then why do we need tun at all? Hmm, good point; I hadn't looked at the network level how this is emulated. When I use a single VM I do not see anything with tcpdump on any interface, so I assume the kernel short-cuts the interface in this case. When communicating between two machines there obviously is traffic; it appears to be transported over udp - I see the messages I sent in plain text in the dump and there is only a handful of packets for the whole connecting and teardown so it's definitely much simpler. This might have some knob I am not aware of to force the driver to send udp in the local setup, if we can it's going to be much easier to reimplement the rxe emulation protocol with raw syscalls than what I was describing earlier... > Leon, maybe you know how to setup a stub rdma that we could use as 9p > transport? If we do this, I guess it will also expose lots of > interesting rdma code paths for testing. I'm doing this on my free time atm so I can't invest too much, would love some help if you're aware of anything :) -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) 2018-10-12 15:08 ` Dominique Martinet @ 2018-11-17 8:46 ` Dominique Martinet 2018-11-20 11:20 ` Dmitry Vyukov 0 siblings, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-11-17 8:46 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dominique Martinet wrote on Fri, Oct 12, 2018: > Hmm, good point; I hadn't looked at the network level how this is > emulated. > When I use a single VM I do not see anything with tcpdump on any > interface, so I assume the kernel short-cuts the interface in this case. > When communicating between two machines there obviously is traffic; it > appears to be transported over udp - I see the messages I sent in plain > text in the dump and there is only a handful of packets for the whole > connecting and teardown so it's definitely much simpler. > > This might have some knob I am not aware of to force the driver to send > udp in the local setup, if we can it's going to be much easier to > reimplement the rxe emulation protocol with raw syscalls than what I was > describing earlier... I've taken a little bit of time to look at this. Unfortunately, I haven't been able to find any knob to make a single-machine setup send UDP over an interface; would have hoped that playing with multiple interfaces might be a way around this (e.g. use the rxe driver on one interface and listen on the udp port on the other one), but I couldn't get this to work and don't see anything in the code that hints it might be configurable. It seems to realize that the remote IP is assigned locally anyway and while I cannot "bind" to an address that wasn't added to the rxe driver at the rdma_cm level, trying to connect to it never sends any packet to a ncat listening on 4791/udp on that interface... If anyone has an idea, it'd be great! Protocole-wise, once we get that to work it doesn't look very difficult to implement some basic "accept connection and send acks/replies" small server, but we need the first part to work... -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) 2018-11-17 8:46 ` Dominique Martinet @ 2018-11-20 11:20 ` Dmitry Vyukov 2018-11-20 11:28 ` Dominique Martinet 0 siblings, 1 reply; 33+ messages in thread From: Dmitry Vyukov @ 2018-11-20 11:20 UTC (permalink / raw) To: Dominique Martinet Cc: Leon Romanovsky, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Sat, Nov 17, 2018 at 9:46 AM, Dominique Martinet <asmadeus@codewreck.org> wrote: > Dominique Martinet wrote on Fri, Oct 12, 2018: >> Hmm, good point; I hadn't looked at the network level how this is >> emulated. >> When I use a single VM I do not see anything with tcpdump on any >> interface, so I assume the kernel short-cuts the interface in this case. >> When communicating between two machines there obviously is traffic; it >> appears to be transported over udp - I see the messages I sent in plain >> text in the dump and there is only a handful of packets for the whole >> connecting and teardown so it's definitely much simpler. >> >> This might have some knob I am not aware of to force the driver to send >> udp in the local setup, if we can it's going to be much easier to >> reimplement the rxe emulation protocol with raw syscalls than what I was >> describing earlier... > > I've taken a little bit of time to look at this. > Unfortunately, I haven't been able to find any knob to make a > single-machine setup send UDP over an interface; would have hoped that > playing with multiple interfaces might be a way around this (e.g. use > the rxe driver on one interface and listen on the udp port on the other > one), but I couldn't get this to work and don't see anything in the code > that hints it might be configurable. > It seems to realize that the remote IP is assigned locally anyway and > while I cannot "bind" to an address that wasn't added to the rxe driver > at the rdma_cm level, trying to connect to it never sends any packet to > a ncat listening on 4791/udp on that interface... > If anyone has an idea, it'd be great! > > Protocole-wise, once we get that to work it doesn't look very difficult > to implement some basic "accept connection and send acks/replies" small > server, but we need the first part to work... I don't understand most of the words here, so I probably won't be of help. Leon told me that there is some kind of software emulation RDMA interface (CONFIG_RDMA_RXE?). Maybe it can be used to create an emulated RDMA interface per test process and use it to communicate with 9p client? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) 2018-11-20 11:20 ` Dmitry Vyukov @ 2018-11-20 11:28 ` Dominique Martinet 0 siblings, 0 replies; 33+ messages in thread From: Dominique Martinet @ 2018-11-20 11:28 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dmitry Vyukov wrote on Tue, Nov 20, 2018: > I don't understand most of the words here, so I probably won't be of help. > > Leon told me that there is some kind of software emulation RDMA > interface (CONFIG_RDMA_RXE?). Maybe it can be used to create an > emulated RDMA interface per test process and use it to communicate > with 9p client? Yes, rxe is what you made me look into and is what I was talking about here - the problem is I can't seem to figure how to use it *as udp* within a single machine. With two VMs or a remote outside of the VM when I try to open a RDMA connection with that software emulation I get nicely understandable udp packets ; but within a single VM there is nothing visible at the udp layer, even if I explicitely configure multiple interfaces and try to use one as rxe and another without it that would act as "outside" -- the kernel doesn't send anything to the normal interface like it normally would. There might be other ways around it (I'm starting to think very ugly things like really pretend to connect to an IP outside of the VM, but capture the outgoing traffic with pcap and forge the replies), but I'd rather not have to go to this extent... -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-09 2:09 ` Dominique Martinet 2018-10-09 4:05 ` [PATCH 1/2] 9p/trans_fd: abort p9_read_work if req status changed Dominique Martinet 2018-10-10 14:03 ` BUG: corrupted list in p9_read_work Dmitry Vyukov @ 2018-10-10 14:29 ` Dmitry Vyukov 2018-10-10 14:48 ` Dominique Martinet 2018-10-10 14:42 ` Dmitry Vyukov 3 siblings, 1 reply; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-10 14:29 UTC (permalink / raw) To: Dominique Martinet Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Tue, Oct 9, 2018 at 4:09 AM, Dominique Martinet <asmadeus@codewreck.org> wrote: > syzbot wrote on Mon, Oct 08, 2018: >> syzbot has found a reproducer for the following crash on: >> >> HEAD commit: 0854ba5ff5c9 Merge git://git.kernel.org/pub/scm/linux/kern.. >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=1514ec06400000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=88e9a8a39dc0be2d >> dashboard link: https://syzkaller.appspot.com/bug?extid=2222c34dc40b515f30dc >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10b91685400000 >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: >> Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com >> >> list_del corruption, ffff88019ae36ee8->next is LIST_POISON1 >> (dead000000000100) >> ------------[ cut here ]------------ >> [...] >> list_del include/linux/list.h:125 [inline] >> p9_read_work+0xab6/0x10e0 net/9p/trans_fd.c:379 > > Hmm this looks very much like the report from > syzbot+735d926e9d1317c3310c@syzkaller.appspotmail.com > which should have been fixed by Tomas in 9f476d7c540cb > ("net/9p/trans_fd.c: fix race by holding the lock")... > > It looks like another double list_del, looking at the code again there > actually are other ways this could happen around connection errors. > For example, > - p9_read_work receives something and lookup works... meanwhile > - p9_write_work fails to write and calls p9_conn_cancel, which deletes > from the req_list without waiting for other works to finish (could also > happen in p9_poll_mux) > - p9_read_work finishes processing the read and deletes from list again > > For this one the simplest fix would probably be to just not > list_del/call p9_client_cb at all if m->r?req->status isn't > REQ_STATUS_ERROR in p9_read_work after the "got new packet" debug print, > and frankly I think that's saner so I'll send a patch shortly doing > that, but I have zero confidence there aren't similar bugs around, the > tcp code is so messy... Most of the syzbot reports recently have been > around trans_fd which I don't think is used much in real life, and this > is not really motivating (i.e. I think it would probably need a more > extensive rewrite but nobody cares) :/ > > > Dmitry, on that note, do you think syzbot could possibly test other > transports somehow? rdma or virtio cannot be faked as easily as passing > a fd around, but I'd be very interested in seeing these flayed a bit. > > (I'm also curious what logic is used to generate the syz tests, the > write$P9_Rxx replies have nothing to do with what the client would > expect so it probably doesn't test very far; this test in particular > does not even get past the initial P9_TVERSION that the client would > expect immediately after mount, so it's basically only testing logic > around packet handling on error... Or if we're accepting a RREADDIR in > reply to TVERSION we have bigger problems, and now I'm looking at it I > think we just might never check that....... I'll look at that for the > next cycle) > > > Back to the current patch, since as I said I am not confident this is a > good enough fix for the current bug, will I get notified if the bug > happens again once the patch hits linux-next with the Reported-by tag ? > (I don't have the setup necessary to run a syz repro as there is no C > repro, and won't have much time to do that setup sorry) Yes, the bug will be reported again if it still happens after the patch is merged (not just into linux-next, but into all tested trees, but it does not matter much). So marking bugs as fixed tentatively is fine if that's our best guess. But note that syzbot can test fixes itself on request. It boils down to just giving it the patch and the base tree: https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-10 14:29 ` BUG: corrupted list in p9_read_work Dmitry Vyukov @ 2018-10-10 14:48 ` Dominique Martinet 2018-10-10 14:49 ` syzbot 0 siblings, 1 reply; 33+ messages in thread From: Dominique Martinet @ 2018-10-10 14:48 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dmitry Vyukov wrote on Wed, Oct 10, 2018: > > Back to the current patch, since as I said I am not confident this is a > > good enough fix for the current bug, will I get notified if the bug > > happens again once the patch hits linux-next with the Reported-by tag ? > > (I don't have the setup necessary to run a syz repro as there is no C > > repro, and won't have much time to do that setup sorry) > > Yes, the bug will be reported again if it still happens after the > patch is merged (not just into linux-next, but into all tested trees, > but it does not matter much). So marking bugs as fixed tentatively is > fine if that's our best guess. Ok, thanks for confirming... > But note that syzbot can test fixes itself on request. It boils down > to just giving it the patch and the base tree: > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches .. and for clarifying that bit, let's try that! :) #syz test: git://github.com/martinetd/linux e4ca13f7d075e551dc158df6af18fb412a1dba0a -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re: BUG: corrupted list in p9_read_work 2018-10-10 14:48 ` Dominique Martinet @ 2018-10-10 14:49 ` syzbot 2018-10-10 16:00 ` Dominique Martinet 0 siblings, 1 reply; 33+ messages in thread From: syzbot @ 2018-10-10 14:49 UTC (permalink / raw) To: Dominique Martinet Cc: asmadeus, davem, dvyukov, ericvh, linux-kernel, lucho, netdev, rminnich, syzkaller-bugs, v9fs-developer > Dmitry Vyukov wrote on Wed, Oct 10, 2018: >> > Back to the current patch, since as I said I am not confident this is a >> > good enough fix for the current bug, will I get notified if the bug >> > happens again once the patch hits linux-next with the Reported-by tag ? >> > (I don't have the setup necessary to run a syz repro as there is no C >> > repro, and won't have much time to do that setup sorry) >> Yes, the bug will be reported again if it still happens after the >> patch is merged (not just into linux-next, but into all tested trees, >> but it does not matter much). So marking bugs as fixed tentatively is >> fine if that's our best guess. > Ok, thanks for confirming... >> But note that syzbot can test fixes itself on request. It boils down >> to just giving it the patch and the base tree: >> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches > .. and for clarifying that bit, let's try that! :) > #syz test: git://github.com/martinetd/linux > e4ca13f7d075e551dc158df6af18fb412a1dba0a "git://github.com/martinetd/linux" does not look like a valid git repo address. > -- > Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re: BUG: corrupted list in p9_read_work 2018-10-10 14:49 ` syzbot @ 2018-10-10 16:00 ` Dominique Martinet 2018-10-10 16:02 ` syzbot 2018-10-10 16:10 ` Dominique Martinet 0 siblings, 2 replies; 33+ messages in thread From: Dominique Martinet @ 2018-10-10 16:00 UTC (permalink / raw) To: syzbot Cc: davem, dvyukov, ericvh, linux-kernel, lucho, netdev, rminnich, syzkaller-bugs, v9fs-developer syzbot wrote on Wed, Oct 10, 2018: >>> But note that syzbot can test fixes itself on request. It boils down >>> to just giving it the patch and the base tree: >>> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches > >> .. and for clarifying that bit, let's try that! :) > >> #syz test: git://github.com/martinetd/linux >> e4ca13f7d075e551dc158df6af18fb412a1dba0a > > "git://github.com/martinetd/linux" does not look like a valid git > repo address. It works though, is it just picky because I didn't end it in .git? let's try again, sorry for the noise... #syz test: git://github.com/martinetd/linux.git e4ca13f7d075e551dc158df6af18fb412a1dba0a -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-10 16:00 ` Dominique Martinet @ 2018-10-10 16:02 ` syzbot 2018-10-10 16:10 ` Dominique Martinet 1 sibling, 0 replies; 33+ messages in thread From: syzbot @ 2018-10-10 16:02 UTC (permalink / raw) To: asmadeus, davem, dvyukov, ericvh, linux-kernel, lucho, netdev, rminnich, syzkaller-bugs, v9fs-developer Hello, syzbot tried to test the proposed patch but build/boot failed: failed to checkout kernel repo git://github.com/martinetd/linux.git on commit e4ca13f7d075e551dc158df6af18fb412a1dba0a: failed to run ["git" "checkout" "e4ca13f7d075e551dc158df6af18fb412a1dba0a"]: exit status 128 fatal: reference is not a tree: e4ca13f7d075e551dc158df6af18fb412a1dba0a Tested on: commit: [unknown] git tree: git://github.com/martinetd/linux.git e4ca13f7d075e551dc158df6af18fb412a1dba0a compiler: gcc (GCC) 8.0.1 20180413 (experimental) ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re: BUG: corrupted list in p9_read_work 2018-10-10 16:00 ` Dominique Martinet 2018-10-10 16:02 ` syzbot @ 2018-10-10 16:10 ` Dominique Martinet 2018-10-10 16:29 ` syzbot 2018-10-10 16:36 ` Dmitry Vyukov 1 sibling, 2 replies; 33+ messages in thread From: Dominique Martinet @ 2018-10-10 16:10 UTC (permalink / raw) To: syzbot Cc: davem, dvyukov, ericvh, linux-kernel, lucho, netdev, rminnich, syzkaller-bugs, v9fs-developer Dominique Martinet wrote on Wed, Oct 10, 2018: > It works though, is it just picky because I didn't end it in .git? let's > try again, sorry for the noise... > > #syz test: git://github.com/martinetd/linux.git e4ca13f7d075e551dc158df6af18fb412a1dba0a And I guess the commit hash needs to be in the default clone branch to work ? ('git fetch <repo> <hash>' happily fetches the commit in a new clone for me... But that feels like a github specific behaviour maybe) Oh, well; made a branch for it, last try for me. #syz test: git://github.com/martinetd/linux.git for-syzbot -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-10 16:10 ` Dominique Martinet @ 2018-10-10 16:29 ` syzbot 2018-10-10 16:36 ` Dmitry Vyukov 1 sibling, 0 replies; 33+ messages in thread From: syzbot @ 2018-10-10 16:29 UTC (permalink / raw) To: asmadeus, davem, dvyukov, ericvh, linux-kernel, lucho, netdev, rminnich, syzkaller-bugs, v9fs-developer Hello, syzbot has tested the proposed patch and the reproducer did not trigger crash: Reported-and-tested-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com Tested on: commit: e4ca13f7d075 9p/trans_fd: abort p9_read_work if req status.. git tree: git://github.com/martinetd/linux.git for-syzbot kernel config: https://syzkaller.appspot.com/x/.config?x=fada1c387645ed03 compiler: gcc (GCC) 8.0.1 20180413 (experimental) Note: testing is done by a robot and is best-effort only. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re: BUG: corrupted list in p9_read_work 2018-10-10 16:10 ` Dominique Martinet 2018-10-10 16:29 ` syzbot @ 2018-10-10 16:36 ` Dmitry Vyukov 2018-10-10 22:55 ` Dominique Martinet 1 sibling, 1 reply; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-10 16:36 UTC (permalink / raw) To: Dominique Martinet Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Wed, Oct 10, 2018 at 6:10 PM, Dominique Martinet <asmadeus@codewreck.org> wrote: > Dominique Martinet wrote on Wed, Oct 10, 2018: >> It works though, is it just picky because I didn't end it in .git? let's >> try again, sorry for the noise... >> >> #syz test: git://github.com/martinetd/linux.git e4ca13f7d075e551dc158df6af18fb412a1dba0a > > And I guess the commit hash needs to be in the default clone branch to > work ? > ('git fetch <repo> <hash>' happily fetches the commit in a new clone for > me... But that feels like a github specific behaviour maybe) yeeeep, this is bug: https://github.com/google/syzkaller/issues/728 Turns out git fetch of a named remote and just a tree work differently. The latter only fetches the main branch. 'git fetch <repo> <hash>' is it a thing? Is it something that requires special server configuration? I remember something similar that wasn't able to fetch a random commit hash all the time... The plan was to make a named remote and then fetch it, this should fetch everything. > Oh, well; made a branch for it, last try for me. > > #syz test: git://github.com/martinetd/linux.git for-syzbot > > -- > Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Re: BUG: corrupted list in p9_read_work 2018-10-10 16:36 ` Dmitry Vyukov @ 2018-10-10 22:55 ` Dominique Martinet 0 siblings, 0 replies; 33+ messages in thread From: Dominique Martinet @ 2018-10-10 22:55 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer Dmitry Vyukov wrote on Wed, Oct 10, 2018: > yeeeep, this is bug: > https://github.com/google/syzkaller/issues/728 Yeah, it makes sense ; I just had to stumble on it once :) > Turns out git fetch of a named remote and just a tree work > differently. The latter only fetches the main branch. > > 'git fetch <repo> <hash>' is it a thing? Is it something that requires > special server configuration? I remember something similar that wasn't > able to fetch a random commit hash all the time... With my version of git (2.19.1); this works with a local tree (git fetch /path/to/repo ref) and with a github remote, but not with a gitolite remote ; I don't think it's safe to assume it'll always work, but it can work. > The plan was to make a named remote and then fetch it, this should > fetch everything. Yeah, that's less efficient but that'd fetch all named branches at least; probably safer than what I tried :) Anyway, the fix seems to be a hit, so cool! Thanks! -- Dominique ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: BUG: corrupted list in p9_read_work 2018-10-09 2:09 ` Dominique Martinet ` (2 preceding siblings ...) 2018-10-10 14:29 ` BUG: corrupted list in p9_read_work Dmitry Vyukov @ 2018-10-10 14:42 ` Dmitry Vyukov 3 siblings, 0 replies; 33+ messages in thread From: Dmitry Vyukov @ 2018-10-10 14:42 UTC (permalink / raw) To: Dominique Martinet Cc: syzbot, David Miller, Eric Van Hensbergen, LKML, Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs, v9fs-developer On Tue, Oct 9, 2018 at 4:09 AM, Dominique Martinet <asmadeus@codewreck.org> wrote: > syzbot wrote on Mon, Oct 08, 2018: >> syzbot has found a reproducer for the following crash on: >> >> HEAD commit: 0854ba5ff5c9 Merge git://git.kernel.org/pub/scm/linux/kern.. >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=1514ec06400000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=88e9a8a39dc0be2d >> dashboard link: https://syzkaller.appspot.com/bug?extid=2222c34dc40b515f30dc >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10b91685400000 >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: >> Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com >> >> list_del corruption, ffff88019ae36ee8->next is LIST_POISON1 >> (dead000000000100) >> ------------[ cut here ]------------ >> [...] >> list_del include/linux/list.h:125 [inline] >> p9_read_work+0xab6/0x10e0 net/9p/trans_fd.c:379 > > Hmm this looks very much like the report from > syzbot+735d926e9d1317c3310c@syzkaller.appspotmail.com > which should have been fixed by Tomas in 9f476d7c540cb > ("net/9p/trans_fd.c: fix race by holding the lock")... > > It looks like another double list_del, looking at the code again there > actually are other ways this could happen around connection errors. > For example, > - p9_read_work receives something and lookup works... meanwhile > - p9_write_work fails to write and calls p9_conn_cancel, which deletes > from the req_list without waiting for other works to finish (could also > happen in p9_poll_mux) > - p9_read_work finishes processing the read and deletes from list again > > For this one the simplest fix would probably be to just not > list_del/call p9_client_cb at all if m->r?req->status isn't > REQ_STATUS_ERROR in p9_read_work after the "got new packet" debug print, > and frankly I think that's saner so I'll send a patch shortly doing > that, but I have zero confidence there aren't similar bugs around, the > tcp code is so messy... Most of the syzbot reports recently have been > around trans_fd which I don't think is used much in real life, and this > is not really motivating (i.e. I think it would probably need a more > extensive rewrite but nobody cares) :/ > > > Dmitry, on that note, do you think syzbot could possibly test other > transports somehow? rdma or virtio cannot be faked as easily as passing > a fd around, but I'd be very interested in seeing these flayed a bit. > > (I'm also curious what logic is used to generate the syz tests, the > write$P9_Rxx replies have nothing to do with what the client would > expect so it probably doesn't test very far; this test in particular > does not even get past the initial P9_TVERSION that the client would > expect immediately after mount, so it's basically only testing logic > around packet handling on error... Or if we're accepting a RREADDIR in > reply to TVERSION we have bigger problems, and now I'm looking at it I > think we just might never check that....... I'll look at that for the > next cycle) Good question. It's a mix of dumb and not-so-dumb. First we have descriptions of kernel interface, here are 9p ones: https://github.com/google/syzkaller/blob/master/sys/linux/9p.txt These descriptions allows to generate primitively meaningful things (e.g. proper struct layout). They also capture some interrelations between calls. For example, you can see these "resource rfd9p" and "resource wfd9p" at the top, these as "fd subtypes", and descriptions capture what produces these resources as output and what consumes these resources as input. For example, rfd9p is produced by pipe and consumed by mount, so we know that these calls need to be called in that order. But this does not work too well for, for example, 9p message tags/types, because we don't know what exactly message type we will read out and these tags expire after reply. Second, syzkaller uses code coverage as guidance. So as soon as it learns to do proper handshake, it sees new coverage and memorizes this program as useful and tries to extend it more in future. Later it learns how to create a single file, sees new coverage, memorizes, etc. This allows it to incrementally build more and more complex programs over time. You can see current code coverage it achieved here (in cover column): https://syzkaller.appspot.com/#managers e.g. (note: 80MB file): https://storage.googleapis.com/syzkaller/cover/ci-upstream-kasan-gce-root.html As far as I see it did some non-trivial progress for 9p subsystem. I don't know if it reached everything reachable or not, though. ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2018-11-20 11:29 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-16 5:59 BUG: corrupted list in p9_read_work syzbot 2018-10-09 1:07 ` syzbot 2018-10-09 2:09 ` Dominique Martinet 2018-10-09 4:05 ` [PATCH 1/2] 9p/trans_fd: abort p9_read_work if req status changed Dominique Martinet 2018-10-09 4:05 ` [PATCH 2/2] 9p/trans_fd: put worker reqs on destroy Dominique Martinet 2018-10-09 13:19 ` Tomas Bortoli 2018-10-15 10:46 ` Dominique Martinet 2018-10-10 14:03 ` BUG: corrupted list in p9_read_work Dmitry Vyukov 2018-10-10 14:40 ` Dominique Martinet 2018-10-10 14:51 ` Dmitry Vyukov 2018-10-10 15:58 ` Dominique Martinet 2018-10-11 12:33 ` Dmitry Vyukov 2018-10-11 13:10 ` Dominique Martinet 2018-10-11 13:27 ` Dmitry Vyukov 2018-10-11 13:40 ` Dmitry Vyukov 2018-10-11 14:28 ` 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work) Dominique Martinet 2018-10-12 14:42 ` Dmitry Vyukov 2018-10-11 14:19 ` Dominique Martinet 2018-10-12 14:50 ` Dmitry Vyukov 2018-10-12 15:08 ` Dominique Martinet 2018-11-17 8:46 ` Dominique Martinet 2018-11-20 11:20 ` Dmitry Vyukov 2018-11-20 11:28 ` Dominique Martinet 2018-10-10 14:29 ` BUG: corrupted list in p9_read_work Dmitry Vyukov 2018-10-10 14:48 ` Dominique Martinet 2018-10-10 14:49 ` syzbot 2018-10-10 16:00 ` Dominique Martinet 2018-10-10 16:02 ` syzbot 2018-10-10 16:10 ` Dominique Martinet 2018-10-10 16:29 ` syzbot 2018-10-10 16:36 ` Dmitry Vyukov 2018-10-10 22:55 ` Dominique Martinet 2018-10-10 14:42 ` Dmitry Vyukov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).