From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC410C433E0 for ; Tue, 16 Mar 2021 11:45:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 82B2864FB5 for ; Tue, 16 Mar 2021 11:45:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237444AbhCPLpI (ORCPT ); Tue, 16 Mar 2021 07:45:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237403AbhCPLoc (ORCPT ); Tue, 16 Mar 2021 07:44:32 -0400 Received: from mail-qv1-xf2b.google.com (mail-qv1-xf2b.google.com [IPv6:2607:f8b0:4864:20::f2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90230C06174A for ; Tue, 16 Mar 2021 04:44:31 -0700 (PDT) Received: by mail-qv1-xf2b.google.com with SMTP id t16so9311351qvr.12 for ; Tue, 16 Mar 2021 04:44:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=NTsYJCryBLJmBTqd/Py0Dz+f5uiVV4FtLBaghBowCOY=; b=TBnl/KM9TU9p3coDdCDbo4TpBTM+I6Lu4+vfs0NooyhGrAknmu42FsT84Ix0sGuAPt Lng68vrtJgMiw4W9CmAOvLdv3JoK+52MrGC0LFf4EkPb0xm5cCTzSz7g5xW+EqEYV/6S WMNY+pXa3zNE8eMcjELxJzryDHFWzXAeKlmSNWWt2xEas8Fv7dfXt4nkwnWgGPNwhwo4 lxMLXOSyYZZ5KDMD06BaYUNWDasIuZ7tEf1MsgK8NTj4enGU1PTYc+uJiqdZWoXeum8v jHqTLJ0IO2DhGJ9bpb2lBAjJisQyIxy351+bwq1wx/CbyLgbmKXCJSh7QnInr/GaP8uF uofQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=NTsYJCryBLJmBTqd/Py0Dz+f5uiVV4FtLBaghBowCOY=; b=RRY6ZkNkk0dEJ1etpXHfedpXRDckfkNu++M9eYuMr/R4oaQzr8YNGs7sKllzWragpW KgUMCsGKjmUqmM3jlzlcYgZLEA/lrtMU1G8JlfggK6E2WYnszEvZJ02ZrKXYSdJGPB5i GbITultnQS1bbHIZGQvcA1Wj+C5xzNNgPmOU4qfvWeBODYgbtV9Cb3cR/BpZDTKVe7HA ooDvPW56VfiiNaehjKCiMi6VRCRDIpJHjhNmB8dKWXm26jAYoByqF906Qx+Cz1MjBLW0 NBgeGKOBw96uKVLZmsN4fqNG/3y3mcPcNmEda07HZwwbxdCVaSuejrRWvd9oxesO5HGJ 2YkA== X-Gm-Message-State: AOAM530fn5Z6U0FZ89lD1iGsr9RjwXxvFTPxgLMCMx6odFwLkcKVXKhh 5pnMkRg/eX0LU7wqk+cLGOKM1uv9QPdcs2wlKE3T8A== X-Google-Smtp-Source: ABdhPJyLRD4J+HqEmUtaqMButRCVbSTrKpNp4w0KfbKrtVqm1s3EXPUUKtGMxrNEt0zw7/mWKpAol9HqLdnV+mDLyTg= X-Received: by 2002:ad4:50d0:: with SMTP id e16mr29583746qvq.37.1615895070392; Tue, 16 Mar 2021 04:44:30 -0700 (PDT) MIME-Version: 1.0 References: <000000000000b74f1b05bd316729@google.com> <84b0471d-42c1-175f-ae1d-a18c310c7f77@codethink.co.uk> <795597a1-ec87-e09e-d073-3daf10422abb@ghiti.fr> <12d4137e-6c14-bc41-4bbc-955ce46198d2@codethink.co.uk> <8ebea51d-b03c-e6de-fa1c-d47091c54e45@codethink.co.uk> In-Reply-To: <8ebea51d-b03c-e6de-fa1c-d47091c54e45@codethink.co.uk> From: Dmitry Vyukov Date: Tue, 16 Mar 2021 12:44:18 +0100 Message-ID: Subject: Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail To: Ben Dooks Cc: Alex Ghiti , syzbot , Paul Walmsley , Palmer Dabbelt , Albert Ou , linux-riscv , Daniel Bristot de Oliveira , Benjamin Segall , dietmar.eggemann@arm.com, Juri Lelli , LKML , Mel Gorman , Ingo Molnar , Peter Zijlstra , Steven Rostedt , syzkaller-bugs , Vincent Guittot Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 16, 2021 at 12:35 PM Ben Dooks wrot= e: > >>>> On 12/03/2021 16:25, Alex Ghiti wrote: > >>>>> > >>>>> > >>>>> Le 3/12/21 =C3=A0 10:12 AM, Dmitry Vyukov a =C3=A9crit : > >>>>>> On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks > >>>>>> wrote: > >>>>>>> > >>>>>>> On 10/03/2021 17:16, Dmitry Vyukov wrote: > >>>>>>>> On Wed, Mar 10, 2021 at 5:46 PM syzbot > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Hello, > >>>>>>>>> > >>>>>>>>> syzbot found the following issue on: > >>>>>>>>> > >>>>>>>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for > >>>>>>>>> arch_dup_tas.. > >>>>>>>>> git tree: > >>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git f= ixes > >>>>>>>>> console output: > >>>>>>>>> https://syzkaller.appspot.com/x/log.txt?x=3D1212c6e6d00000 > >>>>>>>>> kernel config: > >>>>>>>>> https://syzkaller.appspot.com/x/.config?x=3De3c595255fb2d136 > >>>>>>>>> dashboard link: > >>>>>>>>> https://syzkaller.appspot.com/bug?extid=3De74b94fe601ab9552d69 > >>>>>>>>> userspace arch: riscv64 > >>>>>>>>> > >>>>>>>>> Unfortunately, I don't have any reproducer for this issue yet. > >>>>>>>>> > >>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag t= o > >>>>>>>>> the commit: > >>>>>>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.= com > >>>>>>>> > >>>>>>>> +riscv maintainers > >>>>>>>> > >>>>>>>> This is riscv64-specific. > >>>>>>>> I've seen similar crashes in put_user in other places. It looks = like > >>>>>>>> put_user crashes in the user address is not mapped/protected (?)= . > >>>>>>> > >>>>>>> I've been having a look, and this seems to be down to access of t= he > >>>>>>> tsk->set_child_tid variable. I assume the fuzzing here is to pass= a > >>>>>>> bad address to clone? > >>>>>>> > >>>>>>> From looking at the code, the put_user() code should have set= the > >>>>>>> relevant SR_SUM bit (the value for this, which is 1<<18 is in the > >>>>>>> s2 register in the crash report) and from looking at the compiler > >>>>>>> output from my gcc-10, the code looks to be dong the relevant csr= s > >>>>>>> and then csrc around the put_user > >>>>>>> > >>>>>>> So currently I do not understand how the above could have happene= d > >>>>>>> over than something re-tried the code seqeunce and ended up retry= ing > >>>>>>> the faulting instruction without the SR_SUM bit set. > >>>>>> > >>>>>> I would maybe blame qemu for randomly resetting SR_SUM, but it's > >>>>>> strange that 99% of these crashes are in schedule_tail. If it woul= d be > >>>>>> qemu, then they would be more evenly distributed... > >>>>>> > >>>>>> Another observation: looking at a dozen of crash logs, in none of > >>>>>> these cases fuzzer was actually trying to fuzz clone with some ins= ane > >>>>>> arguments. So it looks like completely normal clone's (e..g coming > >>>>>> from pthread_create) result in this crash. > >>>>>> > >>>>>> I also wonder why there is ret_from_exception, is it normal? I see > >>>>>> handle_exception disables SR_SUM: > >>>>> > >>>>> csrrc does the right thing: it cleans SR_SUM bit in status but save= s the > >>>>> previous value that will get correctly restored. > >>>>> > >>>>> ("The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads t= he > >>>>> value of the CSR, zero-extends the value to XLEN bits, and writes i= t to > >>>>> integer registerrd. The initial value in integerregisterrs1is trea= ted > >>>>> as a bit mask that specifies bit positions to be cleared in the CSR= . Any > >>>>> bitthat is high inrs1will cause the corresponding bit to be cleared= in > >>>>> the CSR, if that CSR bit iswritable. Other bits in the CSR are > >>>>> unaffected.") > >>>> > >>>> I think there may also be an understanding issue on what the SR_SUM > >>>> bit does. I thought if it is set, M->U accesses would fault, which i= s > >>>> why it gets set early on. But from reading the uaccess code it looks > >>>> like the uaccess code sets it on entry and then clears on exit. > >>>> > >>>> I am very confused. Is there a master reference for rv64? > >>>> > >>>> https://people.eecs.berkeley.edu/~krste/papers/riscv-privileged-v1.9= .pdf > >>>> seems to state PUM is the SR_SUM bit, and that (if set) disabled > >>>> > >>>> Quote: > >>>> The PUM (Protect User Memory) bit modifies the privilege with wh= ich > >>>> S-mode loads, stores, and instruction fetches access virtual memory. > >>>> When PUM=3D0, translation and protection behave as normal. When PUM= =3D1, > >>>> S-mode memory accesses to pages that are accessible by U-mode (U=3D1= in > >>>> Figure 4.19) will fault. PUM has no effect when executing in U-mode > >>>> > >>>> > >>>>>> https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kerne= l/entry.S#L73 > >>>>>> > >>>>> > >>>>> Still no luck for the moment, can't reproduce it locally, my test i= s > >>>>> maybe not that good (I created threads all day long in order to tri= gger > >>>>> the put_user of schedule_tail). > >>>> > >>>> It may of course depend on memory and other stuff. I did try to see = if > >>>> it was possible to clone() with the child_tid address being a valid = but > >>>> not mapped page... > >>>> > >>>>> Given that the path you mention works most of the time, and that th= e > >>>>> status register in the stack trace shows the SUM bit is not set whe= reas > >>>>> it is set in put_user, I'm leaning toward some race condition (mayb= e an > >>>>> interrupt that arrives at the "wrong" time) or a qemu issue as you > >>>>> mentioned. > >>>> > >>>> I suppose this is possible. From what I read it should get to the > >>>> point of being there with the SUM flag cleared, so either something > >>>> went wrong in trying to fix the instruction up or there's some other > >>>> error we're missing. > >>>> > >>>>> To eliminate qemu issues, do you have access to some HW ? Or to > >>>>> different qemu versions ? > >>>> > >>>> I do have access to a Microchip Polarfire board. I just need the > >>>> instructions on how to setup the test-code to make it work on the > >>>> hardware. > >>> > >>> For full syzkaller support, it would need to know how to reboot these > >>> boards and get access to the console. > >>> syzkaller has a stop-gap VM backend which just uses ssh to a physical > >>> machine and expects the kernel to reboot on its own after any crashes= . > >>> > >>> But I actually managed to reproduce it in an even simpler setup. > >>> Assuming you have Go 1.15 and riscv64 cross-compiler gcc installed > >>> > >>> $ go get -u -d github.com/google/syzkaller/... > >>> $ cd $GOPATH/src/github.com/google/syzkaller > >>> $ make stress executor TARGETARCH=3Driscv64 > >>> $ scp bin/linux_riscv64/syz-execprog bin/linux_riscv64/syz-executor > >>> your_machine:/ > >>> > >>> Then run ./syz-stress on the machine. > >>> On the first run it crashed it with some other bug, on the second run > >>> I got the crash in schedule_tail. > >>> With qemu tcg I also added -slowdown=3D10 flag to syz-stress to scale > >>> all timeouts, if native execution is faster, then you don't need it. > >> > >> Ok, not sure what's going on. I get a lot of errors similar to: > >>> > >>> 2021/03/15 21:35:20 transitively unsupported: ioctl$SNAPSHOT_CREATE_I= MAGE: no syscalls can create resource fd_snapshot, enable some syscalls tha= t can create it [openat$snapshot] > > > > This is not an error, just a notification that some syscalls are not > > enabled in the kernel and won't be fuzzed. > > > >> Followed by: > >> > >>> 2021/03/15 21:35:48 executed 0 programs > >>> 2021/03/15 21:35:48 failed to create execution environment: failed to= mmap shm file: invalid argument > >> > >> The qemu is 5.2.0 and root is Debian/unstable riscv64 (same as chroot > >> used to build the syz tools) > > > > This is an error. But I see it the first time ever. > > It comes from here: > > https://github.com/google/syzkaller/blob/fdb2bb2c23ee709880407f56307e28= 00ad27e9ae/pkg/osutil/osutil_unix.go#L119-L121 > > There should be pretty simple logic inside of syscall.Mmap. Perhaps > > you are using some older Go toolchain with incomplete riscv support? > > I think I've used 1.14 and 1.15. But there is already 1.16. You can > > always download a toolchain here: > > https://golang.org/dl/ > > Hmm it would have been useful to print out what file it failed to map. What do you want to do with the file name? It's not one of pre-existing files, so the name won't tell the user much. It's just a temp file, it won't exist afterwards and it's easy to create an equivalent file. It was created in that function with: f, err =3D ioutil.TempFile("./", "syzkaller-shm") if err !=3D nil { err =3D fmt.Errorf("failed to create temp file: %v", err) return } if err =3D f.Truncate(int64(size)); err !=3D nil { err =3D fmt.Errorf("failed to truncate shm file: %v", err) f.Close() os.Remove(f.Name()) return } f.Close() fname :=3D f.Name() f, err =3D os.OpenFile(f.Name(), os.O_RDWR, DefaultFilePerm) if err !=3D nil { err =3D fmt.Errorf("failed to open shm file: %v", err) os.Remove(fname) return } > I've got go 1.15 from the debian/unstable riscv64 chroot. > I'll have a look at this in a bit to see if it throws the same issue on > a real system. > > > -- > Ben Dooks http://www.codethink.co.uk/ > Senior Engineer Codethink - Providing Genius > > https://www.codethink.co.uk/privacy.html > > -- > You received this message because you are subscribed to the Google Groups= "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgi= d/syzkaller-bugs/8ebea51d-b03c-e6de-fa1c-d47091c54e45%40codethink.co.uk.