From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from phobos.denx.de (phobos.denx.de [85.214.62.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 74E6BECAAD5 for ; Mon, 5 Sep 2022 15:46:04 +0000 (UTC) Received: from h2850616.stratoserver.net (localhost [IPv6:::1]) by phobos.denx.de (Postfix) with ESMTP id 6C30484904; Mon, 5 Sep 2022 17:46:01 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de Authentication-Results: phobos.denx.de; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="N61SKyxr"; dkim-atps=neutral Received: by phobos.denx.de (Postfix, from userid 109) id 5ACEB8492D; Mon, 5 Sep 2022 17:45:59 +0200 (CEST) Received: from mail-qv1-xf2c.google.com (mail-qv1-xf2c.google.com [IPv6:2607:f8b0:4864:20::f2c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by phobos.denx.de (Postfix) with ESMTPS id 1CD8F8483F for ; Mon, 5 Sep 2022 17:45:56 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=seanga2@gmail.com Received: by mail-qv1-xf2c.google.com with SMTP id q8so6598158qvr.9 for ; Mon, 05 Sep 2022 08:45:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject:from:to:cc :subject:date; bh=pZvsBx5zrTVLTFfA1fvxTyPk4PcasKAcW6z/IH8TWJw=; b=N61SKyxrRZW6zokgH6NiRxge5UGKllLGLUkZyartprxwoAhMLHbafbfri/+AsexB7q 8Fi4ufBMeHbatKN2URhB/pqfzzVCphw/dQueaAEa/LiiVarhAzNHvA9gApGP/jcpAy2t sEK0jLTHVI6HSCeOGWh+SY+porV8nlAPTFPuUiDfgcoEOznBC251/yyFwzd8C2eMPh1K 2c/Iv3HH7ULCvLWy5wmGYKWrFUOHYTcT3AkgMwKuIcwylhJP8Z+okofthcdxBZPn+4j6 8gO9qFyD4HMxRkcgXWPQs259qIECR/6id+ul3B818uC2lPg4XA8D1jrszZI4SvVrJXQ8 xXoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject :x-gm-message-state:from:to:cc:subject:date; bh=pZvsBx5zrTVLTFfA1fvxTyPk4PcasKAcW6z/IH8TWJw=; b=PyNrn/5YNkSITl3p4JMLynGRfC7DCTBIjh1ziSAklrY7lAAJWpfqX+nlBDPbkZBCXl gp1vOEDSTsRfQWlihqF9Ze8VzNPYFgjS8yyz+NldBk2r2zybzNneAF/U9mJuL5Z9kEfW hBs1fsF/1BruUMWl/pdJQOUsl4YIC3GleYFTSPLUmxqA8eR+aDb45Qq+Bb/BUckyT02c om1d5/T2WLAoq6ltqcYJpXKR9wN+w9y8nlKE/Cr1EOxYo1Bq74yy/TXm+0S+F1Y8A5tW byfyjM61k18syaslelmVs5sNnAVlX26miBWTQlFtuvajKRWMBrKimQhR/pDhMS1wsBiB Z+Vg== X-Gm-Message-State: ACgBeo0KiNHDlraUYIBl28Vmn0e2N7cIaDf48r2mVEjihL9PB6YEG0Of zbwf7jGi4W0jlFGg5LJbJzw= X-Google-Smtp-Source: AA6agR7MmT15Pyd4vgWTtB/ospg7vyJN5EGQgbzYQ4hdue4osYBGyZU7Bl4T2nQ7EX+tHPy3clF1xw== X-Received: by 2002:a0c:e1cf:0:b0:4aa:96a9:8e8d with SMTP id v15-20020a0ce1cf000000b004aa96a98e8dmr617535qvl.41.1662392754943; Mon, 05 Sep 2022 08:45:54 -0700 (PDT) Received: from [192.168.1.201] (pool-173-73-95-180.washdc.fios.verizon.net. [173.73.95.180]) by smtp.gmail.com with ESMTPSA id l5-20020a05620a28c500b006b5f06186aesm8871638qkp.65.2022.09.05.08.45.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 05 Sep 2022 08:45:54 -0700 (PDT) Subject: Re: RISCV: the machanism of available_harts may cause other harts boot failure To: Heinrich Schuchardt Cc: Lukas Auer , U-Boot Mailing List , Atish Patra , Anup Patel , Bin Meng , Leo Liang , rick , Nikita Shubin , Rick Chen References: <20220905104735.5c2a260d@redslave.neermore.group> <53ef4762-eb1d-043c-69de-a621eb3806d2@gmx.de> From: Sean Anderson Message-ID: Date: Mon, 5 Sep 2022 11:45:53 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <53ef4762-eb1d-043c-69de-a621eb3806d2@gmx.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.39 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" X-Virus-Scanned: clamav-milter 0.103.6 at phobos.denx.de X-Virus-Status: Clean On 9/5/22 11:41 AM, Heinrich Schuchardt wrote: > On 9/5/22 17:30, Sean Anderson wrote: >> On 9/5/22 3:47 AM, Nikita Shubin wrote: >>> Hi Rick! >>> >>> On Mon, 5 Sep 2022 14:22:41 +0800 >>> Rick Chen wrote: >>> >>>> Hi, >>>> >>>> When I free-run a SMP system, I once hit a failure case where some >>>> harts didn't boot to the kernel shell successfully. >>>> However it can't be duplicated anymore even if I try many times. >>>> >>>> But when I set a break during debugging with GDB, it can trigger the= >>>> failure case each time. >>> >>> If hart fails to register itself to available_harts before >>> send_ipi_many is hit by the main hart: >>> https://elixir.bootlin.com/u-boot/v2022.10-rc3/source/arch/riscv/lib/= smp.c#L50 >>> >>> it won't exit the secondary_hart_loop: >>> https://elixir.bootlin.com/u-boot/v2022.10-rc3/source/arch/riscv/cpu/= start.S#L433 >>> As no ipi will be sent to it. >=20 > Can we call send_ipi_many() again when booting? AFAIK we do; see arch/riscv/lib/bootm.c and arch/riscv/lib/spl.c > Do we need to call it before booting? Yes. We also call it when relocating (in SPL and U-Boot proper). >>> >>> This might be exactly your case. >> >> When working on the IPI mechanism, I considered this possibility. Howe= ver, >> there's really no way to know how long to wait. On normal systems, the= boot >> hart is going to do a lot of work before calling send_ipi_many, and th= e >> other harts just have to make it through ~100 instructions. So I figur= ed we >> would never run into this issue. >> >> We might not even need the mask... the only direct reason we might is = for >> OpenSBI, as spl_invoke_opensbi is the only function which uses the wai= t >> parameter. >> >>>> I think the mechanism of available_harts does not provide a method >>>> that guarantees the success of the SMP system. >>>> Maybe we shall think of a better way for the SMP booting or just >>>> remove it ? >>> >>> I haven't experienced any unexplained problem with hart_lottery or >>> available_harts_lock unless: >>> >>> 1) harts are started non-simultaneously >>> 2) SPL/U-Boot is in some kind of TCM, OCRAM, etc... which is not clea= red >>> on reset which leaves available_harts dirty >> >> XIP, of course, has this problem every time and just doesn't use the m= ask. >> I remember thinking a lot about how to deal with this, but I never end= ed >> up sending a patch because I didn't have a XIP system. >> >> --Sean >> >>> 3) something is wrong with atomics >>> >>> Also there might be something wrong with IPI send/recieve. >>> >>>> >>>> Thread 8 hit Breakpoint 1, harts_early_init () >>>> >>>> (gdb) c >>>> Continuing. >>>> [Switching to Thread 7] >>>> >>>> Thread 7 hit Breakpoint 1, harts_early_init () >>>> >>>> (gdb) >>>> Continuing. >>>> [Switching to Thread 6] >>>> >>>> Thread 6 hit Breakpoint 1, harts_early_init () >>>> >>>> (gdb) >>>> Continuing. >>>> [Switching to Thread 5] >>>> >>>> Thread 5 hit Breakpoint 1, harts_early_init () >>>> >>>> (gdb) >>>> Continuing. >>>> [Switching to Thread 4] >>>> >>>> Thread 4 hit Breakpoint 1, harts_early_init () >>>> >>>> (gdb) >>>> Continuing. >>>> [Switching to Thread 3] >>>> >>>> Thread 3 hit Breakpoint 1, harts_early_init () >>>> (gdb) >>>> Continuing. >>>> [Switching to Thread 2] >>>> >>>> Thread 2 hit Breakpoint 1, harts_early_init () >>>> (gdb) >>>> Continuing. >>>> [Switching to Thread 1] >>>> >>>> Thread 1 hit Breakpoint 1, harts_early_init () >>>> (gdb) >>>> Continuing. >>>> [Switching to Thread 5] >>>> >>>> >>>> Thread 5 hit Breakpoint 3, 0x0000000001200000 in ?? () >>>> (gdb) info threads >>>> =C2=A0=C2=A0 Id=C2=A0=C2=A0 Target Id=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 Frame >>>> =C2=A0=C2=A0 1=C2=A0=C2=A0=C2=A0 Thread 1 (hart 1) secondary_hart_lo= op () at >>>> arch/riscv/cpu/start.S:436 2=C2=A0=C2=A0=C2=A0 Thread 2 (hart 2) sec= ondary_hart_loop >>>> () at arch/riscv/cpu/start.S:436 3=C2=A0=C2=A0=C2=A0 Thread 3 (hart = 3) >>>> secondary_hart_loop () at arch/riscv/cpu/start.S:436 4=C2=A0=C2=A0=C2= =A0 Thread 4 >>>> (hart 4) secondary_hart_loop () at arch/riscv/cpu/start.S:436 >>>> * 5=C2=A0=C2=A0=C2=A0 Thread 5 (hart 5) 0x0000000001200000 in ?? () >>>> =C2=A0=C2=A0 6=C2=A0=C2=A0=C2=A0 Thread 6 (hart 6) 0x000000000000b65= 0 in ?? () >>>> =C2=A0=C2=A0 7=C2=A0=C2=A0=C2=A0 Thread 7 (hart 7) 0x000000000000b65= 0 in ?? () >>>> =C2=A0=C2=A0 8=C2=A0=C2=A0=C2=A0 Thread 8 (hart 8) 0x0000000000005fa= 0 in ?? () >>>> (gdb) c >>>> Continuing. >>> >>> Do they all "offline" harts remain in SPL/U-Boot secondary_hart_loop = ? >>> >>>> >>>> >>>> >>>> [=C2=A0=C2=A0=C2=A0 0.175619] smp: Bringing up secondary CPUs ... >>>> [=C2=A0=C2=A0=C2=A0 1.230474] CPU1: failed to come online >>>> [=C2=A0=C2=A0=C2=A0 2.282349] CPU2: failed to come online >>>> [=C2=A0=C2=A0=C2=A0 3.334394] CPU3: failed to come online >>>> [=C2=A0=C2=A0=C2=A0 4.386783] CPU4: failed to come online >>>> [=C2=A0=C2=A0=C2=A0 4.427829] smp: Brought up 1 node, 4 CPUs >>>> >>>> >>>> /root # cat /proc/cpuinfo >>>> processor=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 : 0 >>>> hart=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= : 4 >>>> isa=C2=A0=C2=A0=C2=A0=C2=A0 : rv64i2p0m2p0a2p0c2p0xv5-1p1 >>>> mmu=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 : sv39 >>>> >>>> processor=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 : 5 >>>> hart=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= : 5 >>>> isa=C2=A0=C2=A0=C2=A0=C2=A0 : rv64i2p0m2p0a2p0c2p0xv5-1p1 >>>> mmu=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 : sv39 >>>> >>>> processor=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 : 6 >>>> hart=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= : 6 >>>> isa=C2=A0=C2=A0=C2=A0=C2=A0 : rv64i2p0m2p0a2p0c2p0xv5-1p1 >>>> mmu=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 : sv39 >>>> >>>> processor=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 : 7 >>>> hart=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= : 7 >>>> isa=C2=A0=C2=A0=C2=A0=C2=A0 : rv64i2p0m2p0a2p0c2p0xv5-1p1 >>>> mmu=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 : sv39 >>>> >>>> /root # >>>> >>>> Thanks, >>>> Rick >>> >> >=20