From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0552CC48BD6 for ; Tue, 25 Jun 2019 07:25:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D3E902085A for ; Tue, 25 Jun 2019 07:25:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730051AbfFYHZl (ORCPT ); Tue, 25 Jun 2019 03:25:41 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:35864 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730031AbfFYHZj (ORCPT ); Tue, 25 Jun 2019 03:25:39 -0400 Received: by mail-pl1-f193.google.com with SMTP id k8so8366752plt.3 for ; Tue, 25 Jun 2019 00:25:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:subject:in-reply-to:cc:from:to:message-id :mime-version:content-transfer-encoding; bh=D5TcDpWvSvR6guGVHBK74rSc1eO/xYZXbYoBMxMUInU=; b=ujtqgtKO6naYBjlDKRpqKVRFB/HZRdc6Jcz+pObGECKk+s2K+8skWh+UZZow932iu3 /SZSQIKy2KaiXCkAvV/e6qdbERFfrv9of/HBeqe5aOqdILK4UtKXyGKn+WWCv6iCXYX0 T0i4QixjiIpbVgjiFF/mYb6JRs3rgFATV5GJxV8H0AipHpATb4KMP/qOs9/TFnkhL/Qz O9cWF4+PD0qO1lTtCqgcbmHvVl5id1l+V6+FfzuiUxi5xyPmW6WY+1vUQB3Os8nEZ6S+ 7e6OLIAh8qLizPT1Hk/zCJknL2SHtZi/+R/kdnKlxoH3CYgUM+e+G6dXZqYbJDOVpAPu YDIQ== X-Gm-Message-State: APjAAAUWMzLtCvkD0edemApFiJV1j6GbmD8p32GraTsYaNhyrkYCs/QC K5C8DNza2W5Xfvr+cbl6JLFIgg== X-Google-Smtp-Source: APXvYqyhnohLrNNrcI6vZeIQBQp/G4PBGqvQA75lKl/2Tpz7XDtiEUVz7aNrXVF4nP1pxWqGGr2wLg== X-Received: by 2002:a17:902:bf08:: with SMTP id bi8mr118969510plb.189.1561447538458; Tue, 25 Jun 2019 00:25:38 -0700 (PDT) Received: from localhost (220-132-236-182.HINET-IP.hinet.net. [220.132.236.182]) by smtp.gmail.com with ESMTPSA id v4sm14845133pff.45.2019.06.25.00.25.37 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 00:25:37 -0700 (PDT) Date: Tue, 25 Jun 2019 00:25:37 -0700 (PDT) X-Google-Original-Date: Tue, 25 Jun 2019 00:24:47 PDT (-0700) Subject: Re: [PATCH] arm64: asid: Optimize cache_flush for SMT In-Reply-To: <20190624114010.GA51882@lakrids.cambridge.arm.com> CC: guoren@kernel.org, julien.grall@arm.com, Arnd Bergmann , linux-kernel@vger.kernel.org, linux-csky@vger.kernel.org, ren_guo@c-sky.com, catalin.marinas@arm.com From: Palmer Dabbelt To: mark.rutland@arm.com Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-csky-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-csky@vger.kernel.org On Mon, 24 Jun 2019 04:40:10 PDT (-0700), mark.rutland@arm.com wrote: > I'm very confused by this patch. The title says arm64, yet the code is > under arch/csky/, and the code in question refers to HARTs, which IIUC > is RISC-V terminology. > > On Mon, Jun 24, 2019 at 12:04:29AM +0800, guoren@kernel.org wrote: >> From: Guo Ren >> >> The hardware threads of one core could share the same TLB for SMT+SMP >> system. Assume hardware threads number sequence like this: >> >> | 0 1 2 3 | 4 5 6 7 | 8 9 a b | c d e f | >> core1 core2 core3 core4 > > Given this is the Linux logical CPU ID rather than a physical CPU ID, > this assumption is not valid. For example, CPUs may be renumbered across > kexec. > > Even if this were a physical CPU ID, this doesn't hold on arm64 (e.g. > due to big.LITTLE). > >> Current algorithm seems is correct for SMT+SMP, but it'll give some >> duplicate local_tlb_flush. Because one hardware threads local_tlb_flush >> will also flush other hardware threads' TLB entry in one core TLB. > > Does any architecture specification mandate that behaviour? > > That isn't true for arm64, I have no idea whether RISC-V mandates that, > and as below it seems this is irrelevant on C-SKY. There is no event defined by RISC-V that ever requires implementations flush the TLB of more than one hart at a time. There is also nothing in the normative text of the RISC-V manuals that allows for any differentiation between multiple threads on a single core and multiple cores (though I am about to suggest adding two, against my will :)). >> So we can use bitmap to reduce local_tlb_flush for SMT. >> >> C-SKY cores don't support SMT and the patch is no benefit for C-SKY. > > As above, this patch is very confusing -- if this doesn't benefit C-SKY, > why modify the C-SKY code? > > Thanks, > Mark. > >> >> Signed-off-by: Guo Ren >> Cc: Catalin Marinas >> Cc: Julien Grall >> --- >> arch/csky/include/asm/asid.h | 4 ++++ >> arch/csky/mm/asid.c | 11 ++++++++++- >> arch/csky/mm/context.c | 2 +- >> 3 files changed, 15 insertions(+), 2 deletions(-) >> >> diff --git a/arch/csky/include/asm/asid.h b/arch/csky/include/asm/asid.h >> index ac08b0f..f654492 100644 >> --- a/arch/csky/include/asm/asid.h >> +++ b/arch/csky/include/asm/asid.h >> @@ -23,6 +23,9 @@ struct asid_info >> unsigned int ctxt_shift; >> /* Callback to locally flush the context. */ >> void (*flush_cpu_ctxt_cb)(void); >> + /* To reduce duplicate tlb_flush for SMT */ >> + unsigned int harts_per_core; >> + unsigned int harts_per_core_mask; >> }; >> >> #define NUM_ASIDS(info) (1UL << ((info)->bits)) >> @@ -73,6 +76,7 @@ static inline void asid_check_context(struct asid_info *info, >> >> int asid_allocator_init(struct asid_info *info, >> u32 bits, unsigned int asid_per_ctxt, >> + unsigned int harts_per_core, >> void (*flush_cpu_ctxt_cb)(void)); >> >> #endif >> diff --git a/arch/csky/mm/asid.c b/arch/csky/mm/asid.c >> index b2e9147..50a983e 100644 >> --- a/arch/csky/mm/asid.c >> +++ b/arch/csky/mm/asid.c >> @@ -148,8 +148,13 @@ void asid_new_context(struct asid_info *info, atomic64_t *pasid, >> atomic64_set(pasid, asid); >> } >> >> - if (cpumask_test_and_clear_cpu(cpu, &info->flush_pending)) >> + if (cpumask_test_cpu(cpu, &info->flush_pending)) { >> + unsigned int i; >> + unsigned int harts_base = cpu & info->harts_per_core_mask; >> info->flush_cpu_ctxt_cb(); >> + for (i = 0; i < info->harts_per_core; i++) >> + cpumask_clear_cpu(harts_base + i, &info->flush_pending); >> + } >> >> atomic64_set(&active_asid(info, cpu), asid); >> cpumask_set_cpu(cpu, mm_cpumask(mm)); >> @@ -162,15 +167,19 @@ void asid_new_context(struct asid_info *info, atomic64_t *pasid, >> * @info: Pointer to the asid allocator structure >> * @bits: Number of ASIDs available >> * @asid_per_ctxt: Number of ASIDs to allocate per-context. ASIDs are >> + * @harts_per_core: Number hardware threads per core, must be 1, 2, 4, 8, 16 ... >> * allocated contiguously for a given context. This value should be a power of >> * 2. >> */ >> int asid_allocator_init(struct asid_info *info, >> u32 bits, unsigned int asid_per_ctxt, >> + unsigned int harts_per_core, >> void (*flush_cpu_ctxt_cb)(void)) >> { >> info->bits = bits; >> info->ctxt_shift = ilog2(asid_per_ctxt); >> + info->harts_per_core = harts_per_core; >> + info->harts_per_core_mask = ~((1 << ilog2(harts_per_core)) - 1); >> info->flush_cpu_ctxt_cb = flush_cpu_ctxt_cb; >> /* >> * Expect allocation after rollover to fail if we don't have at least >> diff --git a/arch/csky/mm/context.c b/arch/csky/mm/context.c >> index 0d95bdd..b58523b 100644 >> --- a/arch/csky/mm/context.c >> +++ b/arch/csky/mm/context.c >> @@ -30,7 +30,7 @@ static int asids_init(void) >> { >> BUG_ON(((1 << CONFIG_CPU_ASID_BITS) - 1) <= num_possible_cpus()); >> >> - if (asid_allocator_init(&asid_info, CONFIG_CPU_ASID_BITS, 1, >> + if (asid_allocator_init(&asid_info, CONFIG_CPU_ASID_BITS, 1, 1, >> asid_flush_cpu_ctxt)) >> panic("Unable to initialize ASID allocator for %lu ASIDs\n", >> NUM_ASIDS(&asid_info)); >> -- >> 2.7.4 >>