From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B67AC49EA3 for ; Mon, 28 Jun 2021 12:44:01 +0000 (UTC) Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by mail.kernel.org (Postfix) with ESMTP id A156E61C6A for ; Mon, 28 Jun 2021 12:44:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A156E61C6A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D18524111F; Mon, 28 Jun 2021 14:43:59 +0200 (CEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by mails.dpdk.org (Postfix) with ESMTP id E247A4069F for ; Mon, 28 Jun 2021 14:43:57 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10028"; a="229561141" X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="229561141" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2021 05:43:56 -0700 X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="446551381" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.249.202]) ([10.213.249.202]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2021 05:43:53 -0700 To: "Ananyev, Konstantin" , "dev@dpdk.org" , Jerin Jacob , Ruifeng Wang , Jan Viktorin , David Christensen , Ray Kinsella , Neil Horman , "Richardson, Bruce" Cc: "Hunt, David" , "Loftus, Ciara" References: <6c48562add728df4e2f22b3a41170d624c63e233.1624629506.git.anatoly.burakov@intel.com> From: "Burakov, Anatoly" Message-ID: <6a932db8-1c6c-8400-c274-ea40918ea023@intel.com> Date: Mon, 28 Jun 2021 13:43:48 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2 3/7] eal: add power monitor for multiple events X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 28-Jun-21 1:37 PM, Ananyev, Konstantin wrote: > >> Use RTM and WAITPKG instructions to perform a wait-for-writes similar to >> what UMWAIT does, but without the limitation of having to listen for >> just one event. This works because the optimized power state used by the >> TPAUSE instruction will cause a wake up on RTM transaction abort, so if >> we add the addresses we're interested in to the read-set, any write to >> those addresses will wake us up. >> >> Signed-off-by: Konstantin Ananyev >> Signed-off-by: Anatoly Burakov >> --- >> >> Notes: >> v2: >> - Adapt to callback mechanism >> >> doc/guides/rel_notes/release_21_08.rst | 2 + >> lib/eal/arm/rte_power_intrinsics.c | 11 +++ >> lib/eal/include/generic/rte_cpuflags.h | 2 + >> .../include/generic/rte_power_intrinsics.h | 35 ++++++++++ >> lib/eal/ppc/rte_power_intrinsics.c | 11 +++ >> lib/eal/version.map | 3 + >> lib/eal/x86/rte_cpuflags.c | 2 + >> lib/eal/x86/rte_power_intrinsics.c | 69 +++++++++++++++++++ >> 8 files changed, 135 insertions(+) >> > ... > >> diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c >> index 3c5c9ce7ad..3fc6f62ef5 100644 >> --- a/lib/eal/x86/rte_power_intrinsics.c >> +++ b/lib/eal/x86/rte_power_intrinsics.c >> @@ -4,6 +4,7 @@ >> >> #include >> #include >> +#include >> #include >> >> #include "rte_power_intrinsics.h" >> @@ -28,6 +29,7 @@ __umwait_wakeup(volatile void *addr) >> } >> >> static bool wait_supported; >> +static bool wait_multi_supported; >> >> static inline uint64_t >> __get_umwait_val(const volatile void *p, const uint8_t sz) >> @@ -164,6 +166,8 @@ RTE_INIT(rte_power_intrinsics_init) { >> >> if (i.power_monitor && i.power_pause) >> wait_supported = 1; >> + if (i.power_monitor_multi) >> + wait_multi_supported = 1; >> } >> >> int >> @@ -202,6 +206,9 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) >> * In this case, since we've already woken up, the "wakeup" was >> * unneeded, and since T1 is still waiting on T2 releasing the lock, the >> * wakeup address is still valid so it's perfectly safe to write it. >> + * >> + * For multi-monitor case, the act of locking will in itself trigger the >> + * wakeup, so no additional writes necessary. >> */ >> rte_spinlock_lock(&s->lock); >> if (s->monitor_addr != NULL) >> @@ -210,3 +217,65 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) >> >> return 0; >> } >> + >> +int >> +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], >> + const uint32_t num, const uint64_t tsc_timestamp) >> +{ >> + const unsigned int lcore_id = rte_lcore_id(); >> + struct power_wait_status *s = &wait_status[lcore_id]; >> + uint32_t i, rc; >> + >> + /* check if supported */ >> + if (!wait_multi_supported) >> + return -ENOTSUP; >> + >> + if (pmc == NULL || num == 0) >> + return -EINVAL; >> + >> + /* we are already inside transaction region, return */ >> + if (rte_xtest() != 0) >> + return 0; >> + >> + /* start new transaction region */ >> + rc = rte_xbegin(); >> + >> + /* transaction abort, possible write to one of wait addresses */ >> + if (rc != RTE_XBEGIN_STARTED) >> + return 0; >> + >> + /* >> + * the mere act of reading the lock status here adds the lock to >> + * the read set. This means that when we trigger a wakeup from another >> + * thread, even if we don't have a defined wakeup address and thus don't >> + * actually cause any writes, the act of locking our lock will itself >> + * trigger the wakeup and abort the transaction. >> + */ >> + rte_spinlock_is_locked(&s->lock); >> + >> + /* >> + * add all addresses to wait on into transaction read-set and check if >> + * any of wakeup conditions are already met. >> + */ >> + for (i = 0; i < num; i++) { >> + const struct rte_power_monitor_cond *c = &pmc[i]; >> + >> + if (pmc->fn == NULL) > > Should be c->fn, I believe. Yep, will fix. > >> + continue; > > Actually that way, if c->fn == NULL, we'll never add our c->addr to monitored addresses. > Is that what we really want? > My thought was, that if callback is not set, we'll just go to power-save state without extra checking, no? > Something like that: > > const struct rte_power_monitor_cond *c = &pmc[i]; > const uint64_t val = __get_umwait_val(c->addr, c->size); > > if (c->fn && c->fn(val, c->opaque) != 0) > break; This is consistent with previous behavior of rte_power_monitor where if mask wasn't set we entered power save mode without any checks. If we do a break, that means the check condition has failed somewhere and we have to abort the sleep. Continue keeps the sleep. > > Same thought for rte_power_monitor(). > >> + const uint64_t val = __get_umwait_val(pmc->addr, pmc->size); > > Same thing: s/pmc->/c->/ Yep, you're right. > >> + >> + /* abort if callback indicates that we need to stop */ >> + if (c->fn(val, c->opaque) != 0) >> + break; >> + } >> + >> + /* none of the conditions were met, sleep until timeout */ >> + if (i == num) >> + rte_power_pause(tsc_timestamp); >> + >> + /* end transaction region */ >> + rte_xend(); >> + >> + return 0; >> +} >> -- >> 2.25.1 > -- Thanks, Anatoly