From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94D50C49EAF for ; Thu, 24 Jun 2021 15:56:39 +0000 (UTC) Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by mail.kernel.org (Postfix) with ESMTP id 2481B613EC for ; Thu, 24 Jun 2021 15:56:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2481B613EC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1480540040; Thu, 24 Jun 2021 17:56:38 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id 791E54003C for ; Thu, 24 Jun 2021 17:56:36 +0200 (CEST) IronPort-SDR: MWWaK6Vd4098qRmpkUcD20Vp5pFeoCmA7YYEclr5o0bmq/0Gn1eZN8w0sgQKRFsEjr/sSxU1/L 5BmRNac3VwhA== X-IronPort-AV: E=McAfee;i="6200,9189,10025"; a="204491713" X-IronPort-AV: E=Sophos;i="5.83,296,1616482800"; d="scan'208";a="204491713" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jun 2021 08:54:08 -0700 IronPort-SDR: tEld6k/QXlcgqQQexHSGZZq12ckFuV0x0NddlfR/0uUebYgrFjM9uHzdGXkUzWWfpPZyzq8RD/ +CneuOLVufkg== X-IronPort-AV: E=Sophos;i="5.83,296,1616482800"; d="scan'208";a="557376226" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.250.19]) ([10.213.250.19]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jun 2021 08:54:06 -0700 To: "Ananyev, Konstantin" , "dev@dpdk.org" , "Richardson, Bruce" Cc: "Loftus, Ciara" , "Hunt, David" References: <53ac7ee0-e4bc-099a-2ddd-0d74949eea9d@intel.com> <249f6957-0f10-926f-9f0e-5cb10fc4d2c3@intel.com> <22ddddf1-6523-8219-dcef-86205e662756@intel.com> <084a42d9-1422-f4a5-f57e-53b4667c1f44@intel.com> <882380b2-20b9-f081-9a5a-199c38094b1b@intel.com> From: "Burakov, Anatoly" Message-ID: Date: Thu, 24 Jun 2021 16:54:03 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 24-Jun-21 4:25 PM, Ananyev, Konstantin wrote: > >>>>>>>> I did a quick prototype for this, and i don't think it is going to work. >>>>>>>> >>>>>>>> Callbacks with just "current value" as argument will be pretty limited >>>>>>>> and will only really work for cases where we know what we are expecting. >>>>>>>> However, for cases like event/dlb or net/mlx5, the expected value is (or >>>>>>>> appears to be) dependent upon some internal device data, and is not >>>>>>>> constant like in case of net/ixgbe for example. >>>>>>>> >>>>>>>> This can be fixed by passing an opaque pointer, either by storing it in >>>>>>>> the monitor condition, or by passing it directly to rte_power_monitor at >>>>>>>> invocation time. >>>>>>>> >>>>>>>> The latter doesn't work well because when we call rte_power_monitor from >>>>>>>> inside the rte_power library, we lack the context necessary to get said >>>>>>>> opaque pointer. >>>>>>>> >>>>>>>> The former doesn't work either, because the only place where we can get >>>>>>>> this argument is inside get_monitor_addr, but the opaque pointer must >>>>>>>> persist after we exit that function in order to avoid use-after-free - >>>>>>>> which means that it either has to be statically allocated (which means >>>>>>>> it's not thread-safe for a non-trivial case), or dynamically allocated >>>>>>>> (which a big no-no on a hotpath). >>>>>>> >>>>>>> If I get you right, expected_value (and probably mask) can be variable ones. >>>>>>> So for callback approach to work we need to pass all this as parameters >>>>>>> to PMD comparison callback: >>>>>>> int pmc_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask) >>>>>>> Correct? >>>>>> >>>>>> If we have both expected value, mask, and current value, then what's the >>>>>> point of the callback? The point of the callback would be to pass just >>>>>> the current value, and let the callback decide what's the expected value >>>>>> and how to compare it. >>>>> >>>>> For me the main point of callback is to hide PMD specific comparison semantics. >>>>> Basically they provide us with some values in struct rte_power_monitor_cond, >>>>> and then it is up to them how to interpret them in their comparison function. >>>>> All we'll do for them: will read the value at address provided. >>>>> I understand that it looks like an overkill, as majority of these comparison functions >>>>> will be like: >>>>> int cmp_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask) >>>>> { >>>>> return ((real_val & mask) == expected_val); >>>>> } >>>>> Though qsort() and bsearch() work in a similar manner, and everyone seems ok with it. >>>>> >>>>>> >>>>>> So, we can either let callback handle expected values itself by having >>>>>> an opaque callback-specific argument (which means it has to persist >>>>>> between .get_monitor_addr() and rte_power_monitor() calls), >>>>> >>>>> But that's what we doing already - PMD fills rte_power_monitor_cond values >>>>> for us, we store them somewhere and then use them to decide should we go to sleep or not. >>>>> All callback does - moves actual values interpretation back to PMD: >>>>> Right now: >>>>> PMD: provide PMC values >>>>> POWER: store PMC values somewhere >>>>> read the value at address provided in PMC >>>>> interpret PMC values and newly read value and make the decision >>>>> >>>>> With callback: >>>>> PMD: provide PMC values >>>>> POWER: store PMC values somewhere >>>>> read the value at address provided in PMC >>>>> PMD: interpret PMC values and newly read value and make the decision >>>>> >>>>> Or did you mean something different here? >>>>> >>>>>> or we do the >>>>>> comparisons inside rte_power_monitor(), and store the expected/mask >>>>>> values in the monitor condition, and *don't* have any callbacks at all. >>>>>> Are you suggesting an alternative to the above two options? >>>>> >>>>> As I said in my first mail - we can just replace 'inverse' with 'op'. >>>>> That at least will make this API extendable, if someone will need >>>>> something different in future. >>>>> >>>>> Another option is >>>> >>>> Right, so the idea is store the PMD-specific data in the monitor >>>> condition, and leave it to the callback to interpret it. >>>> >>>> The obvious question then is, how many values is enough? Two? Three? >>>> Four? This option doesn't really solve the basic issue, it just kicks >>>> the can down the road. I guess three values should be enough for >>>> everyone (tm) ? :D >>>> >>>> I don't like the 'op' thing because if the goal is to be flexible, it's >>>> unnecessarily limiting *and* makes the API even more complex to use. I >>>> would rather have a number of PMD-specific values and leave it up to the >>>> callback to interpret them, because at least that way we're not limited >>>> to predefined operations on the monitor condition data. >>> >>> Just to make sure we are talking about the same, does what you propose >>> looks like that: >>> >>> struct rte_power_monitor_cond { >>> volatile void *addr; /**< Address to monitor for changes */ >>> uint8_t size; /**< Data size (in bytes) that will be used to compare >>> * expected value (`val`) with data read from the >>> * monitored memory location (`addr`). Can be 1, 2, >>> * 4, or 8. Supplying any other value will result in >>> * an error. >>> */ >>> int (*cmp)(uint64_t real_value, const uint64_t opaque[4]); >>> uint64_t opaque[4]; /*PMD specific data, used by comparison call-back below */ >>> }; >>> >>> And then in rte_power_monitor(): >>> ... >>> uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size); >>> if (pmc->cmp(cur_value, pmc->opaque) != 0) { >>> /* goto sleep */ >>> } >>> >>> ? >>> >> >> Something like that, yes. >> > > Seems reasonable to me. > Thanks > Konstantin > OK, i'll implement this in v2. Thanks for your input! -- Thanks, Anatoly