From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24537C49361 for ; Mon, 28 Jun 2021 12:41:24 +0000 (UTC) Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by mail.kernel.org (Postfix) with ESMTP id 970A861450 for ; Mon, 28 Jun 2021 12:41:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 970A861450 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0ADF741122; Mon, 28 Jun 2021 14:41:23 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by mails.dpdk.org (Postfix) with ESMTP id 1D61E41121 for ; Mon, 28 Jun 2021 14:41:18 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10028"; a="207761719" X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="207761719" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2021 05:41:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="643319571" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.223.53]) by fmsmga005.fm.intel.com with ESMTP; 28 Jun 2021 05:41:15 -0700 From: Anatoly Burakov To: dev@dpdk.org, Timothy McDaniel , Beilei Xing , Jingjing Wu , Qiming Yang , Qi Zhang , Haiyue Wang , Matan Azrad , Shahaf Shuler , Viacheslav Ovsiienko , Bruce Richardson , Konstantin Ananyev Cc: david.hunt@intel.com, ciara.loftus@intel.com Date: Mon, 28 Jun 2021 12:41:07 +0000 Message-Id: <7a16950b25e28689c98454dac45a8a2e797ebb84.1624884053.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v3 1/7] power_intrinsics: use callbacks for comparison X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Previously, the semantics of power monitor were such that we were checking current value against the expected value, and if they matched, then the sleep was aborted. This is somewhat inflexible, because it only allowed us to check for a specific value. This commit replaces the comparison with a user callback mechanism, so that any PMD (or other code) using `rte_power_monitor()` can define their own comparison semantics and decision making on how to detect the need to abort the entering of power optimized state. Existing implementations are adjusted to follow the new semantics. Suggested-by: Konstantin Ananyev Signed-off-by: Anatoly Burakov --- Notes: v2: - Use callback mechanism for more flexibility - Address feedback from Konstantin doc/guides/rel_notes/release_21_08.rst | 1 + drivers/event/dlb2/dlb2.c | 16 ++++++++-- drivers/net/i40e/i40e_rxtx.c | 19 ++++++++---- drivers/net/iavf/iavf_rxtx.c | 19 ++++++++---- drivers/net/ice/ice_rxtx.c | 19 ++++++++---- drivers/net/ixgbe/ixgbe_rxtx.c | 19 ++++++++---- drivers/net/mlx5/mlx5_rx.c | 16 ++++++++-- .../include/generic/rte_power_intrinsics.h | 29 ++++++++++++++----- lib/eal/x86/rte_power_intrinsics.c | 9 ++---- 9 files changed, 106 insertions(+), 41 deletions(-) diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst index a6ecfdf3ce..c84ac280f5 100644 --- a/doc/guides/rel_notes/release_21_08.rst +++ b/doc/guides/rel_notes/release_21_08.rst @@ -84,6 +84,7 @@ API Changes Also, make sure to start the actual text at the margin. ======================================================= +* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism. ABI Changes ----------- diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c index eca183753f..14dfac257c 100644 --- a/drivers/event/dlb2/dlb2.c +++ b/drivers/event/dlb2/dlb2.c @@ -3154,6 +3154,15 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num) } } +#define CLB_MASK_IDX 0 +#define CLB_VAL_IDX 1 +static int +dlb2_monitor_callback(const uint64_t val, const uint64_t opaque[4]) +{ + /* abort if the value matches */ + return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0; +} + static inline int dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, struct dlb2_eventdev_port *ev_port, @@ -3194,8 +3203,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, expected_value = 0; pmc.addr = monitor_addr; - pmc.val = expected_value; - pmc.mask = qe_mask.raw_qe[1]; + /* store expected value and comparison mask in opaque data */ + pmc.opaque[CLB_VAL_IDX] = expected_value; + pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1]; + /* set up callback */ + pmc.fn = dlb2_monitor_callback; pmc.size = sizeof(uint64_t); rte_power_monitor(&pmc, timeout + start_ticks); diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 6c58decece..45f3fbf4ec 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -81,6 +81,17 @@ #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \ (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK) +static int +i40e_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused) +{ + const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + return (value & m) == m ? -1 : 0; +} + int i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) { @@ -93,12 +104,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) /* watch for changes in status bit */ pmc->addr = &rxdp->wb.qword1.status_error_len; - /* - * we expect the DD bit to be set to 1 if this descriptor was already - * written to. - */ - pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); - pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + /* comparison callback */ + pmc->fn = i40e_monitor_callback; /* registers are 64-bit */ pmc->size = sizeof(uint64_t); diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c index 0361af0d85..6e12ecce07 100644 --- a/drivers/net/iavf/iavf_rxtx.c +++ b/drivers/net/iavf/iavf_rxtx.c @@ -57,6 +57,17 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type) rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1; } +static int +iavf_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused) +{ + const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT); + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + return (value & m) == m ? -1 : 0; +} + int iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) { @@ -69,12 +80,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) /* watch for changes in status bit */ pmc->addr = &rxdp->wb.qword1.status_error_len; - /* - * we expect the DD bit to be set to 1 if this descriptor was already - * written to. - */ - pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT); - pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT); + /* comparison callback */ + pmc->fn = iavf_monitor_callback; /* registers are 64-bit */ pmc->size = sizeof(uint64_t); diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index fc9bb5a3e7..278eb4b9a1 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -27,6 +27,17 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask; uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask; uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask; +static int +ice_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused) +{ + const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + return (value & m) == m ? -1 : 0; +} + int ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) { @@ -39,12 +50,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) /* watch for changes in status bit */ pmc->addr = &rxdp->wb.status_error0; - /* - * we expect the DD bit to be set to 1 if this descriptor was already - * written to. - */ - pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); - pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + /* comparison callback */ + pmc->fn = ice_monitor_callback; /* register is 16-bit */ pmc->size = sizeof(uint16_t); diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index d69f36e977..0c5045d9dc 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1369,6 +1369,17 @@ const uint32_t RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP, }; +static int +ixgbe_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused) +{ + const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + return (value & m) == m ? -1 : 0; +} + int ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) { @@ -1381,12 +1392,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) /* watch for changes in status bit */ pmc->addr = &rxdp->wb.upper.status_error; - /* - * we expect the DD bit to be set to 1 if this descriptor was already - * written to. - */ - pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); - pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + /* comparison callback */ + pmc->fn = ixgbe_monitor_callback; /* the registers are 32-bit */ pmc->size = sizeof(uint32_t); diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c index 777a1d6e45..57f6ca1467 100644 --- a/drivers/net/mlx5/mlx5_rx.c +++ b/drivers/net/mlx5/mlx5_rx.c @@ -269,6 +269,17 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id) return rx_queue_count(rxq); } +#define CLB_VAL_IDX 0 +#define CLB_MSK_IDX 1 +static int +mlx_monitor_callback(const uint64_t value, const uint64_t opaque[4]) +{ + const uint64_t m = opaque[CLB_MSK_IDX]; + const uint64_t v = opaque[CLB_VAL_IDX]; + + return (value & m) == v ? -1 : 0; +} + int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) { struct mlx5_rxq_data *rxq = rx_queue; @@ -282,8 +293,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) return -rte_errno; } pmc->addr = &cqe->op_own; - pmc->val = !!idx; - pmc->mask = MLX5_CQE_OWNER_MASK; + pmc->opaque[CLB_VAL_IDX] = !!idx; + pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK; + pmc->fn = mlx_monitor_callback; pmc->size = sizeof(uint8_t); return 0; } diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h index dddca3d41c..046667ade6 100644 --- a/lib/eal/include/generic/rte_power_intrinsics.h +++ b/lib/eal/include/generic/rte_power_intrinsics.h @@ -18,19 +18,34 @@ * which are architecture-dependent. */ +/** + * Callback definition for monitoring conditions. Callbacks with this signature + * will be used by `rte_power_monitor()` to check if the entering of power + * optimized state should be aborted. + * + * @param val + * The value read from memory. + * @param opaque + * Callback-specific data. + * + * @return + * 0 if entering of power optimized state should proceed + * -1 if entering of power optimized state should be aborted + */ +typedef int (*rte_power_monitor_clb_t)(const uint64_t val, + const uint64_t opaque[4]); struct rte_power_monitor_cond { volatile void *addr; /**< Address to monitor for changes */ - uint64_t val; /**< If the `mask` is non-zero, location pointed - * to by `addr` will be read and compared - * against this value. - */ - uint64_t mask; /**< 64-bit mask to extract value read from `addr` */ - uint8_t size; /**< Data size (in bytes) that will be used to compare - * expected value (`val`) with data read from the + uint8_t size; /**< Data size (in bytes) that will be read from the * monitored memory location (`addr`). Can be 1, 2, * 4, or 8. Supplying any other value will result in * an error. */ + rte_power_monitor_clb_t fn; /**< Callback to be used to check if + * entering power optimized state should + * be aborted. + */ + uint64_t opaque[4]; /**< Callback-specific data */ }; /** diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c index 39ea9fdecd..3c5c9ce7ad 100644 --- a/lib/eal/x86/rte_power_intrinsics.c +++ b/lib/eal/x86/rte_power_intrinsics.c @@ -110,14 +110,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, /* now that we've put this address into monitor, we can unlock */ rte_spinlock_unlock(&s->lock); - /* if we have a comparison mask, we might not need to sleep at all */ - if (pmc->mask) { + /* if we have a callback, we might not need to sleep at all */ + if (pmc->fn) { const uint64_t cur_value = __get_umwait_val( pmc->addr, pmc->size); - const uint64_t masked = cur_value & pmc->mask; - - /* if the masked value is already matching, abort */ - if (masked == pmc->val) + if (pmc->fn(cur_value, pmc->opaque) != 0) goto end; } -- 2.25.1