From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anup Patel Subject: Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients Date: Fri, 10 Feb 2017 08:54:16 +0530 Message-ID: References: <1486455406-11202-1-git-send-email-anup.patel@broadcom.com> <1486455406-11202-3-git-send-email-anup.patel@broadcom.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Mark Rutland , Device Tree , Herbert Xu , Scott Branden , Vinod Koul , Ray Jui , Jassi Brar , "linux-kernel@vger.kernel.org" , linux-raid , Jon Mason , Rob Herring , BCM Kernel Feedback , linux-crypto@vger.kernel.org, Rob Rice , "dmaengine@vger.kernel.org" , "David S . Miller" , "linux-arm-kernel@lists.infradead.org" To: Dan Williams Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=m.gmane.org@lists.infradead.org List-Id: linux-crypto.vger.kernel.org On Thu, Feb 9, 2017 at 10:14 PM, Dan Williams wrote: > On Thu, Feb 9, 2017 at 1:29 AM, Anup Patel wrote: >> On Wed, Feb 8, 2017 at 9:54 PM, Dan Williams wrote: >>> On Wed, Feb 8, 2017 at 12:57 AM, Anup Patel wrote: >>>> On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams wrote: >>>>> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel wrote: >>>>>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams wrote: >>>>>>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel wrote: >>>>>>>> The DMAENGINE framework assumes that if PQ offload is supported by a >>>>>>>> DMA device then all 256 PQ coefficients are supported. This assumption >>>>>>>> does not hold anymore because we now have BCM-SBA-RAID offload engine >>>>>>>> which supports PQ offload with limited number of PQ coefficients. >>>>>>>> >>>>>>>> This patch extends async_tx APIs to handle DMA devices with support >>>>>>>> for fewer PQ coefficients. >>>>>>>> >>>>>>>> Signed-off-by: Anup Patel >>>>>>>> Reviewed-by: Scott Branden >>>>>>> >>>>>>> I don't like this approach. Define an interface for md to query the >>>>>>> offload engine once at the beginning of time. We should not be adding >>>>>>> any new extensions to async_tx. >>>>>> >>>>>> Even if we do capability checks in Linux MD, we still need a way >>>>>> for DMAENGINE drivers to advertise number of PQ coefficients >>>>>> handled by the HW. >>>>>> >>>>>> I agree capability checks should be done once in Linux MD but I don't >>>>>> see why this has to be part of BCM-SBA-RAID driver patches. We need >>>>>> separate patchsets to address limitations of async_tx framework. >>>>> >>>>> Right, separate enabling before we pile on new hardware support to a >>>>> known broken framework. >>>> >>>> Linux Async Tx not broken framework. The issue is: >>>> 1. Its not complete enough >>>> 2. Its not optimized for very high through-put offload engines >>> >>> I'm not understanding your point. I'm nak'ing this change to add yet >>> more per-transaction capability checking to async_tx. I don't like the >>> DMA_HAS_FEWER_PQ_COEF flag, especially since it is equal to >>> DMA_HAS_PQ_CONTINUE. I'm not asking for all of async_tx's problems to >>> be fixed before this new hardware support, I'm simply saying we should >>> start the process of moving offload-engine capability checking to the >>> raid code. >> >> The DMA_HAS_FEWER_PQ_COEF is not equal to >> DMA_HAS_PQ_CONTINUE. > > #define DMA_HAS_PQ_CONTINUE (1 << 15 > #define DMA_HAS_FEWER_PQ_COEF (1 << 15) You are only looking at the values of these flags. The semantics of both these flags are different and both flags are set in different members of "struct dma_device" The DMA_HAS_PQ_CONTINUE is set in "max_pq" whereas DMA_HAS_FEWER_PQ_COEF is set in "max_pqcoef". When DMA_HAS_PQ_CONTINUE is set in "max_pq", it means that PQ HW is capable of taking P & Q computed previous txn as input. If DMA_HAS_PQ_CONTINUE is not supported the async_pq() will pass P & Q computed by previous txn as sources with coef as g^0. When DMA_HAS_FEWER_PQ_COEF is set in "max_pqcoef", it means the PQ HW is not capable of handling all 256 coefs. > >> I will try to drop this patch and take care of unsupported PQ >> coefficients in BCM-SBA-RAID driver itself even if this means >> doing some computations in BCM-SBA-RAID driver itself. > > That should be nak'd as well, please do capability detection in a > routine that is common to all raid engines. Thanks for NAKing this patch. This motivated me to find clean work-around for handling unsupported PQ coefs in BCM-SBA-RAID driver. Let's assume max number of PQ coefs supported by PQ HW is m coefs. Now for any coef n > m, we can use RAID6 math to get g^n = (g^m)*(g^m)*....*(g^k) where k <= m. Using the above fact, we can create chained txn for each source of PQ request in BCM-SBA-RAID driver where each txn will only compute PQ from one source using above described RAID6 math. Also, each txn in chained txn will depend on output of previous txn because we are computing PQ from one source at a time. I will drop this patch and send updated BCM-SBA-RAID driver with above described work-around for handling unsupported PQ coefs. Regards, Anup From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751602AbdBJDZH (ORCPT ); Thu, 9 Feb 2017 22:25:07 -0500 Received: from mail-vk0-f50.google.com ([209.85.213.50]:36124 "EHLO mail-vk0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751050AbdBJDYt (ORCPT ); Thu, 9 Feb 2017 22:24:49 -0500 MIME-Version: 1.0 In-Reply-To: References: <1486455406-11202-1-git-send-email-anup.patel@broadcom.com> <1486455406-11202-3-git-send-email-anup.patel@broadcom.com> From: Anup Patel Date: Fri, 10 Feb 2017 08:54:16 +0530 Message-ID: Subject: Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients To: Dan Williams Cc: Vinod Koul , Rob Herring , Mark Rutland , Herbert Xu , "David S . Miller" , Jassi Brar , Ray Jui , Scott Branden , Jon Mason , Rob Rice , BCM Kernel Feedback , "dmaengine@vger.kernel.org" , Device Tree , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , linux-crypto@vger.kernel.org, linux-raid Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 9, 2017 at 10:14 PM, Dan Williams wrote: > On Thu, Feb 9, 2017 at 1:29 AM, Anup Patel wrote: >> On Wed, Feb 8, 2017 at 9:54 PM, Dan Williams wrote: >>> On Wed, Feb 8, 2017 at 12:57 AM, Anup Patel wrote: >>>> On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams wrote: >>>>> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel wrote: >>>>>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams wrote: >>>>>>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel wrote: >>>>>>>> The DMAENGINE framework assumes that if PQ offload is supported by a >>>>>>>> DMA device then all 256 PQ coefficients are supported. This assumption >>>>>>>> does not hold anymore because we now have BCM-SBA-RAID offload engine >>>>>>>> which supports PQ offload with limited number of PQ coefficients. >>>>>>>> >>>>>>>> This patch extends async_tx APIs to handle DMA devices with support >>>>>>>> for fewer PQ coefficients. >>>>>>>> >>>>>>>> Signed-off-by: Anup Patel >>>>>>>> Reviewed-by: Scott Branden >>>>>>> >>>>>>> I don't like this approach. Define an interface for md to query the >>>>>>> offload engine once at the beginning of time. We should not be adding >>>>>>> any new extensions to async_tx. >>>>>> >>>>>> Even if we do capability checks in Linux MD, we still need a way >>>>>> for DMAENGINE drivers to advertise number of PQ coefficients >>>>>> handled by the HW. >>>>>> >>>>>> I agree capability checks should be done once in Linux MD but I don't >>>>>> see why this has to be part of BCM-SBA-RAID driver patches. We need >>>>>> separate patchsets to address limitations of async_tx framework. >>>>> >>>>> Right, separate enabling before we pile on new hardware support to a >>>>> known broken framework. >>>> >>>> Linux Async Tx not broken framework. The issue is: >>>> 1. Its not complete enough >>>> 2. Its not optimized for very high through-put offload engines >>> >>> I'm not understanding your point. I'm nak'ing this change to add yet >>> more per-transaction capability checking to async_tx. I don't like the >>> DMA_HAS_FEWER_PQ_COEF flag, especially since it is equal to >>> DMA_HAS_PQ_CONTINUE. I'm not asking for all of async_tx's problems to >>> be fixed before this new hardware support, I'm simply saying we should >>> start the process of moving offload-engine capability checking to the >>> raid code. >> >> The DMA_HAS_FEWER_PQ_COEF is not equal to >> DMA_HAS_PQ_CONTINUE. > > #define DMA_HAS_PQ_CONTINUE (1 << 15 > #define DMA_HAS_FEWER_PQ_COEF (1 << 15) You are only looking at the values of these flags. The semantics of both these flags are different and both flags are set in different members of "struct dma_device" The DMA_HAS_PQ_CONTINUE is set in "max_pq" whereas DMA_HAS_FEWER_PQ_COEF is set in "max_pqcoef". When DMA_HAS_PQ_CONTINUE is set in "max_pq", it means that PQ HW is capable of taking P & Q computed previous txn as input. If DMA_HAS_PQ_CONTINUE is not supported the async_pq() will pass P & Q computed by previous txn as sources with coef as g^0. When DMA_HAS_FEWER_PQ_COEF is set in "max_pqcoef", it means the PQ HW is not capable of handling all 256 coefs. > >> I will try to drop this patch and take care of unsupported PQ >> coefficients in BCM-SBA-RAID driver itself even if this means >> doing some computations in BCM-SBA-RAID driver itself. > > That should be nak'd as well, please do capability detection in a > routine that is common to all raid engines. Thanks for NAKing this patch. This motivated me to find clean work-around for handling unsupported PQ coefs in BCM-SBA-RAID driver. Let's assume max number of PQ coefs supported by PQ HW is m coefs. Now for any coef n > m, we can use RAID6 math to get g^n = (g^m)*(g^m)*....*(g^k) where k <= m. Using the above fact, we can create chained txn for each source of PQ request in BCM-SBA-RAID driver where each txn will only compute PQ from one source using above described RAID6 math. Also, each txn in chained txn will depend on output of previous txn because we are computing PQ from one source at a time. I will drop this patch and send updated BCM-SBA-RAID driver with above described work-around for handling unsupported PQ coefs. Regards, Anup From mboxrd@z Thu Jan 1 00:00:00 1970 From: anup.patel@broadcom.com (Anup Patel) Date: Fri, 10 Feb 2017 08:54:16 +0530 Subject: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients In-Reply-To: References: <1486455406-11202-1-git-send-email-anup.patel@broadcom.com> <1486455406-11202-3-git-send-email-anup.patel@broadcom.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Thu, Feb 9, 2017 at 10:14 PM, Dan Williams wrote: > On Thu, Feb 9, 2017 at 1:29 AM, Anup Patel wrote: >> On Wed, Feb 8, 2017 at 9:54 PM, Dan Williams wrote: >>> On Wed, Feb 8, 2017 at 12:57 AM, Anup Patel wrote: >>>> On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams wrote: >>>>> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel wrote: >>>>>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams wrote: >>>>>>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel wrote: >>>>>>>> The DMAENGINE framework assumes that if PQ offload is supported by a >>>>>>>> DMA device then all 256 PQ coefficients are supported. This assumption >>>>>>>> does not hold anymore because we now have BCM-SBA-RAID offload engine >>>>>>>> which supports PQ offload with limited number of PQ coefficients. >>>>>>>> >>>>>>>> This patch extends async_tx APIs to handle DMA devices with support >>>>>>>> for fewer PQ coefficients. >>>>>>>> >>>>>>>> Signed-off-by: Anup Patel >>>>>>>> Reviewed-by: Scott Branden >>>>>>> >>>>>>> I don't like this approach. Define an interface for md to query the >>>>>>> offload engine once at the beginning of time. We should not be adding >>>>>>> any new extensions to async_tx. >>>>>> >>>>>> Even if we do capability checks in Linux MD, we still need a way >>>>>> for DMAENGINE drivers to advertise number of PQ coefficients >>>>>> handled by the HW. >>>>>> >>>>>> I agree capability checks should be done once in Linux MD but I don't >>>>>> see why this has to be part of BCM-SBA-RAID driver patches. We need >>>>>> separate patchsets to address limitations of async_tx framework. >>>>> >>>>> Right, separate enabling before we pile on new hardware support to a >>>>> known broken framework. >>>> >>>> Linux Async Tx not broken framework. The issue is: >>>> 1. Its not complete enough >>>> 2. Its not optimized for very high through-put offload engines >>> >>> I'm not understanding your point. I'm nak'ing this change to add yet >>> more per-transaction capability checking to async_tx. I don't like the >>> DMA_HAS_FEWER_PQ_COEF flag, especially since it is equal to >>> DMA_HAS_PQ_CONTINUE. I'm not asking for all of async_tx's problems to >>> be fixed before this new hardware support, I'm simply saying we should >>> start the process of moving offload-engine capability checking to the >>> raid code. >> >> The DMA_HAS_FEWER_PQ_COEF is not equal to >> DMA_HAS_PQ_CONTINUE. > > #define DMA_HAS_PQ_CONTINUE (1 << 15 > #define DMA_HAS_FEWER_PQ_COEF (1 << 15) You are only looking@the values of these flags. The semantics of both these flags are different and both flags are set in different members of "struct dma_device" The DMA_HAS_PQ_CONTINUE is set in "max_pq" whereas DMA_HAS_FEWER_PQ_COEF is set in "max_pqcoef". When DMA_HAS_PQ_CONTINUE is set in "max_pq", it means that PQ HW is capable of taking P & Q computed previous txn as input. If DMA_HAS_PQ_CONTINUE is not supported the async_pq() will pass P & Q computed by previous txn as sources with coef as g^0. When DMA_HAS_FEWER_PQ_COEF is set in "max_pqcoef", it means the PQ HW is not capable of handling all 256 coefs. > >> I will try to drop this patch and take care of unsupported PQ >> coefficients in BCM-SBA-RAID driver itself even if this means >> doing some computations in BCM-SBA-RAID driver itself. > > That should be nak'd as well, please do capability detection in a > routine that is common to all raid engines. Thanks for NAKing this patch. This motivated me to find clean work-around for handling unsupported PQ coefs in BCM-SBA-RAID driver. Let's assume max number of PQ coefs supported by PQ HW is m coefs. Now for any coef n > m, we can use RAID6 math to get g^n = (g^m)*(g^m)*....*(g^k) where k <= m. Using the above fact, we can create chained txn for each source of PQ request in BCM-SBA-RAID driver where each txn will only compute PQ from one source using above described RAID6 math. Also, each txn in chained txn will depend on output of previous txn because we are computing PQ from one source@a time. I will drop this patch and send updated BCM-SBA-RAID driver with above described work-around for handling unsupported PQ coefs. Regards, Anup