From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67F13C433ED for ; Fri, 7 May 2021 18:24:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 33B5D61430 for ; Fri, 7 May 2021 18:24:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229724AbhEGSY6 (ORCPT ); Fri, 7 May 2021 14:24:58 -0400 Received: from fllv0015.ext.ti.com ([198.47.19.141]:57586 "EHLO fllv0015.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229687AbhEGSY5 (ORCPT ); Fri, 7 May 2021 14:24:57 -0400 Received: from lelv0266.itg.ti.com ([10.180.67.225]) by fllv0015.ext.ti.com (8.15.2/8.15.2) with ESMTP id 147INkKk088265; Fri, 7 May 2021 13:23:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ti.com; s=ti-com-17Q1; t=1620411826; bh=pd1p/XdApN8ywhme/JPeieeHHLI5ICCqEJK/+zrXP0Y=; h=Date:From:To:CC:Subject:References:In-Reply-To; b=my1WieGO9J9NusPJJugu1lt91/WiwX28HCAbZoMxTt4UJWRgokyBnl7XIGGbc8gWG v1U4yMak2Be0wMxa1t4i4C06aGrOz1ACb5mJlc1C0VeXmDwxCqWkPFyyG2B+ESFs/M Hpba469DbbhL8X3kJMq4qkdSuZJPYfuvtccB7ZMA= Received: from DFLE112.ent.ti.com (dfle112.ent.ti.com [10.64.6.33]) by lelv0266.itg.ti.com (8.15.2/8.15.2) with ESMTPS id 147INkvb082699 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 7 May 2021 13:23:46 -0500 Received: from DFLE101.ent.ti.com (10.64.6.22) by DFLE112.ent.ti.com (10.64.6.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Fri, 7 May 2021 13:23:46 -0500 Received: from lelv0327.itg.ti.com (10.180.67.183) by DFLE101.ent.ti.com (10.64.6.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2 via Frontend Transport; Fri, 7 May 2021 13:23:46 -0500 Received: from localhost (ileax41-snat.itg.ti.com [10.172.224.153]) by lelv0327.itg.ti.com (8.15.2/8.15.2) with ESMTP id 147INjYM036385; Fri, 7 May 2021 13:23:46 -0500 Date: Fri, 7 May 2021 23:53:45 +0530 From: Pratyush Yadav To: Michael Walle CC: Tudor Ambarus , Miquel Raynal , Richard Weinberger , Vignesh Raghavendra , Mark Brown , , , Subject: Re: [PATCH 5/6] mtd: spi-nor: core; avoid odd length/address reads on 8D-8D-8D mode Message-ID: <20210507182343.3rykt2ydyj74qfxn@ti.com> References: <20210506191829.8271-1-p.yadav@ti.com> <20210506191829.8271-6-p.yadav@ti.com> <3daadf43ef4743f13ebbdd000ba5ec4a@walle.cc> <20210507180424.kj7c4rfjbycjagxm@ti.com> <50b07a065455b93b78ee43ba665083ee@walle.cc> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <50b07a065455b93b78ee43ba665083ee@walle.cc> User-Agent: NeoMutt/20171215 X-EXCLAIMER-MD-CONFIG: e1e8a2fd-e40a-4ac6-ac9b-f7e9cc9ee180 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/05/21 08:14PM, Michael Walle wrote: > Am 2021-05-07 20:04, schrieb Pratyush Yadav: > > On 07/05/21 05:51PM, Michael Walle wrote: > > > Am 2021-05-06 21:18, schrieb Pratyush Yadav: > > > > On Octal DTR capable flashes like Micron Xcella reads cannot start or > > > > end at an odd address in Octal DTR mode. Extra bytes need to be read at > > > > the start or end to make sure both the start address and length remain > > > > even. > > > > > > > > To avoid allocating too much extra memory, thereby putting unnecessary > > > > memory pressure on the system, the temporary buffer containing the extra > > > > padding bytes is capped at PAGE_SIZE bytes. The rest of the 2-byte > > > > aligned part should be read directly in the main buffer. > > > > > > > > Signed-off-by: Pratyush Yadav > > > > --- > > > > > > > > drivers/mtd/spi-nor/core.c | 81 +++++++++++++++++++++++++++++++++++++- > > > > 1 file changed, 80 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c > > > > index 5cc206b8bbf3..3d66cc34af4d 100644 > > > > --- a/drivers/mtd/spi-nor/core.c > > > > +++ b/drivers/mtd/spi-nor/core.c > > > > @@ -1904,6 +1904,82 @@ static const struct flash_info > > > > *spi_nor_read_id(struct spi_nor *nor) > > > > return ERR_PTR(-ENODEV); > > > > } > > > > > > > > +/* > > > > + * On Octal DTR capable flashes like Micron Xcella reads cannot start > > > > or > > > > + * end at an odd address in Octal DTR mode. Extra bytes need to be read > > > > + * at the start or end to make sure both the start address and length > > > > + * remain even. > > > > + */ > > > > +static int spi_nor_octal_dtr_read(struct spi_nor *nor, loff_t from, > > > > size_t len, > > > > + u_char *buf) > > > > +{ > > > > + u_char *tmp_buf; > > > > + size_t tmp_len; > > > > + loff_t start, end; > > > > + int ret, bytes_read; > > > > + > > > > + if (IS_ALIGNED(from, 2) && IS_ALIGNED(len, 2)) > > > > + return spi_nor_read_data(nor, from, len, buf); > > > > + else if (IS_ALIGNED(from, 2) && len > PAGE_SIZE) > > > > + return spi_nor_read_data(nor, from, round_down(len, PAGE_SIZE), > > > > + buf); > > > > + > > > > + tmp_buf = kmalloc(PAGE_SIZE, GFP_KERNEL); > > > > + if (!tmp_buf) > > > > + return -ENOMEM; > > > > + > > > > + start = round_down(from, 2); > > > > + end = round_up(from + len, 2); > > > > + > > > > + /* > > > > + * Avoid allocating too much memory. The requested read length might > > > > be > > > > + * quite large. Allocating a buffer just as large (slightly bigger, in > > > > + * fact) would put unnecessary memory pressure on the system. > > > > + * > > > > + * For example if the read is from 3 to 1M, then this will read from 2 > > > > + * to 4098. The reads from 4098 to 1M will then not need a temporary > > > > + * buffer so they can proceed as normal. > > > > + */ > > > > + tmp_len = min_t(size_t, end - start, PAGE_SIZE); > > > > + > > > > + ret = spi_nor_read_data(nor, start, tmp_len, tmp_buf); > > > > + if (ret == 0) { > > > > + ret = -EIO; > > > > + goto out; > > > > + } > > > > + if (ret < 0) > > > > + goto out; > > > > + > > > > + /* > > > > + * More bytes are read than actually requested, but that number can't > > > > be > > > > + * reported to the calling function or it will confuse its > > > > calculations. > > > > + * Calculate how many of the _requested_ bytes were read. > > > > + */ > > > > + bytes_read = ret; > > > > + > > > > + if (from != start) > > > > + ret -= from - start; > > > > + > > > > + /* > > > > + * Only account for extra bytes at the end if they were actually read. > > > > + * For example, if the total length was truncated because of temporary > > > > + * buffer size limit then the adjustment for the extra bytes at the > > > > end > > > > + * is not needed. > > > > + */ > > > > + if (start + bytes_read == end) > > > > + ret -= end - (from + len); > > > > + > > > > + if (ret < 0) { > > > > + ret = -EIO; > > > > + goto out; > > > > + } > > > > + > > > > + memcpy(buf, tmp_buf + (from - start), ret); > > > > +out: > > > > + kfree(tmp_buf); > > > > + return ret; > > > > +} > > > > + > > > > static int spi_nor_read(struct mtd_info *mtd, loff_t from, size_t len, > > > > size_t *retlen, u_char *buf) > > > > { > > > > @@ -1921,7 +1997,10 @@ static int spi_nor_read(struct mtd_info *mtd, > > > > loff_t from, size_t len, > > > > > > > > addr = spi_nor_convert_addr(nor, addr); > > > > > > > > - ret = spi_nor_read_data(nor, addr, len, buf); > > > > + if (nor->read_proto == SNOR_PROTO_8_8_8_DTR) > > > > + ret = spi_nor_octal_dtr_read(nor, addr, len, buf); > > > > + else > > > > + ret = spi_nor_read_data(nor, addr, len, buf); > > > > if (ret == 0) { > > > > /* We shouldn't see 0-length reads */ > > > > ret = -EIO; > > > > > > Reviewed-by: Michael Walle > > > > Thanks. > > > > > > > > I wonder how much performance is lost if this would just split > > > one transfer into up to three ones: 2 byte, size - 2, 2 bytes. > > > > This case is not really possible since it would try to read PAGE_SIZE > > whenever it can. But there is a situation possible where one transfer is > > split into three. It would look something like: 4096 bytes, size - 4096 > > bytes, 2 bytes. > > Ah no, I wasn't talking about your implementation, but just having a naive > one where you don't move around up to PAGE_SIZE of data but just read > 2 bytes in the beginning (if unaligned) and 2 bytes at the end (if > unaligned) > and reading the part in between just as usual because its then aligend. > > > I am trying to find a balance between minimizing number of reads while > > keeping the size of the temporary buffer to a reasonable limit. This is > > the best I could come up with. It optimizes for smaller transfers so > > while the absolute amount of overhead remains roughly the same, the > > ratio of it relative to read size is smaller. > > Yes, with this you will have that memcpy() and one transfer for transfers > up to PAGE_SIZE; the "naive" one above would have up to three depending on > the aligment. Right. Smaller transfers lose much more performance to the overhead than the larger ones do. So I think the optimization is worth the extra code complexity. > > > You can optimize for read performance if you are willing to waste memory > > by simple allocating a size + 2 bytes long buffer. Then the read can > > proceed in one transaction. But IMO memory is much more important > > compared to read throughput. > > -michael -- Regards, Pratyush Yadav Texas Instruments Inc.