From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D67B1CCA479 for ; Tue, 28 Jun 2022 15:21:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346352AbiF1PVP (ORCPT ); Tue, 28 Jun 2022 11:21:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344555AbiF1PVO (ORCPT ); Tue, 28 Jun 2022 11:21:14 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7CC433343; Tue, 28 Jun 2022 08:21:13 -0700 (PDT) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25SFHp2u018978; Tue, 28 Jun 2022 15:20:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : subject : from : to : cc : date : in-reply-to : references : content-type : mime-version : content-transfer-encoding; s=pp1; bh=myzpEHdG0SHhteTqPMwMNlaiMJZ9fOBXbBrHcrj47MM=; b=PKaUpLA//gB/cgq1x9qZAjqjIl9yvVlLFHPiaoA1KCVgbYZEpWzlGzOKlHh6wqS+BReV xX9GsZSp3ZvjNRvAnmbWMeNpJfMpp/vr+zHUTaom/ZX7xnhuxux+IABcTDN/x1J8GKpH efdwM1/bs/lXMVJ74/hFixNzKx+JFlTmYrAONI/ZVj9deEJc+UAi7YMw3z3n1+7Cxe6I sT8Bm3yw2vDfARIBg02jJd1iyE6MY3gR1nuJGEptu2LKJ7K5sBlUSXggbkqiZXq1XgrC ojkpdIKJi2GEGE8X7Eg0+XHBZ4i+X0HjDQG7tXhUK0OYHUnkOZwEi3pUqL/f0vv24zOC VQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3h045jr2ge-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Jun 2022 15:20:50 +0000 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25SFJF6P027356; Tue, 28 Jun 2022 15:20:50 GMT Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3h045jr2g1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Jun 2022 15:20:50 +0000 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25SF80E3026731; Tue, 28 Jun 2022 15:20:49 GMT Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com [9.57.198.26]) by ppma05wdc.us.ibm.com with ESMTP id 3gwt09x8ew-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Jun 2022 15:20:49 +0000 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25SFKmvs13697412 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 28 Jun 2022 15:20:48 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CC957AC059; Tue, 28 Jun 2022 15:20:48 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 674CCAC05B; Tue, 28 Jun 2022 15:20:45 +0000 (GMT) Received: from farman-thinkpad-t470p (unknown [9.211.96.189]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP; Tue, 28 Jun 2022 15:20:45 +0000 (GMT) Message-ID: <83e65083890a7ac9c581c5aee0361d1b49e6abd9.camel@linux.ibm.com> Subject: Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io From: Eric Farman To: Halil Pasic , Keith Busch Cc: Keith Busch , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christian Borntraeger , axboe@kernel.dk, Kernel Team , hch@lst.de, bvanassche@acm.org, damien.lemoal@opensource.wdc.com, ebiggers@kernel.org, pankydev8@gmail.com Date: Tue, 28 Jun 2022 11:20:44 -0400 In-Reply-To: <20220628110024.01fcf84f.pasic@linux.ibm.com> References: <20220610195830.3574005-1-kbusch@fb.com> <20220610195830.3574005-12-kbusch@fb.com> <20220628110024.01fcf84f.pasic@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-18.el8) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: LyMatUSkhFxtAa0fPtjTaFMsaiG8XWqk X-Proofpoint-GUID: BxWXK2NqfjMWLv3qM9DKv1U3bwvuXAqo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-06-28_08,2022-06-28_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxlogscore=999 phishscore=0 suspectscore=0 malwarescore=0 mlxscore=0 priorityscore=1501 impostorscore=0 spamscore=0 lowpriorityscore=0 clxscore=1015 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206280062 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Tue, 2022-06-28 at 11:00 +0200, Halil Pasic wrote: > On Mon, 27 Jun 2022 09:36:56 -0600 > Keith Busch wrote: > > > On Mon, Jun 27, 2022 at 11:21:20AM -0400, Eric Farman wrote: > > > Apologies, it took me an extra day to get back to this, but it is > > > indeed this pass through that's causing our boot failures. I note > > > that > > > the old code (in iomap_dio_bio_iter), did: > > > > > > if ((pos | length | align) & ((1 << blkbits) - 1)) > > > return -EINVAL; > > > > > > With blkbits equal to 12, the resulting mask was 0x0fff against > > > an > > > align value (from iov_iter_alignment) of x200 kicks us out. > > > > > > The new code (in iov_iter_aligned_iovec), meanwhile, compares > > > this: > > > > > > if ((unsigned long)(i->iov[k].iov_base + skip) & > > > addr_mask) > > > return false; > > > > > > iov_base (and the output of the old iov_iter_aligned_iovec() > > > routine) > > > is x200, but since addr_mask is x1ff this check provides a > > > different > > > response than it used to. > > > > > > To check this, I changed the comparator to len_mask (almost > > > certainly > > > not the right answer since addr_mask is then unused, but it was > > > good > > > for a quick test), and our PV guests are able to boot again with > > > -next > > > running in the host. > > > > This raises more questions for me. It sounds like your process used > > to get an > > EINVAL error, and it wants to continue getting an EINVAL error > > instead of > > letting the direct-io request proceed. Is that correct? > > Is my understanding as well. But I'm not familiar enough with the > code to > tell where and how that -EINVAL gets handled. > > BTW let me just point out that the bounce buffering via swiotlb > needed > for PV is not unlikely to mess up the alignment of things. But I'm > not > sure if that is relevant here. > > Regards, > Halil > > > If so, could you > > provide more details on what issue occurs with dispatching this > > request? This error occurs reading the initial boot record for a guest, stating QEMU was unable to read block zero from the device. The code that complains doesn't appear to have anything that says "oh, got EINVAL, try it this other way" but I haven't chased down if/where something in between is expecting that and handling it in some unique way. I -think- I have an easier reproducer now, so maybe I'd be able to get a better answer to this question. > > > > If you really need to restrict address' alignment to the storage's > > logical > > block size, I think your storage driver needs to set the > > dma_alignment queue > > limit to that value. It's possible that there's a problem in the virtio stack here, but the failing configuration is a qcow image on the host rootfs, so it's not using any distinct driver. The bdev request queue that ends up being used is the same allocated out of blk_alloc_queue, so changing dma_alignment there wouldn't work.