From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE45FC76190 for ; Mon, 22 Jul 2019 11:20:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ABCC8216C8 for ; Mon, 22 Jul 2019 11:20:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728762AbfGVLU3 (ORCPT ); Mon, 22 Jul 2019 07:20:29 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:5980 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728094AbfGVLU3 (ORCPT ); Mon, 22 Jul 2019 07:20:29 -0400 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x6MBClf5040711 for ; Mon, 22 Jul 2019 07:20:27 -0400 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0a-001b2d01.pphosted.com with ESMTP id 2twc088pep-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 22 Jul 2019 07:20:27 -0400 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 22 Jul 2019 12:20:24 +0100 Received: from b06avi18626390.portsmouth.uk.ibm.com (9.149.26.192) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 22 Jul 2019 12:20:20 +0100 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x6MBK4Gc39649686 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Jul 2019 11:20:04 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0380211C04A; Mon, 22 Jul 2019 11:20:19 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 85BAA11C058; Mon, 22 Jul 2019 11:20:18 +0000 (GMT) Received: from oc7455500831.ibm.com (unknown [9.152.224.116]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 22 Jul 2019 11:20:18 +0000 (GMT) Subject: Re: [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device To: "Dr. David Alan Gilbert" Cc: Dan Williams , Vivek Goyal , Halil Pasic , Collin Walling , Cornelia Huck , Sebastian Ott , KVM list , Miklos Szeredi , linux-nvdimm , Linux Kernel Mailing List , Stefan Hajnoczi , linux-fsdevel , Steven Whitehouse , Heiko Carstens , David Hildenbrand References: <20190515192715.18000-1-vgoyal@redhat.com> <20190515192715.18000-19-vgoyal@redhat.com> <20190717192725.25c3d146.pasic@linux.ibm.com> <20190718131532.GA13883@redhat.com> <20190722105630.GC3035@work-vm> From: Christian Borntraeger Openpgp: preference=signencrypt Autocrypt: addr=borntraeger@de.ibm.com; prefer-encrypt=mutual; keydata= mQINBE6cPPgBEAC2VpALY0UJjGmgAmavkL/iAdqul2/F9ONz42K6NrwmT+SI9CylKHIX+fdf J34pLNJDmDVEdeb+brtpwC9JEZOLVE0nb+SR83CsAINJYKG3V1b3Kfs0hydseYKsBYqJTN2j CmUXDYq9J7uOyQQ7TNVoQejmpp5ifR4EzwIFfmYDekxRVZDJygD0wL/EzUr8Je3/j548NLyL 4Uhv6CIPf3TY3/aLVKXdxz/ntbLgMcfZsDoHgDk3lY3r1iwbWwEM2+eYRdSZaR4VD+JRD7p8 0FBadNwWnBce1fmQp3EklodGi5y7TNZ/CKdJ+jRPAAnw7SINhSd7PhJMruDAJaUlbYaIm23A +82g+IGe4z9tRGQ9TAflezVMhT5J3ccu6cpIjjvwDlbxucSmtVi5VtPAMTLmfjYp7VY2Tgr+ T92v7+V96jAfE3Zy2nq52e8RDdUo/F6faxcumdl+aLhhKLXgrozpoe2nL0Nyc2uqFjkjwXXI OBQiaqGeWtxeKJP+O8MIpjyGuHUGzvjNx5S/592TQO3phpT5IFWfMgbu4OreZ9yekDhf7Cvn /fkYsiLDz9W6Clihd/xlpm79+jlhm4E3xBPiQOPCZowmHjx57mXVAypOP2Eu+i2nyQrkapaY IdisDQfWPdNeHNOiPnPS3+GhVlPcqSJAIWnuO7Ofw1ZVOyg/jwARAQABtDRDaHJpc3RpYW4g Qm9ybnRyYWVnZXIgKElCTSkgPGJvcm50cmFlZ2VyQGRlLmlibS5jb20+iQI4BBMBAgAiBQJO nDz4AhsDBgsJCAcDAgYVCAIJCgsEFgIDAQIeAQIXgAAKCRARe7yAtaYcfOYVD/9sqc6ZdYKD bmDIvc2/1LL0g7OgiA8pHJlYN2WHvIhUoZUIqy8Sw2EFny/nlpPVWfG290JizNS2LZ0mCeGZ 80yt0EpQNR8tLVzLSSr0GgoY0lwsKhAnx3p3AOrA8WXsPL6prLAu3yJI5D0ym4MJ6KlYVIjU ppi4NLWz7ncA2nDwiIqk8PBGxsjdc/W767zOOv7117rwhaGHgrJ2tLxoGWj0uoH3ZVhITP1z gqHXYaehPEELDV36WrSKidTarfThCWW0T3y4bH/mjvqi4ji9emp1/pOWs5/fmd4HpKW+44tD Yt4rSJRSa8lsXnZaEPaeY3nkbWPcy3vX6qafIey5d8dc8Uyaan39WslnJFNEx8cCqJrC77kI vcnl65HaW3y48DezrMDH34t3FsNrSVv5fRQ0mbEed8hbn4jguFAjPt4az1xawSp0YvhzwATJ YmZWRMa3LPx/fAxoolq9cNa0UB3D3jmikWktm+Jnp6aPeQ2Db3C0cDyxcOQY/GASYHY3KNra z8iwS7vULyq1lVhOXg1EeSm+lXQ1Ciz3ub3AhzE4c0ASqRrIHloVHBmh4favY4DEFN19Xw1p 76vBu6QjlsJGjvROW3GRKpLGogQTLslbjCdIYyp3AJq2KkoKxqdeQYm0LZXjtAwtRDbDo71C FxS7i/qfvWJv8ie7bE9A6Wsjn7kCDQROnDz4ARAAmPI1e8xB0k23TsEg8O1sBCTXkV8HSEq7 JlWz7SWyM8oFkJqYAB7E1GTXV5UZcr9iurCMKGSTrSu3ermLja4+k0w71pLxws859V+3z1jr nhB3dGzVZEUhCr3EuN0t8eHSLSMyrlPL5qJ11JelnuhToT6535cLOzeTlECc51bp5Xf6/XSx SMQaIU1nDM31R13o98oRPQnvSqOeljc25aflKnVkSfqWSrZmb4b0bcWUFFUKVPfQ5Z6JEcJg Hp7qPXHW7+tJTgmI1iM/BIkDwQ8qe3Wz8R6rfupde+T70NiId1M9w5rdo0JJsjKAPePKOSDo RX1kseJsTZH88wyJ30WuqEqH9zBxif0WtPQUTjz/YgFbmZ8OkB1i+lrBCVHPdcmvathknAxS bXL7j37VmYNyVoXez11zPYm+7LA2rvzP9WxR8bPhJvHLhKGk2kZESiNFzP/E4r4Wo24GT4eh YrDo7GBHN82V4O9JxWZtjpxBBl8bH9PvGWBmOXky7/bP6h96jFu9ZYzVgIkBP3UYW+Pb1a+b w4A83/5ImPwtBrN324bNUxPPqUWNW0ftiR5b81ms/rOcDC/k/VoN1B+IHkXrcBf742VOLID4 YP+CB9GXrwuF5KyQ5zEPCAjlOqZoq1fX/xGSsumfM7d6/OR8lvUPmqHfAzW3s9n4lZOW5Jfx bbkAEQEAAYkCHwQYAQIACQUCTpw8+AIbDAAKCRARe7yAtaYcfPzbD/9WNGVf60oXezNzSVCL hfS36l/zy4iy9H9rUZFmmmlBufWOATjiGAXnn0rr/Jh6Zy9NHuvpe3tyNYZLjB9pHT6mRZX7 Z1vDxeLgMjTv983TQ2hUSlhRSc6e6kGDJyG1WnGQaqymUllCmeC/p9q5m3IRxQrd0skfdN1V AMttRwvipmnMduy5SdNayY2YbhWLQ2wS3XHJ39a7D7SQz+gUQfXgE3pf3FlwbwZhRtVR3z5u aKjxqjybS3Ojimx4NkWjidwOaUVZTqEecBV+QCzi2oDr9+XtEs0m5YGI4v+Y/kHocNBP0myd pF3OoXvcWdTb5atk+OKcc8t4TviKy1WCNujC+yBSq3OM8gbmk6NwCwqhHQzXCibMlVF9hq5a FiJb8p4QKSVyLhM8EM3HtiFqFJSV7F+h+2W0kDyzBGyE0D8z3T+L3MOj3JJJkfCwbEbTpk4f n8zMboekuNruDw1OADRMPlhoWb+g6exBWx/YN4AY9LbE2KuaScONqph5/HvJDsUldcRN3a5V RGIN40QWFVlZvkKIEkzlzqpAyGaRLhXJPv/6tpoQaCQQoSAc5Z9kM/wEd9e2zMeojcWjUXgg oWj8A/wY4UXExGBu+UCzzP/6sQRpBiPFgmqPTytrDo/gsUGqjOudLiHQcMU+uunULYQxVghC syiRa+UVlsKmx1hsEg== Date: Mon, 22 Jul 2019 13:20:18 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20190722105630.GC3035@work-vm> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 19072211-0016-0000-0000-000002951B6D X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19072211-0017-0000-0000-000032F3051B Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-07-22_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1907220134 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22.07.19 12:56, Dr. David Alan Gilbert wrote: > * Christian Borntraeger (borntraeger@de.ibm.com) wrote: >> >> >> On 18.07.19 16:30, Dan Williams wrote: >>> On Thu, Jul 18, 2019 at 6:15 AM Vivek Goyal wrote: >>>> >>>> On Wed, Jul 17, 2019 at 07:27:25PM +0200, Halil Pasic wrote: >>>>> On Wed, 15 May 2019 15:27:03 -0400 >>>>> Vivek Goyal wrote: >>>>> >>>>>> From: Stefan Hajnoczi >>>>>> >>>>>> Setup a dax device. >>>>>> >>>>>> Use the shm capability to find the cache entry and map it. >>>>>> >>>>>> The DAX window is accessed by the fs/dax.c infrastructure and must have >>>>>> struct pages (at least on x86). Use devm_memremap_pages() to map the >>>>>> DAX window PCI BAR and allocate struct page. >>>>>> >>>>> >>>>> Sorry for being this late. I don't see any more recent version so I will >>>>> comment here. >>>>> >>>>> I'm trying to figure out how is this supposed to work on s390. My concern >>>>> is, that on s390 PCI memory needs to be accessed by special >>>>> instructions. This is taken care of by the stuff defined in >>>>> arch/s390/include/asm/io.h. E.g. we 'override' __raw_writew so it uses >>>>> the appropriate s390 instruction. However if the code does not use the >>>>> linux abstractions for accessing PCI memory, but assumes it can be >>>>> accessed like RAM, we have a problem. >>>>> >>>>> Looking at this patch, it seems to me, that we might end up with exactly >>>>> the case described. For example AFAICT copy_to_iter() (3) resolves to >>>>> the function in lib/iov_iter.c which does not seem to cater for s390 >>>>> oddities. >>>>> >>>>> I didn't have the time to investigate this properly, and since virtio-fs >>>>> is virtual, we may be able to get around what is otherwise a >>>>> limitation on s390. My understanding of these areas is admittedly >>>>> shallow, and since I'm not sure I'll have much more time to >>>>> invest in the near future I decided to raise concern. >>>>> >>>>> Any opinions? >>>> >>>> Hi Halil, >>>> >>>> I don't understand s390 and how PCI works there as well. Is there any >>>> other transport we can use there to map IO memory directly and access >>>> using DAX? >>>> >>>> BTW, is DAX supported for s390. >>>> >>>> I am also hoping somebody who knows better can chip in. Till that time, >>>> we could still use virtio-fs on s390 without DAX. >>> >>> s390 has so-called "limited" dax support, see CONFIG_FS_DAX_LIMITED. >>> In practice that means that support for PTE_DEVMAP is missing which >>> means no get_user_pages() support for dax mappings. Effectively it's >>> only useful for execute-in-place as operations like fork() and ptrace >>> of dax mappings will fail. >> >> >> This is only true for the dcssblk device driver (drivers/s390/block/dcssblk.c >> and arch/s390/mm/extmem.c). >> >> For what its worth, the dcssblk looks to Linux like normal memory (just above the >> previously detected memory) that can be used like normal memory. In previous time >> we even had struct pages for this memory - this was removed long ago (when it was >> still xip) to reduce the memory footprint for large dcss blocks and small memory >> guests. >> Can the CONFIG_FS_DAX_LIMITED go away if we have struct pages for that memory? >> >> Now some observations: >> - dcssblk is z/VM only (not KVM) >> - Setting CONFIG_FS_DAX_LIMITED globally as a Kconfig option depending on wether >> a device driver is compiled in or not seems not flexible enough in case if you >> have device driver that does have struct pages and another one that doesn't >> - I do not see a reason why we should not be able to map anything from QEMU >> into the guest real memory via an additional KVM memory slot. >> We would need to handle that in the guest somehow (and not as a PCI bar), >> register this with struct pages etc. >> - we must then look how we can create the link between the guest memory and the >> virtio-fs driver. For virtio-ccw we might be able to add a new ccw command or >> whatever. Maybe we could also piggy-back on some memory hotplug work from David >> Hildenbrand (add cc). >> >> Regarding limitations on the platform: >> - while we do have PCI, the virtio devices are usually plugged via the ccw bus. >> That implies no PCI bars. I assume you use those PCI bars only to implicitely >> have the location of the shared memory >> Correct? > > Right. So in essence we just have to provide a vm_get_shm_region callback in the virtio-ccw guest code? How many regions do we have to support? One region per device? Or many? Even if we need more, this should be possible with a 2 new CCWs, e.g READ_SHM_BASE(id) and READ_SHM_SIZE(id) > >> - no real memory mapped I/O. Instead there are instructions that work on the mmio. >> As I understand things, this is of no concern regarding virtio-fs as you do not >> need mmio in the sense that a memory access of the guest to such an address >> triggers an exit. You just need the shared memory as a mean to have the data >> inside the guest. Any notification is done via normal virtqueue mechanisms >> Correct? > > Yep.