From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50183C433FE for ; Sat, 9 Oct 2021 07:59:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C2CE960F8F for ; Sat, 9 Oct 2021 07:59:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C2CE960F8F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4A0F06B0071; Sat, 9 Oct 2021 03:59:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 42A3F900002; Sat, 9 Oct 2021 03:59:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C9FB6B0073; Sat, 9 Oct 2021 03:59:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id 193026B0071 for ; Sat, 9 Oct 2021 03:59:45 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C6897180AE811 for ; Sat, 9 Oct 2021 07:59:44 +0000 (UTC) X-FDA: 78676149888.32.CE959A7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 5341270032AA for ; Sat, 9 Oct 2021 07:59:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633766383; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6u97Dq8sBwCpsJ9hCCZ7sWpY+KYop/TEoR6sgGmHBpM=; b=PVdkCqZ2sgPEagAPg68Nv3gCLOU3wWigQddkZk9ead7Lqm0LpVU2VEK5B50PkiyC831Q/l OGnh1KWzyi/XkmznJyS4pHraLvNnK/OnwVy862Eah63S1Z/EzrwKQJmsJ7niDpY42tJNox EoysnpQ+f6FJZniepuH1/hezRdcl1MU= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-141-rXDn7QpbNM-DA5zUXgQzIQ-1; Sat, 09 Oct 2021 03:59:38 -0400 X-MC-Unique: rXDn7QpbNM-DA5zUXgQzIQ-1 Received: by mail-wr1-f72.google.com with SMTP id r16-20020adfbb10000000b00160958ed8acso9022290wrg.16 for ; Sat, 09 Oct 2021 00:59:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=6u97Dq8sBwCpsJ9hCCZ7sWpY+KYop/TEoR6sgGmHBpM=; b=zhRGKnHH6HyUgzcxTtyaPNbcbk4imGsIPLNyr26TmRY1F9r0rdXrfqqDaMnEdNruBE 2aG4a+TOnmmqOhN8af4nk4hca8ZOvfgF0zIqOsWIX3YFTiihWzJeP7pjm01FFz4r0zjs B3MGwCgCcxXsZVUuQ/o9QRj3cFuEIh5+VOlJz7T+ePhru0BVpy+rtaJ9G3hAqFMbIZpf C32C5rkwzKOQ6cIwNS3EK9u8Loh+8hiRf6BlTfT/Ep0tfD7z1KXfEGmGDj9jug/PQtXx QZfktYEKYNcWPXgJlFFlNVr56Ncd5DcNza13BW6Zv52zCiacvLF9tbPFAk7CqLgvQwRm w6Bg== X-Gm-Message-State: AOAM532qNPea3fQaOLjbRLBGntIVJC/qn6MKXANBNIWFUdVZH4CZ0CIg ltqVBk6gsDIZMzQsijbUVkAQvcD826xCz9lAD8qP9CkpPcCoRmNUkZPvcODfrVEK3xVa2syPwM6 JJ6BdC+Dwaa8= X-Received: by 2002:adf:bb52:: with SMTP id x18mr9791303wrg.169.1633766376862; Sat, 09 Oct 2021 00:59:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxAd1IVbqSYiVzuYLGs0k5bi+qenL3GHYvxiDEdNaXFITlkCF1NELfCH3kViXWbkhF9QA52vg== X-Received: by 2002:adf:bb52:: with SMTP id x18mr9791283wrg.169.1633766376585; Sat, 09 Oct 2021 00:59:36 -0700 (PDT) Received: from [192.168.3.132] (p4ff236d4.dip0.t-ipconnect.de. [79.242.54.212]) by smtp.gmail.com with ESMTPSA id k10sm1473294wrh.64.2021.10.09.00.59.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 09 Oct 2021 00:59:36 -0700 (PDT) To: Nadav Amit Cc: Andrew Morton , Peter Xu , LKML , Linux-MM , Andrea Arcangeli , Mike Rapoport , Jan Kara , stable@vger.kernel.org References: <20211007235055.469587-1-namit@vmware.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH] mm/userfaultfd: provide unmasked address on page-fault Message-ID: Date: Sat, 9 Oct 2021 09:59:35 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 5341270032AA X-Stat-Signature: sho3pc7u1osw7rsni6wdcfmopga1nyje Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PVdkCqZ2; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf02.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1633766384-564626 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 09.10.21 00:02, Nadav Amit wrote: >=20 >=20 >> On Oct 8, 2021, at 1:05 AM, David Hildenbrand wrote= : >> >> On 08.10.21 01:50, Nadav Amit wrote: >>> From: Nadav Amit >>> Userfaultfd is supposed to provide the full address (i.e., unmasked) = of >>> the faulting access back to userspace. However, that is not the case = for >>> quite some time. >>> Even running "userfaultfd_demo" from the userfaultfd man page provide= s >>> the wrong output (and contradicts the man page). Notice that >>> "UFFD_EVENT_PAGEFAULT event" shows the masked address. >>> Address returned by mmap() =3D 0x7fc5e30b3000 >>> fault_handler_thread(): >>> poll() returns: nready =3D 1; POLLIN =3D 1; POLLERR =3D 0 >>> UFFD_EVENT_PAGEFAULT event: flags =3D 0; address =3D 7fc5e30b300= 0 >>> (uffdio_copy.copy returned 4096) >>> Read address 0x7fc5e30b300f in main(): A >>> Read address 0x7fc5e30b340f in main(): A >>> Read address 0x7fc5e30b380f in main(): A >>> Read address 0x7fc5e30b3c0f in main(): A >>> Add a new "real_address" field to vmf to hold the unmasked address. I= t >>> is possible to keep the unmasked address in the existing address fiel= d >>> (and mask whenever necessary) instead, but this is likely to cause >>> backporting problems of this patch. >> >> Can we be sure that no existing users will rely on this behavior that = has been the case since end of 2016 IIRC, one year after UFFD was upstrea= med? >=20 > Let me to blow off your mind: how do you be sure that the current behav= ior does not make applications to misbehave? It might cause performance i= ssues as it did for me or hidden correctness issues. >=20 Fair point, but now we can speculate what's more likely: Having an app rely on >4 year old kernel behavior just after the feature=20 was released or having and app rely on kernel behavior that was the case=20 for the last 4 years? Someone once told me about the unwritten way to remove things from the=20 kernel. 1) Silently break it upstream 2) Wait 2 kernel releases 3)=20 Propose removal of the feature because it's broken and nobody complained. <\offtopic> You might ask "why does David even care?", here is why: For the records, I *do* have a prototype from last year that breaks with=20 this new behavior as far as I can tell: using uffd in the context of=20 virtio-balloon in QEMU. I just pushed the latest state to a !private=20 github tree: https://github.com/davidhildenbrand/qemu/tree/virtio-balloon-uffd In that code, I made sure that I'm only dealing with 4k pages (because=20 that's the only thing virtio-balloon really can deal with), and during=20 the debugging I figured that the kernel always returns 4k aligned page=20 fault addresses, so I didn't care about masking. I'll reuse the=20 unmodified fault address for UFFDIO_ZEROPAGE()/UFFDIO_COPY()/... which=20 should then fail because: " EINVAL The start or the len field of the ufdio_range structure was not a multiple of the system page size; or len was zero; or the specified range was otherwise invalid. " If I'm too lazy to read all documentation, I'm quite sure that there are=20 other people that don't. I don't care to much if this patch breaks that=20 prototype, it's just a prototype after all, but I am concerned that we=20 might break other users in a similar way. >> I do wonder what the official ABI nowadays is, because man pages aren'= t necessarily the source of truth. >=20 > Documentation/admin-guide/mm/userfaultfd.rst says: "You get the address= of the access that triggered the missing page > event=E2=80=9D. >=20 > So it is a bug. The least thing I would expect in the patch description is a better=20 motivation ("who cares and why" -- I know you have a better motivation=20 that making the doc correct :) ) and a discussion on the chances of this=20 actually breaking other apps (see my example). I'd sleep better if we'd glue the changed behavior to a new feature=20 flag, but that's just my 2 cents. --=20 Thanks, David / dhildenb