All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Eric Dumazet <edumazet@google.com>,
	iommu <iommu@lists.linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH] dma-debug: fix debug_dma_assert_idle(), use rcu_read_lock()
Date: Fri, 14 Aug 2020 17:59:04 -0700	[thread overview]
Message-ID: <CAHk-=whw3QcceKCdYS2ktCPQ96m8Ysyek+w4ny0ygvy7z-_2rw@mail.gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.11.2008141642260.18762@eggly.anvils>

On Fri, Aug 14, 2020 at 5:26 PM Hugh Dickins <hughd@google.com> wrote:
>
> We used to rely on page count there, and on trylock_page() only; but
> there was at least one user whose app went wrong when occasionally we
> COWed the page, just because something else momentarily took a reference
> to it, or locked it.  Around 2006, bug report from 2004: I did look up
> the history a week ago, but was interrupted before taking notes.

I actually think you may be talking about the exact problem that that
debug patch from Dan was originally created for:

  0abdd7a81b7e dma-debug: introduce debug_dma_assert_idle()
  77873803363c net_dma: mark broken

and your memory sounds exactly like that net_dma case (and the timing
matches roughly too - the NET_DMA code was merged in 2006, but I think
people had been playing trial games with it before that).

IOW, net_dma was horribly broken, and just couldn't deal with COW
because it did things wrong.

The thing is, doing extra COW's really shouldn't matter in _any_
half-way correct situation. There's a few cases:

 - user space writing to it, so we COW.

   This is the "simple" case that is obvious and we've always done the
same thing. User space will get the new copy, and there's no possible
situation when that can be wrong.

 - get_user_pages() for reading.

   This is the one we actually used to get wrong, and when another
user *didn't* cow, the data that was read might not match what the
original get_uiser_pages() case expected.

    But in this case, the bug only happened when we didn't cow
aggressively enough.

 - get_user_pages() for writing

   This is another 'simple" case, because it does the COW at
get_user_pages() time and gets it's own copy (which is also installed
in the thread that does the GUP, of course, so a subsequent fork an
danother write can obviously cause *further* COW action).

But in no case should an extra COW matter. Except if somebody uses
get_user_pages() to write to the page, and the COW "hides" that write
by giving a new copy to whoever expected to see it, but that's exactly
the case that Dan's patch was supposed to notice.

And since it never triggered outside of that invalid net_dma case, I
don't think any other case really ever existed.

Yes, I can well imagine that some people loved the concept of that TCP
receive copy offload, but it really was broken, and was removed
entirely by Dan in commit 7bced397510a ("net_dma: simple removal") a
year after being marked broken (the author date makes it look like
it's just a couple of weeks after being marked broken, but the commit
date for that removal is September 2014).

So I don't think that the trylock and checking page counts is a
correctness issue.

It had better not be, because anybody that writes to a shared-cow page
 without breaking COW is simply broken.

No, I really think that the real worry about doing more aggressive
copying is that it doesn't steal back the KSM page or the swap cache
page, so it will leave those pages around, and while they should then
be really easy for the VM to reclaim, I really worry that we have a
couple of decades of VM reclaim tuning with that swap cache reuse
behavior (KSM, not so much).

And while it works fine on my machine, I currently have 40GB or RAM
free, because honestly, the stuff I do doesn't need all that much
memory, and I ridiculously overspecced my new machine RAM'wise. So
nothing I will do would show any problems.

                Linus

WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Christoph Hellwig <hch@lst.de>, Linux-MM <linux-mm@kvack.org>,
	iommu <iommu@lists.linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH] dma-debug: fix debug_dma_assert_idle(), use rcu_read_lock()
Date: Fri, 14 Aug 2020 17:59:04 -0700	[thread overview]
Message-ID: <CAHk-=whw3QcceKCdYS2ktCPQ96m8Ysyek+w4ny0ygvy7z-_2rw@mail.gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.11.2008141642260.18762@eggly.anvils>

On Fri, Aug 14, 2020 at 5:26 PM Hugh Dickins <hughd@google.com> wrote:
>
> We used to rely on page count there, and on trylock_page() only; but
> there was at least one user whose app went wrong when occasionally we
> COWed the page, just because something else momentarily took a reference
> to it, or locked it.  Around 2006, bug report from 2004: I did look up
> the history a week ago, but was interrupted before taking notes.

I actually think you may be talking about the exact problem that that
debug patch from Dan was originally created for:

  0abdd7a81b7e dma-debug: introduce debug_dma_assert_idle()
  77873803363c net_dma: mark broken

and your memory sounds exactly like that net_dma case (and the timing
matches roughly too - the NET_DMA code was merged in 2006, but I think
people had been playing trial games with it before that).

IOW, net_dma was horribly broken, and just couldn't deal with COW
because it did things wrong.

The thing is, doing extra COW's really shouldn't matter in _any_
half-way correct situation. There's a few cases:

 - user space writing to it, so we COW.

   This is the "simple" case that is obvious and we've always done the
same thing. User space will get the new copy, and there's no possible
situation when that can be wrong.

 - get_user_pages() for reading.

   This is the one we actually used to get wrong, and when another
user *didn't* cow, the data that was read might not match what the
original get_uiser_pages() case expected.

    But in this case, the bug only happened when we didn't cow
aggressively enough.

 - get_user_pages() for writing

   This is another 'simple" case, because it does the COW at
get_user_pages() time and gets it's own copy (which is also installed
in the thread that does the GUP, of course, so a subsequent fork an
danother write can obviously cause *further* COW action).

But in no case should an extra COW matter. Except if somebody uses
get_user_pages() to write to the page, and the COW "hides" that write
by giving a new copy to whoever expected to see it, but that's exactly
the case that Dan's patch was supposed to notice.

And since it never triggered outside of that invalid net_dma case, I
don't think any other case really ever existed.

Yes, I can well imagine that some people loved the concept of that TCP
receive copy offload, but it really was broken, and was removed
entirely by Dan in commit 7bced397510a ("net_dma: simple removal") a
year after being marked broken (the author date makes it look like
it's just a couple of weeks after being marked broken, but the commit
date for that removal is September 2014).

So I don't think that the trylock and checking page counts is a
correctness issue.

It had better not be, because anybody that writes to a shared-cow page
 without breaking COW is simply broken.

No, I really think that the real worry about doing more aggressive
copying is that it doesn't steal back the KSM page or the swap cache
page, so it will leave those pages around, and while they should then
be really easy for the VM to reclaim, I really worry that we have a
couple of decades of VM reclaim tuning with that swap cache reuse
behavior (KSM, not so much).

And while it works fine on my machine, I currently have 40GB or RAM
free, because honestly, the stuff I do doesn't need all that much
memory, and I ridiculously overspecced my new machine RAM'wise. So
nothing I will do would show any problems.

                Linus
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2020-08-15 22:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-13  3:17 [PATCH] dma-debug: fix debug_dma_assert_idle(), use rcu_read_lock() Hugh Dickins
2020-08-13  3:17 ` Hugh Dickins via iommu
2020-08-13  3:17 ` Hugh Dickins
2020-08-13 19:02 ` Linus Torvalds
2020-08-13 19:02   ` Linus Torvalds
2020-08-13 19:02   ` Linus Torvalds
2020-08-13 23:37   ` Dan Williams
2020-08-13 23:37     ` Dan Williams
2020-08-13 23:37     ` Dan Williams
2020-08-14  5:42   ` Christoph Hellwig
2020-08-14  5:42     ` Christoph Hellwig
2020-08-14 22:40     ` Linus Torvalds
2020-08-14 22:40       ` Linus Torvalds
2020-08-14 22:40       ` Linus Torvalds
2020-08-15  0:26       ` Hugh Dickins
2020-08-15  0:26         ` Hugh Dickins via iommu
2020-08-15  0:26         ` Hugh Dickins
2020-08-15  0:59         ` Linus Torvalds [this message]
2020-08-15  0:59           ` Linus Torvalds
2020-08-15  0:59           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whw3QcceKCdYS2ktCPQ96m8Ysyek+w4ny0ygvy7z-_2rw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=edumazet@google.com \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.