All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Muchun Song <songmuchun@bytedance.com>,
	Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	rafael@kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Shakeel Butt <shakeelb@google.com>, Will Deacon <will@kernel.org>,
	Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>,
	Neil Brown <neilb@suse.de>,
	rppt@kernel.org, Sami Tolvanen <samitolvanen@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Feng Tang <feng.tang@intel.com>, Paolo Abeni <pabeni@redhat.com>,
	Willem de Bruijn <willemb@google.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Florian Westphal <fw@strlen.de>,
	gustavoars@kernel.org, Pablo Neira Ayuso <pablo@netfilter.org>,
	Dexuan Cui <decui@microsoft.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	dave@stgolabs.net, Michel Lespinasse <walken@google.com>,
	Jann Horn <jannh@google.com>,
	chenqiwu@xiaomi.com, christophe.leroy@c-s.fr,
	Minchan Kim <minchan@kernel.org>, Martin KaFai Lau <kafai@fb.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Kees Cook <keescook@chromium.org>,
	LKML <linux-kernel@vger.kernel.org>,
	virtualization@lists.linux-foundation.org,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [External] Re: [PATCH] mm: proc: add Sock to /proc/meminfo
Date: Mon, 12 Oct 2020 11:24:05 +0200	[thread overview]
Message-ID: <9262ea44-fc3a-0b30-54dd-526e16df85d1@gmail.com> (raw)
In-Reply-To: <CAMZfGtXVKER_GM-wwqxrUshDzcEg9FkS3x_BaMTVyeqdYPGSkw@mail.gmail.com>



On 10/12/20 10:39 AM, Muchun Song wrote:
> On Mon, Oct 12, 2020 at 3:42 PM Eric Dumazet <edumazet@google.com> wrote:
>>
>> On Mon, Oct 12, 2020 at 6:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>>
>>> On Mon, Oct 12, 2020 at 2:39 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>
>>>> On Sat, Oct 10, 2020 at 3:39 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>>>>
>>>>> The amount of memory allocated to sockets buffer can become significant.
>>>>> However, we do not display the amount of memory consumed by sockets
>>>>> buffer. In this case, knowing where the memory is consumed by the kernel
>>>>
>>>> We do it via `ss -m`. Is it not sufficient? And if not, why not adding it there
>>>> rather than /proc/meminfo?
>>>
>>> If the system has little free memory, we can know where the memory is via
>>> /proc/meminfo. If a lot of memory is consumed by socket buffer, we cannot
>>> know it when the Sock is not shown in the /proc/meminfo. If the unaware user
>>> can't think of the socket buffer, naturally they will not `ss -m`. The
>>> end result
>>> is that we still don’t know where the memory is consumed. And we add the
>>> Sock to the /proc/meminfo just like the memcg does('sock' item in the cgroup
>>> v2 memory.stat). So I think that adding to /proc/meminfo is sufficient.
>>>
>>>>
>>>>>  static inline void __skb_frag_unref(skb_frag_t *frag)
>>>>>  {
>>>>> -       put_page(skb_frag_page(frag));
>>>>> +       struct page *page = skb_frag_page(frag);
>>>>> +
>>>>> +       if (put_page_testzero(page)) {
>>>>> +               dec_sock_node_page_state(page);
>>>>> +               __put_page(page);
>>>>> +       }
>>>>>  }
>>>>
>>>> You mix socket page frag with skb frag at least, not sure this is exactly
>>>> what you want, because clearly skb page frags are frequently used
>>>> by network drivers rather than sockets.
>>>>
>>>> Also, which one matches this dec_sock_node_page_state()? Clearly
>>>> not skb_fill_page_desc() or __skb_frag_ref().
>>>
>>> Yeah, we call inc_sock_node_page_state() in the skb_page_frag_refill().
>>> So if someone gets the page returned by skb_page_frag_refill(), it must
>>> put the page via __skb_frag_unref()/skb_frag_unref(). We use PG_private
>>> to indicate that we need to dec the node page state when the refcount of
>>> page reaches zero.
>>>
>>
>> Pages can be transferred from pipe to socket, socket to pipe (splice()
>> and zerocopy friends...)
>>
>>  If you want to track TCP memory allocations, you always can look at
>> /proc/net/sockstat,
>> without adding yet another expensive memory accounting.
> 
> The 'mem' item in the /proc/net/sockstat does not represent real
> memory usage. This is just the total amount of charged memory.
> 
> For example, if a task sends a 10-byte message, it only charges one
> page to memcg. But the system may allocate 8 pages. Therefore, it
> does not truly reflect the memory allocated by the above memory
> allocation path. We can see the difference via the following message.
> 
> cat /proc/net/sockstat
>   sockets: used 698
>   TCP: inuse 70 orphan 0 tw 617 alloc 134 mem 13
>   UDP: inuse 90 mem 4
>   UDPLITE: inuse 0
>   RAW: inuse 1
>   FRAG: inuse 0 memory 0
> 
> cat /proc/meminfo | grep Sock
>   Sock:              13664 kB
> 
> The /proc/net/sockstat only shows us that there are 17*4 kB TCP
> memory allocations. But apply this patch, we can see that we truly
> allocate 13664 kB(May be greater than this value because of per-cpu
> stat cache). Of course the load of the example here is not high. In
> some high load cases, I believe the difference here will be even
> greater.
> 

This is great, but you have not addressed my feedback.

TCP memory allocations are bounded by /proc/sys/net/ipv4/tcp_mem

Fact that the memory is forward allocated or not is a detail.

If you think we must pre-allocate memory, instead of forward allocations,
your patch does not address this. Adding one line per consumer in /proc/meminfo looks
wrong to me.

If you do not want 9.37 % of physical memory being possibly used by TCP,
just change /proc/sys/net/ipv4/tcp_mem accordingly ?



WARNING: multiple messages have this Message-ID (diff)
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Muchun Song <songmuchun@bytedance.com>,
	Eric Dumazet <edumazet@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
	Feng Tang <feng.tang@intel.com>, Michal Hocko <mhocko@suse.com>,
	rafael@kernel.org, Neil Brown <neilb@suse.de>,
	Alexei Starovoitov <ast@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Michel Lespinasse <walken@google.com>,
	Will Deacon <will@kernel.org>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	dave@stgolabs.net, Herbert Xu <herbert@gondor.apana.org.au>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Dexuan Cui <decui@microsoft.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Sami Tolvanen <samitolvanen@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Kees Cook <keescook@chromium.org>, Jann Horn <jannh@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	virtualization@lists.linux-foundation.org, chenqiwu@xiaomi.com,
	Martin KaFai Lau <kafai@fb.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	christophe.leroy@c-s.fr, Willem de Bruijn <willemb@google.com>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Florian Westphal <fw@strlen.de>,
	gustavoars@kernel.org, Roman Gushchin <guro@fb.com>,
	Minchan Kim <minchan@kernel.org>,
	rppt@kernel.org,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [External] Re: [PATCH] mm: proc: add Sock to /proc/meminfo
Date: Mon, 12 Oct 2020 11:24:05 +0200	[thread overview]
Message-ID: <9262ea44-fc3a-0b30-54dd-526e16df85d1@gmail.com> (raw)
In-Reply-To: <CAMZfGtXVKER_GM-wwqxrUshDzcEg9FkS3x_BaMTVyeqdYPGSkw@mail.gmail.com>



On 10/12/20 10:39 AM, Muchun Song wrote:
> On Mon, Oct 12, 2020 at 3:42 PM Eric Dumazet <edumazet@google.com> wrote:
>>
>> On Mon, Oct 12, 2020 at 6:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>>
>>> On Mon, Oct 12, 2020 at 2:39 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>
>>>> On Sat, Oct 10, 2020 at 3:39 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>>>>
>>>>> The amount of memory allocated to sockets buffer can become significant.
>>>>> However, we do not display the amount of memory consumed by sockets
>>>>> buffer. In this case, knowing where the memory is consumed by the kernel
>>>>
>>>> We do it via `ss -m`. Is it not sufficient? And if not, why not adding it there
>>>> rather than /proc/meminfo?
>>>
>>> If the system has little free memory, we can know where the memory is via
>>> /proc/meminfo. If a lot of memory is consumed by socket buffer, we cannot
>>> know it when the Sock is not shown in the /proc/meminfo. If the unaware user
>>> can't think of the socket buffer, naturally they will not `ss -m`. The
>>> end result
>>> is that we still don’t know where the memory is consumed. And we add the
>>> Sock to the /proc/meminfo just like the memcg does('sock' item in the cgroup
>>> v2 memory.stat). So I think that adding to /proc/meminfo is sufficient.
>>>
>>>>
>>>>>  static inline void __skb_frag_unref(skb_frag_t *frag)
>>>>>  {
>>>>> -       put_page(skb_frag_page(frag));
>>>>> +       struct page *page = skb_frag_page(frag);
>>>>> +
>>>>> +       if (put_page_testzero(page)) {
>>>>> +               dec_sock_node_page_state(page);
>>>>> +               __put_page(page);
>>>>> +       }
>>>>>  }
>>>>
>>>> You mix socket page frag with skb frag at least, not sure this is exactly
>>>> what you want, because clearly skb page frags are frequently used
>>>> by network drivers rather than sockets.
>>>>
>>>> Also, which one matches this dec_sock_node_page_state()? Clearly
>>>> not skb_fill_page_desc() or __skb_frag_ref().
>>>
>>> Yeah, we call inc_sock_node_page_state() in the skb_page_frag_refill().
>>> So if someone gets the page returned by skb_page_frag_refill(), it must
>>> put the page via __skb_frag_unref()/skb_frag_unref(). We use PG_private
>>> to indicate that we need to dec the node page state when the refcount of
>>> page reaches zero.
>>>
>>
>> Pages can be transferred from pipe to socket, socket to pipe (splice()
>> and zerocopy friends...)
>>
>>  If you want to track TCP memory allocations, you always can look at
>> /proc/net/sockstat,
>> without adding yet another expensive memory accounting.
> 
> The 'mem' item in the /proc/net/sockstat does not represent real
> memory usage. This is just the total amount of charged memory.
> 
> For example, if a task sends a 10-byte message, it only charges one
> page to memcg. But the system may allocate 8 pages. Therefore, it
> does not truly reflect the memory allocated by the above memory
> allocation path. We can see the difference via the following message.
> 
> cat /proc/net/sockstat
>   sockets: used 698
>   TCP: inuse 70 orphan 0 tw 617 alloc 134 mem 13
>   UDP: inuse 90 mem 4
>   UDPLITE: inuse 0
>   RAW: inuse 1
>   FRAG: inuse 0 memory 0
> 
> cat /proc/meminfo | grep Sock
>   Sock:              13664 kB
> 
> The /proc/net/sockstat only shows us that there are 17*4 kB TCP
> memory allocations. But apply this patch, we can see that we truly
> allocate 13664 kB(May be greater than this value because of per-cpu
> stat cache). Of course the load of the example here is not high. In
> some high load cases, I believe the difference here will be even
> greater.
> 

This is great, but you have not addressed my feedback.

TCP memory allocations are bounded by /proc/sys/net/ipv4/tcp_mem

Fact that the memory is forward allocated or not is a detail.

If you think we must pre-allocate memory, instead of forward allocations,
your patch does not address this. Adding one line per consumer in /proc/meminfo looks
wrong to me.

If you do not want 9.37 % of physical memory being possibly used by TCP,
just change /proc/sys/net/ipv4/tcp_mem accordingly ?


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2020-10-12  9:24 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-10 10:38 [PATCH] mm: proc: add Sock to /proc/meminfo Muchun Song
2020-10-10 13:06 ` kernel test robot
2020-10-10 13:27 ` kernel test robot
2020-10-10 16:36 ` Randy Dunlap
2020-10-10 16:36   ` Randy Dunlap
2020-10-11  4:42   ` [External] " Muchun Song
2020-10-11  4:42     ` Muchun Song
2020-10-11 13:52 ` Mike Rapoport
2020-10-11 16:00   ` [External] " Muchun Song
2020-10-11 16:00     ` Muchun Song
2020-10-11 18:39 ` Cong Wang
2020-10-11 18:39   ` Cong Wang
2020-10-11 18:39   ` Cong Wang
2020-10-12  4:22   ` [External] " Muchun Song
2020-10-12  4:22     ` Muchun Song
2020-10-12  7:42     ` Eric Dumazet
2020-10-12  7:42       ` Eric Dumazet
2020-10-12  8:39       ` Muchun Song
2020-10-12  8:39         ` Muchun Song
2020-10-12  9:24         ` Eric Dumazet [this message]
2020-10-12  9:24           ` Eric Dumazet
2020-10-12  9:53           ` Muchun Song
2020-10-12  9:53             ` Muchun Song
2020-10-12 22:12             ` Cong Wang
2020-10-12 22:12               ` Cong Wang
2020-10-12 22:12               ` Cong Wang
2020-10-13  3:52               ` Muchun Song
2020-10-13  3:52                 ` Muchun Song
2020-10-13  6:55             ` Eric Dumazet
2020-10-13  6:55               ` Eric Dumazet
2020-10-13  8:09             ` Mike Rapoport
2020-10-13 14:43               ` Randy Dunlap
2020-10-13 14:43                 ` Randy Dunlap
2020-10-13 15:12                 ` Mike Rapoport
2020-10-13 15:21                   ` Randy Dunlap
2020-10-13 15:21                     ` Randy Dunlap
2020-10-14  5:34                     ` Mike Rapoport
2020-10-13 15:28               ` Muchun Song
2020-10-13 15:28                 ` Muchun Song
2020-10-16 15:38               ` Vlastimil Babka
2020-10-16 15:38                 ` Vlastimil Babka
2020-10-16 20:53                 ` Minchan Kim
2020-10-16 20:53                   ` Minchan Kim
2020-10-19 17:23                   ` Shakeel Butt
2020-10-19 17:23                     ` Shakeel Butt
2020-10-12 21:46     ` Cong Wang
2020-10-12 21:46       ` Cong Wang
2020-10-12 21:46       ` Cong Wang
2020-10-13  3:29       ` Muchun Song
2020-10-13  3:29         ` Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9262ea44-fc3a-0b30-54dd-526e16df85d1@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=chenqiwu@xiaomi.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=christophe.leroy@c-s.fr \
    --cc=daniel@iogearbox.net \
    --cc=dave@stgolabs.net \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=ebiederm@xmission.com \
    --cc=edumazet@google.com \
    --cc=feng.tang@intel.com \
    --cc=fw@strlen.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=guro@fb.com \
    --cc=gustavoars@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jakub@cloudflare.com \
    --cc=jannh@google.com \
    --cc=jasowang@redhat.com \
    --cc=kafai@fb.com \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kuba@kernel.org \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linmiaohe@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=mst@redhat.com \
    --cc=neilb@suse.de \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rppt@kernel.org \
    --cc=samitolvanen@google.com \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=steffen.klassert@secunet.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=walken@google.com \
    --cc=will@kernel.org \
    --cc=willemb@google.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.