linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jesse Gross <jesse@nicira.com>
To: Jarno Rajahalme <jrajahalme@nicira.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	"dev@openvswitch.org" <dev@openvswitch.org>,
	Pravin Shelar <pshelar@nicira.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: [ovs-dev] [PATCH] ovs: do not allocate memory from offline numa node
Date: Fri, 9 Oct 2015 15:11:19 -0700	[thread overview]
Message-ID: <CAEP_g=8TTh7pQL_DadBPdhfat+gd_XizGJqWK2wvHvo7oy6WaQ@mail.gmail.com> (raw)
In-Reply-To: <ECF39603-F56D-483A-A398-480C28C93F97@nicira.com>

On Fri, Oct 9, 2015 at 8:54 AM, Jarno Rajahalme <jrajahalme@nicira.com> wrote:
>
> On Oct 8, 2015, at 4:03 PM, Jesse Gross <jesse@nicira.com> wrote:
>
> On Wed, Oct 7, 2015 at 10:47 AM, Jarno Rajahalme <jrajahalme@nicira.com>
> wrote:
>
>
> On Oct 6, 2015, at 6:01 PM, Jesse Gross <jesse@nicira.com> wrote:
>
> On Mon, Oct 5, 2015 at 1:25 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>
> On 10/05/2015 06:59 AM, Vlastimil Babka wrote:
>
>
> On 10/02/2015 12:18 PM, Konstantin Khlebnikov wrote:
>
>
> When openvswitch tries allocate memory from offline numa node 0:
> stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO,
> 0)
> It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
> This patch disables numa affinity in this case.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
>
>
> ...
>
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index f2ea83ba4763..c7f74aab34b9 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -93,7 +93,8 @@ struct sw_flow *ovs_flow_alloc(void)
>
>     /* Initialize the default stat node. */
>     stats = kmem_cache_alloc_node(flow_stats_cache,
> -                      GFP_KERNEL | __GFP_ZERO, 0);
> +                      GFP_KERNEL | __GFP_ZERO,
> +                      node_online(0) ? 0 : NUMA_NO_NODE);
>
>
>
> Stupid question: can node 0 become offline between this check, and the
> VM_WARN_ON? :) BTW what kind of system has node 0 offline?
>
>
>
> Another question to ask would be is it possible for node 0 to be online, but
> be a memoryless node?
>
> I would say you are better off just making this call kmem_cache_alloc.  I
> don't see anything that indicates the memory has to come from node 0, so
> adding the extra overhead doesn't provide any value.
>
>
> I agree that this at least makes me wonder, though I actually have
> concerns in the opposite direction - I see assumptions about this
> being on node 0 in net/openvswitch/flow.c.
>
> Jarno, since you original wrote this code, can you take a look to see
> if everything still makes sense?
>
>
> We keep the pre-allocated stats node at array index 0, which is initially
> used by all CPUs, but if CPUs from multiple numa nodes start updating the
> stats, we allocate additional stats nodes (up to one per numa node), and the
> CPUs on node 0 keep using the preallocated entry. If stats cannot be
> allocated from CPUs local node, then those CPUs keep using the entry at
> index 0. Currently the code in net/openvswitch/flow.c will try to allocate
> the local memory repeatedly, which may not be optimal when there is no
> memory at the local node.
>
> Allocating the memory for the index 0 from other than node 0, as discussed
> here, just means that the CPUs on node 0 will keep on using non-local memory
> for stats. In a scenario where there are CPUs on two nodes (0, 1), but only
> the node 1 has memory, a shared flow entry will still end up having separate
> memory allocated for both nodes, but both of the nodes would be at node 1.
> However, there is still a high likelihood that the memory allocations would
> not share a cache line, which should prevent the nodes from invalidating
> each other’s caches. Based on this I do not see a problem relaxing the
> memory allocation for the default stats node. If node 0 has memory, however,
> it would be better to allocate the memory from node 0.
>
>
> Thanks for going through all of that.
>
> It seems like the question that is being raised is whether it actually
> makes sense to try to get the initial memory on node 0, especially
> since it seems to introduce some corner cases? Is there any reason why
> the flow is more likely to hit node 0 than a randomly chosen one?
> (Assuming that this is a multinode system, otherwise it's kind of a
> moot point.) We could have a separate pointer to the default allocated
> memory, so it wouldn't conflict with memory that was intentionally
> allocated for node 0.
>
>
> It would still be preferable to know from which node the default stats node
> was allocated, and store it in the appropriate pointer in the array. We
> could then add a new “default stats node index” that would be used to locate
> the node in the array of pointers we already have. That way we would avoid
> extra allocation and processing of the default stats node.

I agree, that sounds reasonable to me. Will you make that change?

Besides eliminating corner cases, it might help performance in some
cases too by avoiding stressing memory bandwidth on node 0.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-10-09 22:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-02 10:18 [PATCH] ovs: do not allocate memory from offline numa node Konstantin Khlebnikov
2015-10-02 22:38 ` Pravin Shelar
2015-10-05 13:44 ` David Miller
2015-10-05 13:59 ` Vlastimil Babka
2015-10-05 20:25   ` Alexander Duyck
2015-10-07  1:01     ` [ovs-dev] " Jesse Gross
2015-10-07 17:47       ` Jarno Rajahalme
2015-10-08 23:03         ` Jesse Gross
2015-10-09 15:54           ` Jarno Rajahalme
2015-10-09 22:11             ` Jesse Gross [this message]
2015-10-10  0:02               ` Jarno Rajahalme
2015-10-20 17:58                 ` Jarno Rajahalme
2015-10-21  8:55                   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEP_g=8TTh7pQL_DadBPdhfat+gd_XizGJqWK2wvHvo7oy6WaQ@mail.gmail.com' \
    --to=jesse@nicira.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dev@openvswitch.org \
    --cc=jrajahalme@nicira.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=pshelar@nicira.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).