From: "Derrick, Jonathan" <jonathan.derrick@intel.com>
To: "paulmck@kernel.org" <paulmck@kernel.org>
Cc: "lorenzo.pieralisi@arm.com" <lorenzo.pieralisi@arm.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"helgaas@kernel.org" <helgaas@kernel.org>
Subject: Re: [PATCH v2] PCI: vmd: Add indirection layer to vmd irq lists
Date: Wed, 6 Nov 2019 16:25:25 +0000 [thread overview]
Message-ID: <14aa0466567ebf9bff1301c81214a449c581c998.camel@intel.com> (raw)
In-Reply-To: <20191031231126.GG20975@paulmck-ThinkPad-P72>
On Thu, 2019-10-31 at 16:11 -0700, Paul E. McKenney wrote:
> On Thu, Oct 31, 2019 at 07:08:53AM -0600, Jon Derrick wrote:
> > With CONFIG_MAXSMP and CONFIG_PROVE_LOCKING, the size of an srcu_struct can
> > grow quite large. In one compilation instance it produced a 74KiB data
> > structure. These are embedded in the vmd_irq_list struct, and a N=64 allocation
> > can exceed MAX_ORDER, violating reclaim rules.
> >
> > struct srcu_struct {
> > struct srcu_node node[521]; /* 0 75024 */
> > /* --- cacheline 1172 boundary (75008 bytes) was 16 bytes ago --- */
> > struct srcu_node * level[4]; /* 75024 32 */
> > struct mutex srcu_cb_mutex; /* 75056 128 */
> > /* --- cacheline 1174 boundary (75136 bytes) was 48 bytes ago --- */
> > spinlock_t lock; /* 75184 56 */
> > /* --- cacheline 1175 boundary (75200 bytes) was 40 bytes ago --- */
> > struct mutex srcu_gp_mutex; /* 75240 128 */
> > /* --- cacheline 1177 boundary (75328 bytes) was 40 bytes ago --- */
> > unsigned int srcu_idx; /* 75368 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > long unsigned int srcu_gp_seq; /* 75376 8 */
> > long unsigned int srcu_gp_seq_needed; /* 75384 8 */
> > /* --- cacheline 1178 boundary (75392 bytes) --- */
> > long unsigned int srcu_gp_seq_needed_exp; /* 75392 8 */
> > long unsigned int srcu_last_gp_end; /* 75400 8 */
> > struct srcu_data * sda; /* 75408 8 */
> > long unsigned int srcu_barrier_seq; /* 75416 8 */
> > struct mutex srcu_barrier_mutex; /* 75424 128 */
> > /* --- cacheline 1180 boundary (75520 bytes) was 32 bytes ago --- */
> > struct completion srcu_barrier_completion; /* 75552 80 */
> > /* --- cacheline 1181 boundary (75584 bytes) was 48 bytes ago --- */
> > atomic_t srcu_barrier_cpu_cnt; /* 75632 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > struct delayed_work work; /* 75640 152 */
> >
> > /* XXX last struct has 4 bytes of padding */
> >
> > /* --- cacheline 1184 boundary (75776 bytes) was 16 bytes ago --- */
> > struct lockdep_map dep_map; /* 75792 32 */
> >
> > /* size: 75824, cachelines: 1185, members: 17 */
> > /* sum members: 75816, holes: 2, sum holes: 8 */
> > /* paddings: 1, sum paddings: 4 */
> > /* last cacheline: 48 bytes */
> > };
> >
> > With N=64 VMD IRQ lists, this would allocate 4.6MiB in a single call. This
> > violates MAX_ORDER reclaim rules when PAGE_SIZE=4096 and
> > MAX_ORDER_NR_PAGES=1024, and invokes the following warning in mm/page_alloc.c:
> >
> > /*
> > * There are several places where we assume that the order value is sane
> > * so bail out early if the request is out of bound.
> > */
> > if (unlikely(order >= MAX_ORDER)) {
> > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> > return NULL;
> > }
> >
> > This patch changes the irq list array into an array of pointers to irq
> > lists to avoid allocation failures with greater msix counts.
> >
> > This patch also reverts commit b31822277abcd7c83d1c1c0af876da9ccdf3b7d6.
> > The index_from_irqs() helper was added to calculate the irq list index
> > from the array of irqs, in order to shrink vmd_irq_list for performance.
> >
> > Due to the embedded srcu_struct within the vmd_irq_list struct having a
> > varying size depending on a number of factors, the vmd_irq_list struct
> > no longer guarantees optimal data structure size and granularity.
> >
> > Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> > ---
> > Added Paul to make him aware of srcu_struct size with these options
>
> There was some discussion of making the srcu_struct structure's ->node[]
> array be separately allocated, which would allow this array to be
> rightsize for the system in question. However, I believe they ended up
> instead separately allocating the srcu_struct structure itself.
>
> Without doing something like that, I am kind of stuck. After all,
> at compile time, the kernel build system tells SRCU that it needs to
> be prepared to run on systems with thousands of CPUs. Which requires
> substantial memory to keep track of all those CPUs. Which are not
> present on most systems.
>
> Thoughts?
>
> Thanx, Paul
>
Yes I haven't seen an elegant solution other than making users aware of
the situation.
Thanks for your input
next prev parent reply other threads:[~2019-11-06 16:25 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-31 13:08 [PATCH v2] PCI: vmd: Add indirection layer to vmd irq lists Jon Derrick
2019-10-31 23:11 ` Paul E. McKenney
2019-11-06 16:25 ` Derrick, Jonathan [this message]
2020-02-28 11:10 ` Lorenzo Pieralisi
2020-02-28 14:34 ` Paul E. McKenney
2020-02-28 16:36 ` Derrick, Jonathan
2022-01-22 0:03 ` Scott Wood
2022-01-22 0:22 ` Paul E. McKenney
2022-01-22 0:19 ` Scott Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14aa0466567ebf9bff1301c81214a449c581c998.camel@intel.com \
--to=jonathan.derrick@intel.com \
--cc=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=paulmck@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).