linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick, Jonathan" <jonathan.derrick@intel.com>
To: "lorenzo.pieralisi@arm.com" <lorenzo.pieralisi@arm.com>,
	"paulmck@kernel.org" <paulmck@kernel.org>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"helgaas@kernel.org" <helgaas@kernel.org>
Subject: Re: [PATCH v2] PCI: vmd: Add indirection layer to vmd irq lists
Date: Fri, 28 Feb 2020 16:36:50 +0000	[thread overview]
Message-ID: <85ed447452dc6f49222711e1d513d7a41e9842c7.camel@intel.com> (raw)
In-Reply-To: <20200228143454.GI2935@paulmck-ThinkPad-P72>

On Fri, 2020-02-28 at 06:34 -0800, Paul E. McKenney wrote:
> On Fri, Feb 28, 2020 at 11:10:10AM +0000, Lorenzo Pieralisi wrote:
> > On Wed, Nov 06, 2019 at 04:25:25PM +0000, Derrick, Jonathan wrote:
> > > On Thu, 2019-10-31 at 16:11 -0700, Paul E. McKenney wrote:
> > > > On Thu, Oct 31, 2019 at 07:08:53AM -0600, Jon Derrick wrote:
> > > > > With CONFIG_MAXSMP and CONFIG_PROVE_LOCKING, the size of an srcu_struct can
> > > > > grow quite large. In one compilation instance it produced a 74KiB data
> > > > > structure. These are embedded in the vmd_irq_list struct, and a N=64 allocation
> > > > > can exceed MAX_ORDER, violating reclaim rules.
> > > > > 
> > > > >   struct srcu_struct {
> > > > >           struct srcu_node   node[521];                    /*     0 75024 */
> > > > >           /* --- cacheline 1172 boundary (75008 bytes) was 16 bytes ago --- */
> > > > >           struct srcu_node *         level[4];             /* 75024    32 */
> > > > >           struct mutex       srcu_cb_mutex;                /* 75056   128 */
> > > > >           /* --- cacheline 1174 boundary (75136 bytes) was 48 bytes ago --- */
> > > > >           spinlock_t                 lock;                 /* 75184    56 */
> > > > >           /* --- cacheline 1175 boundary (75200 bytes) was 40 bytes ago --- */
> > > > >           struct mutex       srcu_gp_mutex;                /* 75240   128 */
> > > > >           /* --- cacheline 1177 boundary (75328 bytes) was 40 bytes ago --- */
> > > > >           unsigned int               srcu_idx;             /* 75368     4 */
> > > > > 
> > > > >           /* XXX 4 bytes hole, try to pack */
> > > > > 
> > > > >           long unsigned int          srcu_gp_seq;          /* 75376     8 */
> > > > >           long unsigned int          srcu_gp_seq_needed;   /* 75384     8 */
> > > > >           /* --- cacheline 1178 boundary (75392 bytes) --- */
> > > > >           long unsigned int          srcu_gp_seq_needed_exp; /* 75392     8 */
> > > > >           long unsigned int          srcu_last_gp_end;     /* 75400     8 */
> > > > >           struct srcu_data *         sda;                  /* 75408     8 */
> > > > >           long unsigned int          srcu_barrier_seq;     /* 75416     8 */
> > > > >           struct mutex       srcu_barrier_mutex;           /* 75424   128 */
> > > > >           /* --- cacheline 1180 boundary (75520 bytes) was 32 bytes ago --- */
> > > > >           struct completion  srcu_barrier_completion;      /* 75552    80 */
> > > > >           /* --- cacheline 1181 boundary (75584 bytes) was 48 bytes ago --- */
> > > > >           atomic_t                   srcu_barrier_cpu_cnt; /* 75632     4 */
> > > > > 
> > > > >           /* XXX 4 bytes hole, try to pack */
> > > > > 
> > > > >           struct delayed_work work;                        /* 75640   152 */
> > > > > 
> > > > >           /* XXX last struct has 4 bytes of padding */
> > > > > 
> > > > >           /* --- cacheline 1184 boundary (75776 bytes) was 16 bytes ago --- */
> > > > >           struct lockdep_map dep_map;                      /* 75792    32 */
> > > > > 
> > > > >           /* size: 75824, cachelines: 1185, members: 17 */
> > > > >           /* sum members: 75816, holes: 2, sum holes: 8 */
> > > > >           /* paddings: 1, sum paddings: 4 */
> > > > >           /* last cacheline: 48 bytes */
> > > > >   };
> > > > > 
> > > > > With N=64 VMD IRQ lists, this would allocate 4.6MiB in a single call. This
> > > > > violates MAX_ORDER reclaim rules when PAGE_SIZE=4096 and
> > > > > MAX_ORDER_NR_PAGES=1024, and invokes the following warning in mm/page_alloc.c:
> > > > > 
> > > > >   /*
> > > > >    * There are several places where we assume that the order value is sane
> > > > >    * so bail out early if the request is out of bound.
> > > > >    */
> > > > >   if (unlikely(order >= MAX_ORDER)) {
> > > > >   	WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> > > > >   	return NULL;
> > > > >   }
> > > > > 
> > > > > This patch changes the irq list array into an array of pointers to irq
> > > > > lists to avoid allocation failures with greater msix counts.
> > > > > 
> > > > > This patch also reverts commit b31822277abcd7c83d1c1c0af876da9ccdf3b7d6.
> > > > > The index_from_irqs() helper was added to calculate the irq list index
> > > > > from the array of irqs, in order to shrink vmd_irq_list for performance.
> > > > > 
> > > > > Due to the embedded srcu_struct within the vmd_irq_list struct having a
> > > > > varying size depending on a number of factors, the vmd_irq_list struct
> > > > > no longer guarantees optimal data structure size and granularity.
> > > > > 
> > > > > Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> > > > > ---
> > > > > Added Paul to make him aware of srcu_struct size with these options
> > > > 
> > > > There was some discussion of making the srcu_struct structure's ->node[]
> > > > array be separately allocated, which would allow this array to be
> > > > rightsize for the system in question.  However, I believe they ended up
> > > > instead separately allocating the srcu_struct structure itself.
> > > > 
> > > > Without doing something like that, I am kind of stuck.  After all,
> > > > at compile time, the kernel build system tells SRCU that it needs to
> > > > be prepared to run on systems with thousands of CPUs.  Which requires
> > > > substantial memory to keep track of all those CPUs.  Which are not
> > > > present on most systems.
> > > > 
> > > > Thoughts?
> > > 
> > > Yes I haven't seen an elegant solution other than making users aware
> > > of the situation.
> > > 
> > > Thanks for your input
> > 
> > Jon, Paul,
> > 
> > I don't know if there was any further development in this area in the
> > meantime, should we proceed with this patch ?
> 
> Let me be more explicit.  Would it be helpful to you guys if there was
> a variable-sized ->node[] array that is separately allocated?  If so,
> please do tell me.  After all, I cannot read your minds  ;-)
> 
Frankly I'm not versed enough in RCU to know the implications of this
change. How often have you come across the same issue I am facing? Is
it worth the effort versus my abstraction? Will it affect performance
or will the Sleepable component absorb it?

> An instance of such a variant would not be available via DEFINE_SRCU(),
> which at compile time would absolutely need to allocate as many elements
> as Kconfig said to allocate. 
So the implication is that it needs allocation later via
init_srcu_struct?

> In addition, instances of srcu_struct
> taking this approach would not be usable until after init_srcu_struct()
> was invoked, which would allocate a right-sized ->node array.
So similar to other lock init functions. Are there existing users doing
RCU before init_srcu_struct is called?

> 
> Again, would this be helpful?
> 
> 							Thanx, Paul

  reply	other threads:[~2020-02-28 16:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-31 13:08 [PATCH v2] PCI: vmd: Add indirection layer to vmd irq lists Jon Derrick
2019-10-31 23:11 ` Paul E. McKenney
2019-11-06 16:25   ` Derrick, Jonathan
2020-02-28 11:10     ` Lorenzo Pieralisi
2020-02-28 14:34       ` Paul E. McKenney
2020-02-28 16:36         ` Derrick, Jonathan [this message]
2022-01-22  0:03 ` Scott Wood
2022-01-22  0:22   ` Paul E. McKenney
2022-01-22  0:19 ` Scott Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85ed447452dc6f49222711e1d513d7a41e9842c7.camel@intel.com \
    --to=jonathan.derrick@intel.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).