All of lore.kernel.org
 help / color / mirror / Atom feed
From: Imran Khan <imran.f.khan@oracle.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: tj@kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org
Subject: Re: [RFC PATCH] kernfs: use kernfs_node specific mutex and spinlock.
Date: Mon, 3 Jan 2022 20:07:09 +1100	[thread overview]
Message-ID: <3f340003-2339-85a9-7d3a-51a488dcc720@oracle.com> (raw)
In-Reply-To: <YcWHcZsBGFVDXYTY@kroah.com>

Hi Greg,

On 24/12/21 7:40 pm, Greg KH wrote:
> On Fri, Dec 24, 2021 at 09:52:51AM +1100, Imran Khan wrote:
>> Hi everyone,
>>
>> On 16/12/21 2:06 am, Imran Khan wrote:
>>> Right now a global mutex (kernfs_open_file_mutex) protects list of
>>> kernfs_open_file instances corresponding to a sysfs attribute, so even
>>> if different tasks are opening or closing different sysfs files they
>>> can contend on osq_lock of this mutex. The contention is more apparent
>>> in large scale systems with few hundred CPUs where most of the CPUs have
>>> running tasks that are opening, accessing or closing sysfs files at any
>>> point of time. Since each list of kernfs_open_file belongs to a
>>> kernfs_open_node instance which in turn corresponds to one kernfs_node,
>>> move global kernfs_open_file_mutex within kernfs_node so that it does
>>> not block access to kernfs_open_file lists corresponding to other
>>> kernfs_node.
>>>
>>> Also since kernfs_node->attr.open points to kernfs_open_node instance
>>> corresponding to the kernfs_node, we can use a kernfs_node specific
>>> spinlock in place of current global spinlock i.e kernfs_open_node_lock.
>>> So make this spinlock local to kernfs_node instance as well.
>>>
>>> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
>>> ---
>>> I have kept this patch as RFC, as I am not sure if I have overlooked any
>>> scenario(s) where these global locks are needed.
>>>
>>
>> Could someone please provide some feedback about this change? Also if
>> there is any issues in this change, can I make these locks per-fs as has
>> been done in [1].
>>
>> [1] https://urldefense.com/v3/__https://lore.kernel.org/lkml/YZvV0ESA*zHHqHBU@google.com/__;Kw!!ACWV5N9M2RV99hQ!ZNLlzuX1cVFEAE5Ila2y8AzhvA3xI4HG4q13ZdcaQN__JaPLy6yuzdV0lypeVEIOHA$ 
> 
> Please test this using some tests to verify that sysfs is still working
> properly and that this actually takes less time overall.  In the
> conversations about the last time kernfs was changed, there were lots of
> discussions about proving that it actually mattered.
> 

Thanks for getting back on this.

Yes sysfs and cgroup are working with this change.

I verified the change:

1. Using LTP sysfs tests
2. Doing CPU hotplugging and reading CPU topology info from sysfs in
parallel. I was getting correct topology information or "No such file or
directory error"

If you could suggest me some further tests, I can test with those as well.

As far as overall time taken was concerned, I did not see any
improvement with my test application (I am running 200 instances of it
on a system with 384 CPUs).
The main loop of this application is as follows (One can use any other
sysfs hierarchy as well):


for (int loop = 0; loop <100 ; loop++)
{
  for (int port_num = 1; port_num < 2; port_num++)
  {
    for (int gid_index = 0; gid_index < 254; gid_index++ )
    {
      char ret_buf[64], ret_buf_lo[64];
      char gid_file_path[1024];
      int      ret_len, ret_fd;
      ssize_t  ret_rd;
      unsigned int  i, saved_errno;

      memset(ret_buf, 0, sizeof(ret_buf));
      memset(gid_file_path, 0, sizeof(gid_file_path));

      ret_len = snprintf(gid_file_path, sizeof(gid_file_path),
                         "/sys/class/infiniband/%s/ports/%d/gids/%d",
                         dev_name, port_num, gid_index);

      ret_fd = open(gid_file_path, O_RDONLY | O_CLOEXEC);
      if (ret_fd < 0)
      {
        printf("Failed to open %s\n", gid_file_path);
        continue;
      }

      /* Read the GID */
      ret_rd = read(ret_fd, ret_buf, 40);

      if (ret_rd == -1)
      {
        printf("Failed to read from file %s, errno: %u\n",
                gid_file_path, saved_errno);

        continue;
      }
      close(ret_fd);
    }
  }
}

The patch just moved the contention from osq_lock (corresponding to
kernfs_open_file_mutex) to read-write semaphore (kernfs_rwsem). I have
tried to address the kernfs_rwsem contention in v2 of this patch set at [1].

v2 of the patch set, reduces the test execution time to half (From ~36
secs to ~18 secs)
and contention around kernfs_rwsem lock is reduced to 1/3rd of earlier case.

8.61%     8.55%  showgids   [kernel.kallsyms]          [k] down_read
7.80%     7.75%  showgids [kernel.kallsyms]            [k] up_read

I will await feedback regarding v2 of this patchset.

[1]
https://lore.kernel.org/lkml/20220103084544.1109829-1-imran.f.khan@oracle.com/

> thanks,
> 
> greg k-h

      reply	other threads:[~2022-01-03  9:07 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 15:06 [RFC PATCH] kernfs: use kernfs_node specific mutex and spinlock Imran Khan
2021-12-23 22:52 ` Imran Khan
2021-12-24  8:40   ` Greg KH
2022-01-03  9:07     ` Imran Khan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f340003-2339-85a9-7d3a-51a488dcc720@oracle.com \
    --to=imran.f.khan@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.