All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
To: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>,
	linux-gpio@vger.kernel.org,
	Linus Walleij <linus.walleij@linaro.org>,
	linux-kernel@vger.kernel.org,
	Thorsten Leemhuis <regressions@leemhuis.info>
Subject: Re: INFO: REPRODUCED: memory leak in gpio device in 6.2-rc6
Date: Mon, 20 Feb 2023 14:10:00 +0100	[thread overview]
Message-ID: <4b001ce6-b35d-3ad1-b757-f5f6baca7b51@alu.unizg.hr> (raw)
In-Reply-To: <CAMRc=Mcx=Ko5H_c1YGzA5Jfu3KJqx1pfL3RZuMrV6oTObnUrhQ@mail.gmail.com>

On 2/16/23 15:16, Bartosz Golaszewski wrote:
> On Wed, Feb 15, 2023 at 3:45 PM Mirsad Goran Todorovac
> <mirsad.todorovac@alu.unizg.hr> wrote:
>>
>> On 15.2.2023. 11:53, Andy Shevchenko wrote:
>>> On Tue, Feb 14, 2023 at 07:19:16PM +0100, Mirsad Goran Todorovac wrote:
>>>> On 12. 02. 2023. 15:19, Andy Shevchenko wrote:
>>>>> On Wed, Feb 08, 2023 at 08:55:24PM +0100, Mirsad Goran Todorovac wrote:
>>>>>> On 31. 01. 2023. 10:36, Mirsad Goran Todorovac wrote:
>>>>>>> I came across this memory leak apparently in the GPIO device driver.
>>>>>>> It is still present in 6.2-rc6 release candidate kernel (just ran kselftest).
>>>>>>>
>>>>>>> This is a vanilla Torvalds tree kernel with MGLRU and KMEMLEAK (obviously)
>>>>>>> enabled.
>>>>>>>
>>>>>>> If you think this bug is significant, I can attempt the bug bisect in the
>>>>>>> environment that triggered it (Lenovo LENOVO_MT_10TX_BU_Lenovo_FM_V530S-07ICB)
>>>>>>> with BIOS M22KT49A from 11/10/2022 and AlmaLinux 8.7.
>>>>>>>
>>>>>>> Here is the /sys/kernel/debug/kmemleak output:
>>>>>>>
>>>>>>> unreferenced object 0xffff9e67ad71f160 (size 32):
>>>>>>>     comm "gpio-sim.sh", pid 208926, jiffies 4372229685 (age 2101.564s)
>>>>>>>     hex dump (first 32 bytes):
>>>>>>>       67 70 69 6f 2d 73 69 6d 2e 30 2d 6e 6f 64 65 30  gpio-sim.0-node0
>>>>>>>       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>>>     backtrace:
>>>>>>>       [<0000000098bf3d1b>] slab_post_alloc_hook+0x91/0x320
>>>>>>>       [<00000000da3205c5>] __kmem_cache_alloc_node+0x1bf/0x2b0
>>>>>>>       [<00000000aa51a58a>] __kmalloc_node_track_caller+0x55/0x140
>>>>>>>       [<00000000bd682ecc>] kvasprintf+0x6b/0xd0
>>>>>>>       [<00000000a3431d55>] kasprintf+0x4e/0x70
>>>>>>>       [<00000000f52d2629>] gpio_sim_device_config_live_store+0x401/0x59d [gpio_sim]
>>>>>>>       [<00000000673fc6df>] configfs_write_iter+0xcc/0x130
>>>>>>>       [<000000001d5d0829>] vfs_write+0x2b4/0x3d0
>>>>>>>       [<00000000d2336251>] ksys_write+0x61/0xe0
>>>>>>>       [<00000000f7015bb1>] __x64_sys_write+0x1a/0x20
>>>>>>>       [<000000008ac743d2>] do_syscall_64+0x58/0x80
>>>>>>>       [<000000004d7b7d50>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>
>>>> Sorry, but unfortunately this patch didn't fix the memleak. Please see the result:
>>>
>>> Thank you for give a try!
>>
>> No sweat. It was worth the effort.
>>
>>> Yeah, that's why I put that I'm skeptical, because while patch is correct per
>>> se it wouldn't prevent the initial leakage (it seems it happens due to other
>>> circumstances).
>>
>> I must admit that is looks like hieroglyphs to me. My learning curve had not
>> yet reached that point of debugging memory leaks in the kernel drivers ...
>>
>>>> root@marvin-IdeaPad-3-15ITL6:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/gpio# echo clear > /sys/kernel/debug/kmemleak
>>>> root@marvin-IdeaPad-3-15ITL6:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/gpio# time ./gpio-sim.sh
>>>> trap: SIGTERM: bad trap
>>>> 1. chip_name and dev_name attributes
>>>> 1.1. Chip name is communicated to user
>>>> 1.2. chip_name returns 'none' if the chip is still pending
>>>> 1.3. Device name is communicated to user
>>>> 2. Creating and configuring simulated chips
>>>> 2.1. Default number of lines is 1
>>>> 2.2. Number of lines can be specified
>>>> 2.3. Label can be set
>>>> 2.4. Label can be left empty
>>>> 2.5. Line names can be configured
>>>> 2.6. Line config can remain unused if offset is greater than number of lines
>>>> 2.7. Line configfs directory names are sanitized
>>>> 2.8. Multiple chips can be created
>>>> 2.9. Can't modify settings when chip is live
>>>> 2.10. Can't create line items when chip is live
>>>> 2.11. Probe errors are propagated to user-space
>>>> 2.12. Cannot enable a chip without any GPIO banks
>>>> 2.13. Duplicate chip labels are not allowed
>>>> 2.14. Lines can be hogged
>>>> 3. Controlling simulated chips
>>>> 3.1. Pull can be set over sysfs
>>>> 3.2. Pull can be read from sysfs
>>>> 3.3. Incorrect input in sysfs is rejected
>>>> 3.4. Can't write to value
>>>> 4. Simulated GPIO chips are functional
>>>> 4.1. Values can be read from sysfs
>>>> 4.2. Bias settings work correctly
>>>> GPIO gpio-sim test PASS
>>>>
>>>> real 0m1.120s
>>>> user 0m0.283s
>>>> sys  0m0.842s
>>>> root@marvin-IdeaPad-3-15ITL6:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/gpio# echo scan > /sys/kernel/debug/kmemleak
>>>> root@marvin-IdeaPad-3-15ITL6:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/gpio# cat /sys/kernel/debug/kmemleak
>>>> unreferenced object 0xffff91f019219660 (size 32):
>>>>     comm "gpio-sim.sh", pid 11223, jiffies 4295028142 (age 87.304s)
>>>>     hex dump (first 32 bytes):
>>>>       67 70 69 6f 2d 73 69 6d 2e 30 2d 6e 6f 64 65 30  gpio-sim.0-node0
>>>>       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>     backtrace:
>>>>       [<ffffffff86fcd408>] __kmem_cache_alloc_node+0x1d8/0x330
>>>>       [<ffffffff86f4c271>] __kmalloc_node_track_caller+0x51/0xd0
>>>>       [<ffffffff872fb359>] kvasprintf+0x69/0xd0
>>>>       [<ffffffff872fb4ce>] kasprintf+0x4e/0x70
>>>>       [<ffffffffc1864553>] gpio_sim_device_config_live_store+0x483/0x6b5 [gpio_sim]
>>>>       [<ffffffff870dde9c>] configfs_write_iter+0xcc/0x130
>>>>       [<ffffffff87016869>] vfs_write+0x1f9/0x3b0
>>>>       [<ffffffff87016dab>] ksys_write+0x6b/0xf0
>>>>       [<ffffffff87016e59>] __x64_sys_write+0x19/0x20
>>>>       [<ffffffff87b91188>] do_syscall_64+0x58/0x80
>>>>       [<ffffffff87c000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>> root@marvin-IdeaPad-3-15ITL6:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/gpio# uname -rms
>>>> Linux 6.2.0-rc8-lru-km-andy-00015-gf6feea56f66d-dirty x86_64
>>>> root@marvin-IdeaPad-3-15ITL6:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/gpio#
>>>>
>>>> I can verify that I build the right patch:
>>>>
>>>> marvin@marvin-IdeaPad-3-15ITL6:~/linux/kernel/linux_torvalds$ git diff
>>>> diff --git a/drivers/gpio/gpio-sim.c b/drivers/gpio/gpio-sim.c
>>>> index 60514bc5454f..7f79e49b23d7 100644
>>>> --- a/drivers/gpio/gpio-sim.c
>>>> +++ b/drivers/gpio/gpio-sim.c
>>>> @@ -954,9 +954,9 @@ static void gpio_sim_device_deactivate_unlocked(struct gpio_sim_device *dev)
>>>>
>>>>           swnode = dev_fwnode(&dev->pdev->dev);
>>>>           platform_device_unregister(dev->pdev);
>>>> +       gpio_sim_remove_hogs(dev);
>>>>           gpio_sim_remove_swnode_recursive(swnode);
>>>>           dev->pdev = NULL;
>>>> -       gpio_sim_remove_hogs(dev);
>>>>    }
>>>>
>>>>    static ssize_t
>>>> marvin@marvin-IdeaPad-3-15ITL6:~/linux/kernel/linux_torvalds$
>>>>
>>>> Alternatively, I could try to bisect if you think it's prudent to try that.
>>>> But first I need a stroll after this kernel build :-)
>>>>
>>>> Do you think that knowing when the bug was introduced might help find the culprit?

> Thanks for the report Mirsad, just sent out a fix.

Hi, Bart,

It is really nothing. The reproducer shows that the leak is apparently gone.

As Mr. McKenney once said, a bunch of monkeys with keyboard could
have done it in a considerable number of trials and errors ;-)

But here I have something that could potentially leak as well. I could not devise a
reproducer due to the leak being lightly triggered only in extreme memory contention.

See it for yourself:

drivers/gpio/gpio-sim.c:
  301 static int gpio_sim_setup_sysfs(struct gpio_sim_chip *chip)
  302 {
  303         struct device_attribute *val_dev_attr, *pull_dev_attr;
  304         struct gpio_sim_attribute *val_attr, *pull_attr;
  305         unsigned int num_lines = chip->gc.ngpio;
  306         struct device *dev = chip->gc.parent;
  307         struct attribute_group *attr_group;
  308         struct attribute **attrs;
  309         int i, ret;
  310
  311         chip->attr_groups = devm_kcalloc(dev, sizeof(*chip->attr_groups),
  312                                          num_lines + 1, GFP_KERNEL);
  313         if (!chip->attr_groups)
  314                 return -ENOMEM;
  315
  316         for (i = 0; i < num_lines; i++) {
  317                 attr_group = devm_kzalloc(dev, sizeof(*attr_group), GFP_KERNEL);
  318                 attrs = devm_kcalloc(dev, GPIO_SIM_NUM_ATTRS, sizeof(*attrs),
  319                                      GFP_KERNEL);
  320                 val_attr = devm_kzalloc(dev, sizeof(*val_attr), GFP_KERNEL);
  321                 pull_attr = devm_kzalloc(dev, sizeof(*pull_attr), GFP_KERNEL);
  322                 if (!attr_group || !attrs || !val_attr || !pull_attr)
  323                         return -ENOMEM;
  324
  325                 attr_group->name = devm_kasprintf(dev, GFP_KERNEL,
  326                                                   "sim_gpio%u", i);
  327                 if (!attr_group->name)
  328                         return -ENOMEM;

Apparently, if the memory allocation only partially succeeds, in the theoretical case
that the system is close to its kernel memory exhaustion, `return -ENOMEM` would not
free the partially succeeded allocs, would it?

To explain it better, I tried a version that is not yet full doing "all or nothing"
memory allocation for the gpio-sim driver, because I am not that familiar with the
driver internals.

I give this just as an illustration to what I mean to say (one line of code speaks
like thousand words):

301 static int gpio_sim_setup_sysfs(struct gpio_sim_chip *chip)
  302 {
  303         struct device_attribute *val_dev_attr, *pull_dev_attr;
  304         struct gpio_sim_attribute *val_attr, *pull_attr;
  305         unsigned int num_lines = chip->gc.ngpio;
  306         struct device *dev = chip->gc.parent;
  307         struct attribute_group *attr_group;
  308         struct attribute **attrs;
  309         int i, ret;
  310
  311         chip->attr_groups = devm_kcalloc(dev, sizeof(*chip->attr_groups),
  312                                          num_lines + 1, GFP_KERNEL);
  313         if (!chip->attr_groups)
  314                 return -ENOMEM;
  315
  316         for (i = 0; i < num_lines; i++) {
  317                 attr_group = devm_kzalloc(dev, sizeof(*attr_group), GFP_KERNEL);
  318                 attrs = devm_kcalloc(dev, GPIO_SIM_NUM_ATTRS, sizeof(*attrs),
  319                                      GFP_KERNEL);
  320                 val_attr = devm_kzalloc(dev, sizeof(*val_attr), GFP_KERNEL);
  321                 pull_attr = devm_kzalloc(dev, sizeof(*pull_attr), GFP_KERNEL);
  322                 if (!attr_group || !attrs || !val_attr || !pull_attr) {
  323                         if (attr_group)
  324                                 devm_kfree(dev, attr_group);
  325                         if (attrs)
  326                                 devm_kfree(dev, attrs);
  327                         if (val_attr)
  328                                 devm_kfree(dev, val_attr);
  329                         if (pull_attr)
  330                                 devm_kfree(dev, pull_attr);
  331                         return -ENOMEM;
  332                 }
  333
  334                 attr_group->name = devm_kasprintf(dev, GFP_KERNEL,
  335                                                   "sim_gpio%u", i);
  336                 if (!attr_group->name) {
  337                         devm_kfree(dev, attr_group);
  338                         devm_kfree(dev, attrs);
  339                         devm_kfree(dev, val_attr);
  340                         devm_kfree(dev, pull_attr);
  341                         return -ENOMEM;
  342                 }
  343
  344                 val_attr->offset = pull_attr->offset = i;

This is, of course, far from perfect, and it seems that the driver will still leak
memory in case of partially successful allocation because of the
for (i = 0; i < num_lines; i++) { ... } loop. :-/

I hope this makes any sense to you.

Thanks,
Mirsad

-- 
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

  reply	other threads:[~2023-02-20 13:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31  9:36 INFO: memory leak in gpio device in 6.2-rc6 Mirsad Goran Todorovac
2023-02-08 19:55 ` INFO: REPRODUCED: " Mirsad Goran Todorovac
2023-02-12 14:19   ` Andy Shevchenko
2023-02-14 18:19     ` Mirsad Goran Todorovac
2023-02-15 10:53       ` Andy Shevchenko
2023-02-15 14:45         ` Mirsad Goran Todorovac
2023-02-16 14:16           ` Bartosz Golaszewski
2023-02-20 13:10             ` Mirsad Todorovac [this message]
2023-02-20 13:43               ` Andy Shevchenko
2023-02-21 13:52                 ` Mirsad Goran Todorovac
2023-02-21 14:20                   ` Mirsad Goran Todorovac
2023-02-21 14:32                   ` Mirsad Goran Todorovac
2023-02-21 15:39                   ` Andy Shevchenko
2023-02-22 10:53                     ` Bartosz Golaszewski
2023-02-22 21:27                       ` Mirsad Goran Todorovac
2023-02-24 15:12                     ` Mirsad Todorovac
2023-02-24 17:40                       ` Andy Shevchenko
2023-02-27 18:38                         ` Mirsad Todorovac
2023-02-27 23:13                           ` Andy Shevchenko
2023-03-08 13:11                             ` Mirsad Todorovac
2023-03-08 15:20                               ` Andy Shevchenko
2023-03-08 23:17                                 ` Mirsad Goran Todorovac
2023-02-14 20:54     ` INFO: BISECTED: " Mirsad Goran Todorovac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b001ce6-b35d-3ad1-b757-f5f6baca7b51@alu.unizg.hr \
    --to=mirsad.todorovac@alu.unizg.hr \
    --cc=andriy.shevchenko@intel.com \
    --cc=brgl@bgdev.pl \
    --cc=linus.walleij@linaro.org \
    --cc=linux-gpio@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.