* [PATCH v2 0/3] fix stuck in accessing hwrng attributes @ 2014-09-15 16:02 Amos Kong 2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong ` (3 more replies) 0 siblings, 4 replies; 20+ messages in thread From: Amos Kong @ 2014-09-15 16:02 UTC (permalink / raw) To: virtualization; +Cc: kvm, herbert, m, mb, mpm, rusty, amit.shah, linux-kernel If we read hwrng by long-running dd process, it takes too much cpu time and almost hold the mutex lock. When we check hwrng attributes from sysfs by cat, it gets stuck in waiting the lock releaseing. The problem can only be reproduced with non-smp guest with slow backend. This patchset resolves the issue by changing rng_dev_read() to always schedule 10 jiffies after release mutex lock, then cat process can have chance to get the lock and execute protected code without stuck. Thanks. V2: update commitlog to describe PATCH 2, split second patch. Amos Kong (3): virtio-rng cleanup: move some code out of mutex protection hw_random: fix stuck in catting hwrng attributes hw_random: increase schedule timeout in rng_dev_read() drivers/char/hw_random/core.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) -- 1.9.3 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection 2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong @ 2014-09-15 16:02 ` Amos Kong 2014-09-15 16:13 ` Michael Büsch 2014-09-15 16:02 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong ` (2 subsequent siblings) 3 siblings, 1 reply; 20+ messages in thread From: Amos Kong @ 2014-09-15 16:02 UTC (permalink / raw) To: virtualization; +Cc: kvm, herbert, m, mb, mpm, rusty, amit.shah, linux-kernel It doesn't save too much cpu time as expected, just a cleanup. Signed-off-by: Amos Kong <akong@redhat.com> --- drivers/char/hw_random/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index aa30a25..c591d7e 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev, return -ERESTARTSYS; if (current_rng) name = current_rng->name; - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); mutex_unlock(&rng_mutex); + ret = snprintf(buf, PAGE_SIZE, "%s\n", name); return ret; } @@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev, ssize_t ret = 0; struct hwrng *rng; + buf[0] = '\0'; err = mutex_lock_interruptible(&rng_mutex); if (err) return -ERESTARTSYS; - buf[0] = '\0'; list_for_each_entry(rng, &rng_list, list) { strncat(buf, rng->name, PAGE_SIZE - ret - 1); ret += strlen(rng->name); strncat(buf, " ", PAGE_SIZE - ret - 1); ret++; } + mutex_unlock(&rng_mutex); strncat(buf, "\n", PAGE_SIZE - ret - 1); ret++; - mutex_unlock(&rng_mutex); return ret; } -- 1.9.3 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection 2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong @ 2014-09-15 16:13 ` Michael Büsch 2014-09-16 0:30 ` Amos Kong 0 siblings, 1 reply; 20+ messages in thread From: Michael Büsch @ 2014-09-15 16:13 UTC (permalink / raw) To: Amos Kong Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1593 bytes --] On Tue, 16 Sep 2014 00:02:27 +0800 Amos Kong <akong@redhat.com> wrote: > It doesn't save too much cpu time as expected, just a cleanup. > > Signed-off-by: Amos Kong <akong@redhat.com> > --- > drivers/char/hw_random/core.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index aa30a25..c591d7e 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev, > return -ERESTARTSYS; > if (current_rng) > name = current_rng->name; > - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); > mutex_unlock(&rng_mutex); > + ret = snprintf(buf, PAGE_SIZE, "%s\n", name); I'm not sure this is safe. Name is just a pointer. What if the hwrng gets unregistered after unlock and just before the snprintf? > return ret; > } > @@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev, > ssize_t ret = 0; > struct hwrng *rng; > > + buf[0] = '\0'; > err = mutex_lock_interruptible(&rng_mutex); > if (err) > return -ERESTARTSYS; > - buf[0] = '\0'; > list_for_each_entry(rng, &rng_list, list) { > strncat(buf, rng->name, PAGE_SIZE - ret - 1); > ret += strlen(rng->name); > strncat(buf, " ", PAGE_SIZE - ret - 1); > ret++; > } > + mutex_unlock(&rng_mutex); > strncat(buf, "\n", PAGE_SIZE - ret - 1); > ret++; > - mutex_unlock(&rng_mutex); > > return ret; > } This looks ok. -- Michael [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection 2014-09-15 16:13 ` Michael Büsch @ 2014-09-16 0:30 ` Amos Kong 0 siblings, 0 replies; 20+ messages in thread From: Amos Kong @ 2014-09-16 0:30 UTC (permalink / raw) To: Michael Büsch Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1867 bytes --] On Mon, Sep 15, 2014 at 06:13:20PM +0200, Michael Büsch wrote: > On Tue, 16 Sep 2014 00:02:27 +0800 > Amos Kong <akong@redhat.com> wrote: > > > It doesn't save too much cpu time as expected, just a cleanup. > > > > Signed-off-by: Amos Kong <akong@redhat.com> > > --- > > drivers/char/hw_random/core.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > index aa30a25..c591d7e 100644 > > --- a/drivers/char/hw_random/core.c > > +++ b/drivers/char/hw_random/core.c > > @@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev, > > return -ERESTARTSYS; > > if (current_rng) > > name = current_rng->name; > > - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); > > mutex_unlock(&rng_mutex); > > + ret = snprintf(buf, PAGE_SIZE, "%s\n", name); > > I'm not sure this is safe. > Name is just a pointer. > What if the hwrng gets unregistered after unlock and just before the snprintf? Oh, it points to protected current_rng->name, I will drop this cleanup. Thanks. > > return ret; > > } > > @@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev, > > ssize_t ret = 0; > > struct hwrng *rng; > > > > + buf[0] = '\0'; > > err = mutex_lock_interruptible(&rng_mutex); > > if (err) > > return -ERESTARTSYS; > > - buf[0] = '\0'; > > list_for_each_entry(rng, &rng_list, list) { > > strncat(buf, rng->name, PAGE_SIZE - ret - 1); > > ret += strlen(rng->name); > > strncat(buf, " ", PAGE_SIZE - ret - 1); > > ret++; > > } > > + mutex_unlock(&rng_mutex); > > strncat(buf, "\n", PAGE_SIZE - ret - 1); > > ret++; > > - mutex_unlock(&rng_mutex); > > > > return ret; > > } > > This looks ok. > > -- > Michael -- Amos. [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes 2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong 2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong @ 2014-09-15 16:02 ` Amos Kong 2014-09-18 2:43 ` Rusty Russell 2014-09-15 16:02 ` [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() Amos Kong 2014-09-17 9:30 ` [PATCH v2 0/3] fix stuck in accessing hwrng attributes Herbert Xu 3 siblings, 1 reply; 20+ messages in thread From: Amos Kong @ 2014-09-15 16:02 UTC (permalink / raw) To: virtualization; +Cc: kvm, herbert, m, mb, mpm, rusty, amit.shah, linux-kernel I started a QEMU (non-smp) guest with one virtio-rng device, and read random data from /dev/hwrng by dd: # dd if=/dev/hwrng of=/dev/null & In the same time, if I check hwrng attributes from sysfs by cat: # cat /sys/class/misc/hw_random/rng_* The cat process always gets stuck with slow backend (5 k/s), if we use a quick backend (1.2 M/s), the cat process will cost 1 to 2 minutes. The stuck doesn't exist for smp guest. Reading syscall enters kernel and call rng_dev_read(), it's user context. We used need_resched() to check if other tasks need to be run, but it almost always return false, and re-hold the mutex lock. The attributes accessing process always fails to hold the lock, so the cat gets stuck. User context doesn't allow other user contexts run on that CPU, unless the kernel code sleeps for some reason. This is why the need_reshed() always return false here. This patch removed need_resched() and always schedule other tasks then other tasks can have chance to hold the lock and execute protected code. Signed-off-by: Amos Kong <akong@redhat.com> --- drivers/char/hw_random/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index c591d7e..263a370 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, mutex_unlock(&rng_mutex); - if (need_resched()) - schedule_timeout_interruptible(1); + schedule_timeout_interruptible(1); if (signal_pending(current)) { err = -ERESTARTSYS; -- 1.9.3 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes 2014-09-15 16:02 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong @ 2014-09-18 2:43 ` Rusty Russell 2014-09-18 2:48 ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell 2014-09-18 12:47 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong 0 siblings, 2 replies; 20+ messages in thread From: Rusty Russell @ 2014-09-18 2:43 UTC (permalink / raw) To: Amos Kong, virtualization Cc: kvm, herbert, m, mb, mpm, amit.shah, linux-kernel, Linus Torvalds Amos Kong <akong@redhat.com> writes: > I started a QEMU (non-smp) guest with one virtio-rng device, and read > random data from /dev/hwrng by dd: > > # dd if=/dev/hwrng of=/dev/null & > > In the same time, if I check hwrng attributes from sysfs by cat: > > # cat /sys/class/misc/hw_random/rng_* > > The cat process always gets stuck with slow backend (5 k/s), if we > use a quick backend (1.2 M/s), the cat process will cost 1 to 2 > minutes. The stuck doesn't exist for smp guest. > > Reading syscall enters kernel and call rng_dev_read(), it's user > context. We used need_resched() to check if other tasks need to > be run, but it almost always return false, and re-hold the mutex > lock. The attributes accessing process always fails to hold the > lock, so the cat gets stuck. > > User context doesn't allow other user contexts run on that CPU, > unless the kernel code sleeps for some reason. This is why the > need_reshed() always return false here. > > This patch removed need_resched() and always schedule other tasks > then other tasks can have chance to hold the lock and execute > protected code. OK, this is going to be a rant. Your explanation doesn't make sense at all. Worse, your solution breaks the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite it.". But worst of all, this detailed explanation might have convinced me you understood the problem better than I did, and applied your patch. I did some tests. For me, as expected, the process spends its time inside the virtio rng read function, holding the mutex and thus blocking sysfs access; it's not a failure of this code at all. Your schedule_timeout() "fix" probably just helps by letting the host refresh entropy, so we spend less time waiting in the read fn. I will post a series, which unfortunately is only lightly tested, then I'm going to have some beer to begin my holiday. That may help me forget my disappointment at seeing respected fellow developers monkey-patching random code they don't understand. Grrr.... Rusty. > Signed-off-by: Amos Kong <akong@redhat.com> > --- > drivers/char/hw_random/core.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index c591d7e..263a370 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > mutex_unlock(&rng_mutex); > > - if (need_resched()) > - schedule_timeout_interruptible(1); > + schedule_timeout_interruptible(1); > > if (signal_pending(current)) { > err = -ERESTARTSYS; > -- > 1.9.3 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/5] hw_random: place mutex around read functions and buffers. 2014-09-18 2:43 ` Rusty Russell @ 2014-09-18 2:48 ` Rusty Russell 2014-09-18 2:48 ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell ` (3 more replies) 2014-09-18 12:47 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong 1 sibling, 4 replies; 20+ messages in thread From: Rusty Russell @ 2014-09-18 2:48 UTC (permalink / raw) To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah, linux-kernel Cc: Rusty Russell There's currently a big lock around everything, and it means that we can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current) while the rng is reading. This is a real problem when the rng is slow, or blocked (eg. virtio_rng with qemu's default /dev/random backend) This doesn't help (it leaves the current lock untouched), just adds a lock to protect the read function and the static buffers, in preparation for transition. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> --- drivers/char/hw_random/core.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index aa30a25c8d49..b1b6042ad85c 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -53,7 +53,10 @@ static struct hwrng *current_rng; static struct task_struct *hwrng_fill; static LIST_HEAD(rng_list); +/* Protects rng_list and current_rng */ static DEFINE_MUTEX(rng_mutex); +/* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */ +static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; static unsigned short current_quality; @@ -81,7 +84,9 @@ static void add_early_randomness(struct hwrng *rng) unsigned char bytes[16]; int bytes_read; + mutex_lock(&reading_mutex); bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1); + mutex_unlock(&reading_mutex); if (bytes_read > 0) add_device_randomness(bytes, bytes_read); } @@ -128,6 +133,7 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size, int wait) { int present; + BUG_ON(!mutex_is_locked(&reading_mutex)); if (rng->read) return rng->read(rng, (void *)buffer, size, wait); @@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, goto out_unlock; } + mutex_lock(&reading_mutex); if (!data_avail) { bytes_read = rng_get_data(current_rng, rng_buffer, rng_buffer_size(), !(filp->f_flags & O_NONBLOCK)); if (bytes_read < 0) { err = bytes_read; - goto out_unlock; + goto out_unlock_reading; } data_avail = bytes_read; } @@ -174,7 +181,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (!data_avail) { if (filp->f_flags & O_NONBLOCK) { err = -EAGAIN; - goto out_unlock; + goto out_unlock_reading; } } else { len = data_avail; @@ -186,7 +193,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (copy_to_user(buf + ret, rng_buffer + data_avail, len)) { err = -EFAULT; - goto out_unlock; + goto out_unlock_reading; } size -= len; @@ -194,6 +201,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } mutex_unlock(&rng_mutex); + mutex_unlock(&reading_mutex); if (need_resched()) schedule_timeout_interruptible(1); @@ -208,6 +216,9 @@ out: out_unlock: mutex_unlock(&rng_mutex); goto out; +out_unlock_reading: + mutex_unlock(&reading_mutex); + goto out_unlock; } @@ -348,13 +359,16 @@ static int hwrng_fillfn(void *unused) while (!kthread_should_stop()) { if (!current_rng) break; + mutex_lock(&reading_mutex); rc = rng_get_data(current_rng, rng_fillbuf, rng_buffer_size(), 1); + mutex_unlock(&reading_mutex); if (rc <= 0) { pr_warn("hwrng: no data available\n"); msleep_interruptible(10000); continue; } + /* Outside lock, sure, but y'know: randomness. */ add_hwgenerator_randomness((void *)rng_fillbuf, rc, rc * current_quality * 8 >> 10); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/5] hw_random: use reference counts on each struct hwrng. 2014-09-18 2:48 ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell @ 2014-09-18 2:48 ` Rusty Russell 2014-09-18 12:22 ` Amos Kong 2014-09-18 2:48 ` [PATCH 3/5] hw_random: fix unregister race Rusty Russell ` (2 subsequent siblings) 3 siblings, 1 reply; 20+ messages in thread From: Rusty Russell @ 2014-09-18 2:48 UTC (permalink / raw) To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah, linux-kernel Cc: Rusty Russell current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> --- drivers/char/hw_random/core.c | 135 ++++++++++++++++++++++++++++-------------- include/linux/hw_random.h | 2 + 2 files changed, 94 insertions(+), 43 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index b1b6042ad85c..dc9092a1075d 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -42,6 +42,7 @@ #include <linux/delay.h> #include <linux/slab.h> #include <linux/random.h> +#include <linux/err.h> #include <asm/uaccess.h> @@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng) add_device_randomness(bytes, bytes_read); } +static inline void cleanup_rng(struct kref *kref) +{ + struct hwrng *rng = container_of(kref, struct hwrng, ref); + + if (rng->cleanup) + rng->cleanup(rng); +} + +static void set_current_rng(struct hwrng *rng) +{ + BUG_ON(!mutex_is_locked(&rng_mutex)); + kref_get(&rng->ref); + current_rng = rng; +} + +static void drop_current_rng(void) +{ + BUG_ON(!mutex_is_locked(&rng_mutex)); + if (!current_rng) + return; + + kref_put(¤t_rng->ref, cleanup_rng); + current_rng = NULL; +} + +/* Returns ERR_PTR(), NULL or refcounted hwrng */ +static struct hwrng *get_current_rng(void) +{ + struct hwrng *rng; + + if (mutex_lock_interruptible(&rng_mutex)) + return ERR_PTR(-ERESTARTSYS); + + rng = current_rng; + if (rng) + kref_get(&rng->ref); + + mutex_unlock(&rng_mutex); + return rng; +} + +static void put_rng(struct hwrng *rng) +{ + /* + * Hold rng_mutex here so we serialize in case they set_current_rng + * on rng again immediately. + */ + mutex_lock(&rng_mutex); + if (rng) + kref_put(&rng->ref, cleanup_rng); + mutex_unlock(&rng_mutex); +} + static inline int hwrng_init(struct hwrng *rng) { if (rng->init) { @@ -113,12 +167,6 @@ static inline int hwrng_init(struct hwrng *rng) return 0; } -static inline void hwrng_cleanup(struct hwrng *rng) -{ - if (rng && rng->cleanup) - rng->cleanup(rng); -} - static int rng_dev_open(struct inode *inode, struct file *filp) { /* enforce read-only access to this chrdev */ @@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ssize_t ret = 0; int err = 0; int bytes_read, len; + struct hwrng *rng; while (size) { - if (mutex_lock_interruptible(&rng_mutex)) { - err = -ERESTARTSYS; + rng = get_current_rng(); + if (IS_ERR(rng)) { + err = PTR_ERR(rng); goto out; } - - if (!current_rng) { + if (!rng) { err = -ENODEV; - goto out_unlock; + goto out; } mutex_lock(&reading_mutex); if (!data_avail) { - bytes_read = rng_get_data(current_rng, rng_buffer, + bytes_read = rng_get_data(rng, rng_buffer, rng_buffer_size(), !(filp->f_flags & O_NONBLOCK)); if (bytes_read < 0) { @@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ret += len; } - mutex_unlock(&rng_mutex); mutex_unlock(&reading_mutex); if (need_resched()) @@ -210,15 +258,16 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, err = -ERESTARTSYS; goto out; } + + put_rng(rng); } out: return ret ? : err; -out_unlock: - mutex_unlock(&rng_mutex); - goto out; + out_unlock_reading: mutex_unlock(&reading_mutex); - goto out_unlock; + put_rng(rng); + goto out; } @@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev, err = hwrng_init(rng); if (err) break; - hwrng_cleanup(current_rng); - current_rng = rng; + drop_current_rng(); + set_current_rng(rng); err = 0; break; } @@ -272,17 +321,15 @@ static ssize_t hwrng_attr_current_show(struct device *dev, struct device_attribute *attr, char *buf) { - int err; ssize_t ret; - const char *name = "none"; + struct hwrng *rng; - err = mutex_lock_interruptible(&rng_mutex); - if (err) - return -ERESTARTSYS; - if (current_rng) - name = current_rng->name; - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); - mutex_unlock(&rng_mutex); + rng = get_current_rng(); + if (IS_ERR(rng)) + return PTR_ERR(rng); + + ret = snprintf(buf, PAGE_SIZE, "%s\n", rng ? rng->name : "none"); + put_rng(rng); return ret; } @@ -357,12 +404,16 @@ static int hwrng_fillfn(void *unused) long rc; while (!kthread_should_stop()) { - if (!current_rng) + struct hwrng *rng; + + rng = get_current_rng(); + if (IS_ERR(rng) || !rng) break; mutex_lock(&reading_mutex); - rc = rng_get_data(current_rng, rng_fillbuf, + rc = rng_get_data(rng, rng_fillbuf, rng_buffer_size(), 1); mutex_unlock(&reading_mutex); + put_rng(rng); if (rc <= 0) { pr_warn("hwrng: no data available\n"); msleep_interruptible(10000); @@ -423,14 +474,13 @@ int hwrng_register(struct hwrng *rng) err = hwrng_init(rng); if (err) goto out_unlock; - current_rng = rng; + set_current_rng(rng); } err = 0; if (!old_rng) { err = register_miscdev(); if (err) { - hwrng_cleanup(rng); - current_rng = NULL; + drop_current_rng(); goto out_unlock; } } @@ -457,22 +507,21 @@ EXPORT_SYMBOL_GPL(hwrng_register); void hwrng_unregister(struct hwrng *rng) { - int err; - mutex_lock(&rng_mutex); list_del(&rng->list); if (current_rng == rng) { - hwrng_cleanup(rng); - if (list_empty(&rng_list)) { - current_rng = NULL; - } else { - current_rng = list_entry(rng_list.prev, struct hwrng, list); - err = hwrng_init(current_rng); - if (err) - current_rng = NULL; + drop_current_rng(); + if (!list_empty(&rng_list)) { + struct hwrng *tail; + + tail = list_entry(rng_list.prev, struct hwrng, list); + + if (hwrng_init(tail) == 0) + set_current_rng(tail); } } + if (list_empty(&rng_list)) { unregister_miscdev(); if (hwrng_fill) diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index 914bb08cd738..c212e71ea886 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -14,6 +14,7 @@ #include <linux/types.h> #include <linux/list.h> +#include <linux/kref.h> /** * struct hwrng - Hardware Random Number Generator driver @@ -44,6 +45,7 @@ struct hwrng { /* internal. */ struct list_head list; + struct kref ref; }; /** Register a new Hardware Random Number Generator driver. */ -- 1.9.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] hw_random: use reference counts on each struct hwrng. 2014-09-18 2:48 ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell @ 2014-09-18 12:22 ` Amos Kong 0 siblings, 0 replies; 20+ messages in thread From: Amos Kong @ 2014-09-18 12:22 UTC (permalink / raw) To: Rusty Russell Cc: virtualization, kvm, herbert, m, mpm, amit.shah, linux-kernel On Thu, Sep 18, 2014 at 12:18:23PM +0930, Rusty Russell wrote: > current_rng holds one reference, and we bump it every time we want > to do a read from it. > > This means we only hold the rng_mutex to grab or drop a reference, > so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't > block on read of /dev/hwrng. > > Using a kref is overkill (we're always under the rng_mutex), but > a standard pattern. > > This also solves the problem that the hwrng_fillfn thread was > accessing current_rng without a lock, which could change (eg. to NULL) > underneath it. Hi Rusty, > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > --- > drivers/char/hw_random/core.c | 135 ++++++++++++++++++++++++++++-------------- > include/linux/hw_random.h | 2 + > 2 files changed, 94 insertions(+), 43 deletions(-) ... > static int rng_dev_open(struct inode *inode, struct file *filp) > { > /* enforce read-only access to this chrdev */ > @@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > ssize_t ret = 0; > int err = 0; > int bytes_read, len; > + struct hwrng *rng; > > while (size) { > - if (mutex_lock_interruptible(&rng_mutex)) { > - err = -ERESTARTSYS; > + rng = get_current_rng(); > + if (IS_ERR(rng)) { > + err = PTR_ERR(rng); > goto out; > } > - > - if (!current_rng) { > + if (!rng) { > err = -ENODEV; > - goto out_unlock; > + goto out; > } > > mutex_lock(&reading_mutex); > if (!data_avail) { > - bytes_read = rng_get_data(current_rng, rng_buffer, > + bytes_read = rng_get_data(rng, rng_buffer, > rng_buffer_size(), > !(filp->f_flags & O_NONBLOCK)); > if (bytes_read < 0) { > @@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > ret += len; > } > > - mutex_unlock(&rng_mutex); > mutex_unlock(&reading_mutex); > > if (need_resched()) > @@ -210,15 +258,16 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > err = -ERESTARTSYS; We need put_rng() in this error path. Otherwise, unhotplug will hang in the end of hwrng_unregister() | /* Just in case rng is reading right now, wait. */ | wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0); Steps to reproduce the hang: guest) # dd if=/dev/hwrng of=/dev/null cancel dd process after 10 seconds guest) # dd if=/dev/hwrng of=/dev/null & hotunplug rng device from qemu monitor result: device can't be removed (still can find in QEMU monitor) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 96fa067..4e22d70 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -258,6 +258,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (signal_pending(current)) { err = -ERESTARTSYS; + put_rng(rng); goto out; } > goto out; > } > + > + put_rng(rng); > } > out: > return ret ? : err; > -out_unlock: > - mutex_unlock(&rng_mutex); > - goto out; > + > out_unlock_reading: > mutex_unlock(&reading_mutex); > - goto out_unlock; > + put_rng(rng); > + goto out; > } > > > @@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev, > err = hwrng_init(rng); > if (err) > break; > - hwrng_cleanup(current_rng); > - current_rng = rng; > + drop_current_rng(); > + set_current_rng(rng); > err = 0; > break; > } > @@ -272,17 +321,15 @@ static ssize_t hwrng_attr_current_show(struct device *dev, > struct device_attribute *attr, > char *buf) > { > - int err; > ssize_t ret; > - const char *name = "none"; > + struct hwrng *rng; > > - err = mutex_lock_interruptible(&rng_mutex); > - if (err) > - return -ERESTARTSYS; > - if (current_rng) > - name = current_rng->name; > - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); > - mutex_unlock(&rng_mutex); > + rng = get_current_rng(); > + if (IS_ERR(rng)) > + return PTR_ERR(rng); > + > + ret = snprintf(buf, PAGE_SIZE, "%s\n", rng ? rng->name : "none"); > + put_rng(rng); > > return ret; > } > @@ -357,12 +404,16 @@ static int hwrng_fillfn(void *unused) > long rc; > > while (!kthread_should_stop()) { > - if (!current_rng) > + struct hwrng *rng; > + > + rng = get_current_rng(); > + if (IS_ERR(rng) || !rng) > break; > mutex_lock(&reading_mutex); > - rc = rng_get_data(current_rng, rng_fillbuf, > + rc = rng_get_data(rng, rng_fillbuf, > rng_buffer_size(), 1); > mutex_unlock(&reading_mutex); > + put_rng(rng); ^^^ This put_rng() called a deadlock. I will describe in the bottom. > if (rc <= 0) { > pr_warn("hwrng: no data available\n"); > msleep_interruptible(10000); > @@ -423,14 +474,13 @@ int hwrng_register(struct hwrng *rng) > err = hwrng_init(rng); > if (err) > goto out_unlock; > - current_rng = rng; > + set_current_rng(rng); > } > err = 0; > if (!old_rng) { > err = register_miscdev(); > if (err) { > - hwrng_cleanup(rng); > - current_rng = NULL; > + drop_current_rng(); > goto out_unlock; > } > } > @@ -457,22 +507,21 @@ EXPORT_SYMBOL_GPL(hwrng_register); > > void hwrng_unregister(struct hwrng *rng) > { > - int err; > - > mutex_lock(&rng_mutex); > > list_del(&rng->list); > if (current_rng == rng) { > - hwrng_cleanup(rng); > - if (list_empty(&rng_list)) { > - current_rng = NULL; > - } else { > - current_rng = list_entry(rng_list.prev, struct hwrng, list); > - err = hwrng_init(current_rng); > - if (err) > - current_rng = NULL; > + drop_current_rng(); > + if (!list_empty(&rng_list)) { > + struct hwrng *tail; > + > + tail = list_entry(rng_list.prev, struct hwrng, list); > + > + if (hwrng_init(tail) == 0) > + set_current_rng(tail); > } > } > + > if (list_empty(&rng_list)) { > unregister_miscdev(); > if (hwrng_fill) hwrng_unregister() and put_rng() grab the lock, if hwrng_unregister() takes the lock, hwrng_fillfn() will stay at put_rng() to wait the lock. Right now, thread_stop() is insider lock protection, but we try to wake up the fillfn thread and wait for its completion. | wake_up_process(k); | wait_for_completion(&kthread->exited); The solution is moving kthread_stop() outsider of lock protection. @@ -524,11 +525,11 @@ void hwrng_unregister(struct hwrng *rng) if (list_empty(&rng_list)) { unregister_miscdev(); + mutex_unlock(&rng_mutex); if (hwrng_fill) kthread_stop(hwrng_fill); - } - - mutex_unlock(&rng_mutex); + } else + mutex_unlock(&rng_mutex); /* Just in case rng is reading right now, wait. */ wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0); ================ After applied my additional two fixes, both cating hung and hotunplug issues were resolved. | test 0: | hotunplug rng device from qemu monitor | | test 1: | guest) # dd if=/dev/hwrng of=/dev/null & | hotunplug rng device from qemu monitor | | test 2: | guest) # dd if=/dev/random of=/dev/null & | hotunplug rng device from qemu monitor | | test 4: | guest) # dd if=/dev/hwrng of=/dev/null & | cat /sys/devices/virtual/misc/hw_random/rng_* | | test 5: | guest) # dd if=/dev/hwrng of=/dev/null | cancel dd process after 10 seconds | guest) # dd if=/dev/hwrng of=/dev/null & | hotunplug rng device from qemu monitor | | test 6: | use a fifo as rng backend, execute test 0 ~ 5 with no input of fifo Test are all passed :-) I know you are going or you already started your holiday, I will post a v2 with my additional patches. Thanks, Amos > diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h > index 914bb08cd738..c212e71ea886 100644 > --- a/include/linux/hw_random.h > +++ b/include/linux/hw_random.h > @@ -14,6 +14,7 @@ > > #include <linux/types.h> > #include <linux/list.h> > +#include <linux/kref.h> > > /** > * struct hwrng - Hardware Random Number Generator driver > @@ -44,6 +45,7 @@ struct hwrng { > > /* internal. */ > struct list_head list; > + struct kref ref; > }; > > /** Register a new Hardware Random Number Generator driver. */ > -- > 1.9.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/5] hw_random: fix unregister race. 2014-09-18 2:48 ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell 2014-09-18 2:48 ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell @ 2014-09-18 2:48 ` Rusty Russell 2014-10-21 14:15 ` Herbert Xu 2014-09-18 2:48 ` [PATCH 4/5] hw_random: don't double-check old_rng Rusty Russell 2014-09-18 2:48 ` [PATCH 5/5] hw_random: don't init list element we're about to add to list Rusty Russell 3 siblings, 1 reply; 20+ messages in thread From: Rusty Russell @ 2014-09-18 2:48 UTC (permalink / raw) To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah, linux-kernel Cc: Rusty Russell The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> --- drivers/char/hw_random/core.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index dc9092a1075d..b4a21e9521cf 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex); static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; +static DECLARE_WAIT_QUEUE_HEAD(rng_done); static unsigned short current_quality; static unsigned short default_quality; /* = 0; default to "off" */ @@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref) if (rng->cleanup) rng->cleanup(rng); + wake_up_all(&rng_done); } static void set_current_rng(struct hwrng *rng) @@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng) } mutex_unlock(&rng_mutex); + + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0); } EXPORT_SYMBOL_GPL(hwrng_unregister); -- 1.9.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] hw_random: fix unregister race. 2014-09-18 2:48 ` [PATCH 3/5] hw_random: fix unregister race Rusty Russell @ 2014-10-21 14:15 ` Herbert Xu 2014-11-03 15:24 ` Amos Kong 0 siblings, 1 reply; 20+ messages in thread From: Herbert Xu @ 2014-10-21 14:15 UTC (permalink / raw) To: Rusty Russell Cc: Amos Kong, virtualization, kvm, m, mb, mpm, amit.shah, linux-kernel On Thu, Sep 18, 2014 at 12:18:24PM +0930, Rusty Russell wrote: > The previous patch added one potential problem: we can still be > reading from a hwrng when it's unregistered. Add a wait for zero > in the hwrng_unregister path. > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > --- > drivers/char/hw_random/core.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index dc9092a1075d..b4a21e9521cf 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex); > static DEFINE_MUTEX(reading_mutex); > static int data_avail; > static u8 *rng_buffer, *rng_fillbuf; > +static DECLARE_WAIT_QUEUE_HEAD(rng_done); > static unsigned short current_quality; > static unsigned short default_quality; /* = 0; default to "off" */ > > @@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref) > > if (rng->cleanup) > rng->cleanup(rng); > + wake_up_all(&rng_done); > } > > static void set_current_rng(struct hwrng *rng) > @@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng) > } > > mutex_unlock(&rng_mutex); > + > + /* Just in case rng is reading right now, wait. */ > + wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0); While it's obviously better than what we have now, I don't believe this is 100% safe as the cleanup function might still be running even after the ref count hits zero. Once we return from this function the module may be unloaded so we need to ensure that nothing is running at this point. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] hw_random: fix unregister race. 2014-10-21 14:15 ` Herbert Xu @ 2014-11-03 15:24 ` Amos Kong 0 siblings, 0 replies; 20+ messages in thread From: Amos Kong @ 2014-11-03 15:24 UTC (permalink / raw) To: Herbert Xu Cc: Rusty Russell, virtualization, kvm, m, mb, mpm, amit.shah, linux-kernel On Tue, Oct 21, 2014 at 10:15:23PM +0800, Herbert Xu wrote: > On Thu, Sep 18, 2014 at 12:18:24PM +0930, Rusty Russell wrote: > > The previous patch added one potential problem: we can still be > > reading from a hwrng when it's unregistered. Add a wait for zero > > in the hwrng_unregister path. > > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > > --- > > drivers/char/hw_random/core.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > index dc9092a1075d..b4a21e9521cf 100644 > > --- a/drivers/char/hw_random/core.c > > +++ b/drivers/char/hw_random/core.c > > @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex); > > static DEFINE_MUTEX(reading_mutex); > > static int data_avail; > > static u8 *rng_buffer, *rng_fillbuf; > > +static DECLARE_WAIT_QUEUE_HEAD(rng_done); > > static unsigned short current_quality; > > static unsigned short default_quality; /* = 0; default to "off" */ > > > > @@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref) > > > > if (rng->cleanup) > > rng->cleanup(rng); rng->cleanup_done = true; > > + wake_up_all(&rng_done); > > } > > > > static void set_current_rng(struct hwrng *rng) > > @@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng) > > } > > > > mutex_unlock(&rng_mutex); > > + > > + /* Just in case rng is reading right now, wait. */ > > + wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0); Hi Rusty, After initializing (kref_init()), the refcount is 1, so we need one more kref_put() after each drop_current_rng() to release last reference count, then cleanup function will be called. > While it's obviously better than what we have now, I don't believe > this is 100% safe as the cleanup function might still be running > even after the ref count hits zero. Once we return from this function > the module may be unloaded so we need to ensure that nothing is > running at this point. I found wait_event() can still pass and finish unregister even cleanup function isn't called (wake_up_all() isn't called). So I added a flag cleanup_done to indicate that the rng device is cleaned up. + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, rng->cleanup_done && + atomic_read(&rng->ref.refcount) == 0); I will post the new v4 later. > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Amos. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/5] hw_random: don't double-check old_rng. 2014-09-18 2:48 ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell 2014-09-18 2:48 ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell 2014-09-18 2:48 ` [PATCH 3/5] hw_random: fix unregister race Rusty Russell @ 2014-09-18 2:48 ` Rusty Russell 2014-09-18 2:48 ` [PATCH 5/5] hw_random: don't init list element we're about to add to list Rusty Russell 3 siblings, 0 replies; 20+ messages in thread From: Rusty Russell @ 2014-09-18 2:48 UTC (permalink / raw) To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah, linux-kernel Cc: Rusty Russell Interesting anti-pattern. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> --- drivers/char/hw_random/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index b4a21e9521cf..6a34feca6b43 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -472,14 +472,13 @@ int hwrng_register(struct hwrng *rng) } old_rng = current_rng; + err = 0; if (!old_rng) { err = hwrng_init(rng); if (err) goto out_unlock; set_current_rng(rng); - } - err = 0; - if (!old_rng) { + err = register_miscdev(); if (err) { drop_current_rng(); -- 1.9.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/5] hw_random: don't init list element we're about to add to list. 2014-09-18 2:48 ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell ` (2 preceding siblings ...) 2014-09-18 2:48 ` [PATCH 4/5] hw_random: don't double-check old_rng Rusty Russell @ 2014-09-18 2:48 ` Rusty Russell 3 siblings, 0 replies; 20+ messages in thread From: Rusty Russell @ 2014-09-18 2:48 UTC (permalink / raw) To: Amos Kong, virtualization, kvm, herbert, m, mb, mpm, amit.shah, linux-kernel Cc: Rusty Russell Another interesting anti-pattern. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> --- drivers/char/hw_random/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 6a34feca6b43..96fa06716e95 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -485,7 +485,6 @@ int hwrng_register(struct hwrng *rng) goto out_unlock; } } - INIT_LIST_HEAD(&rng->list); list_add_tail(&rng->list, &rng_list); if (old_rng && !rng->init) { -- 1.9.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes 2014-09-18 2:43 ` Rusty Russell 2014-09-18 2:48 ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell @ 2014-09-18 12:47 ` Amos Kong 1 sibling, 0 replies; 20+ messages in thread From: Amos Kong @ 2014-09-18 12:47 UTC (permalink / raw) To: Rusty Russell Cc: virtualization, kvm, herbert, m, mb, mpm, amit.shah, linux-kernel, Linus Torvalds On Thu, Sep 18, 2014 at 12:13:08PM +0930, Rusty Russell wrote: > Amos Kong <akong@redhat.com> writes: > > > I started a QEMU (non-smp) guest with one virtio-rng device, and read > > random data from /dev/hwrng by dd: > > > > # dd if=/dev/hwrng of=/dev/null & > > > > In the same time, if I check hwrng attributes from sysfs by cat: > > > > # cat /sys/class/misc/hw_random/rng_* > > > > The cat process always gets stuck with slow backend (5 k/s), if we > > use a quick backend (1.2 M/s), the cat process will cost 1 to 2 > > minutes. The stuck doesn't exist for smp guest. > > > > Reading syscall enters kernel and call rng_dev_read(), it's user > > context. We used need_resched() to check if other tasks need to > > be run, but it almost always return false, and re-hold the mutex > > lock. The attributes accessing process always fails to hold the > > lock, so the cat gets stuck. > > > > User context doesn't allow other user contexts run on that CPU, > > unless the kernel code sleeps for some reason. This is why the > > need_reshed() always return false here. > > > > This patch removed need_resched() and always schedule other tasks > > then other tasks can have chance to hold the lock and execute > > protected code. Hi Rusty, > OK, this is going to be a rant. > > Your explanation doesn't make sense at all. Worse, your solution breaks > the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite > it.". > > But worst of all, this detailed explanation might have convinced me you > understood the problem better than I did, and applied your patch. I'm sorry about the misleading. > I did some tests. For me, as expected, the process spends its time > inside the virtio rng read function, holding the mutex and thus blocking > sysfs access; it's not a failure of this code at all. Got it now. The catting hang bug was found when I try to fix unhotplug issue, the unhotplug issue can't be reproduced if I try to debug by gdb or printk. So I forgot to debug cat hang ... but spend time to misunderstand schedle code :( > Your schedule_timeout() "fix" probably just helps by letting the host > refresh entropy, so we spend less time waiting in the read fn. > > I will post a series, which unfortunately is only lightly tested, then > I'm going to have some beer to begin my holiday. That may help me > forget my disappointment at seeing respected fellow developers > monkey-patching random code they don't understand. I just posted a V2 with two additional fixes, hotunplugging works well now :) > Grrr.... Enjoy your holiday! Amos > Rusty. > > > Signed-off-by: Amos Kong <akong@redhat.com> > > --- > > drivers/char/hw_random/core.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > index c591d7e..263a370 100644 > > --- a/drivers/char/hw_random/core.c > > +++ b/drivers/char/hw_random/core.c > > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > > > mutex_unlock(&rng_mutex); > > > > - if (need_resched()) > > - schedule_timeout_interruptible(1); > > + schedule_timeout_interruptible(1); > > > > if (signal_pending(current)) { > > err = -ERESTARTSYS; > > -- > > 1.9.3 -- Amos. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() 2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong 2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong 2014-09-15 16:02 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong @ 2014-09-15 16:02 ` Amos Kong 2014-09-15 16:13 ` Michael Büsch 2014-09-17 9:30 ` [PATCH v2 0/3] fix stuck in accessing hwrng attributes Herbert Xu 3 siblings, 1 reply; 20+ messages in thread From: Amos Kong @ 2014-09-15 16:02 UTC (permalink / raw) To: virtualization; +Cc: kvm, herbert, m, mb, mpm, rusty, amit.shah, linux-kernel This patch increases the schedule timeout to 10 jiffies, it's more appropriate, then other takes can easy to hold the mutex lock. Signed-off-by: Amos Kong <akong@redhat.com> --- drivers/char/hw_random/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 263a370..b5d1b6f 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, mutex_unlock(&rng_mutex); - schedule_timeout_interruptible(1); + schedule_timeout_interruptible(10); if (signal_pending(current)) { err = -ERESTARTSYS; -- 1.9.3 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() 2014-09-15 16:02 ` [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() Amos Kong @ 2014-09-15 16:13 ` Michael Büsch 2014-09-16 0:27 ` Amos Kong 0 siblings, 1 reply; 20+ messages in thread From: Michael Büsch @ 2014-09-15 16:13 UTC (permalink / raw) To: Amos Kong Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel [-- Attachment #1: Type: text/plain, Size: 946 bytes --] On Tue, 16 Sep 2014 00:02:29 +0800 Amos Kong <akong@redhat.com> wrote: > This patch increases the schedule timeout to 10 jiffies, it's more > appropriate, then other takes can easy to hold the mutex lock. > > Signed-off-by: Amos Kong <akong@redhat.com> > --- > drivers/char/hw_random/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index 263a370..b5d1b6f 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > mutex_unlock(&rng_mutex); > > - schedule_timeout_interruptible(1); > + schedule_timeout_interruptible(10); > > if (signal_pending(current)) { > err = -ERESTARTSYS; Does a schedule of 1 ms or 10 ms decrease the throughput? I think we need some benchmarks. -- Michael [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() 2014-09-15 16:13 ` Michael Büsch @ 2014-09-16 0:27 ` Amos Kong 2014-09-16 15:01 ` Michael Büsch 0 siblings, 1 reply; 20+ messages in thread From: Amos Kong @ 2014-09-16 0:27 UTC (permalink / raw) To: Michael Büsch Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1930 bytes --] On Mon, Sep 15, 2014 at 06:13:31PM +0200, Michael Büsch wrote: > On Tue, 16 Sep 2014 00:02:29 +0800 > Amos Kong <akong@redhat.com> wrote: > > > This patch increases the schedule timeout to 10 jiffies, it's more > > appropriate, then other takes can easy to hold the mutex lock. > > > > Signed-off-by: Amos Kong <akong@redhat.com> > > --- > > drivers/char/hw_random/core.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > index 263a370..b5d1b6f 100644 > > --- a/drivers/char/hw_random/core.c > > +++ b/drivers/char/hw_random/core.c > > @@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > > > mutex_unlock(&rng_mutex); > > > > - schedule_timeout_interruptible(1); > > + schedule_timeout_interruptible(10); > > > > if (signal_pending(current)) { > > err = -ERESTARTSYS; > > Does a schedule of 1 ms or 10 ms decrease the throughput? In my test environment, 1 jiffe always works (100%), as suggested by Amit 10 jiffes is more appropriate. After applied current 3 patches, there is a throughput regression. 1.2 M/s -> 6 K/s We can only schedule in the end of loop (size == 0), and only for non-smp guest. So smp guest won't be effected. | if (!size && num_online_cpus() == 1) | schedule_timeout_interruptible(timeout); Set timeout to 1: non-smp guest with quick backend (1.2M/s) -> about 49K/s) Set timeout to 10: non-smp guest with quick backend (1.2M/s) -> about 490K/s) We might need other benchmark to test the performance, but we can see the bug clearly caused a regression. As we discussed in other thread, need_resched() should work in this case, so those patches might be wrong fixing. > I think we need some benchmarks. > > -- > Michael -- Amos. [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() 2014-09-16 0:27 ` Amos Kong @ 2014-09-16 15:01 ` Michael Büsch 0 siblings, 0 replies; 20+ messages in thread From: Michael Büsch @ 2014-09-16 15:01 UTC (permalink / raw) To: Amos Kong Cc: virtualization, kvm, herbert, mpm, rusty, amit.shah, linux-kernel [-- Attachment #1: Type: text/plain, Size: 283 bytes --] On Tue, 16 Sep 2014 08:27:40 +0800 Amos Kong <akong@redhat.com> wrote: > Set timeout to 10: > non-smp guest with quick backend (1.2M/s) -> about 490K/s) That sounds like an awful lot. This is a 60% loss in throughput. I don't think we can live with that. -- Michael [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/3] fix stuck in accessing hwrng attributes 2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong ` (2 preceding siblings ...) 2014-09-15 16:02 ` [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() Amos Kong @ 2014-09-17 9:30 ` Herbert Xu 3 siblings, 0 replies; 20+ messages in thread From: Herbert Xu @ 2014-09-17 9:30 UTC (permalink / raw) To: Amos Kong; +Cc: virtualization, kvm, m, mb, mpm, rusty, amit.shah, linux-kernel On Tue, Sep 16, 2014 at 12:02:26AM +0800, Amos Kong wrote: > If we read hwrng by long-running dd process, it takes too much cpu > time and almost hold the mutex lock. When we check hwrng attributes > from sysfs by cat, it gets stuck in waiting the lock releaseing. > The problem can only be reproduced with non-smp guest with slow backend. > > This patchset resolves the issue by changing rng_dev_read() to always > schedule 10 jiffies after release mutex lock, then cat process can > have chance to get the lock and execute protected code without stuck. Sorry I'm not going to accept your fix which simply papers over the problem. Please bite the bullet and convert this over to RCU. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-11-03 15:25 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-09-15 16:02 [PATCH v2 0/3] fix stuck in accessing hwrng attributes Amos Kong 2014-09-15 16:02 ` [PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection Amos Kong 2014-09-15 16:13 ` Michael Büsch 2014-09-16 0:30 ` Amos Kong 2014-09-15 16:02 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong 2014-09-18 2:43 ` Rusty Russell 2014-09-18 2:48 ` [PATCH 1/5] hw_random: place mutex around read functions and buffers Rusty Russell 2014-09-18 2:48 ` [PATCH 2/5] hw_random: use reference counts on each struct hwrng Rusty Russell 2014-09-18 12:22 ` Amos Kong 2014-09-18 2:48 ` [PATCH 3/5] hw_random: fix unregister race Rusty Russell 2014-10-21 14:15 ` Herbert Xu 2014-11-03 15:24 ` Amos Kong 2014-09-18 2:48 ` [PATCH 4/5] hw_random: don't double-check old_rng Rusty Russell 2014-09-18 2:48 ` [PATCH 5/5] hw_random: don't init list element we're about to add to list Rusty Russell 2014-09-18 12:47 ` [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes Amos Kong 2014-09-15 16:02 ` [PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read() Amos Kong 2014-09-15 16:13 ` Michael Büsch 2014-09-16 0:27 ` Amos Kong 2014-09-16 15:01 ` Michael Büsch 2014-09-17 9:30 ` [PATCH v2 0/3] fix stuck in accessing hwrng attributes Herbert Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).