[PATCH 06/10] audit: Factor out chunk replacement code

When an inode is tagged with a tree, tag_chunk() checks whether there is
audit_tree_group mark attached to the inode and adds one if not. However
nothing protects another tag_chunk() to add the mark between we've
checked and try to add the fsnotify mark thus resulting in an error from
fsnotify_add_mark() and consequently an ENOSPC error from tag_chunk().

Fix the problem by holding mark_mutex over the whole check-insert code
sequence.

Signed-off-by: Jan Kara <***@suse.cz>
---
kernel/audit_tree.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 1c82eb6674c4..de8d344d91b1 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -342,25 +342,29 @@ static void untag_chunk(struct node *p)
spin_lock(&hash_lock);
}

+/* Call with group->mark_mutex held, releases it */
static int create_chunk(struct inode *inode, struct audit_tree *tree)
{
struct fsnotify_mark *entry;
struct audit_chunk *chunk = alloc_chunk(1);
- if (!chunk)
+
+ if (!chunk) {
+ mutex_unlock(&audit_tree_group->mark_mutex);
return -ENOMEM;
+ }

entry = &chunk->mark;
- if (fsnotify_add_inode_mark(entry, inode, 0)) {
+ if (fsnotify_add_inode_mark_locked(entry, inode, 0)) {
+ mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(entry);
return -ENOSPC;
}

- mutex_lock(&entry->group->mark_mutex);
spin_lock(&hash_lock);
if (tree->goner) {
spin_unlock(&hash_lock);
chunk->dead = 1;
- mutex_unlock(&entry->group->mark_mutex);
+ mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_destroy_mark(entry, audit_tree_group);
fsnotify_put_mark(entry);
return 0;
@@ -375,7 +379,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
}
insert_hash(chunk);
spin_unlock(&hash_lock);
- mutex_unlock(&entry->group->mark_mutex);
+ mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(entry); /* drop initial reference */
return 0;
}
@@ -389,6 +393,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
struct node *p;
int n;

+ mutex_lock(&audit_tree_group->mark_mutex);
old_entry = fsnotify_find_mark(&inode->i_fsnotify_marks,
audit_tree_group);
if (!old_entry)
@@ -401,6 +406,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
for (n = 0; n < old->count; n++) {
if (old->owners[n].owner == tree) {
spin_unlock(&hash_lock);
+ mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(old_entry);
return 0;
}
@@ -409,20 +415,20 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)

chunk = alloc_chunk(old->count + 1);
if (!chunk) {
+ mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(old_entry);
return -ENOMEM;
}

chunk_entry = &chunk->mark;

- mutex_lock(&old_entry->group->mark_mutex);
/*
* mark_mutex protects mark from getting detached and thus also from
* mark->connector->obj getting NULL.
*/
if (!(old_entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
/* old_entry is being shot, lets just lie */
- mutex_unlock(&old_entry->group->mark_mutex);
+ mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(old_entry);
fsnotify_put_mark(&chunk->mark);
return -ENOENT;
@@ -430,7 +436,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)

if (fsnotify_add_mark_locked(chunk_entry, old_entry->connector->obj,
FSNOTIFY_OBJ_TYPE_INODE, 1)) {
- mutex_unlock(&old_entry->group->mark_mutex);
+ mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(chunk_entry);
fsnotify_put_mark(old_entry);
return -ENOSPC;
@@ -440,7 +446,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
if (tree->goner) {
spin_unlock(&hash_lock);
chunk->dead = 1;
- mutex_unlock(&old_entry->group->mark_mutex);
+ mutex_unlock(&audit_tree_group->mark_mutex);

fsnotify_destroy_mark(chunk_entry, audit_tree_group);

@@ -471,7 +477,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
list_add(&tree->same_root, &chunk->trees);
}
spin_unlock(&hash_lock);
- mutex_unlock(&old_entry->group->mark_mutex);
+ mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_destroy_mark(old_entry, audit_tree_group);
fsnotify_put_mark(chunk_entry); /* drop initial reference */
fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */

--
2.16.4

Paul Moore

2018-07-27 04:47:10 UTC

Post by Jan Kara
When an inode is tagged with a tree, tag_chunk() checks whether there is
audit_tree_group mark attached to the inode and adds one if not. However
nothing protects another tag_chunk() to add the mark between we've
checked and try to add the fsnotify mark thus resulting in an error from
fsnotify_add_mark() and consequently an ENOSPC error from tag_chunk().
Fix the problem by holding mark_mutex over the whole check-insert code
sequence.
---
kernel/audit_tree.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)

...

Post by Jan Kara
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 1c82eb6674c4..de8d344d91b1 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -342,25 +342,29 @@ static void untag_chunk(struct node *p)
spin_lock(&hash_lock);
}
+/* Call with group->mark_mutex held, releases it */

Stuff like that always makes me nervous. Could we defer releasing the
mutex to the caller, after create_chunk() returns? It looks like
fsnotify_destroy_mark() allows a single level of nesting so it should
be okay, yes?

--
paul moore
www.paul-moore.com

Jan Kara

2018-09-04 10:00:08 UTC

...

Stuff like that always makes me nervous.

Yes, I also prefer to avoid stuff like this.

Post by Paul Moore
Could we defer releasing the mutex to the caller, after create_chunk()
returns? It looks like fsnotify_destroy_mark() allows a single level of
nesting so it should be okay, yes?

This won't work. fsnotify_destroy_mark() would try to acquire the same
mutex and block indefinitely (the nesting depth is there just for lockdep
so that you can possibly nest mark_mutexes of two different group's). And
even if we do more work and use separate fsnotify_detach_mark() and
fsnotify_free_mark() calls instead of fsnotify_destroy_mark(), the problem
is still there as fsnotify_free_mark() must not be called under mark_mutex
(as it can acquire it in some cases).

So as much as I don't like functions that release locks they didn't take I
don't see how to avoid that here without creating even bigger mess.

Honza

--
Jan Kara <***@suse.com>
SUSE Labs, CR

Jan Kara

2018-07-10 10:02:07 UTC

Currently, audit_tree code uses mark->lock to protect against detaching
of mark from an inode. In most places it however also uses
mark->group->mark_mutex (as we need to atomically replace attached
marks) and this provides protection against mark detaching as well. So
just remove protection with mark->lock from audit tree code and replace
it with mark->group->mark_mutex protection in all the places. It
simplifies the code and gets rid of some ugly catches like calling
fsnotify_add_mark_locked() with mark->lock held (which cannot sleep only
because we hold a reference to another mark attached to the same inode).

Signed-off-by: Jan Kara <***@suse.cz>
---
kernel/audit_tree.c | 24 ++++--------------------
1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 02feef939560..1c82eb6674c4 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -193,7 +193,7 @@ static inline struct list_head *chunk_hash(unsigned long key)
return chunk_hash_heads + n % HASH_SIZE;
}

-/* hash_lock & entry->lock is held by caller */
+/* hash_lock & entry->group->mark_mutex is held by caller */
static void insert_hash(struct audit_chunk *chunk)
{
unsigned long key = chunk_to_key(chunk);
@@ -256,13 +256,11 @@ static void untag_chunk(struct node *p)
new = alloc_chunk(size);

mutex_lock(&entry->group->mark_mutex);
- spin_lock(&entry->lock);
/*
* mark_mutex protects mark from getting detached and thus also from
* mark->connector->obj getting NULL.
*/
if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
- spin_unlock(&entry->lock);
mutex_unlock(&entry->group->mark_mutex);
if (new)
fsnotify_put_mark(&new->mark);
@@ -280,7 +278,6 @@ static void untag_chunk(struct node *p)
list_del_init(&p->list);
list_del_rcu(&chunk->hash);
spin_unlock(&hash_lock);
- spin_unlock(&entry->lock);
mutex_unlock(&entry->group->mark_mutex);
fsnotify_destroy_mark(entry, audit_tree_group);
goto out;
@@ -323,7 +320,6 @@ static void untag_chunk(struct node *p)
list_for_each_entry(owner, &new->trees, same_root)
owner->root = new;
spin_unlock(&hash_lock);
- spin_unlock(&entry->lock);
mutex_unlock(&entry->group->mark_mutex);
fsnotify_destroy_mark(entry, audit_tree_group);
fsnotify_put_mark(&new->mark); /* drop initial reference */
@@ -340,7 +336,6 @@ static void untag_chunk(struct node *p)
p->owner = NULL;
put_tree(owner);
spin_unlock(&hash_lock);
- spin_unlock(&entry->lock);
mutex_unlock(&entry->group->mark_mutex);
out:
fsnotify_put_mark(entry);
@@ -360,12 +355,12 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
return -ENOSPC;
}

- spin_lock(&entry->lock);
+ mutex_lock(&entry->group->mark_mutex);
spin_lock(&hash_lock);
if (tree->goner) {
spin_unlock(&hash_lock);
chunk->dead = 1;
- spin_unlock(&entry->lock);
+ mutex_unlock(&entry->group->mark_mutex);
fsnotify_destroy_mark(entry, audit_tree_group);
fsnotify_put_mark(entry);
return 0;
@@ -380,7 +375,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
}
insert_hash(chunk);
spin_unlock(&hash_lock);
- spin_unlock(&entry->lock);
+ mutex_unlock(&entry->group->mark_mutex);
fsnotify_put_mark(entry); /* drop initial reference */
return 0;
}
@@ -421,14 +416,12 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
chunk_entry = &chunk->mark;

mutex_lock(&old_entry->group->mark_mutex);
- spin_lock(&old_entry->lock);
/*
* mark_mutex protects mark from getting detached and thus also from
* mark->connector->obj getting NULL.
*/
if (!(old_entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
/* old_entry is being shot, lets just lie */
- spin_unlock(&old_entry->lock);
mutex_unlock(&old_entry->group->mark_mutex);
fsnotify_put_mark(old_entry);
fsnotify_put_mark(&chunk->mark);
@@ -437,23 +430,16 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)

if (fsnotify_add_mark_locked(chunk_entry, old_entry->connector->obj,
FSNOTIFY_OBJ_TYPE_INODE, 1)) {
- spin_unlock(&old_entry->lock);
mutex_unlock(&old_entry->group->mark_mutex);
fsnotify_put_mark(chunk_entry);
fsnotify_put_mark(old_entry);
return -ENOSPC;
}

- /* even though we hold old_entry->lock, this is safe since chunk_entry->lock could NEVER have been grabbed before */
- spin_lock(&chunk_entry->lock);
spin_lock(&hash_lock);
-
- /* we now hold old_entry->lock, chunk_entry->lock, and hash_lock */
if (tree->goner) {
spin_unlock(&hash_lock);
chunk->dead = 1;
- spin_unlock(&chunk_entry->lock);
- spin_unlock(&old_entry->lock);
mutex_unlock(&old_entry->group->mark_mutex);

fsnotify_destroy_mark(chunk_entry, audit_tree_group);
@@ -485,8 +471,6 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
list_add(&tree->same_root, &chunk->trees);
}
spin_unlock(&hash_lock);
- spin_unlock(&chunk_entry->lock);
- spin_unlock(&old_entry->lock);
mutex_unlock(&old_entry->group->mark_mutex);
fsnotify_destroy_mark(old_entry, audit_tree_group);
fsnotify_put_mark(chunk_entry); /* drop initial reference */

--
2.16.4

Paul Moore

2018-07-27 04:47:04 UTC

Post by Jan Kara
Currently, audit_tree code uses mark->lock to protect against detaching
of mark from an inode. In most places it however also uses
mark->group->mark_mutex (as we need to atomically replace attached
marks) and this provides protection against mark detaching as well. So
just remove protection with mark->lock from audit tree code and replace
it with mark->group->mark_mutex protection in all the places. It
simplifies the code and gets rid of some ugly catches like calling
fsnotify_add_mark_locked() with mark->lock held (which cannot sleep only
because we hold a reference to another mark attached to the same inode).
---
kernel/audit_tree.c | 24 ++++--------------------
1 file changed, 4 insertions(+), 20 deletions(-)

...

Post by Jan Kara
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 02feef939560..1c82eb6674c4 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -360,12 +355,12 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
return -ENOSPC;
}
- spin_lock(&entry->lock);
+ mutex_lock(&entry->group->mark_mutex);

I wonder if we could move the lock up above the
fsnotify_add_inode_mark() earlier in create_chunk() and use
fsnotify_add_mark_locked()?

--
paul moore
www.paul-moore.com

Jan Kara

2018-09-04 09:53:07 UTC

Hi,

sorry for getting to this so late but I was catching up after vacation and
your replies got burried in my inbox.

...

I wonder if we could move the lock up above the
fsnotify_add_inode_mark() earlier in create_chunk() and use
fsnotify_add_mark_locked()?

Possibly, but I didn't want to do this in this patch as that's a separate
change. Also this is what in fact happens in later patches.

Honza

--
Jan Kara <***@suse.com>
SUSE Labs, CR

Jan Kara

2018-07-10 10:02:10 UTC

Currently chunk hash key (which is in fact pointer to the inode) is
derived as chunk->mark.conn->obj. It is tricky to make this dereference
reliable for hash table lookups only under RCU as mark can get detached
from the connector and connector gets freed independently of the
running lookup. Thus there is a possible use after free / NULL ptr
dereference issue:

CPU1 CPU2
untag_chunk()
...
audit_tree_lookup()
list_for_each_entry_rcu(p, list, hash) {
list_del_rcu(&chunk->hash);
fsnotify_destroy_mark(entry);
fsnotify_put_mark(entry)
chunk_to_key(p)
if (!chunk->mark.connector)
...
hlist_del_init_rcu(&mark->obj_list);
if (hlist_empty(&conn->list)) {
inode = fsnotify_detach_connector_from_object(conn);
mark->connector = NULL;
...
frees connector from workqueue
chunk->mark.connector->obj

This race is probably impossible to hit in practice as the race window
on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's
better to have this fixed. Since the inode the chunk is attached to is
constant during chunk's lifetime it is easy to cache the key in the
chunk itself and thus avoid these issues.

Signed-off-by: Jan Kara <***@suse.cz>
---
kernel/audit_tree.c | 27 ++++++++-------------------
1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 7f9df8c66227..a210963bae32 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -24,6 +24,7 @@ struct audit_tree {

struct audit_chunk {
struct list_head hash;
+ unsigned long key;
struct fsnotify_mark mark;
struct list_head trees; /* with root here */
int dead;
@@ -172,21 +173,6 @@ static unsigned long inode_to_key(const struct inode *inode)
return (unsigned long)&inode->i_fsnotify_marks;
}

-/*
- * Function to return search key in our hash from chunk. Key 0 is special and
- * should never be present in the hash.
- */
-static unsigned long chunk_to_key(struct audit_chunk *chunk)
-{
- /*
- * We have a reference to the mark so it should be attached to a
- * connector.
- */
- if (WARN_ON_ONCE(!chunk->mark.connector))
- return 0;
- return (unsigned long)chunk->mark.connector->obj;
-}
-
static inline struct list_head *chunk_hash(unsigned long key)
{
unsigned long n = key / L1_CACHE_BYTES;
@@ -196,12 +182,12 @@ static inline struct list_head *chunk_hash(unsigned long key)
/* hash_lock & entry->group->mark_mutex is held by caller */
static void insert_hash(struct audit_chunk *chunk)
{
- unsigned long key = chunk_to_key(chunk);
struct list_head *list;

if (!(chunk->mark.flags & FSNOTIFY_MARK_FLAG_ATTACHED))
return;
- list = chunk_hash(key);
+ WARN_ON_ONCE(!chunk->key);
+ list = chunk_hash(chunk->key);
list_add_rcu(&chunk->hash, list);
}

@@ -213,7 +199,7 @@ struct audit_chunk *audit_tree_lookup(const struct inode *inode)
struct audit_chunk *p;

list_for_each_entry_rcu(p, list, hash) {
- if (chunk_to_key(p) == key) {
+ if (p->key == key) {
atomic_long_inc(&p->refs);
return p;
}
@@ -295,6 +281,7 @@ static void untag_chunk(struct node *p)

chunk->dead = 1;
spin_lock(&hash_lock);
+ new->key = chunk->key;
list_replace_init(&chunk->trees, &new->trees);
if (owner->root == chunk) {
list_del_init(&owner->same_root);
@@ -380,6 +367,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
tree->root = chunk;
list_add(&tree->same_root, &chunk->trees);
}
+ chunk->key = inode_to_key(inode);
insert_hash(chunk);
spin_unlock(&hash_lock);
mutex_unlock(&audit_tree_group->mark_mutex);
@@ -456,6 +444,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
fsnotify_put_mark(old_entry);
return 0;
}
+ chunk->key = old->key;
list_replace_init(&old->trees, &chunk->trees);
for (n = 0, p = chunk->owners; n < old->count; n++, p++) {
struct audit_tree *s = old->owners[n].owner;
@@ -652,7 +641,7 @@ void audit_trim_trees(void)
/* this could be NULL if the watch is dying else where... */
node->index |= 1U<<31;
if (iterate_mounts(compare_root,
- (void *)chunk_to_key(chunk),
+ (void *)(chunk->key),
root_mnt))
node->index &= ~(1U<<31);
}

--
2.16.4

Paul Moore

2018-07-27 04:47:17 UTC

Post by Jan Kara
Currently chunk hash key (which is in fact pointer to the inode) is
derived as chunk->mark.conn->obj. It is tricky to make this dereference
reliable for hash table lookups only under RCU as mark can get detached
from the connector and connector gets freed independently of the
running lookup. Thus there is a possible use after free / NULL ptr
CPU1 CPU2
untag_chunk()
...
audit_tree_lookup()
list_for_each_entry_rcu(p, list, hash) {
list_del_rcu(&chunk->hash);
fsnotify_destroy_mark(entry);
fsnotify_put_mark(entry)
chunk_to_key(p)
if (!chunk->mark.connector)
...
hlist_del_init_rcu(&mark->obj_list);
if (hlist_empty(&conn->list)) {
inode = fsnotify_detach_connector_from_object(conn);
mark->connector = NULL;
...
frees connector from workqueue
chunk->mark.connector->obj
This race is probably impossible to hit in practice as the race window
on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's
better to have this fixed. Since the inode the chunk is attached to is
constant during chunk's lifetime it is easy to cache the key in the
chunk itself and thus avoid these issues.
---
kernel/audit_tree.c | 27 ++++++++-------------------
1 file changed, 8 insertions(+), 19 deletions(-)

This looks okay to me.

--
paul moore
www.paul-moore.com

Jan Kara

2018-07-10 10:02:09 UTC

Audit tree code is replacing marks attached to inodes in non-atomic way.
Thus fsnotify_find_mark() in tag_chunk() may find a mark that belongs to
a chunk that is no longer valid one and will soon be destroyed. Tags
added to such chunk will be simply lost.

Fix the problem by making sure old mark is marked as going away (through
fsnotify_detach_mark()) before dropping mark_mutex and thus in an atomic
way wrt tag_chunk(). Note that this does not fix the problem completely
as if tag_chunk() finds a mark that is going away, it fails with
-ENOENT. But at least the failure is not silent and currently there's no
way to search for another fsnotify mark attached to the inode. We'll fix
this problem in later patch.

Signed-off-by: Jan Kara <***@suse.cz>
---
kernel/audit_tree.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index de8d344d91b1..7f9df8c66227 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -278,8 +278,9 @@ static void untag_chunk(struct node *p)
list_del_init(&p->list);
list_del_rcu(&chunk->hash);
spin_unlock(&hash_lock);
+ fsnotify_detach_mark(entry);
mutex_unlock(&entry->group->mark_mutex);
- fsnotify_destroy_mark(entry, audit_tree_group);
+ fsnotify_free_mark(entry);
goto out;
}

@@ -320,8 +321,9 @@ static void untag_chunk(struct node *p)
list_for_each_entry(owner, &new->trees, same_root)
owner->root = new;
spin_unlock(&hash_lock);
+ fsnotify_detach_mark(entry);
mutex_unlock(&entry->group->mark_mutex);
- fsnotify_destroy_mark(entry, audit_tree_group);
+ fsnotify_free_mark(entry);
fsnotify_put_mark(&new->mark); /* drop initial reference */
goto out;

@@ -364,8 +366,9 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
if (tree->goner) {
spin_unlock(&hash_lock);
chunk->dead = 1;
+ fsnotify_detach_mark(entry);
mutex_unlock(&audit_tree_group->mark_mutex);
- fsnotify_destroy_mark(entry, audit_tree_group);
+ fsnotify_free_mark(entry);
fsnotify_put_mark(entry);
return 0;
}
@@ -446,10 +449,9 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
if (tree->goner) {
spin_unlock(&hash_lock);
chunk->dead = 1;
+ fsnotify_detach_mark(chunk_entry);
mutex_unlock(&audit_tree_group->mark_mutex);
-
- fsnotify_destroy_mark(chunk_entry, audit_tree_group);
-
+ fsnotify_free_mark(chunk_entry);
fsnotify_put_mark(chunk_entry);
fsnotify_put_mark(old_entry);
return 0;
@@ -477,8 +479,9 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
list_add(&tree->same_root, &chunk->trees);
}
spin_unlock(&hash_lock);
+ fsnotify_detach_mark(old_entry);
mutex_unlock(&audit_tree_group->mark_mutex);
- fsnotify_destroy_mark(old_entry, audit_tree_group);
+ fsnotify_free_mark(old_entry);
fsnotify_put_mark(chunk_entry); /* drop initial reference */
fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
return 0;

--
2.16.4

Jan Kara

2018-07-10 10:02:13 UTC

The audit_tree_group->mark_mutex is held all the time while we create
the fsnotify mark, add it to the inode, and insert chunk into the hash.
Hence mark cannot get detached during this time and so the check whether
the mark is attached in insert_hash() is pointless.

Signed-off-by: Jan Kara <***@suse.cz>
---
kernel/audit_tree.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index f419fdfc25b4..b7977cf30479 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -184,8 +184,6 @@ static void insert_hash(struct audit_chunk *chunk)
{
struct list_head *list;

- if (!(chunk->mark.flags & FSNOTIFY_MARK_FLAG_ATTACHED))
- return;
/*
* Make sure chunk is fully initialized before making it visible in the
* hash. Pairs with a data dependency barrier in READ_ONCE() in

--
2.16.4

Paul Moore

2018-07-27 04:47:33 UTC

Post by Jan Kara
The audit_tree_group->mark_mutex is held all the time while we create
the fsnotify mark, add it to the inode, and insert chunk into the hash.
Hence mark cannot get detached during this time and so the check whether
the mark is attached in insert_hash() is pointless.
---
kernel/audit_tree.c | 2 --
1 file changed, 2 deletions(-)

This makes sense.

--
paul moore
www.paul-moore.com

Jan Kara

2018-07-10 10:02:14 UTC

Provide a helper function audit_mark_put_chunk() for dropping mark's
reference (which has to happen only after RCU grace period expires).
Currently that happens only from a single place but in later patches we
introduce more callers.

Signed-off-by: Jan Kara <***@suse.cz>
---
kernel/audit_tree.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index b7977cf30479..bce3b04a365d 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -132,10 +132,20 @@ static void __put_chunk(struct rcu_head *rcu)
audit_put_chunk(chunk);
}

+/*
+ * Drop reference to the chunk that was held by the mark. This is the reference
+ * that gets dropped after we've removed the chunk from the hash table and we
+ * use it to make sure chunk cannot be freed before RCU grace period expires.
+ */
+static void audit_mark_put_chunk(struct audit_chunk *chunk)
+{
+ call_rcu(&chunk->head, __put_chunk);
+}
+
static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
{
struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
- call_rcu(&chunk->head, __put_chunk);
+ audit_mark_put_chunk(chunk);
}

static struct audit_chunk *alloc_chunk(int count)

--
2.16.4

Jan Kara

2018-07-10 10:02:11 UTC

Currently, the audit tree code does not make sure that when a chunk in
inserted into the hash table, it is fully initialized. So in theory (in
practice only on DEC Alpha) a user of RCU lookup could see uninitialized
structure in the hash table and crash. Add appropriate barriers between
initialization of the structure and its insertion into hash table.

Signed-off-by: Jan Kara <***@suse.cz>
---
kernel/audit_tree.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index a210963bae32..6a01738d1ac2 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -186,6 +186,12 @@ static void insert_hash(struct audit_chunk *chunk)

if (!(chunk->mark.flags & FSNOTIFY_MARK_FLAG_ATTACHED))
return;
+ /*
+ * Make sure chunk is fully initialized before making it visible in the
+ * hash. Pairs with a data dependency barrier in READ_ONCE() in
+ * audit_tree_lookup().
+ */
+ smp_wmb();
WARN_ON_ONCE(!chunk->key);
list = chunk_hash(chunk->key);
list_add_rcu(&chunk->hash, list);
@@ -199,7 +205,11 @@ struct audit_chunk *audit_tree_lookup(const struct inode *inode)
struct audit_chunk *p;

list_for_each_entry_rcu(p, list, hash) {
- if (p->key == key) {
+ /*
+ * We use a data dependency barrier in READ_ONCE() to make sure
+ * the chunk we see is fully initialized.
+ */
+ if (READ_ONCE(p->key) == key) {
atomic_long_inc(&p->refs);
return p;
}
@@ -304,9 +314,15 @@ static void untag_chunk(struct node *p)
list_replace_init(&chunk->owners[j].list, &new->owners[i].list);
}

- list_replace_rcu(&chunk->hash, &new->hash);
list_for_each_entry(owner, &new->trees, same_root)
owner->root = new;
+ /*
+ * Make sure chunk is fully initialized before making it visible in the
+ * hash. Pairs with a data dependency barrier in READ_ONCE() in
+ * audit_tree_lookup().
+ */
+ smp_wmb();
+ list_replace_rcu(&chunk->hash, &new->hash);
spin_unlock(&hash_lock);
fsnotify_detach_mark(entry);
mutex_unlock(&entry->group->mark_mutex);
@@ -368,6 +384,10 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
list_add(&tree->same_root, &chunk->trees);
}
chunk->key = inode_to_key(inode);
+ /*
+ * Inserting into the hash table has to go last as once we do that RCU
+ * readers can see the chunk.
+ */
insert_hash(chunk);
spin_unlock(&hash_lock);
mutex_unlock(&audit_tree_group->mark_mutex);
@@ -459,7 +479,6 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
p->owner = tree;
get_tree(tree);
list_add(&p->list, &tree->chunks);
- list_replace_rcu(&old->hash, &chunk->hash);
list_for_each_entry(owner, &chunk->trees, same_root)
owner->root = chunk;
old->dead = 1;
@@ -467,6 +486,13 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
tree->root = chunk;
list_add(&tree->same_root, &chunk->trees);
}
+ /*
+ * Make sure chunk is fully initialized before making it visible in the
+ * hash. Pairs with a data dependency barrier in READ_ONCE() in
+ * audit_tree_lookup().
+ */
+ smp_wmb();
+ list_replace_rcu(&old->hash, &chunk->hash);
spin_unlock(&hash_lock);
fsnotify_detach_mark(old_entry);
mutex_unlock(&audit_tree_group->mark_mutex);

--
2.16.4

Jan Kara

2018-07-10 10:02:16 UTC

Audit tree code currently associates new fsnotify mark with each new
chunk. As chunk attached to an inode is replaced when new tag is added /
removed, we also need to remove old fsnotify mark and add a new one on
such occasion. This is cumbersome and makes locking rules somewhat
difficult to follow.

Fix these problems by allocating fsnotify mark independently of chunk
and keeping it all the time while there is some chunk attached to an
inode. Also add documentation about the locking rules so that things are
easier to follow.

Signed-off-by: Jan Kara <***@suse.cz>
---
kernel/audit_tree.c | 163 +++++++++++++++++++++++++++-------------------------
1 file changed, 85 insertions(+), 78 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index aec9b27a20ff..40f61de77dd0 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -27,7 +27,6 @@ struct audit_chunk {
unsigned long key;
struct fsnotify_mark *mark;
struct list_head trees; /* with root here */
- int dead;
int count;
atomic_long_t refs;
struct rcu_head head;
@@ -48,8 +47,15 @@ static LIST_HEAD(prune_list);
static struct task_struct *prune_thread;

/*
- * One struct chunk is attached to each inode of interest.
- * We replace struct chunk on tagging/untagging.
+ * One struct chunk is attached to each inode of interest through
+ * audit_tree_mark (fsnotify mark). We replace struct chunk on tagging /
+ * untagging, the mark is stable as long as there is chunk attached. The
+ * association between mark and chunk is protected by hash_lock and
+ * audit_tree_group->mark_mutex. Thus as long as we hold
+ * audit_tree_group->mark_mutex and check that the mark is alive by
+ * FSNOTIFY_MARK_FLAG_ATTACHED flag check, we are sure the mark points to
+ * the current chunk.
+ *
* Rules have pointer to struct audit_tree.
* Rules have struct list_head rlist forming a list of rules over
* the same tree.
@@ -68,8 +74,12 @@ static struct task_struct *prune_thread;
* tree is refcounted; one reference for "some rules on rules_list refer to
* it", one for each chunk with pointer to it.
*
- * chunk is refcounted by embedded fsnotify_mark + .refs (non-zero refcount
- * of watch contributes 1 to .refs).
+ * chunk is refcounted by embedded .refs. Mark associated with the chunk holds
+ * one chunk reference. This reference is dropped either when a mark is going
+ * to be freed (corresponding inode goes away) or when chunk attached to the
+ * mark gets replaced. This reference must be dropped using
+ * audit_mark_put_chunk() to make sure the reference is dropped only after RCU
+ * grace period as it protects RCU readers of the hash table.
*
* node.index allows to get from node.list to containing chunk.
* MSB of that sucker is stolen to mark taggings that we might have to
@@ -155,8 +165,6 @@ static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)

static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
{
- struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
- audit_mark_put_chunk(chunk);
kmem_cache_free(audit_tree_mark_cachep, entry);
}

@@ -183,13 +191,6 @@ static struct audit_chunk *alloc_chunk(int count)
if (!chunk)
return NULL;

- chunk->mark = alloc_fsnotify_mark();
- if (!chunk->mark) {
- kfree(chunk);
- return NULL;
- }
- AUDIT_M(chunk->mark)->chunk = chunk;
-
INIT_LIST_HEAD(&chunk->hash);
INIT_LIST_HEAD(&chunk->trees);
chunk->count = count;
@@ -272,6 +273,20 @@ static struct audit_chunk *find_chunk(struct node *p)
return container_of(p, struct audit_chunk, owners[0]);
}

+static void replace_mark_chunk(struct fsnotify_mark *entry,
+ struct audit_chunk *chunk)
+{
+ struct audit_chunk *old;
+
+ assert_spin_locked(&hash_lock);
+ old = AUDIT_M(entry)->chunk;
+ AUDIT_M(entry)->chunk = chunk;
+ if (chunk)
+ chunk->mark = entry;
+ if (old)
+ old->mark = NULL;
+}
+
static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
struct node *skip)
{
@@ -295,6 +310,7 @@ static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
get_tree(owner);
list_replace_init(&old->owners[j].list, &new->owners[i].list);
}
+ replace_mark_chunk(old->mark, new);
/*
* Make sure chunk is fully initialized before making it visible in the
* hash. Pairs with a data dependency barrier in READ_ONCE() in
@@ -312,6 +328,10 @@ static void untag_chunk(struct node *p)
struct audit_tree *owner;
int size = chunk->count - 1;

+ /* Racing with audit_tree_freeing_mark()? */
+ if (!entry)
+ return;
+
fsnotify_get_mark(entry);

spin_unlock(&hash_lock);
@@ -321,29 +341,31 @@ static void untag_chunk(struct node *p)

mutex_lock(&entry->group->mark_mutex);
/*
- * mark_mutex protects mark from getting detached and thus also from
- * mark->connector->obj getting NULL.
+ * mark_mutex protects mark stabilizes chunk attached to the mark so we
+ * can check whether it didn't change while we've dropped hash_lock.
*/
- if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
+ if (!(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED) ||
+ AUDIT_M(entry)->chunk != chunk) {
mutex_unlock(&entry->group->mark_mutex);
if (new)
- fsnotify_put_mark(new->mark);
+ kfree(new);
goto out;
}

owner = p->owner;

if (!size) {
- chunk->dead = 1;
spin_lock(&hash_lock);
list_del_init(&chunk->trees);
if (owner->root == chunk)
owner->root = NULL;
list_del_init(&p->list);
list_del_rcu(&chunk->hash);
+ replace_mark_chunk(entry, NULL);
spin_unlock(&hash_lock);
fsnotify_detach_mark(entry);
mutex_unlock(&entry->group->mark_mutex);
+ audit_mark_put_chunk(chunk);
fsnotify_free_mark(entry);
goto out;
}
@@ -351,13 +373,6 @@ static void untag_chunk(struct node *p)
if (!new)
goto Fallback;

- if (fsnotify_add_mark_locked(new->mark, entry->connector->obj,
- FSNOTIFY_OBJ_TYPE_INODE, 1)) {
- fsnotify_put_mark(new->mark);
- goto Fallback;
- }
-
- chunk->dead = 1;
spin_lock(&hash_lock);
if (owner->root == chunk) {
list_del_init(&owner->same_root);
@@ -370,10 +385,8 @@ static void untag_chunk(struct node *p)
*/
replace_chunk(new, chunk, p);
spin_unlock(&hash_lock);
- fsnotify_detach_mark(entry);
mutex_unlock(&entry->group->mark_mutex);
- fsnotify_free_mark(entry);
- fsnotify_put_mark(new->mark); /* drop initial reference */
+ audit_mark_put_chunk(chunk);
goto out;

Fallback:
@@ -404,23 +417,31 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
return -ENOMEM;
}

- entry = chunk->mark;
+ entry = alloc_fsnotify_mark();
+ if (!entry) {
+ mutex_unlock(&audit_tree_group->mark_mutex);
+ kfree(chunk);
+ return -ENOMEM;
+ }
+
if (fsnotify_add_inode_mark_locked(entry, inode, 0)) {
mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(entry);
+ kfree(chunk);
return -ENOSPC;
}

spin_lock(&hash_lock);
if (tree->goner) {
spin_unlock(&hash_lock);
- chunk->dead = 1;
fsnotify_detach_mark(entry);
mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_free_mark(entry);
fsnotify_put_mark(entry);
+ kfree(chunk);
return 0;
}
+ replace_mark_chunk(entry, chunk);
chunk->owners[0].index = (1U << 31);
chunk->owners[0].owner = tree;
get_tree(tree);
@@ -437,33 +458,41 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
insert_hash(chunk);
spin_unlock(&hash_lock);
mutex_unlock(&audit_tree_group->mark_mutex);
- fsnotify_put_mark(entry); /* drop initial reference */
+ /*
+ * Drop our initial reference. When mark we point to is getting freed,
+ * we get notification through ->freeing_mark callback and cleanup
+ * chunk pointing to this mark.
+ */
+ fsnotify_put_mark(entry);
return 0;
}

/* the first tagged inode becomes root of tree */
static int tag_chunk(struct inode *inode, struct audit_tree *tree)
{
- struct fsnotify_mark *old_entry, *chunk_entry;
+ struct fsnotify_mark *entry;
struct audit_chunk *chunk, *old;
struct node *p;
int n;

mutex_lock(&audit_tree_group->mark_mutex);
- old_entry = fsnotify_find_mark(&inode->i_fsnotify_marks,
- audit_tree_group);
- if (!old_entry)
+ entry = fsnotify_find_mark(&inode->i_fsnotify_marks, audit_tree_group);
+ if (!entry)
return create_chunk(inode, tree);

- old = AUDIT_M(old_entry)->chunk;
-
+ /*
+ * Found mark is guaranteed to be attached and mark_mutex protects mark
+ * from getting detached and thus it makes sure there is chunk attached
+ * to the mark.
+ */
/* are we already there? */
spin_lock(&hash_lock);
+ old = AUDIT_M(entry)->chunk;
for (n = 0; n < old->count; n++) {
if (old->owners[n].owner == tree) {
spin_unlock(&hash_lock);
mutex_unlock(&audit_tree_group->mark_mutex);
- fsnotify_put_mark(old_entry);
+ fsnotify_put_mark(entry);
return 0;
}
}
@@ -472,41 +501,16 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
chunk = alloc_chunk(old->count + 1);
if (!chunk) {
mutex_unlock(&audit_tree_group->mark_mutex);
- fsnotify_put_mark(old_entry);
+ fsnotify_put_mark(entry);
return -ENOMEM;
}

- chunk_entry = chunk->mark;
-
- /*
- * mark_mutex protects mark from getting detached and thus also from
- * mark->connector->obj getting NULL.
- */
- if (!(old_entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
- /* old_entry is being shot, lets just lie */
- mutex_unlock(&audit_tree_group->mark_mutex);
- fsnotify_put_mark(old_entry);
- fsnotify_put_mark(chunk->mark);
- return -ENOENT;
- }
-
- if (fsnotify_add_mark_locked(chunk_entry, old_entry->connector->obj,
- FSNOTIFY_OBJ_TYPE_INODE, 1)) {
- mutex_unlock(&audit_tree_group->mark_mutex);
- fsnotify_put_mark(chunk_entry);
- fsnotify_put_mark(old_entry);
- return -ENOSPC;
- }
-
spin_lock(&hash_lock);
if (tree->goner) {
spin_unlock(&hash_lock);
- chunk->dead = 1;
- fsnotify_detach_mark(chunk_entry);
mutex_unlock(&audit_tree_group->mark_mutex);
- fsnotify_free_mark(chunk_entry);
- fsnotify_put_mark(chunk_entry);
- fsnotify_put_mark(old_entry);
+ fsnotify_put_mark(entry);
+ kfree(chunk);
return 0;
}
p = &chunk->owners[chunk->count - 1];
@@ -514,7 +518,6 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
p->owner = tree;
get_tree(tree);
list_add(&p->list, &tree->chunks);
- old->dead = 1;
if (!tree->root) {
tree->root = chunk;
list_add(&tree->same_root, &chunk->trees);
@@ -525,11 +528,10 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
*/
replace_chunk(chunk, old, NULL);
spin_unlock(&hash_lock);
- fsnotify_detach_mark(old_entry);
mutex_unlock(&audit_tree_group->mark_mutex);
- fsnotify_free_mark(old_entry);
- fsnotify_put_mark(chunk_entry); /* drop initial reference */
- fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
+ fsnotify_put_mark(entry); /* pair to fsnotify_find mark_entry */
+ audit_mark_put_chunk(old);
+
return 0;
}

@@ -996,10 +998,6 @@ static void evict_chunk(struct audit_chunk *chunk)
int need_prune = 0;
int n;

- if (chunk->dead)
- return;
-
- chunk->dead = 1;
mutex_lock(&audit_filter_mutex);
spin_lock(&hash_lock);
while (!list_empty(&chunk->trees)) {
@@ -1038,9 +1036,18 @@ static int audit_tree_handle_event(struct fsnotify_group *group,

static void audit_tree_freeing_mark(struct fsnotify_mark *entry, struct fsnotify_group *group)
{
- struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
+ struct audit_chunk *chunk;

- evict_chunk(chunk);
+ mutex_lock(&entry->group->mark_mutex);
+ spin_lock(&hash_lock);
+ chunk = AUDIT_M(entry)->chunk;
+ replace_mark_chunk(entry, NULL);
+ spin_unlock(&hash_lock);
+ mutex_unlock(&entry->group->mark_mutex);
+ if (chunk) {
+ evict_chunk(chunk);
+ audit_mark_put_chunk(chunk);
+ }

/*
* We are guaranteed to have at least one reference to the mark from

--
2.16.4

Amir Goldstein

2018-07-11 14:17:01 UTC

Post by Jan Kara
Audit tree code currently associates new fsnotify mark with each new
chunk. As chunk attached to an inode is replaced when new tag is added /
removed, we also need to remove old fsnotify mark and add a new one on
such occasion. This is cumbersome and makes locking rules somewhat
difficult to follow.
Fix these problems by allocating fsnotify mark independently of chunk
and keeping it all the time while there is some chunk attached to an
inode. Also add documentation about the locking rules so that things are
easier to follow.
---

Well, if there are bugs here I can't find them, but they sure have a lot of
places to hide...

Cheers,
Amir.

Paul Moore

2018-07-27 04:47:42 UTC

This is a really nice improvement, thanks!

Post by Jan Kara
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index aec9b27a20ff..40f61de77dd0 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -272,6 +273,20 @@ static struct audit_chunk *find_chunk(struct node *p)
return container_of(p, struct audit_chunk, owners[0]);
}
+static void replace_mark_chunk(struct fsnotify_mark *entry,
+ struct audit_chunk *chunk)
+{
+ struct audit_chunk *old;
+
+ assert_spin_locked(&hash_lock);
+ old = AUDIT_M(entry)->chunk;
+ AUDIT_M(entry)->chunk = chunk;
+ if (chunk)
+ chunk->mark = entry;
+ if (old)
+ old->mark = NULL;

Is it necessary that we check to see if chunk and old are non-NULL?
It seems like we would always want to set chunk->mark to entry and set
old->mark to NULL, yes?

Post by Jan Kara
@@ -321,29 +341,31 @@ static void untag_chunk(struct node *p)
mutex_lock(&entry->group->mark_mutex);
/*
- * mark_mutex protects mark from getting detached and thus also from
- * mark->connector->obj getting NULL.
+ * mark_mutex protects mark stabilizes chunk attached to the mark so we
+ * can check whether it didn't change while we've dropped hash_lock.

I think your new text could use some revision, the "protects mark
stabilizes chunk" is odd.

Post by Jan Kara
*/
- if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
+ if (!(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED) ||
+ AUDIT_M(entry)->chunk != chunk) {
mutex_unlock(&entry->group->mark_mutex);
if (new)
- fsnotify_put_mark(new->mark);
+ kfree(new);

Since we are just calling kfree() now we can do away with the "if (new)" check.

--
paul moore
www.paul-moore.com

Jan Kara

2018-09-04 14:11:22 UTC

Is it necessary that we check to see if chunk and old are non-NULL?
It seems like we would always want to set chunk->mark to entry and set
old->mark to NULL, yes?

Both checks are needed - 'old' can be NULL if we use replace_mark_chunk()
to attach first chunk to mark. 'chunk' can be NULL if we use
replace_mark_chunk() to detach mark from current chunk when destroying it.

I think your new text could use some revision, the "protects mark
stabilizes chunk" is odd.

Yup, I'll fix that.

Since we are just calling kfree() now we can do away with the "if (new)" check.

Right, I'll do that.

Honza

--
Jan Kara <***@suse.com>
SUSE Labs, CR

Jan Kara

2018-07-10 10:02:17 UTC

Add stress test for stressing audit tree watches by adding and deleting
rules while events are generated and watched filesystems are mounted and
unmounted in parallel.

Signed-off-by: Jan Kara <***@suse.cz>
---
tests/stress_tree/Makefile | 8 +++
tests/stress_tree/test | 171 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 179 insertions(+)
create mode 100644 tests/stress_tree/Makefile
create mode 100755 tests/stress_tree/test

diff --git a/tests/stress_tree/Makefile b/tests/stress_tree/Makefile
new file mode 100644
index 000000000000..7ade09aad86f
--- /dev/null
+++ b/tests/stress_tree/Makefile
@@ -0,0 +1,8 @@
+TARGETS=$(patsubst %.c,%,$(wildcard *.c))
+
+LDLIBS += -lpthread
+
+all: $(TARGETS)
+clean:
+ rm -f $(TARGETS)
+
diff --git a/tests/stress_tree/test b/tests/stress_tree/test
new file mode 100755
index 000000000000..6215bec810d1
--- /dev/null
+++ b/tests/stress_tree/test
@@ -0,0 +1,171 @@
+#!/usr/bin/perl
+
+use strict;
+
+use Test;
+BEGIN { plan tests => 1 }
+
+use File::Temp qw/ tempdir tempfile /;
+
+###
+# functions
+
+sub key_gen {
+ my @chars = ( "A" .. "Z", "a" .. "z" );
+ my $key = "testsuite-" . time . "-";
+ $key .= $chars[ rand @chars ] for 1 .. 8;
+ return $key;
+}
+
+# Run stat on random files in subtrees to generate audit events
+sub run_stat {
+ my($dir,$dirs) = @_;
+ my $path;
+
+ while (1) {
+ $path = "$dir/mnt/mnt".int(rand($dirs))."/subdir".int(rand($dirs));
+ stat($path);
+ }
+}
+
+# Generate audit rules for subtrees. Do one rule per subtree. Because watch
+# recursively iterates child mounts and we mount $dir/leaf$i under various
+# subtrees, the inode corresponding to $dir/leaf$i gets tagged by different
+# trees.
+sub run_mark_audit {
+ my($dir,$dirs,$key) = @_;
+
+ while (1) {
+ for (my $i=0; $i < $dirs; $i++) {
+ system("auditctl -w $dir/mnt/mnt$i -p r -k $key");
+ }
+ system("auditctl -D -k $key >& /dev/null");
+ }
+}
+
+sub umount_all {
+ my($dir,$dirs,$ignore_fail) = @_;
+
+ for (my $i=0; $i < $dirs; $i++) {
+ while (system("umount $dir/leaf$i >& /dev/null") > 0 &&
+ $ignore_fail == 0) {
+ # Nothing - loop until umount succeeds
+ }
+ }
+ for (my $i=0; $i < $dirs; $i++) {
+ for (my $j=0; $j < $dirs; $j++) {
+ while (system("umount $dir/mnt/mnt$i/subdir$j >& /dev/null") > 0 &&
+ $ignore_fail == 0) {
+ # Nothing - loop until umount succeeds
+ }
+ }
+ while (system("umount $dir/mnt/mnt$i >& /dev/null") > 0 &&
+ $ignore_fail == 0) {
+ # Nothing - loop until umount succeeds
+ }
+ }
+}
+
+# Mount and unmount filesystems. We pick random leaf mount so that sometimes
+# a leaf mount point root inode will gather more tags from different trees
+# and sometimes we will be quicker in unmounting all instances of leaf and
+# thus excercise inode evistion path
+sub run_mount {
+ my($dir,$dirs) = @_;
+
+ while (1) {
+ # We use tmpfs here and not just bind mounts of some dir so
+ # that the root inode gets evicted once all instances are
+ # unmounted.
+ for (my $i=0; $i < $dirs; $i++) {
+ system("mount -t tmpfs none $dir/leaf$i");
+ }
+ for (my $i=0; $i < $dirs; $i++) {
+ system("mount --bind $dir/dir$i $dir/mnt/mnt$i");
+ for (my $j=0; $j < $dirs; $j++) {
+ my $leaf="$dir/leaf".int(rand($dirs));
+ system("mount --bind $leaf $dir/mnt/mnt$i/subdir$j");
+ }
+ }
+ umount_all($dir, $dirs, 0);
+ }
+}
+
+
+###
+# setup
+
+# reset audit
+system("auditctl -D >& /dev/null");
+
+# create temp directory
+my $dir = tempdir( TEMPLATE => '/tmp/audit-testsuite-XXXX', CLEANUP => 1 );
+
+# create stdout/stderr sinks
+( my $fh_out, my $stdout ) = tempfile(
+ TEMPLATE => '/tmp/audit-testsuite-out-XXXX',
+ UNLINK => 1
+);
+( my $fh_err, my $stderr ) = tempfile(
+ TEMPLATE => '/tmp/audit-testsuite-err-XXXX',
+ UNLINK => 1
+);
+
+###
+# tests
+
+my $dirs = 4;
+
+# setup directory hierarchy
+for (my $i=0; $i < $dirs; $i++) {
+ mkdir $dir."/dir".$i;
+ for (my $j=0; $j < $dirs; $j++) {
+ mkdir $dir."/dir".$i."/subdir".$j;
+ }
+}
+mkdir "$dir/mnt";
+for (my $i=0; $i < $dirs; $i++) {
+ mkdir "$dir/mnt/mnt$i";
+ mkdir "$dir/leaf$i";
+}
+
+my $stat_pid = fork();
+
+if ($stat_pid == 0) {
+ run_stat($dir, $dirs);
+ # Never reached
+ exit;
+}
+
+my $mount_pid = fork();
+
+if ($mount_pid == 0) {
+ run_mount($dir, $dirs);
+ # Never reached
+ exit;
+}
+
+my $key = key_gen();
+
+my $audit_pid = fork();
+
+if ($audit_pid == 0) {
+ run_mark_audit($dir, $dirs, $key);
+ # Never reached
+ exit;
+}
+
+# Sleep for a minute to let stress test run...
+sleep(60);
+ok(1);
+
+###
+# cleanup
+
+kill('KILL', $stat_pid, $mount_pid, $audit_pid);
+# Wait for children to terminate
+waitpid($stat_pid, 0);
+waitpid($mount_pid, 0);
+waitpid($audit_pid, 0);
+system("auditctl -D >& /dev/null");
+umount_all($dir, $dirs, 1);

--
2.16.4

Jan Kara

2018-07-10 10:02:15 UTC

Allocate fsnotify mark independently instead of embedding it inside
chunk. This will allow us to just replace chunk attached to mark when
growing / shrinking chunk instead of replacing mark attached to inode
which is a more complex operation.

Signed-off-by: Jan Kara <***@suse.cz>
---
kernel/audit_tree.c | 59 ++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 45 insertions(+), 14 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index bce3b04a365d..aec9b27a20ff 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -25,7 +25,7 @@ struct audit_tree {
struct audit_chunk {
struct list_head hash;
unsigned long key;
- struct fsnotify_mark mark;
+ struct fsnotify_mark *mark;
struct list_head trees; /* with root here */
int dead;
int count;
@@ -38,6 +38,11 @@ struct audit_chunk {
} owners[];
};

+struct audit_tree_mark {
+ struct fsnotify_mark fsn_mark;
+ struct audit_chunk *chunk;
+};
+
static LIST_HEAD(tree_list);
static LIST_HEAD(prune_list);
static struct task_struct *prune_thread;
@@ -73,6 +78,7 @@ static struct task_struct *prune_thread;
*/

static struct fsnotify_group *audit_tree_group;
+static struct kmem_cache *audit_tree_mark_cachep __read_mostly;

static struct audit_tree *alloc_tree(const char *s)
{
@@ -142,10 +148,28 @@ static void audit_mark_put_chunk(struct audit_chunk *chunk)
call_rcu(&chunk->head, __put_chunk);
}

+static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
+{
+ return container_of(entry, struct audit_tree_mark, fsn_mark);
+}
+
static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
{
- struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
+ struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
audit_mark_put_chunk(chunk);
+ kmem_cache_free(audit_tree_mark_cachep, entry);
+}
+
+static struct fsnotify_mark *alloc_fsnotify_mark(void)
+{
+ struct audit_tree_mark *mark;
+
+ mark = kmem_cache_zalloc(audit_tree_mark_cachep, GFP_KERNEL);
+ if (!mark)
+ return NULL;
+ fsnotify_init_mark(&mark->fsn_mark, audit_tree_group);
+ mark->fsn_mark.mask = FS_IN_IGNORED;
+ return &mark->fsn_mark;
}

static struct audit_chunk *alloc_chunk(int count)
@@ -159,6 +183,13 @@ static struct audit_chunk *alloc_chunk(int count)
if (!chunk)
return NULL;

+ chunk->mark = alloc_fsnotify_mark();
+ if (!chunk->mark) {
+ kfree(chunk);
+ return NULL;
+ }
+ AUDIT_M(chunk->mark)->chunk = chunk;
+
INIT_LIST_HEAD(&chunk->hash);
INIT_LIST_HEAD(&chunk->trees);
chunk->count = count;
@@ -167,8 +198,6 @@ static struct audit_chunk *alloc_chunk(int count)
INIT_LIST_HEAD(&chunk->owners[i].list);
chunk->owners[i].index = i;
}
- fsnotify_init_mark(&chunk->mark, audit_tree_group);
- chunk->mark.mask = FS_IN_IGNORED;
return chunk;
}

@@ -278,7 +307,7 @@ static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
static void untag_chunk(struct node *p)
{
struct audit_chunk *chunk = find_chunk(p);
- struct fsnotify_mark *entry = &chunk->mark;
+ struct fsnotify_mark *entry = chunk->mark;
struct audit_chunk *new = NULL;
struct audit_tree *owner;
int size = chunk->count - 1;
@@ -298,7 +327,7 @@ static void untag_chunk(struct node *p)
if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
mutex_unlock(&entry->group->mark_mutex);
if (new)
- fsnotify_put_mark(&new->mark);
+ fsnotify_put_mark(new->mark);
goto out;
}

@@ -322,9 +351,9 @@ static void untag_chunk(struct node *p)
if (!new)
goto Fallback;

- if (fsnotify_add_mark_locked(&new->mark, entry->connector->obj,
+ if (fsnotify_add_mark_locked(new->mark, entry->connector->obj,
FSNOTIFY_OBJ_TYPE_INODE, 1)) {
- fsnotify_put_mark(&new->mark);
+ fsnotify_put_mark(new->mark);
goto Fallback;
}

@@ -344,7 +373,7 @@ static void untag_chunk(struct node *p)
fsnotify_detach_mark(entry);
mutex_unlock(&entry->group->mark_mutex);
fsnotify_free_mark(entry);
- fsnotify_put_mark(&new->mark); /* drop initial reference */
+ fsnotify_put_mark(new->mark); /* drop initial reference */
goto out;

Fallback:
@@ -375,7 +404,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
return -ENOMEM;
}

- entry = &chunk->mark;
+ entry = chunk->mark;
if (fsnotify_add_inode_mark_locked(entry, inode, 0)) {
mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(entry);
@@ -426,7 +455,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
if (!old_entry)
return create_chunk(inode, tree);

- old = container_of(old_entry, struct audit_chunk, mark);
+ old = AUDIT_M(old_entry)->chunk;

/* are we already there? */
spin_lock(&hash_lock);
@@ -447,7 +476,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
return -ENOMEM;
}

- chunk_entry = &chunk->mark;
+ chunk_entry = chunk->mark;

/*
* mark_mutex protects mark from getting detached and thus also from
@@ -457,7 +486,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
/* old_entry is being shot, lets just lie */
mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(old_entry);
- fsnotify_put_mark(&chunk->mark);
+ fsnotify_put_mark(chunk->mark);
return -ENOENT;
}

@@ -1009,7 +1038,7 @@ static int audit_tree_handle_event(struct fsnotify_group *group,

static void audit_tree_freeing_mark(struct fsnotify_mark *entry, struct fsnotify_group *group)
{
- struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
+ struct audit_chunk *chunk = AUDIT_M(entry)->chunk;

evict_chunk(chunk);

@@ -1030,6 +1059,8 @@ static int __init audit_tree_init(void)
{
int i;

+ audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC);
+
audit_tree_group = fsnotify_alloc_group(&audit_tree_ops);
if (IS_ERR(audit_tree_group))
audit_panic("cannot initialize fsnotify group for rectree watches");

--
2.16.4

Amir Goldstein

2018-07-11 08:57:37 UTC

Ack.

Thanks for separating this patch.
Amir.

Post by Jan Kara
---
kernel/audit_tree.c | 59 ++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 45 insertions(+), 14 deletions(-)
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index bce3b04a365d..aec9b27a20ff 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -25,7 +25,7 @@ struct audit_tree {
struct audit_chunk {
struct list_head hash;
unsigned long key;
- struct fsnotify_mark mark;
+ struct fsnotify_mark *mark;
struct list_head trees; /* with root here */
int dead;
int count;
@@ -38,6 +38,11 @@ struct audit_chunk {
} owners[];
};
+struct audit_tree_mark {
+ struct fsnotify_mark fsn_mark;
+ struct audit_chunk *chunk;
+};
+
static LIST_HEAD(tree_list);
static LIST_HEAD(prune_list);
static struct task_struct *prune_thread;
@@ -73,6 +78,7 @@ static struct task_struct *prune_thread;
*/
static struct fsnotify_group *audit_tree_group;
+static struct kmem_cache *audit_tree_mark_cachep __read_mostly;
static struct audit_tree *alloc_tree(const char *s)
{
@@ -142,10 +148,28 @@ static void audit_mark_put_chunk(struct audit_chunk *chunk)
call_rcu(&chunk->head, __put_chunk);
}
+static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
+{
+ return container_of(entry, struct audit_tree_mark, fsn_mark);
+}
+
static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
{
- struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
+ struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
audit_mark_put_chunk(chunk);
+ kmem_cache_free(audit_tree_mark_cachep, entry);
+}
+
+static struct fsnotify_mark *alloc_fsnotify_mark(void)
+{
+ struct audit_tree_mark *mark;
+
+ mark = kmem_cache_zalloc(audit_tree_mark_cachep, GFP_KERNEL);
+ if (!mark)
+ return NULL;
+ fsnotify_init_mark(&mark->fsn_mark, audit_tree_group);
+ mark->fsn_mark.mask = FS_IN_IGNORED;
+ return &mark->fsn_mark;
}
static struct audit_chunk *alloc_chunk(int count)
@@ -159,6 +183,13 @@ static struct audit_chunk *alloc_chunk(int count)
if (!chunk)
return NULL;
+ chunk->mark = alloc_fsnotify_mark();
+ if (!chunk->mark) {
+ kfree(chunk);
+ return NULL;
+ }
+ AUDIT_M(chunk->mark)->chunk = chunk;
+
INIT_LIST_HEAD(&chunk->hash);
INIT_LIST_HEAD(&chunk->trees);
chunk->count = count;
@@ -167,8 +198,6 @@ static struct audit_chunk *alloc_chunk(int count)
INIT_LIST_HEAD(&chunk->owners[i].list);
chunk->owners[i].index = i;
}
- fsnotify_init_mark(&chunk->mark, audit_tree_group);
- chunk->mark.mask = FS_IN_IGNORED;
return chunk;
}
@@ -278,7 +307,7 @@ static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
static void untag_chunk(struct node *p)
{
struct audit_chunk *chunk = find_chunk(p);
- struct fsnotify_mark *entry = &chunk->mark;
+ struct fsnotify_mark *entry = chunk->mark;
struct audit_chunk *new = NULL;
struct audit_tree *owner;
int size = chunk->count - 1;
@@ -298,7 +327,7 @@ static void untag_chunk(struct node *p)
if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
mutex_unlock(&entry->group->mark_mutex);
if (new)
- fsnotify_put_mark(&new->mark);
+ fsnotify_put_mark(new->mark);
goto out;
}
@@ -322,9 +351,9 @@ static void untag_chunk(struct node *p)
if (!new)
goto Fallback;
- if (fsnotify_add_mark_locked(&new->mark, entry->connector->obj,
+ if (fsnotify_add_mark_locked(new->mark, entry->connector->obj,
FSNOTIFY_OBJ_TYPE_INODE, 1)) {
- fsnotify_put_mark(&new->mark);
+ fsnotify_put_mark(new->mark);
goto Fallback;
}
@@ -344,7 +373,7 @@ static void untag_chunk(struct node *p)
fsnotify_detach_mark(entry);
mutex_unlock(&entry->group->mark_mutex);
fsnotify_free_mark(entry);
- fsnotify_put_mark(&new->mark); /* drop initial reference */
+ fsnotify_put_mark(new->mark); /* drop initial reference */
goto out;
@@ -375,7 +404,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
return -ENOMEM;
}
- entry = &chunk->mark;
+ entry = chunk->mark;
if (fsnotify_add_inode_mark_locked(entry, inode, 0)) {
mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(entry);
@@ -426,7 +455,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
if (!old_entry)
return create_chunk(inode, tree);
- old = container_of(old_entry, struct audit_chunk, mark);
+ old = AUDIT_M(old_entry)->chunk;
/* are we already there? */
spin_lock(&hash_lock);
@@ -447,7 +476,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
return -ENOMEM;
}
- chunk_entry = &chunk->mark;
+ chunk_entry = chunk->mark;
/*
* mark_mutex protects mark from getting detached and thus also from
@@ -457,7 +486,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
/* old_entry is being shot, lets just lie */
mutex_unlock(&audit_tree_group->mark_mutex);
fsnotify_put_mark(old_entry);
- fsnotify_put_mark(&chunk->mark);
+ fsnotify_put_mark(chunk->mark);
return -ENOENT;
}
@@ -1009,7 +1038,7 @@ static int audit_tree_handle_event(struct fsnotify_group *group,
static void audit_tree_freeing_mark(struct fsnotify_mark *entry, struct fsnotify_group *group)
{
- struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
+ struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
evict_chunk(chunk);
@@ -1030,6 +1059,8 @@ static int __init audit_tree_init(void)
{
int i;
+ audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC);
+
audit_tree_group = fsnotify_alloc_group(&audit_tree_ops);
if (IS_ERR(audit_tree_group))
audit_panic("cannot initialize fsnotify group for rectree watches");
--
2.16.4

Amir Goldstein

2018-07-11 10:48:58 UTC

Ack.

Found minor nit.

[...]

Post by Jan Kara
+static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
+{
+ return container_of(entry, struct audit_tree_mark, fsn_mark);
+}
+
static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
{
- struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
+ struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
audit_mark_put_chunk(chunk);
+ kmem_cache_free(audit_tree_mark_cachep, entry);

Technically, we should be freeing AUDIT_M(entry)
although it is the same address with current struct layout.

Thanks,
Amir.

Jan Kara

2018-07-16 15:13:52 UTC

Ack.

Found minor nit.
[...]

Technically, we should be freeing AUDIT_M(entry)
although it is the same address with current struct layout.

Right, thanks for spotting this. Fixed in my tree.

Honza

--
Jan Kara <***@suse.com>
SUSE Labs, CR

Paul Moore

2018-07-27 04:47:37 UTC

Post by Jan Kara
Allocate fsnotify mark independently instead of embedding it inside
chunk. This will allow us to just replace chunk attached to mark when
growing / shrinking chunk instead of replacing mark attached to inode
which is a more complex operation.
---
kernel/audit_tree.c | 59 ++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 45 insertions(+), 14 deletions(-)

...

Post by Jan Kara
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index bce3b04a365d..aec9b27a20ff 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -38,6 +38,11 @@ struct audit_chunk {
} owners[];
};
+struct audit_tree_mark {
+ struct fsnotify_mark fsn_mark;
+ struct audit_chunk *chunk;
+};

It's probably okay to just call it "mark" considering we call
fsnotify_mark fields "mark" elsewhere. If we are going to change it
in one spot we should probably change it other places as well for the
sake of readability.

Post by Jan Kara
call_rcu(&chunk->head, __put_chunk);
}
+static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
+{
+ return container_of(entry, struct audit_tree_mark, fsn_mark);
+}

When I see a symbol in all caps I think "macro definition", but this
isn't a macro definition. I would suggest a different name, dropping
the caps, or converting it into a macro.

Also, unless I missed a call, it seems like after patch 10 all callers
of AUDIT_M end up getting the chunk field; maybe it is better if
AUDIT_M() ends up returning the audit_chunk pointer instead of the
audit_tree_mark pointer (and of course a name change if this is a
reasonable change)?

Post by Jan Kara
static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
{
- struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
+ struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
audit_mark_put_chunk(chunk);
+ kmem_cache_free(audit_tree_mark_cachep, entry);
+}

I think you've said you've already fixed the above (thanks for the
review Amir!).

Post by Jan Kara
+static struct fsnotify_mark *alloc_fsnotify_mark(void)
+{
+ struct audit_tree_mark *mark;
+
+ mark = kmem_cache_zalloc(audit_tree_mark_cachep, GFP_KERNEL);
+ if (!mark)
+ return NULL;
+ fsnotify_init_mark(&mark->fsn_mark, audit_tree_group);
+ mark->fsn_mark.mask = FS_IN_IGNORED;
+ return &mark->fsn_mark;
}

Can't we just call it alloc_mark()? Or did you create the function
earlier in the patchset and I'm missing it now?

[SIDE NOTE: I have some rather big disagreements with the current
naming in this file, but keeping things consistent seems to be a Good
Thing (once again, this is an existing problem not specific to your
patches).]

--
paul moore
www.paul-moore.com

Jan Kara

2018-09-04 14:07:13 UTC

...

The current notation is that 'fsn_mark' (or sometimes 'entry') is struct
fsnotify_mark while plain 'mark' is struct audit_tree_mark (well, except
for audit_chunk AFAICS). So just replacing fsn_mark with mark is IMO going
to cause more confusion. But if you prefer different naming convention,
this is the right moment to bring some consistency into the whole thing.
So how do you prefer to differentiate between fsnotify_mark and
audit_tree_mark?

After searching the code and given your observation that audit_tree_mark is
rarely directly used, I guess I'll just make fsn_mark -> mark, entry->mark
renaming and invent some name for the few places where we use
audit_tree_mark directly.

Honza

--
Jan Kara <***@suse.com>
SUSE Labs, CR

Amir Goldstein

2018-07-11 07:58:24 UTC

Post by Jan Kara
Chunk replacement code is very similar for the cases where we grow or
shrink chunk. Factor the code out into a common helper function.

Ack. Nice re-factoring.

Thanks,
Amir.

Post by Jan Kara
---
kernel/audit_tree.c | 86 +++++++++++++++++++++++++----------------------------
1 file changed, 40 insertions(+), 46 deletions(-)
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 6a01738d1ac2..f419fdfc25b4 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -235,6 +235,38 @@ static struct audit_chunk *find_chunk(struct node *p)
return container_of(p, struct audit_chunk, owners[0]);
}
+static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
+ struct node *skip)
+{
+ struct audit_tree *owner;
+ int i, j;
+
+ new->key = old->key;
+ list_replace_init(&old->trees, &new->trees);
+ list_for_each_entry(owner, &new->trees, same_root)
+ owner->root = new;
+ for (i = j = 0; j < old->count; i++, j++) {
+ if (&old->owners[j] == skip) {
+ i--;
+ continue;
+ }
+ owner = old->owners[j].owner;
+ new->owners[i].owner = owner;
+ new->owners[i].index = old->owners[j].index - j + i;
+ if (!owner) /* result of earlier fallback */
+ continue;
+ get_tree(owner);
+ list_replace_init(&old->owners[j].list, &new->owners[i].list);
+ }
+ /*
+ * Make sure chunk is fully initialized before making it visible in the
+ * hash. Pairs with a data dependency barrier in READ_ONCE() in
+ * audit_tree_lookup().
+ */
+ smp_wmb();
+ list_replace_rcu(&old->hash, &new->hash);
+}
+
static void untag_chunk(struct node *p)
{
struct audit_chunk *chunk = find_chunk(p);
@@ -242,7 +274,6 @@ static void untag_chunk(struct node *p)
struct audit_chunk *new = NULL;
struct audit_tree *owner;
int size = chunk->count - 1;
- int i, j;
fsnotify_get_mark(entry);
@@ -291,38 +322,16 @@ static void untag_chunk(struct node *p)
chunk->dead = 1;
spin_lock(&hash_lock);
- new->key = chunk->key;
- list_replace_init(&chunk->trees, &new->trees);
if (owner->root == chunk) {
list_del_init(&owner->same_root);
owner->root = NULL;
}
-
- for (i = j = 0; j <= size; i++, j++) {
- struct audit_tree *s;
- if (&chunk->owners[j] == p) {
- list_del_init(&p->list);
- i--;
- continue;
- }
- s = chunk->owners[j].owner;
- new->owners[i].owner = s;
- new->owners[i].index = chunk->owners[j].index - j + i;
- if (!s) /* result of earlier fallback */
- continue;
- get_tree(s);
- list_replace_init(&chunk->owners[j].list, &new->owners[i].list);
- }
-
- list_for_each_entry(owner, &new->trees, same_root)
- owner->root = new;
+ list_del_init(&p->list);
/*
- * Make sure chunk is fully initialized before making it visible in the
- * hash. Pairs with a data dependency barrier in READ_ONCE() in
- * audit_tree_lookup().
+ * This has to go last when updating chunk as once replace_chunk() is
+ * called, new RCU readers can see the new chunk.
*/
- smp_wmb();
- list_replace_rcu(&chunk->hash, &new->hash);
+ replace_chunk(new, chunk, p);
spin_unlock(&hash_lock);
fsnotify_detach_mark(entry);
mutex_unlock(&entry->group->mark_mutex);
@@ -399,7 +408,6 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
static int tag_chunk(struct inode *inode, struct audit_tree *tree)
{
struct fsnotify_mark *old_entry, *chunk_entry;
- struct audit_tree *owner;
struct audit_chunk *chunk, *old;
struct node *p;
int n;
@@ -464,35 +472,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
fsnotify_put_mark(old_entry);
return 0;
}
- chunk->key = old->key;
- list_replace_init(&old->trees, &chunk->trees);
- for (n = 0, p = chunk->owners; n < old->count; n++, p++) {
- struct audit_tree *s = old->owners[n].owner;
- p->owner = s;
- p->index = old->owners[n].index;
- if (!s) /* result of fallback in untag */
- continue;
- get_tree(s);
- list_replace_init(&old->owners[n].list, &p->list);
- }
+ p = &chunk->owners[chunk->count - 1];
p->index = (chunk->count - 1) | (1U<<31);
p->owner = tree;
get_tree(tree);
list_add(&p->list, &tree->chunks);
- list_for_each_entry(owner, &chunk->trees, same_root)
- owner->root = chunk;
old->dead = 1;
if (!tree->root) {
tree->root = chunk;
list_add(&tree->same_root, &chunk->trees);
}
/*
- * Make sure chunk is fully initialized before making it visible in the
- * hash. Pairs with a data dependency barrier in READ_ONCE() in
- * audit_tree_lookup().
+ * This has to go last when updating chunk as once replace_chunk() is
+ * called, new RCU readers can see the new chunk.
*/
- smp_wmb();
- list_replace_rcu(&old->hash, &chunk->hash);
+ replace_chunk(chunk, old, NULL);
spin_unlock(&hash_lock);
fsnotify_detach_mark(old_entry);
mutex_unlock(&audit_tree_group->mark_mutex);
--
2.16.4

Jan Kara

2018-07-11 08:26:38 UTC

Post by Jan Kara
Chunk replacement code is very similar for the cases where we grow or
shrink chunk. Factor the code out into a common helper function.

Ack. Nice re-factoring.

Thanks. BTW, I didn't add your Acked-by tags because it wasn't obvious to
me whether you meant them as an official tag-worthy statement. So do want
me to add the tags or do you prefer to stay "anonymous reviewer" ;)?

Honza

Post by Amir Goldstein
Thanks,
Amir.

--
Jan Kara <***@suse.com>
SUSE Labs, CR

Amir Goldstein

2018-07-11 09:01:35 UTC

Post by Jan Kara

Post by Jan Kara
Chunk replacement code is very similar for the cases where we grow or
shrink chunk. Factor the code out into a common helper function.

Ack. Nice re-factoring.

Don't feel confident enough with Reviewed-by/Acked-by on this series,
just letting you know that I eyeballed your patches FWIW ;-)

I'm down to the last one, which is agonizing to make sure nothing is missed
from the diff. At this point [10/10] I'd rather review the final code.

Can you push the work to a branch.

Thanks,
Amir.

Jan Kara

2018-07-11 09:23:58 UTC

Post by Jan Kara

Post by Jan Kara
Chunk replacement code is very similar for the cases where we grow or
shrink chunk. Factor the code out into a common helper function.

Ack. Nice re-factoring.

Don't feel confident enough with Reviewed-by/Acked-by on this series,
just letting you know that I eyeballed your patches FWIW ;-)
I'm down to the last one, which is agonizing to make sure nothing is missed
from the diff. At this point [10/10] I'd rather review the final code.

Yes, changing lifetime rules is almost always painful, especially in the
code you don't understand too well. It took me a day to understand audit
well enough to be confident to make that change...

Post by Amir Goldstein
Can you push the work to a branch.

Sure, audit_cleanup branch pushed out to
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git

Thanks for your review!

Honza

--
Jan Kara <***@suse.com>
SUSE Labs, CR

Paul Moore

2018-07-27 04:47:27 UTC