Discussion:
[PATCH 00/12] Add kdbus implementation
(too old to reply)
Greg Kroah-Hartman
2014-10-29 22:03:38 UTC
Permalink
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.

The documentation added by the first patch in this series is meant to
explain all protocol and API details comprehensively, but here's a terse
list of the kdbus key features:

* Implemented as a char driver, which creates devices on demand when
they are created.

* Message transfer over shared memory areas in each of the peer's
task to avoid unnecessary extra data copies during message exchanges.

* Optional passing of file descriptors and sealed memfds along with
messages.

* No demarshalling of any message content from inside the kernel;
the driver stays entirely agnostic to the transported payload.

* Support for multiple domains, completely separated from each other,
allowing multiple virtualized instances to be used at the same time.

* Support for peer-to-peer unicast and multicast messages.

* Attachment of trustable metadata to each message on demand, such as
the sending peer's timestamp, creds, auxgroups, comm, exe, cmdline,
cgroup path, capabilities, security label, audit information, etc,
each taken at the time the sender issued the ioctl to send the
message. Which of those are actually recorded and attached is
controlled by the receiving peer.

* Bloom filters as measure to pre-filter broadcast messages and to
mitigate unnecessary task wakeups. On the side kernel, however, this
is just a cheap &-operation, hash functions are left to be
implemented by userspace.

* Optional message dequeuing by priority, allowing multiple types of
payloads of different priorities to be transported over the same
connection.

* Global, domain-wide guaranteed message ordering.

* Eavesdropping for buses for debugging

* Adressing of remote peers by their numerical unique ID, or by a
well-known name.

* Built-in name registry for atomic name ownership lookups, claims,
releases and take-overs from one peer to another.

* Simple policy database to restrict peers from seeing or talking to
each other, and to control name ownership.

* Custom bus endpoints in addition to the default ones. Those allow
to upload extra policy rules, and can act as a protocol-filtering
bus firewall.

* Kernel-generated notifications on connected and disconnected peers,
claimed and released well-known-names, and exceeded reply timeouts.

This is the first submission of kdbus by the kernel community. It was
developed in its own repository for well more than a year, and has been
tested on x64-64, i686 and ARM architectures in various use cases. The
driver is totally non-intrusive and doesn't touch a single line of
existing kernel code.

kdbus has been worked on collaboratively by many people contributing
code and suggestions during its development. Below is a list of all
involved individuals, in alphabetical order.

Alban Crequy, Arnd Bergmann, Christian S., Daniel Kowalski,
Daniel Mack, David Herrmann, Djalal Harouni, Govindarajulu
Varadarajan, Greg Kroah-Hartman, Harald Hoyer, Hristo Venev,
Ingo van Lil, Jacek Janczyk, Jason A. Donenfeld, John de
la Garza, Kay Sievers, Lennart Poettering, Lukasz Skalski,
Maciej Wereski, Marc-Antoine Perennou, Marcel Holtmann,
Michal Eljasiewicz, Michele Curti, Przemyslaw Kedzierski,
Radoslaw Pajak, Ryan Lortie, Simon McVittie, Simon Peeters,
Stefan Beller, Ted Feng, Tejun Heo, Tero Roponen, Thomas
Andersen, Torstein Husebø, Vasiliy Balyasnyy.

Some statistics: the driver itself has a little more than 11k lines,
with ~25% of the lines being comments. Our test suite weights in for
another 6k lines, and the API documentation file currently has >1800
lines. The loaded kernel module has ~70kB of text size.

Patches #3 to #10 carry the driver implementation in digestable bites,
but only #11 adds the Makefile to actually compile them. That division
can of course be changed, and the patches be squashed and reordered
later.

The rest should be pretty much self-explanatory - the individual commit
logs and Documentation/kdbus.txt contain detailed information on the
driver's inner life.

While we consider the kernel API/ABI mostly stable at this point, we're
still in the process of fixing up some ends in userspace, such as
compatibility layers and the D-Bus spec, but that shouldn't affect the
kernel side much anymore.

As for maintainership, Daniel Mack, David Herrmann, Djalal Harouni and
myself would be taking care for it in the future.

I'll also be keeping this in a git tree, the kdbus branch of
char-misc.git at:
https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/

thanks,

greg k-h

Daniel Mack (12):
kdbus: add documentation
kdbus: add header file
kdbus: add driver skeleton, ioctl entry points and utility functions
kdbus: add connection pool implementation
kdbus: add connection, queue handling and message validation code
kdbus: add code to gather metadata
kdbus: add code for notifications and matches
kdbus: add code for buses, domains and endpoints
kdbus: add name registry implementation
kdbus: add policy database implementation
kdbus: add Makefile, Kconfig and MAINTAINERS entry
kdbus: add selftests

Documentation/ioctl/ioctl-number.txt | 1 +
Documentation/kdbus.txt | 1815 ++++++++++++++++++++++
MAINTAINERS | 12 +
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/kdbus/Kconfig | 11 +
drivers/misc/kdbus/Makefile | 19 +
drivers/misc/kdbus/bus.c | 450 ++++++
drivers/misc/kdbus/bus.h | 107 ++
drivers/misc/kdbus/connection.c | 1751 +++++++++++++++++++++
drivers/misc/kdbus/connection.h | 177 +++
drivers/misc/kdbus/domain.c | 477 ++++++
drivers/misc/kdbus/domain.h | 105 ++
drivers/misc/kdbus/endpoint.c | 567 +++++++
drivers/misc/kdbus/endpoint.h | 94 ++
drivers/misc/kdbus/handle.c | 1221 +++++++++++++++
drivers/misc/kdbus/handle.h | 46 +
drivers/misc/kdbus/item.c | 256 +++
drivers/misc/kdbus/item.h | 40 +
drivers/misc/kdbus/limits.h | 77 +
drivers/misc/kdbus/main.c | 70 +
drivers/misc/kdbus/match.c | 521 +++++++
drivers/misc/kdbus/match.h | 30 +
drivers/misc/kdbus/message.c | 420 +++++
drivers/misc/kdbus/message.h | 72 +
drivers/misc/kdbus/metadata.c | 626 ++++++++
drivers/misc/kdbus/metadata.h | 51 +
drivers/misc/kdbus/names.c | 920 +++++++++++
drivers/misc/kdbus/names.h | 81 +
drivers/misc/kdbus/notify.c | 235 +++
drivers/misc/kdbus/notify.h | 28 +
drivers/misc/kdbus/policy.c | 617 ++++++++
drivers/misc/kdbus/policy.h | 60 +
drivers/misc/kdbus/pool.c | 728 +++++++++
drivers/misc/kdbus/pool.h | 43 +
drivers/misc/kdbus/queue.c | 602 +++++++
drivers/misc/kdbus/queue.h | 82 +
drivers/misc/kdbus/util.c | 108 ++
drivers/misc/kdbus/util.h | 94 ++
include/uapi/linux/kdbus.h | 918 +++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/kdbus/.gitignore | 11 +
tools/testing/selftests/kdbus/Makefile | 46 +
tools/testing/selftests/kdbus/kdbus-enum.c | 90 ++
tools/testing/selftests/kdbus/kdbus-enum.h | 14 +
tools/testing/selftests/kdbus/kdbus-test.c | 474 ++++++
tools/testing/selftests/kdbus/kdbus-test.h | 79 +
tools/testing/selftests/kdbus/kdbus-util.c | 1173 ++++++++++++++
tools/testing/selftests/kdbus/kdbus-util.h | 139 ++
tools/testing/selftests/kdbus/test-activator.c | 317 ++++
tools/testing/selftests/kdbus/test-benchmark.c | 417 +++++
tools/testing/selftests/kdbus/test-bus.c | 117 ++
tools/testing/selftests/kdbus/test-chat.c | 123 ++
tools/testing/selftests/kdbus/test-connection.c | 258 +++
tools/testing/selftests/kdbus/test-daemon.c | 66 +
tools/testing/selftests/kdbus/test-domain.c | 65 +
tools/testing/selftests/kdbus/test-endpoint.c | 221 +++
tools/testing/selftests/kdbus/test-fd.c | 473 ++++++
tools/testing/selftests/kdbus/test-free.c | 34 +
tools/testing/selftests/kdbus/test-match.c | 385 +++++
tools/testing/selftests/kdbus/test-message.c | 126 ++
tools/testing/selftests/kdbus/test-metadata-ns.c | 236 +++
tools/testing/selftests/kdbus/test-monitor.c | 156 ++
tools/testing/selftests/kdbus/test-names.c | 184 +++
tools/testing/selftests/kdbus/test-policy-ns.c | 578 +++++++
tools/testing/selftests/kdbus/test-policy-priv.c | 1168 ++++++++++++++
tools/testing/selftests/kdbus/test-policy.c | 81 +
tools/testing/selftests/kdbus/test-race.c | 313 ++++
tools/testing/selftests/kdbus/test-sync.c | 241 +++
tools/testing/selftests/kdbus/test-timeout.c | 97 ++
70 files changed, 21217 insertions(+)
create mode 100644 Documentation/kdbus.txt
create mode 100644 drivers/misc/kdbus/Kconfig
create mode 100644 drivers/misc/kdbus/Makefile
create mode 100644 drivers/misc/kdbus/bus.c
create mode 100644 drivers/misc/kdbus/bus.h
create mode 100644 drivers/misc/kdbus/connection.c
create mode 100644 drivers/misc/kdbus/connection.h
create mode 100644 drivers/misc/kdbus/domain.c
create mode 100644 drivers/misc/kdbus/domain.h
create mode 100644 drivers/misc/kdbus/endpoint.c
create mode 100644 drivers/misc/kdbus/endpoint.h
create mode 100644 drivers/misc/kdbus/handle.c
create mode 100644 drivers/misc/kdbus/handle.h
create mode 100644 drivers/misc/kdbus/item.c
create mode 100644 drivers/misc/kdbus/item.h
create mode 100644 drivers/misc/kdbus/limits.h
create mode 100644 drivers/misc/kdbus/main.c
create mode 100644 drivers/misc/kdbus/match.c
create mode 100644 drivers/misc/kdbus/match.h
create mode 100644 drivers/misc/kdbus/message.c
create mode 100644 drivers/misc/kdbus/message.h
create mode 100644 drivers/misc/kdbus/metadata.c
create mode 100644 drivers/misc/kdbus/metadata.h
create mode 100644 drivers/misc/kdbus/names.c
create mode 100644 drivers/misc/kdbus/names.h
create mode 100644 drivers/misc/kdbus/notify.c
create mode 100644 drivers/misc/kdbus/notify.h
create mode 100644 drivers/misc/kdbus/policy.c
create mode 100644 drivers/misc/kdbus/policy.h
create mode 100644 drivers/misc/kdbus/pool.c
create mode 100644 drivers/misc/kdbus/pool.h
create mode 100644 drivers/misc/kdbus/queue.c
create mode 100644 drivers/misc/kdbus/queue.h
create mode 100644 drivers/misc/kdbus/util.c
create mode 100644 drivers/misc/kdbus/util.h
create mode 100644 include/uapi/linux/kdbus.h
create mode 100644 tools/testing/selftests/kdbus/.gitignore
create mode 100644 tools/testing/selftests/kdbus/Makefile
create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
create mode 100644 tools/testing/selftests/kdbus/test-activator.c
create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
create mode 100644 tools/testing/selftests/kdbus/test-bus.c
create mode 100644 tools/testing/selftests/kdbus/test-chat.c
create mode 100644 tools/testing/selftests/kdbus/test-connection.c
create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
create mode 100644 tools/testing/selftests/kdbus/test-domain.c
create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
create mode 100644 tools/testing/selftests/kdbus/test-fd.c
create mode 100644 tools/testing/selftests/kdbus/test-free.c
create mode 100644 tools/testing/selftests/kdbus/test-match.c
create mode 100644 tools/testing/selftests/kdbus/test-message.c
create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
create mode 100644 tools/testing/selftests/kdbus/test-names.c
create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
create mode 100644 tools/testing/selftests/kdbus/test-policy.c
create mode 100644 tools/testing/selftests/kdbus/test-race.c
create mode 100644 tools/testing/selftests/kdbus/test-sync.c
create mode 100644 tools/testing/selftests/kdbus/test-timeout.c
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:03:42 UTC
Permalink
From: Daniel Mack <***@zonque.org>

A connection chooses which metadata it wants to have attached to each
message it receives with kdbus_cmd_hello.attach_flags. The metadata
will be attached as items to the messages. All metadata refers to
information about the sending task at sending time, unless otherwise
stated. Also, the metadata is copied, not referenced, so even if the
sending task doesn't exist anymore at the time the message is received,
the information is still preserved.

See kdbus.txt for more details on which metadata can currently be
attached to messages.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/misc/kdbus/metadata.c | 626 ++++++++++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/metadata.h | 51 ++++
2 files changed, 677 insertions(+)
create mode 100644 drivers/misc/kdbus/metadata.c
create mode 100644 drivers/misc/kdbus/metadata.h

diff --git a/drivers/misc/kdbus/metadata.c b/drivers/misc/kdbus/metadata.c
new file mode 100644
index 000000000000..8323e6d7a071
--- /dev/null
+++ b/drivers/misc/kdbus/metadata.c
@@ -0,0 +1,626 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/capability.h>
+#include <linux/cgroup.h>
+#include <linux/cred.h>
+#include <linux/file.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/pid_namespace.h>
+#include <linux/sched.h>
+#include <linux/security.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/user_namespace.h>
+#include <linux/version.h>
+
+#include "connection.h"
+#include "item.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+
+/**
+ * kdbus_meta_new() - create new metadata object
+ * @meta: New metadata object
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_meta_new(struct kdbus_meta **meta)
+{
+ struct kdbus_meta *m;
+
+ BUG_ON(*meta);
+
+ m = kzalloc(sizeof(*m), GFP_KERNEL);
+ if (!m)
+ return -ENOMEM;
+
+ /*
+ * Remember the PID and user namespaces our credentials belong to;
+ * we need to prevent leaking authorization and security-relevant
+ * data across different namespaces.
+ */
+ m->pid_namespace = get_pid_ns(task_active_pid_ns(current));
+ m->user_namespace = get_user_ns(current_user_ns());
+
+ *meta = m;
+ return 0;
+}
+
+/**
+ * kdbus_meta_dup() - Duplicate a meta object
+ *
+ * @orig: The meta object to duplicate
+ * @copy: Return pointer for the duplicated object
+ *
+ * Return: 0 on success, -ENOMEM on memory allocation failures.
+ */
+int kdbus_meta_dup(const struct kdbus_meta *orig,
+ struct kdbus_meta **copy)
+{
+ struct kdbus_meta *m;
+
+ BUG_ON(!orig || !copy);
+
+ m = kmalloc(sizeof(*m), GFP_KERNEL);
+ if (!m)
+ return -ENOMEM;
+
+ m->data = kmemdup(orig->data, orig->allocated_size, GFP_KERNEL);
+ if (!m->data) {
+ kfree(m);
+ return -ENOMEM;
+ }
+
+ m->pid_namespace = get_pid_ns(orig->pid_namespace);
+ m->user_namespace = get_user_ns(orig->user_namespace);
+
+ m->attached = orig->attached;
+ m->allocated_size = orig->allocated_size;
+ m->size = orig->size;
+
+ *copy = m;
+ return 0;
+}
+
+/**
+ * kdbus_meta_ns_eq() - check whether the namespaces of two metadata objects
+ * are equal.
+ * @meta_a: Metadata A
+ * @meta_b: Metadata B
+ *
+ * Return: true if the two objects have the same namespaces, false otherwise.
+ */
+bool kdbus_meta_ns_eq(const struct kdbus_meta *meta_a,
+ const struct kdbus_meta *meta_b)
+{
+ return (meta_a->pid_namespace == meta_b->pid_namespace &&
+ meta_a->user_namespace == meta_b->user_namespace);
+}
+
+/**
+ * kdbus_meta_free() - release metadata
+ * @meta: Metadata object
+ */
+void kdbus_meta_free(struct kdbus_meta *meta)
+{
+ if (!meta)
+ return;
+
+ put_pid_ns(meta->pid_namespace);
+ put_user_ns(meta->user_namespace);
+
+ kfree(meta->data);
+ kfree(meta);
+}
+
+static struct kdbus_item *
+kdbus_meta_append_item(struct kdbus_meta *meta, u64 type, size_t payload_size)
+{
+ size_t extra_size = KDBUS_ITEM_SIZE(payload_size);
+ struct kdbus_item *item;
+ size_t size;
+
+ /* get new metadata buffer, pre-allocate at least 512 bytes */
+ if (!meta->data) {
+ size = roundup_pow_of_two(256 + extra_size);
+ meta->data = kzalloc(size, GFP_KERNEL);
+ if (!meta->data)
+ return ERR_PTR(-ENOMEM);
+
+ meta->allocated_size = size;
+ }
+
+ /* double the pre-allocated buffer size if needed */
+ size = meta->size + extra_size;
+ if (size > meta->allocated_size) {
+ size_t size_diff;
+ struct kdbus_item *data;
+
+ size = roundup_pow_of_two(size);
+ size_diff = size - meta->allocated_size;
+ data = kmalloc(size, GFP_KERNEL);
+ if (!data)
+ return ERR_PTR(-ENOMEM);
+
+ memcpy(data, meta->data, meta->size);
+ memset((u8 *)data + meta->allocated_size, 0, size_diff);
+
+ kfree(meta->data);
+ meta->data = data;
+ meta->allocated_size = size;
+ }
+
+ /* insert new record */
+ item = (struct kdbus_item *)((u8 *)meta->data + meta->size);
+ item->type = type;
+ item->size = KDBUS_ITEM_HEADER_SIZE + payload_size;
+
+ meta->size += extra_size;
+
+ return item;
+}
+
+/**
+ * kdbus_meta_append_data() - append given raw data to metadata object
+ * @meta: Metadata object
+ * @type: KDBUS_ITEM_* type
+ * @data: pointer to data to copy from. If it is NULL
+ * then just make space in the metadata buffer.
+ * @len: number of bytes to copy
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_meta_append_data(struct kdbus_meta *meta, u64 type,
+ const void *data, size_t len)
+{
+ struct kdbus_item *item;
+
+ if (len == 0)
+ return 0;
+
+ item = kdbus_meta_append_item(meta, type, len);
+ if (IS_ERR(item))
+ return PTR_ERR(item);
+
+ if (data)
+ memcpy(item->data, data, len);
+
+ return 0;
+}
+
+static int kdbus_meta_append_str(struct kdbus_meta *meta, u64 type,
+ const char *str)
+{
+ return kdbus_meta_append_data(meta, type, str, strlen(str) + 1);
+}
+
+static int kdbus_meta_append_timestamp(struct kdbus_meta *meta,
+ u64 seq)
+{
+ struct kdbus_item *item;
+ struct timespec ts;
+
+ item = kdbus_meta_append_item(meta, KDBUS_ITEM_TIMESTAMP,
+ sizeof(struct kdbus_timestamp));
+ if (IS_ERR(item))
+ return PTR_ERR(item);
+
+ if (seq > 0)
+ item->timestamp.seqnum = seq;
+
+ ktime_get_ts(&ts);
+ item->timestamp.monotonic_ns = timespec_to_ns(&ts);
+
+ ktime_get_real_ts(&ts);
+ item->timestamp.realtime_ns = timespec_to_ns(&ts);
+
+ return 0;
+}
+
+static int kdbus_meta_append_cred(struct kdbus_meta *meta)
+{
+ struct kdbus_creds creds = {
+ .uid = from_kuid_munged(current_user_ns(), current_uid()),
+ .gid = from_kgid_munged(current_user_ns(), current_gid()),
+ .pid = task_pid_vnr(current),
+ .tid = task_tgid_vnr(current),
+ .starttime = current->start_time,
+ };
+
+ return kdbus_meta_append_data(meta, KDBUS_ITEM_CREDS,
+ &creds, sizeof(creds));
+}
+
+static int kdbus_meta_append_auxgroups(struct kdbus_meta *meta)
+{
+ struct group_info *info;
+ struct kdbus_item *item;
+ int i, ret = 0;
+ u64 *gid;
+
+ info = get_current_groups();
+ item = kdbus_meta_append_item(meta, KDBUS_ITEM_AUXGROUPS,
+ info->ngroups * sizeof(*gid));
+ if (IS_ERR(item)) {
+ ret = PTR_ERR(item);
+ goto exit_put_groups;
+ }
+
+ gid = (u64 *) item->data;
+
+ for (i = 0; i < info->ngroups; i++)
+ gid[i] = from_kgid_munged(current_user_ns(), GROUP_AT(info, i));
+
+exit_put_groups:
+ put_group_info(info);
+
+ return ret;
+}
+
+static int kdbus_meta_append_src_names(struct kdbus_meta *meta,
+ struct kdbus_conn *conn)
+{
+ struct kdbus_name_entry *e;
+ int ret = 0;
+
+ if (!conn)
+ return 0;
+
+ mutex_lock(&conn->lock);
+ list_for_each_entry(e, &conn->names_list, conn_entry) {
+ struct kdbus_item *item;
+ size_t len;
+
+ len = strlen(e->name) + 1;
+ item = kdbus_meta_append_item(meta, KDBUS_ITEM_NAME,
+ sizeof(struct kdbus_name) + len);
+ if (IS_ERR(item)) {
+ ret = PTR_ERR(item);
+ break;
+ }
+
+ item->name.flags = e->flags;
+ memcpy(item->name.name, e->name, len);
+ }
+ mutex_unlock(&conn->lock);
+
+ return ret;
+}
+
+static int kdbus_meta_append_exe(struct kdbus_meta *meta)
+{
+ struct mm_struct *mm = get_task_mm(current);
+ struct path *exe_path = NULL;
+ char *pathname;
+ int ret = 0;
+ size_t len;
+ char *tmp;
+
+ if (!mm)
+ return -EFAULT;
+
+ down_read(&mm->mmap_sem);
+ if (mm->exe_file) {
+ path_get(&mm->exe_file->f_path);
+ exe_path = &mm->exe_file->f_path;
+ }
+ up_read(&mm->mmap_sem);
+
+ if (!exe_path)
+ goto exit_mmput;
+
+ tmp = (char *)__get_free_page(GFP_TEMPORARY | __GFP_ZERO);
+ if (!tmp) {
+ ret = -ENOMEM;
+ goto exit_path_put;
+ }
+
+ pathname = d_path(exe_path, tmp, PAGE_SIZE);
+ if (IS_ERR(pathname)) {
+ ret = PTR_ERR(pathname);
+ goto exit_free_page;
+ }
+
+ len = tmp + PAGE_SIZE - pathname;
+ ret = kdbus_meta_append_data(meta, KDBUS_ITEM_EXE, pathname, len);
+
+exit_free_page:
+ free_page((unsigned long) tmp);
+
+exit_path_put:
+ path_put(exe_path);
+
+exit_mmput:
+ mmput(mm);
+
+ return ret;
+}
+
+static int kdbus_meta_append_cmdline(struct kdbus_meta *meta)
+{
+ struct mm_struct *mm;
+ int ret = 0;
+ size_t len;
+ char *tmp;
+
+ tmp = (char *)__get_free_page(GFP_TEMPORARY | __GFP_ZERO);
+ if (!tmp)
+ return -ENOMEM;
+
+ mm = get_task_mm(current);
+ if (!mm) {
+ ret = -EFAULT;
+ goto exit_free_page;
+ }
+
+ if (!mm->arg_end)
+ goto exit_mmput;
+
+ len = mm->arg_end - mm->arg_start;
+ if (len > PAGE_SIZE)
+ len = PAGE_SIZE;
+
+ ret = copy_from_user(tmp, (const char __user *)mm->arg_start, len);
+ if (ret < 0)
+ goto exit_mmput;
+
+ ret = kdbus_meta_append_data(meta, KDBUS_ITEM_CMDLINE, tmp, len);
+
+exit_mmput:
+ mmput(mm);
+
+exit_free_page:
+ free_page((unsigned long) tmp);
+
+ return ret;
+}
+
+static int kdbus_meta_append_caps(struct kdbus_meta *meta)
+{
+ struct caps {
+ u32 last_cap;
+ struct {
+ u32 caps[_KERNEL_CAPABILITY_U32S];
+ } set[4];
+ } caps;
+ unsigned int i;
+ const struct cred *cred = current_cred();
+
+ caps.last_cap = CAP_LAST_CAP;
+
+ for (i = 0; i < _KERNEL_CAPABILITY_U32S; i++) {
+ caps.set[0].caps[i] = cred->cap_inheritable.cap[i];
+ caps.set[1].caps[i] = cred->cap_permitted.cap[i];
+ caps.set[2].caps[i] = cred->cap_effective.cap[i];
+ caps.set[3].caps[i] = cred->cap_bset.cap[i];
+ }
+
+ /* clear unused bits */
+ for (i = 0; i < 4; i++)
+ caps.set[i].caps[CAP_TO_INDEX(CAP_LAST_CAP)] &=
+ CAP_TO_MASK(CAP_LAST_CAP + 1) - 1;
+
+ return kdbus_meta_append_data(meta, KDBUS_ITEM_CAPS,
+ &caps, sizeof(caps));
+}
+
+#ifdef CONFIG_CGROUPS
+static int kdbus_meta_append_cgroup(struct kdbus_meta *meta)
+{
+ char *buf, *path;
+ int ret;
+
+ buf = (char *)__get_free_page(GFP_TEMPORARY | __GFP_ZERO);
+ if (!buf)
+ return -ENOMEM;
+
+ path = task_cgroup_path(current, buf, PAGE_SIZE);
+
+ if (path)
+ ret = kdbus_meta_append_str(meta, KDBUS_ITEM_CGROUP, path);
+ else
+ ret = -ENAMETOOLONG;
+
+ free_page((unsigned long) buf);
+
+ return ret;
+}
+#endif
+
+#ifdef CONFIG_AUDITSYSCALL
+static int kdbus_meta_append_audit(struct kdbus_meta *meta)
+{
+ struct kdbus_audit audit;
+
+ audit.loginuid = from_kuid(current_user_ns(),
+ audit_get_loginuid(current));
+ audit.sessionid = audit_get_sessionid(current);
+
+ return kdbus_meta_append_data(meta, KDBUS_ITEM_AUDIT,
+ &audit, sizeof(audit));
+}
+#endif
+
+#ifdef CONFIG_SECURITY
+static int kdbus_meta_append_seclabel(struct kdbus_meta *meta)
+{
+ u32 len, sid;
+ char *label;
+ int ret;
+
+ security_task_getsecid(current, &sid);
+ ret = security_secid_to_secctx(sid, &label, &len);
+ if (ret == -EOPNOTSUPP)
+ return 0;
+ if (ret < 0)
+ return ret;
+
+ if (label && len > 0)
+ ret = kdbus_meta_append_data(meta, KDBUS_ITEM_SECLABEL,
+ label, len);
+ security_release_secctx(label, len);
+
+ return ret;
+}
+#endif
+
+/**
+ * kdbus_meta_append() - collect metadata from current process
+ * @meta: Metadata object
+ * @conn: Current connection to read names from
+ * @seq: Message sequence number
+ * @which: KDBUS_ATTACH_* flags which typ of data to attach
+ *
+ * Collect the data specified in flags and allocate or extend
+ * the buffer in the metadata object.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_meta_append(struct kdbus_meta *meta,
+ struct kdbus_conn *conn,
+ u64 seq, u64 which)
+{
+ int ret;
+ u64 mask;
+
+ /* which metadata is wanted but not yet attached? */
+ mask = which & ~meta->attached;
+ if (mask == 0)
+ return 0;
+
+ if (mask & KDBUS_ATTACH_TIMESTAMP) {
+ ret = kdbus_meta_append_timestamp(meta, seq);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_TIMESTAMP;
+ }
+
+ if (mask & KDBUS_ATTACH_CREDS) {
+ ret = kdbus_meta_append_cred(meta);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_CREDS;
+ }
+
+ if (mask & KDBUS_ATTACH_AUXGROUPS) {
+ ret = kdbus_meta_append_auxgroups(meta);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_AUXGROUPS;
+ }
+
+ if (mask & KDBUS_ATTACH_NAMES && conn) {
+ ret = kdbus_meta_append_src_names(meta, conn);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_NAMES;
+ }
+
+ if (mask & KDBUS_ATTACH_TID_COMM) {
+ char comm[TASK_COMM_LEN];
+
+ get_task_comm(comm, current->group_leader);
+ ret = kdbus_meta_append_str(meta, KDBUS_ITEM_TID_COMM, comm);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_TID_COMM;
+ }
+
+ if (mask & KDBUS_ATTACH_PID_COMM) {
+ char comm[TASK_COMM_LEN];
+
+ get_task_comm(comm, current);
+ ret = kdbus_meta_append_str(meta, KDBUS_ITEM_PID_COMM, comm);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_PID_COMM;
+ }
+
+ if (mask & KDBUS_ATTACH_EXE) {
+ ret = kdbus_meta_append_exe(meta);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_EXE;
+ }
+
+ if (mask & KDBUS_ATTACH_CMDLINE) {
+ ret = kdbus_meta_append_cmdline(meta);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_CMDLINE;
+ }
+
+ /* we always return a 4 elements, the element size is 1/4 */
+ if (mask & KDBUS_ATTACH_CAPS) {
+ ret = kdbus_meta_append_caps(meta);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_CAPS;
+ }
+
+#ifdef CONFIG_CGROUPS
+ /* attach the path of the one group hierarchy specified for the bus */
+ if (mask & KDBUS_ATTACH_CGROUP) {
+ ret = kdbus_meta_append_cgroup(meta);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_CGROUP;
+ }
+#endif
+
+#ifdef CONFIG_AUDITSYSCALL
+ if (mask & KDBUS_ATTACH_AUDIT) {
+ ret = kdbus_meta_append_audit(meta);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_AUDIT;
+ }
+#endif
+
+#ifdef CONFIG_SECURITY
+ if (mask & KDBUS_ATTACH_SECLABEL) {
+ ret = kdbus_meta_append_seclabel(meta);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_SECLABEL;
+ }
+#endif
+
+ if ((mask & KDBUS_ATTACH_CONN_NAME) && conn && conn->name) {
+ ret = kdbus_meta_append_str(meta, KDBUS_ITEM_CONN_NAME,
+ conn->name);
+ if (ret < 0)
+ return ret;
+
+ meta->attached |= KDBUS_ATTACH_CONN_NAME;
+ }
+
+ return 0;
+}
diff --git a/drivers/misc/kdbus/metadata.h b/drivers/misc/kdbus/metadata.h
new file mode 100644
index 000000000000..a2728f57a06f
--- /dev/null
+++ b/drivers/misc/kdbus/metadata.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_METADATA_H
+#define __KDBUS_METADATA_H
+
+/**
+ * struct kdbus_meta - metadata buffer
+ * @attached: Flags for already attached data
+ * @domain: Domain the metadata belongs to
+ * @data: Allocated buffer
+ * @size: Number of bytes used
+ * @allocated_size: Size of buffer
+ *
+ * Used to collect and store connection metadata in a pre-compiled
+ * buffer containing struct kdbus_item.
+ */
+struct kdbus_meta {
+ u64 attached;
+ struct pid_namespace *pid_namespace;
+ struct user_namespace *user_namespace;
+ struct kdbus_item *data;
+ size_t size;
+ size_t allocated_size;
+};
+
+struct kdbus_conn;
+
+int kdbus_meta_new(struct kdbus_meta **meta);
+int kdbus_meta_dup(const struct kdbus_meta *orig,
+ struct kdbus_meta **copy);
+int kdbus_meta_append_data(struct kdbus_meta *meta, u64 type,
+ const void *buf, size_t len);
+int kdbus_meta_append(struct kdbus_meta *meta,
+ struct kdbus_conn *conn,
+ u64 seq,
+ u64 which);
+void kdbus_meta_free(struct kdbus_meta *meta);
+bool kdbus_meta_ns_eq(const struct kdbus_meta *meta_a,
+ const struct kdbus_meta *meta_b);
+#endif
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-29 22:33:45 UTC
Permalink
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
A connection chooses which metadata it wants to have attached to each
message it receives with kdbus_cmd_hello.attach_flags. The metadata
will be attached as items to the messages. All metadata refers to
information about the sending task at sending time, unless otherwise
stated. Also, the metadata is copied, not referenced, so even if the
sending task doesn't exist anymore at the time the message is received,
the information is still preserved.
See kdbus.txt for more details on which metadata can currently be
attached to messages.
---
drivers/misc/kdbus/metadata.c | 626 ++++++++++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/metadata.h | 51 ++++
2 files changed, 677 insertions(+)
create mode 100644 drivers/misc/kdbus/metadata.c
create mode 100644 drivers/misc/kdbus/metadata.h
diff --git a/drivers/misc/kdbus/metadata.c b/drivers/misc/kdbus/metadata.c
new file mode 100644
index 000000000000..8323e6d7a071
--- /dev/null
+++ b/drivers/misc/kdbus/metadata.c
@@ -0,0 +1,626 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/capability.h>
+#include <linux/cgroup.h>
+#include <linux/cred.h>
+#include <linux/file.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/pid_namespace.h>
+#include <linux/sched.h>
+#include <linux/security.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/user_namespace.h>
+#include <linux/version.h>
+
+#include "connection.h"
+#include "item.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+
+/**
+ * kdbus_meta_new() - create new metadata object
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_meta_new(struct kdbus_meta **meta)
+{
+ struct kdbus_meta *m;
+
+ BUG_ON(*meta);
+
+ m = kzalloc(sizeof(*m), GFP_KERNEL);
+ if (!m)
+ return -ENOMEM;
+
+ /*
+ * Remember the PID and user namespaces our credentials belong to;
+ * we need to prevent leaking authorization and security-relevant
+ * data across different namespaces.
+ */
+ m->pid_namespace = get_pid_ns(task_active_pid_ns(current));
+ m->user_namespace = get_user_ns(current_user_ns());
+
This is unusual, and it could be very expensive (it will serialize
essentially everyone on an exclusive cacheline). What attack is it
protecting against?
Post by Greg Kroah-Hartman
+static int kdbus_meta_append_cred(struct kdbus_meta *meta)
+{
+ struct kdbus_creds creds = {
+ .uid = from_kuid_munged(current_user_ns(), current_uid()),
+ .gid = from_kgid_munged(current_user_ns(), current_gid()),
+ .pid = task_pid_vnr(current),
+ .tid = task_tgid_vnr(current),
+ .starttime = current->start_time,
+ };
+
+ return kdbus_meta_append_data(meta, KDBUS_ITEM_CREDS,
+ &creds, sizeof(creds));
+}
This seems wrong to me. Shouldn't this store kuid_t, etc. directly?
Also, why pid, tid, and starttime?
Post by Greg Kroah-Hartman
+
+ for (i = 0; i < info->ngroups; i++)
+ gid[i] = from_kgid_munged(current_user_ns(), GROUP_AT(info, i));
Ditto.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 00:14:02 UTC
Permalink
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
A connection chooses which metadata it wants to have attached to each
message it receives with kdbus_cmd_hello.attach_flags. The metadata
will be attached as items to the messages. All metadata refers to
information about the sending task at sending time, unless otherwise
stated. Also, the metadata is copied, not referenced, so even if the
sending task doesn't exist anymore at the time the message is received,
the information is still preserved.
Also, in general, the comments seem to talk about capturing metadata
at the time that a connection is opened, but the actual code seems to
capture metadata all over the place. I think it needs to be very
clear, both in the code and the interface, when metadata is captured.

And the ns_eq stuff is too far buried (and not even contained in this
patch!) to be easily verified as being correct, whatever correct means
in that context.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-10-30 08:45:31 UTC
Permalink
Post by Andy Lutomirski
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
A connection chooses which metadata it wants to have attached to each
message it receives with kdbus_cmd_hello.attach_flags. The metadata
will be attached as items to the messages. All metadata refers to
information about the sending task at sending time, unless otherwise
stated. Also, the metadata is copied, not referenced, so even if the
sending task doesn't exist anymore at the time the message is received,
the information is still preserved.
Also, in general, the comments seem to talk about capturing metadata
at the time that a connection is opened, but the actual code seems to
capture metadata all over the place. I think it needs to be very
clear, both in the code and the interface, when metadata is captured.
Ok, so we should make that cleaner in the comments then.

To clarify, we currently take metadata at the following occasions:


1. At open() time, So we can tell peers (through KDBUS_CMD_CONN_INFO)
about the credentials a connection had when it was created with
KDBUS_CMD_HELLO.

2. When a new bus is created through KDBUS_CMD_BUS_MAKE, so peers can
later query the credentials of the owner of the bus they're connected to.

3. When we dispatch a KDBUS_CMD_MSG_SEND ioctl(), because we want to
attach the metadata that was authoritative when the message was sent.
IOW: We want metadata that actually matches the message payload.

4. We create faked metadata to pass around in messages in case the
connection was created 'on behalf' of another task. This case we need to
cover so we can implement a daemon in userspace that translates between
existing D-Bus clients and kdbus. In such cases, we want the receiving
peers to see the creds of the proxied task, rather than the proxy, so we
pass the small amount of reliably credential information that we can get
with SO_PEERCRED into the KDBUS_CMD_HELLO ioctl. In the kernel, we
create a metadata object out of that, so we can reuse when a message is
sent. This case, however, is an considered an exception and limited to
privileged clients.

In all such cases, we share some implementation in metadata.c, and we
operate on the same kdbus_metadata object, even though the origin of the
data varies in the individual cases. I agree that this should be better
documented, so I've put that on my TODO list.
Post by Andy Lutomirski
And the ns_eq stuff is too far buried (and not even contained in this
patch!) to be easily verified as being correct, whatever correct means
in that context.
I see that. As I explained earlier in my reply to Eric, we've chosen to
submit the patch set this way to keep the reply threading clean, so it
was some sort of a trade-off. Still, I think the best way to review it
is to pull in Greg's patches and look at the actual files.


Thanks,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 14:08:27 UTC
Permalink
Post by Daniel Mack
1. At open() time, So we can tell peers (through KDBUS_CMD_CONN_INFO)
about the credentials a connection had when it was created with
KDBUS_CMD_HELLO.
Then the API that tells peers about this information needs to make
this very clear.
Post by Daniel Mack
2. When a new bus is created through KDBUS_CMD_BUS_MAKE, so peers can
later query the credentials of the owner of the bus they're connected to.
Ditto. Although, why on earth should a bus have credentials? This
sounds like a misdesign. It seems to me that this type of policy
belongs all the way in userspace. If you want a bus, you ask the
owner of the entire domain to make you a bus. Or you make it yourself
and hand off references in some authenticated way.
Post by Daniel Mack
3. When we dispatch a KDBUS_CMD_MSG_SEND ioctl(), because we want to
attach the metadata that was authoritative when the message was sent.
IOW: We want metadata that actually matches the message payload.
What does that "metadata that actually matches the message payload"
mean? If I create an endpoint and delegate some processing to a less
privileged child, other things on the bus MUST NOT be able to detect
that delegation in any sensible design. Otherwise everything will
appear to work in testing because other processes never checked the
problematic credential, but then it will randomly fail because someone
decided to do something daft and validate my unprivileged child's
argv[0], which is, of course, not what they expected.

I suspect that, if you make credential sending opt-in, you will
quickly discover that the current model for which credentials matter
makes no sense.
Post by Daniel Mack
Post by Andy Lutomirski
And the ns_eq stuff is too far buried (and not even contained in this
patch!) to be easily verified as being correct, whatever correct means
in that context.
I see that. As I explained earlier in my reply to Eric, we've chosen to
submit the patch set this way to keep the reply threading clean, so it
was some sort of a trade-off. Still, I think the best way to review it
is to pull in Greg's patches and look at the actual files.
This wasn't a comment about the threading. The call, in the patches
in the git tree, is buried and very difficult to follow.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-10-30 15:54:36 UTC
Permalink
Post by Andy Lutomirski
Post by Daniel Mack
1. At open() time, So we can tell peers (through KDBUS_CMD_CONN_INFO)
about the credentials a connection had when it was created with
KDBUS_CMD_HELLO.
Then the API that tells peers about this information needs to make
this very clear.
Yes, that's an API that tells you something about a connection, and with
that KDBUS_CMD_CONN_INFO call, you'll always get the information about
the creator of the connection. IOW: by calling that particular ioctl,
you'll always get the same kind of information. I guess it's valid to
just define the API that way?
Post by Andy Lutomirski
Post by Daniel Mack
2. When a new bus is created through KDBUS_CMD_BUS_MAKE, so peers can
later query the credentials of the owner of the bus they're connected to.
Ditto. Although, why on earth should a bus have credentials? This
sounds like a misdesign. It seems to me that this type of policy
belongs all the way in userspace. If you want a bus, you ask the
owner of the entire domain to make you a bus. Or you make it yourself
and hand off references in some authenticated way.
Yes, that's the way it works. However, the idea is that a bus stays
alive as long as the file descriptor that was used to the create that
bus remains open, and it is immediately shut down when the fd is closed.
We merely allow user that are connected to a bus to query the
credentials of the creator of the bus they're connected to. So it's not
the bus which has credentials, but its original creator, at the time of
creation.
Post by Andy Lutomirski
Post by Daniel Mack
3. When we dispatch a KDBUS_CMD_MSG_SEND ioctl(), because we want to
attach the metadata that was authoritative when the message was sent.
IOW: We want metadata that actually matches the message payload.
What does that "metadata that actually matches the message payload"
mean?
A bus client posted something in some point in time. We want metadata of
the time the message was posted.
Post by Andy Lutomirski
If I create an endpoint and delegate some processing to a less
privileged child, other things on the bus MUST NOT be able to detect
that delegation in any sensible design. Otherwise everything will
appear to work in testing because other processes never checked the
problematic credential, but then it will randomly fail because someone
decided to do something daft and validate my unprivileged child's
argv[0], which is, of course, not what they expected.
Not sure whether I got your point, but if a privileged service that
takes action on behalf of unprivileged clients, it may well depend on
certain credential information to be present along with the message, and
refuse to take action otherwise.

For example, if a privileged service can reboot the system, it checks
whether the asking peer has CAP_SYS_BOOT set. If it checks for uid==0
instead, and it works in your tests because you happen to test as root,
that just a bug in the service, right? But I might have missed your point.
Post by Andy Lutomirski
I suspect that, if you make credential sending opt-in, you will
quickly discover that the current model for which credentials matter
makes no sense.
While for the example above, opting-in for creds items on the sender
side might actually work (the asking service would be refused in his
request to reboot the machine). However, for any sort of logging or
system services, for example, allowing the sender to select which creds
it wants to reveal is supporting a hide-and-seek game, and that's
something that won't work.

Thanks for sharing your thoughts on this - I appreciate this discussion
stays on such technical grounds :)

Daniel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 21:02:20 UTC
Permalink
Post by Daniel Mack
Post by Andy Lutomirski
Post by Daniel Mack
2. When a new bus is created through KDBUS_CMD_BUS_MAKE, so peers can
later query the credentials of the owner of the bus they're connected to.
Ditto. Although, why on earth should a bus have credentials? This
sounds like a misdesign. It seems to me that this type of policy
belongs all the way in userspace. If you want a bus, you ask the
owner of the entire domain to make you a bus. Or you make it yourself
and hand off references in some authenticated way.
Yes, that's the way it works. However, the idea is that a bus stays
alive as long as the file descriptor that was used to the create that
bus remains open, and it is immediately shut down when the fd is closed.
We merely allow user that are connected to a bus to query the
credentials of the creator of the bus they're connected to.
Why do you allow this? What purpose does it serve? Is that idea that
each application will own one bus? If so, what goes wrong if you only
capture the specific credentials that the creator of a given bus asks
to have captured?

[snip]
Post by Daniel Mack
Post by Andy Lutomirski
If I create an endpoint and delegate some processing to a less
privileged child, other things on the bus MUST NOT be able to detect
that delegation in any sensible design. Otherwise everything will
appear to work in testing because other processes never checked the
problematic credential, but then it will randomly fail because someone
decided to do something daft and validate my unprivileged child's
argv[0], which is, of course, not what they expected.
Not sure whether I got your point, but if a privileged service that
takes action on behalf of unprivileged clients, it may well depend on
certain credential information to be present along with the message, and
refuse to take action otherwise.
For example, if a privileged service can reboot the system, it checks
whether the asking peer has CAP_SYS_BOOT set. If it checks for uid==0
instead, and it works in your tests because you happen to test as root,
that just a bug in the service, right? But I might have missed your point.
The issue is the following: if have the privilege needed to talk to
journald, I may want to enhance security by opening a connection to
journald (and capture that privilege) and then drop privilege. I
should still be able to talk to journald.

Alternatively, if the privilege needed to reboot is CAP_SYS_BOOT, then
clients should send that capability bit. Capturing extra information
to try to give daemons the flexibility to change their authorization
conditions later on just moves the problem if you need to change
policy down the line.
Post by Daniel Mack
Post by Andy Lutomirski
I suspect that, if you make credential sending opt-in, you will
quickly discover that the current model for which credentials matter
makes no sense.
While for the example above, opting-in for creds items on the sender
side might actually work (the asking service would be refused in his
request to reboot the machine). However, for any sort of logging or
system services, for example, allowing the sender to select which creds
it wants to reveal is supporting a hide-and-seek game, and that's
something that won't work.
What's the actual problem for logging? I very much understand why a
logging service never wants to log an incorrect credential (and legacy
syslog has serious problems here because it doesn't even try to
capture credential), but what's wrong with having a log that shows the
uid for legit log messages and that reliably says "declined to state"
for messages that decline to state.

(Also, I presume that cmdline is for logging. Keep in mind that the
cmdline is yanked from user memory and can be freely spoofed.)

A major benefit of opt-in credential passing is that it makes it very
difficult to convince another process to exercise its credential on
your behalf by accident.
Post by Daniel Mack
Thanks for sharing your thoughts on this - I appreciate this discussion
stays on such technical grounds :)
My pleasure. I have no desire to see this process devolve into random
flames (and I'm glad it hasn't). That being said, I'll still continue
to object to credential and namespace issues that I think are
problematic. :)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-11-01 11:05:40 UTC
Permalink
Hi Andy,
Post by Andy Lutomirski
Post by Daniel Mack
We merely allow user that are connected to a bus to query the
credentials of the creator of the bus they're connected to.
Why do you allow this? What purpose does it serve? Is that idea that
each application will own one bus? If so, what goes wrong if you only
capture the specific credentials that the creator of a given bus asks
to have captured?
There are different kinds of buses. There is the system bus and a number
of user buses. It's really useful to be able to identify the user who
owns one of these user buses, for the sake of access control. More
specifically, we have a compatibility service called "bus-proxy" that
speaks the old D-Bus socket protocol on one side and translates all
messages to kdbus messages onto the other. For that, it needs to enforce
the old D-Bus access control semantics, which is described in XML, and
has quite elaborate checks. In order to enforce it, it's relevant to be
able to compare peer credentials with bus owner credentials, because
there's usually a rule that the bus owner UID is allowed more than other
peers.

Sure, there are other ways to figure out the identity of the bus, but
it's really nice to have similar semantics for identifying the bus
owner. kdbus internally has that piece of information anyway, so we
decided to export it, optionally. However, that's really a minor detail
after all.
Post by Andy Lutomirski
Post by Daniel Mack
For example, if a privileged service can reboot the system, it checks
whether the asking peer has CAP_SYS_BOOT set. If it checks for uid==0
instead, and it works in your tests because you happen to test as root,
that just a bug in the service, right? But I might have missed your point.
The issue is the following: if have the privilege needed to talk to
journald, I may want to enhance security by opening a connection to
journald (and capture that privilege) and then drop privilege. I
should still be able to talk to journald.
Hmm, this is not how D-Bus works, and kdbus stays close the design
principles of D-Bus. There's no concept of 'opening connections to
services'. You just connect to a bus, and then on that bus, you send
individual messages and method calls to other services.

The design of Binder and D-Bus are fundamentally different in that
regard. On D-Bus, the focus is really about method call transaction (a
method call message and a corresponding reply message), and there's no
way to continuously reference a peer via the concept of a 'connection'.

This is why we have this functionality of passing over the caller creds
every time a method call is made. The focus is really on the individual
method call transaction, each one is individually routed, dispatched and
checked for permission. Hence, it should carry individual credential
information from the time the call is issued.

So, back to your example: you cannot 'open a connection to journald'.
You can only connect to a bus and send messages to journald.
Post by Andy Lutomirski
Alternatively, if the privilege needed to reboot is CAP_SYS_BOOT, then
clients should send that capability bit. Capturing extra information
to try to give daemons the flexibility to change their authorization
conditions later on just moves the problem if you need to change
policy down the line.
This would be similar to changing the Linux kernel so that each system
call gets a set of capabilities passed in explicitly. RPC calls are very
much comparable to syscalls, though they don't transition into kernel
space, but simply into another process.

We've been augmenting what syscalls check on for access control ever
since. Initially, it check was UID based, then became capability based,
and nowadays we have a concept of MACs, that are actually different on
every system, because some systems use SElinux, some SMACK or some other
MAC. This is why this metadata should be implicit and controlled by the
receiver, not by the sender, because the implementation of the policy
might change eventually, be extended with more sophisticated access
control etc.

But it's not only about access control, there's also auditing. In order
to generate useful audit logs, a system service that is offering
privileged operation to certain clients needs to know the audit
credentials (sessionid and loginuid). Hence, this information needs to
be implicitly appended, controlled by that service. Because if it is not
implicitly appended, than it will more often be missing than expected
(simply because in real-life very few people actually use auditing), and
the system service would not be able to log about it.

So, this is not a metadata leak by accident but metadata that system
services need to know about in order to work properly. Individual
services will require slightly different components of these
credentials, but if you combine things, they need to know pretty much
all of the details we currently offer as implied metadata.

The system bus is about unprivileged apps asking system services for
system operations, in which case the system services must have a way to
know who wants them to do what. The purpose of a system bus is _not_ to
allow unprivileged peers to talk to each other, that's actually even
forbidden in the default policy. That's what user buses are for, and on
those, pretty much every client will have the same privileges anyway,
hence there's no information leak there either.
Post by Andy Lutomirski
What's the actual problem for logging? I very much understand why a
logging service never wants to log an incorrect credential (and legacy
syslog has serious problems here because it doesn't even try to
capture credential), but what's wrong with having a log that shows the
uid for legit log messages and that reliably says "declined to state"
for messages that decline to state.
A system's administrator should be able to gather all sorts of
information about things that happened on a system. Trying to hide
associated metadata is not how things are done anyway - we show it in
numerous places, in /proc, in SCM_CREDENTIALS, by listing /tmp or /home etc.

The separation to limit what is passed around is, in our concept, rather
on the level of connecting to separate buses, PID namespaces, kdbus
domains etc, than to suppress information.
Post by Andy Lutomirski
(Also, I presume that cmdline is for logging. Keep in mind that the
cmdline is yanked from user memory and can be freely spoofed.)
Sure. But If a task did that, we want to know about it, and log it
accordingly. Journald provides such features already today, and it's a
great deal in detecting runtime inconsistencies between a task's real
nature and what is displayed in 'ps'.



Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-11-01 16:20:35 UTC
Permalink
Post by Djalal Harouni
Hi Andy,
Post by Andy Lutomirski
Post by Daniel Mack
We merely allow user that are connected to a bus to query the
credentials of the creator of the bus they're connected to.
Why do you allow this? What purpose does it serve? Is that idea that
each application will own one bus? If so, what goes wrong if you only
capture the specific credentials that the creator of a given bus asks
to have captured?
There are different kinds of buses. There is the system bus and a number
of user buses. It's really useful to be able to identify the user who
owns one of these user buses, for the sake of access control. More
specifically, we have a compatibility service called "bus-proxy" that
speaks the old D-Bus socket protocol on one side and translates all
messages to kdbus messages onto the other. For that, it needs to enforce
the old D-Bus access control semantics, which is described in XML, and
has quite elaborate checks. In order to enforce it, it's relevant to be
able to compare peer credentials with bus owner credentials, because
there's usually a rule that the bus owner UID is allowed more than other
peers.
Sure, there are other ways to figure out the identity of the bus, but
it's really nice to have similar semantics for identifying the bus
owner. kdbus internally has that piece of information anyway, so we
decided to export it, optionally. However, that's really a minor detail
after all.
I'm sceptical about the kernel offering APIs just because it can. I'm
not fundamentally opposed to objects (e.g. busses) having ownership
information, but I think it needs to be well-justified.

Keep in mind that the kernel *has* a concept of ownership. It's uid,
gid, and security label. Having the creator's full set of caps and
even *command line* as part of the ownership information is really
weird.
Post by Djalal Harouni
Post by Andy Lutomirski
Post by Daniel Mack
For example, if a privileged service can reboot the system, it checks
whether the asking peer has CAP_SYS_BOOT set. If it checks for uid==0
instead, and it works in your tests because you happen to test as root,
that just a bug in the service, right? But I might have missed your point.
The issue is the following: if have the privilege needed to talk to
journald, I may want to enhance security by opening a connection to
journald (and capture that privilege) and then drop privilege. I
should still be able to talk to journald.
Hmm, this is not how D-Bus works, and kdbus stays close the design
principles of D-Bus. There's no concept of 'opening connections to
services'. You just connect to a bus, and then on that bus, you send
individual messages and method calls to other services.
The design of Binder and D-Bus are fundamentally different in that
regard. On D-Bus, the focus is really about method call transaction (a
method call message and a corresponding reply message), and there's no
way to continuously reference a peer via the concept of a 'connection'.
This is why we have this functionality of passing over the caller creds
every time a method call is made. The focus is really on the individual
method call transaction, each one is individually routed, dispatched and
checked for permission. Hence, it should carry individual credential
information from the time the call is issued.
So, back to your example: you cannot 'open a connection to journald'.
You can only connect to a bus and send messages to journald.
But you can open a kdbus fd. IMO there should be some actual thought
as to what should happen if that credentials change after the fd is
opened. The default answer in UNIX is that credentials are only
checked at open time. Violating that should have a good reason.
Post by Djalal Harouni
Post by Andy Lutomirski
Alternatively, if the privilege needed to reboot is CAP_SYS_BOOT, then
clients should send that capability bit. Capturing extra information
to try to give daemons the flexibility to change their authorization
conditions later on just moves the problem if you need to change
policy down the line.
This would be similar to changing the Linux kernel so that each system
call gets a set of capabilities passed in explicitly. RPC calls are very
much comparable to syscalls, though they don't transition into kernel
space, but simply into another process.
That would be a fantastic idea, and, in fact, it would have been
vastly better if syscalls had always worked that way. It's how
anything on a network *has* to work, it's how object capability
systems work (and object capability systems are much less prone to
security bugs caused by overcomplicated implicit checks), and it's how
Capsicum (which builds on POSIX!) works.
Post by Djalal Harouni
We've been augmenting what syscalls check on for access control ever
since. Initially, it check was UID based, then became capability based,
and nowadays we have a concept of MACs, that are actually different on
every system, because some systems use SElinux, some SMACK or some other
MAC. This is why this metadata should be implicit and controlled by the
receiver, not by the sender, because the implementation of the policy
might change eventually, be extended with more sophisticated access
control etc.
I agree that this is convenient, but it's a hack, and we've only
gotten away with it so far because only something that's very much in
the trusted computing base can see this information at all. And I
strongly suspect that, if I were inclined to try to break SELinux, I
could find any number of rather fundamental holes based on the fact
that it adds checks to capabilities in places where they didn't exist
originally.
Post by Djalal Harouni
But it's not only about access control, there's also auditing. In order
to generate useful audit logs, a system service that is offering
privileged operation to certain clients needs to know the audit
credentials (sessionid and loginuid). Hence, this information needs to
be implicitly appended, controlled by that service. Because if it is not
implicitly appended, than it will more often be missing than expected
(simply because in real-life very few people actually use auditing), and
the system service would not be able to log about it.
So, this is not a metadata leak by accident but metadata that system
services need to know about in order to work properly. Individual
services will require slightly different components of these
credentials, but if you combine things, they need to know pretty much
all of the details we currently offer as implied metadata.
Sorry, but this is bogus. The audit system is a bit of an unstable
mess, and, to all appearances, its main design goal seems to be
regulatory compliance instead of actual security. It seems to be
mostly harmless, but only because all of the data gathered by auditd
that should never have existed in the first place can only be
collected by an extremely privileged global daemon and is shoved in
logs that no one outside the TCB can read.

If you want a configurable-out off-by-default bolt-on system that
allows auditd (and nothing else!) to sniff kdbus busses and log
whatever random crap it wants to log (and keep it the hell out of the
part of the journal that unprivileged users can see), then do so.
Post by Djalal Harouni
The system bus is about unprivileged apps asking system services for
system operations, in which case the system services must have a way to
know who wants them to do what. The purpose of a system bus is _not_ to
allow unprivileged peers to talk to each other, that's actually even
forbidden in the default policy. That's what user buses are for, and on
those, pretty much every client will have the same privileges anyway,
hence there's no information leak there either.
This is nonsense. If you don't need real access control on user
busses, then *don't add any*.

Except that you *do* need real access control, because of containers,
seccomp, the Chromium sandbox, the upcoming Firefox sandbox, all of
the other little sandboxes that should exist but don't yet, remoting,
etc. And I guarantee that your list of implicitly transmitted
credentials will fail to handle this case, and this case is the only
one that really matters. At the end of the day, you're going to need
either crypto, something like object capabilities, or an access
broker. And none of those benefit at all from the current proposed
model. (And none of them check anything like implicit credentials at
the time of an RPC call, either.)

Interestingly, it sounds to be like traditional D-Bus will work
considerably better than kdbus in this scenario.

As a concrete example, Wayland takes sensitive operations like
screenshots very seriously. Which of the kdbus metadata items would
be appropriate to use for access control for something like that?
Post by Djalal Harouni
Post by Andy Lutomirski
What's the actual problem for logging? I very much understand why a
logging service never wants to log an incorrect credential (and legacy
syslog has serious problems here because it doesn't even try to
capture credential), but what's wrong with having a log that shows the
uid for legit log messages and that reliably says "declined to state"
for messages that decline to state.
A system's administrator should be able to gather all sorts of
information about things that happened on a system. Trying to hide
associated metadata is not how things are done anyway - we show it in
numerous places, in /proc, in SCM_CREDENTIALS, by listing /tmp or /home etc.
See above. The sysadmin != every single kdbus user.
Post by Djalal Harouni
Post by Andy Lutomirski
(Also, I presume that cmdline is for logging. Keep in mind that the
cmdline is yanked from user memory and can be freely spoofed.)
Sure. But If a task did that, we want to know about it, and log it
accordingly. Journald provides such features already today, and it's a
great deal in detecting runtime inconsistencies between a task's real
nature and what is displayed in 'ps'.
Sorry, but this is a bit off the deep end. The kdbus logging
mechanism shouldn't be in the business of trying to do this, because
it's out of scope, and it will fail.

You can't justify logging fundamentally unverifiable things like the
command line by saying that you want to know if someone tries to play
(impossible-to-reliably-detect) games to obscure their command line.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Simon McVittie
2014-11-03 12:00:24 UTC
Permalink
Post by Andy Lutomirski
You can't justify logging fundamentally unverifiable things like the
command line by saying that you want to know if someone tries to play
(impossible-to-reliably-detect) games to obscure their command line.
I think kdbus might be mixing up two orthogonal things here.

It has an easy, kernel-checked, race-free way to determine
kernel-mediated credential-like information that cannot be faked or
interfered with (uid, primary gid, other gids?, security label,
capabilities) because these are usable for security decisions, but if
they are *not* received in a kernel-checked, race-free way, then they
are useless.

One concrete example of using non-ucred credential-like information is
that traditional D-Bus can only restrict sysadmin tasks to uid 0 (or a
root-equivalent uid in group sudo/admin/whatever), whereas when systemd
and systemd-logind are run on kdbus, many of their D-Bus methods require
specific capabilities(7): KillUser requires CAP_KILL, PowerOff requires
CAP_SYS_BOOT, and so on. If capabilities(7) are a good thing, then
that's surely a good thing too. (On the other hand, if you think
capabilities(7) are a waste of time, then so is this.)

It also uses the same mechanism as an easy, race-free, but *not*
kernel-checked way to determine bits and pieces that are valuable for
debugging (dbus-monitor etc.), but unsuitable for security decisions,
such as cmdline.

In traditional D-Bus, you can get the uid and pid of a remote process,
but in a debug log you would probably actually prefer to log the cmdline
in addition; yes a malicious user could fake the cmdline, but when
debugging a system problem, information that is known to be forgeable
seems better than no information at all. After all, ps(1) shows the
forgeable cmdline, not just the executable. You can get that by
rummaging in /proc/$pid, but there is a race: if the remote process
exits too soon (a "fire and forget" method call) then you'll never know
who it was. kdbus solves that race, but does not make cmdline unforgeable.

If client libraries wishing to attach their cmdline (or other debug
info) to messages for debugging were required to add it as an
out-of-band KDBUS_ITEM, or as a D-Bus message header inside the payload,
then that would be duplicating work in client libraries that could have
been done centrally, but would still solve the race.

S

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-11-03 17:05:45 UTC
Permalink
On Mon, Nov 3, 2014 at 4:00 AM, Simon McVittie
Post by Simon McVittie
Post by Andy Lutomirski
You can't justify logging fundamentally unverifiable things like the
command line by saying that you want to know if someone tries to play
(impossible-to-reliably-detect) games to obscure their command line.
I think kdbus might be mixing up two orthogonal things here.
It has an easy, kernel-checked, race-free way to determine
kernel-mediated credential-like information that cannot be faked or
interfered with (uid, primary gid, other gids?, security label,
capabilities) because these are usable for security decisions, but if
they are *not* received in a kernel-checked, race-free way, then they
are useless.
One concrete example of using non-ucred credential-like information is
that traditional D-Bus can only restrict sysadmin tasks to uid 0 (or a
root-equivalent uid in group sudo/admin/whatever), whereas when systemd
and systemd-logind are run on kdbus, many of their D-Bus methods require
specific capabilities(7): KillUser requires CAP_KILL, PowerOff requires
CAP_SYS_BOOT, and so on. If capabilities(7) are a good thing, then
that's surely a good thing too. (On the other hand, if you think
capabilities(7) are a waste of time, then so is this.)
I think that capabilities(7) is largely a disaster. That aside, I
don't think that all of these capabilities should refer to the
*kernel* privileges. For example, CAP_SYS_BOOT should be, and remain,
the ability to reboot the system *yourself*. For example, if someone
wants to implement Windows 7 (or Visa? Or 2003? I forget.) style
reboot auditing, then userspace specifically does not want programs
that can ask for a reboot to hold CAP_SYS_BOOT.
Post by Simon McVittie
It also uses the same mechanism as an easy, race-free, but *not*
kernel-checked way to determine bits and pieces that are valuable for
debugging (dbus-monitor etc.), but unsuitable for security decisions,
such as cmdline.
In traditional D-Bus, you can get the uid and pid of a remote process,
but in a debug log you would probably actually prefer to log the cmdline
in addition; yes a malicious user could fake the cmdline, but when
debugging a system problem, information that is known to be forgeable
seems better than no information at all. After all, ps(1) shows the
forgeable cmdline, not just the executable. You can get that by
rummaging in /proc/$pid, but there is a race: if the remote process
exits too soon (a "fire and forget" method call) then you'll never know
who it was. kdbus solves that race, but does not make cmdline unforgeable.
I would love to fix this race. That opens the door to a lot more
debugging (maps, status, etc) while not pretending to offer a kernel
check that doesn't, and can't, exist.

Materializing some sort of pid fd on the receiving end would be one
solution. Another would be to give the receiver some token that can
be used to check for pid recycling.

Can we give each task a pid-namespace-local 64-bit (or even longer)
number that is mostly guaranteed not to be reused for the lifetime of
the namespace?

For example, give each task a per-pidns tid_unique, and give each
pidns a next_tid_unique. Creating a task assigns it
next_tid_unique++. (On overflow, clone fails.) For CRIU's benefit,
if you have CAP_SYS_ADMIN over the pidns's userns and you are not in
the pidns, then you can change tid_unique and next_tid_unique. Doing
that carelessly will introduce races, but that's fine -- it should
only be done on restore when everything's frozen.
Post by Simon McVittie
If client libraries wishing to attach their cmdline (or other debug
info) to messages for debugging were required to add it as an
out-of-band KDBUS_ITEM, or as a D-Bus message header inside the payload,
then that would be duplicating work in client libraries that could have
been done centrally, but would still solve the race.
I much prefer that approach. If a kernel feature is being added to
just to avoid duplication of a debugging aid in user code, then let's
leave it to user code.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-10-30 08:10:36 UTC
Permalink
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
+/**
+ * kdbus_meta_new() - create new metadata object
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_meta_new(struct kdbus_meta **meta)
+{
+ struct kdbus_meta *m;
+
+ BUG_ON(*meta);
+
+ m = kzalloc(sizeof(*m), GFP_KERNEL);
+ if (!m)
+ return -ENOMEM;
+
+ /*
+ * Remember the PID and user namespaces our credentials belong to;
+ * we need to prevent leaking authorization and security-relevant
+ * data across different namespaces.
+ */
+ m->pid_namespace = get_pid_ns(task_active_pid_ns(current));
+ m->user_namespace = get_user_ns(current_user_ns());
+
This is unusual, and it could be very expensive (it will serialize
essentially everyone on an exclusive cacheline). What attack is it
protecting against?
As mentioned before, we currently prevent metadata from crossing over
user and pid namespace boundaries. In order to detect such situations,
we need to pin the namespaces of the the task creating such a metadata
object, so we can compare them later, even when the original task is not
alive anymore. But I'm open for cheaper solutions for this, as I'm
admittedly not an expert in these APIs.
Post by Andy Lutomirski
Post by Greg Kroah-Hartman
+static int kdbus_meta_append_cred(struct kdbus_meta *meta)
+{
+ struct kdbus_creds creds = {
+ .uid = from_kuid_munged(current_user_ns(), current_uid()),
+ .gid = from_kgid_munged(current_user_ns(), current_gid()),
+ .pid = task_pid_vnr(current),
+ .tid = task_tgid_vnr(current),
+ .starttime = current->start_time,
+ };
+
+ return kdbus_meta_append_data(meta, KDBUS_ITEM_CREDS,
+ &creds, sizeof(creds));
+}
This seems wrong to me. Shouldn't this store kuid_t, etc. directly?
The metadata item's memory that is appended here is directly copied into
the final message in the receiver's pool later, so the information has
to be authoritative and translated at this point. This is currently not
a problem as in cases where we cross namespaces, the metadata will not
be added to the final message anyway.

But you're right, if we support translation between namespaces later, we
need to store the kuid_t here, and patch in the the translated version
later, when the message is installed by the receiving peer (which is
when we know which namespace to translate the kuid_t for).
Post by Andy Lutomirski
Also, why pid, tid, and starttime?
Because pid is also part of struct ucred, and starttime seemed to fit in
here as well. After all, an item has some overhead with its header, so
we tried to group information that will most probably be needed
together. Any strong reason not to store it here?


Thanks,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:03:51 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch adds the policy database implementation.

A policy databases restrict the possibilities of connections to own,
see and talk to well-known names. It can be associated with a bus
(through a policy holder connection) or a custom endpoint.

By default, buses have an empty policy database that is augmented on
demand when a policy holder connection is instantiated.

Policies are set through KDBUS_CMD_HELLO (when creating a policy
holder connection), KDBUS_CMD_CONN_UPDATE (when updating a policy
holder connection), KDBUS_CMD_EP_MAKE (creating a custom endpoint)
or KDBUS_CMD_EP_UPDATE (updating a custom endpoint). In all cases,
the name and policy access information is stored in items of type
KDBUS_ITEM_NAME and KDBUS_ITEM_POLICY_ACCESS.

See Documentation/kdbus.txt for more details.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/misc/kdbus/policy.c | 617 ++++++++++++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/policy.h | 60 +++++
2 files changed, 677 insertions(+)
create mode 100644 drivers/misc/kdbus/policy.c
create mode 100644 drivers/misc/kdbus/policy.h

diff --git a/drivers/misc/kdbus/policy.c b/drivers/misc/kdbus/policy.c
new file mode 100644
index 000000000000..66bce57eb5e6
--- /dev/null
+++ b/drivers/misc/kdbus/policy.c
@@ -0,0 +1,617 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "item.h"
+#include "names.h"
+#include "policy.h"
+
+#define KDBUS_POLICY_HASH_SIZE 64
+
+/**
+ * struct kdbus_policy_db_cache_entry - a cached entry
+ * @conn_a: Connection A
+ * @conn_b: Connection B
+ * @owner: Owner of policy-entry that produced this cache-entry
+ * @hentry: The hash table entry for the database's entries_hash
+ */
+struct kdbus_policy_db_cache_entry {
+ struct kdbus_conn *conn_a;
+ struct kdbus_conn *conn_b;
+ const void *owner;
+ struct hlist_node hentry;
+};
+
+/**
+ * struct kdbus_policy_db_entry_access - a database entry access item
+ * @type: One of KDBUS_POLICY_ACCESS_* types
+ * @access: Access to grant. One of KDBUS_POLICY_*
+ * @uid: For KDBUS_POLICY_ACCESS_USER, the global uid
+ * @gid: For KDBUS_POLICY_ACCESS_GROUP, the global gid
+ * @list: List entry item for the entry's list
+ *
+ * This is the internal version of struct kdbus_policy_db_access.
+ */
+struct kdbus_policy_db_entry_access {
+ u8 type; /* USER, GROUP, WORLD */
+ u8 access; /* OWN, TALK, SEE */
+ union {
+ kuid_t uid; /* global uid */
+ kgid_t gid; /* global gid */
+ };
+ struct list_head list;
+};
+
+/**
+ * struct kdbus_policy_db_entry - a policy database entry
+ * @name: The name to match the policy entry against
+ * @hentry: The hash entry for the database's entries_hash
+ * @access_list: List head for keeping tracks of the entry's
+ * access items.
+ * @owner: The owner of this entry. Can be a kdbus_conn or
+ * a kdbus_ep object.
+ * @wildcard: The name is a wildcard, such as ending on '.*'
+ */
+struct kdbus_policy_db_entry {
+ char *name;
+ struct hlist_node hentry;
+ struct list_head access_list;
+ const void *owner;
+ bool wildcard:1;
+};
+
+static void kdbus_policy_entry_free(struct kdbus_policy_db_entry *e)
+{
+ struct kdbus_policy_db_entry_access *a, *tmp;
+
+ list_for_each_entry_safe(a, tmp, &e->access_list, list) {
+ list_del(&a->list);
+ kfree(a);
+ }
+
+ kfree(e->name);
+ kfree(e);
+}
+
+static const struct kdbus_policy_db_entry *
+kdbus_policy_lookup(struct kdbus_policy_db *db,
+ const char *name, u32 hash, bool wildcard)
+{
+ struct kdbus_policy_db_entry *e, *found = NULL;
+
+ hash_for_each_possible(db->entries_hash, e, hentry, hash)
+ if (strcmp(e->name, name) == 0 && !e->wildcard)
+ return e;
+
+ if (wildcard) {
+ const char *tmp;
+ char *dot;
+
+ tmp = kstrdup(name, GFP_KERNEL);
+ if (!tmp)
+ return NULL;
+
+ dot = strrchr(tmp, '.');
+ if (!dot)
+ goto exit_free;
+
+ *dot = '\0';
+ hash = kdbus_str_hash(tmp);
+
+ hash_for_each_possible(db->entries_hash, e, hentry, hash)
+ if (strcmp(e->name, tmp) == 0 && e->wildcard) {
+ found = e;
+ /* never "break;" in hash_for_each() */
+ goto exit_free;
+ }
+
+exit_free:
+ kfree(tmp);
+ }
+
+ return found;
+}
+
+/**
+ * kdbus_policy_db_clear - release all memory from a policy db
+ * @db: The policy database
+ */
+void kdbus_policy_db_clear(struct kdbus_policy_db *db)
+{
+ struct kdbus_policy_db_cache_entry *ce;
+ struct kdbus_policy_db_entry *e;
+ struct hlist_node *tmp;
+ unsigned int i;
+
+ BUG_ON(!db);
+
+ /* purge entries */
+ down_write(&db->entries_rwlock);
+ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry) {
+ hash_del(&e->hentry);
+ kdbus_policy_entry_free(e);
+ }
+ up_write(&db->entries_rwlock);
+
+ /* purge cache */
+ mutex_lock(&db->cache_lock);
+ hash_for_each_safe(db->talk_access_hash, i, tmp, ce, hentry) {
+ hash_del(&ce->hentry);
+ kfree(ce);
+ }
+ mutex_unlock(&db->cache_lock);
+}
+
+/**
+ * kdbus_policy_db_init() - initialize a new policy database
+ * @db: The location of the database
+ *
+ * This initializes a new policy-db. The underlying memory must have been
+ * cleared to zero by the caller.
+ */
+void kdbus_policy_db_init(struct kdbus_policy_db *db)
+{
+ hash_init(db->entries_hash);
+ hash_init(db->talk_access_hash);
+ init_rwsem(&db->entries_rwlock);
+ mutex_init(&db->cache_lock);
+}
+
+static int kdbus_policy_check_access(const struct kdbus_policy_db_entry *e,
+ const struct cred *cred,
+ unsigned int access)
+{
+ struct kdbus_policy_db_entry_access *a;
+ struct group_info *group_info;
+ int i;
+
+ if (!e)
+ return -EPERM;
+
+ group_info = cred->group_info;
+
+ list_for_each_entry(a, &e->access_list, list) {
+ if (a->access >= access) {
+ switch (a->type) {
+ case KDBUS_POLICY_ACCESS_USER:
+ if (uid_eq(cred->uid, a->uid))
+ return 0;
+ break;
+ case KDBUS_POLICY_ACCESS_GROUP:
+ if (gid_eq(cred->gid, a->gid))
+ return 0;
+
+ for (i = 0; i < group_info->ngroups; i++) {
+ kgid_t gid = GROUP_AT(group_info, i);
+
+ if (gid_eq(gid, a->gid))
+ return 0;
+ }
+
+ break;
+ case KDBUS_POLICY_ACCESS_WORLD:
+ return 0;
+ }
+ }
+ }
+
+ return -EPERM;
+}
+
+/**
+ * kdbus_policy_check_own_access() - check whether a connection is allowed
+ * to own a name
+ * @db: The policy database
+ * @conn: The connection to check
+ * @name: The name to check
+ *
+ * Return: 0 if the connection is allowed to own the name, -EPERM otherwise
+ */
+int kdbus_policy_check_own_access(struct kdbus_policy_db *db,
+ const struct kdbus_conn *conn,
+ const char *name)
+{
+ const struct kdbus_policy_db_entry *e;
+ int ret;
+
+ down_read(&db->entries_rwlock);
+ e = kdbus_policy_lookup(db, name, kdbus_str_hash(name), true);
+ ret = kdbus_policy_check_access(e, conn->cred, KDBUS_POLICY_OWN);
+ up_read(&db->entries_rwlock);
+
+ return ret;
+}
+
+/**
+ * kdbus_policy_check_talk_access() - check if one connection is allowed
+ * to send a message to another connection
+ * @db: The policy database
+ * @conn_src: The source connection
+ * @conn_dst: The destination connection
+ *
+ * Return: 0 if access is granted, -EPERM if not, negative errno on failure
+ */
+int kdbus_policy_check_talk_access(struct kdbus_policy_db *db,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst)
+{
+ struct kdbus_policy_db_cache_entry *ce;
+ struct kdbus_name_entry *name_entry;
+ unsigned int hash = 0;
+ const void *owner;
+ int ret;
+
+ /*
+ * If there was a positive match for these two connections before,
+ * there's an entry in the hash table for them.
+ */
+ hash ^= hash_ptr(conn_src, KDBUS_POLICY_HASH_SIZE);
+ hash ^= hash_ptr(conn_dst, KDBUS_POLICY_HASH_SIZE);
+
+ mutex_lock(&db->cache_lock);
+ hash_for_each_possible(db->talk_access_hash, ce, hentry, hash)
+ if (ce->conn_a == conn_src && ce->conn_b == conn_dst) {
+ mutex_unlock(&db->cache_lock);
+ return 0;
+ }
+ mutex_unlock(&db->cache_lock);
+
+ /*
+ * Otherwise, walk the connection list and store a hash-table entry if
+ * send access is granted.
+ */
+
+ down_read(&db->entries_rwlock);
+
+ ret = -EPERM;
+ mutex_lock(&conn_dst->lock);
+ list_for_each_entry(name_entry, &conn_dst->names_list, conn_entry) {
+ u32 hash = kdbus_str_hash(name_entry->name);
+ const struct kdbus_policy_db_entry *e;
+
+ e = kdbus_policy_lookup(db, name_entry->name, hash, true);
+ if (kdbus_policy_check_access(e, conn_src->cred,
+ KDBUS_POLICY_TALK) == 0) {
+ owner = e->owner;
+ ret = 0;
+ break;
+ }
+ }
+ mutex_unlock(&conn_dst->lock);
+
+ if (ret >= 0) {
+ ret = -ENOMEM;
+ ce = kmalloc(sizeof(*ce), GFP_KERNEL);
+ if (ce) {
+ ce->conn_a = conn_src;
+ ce->conn_b = conn_dst;
+ ce->owner = owner;
+ INIT_HLIST_NODE(&ce->hentry);
+
+ mutex_lock(&db->cache_lock);
+ hash_add(db->talk_access_hash, &ce->hentry, hash);
+ mutex_unlock(&db->cache_lock);
+
+ ret = 0;
+ }
+ }
+
+ up_read(&db->entries_rwlock);
+
+ return ret;
+}
+
+/**
+ * kdbus_policy_check_see_access_unlocked() - Check whether a connection is
+ * allowed to see a given name
+ * @db: The policy database
+ * @conn: The connection performing the lookup
+ * @name: The name
+ *
+ * Return: 0 if permission to see the name is granted, -EPERM otherwise
+ */
+int kdbus_policy_check_see_access_unlocked(struct kdbus_policy_db *db,
+ struct kdbus_conn *conn,
+ const char *name)
+{
+ const struct kdbus_policy_db_entry *e;
+
+ e = kdbus_policy_lookup(db, name, kdbus_str_hash(name), true);
+ return kdbus_policy_check_access(e, conn->cred, KDBUS_POLICY_SEE);
+}
+
+static void __kdbus_policy_remove_owner_cache(struct kdbus_policy_db *db,
+ const void *owner)
+{
+ struct kdbus_policy_db_cache_entry *ce;
+ struct hlist_node *tmp;
+ int i;
+
+ mutex_lock(&db->cache_lock);
+ hash_for_each_safe(db->talk_access_hash, i, tmp, ce, hentry)
+ if (ce->owner == owner) {
+ hash_del(&ce->hentry);
+ kfree(ce);
+ }
+ mutex_unlock(&db->cache_lock);
+}
+
+static void __kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+ const void *owner)
+{
+ struct kdbus_policy_db_entry *e;
+ struct hlist_node *tmp;
+ int i;
+
+ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
+ if (e->owner == owner) {
+ hash_del(&e->hentry);
+ kdbus_policy_entry_free(e);
+ }
+}
+
+/**
+ * kdbus_policy_remove_owner() - remove all entries related to a connection
+ * @db: The policy database
+ * @owner: The connection which items to remove
+ */
+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+ const void *owner)
+{
+ down_write(&db->entries_rwlock);
+ __kdbus_policy_remove_owner(db, owner);
+ __kdbus_policy_remove_owner_cache(db, owner);
+ up_write(&db->entries_rwlock);
+}
+
+/**
+ * kdbus_policy_purge_cache_for_conn() - remove all cached entries related to
+ * a connection
+ * @db: The policy database
+ * @conn: The connection which items to remove
+ */
+void kdbus_policy_purge_cache(struct kdbus_policy_db *db,
+ const struct kdbus_conn *conn)
+{
+ struct kdbus_policy_db_cache_entry *ce;
+ struct hlist_node *tmp;
+ int i;
+
+ mutex_lock(&db->cache_lock);
+ hash_for_each_safe(db->talk_access_hash, i, tmp, ce, hentry)
+ if (ce->conn_a == conn || ce->conn_b == conn) {
+ hash_del(&ce->hentry);
+ kfree(ce);
+ }
+ mutex_unlock(&db->cache_lock);
+}
+
+/*
+ * Convert user provided policy access to internal kdbus policy
+ * access
+ */
+static int
+kdbus_policy_make_access(const struct kdbus_policy_access *uaccess,
+ struct kdbus_policy_db_entry_access **entry)
+{
+ int ret;
+ struct kdbus_policy_db_entry_access *a;
+
+ a = kzalloc(sizeof(*a), GFP_KERNEL);
+ if (!a)
+ return -ENOMEM;
+
+ ret = -EINVAL;
+ switch (uaccess->type) {
+ case KDBUS_POLICY_ACCESS_USER:
+ a->uid = make_kuid(current_user_ns(), uaccess->id);
+ if (!uid_valid(a->uid))
+ goto err;
+
+ break;
+ case KDBUS_POLICY_ACCESS_GROUP:
+ a->gid = make_kgid(current_user_ns(), uaccess->id);
+ if (!gid_valid(a->gid))
+ goto err;
+
+ break;
+ }
+
+ a->type = uaccess->type;
+ a->access = uaccess->access;
+
+ *entry = a;
+
+ return 0;
+
+err:
+ kfree(a);
+ return ret;
+}
+
+/**
+ * kdbus_policy_set() - set a connection's policy rules
+ * @db: The policy database
+ * @items: A list of kdbus_item elements that contain both
+ * names and access rules to set.
+ * @items_size: The total size of the items.
+ * @max_policies: The maximum number of policy entries to allow.
+ * Pass 0 for no limit.
+ * @allow_wildcards: Boolean value whether wildcard entries (such
+ * ending on '.*') should be allowed.
+ * @owner: The owner of the new policy items.
+ *
+ * This function sets a new set of policies for a given owner. The names and
+ * access rules are gathered by walking the list of items passed in as
+ * argument. An item of type KDBUS_ITEM_NAME is expected before any number of
+ * KDBUS_ITEM_POLICY_ACCESS items. If there are more repetitions of this
+ * pattern than denoted in @max_policies, -EINVAL is returned.
+ *
+ * In order to allow atomic replacement of rules, the function first removes
+ * all entries that have been created for the given owner previously.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_policy_set(struct kdbus_policy_db *db,
+ const struct kdbus_item *items,
+ size_t items_size,
+ size_t max_policies,
+ bool allow_wildcards,
+ const void *owner)
+{
+ struct kdbus_policy_db_entry_access *a;
+ struct kdbus_policy_db_entry *e, *p;
+ const struct kdbus_item *item;
+ struct hlist_node *tmp;
+ HLIST_HEAD(entries);
+ HLIST_HEAD(restore);
+ size_t count = 0;
+ int i, ret = 0;
+ u32 hash;
+
+ if (items_size > KDBUS_POLICY_MAX_SIZE)
+ return -E2BIG;
+
+ /* Walk the list of items and look for new policies */
+ e = NULL;
+ KDBUS_ITEMS_FOREACH(item, items, items_size) {
+ switch (item->type) {
+ case KDBUS_ITEM_NAME: {
+ size_t len;
+
+ if (max_policies && ++count > max_policies) {
+ ret = -E2BIG;
+ goto exit;
+ }
+
+ if (!kdbus_name_is_valid(item->str, true)) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ e = kzalloc(sizeof(*e), GFP_KERNEL);
+ if (!e) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ INIT_LIST_HEAD(&e->access_list);
+ e->owner = owner;
+ hlist_add_head(&e->hentry, &entries);
+
+ e->name = kstrdup(item->str, GFP_KERNEL);
+ if (!e->name) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ /*
+ * If a supplied name ends with an '.*', cut off that
+ * part, only store anything before it, and mark the
+ * entry as wildcard.
+ */
+ len = strlen(e->name);
+ if (len > 2 &&
+ e->name[len - 3] == '.' &&
+ e->name[len - 2] == '*') {
+ if (!allow_wildcards) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ e->name[len - 3] = '\0';
+ e->wildcard = true;
+ }
+
+ break;
+ }
+
+ case KDBUS_ITEM_POLICY_ACCESS:
+ if (!e) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ ret = kdbus_policy_make_access(&item->policy_access,
+ &a);
+ if (ret < 0)
+ goto exit;
+
+ list_add_tail(&a->list, &e->access_list);
+ break;
+ }
+ }
+
+ down_write(&db->entries_rwlock);
+
+ /* remember previous entries to restore in case of failure */
+ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
+ if (e->owner == owner) {
+ hash_del(&e->hentry);
+ hlist_add_head(&e->hentry, &restore);
+ }
+
+ hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
+ /* prevent duplicates */
+ hash = kdbus_str_hash(e->name);
+ hash_for_each_possible(db->entries_hash, p, hentry, hash)
+ if (strcmp(e->name, p->name) == 0 &&
+ e->wildcard == p->wildcard) {
+ ret = -EEXIST;
+ goto restore;
+ }
+
+ hlist_del(&e->hentry);
+ hash_add(db->entries_hash, &e->hentry, hash);
+ }
+
+ /* purge all cache-entries produced by previous rules */
+ __kdbus_policy_remove_owner_cache(db, owner);
+
+restore:
+ /* if we failed, flush all entries we added so far, but keep cache */
+ if (ret < 0)
+ __kdbus_policy_remove_owner(db, owner);
+
+ /* if we failed, restore entries, otherwise release them */
+ hlist_for_each_entry_safe(e, tmp, &restore, hentry) {
+ hlist_del(&e->hentry);
+ if (ret < 0) {
+ hash = kdbus_str_hash(e->name);
+ hash_add(db->entries_hash, &e->hentry, hash);
+ } else {
+ kdbus_policy_entry_free(e);
+ }
+ }
+
+ up_write(&db->entries_rwlock);
+
+exit:
+ hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
+ hlist_del(&e->hentry);
+ kdbus_policy_entry_free(e);
+ }
+
+ return ret;
+}
diff --git a/drivers/misc/kdbus/policy.h b/drivers/misc/kdbus/policy.h
new file mode 100644
index 000000000000..f4f6f044b4c1
--- /dev/null
+++ b/drivers/misc/kdbus/policy.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_POLICY_H
+#define __KDBUS_POLICY_H
+
+#include <linux/hashtable.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+
+struct kdbus_conn;
+struct kdbus_item;
+
+/**
+ * struct kdbus_policy_db - policy database
+ * @entries_hash: Hashtable of entries
+ * @talk_access_hash: Hashtable of send access elements
+ * @entries_lock: Mutex to protect the database's access entries
+ * @cache_lock: Mutex to protect the database's cache
+ */
+struct kdbus_policy_db {
+ DECLARE_HASHTABLE(entries_hash, 6);
+ DECLARE_HASHTABLE(talk_access_hash, 6);
+ struct rw_semaphore entries_rwlock;
+ struct mutex cache_lock;
+};
+
+void kdbus_policy_db_init(struct kdbus_policy_db *db);
+void kdbus_policy_db_clear(struct kdbus_policy_db *db);
+
+int kdbus_policy_check_see_access_unlocked(struct kdbus_policy_db *db,
+ struct kdbus_conn *conn,
+ const char *name);
+int kdbus_policy_check_talk_access(struct kdbus_policy_db *db,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst);
+int kdbus_policy_check_own_access(struct kdbus_policy_db *db,
+ const struct kdbus_conn *conn,
+ const char *name);
+void kdbus_policy_purge_cache(struct kdbus_policy_db *db,
+ const struct kdbus_conn *conn);
+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
+ const void *owner);
+int kdbus_policy_set(struct kdbus_policy_db *db,
+ const struct kdbus_item *items,
+ size_t items_size,
+ size_t max_policies,
+ bool allow_wildcards,
+ const void *owner);
+#endif
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:04:07 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch adds the name registry implementation.

Each bus instantiates a name registry to resolve well-known names
into unique connection IDs for message delivery. The registry will
be queried when a message is sent with kdbus_msg.dst_id set to
KDBUS_DST_ID_NAME, or when a registry dump is requested.

It's important to have this registry implemented in the kernel to
implement lookups and take-overs in a race-free way.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/misc/kdbus/names.c | 920 +++++++++++++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/names.h | 81 ++++
2 files changed, 1001 insertions(+)
create mode 100644 drivers/misc/kdbus/names.c
create mode 100644 drivers/misc/kdbus/names.h

diff --git a/drivers/misc/kdbus/names.c b/drivers/misc/kdbus/names.c
new file mode 100644
index 000000000000..5f8853cbc919
--- /dev/null
+++ b/drivers/misc/kdbus/names.c
@@ -0,0 +1,920 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/hash.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "item.h"
+#include "names.h"
+#include "notify.h"
+#include "policy.h"
+
+/**
+ * struct kdbus_name_queue_item - a queue item for a name
+ * @conn: The associated connection
+ * @entry: Name entry queuing up for
+ * @entry_entry: List element for the list in @entry
+ * @conn_entry: List element for the list in @conn
+ * @flags: The queuing flags
+ */
+struct kdbus_name_queue_item {
+ struct kdbus_conn *conn;
+ struct kdbus_name_entry *entry;
+ struct list_head entry_entry;
+ struct list_head conn_entry;
+ u64 flags;
+};
+
+static void kdbus_name_entry_free(struct kdbus_name_entry *e)
+{
+ hash_del(&e->hentry);
+ kfree(e->name);
+ kfree(e);
+}
+
+/**
+ * kdbus_name_registry_free() - drop a name reg's reference
+ * @reg: The name registry
+ *
+ * Cleanup the name registry's internal structures.
+ */
+void kdbus_name_registry_free(struct kdbus_name_registry *reg)
+{
+ struct kdbus_name_entry *e;
+ struct hlist_node *tmp;
+ unsigned int i;
+
+ hash_for_each_safe(reg->entries_hash, i, tmp, e, hentry)
+ kdbus_name_entry_free(e);
+
+ kfree(reg);
+}
+
+/**
+ * kdbus_name_registry_new() - create a new name registry
+ * @reg: The returned name registry
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_name_registry_new(struct kdbus_name_registry **reg)
+{
+ struct kdbus_name_registry *r;
+
+ r = kzalloc(sizeof(*r), GFP_KERNEL);
+ if (!r)
+ return -ENOMEM;
+
+ hash_init(r->entries_hash);
+ init_rwsem(&r->rwlock);
+
+ *reg = r;
+ return 0;
+}
+
+static struct kdbus_name_entry *
+kdbus_name_lookup(struct kdbus_name_registry *reg, u32 hash, const char *name)
+{
+ struct kdbus_name_entry *e;
+
+ hash_for_each_possible(reg->entries_hash, e, hentry, hash)
+ if (strcmp(e->name, name) == 0)
+ return e;
+
+ return NULL;
+}
+
+static void kdbus_name_queue_item_free(struct kdbus_name_queue_item *q)
+{
+ list_del(&q->entry_entry);
+ list_del(&q->conn_entry);
+ kfree(q);
+}
+
+/*
+ * The caller needs to hold its own reference, so the connection does not go
+ * away while the entry's reference is dropped under lock.
+ */
+static void kdbus_name_entry_remove_owner(struct kdbus_name_entry *e)
+{
+ BUG_ON(!e->conn);
+ BUG_ON(!mutex_is_locked(&e->conn->lock));
+
+ atomic_dec(&e->conn->name_count);
+ list_del(&e->conn_entry);
+ e->conn = kdbus_conn_unref(e->conn);
+}
+
+static void kdbus_name_entry_set_owner(struct kdbus_name_entry *e,
+ struct kdbus_conn *conn)
+{
+ BUG_ON(e->conn);
+ BUG_ON(!mutex_is_locked(&conn->lock));
+
+ e->conn = kdbus_conn_ref(conn);
+ list_add_tail(&e->conn_entry, &e->conn->names_list);
+ atomic_inc(&conn->name_count);
+}
+
+static int kdbus_name_replace_owner(struct kdbus_name_entry *e,
+ struct kdbus_conn *conn, u64 flags)
+{
+ struct kdbus_conn *conn_old = kdbus_conn_ref(e->conn);
+ int ret;
+
+ BUG_ON(conn == conn_old);
+ BUG_ON(!conn_old);
+
+ /* take lock of both connections in a defined order */
+ if (conn < conn_old) {
+ mutex_lock(&conn->lock);
+ mutex_lock_nested(&conn_old->lock, 1);
+ } else {
+ mutex_lock(&conn_old->lock);
+ mutex_lock_nested(&conn->lock, 1);
+ }
+
+ if (!kdbus_conn_active(conn)) {
+ ret = -ECONNRESET;
+ goto exit_unlock;
+ }
+
+ ret = kdbus_notify_name_change(conn->bus, KDBUS_ITEM_NAME_CHANGE,
+ e->conn->id, conn->id,
+ e->flags, flags, e->name);
+ if (ret < 0)
+ goto exit_unlock;
+
+ /* hand over name ownership */
+ kdbus_name_entry_remove_owner(e);
+ kdbus_name_entry_set_owner(e, conn);
+ e->flags = flags;
+
+exit_unlock:
+ mutex_unlock(&conn_old->lock);
+ mutex_unlock(&conn->lock);
+
+ kdbus_conn_unref(conn_old);
+ return ret;
+}
+
+static int kdbus_name_entry_release(struct kdbus_name_entry *e,
+ struct kdbus_bus *bus)
+{
+ struct kdbus_conn *conn;
+
+ /* give it to first active waiter in the queue */
+ while (!list_empty(&e->queue_list)) {
+ struct kdbus_name_queue_item *q;
+ int ret;
+
+ q = list_first_entry(&e->queue_list,
+ struct kdbus_name_queue_item,
+ entry_entry);
+
+ ret = kdbus_name_replace_owner(e, q->conn, q->flags);
+ if (ret < 0)
+ continue;
+
+ kdbus_name_queue_item_free(q);
+ return 0;
+ }
+
+ /* hand it back to an active activator connection */
+ if (e->activator && e->activator != e->conn) {
+ u64 flags = KDBUS_NAME_ACTIVATOR;
+ int ret;
+
+ /*
+ * Move messages still queued in the old connection
+ * and addressed to that name to the new connection.
+ * This allows a race and loss-free name and message
+ * takeover and exit-on-idle services.
+ */
+ ret = kdbus_conn_move_messages(e->activator, e->conn,
+ e->name_id);
+ if (ret < 0)
+ goto exit_release;
+
+ return kdbus_name_replace_owner(e, e->activator, flags);
+ }
+
+exit_release:
+ /* release the name */
+ kdbus_notify_name_change(e->conn->bus, KDBUS_ITEM_NAME_REMOVE,
+ e->conn->id, 0,
+ e->flags, 0, e->name);
+
+ conn = kdbus_conn_ref(e->conn);
+ mutex_lock(&conn->lock);
+ kdbus_name_entry_remove_owner(e);
+ mutex_unlock(&conn->lock);
+ kdbus_conn_unref(conn);
+
+ kdbus_conn_unref(e->activator);
+ kdbus_name_entry_free(e);
+
+ return 0;
+}
+
+static int kdbus_name_release(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ const char *name)
+{
+ struct kdbus_name_queue_item *q_tmp, *q;
+ struct kdbus_name_entry *e = NULL;
+ u32 hash;
+ int ret = 0;
+
+ hash = kdbus_str_hash(name);
+
+ /* lock order: domain -> bus -> ep -> names -> connection */
+ mutex_lock(&conn->bus->lock);
+ down_write(&reg->rwlock);
+
+ e = kdbus_name_lookup(reg, hash, name);
+ if (!e) {
+ ret = -ESRCH;
+ goto exit_unlock;
+ }
+
+ /* Is the connection already the real owner of the name? */
+ if (e->conn == conn) {
+ ret = kdbus_name_entry_release(e, conn->bus);
+ } else {
+ /*
+ * Otherwise, walk the list of queued entries and search
+ * for items for connection.
+ */
+
+ /* In case the name belongs to somebody else */
+ ret = -EADDRINUSE;
+
+ list_for_each_entry_safe(q, q_tmp,
+ &e->queue_list,
+ entry_entry) {
+ if (q->conn != conn)
+ continue;
+
+ kdbus_name_queue_item_free(q);
+ ret = 0;
+ break;
+ }
+ }
+
+ /*
+ * Now that the connection has lost a name, purge all cached policy
+ * entries, so upon the next message, TALK access will be checked
+ * against the names the connection actually owns.
+ */
+ if (ret == 0)
+ kdbus_conn_purge_policy_cache(conn);
+
+exit_unlock:
+ up_write(&reg->rwlock);
+ mutex_unlock(&conn->bus->lock);
+
+ return ret;
+}
+
+/**
+ * kdbus_name_remove_by_conn() - remove all name entries of a given connection
+ * @reg: The name registry
+ * @conn: The connection which entries to remove
+ *
+ * This function removes all name entry held by a given connection.
+ */
+void kdbus_name_remove_by_conn(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn)
+{
+ struct kdbus_name_queue_item *q_tmp, *q;
+ struct kdbus_conn *activator = NULL;
+ struct kdbus_name_entry *e_tmp, *e;
+ LIST_HEAD(names_queue_list);
+ LIST_HEAD(names_list);
+
+ /* lock order: domain -> bus -> ep -> names -> conn */
+ mutex_lock(&conn->bus->lock);
+ down_write(&reg->rwlock);
+
+ mutex_lock(&conn->lock);
+ list_splice_init(&conn->names_list, &names_list);
+ list_splice_init(&conn->names_queue_list, &names_queue_list);
+ mutex_unlock(&conn->lock);
+
+ if (kdbus_conn_is_activator(conn)) {
+ activator = conn->activator_of->activator;
+ conn->activator_of->activator = NULL;
+ }
+ list_for_each_entry_safe(q, q_tmp, &names_queue_list, conn_entry)
+ kdbus_name_queue_item_free(q);
+ list_for_each_entry_safe(e, e_tmp, &names_list, conn_entry)
+ kdbus_name_entry_release(e, conn->bus);
+
+ up_write(&reg->rwlock);
+ mutex_unlock(&conn->bus->lock);
+
+ kdbus_conn_unref(activator);
+ kdbus_notify_flush(conn->bus);
+}
+
+/**
+ * kdbus_name_lock() - look up a name in a name registry and lock it
+ * @reg: The name registry
+ * @name: The name to look up
+ *
+ * Search for a name in a given name registry and return it with the
+ * registry-lock held. If the object is not found, the lock is not acquired and
+ * NULL is returned. The caller is responsible of unlocking the name via
+ * kdbus_name_unlock() again. Note that kdbus_name_unlock() can be safely called
+ * with NULL as name. In this case, it's a no-op as nothing was locked.
+ *
+ * The *_lock() + *_unlock() logic is only required for callers that need to
+ * protect their code against concurrent activator/implementor name changes.
+ * Multiple readers can lock names concurrently. However, you may not change
+ * name-ownership while holding a name-lock.
+ *
+ * Return: NULL if name is unknown, otherwise return a pointer to the name
+ * entry with the name-lock held (reader lock only).
+ */
+struct kdbus_name_entry *kdbus_name_lock(struct kdbus_name_registry *reg,
+ const char *name)
+{
+ struct kdbus_name_entry *e = NULL;
+ u32 hash = kdbus_str_hash(name);
+
+ down_read(&reg->rwlock);
+ e = kdbus_name_lookup(reg, hash, name);
+ if (e)
+ return e;
+ up_read(&reg->rwlock);
+
+ return NULL;
+}
+
+/**
+ * kdbus_name_unlock() - unlock one name in a name registry
+ * @reg: The name registry
+ * @entry: The locked name entry or NULL
+ *
+ * This is the unlock-counterpart of kdbus_name_lock(). It unlocks a name that
+ * was previously successfully locked. You can safely pass NULL as entry and
+ * this will become a no-op. Therefore, it's safe to always call this on the
+ * return-value of kdbus_name_lock().
+ *
+ * Return: This always returns NULL.
+ */
+struct kdbus_name_entry *kdbus_name_unlock(struct kdbus_name_registry *reg,
+ struct kdbus_name_entry *entry)
+{
+ if (entry) {
+ BUG_ON(!rwsem_is_locked(&reg->rwlock));
+ up_read(&reg->rwlock);
+ }
+
+ return NULL;
+}
+
+static int kdbus_name_queue_conn(struct kdbus_conn *conn, u64 flags,
+ struct kdbus_name_entry *e)
+{
+ struct kdbus_name_queue_item *q;
+
+ q = kzalloc(sizeof(*q), GFP_KERNEL);
+ if (!q)
+ return -ENOMEM;
+
+ q->conn = conn;
+ q->flags = flags;
+ q->entry = e;
+
+ list_add_tail(&q->entry_entry, &e->queue_list);
+ list_add_tail(&q->conn_entry, &conn->names_queue_list);
+
+ return 0;
+}
+
+/**
+ * kdbus_name_is_valid() - check if a name is valid
+ * @p: The name to check
+ * @allow_wildcard: Whether or not to allow a wildcard name
+ *
+ * A name is valid if all of the following criterias are met:
+ *
+ * - The name has two or more elements separated by a period ('.') character.
+ * - All elements must contain at least one character.
+ * - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_-"
+ * and must not begin with a digit.
+ * - The name must not exceed KDBUS_NAME_MAX_LEN.
+ * - If @allow_wildcard is true, the name may end on '.*'
+ */
+bool kdbus_name_is_valid(const char *p, bool allow_wildcard)
+{
+ bool dot, found_dot = false;
+ const char *q;
+
+ for (dot = true, q = p; *q; q++) {
+ if (*q == '.') {
+ if (dot)
+ return false;
+
+ found_dot = true;
+ dot = true;
+ } else {
+ bool good;
+
+ good = isalpha(*q) || (!dot && isdigit(*q)) ||
+ *q == '_' || *q == '-' ||
+ (allow_wildcard && dot &&
+ *q == '*' && *(q + 1) == '\0');
+
+ if (!good)
+ return false;
+
+ dot = false;
+ }
+ }
+
+ if (q - p > KDBUS_NAME_MAX_LEN)
+ return false;
+
+ if (dot)
+ return false;
+
+ if (!found_dot)
+ return false;
+
+ return true;
+}
+
+/**
+ * kdbus_name_acquire() - acquire a name
+ * @reg: The name registry
+ * @conn: The connection to pin this entry to
+ * @name: The name to acquire
+ * @flags: Acquisition flags (KDBUS_NAME_*)
+ * @entry: Return pointer for the entry (may be NULL)
+ *
+ * Callers must ensure that @conn is either a privileged bus user or has
+ * sufficient privileges in the policy-db to own the well-known name @name.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_name_acquire(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ const char *name, u64 *flags,
+ struct kdbus_name_entry **entry)
+{
+ struct kdbus_name_entry *e = NULL;
+ u32 hash;
+ int ret = 0;
+
+ /* lock order: domain -> bus -> ep -> names -> conn */
+ mutex_lock(&conn->bus->lock);
+ down_write(&reg->rwlock);
+
+ hash = kdbus_str_hash(name);
+ e = kdbus_name_lookup(reg, hash, name);
+ if (e) {
+ /* connection already owns that name */
+ if (e->conn == conn) {
+ ret = -EALREADY;
+ goto exit_unlock;
+ }
+
+ if (kdbus_conn_is_activator(conn)) {
+ /* An activator can only own a single name */
+ if (conn->activator_of) {
+ if (conn->activator_of == e)
+ ret = -EALREADY;
+ else
+ ret = -EINVAL;
+ } else if (!e->activator && !conn->activator_of) {
+ /*
+ * Activator registers for name that is
+ * already owned
+ */
+ e->activator = kdbus_conn_ref(conn);
+ conn->activator_of = e;
+ }
+
+ goto exit_unlock;
+ }
+
+ /* take over the name of an activator connection */
+ if (e->flags & KDBUS_NAME_ACTIVATOR) {
+ /*
+ * Take over the messages queued in the activator
+ * connection, the activator itself never reads them.
+ */
+ ret = kdbus_conn_move_messages(conn, e->activator, 0);
+ if (ret < 0)
+ goto exit_unlock;
+
+ ret = kdbus_name_replace_owner(e, conn, *flags);
+ goto exit_unlock;
+ }
+
+ /* take over the name if both parties agree */
+ if ((*flags & KDBUS_NAME_REPLACE_EXISTING) &&
+ (e->flags & KDBUS_NAME_ALLOW_REPLACEMENT)) {
+ /*
+ * Move name back to the queue, in case we take it away
+ * from a connection which asked for queuing.
+ */
+ if (e->flags & KDBUS_NAME_QUEUE) {
+ ret = kdbus_name_queue_conn(e->conn,
+ e->flags, e);
+ if (ret < 0)
+ goto exit_unlock;
+ }
+
+ ret = kdbus_name_replace_owner(e, conn, *flags);
+ goto exit_unlock;
+ }
+
+ /* add it to the queue waiting for the name */
+ if (*flags & KDBUS_NAME_QUEUE) {
+ ret = kdbus_name_queue_conn(conn, *flags, e);
+
+ /* tell the caller that we queued it */
+ *flags |= KDBUS_NAME_IN_QUEUE;
+
+ goto exit_unlock;
+ }
+
+ /* the name is busy, return a failure */
+ ret = -EEXIST;
+ goto exit_unlock;
+ } else {
+ /* An activator can only own a single name */
+ if (kdbus_conn_is_activator(conn) &&
+ conn->activator_of) {
+ ret = -EINVAL;
+ goto exit_unlock;
+ }
+ }
+
+ /* new name entry */
+ e = kzalloc(sizeof(*e), GFP_KERNEL);
+ if (!e) {
+ ret = -ENOMEM;
+ goto exit_unlock;
+ }
+
+ e->name = kstrdup(name, GFP_KERNEL);
+ if (!e->name) {
+ kfree(e);
+ ret = -ENOMEM;
+ goto exit_unlock;
+ }
+
+ if (kdbus_conn_is_activator(conn)) {
+ e->activator = kdbus_conn_ref(conn);
+ conn->activator_of = e;
+ }
+
+ e->flags = *flags;
+ INIT_LIST_HEAD(&e->queue_list);
+ e->name_id = ++reg->name_seq_last;
+
+ mutex_lock(&conn->lock);
+ if (!kdbus_conn_active(conn)) {
+ mutex_unlock(&conn->lock);
+ kfree(e);
+ ret = -ECONNRESET;
+ goto exit_unlock;
+ }
+ hash_add(reg->entries_hash, &e->hentry, hash);
+ kdbus_name_entry_set_owner(e, conn);
+ mutex_unlock(&conn->lock);
+
+ kdbus_notify_name_change(e->conn->bus, KDBUS_ITEM_NAME_ADD,
+ 0, e->conn->id,
+ 0, e->flags, e->name);
+
+ if (entry)
+ *entry = e;
+
+exit_unlock:
+ up_write(&reg->rwlock);
+ mutex_unlock(&conn->bus->lock);
+ kdbus_notify_flush(conn->bus);
+
+ return ret;
+}
+
+/**
+ * kdbus_cmd_name_acquire() - acquire a name from a ioctl command buffer
+ * @reg: The name registry
+ * @conn: The connection to pin this entry to
+ * @cmd: The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_name_acquire(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ struct kdbus_cmd_name *cmd)
+{
+ struct kdbus_name_entry *e = NULL;
+ const char *name;
+ int ret;
+
+ ret = kdbus_items_get_str(cmd->items, KDBUS_ITEMS_SIZE(cmd, items),
+ KDBUS_ITEM_NAME, &name);
+ if (ret < 0)
+ return -EINVAL;
+
+ if (!kdbus_name_is_valid(name, false))
+ return -EINVAL;
+
+ /*
+ * Do atomic_inc_return here to reserve our slot, then decrement
+ * it before returning.
+ */
+ ret = -E2BIG;
+ if (atomic_inc_return(&conn->name_count) > KDBUS_CONN_MAX_NAMES)
+ goto out_dec;
+
+ ret = kdbus_ep_policy_check_own_access(conn->ep, conn, name);
+ if (ret < 0)
+ goto out_dec;
+
+ ret = kdbus_name_acquire(reg, conn, name, &cmd->flags, &e);
+ kdbus_notify_flush(conn->bus);
+
+out_dec:
+ /* Decrement the previous allocated slot */
+ atomic_dec(&conn->name_count);
+ return ret;
+}
+
+/**
+ * kdbus_cmd_name_release() - release a name entry from a ioctl command buffer
+ * @reg: The name registry
+ * @conn: The connection that holds the name
+ * @cmd: The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_name_release(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ const struct kdbus_cmd_name *cmd)
+{
+ int ret;
+ const char *name;
+
+ ret = kdbus_items_get_str(cmd->items, KDBUS_ITEMS_SIZE(cmd, items),
+ KDBUS_ITEM_NAME, &name);
+ if (ret < 0)
+ return -EINVAL;
+
+ if (!kdbus_name_is_valid(name, false))
+ return -EINVAL;
+
+ ret = kdbus_ep_policy_check_see_access(conn->ep, conn, name);
+ if (ret < 0)
+ return ret;
+
+ ret = kdbus_name_release(reg, conn, name);
+
+ kdbus_notify_flush(conn->bus);
+ return ret;
+}
+
+static int kdbus_name_list_write(struct kdbus_conn *conn,
+ struct kdbus_conn *c,
+ struct kdbus_pool_slice *slice,
+ size_t *pos,
+ struct kdbus_name_entry *e,
+ bool write)
+{
+ const size_t len = sizeof(struct kdbus_name_info);
+ size_t p = *pos;
+ size_t nlen = 0;
+
+ if (e) {
+ nlen = strlen(e->name) + 1;
+
+ if (kdbus_ep_policy_check_see_access_unlocked(conn->ep, conn,
+ e->name) < 0)
+ return 0;
+ }
+
+ if (write) {
+ int ret;
+ struct kdbus_name_info info = {
+ .size = len,
+ .owner_id = c->id,
+ .flags = e ? e->flags : 0,
+ .conn_flags = c->flags,
+ };
+
+ if (nlen)
+ info.size += KDBUS_ITEM_SIZE(nlen);
+
+ /* write record */
+ ret = kdbus_pool_slice_copy(slice, p, &info, len);
+ if (ret < 0)
+ return ret;
+ p += len;
+
+ /* append name */
+ if (e) {
+ struct kdbus_item_header {
+ __u64 size;
+ __u64 type;
+ } h;
+
+ h.size = KDBUS_ITEM_HEADER_SIZE + nlen;
+ h.type = KDBUS_ITEM_NAME;
+
+ ret = kdbus_pool_slice_copy(slice, p, &h, sizeof(h));
+ if (ret < 0)
+ return ret;
+
+ p += sizeof(h);
+
+ ret = kdbus_pool_slice_copy(slice, p, e->name, nlen);
+ if (ret < 0)
+ return ret;
+
+ p += KDBUS_ALIGN8(nlen);
+ }
+ } else {
+ p += len;
+ if (nlen)
+ p += KDBUS_ITEM_SIZE(nlen);
+ }
+
+ *pos = p;
+ return 0;
+}
+
+static int kdbus_name_list_all(struct kdbus_conn *conn, u64 flags,
+ struct kdbus_pool_slice *slice,
+ size_t *pos, bool write)
+{
+ struct kdbus_conn *c;
+ size_t p = *pos;
+ int ret, i;
+
+ hash_for_each(conn->bus->conn_hash, i, c, hentry) {
+ bool added = false;
+
+ /* skip activators */
+ if (!(flags & KDBUS_NAME_LIST_ACTIVATORS) &&
+ kdbus_conn_is_activator(c))
+ continue;
+
+ /* all names the connection owns */
+ if (flags & (KDBUS_NAME_LIST_NAMES |
+ KDBUS_NAME_LIST_ACTIVATORS)) {
+ struct kdbus_name_entry *e;
+
+ mutex_lock(&c->lock);
+ list_for_each_entry(e, &c->names_list, conn_entry) {
+ struct kdbus_conn *a = e->activator;
+
+ if ((flags & KDBUS_NAME_LIST_ACTIVATORS) &&
+ a && a != c) {
+ ret = kdbus_name_list_write(conn, a,
+ slice, &p, e, write);
+ if (ret < 0) {
+ mutex_unlock(&c->lock);
+ return ret;
+ }
+
+ added = true;
+ }
+
+ if (flags & KDBUS_NAME_LIST_NAMES ||
+ kdbus_conn_is_activator(c)) {
+ ret = kdbus_name_list_write(conn, c,
+ slice, &p, e, write);
+ if (ret < 0) {
+ mutex_unlock(&c->lock);
+ return ret;
+ }
+
+ added = true;
+ }
+ }
+ mutex_unlock(&c->lock);
+ }
+
+ /* queue of names the connection is currently waiting for */
+ if (flags & KDBUS_NAME_LIST_QUEUED) {
+ struct kdbus_name_queue_item *q;
+
+ mutex_lock(&c->lock);
+ list_for_each_entry(q, &c->names_queue_list,
+ conn_entry) {
+ ret = kdbus_name_list_write(conn, c,
+ slice, &p, q->entry, write);
+ if (ret < 0) {
+ mutex_unlock(&c->lock);
+ return ret;
+ }
+
+ added = true;
+ }
+ mutex_unlock(&c->lock);
+ }
+
+ /* nothing added so far, just add the unique ID */
+ if (!added && flags & KDBUS_NAME_LIST_UNIQUE) {
+ ret = kdbus_name_list_write(conn, c,
+ slice, &p, NULL, write);
+ if (ret < 0)
+ return ret;
+ }
+ }
+
+ *pos = p;
+ return 0;
+}
+
+/**
+ * kdbus_cmd_name_list() - list names of a connection
+ * @reg: The name registry
+ * @conn: The connection holding the name entries
+ * @cmd: The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_name_list(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ struct kdbus_cmd_name_list *cmd)
+{
+ struct kdbus_policy_db *policy_db;
+ struct kdbus_name_list list = {};
+ struct kdbus_pool_slice *slice;
+ size_t pos;
+ int ret;
+
+ policy_db = &conn->ep->policy_db;
+
+ /* lock order: domain -> bus -> ep -> names -> conn */
+ down_read(&reg->rwlock);
+ down_read(&conn->bus->conn_rwlock);
+ down_read(&policy_db->entries_rwlock);
+
+ /* size of header + records */
+ pos = sizeof(struct kdbus_name_list);
+ ret = kdbus_name_list_all(conn, cmd->flags, NULL, &pos, false);
+ if (ret < 0)
+ goto exit_unlock;
+
+ ret = kdbus_pool_slice_alloc(conn->pool, &slice, pos);
+ if (ret < 0)
+ goto exit_unlock;
+
+ /* copy the header, specifying the overall size */
+ list.size = pos;
+ ret = kdbus_pool_slice_copy(slice, 0, &list, sizeof(list));
+ if (ret < 0)
+ goto exit_pool_free;
+
+ /* copy the records */
+ pos = sizeof(struct kdbus_name_list);
+ ret = kdbus_name_list_all(conn, cmd->flags, slice, &pos, true);
+ if (ret < 0)
+ goto exit_pool_free;
+
+ cmd->offset = kdbus_pool_slice_offset(slice);
+ kdbus_pool_slice_flush(slice);
+ kdbus_pool_slice_make_public(slice);
+
+exit_pool_free:
+ if (ret < 0)
+ kdbus_pool_slice_free(slice);
+exit_unlock:
+ up_read(&policy_db->entries_rwlock);
+ up_read(&conn->bus->conn_rwlock);
+ up_read(&reg->rwlock);
+ return ret;
+}
diff --git a/drivers/misc/kdbus/names.h b/drivers/misc/kdbus/names.h
new file mode 100644
index 000000000000..594d1bd54b2e
--- /dev/null
+++ b/drivers/misc/kdbus/names.h
@@ -0,0 +1,81 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_NAMES_H
+#define __KDBUS_NAMES_H
+
+#include <linux/hashtable.h>
+#include <linux/rwsem.h>
+
+/**
+ * struct kdbus_name_registry - names registered for a bus
+ * @entries_hash: Map of entries
+ * @lock: Registry data lock
+ * @name_seq_last: Last used sequence number to assign to a name entry
+ */
+struct kdbus_name_registry {
+ DECLARE_HASHTABLE(entries_hash, 8);
+ struct rw_semaphore rwlock;
+ u64 name_seq_last;
+};
+
+/**
+ * struct kdbus_name_entry - well-know name entry
+ * @name: The well-known name
+ * @name_id: Sequence number of name entry to be able to uniquely
+ * identify a name over its registration lifetime
+ * @flags: KDBUS_NAME_* flags
+ * @queue_list: List of queued waiters for the well-known name
+ * @conn_entry: Entry in connection
+ * @hentry: Entry in registry map
+ * @conn: Connection owning the name
+ * @activator: Connection of the activator queuing incoming messages
+ */
+struct kdbus_name_entry {
+ char *name;
+ u64 name_id;
+ u64 flags;
+ struct list_head queue_list;
+ struct list_head conn_entry;
+ struct hlist_node hentry;
+ struct kdbus_conn *conn;
+ struct kdbus_conn *activator;
+};
+
+int kdbus_name_registry_new(struct kdbus_name_registry **reg);
+void kdbus_name_registry_free(struct kdbus_name_registry *reg);
+
+int kdbus_name_acquire(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ const char *name, u64 *flags,
+ struct kdbus_name_entry **entry);
+int kdbus_cmd_name_acquire(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ struct kdbus_cmd_name *cmd);
+int kdbus_cmd_name_release(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ const struct kdbus_cmd_name *cmd);
+int kdbus_cmd_name_list(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn,
+ struct kdbus_cmd_name_list *cmd);
+
+struct kdbus_name_entry *kdbus_name_lock(struct kdbus_name_registry *reg,
+ const char *name);
+struct kdbus_name_entry *kdbus_name_unlock(struct kdbus_name_registry *reg,
+ struct kdbus_name_entry *entry);
+
+void kdbus_name_remove_by_conn(struct kdbus_name_registry *reg,
+ struct kdbus_conn *conn);
+
+bool kdbus_name_is_valid(const char *p, bool allow_wildcard);
+#endif
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:04:27 UTC
Permalink
From: Daniel Mack <***@zonque.org>

Add the logic to handle the following entities:

Domain:
A domain is a named object containing a number of buses. A
system container that contains its own init system and
users usually also runs in its own kdbus domain. The
/dev/kdbus/domain/<container-name>/ directory shows up inside
the domain as /dev/kdbus/. Every domain offers its own "control"
device node to create new buses or new sub-domains. Domains have
no connection to each other and cannot see nor talk to each
other. See section 5 for more details.

Bus:
A bus is a named object inside a domain. Clients exchange messages
over a bus. Multiple buses themselves have no connection to each
other; messages can only be exchanged on the same bus. The default
entry point to a bus, where clients establish the connection to, is
the "bus" device node /dev/kdbus/<bus name>/bus. Common operating
system setups create one "system bus" per system, and one "user
bus" for every logged-in user. Applications or services may create
their own private named buses. See section 5 for more details.

Endpoint:
An endpoint provides the device node to talk to a bus. Opening an
endpoint creates a new connection to the bus to which the endpoint
belongs. Every bus has a default endpoint called "bus". A bus can
optionally offer additional endpoints with custom names to provide
a restricted access to the same bus. Custom endpoints carry
additional policy which can be used to give sandboxed processes
only a locked-down, limited, filtered access to the same bus.

See Documentation/kdbus.txt for more details.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/misc/kdbus/bus.c | 450 +++++++++++++++++++++++++++++++++
drivers/misc/kdbus/bus.h | 107 ++++++++
drivers/misc/kdbus/domain.c | 477 +++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/domain.h | 105 ++++++++
drivers/misc/kdbus/endpoint.c | 567 ++++++++++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/endpoint.h | 94 +++++++
6 files changed, 1800 insertions(+)
create mode 100644 drivers/misc/kdbus/bus.c
create mode 100644 drivers/misc/kdbus/bus.h
create mode 100644 drivers/misc/kdbus/domain.c
create mode 100644 drivers/misc/kdbus/domain.h
create mode 100644 drivers/misc/kdbus/endpoint.c
create mode 100644 drivers/misc/kdbus/endpoint.h

diff --git a/drivers/misc/kdbus/bus.c b/drivers/misc/kdbus/bus.c
new file mode 100644
index 000000000000..6dcaf22f5d59
--- /dev/null
+++ b/drivers/misc/kdbus/bus.c
@@ -0,0 +1,450 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "notify.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "item.h"
+#include "metadata.h"
+#include "names.h"
+#include "policy.h"
+
+/**
+ * kdbus_bus_cred_is_privileged() - check whether the given credentials in
+ * combination with the capabilities of the
+ * current thead are privileged on the bus
+ * @bus: The bus to check
+ * @cred: The credentials to match
+ *
+ * Return: true if the credentials are privileged, otherwise false.
+ */
+bool kdbus_bus_cred_is_privileged(const struct kdbus_bus *bus,
+ const struct cred *cred)
+{
+ /* Capabilities are *ALWAYS* tested against the current thread, they're
+ * never remembered from conn-credentials. */
+ if (ns_capable(&init_user_ns, CAP_IPC_OWNER))
+ return true;
+
+ return uid_eq(bus->uid_owner, cred->fsuid);
+}
+
+/**
+ * kdbus_bus_uid_is_privileged() - check whether the current user is a
+ * priviledged bus user
+ * @bus: The bus to check
+ *
+ * Return: true if the current user has CAP_IPC_OWNER capabilities, or
+ * if it has the same UID as the user that created the bus. Otherwise,
+ * false is returned.
+ */
+bool kdbus_bus_uid_is_privileged(const struct kdbus_bus *bus)
+{
+ return kdbus_bus_cred_is_privileged(bus, current_cred());
+}
+
+/**
+ * kdbus_bus_ref() - increase the reference counter of a kdbus_bus
+ * @bus: The bus to reference
+ *
+ * Every user of a bus, except for its creator, must add a reference to the
+ * kdbus_bus using this function.
+ *
+ * Return: the bus itself
+ */
+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus)
+{
+ kref_get(&bus->kref);
+ return bus;
+}
+
+static void __kdbus_bus_free(struct kref *kref)
+{
+ struct kdbus_bus *bus = container_of(kref, struct kdbus_bus, kref);
+
+ BUG_ON(!bus->disconnected);
+ BUG_ON(!list_empty(&bus->ep_list));
+ BUG_ON(!list_empty(&bus->monitors_list));
+ BUG_ON(!hash_empty(bus->conn_hash));
+
+ kdbus_notify_free(bus);
+ atomic_dec(&bus->user->buses);
+ kdbus_domain_user_unref(bus->user);
+ kdbus_name_registry_free(bus->name_registry);
+ kdbus_domain_unref(bus->domain);
+ kdbus_policy_db_clear(&bus->policy_db);
+ kdbus_meta_free(bus->meta);
+ kfree(bus->name);
+ kfree(bus);
+}
+
+/**
+ * kdbus_bus_unref() - decrease the reference counter of a kdbus_bus
+ * @bus: The bus to unref
+ *
+ * Release a reference. If the reference count drops to 0, the bus will be
+ * freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus)
+{
+ if (!bus)
+ return NULL;
+
+ kref_put(&bus->kref, __kdbus_bus_free);
+ return NULL;
+}
+
+/**
+ * kdbus_bus_find_conn_by_id() - find a connection with a given id
+ * @bus: The bus to look for the connection
+ * @id: The 64-bit connection id
+ *
+ * Looks up a connection with a given id. The returned connection
+ * is ref'ed, and needs to be unref'ed by the user. Returns NULL if
+ * the connection can't be found.
+ */
+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id)
+{
+ struct kdbus_conn *conn, *found = NULL;
+
+ down_read(&bus->conn_rwlock);
+ hash_for_each_possible(bus->conn_hash, conn, hentry, id)
+ if (conn->id == id) {
+ found = kdbus_conn_ref(conn);
+ break;
+ }
+ up_read(&bus->conn_rwlock);
+
+ return found;
+}
+
+/**
+ * kdbus_bus_disconnect() - disconnect a bus
+ * @bus: The kdbus reference
+ *
+ * The passed bus will be disconnected and the associated endpoint will be
+ * unref'ed.
+ */
+void kdbus_bus_disconnect(struct kdbus_bus *bus)
+{
+ mutex_lock(&bus->lock);
+ if (bus->disconnected) {
+ mutex_unlock(&bus->lock);
+ return;
+ }
+ bus->disconnected = true;
+ mutex_unlock(&bus->lock);
+
+ /* disconnect from domain */
+ mutex_lock(&bus->domain->lock);
+ list_del(&bus->domain_entry);
+ mutex_unlock(&bus->domain->lock);
+
+ /* disconnect all endpoints attached to this bus */
+ for (;;) {
+ struct kdbus_ep *ep;
+
+ mutex_lock(&bus->lock);
+ ep = list_first_entry_or_null(&bus->ep_list,
+ struct kdbus_ep,
+ bus_entry);
+ if (!ep) {
+ mutex_unlock(&bus->lock);
+ break;
+ }
+
+ /* take reference, release lock, disconnect without lock */
+ kdbus_ep_ref(ep);
+ mutex_unlock(&bus->lock);
+
+ kdbus_ep_disconnect(ep);
+ kdbus_ep_unref(ep);
+ }
+
+ /* drop reference for our "bus" endpoint after we disconnected */
+ bus->ep = kdbus_ep_unref(bus->ep);
+}
+
+static struct kdbus_bus *kdbus_bus_find(struct kdbus_domain *domain,
+ const char *name)
+{
+ struct kdbus_bus *bus = NULL;
+ struct kdbus_bus *b;
+
+ mutex_lock(&domain->lock);
+ list_for_each_entry(b, &domain->bus_list, domain_entry) {
+ if (strcmp(b->name, name))
+ continue;
+
+ bus = kdbus_bus_ref(b);
+ break;
+ }
+ mutex_unlock(&domain->lock);
+
+ return bus;
+}
+
+/**
+ * kdbus_cmd_bus_creator_info() - get information on a bus creator
+ * @conn: The querying connection
+ * @cmd_info: The command buffer, as passed in from the ioctl
+ *
+ * Gather information on the creator of the bus @conn is connected to.
+ *
+ * Return: 0 on success, error otherwise.
+ */
+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn,
+ struct kdbus_cmd_info *cmd_info)
+{
+ struct kdbus_bus *bus = conn->bus;
+ struct kdbus_pool_slice *slice;
+ struct kdbus_info info = {};
+ int ret;
+
+ info.size = sizeof(info) + bus->meta->size;
+ info.id = bus->id;
+ info.flags = bus->bus_flags;
+
+ if (!kdbus_meta_ns_eq(conn->meta, bus->meta))
+ return -EPERM;
+
+ ret = kdbus_pool_slice_alloc(conn->pool, &slice, info.size);
+ if (ret < 0)
+ return ret;
+
+ ret = kdbus_pool_slice_copy(slice, 0, &info, sizeof(info));
+ if (ret < 0)
+ goto exit_free_slice;
+
+ ret = kdbus_pool_slice_copy(slice, sizeof(info), bus->meta->data,
+ bus->meta->size);
+ if (ret < 0)
+ goto exit_free_slice;
+
+ /* write back the offset */
+ cmd_info->offset = kdbus_pool_slice_offset(slice);
+ kdbus_pool_slice_flush(slice);
+ kdbus_pool_slice_make_public(slice);
+
+ return 0;
+
+exit_free_slice:
+ kdbus_pool_slice_free(slice);
+ return ret;
+}
+
+/**
+ * kdbus_bus_new() - create a new bus
+ * @domain: The domain to work on
+ * @make: Pointer to a struct kdbus_cmd_make containing the
+ * details for the bus creation
+ * @name: Name of the bus
+ * @bloom: Bloom parameters for this bus
+ * @mode: The access mode for the device node
+ * @uid: The uid of the device node
+ * @gid: The gid of the device node
+ * @bus: Pointer to a reference where the new bus is stored
+ *
+ * This function will allocate a new kdbus_bus and link it to the given
+ * domain.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_bus_new(struct kdbus_domain *domain,
+ const struct kdbus_cmd_make *make,
+ const char *name,
+ const struct kdbus_bloom_parameter *bloom,
+ umode_t mode, kuid_t uid, kgid_t gid,
+ struct kdbus_bus **bus)
+{
+ struct kdbus_bus *b;
+ char prefix[16];
+ int ret;
+
+ BUG_ON(*bus);
+
+ /* enforce "$UID-" prefix */
+ snprintf(prefix, sizeof(prefix), "%u-",
+ from_kuid(current_user_ns(), uid));
+ if (strncmp(name, prefix, strlen(prefix) != 0))
+ return -EINVAL;
+
+ b = kdbus_bus_find(domain, name);
+ if (b) {
+ kdbus_bus_unref(b);
+ return -EEXIST;
+ }
+
+ b = kzalloc(sizeof(*b), GFP_KERNEL);
+ if (!b)
+ return -ENOMEM;
+
+ kref_init(&b->kref);
+ b->uid_owner = uid;
+ b->bus_flags = make->flags;
+ b->bloom = *bloom;
+ mutex_init(&b->lock);
+ init_rwsem(&b->conn_rwlock);
+ hash_init(b->conn_hash);
+ INIT_LIST_HEAD(&b->ep_list);
+ INIT_LIST_HEAD(&b->monitors_list);
+ INIT_LIST_HEAD(&b->notify_list);
+ spin_lock_init(&b->notify_lock);
+ mutex_init(&b->notify_flush_lock);
+ atomic64_set(&b->conn_seq_last, 0);
+ b->domain = kdbus_domain_ref(domain);
+ kdbus_policy_db_init(&b->policy_db);
+
+ /* generate unique bus id */
+ generate_random_uuid(b->id128);
+
+ /* cache the metadata/credentials of the creator */
+ ret = kdbus_meta_new(&b->meta);
+ if (ret < 0)
+ return ret;
+
+ ret = kdbus_meta_append(b->meta, NULL, 0,
+ KDBUS_ATTACH_CREDS |
+ KDBUS_ATTACH_TID_COMM |
+ KDBUS_ATTACH_PID_COMM |
+ KDBUS_ATTACH_EXE |
+ KDBUS_ATTACH_CMDLINE |
+ KDBUS_ATTACH_CGROUP |
+ KDBUS_ATTACH_CAPS |
+ KDBUS_ATTACH_SECLABEL |
+ KDBUS_ATTACH_AUDIT);
+ if (ret < 0)
+ goto exit_free;
+
+ b->name = kstrdup(name, GFP_KERNEL);
+ if (!b->name) {
+ ret = -ENOMEM;
+ goto exit_free;
+ }
+
+ ret = kdbus_name_registry_new(&b->name_registry);
+ if (ret < 0)
+ goto exit_free_name;
+
+ ret = kdbus_ep_new(b, "bus", mode, uid, gid, false, &b->ep);
+ if (ret < 0)
+ goto exit_free_reg;
+
+ /* link into domain */
+ mutex_lock(&domain->lock);
+ if (domain->disconnected) {
+ ret = -ESHUTDOWN;
+ goto exit_unref_user_unlock;
+ }
+
+ /* account the bus against the user */
+ ret = kdbus_domain_get_user_unlocked(domain, uid, &b->user);
+ if (ret < 0)
+ goto exit_unref_user_unlock;
+
+ if (!capable(CAP_IPC_OWNER) &&
+ atomic_inc_return(&b->user->buses) > KDBUS_USER_MAX_BUSES) {
+ atomic_dec(&b->user->buses);
+ ret = -EMFILE;
+ goto exit_unref_user_unlock;
+ }
+
+ b->id = ++domain->bus_seq_last;
+ list_add_tail(&b->domain_entry, &domain->bus_list);
+ mutex_unlock(&domain->lock);
+
+ *bus = b;
+ return 0;
+
+exit_unref_user_unlock:
+ mutex_unlock(&domain->lock);
+ kdbus_domain_user_unref(b->user);
+ kdbus_ep_disconnect(b->ep);
+ kdbus_ep_unref(b->ep);
+exit_free_reg:
+ kdbus_name_registry_free(b->name_registry);
+exit_free_name:
+ kfree(b->name);
+exit_free:
+ kdbus_meta_free(b->meta);
+ kdbus_policy_db_clear(&b->policy_db);
+ kdbus_domain_unref(b->domain);
+ kfree(b);
+ return ret;
+}
+
+/**
+ * kdbus_bus_make_user() - create a kdbus_cmd_make from user-supplied data
+ * @make: Reference to the location where to store the result
+ * @name: Shortcut to the requested name
+ * @bloom: Bloom parameters for this bus
+ *
+ * This function is part of the connection ioctl() interface and will parse
+ * the user-supplied data.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_bus_make_user(const struct kdbus_cmd_make *make,
+ char **name, struct kdbus_bloom_parameter *bloom)
+{
+ const struct kdbus_item *item;
+ const char *n = NULL;
+ const struct kdbus_bloom_parameter *bl = NULL;
+
+ KDBUS_ITEMS_FOREACH(item, make->items, KDBUS_ITEMS_SIZE(make, items)) {
+ switch (item->type) {
+ case KDBUS_ITEM_MAKE_NAME:
+ if (n)
+ return -EEXIST;
+
+ n = item->str;
+ break;
+
+ case KDBUS_ITEM_BLOOM_PARAMETER:
+ if (bl)
+ return -EEXIST;
+
+ bl = &item->bloom_parameter;
+ break;
+ }
+ }
+
+ if (!n || !bl)
+ return -EBADMSG;
+
+ if (bl->size < 8 || bl->size > KDBUS_BUS_BLOOM_MAX_SIZE)
+ return -EINVAL;
+ if (!KDBUS_IS_ALIGNED8(bl->size))
+ return -EINVAL;
+ if (bl->n_hash < 1)
+ return -EINVAL;
+
+ *name = (char *)n;
+ *bloom = *bl;
+ return 0;
+}
diff --git a/drivers/misc/kdbus/bus.h b/drivers/misc/kdbus/bus.h
new file mode 100644
index 000000000000..fd9d8431b886
--- /dev/null
+++ b/drivers/misc/kdbus/bus.h
@@ -0,0 +1,107 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_BUS_H
+#define __KDBUS_BUS_H
+
+#include <linux/hashtable.h>
+#include <linux/spinlock.h>
+#include <linux/kref.h>
+#include <linux/rwsem.h>
+
+#include "policy.h"
+#include "util.h"
+
+/**
+ * struct kdbus_bus - bus in a domain
+ * @kref: Reference count
+ * @disconnected: Invalidated data
+ * @uid_owner: The uid of the owner of the bus
+ * @domain: Domain of this bus
+ * @name: The bus name
+ * @id: ID of this bus in the domain
+ * @lock: Bus data lock
+ * @ep: Default "bus" endpoint
+ * @ep_seq_last: Last used endpoint id sequence number
+ * @conn_seq_last: Last used connection id sequence number
+ * @ep_list: Endpoints on this bus
+ * @bus_flags: Simple pass-through flags from userspace to userspace
+ * @name_registry: Name registry of this bus
+ * @domain_entry: Entry in domain
+ * @bloom: Bloom parameters
+ * @id128: Unique random 128 bit ID of this bus
+ * @user: Owner of the bus
+ * @policy_db: Policy database for this bus
+ * @notify_list: List of pending kernel-generated messages
+ * @notify_lock: Notification list lock
+ * @notify_flush_lock: Notification flushing lock
+ * @conn_rwlock: Read/Write lock for all lists of child connections
+ * @conn_hash: Map of connection IDs
+ * @monitors_list: Connections that monitor this bus
+ * @meta: Meta information about the bus creator
+ *
+ * A bus provides a "bus" endpoint / device node.
+ *
+ * A bus is created by opening the control node and issuing the
+ * KDBUS_CMD_BUS_MAKE iotcl. Closing this file immediately destroys
+ * the bus.
+ */
+struct kdbus_bus {
+ struct kref kref;
+ bool disconnected;
+ kuid_t uid_owner;
+ struct kdbus_domain *domain;
+ const char *name;
+ u64 id;
+ struct mutex lock;
+ struct kdbus_ep *ep;
+ u64 ep_seq_last;
+ atomic64_t conn_seq_last;
+ struct list_head ep_list;
+ u64 bus_flags;
+ struct kdbus_name_registry *name_registry;
+ struct list_head domain_entry;
+ struct kdbus_bloom_parameter bloom;
+ u8 id128[16];
+ struct kdbus_domain_user *user;
+ struct kdbus_policy_db policy_db;
+ struct list_head notify_list;
+ spinlock_t notify_lock;
+ struct mutex notify_flush_lock;
+
+ struct rw_semaphore conn_rwlock;
+ DECLARE_HASHTABLE(conn_hash, 8);
+ struct list_head monitors_list;
+
+ struct kdbus_meta *meta;
+};
+
+int kdbus_bus_make_user(const struct kdbus_cmd_make *make,
+ char **name, struct kdbus_bloom_parameter *bloom);
+int kdbus_bus_new(struct kdbus_domain *domain,
+ const struct kdbus_cmd_make *make,
+ const char *name,
+ const struct kdbus_bloom_parameter *bloom,
+ umode_t mode, kuid_t uid, kgid_t gid,
+ struct kdbus_bus **bus);
+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn,
+ struct kdbus_cmd_info *cmd_info);
+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus);
+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus);
+void kdbus_bus_disconnect(struct kdbus_bus *bus);
+
+bool kdbus_bus_cred_is_privileged(const struct kdbus_bus *bus,
+ const struct cred *cred);
+bool kdbus_bus_uid_is_privileged(const struct kdbus_bus *bus);
+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id);
+#endif
diff --git a/drivers/misc/kdbus/domain.c b/drivers/misc/kdbus/domain.c
new file mode 100644
index 000000000000..eb2ce720f686
--- /dev/null
+++ b/drivers/misc/kdbus/domain.c
@@ -0,0 +1,477 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "domain.h"
+#include "handle.h"
+#include "item.h"
+#include "limits.h"
+#include "util.h"
+
+/* previous domain id sequence number */
+static atomic64_t kdbus_domain_seq_last;
+
+/* kdbus sysfs subsystem */
+struct bus_type kdbus_subsys = {
+ .name = KBUILD_MODNAME,
+};
+
+/* control nodes are world accessible */
+static char *kdbus_devnode_control(struct device *dev, umode_t *mode,
+ kuid_t *uid, kgid_t *gid)
+{
+ struct kdbus_domain *domain = container_of(dev, struct kdbus_domain,
+ dev);
+
+ if (mode)
+ *mode = domain->mode;
+
+ return NULL;
+}
+
+static void kdbus_dev_release(struct device *dev)
+{
+ kfree(dev);
+}
+
+static struct device_type kdbus_devtype_control = {
+ .name = "control",
+ .release = kdbus_dev_release,
+ .devnode = kdbus_devnode_control,
+};
+
+/**
+ * kdbus_domain_ref() - take a domain reference
+ * @domain: Domain
+ *
+ * Return: the domain itself
+ */
+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain)
+{
+ get_device(&domain->dev);
+ return domain;
+}
+
+/**
+ * kdbus_domain_disconnect() - invalidate a domain
+ * @domain: Domain
+ */
+void kdbus_domain_disconnect(struct kdbus_domain *domain)
+{
+ mutex_lock(&domain->lock);
+ if (domain->disconnected) {
+ mutex_unlock(&domain->lock);
+ return;
+ }
+ domain->disconnected = true;
+ mutex_unlock(&domain->lock);
+
+ /* disconnect from parent domain */
+ if (domain->parent) {
+ mutex_lock(&domain->parent->lock);
+ list_del(&domain->domain_entry);
+ mutex_unlock(&domain->parent->lock);
+ }
+
+ if (device_is_registered(&domain->dev))
+ device_del(&domain->dev);
+
+ kdbus_minor_set(domain->dev.devt, KDBUS_MINOR_CONTROL, NULL);
+
+ /* disconnect all sub-domains */
+ for (;;) {
+ struct kdbus_domain *dom;
+
+ mutex_lock(&domain->lock);
+ dom = list_first_entry_or_null(&domain->domain_list,
+ struct kdbus_domain,
+ domain_entry);
+ if (!dom) {
+ mutex_unlock(&domain->lock);
+ break;
+ }
+
+ /* take reference, release lock, disconnect without lock */
+ kdbus_domain_ref(dom);
+ mutex_unlock(&domain->lock);
+
+ kdbus_domain_disconnect(dom);
+ kdbus_domain_unref(dom);
+ }
+
+ /* disconnect all buses in this domain */
+ for (;;) {
+ struct kdbus_bus *bus;
+
+ mutex_lock(&domain->lock);
+ bus = list_first_entry_or_null(&domain->bus_list,
+ struct kdbus_bus,
+ domain_entry);
+ if (!bus) {
+ mutex_unlock(&domain->lock);
+ break;
+ }
+
+ /* take reference, release lock, disconnect without lock */
+ kdbus_bus_ref(bus);
+ mutex_unlock(&domain->lock);
+
+ kdbus_bus_disconnect(bus);
+ kdbus_bus_unref(bus);
+ }
+}
+
+static void __kdbus_domain_free(struct device *dev)
+{
+ struct kdbus_domain *domain = container_of(dev, struct kdbus_domain,
+ dev);
+
+ BUG_ON(!domain->disconnected);
+ BUG_ON(!list_empty(&domain->domain_list));
+ BUG_ON(!list_empty(&domain->bus_list));
+ BUG_ON(!hash_empty(domain->user_hash));
+
+ kdbus_minor_free(domain->dev.devt);
+ kdbus_domain_unref(domain->parent);
+ idr_destroy(&domain->user_idr);
+ kfree(domain->name);
+ kfree(domain->devpath);
+ kfree(domain);
+}
+
+/**
+ * kdbus_domain_unref() - drop a domain reference
+ * @domain: Domain
+ *
+ * When the last reference is dropped, the domain internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain)
+{
+ if (domain)
+ put_device(&domain->dev);
+ return NULL;
+}
+
+static struct kdbus_domain *kdbus_domain_find(struct kdbus_domain *parent,
+ const char *name)
+{
+ struct kdbus_domain *n;
+
+ list_for_each_entry(n, &parent->domain_list, domain_entry)
+ if (!strcmp(n->name, name))
+ return n;
+
+ return NULL;
+}
+
+/**
+ * kdbus_domain_new() - create a new domain
+ * @parent: Parent domain, NULL for initial one
+ * @name: Name of the domain, NULL for the initial one
+ * @mode: The access mode for the "control" device node
+ * @domain: The returned domain
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_domain_new(struct kdbus_domain *parent, const char *name,
+ umode_t mode, struct kdbus_domain **domain)
+{
+ struct kdbus_domain *d;
+ int ret;
+
+ BUG_ON(*domain);
+
+ if ((parent && !name) || (!parent && name))
+ return -EINVAL;
+
+ d = kzalloc(sizeof(*d), GFP_KERNEL);
+ if (!d)
+ return -ENOMEM;
+
+ d->disconnected = true;
+ INIT_LIST_HEAD(&d->bus_list);
+ INIT_LIST_HEAD(&d->domain_list);
+ d->mode = mode;
+ mutex_init(&d->lock);
+ atomic64_set(&d->msg_seq_last, 0);
+ idr_init(&d->user_idr);
+
+ device_initialize(&d->dev);
+ d->dev.bus = &kdbus_subsys;
+ d->dev.type = &kdbus_devtype_control;
+ d->dev.release = __kdbus_domain_free;
+
+ /* compose name and path of base directory in /dev */
+ if (parent) {
+ d->devpath = kasprintf(GFP_KERNEL, "%s/domain/%s",
+ parent->devpath, name);
+ if (!d->devpath) {
+ ret = -ENOMEM;
+ goto exit_put;
+ }
+
+ d->name = kstrdup(name, GFP_KERNEL);
+ if (!d->name) {
+ ret = -ENOMEM;
+ goto exit_put;
+ }
+ } else {
+ /* initial domain */
+ d->devpath = kstrdup(KBUILD_MODNAME, GFP_KERNEL);
+ if (!d->devpath) {
+ ret = -ENOMEM;
+ goto exit_put;
+ }
+ }
+
+ ret = dev_set_name(&d->dev, "%s/control", d->devpath);
+ if (ret < 0)
+ goto exit_put;
+
+ ret = kdbus_minor_alloc(KDBUS_MINOR_CONTROL, NULL, &d->dev.devt);
+ if (ret < 0)
+ goto exit_put;
+
+ if (parent) {
+ /* lock order: parent domain -> domain */
+ mutex_lock(&parent->lock);
+
+ if (parent->disconnected) {
+ mutex_unlock(&parent->lock);
+ ret = -ESHUTDOWN;
+ goto exit_put;
+ }
+
+ if (kdbus_domain_find(parent, name)) {
+ mutex_unlock(&parent->lock);
+ ret = -EEXIST;
+ goto exit_put;
+ }
+
+ d->parent = kdbus_domain_ref(parent);
+ list_add_tail(&d->domain_entry, &parent->domain_list);
+ }
+
+ d->id = atomic64_inc_return(&kdbus_domain_seq_last);
+
+ /*
+ * We have to mark the domain as enabled _before_ running device_add().
+ * Otherwise, there's a race between UEVENT_ADD (generated by
+ * device_add()) and us enabling the minor.
+ * However, this means user-space can open the minor before we called
+ * device_add(). This is fine, as we never require the device to be
+ * registered, anyway.
+ */
+
+ d->disconnected = false;
+ kdbus_minor_set_control(d->dev.devt, d);
+
+ ret = device_add(&d->dev);
+
+ if (parent)
+ mutex_unlock(&parent->lock);
+
+ if (ret < 0) {
+ kdbus_domain_disconnect(d);
+ kdbus_domain_unref(d);
+ return ret;
+ }
+
+ *domain = d;
+ return 0;
+
+exit_put:
+ put_device(&d->dev);
+ return ret;
+}
+
+/**
+ * kdbus_domain_user_assign_id() - allocate ID and assign it to the
+ * domain user
+ * @domain: The domain of the user
+ * @user: The kdbus_domain_user object of the user
+ *
+ * Returns 0 if ID in [0, INT_MAX] is successfully assigned to the
+ * domain user. Negative errno on failure.
+ *
+ * The user index is used in arrays for accounting user quota in
+ * receiver queues.
+ *
+ * Caller must have the domain lock held and must ensure that the
+ * domain was not disconnected.
+ */
+static int kdbus_domain_user_assign_id(struct kdbus_domain *domain,
+ struct kdbus_domain_user *user)
+{
+ int ret;
+
+ /*
+ * Allocate the smallest possible index for this user; used
+ * in arrays for accounting user quota in receiver queues.
+ */
+ ret = idr_alloc(&domain->user_idr, user, 0, 0, GFP_KERNEL);
+ if (ret < 0)
+ return ret;
+
+ user->idr = ret;
+
+ return 0;
+}
+
+/**
+ * kdbus_domain_get_user_unlocked() - get a kdbus_domain_user object
+ * @domain: The domain of the user
+ * @uid: The uid of the user; INVALID_UID for an
+ * anonymous user like a custom endpoint
+ * @user: Pointer to a reference where the accounted
+ * domain user will be stored.
+ *
+ * Return: 0 on success, negative errno on failure.
+ *
+ * If there is a uid matching, then use the already accounted
+ * kdbus_domain_user, increment its reference counter and
+ * return it in the @user argument. Otherwise allocate a new one,
+ * link it into the domain and return it.
+ */
+int kdbus_domain_get_user_unlocked(struct kdbus_domain *domain,
+ kuid_t uid,
+ struct kdbus_domain_user **user)
+{
+ int ret;
+ struct kdbus_domain_user *tmp_user;
+ struct kdbus_domain_user *u = NULL;
+
+ BUG_ON(!mutex_is_locked(&domain->lock));
+
+ /* find uid and reference it */
+ if (uid_valid(uid)) {
+ hash_for_each_possible(domain->user_hash, tmp_user,
+ hentry, __kuid_val(uid)) {
+ if (!uid_eq(tmp_user->uid, uid))
+ continue;
+
+ u = kdbus_domain_user_ref(tmp_user);
+ goto out;
+ }
+ }
+
+ ret = -ENOMEM;
+ u = kzalloc(sizeof(*u), GFP_KERNEL);
+ if (!u)
+ return ret;
+
+ kref_init(&u->kref);
+ u->domain = kdbus_domain_ref(domain);
+ u->uid = uid;
+ atomic_set(&u->buses, 0);
+ atomic_set(&u->connections, 0);
+
+ /* Assign user ID and link into domain */
+ ret = kdbus_domain_user_assign_id(domain, u);
+ if (ret < 0)
+ goto exit_free;
+
+ /* UID hash map */
+ hash_add(domain->user_hash, &u->hentry, __kuid_val(u->uid));
+
+out:
+ *user = u;
+ return 0;
+
+exit_free:
+ kdbus_domain_unref(u->domain);
+ kfree(u);
+ return ret;
+}
+
+/**
+ * kdbus_domain_get_user() - get a kdbus_domain_user object
+ * @domain: The domain of the user
+ * @uid: The uid of the user; INVALID_UID for an
+ * anonymous user like a custom endpoint
+ * @user: Pointer to a reference where the accounted
+ * domain user will be stored.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_domain_get_user(struct kdbus_domain *domain,
+ kuid_t uid,
+ struct kdbus_domain_user **user)
+{
+ int ret = -ESHUTDOWN;
+
+ mutex_lock(&domain->lock);
+ if (!domain->disconnected)
+ ret = kdbus_domain_get_user_unlocked(domain, uid, user);
+ mutex_unlock(&domain->lock);
+
+ return ret;
+}
+
+static void __kdbus_domain_user_free(struct kref *kref)
+{
+ struct kdbus_domain_user *user =
+ container_of(kref, struct kdbus_domain_user, kref);
+
+ BUG_ON(atomic_read(&user->buses) > 0);
+ BUG_ON(atomic_read(&user->connections) > 0);
+
+ mutex_lock(&user->domain->lock);
+ idr_remove(&user->domain->user_idr, user->idr);
+ hash_del(&user->hentry);
+ mutex_unlock(&user->domain->lock);
+
+ kdbus_domain_unref(user->domain);
+ kfree(user);
+}
+
+/**
+ * kdbus_domain_user_ref() - take a domain user reference
+ * @u: User
+ *
+ * Return: the domain user itself
+ */
+struct kdbus_domain_user *kdbus_domain_user_ref(struct kdbus_domain_user *u)
+{
+ kref_get(&u->kref);
+ return u;
+}
+
+/**
+ * kdbus_domain_user_unref() - drop a domain user eference
+ * @u: User
+ *
+ * When the last reference is dropped, the domain internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_domain_user *kdbus_domain_user_unref(struct kdbus_domain_user *u)
+{
+ if (u)
+ kref_put(&u->kref, __kdbus_domain_user_free);
+ return NULL;
+}
diff --git a/drivers/misc/kdbus/domain.h b/drivers/misc/kdbus/domain.h
new file mode 100644
index 000000000000..f51cdb56e83a
--- /dev/null
+++ b/drivers/misc/kdbus/domain.h
@@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_DOMAIN_H
+#define __KDBUS_DOMAIN_H
+
+#include <linux/device.h>
+#include <linux/hashtable.h>
+#include <linux/idr.h>
+
+/**
+ * struct kdbus_domain - domain for buses
+ * @dev: Underlying device
+ * @disconnected: Invalidated data
+ * @name: Name of the domain
+ * @devpath: /dev base directory path
+ * @parent: Parent domain
+ * @id: Global id of this domain
+ * @mode: Device node access mode
+ * @lock: Domain data lock
+ * @bus_seq_last: Last used bus id sequence number
+ * @msg_seq_last: Last used message id sequence number
+ * @domain_list: List of child domains
+ * @domain_entry: Entry in parent domain
+ * @bus_list: Buses in this domain
+ * @user_hash: Accounting of user resources
+ * @user_idr: Map of all users; smallest possible index
+ *
+ * A domain provides a "control" device node. Every domain has its
+ * own major number for its endpoint device nodes.
+ *
+ * The initial domain is created at initialization time, is unnamed and
+ * stays around for forver.
+ *
+ * A domain is created by opening the "control" device node of the
+ * parent domain and issuing the KDBUS_CMD_DOMAIN_MAKE iotcl. Closing this
+ * file immediately destroys the entire domain.
+ */
+struct kdbus_domain {
+ struct device dev;
+ bool disconnected;
+ const char *name;
+ const char *devpath;
+ struct kdbus_domain *parent;
+ u64 id;
+ umode_t mode;
+ struct mutex lock;
+ u64 bus_seq_last;
+ atomic64_t msg_seq_last;
+ struct list_head domain_list;
+ struct list_head domain_entry;
+ struct list_head bus_list;
+ DECLARE_HASHTABLE(user_hash, 6);
+ struct idr user_idr;
+};
+
+/**
+ * struct kdbus_domain_user - resource accounting for users
+ * @kref: Reference counter
+ * @domain: Domain of the user
+ * @hentry: Entry in domain user map
+ * @idr: Smallest possible index number of all users
+ * @uid: UID of the user
+ * @buses: Number of buses the user has created
+ * @connections: Number of connections the user has created
+ */
+struct kdbus_domain_user {
+ struct kref kref;
+ struct kdbus_domain *domain;
+ struct hlist_node hentry;
+ unsigned int idr;
+ kuid_t uid;
+ atomic_t buses;
+ atomic_t connections;
+};
+
+extern struct bus_type kdbus_subsys;
+
+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain);
+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain);
+void kdbus_domain_disconnect(struct kdbus_domain *domain);
+int kdbus_domain_new(struct kdbus_domain *parent, const char *name,
+ umode_t mode, struct kdbus_domain **domain);
+
+int kdbus_domain_get_user_unlocked(struct kdbus_domain *domain,
+ kuid_t uid,
+ struct kdbus_domain_user **user);
+
+int kdbus_domain_get_user(struct kdbus_domain *domain,
+ kuid_t uid,
+ struct kdbus_domain_user **user);
+
+struct kdbus_domain_user *kdbus_domain_user_ref(struct kdbus_domain_user *u);
+struct kdbus_domain_user *kdbus_domain_user_unref(struct kdbus_domain_user *u);
+#endif
diff --git a/drivers/misc/kdbus/endpoint.c b/drivers/misc/kdbus/endpoint.c
new file mode 100644
index 000000000000..830436067c0c
--- /dev/null
+++ b/drivers/misc/kdbus/endpoint.c
@@ -0,0 +1,567 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "message.h"
+#include "policy.h"
+
+/* endpoints are by default owned by the bus owner */
+static char *kdbus_devnode_ep(struct device *dev, umode_t *mode,
+ kuid_t *uid, kgid_t *gid)
+{
+ struct kdbus_ep *ep = container_of(dev, struct kdbus_ep, dev);
+
+ if (mode)
+ *mode = ep->mode;
+ if (uid)
+ *uid = ep->uid;
+ if (gid)
+ *gid = ep->gid;
+
+ return NULL;
+}
+
+static void kdbus_dev_release(struct device *dev)
+{
+ kfree(dev);
+}
+
+static struct device_type kdbus_devtype_ep = {
+ .name = "endpoint",
+ .release = kdbus_dev_release,
+ .devnode = kdbus_devnode_ep,
+};
+
+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep)
+{
+ get_device(&ep->dev);
+ return ep;
+}
+
+/**
+ * kdbus_ep_disconnect() - disconnect an endpoint
+ * @ep: Endpoint
+ */
+void kdbus_ep_disconnect(struct kdbus_ep *ep)
+{
+ mutex_lock(&ep->lock);
+ if (ep->disconnected) {
+ mutex_unlock(&ep->lock);
+ return;
+ }
+ ep->disconnected = true;
+ mutex_unlock(&ep->lock);
+
+ /* disconnect from bus */
+ mutex_lock(&ep->bus->lock);
+ list_del(&ep->bus_entry);
+ mutex_unlock(&ep->bus->lock);
+
+ if (device_is_registered(&ep->dev))
+ device_del(&ep->dev);
+
+ kdbus_minor_set(ep->dev.devt, KDBUS_MINOR_EP, NULL);
+
+ /* disconnect all connections to this endpoint */
+ for (;;) {
+ struct kdbus_conn *conn;
+
+ mutex_lock(&ep->lock);
+ conn = list_first_entry_or_null(&ep->conn_list,
+ struct kdbus_conn,
+ ep_entry);
+ if (!conn) {
+ mutex_unlock(&ep->lock);
+ break;
+ }
+
+ /* take reference, release lock, disconnect without lock */
+ kdbus_conn_ref(conn);
+ mutex_unlock(&ep->lock);
+
+ kdbus_conn_disconnect(conn, false);
+ kdbus_conn_unref(conn);
+ }
+}
+
+static void __kdbus_ep_free(struct device *dev)
+{
+ struct kdbus_ep *ep = container_of(dev, struct kdbus_ep, dev);
+
+ BUG_ON(!ep->disconnected);
+ BUG_ON(!list_empty(&ep->conn_list));
+
+ kdbus_policy_db_clear(&ep->policy_db);
+ kdbus_minor_free(ep->dev.devt);
+ kdbus_bus_unref(ep->bus);
+ kdbus_domain_user_unref(ep->user);
+ kfree(ep->name);
+ kfree(ep);
+}
+
+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep)
+{
+ if (ep)
+ put_device(&ep->dev);
+ return NULL;
+}
+
+static struct kdbus_ep *kdbus_ep_find(struct kdbus_bus *bus, const char *name)
+{
+ struct kdbus_ep *e;
+
+ list_for_each_entry(e, &bus->ep_list, bus_entry)
+ if (!strcmp(e->name, name))
+ return e;
+
+ return NULL;
+}
+
+/**
+ * kdbus_ep_new() - create a new endpoint
+ * @bus: The bus this endpoint will be created for
+ * @name: The name of the endpoint
+ * @mode: The access mode for the device node
+ * @uid: The uid of the device node
+ * @gid: The gid of the device node
+ * @policy: Whether or not the endpoint should have a policy db
+ * @ep: Pointer to a reference where the new endpoint is stored
+ *
+ * This function will create a new enpoint with the given
+ * name and properties for a given bus.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_ep_new(struct kdbus_bus *bus, const char *name,
+ umode_t mode, kuid_t uid, kgid_t gid,
+ bool policy, struct kdbus_ep **ep)
+{
+ struct kdbus_ep *e;
+ int ret;
+
+ e = kzalloc(sizeof(*e), GFP_KERNEL);
+ if (!e)
+ return -ENOMEM;
+
+ e->disconnected = true;
+ mutex_init(&e->lock);
+ INIT_LIST_HEAD(&e->conn_list);
+ kdbus_policy_db_init(&e->policy_db);
+ e->uid = uid;
+ e->gid = gid;
+ e->mode = mode;
+ e->has_policy = policy;
+
+ device_initialize(&e->dev);
+ e->dev.bus = &kdbus_subsys;
+ e->dev.type = &kdbus_devtype_ep;
+ e->dev.release = __kdbus_ep_free;
+
+ e->name = kstrdup(name, GFP_KERNEL);
+ if (!e->name) {
+ ret = -ENOMEM;
+ goto exit_put;
+ }
+
+ ret = dev_set_name(&e->dev, "%s/%s/%s",
+ bus->domain->devpath, bus->name, name);
+ if (ret < 0)
+ goto exit_put;
+
+ ret = kdbus_minor_alloc(KDBUS_MINOR_EP, NULL, &e->dev.devt);
+ if (ret < 0)
+ goto exit_put;
+
+ mutex_lock(&bus->lock);
+
+ if (bus->disconnected) {
+ mutex_unlock(&bus->lock);
+ ret = -ESHUTDOWN;
+ goto exit_put;
+ }
+
+ if (kdbus_ep_find(bus, name)) {
+ mutex_unlock(&bus->lock);
+ ret = -EEXIST;
+ goto exit_put;
+ }
+
+ e->bus = kdbus_bus_ref(bus);
+ list_add_tail(&e->bus_entry, &bus->ep_list);
+
+ e->id = ++bus->ep_seq_last;
+
+ /*
+ * Same as with domains, we have to mark it enabled _before_ running
+ * device_add() to avoid messing with state after UEVENT_ADD was sent.
+ */
+
+ e->disconnected = false;
+ kdbus_minor_set_ep(e->dev.devt, e);
+
+ ret = device_add(&e->dev);
+
+ mutex_unlock(&bus->lock);
+
+ if (ret < 0) {
+ kdbus_ep_disconnect(e);
+ kdbus_ep_unref(e);
+ return ret;
+ }
+
+ if (ep)
+ *ep = e;
+ return 0;
+
+exit_put:
+ put_device(&e->dev);
+ return ret;
+}
+
+/**
+ * kdbus_ep_policy_set() - set policy for an endpoint
+ * @ep: The endpoint
+ * @items: The kdbus items containing policy information
+ * @items_size: The total length of the items
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_ep_policy_set(struct kdbus_ep *ep,
+ const struct kdbus_item *items,
+ size_t items_size)
+{
+ return kdbus_policy_set(&ep->policy_db, items, items_size, 0, true, ep);
+}
+
+/**
+ * kdbus_ep_policy_check_see_access_unlocked() - verify a connection can see
+ * the passed name
+ * @ep: Endpoint to operate on
+ * @conn: Connection that lists names
+ * @name: Name that is tried to be listed
+ *
+ * This verifies that @conn is allowed to see the well-known name @name via the
+ * endpoint @ep.
+ *
+ * Return: 0 if allowed, negative error code if not.
+ */
+int kdbus_ep_policy_check_see_access_unlocked(struct kdbus_ep *ep,
+ struct kdbus_conn *conn,
+ const char *name)
+{
+ int ret;
+
+ /*
+ * Check policy, if the endpoint of the connection has a db.
+ * Note that policy DBs instanciated along with connections
+ * don't have SEE rules, so it's sufficient to check the
+ * endpoint's database.
+ *
+ * The lock for the policy db is held across all calls of
+ * kdbus_name_list_all(), so the entries in both writing
+ * and non-writing runs of kdbus_name_list_write() are the
+ * same.
+ */
+
+ if (!ep->has_policy)
+ return 0;
+
+ ret = kdbus_policy_check_see_access_unlocked(&ep->policy_db,
+ conn, name);
+
+ /* don't leak hints whether a name exists on a custom endpoint. */
+ if (ret == -EPERM)
+ return -ENOENT;
+
+ return ret;
+}
+
+/**
+ * kdbus_ep_policy_check_see_access() - verify a connection can see
+ * the passed name
+ * @ep: Endpoint to operate on
+ * @conn: Connection that lists names
+ * @name: Name that is tried to be listed
+ *
+ * This verifies that @conn is allowed to see the well-known name @name via the
+ * endpoint @ep.
+ *
+ * Return: 0 if allowed, negative error code if not.
+ */
+int kdbus_ep_policy_check_see_access(struct kdbus_ep *ep,
+ struct kdbus_conn *conn,
+ const char *name)
+{
+ int ret;
+
+ down_read(&ep->policy_db.entries_rwlock);
+ mutex_lock(&conn->lock);
+
+ ret = kdbus_ep_policy_check_see_access_unlocked(ep, conn, name);
+
+ mutex_unlock(&conn->lock);
+ up_read(&ep->policy_db.entries_rwlock);
+
+ return ret;
+}
+
+/**
+ * kdbus_ep_policy_check_notification() - verify a connection is allowed to see
+ * the name in a notification
+ * @ep: Endpoint to operate on
+ * @conn: Connection connected to the endpoint
+ * @kmsg: The message carrying the notification
+ *
+ * This function verifies that @conn is allowed to see the well-known name
+ * inside a name-change notification contained in @msg via the endpoint @ep.
+ * If @msg is not a notification for name changes, this function does nothing
+ * but return 0.
+ *
+ * Return: 0 if allowed, negative error code if not.
+ */
+int kdbus_ep_policy_check_notification(struct kdbus_ep *ep,
+ struct kdbus_conn *conn,
+ const struct kdbus_kmsg *kmsg)
+{
+ int ret = 0;
+
+ if (kmsg->msg.src_id != KDBUS_SRC_ID_KERNEL || !ep->has_policy)
+ return 0;
+
+ switch (kmsg->notify_type) {
+ case KDBUS_ITEM_NAME_ADD:
+ case KDBUS_ITEM_NAME_REMOVE:
+ case KDBUS_ITEM_NAME_CHANGE:
+ ret = kdbus_ep_policy_check_see_access(ep, conn,
+ kmsg->notify_name);
+ break;
+ default:
+ break;
+ }
+
+ return ret;
+}
+
+/**
+ * kdbus_ep_policy_check_src_names() - check whether a connection's endpoint
+ * is allowed to see any of another
+ * connection's currently owned names
+ * @ep: Endpoint to operate on
+ * @conn_src: Connection that owns the names
+ * @conn_dst: Destination connection to check credentials against
+ *
+ * This function checks whether @ep is allowed to see any of the names
+ * currently owned by @conn_src.
+ *
+ * Return: 0 if allowed, negative error code if not.
+ */
+int kdbus_ep_policy_check_src_names(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst)
+{
+ struct kdbus_name_entry *e;
+ int ret = -ENOENT;
+
+ if (!ep->has_policy)
+ return 0;
+
+ down_read(&ep->policy_db.entries_rwlock);
+ mutex_lock(&conn_src->lock);
+
+ list_for_each_entry(e, &conn_src->names_list, conn_entry) {
+ ret = kdbus_ep_policy_check_see_access_unlocked(ep, conn_dst,
+ e->name);
+ if (ret == 0)
+ break;
+ }
+
+ mutex_unlock(&conn_src->lock);
+ up_read(&ep->policy_db.entries_rwlock);
+
+ return ret;
+}
+
+static int
+kdbus_custom_ep_check_talk_access(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst)
+{
+ int ret;
+
+ if (!ep->has_policy)
+ return 0;
+
+ /* Custom endpoints have stricter policies */
+ ret = kdbus_policy_check_talk_access(&ep->policy_db,
+ conn_src, conn_dst);
+
+ /*
+ * Don't leak hints whether a name exists on a custom
+ * endpoint.
+ */
+ if (ret == -EPERM)
+ ret = -ENOENT;
+
+ return ret;
+}
+
+static bool
+kdbus_ep_has_default_talk_access(struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst)
+{
+ if (kdbus_bus_cred_is_privileged(conn_src->bus, conn_src->cred))
+ return true;
+
+ if (uid_eq(conn_src->cred->fsuid, conn_dst->cred->uid))
+ return true;
+
+ return false;
+}
+
+/**
+ * kdbus_ep_policy_check_talk_access() - verify a connection can talk to the
+ * the passed connection
+ * @ep: Endpoint to operate on
+ * @conn_src: Connection that tries to talk
+ * @conn_dst: Connection that is talked to
+ *
+ * This verifies that @conn_src is allowed to talk to @conn_dst via the
+ * endpoint @ep.
+ *
+ * Return: 0 if allowed, negative error code if not.
+ */
+int kdbus_ep_policy_check_talk_access(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst)
+{
+ int ret;
+
+ /* First check the custom endpoint with its policies */
+ ret = kdbus_custom_ep_check_talk_access(ep, conn_src, conn_dst);
+ if (ret < 0)
+ return ret;
+
+ /* Then check if it satisfies the implicit policies */
+ if (kdbus_ep_has_default_talk_access(conn_src, conn_dst))
+ return 0;
+
+ /* Fallback to the default endpoint policy */
+ ret = kdbus_policy_check_talk_access(&ep->bus->policy_db,
+ conn_src, conn_dst);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+/**
+ * kdbus_ep_policy_check_broadcast() - verify a connection can send
+ * broadcast messages to the
+ * passed connection
+ * @ep: Endpoint to operate on
+ * @conn_src: Connection that tries to talk
+ * @conn_dst: Connection that is talked to
+ *
+ * This verifies that @conn_src is allowed to send broadcast messages
+ * to @conn_dst via the endpoint @ep.
+ *
+ * Return: 0 if allowed, negative error code if not.
+ */
+int kdbus_ep_policy_check_broadcast(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst)
+{
+ int ret;
+
+ /* First check the custom endpoint with its policies */
+ ret = kdbus_custom_ep_check_talk_access(ep, conn_src, conn_dst);
+ if (ret < 0)
+ return ret;
+
+ /* Then check if it satisfies the implicit policies */
+ if (kdbus_ep_has_default_talk_access(conn_src, conn_dst))
+ return 0;
+
+ /*
+ * If conn_src owns names on the bus, and the conn_dst does
+ * not own any name, then allow conn_src to signal to
+ * conn_dst. Otherwise fallback and perform the bus policy
+ * check on conn_dst.
+ *
+ * This way we allow services to signal on the bus, and we
+ * block broadcasts directed to services that own names and
+ * do not want to receive these messages unless there is a
+ * policy entry to permit it. By this we try to follow the
+ * same logic used for unicat messages.
+ */
+ if (atomic_read(&conn_src->name_count) > 0 &&
+ atomic_read(&conn_dst->name_count) == 0)
+ return 0;
+
+ /* Fallback to the default endpoint policy */
+ ret = kdbus_policy_check_talk_access(&ep->bus->policy_db,
+ conn_src, conn_dst);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+/**
+ * kdbus_ep_policy_check_own_access() - verify a connection can own the passed
+ * name
+ * @ep: Endpoint to operate on
+ * @conn: Connection that acquires a name
+ * @name: Name that is about to be acquired
+ *
+ * This verifies that @conn is allowed to acquire the well-known name @name via
+ * the endpoint @ep.
+ *
+ * Return: 0 if allowed, negative error code if not.
+ */
+int kdbus_ep_policy_check_own_access(struct kdbus_ep *ep,
+ const struct kdbus_conn *conn,
+ const char *name)
+{
+ int ret;
+
+ if (ep->has_policy) {
+ ret = kdbus_policy_check_own_access(&ep->policy_db, conn, name);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (kdbus_bus_cred_is_privileged(conn->bus, conn->cred))
+ return 0;
+
+ ret = kdbus_policy_check_own_access(&ep->bus->policy_db, conn, name);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
diff --git a/drivers/misc/kdbus/endpoint.h b/drivers/misc/kdbus/endpoint.h
new file mode 100644
index 000000000000..19cb2d30d093
--- /dev/null
+++ b/drivers/misc/kdbus/endpoint.h
@@ -0,0 +1,94 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_ENDPOINT_H
+#define __KDBUS_ENDPOINT_H
+
+#include <linux/device.h>
+#include "limits.h"
+#include "names.h"
+#include "policy.h"
+#include "util.h"
+
+/*
+ * struct kdbus_endpoint - enpoint to access a bus
+ * @dev: Device
+ * @bus: Bus behind this endpoint
+ * @name: Name of the endpoint
+ * @id: ID of this endpoint on the bus
+ * @mode: File mode of this endpoint device node
+ * @uid: UID owning this endpoint
+ * @gid: GID owning this endpoint
+ * @conn_list: Connections of this endpoint
+ * @bus_entry: bus' endpoints
+ * @lock: Endpoint data lock
+ * @user: Custom enpoints account against an anonymous user
+ * @policy_db: Uploaded policy
+ * @disconnected: Invalidated data
+ * @has_policy: The policy-db is valid and should be used
+ *
+ * An enpoint offers access to a bus; the default device node name is "bus".
+ * Additional custom endpoints to the same bus can be created and they can
+ * carry their own policies/filters.
+ */
+struct kdbus_ep {
+ struct device dev;
+ struct kdbus_bus *bus;
+ const char *name;
+ u64 id;
+ umode_t mode;
+ kuid_t uid;
+ kgid_t gid;
+ struct list_head conn_list;
+ struct list_head bus_entry;
+ struct mutex lock;
+ struct kdbus_domain_user *user;
+ struct kdbus_policy_db policy_db;
+
+ bool disconnected : 1;
+ bool has_policy : 1;
+};
+
+int kdbus_ep_new(struct kdbus_bus *bus, const char *name,
+ umode_t mode, kuid_t uid, kgid_t gid,
+ bool policy, struct kdbus_ep **ep);
+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep);
+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep);
+void kdbus_ep_disconnect(struct kdbus_ep *ep);
+int kdbus_ep_policy_set(struct kdbus_ep *ep,
+ const struct kdbus_item *items,
+ size_t items_size);
+
+int kdbus_ep_policy_check_see_access_unlocked(struct kdbus_ep *ep,
+ struct kdbus_conn *conn,
+ const char *name);
+int kdbus_ep_policy_check_see_access(struct kdbus_ep *ep,
+ struct kdbus_conn *conn,
+ const char *name);
+int kdbus_ep_policy_check_notification(struct kdbus_ep *ep,
+ struct kdbus_conn *conn,
+ const struct kdbus_kmsg *kmsg);
+int kdbus_ep_policy_check_src_names(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst);
+int kdbus_ep_policy_check_talk_access(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst);
+int kdbus_ep_policy_check_broadcast(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst);
+int kdbus_ep_policy_check_own_access(struct kdbus_ep *ep,
+ const struct kdbus_conn *conn,
+ const char *name);
+
+#endif
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric W. Biederman
2014-10-30 04:00:48 UTC
Permalink
Greg Kroah-Hartman <***@linuxfoundation.org> writes:

The way capabilities are checked in this patch make me very nervous.

We are not checking permissions at open time. Every other location
of calling capable on file like objects has been show to be suceptible
to file descriptor pass attacks.
Post by Greg Kroah-Hartman
See Documentation/kdbus.txt for more details.
---
diff --git a/drivers/misc/kdbus/bus.c b/drivers/misc/kdbus/bus.c
new file mode 100644
index 000000000000..6dcaf22f5d59
--- /dev/null
+++ b/drivers/misc/kdbus/bus.c
@@ -0,0 +1,450 @@
+/**
+ * kdbus_bus_cred_is_privileged() - check whether the given credentials in
+ * combination with the capabilities of the
+ * current thead are privileged on the bus
+ *
+ * Return: true if the credentials are privileged, otherwise false.
+ */
+bool kdbus_bus_cred_is_privileged(const struct kdbus_bus *bus,
+ const struct cred *cred)
+{
+ /* Capabilities are *ALWAYS* tested against the current thread, they're
+ * never remembered from conn-credentials. */
+ if (ns_capable(&init_user_ns, CAP_IPC_OWNER))
+ return true;
+
+ return uid_eq(bus->uid_owner, cred->fsuid);
+}
+
+/**
+ * kdbus_bus_uid_is_privileged() - check whether the current user is a
+ * priviledged bus user
+ *
+ * Return: true if the current user has CAP_IPC_OWNER capabilities, or
+ * if it has the same UID as the user that created the bus. Otherwise,
+ * false is returned.
+ */
+bool kdbus_bus_uid_is_privileged(const struct kdbus_bus *bus)
+{
+ return kdbus_bus_cred_is_privileged(bus, current_cred());
+}
+/**
+ * kdbus_bus_new() - create a new bus
+ * details for the bus creation
+ *
+ * This function will allocate a new kdbus_bus and link it to the given
+ * domain.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_bus_new(struct kdbus_domain *domain,
+ const struct kdbus_cmd_make *make,
+ const char *name,
+ const struct kdbus_bloom_parameter *bloom,
+ umode_t mode, kuid_t uid, kgid_t gid,
+ struct kdbus_bus **bus)
+{
[snip]
Post by Greg Kroah-Hartman
+
+ if (!capable(CAP_IPC_OWNER) &&
+ atomic_inc_return(&b->user->buses) > KDBUS_USER_MAX_BUSES) {
+ atomic_dec(&b->user->buses);
+ ret = -EMFILE;
+ goto exit_unref_user_unlock;
+ }
+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Djalal Harouni
2014-10-30 09:59:08 UTC
Permalink
Post by Eric W. Biederman
The way capabilities are checked in this patch make me very nervous.
We are not checking permissions at open time. Every other location
of calling capable on file like objects has been show to be suceptible
to file descriptor pass attacks.
Yes, I do understand the concern, this is valid for some cases! but we
can't apply it on the ioctl API ?! please see below:

All (perhaps not all) the current ioctl do not check for fd passing
attacks! if a privileged do arbitrary ioctl on untrusted fds we are
already owned... the dumb privileged process is the one to blame, right?


Example:
1) fs/ext4/ioctl.c:ext4_ioctl()
they have:
inode_owner_or_capable() + capable() checks

for all the restricted ioctl()

2) fs/xfs/xfs_ioctl.c:xfs_file_ioctl()
they have:
capable() checks

3) fs/btrfs/ioctl.c:btrfs_ioctl()
they have capable() + inode_owner_or_capable()

.. long list

These are sensible API and they do not care at all about fd passing,
so I don't think we should care either ?! or perhaps I'm missing
something ?


The capable() is done as it is, and for the inode_owner_or_capable() you
will notice that we followed the same logic and did use it in our
kdbus_bus_uid_is_privileged() to stay safe and follow what other API are
doing.

Thank you for the comments!
Post by Eric W. Biederman
Post by Greg Kroah-Hartman
See Documentation/kdbus.txt for more details.
---
diff --git a/drivers/misc/kdbus/bus.c b/drivers/misc/kdbus/bus.c
new file mode 100644
index 000000000000..6dcaf22f5d59
--- /dev/null
+++ b/drivers/misc/kdbus/bus.c
@@ -0,0 +1,450 @@
+/**
+ * kdbus_bus_cred_is_privileged() - check whether the given credentials in
+ * combination with the capabilities of the
+ * current thead are privileged on the bus
+ *
+ * Return: true if the credentials are privileged, otherwise false.
+ */
+bool kdbus_bus_cred_is_privileged(const struct kdbus_bus *bus,
+ const struct cred *cred)
+{
+ /* Capabilities are *ALWAYS* tested against the current thread, they're
+ * never remembered from conn-credentials. */
+ if (ns_capable(&init_user_ns, CAP_IPC_OWNER))
+ return true;
+
+ return uid_eq(bus->uid_owner, cred->fsuid);
+}
+
+/**
+ * kdbus_bus_uid_is_privileged() - check whether the current user is a
+ * priviledged bus user
+ *
+ * Return: true if the current user has CAP_IPC_OWNER capabilities, or
+ * if it has the same UID as the user that created the bus. Otherwise,
+ * false is returned.
+ */
+bool kdbus_bus_uid_is_privileged(const struct kdbus_bus *bus)
+{
+ return kdbus_bus_cred_is_privileged(bus, current_cred());
+}
+/**
+ * kdbus_bus_new() - create a new bus
+ * details for the bus creation
+ *
+ * This function will allocate a new kdbus_bus and link it to the given
+ * domain.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_bus_new(struct kdbus_domain *domain,
+ const struct kdbus_cmd_make *make,
+ const char *name,
+ const struct kdbus_bloom_parameter *bloom,
+ umode_t mode, kuid_t uid, kgid_t gid,
+ struct kdbus_bus **bus)
+{
[snip]
Post by Greg Kroah-Hartman
+
+ if (!capable(CAP_IPC_OWNER) &&
+ atomic_inc_return(&b->user->buses) > KDBUS_USER_MAX_BUSES) {
+ atomic_dec(&b->user->buses);
+ ret = -EMFILE;
+ goto exit_unref_user_unlock;
+ }
+
--
Djalal Harouni
http://opendz.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric W. Biederman
2014-10-30 12:16:10 UTC
Permalink
Post by Djalal Harouni
Post by Eric W. Biederman
The way capabilities are checked in this patch make me very nervous.
We are not checking permissions at open time. Every other location
of calling capable on file like objects has been show to be suceptible
to file descriptor pass attacks.
Yes, I do understand the concern, this is valid for some cases! but we
All (perhaps not all) the current ioctl do not check for fd passing
attacks! if a privileged do arbitrary ioctl on untrusted fds we are
already owned... the dumb privileged process is the one to blame, right?
1) fs/ext4/ioctl.c:ext4_ioctl()
inode_owner_or_capable() + capable() checks
for all the restricted ioctl()
2) fs/xfs/xfs_ioctl.c:xfs_file_ioctl()
capable() checks
3) fs/btrfs/ioctl.c:btrfs_ioctl()
they have capable() + inode_owner_or_capable()
... long list
These are sensible API and they do not care at all about fd passing,
so I don't think we should care either ?! or perhaps I'm missing
something ?
- It is an easy mistake to make.
- We have not performed extensive audits of the capable calls at this
time to veryify that fd passing is safe.
- Unless it is egregious we are likely to grandfather the existing usage
in to avoid breaking userspace.

None of that is an excuse for kdbus to get it wrong once it has been
pointed out in review.
Post by Djalal Harouni
The capable() is done as it is, and for the inode_owner_or_capable() you
will notice that we followed the same logic and did use it in our
kdbus_bus_uid_is_privileged() to stay safe and follow what other API are
doing.
What others are doing makes it very hard to safely use allow those
ioctls in a tightly sandboxed application, as it is unpredictable
what the sandboxed ioctl can do with the file descriptor.

Further an application that calls setresuid at different times during
it's application will behave differently. Which makes ioctls that do
not have consistent behavior after open time inappropriate for use in
userspace libraries.

Eric
Post by Djalal Harouni
Thank you for the comments!
Post by Eric W. Biederman
Post by Greg Kroah-Hartman
See Documentation/kdbus.txt for more details.
---
diff --git a/drivers/misc/kdbus/bus.c b/drivers/misc/kdbus/bus.c
new file mode 100644
index 000000000000..6dcaf22f5d59
--- /dev/null
+++ b/drivers/misc/kdbus/bus.c
@@ -0,0 +1,450 @@
+/**
+ * kdbus_bus_cred_is_privileged() - check whether the given credentials in
+ * combination with the capabilities of the
+ * current thead are privileged on the bus
+ *
+ * Return: true if the credentials are privileged, otherwise false.
+ */
+bool kdbus_bus_cred_is_privileged(const struct kdbus_bus *bus,
+ const struct cred *cred)
+{
+ /* Capabilities are *ALWAYS* tested against the current thread, they're
+ * never remembered from conn-credentials. */
+ if (ns_capable(&init_user_ns, CAP_IPC_OWNER))
+ return true;
+
+ return uid_eq(bus->uid_owner, cred->fsuid);
+}
+
+/**
+ * kdbus_bus_uid_is_privileged() - check whether the current user is a
+ * priviledged bus user
+ *
+ * Return: true if the current user has CAP_IPC_OWNER capabilities, or
+ * if it has the same UID as the user that created the bus. Otherwise,
+ * false is returned.
+ */
+bool kdbus_bus_uid_is_privileged(const struct kdbus_bus *bus)
+{
+ return kdbus_bus_cred_is_privileged(bus, current_cred());
+}
+/**
+ * kdbus_bus_new() - create a new bus
+ * details for the bus creation
+ *
+ * This function will allocate a new kdbus_bus and link it to the given
+ * domain.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_bus_new(struct kdbus_domain *domain,
+ const struct kdbus_cmd_make *make,
+ const char *name,
+ const struct kdbus_bloom_parameter *bloom,
+ umode_t mode, kuid_t uid, kgid_t gid,
+ struct kdbus_bus **bus)
+{
[snip]
Post by Greg Kroah-Hartman
+
+ if (!capable(CAP_IPC_OWNER) &&
+ atomic_inc_return(&b->user->buses) > KDBUS_USER_MAX_BUSES) {
+ atomic_dec(&b->user->buses);
+ ret = -EMFILE;
+ goto exit_unref_user_unlock;
+ }
+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Djalal Harouni
2014-10-30 14:57:27 UTC
Permalink
Post by Eric W. Biederman
Post by Djalal Harouni
Post by Eric W. Biederman
The way capabilities are checked in this patch make me very nervous.
We are not checking permissions at open time. Every other location
of calling capable on file like objects has been show to be suceptible
to file descriptor pass attacks.
Yes, I do understand the concern, this is valid for some cases! but we
All (perhaps not all) the current ioctl do not check for fd passing
attacks! if a privileged do arbitrary ioctl on untrusted fds we are
already owned... the dumb privileged process is the one to blame, right?
1) fs/ext4/ioctl.c:ext4_ioctl()
inode_owner_or_capable() + capable() checks
for all the restricted ioctl()
2) fs/xfs/xfs_ioctl.c:xfs_file_ioctl()
capable() checks
3) fs/btrfs/ioctl.c:btrfs_ioctl()
they have capable() + inode_owner_or_capable()
... long list
These are sensible API and they do not care at all about fd passing,
so I don't think we should care either ?! or perhaps I'm missing
something ?
- It is an easy mistake to make.
- We have not performed extensive audits of the capable calls at this
time to veryify that fd passing is safe.
- Unless it is egregious we are likely to grandfather the existing usage
in to avoid breaking userspace.
None of that is an excuse for kdbus to get it wrong once it has been
pointed out in review.
Of course! but our goal here is not to produce some sort of new
capability checks or new security mechanisms in this field. We want to
follow what other API are doing and be consistent. So every one who reads
the code can understand it, it is the standard API, the standard scheme
used in every crucial part of the kernel. If there is really some sort
of proven bugs affecting these ioctl() API say in ext4, btrfs or other
devices, in this case we need to follow and update, we have too!
Post by Eric W. Biederman
Post by Djalal Harouni
The capable() is done as it is, and for the inode_owner_or_capable() you
will notice that we followed the same logic and did use it in our
kdbus_bus_uid_is_privileged() to stay safe and follow what other API are
doing.
What others are doing makes it very hard to safely use allow those
ioctls in a tightly sandboxed application, as it is unpredictable
what the sandboxed ioctl can do with the file descriptor.
Further an application that calls setresuid at different times during
it's application will behave differently. Which makes ioctls that do
not have consistent behavior after open time inappropriate for use in
userspace libraries.
We are consistent in our checks, you say that the application will
behave differently when it calls setresuid() sure! If it changes its
creds then regain of course it will behave differently! and the checks
are here to make sure that setresuid() and alike work correctly when the
application changes its creds and calls-in.

Thanks!
--
Djalal Harouni
http://opendz.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 14:58:34 UTC
Permalink
Post by Djalal Harouni
Post by Eric W. Biederman
What others are doing makes it very hard to safely use allow those
ioctls in a tightly sandboxed application, as it is unpredictable
what the sandboxed ioctl can do with the file descriptor.
Further an application that calls setresuid at different times during
it's application will behave differently. Which makes ioctls that do
not have consistent behavior after open time inappropriate for use in
userspace libraries.
We are consistent in our checks, you say that the application will
behave differently when it calls setresuid() sure! If it changes its
creds then regain of course it will behave differently! and the checks
are here to make sure that setresuid() and alike work correctly when the
application changes its creds and calls-in.
Except that it isn't consistent.

If I open a postgresql socket that wants me to be root and then I drop
privileges, I can keep talking to postresql. This is a good thing,
because it means that I can keep talking to postgresql but I lose my
privilege to do other things.

The new kdbus model breaks this. If I start as root and drop
privileges to UID_PRIVSEP, then my attempts to communicate over
already-open connections shouldn't consider UID_PRIVSEP. In the, they
shouldn't tell the other endpoints that UID_PRIVSEP exists at all
unless I've explicitly asked the kernel for this behavior.

I suggest reading up on the object capability model. Linux isn't one,
but large deviations (like kdbus') from an object capability model are
rarely a good thing.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Djalal Harouni
2014-10-30 18:08:26 UTC
Permalink
Hi Andy,
Post by Andy Lutomirski
Post by Djalal Harouni
Post by Eric W. Biederman
What others are doing makes it very hard to safely use allow those
ioctls in a tightly sandboxed application, as it is unpredictable
what the sandboxed ioctl can do with the file descriptor.
Further an application that calls setresuid at different times during
it's application will behave differently. Which makes ioctls that do
not have consistent behavior after open time inappropriate for use in
userspace libraries.
We are consistent in our checks, you say that the application will
behave differently when it calls setresuid() sure! If it changes its
creds then regain of course it will behave differently! and the checks
are here to make sure that setresuid() and alike work correctly when the
application changes its creds and calls-in.
Except that it isn't consistent.
If I open a postgresql socket that wants me to be root and then I drop
privileges, I can keep talking to postresql. This is a good thing,
because it means that I can keep talking to postgresql but I lose my
privilege to do other things.
Yes, that's nice :-)

But here you are not following about those capable() checks in ioctl(),
here you are referring to the send (talking) logic! which is another
thing. But hey we do not break that use case, we support it.
Post by Andy Lutomirski
The new kdbus model breaks this. If I start as root and drop
privileges to UID_PRIVSEP, then my attempts to communicate over
already-open connections shouldn't consider UID_PRIVSEP. In the, they
shouldn't tell the other endpoints that UID_PRIVSEP exists at all
unless I've explicitly asked the kernel for this behavior.
Yes, but kdbus tries to follow D-Bus which is primarily an RPC system,
not just a stream of bytes.

So we really want to be able to perform real time checks and authorise
method calls on the bus, and not just connections. I mean yes we do our
kdbus talk access checks on send (Talk) requests using creds of the
connection at creation time, but in the other hand we also need and have
to deal with D-Bus method requests which is the primary usecase here.

So, this is similar to AF_UNIX sockets. For them there's SCM_CREDENTIALS
and SO_PEERCRED. The former uses credentials at the time of when
messages are being sent, the latter uses the credentials at the time
when when the connection was initially established. So to not break
things, we provide similar APIs for services:

1) To get the creds of the connection (when the connection was created).

2) To get the creds of the sender of the message during send time. This
is specially relevent to authorize specific D-Bus method calls, by
checking the creds of the caller, not the one who created the kdbus
connection.

Hmm, a comparison for this model can be made with the kernel's own
syscalls: the same way syscalls are authorized or refused based on the
caller's creds at the time of the syscall... you can't hide that
information. We follow the same semantics here, and use it to allow
inter-process method calls. Having an fd passed is just a detail, the
focus should be put on the methode calls and how to perform the correct
access checks against realtime creds.

Thank you Andy for your comments!
--
Djalal Harouni
http://opendz.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Simon McVittie
2014-10-30 18:46:59 UTC
Permalink
Post by Djalal Harouni
So, this is similar to AF_UNIX sockets. For them there's SCM_CREDENTIALS
and SO_PEERCRED. The former uses credentials at the time of when
messages are being sent, the latter uses the credentials at the time
when when the connection was initially established.
Please note that dbus-daemon, the reference implementation of D-Bus,
does not actually ever use SCM_CREDENTIALS on its AF_UNIX sockets. We
prefer to use Linux's SO_PEERCRED, or the platform's closest available
equivalent if there is one. dbus-daemon has methods (RPC calls) to get a
specified peer's uid, pid or LSM data (e.g. SELinux context), but those
methods return the value that was true when the connection was opened or
shortly afterwards, not the value that is true right now. I believe the
plan is that kdbus has ioctls that are equivalent to those RPC calls,
but without needing to wait for asynchronous socket events to get an answer.

The reason I say "or shortly afterwards" is that for the benefit of
platforms where the "best" credentials transfer mechanism behaves like
Linux SCM_CREDENTIALS, such as FreeBSD's SCM_CREDS, the beginning of a
D-Bus protocol stream is that the client sends '\0' to dbus-daemon,
accompanied by SCM_CREDS or whatever if the platform needs it. On Linux
we just send a plain '\0' with no out-of-band data at that point.

The only out-of-band data we send with individual D-Bus RPC messages
later in the connection's lifetime is for fd-passing (SCM_RIGHTS).

It would be a perfectly reasonable feature request to have individual
D-Bus messages that contain proof that, *at the time of sending*, the
sender possessed a given uid/pid/gid/capability/whatever, but we do not
currently have that feature. It would be reasonable for kdbus to have
that feature even though traditional D-Bus doesn't, and it's entirely
possible that it is a feature that would be of benefit for e.g. systemd,
but it is not required for feature parity with traditional D-Bus over
AF_UNIX; it should be included in kdbus, or not, on its own merits.

S

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Djalal Harouni
2014-11-05 19:59:42 UTC
Permalink
Post by Simon McVittie
Post by Djalal Harouni
So, this is similar to AF_UNIX sockets. For them there's SCM_CREDENTIALS
and SO_PEERCRED. The former uses credentials at the time of when
messages are being sent, the latter uses the credentials at the time
when when the connection was initially established.
Please note that dbus-daemon, the reference implementation of D-Bus,
does not actually ever use SCM_CREDENTIALS on its AF_UNIX sockets. We
prefer to use Linux's SO_PEERCRED, or the platform's closest available
equivalent if there is one. dbus-daemon has methods (RPC calls) to get a
specified peer's uid, pid or LSM data (e.g. SELinux context), but those
methods return the value that was true when the connection was opened or
shortly afterwards, not the value that is true right now. I believe the
plan is that kdbus has ioctls that are equivalent to those RPC calls,
but without needing to wait for asynchronous socket events to get an answer.
Correct, we are compatible to SO_PEERCRED and every peer can request
that using KDBUS_CMD_CONN_INFO ioctl(), no need for asynchronous
operations.

Thank you Simon for your feedback!
--
Djalal Harouni
http://opendz.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 20:38:18 UTC
Permalink
Post by Djalal Harouni
Hi Andy,
Post by Andy Lutomirski
Post by Djalal Harouni
Post by Eric W. Biederman
What others are doing makes it very hard to safely use allow those
ioctls in a tightly sandboxed application, as it is unpredictable
what the sandboxed ioctl can do with the file descriptor.
Further an application that calls setresuid at different times during
it's application will behave differently. Which makes ioctls that do
not have consistent behavior after open time inappropriate for use in
userspace libraries.
We are consistent in our checks, you say that the application will
behave differently when it calls setresuid() sure! If it changes its
creds then regain of course it will behave differently! and the checks
are here to make sure that setresuid() and alike work correctly when the
application changes its creds and calls-in.
Except that it isn't consistent.
If I open a postgresql socket that wants me to be root and then I drop
privileges, I can keep talking to postresql. This is a good thing,
because it means that I can keep talking to postgresql but I lose my
privilege to do other things.
Yes, that's nice :-)
But here you are not following about those capable() checks in ioctl(),
here you are referring to the send (talking) logic! which is another
thing. But hey we do not break that use case, we support it.
I don't understand. If postgres starts checking the credentials of
the sender of a query (behind the sender's back, because the current
kdbus code does it implicitly), then this *doesn't work*. Postgres
will see that the sender of the query has the wrong credentials, and
it will reject.
Post by Djalal Harouni
Post by Andy Lutomirski
The new kdbus model breaks this. If I start as root and drop
privileges to UID_PRIVSEP, then my attempts to communicate over
already-open connections shouldn't consider UID_PRIVSEP. In the, they
shouldn't tell the other endpoints that UID_PRIVSEP exists at all
unless I've explicitly asked the kernel for this behavior.
Yes, but kdbus tries to follow D-Bus which is primarily an RPC system,
not just a stream of bytes.
So we really want to be able to perform real time checks and authorise
method calls on the bus, and not just connections. I mean yes we do our
kdbus talk access checks on send (Talk) requests using creds of the
connection at creation time, but in the other hand we also need and have
to deal with D-Bus method requests which is the primary usecase here.
I'm sympathetic to this use case (RPC authorization). I do think that
you can achieve it by making a new connection at the time at which
authorization is needed, since kdbus is supposed to be lightweight,
but that could be an annoying requirement.

*However*, if an RPC client is making an RPC call that needs
authorization, it should know that it needs authorization, and it
should know what authorization it needs, and it should send that
authorization explicitly.

If you need lots of data for logging, then have the process sending
the log message send that data to the logging daemon. If the logging
daemon gets less data than it wants, then it can indicate that in the
logs or return an error.

[snip]
Post by Djalal Harouni
2) To get the creds of the sender of the message during send time. This
is specially relevent to authorize specific D-Bus method calls, by
checking the creds of the caller, not the one who created the kdbus
connection.
Please humor me here: can you describe, concretely, a case where
authorization of the principal issuing a method call is more correct
than authorization of the principal who connected to the object being
acted on?

I suspect that such examples are actually quite difficult to find.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 22:01:27 UTC
Permalink
On Thu, Oct 30, 2014 at 11:08 AM, Djalal Harouni
Post by Djalal Harouni
Hi Andy,
2) To get the creds of the sender of the message during send time. This
is specially relevent to authorize specific D-Bus method calls, by
checking the creds of the caller, not the one who created the kdbus
connection.
Please humor me here: can you describe, concretely, a case where
authorization of the principal issuing a method call is more correct
than authorization of the principal who connected to the object being
acted on?
I suspect that such examples are actually quite difficult to find.
--Andy
The simple answer is that this is a misaimed question - you don't connect to
the object being acted on.
You connect to the _same bus_ as other clients have connected to. You then
act on objects they have made available on the bus.
You might have connected to a restricted endpoint, which provides a narrowed
view of the bus, but that's neither the same thing nor mandatory.
OK, but this doesn't answer the question. It is not an example of a
case where checking credentials at the time of connection to the bus
is actually worse from a security standpoint than checking for
credentials at the time of the send.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Al Viro
2014-10-30 23:38:16 UTC
Permalink
Post by Greg Kroah-Hartman
+static void __kdbus_domain_user_free(struct kref *kref)
+{
+ struct kdbus_domain_user *user =
+ container_of(kref, struct kdbus_domain_user, kref);
+
+ BUG_ON(atomic_read(&user->buses) > 0);
+ BUG_ON(atomic_read(&user->connections) > 0);
+
+ mutex_lock(&user->domain->lock);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Post by Greg Kroah-Hartman
+ idr_remove(&user->domain->user_idr, user->idr);
+ hash_del(&user->hentry);
^^^^^^^^^^^^^^^^^^^^^^^^
Post by Greg Kroah-Hartman
+ mutex_unlock(&user->domain->lock);
+
+ kdbus_domain_unref(user->domain);
+ kfree(user);
+}
+struct kdbus_domain_user *kdbus_domain_user_unref(struct kdbus_domain_user *u)
+{
+ if (u)
+ kref_put(&u->kref, __kdbus_domain_user_free);
+ return NULL;
+}
If you remove an object from some search structures, taking the lock in
destructor is Too Fucking Late(tm). Somebody might have already found
that puppy and decided to pick it (all under that lock) just as we'd
got to that point in destructor and blocked there. Oops...

Normally I'd say "just use kref_put_mutex()", but this case is even worse.
Look:

refcount is 1
A: kref_put_mutex()
see that it's potential 1->0 crossing, need to take mutex
mutex_lock()
B: kref_get()
refcount is 2
A: got the sodding mutex
atomic_dec_and_test
refcount is 1 now
OK, it's not 1->0, after all, just drop the mutex and bugger off
B: kref_put_mutex()
see that it's potential 1->0 crossing, need to take mutex
mutex_lock() blocks
A: mutex_unlock() lets B go
B: ... got it
atomic_dec_and_test
refcount is 0
call the destructor now, which ends with
kdbus_domain_unref(user->domain);
... which just happens to be the last reference to ->domain
... and frees it, along with ->domain->mutex

But what's to guarantee that A will be past the last point where mutex_unlock()
is looking at the mutex? Sure, it's hard to hit, but AFAICS it's not
impossible, especially if the following happens (assuming mutex-dec.h-style
mutices):

B: mutex_lock()
atomic_dec_return -> -1
__mutex_lock_slowpath()
A: mutex_unlock()
atomic_inc_return -> 0
get preempted
B: note that A has already incremented it to 0 and bugger off - we'd got it

and there we go, with A getting the timeslice back and deciding to call
__mutex_unlock_slowpath() when B has already freed the damn thing.

Basically, kref_put_mutex() is only usable when destructor callback cannot end
up freeing the mutex.

kref_get_unless_zero() might be a usable approach, but IMO the whole thing is
simply outside of kref applicability. Using it for something that needs to
deal with removal from search structures from the destructor callback is
already stretching the things; this one is far worse. kref isn't a universal
tool for expressing lifetime cycles. It works for really simple cases and
might eliminate some amount of boilerplate code. It's been greatly oversold
and overused, though...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Linus Torvalds
2014-10-31 18:00:13 UTC
Permalink
Post by Al Viro
If you remove an object from some search structures, taking the lock in
destructor is Too Fucking Late(tm). Somebody might have already found
that puppy and decided to pick it (all under that lock) just as we'd
got to that point in destructor and blocked there. Oops...
Ugh, yes. This is a much too common anti-pattern.
Post by Al Viro
Normally I'd say "just use kref_put_mutex()", but this case is even worse.
Yeah the whole "release the structure the lock is in" is another one.

Both of these patterns have happened so many times that I'd love to
have some kind of automated tool to see them, but I suspect it is
*much* too complex to be easily checked for. The lock object debugging
we have only triggers for the case where the freeing actually happens
with the lock still held, which is too late and too hard-to-hit to be
a very useful check.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Al Viro
2014-10-31 19:56:18 UTC
Permalink
Post by Linus Torvalds
Post by Al Viro
If you remove an object from some search structures, taking the lock in
destructor is Too Fucking Late(tm). Somebody might have already found
that puppy and decided to pick it (all under that lock) just as we'd
got to that point in destructor and blocked there. Oops...
Ugh, yes. This is a much too common anti-pattern.
*nod*

kref is badly oversold; it's fine for the case when there's no non-counting
references to the object, but in this kind of situations the things get
subtle. And the main attraction of kref is the promise that it's easy to
use and avoids all the subtle issues. It's not specific to kref, of course -
the same breakage can and does occur when refcounting is open-coded; for
example, we used to have that kind of bug in dput(). What makes it really
unpleasant is that easy-to-use-don't-worry-it-takes-care-of-everything
assumption...

FWIW, here's another lovely instance (drivers/infiniband/hw/ipath/ipath_mmap.c):
static void ipath_vma_close(struct vm_area_struct *vma)
{
struct ipath_mmap_info *ip = vma->vm_private_data;

kref_put(&ip->ref, ipath_release_mmap_info);
}
void ipath_release_mmap_info(struct kref *ref)
{
struct ipath_mmap_info *ip =
container_of(ref, struct ipath_mmap_info, ref);
struct ipath_ibdev *dev = to_idev(ip->context->device);

spin_lock_irq(&dev->pending_lock);
list_del(&ip->pending_mmaps);
spin_unlock_irq(&dev->pending_lock);

vfree(ip->obj);
kfree(ip);
}
static void ipath_vma_open(struct vm_area_struct *vma)
{
struct ipath_mmap_info *ip = vma->vm_private_data;

kref_get(&ip->ref);
}
int ipath_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
{
..
spin_lock_irq(&dev->pending_lock);
list_for_each_entry_safe(ip, pp, &dev->pending_mmaps,
/* Only the creator is allowed to mmap the object */
if (context != ip->context || (__u64) offset != ip->offset)
continue;
/* Don't allow a mmap larger than the object. */
if (size > ip->size)
break;

list_del_init(&ip->pending_mmaps);
spin_unlock_irq(&dev->pending_lock);

ret = remap_vmalloc_range(vma, ip->obj, 0);
if (ret)
goto done;
vma->vm_ops = &ipath_vm_ops;
vma->vm_private_data = ip;
ipath_vma_open(vma);
goto done;
Post by Linus Torvalds
Post by Al Viro
Normally I'd say "just use kref_put_mutex()", but this case is even worse.
Yeah the whole "release the structure the lock is in" is another one.
Both of these patterns have happened so many times that I'd love to
have some kind of automated tool to see them, but I suspect it is
*much* too complex to be easily checked for.
Especially since here the lock is *not* in the object being fed to
destructor - it's in the object ours holds a reference to, with destructor
dropping that reference and _sometimes_ it ends up being the last one.

As for the first pattern... Frankly, git grep -n -w kref_put, followed
by quick look through the destructors for something like list_del catches
a _lot_ of that ;-/ Not every match is a bug of that sort, but the
fraction of false positives is not too high...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Herrmann
2014-11-04 09:11:24 UTC
Permalink
Hi Al
Post by Al Viro
Post by Greg Kroah-Hartman
+static void __kdbus_domain_user_free(struct kref *kref)
+{
+ struct kdbus_domain_user *user =
+ container_of(kref, struct kdbus_domain_user, kref);
+
+ BUG_ON(atomic_read(&user->buses) > 0);
+ BUG_ON(atomic_read(&user->connections) > 0);
+
+ mutex_lock(&user->domain->lock);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Post by Greg Kroah-Hartman
+ idr_remove(&user->domain->user_idr, user->idr);
+ hash_del(&user->hentry);
^^^^^^^^^^^^^^^^^^^^^^^^
Post by Greg Kroah-Hartman
+ mutex_unlock(&user->domain->lock);
+
+ kdbus_domain_unref(user->domain);
+ kfree(user);
+}
+struct kdbus_domain_user *kdbus_domain_user_unref(struct kdbus_domain_user *u)
+{
+ if (u)
+ kref_put(&u->kref, __kdbus_domain_user_free);
+ return NULL;
+}
If you remove an object from some search structures, taking the lock in
destructor is Too Fucking Late(tm). Somebody might have already found
that puppy and decided to pick it (all under that lock) just as we'd
got to that point in destructor and blocked there. Oops...
Nice catch! I fixed it up via kref_get_unless_zero(). This has the
side-effect that there might be multiple domain_user objects for the
same user, but all but one will have ref==0. They don't carry and
valuable data in those cases, so we're fine. We will just end up using
the next one, or creating a new one.

Thanks a lot!
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Al Viro
2014-10-31 01:39:30 UTC
Permalink
Post by Greg Kroah-Hartman
See Documentation/kdbus.txt for more details.
.. which has nothing whatsoever on object lifetime rules. Could you
folks please document that somewhere? What pins what, what state
transitions are possible, etc.

BTW, the calling conventions for your foo_new() are annoying - instead of
"return -E... or 0, storing the reference to new object in var parameter
passed as the last argument", could you please just return ERR_PTR(-E...)
on error, a pointer to new object on success and to hell with those
struct foo **foo in the argument lists?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-10-31 09:55:41 UTC
Permalink
Hi,
Post by Greg Kroah-Hartman
See Documentation/kdbus.txt for more details.
... which has nothing whatsoever on object lifetime rules.
True. That document only describes the external API exposed by the
driver towards userspace.
Could you
folks please document that somewhere? What pins what, what state
transitions are possible, etc.
Hmm, I'll see whether I can write something up.
BTW, the calling conventions for your foo_new() are annoying - instead of
"return -E... or 0, storing the reference to new object in var parameter
passed as the last argument", could you please just return ERR_PTR(-E...)
on error, a pointer to new object on success and to hell with those
struct foo **foo in the argument lists?
No problem at all. We'll change that around.


Thanks for your feedback, much appreciated!

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:05:19 UTC
Permalink
From: Daniel Mack <***@zonque.org>

Add the basic driver structure.

handle.c is the main ioctl command dispatcher that calls into other parts
of the driver.

main.c contains the code that creates the initial domain at startup, and
util.c has utility functions such as item iterators that are shared with
other files.

limits.h describes limits on things like maximum data structure sizes,
number of messages per users and suchlike. Some of the numbers currently
picked are rough ideas of what what might be sufficient and are probably
rather conservative.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
Documentation/ioctl/ioctl-number.txt | 1 +
drivers/misc/kdbus/handle.c | 1221 ++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/handle.h | 46 ++
drivers/misc/kdbus/limits.h | 77 +++
drivers/misc/kdbus/main.c | 70 ++
drivers/misc/kdbus/util.c | 108 +++
drivers/misc/kdbus/util.h | 94 +++
7 files changed, 1617 insertions(+)
create mode 100644 drivers/misc/kdbus/handle.c
create mode 100644 drivers/misc/kdbus/handle.h
create mode 100644 drivers/misc/kdbus/limits.h
create mode 100644 drivers/misc/kdbus/main.c
create mode 100644 drivers/misc/kdbus/util.c
create mode 100644 drivers/misc/kdbus/util.h

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 8136e1fd30fd..54e091ebb862 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -292,6 +292,7 @@ Code Seq#(hex) Include File Comments
0x92 00-0F drivers/usb/mon/mon_bin.c
0x93 60-7F linux/auto_fs.h
0x94 all fs/btrfs/ioctl.h
+0x95 all uapi/linux/kdbus.h kdbus IPC driver
0x97 00-7F fs/ceph/ioctl.h Ceph file system
0x99 00-0F 537-Addinboard driver
<mailto:***@buks.ipn.de>
diff --git a/drivers/misc/kdbus/handle.c b/drivers/misc/kdbus/handle.c
new file mode 100644
index 000000000000..14810577e269
--- /dev/null
+++ b/drivers/misc/kdbus/handle.c
@@ -0,0 +1,1221 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+#include "domain.h"
+#include "policy.h"
+
+/**
+ * enum kdbus_handle_type - type a handle can be of
+ * @_KDBUS_HANDLE_NULL: Uninitialized/invalid
+ * @KDBUS_HANDLE_CONTROL: New file descriptor of a control node
+ * @KDBUS_HANDLE_CONTROL_DOMAIN_OWNER: File descriptor to hold a domain
+ * @KDBUS_HANDLE_CONTROL_BUS_OWNER: File descriptor to hold a bus
+ * @KDBUS_HANDLE_EP: New file descriptor of a bus node
+ * @KDBUS_HANDLE_ENDPOINT_CONNECTED: A bus connection after HELLO
+ * @KDBUS_HANDLE_ENDPOINT_OWNER: File descriptor to hold an endpoint
+ */
+enum kdbus_handle_type {
+ _KDBUS_HANDLE_NULL,
+ KDBUS_HANDLE_CONTROL,
+ KDBUS_HANDLE_CONTROL_DOMAIN_OWNER,
+ KDBUS_HANDLE_CONTROL_BUS_OWNER,
+ KDBUS_HANDLE_EP,
+ KDBUS_HANDLE_ENDPOINT_CONNECTED,
+ KDBUS_HANDLE_ENDPOINT_OWNER,
+};
+
+/**
+ * struct kdbus_handle - a handle to the kdbus system
+ * @type: Type of this handle (KDBUS_HANDLE_*)
+ * @domain: Domain for this handle
+ * @meta: Cached connection creator's metadata/credentials
+ * @ep: The endpoint for this handle, in case @type is
+ * KDBUS_HANDLE_EP, KDBUS_HANDLE_ENDPOINT_OWNER or
+ * KDBUS_HANDLE_ENDPOINT_CONNECTED
+ * @ptr: Generic pointer used as alias for other members
+ * in the same union by kdbus_handle_transform()
+ * @domain_owner: The domain this handle owns, in case @type
+ * is KDBUS_HANDLE_CONTROL_DOMAIN_OWNER
+ * @bus_owner: The bus this handle owns, in case @type
+ * is KDBUS_HANDLE_CONTROL_BUS_OWNER
+ * @ep_owner: The endpoint this handle owns, in case @type
+ * is KDBUS_HANDLE_ENDPOINT_OWNER
+ * @conn: The connection this handle owns, in case @type
+ * is KDBUS_HANDLE_EP, after HELLO it is
+ * KDBUS_HANDLE_ENDPOINT_CONNECTED
+ */
+struct kdbus_handle {
+ enum kdbus_handle_type type;
+ struct kdbus_domain *domain;
+ struct kdbus_meta *meta;
+ struct kdbus_ep *ep;
+ union {
+ void *ptr;
+ struct kdbus_domain *domain_owner;
+ struct kdbus_bus *bus_owner;
+ struct kdbus_ep *ep_owner;
+ struct kdbus_conn *conn;
+ };
+};
+
+/* kdbus major */
+static unsigned int kdbus_major;
+
+/* map of minors to objects */
+static DEFINE_IDR(kdbus_minor_idr);
+
+/* kdbus minor lock */
+static DEFINE_SPINLOCK(kdbus_minor_lock);
+
+int kdbus_minor_init(void)
+{
+ int ret;
+
+ ret = __register_chrdev(0, 0, 0xfffff, KBUILD_MODNAME,
+ &kdbus_handle_ops);
+ if (ret < 0)
+ return ret;
+
+ kdbus_major = ret;
+ return 0;
+}
+
+void kdbus_minor_exit(void)
+{
+ __unregister_chrdev(kdbus_major, 0, 0xfffff, KBUILD_MODNAME);
+ idr_destroy(&kdbus_minor_idr);
+}
+
+static void *kdbus_minor_pack(enum kdbus_minor_type type, void *ptr)
+{
+ unsigned long p = (unsigned long)ptr;
+
+ BUILD_BUG_ON(KDBUS_MINOR_CNT > 4);
+
+ if (WARN_ON(p & 0x3UL || type >= KDBUS_MINOR_CNT))
+ return NULL;
+
+ return (void *)(p | (unsigned long)type);
+}
+
+static enum kdbus_minor_type kdbus_minor_unpack(void **ptr)
+{
+ unsigned long p = (unsigned long)*ptr;
+
+ *ptr = (void *)(p & ~0x3UL);
+ return p & 0x3UL;
+}
+
+static void kdbus_minor_ref(enum kdbus_minor_type type, void *ptr)
+{
+ if (ptr) {
+ switch (type) {
+ case KDBUS_MINOR_CONTROL:
+ kdbus_domain_ref(ptr);
+ break;
+ case KDBUS_MINOR_EP:
+ kdbus_ep_ref(ptr);
+ break;
+ default:
+ break;
+ }
+ }
+}
+
+static void kdbus_minor_unref(enum kdbus_minor_type type, void *ptr)
+{
+ if (ptr) {
+ switch (type) {
+ case KDBUS_MINOR_CONTROL:
+ kdbus_domain_unref(ptr);
+ break;
+ case KDBUS_MINOR_EP:
+ kdbus_ep_unref(ptr);
+ break;
+ default:
+ break;
+ }
+ }
+}
+
+/**
+ * kdbus_minor_alloc() - allocate a minor for a new kdbus device node
+ * @type: The type of device to allocate
+ * @ptr: The opaque pointer of the new device to store
+ * @out: Pointer to a dev_t for storing the result.
+ *
+ * Returns: 0 on success, in which case @out is set to the newly allocated
+ * device node.
+ */
+int kdbus_minor_alloc(enum kdbus_minor_type type, void *ptr, dev_t *out)
+{
+ int ret;
+
+ ptr = kdbus_minor_pack(type, ptr);
+
+ idr_preload(GFP_KERNEL);
+ spin_lock(&kdbus_minor_lock);
+ ret = idr_alloc(&kdbus_minor_idr, ptr, 0, 0, GFP_NOWAIT);
+ spin_unlock(&kdbus_minor_lock);
+ idr_preload_end();
+
+ if (ret < 0)
+ return ret;
+
+ *out = MKDEV(kdbus_major, ret);
+ return 0;
+}
+
+/**
+ * kdbus_minor_free() - free a minor of a kdbus device node
+ * @devt: The device node to remove
+ */
+void kdbus_minor_free(dev_t devt)
+{
+ unsigned int minor = MINOR(devt);
+
+ if (!devt)
+ return;
+
+ spin_lock(&kdbus_minor_lock);
+ idr_remove(&kdbus_minor_idr, minor);
+ spin_unlock(&kdbus_minor_lock);
+}
+
+/**
+ * kdbus_minor_set() - set an existing minor type of a kdbus device node
+ * @devt: The device node to remove
+ * @type: New type to set
+ * @ptr: Associated pointer when node was initially registered
+ */
+void kdbus_minor_set(dev_t devt, enum kdbus_minor_type type, void *ptr)
+{
+ unsigned int minor = MINOR(devt);
+
+ ptr = kdbus_minor_pack(type, ptr);
+
+ spin_lock(&kdbus_minor_lock);
+ ptr = idr_replace(&kdbus_minor_idr, ptr, minor);
+ spin_unlock(&kdbus_minor_lock);
+}
+
+static int kdbus_minor_lookup(dev_t devt, void **out)
+{
+ unsigned int minor = MINOR(devt);
+ enum kdbus_minor_type type;
+ void *ptr;
+
+ spin_lock(&kdbus_minor_lock);
+ ptr = idr_find(&kdbus_minor_idr, minor);
+ type = kdbus_minor_unpack(&ptr);
+ kdbus_minor_ref(type, ptr);
+ spin_unlock(&kdbus_minor_lock);
+
+ if (!ptr)
+ return -ESHUTDOWN;
+
+ *out = ptr;
+ return type;
+}
+
+static int kdbus_handle_open(struct inode *inode, struct file *file)
+{
+ enum kdbus_minor_type minor_type;
+ struct kdbus_handle *handle;
+ void *minor_ptr;
+ int ret;
+
+ ret = kdbus_minor_lookup(inode->i_rdev, &minor_ptr);
+ if (ret < 0)
+ return ret;
+
+ minor_type = ret;
+
+ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+ if (!handle) {
+ kdbus_minor_unref(minor_type, minor_ptr);
+ return -ENOMEM;
+ }
+
+ file->private_data = handle;
+
+ switch (minor_type) {
+ case KDBUS_MINOR_CONTROL:
+ handle->type = KDBUS_HANDLE_CONTROL;
+ handle->domain = minor_ptr;
+
+ break;
+
+ case KDBUS_MINOR_EP:
+ handle->type = KDBUS_HANDLE_EP;
+ handle->ep = minor_ptr;
+ handle->domain = kdbus_domain_ref(handle->ep->bus->domain);
+
+ /* cache the metadata/credentials of the creator */
+ ret = kdbus_meta_new(&handle->meta);
+ if (ret < 0)
+ goto exit_free;
+
+ ret = kdbus_meta_append(handle->meta, NULL, 0,
+ KDBUS_ATTACH_CREDS |
+ KDBUS_ATTACH_TID_COMM |
+ KDBUS_ATTACH_PID_COMM |
+ KDBUS_ATTACH_EXE |
+ KDBUS_ATTACH_CMDLINE |
+ KDBUS_ATTACH_CGROUP |
+ KDBUS_ATTACH_CAPS |
+ KDBUS_ATTACH_SECLABEL |
+ KDBUS_ATTACH_AUDIT);
+ if (ret < 0)
+ goto exit_free;
+
+ break;
+
+ default:
+ kdbus_minor_unref(minor_type, minor_ptr);
+ ret = -EINVAL;
+ goto exit_free;
+ }
+
+ return 0;
+
+exit_free:
+ kdbus_meta_free(handle->meta);
+ kdbus_ep_unref(handle->ep);
+ kdbus_domain_unref(handle->domain);
+ kfree(handle);
+ return ret;
+}
+
+static int kdbus_handle_release(struct inode *inode, struct file *file)
+{
+ struct kdbus_handle *handle = file->private_data;
+
+ switch (handle->type) {
+ case KDBUS_HANDLE_CONTROL_DOMAIN_OWNER:
+ kdbus_domain_disconnect(handle->domain_owner);
+ kdbus_domain_unref(handle->domain_owner);
+ break;
+
+ case KDBUS_HANDLE_CONTROL_BUS_OWNER:
+ kdbus_bus_disconnect(handle->bus_owner);
+ kdbus_bus_unref(handle->bus_owner);
+ break;
+
+ case KDBUS_HANDLE_ENDPOINT_OWNER:
+ kdbus_ep_disconnect(handle->ep_owner);
+ kdbus_ep_unref(handle->ep_owner);
+ break;
+
+ case KDBUS_HANDLE_ENDPOINT_CONNECTED:
+ kdbus_conn_disconnect(handle->conn, false);
+ kdbus_conn_unref(handle->conn);
+ break;
+
+ default:
+ break;
+ }
+
+ kdbus_meta_free(handle->meta);
+ kdbus_domain_unref(handle->domain);
+ kdbus_ep_unref(handle->ep);
+ kfree(handle);
+
+ return 0;
+}
+
+static int kdbus_copy_from_user(void *dest,
+ void __user *user_ptr,
+ size_t size)
+{
+ if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
+ return -EFAULT;
+
+ if (copy_from_user(dest, user_ptr, size))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kdbus_memdup_user(void __user *user_ptr,
+ void **out,
+ size_t size_min,
+ size_t size_max)
+{
+ void *ptr = NULL;
+ u64 size;
+ int ret;
+
+ ret = kdbus_copy_from_user(&size, user_ptr, sizeof(size));
+ if (ret < 0)
+ return ret;
+
+ if (size < size_min)
+ return -EINVAL;
+
+ if (size > size_max)
+ return -EMSGSIZE;
+
+ ptr = memdup_user(user_ptr, size);
+ if (IS_ERR(ptr))
+ return PTR_ERR(ptr);
+
+ *out = ptr;
+ return 0;
+}
+
+static int kdbus_handle_transform(struct kdbus_handle *handle,
+ enum kdbus_handle_type old_type,
+ enum kdbus_handle_type new_type,
+ void *ctx_ptr)
+{
+ int ret = -EBADFD;
+
+ /*
+ * This transforms a handle from one state into another. Only a single
+ * transformation is allowed per handle, and it must be one of:
+ * CONTROL -> CONTROL_DOMAIN_OWNER
+ * -> CONTROL_BUS_OWNER
+ * EP -> EP_CONNECTED
+ * -> EP_OWNER
+ *
+ * State transformations are protected by the domain-lock. If another
+ * transformation runs in parallel, we will fail and the caller has to
+ * revert any previous steps.
+ *
+ * We also update any context before we write the new type. Reads can
+ * now be sure that iff a specific non-entry type is set, the context
+ * is accessible, too (given appropriate read-barriers).
+ */
+
+ mutex_lock(&handle->domain->lock);
+ if (handle->type == old_type) {
+ handle->ptr = ctx_ptr;
+ /* make sure handle->XYZ is accessible before the type is set */
+ smp_wmb();
+ handle->type = new_type;
+ ret = 0;
+ }
+ mutex_unlock(&handle->domain->lock);
+
+ return ret;
+}
+
+/* kdbus control device commands */
+static long kdbus_handle_ioctl_control(struct file *file, unsigned int cmd,
+ void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_bus *bus = NULL;
+ struct kdbus_cmd_make *make;
+ struct kdbus_domain *domain = NULL;
+ umode_t mode = 0600;
+ void *p = NULL;
+ int ret;
+
+ switch (cmd) {
+ case KDBUS_CMD_BUS_MAKE: {
+ kgid_t gid = KGIDT_INIT(0);
+ struct kdbus_bloom_parameter bloom;
+ char *name;
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*make),
+ KDBUS_MAKE_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ make = p;
+
+ ret = kdbus_negotiate_flags(make, buf, typeof(*make),
+ KDBUS_MAKE_ACCESS_GROUP |
+ KDBUS_MAKE_ACCESS_WORLD);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(make->items,
+ KDBUS_ITEMS_SIZE(make, items));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_bus_make_user(make, &name, &bloom);
+ if (ret < 0)
+ break;
+
+ if (make->flags & KDBUS_MAKE_ACCESS_WORLD) {
+ mode = 0666;
+ } else if (make->flags & KDBUS_MAKE_ACCESS_GROUP) {
+ mode = 0660;
+ gid = current_fsgid();
+ }
+
+ ret = kdbus_bus_new(handle->domain, make, name, &bloom,
+ mode, current_fsuid(), gid, &bus);
+ if (ret < 0)
+ break;
+
+ /* turn the control fd into a new bus owner device */
+ ret = kdbus_handle_transform(handle, KDBUS_HANDLE_CONTROL,
+ KDBUS_HANDLE_CONTROL_BUS_OWNER,
+ bus);
+ if (ret < 0) {
+ kdbus_bus_disconnect(bus);
+ kdbus_bus_unref(bus);
+ break;
+ }
+
+ break;
+ }
+
+ case KDBUS_CMD_DOMAIN_MAKE: {
+ const char *name;
+
+ if (!capable(CAP_IPC_OWNER)) {
+ ret = -EPERM;
+ break;
+ }
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*make),
+ KDBUS_MAKE_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ make = p;
+
+ ret = kdbus_negotiate_flags(make, buf, typeof(*make),
+ KDBUS_MAKE_ACCESS_GROUP |
+ KDBUS_MAKE_ACCESS_WORLD);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(make->items,
+ KDBUS_ITEMS_SIZE(make, items));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_get_str(make->items,
+ KDBUS_ITEMS_SIZE(make, items),
+ KDBUS_ITEM_MAKE_NAME, &name);
+ if (ret < 0)
+ break;
+
+ if (make->flags & KDBUS_MAKE_ACCESS_WORLD)
+ mode = 0666;
+
+ ret = kdbus_domain_new(handle->domain, name, mode, &domain);
+ if (ret < 0)
+ break;
+
+ /* turn the control fd into a new domain owner device */
+ ret = kdbus_handle_transform(handle, KDBUS_HANDLE_CONTROL,
+ KDBUS_HANDLE_CONTROL_DOMAIN_OWNER,
+ domain);
+ if (ret < 0) {
+ kdbus_domain_disconnect(domain);
+ kdbus_domain_unref(domain);
+ break;
+ }
+
+ break;
+ }
+
+ default:
+ ret = -ENOTTY;
+ break;
+ }
+
+ kfree(p);
+
+ return ret;
+}
+
+/* kdbus endpoint make commands */
+static long kdbus_handle_ioctl_ep(struct file *file, unsigned int cmd,
+ void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ void *p = NULL;
+ long ret = 0;
+
+ switch (cmd) {
+ case KDBUS_CMD_ENDPOINT_MAKE: {
+ struct kdbus_cmd_make *make;
+ umode_t mode = 0;
+ kgid_t gid = KGIDT_INIT(0);
+ const char *name;
+ struct kdbus_ep *ep;
+
+ /* creating custom endpoints is a privileged operation */
+ if (!kdbus_bus_uid_is_privileged(handle->ep->bus)) {
+ ret = -EPERM;
+ break;
+ }
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*make),
+ KDBUS_MAKE_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ make = p;
+
+ ret = kdbus_negotiate_flags(make, buf, typeof(*make),
+ KDBUS_MAKE_ACCESS_GROUP |
+ KDBUS_MAKE_ACCESS_WORLD);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(make->items,
+ KDBUS_ITEMS_SIZE(make, items));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_get_str(make->items,
+ KDBUS_ITEMS_SIZE(make, items),
+ KDBUS_ITEM_MAKE_NAME, &name);
+ if (ret < 0)
+ break;
+
+ if (make->flags & KDBUS_MAKE_ACCESS_WORLD) {
+ mode = 0666;
+ } else if (make->flags & KDBUS_MAKE_ACCESS_GROUP) {
+ mode = 0660;
+ gid = current_fsgid();
+ }
+
+ /* custom endpoints always have a policy db */
+ ret = kdbus_ep_new(handle->ep->bus, name, mode,
+ current_fsuid(), gid, true, &ep);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_ep_policy_set(ep, make->items,
+ KDBUS_ITEMS_SIZE(make, items));
+ if (ret < 0) {
+ kdbus_ep_disconnect(ep);
+ kdbus_ep_unref(ep);
+ break;
+ }
+
+ /*
+ * Get an anonymous user to account messages against; custom
+ * endpoint users do not share the budget with the ordinary
+ * users created for a UID.
+ */
+ ret = kdbus_domain_get_user(handle->ep->bus->domain,
+ INVALID_UID, &ep->user);
+ if (ret < 0) {
+ kdbus_ep_disconnect(ep);
+ kdbus_ep_unref(ep);
+ break;
+ }
+
+ /* turn the ep fd into a new endpoint owner device */
+ ret = kdbus_handle_transform(handle, KDBUS_HANDLE_EP,
+ KDBUS_HANDLE_ENDPOINT_OWNER, ep);
+ if (ret < 0) {
+ kdbus_ep_disconnect(ep);
+ kdbus_ep_unref(ep);
+ break;
+ }
+
+ break;
+ }
+
+ case KDBUS_CMD_HELLO: {
+ struct kdbus_cmd_hello *hello;
+ struct kdbus_conn *conn = NULL;
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*hello),
+ KDBUS_HELLO_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ hello = p;
+
+ ret = kdbus_negotiate_flags(hello, buf, typeof(*hello),
+ KDBUS_HELLO_ACCEPT_FD |
+ KDBUS_HELLO_ACTIVATOR |
+ KDBUS_HELLO_POLICY_HOLDER |
+ KDBUS_HELLO_MONITOR);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(hello->items,
+ KDBUS_ITEMS_SIZE(hello, items));
+ if (ret < 0)
+ break;
+
+ if (hello->pool_size == 0 ||
+ !IS_ALIGNED(hello->pool_size, PAGE_SIZE)) {
+ ret = -EFAULT;
+ break;
+ }
+
+ ret = kdbus_conn_new(handle->ep, hello, handle->meta, &conn);
+ if (ret < 0)
+ break;
+
+ /* turn the ep fd into a new connection */
+ ret = kdbus_handle_transform(handle, KDBUS_HANDLE_EP,
+ KDBUS_HANDLE_ENDPOINT_CONNECTED,
+ conn);
+ if (ret < 0) {
+ kdbus_conn_disconnect(conn, false);
+ kdbus_conn_unref(conn);
+ break;
+ }
+
+ if (copy_to_user(buf, hello, sizeof(*hello)))
+ ret = -EFAULT;
+
+ break;
+ }
+
+ default:
+ ret = -ENOTTY;
+ break;
+ }
+
+ kfree(p);
+
+ return ret;
+}
+
+/* kdbus endpoint commands for connected peers */
+static long kdbus_handle_ioctl_ep_connected(struct file *file, unsigned int cmd,
+ void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_conn *conn = handle->conn;
+ void *p = NULL;
+ long ret = 0;
+
+ /*
+ * BYEBYE is special; we must not acquire a connection when
+ * calling into kdbus_conn_disconnect() or we will deadlock,
+ * because kdbus_conn_disconnect() will wait for all acquired
+ * references to be dropped.
+ */
+ if (cmd == KDBUS_CMD_BYEBYE) {
+ if (!kdbus_conn_is_connected(conn))
+ return -EOPNOTSUPP;
+
+ return kdbus_conn_disconnect(conn, true);
+ }
+
+ ret = kdbus_conn_acquire(conn);
+ if (ret < 0)
+ return ret;
+
+ switch (cmd) {
+ case KDBUS_CMD_NAME_ACQUIRE: {
+ /* acquire a well-known name */
+ struct kdbus_cmd_name *cmd_name;
+
+ if (!kdbus_conn_is_connected(conn)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*cmd_name),
+ sizeof(*cmd_name) +
+ KDBUS_ITEM_HEADER_SIZE +
+ KDBUS_NAME_MAX_LEN + 1);
+ if (ret < 0)
+ break;
+
+ cmd_name = p;
+
+ ret = kdbus_negotiate_flags(cmd_name, buf, typeof(*cmd_name),
+ KDBUS_NAME_REPLACE_EXISTING |
+ KDBUS_NAME_ALLOW_REPLACEMENT |
+ KDBUS_NAME_QUEUE);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(cmd_name->items,
+ KDBUS_ITEMS_SIZE(cmd_name, items));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_cmd_name_acquire(conn->bus->name_registry, conn, p);
+ if (ret < 0)
+ break;
+
+ /* return flags to the caller */
+ if (copy_to_user(buf, p, cmd_name->size))
+ ret = -EFAULT;
+
+ break;
+ }
+
+ case KDBUS_CMD_NAME_RELEASE: {
+ /* release a well-known name */
+ struct kdbus_cmd_name *cmd_name;
+
+ if (!kdbus_conn_is_connected(conn)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*cmd_name),
+ sizeof(*cmd_name) +
+ KDBUS_ITEM_HEADER_SIZE +
+ KDBUS_NAME_MAX_LEN + 1);
+ if (ret < 0)
+ break;
+
+ cmd_name = p;
+
+ ret = kdbus_negotiate_flags(cmd_name, buf, typeof(*cmd_name),
+ 0);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(cmd_name->items,
+ KDBUS_ITEMS_SIZE(cmd_name, items));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_cmd_name_release(conn->bus->name_registry, conn, p);
+ break;
+ }
+
+ case KDBUS_CMD_NAME_LIST: {
+ struct kdbus_cmd_name_list cmd_list;
+
+ /* query current IDs and names */
+ if (kdbus_copy_from_user(&cmd_list, buf, sizeof(cmd_list))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ ret = kdbus_negotiate_flags(&cmd_list, buf, typeof(cmd_list),
+ KDBUS_NAME_LIST_UNIQUE |
+ KDBUS_NAME_LIST_NAMES |
+ KDBUS_NAME_LIST_ACTIVATORS |
+ KDBUS_NAME_LIST_QUEUED);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_cmd_name_list(conn->bus->name_registry,
+ conn, &cmd_list);
+ if (ret < 0)
+ break;
+
+ /* return allocated data */
+ if (kdbus_offset_set_user(&cmd_list.offset, buf,
+ struct kdbus_cmd_name_list))
+ ret = -EFAULT;
+
+ break;
+ }
+
+ case KDBUS_CMD_CONN_INFO:
+ case KDBUS_CMD_BUS_CREATOR_INFO: {
+ struct kdbus_cmd_info *cmd_info;
+
+ /* return the properties of a connection */
+ ret = kdbus_memdup_user(buf, &p, sizeof(*cmd_info),
+ sizeof(*cmd_info) +
+ KDBUS_NAME_MAX_LEN + 1);
+ if (ret < 0)
+ break;
+
+ cmd_info = p;
+
+ ret = kdbus_negotiate_flags(cmd_info, buf, typeof(*cmd_info),
+ _KDBUS_ATTACH_ALL);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(cmd_info->items,
+ KDBUS_ITEMS_SIZE(cmd_info, items));
+ if (ret < 0)
+ break;
+
+ if (cmd == KDBUS_CMD_CONN_INFO)
+ ret = kdbus_cmd_info(conn, cmd_info);
+ else
+ ret = kdbus_cmd_bus_creator_info(conn, cmd_info);
+
+ if (ret < 0)
+ break;
+
+ if (kdbus_offset_set_user(&cmd_info->offset, buf,
+ struct kdbus_cmd_info))
+ ret = -EFAULT;
+
+ break;
+ }
+
+ case KDBUS_CMD_CONN_UPDATE: {
+ /* update the properties of a connection */
+ struct kdbus_cmd_update *cmd_update;
+
+ if (!kdbus_conn_is_connected(conn) &&
+ !kdbus_conn_is_policy_holder(conn) &&
+ !kdbus_conn_is_monitor(conn)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*cmd_update),
+ KDBUS_UPDATE_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ cmd_update = p;
+
+ ret = kdbus_negotiate_flags(cmd_update, buf,
+ typeof(*cmd_update), 0);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(cmd_update->items,
+ KDBUS_ITEMS_SIZE(cmd_update, items));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_cmd_conn_update(conn, p);
+ break;
+ }
+
+ case KDBUS_CMD_MATCH_ADD: {
+ /* subscribe to/filter for broadcast messages */
+ struct kdbus_cmd_match *cmd_match;
+
+ if (!kdbus_conn_is_connected(conn) &&
+ !kdbus_conn_is_monitor(conn)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*cmd_match),
+ KDBUS_MATCH_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ cmd_match = p;
+
+ ret = kdbus_negotiate_flags(cmd_match, buf, typeof(*cmd_match),
+ KDBUS_MATCH_REPLACE);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(cmd_match->items,
+ KDBUS_ITEMS_SIZE(cmd_match, items));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_match_db_add(conn, cmd_match);
+ break;
+ }
+
+ case KDBUS_CMD_MATCH_REMOVE: {
+ /* unsubscribe from broadcast messages */
+ struct kdbus_cmd_match *cmd_match;
+
+ if (!kdbus_conn_is_connected(conn) &&
+ !kdbus_conn_is_monitor(conn)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ ret = kdbus_memdup_user(buf, &p,
+ sizeof(*cmd_match),
+ sizeof(*cmd_match));
+ if (ret < 0)
+ break;
+
+ cmd_match = p;
+
+ ret = kdbus_negotiate_flags(cmd_match, buf, typeof(*cmd_match),
+ 0);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(cmd_match->items,
+ KDBUS_ITEMS_SIZE(cmd_match, items));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_match_db_remove(conn, p);
+ break;
+ }
+
+ case KDBUS_CMD_MSG_SEND: {
+ /* submit a message which will be queued in the receiver */
+ struct kdbus_kmsg *kmsg = NULL;
+
+ if (!kdbus_conn_is_connected(conn)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ ret = kdbus_kmsg_new_from_user(conn, buf, &kmsg);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_conn_kmsg_send(conn->ep, conn, kmsg);
+ if (ret < 0) {
+ kdbus_kmsg_free(kmsg);
+ break;
+ }
+
+ /* store the offset of the reply back to userspace */
+ if (kmsg->msg.flags & KDBUS_MSG_FLAGS_SYNC_REPLY) {
+ struct kdbus_msg __user *msg = buf;
+
+ if (copy_to_user(&msg->offset_reply,
+ &kmsg->msg.offset_reply,
+ sizeof(msg->offset_reply)))
+ ret = -EFAULT;
+ }
+
+ kdbus_kmsg_free(kmsg);
+ break;
+ }
+
+ case KDBUS_CMD_MSG_RECV: {
+ struct kdbus_cmd_recv cmd_recv;
+
+ if (!kdbus_conn_is_connected(conn) &&
+ !kdbus_conn_is_monitor(conn)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ ret = kdbus_copy_from_user(&cmd_recv, buf, sizeof(cmd_recv));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_negotiate_flags(&cmd_recv, buf, typeof(cmd_recv),
+ KDBUS_RECV_PEEK | KDBUS_RECV_DROP |
+ KDBUS_RECV_USE_PRIORITY);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_cmd_msg_recv(conn, &cmd_recv);
+ if (ret < 0)
+ break;
+
+ /* return the address of the next message in the pool */
+ if (kdbus_offset_set_user(&cmd_recv.offset, buf,
+ struct kdbus_cmd_recv))
+ ret = -EFAULT;
+
+ break;
+ }
+
+ case KDBUS_CMD_MSG_CANCEL: {
+ struct kdbus_cmd_cancel cmd_cancel;
+
+ if (!kdbus_conn_is_connected(conn)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ /* cancel sync message send requests by cookie */
+ ret = kdbus_copy_from_user(&cmd_cancel, buf,
+ sizeof(cmd_cancel));
+ if (ret < 0)
+ break;
+
+ if (cmd_cancel.flags != 0)
+ return -EOPNOTSUPP;
+
+ ret = kdbus_cmd_msg_cancel(conn, cmd_cancel.cookie);
+ break;
+ }
+
+ case KDBUS_CMD_FREE: {
+ struct kdbus_cmd_free cmd_free;
+
+ if (!kdbus_conn_is_connected(conn) &&
+ !kdbus_conn_is_monitor(conn)) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
+
+ /* free the memory used in the receiver's pool */
+ ret = copy_from_user(&cmd_free, buf, sizeof(cmd_free));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_negotiate_flags(&cmd_free, buf, typeof(cmd_free),
+ 0);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_pool_release_offset(conn->pool, cmd_free.offset);
+ break;
+ }
+
+ default:
+ ret = -ENOTTY;
+ break;
+ }
+
+ kdbus_conn_release(conn);
+ kfree(p);
+ return ret;
+}
+
+/* kdbus endpoint commands for endpoint owners */
+static long kdbus_handle_ioctl_ep_owner(struct file *file, unsigned int cmd,
+ void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_ep *ep = handle->ep_owner;
+ void *p = NULL;
+ long ret = 0;
+
+ switch (cmd) {
+ case KDBUS_CMD_ENDPOINT_UPDATE: {
+ struct kdbus_cmd_update *cmd_update;
+
+ /* update the properties of a custom endpoint */
+ ret = kdbus_memdup_user(buf, &p, sizeof(*cmd_update),
+ KDBUS_UPDATE_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ cmd_update = p;
+
+ ret = kdbus_negotiate_flags(cmd_update, buf,
+ typeof(*cmd_update), 0);
+ if (ret < 0)
+ break;
+
+ ret = kdbus_items_validate(cmd_update->items,
+ KDBUS_ITEMS_SIZE(cmd_update, items));
+ if (ret < 0)
+ break;
+
+ ret = kdbus_ep_policy_set(ep, cmd_update->items,
+ KDBUS_ITEMS_SIZE(cmd_update, items));
+ break;
+ }
+
+ default:
+ ret = -ENOTTY;
+ break;
+ }
+
+ kfree(p);
+ return ret;
+}
+
+static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct kdbus_handle *handle = file->private_data;
+ void __user *argp = (void __user *)arg;
+ enum kdbus_handle_type type = handle->type;
+
+ /* make sure all handle fields are set if handle->type is */
+ smp_rmb();
+
+ switch (type) {
+ case KDBUS_HANDLE_CONTROL:
+ return kdbus_handle_ioctl_control(file, cmd, argp);
+
+ case KDBUS_HANDLE_EP:
+ return kdbus_handle_ioctl_ep(file, cmd, argp);
+
+ case KDBUS_HANDLE_ENDPOINT_CONNECTED:
+ return kdbus_handle_ioctl_ep_connected(file, cmd, argp);
+
+ case KDBUS_HANDLE_ENDPOINT_OWNER:
+ return kdbus_handle_ioctl_ep_owner(file, cmd, argp);
+
+ default:
+ return -EBADFD;
+ }
+}
+
+static unsigned int kdbus_handle_poll(struct file *file,
+ struct poll_table_struct *wait)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_conn *conn;
+ unsigned int mask = POLLOUT | POLLWRNORM;
+
+ /* Only a connected endpoint can read/write data */
+ if (handle->type != KDBUS_HANDLE_ENDPOINT_CONNECTED)
+ return POLLERR | POLLHUP;
+
+ /* make sure handle->conn is set if handle->type is */
+ smp_rmb();
+ conn = handle->conn;
+
+ poll_wait(file, &conn->wait, wait);
+
+ mutex_lock(&conn->lock);
+ if (!kdbus_conn_active(conn))
+ mask = POLLERR | POLLHUP;
+ else if (!list_empty(&conn->queue.msg_list))
+ mask |= POLLIN | POLLRDNORM;
+ mutex_unlock(&conn->lock);
+
+ return mask;
+}
+
+static int kdbus_handle_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct kdbus_handle *handle = file->private_data;
+
+ if (handle->type != KDBUS_HANDLE_ENDPOINT_CONNECTED)
+ return -EPERM;
+
+ /* make sure handle->conn is set if handle->type is */
+ smp_rmb();
+
+ return kdbus_pool_mmap(handle->conn->pool, vma);
+}
+
+const struct file_operations kdbus_handle_ops = {
+ .owner = THIS_MODULE,
+ .open = kdbus_handle_open,
+ .release = kdbus_handle_release,
+ .poll = kdbus_handle_poll,
+ .llseek = noop_llseek,
+ .unlocked_ioctl = kdbus_handle_ioctl,
+ .mmap = kdbus_handle_mmap,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = kdbus_handle_ioctl,
+#endif
+};
diff --git a/drivers/misc/kdbus/handle.h b/drivers/misc/kdbus/handle.h
new file mode 100644
index 000000000000..0e8e9a50aeb1
--- /dev/null
+++ b/drivers/misc/kdbus/handle.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_HANDLE_H
+#define __KDBUS_HANDLE_H
+
+struct kdbus_domain;
+struct kdbus_ep;
+
+extern const struct file_operations kdbus_handle_ops;
+
+enum kdbus_minor_type {
+ KDBUS_MINOR_CONTROL,
+ KDBUS_MINOR_EP,
+ KDBUS_MINOR_CNT
+};
+
+int kdbus_minor_init(void);
+void kdbus_minor_exit(void);
+int kdbus_minor_alloc(enum kdbus_minor_type type, void *ptr, dev_t *out);
+void kdbus_minor_free(dev_t devt);
+void kdbus_minor_set(dev_t devt, enum kdbus_minor_type type, void *ptr);
+
+/* type-safe kdbus_minor_set() */
+static inline void kdbus_minor_set_control(dev_t devt, struct kdbus_domain *d)
+{
+ kdbus_minor_set(devt, KDBUS_MINOR_CONTROL, d);
+}
+
+/* type-safe kdbus_minor_set() */
+static inline void kdbus_minor_set_ep(dev_t devt, struct kdbus_ep *e)
+{
+ kdbus_minor_set(devt, KDBUS_MINOR_EP, e);
+}
+
+#endif
diff --git a/drivers/misc/kdbus/limits.h b/drivers/misc/kdbus/limits.h
new file mode 100644
index 000000000000..29cf30fcce07
--- /dev/null
+++ b/drivers/misc/kdbus/limits.h
@@ -0,0 +1,77 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_DEFAULTS_H
+#define __KDBUS_DEFAULTS_H
+
+/* maximum size of message header and items */
+#define KDBUS_MSG_MAX_SIZE SZ_8K
+
+/* maximum number of message items */
+#define KDBUS_MSG_MAX_ITEMS 128
+
+/*
+ * Maximum number of passed file descriptors
+ * Number taken from AF_UNIX upper limits
+ */
+#define KDBUS_MSG_MAX_FDS 253
+
+/* maximum message payload size */
+#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE SZ_2M
+
+/* maximum size of bloom bit field in bytes */
+#define KDBUS_BUS_BLOOM_MAX_SIZE SZ_4K
+
+/* maximum length of well-known bus name */
+#define KDBUS_NAME_MAX_LEN 255
+
+/* maximum length of bus, domain, ep name */
+#define KDBUS_SYSNAME_MAX_LEN 63
+
+/* maximum size of make data */
+#define KDBUS_MAKE_MAX_SIZE SZ_32K
+
+/* maximum size of hello data */
+#define KDBUS_HELLO_MAX_SIZE SZ_32K
+
+/* maximum size for update commands */
+#define KDBUS_UPDATE_MAX_SIZE SZ_32K
+
+/* maximum number of matches per connection */
+#define KDBUS_MATCH_MAX 256
+
+/* maximum size of match data */
+#define KDBUS_MATCH_MAX_SIZE SZ_32K
+
+/* maximum size of policy data */
+#define KDBUS_POLICY_MAX_SIZE SZ_32K
+
+/* maximum number of queued messages in a connection */
+#define KDBUS_CONN_MAX_MSGS 256
+
+/* maximum number of queued messages from the same indvidual user */
+#define KDBUS_CONN_MAX_MSGS_PER_USER 16
+
+/* maximum number of well-known names per connection */
+#define KDBUS_CONN_MAX_NAMES 64
+
+/* maximum number of queued requests waiting for a reply */
+#define KDBUS_CONN_MAX_REQUESTS_PENDING 128
+
+/* maximum number of connections per user in one domain */
+#define KDBUS_USER_MAX_CONN 256
+
+/* maximum number of buses per user in one domain */
+#define KDBUS_USER_MAX_BUSES 16
+
+#endif
diff --git a/drivers/misc/kdbus/main.c b/drivers/misc/kdbus/main.c
new file mode 100644
index 000000000000..caa4aabc1d8d
--- /dev/null
+++ b/drivers/misc/kdbus/main.c
@@ -0,0 +1,70 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+#include "util.h"
+#include "domain.h"
+#include "handle.h"
+
+/* kdbus initial domain */
+static struct kdbus_domain *kdbus_domain_init;
+
+static int __init kdbus_init(void)
+{
+ int ret;
+
+ ret = subsys_virtual_register(&kdbus_subsys, NULL);
+ if (ret < 0)
+ return ret;
+
+ ret = kdbus_minor_init();
+ if (ret < 0)
+ goto exit_subsys;
+
+ /*
+ * Create the initial domain; it is world-accessible and
+ * provides the /dev/kdbus/control device node.
+ */
+ ret = kdbus_domain_new(NULL, NULL, 0666, &kdbus_domain_init);
+ if (ret < 0) {
+ pr_err("failed to initialize, error=%i\n", ret);
+ goto exit_minor;
+ }
+
+ pr_info("initialized\n");
+ return 0;
+
+exit_minor:
+ kdbus_minor_exit();
+exit_subsys:
+ bus_unregister(&kdbus_subsys);
+ return ret;
+}
+
+static void __exit kdbus_exit(void)
+{
+ kdbus_domain_disconnect(kdbus_domain_init);
+ kdbus_domain_unref(kdbus_domain_init);
+ kdbus_minor_exit();
+ bus_unregister(&kdbus_subsys);
+}
+
+module_init(kdbus_init);
+module_exit(kdbus_exit);
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("D-Bus, powerful, easy to use interprocess communication");
diff --git a/drivers/misc/kdbus/util.c b/drivers/misc/kdbus/util.c
new file mode 100644
index 000000000000..8241e15c6ef5
--- /dev/null
+++ b/drivers/misc/kdbus/util.c
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/ctype.h>
+#include <linux/file.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+
+#include "limits.h"
+#include "util.h"
+
+/**
+ * kdbus_sysname_valid() - validate names showing up in /proc, /sys and /dev
+ * @name: Name of domain, bus, endpoint
+ *
+ * Return: 0 if the given name is valid, otherwise negative errno
+ */
+int kdbus_sysname_is_valid(const char *name)
+{
+ unsigned int i;
+ size_t len;
+
+ len = strlen(name);
+ if (len == 0)
+ return -EINVAL;
+
+ for (i = 0; i < len; i++) {
+ if (isalpha(name[i]))
+ continue;
+ if (isdigit(name[i]))
+ continue;
+ if (name[i] == '_')
+ continue;
+ if (i > 0 && i + 1 < len && strchr("-.", name[i]))
+ continue;
+
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/**
+ * kdbus_check_and_write_flags() - check flags provided by user, and write the
+ * valid mask back
+ * @flags: The flags mask provided by userspace
+ * @buf: The buffer provided by userspace
+ * @offset_out: Offset of the kernel_flags field inside the user-provided struct
+ * @valid: Mask of valid bits
+ *
+ * This function will check whether the flags provided by userspace are within
+ * the combination of allowed bits to the kernel, with the KDBUS_FLAGS_KERNEL
+ * bit set in the return buffer.
+ *
+ * Return: 0 on success, -EFAULT if copy_to_user() failed, or -EINVAL if
+ * userspace submitted invalid bits in its mask.
+ */
+int kdbus_check_and_write_flags(u64 flags, void __user *buf,
+ off_t offset_out, u64 valid)
+{
+ u64 val = valid | KDBUS_FLAG_KERNEL;
+
+ /*
+ * KDBUS_FLAG_KERNEL is reserved and will never be considered
+ * valid by any user of this function.
+ */
+ WARN_ON_ONCE(valid & KDBUS_FLAG_KERNEL);
+
+ if (copy_to_user(((u8 __user *) buf) + offset_out, &val, sizeof(val)))
+ return -EFAULT;
+
+ if (flags & ~valid)
+ return -EINVAL;
+
+ return 0;
+}
+
+/**
+ * kdbus_fput_files() - fput() an array of struct files
+ * @files: The array of files to put, may be NULL
+ * @count: The number of elements in @files
+ *
+ * Call fput() on all non-NULL elements in @files, and set the entries to
+ * NULL afterwards.
+ */
+void kdbus_fput_files(struct file **files, unsigned int count)
+{
+ int i;
+
+ if (!files)
+ return;
+
+ for (i = count - 1; i >= 0; i--)
+ if (files[i]) {
+ fput(files[i]);
+ files[i] = NULL;
+ }
+}
diff --git a/drivers/misc/kdbus/util.h b/drivers/misc/kdbus/util.h
new file mode 100644
index 000000000000..d84b820d2132
--- /dev/null
+++ b/drivers/misc/kdbus/util.h
@@ -0,0 +1,94 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_UTIL_H
+#define __KDBUS_UTIL_H
+
+#include <linux/dcache.h>
+#include <linux/ioctl.h>
+
+#include "kdbus.h"
+
+/* all exported addresses are 64 bit */
+#define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
+
+/* all exported sizes are 64 bit and data aligned to 64 bit */
+#define KDBUS_ALIGN8(s) ALIGN((s), 8)
+#define KDBUS_IS_ALIGNED8(s) (IS_ALIGNED(s, 8))
+
+/**
+ * kdbus_size_get_user - read the size variable from user memory
+ * @_s: Size variable
+ * @_b: Buffer to read from
+ * @_t: Structure, "size" is a member of
+ *
+ * Return: the result of copy_from_user()
+ */
+#define kdbus_size_get_user(_s, _b, _t) \
+({ \
+ u64 __user *_sz = \
+ (void __user *)((u8 __user *)(_b) + offsetof(_t, size));\
+ copy_from_user(_s, _sz, sizeof(__u64)); \
+})
+
+/**
+ * kdbus_offset_set_user - write the offset variable to user memory
+ * @_s: Offset variable
+ * @_b: Buffer to write to
+ * @_t: Structure, "offset" is a member of
+ *
+ * Return: the result of copy_to_user()
+ */
+#define kdbus_offset_set_user(_s, _b, _t) \
+({ \
+ u64 __user *_sz = \
+ (void __user *)((u8 __user *)(_b) + offsetof(_t, offset)); \
+ copy_to_user(_sz, _s, sizeof(__u64)); \
+})
+
+/**
+ * kdbus_str_hash - calculate a hash
+ * @str: String
+ *
+ * Return: hash value
+ */
+static inline unsigned int kdbus_str_hash(const char *str)
+{
+ return full_name_hash(str, strlen(str));
+}
+
+/**
+ * kdbus_str_valid - verify a string
+ * @str: String to verify
+ * @size: Size of buffer of string (including 0-byte)
+ *
+ * This verifies the string at position @str with size @size is properly
+ * zero-terminated and does not contain a 0-byte but at the end.
+ *
+ * Return: true if string is valid, false if not.
+ */
+static inline bool kdbus_str_valid(const char *str, size_t size)
+{
+ return size > 0 && memchr(str, '\0', size) == str + size - 1;
+}
+
+int kdbus_sysname_is_valid(const char *name);
+void kdbus_fput_files(struct file **files, unsigned int count);
+int kdbus_check_and_write_flags(u64 flags, void __user *buf,
+ off_t offset_out, u64 valid);
+
+#define kdbus_negotiate_flags(_s, _b, _t, _v) \
+ kdbus_check_and_write_flags((_s)->flags, _b, \
+ offsetof(_t, kernel_flags), _v) \
+
+#endif
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric W. Biederman
2014-10-30 03:52:29 UTC
Permalink
Post by Greg Kroah-Hartman
Add the basic driver structure.
handle.c is the main ioctl command dispatcher that calls into other parts
of the driver.
main.c contains the code that creates the initial domain at startup, and
util.c has utility functions such as item iterators that are shared with
other files.
limits.h describes limits on things like maximum data structure sizes,
number of messages per users and suchlike. Some of the numbers currently
picked are rough ideas of what what might be sufficient and are probably
rather conservative.
+/* kdbus control device commands */
+static long kdbus_handle_ioctl_control(struct file *file, unsigned int cmd,
+ void __user *buf)
+{
+ case KDBUS_CMD_DOMAIN_MAKE: {
+ const char *name;
+
+ if (!capable(CAP_IPC_OWNER)) {
+ ret = -EPERM;
+ break;
+ }
I don't know if this is exploitable (given that this happens in an
ioctl) but capable checks outside of open usually are.

Eric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Thomas Gleixner
2014-10-30 23:45:54 UTC
Permalink
Post by Greg Kroah-Hartman
+/* kdbus major */
+static unsigned int kdbus_major;
+
+/* map of minors to objects */
+static DEFINE_IDR(kdbus_minor_idr);
+
+/* kdbus minor lock */
+static DEFINE_SPINLOCK(kdbus_minor_lock);
+
+int kdbus_minor_init(void)
+{
+ int ret;
+
+ ret = __register_chrdev(0, 0, 0xfffff, KBUILD_MODNAME,
0xfffff? Random number pulled out of thin air or is there some
sensible explanation for this choice?
Post by Greg Kroah-Hartman
+ &kdbus_handle_ops);
+ if (ret < 0)
+ return ret;
+
+ kdbus_major = ret;
So minor_init actually assigns the major number ...
Post by Greg Kroah-Hartman
+ return 0;
+}
+
+void kdbus_minor_exit(void)
+{
+ __unregister_chrdev(kdbus_major, 0, 0xfffff, KBUILD_MODNAME);
So we have the magic 0xfffff constant at two places to make sure that
it is updated in sync.
Post by Greg Kroah-Hartman
+ idr_destroy(&kdbus_minor_idr);
+}
+
+static void *kdbus_minor_pack(enum kdbus_minor_type type, void *ptr)
+{
+ unsigned long p = (unsigned long)ptr;
+
+ BUILD_BUG_ON(KDBUS_MINOR_CNT > 4);
We certainly want a build bug in some random function with another
magic number for a completely undocumented enum, which does neither
explain what its enum constants mean nor does have a comment that it
is limited to 0-3 for a very good reason.

And of course any not overloaded pointer defaults to
KDBUS_MINOR_CONTROL which is always valid...
Post by Greg Kroah-Hartman
+
+ if (WARN_ON(p & 0x3UL || type >= KDBUS_MINOR_CNT))
0x03UL is a very descriptive constant ....
Post by Greg Kroah-Hartman
+ return NULL;
+
+ return (void *)(p | (unsigned long)type);
+}
+
+static enum kdbus_minor_type kdbus_minor_unpack(void **ptr)
+{
+ unsigned long p = (unsigned long)*ptr;
+
+ *ptr = (void *)(p & ~0x3UL);
+ return p & 0x3UL;
I'm really excited about the intuitive naming conventions
here. minor_init() initializes kdbus_major and this pack/unpack stuff
converts a pointer to carry a type and vice versa. Of course that
stuff lacks any comment in order to accelerate reviews, right?

I really had to look more than twice to figure out that this function
serves two purposes;

- Remove the type overload from the pointer

- Return the type retrieved from the pointer.

Aside of that: What on earth has minor to do with this?

For heavens sake. Minor is referring to a minor number in the context
of character devices, right?

And a number does not map at all to a randomly overloaded pointer
AFAICT.
Post by Greg Kroah-Hartman
+/**
+ * kdbus_minor_set() - set an existing minor type of a kdbus device node
Groan. The name choices are just ass backwards to be honest. And the
explanation of the function is even worse:

"set an existing minor type of a kdbus device node"

What the heck is an 'existing minor type' ?

Ah, you mean if the type does not exist, i.e. it is >= KDBUS_MINOR_CNT)
then the idr entry will be replaced by a NULL pointer silently.

Aside of that if ptr has one of the lower two bits set, and the call
tries to change its type then we get a WARN_ON in dmesg and happily
assign a NULL pointer to the idr entry for that minor number.

Makes a lot of sense. We'll see the fallout later...
So @devt is removed? The documentation of idr_replace() tells a
different story.
Why is this a void pointer? Are we dealing with arbitrary node types
here?
Post by Greg Kroah-Hartman
+ */
+void kdbus_minor_set(dev_t devt, enum kdbus_minor_type type, void *ptr)
+{
+ unsigned int minor = MINOR(devt);
+
+ ptr = kdbus_minor_pack(type, ptr);
+
+ spin_lock(&kdbus_minor_lock);
Why is this a spinlock and not a mutex?
Post by Greg Kroah-Hartman
+ ptr = idr_replace(&kdbus_minor_idr, ptr, minor);
What's the value of this pointless assignment? Pacify gcc?
Post by Greg Kroah-Hartman
+static int kdbus_handle_release(struct inode *inode, struct file *file)
+{
+ struct kdbus_handle *handle = file->private_data;
+
+ switch (handle->type) {
+ kdbus_domain_disconnect(handle->domain_owner);
+ kdbus_domain_unref(handle->domain_owner);
+ break;
+
+ kdbus_bus_disconnect(handle->bus_owner);
+ kdbus_bus_unref(handle->bus_owner);
+ break;
+
+ kdbus_ep_disconnect(handle->ep_owner);
+ kdbus_ep_unref(handle->ep_owner);
+ break;
+
+ kdbus_conn_disconnect(handle->conn, false);
+ kdbus_conn_unref(handle->conn);
+ break;
+
+ break;
Silent acceptance of type being unknown?
Post by Greg Kroah-Hartman
+static int kdbus_copy_from_user(void *dest,
+ void __user *user_ptr,
+ size_t size)
+{
+ if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
+ return -EFAULT;
Completely undocumented requirement and there is no reason WHY we need
KDBUS_IS_ALIGNED8 to figure that out ....
Post by Greg Kroah-Hartman
+static int kdbus_handle_transform(struct kdbus_handle *handle,
+ enum kdbus_handle_type old_type,
+ enum kdbus_handle_type new_type,
+ void *ctx_ptr)
+{
+ int ret = -EBADFD;
+
+ /*
+ * This transforms a handle from one state into another. Only a single
+ * CONTROL -> CONTROL_DOMAIN_OWNER
+ * -> CONTROL_BUS_OWNER
+ * EP -> EP_CONNECTED
+ * -> EP_OWNER
And that's magically enforced by what? new_type is not sanity checked
here at all and there is no requirement for the call site to do so.
Post by Greg Kroah-Hartman
+/* kdbus control device commands */
+static long kdbus_handle_ioctl_control(struct file *file, unsigned int cmd,
+ void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_bus *bus = NULL;
+ struct kdbus_cmd_make *make;
+ struct kdbus_domain *domain = NULL;
+ umode_t mode = 0600;
+ void *p = NULL;
+ int ret;
+
+ switch (cmd) {
+ case KDBUS_CMD_BUS_MAKE: {
+ kgid_t gid = KGIDT_INIT(0);
+ struct kdbus_bloom_parameter bloom;
+ char *name;
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*make),
+ KDBUS_MAKE_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ make = p;
Another great coding convention stolen from some ramdonly misdesigned
user space app?

ret = kdbus_memdup_user(buf, &p, sizeof(*make)....

What's wrong with

struct kdbus_cmd_make *make = NULL;

ret = kdbus_memdup_user(buf, &make, sizeof(*make) ... ????

Surely void pointers are a great guarantee to make things better,
right?
Post by Greg Kroah-Hartman
+ case KDBUS_CMD_DOMAIN_MAKE: {
+ const char *name;
+
+ if (!capable(CAP_IPC_OWNER)) {
Why is this not in the open() call? Because you have an opaque device
at the time of open()?

Offloading security relevant decisions to an ioctl is an interesting
design choice in theory. In fact it is just wrong.
Post by Greg Kroah-Hartman
+ ret = -EPERM;
+ break;
+ }
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*make),
+ KDBUS_MAKE_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ make = p;
See above.
Post by Greg Kroah-Hartman
+/* kdbus endpoint make commands */
+static long kdbus_handle_ioctl_ep(struct file *file, unsigned int cmd,
+ void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ void *p = NULL;
+ long ret = 0;
+
+ switch (cmd) {
+ case KDBUS_CMD_ENDPOINT_MAKE: {
+ struct kdbus_cmd_make *make;
+ umode_t mode = 0;
+ kgid_t gid = KGIDT_INIT(0);
+ const char *name;
+ struct kdbus_ep *ep;
+
+ /* creating custom endpoints is a privileged operation */
+ if (!kdbus_bus_uid_is_privileged(handle->ep->bus)) {
See above.
Post by Greg Kroah-Hartman
+ ret = -EPERM;
+ break;
+ }
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*make),
+ KDBUS_MAKE_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ make = p;
See above.
Post by Greg Kroah-Hartman
+ case KDBUS_CMD_HELLO: {
+ struct kdbus_cmd_hello *hello;
+ struct kdbus_conn *conn = NULL;
+
+ ret = kdbus_memdup_user(buf, &p, sizeof(*hello),
+ KDBUS_HELLO_MAX_SIZE);
+ if (ret < 0)
+ break;
+
+ hello = p;
Ditto.
Post by Greg Kroah-Hartman
+/* kdbus endpoint commands for connected peers */
+static long kdbus_handle_ioctl_ep_connected(struct file *file, unsigned int cmd,
+ void __user *buf)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_conn *conn = handle->conn;
+ void *p = NULL;
+ long ret = 0;
+
+ /*
+ * BYEBYE is special; we must not acquire a connection when
+ * calling into kdbus_conn_disconnect() or we will deadlock,
+ * because kdbus_conn_disconnect() will wait for all acquired
+ * references to be dropped.
+ */
+ if (cmd == KDBUS_CMD_BYEBYE) {
+ if (!kdbus_conn_is_connected(conn))
+ return -EOPNOTSUPP;
If the connection is down already then a BEYBEY is just moot. I don't
see a good reason WHY the return code is -EOPNOTSUPP.

Is this just to provide bug compability with the existing user space
code? If so, pick some proper return code which reflects the state and
deal with it in user space. If not, then you should think hard why you
did not find anything which is more appropriate in the wide choice of
error codes.
Post by Greg Kroah-Hartman
+ return kdbus_conn_disconnect(conn, true);
+ }
+
+ ret = kdbus_conn_acquire(conn);
+ if (ret < 0)
+ return ret;
+
+ switch (cmd) {
+ case KDBUS_CMD_NAME_ACQUIRE: {
+ /* acquire a well-known name */
+ struct kdbus_cmd_name *cmd_name;
+
+ if (!kdbus_conn_is_connected(conn)) {
+ ret = -EOPNOTSUPP;
See above.
Post by Greg Kroah-Hartman
+ break;
+ }
Aside of that what makes sure that the connection is not going away
after you checked it above?

Magic serialization or what? I can't see any of it.

If it's just an optimization then it wants to have a proper comment
and not just a random chosen return code which matches the expectation
of some equally undocumented user space app.
Post by Greg Kroah-Hartman
+static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct kdbus_handle *handle = file->private_data;
+ void __user *argp = (void __user *)arg;
+ enum kdbus_handle_type type = handle->type;
+
+ /* make sure all handle fields are set if handle->type is */
+ smp_rmb();
Sure. You really need this kind of serialization because your design
choice of allowing opaque handles in the first place.

I'm really interested why you need this rmb() at all. Just because you
have several threads in user space which might race with the type
assignment when they call the ioctl?

We have a strict requirement to document memory barriers. The
following comment definitely does not fulfil this requirement as it
just documents that someone observed a race of unknown provenance and
got it 'fixed' with a 'smp_rmb()'
Post by Greg Kroah-Hartman
+ /* make sure all handle fields are set if handle->type is */
That's really hillarious, The user space side knows excatly upfront
which type of 'handle' it wants to open. Making it an opaque handle in
the first place and let the kernel deal with the actual type
assignment is beyond silly. Especially if that involves undocumented
memory barriers.
Post by Greg Kroah-Hartman
+ switch (type) {
+ return kdbus_handle_ioctl_control(file, cmd, argp);
+
+ return kdbus_handle_ioctl_ep(file, cmd, argp);
+
+ return kdbus_handle_ioctl_ep_connected(file, cmd, argp);
+
+ return kdbus_handle_ioctl_ep_owner(file, cmd, argp);
+
+ return -EBADFD;
+ }
+}
+
+static unsigned int kdbus_handle_poll(struct file *file,
+ struct poll_table_struct *wait)
+{
+ struct kdbus_handle *handle = file->private_data;
+ struct kdbus_conn *conn;
+ unsigned int mask = POLLOUT | POLLWRNORM;
+
+ /* Only a connected endpoint can read/write data */
+ if (handle->type != KDBUS_HANDLE_ENDPOINT_CONNECTED)
+ return POLLERR | POLLHUP;
+
+ /* make sure handle->conn is set if handle->type is */
+ smp_rmb();
Surely we need to plaster that all over the place just because we
avoid to open dedicated devices in the first place. And do not tell me
that the open call does not know what type it is going to be.

Copying badly designed userspace code to the kernel without rethinking
the design and 'fixing' the short comings by copying the same 'fixup'
over and over is definitely a guarantee for interesting CVEs in the
future.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jiri Kosina
2014-10-31 00:23:55 UTC
Permalink
Post by Thomas Gleixner
Post by Greg Kroah-Hartman
+static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct kdbus_handle *handle = file->private_data;
+ void __user *argp = (void __user *)arg;
+ enum kdbus_handle_type type = handle->type;
+
+ /* make sure all handle fields are set if handle->type is */
+ smp_rmb();
Sure. You really need this kind of serialization because your design
choice of allowing opaque handles in the first place.
I'm really interested why you need this rmb() at all. Just because you
have several threads in user space which might race with the type
assignment when they call the ioctl?
We have a strict requirement to document memory barriers. The
following comment definitely does not fulfil this requirement as it
just documents that someone observed a race of unknown provenance and
got it 'fixed' with a 'smp_rmb()'
Post by Greg Kroah-Hartman
+ /* make sure all handle fields are set if handle->type is */
That's really hillarious, The user space side knows excatly upfront
which type of 'handle' it wants to open. Making it an opaque handle in
the first place and let the kernel deal with the actual type
assignment is beyond silly. Especially if that involves undocumented
memory barriers.
I have been staring at exactly this for rather a long time today.

Apparently this barrier pairs with smp_wmb() in kdbus_handle_transform()
and tries to make sure that whenever handle->type is seen as updated,
handle->ptr is as well.

But it's still difficult for me to understand all the memory ordering
rules and consequences of this strict ordering (my current understanding
is that the barrier is not needed, but I will have to think about it a
little bit more), so a nice and explanatory comment precisely describing
the race this is protecting against would be very welcome.

Thanks,
--
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Thomas Gleixner
2014-10-31 00:42:16 UTC
Permalink
Post by Jiri Kosina
Post by Thomas Gleixner
Post by Greg Kroah-Hartman
+static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct kdbus_handle *handle = file->private_data;
+ void __user *argp = (void __user *)arg;
+ enum kdbus_handle_type type = handle->type;
+
+ /* make sure all handle fields are set if handle->type is */
+ smp_rmb();
Sure. You really need this kind of serialization because your design
choice of allowing opaque handles in the first place.
I'm really interested why you need this rmb() at all. Just because you
have several threads in user space which might race with the type
assignment when they call the ioctl?
We have a strict requirement to document memory barriers. The
following comment definitely does not fulfil this requirement as it
just documents that someone observed a race of unknown provenance and
got it 'fixed' with a 'smp_rmb()'
Post by Greg Kroah-Hartman
+ /* make sure all handle fields are set if handle->type is */
That's really hillarious, The user space side knows excatly upfront
which type of 'handle' it wants to open. Making it an opaque handle in
the first place and let the kernel deal with the actual type
assignment is beyond silly. Especially if that involves undocumented
memory barriers.
I have been staring at exactly this for rather a long time today.
Apparently this barrier pairs with smp_wmb() in kdbus_handle_transform()
and tries to make sure that whenever handle->type is seen as updated,
handle->ptr is as well.
Right. But it does not make any sense at all.

The underlying problem is the design of the whole character device
interface as an opaque type. Just look at the absurd workarounds in
the userspace implementation of this,

Now we copy it to kernel space 1:1 and find other absurd workarounds
for it instead of designing it new.

Welcome to the world of bug compability...

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:05:49 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch hooks up the build system to actually compile the files
added by previous patches. It also adds an entry to MAINTAINERS to
direct people to Greg KH, David Herrmann, Djalal Harouni and me for
questions and patches.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
MAINTAINERS | 12 ++++++++++++
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/kdbus/Kconfig | 11 +++++++++++
drivers/misc/kdbus/Makefile | 19 +++++++++++++++++++
5 files changed, 44 insertions(+)
create mode 100644 drivers/misc/kdbus/Kconfig
create mode 100644 drivers/misc/kdbus/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index dab92a78d1d5..15d2677d6d5b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5255,6 +5255,18 @@ S: Maintained
F: Documentation/kbuild/kconfig-language.txt
F: scripts/kconfig/

+KDBUS
+M: Greg Kroah-Hartman <***@linuxfoundation.org>
+M: Daniel Mack <***@zonque.org>
+M: David Herrmann <***@googlemail.com>
+M: Djalal Harouni <***@opendz.org>
+L: linux-***@vger.kernel.org
+S: Maintained
+F: drivers/misc/kdbus/*
+F: Documentation/kdbus.txt
+F: include/uapi/linux/kdbus.h
+F: tools/testing/selftests/kdbus/
+
KDUMP
M: Vivek Goyal <***@redhat.com>
M: Haren Myneni <***@us.ibm.com>
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index bbeb4516facf..fa84e5f4b30a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -528,4 +528,5 @@ source "drivers/misc/mic/Kconfig"
source "drivers/misc/genwqe/Kconfig"
source "drivers/misc/echo/Kconfig"
source "drivers/misc/cxl/Kconfig"
+source "drivers/misc/kdbus/Kconfig"
endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 7d5c4cd118c4..ce6fa2fe7513 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -56,3 +56,4 @@ obj-$(CONFIG_GENWQE) += genwqe/
obj-$(CONFIG_ECHO) += echo/
obj-$(CONFIG_VEXPRESS_SYSCFG) += vexpress-syscfg.o
obj-$(CONFIG_CXL_BASE) += cxl/
+obj-$(CONFIG_KDBUS) += kdbus/
diff --git a/drivers/misc/kdbus/Kconfig b/drivers/misc/kdbus/Kconfig
new file mode 100644
index 000000000000..204defa7e237
--- /dev/null
+++ b/drivers/misc/kdbus/Kconfig
@@ -0,0 +1,11 @@
+config KDBUS
+ tristate "kdbus interprocess communication"
+ depends on TMPFS
+ help
+ D-Bus is a system for low-latency, low-overhead, easy to use
+ interprocess communication (IPC).
+
+ See Documentation/kdbus.txt
+
+ To compile this driver as a module, choose M here: the
+ module will be called kdbus.
diff --git a/drivers/misc/kdbus/Makefile b/drivers/misc/kdbus/Makefile
new file mode 100644
index 000000000000..2867b776b821
--- /dev/null
+++ b/drivers/misc/kdbus/Makefile
@@ -0,0 +1,19 @@
+kdbus-y := \
+ bus.o \
+ connection.o \
+ endpoint.o \
+ handle.o \
+ item.o \
+ main.o \
+ match.o \
+ message.o \
+ metadata.o \
+ names.o \
+ notify.o \
+ domain.o \
+ policy.o \
+ pool.o \
+ queue.o \
+ util.o
+
+obj-$(CONFIG_KDBUS) += kdbus.o
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:06:14 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch adds code for matches and notifications.

Notifications are broadcast messages generated by the kernel, which
notify subscribes when connections are created or destroyed, when
well-known-names have been claimed, released or changed ownership,
or when reply messages have timed out.

Matches are used to tell the kernel driver which broadcast messages
a connection is interested in. Matches can either be specific on one
of the kernel-generated notification types, or carry a bloom filter
mask to match against a message from userspace. The latter is a way
to pre-filter messages from other connections in order to mitigate
unnecessary wakeups.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/misc/kdbus/match.c | 521 ++++++++++++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/match.h | 30 +++
drivers/misc/kdbus/notify.c | 235 ++++++++++++++++++++
drivers/misc/kdbus/notify.h | 28 +++
4 files changed, 814 insertions(+)
create mode 100644 drivers/misc/kdbus/match.c
create mode 100644 drivers/misc/kdbus/match.h
create mode 100644 drivers/misc/kdbus/notify.c
create mode 100644 drivers/misc/kdbus/notify.h

diff --git a/drivers/misc/kdbus/match.c b/drivers/misc/kdbus/match.c
new file mode 100644
index 000000000000..86458a642d07
--- /dev/null
+++ b/drivers/misc/kdbus/match.c
@@ -0,0 +1,521 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/hash.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+
+/**
+ * struct kdbus_match_db - message filters
+ * @entries_list: List of matches
+ * @entries_lock: Match data lock
+ * @entries: Number of entries in database
+ */
+struct kdbus_match_db {
+ struct list_head entries_list;
+ struct mutex entries_lock;
+ unsigned int entries;
+};
+
+/**
+ * struct kdbus_match_entry - a match database entry
+ * @cookie: User-supplied cookie to lookup the entry
+ * @list_entry: The list entry element for the db list
+ * @rules_list: The list head for tracking rules of this entry
+ */
+struct kdbus_match_entry {
+ u64 cookie;
+ struct list_head list_entry;
+ struct list_head rules_list;
+};
+
+/**
+ * struct kdbus_bloom_mask - mask to match against filter
+ * @generations: Number of generations carried
+ * @data: Array of bloom bit fields
+ */
+struct kdbus_bloom_mask {
+ u64 generations;
+ u64 *data;
+};
+
+/**
+ * struct kdbus_match_rule - a rule appended to a match entry
+ * @type: An item type to match agains
+ * @bloom_mask: Bloom mask to match a message's filter against, used
+ * with KDBUS_ITEM_BLOOM_MASK
+ * @name: Name to match against, used with KDBUS_ITEM_NAME,
+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}
+ * @old_id: ID to match against, used with
+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
+ * KDBUS_ITEM_ID_REMOVE
+ * @new_id: ID to match against, used with
+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
+ * KDBUS_ITEM_ID_REMOVE
+ * @src_id: ID to match against, used with KDBUS_ITEM_ID
+ * @rules_entry: Entry in the entry's rules list
+ */
+struct kdbus_match_rule {
+ u64 type;
+ union {
+ struct kdbus_bloom_mask bloom_mask;
+ struct {
+ char *name;
+ u64 old_id;
+ u64 new_id;
+ };
+ u64 src_id;
+ };
+ struct list_head rules_entry;
+};
+
+static void kdbus_match_rule_free(struct kdbus_match_rule *rule)
+{
+ switch (rule->type) {
+ case KDBUS_ITEM_BLOOM_MASK:
+ kfree(rule->bloom_mask.data);
+ break;
+
+ case KDBUS_ITEM_NAME:
+ case KDBUS_ITEM_NAME_ADD:
+ case KDBUS_ITEM_NAME_REMOVE:
+ case KDBUS_ITEM_NAME_CHANGE:
+ kfree(rule->name);
+ break;
+
+ case KDBUS_ITEM_ID:
+ case KDBUS_ITEM_ID_ADD:
+ case KDBUS_ITEM_ID_REMOVE:
+ break;
+
+ default:
+ BUG();
+ }
+
+ list_del(&rule->rules_entry);
+ kfree(rule);
+}
+
+static void kdbus_match_entry_free(struct kdbus_match_entry *entry)
+{
+ struct kdbus_match_rule *r, *tmp;
+
+ list_for_each_entry_safe(r, tmp, &entry->rules_list, rules_entry)
+ kdbus_match_rule_free(r);
+
+ list_del(&entry->list_entry);
+ kfree(entry);
+}
+
+/**
+ * kdbus_match_db_free() - free match db resources
+ * @db: The match database
+ */
+void kdbus_match_db_free(struct kdbus_match_db *db)
+{
+ struct kdbus_match_entry *entry, *tmp;
+
+ mutex_lock(&db->entries_lock);
+ list_for_each_entry_safe(entry, tmp, &db->entries_list, list_entry)
+ kdbus_match_entry_free(entry);
+ mutex_unlock(&db->entries_lock);
+
+ kfree(db);
+}
+
+/**
+ * kdbus_match_db_new() - create a new match database
+ * @db: Pointer location for the returned database
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_match_db_new(struct kdbus_match_db **db)
+{
+ struct kdbus_match_db *d;
+
+ d = kzalloc(sizeof(*d), GFP_KERNEL);
+ if (!d)
+ return -ENOMEM;
+
+ mutex_init(&d->entries_lock);
+ INIT_LIST_HEAD(&d->entries_list);
+
+ *db = d;
+ return 0;
+}
+
+static bool kdbus_match_bloom(const struct kdbus_bloom_filter *filter,
+ const struct kdbus_bloom_mask *mask,
+ const struct kdbus_conn *conn)
+{
+ size_t n = conn->bus->bloom.size / sizeof(u64);
+ const u64 *m;
+ size_t i;
+
+ /*
+ * The message's filter carries a generation identifier, the
+ * match's mask possibly carries an array of multiple generations
+ * of the mask. Select the mask with the closest match of the
+ * filter's generation.
+ */
+ m = mask->data + (min(filter->generation, mask->generations - 1) * n);
+
+ /*
+ * The message's filter contains the messages properties,
+ * the match's mask contains the properties to look for in the
+ * message. Check the mask bit field against the filter bit field,
+ * if the message possibly carries the properties the connection
+ * has subscribed to.
+ */
+ for (i = 0; i < n; i++)
+ if ((filter->data[i] & m[i]) != m[i])
+ return false;
+
+ return true;
+}
+
+static bool kdbus_match_rules(const struct kdbus_match_entry *entry,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg)
+{
+ struct kdbus_match_rule *r;
+
+ /*
+ * Walk all the rules and bail out immediately
+ * if any of them is unsatisfied.
+ */
+
+ list_for_each_entry(r, &entry->rules_list, rules_entry) {
+ if (conn_src == NULL) {
+ /* kernel notifications */
+
+ if (kmsg->notify_type != r->type)
+ return false;
+
+ switch (r->type) {
+ case KDBUS_ITEM_ID_ADD:
+ if (r->new_id != KDBUS_MATCH_ID_ANY &&
+ r->new_id != kmsg->notify_new_id)
+ return false;
+
+ break;
+
+ case KDBUS_ITEM_ID_REMOVE:
+ if (r->old_id != KDBUS_MATCH_ID_ANY &&
+ r->old_id != kmsg->notify_old_id)
+ return false;
+
+ break;
+
+ case KDBUS_ITEM_NAME_ADD:
+ case KDBUS_ITEM_NAME_CHANGE:
+ case KDBUS_ITEM_NAME_REMOVE:
+ if ((r->old_id != KDBUS_MATCH_ID_ANY &&
+ r->old_id != kmsg->notify_old_id) ||
+ (r->new_id != KDBUS_MATCH_ID_ANY &&
+ r->new_id != kmsg->notify_new_id) ||
+ (r->name && kmsg->notify_name &&
+ strcmp(r->name, kmsg->notify_name) != 0))
+ return false;
+
+ break;
+
+ default:
+ return false;
+ }
+ } else {
+ /* messages from userspace */
+
+ switch (r->type) {
+ case KDBUS_ITEM_BLOOM_MASK:
+ if (!kdbus_match_bloom(kmsg->bloom_filter,
+ &r->bloom_mask,
+ conn_src))
+ return false;
+ break;
+
+ case KDBUS_ITEM_ID:
+ if (r->src_id != conn_src->id &&
+ r->src_id != KDBUS_MATCH_ID_ANY)
+ return false;
+
+ break;
+
+ case KDBUS_ITEM_NAME:
+ if (!kdbus_conn_has_name(conn_src, r->name))
+ return false;
+
+ break;
+
+ default:
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
+
+/**
+ * kdbus_match_db_match_kmsg() - match a kmsg object agains the database entries
+ * @db: The match database
+ * @conn_src: The connection object originating the message
+ * @kmsg: The kmsg to perform the match on
+ *
+ * This function will walk through all the database entries previously uploaded
+ * with kdbus_match_db_add(). As soon as any of them has an all-satisfied rule
+ * set, this function will return true.
+ *
+ * Return: true if there was a matching database entry, false otherwise.
+ */
+bool kdbus_match_db_match_kmsg(struct kdbus_match_db *db,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg)
+{
+ struct kdbus_match_entry *entry;
+ bool matched = false;
+
+ mutex_lock(&db->entries_lock);
+ list_for_each_entry(entry, &db->entries_list, list_entry) {
+ matched = kdbus_match_rules(entry, conn_src, kmsg);
+ if (matched)
+ break;
+ }
+ mutex_unlock(&db->entries_lock);
+
+ return matched;
+}
+
+static int __kdbus_match_db_remove_unlocked(struct kdbus_match_db *db,
+ uint64_t cookie)
+{
+ struct kdbus_match_entry *entry, *tmp;
+ bool found = false;
+
+ list_for_each_entry_safe(entry, tmp, &db->entries_list, list_entry)
+ if (entry->cookie == cookie) {
+ kdbus_match_entry_free(entry);
+ --db->entries;
+ found = true;
+ }
+
+ return found ? 0 : -ENOENT;
+}
+
+/**
+ * kdbus_match_db_add() - add an entry to the match database
+ * @conn: The connection that was used in the ioctl call
+ * @cmd: The command as provided by the ioctl call
+ *
+ * This function is used in the context of the KDBUS_CMD_MATCH_ADD ioctl
+ * interface.
+ *
+ * One call to this function (or one ioctl(KDBUS_CMD_MATCH_ADD), respectively,
+ * adds one new database entry with n rules attached to it. Each rule is
+ * described with an kdbus_item, and an entry is considered matching if all
+ * its rules are satisfied.
+ *
+ * The items attached to a kdbus_cmd_match struct have the following mapping:
+ *
+ * KDBUS_ITEM_BLOOM_MASK: A bloom mask
+ * KDBUS_ITEM_NAME: A connection's source name
+ * KDBUS_ITEM_ID: A connection ID
+ * KDBUS_ITEM_NAME_ADD:
+ * KDBUS_ITEM_NAME_REMOVE:
+ * KDBUS_ITEM_NAME_CHANGE: Well-known name changes, carry
+ * kdbus_notify_name_change
+ * KDBUS_ITEM_ID_ADD:
+ * KDBUS_ITEM_ID_REMOVE: Connection ID changes, carry
+ * kdbus_notify_id_change
+ *
+ * For kdbus_notify_{id,name}_change structs, only the ID and name fields
+ * are looked at at when adding an entry. The flags are unused.
+ *
+ * Also note that KDBUS_ITEM_BLOOM_MASK, KDBUS_ITEM_NAME and KDBUS_ITEM_ID
+ * are used to match messages from userspace, while the others apply to
+ * kernel-generated notifications.
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_match_db_add(struct kdbus_conn *conn,
+ struct kdbus_cmd_match *cmd)
+{
+ struct kdbus_match_entry *entry = NULL;
+ struct kdbus_match_db *db = conn->match_db;
+ struct kdbus_item *item;
+ LIST_HEAD(list);
+ int ret = 0;
+
+ lockdep_assert_held(conn);
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry) {
+ ret = -ENOMEM;
+ goto exit_free;
+ }
+
+ entry->cookie = cmd->cookie;
+ INIT_LIST_HEAD(&entry->list_entry);
+ INIT_LIST_HEAD(&entry->rules_list);
+
+ KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
+ struct kdbus_match_rule *rule;
+ size_t size = item->size - offsetof(struct kdbus_item, data);
+
+ rule = kzalloc(sizeof(*rule), GFP_KERNEL);
+ if (!rule) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ switch (item->type) {
+ case KDBUS_ITEM_BLOOM_MASK: {
+ u64 generations;
+ u64 remainder;
+
+ generations = div64_u64_rem(size, conn->bus->bloom.size,
+ &remainder);
+ if (size < conn->bus->bloom.size ||
+ remainder > 0) {
+ ret = -EDOM;
+ break;
+ }
+
+ rule->bloom_mask.data = kmemdup(item->data,
+ size, GFP_KERNEL);
+ if (!rule->bloom_mask.data) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ /* we get an array of n generations of bloom masks */
+ rule->bloom_mask.generations = generations;
+
+ break;
+ }
+ case KDBUS_ITEM_NAME:
+ ret = kdbus_item_validate_name(item);
+ if (ret < 0)
+ break;
+
+ rule->name = kstrdup(item->str, GFP_KERNEL);
+ if (!rule->name)
+ ret = -ENOMEM;
+
+ break;
+
+ case KDBUS_ITEM_ID:
+ rule->src_id = item->id;
+ break;
+
+ case KDBUS_ITEM_NAME_ADD:
+ case KDBUS_ITEM_NAME_REMOVE:
+ case KDBUS_ITEM_NAME_CHANGE: {
+ rule->old_id = item->name_change.old_id.id;
+ rule->new_id = item->name_change.new_id.id;
+
+ if (size > sizeof(struct kdbus_notify_name_change)) {
+ rule->name = kstrdup(item->name_change.name,
+ GFP_KERNEL);
+ if (!rule->name)
+ ret = -ENOMEM;
+ }
+
+ break;
+ }
+
+ case KDBUS_ITEM_ID_ADD:
+ case KDBUS_ITEM_ID_REMOVE:
+ if (item->type == KDBUS_ITEM_ID_ADD)
+ rule->new_id = item->id_change.id;
+ else
+ rule->old_id = item->id_change.id;
+
+ break;
+
+ default:
+ kfree(rule);
+ continue;
+ }
+
+ if (ret < 0) {
+ kfree(rule);
+ break;
+ }
+
+ rule->type = item->type;
+
+ list_add_tail(&rule->rules_entry, &entry->rules_list);
+ }
+
+ mutex_lock(&db->entries_lock);
+
+ /* Remove any entry that has the same cookie as the current one. */
+ if (cmd->flags & KDBUS_MATCH_REPLACE)
+ __kdbus_match_db_remove_unlocked(db, entry->cookie);
+
+ /*
+ * If the above removal caught any entry, there will be room for the
+ * new one.
+ */
+ if (++db->entries > KDBUS_MATCH_MAX) {
+ --db->entries;
+ ret = -EMFILE;
+ }
+ if (ret == 0)
+ list_add_tail(&entry->list_entry, &db->entries_list);
+ else
+ kdbus_match_entry_free(entry);
+ mutex_unlock(&db->entries_lock);
+
+exit_free:
+ return ret;
+}
+
+/**
+ * kdbus_match_db_remove() - remove an entry from the match database
+ * @conn: The connection that was used in the ioctl call
+ * @cmd: Pointer to the match data structure
+ *
+ * This function is used in the context of the KDBUS_CMD_MATCH_REMOVE
+ * ioctl interface.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_match_db_remove(struct kdbus_conn *conn,
+ struct kdbus_cmd_match *cmd)
+{
+ struct kdbus_match_db *db = conn->match_db;
+ int ret;
+
+ lockdep_assert_held(conn);
+
+ mutex_lock(&db->entries_lock);
+ ret = __kdbus_match_db_remove_unlocked(db, cmd->cookie);
+ mutex_unlock(&db->entries_lock);
+
+ return ret;
+}
diff --git a/drivers/misc/kdbus/match.h b/drivers/misc/kdbus/match.h
new file mode 100644
index 000000000000..72888080a8d0
--- /dev/null
+++ b/drivers/misc/kdbus/match.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_MATCH_H
+#define __KDBUS_MATCH_H
+
+struct kdbus_conn;
+struct kdbus_kmsg;
+struct kdbus_match_db;
+
+int kdbus_match_db_new(struct kdbus_match_db **db);
+void kdbus_match_db_free(struct kdbus_match_db *db);
+int kdbus_match_db_add(struct kdbus_conn *conn,
+ struct kdbus_cmd_match *cmd);
+int kdbus_match_db_remove(struct kdbus_conn *conn,
+ struct kdbus_cmd_match *cmd);
+bool kdbus_match_db_match_kmsg(struct kdbus_match_db *db,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg);
+#endif
diff --git a/drivers/misc/kdbus/notify.c b/drivers/misc/kdbus/notify.c
new file mode 100644
index 000000000000..c68add64cbf0
--- /dev/null
+++ b/drivers/misc/kdbus/notify.c
@@ -0,0 +1,235 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "item.h"
+#include "message.h"
+#include "notify.h"
+
+static int kdbus_notify_reply(struct kdbus_bus *bus, u64 id,
+ u64 cookie, u64 msg_type)
+{
+ struct kdbus_kmsg *kmsg = NULL;
+ int ret;
+
+ BUG_ON(id == 0);
+
+ ret = kdbus_kmsg_new(0, &kmsg);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * a kernel-generated notification can only contain one
+ * struct kdbus_item, so make a shortcut here for
+ * faster lookup in the match db.
+ */
+ kmsg->notify_type = msg_type;
+ kmsg->msg.dst_id = id;
+ kmsg->msg.src_id = KDBUS_SRC_ID_KERNEL;
+ kmsg->msg.payload_type = KDBUS_PAYLOAD_KERNEL;
+ kmsg->msg.cookie_reply = cookie;
+ kmsg->msg.items[0].type = msg_type;
+
+ spin_lock(&bus->notify_lock);
+ list_add_tail(&kmsg->queue_entry, &bus->notify_list);
+ spin_unlock(&bus->notify_lock);
+ return ret;
+}
+
+/**
+ * kdbus_notify_reply_timeout() - queue a timeout reply
+ * @bus: Bus which queues the messages
+ * @id: The destination's connection ID
+ * @cookie: The cookie to set in the reply.
+ *
+ * Queues a message that has a KDBUS_ITEM_REPLY_TIMEOUT item attached.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie)
+{
+ return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_TIMEOUT);
+}
+
+/**
+ * kdbus_notify_reply_dead() - queue a 'dead' reply
+ * @bus: Bus which queues the messages
+ * @id: The destination's connection ID
+ * @cookie: The cookie to set in the reply.
+ *
+ * Queues a message that has a KDBUS_ITEM_REPLY_DEAD item attached.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie)
+{
+ return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_DEAD);
+}
+
+/**
+ * kdbus_notify_name_change() - queue a notification about a name owner change
+ * @bus: Bus which queues the messages
+ * @type: The type if the notification; KDBUS_ITEM_NAME_ADD,
+ * KDBUS_ITEM_NAME_CHANGE or KDBUS_ITEM_NAME_REMOVE
+ * @old_id: The id of the connection that used to own the name
+ * @new_id: The id of the new owner connection
+ * @old_flags: The flags to pass in the KDBUS_ITEM flags field for
+ * the old owner
+ * @new_flags: The flags to pass in the KDBUS_ITEM flags field for
+ * the new owner
+ * @name: The name that was removed or assigned to a new owner
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
+ u64 old_id, u64 new_id,
+ u64 old_flags, u64 new_flags,
+ const char *name)
+{
+ struct kdbus_kmsg *kmsg = NULL;
+ size_t name_len, extra_size;
+ int ret;
+
+ name_len = strlen(name) + 1;
+ extra_size = sizeof(struct kdbus_notify_name_change) + name_len;
+ ret = kdbus_kmsg_new(extra_size, &kmsg);
+ if (ret < 0)
+ return ret;
+
+ kmsg->msg.dst_id = KDBUS_DST_ID_BROADCAST;
+ kmsg->msg.src_id = KDBUS_SRC_ID_KERNEL;
+ kmsg->msg.payload_type = KDBUS_PAYLOAD_KERNEL;
+ kmsg->notify_type = type;
+ kmsg->notify_old_id = old_id;
+ kmsg->notify_new_id = new_id;
+ kmsg->msg.items[0].type = type;
+ kmsg->msg.items[0].name_change.old_id.id = old_id;
+ kmsg->msg.items[0].name_change.old_id.flags = old_flags;
+ kmsg->msg.items[0].name_change.new_id.id = new_id;
+ kmsg->msg.items[0].name_change.new_id.flags = new_flags;
+ memcpy(kmsg->msg.items[0].name_change.name, name, name_len);
+ kmsg->notify_name = kmsg->msg.items[0].name_change.name;
+
+ spin_lock(&bus->notify_lock);
+ list_add_tail(&kmsg->queue_entry, &bus->notify_list);
+ spin_unlock(&bus->notify_lock);
+ return ret;
+}
+
+/**
+ * kdbus_notify_id_change() - queue a notification about a unique ID change
+ * @bus: Bus which queues the messages
+ * @type: The type if the notification; KDBUS_ITEM_ID_ADD or
+ * KDBUS_ITEM_ID_REMOVE
+ * @id: The id of the connection that was added or removed
+ * @flags: The flags to pass in the KDBUS_ITEM flags field
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags)
+{
+ struct kdbus_kmsg *kmsg = NULL;
+ int ret;
+
+ ret = kdbus_kmsg_new(sizeof(struct kdbus_notify_id_change), &kmsg);
+ if (ret < 0)
+ return ret;
+
+ kmsg->msg.dst_id = KDBUS_DST_ID_BROADCAST;
+ kmsg->msg.src_id = KDBUS_SRC_ID_KERNEL;
+ kmsg->msg.payload_type = KDBUS_PAYLOAD_KERNEL;
+ kmsg->notify_type = type;
+
+ switch (type) {
+ case KDBUS_ITEM_ID_ADD:
+ kmsg->notify_new_id = id;
+ break;
+
+ case KDBUS_ITEM_ID_REMOVE:
+ kmsg->notify_old_id = id;
+ break;
+
+ default:
+ BUG();
+ }
+
+ kmsg->msg.items[0].type = type;
+ kmsg->msg.items[0].id_change.id = id;
+ kmsg->msg.items[0].id_change.flags = flags;
+
+ spin_lock(&bus->notify_lock);
+ list_add_tail(&kmsg->queue_entry, &bus->notify_list);
+ spin_unlock(&bus->notify_lock);
+ return ret;
+}
+
+/**
+ * kdbus_notify_flush() - send a list of collected messages
+ * @bus: Bus which queues the messages
+ *
+ * The list is empty after sending the messages.
+ */
+void kdbus_notify_flush(struct kdbus_bus *bus)
+{
+ LIST_HEAD(notify_list);
+ struct kdbus_kmsg *kmsg, *tmp;
+ struct kdbus_ep *ep = NULL;
+
+ /* bus->ep is only valid as long as the bus is alive */
+ mutex_lock(&bus->lock);
+ if (!bus->disconnected)
+ ep = kdbus_ep_ref(bus->ep);
+ mutex_unlock(&bus->lock);
+
+ mutex_lock(&bus->notify_flush_lock);
+
+ spin_lock(&bus->notify_lock);
+ list_splice_init(&bus->notify_list, &notify_list);
+ spin_unlock(&bus->notify_lock);
+
+ list_for_each_entry_safe(kmsg, tmp, &notify_list, queue_entry) {
+ if (ep)
+ kdbus_conn_kmsg_send(ep, NULL, kmsg);
+ list_del(&kmsg->queue_entry);
+ kdbus_kmsg_free(kmsg);
+ }
+
+ mutex_unlock(&bus->notify_flush_lock);
+
+ kdbus_ep_unref(ep);
+}
+
+/**
+ * kdbus_notify_free() - free a list of collected messages
+ * @bus: Bus which queues the messages
+ */
+void kdbus_notify_free(struct kdbus_bus *bus)
+{
+ struct kdbus_kmsg *kmsg, *tmp;
+
+ list_for_each_entry_safe(kmsg, tmp, &bus->notify_list, queue_entry) {
+ list_del(&kmsg->queue_entry);
+ kdbus_kmsg_free(kmsg);
+ }
+}
diff --git a/drivers/misc/kdbus/notify.h b/drivers/misc/kdbus/notify.h
new file mode 100644
index 000000000000..f6ebd56e2dca
--- /dev/null
+++ b/drivers/misc/kdbus/notify.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_NOTIFY_H
+#define __KDBUS_NOTIFY_H
+
+struct kdbus_bus;
+
+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags);
+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie);
+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie);
+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
+ u64 old_id, u64 new_id,
+ u64 old_flags, u64 new_flags,
+ const char *name);
+void kdbus_notify_flush(struct kdbus_bus *bus);
+void kdbus_notify_free(struct kdbus_bus *bus);
+#endif
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:06:41 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch adds code to create and destroy connections, to validate
incoming messages and to maintain the queue of messages that are
associated with a connection.

Note that connection and queue have a 1:1 relation, the code is only
split in two parts for cleaner separation and better readability.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/misc/kdbus/connection.c | 1751 +++++++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/connection.h | 177 ++++
drivers/misc/kdbus/item.c | 256 ++++++
drivers/misc/kdbus/item.h | 40 +
drivers/misc/kdbus/message.c | 420 ++++++++++
drivers/misc/kdbus/message.h | 72 ++
drivers/misc/kdbus/queue.c | 602 ++++++++++++++
drivers/misc/kdbus/queue.h | 82 ++
drivers/misc/kdbus/util.h | 2 +-
9 files changed, 3401 insertions(+), 1 deletion(-)
create mode 100644 drivers/misc/kdbus/connection.c
create mode 100644 drivers/misc/kdbus/connection.h
create mode 100644 drivers/misc/kdbus/item.c
create mode 100644 drivers/misc/kdbus/item.h
create mode 100644 drivers/misc/kdbus/message.c
create mode 100644 drivers/misc/kdbus/message.h
create mode 100644 drivers/misc/kdbus/queue.c
create mode 100644 drivers/misc/kdbus/queue.h

diff --git a/drivers/misc/kdbus/connection.c b/drivers/misc/kdbus/connection.c
new file mode 100644
index 000000000000..5b1f3ed51611
--- /dev/null
+++ b/drivers/misc/kdbus/connection.c
@@ -0,0 +1,1751 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/math64.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/shmem_fs.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "endpoint.h"
+#include "match.h"
+#include "message.h"
+#include "metadata.h"
+#include "names.h"
+#include "domain.h"
+#include "item.h"
+#include "notify.h"
+#include "policy.h"
+#include "util.h"
+#include "queue.h"
+
+struct kdbus_conn_reply;
+
+#define KDBUS_CONN_ACTIVE_BIAS (INT_MIN + 1)
+
+/**
+ * struct kdbus_conn_reply - an entry of kdbus_conn's list of replies
+ * @kref: Ref-count of this object
+ * @entry: The entry of the connection's reply_list
+ * @reply_dst: The connection the reply will be sent to (method origin)
+ * @queue_entry: The queue enty item that is prepared by the replying
+ * connection
+ * @deadline_ns: The deadline of the reply, in nanoseconds
+ * @cookie: The cookie of the requesting message
+ * @name_id: ID of the well-known name the original msg was sent to
+ * @sync: The reply block is waiting for synchronous I/O
+ * @waiting: The condition to synchronously wait for
+ * @interrupted: The sync reply was left in an interrupted state
+ * @err: The error code for the synchronous reply
+ */
+struct kdbus_conn_reply {
+ struct kref kref;
+ struct list_head entry;
+ struct kdbus_conn *reply_dst;
+ struct kdbus_queue_entry *queue_entry;
+ u64 deadline_ns;
+ u64 cookie;
+ u64 name_id;
+ bool sync:1;
+ bool waiting:1;
+ bool interrupted:1;
+ int err;
+};
+
+static int kdbus_conn_reply_new(struct kdbus_conn_reply **reply_wait,
+ struct kdbus_conn *reply_dst,
+ const struct kdbus_msg *msg,
+ struct kdbus_name_entry *name_entry)
+{
+ bool sync = msg->flags & KDBUS_MSG_FLAGS_SYNC_REPLY;
+ struct kdbus_conn_reply *r;
+ int ret = 0;
+
+ if (atomic_inc_return(&reply_dst->reply_count) >
+ KDBUS_CONN_MAX_REQUESTS_PENDING) {
+ ret = -EMLINK;
+ goto exit_dec_reply_count;
+ }
+
+ r = kzalloc(sizeof(*r), GFP_KERNEL);
+ if (!r) {
+ ret = -ENOMEM;
+ goto exit_dec_reply_count;
+ }
+
+ kref_init(&r->kref);
+ r->reply_dst = kdbus_conn_ref(reply_dst);
+ r->cookie = msg->cookie;
+ r->name_id = name_entry ? name_entry->name_id : 0;
+ r->deadline_ns = msg->timeout_ns;
+
+ if (sync) {
+ r->sync = true;
+ r->waiting = true;
+ }
+
+ *reply_wait = r;
+
+exit_dec_reply_count:
+ if (ret < 0)
+ atomic_dec(&reply_dst->reply_count);
+
+ return ret;
+}
+
+static void __kdbus_conn_reply_free(struct kref *kref)
+{
+ struct kdbus_conn_reply *reply =
+ container_of(kref, struct kdbus_conn_reply, kref);
+
+ atomic_dec(&reply->reply_dst->reply_count);
+ kdbus_conn_unref(reply->reply_dst);
+ kfree(reply);
+}
+
+static struct kdbus_conn_reply*
+kdbus_conn_reply_ref(struct kdbus_conn_reply *r)
+{
+ if (r)
+ kref_get(&r->kref);
+ return r;
+}
+
+static struct kdbus_conn_reply*
+kdbus_conn_reply_unref(struct kdbus_conn_reply *r)
+{
+ if (r)
+ kref_put(&r->kref, __kdbus_conn_reply_free);
+ return NULL;
+}
+
+static void kdbus_conn_reply_sync(struct kdbus_conn_reply *reply, int err)
+{
+ BUG_ON(!reply->sync);
+
+ list_del_init(&reply->entry);
+ reply->waiting = false;
+ reply->err = err;
+ wake_up_interruptible(&reply->reply_dst->wait);
+}
+
+/*
+ * Check for maximum number of messages per individual user. This
+ * should prevent a single user from being able to fill the receiver's
+ * queue.
+ */
+static int kdbus_conn_queue_user_quota(struct kdbus_conn *conn,
+ const struct kdbus_conn *conn_src,
+ struct kdbus_queue_entry *entry)
+{
+ unsigned int user;
+
+ if (!conn_src)
+ return 0;
+
+ if (ns_capable(&init_user_ns, CAP_IPC_OWNER))
+ return 0;
+
+ /*
+ * Only after the queue grows above the maximum number of messages
+ * per individual user, we start to count all further messages
+ * from the sending users.
+ */
+ if (conn->queue.msg_count < KDBUS_CONN_MAX_MSGS_PER_USER)
+ return 0;
+
+ user = conn_src->user->idr;
+
+ /* extend array to store the user message counters */
+ if (user >= conn->msg_users_max) {
+ unsigned int *users;
+ unsigned int i;
+
+ i = 8 + KDBUS_ALIGN8(user);
+ users = kcalloc(i, sizeof(unsigned int), GFP_KERNEL);
+ if (!users)
+ return -ENOMEM;
+
+ memcpy(users, conn->msg_users,
+ sizeof(unsigned int) * conn->msg_users_max);
+ kfree(conn->msg_users);
+ conn->msg_users = users;
+ conn->msg_users_max = i;
+ }
+
+ if (conn->msg_users[user] > KDBUS_CONN_MAX_MSGS_PER_USER)
+ return -ENOBUFS;
+
+ conn->msg_users[user]++;
+ entry->user = user;
+ return 0;
+}
+
+static void kdbus_conn_work(struct work_struct *work)
+{
+ struct kdbus_conn *conn;
+ struct kdbus_conn_reply *reply, *reply_tmp;
+ u64 deadline = ~0ULL;
+ struct timespec64 ts;
+ u64 now;
+
+ conn = container_of(work, struct kdbus_conn, work.work);
+ ktime_get_ts64(&ts);
+ now = timespec64_to_ns(&ts);
+
+ mutex_lock(&conn->lock);
+ if (!kdbus_conn_active(conn)) {
+ mutex_unlock(&conn->lock);
+ return;
+ }
+
+ list_for_each_entry_safe(reply, reply_tmp, &conn->reply_list, entry) {
+ /*
+ * If the reply block is waiting for synchronous I/O,
+ * the timeout is handled by wait_event_*_timeout(),
+ * so we don't have to care for it here.
+ */
+ if (reply->sync && !reply->interrupted)
+ continue;
+
+ if (reply->deadline_ns > now) {
+ /* remember next timeout */
+ if (deadline > reply->deadline_ns)
+ deadline = reply->deadline_ns;
+
+ continue;
+ }
+
+ /*
+ * A zero deadline means the connection died, was
+ * cleaned up already and the notification was sent.
+ * Don't send notifications for reply trackers that were
+ * left in an interrupted syscall state.
+ */
+ if (reply->deadline_ns != 0 && !reply->interrupted)
+ kdbus_notify_reply_timeout(conn->bus,
+ reply->reply_dst->id,
+ reply->cookie);
+
+ list_del_init(&reply->entry);
+ kdbus_conn_reply_unref(reply);
+ }
+
+ /* rearm delayed work with next timeout */
+ if (deadline != ~0ULL)
+ schedule_delayed_work(&conn->work,
+ nsecs_to_jiffies(deadline - now));
+
+ mutex_unlock(&conn->lock);
+
+ kdbus_notify_flush(conn->bus);
+}
+
+/**
+ * kdbus_cmd_msg_recv() - receive a message from the queue
+ * @conn: Connection to work on
+ * @recv: The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_cmd_msg_recv(struct kdbus_conn *conn,
+ struct kdbus_cmd_recv *recv)
+{
+ struct kdbus_queue_entry *entry = NULL;
+ int ret;
+
+ if (recv->offset > 0)
+ return -EINVAL;
+
+ mutex_lock(&conn->lock);
+ ret = kdbus_queue_entry_peek(&conn->queue, recv->priority,
+ recv->flags & KDBUS_RECV_USE_PRIORITY,
+ &entry);
+ if (ret < 0)
+ goto exit_unlock;
+
+ BUG_ON(!entry);
+
+ /* just drop the message */
+ if (recv->flags & KDBUS_RECV_DROP) {
+ bool reply_found = false;
+
+ if (entry->reply) {
+ struct kdbus_conn_reply *r;
+
+ /*
+ * Walk the list of pending replies and see if the
+ * one attached to this entry item is stil there.
+ * It might have been removed by an incoming reply,
+ * and we currently don't track reply entries in that
+ * direction in order to prevent potentially dangling
+ * pointers.
+ */
+ list_for_each_entry(r, &conn->reply_list, entry) {
+ if (r == entry->reply) {
+ reply_found = true;
+ break;
+ }
+ }
+ }
+
+ if (reply_found) {
+ if (entry->reply->sync) {
+ kdbus_conn_reply_sync(entry->reply, -EPIPE);
+ } else {
+ list_del_init(&entry->reply->entry);
+ kdbus_conn_reply_unref(entry->reply);
+ kdbus_notify_reply_dead(conn->bus,
+ entry->src_id,
+ entry->cookie);
+ }
+ }
+
+ kdbus_queue_entry_remove(conn, entry);
+ kdbus_pool_slice_free(entry->slice);
+ mutex_unlock(&conn->lock);
+
+ kdbus_queue_entry_free(entry);
+
+ goto exit;
+ }
+
+ /* Give the offset back to the caller. */
+ recv->offset = kdbus_pool_slice_offset(entry->slice);
+
+ /*
+ * Just return the location of the next message. Do not install
+ * file descriptors or anything else. This is usually used to
+ * determine the sender of the next queued message.
+ *
+ * File descriptor numbers referenced in the message items
+ * are undefined, they are only valid with the full receive
+ * not with peek.
+ */
+ if (recv->flags & KDBUS_RECV_PEEK) {
+ kdbus_pool_slice_flush(entry->slice);
+ goto exit_unlock;
+ }
+
+ ret = kdbus_queue_entry_install(entry);
+ kdbus_pool_slice_make_public(entry->slice);
+ kdbus_queue_entry_remove(conn, entry);
+ kdbus_queue_entry_free(entry);
+
+exit_unlock:
+ mutex_unlock(&conn->lock);
+exit:
+ kdbus_notify_flush(conn->bus);
+ return ret;
+}
+
+static int kdbus_conn_find_reply(struct kdbus_conn *conn_replying,
+ struct kdbus_conn *conn_reply_dst,
+ uint64_t cookie,
+ struct kdbus_conn_reply **reply)
+{
+ struct kdbus_conn_reply *r;
+ int ret = -ENOENT;
+
+ if (atomic_read(&conn_reply_dst->reply_count) == 0)
+ return -ENOENT;
+
+ list_for_each_entry(r, &conn_replying->reply_list, entry) {
+ if (r->reply_dst == conn_reply_dst &&
+ r->cookie == cookie) {
+ *reply = r;
+ ret = 0;
+ break;
+ }
+ }
+
+ return ret;
+}
+
+/**
+ * kdbus_cmd_msg_cancel() - cancel all pending sync requests
+ * with the given cookie
+ * @conn: The connection
+ * @cookie: The cookie
+ *
+ * Return: 0 on success, or -ENOENT if no pending request with that
+ * cookie was found.
+ */
+int kdbus_cmd_msg_cancel(struct kdbus_conn *conn,
+ u64 cookie)
+{
+ struct kdbus_conn_reply *reply;
+ struct kdbus_conn *c;
+ bool found = false;
+ int ret, i;
+
+ if (atomic_read(&conn->reply_count) == 0)
+ return -ENOENT;
+
+ /* lock order: domain -> bus -> ep -> names -> conn */
+ down_read(&conn->bus->conn_rwlock);
+ hash_for_each(conn->bus->conn_hash, i, c, hentry) {
+ if (c == conn)
+ continue;
+
+ mutex_lock(&c->lock);
+ ret = kdbus_conn_find_reply(c, conn, cookie, &reply);
+ if (ret == 0) {
+ kdbus_conn_reply_sync(reply, -ECANCELED);
+ found = true;
+ }
+ mutex_unlock(&c->lock);
+ }
+ up_read(&conn->bus->conn_rwlock);
+
+ return found ? 0 : -ENOENT;
+}
+
+static int kdbus_conn_check_access(struct kdbus_ep *ep,
+ const struct kdbus_msg *msg,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst,
+ struct kdbus_conn_reply **reply_wake)
+{
+ bool allowed = false;
+ int ret;
+
+ /*
+ * Walk the conn_src's list of expected replies. If there's any
+ * matching entry, allow the message to be sent, and remove it.
+ */
+ if (reply_wake && msg->cookie_reply > 0) {
+ struct kdbus_conn_reply *r;
+
+ mutex_lock(&conn_src->lock);
+ ret = kdbus_conn_find_reply(conn_src, conn_dst,
+ msg->cookie_reply, &r);
+ if (ret == 0) {
+ list_del_init(&r->entry);
+ if (r->sync)
+ *reply_wake = kdbus_conn_reply_ref(r);
+ else
+ kdbus_conn_reply_unref(r);
+
+ allowed = true;
+ }
+ mutex_unlock(&conn_src->lock);
+ }
+
+ if (allowed)
+ return 0;
+
+ /* ... otherwise, ask the policy DBs for permission */
+ ret = kdbus_ep_policy_check_talk_access(ep, conn_src, conn_dst);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+/* enqueue a message into the receiver's pool */
+static int kdbus_conn_entry_insert(struct kdbus_conn *conn,
+ struct kdbus_conn *conn_src,
+ const struct kdbus_kmsg *kmsg,
+ struct kdbus_conn_reply *reply)
+{
+ struct kdbus_queue_entry *entry;
+ int ret;
+
+ mutex_lock(&conn->lock);
+
+ /* limit the maximum number of queued messages */
+ if (!ns_capable(&init_user_ns, CAP_IPC_OWNER) &&
+ conn->queue.msg_count > KDBUS_CONN_MAX_MSGS) {
+ ret = -ENOBUFS;
+ goto exit_unlock;
+ }
+
+ if (!kdbus_conn_active(conn)) {
+ ret = -ECONNRESET;
+ goto exit_unlock;
+ }
+
+ /* The connection does not accept file descriptors */
+ if (!(conn->flags & KDBUS_HELLO_ACCEPT_FD) && kmsg->fds_count > 0) {
+ ret = -ECOMM;
+ goto exit_unlock;
+ }
+
+ ret = kdbus_queue_entry_alloc(conn, kmsg, &entry);
+ if (ret < 0)
+ goto exit_unlock;
+
+ /* limit the number of queued messages from the same individual user */
+ ret = kdbus_conn_queue_user_quota(conn, conn_src, entry);
+ if (ret < 0)
+ goto exit_queue_free;
+
+ /*
+ * Remember the the reply associated with this queue entry, so we can
+ * move the reply entry's connection when a connection moves from an
+ * activator to an implementor.
+ */
+ entry->reply = reply;
+
+ if (reply) {
+ list_add(&reply->entry, &conn->reply_list);
+ if (!reply->sync)
+ schedule_delayed_work(&conn->work, 0);
+ }
+
+ /* link the message into the receiver's entry */
+ kdbus_queue_entry_add(&conn->queue, entry);
+ mutex_unlock(&conn->lock);
+
+ /* wake up poll() */
+ wake_up_interruptible(&conn->wait);
+ return 0;
+
+exit_queue_free:
+ kdbus_queue_entry_free(entry);
+exit_unlock:
+ mutex_unlock(&conn->lock);
+ return ret;
+}
+
+static int kdbus_kmsg_attach_metadata(struct kdbus_kmsg *kmsg,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst)
+{
+ u64 attach_flags;
+
+ /*
+ * Append metadata items according to the destination connection's
+ * attach flags. If the source connection has faked credentials, the
+ * metadata object associated with the kmsg has been pre-filled with
+ * conn_src->owner_meta, and we only attach the connection's name and
+ * currently owned names on top of that.
+ */
+ attach_flags = atomic64_read(&conn_dst->attach_flags);
+
+ if (conn_src->owner_meta)
+ attach_flags &= KDBUS_ATTACH_NAMES | KDBUS_ATTACH_CONN_NAME;
+
+ return kdbus_meta_append(kmsg->meta, conn_src, kmsg->seq, attach_flags);
+}
+
+static void kdbus_conn_broadcast(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg)
+{
+ const struct kdbus_msg *msg = &kmsg->msg;
+ struct kdbus_bus *bus = ep->bus;
+ struct kdbus_conn *conn_dst;
+ unsigned int i;
+ int ret = 0;
+
+ down_read(&bus->conn_rwlock);
+
+ hash_for_each(bus->conn_hash, i, conn_dst, hentry) {
+ if (conn_dst->id == msg->src_id)
+ continue;
+
+ /*
+ * Activator or policy holder connections will
+ * not receive any broadcast messages, only
+ * ordinary and monitor ones.
+ */
+ if (!kdbus_conn_is_connected(conn_dst) &&
+ !kdbus_conn_is_monitor(conn_dst))
+ continue;
+
+ if (!kdbus_match_db_match_kmsg(conn_dst->match_db, conn_src,
+ kmsg))
+ continue;
+
+ ret = kdbus_ep_policy_check_notification(conn_dst->ep,
+ conn_dst, kmsg);
+ if (ret < 0)
+ continue;
+
+ /*
+ * The first receiver which requests additional
+ * metadata causes the message to carry it; all
+ * receivers after that will see all of the added
+ * data, even when they did not ask for it.
+ */
+ if (conn_src) {
+ /* Check if conn_src is allowed to signal */
+ ret = kdbus_ep_policy_check_broadcast(conn_dst->ep,
+ conn_src,
+ conn_dst);
+ if (ret < 0)
+ continue;
+
+ ret = kdbus_ep_policy_check_src_names(conn_dst->ep,
+ conn_src,
+ conn_dst);
+ if (ret < 0)
+ continue;
+
+ ret = kdbus_kmsg_attach_metadata(kmsg, conn_src,
+ conn_dst);
+ if (ret < 0)
+ goto exit_unlock;
+ }
+
+ kdbus_conn_entry_insert(conn_dst, conn_src, kmsg, NULL);
+ }
+
+exit_unlock:
+ up_read(&bus->conn_rwlock);
+}
+
+static void kdbus_conn_eavesdrop(struct kdbus_ep *ep, struct kdbus_conn *conn,
+ struct kdbus_kmsg *kmsg)
+{
+ struct kdbus_conn *c;
+ int ret;
+
+ /*
+ * Monitor connections get all messages; ignore possible errors
+ * when sending messages to monitor connections.
+ */
+
+ down_read(&ep->bus->conn_rwlock);
+ list_for_each_entry(c, &ep->bus->monitors_list, monitor_entry) {
+ /*
+ * The first monitor which requests additional
+ * metadata causes the message to carry it; all
+ * monitors after that will see all of the added
+ * data, even when they did not ask for it.
+ */
+ if (conn) {
+ ret = kdbus_kmsg_attach_metadata(kmsg, conn, c);
+ if (ret < 0)
+ break;
+ }
+
+ kdbus_conn_entry_insert(c, NULL, kmsg, NULL);
+ }
+ up_read(&ep->bus->conn_rwlock);
+}
+
+static int kdbus_conn_wait_reply(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_conn *conn_dst,
+ struct kdbus_msg *msg,
+ struct kdbus_conn_reply *reply_wait,
+ u64 timeout_ns)
+{
+ struct kdbus_queue_entry *entry;
+ int r, ret;
+
+ /*
+ * Block until the reply arrives. reply_wait is left untouched
+ * by the timeout scans that might be conducted for other,
+ * asynchronous replies of conn_src.
+ */
+ r = wait_event_interruptible_timeout(reply_wait->reply_dst->wait,
+ !reply_wait->waiting || !kdbus_conn_active(conn_src),
+ nsecs_to_jiffies(timeout_ns));
+ if (r < 0) {
+ /*
+ * Interrupted system call. Unref the reply object, and
+ * pass the return value down the chain. Mark the reply as
+ * interrupted, so the cleanup work can remove it, but do
+ * not unlink it from the list. Once the syscall restarts,
+ * we'll pick it up and wait on it again.
+ */
+ mutex_lock(&conn_dst->lock);
+ reply_wait->interrupted = true;
+ schedule_delayed_work(&conn_dst->work, 0);
+ mutex_unlock(&conn_dst->lock);
+
+ return r;
+ }
+
+ if (r == 0)
+ ret = -ETIMEDOUT;
+ else if (!kdbus_conn_active(conn_src))
+ ret = -ECONNRESET;
+ else
+ ret = reply_wait->err;
+
+ mutex_lock(&conn_dst->lock);
+ list_del_init(&reply_wait->entry);
+ mutex_unlock(&conn_dst->lock);
+
+ mutex_lock(&conn_src->lock);
+ reply_wait->waiting = false;
+ entry = reply_wait->queue_entry;
+ if (entry) {
+ if (ret == 0)
+ ret = kdbus_queue_entry_install(entry);
+
+ msg->offset_reply = kdbus_pool_slice_offset(entry->slice);
+ kdbus_pool_slice_make_public(entry->slice);
+ kdbus_queue_entry_free(entry);
+ }
+ mutex_unlock(&conn_src->lock);
+
+ kdbus_conn_reply_unref(reply_wait);
+
+ return ret;
+}
+
+/**
+ * kdbus_conn_kmsg_send() - send a message
+ * @ep: Endpoint to send from
+ * @conn_src: Connection, kernel-generated messages do not have one
+ * @kmsg: Message to send
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_conn_kmsg_send(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg)
+{
+ struct kdbus_conn_reply *reply_wait = NULL;
+ struct kdbus_conn_reply *reply_wake = NULL;
+ struct kdbus_name_entry *name_entry = NULL;
+ struct kdbus_msg *msg = &kmsg->msg;
+ struct kdbus_conn *conn_dst = NULL;
+ struct kdbus_bus *bus = ep->bus;
+ bool sync = msg->flags & KDBUS_MSG_FLAGS_SYNC_REPLY;
+ int ret = 0;
+
+ /* assign domain-global message sequence number */
+ BUG_ON(kmsg->seq > 0);
+ kmsg->seq = atomic64_inc_return(&bus->domain->msg_seq_last);
+
+ /* non-kernel senders append credentials/metadata */
+ if (conn_src) {
+ /*
+ * If a connection has installed faked credentials when it was
+ * created, make sure only those are sent out as attachments
+ * of messages, and nothing that is gathered at retrieved from
+ * 'current' at the time of sending.
+ *
+ * Hence, in such cases, duplicate the connection's owner_meta,
+ * and take care not to augment it by attaching any new items.
+ */
+ if (conn_src->owner_meta)
+ ret = kdbus_meta_dup(conn_src->owner_meta, &kmsg->meta);
+ else
+ ret = kdbus_meta_new(&kmsg->meta);
+
+ if (ret < 0)
+ return ret;
+ }
+
+ if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
+ kdbus_conn_broadcast(ep, conn_src, kmsg);
+ return 0;
+ }
+
+ if (kmsg->dst_name) {
+ name_entry = kdbus_name_lock(bus->name_registry,
+ kmsg->dst_name);
+ if (!name_entry)
+ return -ESRCH;
+
+ /*
+ * If both a name and a connection ID are given as destination
+ * of a message, check that the currently owning connection of
+ * the name matches the specified ID.
+ * This way, we allow userspace to send the message to a
+ * specific connection by ID only if the connection currently
+ * owns the given name.
+ */
+ if (msg->dst_id != KDBUS_DST_ID_NAME &&
+ msg->dst_id != name_entry->conn->id) {
+ ret = -EREMCHG;
+ goto exit_name_unlock;
+ }
+
+ if (!name_entry->conn && name_entry->activator)
+ conn_dst = kdbus_conn_ref(name_entry->activator);
+ else
+ conn_dst = kdbus_conn_ref(name_entry->conn);
+
+ if ((msg->flags & KDBUS_MSG_FLAGS_NO_AUTO_START) &&
+ kdbus_conn_is_activator(conn_dst)) {
+ ret = -EADDRNOTAVAIL;
+ goto exit_unref;
+ }
+ } else {
+ /* unicast message to unique name */
+ conn_dst = kdbus_bus_find_conn_by_id(bus, msg->dst_id);
+ if (!conn_dst)
+ return -ENXIO;
+
+ /*
+ * Special-purpose connections are not allowed to be addressed
+ * via their unique IDs.
+ */
+ if (!kdbus_conn_is_connected(conn_dst)) {
+ ret = -ENXIO;
+ goto exit_unref;
+ }
+ }
+
+ /*
+ * Record the sequence number of the registered name;
+ * it will be passed on to the queue, in case messages
+ * addressed to a name need to be moved from or to
+ * activator connections of the same name.
+ */
+ if (name_entry)
+ kmsg->dst_name_id = name_entry->name_id;
+
+ if (conn_src) {
+ /*
+ * If we got here due to an interrupted system call, our reply
+ * wait object is still queued on conn_dst, with the former
+ * cookie. Look it up, and in case it exists, go dormant right
+ * away again, and don't queue the message again.
+ */
+ if (sync) {
+ mutex_lock(&conn_dst->lock);
+ ret = kdbus_conn_find_reply(conn_dst, conn_src,
+ kmsg->msg.cookie,
+ &reply_wait);
+ if (ret == 0) {
+ if (reply_wait->interrupted)
+ reply_wait->interrupted = false;
+ else
+ reply_wait = NULL;
+ }
+ mutex_unlock(&conn_dst->lock);
+
+ if (reply_wait)
+ goto wait_sync;
+ }
+
+ ret = kdbus_kmsg_attach_metadata(kmsg, conn_src, conn_dst);
+ if (ret < 0)
+ goto exit_unref;
+
+ if (msg->flags & KDBUS_MSG_FLAGS_EXPECT_REPLY) {
+ ret = kdbus_conn_check_access(ep, msg, conn_src,
+ conn_dst, NULL);
+ if (ret < 0)
+ goto exit_unref;
+
+ ret = kdbus_conn_reply_new(&reply_wait, conn_src, msg,
+ name_entry);
+ if (ret < 0)
+ goto exit_unref;
+ } else {
+ ret = kdbus_conn_check_access(ep, msg, conn_src,
+ conn_dst, &reply_wake);
+ if (ret < 0)
+ goto exit_unref;
+ }
+ }
+
+ if (reply_wake) {
+ /*
+ * If we're synchronously responding to a message, allocate a
+ * queue item and attach it to the reply tracking object.
+ * The connection's queue will never get to see it.
+ */
+ mutex_lock(&conn_dst->lock);
+ if (reply_wake->waiting && kdbus_conn_active(conn_dst))
+ ret = kdbus_queue_entry_alloc(conn_dst, kmsg,
+ &reply_wake->queue_entry);
+ else
+ ret = -ECONNRESET;
+
+ kdbus_conn_reply_sync(reply_wake, ret);
+ kdbus_conn_reply_unref(reply_wake);
+ mutex_unlock(&conn_dst->lock);
+
+ if (ret < 0)
+ goto exit_unref;
+ } else {
+ /*
+ * Otherwise, put it in the queue and wait for the connection
+ * to dequeue and receive the message.
+ */
+ ret = kdbus_conn_entry_insert(conn_dst, conn_src,
+ kmsg, reply_wait);
+ if (ret < 0) {
+ if (reply_wait)
+ kdbus_conn_reply_unref(reply_wait);
+ goto exit_unref;
+ }
+ }
+
+ /* forward to monitors */
+ kdbus_conn_eavesdrop(ep, conn_src, kmsg);
+
+wait_sync:
+ /* no reason to keep names locked for replies */
+ name_entry = kdbus_name_unlock(bus->name_registry, name_entry);
+
+ if (sync) {
+ struct timespec64 ts;
+ u64 now, timeout;
+
+ BUG_ON(!reply_wait);
+
+ ktime_get_ts64(&ts);
+ now = timespec64_to_ns(&ts);
+
+ if (unlikely(msg->timeout_ns <= now))
+ timeout = 0;
+ else
+ timeout = msg->timeout_ns - now;
+
+ ret = kdbus_conn_wait_reply(ep, conn_src, conn_dst, msg,
+ reply_wait, timeout);
+ }
+
+exit_unref:
+ kdbus_conn_unref(conn_dst);
+exit_name_unlock:
+ kdbus_name_unlock(bus->name_registry, name_entry);
+
+ return ret;
+}
+
+/**
+ * kdbus_conn_disconnect() - disconnect a connection
+ * @conn: The connection to disconnect
+ * @ensure_queue_empty: Flag to indicate if the call should fail in
+ * case the connection's message list is not
+ * empty
+ *
+ * If @ensure_msg_list_empty is true, and the connection has pending messages,
+ * -EBUSY is returned.
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty)
+{
+ struct kdbus_conn_reply *reply, *reply_tmp;
+ struct kdbus_queue_entry *entry, *tmp;
+ LIST_HEAD(reply_list);
+
+ mutex_lock(&conn->lock);
+ if (!kdbus_conn_active(conn)) {
+ mutex_unlock(&conn->lock);
+ return -EALREADY;
+ }
+
+ if (ensure_queue_empty && !list_empty(&conn->queue.msg_list)) {
+ mutex_unlock(&conn->lock);
+ return -EBUSY;
+ }
+
+ atomic_add(KDBUS_CONN_ACTIVE_BIAS, &conn->active);
+ mutex_unlock(&conn->lock);
+
+ wake_up_interruptible(&conn->wait);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ rwsem_acquire(&conn->dep_map, 0, 0, _RET_IP_);
+ if (atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_BIAS)
+ lock_contended(&conn->dep_map, _RET_IP_);
+#endif
+
+ wait_event(conn->wait,
+ atomic_read(&conn->active) == KDBUS_CONN_ACTIVE_BIAS);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ lock_acquired(&conn->dep_map, _RET_IP_);
+ rwsem_release(&conn->dep_map, 1, _RET_IP_);
+#endif
+
+ cancel_delayed_work_sync(&conn->work);
+
+ /* lock order: domain -> bus -> ep -> names -> conn */
+ mutex_lock(&conn->ep->lock);
+ down_write(&conn->bus->conn_rwlock);
+
+ /* remove from bus and endpoint */
+ hash_del(&conn->hentry);
+ list_del(&conn->monitor_entry);
+ list_del(&conn->ep_entry);
+
+ up_write(&conn->bus->conn_rwlock);
+ mutex_unlock(&conn->ep->lock);
+
+ /*
+ * Remove all names associated with this connection; this possibly
+ * moves queued messages back to the activator connection.
+ */
+ kdbus_name_remove_by_conn(conn->bus->name_registry, conn);
+
+ /* if we die while other connections wait for our reply, notify them */
+ mutex_lock(&conn->lock);
+ list_for_each_entry_safe(entry, tmp, &conn->queue.msg_list, entry) {
+ if (entry->reply)
+ kdbus_notify_reply_dead(conn->bus, entry->src_id,
+ entry->cookie);
+
+ kdbus_queue_entry_remove(conn, entry);
+ kdbus_pool_slice_free(entry->slice);
+ kdbus_queue_entry_free(entry);
+ }
+ list_splice_init(&conn->reply_list, &reply_list);
+ mutex_unlock(&conn->lock);
+
+ list_for_each_entry_safe(reply, reply_tmp, &reply_list, entry) {
+ if (reply->sync) {
+ kdbus_conn_reply_sync(reply, -EPIPE);
+ continue;
+ }
+
+ /* send a 'connection dead' notification */
+ kdbus_notify_reply_dead(conn->bus, reply->reply_dst->id,
+ reply->cookie);
+
+ list_del(&reply->entry);
+ kdbus_conn_reply_unref(reply);
+ }
+
+ kdbus_notify_id_change(conn->bus, KDBUS_ITEM_ID_REMOVE,
+ conn->id, conn->flags);
+
+ kdbus_notify_flush(conn->bus);
+
+ return 0;
+}
+
+/**
+ * kdbus_conn_active() - connection is not disconnected
+ * @conn: Connection to check
+ *
+ * Return true if the connection was not disconnected, yet. Note that a
+ * connection might be disconnected asynchronously, unless you hold the
+ * connection lock. If that's not suitable for you, see kdbus_conn_acquire() to
+ * suppress connection shutdown for a short period.
+ *
+ * Return: true if the connection is still active
+ */
+bool kdbus_conn_active(const struct kdbus_conn *conn)
+{
+ return atomic_read(&conn->active) >= 0;
+}
+
+/**
+ * kdbus_conn_flush_policy() - flush all cached policy entries that
+ * refer to a connecion
+ * @conn: Connection to check
+ */
+void kdbus_conn_purge_policy_cache(struct kdbus_conn *conn)
+{
+ kdbus_policy_purge_cache(&conn->ep->policy_db, conn);
+ kdbus_policy_purge_cache(&conn->bus->policy_db, conn);
+}
+
+static void __kdbus_conn_free(struct kref *kref)
+{
+ struct kdbus_conn *conn = container_of(kref, struct kdbus_conn, kref);
+
+ BUG_ON(kdbus_conn_active(conn));
+ BUG_ON(delayed_work_pending(&conn->work));
+ BUG_ON(!list_empty(&conn->queue.msg_list));
+ BUG_ON(!list_empty(&conn->names_list));
+ BUG_ON(!list_empty(&conn->names_queue_list));
+ BUG_ON(!list_empty(&conn->reply_list));
+
+ atomic_dec(&conn->user->connections);
+ kdbus_domain_user_unref(conn->user);
+
+ kdbus_conn_purge_policy_cache(conn);
+ kdbus_policy_remove_owner(&conn->bus->policy_db, conn);
+
+ kdbus_meta_free(conn->owner_meta);
+ kdbus_match_db_free(conn->match_db);
+ kdbus_pool_free(conn->pool);
+ kdbus_ep_unref(conn->ep);
+ kdbus_bus_unref(conn->bus);
+ put_cred(conn->cred);
+ kfree(conn->name);
+ kfree(conn);
+}
+
+/**
+ * kdbus_conn_ref() - take a connection reference
+ * @conn: Connection
+ *
+ * Return: the connection itself
+ */
+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn)
+{
+ kref_get(&conn->kref);
+ return conn;
+}
+
+/**
+ * kdbus_conn_unref() - drop a connection reference
+ * @conn: Connection (may be NULL)
+ *
+ * When the last reference is dropped, the connection's internal structure
+ * is freed.
+ *
+ * Return: NULL
+ */
+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn)
+{
+ if (!conn)
+ return NULL;
+
+ kref_put(&conn->kref, __kdbus_conn_free);
+ return NULL;
+}
+
+/**
+ * kdbus_conn_acquire() - acquire an active connection reference
+ * @conn: Connection
+ *
+ * Users can close a connection via KDBUS_BYEBYE (or by destroying the
+ * endpoint/bus/...) at any time. Whenever this happens, we should deny any
+ * user-visible action on this connection and signal ECONNRESET instead.
+ * To avoid testing for connection availability everytime you take the
+ * connection-lock, you can acquire a connection for short periods.
+ *
+ * By calling kdbus_conn_acquire(), you gain an "active reference" to the
+ * connection. You must also hold a regular reference at any time! As long as
+ * you hold the active-ref, the connection will not be shut down. However, if
+ * the connection was shut down, you can never acquire an active-ref again.
+ *
+ * kdbus_conn_disconnect() disables the connection and then waits for all active
+ * references to be dropped. It will also wake up any pending operation.
+ * However, you must not sleep for an indefinite period while holding an
+ * active-reference. Otherwise, kdbus_conn_disconnect() might stall. If you need
+ * to sleep for an indefinite period, either release the reference and try to
+ * acquire it again after waking up, or make kdbus_conn_disconnect() wake up
+ * your wait-queue.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_conn_acquire(struct kdbus_conn *conn)
+{
+ if (!atomic_inc_unless_negative(&conn->active))
+ return -ECONNRESET;
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
+#endif
+
+ return 0;
+}
+
+/**
+ * kdbus_conn_release() - release an active connection reference
+ * @conn: Connection
+ *
+ * This releases an active reference that has been acquired via
+ * kdbus_conn_acquire(). If the connection was already disabled and this is the
+ * last active-ref that is dropped, the disconnect-waiter will be woken up and
+ * properly close the connection.
+ */
+void kdbus_conn_release(struct kdbus_conn *conn)
+{
+ int v;
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ rwsem_release(&conn->dep_map, 1, _RET_IP_);
+#endif
+
+ v = atomic_dec_return(&conn->active);
+ if (v != KDBUS_CONN_ACTIVE_BIAS)
+ return;
+
+ wake_up_all(&conn->wait);
+}
+
+/**
+ * kdbus_conn_move_messages() - move messages from one connection to another
+ * @conn_dst: Connection to copy to
+ * @conn_src: Connection to copy from
+ * @name_id: Filter for the sequence number of the registered
+ * name, 0 means no filtering.
+ *
+ * Move all messages from one connection to another. This is used when
+ * an implementor connection is taking over/giving back a well-known name
+ * from/to an activator connection.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
+ struct kdbus_conn *conn_src,
+ u64 name_id)
+{
+ struct kdbus_queue_entry *q, *q_tmp;
+ struct kdbus_conn_reply *r, *r_tmp;
+ LIST_HEAD(reply_list);
+ LIST_HEAD(msg_list);
+ int ret = 0;
+
+ BUG_ON(!mutex_is_locked(&conn_dst->bus->lock));
+ BUG_ON(conn_src == conn_dst);
+
+ /* remove all messages from the source */
+ mutex_lock(&conn_src->lock);
+ list_for_each_entry_safe(r, r_tmp, &conn_src->reply_list, entry) {
+ /* filter messages for a specific name */
+ if (name_id > 0 && r->name_id != name_id)
+ continue;
+
+ list_move_tail(&r->entry, &reply_list);
+ }
+ list_for_each_entry_safe(q, q_tmp, &conn_src->queue.msg_list, entry) {
+ /* filter messages for a specific name */
+ if (name_id > 0 && q->dst_name_id != name_id)
+ continue;
+
+ kdbus_queue_entry_remove(conn_src, q);
+ list_add_tail(&q->entry, &msg_list);
+ }
+ mutex_unlock(&conn_src->lock);
+
+ /* insert messages into destination */
+ mutex_lock(&conn_dst->lock);
+ if (!kdbus_conn_active(conn_dst)) {
+ struct kdbus_conn_reply *r, *r_tmp;
+
+ /* our destination connection died, just drop all messages */
+ mutex_unlock(&conn_dst->lock);
+ list_for_each_entry_safe(q, q_tmp, &msg_list, entry)
+ kdbus_queue_entry_free(q);
+ list_for_each_entry_safe(r, r_tmp, &reply_list, entry)
+ kdbus_conn_reply_unref(r);
+ return -ECONNRESET;
+ }
+
+ list_for_each_entry_safe(q, q_tmp, &msg_list, entry) {
+ ret = kdbus_pool_move_slice(conn_dst->pool, conn_src->pool,
+ &q->slice);
+ if (ret < 0)
+ kdbus_queue_entry_free(q);
+ else
+ kdbus_queue_entry_add(&conn_dst->queue, q);
+ }
+ list_splice(&reply_list, &conn_dst->reply_list);
+ mutex_unlock(&conn_dst->lock);
+
+ /* wake up poll() */
+ wake_up_interruptible(&conn_dst->wait);
+
+ return ret;
+}
+
+/**
+ * kdbus_cmd_info() - retrieve info about a connection
+ * @conn: Connection
+ * @cmd_info: The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_info(struct kdbus_conn *conn,
+ struct kdbus_cmd_info *cmd_info)
+{
+ struct kdbus_name_entry *entry = NULL;
+ struct kdbus_conn *owner_conn = NULL;
+ struct kdbus_info info = {};
+ struct kdbus_meta *meta = NULL;
+ struct kdbus_pool_slice *slice;
+ size_t pos;
+ int ret = 0;
+ u64 flags;
+
+ if (cmd_info->id == 0) {
+ const char *name;
+
+ ret = kdbus_items_get_str(cmd_info->items,
+ KDBUS_ITEMS_SIZE(cmd_info, items),
+ KDBUS_ITEM_NAME, &name);
+ if (ret < 0)
+ return -EINVAL;
+
+ if (!kdbus_name_is_valid(name, false))
+ return -EINVAL;
+
+ /* check if 'conn' is allowed to see 'name' */
+ ret = kdbus_ep_policy_check_see_access(conn->ep, conn, name);
+ if (ret < 0)
+ return ret;
+
+ entry = kdbus_name_lock(conn->bus->name_registry, name);
+ if (!entry)
+ return -ESRCH;
+ else if (entry->conn)
+ owner_conn = kdbus_conn_ref(entry->conn);
+ } else {
+ owner_conn = kdbus_bus_find_conn_by_id(conn->bus, cmd_info->id);
+ if (!owner_conn) {
+ ret = -ENXIO;
+ goto exit;
+ }
+
+ /* check if 'conn' is allowed to see any of owner_conn's names*/
+ ret = kdbus_ep_policy_check_src_names(conn->ep, owner_conn,
+ conn);
+ if (ret < 0)
+ return ret;
+ }
+
+ info.size = sizeof(info);
+ info.id = owner_conn->id;
+ info.flags = owner_conn->flags;
+
+ /* do not leak domain-specific credentials */
+ if (kdbus_meta_ns_eq(conn->meta, owner_conn->meta))
+ info.size += owner_conn->meta->size;
+
+ /*
+ * Unlike the rest of the values which are cached at connection
+ * creation time, some values need to be appended here because
+ * at creation time a connection does not have names and other
+ * properties.
+ */
+ flags = cmd_info->flags & (KDBUS_ATTACH_NAMES | KDBUS_ATTACH_CONN_NAME);
+ if (flags) {
+ ret = kdbus_meta_new(&meta);
+ if (ret < 0)
+ goto exit;
+
+ ret = kdbus_meta_append(meta, owner_conn, 0, flags);
+ if (ret < 0)
+ goto exit;
+
+ info.size += meta->size;
+ }
+
+ ret = kdbus_pool_slice_alloc(conn->pool, &slice, info.size);
+ if (ret < 0)
+ goto exit;
+
+ ret = kdbus_pool_slice_copy(slice, 0, &info, sizeof(info));
+ if (ret < 0)
+ goto exit_free;
+ pos = sizeof(info);
+
+ if (kdbus_meta_ns_eq(conn->meta, owner_conn->meta)) {
+ ret = kdbus_pool_slice_copy(slice, pos, owner_conn->meta->data,
+ owner_conn->meta->size);
+ if (ret < 0)
+ goto exit_free;
+
+ pos += owner_conn->meta->size;
+ }
+
+ if (meta) {
+ ret = kdbus_pool_slice_copy(slice, pos, meta->data, meta->size);
+ if (ret < 0)
+ goto exit_free;
+ }
+
+ /* write back the offset */
+ cmd_info->offset = kdbus_pool_slice_offset(slice);
+ kdbus_pool_slice_flush(slice);
+ kdbus_pool_slice_make_public(slice);
+
+exit_free:
+ if (ret < 0)
+ kdbus_pool_slice_free(slice);
+
+exit:
+ kdbus_meta_free(meta);
+ kdbus_conn_unref(owner_conn);
+ kdbus_name_unlock(conn->bus->name_registry, entry);
+
+ return ret;
+}
+
+/**
+ * kdbus_cmd_conn_update() - update the attach-flags of a connection or
+ * the policy entries of a policy holding one
+ * @conn: Connection
+ * @cmd: The command as passed in by the ioctl
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_cmd_conn_update(struct kdbus_conn *conn,
+ const struct kdbus_cmd_update *cmd)
+{
+ const struct kdbus_item *item;
+ bool policy_provided = false;
+ bool flags_provided = false;
+ u64 attach_flags;
+ int ret;
+
+ KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
+ switch (item->type) {
+ case KDBUS_ITEM_ATTACH_FLAGS:
+ /*
+ * Only ordinary or monitor connections
+ * may update their attach-flags.
+ */
+ if (!kdbus_conn_is_connected(conn) &&
+ !kdbus_conn_is_monitor(conn))
+ return -EOPNOTSUPP;
+
+ flags_provided = true;
+ attach_flags = item->data64[0];
+ break;
+
+ case KDBUS_ITEM_NAME:
+ case KDBUS_ITEM_POLICY_ACCESS:
+ /*
+ * Only policy holders may update their policy entries.
+ */
+ if (!kdbus_conn_is_policy_holder(conn))
+ return -EOPNOTSUPP;
+
+ policy_provided = true;
+ break;
+ }
+ }
+
+ if (policy_provided) {
+ ret = kdbus_policy_set(&conn->bus->policy_db, cmd->items,
+ KDBUS_ITEMS_SIZE(cmd, items),
+ 1, true, conn);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (flags_provided)
+ atomic64_set(&conn->attach_flags, attach_flags);
+
+ return 0;
+}
+
+/**
+ * kdbus_conn_new() - create a new connection
+ * @ep: The endpoint the connection is connected to
+ * @hello: The kdbus_cmd_hello as passed in by the user
+ * @meta: The metadata gathered at open() time of the handle
+ * @c: Returned connection
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+int kdbus_conn_new(struct kdbus_ep *ep,
+ struct kdbus_cmd_hello *hello,
+ struct kdbus_meta *meta,
+ struct kdbus_conn **c)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ static struct lock_class_key __key;
+#endif
+ const struct kdbus_creds *creds = NULL;
+ const struct kdbus_item *item;
+ const char *conn_name = NULL;
+ const char *seclabel = NULL;
+ const char *name = NULL;
+ struct kdbus_conn *conn;
+ struct kdbus_bus *bus = ep->bus;
+ size_t seclabel_len = 0;
+ bool is_policy_holder;
+ bool is_activator;
+ bool is_monitor;
+ int ret;
+
+ BUG_ON(*c);
+
+ is_monitor = hello->flags & KDBUS_HELLO_MONITOR;
+ is_activator = hello->flags & KDBUS_HELLO_ACTIVATOR;
+ is_policy_holder = hello->flags & KDBUS_HELLO_POLICY_HOLDER;
+
+ /* can't be activator or policy holder and monitor at the same time */
+ if (is_monitor && (is_activator || is_policy_holder))
+ return -EINVAL;
+
+ /* can't be policy holder and activator at the same time */
+ if (is_activator && is_policy_holder)
+ return -EINVAL;
+
+ /* only privileged connections can activate and monitor */
+ if (!kdbus_bus_uid_is_privileged(bus) &&
+ (is_activator || is_policy_holder || is_monitor))
+ return -EPERM;
+
+ KDBUS_ITEMS_FOREACH(item, hello->items,
+ KDBUS_ITEMS_SIZE(hello, items)) {
+ switch (item->type) {
+ case KDBUS_ITEM_NAME:
+ if (!is_activator && !is_policy_holder)
+ return -EINVAL;
+
+ if (name)
+ return -EINVAL;
+
+ if (!kdbus_name_is_valid(item->str, true))
+ return -EINVAL;
+
+ name = item->str;
+ break;
+
+ case KDBUS_ITEM_CREDS:
+ /* privileged processes can impersonate somebody else */
+ if (!kdbus_bus_uid_is_privileged(bus))
+ return -EPERM;
+
+ if (item->size != KDBUS_ITEM_SIZE(sizeof(*creds)))
+ return -EINVAL;
+
+ creds = &item->creds;
+ break;
+
+ case KDBUS_ITEM_SECLABEL:
+ /* privileged processes can impersonate somebody else */
+ if (!kdbus_bus_uid_is_privileged(bus))
+ return -EPERM;
+
+ seclabel = item->str;
+ seclabel_len = item->size - KDBUS_ITEM_HEADER_SIZE;
+ break;
+
+ case KDBUS_ITEM_CONN_NAME:
+ /* human-readable connection name (debugging) */
+ if (conn_name)
+ return -EINVAL;
+
+ conn_name = item->str;
+ break;
+ }
+ }
+
+ if ((is_activator || is_policy_holder) && !name)
+ return -EINVAL;
+
+ conn = kzalloc(sizeof(*conn), GFP_KERNEL);
+ if (!conn)
+ return -ENOMEM;
+
+ if (is_activator || is_policy_holder) {
+ /*
+ * Policy holders may install one name, and are
+ * allowed to use wildcards.
+ */
+ ret = kdbus_policy_set(&bus->policy_db, hello->items,
+ KDBUS_ITEMS_SIZE(hello, items),
+ 1, is_policy_holder, conn);
+ if (ret < 0)
+ goto exit_free_conn;
+ }
+
+ if (conn_name) {
+ conn->name = kstrdup(conn_name, GFP_KERNEL);
+ if (!conn->name) {
+ ret = -ENOMEM;
+ goto exit_free_conn;
+ }
+ }
+
+ kref_init(&conn->kref);
+ atomic_set(&conn->active, 0);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ lockdep_init_map(&conn->dep_map, "s_active", &__key, 0);
+#endif
+ mutex_init(&conn->lock);
+ INIT_LIST_HEAD(&conn->names_list);
+ INIT_LIST_HEAD(&conn->names_queue_list);
+ INIT_LIST_HEAD(&conn->reply_list);
+ atomic_set(&conn->name_count, 0);
+ atomic_set(&conn->reply_count, 0);
+ INIT_DELAYED_WORK(&conn->work, kdbus_conn_work);
+ conn->cred = get_current_cred();
+ init_waitqueue_head(&conn->wait);
+ kdbus_queue_init(&conn->queue);
+
+ /* init entry, so we can unconditionally remove it */
+ INIT_LIST_HEAD(&conn->monitor_entry);
+
+ ret = kdbus_pool_new(conn->name, &conn->pool, hello->pool_size);
+ if (ret < 0)
+ goto exit_unref_cred;
+
+ ret = kdbus_match_db_new(&conn->match_db);
+ if (ret < 0)
+ goto exit_free_pool;
+
+ conn->bus = kdbus_bus_ref(ep->bus);
+ conn->ep = kdbus_ep_ref(ep);
+
+ /* get new id for this connection */
+ conn->id = atomic64_inc_return(&bus->conn_seq_last);
+
+ /* return properties of this connection to the caller */
+ hello->bus_flags = bus->bus_flags;
+ hello->bloom = bus->bloom;
+ hello->id = conn->id;
+
+ BUILD_BUG_ON(sizeof(bus->id128) != sizeof(hello->id128));
+ memcpy(hello->id128, bus->id128, sizeof(hello->id128));
+
+ conn->flags = hello->flags;
+ atomic64_set(&conn->attach_flags, hello->attach_flags);
+
+ if (is_activator) {
+ u64 flags = KDBUS_NAME_ACTIVATOR;
+
+ ret = kdbus_name_acquire(bus->name_registry, conn,
+ name, &flags, NULL);
+ if (ret < 0)
+ goto exit_unref_ep;
+ }
+
+ if (is_monitor) {
+ down_write(&bus->conn_rwlock);
+ list_add_tail(&conn->monitor_entry, &bus->monitors_list);
+ up_write(&bus->conn_rwlock);
+ }
+
+ /* privileged processes can impersonate somebody else */
+ if (creds || seclabel) {
+ ret = kdbus_meta_new(&conn->owner_meta);
+ if (ret < 0)
+ goto exit_release_names;
+
+ if (creds) {
+ ret = kdbus_meta_append_data(conn->owner_meta,
+ KDBUS_ITEM_CREDS,
+ creds, sizeof(*creds));
+ if (ret < 0)
+ goto exit_free_meta;
+ }
+
+ if (seclabel) {
+ ret = kdbus_meta_append_data(conn->owner_meta,
+ KDBUS_ITEM_SECLABEL,
+ seclabel, seclabel_len);
+ if (ret < 0)
+ goto exit_free_meta;
+ }
+
+ /* use the information provided with the HELLO call */
+ conn->meta = conn->owner_meta;
+ } else {
+ /* use the connection's metadata gathered at open() */
+ conn->meta = meta;
+ }
+
+ /*
+ * Account the connection against the current user (UID), or for
+ * custom endpoints use the anonymous user assigned to the endpoint.
+ */
+ if (ep->user) {
+ conn->user = kdbus_domain_user_ref(ep->user);
+ } else {
+ ret = kdbus_domain_get_user(ep->bus->domain,
+ current_fsuid(),
+ &conn->user);
+ if (ret < 0)
+ goto exit_free_meta;
+ }
+
+ /* lock order: domain -> bus -> ep -> names -> conn */
+ mutex_lock(&bus->lock);
+ mutex_lock(&ep->lock);
+ down_write(&bus->conn_rwlock);
+
+ if (bus->disconnected || ep->disconnected) {
+ ret = -ESHUTDOWN;
+ goto exit_unref_user_unlock;
+ }
+
+ if (!kdbus_bus_uid_is_privileged(bus) &&
+ atomic_inc_return(&conn->user->connections) > KDBUS_USER_MAX_CONN) {
+ atomic_dec(&conn->user->connections);
+ ret = -EMFILE;
+ goto exit_unref_user_unlock;
+ }
+
+ /* link into bus and endpoint */
+ list_add_tail(&conn->ep_entry, &ep->conn_list);
+ hash_add(bus->conn_hash, &conn->hentry, conn->id);
+
+ up_write(&bus->conn_rwlock);
+ mutex_unlock(&ep->lock);
+ mutex_unlock(&bus->lock);
+
+ /* notify subscribers about the new active connection */
+ ret = kdbus_notify_id_change(conn->bus, KDBUS_ITEM_ID_ADD,
+ conn->id, conn->flags);
+ if (ret < 0) {
+ atomic_dec(&conn->user->connections);
+ goto exit_domain_user_unref;
+ }
+
+ kdbus_notify_flush(conn->bus);
+
+ *c = conn;
+ return 0;
+
+exit_unref_user_unlock:
+ up_write(&bus->conn_rwlock);
+ mutex_unlock(&ep->lock);
+ mutex_unlock(&bus->lock);
+exit_domain_user_unref:
+ kdbus_domain_user_unref(conn->user);
+exit_free_meta:
+ kdbus_meta_free(conn->owner_meta);
+exit_release_names:
+ kdbus_name_remove_by_conn(bus->name_registry, conn);
+exit_unref_ep:
+ kdbus_ep_unref(conn->ep);
+ kdbus_bus_unref(conn->bus);
+ kdbus_match_db_free(conn->match_db);
+exit_free_pool:
+ kdbus_pool_free(conn->pool);
+exit_unref_cred:
+ put_cred(conn->cred);
+exit_free_conn:
+ kfree(conn->name);
+ kfree(conn);
+
+ return ret;
+}
+
+/**
+ * kdbus_conn_has_name() - check if a connection owns a name
+ * @conn: Connection
+ * @name: Well-know name to check for
+ *
+ * Return: true if the name is currently owned by the connection
+ */
+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name)
+{
+ struct kdbus_name_entry *e;
+ bool match = false;
+
+ mutex_lock(&conn->lock);
+ list_for_each_entry(e, &conn->names_list, conn_entry) {
+ if (strcmp(e->name, name) == 0) {
+ match = true;
+ break;
+ }
+ }
+ mutex_unlock(&conn->lock);
+
+ return match;
+}
diff --git a/drivers/misc/kdbus/connection.h b/drivers/misc/kdbus/connection.h
new file mode 100644
index 000000000000..01a5bd8feda7
--- /dev/null
+++ b/drivers/misc/kdbus/connection.h
@@ -0,0 +1,177 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ * Copyright (C) 2014 Djalal Harouni
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_CONNECTION_H
+#define __KDBUS_CONNECTION_H
+
+#include <linux/atomic.h>
+#include <linux/lockdep.h>
+#include "limits.h"
+#include "metadata.h"
+#include "pool.h"
+#include "queue.h"
+#include "util.h"
+
+#define KDBUS_HELLO_SPECIAL_CONN (KDBUS_HELLO_ACTIVATOR | \
+ KDBUS_HELLO_POLICY_HOLDER | \
+ KDBUS_HELLO_MONITOR)
+
+/**
+ * struct kdbus_conn - connection to a bus
+ * @kref: Reference count
+ * @active: Active references to the connection
+ * @id: Connection ID
+ * @flags: KDBUS_HELLO_* flags
+ * @attach_flags: KDBUS_ATTACH_* flags
+ * @name: Human-readable connection name, used for debugging
+ * @bus: The bus this connection belongs to
+ * @ep: The endpoint this connection belongs to
+ * @lock: Connection data lock
+ * @msg_users: Array to account the number of queued messages per
+ * individual user
+ * @msg_users_max: Size of the users array
+ * @hentry: Entry in ID <-> connection map
+ * @ep_entry: Entry in endpoint
+ * @monitor_entry: Entry in monitor, if the connection is a monitor
+ * @names_list: List of well-known names
+ * @names_queue_list: Well-known names this connection waits for
+ * @reply_list: List of connections this connection expects
+ * a reply from.
+ * @work: Delayed work to handle timeouts
+ * @activator_of: Well-known name entry this connection acts as an
+ * activator for
+ * @match_db: Subscription filter to broadcast messages
+ * @meta: Active connection creator's metadata/credentials,
+ * either from the handle or from HELLO
+ * @owner_meta: The connection's metadata/credentials supplied by
+ * HELLO
+ * @pool: The user's buffer to receive messages
+ * @user: Owner of the connection
+ * @cred: The credentials of the connection at creation time
+ * @name_count: Number of owned well-known names
+ * @reply_count: Number of requests this connection has issued, and
+ * waits for replies from the peer
+ * @wait: Wake up this endpoint
+ * @queue: The message queue associcated with this connection
+ */
+struct kdbus_conn {
+ struct kref kref;
+ atomic_t active;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+ u64 id;
+ u64 flags;
+ atomic64_t attach_flags;
+ const char *name;
+ struct kdbus_bus *bus;
+ struct kdbus_ep *ep;
+ struct mutex lock;
+ unsigned int *msg_users;
+ unsigned int msg_users_max;
+ struct hlist_node hentry;
+ struct list_head ep_entry;
+ struct list_head monitor_entry;
+ struct list_head names_list;
+ struct list_head names_queue_list;
+ struct list_head reply_list;
+ struct delayed_work work;
+ struct kdbus_name_entry *activator_of;
+ struct kdbus_match_db *match_db;
+ struct kdbus_meta *meta;
+ struct kdbus_meta *owner_meta;
+ struct kdbus_pool *pool;
+ struct kdbus_domain_user *user;
+ const struct cred *cred;
+ atomic_t name_count;
+ atomic_t reply_count;
+ wait_queue_head_t wait;
+ struct kdbus_queue queue;
+};
+
+struct kdbus_kmsg;
+struct kdbus_name_registry;
+
+int kdbus_conn_new(struct kdbus_ep *ep,
+ struct kdbus_cmd_hello *hello,
+ struct kdbus_meta *meta,
+ struct kdbus_conn **conn);
+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn);
+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn);
+int kdbus_conn_acquire(struct kdbus_conn *conn);
+void kdbus_conn_release(struct kdbus_conn *conn);
+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty);
+bool kdbus_conn_active(const struct kdbus_conn *conn);
+void kdbus_conn_purge_policy_cache(struct kdbus_conn *conn);
+
+int kdbus_cmd_msg_recv(struct kdbus_conn *conn,
+ struct kdbus_cmd_recv *recv);
+int kdbus_cmd_msg_cancel(struct kdbus_conn *conn,
+ u64 cookie);
+int kdbus_cmd_info(struct kdbus_conn *conn,
+ struct kdbus_cmd_info *cmd_info);
+int kdbus_cmd_conn_update(struct kdbus_conn *conn,
+ const struct kdbus_cmd_update *cmd_update);
+int kdbus_conn_kmsg_send(struct kdbus_ep *ep,
+ struct kdbus_conn *conn_src,
+ struct kdbus_kmsg *kmsg);
+int kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
+ struct kdbus_conn *conn_src,
+ u64 name_id);
+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name);
+
+/**
+ * kdbus_conn_is_connected() - Check if connection is ordinary
+ * @conn: The connection to check
+ *
+ * Return: Non-zero if the connection is an ordinary connection
+ */
+static inline int kdbus_conn_is_connected(const struct kdbus_conn *conn)
+{
+ return !(conn->flags & KDBUS_HELLO_SPECIAL_CONN);
+}
+
+/**
+ * kdbus_conn_is_activator() - Check if connection is an activator
+ * @conn: The connection to check
+ *
+ * Return: Non-zero if the connection is an activator
+ */
+static inline int kdbus_conn_is_activator(const struct kdbus_conn *conn)
+{
+ return conn->flags & KDBUS_HELLO_ACTIVATOR;
+}
+
+/**
+ * kdbus_conn_is_policy_holder() - Check if connection is a policy holder
+ * @conn: The connection to check
+ *
+ * Return: Non-zero if the connection is a policy holder
+ */
+static inline int kdbus_conn_is_policy_holder(const struct kdbus_conn *conn)
+{
+ return conn->flags & KDBUS_HELLO_POLICY_HOLDER;
+}
+
+/**
+ * kdbus_conn_is_monitor() - Check if connection is a monitor
+ * @conn: The connection to check
+ *
+ * Return: Non-zero if the connection is a monitor
+ */
+static inline int kdbus_conn_is_monitor(const struct kdbus_conn *conn)
+{
+ return conn->flags & KDBUS_HELLO_MONITOR;
+}
+#endif
diff --git a/drivers/misc/kdbus/item.c b/drivers/misc/kdbus/item.c
new file mode 100644
index 000000000000..abcd1ada5567
--- /dev/null
+++ b/drivers/misc/kdbus/item.c
@@ -0,0 +1,256 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/ctype.h>
+#include <linux/string.h>
+
+#include "item.h"
+#include "limits.h"
+#include "util.h"
+
+#define KDBUS_ITEM_VALID(_i, _is, _s) \
+ ((_i)->size > KDBUS_ITEM_HEADER_SIZE && \
+ (u8 *)(_i) + (_i)->size <= (u8 *)(_is) + (_s) && \
+ (u8 *)(_i) >= (u8 *)(_is))
+
+#define KDBUS_ITEMS_END(_i, _is, _s) \
+ ((u8 *)_i == ((u8 *)(_is) + KDBUS_ALIGN8(_s)))
+
+/**
+ * kdbus_item_validate_name() - validate an item containing a name
+ * @item: Item to validate
+ *
+ * Return: zero on success or an negative error code on failure
+ */
+int kdbus_item_validate_name(const struct kdbus_item *item)
+{
+ if (item->size < KDBUS_ITEM_HEADER_SIZE + 2)
+ return -EINVAL;
+
+ if (item->size > KDBUS_ITEM_HEADER_SIZE +
+ KDBUS_SYSNAME_MAX_LEN + 1)
+ return -ENAMETOOLONG;
+
+ if (!kdbus_str_valid(item->str, KDBUS_ITEM_PAYLOAD_SIZE(item)))
+ return -EINVAL;
+
+ return kdbus_sysname_is_valid(item->str);
+}
+
+static int kdbus_item_validate(const struct kdbus_item *item)
+{
+ size_t payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
+ size_t l;
+ int ret;
+
+ if (item->size < KDBUS_ITEM_HEADER_SIZE)
+ return -EINVAL;
+
+ switch (item->type) {
+ case KDBUS_ITEM_PAYLOAD_VEC:
+ if (payload_size != sizeof(struct kdbus_vec))
+ return -EINVAL;
+ if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_PAYLOAD_OFF:
+ if (payload_size != sizeof(struct kdbus_vec))
+ return -EINVAL;
+ if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_PAYLOAD_MEMFD:
+ if (payload_size != sizeof(struct kdbus_memfd))
+ return -EINVAL;
+ if (item->memfd.size == 0 || item->memfd.size > SIZE_MAX)
+ return -EINVAL;
+ if (item->memfd.fd < 0)
+ return -EBADF;
+ break;
+
+ case KDBUS_ITEM_FDS:
+ if (payload_size % sizeof(int) != 0)
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_BLOOM_PARAMETER:
+ if (payload_size != sizeof(struct kdbus_bloom_parameter))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_BLOOM_FILTER:
+ /* followed by the bloom-mask, depends on the bloom-size */
+ if (payload_size < sizeof(struct kdbus_bloom_filter))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_BLOOM_MASK:
+ /* size depends on bloom-size of bus */
+ break;
+
+ case KDBUS_ITEM_CONN_NAME:
+ case KDBUS_ITEM_MAKE_NAME:
+ ret = kdbus_item_validate_name(item);
+ if (ret < 0)
+ return ret;
+ break;
+
+ case KDBUS_ITEM_ATTACH_FLAGS:
+ case KDBUS_ITEM_ID:
+ if (payload_size != sizeof(u64))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_TIMESTAMP:
+ if (payload_size != sizeof(struct kdbus_timestamp))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_CREDS:
+ if (payload_size != sizeof(struct kdbus_creds))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_AUXGROUPS:
+ if (payload_size % sizeof(u64) != 0)
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_NAME:
+ case KDBUS_ITEM_DST_NAME:
+ case KDBUS_ITEM_PID_COMM:
+ case KDBUS_ITEM_TID_COMM:
+ case KDBUS_ITEM_EXE:
+ case KDBUS_ITEM_CMDLINE:
+ case KDBUS_ITEM_CGROUP:
+ case KDBUS_ITEM_SECLABEL:
+ if (!kdbus_str_valid(item->str, payload_size))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_CAPS:
+ /* TODO */
+ break;
+
+ case KDBUS_ITEM_AUDIT:
+ if (payload_size != sizeof(struct kdbus_audit))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_POLICY_ACCESS:
+ if (payload_size != sizeof(struct kdbus_policy_access))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_NAME_ADD:
+ case KDBUS_ITEM_NAME_REMOVE:
+ case KDBUS_ITEM_NAME_CHANGE:
+ if (payload_size < sizeof(struct kdbus_notify_name_change))
+ return -EINVAL;
+ l = payload_size - offsetof(struct kdbus_notify_name_change,
+ name);
+ if (l > 0 && !kdbus_str_valid(item->name_change.name, l))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_ID_ADD:
+ case KDBUS_ITEM_ID_REMOVE:
+ if (payload_size != sizeof(struct kdbus_notify_id_change))
+ return -EINVAL;
+ break;
+
+ case KDBUS_ITEM_REPLY_TIMEOUT:
+ case KDBUS_ITEM_REPLY_DEAD:
+ if (payload_size != 0)
+ return -EINVAL;
+ break;
+
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+/**
+ * kdbus_items_validate() - validate items passed by user-space
+ * @items: items to validate
+ * @items_size: number of items
+ *
+ * This verifies that the passed items pointer is consistent and valid.
+ * Furthermore, each item is checked for:
+ * - valid "size" value
+ * - payload is of expected type
+ * - payload is fully included in the item
+ * - string payloads are zero-terminated
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size)
+{
+ const struct kdbus_item *item;
+ int ret;
+
+ KDBUS_ITEMS_FOREACH(item, items, items_size) {
+ if (!KDBUS_ITEM_VALID(item, items, items_size))
+ return -EINVAL;
+
+ ret = kdbus_item_validate(item);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (!KDBUS_ITEMS_END(item, items, items_size))
+ return -EINVAL;
+
+ return 0;
+}
+
+/**
+ * kdbus_items_get_str() - get string from a list of items
+ * @items: The items to walk
+ * @items_size: The size of all items
+ * @item_type: The item type to look for
+ * @str_ret: A pointer to store the found name
+ *
+ * This function walks a list of items and searches for items of type
+ * @item_type. If it finds exactly one such item, @str_ret will be set to
+ * the .str member of the item.
+ *
+ * Return: 0 if the item was found exactly once, -EEXIST if the item was
+ * found more than once, and -EBADMSG if there was no item of the given type.
+ */
+int kdbus_items_get_str(const struct kdbus_item *items, size_t items_size,
+ unsigned int item_type, const char **str_ret)
+{
+ const struct kdbus_item *item;
+ const char *n = NULL;
+
+ KDBUS_ITEMS_FOREACH(item, items, items_size) {
+ if (item->type == item_type) {
+ if (n)
+ return -EEXIST;
+
+ n = item->str;
+ continue;
+ }
+ }
+
+ if (!n)
+ return -EBADMSG;
+
+ *str_ret = n;
+ return 0;
+}
diff --git a/drivers/misc/kdbus/item.h b/drivers/misc/kdbus/item.h
new file mode 100644
index 000000000000..63ff4f1c9208
--- /dev/null
+++ b/drivers/misc/kdbus/item.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_ITEM_H
+#define __KDBUS_ITEM_H
+
+#include <linux/kernel.h>
+#include <uapi/linux/kdbus.h>
+
+#include "util.h"
+
+/* generic access and iterators over a stream of items */
+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
+#define KDBUS_ITEM_PAYLOAD_SIZE(_i) ((_i)->size - KDBUS_ITEM_HEADER_SIZE)
+#define KDBUS_ITEM_SIZE(_s) KDBUS_ALIGN8(KDBUS_ITEM_HEADER_SIZE + (_s))
+#define KDBUS_ITEM_NEXT(_i) (typeof(_i))(((u8 *)_i) + KDBUS_ALIGN8((_i)->size))
+#define KDBUS_ITEMS_SIZE(_h, _is) ((_h)->size - offsetof(typeof(*_h), _is))
+
+#define KDBUS_ITEMS_FOREACH(_i, _is, _s) \
+ for (_i = _is; \
+ ((u8 *)(_i) < (u8 *)(_is) + (_s)) && \
+ ((u8 *)(_i) >= (u8 *)(_is)); \
+ _i = KDBUS_ITEM_NEXT(_i))
+
+int kdbus_item_validate_name(const struct kdbus_item *item);
+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size);
+int kdbus_items_get_str(const struct kdbus_item *items, size_t items_size,
+ unsigned int item_type, const char **str_ret);
+
+#endif
diff --git a/drivers/misc/kdbus/message.c b/drivers/misc/kdbus/message.c
new file mode 100644
index 000000000000..8550d62b030c
--- /dev/null
+++ b/drivers/misc/kdbus/message.c
@@ -0,0 +1,420 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/capability.h>
+#include <linux/cgroup.h>
+#include <linux/cred.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/shmem_fs.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <net/sock.h>
+
+#include "bus.h"
+#include "connection.h"
+#include "domain.h"
+#include "endpoint.h"
+#include "handle.h"
+#include "item.h"
+#include "match.h"
+#include "message.h"
+#include "names.h"
+#include "policy.h"
+
+#define KDBUS_KMSG_HEADER_SIZE offsetof(struct kdbus_kmsg, msg)
+
+/**
+ * kdbus_kmsg_free() - free allocated message
+ * @kmsg: Message
+ */
+void kdbus_kmsg_free(struct kdbus_kmsg *kmsg)
+{
+ kdbus_fput_files(kmsg->memfds, kmsg->memfds_count);
+ kdbus_fput_files(kmsg->fds, kmsg->fds_count);
+ kdbus_meta_free(kmsg->meta);
+ kfree(kmsg->memfds);
+ kfree(kmsg->fds);
+ kfree(kmsg);
+}
+
+/**
+ * kdbus_kmsg_new() - allocate message
+ * @extra_size: additional size to reserve for data
+ * @kmsg: Returned Message
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_kmsg_new(size_t extra_size, struct kdbus_kmsg **kmsg)
+{
+ struct kdbus_kmsg *m;
+ size_t size;
+
+ BUG_ON(*kmsg);
+
+ size = sizeof(struct kdbus_kmsg) + KDBUS_ITEM_SIZE(extra_size);
+ m = kzalloc(size, GFP_KERNEL);
+ if (!m)
+ return -ENOMEM;
+
+ m->msg.size = size - KDBUS_KMSG_HEADER_SIZE;
+ m->msg.items[0].size = KDBUS_ITEM_SIZE(extra_size);
+
+ *kmsg = m;
+ return 0;
+}
+
+static int kdbus_handle_check_file(struct file *file)
+{
+ struct inode *inode = file_inode(file);
+ struct socket *sock;
+
+ /*
+ * Don't allow file descriptors in the transport that themselves allow
+ * file descriptor queueing. This will eventually be allowed once both
+ * unix domain sockets and kdbus share a generic garbage collector.
+ */
+
+ if (file->f_op == &kdbus_handle_ops)
+ return -EOPNOTSUPP;
+
+ if (!S_ISSOCK(inode->i_mode))
+ return 0;
+
+ if (file->f_mode & FMODE_PATH)
+ return 0;
+
+ sock = SOCKET_I(inode);
+ if (sock->sk && sock->ops && sock->ops->family == PF_UNIX)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+/*
+ * kdbus_msg_scan_items() - validate incoming data and prepare parsing
+ * @conn: Connection
+ * @kmsg: Message
+ *
+ * Return: 0 on success, negative errno on failure.
+ *
+ * On errors, the caller should drop any taken reference with
+ * kdbus_kmsg_free()
+ */
+static int kdbus_msg_scan_items(struct kdbus_conn *conn,
+ struct kdbus_kmsg *kmsg)
+{
+ const struct kdbus_msg *msg = &kmsg->msg;
+ const struct kdbus_item *item;
+ unsigned int items_count = 0;
+ size_t vecs_size = 0;
+ bool has_bloom = false;
+ bool has_name = false;
+ bool has_fds = false;
+ struct file *f;
+
+ KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items))
+ if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD)
+ kmsg->memfds_count++;
+
+ if (kmsg->memfds_count > 0) {
+ kmsg->memfds = kcalloc(kmsg->memfds_count,
+ sizeof(struct file *), GFP_KERNEL);
+ if (!kmsg->memfds)
+ return -ENOMEM;
+
+ /* reset counter so we can reuse it */
+ kmsg->memfds_count = 0;
+ }
+
+ KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items)) {
+ size_t payload_size;
+
+ if (++items_count > KDBUS_MSG_MAX_ITEMS)
+ return -E2BIG;
+
+ payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
+
+ switch (item->type) {
+ case KDBUS_ITEM_PAYLOAD_VEC:
+ if (vecs_size + item->vec.size <= vecs_size)
+ return -EMSGSIZE;
+
+ vecs_size += item->vec.size;
+ if (vecs_size > KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE)
+ return -EMSGSIZE;
+
+ /* \0-bytes records store only the alignment bytes */
+ if (KDBUS_PTR(item->vec.address))
+ kmsg->vecs_size += item->vec.size;
+ else
+ kmsg->vecs_size += item->vec.size % 8;
+ kmsg->vecs_count++;
+ break;
+
+ case KDBUS_ITEM_PAYLOAD_MEMFD: {
+ int seals, mask;
+ int fd = item->memfd.fd;
+
+ /* Verify the fd and increment the usage count */
+ if (fd < 0)
+ return -EBADF;
+
+ f = fget(fd);
+ if (!f)
+ return -EBADF;
+
+ kmsg->memfds[kmsg->memfds_count] = f;
+ kmsg->memfds_count++;
+
+ /*
+ * We only accept a sealed memfd file whose content
+ * cannot be altered by the sender or anybody else
+ * while it is shared or in-flight. Other files need
+ * to be passed with KDBUS_MSG_FDS.
+ */
+ seals = shmem_get_seals(f);
+ if (seals < 0)
+ return -EMEDIUMTYPE;
+
+ mask = F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE;
+ if ((seals & mask) != mask)
+ return -ETXTBSY;
+
+ /*
+ * The specified size in the item cannot be larger
+ * than the backing file.
+ */
+ if (item->memfd.size > i_size_read(file_inode(f)))
+ return -EBADF;
+
+ break;
+ }
+
+ case KDBUS_ITEM_FDS: {
+ unsigned int n, i;
+
+ /* do not allow multiple fd arrays */
+ if (has_fds)
+ return -EEXIST;
+ has_fds = true;
+
+ /* do not allow to broadcast file descriptors */
+ if (msg->dst_id == KDBUS_DST_ID_BROADCAST)
+ return -ENOTUNIQ;
+
+ n = KDBUS_ITEM_PAYLOAD_SIZE(item) / sizeof(int);
+ if (n > KDBUS_MSG_MAX_FDS)
+ return -EMFILE;
+
+ kmsg->fds = kcalloc(n, sizeof(*kmsg->fds), GFP_KERNEL);
+ if (!kmsg->fds)
+ return -ENOMEM;
+
+ for (i = 0; i < n; i++) {
+ int ret;
+ int fd = item->fds[i];
+
+ /*
+ * Verify the fd and increment the usage count.
+ * Use fget_raw() to allow passing O_PATH fds.
+ */
+ if (fd < 0)
+ return -EBADF;
+
+ f = fget_raw(fd);
+ if (!f)
+ return -EBADF;
+
+ kmsg->fds[i] = f;
+ kmsg->fds_count++;
+
+ ret = kdbus_handle_check_file(f);
+ if (ret < 0)
+ return ret;
+ }
+
+ break;
+ }
+
+ case KDBUS_ITEM_BLOOM_FILTER: {
+ u64 bloom_size;
+
+ /* do not allow multiple bloom filters */
+ if (has_bloom)
+ return -EEXIST;
+ has_bloom = true;
+
+ /* bloom filters are only for broadcast messages */
+ if (msg->dst_id != KDBUS_DST_ID_BROADCAST)
+ return -EBADMSG;
+
+ bloom_size = payload_size -
+ offsetof(struct kdbus_bloom_filter, data);
+
+ /*
+ * Allow only bloom filter sizes of a multiple of 64bit.
+ */
+ if (!KDBUS_IS_ALIGNED8(bloom_size))
+ return -EFAULT;
+
+ /* do not allow mismatching bloom filter sizes */
+ if (bloom_size != conn->bus->bloom.size)
+ return -EDOM;
+
+ kmsg->bloom_filter = &item->bloom_filter;
+ break;
+ }
+
+ case KDBUS_ITEM_DST_NAME:
+ /* do not allow multiple names */
+ if (has_name)
+ return -EEXIST;
+ has_name = true;
+
+ if (!kdbus_name_is_valid(item->str, false))
+ return -EINVAL;
+
+ kmsg->dst_name = item->str;
+ break;
+ }
+ }
+
+ /* name is needed if no ID is given */
+ if (msg->dst_id == KDBUS_DST_ID_NAME && !has_name)
+ return -EDESTADDRREQ;
+
+ if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
+ /* broadcasts can't take names */
+ if (has_name)
+ return -EBADMSG;
+
+ /* broadcast messages require a bloom filter */
+ if (!has_bloom)
+ return -EBADMSG;
+
+ /* timeouts are not allowed for broadcasts */
+ if (msg->timeout_ns > 0)
+ return -ENOTUNIQ;
+ }
+
+ /* bloom filters are for undirected messages only */
+ if (has_name && has_bloom)
+ return -EBADMSG;
+
+ return 0;
+}
+
+/**
+ * kdbus_kmsg_new_from_user() - copy message from user memory
+ * @conn: Connection
+ * @msg: User-provided message
+ * @kmsg: Copy of message
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_kmsg_new_from_user(struct kdbus_conn *conn,
+ struct kdbus_msg __user *msg,
+ struct kdbus_kmsg **kmsg)
+{
+ struct kdbus_kmsg *m;
+ u64 size, alloc_size;
+ int ret;
+
+ BUG_ON(*kmsg);
+
+ if (!KDBUS_IS_ALIGNED8((unsigned long)msg))
+ return -EFAULT;
+
+ if (kdbus_size_get_user(&size, msg, struct kdbus_msg))
+ return -EFAULT;
+
+ if (size < sizeof(struct kdbus_msg) || size > KDBUS_MSG_MAX_SIZE)
+ return -EMSGSIZE;
+
+ alloc_size = size + KDBUS_KMSG_HEADER_SIZE;
+
+ m = kmalloc(alloc_size, GFP_KERNEL);
+ if (!m)
+ return -ENOMEM;
+ memset(m, 0, KDBUS_KMSG_HEADER_SIZE);
+
+ if (copy_from_user(&m->msg, msg, size)) {
+ ret = -EFAULT;
+ goto exit_free;
+ }
+
+ ret = kdbus_items_validate(m->msg.items,
+ KDBUS_ITEMS_SIZE(&m->msg, items));
+ if (ret < 0)
+ goto exit_free;
+
+ /* do not accept kernel-generated messages */
+ if (m->msg.payload_type == KDBUS_PAYLOAD_KERNEL) {
+ ret = -EINVAL;
+ goto exit_free;
+ }
+
+ ret = kdbus_negotiate_flags(&m->msg, msg, struct kdbus_msg,
+ KDBUS_MSG_FLAGS_EXPECT_REPLY |
+ KDBUS_MSG_FLAGS_SYNC_REPLY |
+ KDBUS_MSG_FLAGS_NO_AUTO_START);
+ if (ret < 0)
+ goto exit_free;
+
+ if (m->msg.flags & KDBUS_MSG_FLAGS_EXPECT_REPLY) {
+ /* requests for replies need a timeout */
+ if (m->msg.timeout_ns == 0) {
+ ret = -EINVAL;
+ goto exit_free;
+ }
+
+ /* replies may not be expected for broadcasts */
+ if (m->msg.dst_id == KDBUS_DST_ID_BROADCAST) {
+ ret = -ENOTUNIQ;
+ goto exit_free;
+ }
+ } else {
+ /*
+ * KDBUS_MSG_FLAGS_SYNC_REPLY is only valid together with
+ * KDBUS_MSG_FLAGS_EXPECT_REPLY
+ */
+ if (m->msg.flags & KDBUS_MSG_FLAGS_SYNC_REPLY) {
+ ret = -EINVAL;
+ goto exit_free;
+ }
+ }
+
+ ret = kdbus_msg_scan_items(conn, m);
+ if (ret < 0)
+ goto exit_free;
+
+ /* patch-in the source of this message */
+ if (m->msg.src_id > 0 && m->msg.src_id != conn->id) {
+ ret = -EINVAL;
+ goto exit_free;
+ }
+ m->msg.src_id = conn->id;
+
+ *kmsg = m;
+ return 0;
+
+exit_free:
+ kdbus_kmsg_free(m);
+ return ret;
+}
diff --git a/drivers/misc/kdbus/message.h b/drivers/misc/kdbus/message.h
new file mode 100644
index 000000000000..2c8573423d4f
--- /dev/null
+++ b/drivers/misc/kdbus/message.h
@@ -0,0 +1,72 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_MESSAGE_H
+#define __KDBUS_MESSAGE_H
+
+#include "util.h"
+#include "metadata.h"
+
+/**
+ * struct kdbus_kmsg - internal message handling data
+ * @seq: Domain-global message sequence number
+ * @notify_type: Short-cut for faster lookup
+ * @notify_old_id: Short-cut for faster lookup
+ * @notify_new_id: Short-cut for faster lookup
+ * @notify_name: Short-cut for faster lookup
+ * @dst_name: Short-cut to msg for faster lookup
+ * @dst_name_id: Short-cut to msg for faster lookup
+ * @bloom_filter: Bloom filter to match message properties
+ * @bloom_generation: Generation of bloom element set
+ * @fds: Array of file descriptors to pass
+ * @fds_count: Number of file descriptors to pass
+ * @meta: Appended SCM-like metadata of the sending process
+ * @vecs_size: Size of PAYLOAD data
+ * @vecs_count: Number of PAYLOAD vectors
+ * @memfds_count: Number of memfds to pass
+ * @queue_entry: List of kernel-generated notifications
+ * @msg: Message from or to userspace
+ */
+struct kdbus_kmsg {
+ u64 seq;
+ u64 notify_type;
+ u64 notify_old_id;
+ u64 notify_new_id;
+ const char *notify_name;
+
+ const char *dst_name;
+ u64 dst_name_id;
+ const struct kdbus_bloom_filter *bloom_filter;
+ u64 bloom_generation;
+ struct file **fds;
+ unsigned int fds_count;
+ struct kdbus_meta *meta;
+ size_t vecs_size;
+ unsigned int vecs_count;
+ struct file **memfds;
+ unsigned int memfds_count;
+ struct list_head queue_entry;
+
+ /* variable size, must be the last member */
+ struct kdbus_msg msg;
+};
+
+struct kdbus_ep;
+struct kdbus_conn;
+
+int kdbus_kmsg_new(size_t extra_size, struct kdbus_kmsg **kmsg);
+int kdbus_kmsg_new_from_user(struct kdbus_conn *conn,
+ struct kdbus_msg __user *msg,
+ struct kdbus_kmsg **kmsg);
+void kdbus_kmsg_free(struct kdbus_kmsg *kmsg);
+#endif
diff --git a/drivers/misc/kdbus/queue.c b/drivers/misc/kdbus/queue.c
new file mode 100644
index 000000000000..6693852f7ba8
--- /dev/null
+++ b/drivers/misc/kdbus/queue.c
@@ -0,0 +1,602 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/audit.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/math64.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+
+#include "connection.h"
+#include "item.h"
+#include "message.h"
+#include "metadata.h"
+#include "util.h"
+#include "queue.h"
+
+static int kdbus_queue_entry_fds_install(struct kdbus_queue_entry *entry)
+{
+ unsigned int i;
+ int ret, *fds;
+ size_t count;
+
+ /* get array of file descriptors */
+ count = entry->fds_count + entry->memfds_count;
+ if (!count)
+ return 0;
+
+ fds = kcalloc(count, sizeof(int), GFP_KERNEL);
+ if (!fds)
+ return -ENOMEM;
+
+ /* allocate new file descriptors in the receiver's process */
+ for (i = 0; i < count; i++) {
+ fds[i] = get_unused_fd_flags(O_CLOEXEC);
+ if (fds[i] < 0) {
+ ret = fds[i];
+ goto exit_remove_unused;
+ }
+ }
+
+ if (entry->fds_count) {
+ /* copy the array into the message item */
+ ret = kdbus_pool_slice_copy(entry->slice, entry->fds, fds,
+ entry->fds_count * sizeof(int));
+ if (ret < 0)
+ goto exit_remove_unused;
+
+ /* install files in the receiver's process */
+ for (i = 0; i < entry->fds_count; i++)
+ fd_install(fds[i], get_file(entry->fds_fp[i]));
+ }
+
+ if (entry->memfds_count) {
+ off_t o = entry->fds_count;
+
+ /*
+ * Update the file descriptor number in the items.
+ * We remembered the locations of the values in the buffer.
+ */
+ for (i = 0; i < entry->memfds_count; i++) {
+ ret = kdbus_pool_slice_copy(entry->slice,
+ entry->memfds[i],
+ &fds[o + i], sizeof(int));
+ if (ret < 0)
+ goto exit_rewind_fds;
+ }
+
+ /* install files in the receiver's process */
+ for (i = 0; i < entry->memfds_count; i++)
+ fd_install(fds[o + i], get_file(entry->memfds_fp[i]));
+ }
+
+ kfree(fds);
+ return 0;
+
+exit_rewind_fds:
+ for (i = 0; i < entry->fds_count; i++)
+ sys_close(fds[i]);
+
+exit_remove_unused:
+ for (i = 0; i < count; i++) {
+ if (fds[i] < 0)
+ break;
+
+ put_unused_fd(fds[i]);
+ }
+
+ kfree(fds);
+ return ret;
+}
+
+/**
+ * kdbus_queue_entry_install() - install message components into the
+ * receiver's process
+ * @entry: The queue entry to install
+ *
+ * This function will install file descriptors into 'current'.
+ * Also, it the associated message has metadata attached which's final values
+ * couldn't be determined before (such as details that are related to name
+ * spaces etc), the correct information is patched in at this point.
+ *
+ * Return: 0 on success.
+ */
+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry)
+{
+ int *memfds = NULL;
+ int *fds = NULL;
+ int ret = 0;
+
+ ret = kdbus_queue_entry_fds_install(entry);
+ if (ret < 0)
+ return ret;
+
+ kfree(fds);
+ kfree(memfds);
+ kdbus_pool_slice_flush(entry->slice);
+ return 0;
+}
+
+static int kdbus_queue_entry_payload_add(struct kdbus_queue_entry *entry,
+ const struct kdbus_kmsg *kmsg,
+ size_t items, size_t vec_data)
+{
+ const struct kdbus_item *item;
+ int ret;
+
+ if (kmsg->memfds_count > 0) {
+ entry->memfds = kcalloc(kmsg->memfds_count,
+ sizeof(off_t), GFP_KERNEL);
+ if (!entry->memfds)
+ return -ENOMEM;
+
+ entry->memfds_fp = kcalloc(kmsg->memfds_count,
+ sizeof(struct file *), GFP_KERNEL);
+ if (!entry->memfds_fp)
+ return -ENOMEM;
+ }
+
+ KDBUS_ITEMS_FOREACH(item, kmsg->msg.items,
+ KDBUS_ITEMS_SIZE(&kmsg->msg, items)) {
+ switch (item->type) {
+ case KDBUS_ITEM_PAYLOAD_VEC: {
+ char tmp[KDBUS_ITEM_HEADER_SIZE +
+ sizeof(struct kdbus_vec)];
+ struct kdbus_item *it = (struct kdbus_item *)tmp;
+
+ /* add item */
+ it->type = KDBUS_ITEM_PAYLOAD_OFF;
+ it->size = sizeof(tmp);
+
+ /* a NULL address specifies a \0-bytes record */
+ if (KDBUS_PTR(item->vec.address))
+ it->vec.offset = vec_data;
+ else
+ it->vec.offset = ~0ULL;
+ it->vec.size = item->vec.size;
+ ret = kdbus_pool_slice_copy(entry->slice, items,
+ it, it->size);
+ if (ret < 0)
+ return ret;
+ items += KDBUS_ALIGN8(it->size);
+
+ /* \0-bytes record */
+ if (!KDBUS_PTR(item->vec.address)) {
+ size_t l = item->vec.size % 8;
+ const char *n = "\0\0\0\0\0\0\0";
+
+ if (l == 0)
+ break;
+
+ /*
+ * Preserve the alignment for the next payload
+ * record in the output buffer; write as many
+ * null-bytes to the buffer which the \0-bytes
+ * record would have shifted the alignment.
+ */
+ ret = kdbus_pool_slice_copy(entry->slice,
+ vec_data, n, l);
+ if (ret < 0)
+ return ret;
+
+ vec_data += l;
+ break;
+ }
+
+ /* copy kdbus_vec data from sender to receiver */
+ ret = kdbus_pool_slice_copy_user(entry->slice, vec_data,
+ KDBUS_PTR(item->vec.address), item->vec.size);
+ if (ret < 0)
+ return ret;
+
+ vec_data += item->vec.size;
+ break;
+ }
+
+ case KDBUS_ITEM_PAYLOAD_MEMFD: {
+ char tmp[KDBUS_ITEM_HEADER_SIZE +
+ sizeof(struct kdbus_memfd)];
+ struct kdbus_item *it = (struct kdbus_item *)tmp;
+
+ /* add item */
+ it->type = KDBUS_ITEM_PAYLOAD_MEMFD;
+ it->size = sizeof(tmp);
+ it->memfd.size = item->memfd.size;
+ it->memfd.fd = -1;
+ ret = kdbus_pool_slice_copy(entry->slice, items,
+ it, it->size);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * Remember the file and the location of the fd number
+ * which will be updated at RECV time.
+ */
+ entry->memfds[entry->memfds_count] =
+ items + offsetof(struct kdbus_item, memfd.fd);
+ entry->memfds_fp[entry->memfds_count] =
+ get_file(kmsg->memfds[entry->memfds_count]);
+ entry->memfds_count++;
+
+ items += KDBUS_ALIGN8(it->size);
+ break;
+ }
+
+ default:
+ break;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * kdbus_queue_entry_add() - Add an queue entry to a queue
+ * @queue: The queue to attach the item to
+ * @entry: The entry to attach
+ *
+ * Adds a previously allocated queue item to a queue, and maintains the
+ * priority r/b tree.
+ */
+/* add queue entry to connection, maintain priority queue */
+void kdbus_queue_entry_add(struct kdbus_queue *queue,
+ struct kdbus_queue_entry *entry)
+{
+ struct rb_node **n, *pn = NULL;
+ bool highest = true;
+
+ /* sort into priority entry tree */
+ n = &queue->msg_prio_queue.rb_node;
+ while (*n) {
+ struct kdbus_queue_entry *e;
+
+ pn = *n;
+ e = rb_entry(pn, struct kdbus_queue_entry, prio_node);
+
+ /* existing node for this priority, add to its list */
+ if (likely(entry->priority == e->priority)) {
+ list_add_tail(&entry->prio_entry, &e->prio_entry);
+ goto prio_done;
+ }
+
+ if (entry->priority < e->priority) {
+ n = &pn->rb_left;
+ } else {
+ n = &pn->rb_right;
+ highest = false;
+ }
+ }
+
+ /* cache highest-priority entry */
+ if (highest)
+ queue->msg_prio_highest = &entry->prio_node;
+
+ /* new node for this priority */
+ rb_link_node(&entry->prio_node, pn, n);
+ rb_insert_color(&entry->prio_node, &queue->msg_prio_queue);
+ INIT_LIST_HEAD(&entry->prio_entry);
+
+prio_done:
+ /* add to unsorted fifo list */
+ list_add_tail(&entry->entry, &queue->msg_list);
+ queue->msg_count++;
+}
+
+/**
+ * kdbus_queue_entry_peek() - Retrieves an entry from a queue
+ *
+ * @queue: The queue
+ * @priority: The minimum priority of the entry to peek
+ * @use_priority: Boolean flag whether or not to peek by priority
+ * @entry: Pointer to return the peeked entry
+ *
+ * Look for a entry in a queue, either by priority, or the oldest one (FIFO).
+ * The entry is not freed, put off the queue's lists or anything else.
+ *
+ * Return: 0 on success, -ENOMSG if there is no entry with the requested
+ * priority, or -EAGAIN if there are no entries at all.
+ */
+int kdbus_queue_entry_peek(struct kdbus_queue *queue,
+ s64 priority, bool use_priority,
+ struct kdbus_queue_entry **entry)
+{
+ struct kdbus_queue_entry *e;
+
+ if (queue->msg_count == 0)
+ return -EAGAIN;
+
+ if (use_priority) {
+ /* get next entry with highest priority */
+ e = rb_entry(queue->msg_prio_highest,
+ struct kdbus_queue_entry, prio_node);
+
+ /* no entry with the requested priority */
+ if (e->priority > priority)
+ return -ENOMSG;
+ } else {
+ /* ignore the priority, return the next entry in the entry */
+ e = list_first_entry(&queue->msg_list,
+ struct kdbus_queue_entry, entry);
+ }
+
+ *entry = e;
+
+ return 0;
+}
+
+/**
+ * kdbus_queue_entry_remove() - Remove an entry from a queue
+ * @conn: The connection containing the queue
+ * @entry: The entry to remove
+ *
+ * Remove an entry from both the queue's list and the priority r/b tree.
+ */
+void kdbus_queue_entry_remove(struct kdbus_conn *conn,
+ struct kdbus_queue_entry *entry)
+{
+ struct kdbus_queue *queue = &conn->queue;
+
+ list_del(&entry->entry);
+ queue->msg_count--;
+
+ /* user quota */
+ if (entry->user >= 0) {
+ BUG_ON(conn->msg_users[entry->user] == 0);
+ conn->msg_users[entry->user]--;
+ entry->user = -1;
+ }
+
+ /* the queue is empty, remove the user quota accounting */
+ if (queue->msg_count == 0 && conn->msg_users_max > 0) {
+ kfree(conn->msg_users);
+ conn->msg_users = NULL;
+ conn->msg_users_max = 0;
+ }
+
+ if (list_empty(&entry->prio_entry)) {
+ /*
+ * Single entry for this priority, update cached
+ * highest-priority entry, remove the tree node.
+ */
+ if (queue->msg_prio_highest == &entry->prio_node)
+ queue->msg_prio_highest = rb_next(&entry->prio_node);
+
+ rb_erase(&entry->prio_node, &queue->msg_prio_queue);
+ } else {
+ struct kdbus_queue_entry *q;
+
+ /*
+ * Multiple entries for this priority entry, get next one in
+ * the list. Update cached highest-priority entry, store the
+ * new one as the tree node.
+ */
+ q = list_first_entry(&entry->prio_entry,
+ struct kdbus_queue_entry, prio_entry);
+ list_del(&entry->prio_entry);
+
+ if (queue->msg_prio_highest == &entry->prio_node)
+ queue->msg_prio_highest = &q->prio_node;
+
+ rb_replace_node(&entry->prio_node, &q->prio_node,
+ &queue->msg_prio_queue);
+ }
+}
+
+/**
+ * kdbus_queue_entry_alloc() - allocate a queue entry
+ * @conn: The connection that holds the queue
+ * @kmsg: The kmsg object the queue entry should track
+ * @e: Pointer to return the allocated entry
+ *
+ * Allocates a queue entry based on a given kmsg and allocate space for
+ * the message payload and the requested metadata in the connection's pool.
+ * The entry is not actually added to the queue's lists at this point.
+ */
+int kdbus_queue_entry_alloc(struct kdbus_conn *conn,
+ const struct kdbus_kmsg *kmsg,
+ struct kdbus_queue_entry **e)
+{
+ struct kdbus_queue_entry *entry;
+ struct kdbus_item *it;
+ u64 msg_size;
+ size_t size;
+ size_t dst_name_len = 0;
+ size_t payloads = 0;
+ size_t fds = 0;
+ size_t meta_off = 0;
+ size_t vec_data;
+ size_t want, have;
+ int ret = 0;
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ return -ENOMEM;
+
+ entry->user = -1;
+
+ /* copy message properties we need for the entry management */
+ entry->src_id = kmsg->msg.src_id;
+ entry->cookie = kmsg->msg.cookie;
+
+ /* space for the header */
+ if (kmsg->msg.src_id == KDBUS_SRC_ID_KERNEL)
+ size = kmsg->msg.size;
+ else
+ size = offsetof(struct kdbus_msg, items);
+ msg_size = size;
+
+ /* let the receiver know where the message was addressed to */
+ if (kmsg->dst_name) {
+ dst_name_len = strlen(kmsg->dst_name) + 1;
+ msg_size += KDBUS_ITEM_SIZE(dst_name_len);
+ entry->dst_name_id = kmsg->dst_name_id;
+ }
+
+ /* space for PAYLOAD items */
+ if ((kmsg->vecs_count + kmsg->memfds_count) > 0) {
+ payloads = msg_size;
+ msg_size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec)) *
+ kmsg->vecs_count;
+ msg_size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd)) *
+ kmsg->memfds_count;
+ }
+
+ /* space for FDS item */
+ if (kmsg->fds_count > 0) {
+ entry->fds_fp = kcalloc(kmsg->fds_count, sizeof(struct file *),
+ GFP_KERNEL);
+ if (!entry->fds_fp)
+ return -ENOMEM;
+
+ fds = msg_size;
+ msg_size += KDBUS_ITEM_SIZE(kmsg->fds_count * sizeof(int));
+ }
+
+ /* space for metadata/credential items */
+ if (kmsg->meta && kmsg->meta->size > 0 &&
+ kdbus_meta_ns_eq(kmsg->meta, conn->meta)) {
+ meta_off = msg_size;
+ msg_size += kmsg->meta->size;
+ }
+
+ /* data starts after the message */
+ vec_data = KDBUS_ALIGN8(msg_size);
+
+ /* do not give out more than half of the remaining space */
+ want = vec_data + kmsg->vecs_size;
+ have = kdbus_pool_remain(conn->pool);
+ if (want < have && want > have / 2) {
+ ret = -EXFULL;
+ goto exit;
+ }
+
+ /* allocate the needed space in the pool of the receiver */
+ ret = kdbus_pool_slice_alloc(conn->pool, &entry->slice, want);
+ if (ret < 0)
+ goto exit;
+
+ /* copy the message header */
+ ret = kdbus_pool_slice_copy(entry->slice, 0, &kmsg->msg, size);
+ if (ret < 0)
+ goto exit_pool_free;
+
+ /* update the size */
+ ret = kdbus_pool_slice_copy(entry->slice, 0, &msg_size,
+ sizeof(kmsg->msg.size));
+ if (ret < 0)
+ goto exit_pool_free;
+
+ if (dst_name_len > 0) {
+ char tmp[KDBUS_ITEM_HEADER_SIZE + dst_name_len];
+
+ it = (struct kdbus_item *)tmp;
+ it->size = KDBUS_ITEM_HEADER_SIZE + dst_name_len;
+ it->type = KDBUS_ITEM_DST_NAME;
+ memcpy(it->str, kmsg->dst_name, dst_name_len);
+
+ ret = kdbus_pool_slice_copy(entry->slice, size, it, it->size);
+ if (ret < 0)
+ goto exit_pool_free;
+ }
+
+ /* add PAYLOAD items */
+ if (payloads > 0) {
+ ret = kdbus_queue_entry_payload_add(entry, kmsg,
+ payloads, vec_data);
+ if (ret < 0)
+ goto exit_pool_free;
+ }
+
+ /* add a FDS item; the array content will be updated at RECV time */
+ if (kmsg->fds_count > 0) {
+ char tmp[KDBUS_ITEM_HEADER_SIZE];
+ unsigned int i;
+
+ it = (struct kdbus_item *)tmp;
+ it->type = KDBUS_ITEM_FDS;
+ it->size = KDBUS_ITEM_HEADER_SIZE +
+ (kmsg->fds_count * sizeof(int));
+ ret = kdbus_pool_slice_copy(entry->slice, fds,
+ it, KDBUS_ITEM_HEADER_SIZE);
+ if (ret < 0)
+ goto exit_pool_free;
+
+ for (i = 0; i < kmsg->fds_count; i++) {
+ entry->fds_fp[i] = get_file(kmsg->fds[i]);
+ if (!entry->fds_fp[i]) {
+ ret = -EBADF;
+ goto exit_pool_free;
+ }
+ }
+
+ /* remember the array to update at RECV */
+ entry->fds = fds + offsetof(struct kdbus_item, fds);
+ entry->fds_count = kmsg->fds_count;
+ }
+
+ /* append message metadata/credential items */
+ if (meta_off > 0) {
+ ret = kdbus_pool_slice_copy(entry->slice, meta_off,
+ kmsg->meta->data,
+ kmsg->meta->size);
+ if (ret < 0)
+ goto exit_pool_free;
+ }
+
+ entry->priority = kmsg->msg.priority;
+ *e = entry;
+ return 0;
+
+exit_pool_free:
+ kdbus_pool_slice_free(entry->slice);
+exit:
+ kdbus_queue_entry_free(entry);
+ return ret;
+}
+
+/**
+ * kdbus_queue_entry_free() - free resources of an entry
+ * @entry: The entry to free
+ *
+ * Removes resources allocated by a queue entry, along with the entry itself.
+ * Note that the entry's slice is not freed at this point.
+ */
+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry)
+{
+ kdbus_fput_files(entry->memfds_fp, entry->memfds_count);
+ kdbus_fput_files(entry->fds_fp, entry->fds_count);
+ kfree(entry->memfds_fp);
+ kfree(entry->fds_fp);
+ kfree(entry);
+}
+
+/**
+ * kdbus_queue_init() - initialize data structure related to a queue
+ * @queue: The queue to initialize
+ */
+void kdbus_queue_init(struct kdbus_queue *queue)
+{
+ INIT_LIST_HEAD(&queue->msg_list);
+ queue->msg_prio_queue = RB_ROOT;
+}
diff --git a/drivers/misc/kdbus/queue.h b/drivers/misc/kdbus/queue.h
new file mode 100644
index 000000000000..26ff199a40f7
--- /dev/null
+++ b/drivers/misc/kdbus/queue.h
@@ -0,0 +1,82 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_QUEUE_H
+#define __KDBUS_QUEUE_H
+
+struct kdbus_queue {
+ size_t msg_count;
+ struct list_head msg_list;
+ struct rb_root msg_prio_queue;
+ struct rb_node *msg_prio_highest;
+};
+
+/**
+ * struct kdbus_queue_entry - messages waiting to be read
+ * @entry: Entry in the connection's list
+ * @prio_node: Entry in the priority queue tree
+ * @prio_entry: Queue tree node entry in the list of one priority
+ * @priority: Queueing priority of the message
+ * @slice: Allocated slice in the receiver's pool
+ * @memfds: Arrays of offsets where to update the installed
+ * fd number
+ * @memfds_fp: Array memfd files queued up for this message
+ * @memfds_count: Number of memfds
+ * @fds: Offset to array where to update the installed fd number
+ * @fds_fp: Array of passed files queued up for this message
+ * @fds_count: Number of files
+ * @src_id: The ID of the sender
+ * @cookie: Message cookie, used for replies
+ * @dst_name_id: The sequence number of the name this message is
+ * addressed to, 0 for messages sent to an ID
+ * @reply: The reply block if a reply to this message is expected.
+ * @user: Index in per-user message counter, -1 for unused
+ */
+struct kdbus_queue_entry {
+ struct list_head entry;
+ struct rb_node prio_node;
+ struct list_head prio_entry;
+ s64 priority;
+ struct kdbus_pool_slice *slice;
+ size_t *memfds;
+ struct file **memfds_fp;
+ unsigned int memfds_count;
+ size_t fds;
+ struct file **fds_fp;
+ unsigned int fds_count;
+ u64 src_id;
+ u64 cookie;
+ u64 dst_name_id;
+ struct kdbus_conn_reply *reply;
+ int user;
+};
+
+struct kdbus_kmsg;
+
+void kdbus_queue_init(struct kdbus_queue *queue);
+
+int kdbus_queue_entry_alloc(struct kdbus_conn *conn,
+ const struct kdbus_kmsg *kmsg,
+ struct kdbus_queue_entry **e);
+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry);
+
+void kdbus_queue_entry_add(struct kdbus_queue *queue,
+ struct kdbus_queue_entry *entry);
+void kdbus_queue_entry_remove(struct kdbus_conn *conn,
+ struct kdbus_queue_entry *entry);
+int kdbus_queue_entry_peek(struct kdbus_queue *queue,
+ s64 priority, bool use_priority,
+ struct kdbus_queue_entry **entry);
+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry);
+
+#endif /* __KDBUS_QUEUE_H */
diff --git a/drivers/misc/kdbus/util.h b/drivers/misc/kdbus/util.h
index d84b820d2132..bb180579de18 100644
--- a/drivers/misc/kdbus/util.h
+++ b/drivers/misc/kdbus/util.h
@@ -17,7 +17,7 @@
#include <linux/dcache.h>
#include <linux/ioctl.h>

-#include "kdbus.h"
+#include <uapi/linux/kdbus.h>

/* all exported addresses are 64 bit */
#define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 03:56:28 UTC
Permalink
On Wed, Oct 29, 2014 at 8:47 PM, Eric W. Biederman
Post by Greg Kroah-Hartman
This patch adds code to create and destroy connections, to validate
incoming messages and to maintain the queue of messages that are
associated with a connection.
Note that connection and queue have a 1:1 relation, the code is only
split in two parts for cleaner separation and better readability.
You are not performing capability checks at open time.
As such this API is suceptible to a host of file descriptor passing attacks.
To be fair, write(2) doesn't work on these fds, so the usual attacks
don't work. But who knows what absurd things kdbus clients will do
with fd passing?

--Andy
Post by Greg Kroah-Hartman
---
+/*
+ * Check for maximum number of messages per individual user. This
+ * should prevent a single user from being able to fill the receiver's
+ * queue.
+ */
+static int kdbus_conn_queue_user_quota(struct kdbus_conn *conn,
+ const struct kdbus_conn *conn_src,
+ struct kdbus_queue_entry *entry)
+{
+ unsigned int user;
+
+ if (!conn_src)
+ return 0;
+
+ if (ns_capable(&init_user_ns, CAP_IPC_OWNER))
+ return 0;
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Djalal Harouni
2014-10-30 09:07:08 UTC
Permalink
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 8:47 PM, Eric W. Biederman
Post by Greg Kroah-Hartman
This patch adds code to create and destroy connections, to validate
incoming messages and to maintain the queue of messages that are
associated with a connection.
Note that connection and queue have a 1:1 relation, the code is only
split in two parts for cleaner separation and better readability.
You are not performing capability checks at open time.
As such this API is suceptible to a host of file descriptor passing attacks.
To be fair, write(2) doesn't work on these fds, so the usual attacks
don't work. But who knows what absurd things kdbus clients will do
with fd passing?
Yes, we use ioctl() so we are safe here! if there is a a suid process
that does perform arbitrary ioctl() on intrusted passed fds,
then we are already in truble given all the already available ioctl()
(not only kdbus, all available ioctl()... we blame the client), so yes
usual write()/read() do not work here.

But we do perform the creds check against the cred of connection
creation time, if you open the fd you do not have the connection,
you still need a KDBUS_CMD_HELLO ioctl() on the fd, and during that time
we store the creds, and we perform all the TALK, SEE and OWN against
those creds (uid/gid). It is like a second connect() call, unless you
perform the KDBUS_CMD_HELLO you are not connected, and after turning
your fd to a connection, a service can restrict its access (TALK, OWN
and SEE) policies, not all connected peers can TALK (send messages) to
a service.
--
Djalal Harouni
http://opendz.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:07:01 UTC
Permalink
From: Daniel Mack <***@zonque.org>

This patch adds the header file which describes the low-level
transport protocol used by various ioctls. The header file is located
in include/uapi/linux/ as it is shared between kernel and userspace,
and it only contains data structure definitionsi, enums and #defines
for constants.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
include/uapi/linux/kdbus.h | 918 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 918 insertions(+)
create mode 100644 include/uapi/linux/kdbus.h

diff --git a/include/uapi/linux/kdbus.h b/include/uapi/linux/kdbus.h
new file mode 100644
index 000000000000..2ebf405d7dfa
--- /dev/null
+++ b/include/uapi/linux/kdbus.h
@@ -0,0 +1,918 @@
+/*
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef _KDBUS_UAPI_H_
+#define _KDBUS_UAPI_H_
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#define KDBUS_IOCTL_MAGIC 0x95
+#define KDBUS_SRC_ID_KERNEL (0)
+#define KDBUS_DST_ID_NAME (0)
+#define KDBUS_MATCH_ID_ANY (~0ULL)
+#define KDBUS_DST_ID_BROADCAST (~0ULL)
+#define KDBUS_FLAG_KERNEL (1ULL << 63)
+
+/**
+ * struct kdbus_notify_id_change - name registry change message
+ * @id: New or former owner of the name
+ * @flags: flags field from KDBUS_HELLO_*
+ *
+ * Sent from kernel to userspace when the owner or activator of
+ * a well-known name changes.
+ *
+ * Attached to:
+ * KDBUS_ITEM_ID_ADD
+ * KDBUS_ITEM_ID_REMOVE
+ */
+struct kdbus_notify_id_change {
+ __u64 id;
+ __u64 flags;
+};
+
+/**
+ * struct kdbus_notify_name_change - name registry change message
+ * @old_id: ID and flags of former owner of a name
+ * @new_id: ID and flags of new owner of a name
+ * @name: Well-known name
+ *
+ * Sent from kernel to userspace when the owner or activator of
+ * a well-known name changes.
+ *
+ * Attached to:
+ * KDBUS_ITEM_NAME_ADD
+ * KDBUS_ITEM_NAME_REMOVE
+ * KDBUS_ITEM_NAME_CHANGE
+ */
+struct kdbus_notify_name_change {
+ struct kdbus_notify_id_change old_id;
+ struct kdbus_notify_id_change new_id;
+ char name[0];
+};
+
+/**
+ * struct kdbus_creds - process credentials
+ * @uid: User ID
+ * @gid: Group ID
+ * @pid: Process ID
+ * @tid: Thread ID
+ * @starttime: Starttime of the process
+ *
+ * The starttime of the process PID. This is useful to detect PID overruns
+ * from the client side. i.e. if you use the PID to look something up in
+ * /proc/$PID/ you can afterwards check the starttime field of it, to ensure
+ * you didn't run into a PID overrun.
+ *
+ * Attached to:
+ * KDBUS_ITEM_CREDS
+ */
+struct kdbus_creds {
+ __u64 uid;
+ __u64 gid;
+ __u64 pid;
+ __u64 tid;
+ __u64 starttime;
+};
+
+/**
+ * struct kdbus_caps - process capabilities
+ * @last_cap: Highest currently known capability bit
+ * @caps: Variable number of 32-bit capabilities flags
+ *
+ * Contains a variable number of 32-bit capabilities flags.
+ *
+ * Attached to:
+ * KDBUS_ITEM_CAPS
+ */
+struct kdbus_caps {
+ __u32 last_cap;
+ __u32 caps[0];
+};
+
+/**
+ * struct kdbus_audit - audit information
+ * @sessionid: The audit session ID
+ * @loginuid: The audit login uid
+ *
+ * Attached to:
+ * KDBUS_ITEM_AUDIT
+ */
+struct kdbus_audit {
+ __u64 sessionid;
+ __u64 loginuid;
+};
+
+/**
+ * struct kdbus_timestamp
+ * @seqnum: Global per-domain message sequence number
+ * @monotonic_ns: Monotonic timestamp, in nanoseconds
+ * @realtime_ns: Realtime timestamp, in nanoseconds
+ *
+ * Attached to:
+ * KDBUS_ITEM_TIMESTAMP
+ */
+struct kdbus_timestamp {
+ __u64 seqnum;
+ __u64 monotonic_ns;
+ __u64 realtime_ns;
+};
+
+/**
+ * struct kdbus_vec - I/O vector for kdbus payload items
+ * @size: The size of the vector
+ * @address: Memory address of data buffer
+ * @offset: Offset in the in-message payload memory,
+ * relative to the message head
+ *
+ * Attached to:
+ * KDBUS_ITEM_PAYLOAD_VEC, KDBUS_ITEM_PAYLOAD_OFF
+ */
+struct kdbus_vec {
+ __u64 size;
+ union {
+ __u64 address;
+ __u64 offset;
+ };
+};
+
+/**
+ * struct kdbus_bloom_parameter - bus-wide bloom parameters
+ * @size: Size of the bit field in bytes (m / 8)
+ * @n_hash: Number of hash functions used (k)
+ */
+struct kdbus_bloom_parameter {
+ __u64 size;
+ __u64 n_hash;
+};
+
+/**
+ * struct kdbus_bloom_filter - bloom filter containing n elements
+ * @generation: Generation of the element set in the filter
+ * @data: Bit field, multiple of 8 bytes
+ */
+struct kdbus_bloom_filter {
+ __u64 generation;
+ __u64 data[0];
+};
+
+/**
+ * struct kdbus_memfd - a kdbus memfd
+ * @size: The memfd's size
+ * @fd: The file descriptor number
+ * @__pad: Padding to ensure proper alignment and size
+ *
+ * Attached to:
+ * KDBUS_ITEM_PAYLOAD_MEMFD
+ */
+struct kdbus_memfd {
+ __u64 size;
+ int fd;
+ __u32 __pad;
+};
+
+/**
+ * struct kdbus_name - a registered well-known name with its flags
+ * @flags: Flags from KDBUS_NAME_*
+ * @name: Well-known name
+ *
+ * Attached to:
+ * KDBUS_ITEM_NAME
+ */
+struct kdbus_name {
+ __u64 flags;
+ char name[0];
+};
+
+/**
+ * struct kdbus_policy_access - policy access item
+ * @type: One of KDBUS_POLICY_ACCESS_* types
+ * @access: Access to grant
+ * @id: For KDBUS_POLICY_ACCESS_USER, the uid
+ * For KDBUS_POLICY_ACCESS_GROUP, the gid
+ */
+struct kdbus_policy_access {
+ __u64 type; /* USER, GROUP, WORLD */
+ __u64 access; /* OWN, TALK, SEE */
+ __u64 id; /* uid, gid, 0 */
+};
+
+/**
+ * enum kdbus_item_type - item types to chain data in a list
+ * @_KDBUS_ITEM_NULL: Uninitialized/invalid
+ * @_KDBUS_ITEM_USER_BASE: Start of user items
+ * @KDBUS_ITEM_PAYLOAD_VEC: Vector to data
+ * @KDBUS_ITEM_PAYLOAD_OFF: Data at returned offset to message head
+ * @KDBUS_ITEM_PAYLOAD_MEMFD: Data as sealed memfd
+ * @KDBUS_ITEM_FDS: Attached file descriptors
+ * @KDBUS_ITEM_BLOOM_PARAMETER: Bus-wide bloom parameters, used with
+ * KDBUS_CMD_BUS_MAKE, carries a
+ * struct kdbus_bloom_parameter
+ * @KDBUS_ITEM_BLOOM_FILTER: Bloom filter carried with a message, used to
+ * match against a bloom mask of a connection,
+ * carries a struct kdbus_bloom_filter
+ * @KDBUS_ITEM_BLOOM_MASK: Bloom mask used to match against a message's
+ * bloom filter
+ * @KDBUS_ITEM_DST_NAME: Destination's well-known name
+ * @KDBUS_ITEM_MAKE_NAME: Name of domain, bus, endpoint
+ * @KDBUS_ITEM_ATTACH_FLAGS: Attach-flags, used for updating which metadata
+ * a connection subscribes to
+ * @_KDBUS_ITEM_ATTACH_BASE: Start of metadata attach items
+ * @KDBUS_ITEM_NAME: Well-know name with flags
+ * @KDBUS_ITEM_ID: Connection ID
+ * @KDBUS_ITEM_TIMESTAMP: Timestamp
+ * @KDBUS_ITEM_CREDS: Process credential
+ * @KDBUS_ITEM_AUXGROUPS: Auxiliary process groups
+ * @KDBUS_ITEM_PID_COMM: Process ID "comm" identifier
+ * @KDBUS_ITEM_TID_COMM: Thread ID "comm" identifier
+ * @KDBUS_ITEM_EXE: The path of the executable
+ * @KDBUS_ITEM_CMDLINE: The process command line
+ * @KDBUS_ITEM_CGROUP: The croup membership
+ * @KDBUS_ITEM_CAPS: The process capabilities
+ * @KDBUS_ITEM_SECLABEL: The security label
+ * @KDBUS_ITEM_AUDIT: The audit IDs
+ * @KDBUS_ITEM_CONN_NAME: The connection's human-readable name (debugging)
+ * @_KDBUS_ITEM_POLICY_BASE: Start of policy items
+ * @KDBUS_ITEM_POLICY_ACCESS: Policy access block
+ * @_KDBUS_ITEM_KERNEL_BASE: Start of kernel-generated message items
+ * @KDBUS_ITEM_NAME_ADD: Notify in struct kdbus_notify_name_change
+ * @KDBUS_ITEM_NAME_REMOVE: Notify in struct kdbus_notify_name_change
+ * @KDBUS_ITEM_NAME_CHANGE: Notify in struct kdbus_notify_name_change
+ * @KDBUS_ITEM_ID_ADD: Notify in struct kdbus_notify_id_change
+ * @KDBUS_ITEM_ID_REMOVE: Notify in struct kdbus_notify_id_change
+ * @KDBUS_ITEM_REPLY_TIMEOUT: Timeout has been reached
+ * @KDBUS_ITEM_REPLY_DEAD: Destination died
+ */
+enum kdbus_item_type {
+ _KDBUS_ITEM_NULL,
+ _KDBUS_ITEM_USER_BASE,
+ KDBUS_ITEM_PAYLOAD_VEC = _KDBUS_ITEM_USER_BASE,
+ KDBUS_ITEM_PAYLOAD_OFF,
+ KDBUS_ITEM_PAYLOAD_MEMFD,
+ KDBUS_ITEM_FDS,
+ KDBUS_ITEM_BLOOM_PARAMETER,
+ KDBUS_ITEM_BLOOM_FILTER,
+ KDBUS_ITEM_BLOOM_MASK,
+ KDBUS_ITEM_DST_NAME,
+ KDBUS_ITEM_MAKE_NAME,
+ KDBUS_ITEM_ATTACH_FLAGS,
+
+ _KDBUS_ITEM_ATTACH_BASE = 0x1000,
+ KDBUS_ITEM_NAME = _KDBUS_ITEM_ATTACH_BASE,
+ KDBUS_ITEM_ID,
+ KDBUS_ITEM_TIMESTAMP,
+ KDBUS_ITEM_CREDS,
+ KDBUS_ITEM_AUXGROUPS,
+ KDBUS_ITEM_PID_COMM,
+ KDBUS_ITEM_TID_COMM,
+ KDBUS_ITEM_EXE,
+ KDBUS_ITEM_CMDLINE,
+ KDBUS_ITEM_CGROUP,
+ KDBUS_ITEM_CAPS,
+ KDBUS_ITEM_SECLABEL,
+ KDBUS_ITEM_AUDIT,
+ KDBUS_ITEM_CONN_NAME,
+
+ _KDBUS_ITEM_POLICY_BASE = 0x2000,
+ KDBUS_ITEM_POLICY_ACCESS = _KDBUS_ITEM_POLICY_BASE,
+
+ _KDBUS_ITEM_KERNEL_BASE = 0x8000,
+ KDBUS_ITEM_NAME_ADD = _KDBUS_ITEM_KERNEL_BASE,
+ KDBUS_ITEM_NAME_REMOVE,
+ KDBUS_ITEM_NAME_CHANGE,
+ KDBUS_ITEM_ID_ADD,
+ KDBUS_ITEM_ID_REMOVE,
+ KDBUS_ITEM_REPLY_TIMEOUT,
+ KDBUS_ITEM_REPLY_DEAD,
+};
+
+/**
+ * struct kdbus_item - chain of data blocks
+ * @size: Overall data record size
+ * @type: Kdbus_item type of data
+ * @data: Generic bytes
+ * @data32: Generic 32 bit array
+ * @data64: Generic 64 bit array
+ * @str: Generic string
+ * @id: Connection ID
+ * @vec: KDBUS_ITEM_PAYLOAD_VEC
+ * @creds: KDBUS_ITEM_CREDS
+ * @audit: KDBUS_ITEM_AUDIT
+ * @timestamp: KDBUS_ITEM_TIMESTAMP
+ * @name: KDBUS_ITEM_NAME
+ * @bloom_parameter: KDBUS_ITEM_BLOOM_PARAMETER
+ * @bloom_filter: KDBUS_ITEM_BLOOM_FILTER
+ * @memfd: KDBUS_ITEM_PAYLOAD_MEMFD
+ * @name_change: KDBUS_ITEM_NAME_ADD
+ * KDBUS_ITEM_NAME_REMOVE
+ * KDBUS_ITEM_NAME_CHANGE
+ * @id_change: KDBUS_ITEM_ID_ADD
+ * KDBUS_ITEM_ID_REMOVE
+ * @policy: KDBUS_ITEM_POLICY_ACCESS
+ */
+struct kdbus_item {
+ __u64 size;
+ __u64 type;
+ union {
+ __u8 data[0];
+ __u32 data32[0];
+ __u64 data64[0];
+ char str[0];
+
+ __u64 id;
+ struct kdbus_vec vec;
+ struct kdbus_creds creds;
+ struct kdbus_audit audit;
+ struct kdbus_caps caps;
+ struct kdbus_timestamp timestamp;
+ struct kdbus_name name;
+ struct kdbus_bloom_parameter bloom_parameter;
+ struct kdbus_bloom_filter bloom_filter;
+ struct kdbus_memfd memfd;
+ int fds[0];
+ struct kdbus_notify_name_change name_change;
+ struct kdbus_notify_id_change id_change;
+ struct kdbus_policy_access policy_access;
+ };
+};
+
+/**
+ * enum kdbus_msg_flags - type of message
+ * @KDBUS_MSG_FLAGS_EXPECT_REPLY: Expect a reply message, used for
+ * method calls. The userspace-supplied
+ * cookie identifies the message and the
+ * respective reply carries the cookie
+ * in cookie_reply
+ * @KDBUS_MSG_FLAGS_SYNC_REPLY: Wait for destination connection to
+ * reply to this message. The
+ * KDBUS_CMD_MSG_SEND ioctl() will block
+ * until the reply is received, and
+ * offset_reply in struct kdbus_msg will
+ * yield the offset in the sender's pool
+ * where the reply can be found.
+ * This flag is only valid if
+ * @KDBUS_MSG_FLAGS_EXPECT_REPLY is set as
+ * well.
+ * @KDBUS_MSG_FLAGS_NO_AUTO_START: Do not start a service, if the addressed
+ * name is not currently active
+ */
+enum kdbus_msg_flags {
+ KDBUS_MSG_FLAGS_EXPECT_REPLY = 1ULL << 0,
+ KDBUS_MSG_FLAGS_SYNC_REPLY = 1ULL << 1,
+ KDBUS_MSG_FLAGS_NO_AUTO_START = 1ULL << 2,
+};
+
+/**
+ * enum kdbus_payload_type - type of payload carried by message
+ * @KDBUS_PAYLOAD_KERNEL: Kernel-generated simple message
+ * @KDBUS_PAYLOAD_DBUS: D-Bus marshalling "DBusDBus"
+ */
+enum kdbus_payload_type {
+ KDBUS_PAYLOAD_KERNEL,
+ KDBUS_PAYLOAD_DBUS = 0x4442757344427573ULL,
+};
+
+/**
+ * struct kdbus_msg - the representation of a kdbus message
+ * @size: Total size of the message
+ * @flags: Message flags (KDBUS_MSG_FLAGS_*), userspace → kernel
+ * @kernel_flags: Supported message flags, kernel → userspace
+ * @priority: Message queue priority value
+ * @dst_id: 64-bit ID of the destination connection
+ * @src_id: 64-bit ID of the source connection
+ * @payload_type: Payload type (KDBUS_PAYLOAD_*)
+ * @cookie: Userspace-supplied cookie, for the connection
+ * to identify its messages
+ * @timeout_ns: The time to wait for a message reply from the peer.
+ * If there is no reply, a kernel-generated message
+ * with an attached KDBUS_ITEM_REPLY_TIMEOUT item
+ * is sent to @src_id. The timeout is expected in
+ * nanoseconds and as absolute CLOCK_MONOTONIC value.
+ * @cookie_reply: A reply to the requesting message with the same
+ * cookie. The requesting connection can match its
+ * request and the reply with this value
+ * @offset_reply: If KDBUS_MSG_FLAGS_EXPECT_REPLY, this field will
+ * contain the offset in the sender's pool where the
+ * reply is stored.
+ * @items: A list of kdbus_items containing the message payload
+ */
+struct kdbus_msg {
+ __u64 size;
+ __u64 flags;
+ __u64 kernel_flags;
+ __s64 priority;
+ __u64 dst_id;
+ __u64 src_id;
+ __u64 payload_type;
+ __u64 cookie;
+ union {
+ __u64 timeout_ns;
+ __u64 cookie_reply;
+ __u64 offset_reply;
+ };
+ struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_recv_flags - flags for de-queuing messages
+ * @KDBUS_RECV_PEEK: Return the next queued message without
+ * actually de-queuing it, and without installing
+ * any file descriptors or other resources. It is
+ * usually used to determine the activating
+ * connection of a bus name.
+ * @KDBUS_RECV_DROP: Drop and free the next queued message and all
+ * its resources without actually receiving it.
+ * @KDBUS_RECV_USE_PRIORITY: Only de-queue messages with the specified or
+ * higher priority (lowest values); if not set,
+ * the priority value is ignored.
+ */
+enum kdbus_recv_flags {
+ KDBUS_RECV_PEEK = 1ULL << 0,
+ KDBUS_RECV_DROP = 1ULL << 1,
+ KDBUS_RECV_USE_PRIORITY = 1ULL << 2,
+};
+
+/**
+ * struct kdbus_cmd_recv - struct to de-queue a buffered message
+ * @flags: KDBUS_RECV_* flags, userspace → kernel
+ * @kernel_flags: Supported KDBUS_RECV_* flags, kernel → userspace
+ * @priority: Minimum priority of the messages to de-queue. Lowest
+ * values have the highest priority.
+ * @offset: Returned offset in the pool where the message is
+ * stored. The user must use KDBUS_CMD_FREE to free
+ * the allocated memory.
+ *
+ * This struct is used with the KDBUS_CMD_MSG_RECV ioctl.
+ */
+struct kdbus_cmd_recv {
+ __u64 flags;
+ __u64 kernel_flags;
+ __s64 priority;
+ __u64 offset;
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_cmd_cancel - struct to cancel a synchronously pending message
+ * @cookie The cookie of the pending message
+ * @flags Flags for the free command. Currently unused.
+ *
+ * This struct is used with the KDBUS_CMD_CANCEL ioctl.
+ */
+struct kdbus_cmd_cancel {
+ __u64 cookie;
+ __u64 flags;
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_cmd_free - struct to free a slice of memory in the pool
+ * @offset: The offset of the memory slice, as returned by other
+ * ioctls
+ * @flags: Flags for the free command, userspace → kernel
+ * @kernel_flags: Supported flags of the free command, userspace → kernel
+ *
+ * This struct is used with the KDBUS_CMD_FREE ioctl.
+ */
+struct kdbus_cmd_free {
+ __u64 offset;
+ __u64 flags;
+ __u64 kernel_flags;
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_policy_access_type - permissions of a policy record
+ * @_KDBUS_POLICY_ACCESS_NULL: Uninitialized/invalid
+ * @KDBUS_POLICY_ACCESS_USER: Grant access to a uid
+ * @KDBUS_POLICY_ACCESS_GROUP: Grant access to gid
+ * @KDBUS_POLICY_ACCESS_WORLD: World-accessible
+ */
+enum kdbus_policy_access_type {
+ _KDBUS_POLICY_ACCESS_NULL,
+ KDBUS_POLICY_ACCESS_USER,
+ KDBUS_POLICY_ACCESS_GROUP,
+ KDBUS_POLICY_ACCESS_WORLD,
+};
+
+/**
+ * enum kdbus_policy_access_flags - mode flags
+ * @KDBUS_POLICY_OWN: Allow to own a well-known name
+ * Implies KDBUS_POLICY_TALK and KDBUS_POLICY_SEE
+ * @KDBUS_POLICY_TALK: Allow communication to a well-known name
+ * Implies KDBUS_POLICY_SEE
+ * @KDBUS_POLICY_SEE: Allow to see a well-known name
+ */
+enum kdbus_policy_type {
+ KDBUS_POLICY_SEE = 0,
+ KDBUS_POLICY_TALK,
+ KDBUS_POLICY_OWN,
+};
+
+/**
+ * enum kdbus_hello_flags - flags for struct kdbus_cmd_hello
+ * @KDBUS_HELLO_ACCEPT_FD: The connection allows the reception of
+ * any passed file descriptors
+ * @KDBUS_HELLO_ACTIVATOR: Special-purpose connection which registers
+ * a well-know name for a process to be started
+ * when traffic arrives
+ * @KDBUS_HELLO_POLICY_HOLDER: Special-purpose connection which registers
+ * policy entries for a name. The provided name
+ * is not activated and not registered with the
+ * name database, it only allows unprivileged
+ * connections to aquire a name, talk or discover
+ * a service
+ * @KDBUS_HELLO_MONITOR: Special-purpose connection to monitor
+ * bus traffic
+ */
+enum kdbus_hello_flags {
+ KDBUS_HELLO_ACCEPT_FD = 1ULL << 0,
+ KDBUS_HELLO_ACTIVATOR = 1ULL << 1,
+ KDBUS_HELLO_POLICY_HOLDER = 1ULL << 2,
+ KDBUS_HELLO_MONITOR = 1ULL << 3,
+};
+
+/**
+ * enum kdbus_attach_flags - flags for metadata attachments
+ * @KDBUS_ATTACH_TIMESTAMP: Timestamp
+ * @KDBUS_ATTACH_CREDS: Credentials
+ * @KDBUS_ATTACH_AUXGROUPS: Auxiliary groups
+ * @KDBUS_ATTACH_NAMES: Well-known names
+ * @KDBUS_ATTACH_COMM_TID: The "comm" process identifier of the TID
+ * @KDBUS_ATTACH_COMM_PID: The "comm" process identifier of the PID
+ * @KDBUS_ATTACH_EXE: The path of the executable
+ * @KDBUS_ATTACH_CMDLINE: The process command line
+ * @KDBUS_ATTACH_CGROUP: The croup membership
+ * @KDBUS_ATTACH_CAPS: The process capabilities
+ * @KDBUS_ATTACH_SECLABEL: The security label
+ * @KDBUS_ATTACH_AUDIT: The audit IDs
+ * @KDBUS_ATTACH_CONN_NAME: The human-readable connection name
+ * @_KDBUS_ATTACH_ALL: All of the above
+ */
+enum kdbus_attach_flags {
+ KDBUS_ATTACH_TIMESTAMP = 1ULL << 0,
+ KDBUS_ATTACH_CREDS = 1ULL << 1,
+ KDBUS_ATTACH_AUXGROUPS = 1ULL << 2,
+ KDBUS_ATTACH_NAMES = 1ULL << 3,
+ KDBUS_ATTACH_TID_COMM = 1ULL << 4,
+ KDBUS_ATTACH_PID_COMM = 1ULL << 5,
+ KDBUS_ATTACH_EXE = 1ULL << 6,
+ KDBUS_ATTACH_CMDLINE = 1ULL << 7,
+ KDBUS_ATTACH_CGROUP = 1ULL << 8,
+ KDBUS_ATTACH_CAPS = 1ULL << 9,
+ KDBUS_ATTACH_SECLABEL = 1ULL << 10,
+ KDBUS_ATTACH_AUDIT = 1ULL << 11,
+ KDBUS_ATTACH_CONN_NAME = 1ULL << 12,
+ _KDBUS_ATTACH_ALL = (1ULL << 13) - 1,
+};
+
+/**
+ * struct kdbus_cmd_hello - struct to say hello to kdbus
+ * @size: The total size of the structure
+ * @flags: Connection flags (KDBUS_HELLO_*), userspace → kernel
+ * @kernel_flags: Supported connection flags, kernel → userspace
+ * @attach_flags: Mask of metadata to attach to each message sent
+ * (KDBUS_ATTACH_*)
+ * @bus_flags: The flags field copied verbatim from the original
+ * KDBUS_CMD_BUS_MAKE ioctl. It's intended to be useful
+ * to do negotiation of features of the payload that is
+ * transferred (kernel → userspace)
+ * @id: The ID of this connection (kernel → userspace)
+ * @pool_size: Size of the connection's buffer where the received
+ * messages are placed
+ * @bloom: The bloom properties of the bus, specified
+ * by the bus creator (kernel → userspace)
+ * @id128: Unique 128-bit ID of the bus (kernel → userspace)
+ * @items: A list of items
+ *
+ * This struct is used with the KDBUS_CMD_HELLO ioctl.
+ */
+struct kdbus_cmd_hello {
+ __u64 size;
+ __u64 flags;
+ __u64 kernel_flags;
+ __u64 attach_flags;
+ __u64 bus_flags;
+ __u64 id;
+ __u64 pool_size;
+ struct kdbus_bloom_parameter bloom;
+ __u8 id128[16];
+ struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_make_flags - Flags for KDBUS_CMD_{BUS,EP,NS}_MAKE
+ * @KDBUS_MAKE_ACCESS_GROUP: Make the device node group-accessible
+ * @KDBUS_MAKE_ACCESS_WORLD: Make the device node world-accessible
+ */
+enum kdbus_make_flags {
+ KDBUS_MAKE_ACCESS_GROUP = 1ULL << 0,
+ KDBUS_MAKE_ACCESS_WORLD = 1ULL << 1,
+};
+
+/**
+ * struct kdbus_cmd_make - struct to make a bus, an endpoint or a domain
+ * @size: The total size of the struct
+ * @flags: Properties for the bus/ep/domain to create,
+ * userspace → kernel
+ * @kernel_flags: Supported flags for the used command, kernel → userspace
+ * @items: Items describing details
+ *
+ * This structure is used with the KDBUS_CMD_BUS_MAKE, KDBUS_CMD_ENDPOINT_MAKE
+ * and KDBUS_CMD_DOMAIN_MAKE ioctls.
+ */
+struct kdbus_cmd_make {
+ __u64 size;
+ __u64 flags;
+ __u64 kernel_flags;
+ struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_name_flags - properties of a well-known name
+ * @KDBUS_NAME_REPLACE_EXISTING: Try to replace name of other connections
+ * @KDBUS_NAME_ALLOW_REPLACEMENT: Allow the replacement of the name
+ * @KDBUS_NAME_QUEUE: Name should be queued if busy
+ * @KDBUS_NAME_IN_QUEUE: Name is queued
+ * @KDBUS_NAME_ACTIVATOR: Name is owned by a activator connection
+ */
+enum kdbus_name_flags {
+ KDBUS_NAME_REPLACE_EXISTING = 1ULL << 0,
+ KDBUS_NAME_ALLOW_REPLACEMENT = 1ULL << 1,
+ KDBUS_NAME_QUEUE = 1ULL << 2,
+ KDBUS_NAME_IN_QUEUE = 1ULL << 3,
+ KDBUS_NAME_ACTIVATOR = 1ULL << 4,
+};
+
+/**
+ * struct kdbus_cmd_name - struct to describe a well-known name
+ * @size: The total size of the struct
+ * @flags: Flags for a name entry (KDBUS_NAME_*),
+ * userspace → kernel, kernel → userspace
+ * @kernel_flags: Supported flags for a name entry, kernel → userspace
+ * @items: Item list, containing the well-known name as
+ * KDBUS_ITEM_NAME
+ *
+ * This structure is used with the KDBUS_CMD_NAME_ACQUIRE ioctl.
+ */
+struct kdbus_cmd_name {
+ __u64 size;
+ __u64 flags;
+ __u64 kernel_flags;
+ struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_name_info - struct to describe a well-known name
+ * @size: The total size of the struct
+ * @flags: Flags for a name entry (KDBUS_NAME_*),
+ * @conn_flags: The flags of the owning connection (KDBUS_HELLO_*)
+ * @owner_id: The current owner of the name
+ * @items: Item list, containing the well-known name as
+ * KDBUS_ITEM_NAME
+ *
+ * This structure is used as return struct for the KDBUS_CMD_NAME_LIST ioctl.
+ */
+struct kdbus_name_info {
+ __u64 size;
+ __u64 flags;
+ __u64 conn_flags;
+ __u64 owner_id;
+ struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_name_list_flags - what to include into the returned list
+ * @KDBUS_NAME_LIST_UNIQUE: All active connections
+ * @KDBUS_NAME_LIST_NAMES: All known well-known names
+ * @KDBUS_NAME_LIST_ACTIVATORS: All activator connections
+ * @KDBUS_NAME_LIST_QUEUED: All queued-up names
+ */
+enum kdbus_name_list_flags {
+ KDBUS_NAME_LIST_UNIQUE = 1ULL << 0,
+ KDBUS_NAME_LIST_NAMES = 1ULL << 1,
+ KDBUS_NAME_LIST_ACTIVATORS = 1ULL << 2,
+ KDBUS_NAME_LIST_QUEUED = 1ULL << 3,
+};
+
+/**
+ * struct kdbus_cmd_name_list - request a list of name entries
+ * @flags: Flags for the query (KDBUS_NAME_LIST_*),
+ * userspace → kernel
+ * @kernel_flags: Supported flags for queries, kernel → userspace
+ * @offset: The returned offset in the caller's pool buffer.
+ * The user must use KDBUS_CMD_FREE to free the
+ * allocated memory.
+ *
+ * This structure is used with the KDBUS_CMD_NAME_LIST ioctl.
+ */
+struct kdbus_cmd_name_list {
+ __u64 flags;
+ __u64 kernel_flags;
+ __u64 offset;
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_name_list - information returned by KDBUS_CMD_NAME_LIST
+ * @size: The total size of the structure
+ * @names: A list of names
+ *
+ * Note that the user is responsible for freeing the allocated memory with
+ * the KDBUS_CMD_FREE ioctl.
+ */
+struct kdbus_name_list {
+ __u64 size;
+ struct kdbus_name_info names[0];
+};
+
+/**
+ * struct kdbus_cmd_info - struct used for KDBUS_CMD_CONN_INFO ioctl
+ * @size: The total size of the struct
+ * @flags: KDBUS_ATTACH_* flags, userspace → kernel
+ * @kernel_flags: Supported KDBUS_ATTACH_* flags, kernel → userspace
+ * @id: The 64-bit ID of the connection. If set to zero, passing
+ * @name is required. kdbus will look up the name to
+ * determine the ID in this case.
+ * @offset: Returned offset in the caller's pool buffer where the
+ * kdbus_info struct result is stored. The user must
+ * use KDBUS_CMD_FREE to free the allocated memory.
+ * @items: The optional item list, containing the
+ * well-known name to look up as a KDBUS_ITEM_NAME.
+ * Only needed in case @id is zero.
+ *
+ * On success, the KDBUS_CMD_CONN_INFO ioctl will return 0 and @offset will
+ * tell the user the offset in the connection pool buffer at which to find the
+ * result in a struct kdbus_info.
+ */
+struct kdbus_cmd_info {
+ __u64 size;
+ __u64 flags;
+ __u64 kernel_flags;
+ __u64 id;
+ __u64 offset;
+ struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * struct kdbus_info - information returned by KDBUS_CMD_*_INFO
+ * @size: The total size of the struct
+ * @id: The connection's or bus' 64-bit ID
+ * @flags: The connection's or bus' flags
+ * @items: A list of struct kdbus_item
+ *
+ * Note that the user is responsible for freeing the allocated memory with
+ * the KDBUS_CMD_FREE ioctl.
+ */
+struct kdbus_info {
+ __u64 size;
+ __u64 id;
+ __u64 flags;
+ struct kdbus_item items[0];
+};
+
+/**
+ * struct kdbus_cmd_update - update flags of a connection
+ * @size: The total size of the struct
+ * @flags: Flags for the update command, userspace → kernel
+ * @kernel_flags: Supported flags for this command, kernel → userspace
+ * @items: A list of struct kdbus_item
+ *
+ * This struct is used with the KDBUS_CMD_CONN_UPDATE ioctl.
+ */
+struct kdbus_cmd_update {
+ __u64 size;
+ __u64 flags;
+ __u64 kernel_flags;
+ struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_cmd_match_flags - flags to control the KDBUS_CMD_MATCH_ADD ioctl
+ * @KDBUS_MATCH_REPLACE: If entries with the supplied cookie already
+ * exists, remove them before installing the new
+ * matches.
+ */
+enum kdbus_cmd_match_flags {
+ KDBUS_MATCH_REPLACE = 1ULL << 0,
+};
+
+/**
+ * struct kdbus_cmd_match - struct to add or remove matches
+ * @size: The total size of the struct
+ * @cookie: Userspace supplied cookie. When removing, the cookie
+ * identifies the match to remove
+ * @flags: Flags for match command (KDBUS_MATCH_*),
+ * userspace → kernel
+ * @kernel_flags: Supported flags of the used command, kernel → userspace
+ * @items: A list of items for additional information
+ *
+ * This structure is used with the KDBUS_CMD_MATCH_ADD and
+ * KDBUS_CMD_MATCH_REMOVE ioctl.
+ */
+struct kdbus_cmd_match {
+ __u64 size;
+ __u64 cookie;
+ __u64 flags;
+ __u64 kernel_flags;
+ struct kdbus_item items[0];
+} __attribute__((aligned(8)));
+
+/**
+ * enum kdbus_ioctl_type - Ioctl API
+ * @KDBUS_CMD_BUS_MAKE: After opening the "control" device node, this
+ * command creates a new bus with the specified
+ * name. The bus is immediately shut down and
+ * cleaned up when the opened "control" device node
+ * is closed.
+ * @KDBUS_CMD_DOMAIN_MAKE: Similar to KDBUS_CMD_BUS_MAKE, but it creates a
+ * new kdbus domain.
+ * @KDBUS_CMD_ENDPOINT_MAKE: Creates a new named special endpoint to talk to
+ * the bus. Such endpoints usually carry a more
+ * restrictive policy and grant restricted access
+ * to specific applications.
+ * @KDBUS_CMD_HELLO: By opening the bus device node a connection is
+ * created. After a HELLO the opened connection
+ * becomes an active peer on the bus.
+ * @KDBUS_CMD_BYEBYE: Disconnect a connection. If there are no
+ * messages queued up in the connection's pool,
+ * the call succeeds, and the handle is rendered
+ * unusable. Otherwise, -EBUSY is returned without
+ * any further side-effects.
+ * @KDBUS_CMD_MSG_SEND: Send a message and pass data from userspace to
+ * the kernel.
+ * @KDBUS_CMD_MSG_RECV: Receive a message from the kernel which is
+ * placed in the receiver's pool.
+ * @KDBUS_CMD_MSG_CANCEL: Cancel a pending request of a message that
+ * blocks while waiting for a reply. The parameter
+ * denotes the cookie of the message in flight.
+ * @KDBUS_CMD_FREE: Release the allocated memory in the receiver's
+ * pool.
+ * @KDBUS_CMD_NAME_ACQUIRE: Request a well-known bus name to associate with
+ * the connection. Well-known names are used to
+ * address a peer on the bus.
+ * @KDBUS_CMD_NAME_RELEASE: Release a well-known name the connection
+ * currently owns.
+ * @KDBUS_CMD_NAME_LIST: Retrieve the list of all currently registered
+ * well-known and unique names.
+ * @KDBUS_CMD_CONN_INFO: Retrieve credentials and properties of the
+ * initial creator of the connection. The data was
+ * stored at registration time and does not
+ * necessarily represent the connected process or
+ * the actual state of the process.
+ * @KDBUS_CMD_CONN_UPDATE: Update the properties of a connection. Used to
+ * update the metadata subscription mask and
+ * policy.
+ * @KDBUS_CMD_BUS_CREATOR_INFO: Retrieve information of the creator of the bus
+ * a connection is attached to.
+ * @KDBUS_CMD_ENDPOINT_UPDATE: Update the properties of a custom enpoint. Used
+ * to update the policy.
+ * @KDBUS_CMD_MATCH_ADD: Install a match which broadcast messages should
+ * be delivered to the connection.
+ * @KDBUS_CMD_MATCH_REMOVE: Remove a current match for broadcast messages.
+ */
+enum kdbus_ioctl_type {
+ KDBUS_CMD_BUS_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x00,
+ struct kdbus_cmd_make),
+ KDBUS_CMD_DOMAIN_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x10,
+ struct kdbus_cmd_make),
+ KDBUS_CMD_ENDPOINT_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x20,
+ struct kdbus_cmd_make),
+
+ KDBUS_CMD_HELLO = _IOWR(KDBUS_IOCTL_MAGIC, 0x30,
+ struct kdbus_cmd_hello),
+ KDBUS_CMD_BYEBYE = _IO(KDBUS_IOCTL_MAGIC, 0x31),
+
+ KDBUS_CMD_MSG_SEND = _IOWR(KDBUS_IOCTL_MAGIC, 0x40,
+ struct kdbus_msg),
+ KDBUS_CMD_MSG_RECV = _IOWR(KDBUS_IOCTL_MAGIC, 0x41,
+ struct kdbus_cmd_recv),
+ KDBUS_CMD_MSG_CANCEL = _IOW(KDBUS_IOCTL_MAGIC, 0x42,
+ struct kdbus_cmd_cancel),
+ KDBUS_CMD_FREE = _IOW(KDBUS_IOCTL_MAGIC, 0x43,
+ struct kdbus_cmd_free),
+
+ KDBUS_CMD_NAME_ACQUIRE = _IOWR(KDBUS_IOCTL_MAGIC, 0x50,
+ struct kdbus_cmd_name),
+ KDBUS_CMD_NAME_RELEASE = _IOW(KDBUS_IOCTL_MAGIC, 0x51,
+ struct kdbus_cmd_name),
+ KDBUS_CMD_NAME_LIST = _IOWR(KDBUS_IOCTL_MAGIC, 0x52,
+ struct kdbus_cmd_name_list),
+
+ KDBUS_CMD_CONN_INFO = _IOWR(KDBUS_IOCTL_MAGIC, 0x60,
+ struct kdbus_cmd_info),
+ KDBUS_CMD_CONN_UPDATE = _IOW(KDBUS_IOCTL_MAGIC, 0x61,
+ struct kdbus_cmd_update),
+ KDBUS_CMD_BUS_CREATOR_INFO = _IOWR(KDBUS_IOCTL_MAGIC, 0x62,
+ struct kdbus_cmd_info),
+
+ KDBUS_CMD_ENDPOINT_UPDATE = _IOW(KDBUS_IOCTL_MAGIC, 0x71,
+ struct kdbus_cmd_update),
+
+ KDBUS_CMD_MATCH_ADD = _IOW(KDBUS_IOCTL_MAGIC, 0x80,
+ struct kdbus_cmd_match),
+ KDBUS_CMD_MATCH_REMOVE = _IOW(KDBUS_IOCTL_MAGIC, 0x81,
+ struct kdbus_cmd_match),
+};
+
+#endif /* _KDBUS_UAPI_H_ */
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Bergmann
2014-10-30 08:20:37 UTC
Permalink
Post by Greg Kroah-Hartman
+enum kdbus_ioctl_type {
+ KDBUS_CMD_BUS_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x00,
+ struct kdbus_cmd_make),
+ KDBUS_CMD_DOMAIN_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x10,
+ struct kdbus_cmd_make),
+ KDBUS_CMD_ENDPOINT_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x20,
+ struct kdbus_cmd_make),
+
+ KDBUS_CMD_HELLO = _IOWR(KDBUS_IOCTL_MAGIC, 0x30,
+ struct kdbus_cmd_hello),
+ KDBUS_CMD_BYEBYE = _IO(KDBUS_IOCTL_MAGIC, 0x31),
+
+ KDBUS_CMD_MSG_SEND = _IOWR(KDBUS_IOCTL_MAGIC, 0x40,
+ struct kdbus_msg),
+ KDBUS_CMD_MSG_RECV = _IOWR(KDBUS_IOCTL_MAGIC, 0x41,
+ struct kdbus_cmd_recv),
+ KDBUS_CMD_MSG_CANCEL = _IOW(KDBUS_IOCTL_MAGIC, 0x42,
+ struct kdbus_cmd_cancel),
+ KDBUS_CMD_FREE = _IOW(KDBUS_IOCTL_MAGIC, 0x43,
+ struct kdbus_cmd_free),
I think in general, using enum is great, but for ioctl command numbers,
we probably want to have defines so the user space implementation can
use #ifdef to see if the kernel version that it is being built for
knows a particular command.

You could do that using

#define KDBUS_CMD_BUS_MAKE KDBUS_CMD_BUS_MAKE

while keeping the enum, or do it like everybody else using

#define KDBUS_CMD_BUS_MAKE _IOW(KDBUS_IOCTL_MAGIC, 0x00, struct kdbus_cmd_make)

which might in fact help some tools that try to do automated parsing
of header files to find ioctl commands.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Tom Gundersen
2014-10-30 11:03:09 UTC
Permalink
Post by Arnd Bergmann
I think in general, using enum is great, but for ioctl command numbers,
we probably want to have defines so the user space implementation can
use #ifdef to see if the kernel version that it is being built for
knows a particular command.
Does that make sense for the first version? I agree that we should use
#define to allow #ifdef for when we add more ioctls in the future,
but these ioctls will always exist...

The nice thing about enums is of course that it helps with debugging
as gdb can show the string representation rather than the number,
because in contrast to #defines, an enum is something the compliler
knows about.

Cheers,

Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Bergmann
2014-10-30 11:27:19 UTC
Permalink
Post by Tom Gundersen
Post by Arnd Bergmann
I think in general, using enum is great, but for ioctl command numbers,
we probably want to have defines so the user space implementation can
use #ifdef to see if the kernel version that it is being built for
knows a particular command.
Does that make sense for the first version? I agree that we should use
#define to allow #ifdef for when we add more ioctls in the future,
but these ioctls will always exist...
It's mainly for consistency really.
Post by Tom Gundersen
The nice thing about enums is of course that it helps with debugging
as gdb can show the string representation rather than the number,
because in contrast to #defines, an enum is something the compliler
knows about.
This doesn't get passed as an enum in user space though, and when debugging
the kernel it only helps within one function.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-10-30 11:53:10 UTC
Permalink
Post by Arnd Bergmann
Post by Tom Gundersen
The nice thing about enums is of course that it helps with debugging
as gdb can show the string representation rather than the number,
because in contrast to #defines, an enum is something the compliler
knows about.
This doesn't get passed as an enum in user space though, and when debugging
the kernel it only helps within one function.
Hmm, this is the header exported to userspace, so having enums in would
make our lives easier, right?

Hence, for now, I'd propose we keep it the way it is, and add new ioctls
with defines once they are implemented. Are you okay with this? I'll add
a comment to the file to give a heads-up.


Thanks,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Bergmann
2014-10-30 12:03:58 UTC
Permalink
Post by Daniel Mack
Post by Arnd Bergmann
Post by Tom Gundersen
The nice thing about enums is of course that it helps with debugging
as gdb can show the string representation rather than the number,
because in contrast to #defines, an enum is something the compliler
knows about.
This doesn't get passed as an enum in user space though, and when debugging
the kernel it only helps within one function.
Hmm, this is the header exported to userspace, so having enums in would
make our lives easier, right?
My point was that you never use the enum by type and the only place in
user space where it's referenced would be something like

ret = ioctl(fd, KDBUS_CMD_BUS_MAKE, &make);

In the debugger, you will see the source line here. If you trace into the
glibc ioctl function, you no longer know the type because that just
has an 'int'.
Post by Daniel Mack
Hence, for now, I'd propose we keep it the way it is, and add new ioctls
with defines once they are implemented. Are you okay with this? I'll add
a comment to the file to give a heads-up.
It's certainly not a show-stopped, but I have yet to see a good reason
why it would help anyone.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-10-31 10:03:44 UTC
Permalink
Post by Arnd Bergmann
Post by Daniel Mack
Hmm, this is the header exported to userspace, so having enums in would
make our lives easier, right?
My point was that you never use the enum by type and the only place in
user space where it's referenced would be something like
ret = ioctl(fd, KDBUS_CMD_BUS_MAKE, &make);
In the debugger, you will see the source line here. If you trace into the
glibc ioctl function, you no longer know the type because that just
has an 'int'.
Alright - I changed that to #defines now.


Thanks,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:07:44 UTC
Permalink
From: Daniel Mack <***@zonque.org>

A pool for data received from the kernel is installed for every
connection of the bus, and it is used to copy data from the kernel to
userspace clients, for messages and other information.

It is accessed when one of the following ioctls is issued:

* KDBUS_CMD_MSG_RECV, to receive a message
* KDBUS_CMD_NAME_LIST, to dump the name registry
* KDBUS_CMD_CONN_INFO, to retrieve information on a connection

The offsets returned by either one of the aforementioned ioctls
describe offsets inside the pool. Internally, the pool is organized in
slices, that are dynamically allocated on demand. The overall size of
the pool is chosen by the connection when it connects to the bus with
KDBUS_CMD_HELLO.

In order to make the slice available for subsequent calls,
KDBUS_CMD_FREE has to be called on the offset.

To access the memory, the caller is expected to mmap() it to its task.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
drivers/misc/kdbus/pool.c | 728 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/misc/kdbus/pool.h | 43 +++
2 files changed, 771 insertions(+)
create mode 100644 drivers/misc/kdbus/pool.c
create mode 100644 drivers/misc/kdbus/pool.h

diff --git a/drivers/misc/kdbus/pool.c b/drivers/misc/kdbus/pool.c
new file mode 100644
index 000000000000..ef181d7c043b
--- /dev/null
+++ b/drivers/misc/kdbus/pool.c
@@ -0,0 +1,728 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#include <linux/aio.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/highmem.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/shmem_fs.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "pool.h"
+#include "util.h"
+
+/**
+ * struct kdbus_pool - the receiver's buffer
+ * @f: The backing shmem file
+ * @size: The size of the file
+ * @busy: The currently used size
+ * @lock: Pool data lock
+ * @slices: All slices sorted by address
+ * @slices_busy: Tree of allocated slices
+ * @slices_free: Tree of free slices
+ *
+ * The receiver's buffer, managed as a pool of allocated and free
+ * slices containing the queued messages.
+ *
+ * Messages sent with KDBUS_CMD_MSG_SEND are copied direcly by the
+ * sending process into the receiver's pool.
+ *
+ * Messages received with KDBUS_CMD_MSG_RECV just return the offset
+ * to the data placed in the pool.
+ *
+ * The internally allocated memory needs to be returned by the receiver
+ * with KDBUS_CMD_MSG_FREE.
+ */
+struct kdbus_pool {
+ struct file *f;
+ size_t size;
+ size_t busy;
+ struct mutex lock;
+
+ struct list_head slices;
+ struct rb_root slices_busy;
+ struct rb_root slices_free;
+};
+
+/**
+ * struct kdbus_pool_slice - allocated element in kdbus_pool
+ * @pool: Pool this slice belongs to
+ * @off: Offset of slice in the shmem file
+ * @size: Size of slice
+ * @entry: Entry in "all slices" list
+ * @rb_node: Entry in free or busy list
+ * @free: Unused slice
+ * @public: Slice was exposed to userspace and may be freed
+ * with KDBUS_CMD_FREE.
+ *
+ * The pool has one or more slices, always spanning the entire size of the
+ * pool.
+ *
+ * Every slice is an element in a list sorted by the buffer address, to
+ * provide access to the next neighbor slice.
+ *
+ * Every slice is member in either the busy or the free tree. The free
+ * tree is organized by slice size, the busy tree organized by buffer
+ * offset.
+ */
+struct kdbus_pool_slice {
+ struct kdbus_pool *pool;
+ size_t off;
+ size_t size;
+
+ struct list_head entry;
+ struct rb_node rb_node;
+ bool free;
+ bool public;
+};
+
+static struct kdbus_pool_slice *kdbus_pool_slice_new(struct kdbus_pool *pool,
+ size_t off, size_t size)
+{
+ struct kdbus_pool_slice *slice;
+
+ slice = kzalloc(sizeof(*slice), GFP_KERNEL);
+ if (!slice)
+ return NULL;
+
+ slice->pool = pool;
+ slice->off = off;
+ slice->size = size;
+ slice->free = true;
+ slice->public = false;
+ return slice;
+}
+
+/* insert a slice into the free tree */
+static void kdbus_pool_add_free_slice(struct kdbus_pool *pool,
+ struct kdbus_pool_slice *slice)
+{
+ struct rb_node **n;
+ struct rb_node *pn = NULL;
+
+ n = &pool->slices_free.rb_node;
+ while (*n) {
+ struct kdbus_pool_slice *pslice;
+
+ pn = *n;
+ pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
+ if (slice->size < pslice->size)
+ n = &pn->rb_left;
+ else
+ n = &pn->rb_right;
+ }
+
+ rb_link_node(&slice->rb_node, pn, n);
+ rb_insert_color(&slice->rb_node, &pool->slices_free);
+}
+
+/* insert a slice into the busy tree */
+static void kdbus_pool_add_busy_slice(struct kdbus_pool *pool,
+ struct kdbus_pool_slice *slice)
+{
+ struct rb_node **n;
+ struct rb_node *pn = NULL;
+
+ n = &pool->slices_busy.rb_node;
+ while (*n) {
+ struct kdbus_pool_slice *pslice;
+
+ pn = *n;
+ pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
+ if (slice->off < pslice->off)
+ n = &pn->rb_left;
+ else if (slice->off > pslice->off)
+ n = &pn->rb_right;
+ }
+
+ rb_link_node(&slice->rb_node, pn, n);
+ rb_insert_color(&slice->rb_node, &pool->slices_busy);
+}
+
+static struct kdbus_pool_slice *kdbus_pool_find_slice(struct kdbus_pool *pool,
+ size_t off)
+{
+ struct rb_node *n;
+
+ n = pool->slices_busy.rb_node;
+ while (n) {
+ struct kdbus_pool_slice *s;
+
+ s = rb_entry(n, struct kdbus_pool_slice, rb_node);
+ if (off < s->off)
+ n = n->rb_left;
+ else if (off > s->off)
+ n = n->rb_right;
+ else
+ return s;
+ }
+
+ return NULL;
+}
+
+/**
+ * kdbus_pool_slice_alloc() - allocate memory from a pool
+ * @pool: The receiver's pool
+ * @slice: Slice allocated from the the pool
+ * @size: The number of bytes to allocate
+ *
+ * The returned slice is used for kdbus_pool_slice_free() to
+ * free the allocated memory.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_pool_slice_alloc(struct kdbus_pool *pool,
+ struct kdbus_pool_slice **slice, size_t size)
+{
+ size_t slice_size = KDBUS_ALIGN8(size);
+ struct rb_node *n, *found = NULL;
+ struct kdbus_pool_slice *s;
+ int ret = 0;
+
+ /* search a free slice with the closest matching size */
+ mutex_lock(&pool->lock);
+ n = pool->slices_free.rb_node;
+ while (n) {
+ s = rb_entry(n, struct kdbus_pool_slice, rb_node);
+ if (slice_size < s->size) {
+ found = n;
+ n = n->rb_left;
+ } else if (slice_size > s->size) {
+ n = n->rb_right;
+ } else {
+ found = n;
+ break;
+ }
+ }
+
+ /* no slice with the minimum size found in the pool */
+ if (!found) {
+ ret = -ENOBUFS;
+ goto exit_unlock;
+ }
+
+ /* no exact match, use the closest one */
+ if (!n)
+ s = rb_entry(found, struct kdbus_pool_slice, rb_node);
+
+ /* move slice from free to the busy tree */
+ rb_erase(found, &pool->slices_free);
+ kdbus_pool_add_busy_slice(pool, s);
+
+ /* we got a slice larger than what we asked for? */
+ if (s->size > slice_size) {
+ struct kdbus_pool_slice *s_new;
+
+ /* split-off the remainder of the size to its own slice */
+ s_new = kdbus_pool_slice_new(pool, s->off + slice_size,
+ s->size - slice_size);
+ if (!s_new) {
+ ret = -ENOMEM;
+ goto exit_unlock;
+ }
+
+ list_add(&s_new->entry, &s->entry);
+ kdbus_pool_add_free_slice(pool, s_new);
+
+ /* adjust our size now that we split-off another slice */
+ s->size = slice_size;
+ }
+
+ s->free = false;
+ s->public = false;
+ pool->busy += s->size;
+ mutex_unlock(&pool->lock);
+
+ *slice = s;
+ return 0;
+
+exit_unlock:
+ mutex_unlock(&pool->lock);
+ return ret;
+}
+
+static void __kdbus_pool_slice_free(struct kdbus_pool_slice *slice)
+{
+ struct kdbus_pool *pool = slice->pool;
+
+ BUG_ON(slice->free);
+
+ rb_erase(&slice->rb_node, &pool->slices_busy);
+ pool->busy -= slice->size;
+
+ /* merge with the next free slice */
+ if (!list_is_last(&slice->entry, &pool->slices)) {
+ struct kdbus_pool_slice *s;
+
+ s = list_entry(slice->entry.next,
+ struct kdbus_pool_slice, entry);
+ if (s->free) {
+ rb_erase(&s->rb_node, &pool->slices_free);
+ list_del(&s->entry);
+ slice->size += s->size;
+ kfree(s);
+ }
+ }
+
+ /* merge with previous free slice */
+ if (pool->slices.next != &slice->entry) {
+ struct kdbus_pool_slice *s;
+
+ s = list_entry(slice->entry.prev, struct kdbus_pool_slice,
+ entry);
+ if (s->free) {
+ rb_erase(&s->rb_node, &pool->slices_free);
+ list_del(&slice->entry);
+ s->size += slice->size;
+ kfree(slice);
+ slice = s;
+ }
+ }
+
+ slice->free = true;
+ kdbus_pool_add_free_slice(pool, slice);
+}
+
+/**
+ * kdbus_pool_slice_free() - give allocated memory back to the pool
+ * @slice: Slice allocated from the the pool
+ *
+ * The slice was returned by the call to kdbus_pool_alloc_slice(), the
+ * memory is returned to the pool.
+ */
+void kdbus_pool_slice_free(struct kdbus_pool_slice *slice)
+{
+ struct kdbus_pool *pool = slice->pool;
+
+ mutex_lock(&pool->lock);
+ __kdbus_pool_slice_free(slice);
+ mutex_unlock(&pool->lock);
+}
+
+/**
+ * kdbus_pool_release_offset() - release a public offset
+ * @pool: pool to operate on
+ * @off: offset to release
+ *
+ * This should be called whenever user-space frees a slice given to them. It
+ * verifies the slice is available and public, and then drops it. It ensures
+ * correct locking and barriers against queues.
+ *
+ * Return: 0 on success, ENXIO if the offset is invalid, EINVAL if the offset is
+ * valid but not public.
+ */
+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off)
+{
+ struct kdbus_pool_slice *slice;
+ int ret = 0;
+
+ mutex_lock(&pool->lock);
+ slice = kdbus_pool_find_slice(pool, off);
+ if (slice) {
+ if (slice->public)
+ __kdbus_pool_slice_free(slice);
+ else
+ ret = -EINVAL;
+ } else {
+ ret = -ENXIO;
+ }
+ mutex_unlock(&pool->lock);
+
+ return ret;
+}
+
+/**
+ * kdbus_pool_slice_offset() - return the slice's offset inside the pool
+ * @slice: The slice
+ *
+ * Return: the offset in bytes.
+ */
+size_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice)
+{
+ return slice->off;
+}
+
+/**
+ * kdbus_pool_slice_make_public() - set a slice's public flag to true
+ * @slice: The slice
+ */
+void kdbus_pool_slice_make_public(struct kdbus_pool_slice *slice)
+{
+ slice->public = true;
+}
+
+/**
+ * kdbus_pool_new() - create a new pool
+ * @name: Name of the (deleted) file which shows up in
+ * /proc, used for debugging
+ * @pool: Newly allocated pool
+ * @size: Maximum size of the pool
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_pool_new(const char *name, struct kdbus_pool **pool, size_t size)
+{
+ struct kdbus_pool_slice *s;
+ struct kdbus_pool *p;
+ struct file *f;
+ char *n = NULL;
+ int ret;
+
+ BUG_ON(*pool);
+
+ p = kzalloc(sizeof(*p), GFP_KERNEL);
+ if (!p)
+ return -ENOMEM;
+
+ if (name) {
+ n = kasprintf(GFP_KERNEL, KBUILD_MODNAME "-conn:%s", name);
+ if (!n) {
+ ret = -ENOMEM;
+ goto exit_free;
+ }
+ }
+
+ f = shmem_file_setup(n ?: KBUILD_MODNAME "-conn", size, VM_NORESERVE);
+ kfree(n);
+
+ if (IS_ERR(f)) {
+ ret = PTR_ERR(f);
+ goto exit_free;
+ }
+
+ ret = get_write_access(file_inode(f));
+ if (ret < 0)
+ goto exit_put_shmem;
+
+ /* allocate first slice spanning the entire pool */
+ s = kdbus_pool_slice_new(p, 0, size);
+ if (!s) {
+ ret = -ENOMEM;
+ goto exit_put_write;
+ }
+
+ p->f = f;
+ p->size = size;
+ p->busy = 0;
+ p->slices_free = RB_ROOT;
+ p->slices_busy = RB_ROOT;
+ mutex_init(&p->lock);
+
+ INIT_LIST_HEAD(&p->slices);
+ list_add(&s->entry, &p->slices);
+
+ kdbus_pool_add_free_slice(p, s);
+ *pool = p;
+ return 0;
+
+exit_put_write:
+ put_write_access(file_inode(f));
+exit_put_shmem:
+ fput(f);
+exit_free:
+ kfree(p);
+ return ret;
+}
+
+/**
+ * kdbus_pool_free() - destroy pool
+ * @pool: The receiver's pool
+ */
+void kdbus_pool_free(struct kdbus_pool *pool)
+{
+ struct kdbus_pool_slice *s, *tmp;
+
+ if (!pool)
+ return;
+
+ list_for_each_entry_safe(s, tmp, &pool->slices, entry) {
+ list_del(&s->entry);
+ kfree(s);
+ }
+
+ put_write_access(file_inode(pool->f));
+ fput(pool->f);
+ kfree(pool);
+}
+
+/**
+ * kdbus_pool_remain() - the number of free bytes in the pool
+ * @pool: The receiver's pool
+ *
+ * Return: the number of unallocated bytes in the pool
+ */
+size_t kdbus_pool_remain(struct kdbus_pool *pool)
+{
+ size_t size;
+
+ mutex_lock(&pool->lock);
+ size = pool->size - pool->busy;
+ mutex_unlock(&pool->lock);
+
+ return size;
+}
+
+/* copy data from a file to a page in the receiver's pool */
+static int kdbus_pool_copy_file(struct page *p, size_t start,
+ struct file *f, size_t off, size_t count)
+{
+ loff_t o = off;
+ char *kaddr;
+ ssize_t n;
+
+ kaddr = kmap(p);
+ n = f->f_op->read(f, (char __force __user *)kaddr + start, count, &o);
+ kunmap(p);
+ if (n < 0)
+ return n;
+ if (n != count)
+ return -EFAULT;
+
+ return 0;
+}
+
+/* copy data to a page in the receiver's pool */
+static int kdbus_pool_copy_data(struct page *p, size_t start,
+ const void __user *from, size_t count)
+{
+ unsigned long remain;
+ char *kaddr;
+
+ if (fault_in_pages_readable(from, count) < 0)
+ return -EFAULT;
+
+ kaddr = kmap_atomic(p);
+ pagefault_disable();
+ remain = __copy_from_user_inatomic(kaddr + start, from, count);
+ pagefault_enable();
+ kunmap_atomic(kaddr);
+ if (remain > 0)
+ return -EFAULT;
+
+ cond_resched();
+ return 0;
+}
+
+/* copy data to the receiver's pool */
+static size_t kdbus_pool_copy(const struct kdbus_pool_slice *slice, size_t off,
+ const void __user *data, struct file *f_src,
+ size_t off_src, size_t len)
+{
+ struct file *f_dst = slice->pool->f;
+ struct address_space *mapping = f_dst->f_mapping;
+ const struct address_space_operations *aops = mapping->a_ops;
+ unsigned long fpos = slice->off + off;
+ unsigned long rem = len;
+ size_t pos = 0;
+ int ret = 0;
+
+ BUG_ON(off + len > slice->size);
+ BUG_ON(slice->free);
+
+ while (rem > 0) {
+ struct page *p;
+ unsigned long o;
+ unsigned long n;
+ void *fsdata;
+ int status;
+
+ o = fpos & (PAGE_CACHE_SIZE - 1);
+ n = min_t(unsigned long, PAGE_CACHE_SIZE - o, rem);
+
+ status = aops->write_begin(f_dst, mapping, fpos, n, 0, &p,
+ &fsdata);
+ if (status) {
+ ret = -EFAULT;
+ break;
+ }
+
+ if (data)
+ ret = kdbus_pool_copy_data(p, o, data + pos, n);
+ else
+ ret = kdbus_pool_copy_file(p, o, f_src,
+ off_src + pos, n);
+ mark_page_accessed(p);
+
+ status = aops->write_end(f_dst, mapping, fpos, n, n, p, fsdata);
+
+ if (ret < 0)
+ break;
+ if (status != n) {
+ ret = -EFAULT;
+ break;
+ }
+
+ pos += n;
+ fpos += n;
+ rem -= n;
+ }
+
+ return ret;
+}
+
+/**
+ * kdbus_pool_slice_copy_user() - copy user memory to a slice
+ * @slice: The slice to write to
+ * @off: Offset in the slice to write to
+ * @data: User memory to copy from
+ * @len: Number of bytes to copy
+ *
+ * The offset was returned by the call to kdbus_pool_alloc_slice().
+ * The user memory at @data will be copied to the @off in the allocated
+ * slice in the pool.
+ *
+ * Return: the numbers of bytes copied, negative errno on failure.
+ */
+ssize_t
+kdbus_pool_slice_copy_user(const struct kdbus_pool_slice *slice, size_t off,
+ const void __user *data, size_t len)
+{
+ return kdbus_pool_copy(slice, off, data, NULL, 0, len);
+}
+
+/**
+ * kdbus_pool_slice_copy() - copy kernel memory to a slice
+ * @slice: The slice to write to
+ * @off: Offset in the slice to write to
+ * @data: Kernel memory to copy from
+ * @len: Number of bytes to copy
+ *
+ * The slice was returned by the call to kdbus_pool_alloc_slice().
+ * The user memory at @data will be copied to the @off in the allocated
+ * slice in the pool.
+ *
+ * Return: the numbers of bytes copied, negative errno on failure.
+ */
+ssize_t kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice, size_t off,
+ const void *data, size_t len)
+{
+ mm_segment_t old_fs;
+ ssize_t ret;
+
+ old_fs = get_fs();
+ set_fs(get_ds());
+ ret = kdbus_pool_copy(slice, off,
+ (const void __user *)data, NULL, 0, len);
+ set_fs(old_fs);
+
+ return ret;
+}
+
+/**
+ * kdbus_pool_move_slice() - move memory from one pool into another one
+ * @dst_pool: The receiver's pool to copy to
+ * @src_pool: The receiver's pool to copy from
+ * @slice: Reference to the slice to copy from the source;
+ * updated with the newly allocated slice in the
+ * destination
+ *
+ * Move memory from one pool to another. Memory will be allocated in the
+ * destination pool, the memory copied over, and the free()d in source
+ * pool.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int kdbus_pool_move_slice(struct kdbus_pool *dst_pool,
+ struct kdbus_pool *src_pool,
+ struct kdbus_pool_slice **slice)
+{
+ mm_segment_t old_fs;
+ struct kdbus_pool_slice *slice_new;
+ int ret;
+
+ ret = kdbus_pool_slice_alloc(dst_pool, &slice_new, (*slice)->size);
+ if (ret < 0)
+ return ret;
+
+ old_fs = get_fs();
+ set_fs(get_ds());
+ ret = kdbus_pool_copy(slice_new, 0, NULL,
+ src_pool->f, (*slice)->off, (*slice)->size);
+ set_fs(old_fs);
+ if (ret < 0)
+ goto exit_free;
+
+ kdbus_pool_slice_free(*slice);
+
+ *slice = slice_new;
+ return 0;
+
+exit_free:
+ kdbus_pool_slice_free(slice_new);
+ return ret;
+}
+
+/**
+ * kdbus_pool_slice_flush() - flush dcache memory area of a slice
+ * @slice: The allocated slice to flush
+ *
+ * Dcache flushes are delayed to happen only right before the receiver
+ * gets the new buffer area announced. The mapped buffer is always
+ * read-only for the receiver, and only the area of the announced message
+ * needs to be flushed.
+ */
+void kdbus_pool_slice_flush(const struct kdbus_pool_slice *slice)
+{
+#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 1
+ struct address_space *mapping = slice->pool->f->f_mapping;
+ pgoff_t first = slice->off >> PAGE_CACHE_SHIFT;
+ pgoff_t last = (slice->off + slice->size +
+ PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
+ pgoff_t i;
+
+ for (i = first; i < last; i++) {
+ struct page *page;
+
+ page = find_get_page(mapping, i);
+ if (!page)
+ continue;
+
+ flush_dcache_page(page);
+ put_page(page);
+ }
+#endif
+}
+
+/**
+ * kdbus_pool_mmap() - map the pool into the process
+ * @pool: The receiver's pool
+ * @vma: passed by mmap() syscall
+ *
+ * Return: the result of the mmap() call, negative errno on failure.
+ */
+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma)
+{
+ /* deny write access to the pool */
+ if (vma->vm_flags & VM_WRITE)
+ return -EPERM;
+ vma->vm_flags &= ~VM_MAYWRITE;
+
+ /* do not allow to map more than the size of the file */
+ if ((vma->vm_end - vma->vm_start) > pool->size)
+ return -EFAULT;
+
+ /* replace the connection file with our shmem file */
+ if (vma->vm_file)
+ fput(vma->vm_file);
+ vma->vm_file = get_file(pool->f);
+
+ return pool->f->f_op->mmap(pool->f, vma);
+}
diff --git a/drivers/misc/kdbus/pool.h b/drivers/misc/kdbus/pool.h
new file mode 100644
index 000000000000..745161ba4463
--- /dev/null
+++ b/drivers/misc/kdbus/pool.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (C) 2013-2014 Kay Sievers
+ * Copyright (C) 2013-2014 Greg Kroah-Hartman <***@linuxfoundation.org>
+ * Copyright (C) 2013-2014 Daniel Mack <***@zonque.org>
+ * Copyright (C) 2013-2014 David Herrmann <***@gmail.com>
+ * Copyright (C) 2013-2014 Linux Foundation
+ *
+ * kdbus is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU Lesser General Public License as published by the
+ * Free Software Foundation; either version 2.1 of the License, or (at
+ * your option) any later version.
+ */
+
+#ifndef __KDBUS_POOL_H
+#define __KDBUS_POOL_H
+
+struct kdbus_pool;
+struct kdbus_pool_slice;
+
+int kdbus_pool_new(const char *name, struct kdbus_pool **pool, size_t size);
+void kdbus_pool_free(struct kdbus_pool *pool);
+size_t kdbus_pool_remain(struct kdbus_pool *pool);
+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma);
+int kdbus_pool_move_slice(struct kdbus_pool *dst_pool,
+ struct kdbus_pool *src_pool,
+ struct kdbus_pool_slice **slice);
+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off);
+
+int kdbus_pool_slice_alloc(struct kdbus_pool *pool,
+ struct kdbus_pool_slice **slice, size_t size);
+void kdbus_pool_slice_free(struct kdbus_pool_slice *slice);
+struct kdbus_pool_slice *kdbus_pool_slice_find(struct kdbus_pool *pool,
+ size_t off);
+size_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice);
+ssize_t kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice, size_t off,
+ const void *data, size_t len);
+ssize_t
+kdbus_pool_slice_copy_user(const struct kdbus_pool_slice *slice, size_t off,
+ const void __user *data, size_t len);
+void kdbus_pool_slice_flush(const struct kdbus_pool_slice *slice);
+
+void kdbus_pool_slice_make_public(struct kdbus_pool_slice *slice);
+#endif
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:08:09 UTC
Permalink
From: Daniel Mack <***@zonque.org>

kdbus is a system for low-latency, low-overhead, easy to use
interprocess communication (IPC).

The interface to all functions in this driver is implemented through ioctls
on /dev nodes. This patch adds detailed documentation about the kernel
level API design.

Signed-off-by: Daniel Mack <***@zonque.org>
Signed-off-by: Greg Kroah-Hartman <***@linuxfoundation.org>
---
Documentation/kdbus.txt | 1815 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 1815 insertions(+)
create mode 100644 Documentation/kdbus.txt

diff --git a/Documentation/kdbus.txt b/Documentation/kdbus.txt
new file mode 100644
index 000000000000..ac1a18908976
--- /dev/null
+++ b/Documentation/kdbus.txt
@@ -0,0 +1,1815 @@
+D-Bus is a system for powerful, easy to use interprocess communication (IPC).
+
+The focus of this document is an overview of the low-level, native kernel D-Bus
+transport called kdbus. Kdbus in the kernel acts similar to a device driver,
+all communication between processes take place over special character device
+nodes in /dev/kdbus/.
+
+For the general D-Bus protocol specification, the payload format, the
+marshaling, and the communication semantics, please refer to:
+ http://dbus.freedesktop.org/doc/dbus-specification.html
+
+For a kdbus specific userspace library implementation please refer to:
+ http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
+
+Articles about D-Bus and kdbus:
+ http://lwn.net/Articles/580194/
+
+
+1. Terminology
+===============================================================================
+
+ Domain:
+ A domain is a named object containing a number of buses. A system
+ container that contains its own init system and users usually also
+ runs in its own kdbus domain. The /dev/kdbus/domain/<container-name>/
+ directory shows up inside the domain as /dev/kdbus/. Every domain offers
+ its own "control" device node to create new buses or new sub-domains.
+ Domains have no connection to each other and cannot see nor talk to
+ each other. See section 5 for more details.
+
+ Bus:
+ A bus is a named object inside a domain. Clients exchange messages
+ over a bus. Multiple buses themselves have no connection to each other;
+ messages can only be exchanged on the same bus. The default entry point to
+ a bus, where clients establish the connection to, is the "bus" device node
+ /dev/kdbus/<bus name>/bus.
+ Common operating system setups create one "system bus" per system, and one
+ "user bus" for every logged-in user. Applications or services may create
+ their own private named buses. See section 5 for more details.
+
+ Endpoint:
+ An endpoint provides the device node to talk to a bus. Opening an
+ endpoint creates a new connection to the bus to which the endpoint belongs.
+ Every bus has a default endpoint called "bus".
+ A bus can optionally offer additional endpoints with custom names to
+ provide a restricted access to the same bus. Custom endpoints carry
+ additional policy which can be used to give sandboxed processes only
+ a locked-down, limited, filtered access to the same bus.
+ See section 5 for more details.
+
+ Connection:
+ A connection to a bus is created by opening an endpoint device node of
+ a bus and becoming an active client with the HELLO exchange. Every
+ connected client connection has a unique identifier on the bus and can
+ address messages to every other connection on the same bus by using
+ the peer's connection id as the destination.
+ See section 6 for more details.
+
+ Pool:
+ Each connection allocates a piece of shmem-backed memory that is used
+ to receive messages and answers to ioctl command from the kernel. It is
+ never used to send anything to the kernel. In order to access that memory,
+ userspace must mmap() it into its task.
+ See section 12 for more details.
+
+ Well-known Name:
+ A connection can, in addition to its implicit unique connection id, request
+ the ownership of a textual well-known name. Well-known names are noted in
+ reverse-domain notation, such as com.example.service1. Connections offering
+ a service on a bus are usually reached by its well-known name. The analogy
+ of connection id and well-known name is an IP address and a DNS name
+ associated with that address.
+
+ Message:
+ Connections can exchange messages with other connections by addressing
+ the peers with their connection id or well-known name. A message consists
+ of a message header with kernel-specific information on how to route the
+ message, and the message payload, which is a logical byte stream of
+ arbitrary size. Messages can carry additional file descriptors to be passed
+ from one connection to another. Every connection can specify which set of
+ metadata the kernel should attach to the message when it is delivered
+ to the receiving connection. Metadata contains information like: system
+ timestamps, uid, gid, tid, proc-starttime, well-known-names, process comm,
+ process exe, process argv, cgroup, capabilities, seclabel, audit session,
+ loginuid and the connection's human-readable name.
+ See section 7 and 13 for more details.
+
+ Item:
+ The API of kdbus implements a notion of items, submitted through and
+ returned by most ioctls, and stored inside data structures in the
+ connection's pool. See section 4 for more details.
+
+ Broadcast and Match:
+ Broadcast messages are potentially sent to all connections of a bus. By
+ default, the connections will not actually receive any of the sent
+ broadcast messages; only after installing a match for specific message
+ properties, a broadcast message passes this filter.
+ See section 10 for more details.
+
+ Policy:
+ A policy is a set of rules that define which connections can see, talk to,
+ or register a well-know name on the bus. A policy is attached to buses and
+ custom endpoints, and modified by policy holder connection or owners of
+ custom endpoints. See section 11 for more details.
+
+ Access rules to allow who can see a name on the bus are only checked on
+ custom endpoints. Policies may be defined with names that end with '.*'.
+ When matching a well-known name against such a wildcard entry, the last
+ part of the name is ignored and checked against the wildcard name without
+ the trailing '.*'. See section 11 for more details.
+
+ Privileged bus users:
+ A user connecting to the bus is considered privileged if it is either the
+ creator of the bus, or if it has the CAP_IPC_OWNER capability flag set.
+
+
+2. Device Node Layout
+===============================================================================
+
+The kdbus interface is exposed through device nodes in /dev.
+
+ /sys/bus/kdbus
+ `-- devices
+ |-- kdbus!0-system!bus -> ../../../devices/virtual/kdbus/kdbus!0-system!bus
+ |-- kdbus!2702-user!bus -> ../../../devices/virtual/kdbus/kdbus!2702-user!bus
+ |-- kdbus!2702-user!ep.app -> ../../../devices/virtual/kdbus/kdbus!2702-user!ep.app
+ `-- kdbus!control -> ../../../devices/kdbus!control
+
+ /dev/kdbus
+ |-- control
+ |-- 0-system
+ | |-- bus
+ | `-- ep.apache
+ |-- 1000-user
+ | `-- bus
+ |-- 2702-user
+ | |-- bus
+ | `-- ep.app
+ `-- domain
+ |-- fedoracontainer
+ | |-- control
+ | |-- 0-system
+ | | `-- bus
+ | `-- 1000-user
+ | `-- bus
+ `-- mydebiancontainer
+ |-- control
+ `-- 0-system
+ `-- bus
+
+Note:
+ The device node subdirectory layout is arranged that a future version of
+ kdbus could be implemented as a file system with a separate instance mounted
+ for each domain. For any future changes, this always needs to be kept
+ in mind. Also the dependency on udev's userspace hookups or sysfs attribute
+ use should be limited to the absolute minimum for the same reason.
+
+
+3. Data Structures and flags
+===============================================================================
+
+3.1 Data structures and interconnections
+----------------------------------------
+
+ +-------------------------------------------------------------------------+
+ | Domain (Init Domain) |
+ | /dev/kdbus/control |
+ | +---------------------------------------------------------------------+ |
+ | | Bus (System Bus) | |
+ | | /dev/kdbus/0-system/ | |
+ | | +-------------------------------+ +-------------------------------+ | |
+ | | | Endpoint | | Endpoint | | |
+ | | | /dev/kdbus/0-system/bus | | /dev/kdbus/0-system/ep.app | | |
+ | | +-------------------------------+ +-------------------------------+ | |
+ | | +--------------+ +--------------+ +--------------+ +--------------+ | |
+ | | | Connection | | Connection | | Connection | | Connection | | |
+ | | | :1.22 | | :1.25 | | :1.55 | | :1.81 | | |
+ | | +--------------+ +--------------+ +--------------+ +--------------+ | |
+ | +---------------------------------------------------------------------+ |
+ | |
+ | +---------------------------------------------------------------------+ |
+ | | Bus (User Bus for UID 2702) | |
+ | | /dev/kdbus/2702-user/ | |
+ | | +-------------------------------+ +-------------------------------+ | |
+ | | | Endpoint | | Endpoint | | |
+ | | | /dev/kdbus/2702-user/bus | | /dev/kdbus/2702-user/ep.app | | |
+ | | +-------------------------------+ +-------------------------------+ | |
+ | | +--------------+ +--------------+ +--------------+ +--------------+ | |
+ | | | Connection | | Connection | | Connection | | Connection | | |
+ | | | :1.22 | | :1.25 | | :1.55 | | :1.81 | | |
+ | | +--------------+ +--------------+ +-------------------------------+ | |
+ | +---------------------------------------------------------------------+ |
+ | |
+ | +---------------------------------------------------------------------+ |
+ | | Domain (Container; inside it, fedoracontainer/ becomes /dev/kdbus/) | |
+ | | /dev/kdbus/domain/fedoracontainer/control | |
+ | | +-----------------------------------------------------------------+ | |
+ | | | Bus (System Bus of "fedoracontainer") | | |
+ | | | /dev/kdbus/domain/fedoracontainer/0-system/ | | |
+ | | | +-----------------------------+ | | |
+ | | | | Endpoint | | | |
+ | | | | /dev/.../0-system/bus | | | |
+ | | | +-----------------------------+ | | |
+ | | | +-------------+ +-------------+ | | |
+ | | | | Connection | | Connection | | | |
+ | | | | :1.22 | | :1.25 | | | |
+ | | | +-------------+ +-------------+ | | |
+ | | +-----------------------------------------------------------------+ | |
+ | | | |
+ | | +-----------------------------------------------------------------+ | |
+ | | | Bus (User Bus for UID 270 of "fedoracontainer") | | |
+ | | | /dev/kdbus/domain/fedoracontainer/2702-user/ | | |
+ | | | +-----------------------------+ | | |
+ | | | | Endpoint | | | |
+ | | | | /dev/.../2702-user/bus | | | |
+ | | | +-----------------------------+ | | |
+ | | | +-------------+ +-------------+ | | |
+ | | | | Connection | | Connection | | | |
+ | | | | :1.22 | | :1.25 | | | |
+ | | | +-------------+ +-------------+ | | |
+ | | +-----------------------------------------------------------------+ | |
+ | +---------------------------------------------------------------------+ |
+ +-------------------------------------------------------------------------+
+
+The above description uses the D-Bus notation of unique connection names that
+adds a ":1." prefix to the connection's unique ID. kbus itself doesn't
+use that notation, neither internally nor externally. However, libraries and
+other usespace code that aims for compatibility to D-Bus might.
+
+3.2 Flags
+---------
+
+All ioctls used in the communication with the driver contain two 64-bit fields,
+'flags' and 'kernel_flags'. In 'flags', the behavior of the command can be
+tweaked, whereas in 'kernel_flags', the kernel driver writes back the mask of
+supported bits upon each call, and sets the KDBUS_FLAGS_KERNEL bit. This is a
+way to probe possible kernel features and make code forward and backward
+compatible.
+
+All bits that are not recognized by the kernel in 'flags' are rejected, and the
+ioctl fails with -EINVAL.
+
+
+4. Items
+===============================================================================
+
+To flexibly augment transport structures used by kdbus, data blobs of type
+struct kdbus_item are used. An item has a fixed-sized header that only stores
+the type of the item and the overall size. The total size is variable and is
+in some cases defined by the item type, in other cases, they can be of
+arbitrary length (for instance, a string).
+
+In the external kernel API, items are used for many ioctls to transport
+optional information from userspace to kernelspace. They are also used for
+information stored in a connection's pool, such as messages, name lists or
+requested connection information.
+
+In all such occasions where items are used as part of the kdbus kernel API,
+they are embedded in structs that have an overall size of their own, so there
+can be many of them.
+
+The kernel expects all items to be aligned to 8-byte boundaries.
+
+A simple iterator in userspace would iterate over the items until the items
+have reached the embedding structure's overall size. An example implementation
+of such an iterator can be found in tools/testing/selftests/kdbus/kdbus-util.h.
+
+
+5. Creation of new domains, buses and endpoints
+===============================================================================
+
+The initial kdbus domain is unconditionally created by the kernel module. A
+domain contains a "control" device node which allows to create a new bus or
+domain. New domains do not have any buses created by default.
+
+
+5.1 Domains and buses
+---------------------
+
+Opening the control device node returns a file descriptor, it accepts the
+ioctls KDBUS_CMD_BUS_MAKE and KDBUS_CMD_DOMAIN_MAKE which specify the name of
+the new bus or domain to create. The control file descriptor needs to be kept
+open for the entire life-time of the created bus or domain, closing it will
+immediately cleanup the entire bus or domain and all its associated
+resources and connections. Every control file descriptor can only be used once
+to create a new bus or domain; from that point, it is not used for any
+further communication until the final close().
+
+Each bus will generate a random, 128-bit UUID upon creation. It will be
+returned to the creators of connections through kdbus_cmd_hello.id128 and can
+be used by userspace to uniquely identify buses, even across different machines
+or containers. The UUID will have its its variant bits set to 'DCE', and denote
+version 4 (random).
+
+When a new domain is created, its structure in /dev/kdbus/<name>/ is a
+replication of what's initially created in /dev/kdbus. In fact, internally,
+a dummy default domain is set up when the driver is loaded. This allows
+userspace to bind-mount domain subtrees of /dev/kdbus into a container's
+filesystem view, and hence achieve complete isolation from the host's domain
+and those of other containers.
+
+
+5.2 Endpoints
+-------------
+
+Endpoints are entry points to a bus. By default, each bus has a default
+endpoint called 'bus'. The bus owner has the ability to create custom
+endpoints with specific names, permissions, and policy databases (see below).
+
+To create a custom endpoint, use the KDBUS_CMD_ENDPOINT_MAKE ioctl with struct
+kdbus_cmd_make. Custom endpoints always have a policy db that, by default,
+does not allow anything. Everything that users of this new endpoint should be
+able to do has to be explicitly specified through KDBUS_ITEM_NAME and
+KDBUS_ITEM_POLICY_ACCESS items.
+
+5.3 Creating domains, buses and endpoints
+-----------------------------------------
+
+KDBUS_CMD_BUS_MAKE, KDBUS_CMD_DOMAIN_MAKE and KDBUS_CMD_ENDPOINT_MAKE take a
+struct kdbus_cmd_make argument.
+
+struct kdbus_cmd_make {
+ __u64 size;
+ The overall size of the struct, including its items.
+
+ __u64 flags;
+ The flags for creation.
+
+ KDBUS_MAKE_ACCESS_GROUP
+ Make the device node group-accessible
+
+ KDBUS_MAKE_ACCESS_WORLD
+ Make the device node world-accessible
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ A list of items, only used for creating custom endpoints. Ignored for
+ buses and domains.
+};
+
+
+6. Connections
+===============================================================================
+
+
+6.1 Connection IDs and well-known connection names
+--------------------------------------------------
+
+Connections are identified by their connection id, internally implemented as a
+uint64_t counter. The IDs of every newly created bus start at 1, and every new
+connection will increment the counter by 1. The ids are not reused.
+
+In higher level tools, the user visible representation of a connection is
+defined by the D-Bus protocol specification as ":1.<id>".
+
+Messages with a specific uint64_t destination id are directly delivered to
+the connection with the corresponding id. Messages with the special destination
+id KDBUS_DST_ID_BROADCAST are broadcast messages and are potentially delivered
+to all known connections on the bus; clients interested in broadcast messages
+need to subscribe to the specific messages they are interested though, before
+any broadcast message reaches them.
+
+Messages synthesized and sent directly by the kernel will carry the special
+source id KDBUS_SRC_ID_KERNEL (0).
+
+In addition to the unique uint64_t connection id, established connections can
+request the ownership of well-known names, under which they can be found and
+addressed by other bus clients. A well-known name is associated with one and
+only one connection at a time. See section 8 on name acquisition and the
+name registry, and the validity of names.
+
+Messages can specify the special destination id 0 and carry a well-known name
+in the message data. Such a message is delivered to the destination connection
+which owns that well-known name.
+
+ +-------------------------------------------------------------------------+
+ | +---------------+ +---------------------------+ |
+ | | Connection | | Message | -----------------+ |
+ | | :1.22 | --> | src: 22 | | |
+ | | | | dst: 25 | | |
+ | | | | | | |
+ | | | | | | |
+ | | | +---------------------------+ | |
+ | | | | |
+ | | | <--------------------------------------+ | |
+ | +---------------+ | | |
+ | | | |
+ | +---------------+ +---------------------------+ | | |
+ | | Connection | | Message | -----+ | |
+ | | :1.25 | --> | src: 25 | | |
+ | | | | dst: 0xffffffffffffffff | -------------+ | |
+ | | | | (KDBUS_DST_ID_BROADCAST) | | | |
+ | | | | | ---------+ | | |
+ | | | +---------------------------+ | | | |
+ | | | | | | |
+ | | | <--------------------------------------------------+ |
+ | +---------------+ | | |
+ | | | |
+ | +---------------+ +---------------------------+ | | |
+ | | Connection | | Message | --+ | | |
+ | | :1.55 | --> | src: 55 | | | | |
+ | | | | dst: 0 / org.foo.bar | | | | |
+ | | | | | | | | |
+ | | | | | | | | |
+ | | | +---------------------------+ | | | |
+ | | | | | | |
+ | | | <------------------------------------------+ | |
+ | +---------------+ | | |
+ | | | |
+ | +---------------+ | | |
+ | | Connection | | | |
+ | | :1.81 | | | |
+ | | org.foo.bar | | | |
+ | | | | | |
+ | | | | | |
+ | | | <-----------------------------------+ | |
+ | | | | |
+ | | | <----------------------------------------------+ |
+ | +---------------+ |
+ +-------------------------------------------------------------------------+
+
+
+6.2 Creating connections
+------------------------
+
+A connection to a bus is created by opening an endpoint device node of
+a bus and becoming an active client with the KDBUS_CMD_HELLO ioctl. Every
+connected client connection has a unique identifier on the bus and can
+address messages to every other connection on the same bus by using
+the peer's connection id as the destination.
+
+The KDBUS_CMD_HELLO ioctl takes the following struct as argument.
+
+struct kdbus_cmd_hello {
+ __u64 size;
+ The overall size of the struct, including all attached items.
+
+ __u64 conn_flags;
+ Flags to apply to this connection:
+
+ KDBUS_HELLO_ACCEPT_FD
+ When this flag is set, the connection can be sent file descriptors
+ as message payload. If it's not set, any attempt of doing so will
+ result in -ECOMM on the sender's side.
+
+ KDBUS_HELLO_ACTIVATOR
+ Make this connection an activator (see below). With this bit set,
+ an item of type KDBUS_ITEM_NAME has to be attached which describes
+ the well-known name this connection should be an activator for.
+
+ KDBUS_HELLO_POLICY_HOLDER
+ Make this connection a policy holder (see below). With this bit set,
+ an item of type KDBUS_ITEM_NAME has to be attached which describes
+ the well-known name this connection should hold a policy for.
+
+ KDBUS_HELLO_MONITOR
+ Make this connection an eaves-dropping connection that receives all
+ unicast messages sent on the bus. To also receive broadcast messages,
+ the connection has to upload appropriate matches as well.
+ This flag is only valid for privileged bus connections.
+
+ __u64 attach_flags;
+ Request the attachment of metadata for each message received by this
+ connection. The metadata actually attached may actually augment the list
+ of requested items. See section 13 for more details.
+
+ __u64 bus_flags;
+ Upon successful completion of the ioctl, this member will contain the
+ flags of the bus it connected to.
+
+ __u64 id;
+ Upon successful completion of the ioctl, this member will contain the
+ id of the new connection.
+
+ __u64 pool_size;
+ The size of the communication pool, in bytes. The pool can be accessed
+ by calling mmap() on the file descriptor that was used to issue the
+ KDBUS_CMD_HELLO ioctl.
+
+ struct kdbus_bloom_parameter bloom;
+ Bloom filter parameter (see below).
+
+ __u8 id128[16];
+ Upon successful completion of the ioctl, this member will contain the
+ 128 bit wide UUID of the connected bus.
+
+ struct kdbus_item items[0];
+ Variable list of items to add optional additional information. The
+ following items are currently expected/valid:
+
+ KDBUS_ITEM_CONN_NAME
+ Contains a string to describes this connection's name, so it can be
+ identified later.
+
+ KDBUS_ITEM_NAME
+ KDBUS_ITEM_POLICY_ACCESS
+ For activators and policy holders only, combinations of these two
+ items describe policy access entries (see section about policy db).
+
+ KDBUS_ITEM_CREDS
+ KDBUS_ITEM_SECLABEL
+ Privileged bus users may submit these types in order to create
+ connections with faked credentials. The only real use case for this
+ is a proxy service which acts on behalf of some other tasks. For a
+ connection that runs in that mode, the message's metadata items will
+ be limited to what's specified here. See section 13 for more
+ information.
+
+ Items of other types are silently ignored.
+};
+
+
+6.3 Activator and policy holder connection
+------------------------------------------
+
+An activator connection is a placeholder for a well-known name. Messages sent
+to such a connection can be used by userspace to start an implementor
+connection, which will then get all the messages from the activator copied
+over. An activator connection cannot be used to send any message.
+
+A policy holder connection only installs a policy for one or more names.
+These policy entries are kept active as long as the connection is alive, and
+are removed once it terminates. Such a policy connection type can be used to
+deploy restrictions for names that are not yet active on the bus. A policy
+holder connection cannot be used to send any message.
+
+The creation of activator, policy holder or monitor connections is an operation
+restricted to privileged users on the bus (see section "Terminology").
+
+
+6.4 Retrieving information on a connection
+------------------------------------------
+
+The KDBUS_CMD_CONN_INFO ioctl can be used to retrieve credentials and
+properties of the initial creator of a connection. This ioctl uses the
+following struct:
+
+struct kdbus_cmd_info {
+ __u64 size;
+ The overall size of the struct, including the name with its 0-byte string
+ terminator.
+
+ __u64 flags;
+ Specify which items should be attached to the answer.
+ The following flags can be used:
+
+ KDBUS_ATTACH_NAMES
+ Add an item to the answer containing all the names the connection
+ currently owns.
+
+ KDBUS_ATTACH_CONN_NAME
+ Add an item to the answer containing the connection's name.
+
+ After the ioctl returns, this field will contain the current metadata
+ attach flags of the connection.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ __u64 id;
+ The connection's numerical ID to retrieve information for. If set to
+ non-zero value, the 'name' field is ignored.
+
+ __u64 offset;
+ When the ioctl returns, this value will yield the offset of the connection
+ information inside the caller's pool.
+
+ struct kdbus_item items[0];
+ The optional item list, containing the well-known name to look up as
+ a KDBUS_ITEM_NAME. Only required if the 'id' field is set to 0.
+ All other items are currently ignored.
+};
+
+After the ioctl returns, the following struct will be stored in the caller's
+pool at 'offset'.
+
+struct kdbus_info {
+ __u64 size;
+ The overall size of the struct, including all its items.
+
+ __u64 id;
+ The connection's unique ID.
+
+ __u64 flags;
+ The connection's flags as specified when it was created.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ Depending on the 'flags' field in struct kdbus_cmd_info, items of
+ types KDBUS_ITEM_NAME and KDBUS_ITEM_CONN_NAME are followed here.
+};
+
+Once the caller is finished with parsing the return buffer, it needs to call
+KDBUS_CMD_FREE for the offset.
+
+
+6.5 Getting information about a connection's bus creator
+--------------------------------------------------------
+
+The KDBUS_CMD_BUS_CREATOR_INFO ioctl takes the same struct as
+KDBUS_CMD_CONN_INFO but is used to retrieve information about the creator of
+the bus the connection is attached to. The metadata returned by this call is
+collected during the creation of the bus and is never altered afterwards, so
+it provides pristine information on the task that created the bus, at the
+moment when it did so.
+
+In response to this call, a slice in the connection's pool is allocated and
+filled with an object of type struct kdbus_info, pointed to by the ioctl's
+'offset' field.
+
+struct kdbus_info {
+ __u64 size;
+ The overall size of the struct, including all its items.
+
+ __u64 id;
+ The bus' ID
+
+ __u64 flags;
+ The bus' flags as specified when it was created.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ Metadata information is stored in items here.
+};
+
+Once the caller is finished with parsing the return buffer, it needs to call
+KDBUS_CMD_FREE for the offset.
+
+
+6.6 Updating connection details
+-------------------------------
+
+Some of a connection's details can be updated with the KDBUS_CMD_CONN_UPDATE
+ioctl, using the file descriptor that was used to create the connection.
+The update command uses the following struct.
+
+struct kdbus_cmd_update {
+ __u64 size;
+ The overall size of the struct, including all its items.
+
+ struct kdbus_item items[0];
+ Items to describe the connection details to be updated. The following item
+ types are supported:
+
+ KDBUS_ITEM_ATTACH_FLAGS
+ Supply a new set of items to be attached to each message.
+
+ KDBUS_ITEM_NAME
+ KDBUS_ITEM_POLICY_ACCESS
+ Policy holder connections may supply a new set of policy information
+ with these items. For other connection types, -EOPNOTSUPP is returned.
+};
+
+
+6.6 Termination
+---------------
+
+A connection can be terminated by simply closing the file descriptor that was
+used to start the connection. All pending incoming messages will be discarded,
+and the memory in the pool will be freed.
+
+An alternative way of way of closing down a connection is calling the
+KDBUS_CMD_BYEBYE ioctl on it, which will only succeed if the message queue
+of the connection is empty at the time of closing, otherwise, -EBUSY is
+returned.
+
+When this ioctl returns successfully, the connection has been terminated and
+won't accept any new messages from remote peers. This way, a connection can
+be terminated race-free, without losing any messages.
+
+
+7. Messages
+===============================================================================
+
+Messages consist of a fixed-size header followed directly by a list of
+variable-sized data 'items'. The overall message size is specified in the
+header of the message. The chain of data items can contain well-defined
+message metadata fields, raw data, references to data, or file descriptors.
+
+
+7.1 Sending messages
+--------------------
+
+Messages are passed to the kernel with the KDBUS_CMD_MSG_SEND ioctl. Depending
+on the the destination address of the message, the kernel delivers the message
+to the specific destination connection or to all connections on the same bus.
+Sending messages across buses is not possible. Messages are always queued in
+the memory pool of the destination connection (see below).
+
+The KDBUS_CMD_MSG_SEND ioctl uses struct kdbus_msg to describe the message to
+be sent.
+
+struct kdbus_msg {
+ __u64 size;
+ The over all size of the struct, including the attached items.
+
+ __u64 flags;
+ Flags for message delivery:
+
+ KDBUS_MSG_FLAGS_EXPECT_REPLY
+ Expect a reply from the remote peer to this message. With this bit set,
+ the timeout_ns field must be set to a non-zero number of nanoseconds in
+ which the receiving peer is expected to reply. If such a reply is not
+ received in time, the sender will be notified with a timeout message
+ (see below). The value must be an absolute value, in nanoseconds and
+ based on CLOCK_MONOTONIC.
+
+ For a message to be accepted as reply, it must be a direct message to
+ the original sender (not a broadcast), and its kdbus_msg.reply_cookie
+ must match the previous message's kdbus_msg.cookie.
+
+ Expected replies also temporarily open the policy of the sending
+ connection, so the other peer is allowed to respond within the given
+ time window.
+
+ KDBUS_MSG_FLAGS_SYNC_REPLY
+ By default, all calls to kdbus are considered asynchronous,
+ non-blocking. However, as there are many use cases that need to wait
+ for a remote peer to answer a method call, there's a way to send a
+ message and wait for a reply in a synchronous fashion. This is what
+ the KDBUS_MSG_FLAGS_SYNC_REPLY controls. The KDBUS_CMD_MSG_SEND ioctl
+ will block until the reply has arrived, the timeout limit is reached,
+ in case the remote connection was shut down, or if interrupted by
+ a signal before any reply; see signal(7).
+
+ The offset of the reply message in the sender's pool is stored in
+ in 'offset_reply' when the ioctl has returned without error. Hence,
+ there is no need for another KDBUS_CMD_MSG_RECV ioctl or anything else
+ to receive the reply.
+
+ KDBUS_MSG_FLAGS_NO_AUTO_START
+ By default, when a message is sent to an activator connection, the
+ activator notified and will start an implementor. This flag inhibits
+ that behavior. With this bit set, and the remote being an activator,
+ -EADDRNOTAVAIL is returned from the ioctl.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call of
+ KDBUS_MSG_SEND.
+
+ __s64 priority;
+ The priority of this message. Receiving messages (see below) may
+ optionally be constrained to messages of a minimal priority. This
+ allows for use cases where timing critical data is interleaved with
+ control data on the same connection. If unused, the priority should be
+ set to zero.
+
+ __u64 dst_id;
+ The numeric ID of the destination connection, or KDBUS_DST_ID_BROADCAST
+ (~0ULL) to address every peer on the bus, or KDBUS_DST_ID_NAME (0) to look
+ it up dynamically from the bus' name registry. In the latter case, an item
+ of type KDBUS_ITEM_DST_NAME is mandatory.
+
+ __u64 src_id;
+ Upon return of the ioctl, this member will contain the sending
+ connection's numerical ID. Should be 0 at send time.
+
+ __u64 payload_type;
+ Type of the payload in the actual data records. Currently, only
+ KDBUS_PAYLOAD_DBUS is accepted as input value of this field. When
+ receiving messages that are generated by the kernel (notifications),
+ this field will yield KDBUS_PAYLOAD_KERNEL.
+
+ __u64 cookie;
+ Cookie of this message, for later recognition. Also, when replying
+ to a message (see above), the cookie_reply field must match this value.
+
+ __u64 timeout_ns;
+ If the message sent requires a reply from the remote peer (see above),
+ this field contains the timeout in absolute nanoseconds based on
+ CLOCK_MONOTONIC.
+
+ __u64 cookie_reply;
+ If the message sent is a reply to another message, this field must
+ match the cookie of the formerly received message.
+
+ __u64 offset_reply;
+ If the message successfully got a synchronous reply (see above), this
+ field will yield the offset of the reply message in the sender's pool.
+ Is is what KDBUS_CMD_MSG_RECV usually does for asynchronous messages.
+
+ struct kdbus_item items[0];
+ A dynamically sized list of items to contain additional information.
+ The following items are expected/valid:
+
+ KDBUS_ITEM_PAYLOAD_VEC
+ KDBUS_ITEM_PAYLOAD_MEMFD
+ KDBUS_ITEM_FDS
+ Actual data records containing the payload. See section "Passing of
+ Payload Data".
+
+ KDBUS_ITEM_BLOOM_FILTER
+ Bloom filter for matches (see below).
+
+ KDBUS_ITEM_DST_NAME
+ Well-known name to send this message to. Required if dst_id is set
+ to KDBUS_DST_ID_NAME. If a connection holding the given name can't
+ be found, -ESRCH is returned.
+ For messages to a unique name (ID), this item is optional. If present,
+ the kernel will make sure the name owner matches the given unique name.
+ This allows userspace tie the message sending to the condition that a
+ name is currently owned by a certain unique name.
+};
+
+The message will be augmented by the requested metadata items when queued into
+the receiver's pool. See also section 13.1 ("Metadata and namespaces").
+
+
+7.2 Message layout
+------------------
+
+The layout of a message is shown below.
+
+ +-------------------------------------------------------------------------+
+ | Message |
+ | +---------------------------------------------------------------------+ |
+ | | Header | |
+ | | size: overall message size, including the data records | |
+ | | destination: connection id of the receiver | |
+ | | source: connection id of the sender (set by kernel) | |
+ | | payload_type: "DBusDBus" textual identifier stored as uint64_t | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | Data Record | |
+ | | size: overall record size (without padding) | |
+ | | type: type of data | |
+ | | data: reference to data (address or file descriptor) | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | padding bytes to the next 8 byte alignment | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | Data Record | |
+ | | size: overall record size (without padding) | |
+ | | ... | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | padding bytes to the next 8 byte alignment | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | Data Record | |
+ | | size: overall record size | |
+ | | ... | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | padding bytes to the next 8 byte alignment | |
+ | +---------------------------------------------------------------------+ |
+ +-------------------------------------------------------------------------+
+
+
+7.3 Passing of Payload Data
+---------------------------
+
+When connecting to the bus, receivers request a memory pool of a given size,
+large enough to carry all backlog of data enqueued for the connection. The
+pool is internally backed by a shared memory file which can be mmap()ed by
+the receiver.
+
+KDBUS_MSG_PAYLOAD_VEC:
+ Messages are directly copied by the sending process into the receiver's pool,
+ that way two peers can exchange data by effectively doing a single-copy from
+ one process to another, the kernel will not buffer the data anywhere else.
+
+KDBUS_MSG_PAYLOAD_MEMFD:
+ Messages can reference memfd files which contain the data.
+ memfd files are tmpfs-backed files that allow sealing of the content of the
+ file, which prevents all writable access to the file content.
+ Only sealed memfd files are accepted as payload data, which enforces
+ reliable passing of data; the receiver can assume that neither the sender nor
+ anyone else can alter the content after the message is sent.
+
+Apart from the sender filling-in the content into memfd files, the data will
+be passed as zero-copy from one process to another, read-only, shared between
+the peers.
+
+
+7.4 Receiving messages
+----------------------
+
+Messages are received by the client with the KDBUS_CMD_MSG_RECV ioctl. The
+endpoint device node of the bus supports poll() to wake up the receiving
+process when new messages are queued up to be received.
+
+With the KDBUS_CMD_MSG_RECV ioctl, a struct kdbus_cmd_recv is used.
+
+struct kdbus_cmd_recv {
+ __u64 flags;
+ Flags to control the receive command.
+
+ KDBUS_RECV_PEEK
+ Just return the location of the next message. Do not install file
+ descriptors or anything else. This is usually used to determine the
+ sender of the next queued message.
+
+ KDBUS_RECV_DROP
+ Drop the next message without doing anything else with it, and free the
+ pool slice. This a short-cut for KDBUS_RECV_PEEK and KDBUS_CMD_FREE.
+
+ KDBUS_RECV_USE_PRIORITY
+ Use the priority field (see below).
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ __s64 priority;
+ With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
+ the queue with at least the given priority. If no such message is waiting
+ in the queue, -ENOMSG is returned.
+
+ __u64 offset;
+ Upon return of the ioctl, this field contains the offset in the
+ receiver's memory pool.
+};
+
+Unless KDBUS_RECV_DROP was passed, and given that the ioctl succeeded, the
+offset field contains the location of the new message inside the receiver's
+pool. The message is stored as struct kdbus_msg at this offset, and can be
+interpreted with the semantics described above.
+
+Also, if the connection allowed for file descriptor to be passed
+(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
+installed into the receiving process after the KDBUS_CMD_MSG_RECV ioctl
+returns. The receiving task is obliged to close all of them appropriately.
+
+The caller is obliged to call KDBUS_CMD_FREE with the returned offset when
+the memory is no longer needed.
+
+
+7.5 Canceling messages synchronously waiting for replies
+--------------------------------------------------------
+
+When a connection sends a message with KDBUS_MSG_FLAGS_SYNC_REPLY and
+blocks while waiting for the reply, the KDBUS_CMD_MSG_CANCEL ioctl can be
+used on the same file descriptor to cancel the message, based on its cookie.
+If there are multiple messages with the same cookie that are all synchronously
+waiting for a reply, all of them will be canceled. Obviously, this is only
+possible in multi-threaded applications.
+
+
+8. Name registry
+===============================================================================
+
+Each bus instantiates a name registry to resolve well-known names into unique
+connection IDs for message delivery. The registry will be queried when a
+message is sent with kdbus_msg.dst_id set to KDBUS_DST_ID_NAME, or when a
+registry dump is requested.
+
+All of the below is subject to policy rules for SEE and OWN permissions.
+
+
+8.1 Name validity
+-----------------
+
+A name has to comply to the following rules to be considered valid:
+
+ - The name has two or more elements separated by a period ('.') character
+ - All elements must contain at least one character
+ - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_"
+ and must not begin with a digit
+ - The name must contain at least one '.' (period) character
+ (and thus at least two elements)
+ - The name must not begin with a '.' (period) character
+ - The name must not exceed KDBUS_NAME_MAX_LEN (255)
+
+
+8.2 Acquiring a name
+--------------------
+
+To acquire a name, a client uses the KDBUS_CMD_NAME_ACQUIRE ioctl with the
+following data structure.
+
+struct kdbus_cmd_name {
+ __u64 size;
+ The overall size of this struct, including the name with its 0-byte string
+ terminator.
+
+ __u64 flags;
+ Flags to control details in the name acquisition.
+
+ KDBUS_NAME_REPLACE_EXISTING
+ Acquiring a name that is already present usually fails, unless this flag
+ is set in the call, and KDBUS_NAME_ALLOW_REPLACEMENT or (see below) was
+ set when the current owner of the name acquired it, or if the current
+ owner is an activator connection (see below).
+
+ KDBUS_NAME_ALLOW_REPLACEMENT
+ Allow other connections to take over this name. When this happens, the
+ former owner of the connection will be notified of the name loss.
+
+ KDBUS_NAME_QUEUE (acquire)
+ A name that is already acquired by a connection, and which wasn't
+ requested with the KDBUS_NAME_ALLOW_REPLACEMENT flag set can not be
+ acquired again. However, a connection can put itself in a queue of
+ connections waiting for the name to be released. Once that happens, the
+ first connection in that queue becomes the new owner and is notified
+ accordingly.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ Items to submit the name. Currently, one one item of type KDBUS_ITEM_NAME
+ is expected and allowed, and the contained string must be a valid bus name.
+};
+
+
+8.3 Releasing a name
+--------------------
+
+A connection may release a name explicitly with the KDBUS_CMD_NAME_RELEASE
+ioctl. If the connection was an implementor of an activatable name, its
+pending messages are moved back to the activator. If there are any connections
+queued up as waiters for the name, the oldest one of them will become the new
+owner. The same happens implicitly for all names once a connection terminates.
+
+The KDBUS_CMD_NAME_RELEASE ioctl uses the same data structure as the
+acquisition call, but with slightly different field usage.
+
+struct kdbus_cmd_name {
+ __u64 size;
+ The overall size of this struct, including the name with its 0-byte string
+ terminator.
+
+ __u64 flags;
+
+ struct kdbus_item items[0];
+ Items to submit the name. Currently, one one item of type KDBUS_ITEM_NAME
+ is expected and allowed, and the contained string must be a valid bus name.
+};
+
+
+8.4 Dumping the name registry
+-----------------------------
+
+A connection may request a complete or filtered dump of currently active bus
+names with the KDBUS_CMD_NAME_LIST ioctl, which takes a struct
+kdbus_cmd_name_list as argument.
+
+struct kdbus_cmd_name_list {
+ __u64 flags;
+ Any combination of flags to specify which names should be dumped.
+
+ KDBUS_NAME_LIST_UNIQUE
+ List the unique (numeric) IDs of the connection, whether it owns a name
+ or not.
+
+ KDBUS_NAME_LIST_NAMES
+ List well-known names stored in the database which are actively owned by
+ a real connection (not an activator).
+
+ KDBUS_NAME_LIST_ACTIVATORS
+ List names that are owned by an activator.
+
+ KDBUS_NAME_LIST_QUEUED
+ List connections that are not yet owning a name but are waiting for it
+ to become available.
+
+ __u64 offset;
+ When the ioctl returns successfully, the offset to the name registry dump
+ inside the connection's pool will be stored in this field.
+};
+
+The returned list of names is stored in a struct kdbus_name_list that in turn
+contains a dynamic number of struct kdbus_cmd_name that carry the actual
+information. The fields inside that struct kdbus_cmd_name is described next.
+
+struct kdbus_name_info {
+ __u64 size;
+ The overall size of this struct, including the name with its 0-byte string
+ terminator.
+
+ __u64 flags;
+ The current flags for this name. Can be any combination of
+
+ KDBUS_NAME_ALLOW_REPLACEMENT
+
+ KDBUS_NAME_IN_QUEUE (list)
+ When retrieving a list of currently acquired name in the registry, this
+ flag indicates whether the connection actually owns the name or is
+ currently waiting for it to become available.
+
+ KDBUS_NAME_ACTIVATOR (list)
+ An activator connection owns a name as a placeholder for an implementor,
+ which is started on demand as soon as the first message arrives. There's
+ some more information on this topic below. In contrast to
+ KDBUS_NAME_REPLACE_EXISTING, when a name is taken over from an activator
+ connection, all the messages that have been queued in the activator
+ connection will be moved over to the new owner. The activator connection
+ will still be tracked for the name and will take control again if the
+ implementor connection terminates.
+ This flag can not be used when acquiring a name, but is implicitly set
+ through KDBUS_CMD_HELLO with KDBUS_HELLO_ACTIVATOR set in
+ kdbus_cmd_hello.conn_flags.
+
+ __u64 owner_id;
+ The owning connection's unique ID.
+
+ __u64 conn_flags;
+ The flags of the owning connection.
+
+ struct kdbus_item items[0];
+ Items containing the actual name. Currently, one one item of type
+ KDBUS_ITEM_NAME will be attached.
+};
+
+The returned buffer must be freed with the KDBUS_CMD_FREE ioctl when the user
+is finished with it.
+
+
+9. Notifications
+===============================================================================
+
+The kernel will notify its users of the following events.
+
+ * When connection A is terminated while connection B is waiting for a reply
+ from it, connection B is notified with a message with an item of type
+ KDBUS_ITEM_REPLY_DEAD.
+
+ * When connection A does not receive a reply from connection B within the
+ specified timeout window, connection A will receive a message with an item
+ of type KDBUS_ITEM_REPLY_TIMEOUT.
+
+ * When a connection is created on or removed from a bus, messages with an
+ item of type KDBUS_ITEM_ID_ADD or KDBUS_ITEM_ID_REMOVE, respectively, are
+ sent to all bus members that match these messages through their match
+ database.
+
+ * When a connection owns or loses a name, or a name is moved from one
+ connection to another, messages with an item of type KDBUS_ITEM_NAME_ADD,
+ KDBUS_ITEM_NAME_REMOVE or KDBUS_ITEM_NAME_CHANGE are sent to all bus
+ members that match these messages through their match database.
+
+A kernel notification is a regular kdbus message with the following details.
+
+ * kdbus_msg.src_id == KDBUS_SRC_ID_KERNEL
+ * kdbus_msg.dst_id == KDBUS_DST_ID_BROADCAST
+ * kdbus_msg.payload_type == KDBUS_PAYLOAD_KERNEL
+ * Has exactly one of the aforementioned items attached
+
+
+10. Message Matching, Bloom filters
+===============================================================================
+
+10.1 Matches for broadcast messages from other connections
+----------------------------------------------------------
+
+A message addressed at the connection ID KDBUS_DST_ID_BROADCAST (~0ULL) is a
+broadcast message, delivered to all connected peers which installed a rule to
+match certain properties of the message. Without any rules installed in the
+connection, no broadcast message or kernel-side notifications will be delivered
+to the connection. Broadcast messages are subject to policy rules and TALK
+access checks.
+
+See section 11 for details on policies, and section 11.5 for more
+details on implicit policies.
+
+Matches for messages from other connections (not kernel notifications) are
+implemented as bloom filters. The sender adds certain properties of the message
+as elements to a bloom filter bit field, and sends that along with the
+broadcast message.
+
+The connection adds the message properties it is interested as elements to a
+bloom mask bit field, and uploads the mask to the match rules of the
+connection.
+
+The kernel will match the broadcast message's bloom filter against the
+connections bloom mask (simply by &-ing it), and decide whether the message
+should be delivered to the connection.
+
+The kernel has no notion of any specific properties of the message, all it
+sees are the bit fields of the bloom filter and mask to match against. The
+use of bloom filters allows simple and efficient matching, without exposing
+any message properties or internals to the kernel side. Clients need to deal
+with the fact that they might receive broadcasts which they did not subscribe
+to, as the bloom filter might allow false-positives to pass the filter.
+
+To allow the future extension of the set of elements in the bloom filter, the
+filter specifies a "generation" number. A later generation must always contain
+all elements of the set of the previous generation, but can add new elements
+to the set. The match rules mask can carry an array with all previous
+generations of masks individually stored. When the filter and mask are matched
+by the kernel, the mask with the closest matching "generation" is selected
+as the index into the mask array.
+
+
+10.2 Matches for kernel notifications
+------------------------------------
+
+To receive kernel generated notifications (see section 9), a connection must
+install special match rules that are different from the bloom filter matches
+described in the section above. They can be filtered by a sender connection's
+ID, by one of the name the sender connection owns at the time of sending the
+message, or by type of the notification (id/name add/remove/change).
+
+10.3 Adding a match
+-------------------
+
+To add a match, the KDBUS_CMD_MATCH_ADD ioctl is used, which takes a struct
+of the struct described below.
+
+Note that each of the items attached to this command will internally create
+one match 'rule', and the collection of them, which is submitted as one block
+via the ioctl is called a 'match'. To allow a message to pass, all rules of a
+match have to be satisfied. Hence, adding more items to the command will only
+narrow the possibility of a match to effectively let the message pass, and will
+cause the connection's user space process to wake up less likely.
+
+Multiple matches can be installed per connection. As long as one of it has a
+set of rules which allows the message to pass, this one will be decisive.
+
+struct kdbus_cmd_match {
+ __u64 size;
+ The overall size of the struct, including its items.
+
+ __u64 cookie;
+ A cookie which identifies the match, so it can be referred to at removal
+ time.
+
+ __u64 flags;
+ Flags to control the behavior of the ioctl.
+
+ KDBUS_MATCH_REPLACE:
+ Remove all entries with the given cookie before installing the new one.
+ This allows for race-free replacement of matches.
+
+ struct kdbus_item items[0];
+ Items to define the actual rules of the matches. The following item types
+ are expected. Each item will cause one new match rule to be created.
+
+ KDBUS_ITEM_BLOOM_MASK
+ An item that carries the bloom filter mask to match against in its
+ data field. The payload size must match the bloom filter size that
+ was specified when the bus was created.
+ See section 10.4 for more information.
+
+ KDBUS_ITEM_NAME
+ Specify a name that a sending connection must own at a time of sending
+ a broadcast message in order to match this rule.
+
+ KDBUS_ITEM_ID
+ Specify a sender connection's ID that will match this rule.
+
+ KDBUS_ITEM_NAME_ADD
+ KDBUS_ITEM_NAME_REMOVE
+ KDBUS_ITEM_NAME_CHANGE
+ These items request delivery of broadcast messages that describe a name
+ acquisition, loss, or change. The details are stored in the item's
+ kdbus_notify_name_change member. All information specified must be
+ matched in order to make the message pass. Use KDBUS_MATCH_ID_ANY to
+ match against any unique connection ID.
+
+ KDBUS_ITEM_ID_ADD
+ KDBUS_ITEM_ID_REMOVE
+ These items request delivery of broadcast messages that are generated
+ when a connection is created or terminated. struct kdbus_notify_id_change
+ is used to store the actual match information. This item can be used to
+ monitor one particular connection ID, or, when the id field is set to
+ KDBUS_MATCH_ID_ANY, all of them.
+
+ Other item types are ignored.
+};
+
+
+10.4 Bloom filters
+------------------
+
+Bloom filters allow checking whether a given word is present in a dictionary.
+This allows connections to set up a mask for information it is interested in,
+and will be delivered broadcast messages that have a matching filter.
+
+For general information on bloom filters, see
+
+ https://en.wikipedia.org/wiki/Bloom_filter
+
+The size of the bloom filter is defined per bus when it is created, in
+kdbus_bloom_parameter.size. All bloom filters attached to broadcast messages
+on the bus must match this size, and all bloom filter matches uploaded by
+connections must also match the size, or a multiple thereof (see below).
+
+The calculation of the mask has to be done on the userspace side. The kernel
+just checks the bitmasks to decide whether or not to let the message pass. All
+bits in the mask must match the filter in and bit-wise AND logic, but the
+mask may have more bits set than the filter. Consequently, false positive
+matches are expected to happen, and userspace must deal with that fact.
+
+Masks are entities that are always passed to the kernel as part of a match
+(with an item of type KDBUS_ITEM_BLOOM_MASK), and filters can be attached to
+broadcast messages (with an item of type KDBUS_ITEM_BLOOM_FILTER).
+
+For a broadcast to match, all set bits in the filter have to be set in the
+installed match mask as well. For example, consider a bus has a bloom size
+of 8 bytes, and the following mask/filter combinations:
+
+ filter 0x0101010101010101
+ mask 0x0101010101010101
+ -> matches
+
+ filter 0x0303030303030303
+ mask 0x0101010101010101
+ -> doesn't match
+
+ filter 0x0101010101010101
+ mask 0x0303030303030303
+ -> matches
+
+Hence, in order to catch all messages, a mask filled with 0xff bytes can be
+installed as a wildcard match rule.
+
+Uploaded matches may contain multiple masks, each of which in the size of the
+bloom size defined by the bus. Each block of a mask is called a 'generation',
+starting at index 0.
+
+At match time, when a broadcast message is about to be delivered, a bloom
+mask generation is passed, which denotes which of the bloom masks the filter
+should be matched against. This allows userspace to provide backward compatible
+masks at upload time, while older clients can still match against older
+versions of filters.
+
+
+10.5 Removing a match
+--------------------
+
+Matches can be removed through the KDBUS_CMD_MATCH_REMOVE ioctl, which again
+takes struct kdbus_cmd_match as argument, but its fields are used slightly
+differently.
+
+struct kdbus_cmd_match {
+ __u64 size;
+ The overall size of the struct. As it has no items in this use case, the
+ value should yield 16.
+
+ __u64 cookie;
+ The cookie of the match, as it was passed when the match was added.
+ All matches that have this cookie will be removed.
+
+ __u64 flags;
+ Unused for this use case,
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ Unused for this use case.
+};
+
+
+11. Policy
+===============================================================================
+
+A policy databases restrict the possibilities of connections to own, see and
+talk to well-known names. It can be associated with a bus (through a policy
+holder connection) or a custom endpoint.
+
+See section 8.1 for more details on the validity of well-known names.
+
+Default endpoints of buses always have a policy database. The default
+policy is to deny all operations except for operations that are covered by
+implicit policies. Custom endpoints always have a policy, and by default,
+a policy database is empty. Therefore, unless policy rules are added, all
+operations will also be denied by default.
+
+See section 11.5 for more details on implicit policies.
+
+A set of policy rules is described by a name and multiple access rules, defined
+by the following struct.
+
+struct kdbus_policy_access {
+ __u64 type; /* USER, GROUP, WORLD */
+ One of the following.
+
+ KDBUS_POLICY_ACCESS_USER
+ Grant access to a user with the uid stored in the 'id' field.
+
+ KDBUS_POLICY_ACCESS_GROUP
+ Grant access to a user with the gid stored in the 'id' field.
+
+ KDBUS_POLICY_ACCESS_WORLD
+ Grant access to everyone. The 'id' field is ignored.
+
+ __u64 access; /* OWN, TALK, SEE */
+ The access to grant.
+
+ KDBUS_POLICY_SEE
+ Allow the name to be seen.
+
+ KDBUS_POLICY_TALK
+ Allow the name to be talked to.
+
+ KDBUS_POLICY_OWN
+ Allow the name to be owned.
+
+ __u64 id;
+ For KDBUS_POLICY_ACCESS_USER, stores the uid.
+ For KDBUS_POLICY_ACCESS_GROUP, stores the gid.
+};
+
+Policies are set through KDBUS_CMD_HELLO (when creating a policy holder
+connection), KDBUS_CMD_CONN_UPDATE (when updating a policy holder connection),
+KDBUS_CMD_ENDPOINT_MAKE (creating a custom endpoint) or
+KDBUS_CMD_ENDPOINT_UPDATE (updating a custom endpoint). In all cases, the name
+and policy access information is stored in items of type KDBUS_ITEM_NAME and
+KDBUS_ITEM_POLICY_ACCESS. For this transport, the following rules apply.
+
+ * An item of type KDBUS_ITEM_NAME must be followed by at least one
+ KDBUS_ITEM_POLICY_ACCESS item
+ * An item of type KDBUS_ITEM_NAME can be followed by an arbitrary number of
+ KDBUS_ITEM_POLICY_ACCESS items
+ * An arbitrary number of groups of names and access levels can be passed
+
+uids and gids are internally always stored in the kernel's view of global ids,
+and are translated back and forth on the ioctl level accordingly.
+
+
+11.2 Wildcard names
+-------------------
+
+Policy holder connections may upload names that contain the wildcard suffix
+(".*"). That way, a policy can be uploaded that is effective for every
+well-kwown name that extends the provided name by exactly one more level.
+
+For example, if an item of a set up uploaded policy rules contains the name
+"foo.bar.*", both "foo.bar.baz" and "foo.bar.bazbaz" are valid, but
+"foo.bar.baz.baz" is not.
+
+This allows connections to take control over multiple names that the policy
+holder doesn't need to know about when uploading the policy.
+
+Such wildcard entries are not allowed for custom endpoints.
+
+
+11.3 Policy example
+-------------------
+
+For example, a set of policy rules may look like this:
+
+ KDBUS_ITEM_NAME: str='org.foo.bar'
+ KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=1000
+ KDBUS_ITEM_POLICY_ACCESS: type=USER, access=TALK, id=1001
+ KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=SEE
+ KDBUS_ITEM_NAME: str='org.blah.baz'
+ KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=0
+ KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=TALK
+
+That means that 'org.foo.bar' may only be owned by uid 1000, but every user on
+the bus is allowed to see the name. However, only uid 1001 may actually send
+a message to the connection and receive a reply from it.
+
+The second rule allows 'org.blah.baz' to be owned by uid 0 only, but every user
+may talk to it.
+
+
+11.4 TALK access and multiple well-known names per connection
+-------------------------------------------------------------
+
+Note that TALK access is checked against all names of a connection.
+For example, if a connection owns both 'org.foo.bar' and 'org.blah.baz', and
+the policy database allows 'org.blah.baz' to be talked to by WORLD, then this
+permission is also granted to 'org.foo.bar'. That might sound illogical, but
+after all, we allow messages to be directed to either the name or a well-known
+name, and policy is applied to the connection, not the name. In other words,
+the effective TALK policy for a connection is the most permissive of all names
+the connection owns.
+
+If a policy database exists for a bus (because a policy holder created one on
+demand) or for a custom endpoint (which always has one), each one is consulted
+during name registry listing, name owning or message delivery. If either one
+fails, the operation is failed with -EPERM.
+
+For best practices, connections that own names with a restricted TALK
+access should not install matches. This avoids cases where the sent
+message may pass the bloom filter due to false-positives and may also
+satisfy the policy rules.
+
+11.5 Implicit policies
+----------------------
+
+Depending on the type of the endpoint, a set of implicit rules might be
+enforced. On default endpoints, the following set is enforced:
+
+ * Privileged connections always override any installed policy. Those
+ connections could easily install their own policies, so there is no
+ reason to enforce installed policies.
+ * Connections can always talk to connections of the same user. This
+ includes broadcast messages.
+ * Connections that own names might send broadcast messages to other
+ connections that belong to a different user, but only if that
+ destination connection does not own any name.
+
+Custom endpoints have stricter policies. The following rules apply:
+
+ * Policy rules are always enforced, even if the connection is a privileged
+ connection.
+ * Policy rules are always enforced for TALK access, even if both ends are
+ running under the same user. This includes broadcast messages.
+ * To restrict the set of names that can be seen, endpoint policies can
+ install "SEE" policies.
+
+
+12. Pool
+===============================================================================
+
+A pool for data received from the kernel is installed for every connection of
+the bus, and is sized according to kdbus_cmd_hello.pool_size. It is accessed
+when one of the following ioctls is issued:
+
+ * KDBUS_CMD_MSG_RECV, to receive a message
+ * KDBUS_CMD_NAME_LIST, to dump the name registry
+ * KDBUS_CMD_CONN_INFO, to retrieve information on a connection
+
+Internally, the pool is organized in slices, stored in an rb-tree. The offsets
+returned by either one of the aforementioned ioctls describe offsets inside the
+pool. In order to make the slice available for subsequent calls, KDBUS_CMD_FREE
+has to be called on the offset.
+
+To access the memory, the caller is expected to mmap() it to its task, like
+this:
+
+ /*
+ * POOL_SIZE has to be a multiple of PAGE_SIZE, and it must match the
+ * value that was previously passed in the .pool_size field of struct
+ * kdbus_cmd_hello.
+ */
+
+ buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_PRIVATE, conn_fd, 0);
+
+
+13. Metadata
+===============================================================================
+
+When a message is delivered to a receiver connection, it is augmented by
+metadata items in accordance to the destination's current attach flags. The
+information stored in those metadata items refer to the sender task at the
+time of sending the message, so even if any detail of the sender task has
+already changed upon message reception (or if the sender task does not exist
+anymore), the information is still preserved and won't be modfied until the
+message is freed.
+
+Note that there are two exceptions to the above rules:
+
+ a) Kernel generated messages don't have a source connection, so they won't be
+ augmented.
+
+ b) If a connection was created with faked credentials (see section 6.2),
+ the only attached metadata items are the ones provided by the connection
+ itself. The destination's attach_flags won't be looked at in such cases.
+
+Also, there are two things to be considered by userspace programs regarding
+those metadata items:
+
+ a) Userspace must cope with the fact that it might get more metadata than
+ they requested. That happens, for example, when a broadcast message is
+ sent and receivers have different attach flags. Items that haven't been
+ requested should hence be silently ignored.
+
+ b) Userspace might not always get all requested metadata items that it
+ requested. That is because some of those items are only added if a
+ corresponding kernel feature has been enabled. Also, the two exceptions
+ described above will as well lead to less items be attached than
+ requested.
+
+
+13.1 Known item types
+---------------------
+
+The following attach flags are currently supported.
+
+ KDBUS_ATTACH_TIMESTAMP
+ Attaches an item of type KDBUS_ITEM_TIMESTAMP which contains both the
+ monotonic and the realtime timestamp, taken when the message was
+ processed on the kernel side.
+
+ KDBUS_ATTACH_CREDS
+ Attaches an item of type KDBUS_ITEM_CREDS, containing credentials as
+ described in kdbus_creds: the uid, gid, pid, tid and starttime of the task.
+
+ KDBUS_ATTACH_AUXGROUPS
+ Attaches an item of type KDBUS_ITEM_AUXGROUPS, containing a dynamic
+ number of auxiliary groups the sending task was a member of.
+
+ KDBUS_ATTACH_NAMES
+ Attaches items of type KDBUS_ITEM_NAME, one for each name the sending
+ connection currently owns. The name is stored in kdbus_item.str for each
+ of them.
+
+ KDBUS_ATTACH_COMM
+ Attaches an items of type KDBUS_ITEM_PID_COMM and KDBUS_ITEM_TID_COMM,
+ both transporting the sending task's 'comm', for both the pid and the tid.
+ The strings are stored in kdbus_item.str.
+
+ KDBUS_ATTACH_EXE
+ Attaches an item of type KDBUS_ITEM_EXE, containing the path to the
+ executable of the sending task, stored in kdbus_item.str.
+
+ KDBUS_ATTACH_CMDLINE
+ Attaches an item of type KDBUS_ITEM_CMDLINE, containing the command line
+ arguments of the sending task, as an array of strings, stored in
+ kdbus_item.str.
+
+ KDBUS_ATTACH_CGROUP
+ Attaches an item of type KDBUS_ITEM_CGROUP with the task's cgroup path.
+
+ KDBUS_ATTACH_CAPS
+ Attaches an item of type KDBUS_ITEM_CAPS, carrying sets of capabilities
+ that should be accessed via kdbus_item.caps.caps. Also, userspace should
+ be written in a way that it takes kdbus_item.caps.last_cap into account,
+ and derive the number of sets and rows from the item size and the reported
+ number of valid capability bits.
+
+ KDBUS_ATTACH_SECLABEL
+ Attaches an item of type KDBUS_ITEM_SECLABEL, which contains the SELinux
+ security label of the sending task. Access via kdbus_item->str.
+
+ KDBUS_ATTACH_AUDIT
+ Attaches an item of type KDBUS_ITEM_AUDIT, which contains the audio label
+ of the sending taskj. Access via kdbus_item->str.
+
+ KDBUS_ATTACH_CONN_NAME
+ Attaches an item of type KDBUS_ITEM_CONN_NAME that contain's the
+ sending's connection current name in kdbus_item.str.
+
+
+13.1 Metadata and namespaces
+----------------------------
+Note that if the user or PID namespaces of a connection at the time of sending
+differ from those that were active then the connection was created
+(KDBUS_CMD_HELLO), data structures such as messages will not have any metadata
+attached to prevent leaking security-relevant information.
+
+
+14. Error codes
+===============================================================================
+
+Below is a list of error codes that might be returned by the individual
+ioctl commands. The list focuses on the return values from kdbus code itself,
+and might not cover those of all kernel internal functions.
+
+For all ioctls:
+
+ -ENOMEM The kernel memory is exhausted
+ -ENOTTY Illegal ioctl command issued for the file descriptor
+ -ENOSYS The requested functionality is not available
+
+For all ioctls that carry a struct as payload:
+
+ -EFAULT The supplied data pointer was not 64-bit aligned, or was
+ inaccessible from the kernel side.
+ -EINVAL The size inside the supplied struct was smaller than expected
+ -EMSGSIZE The size inside the supplied struct was bigger than expected
+ -ENAMETOOLONG A supplied name is larger than the allowed maximum size
+
+For KDBUS_CMD_BUS_MAKE:
+
+ -EINVAL The flags supplied in the kdbus_cmd_make struct are invalid or
+ the supplied name does not start with the current uid and a '-'
+ -EEXIST A bus of that name already exists
+ -ESHUTDOWN The domain for the bus is already shut down
+ -EMFILE The maximum number of buses for the current user is exhausted
+
+For KDBUS_CMD_DOMAIN_MAKE:
+
+ -EPERM The calling user does not have CAP_IPC_OWNER set, or
+ -EINVAL The flags supplied in the kdbus_cmd_make struct are invalid, or
+ no name supplied for top-level domain
+ -EEXIST A domain of that name already exists
+
+For KDBUS_CMD_ENDPOINT_MAKE:
+
+ -EPERM The calling user is not privileged (see Terminology)
+ -EINVAL The flags supplied in the kdbus_cmd_make struct are invalid
+ -EEXIST An endpoint of that name already exists
+
+For KDBUS_CMD_HELLO:
+
+ -EFAULT The supplied pool size was 0 or not a multiple of the page size
+ -EINVAL The flags supplied in the kdbus_cmd_make struct are invalid, or
+ an illegal combination of KDBUS_HELLO_MONITOR,
+ KDBUS_HELLO_ACTIVATOR and KDBUS_HELLO_POLICY_HOLDER was passed
+ in the flags, or an invalid set of items was supplied
+ -EPERM An KDBUS_ITEM_CREDS items was supplied, but the current user is
+ not privileged
+ -ESHUTDOWN The bus has already been shut down
+ -EMFILE The maximum number of connection on the bus has been reached
+
+For KDBUS_CMD_BYEBYE:
+
+ -EALREADY The connection has already been shut down
+ -EBUSY There are still messages queued up in the connection's pool
+
+For KDBUS_CMD_MSG_SEND:
+
+ -EOPNOTSUPP The connection is unconnected, or a fd was passed that is
+ either a kdbus handle itself or a unix domain socket. Both is
+ currently unsupported.
+ -EINVAL The submitted payload type is KDBUS_PAYLOAD_KERNEL,
+ KDBUS_MSG_FLAGS_EXPECT_REPLY was set without a timeout value,
+ KDBUS_MSG_FLAGS_SYNC_REPLY was set without
+ KDBUS_MSG_FLAGS_EXPECT_REPLY, an invalid item was supplied,
+ src_id was != 0 and different from the current connection's ID,
+ a supplied memfd had a size of 0, a string was not properly
+ nul-terminated
+ -ENOTUNIQ KDBUS_MSG_FLAGS_EXPECT_REPLY was set, but the dst_id is set
+ to KDBUS_DST_ID_BROADCAST
+ -E2BIG Too many items
+ -EMSGSIZE A payload vector was too big, and the current user is
+ unprivileged.
+ -ENOTUNIQ A fd or memfd payload was passed in a broadcast message, or
+ a timeout was given for a broadcast message
+ -EEXIST Multiple KDBUS_ITEM_FDS or KDBUS_ITEM_BLOOM_FILTER,
+ KDBUS_ITEM_DST_NAME were supplied
+ -EBADF A memfd item contained an illegal fd
+ -EMEDIUMTYPE A file descriptor which is not a kdbus memfd was
+ refused to send as KDBUS_MSG_PAYLOAD_MEMFD.
+ -EMFILE Too many file descriptors inside a KDBUS_ITEM_FDS
+ -EBADMSG An item had illegal size, both a dst_id and a
+ KDBUS_ITEM_DST_NAME was given, or both a name and a bloom
+ filter was given
+ -ETXTBSY A kdbus memfd file cannot be sealed or the seal removed,
+ because it is shared with other processes or still mmap()ed
+ -ECOMM A peer does not accept the file descriptors addressed to it
+ -EFAULT The supplied bloom filter size was not 64-bit aligned
+ -EDOM The supplied bloom filter size did not match the bloom filter
+ size of the bus
+ -EDESTADDRREQ dst_id was set to KDBUS_DST_ID_NAME, but no KDBUS_ITEM_DST_NAME
+ was attached
+ -ESRCH The name to look up was not found in the name registry
+ -EADDRNOTAVAIL KDBUS_MSG_FLAGS_NO_AUTO_START was given but the destination
+ connection is an activator.
+ -ENXIO The passed numeric destination connection ID couldn't be found,
+ or is not connected
+ -ECONNRESET The destination connection is no longer active
+ -ETIMEDOUT Timeout while synchronously waiting for a reply
+ -EINTR System call interrupted while synchronously waiting for a reply
+ -EPIPE When sending a message, a synchronous reply from the receiving
+ connection was expected but the connection died before
+ answering
+ -ECANCELED A synchronous message sending was cancelled
+ -ENOBUFS Too many pending messages on the receiver side
+ -EREMCHG Both a well-known name and a unique name (ID) was given, but
+ the name is not currently owned by that connection.
+
+For KDBUS_CMD_MSG_RECV:
+
+ -EINVAL Invalid flags or offset
+ -EAGAIN No message found in the queue
+ -ENOMSG No message of the requested priority found
+
+For KDBUS_CMD_MSG_CANCEL:
+
+ -EINVAL Invalid flags
+ -ENOENT Pending message with the supplied cookie not found
+
+For KDBUS_CMD_FREE:
+
+ -ENXIO No pool slice found at given offset
+ -EINVAL Invalid flags provided, the offset is valid, but the user is
+ not allowed to free the slice. This happens, for example, if
+ the offset was retrieved with KDBUS_RECV_PEEK.
+
+For KDBUS_CMD_NAME_ACQUIRE:
+
+ -EINVAL Illegal command flags, illegal name provided, or an activator
+ tried to acquire a second name
+ -EPERM Policy prohibited name ownership
+ -EALREADY Connection already owns that name
+ -EEXIST The name already exists and can not be taken over
+ -ECONNRESET The connection was reset during the call
+
+For KDBUS_CMD_NAME_RELEASE:
+
+ -EINVAL Invalid command flags, or invalid name provided
+ -ESRCH Name is not found found in the registry
+ -EADDRINUSE Name is owned by a different connection and can't be released
+
+For KDBUS_CMD_NAME_LIST:
+
+ -EINVAL Invalid flags
+ -ENOBUFS No available memory in the connection's pool.
+
+For KDBUS_CMD_CONN_INFO:
+
+ -EINVAL Invalid flags, or neither an ID nor a name was provided,
+ or the name is invalid.
+ -ESRCH Connection lookup by name failed
+ -ENXIO No connection with the provided number connection ID found
+
+For KDBUS_CMD_CONN_UPDATE:
+
+ -EINVAL Illegal flags or items
+ -EOPNOTSUPP Operation not supported by connection.
+ -E2BIG Too many policy items attached
+ -EINVAL Wildcards submitted in policy entries, or illegal sequence
+ of policy items
+
+For KDBUS_CMD_ENDPOINT_UPDATE:
+
+ -E2BIG Too many policy items attached
+ -EINVAL Invalid flags, or wildcards submitted in policy entries,
+ or illegal sequence of policy items
+
+For KDBUS_CMD_MATCH_ADD:
+
+ -EINVAL Illegal flags or items
+ -EDOM Illegal bloom filter size
+ -EMFILE Too many matches for this connection
+
+For KDBUS_CMD_MATCH_REMOVE:
+
+ -EINVAL Illegal flags
+ -ENOENT A match entry with the given cookie could not be found.
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Peter Meerwald
2014-10-30 12:30:37 UTC
Permalink
Post by Greg Kroah-Hartman
kdbus is a system for low-latency, low-overhead, easy to use
interprocess communication (IPC).
The interface to all functions in this driver is implemented through ioctls
on /dev nodes. This patch adds detailed documentation about the kernel
level API design.
just some typos below
Post by Greg Kroah-Hartman
---
Documentation/kdbus.txt | 1815 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 1815 insertions(+)
create mode 100644 Documentation/kdbus.txt
diff --git a/Documentation/kdbus.txt b/Documentation/kdbus.txt
new file mode 100644
index 000000000000..ac1a18908976
--- /dev/null
+++ b/Documentation/kdbus.txt
@@ -0,0 +1,1815 @@
+D-Bus is a system for powerful, easy to use interprocess communication (IPC).
+
+The focus of this document is an overview of the low-level, native kernel D-Bus
+transport called kdbus. Kdbus in the kernel acts similar to a device driver,
+all communication between processes take place over special character device
takes
Post by Greg Kroah-Hartman
+nodes in /dev/kdbus/.
+
+For the general D-Bus protocol specification, the payload format, the
+ http://dbus.freedesktop.org/doc/dbus-specification.html
+
+ http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
+
+ http://lwn.net/Articles/580194/
+
+
+1. Terminology
+===============================================================================
+
+ A domain is a named object containing a number of buses. A system
+ container that contains its own init system and users usually also
+ runs in its own kdbus domain. The /dev/kdbus/domain/<container-name>/
+ directory shows up inside the domain as /dev/kdbus/. Every domain offers
+ its own "control" device node to create new buses or new sub-domains.
+ Domains have no connection to each other and cannot see nor talk to
+ each other. See section 5 for more details.
+
+ A bus is a named object inside a domain. Clients exchange messages
+ over a bus. Multiple buses themselves have no connection to each other;
+ messages can only be exchanged on the same bus. The default entry point to
+ a bus, where clients establish the connection to, is the "bus" device node
+ /dev/kdbus/<bus name>/bus.
+ Common operating system setups create one "system bus" per system, and one
+ "user bus" for every logged-in user. Applications or services may create
+ their own private named buses. See section 5 for more details.
+
+ An endpoint provides the device node to talk to a bus. Opening an
+ endpoint creates a new connection to the bus to which the endpoint belongs.
+ Every bus has a default endpoint called "bus".
+ A bus can optionally offer additional endpoints with custom names to
+ provide a restricted access to the same bus. Custom endpoints carry
+ additional policy which can be used to give sandboxed processes only
+ a locked-down, limited, filtered access to the same bus.
+ See section 5 for more details.
+
+ A connection to a bus is created by opening an endpoint device node of
+ a bus and becoming an active client with the HELLO exchange. Every
+ connected client connection has a unique identifier on the bus and can
+ address messages to every other connection on the same bus by using
+ the peer's connection id as the destination.
+ See section 6 for more details.
+
+ Each connection allocates a piece of shmem-backed memory that is used
+ to receive messages and answers to ioctl command from the kernel. It is
+ never used to send anything to the kernel. In order to access that memory,
+ userspace must mmap() it into its task.
+ See section 12 for more details.
+
+ A connection can, in addition to its implicit unique connection id, request
+ the ownership of a textual well-known name. Well-known names are noted in
+ reverse-domain notation, such as com.example.service1. Connections offering
+ a service on a bus are usually reached by its well-known name. The analogy
+ of connection id and well-known name is an IP address and a DNS name
+ associated with that address.
+
+ Connections can exchange messages with other connections by addressing
+ the peers with their connection id or well-known name. A message consists
+ of a message header with kernel-specific information on how to route the
+ message, and the message payload, which is a logical byte stream of
+ arbitrary size. Messages can carry additional file descriptors to be passed
+ from one connection to another. Every connection can specify which set of
+ metadata the kernel should attach to the message when it is delivered
+ to the receiving connection. Metadata contains information like: system
+ timestamps, uid, gid, tid, proc-starttime, well-known-names, process comm,
+ process exe, process argv, cgroup, capabilities, seclabel, audit session,
+ loginuid and the connection's human-readable name.
+ See section 7 and 13 for more details.
+
+ The API of kdbus implements a notion of items, submitted through and
+ returned by most ioctls, and stored inside data structures in the
+ connection's pool. See section 4 for more details.
+
+ Broadcast messages are potentially sent to all connections of a bus. By
+ default, the connections will not actually receive any of the sent
+ broadcast messages; only after installing a match for specific message
+ properties, a broadcast message passes this filter.
+ See section 10 for more details.
+
+ A policy is a set of rules that define which connections can see, talk to,
+ or register a well-know name on the bus. A policy is attached to buses and
+ custom endpoints, and modified by policy holder connection or owners of
+ custom endpoints. See section 11 for more details.
+
+ Access rules to allow who can see a name on the bus are only checked on
+ custom endpoints. Policies may be defined with names that end with '.*'.
+ When matching a well-known name against such a wildcard entry, the last
+ part of the name is ignored and checked against the wildcard name without
+ the trailing '.*'. See section 11 for more details.
+
+ A user connecting to the bus is considered privileged if it is either the
+ creator of the bus, or if it has the CAP_IPC_OWNER capability flag set.
+
+
+2. Device Node Layout
+===============================================================================
+
+The kdbus interface is exposed through device nodes in /dev.
+
+ /sys/bus/kdbus
+ `-- devices
+ |-- kdbus!0-system!bus -> ../../../devices/virtual/kdbus/kdbus!0-system!bus
+ |-- kdbus!2702-user!bus -> ../../../devices/virtual/kdbus/kdbus!2702-user!bus
+ |-- kdbus!2702-user!ep.app -> ../../../devices/virtual/kdbus/kdbus!2702-user!ep.app
+ `-- kdbus!control -> ../../../devices/kdbus!control
+
+ /dev/kdbus
+ |-- control
+ |-- 0-system
+ | |-- bus
+ | `-- ep.apache
+ |-- 1000-user
+ | `-- bus
+ |-- 2702-user
+ | |-- bus
+ | `-- ep.app
+ `-- domain
+ |-- fedoracontainer
+ | |-- control
+ | |-- 0-system
+ | | `-- bus
+ | `-- 1000-user
+ | `-- bus
+ `-- mydebiancontainer
+ |-- control
+ `-- 0-system
+ `-- bus
+
+ The device node subdirectory layout is arranged that a future version of
+ kdbus could be implemented as a file system with a separate instance mounted
+ for each domain. For any future changes, this always needs to be kept
+ in mind. Also the dependency on udev's userspace hookups or sysfs attribute
+ use should be limited to the absolute minimum for the same reason.
+
+
+3. Data Structures and flags
+===============================================================================
+
+3.1 Data structures and interconnections
+----------------------------------------
+
+ +-------------------------------------------------------------------------+
+ | Domain (Init Domain) |
+ | /dev/kdbus/control |
+ | +---------------------------------------------------------------------+ |
+ | | Bus (System Bus) | |
+ | | /dev/kdbus/0-system/ | |
+ | | +-------------------------------+ +-------------------------------+ | |
+ | | | Endpoint | | Endpoint | | |
+ | | | /dev/kdbus/0-system/bus | | /dev/kdbus/0-system/ep.app | | |
+ | | +-------------------------------+ +-------------------------------+ | |
+ | | +--------------+ +--------------+ +--------------+ +--------------+ | |
+ | | | Connection | | Connection | | Connection | | Connection | | |
+ | | | :1.22 | | :1.25 | | :1.55 | | :1.81 | | |
+ | | +--------------+ +--------------+ +--------------+ +--------------+ | |
+ | +---------------------------------------------------------------------+ |
+ | |
+ | +---------------------------------------------------------------------+ |
+ | | Bus (User Bus for UID 2702) | |
+ | | /dev/kdbus/2702-user/ | |
+ | | +-------------------------------+ +-------------------------------+ | |
+ | | | Endpoint | | Endpoint | | |
+ | | | /dev/kdbus/2702-user/bus | | /dev/kdbus/2702-user/ep.app | | |
+ | | +-------------------------------+ +-------------------------------+ | |
+ | | +--------------+ +--------------+ +--------------+ +--------------+ | |
+ | | | Connection | | Connection | | Connection | | Connection | | |
+ | | | :1.22 | | :1.25 | | :1.55 | | :1.81 | | |
+ | | +--------------+ +--------------+ +-------------------------------+ | |
+ | +---------------------------------------------------------------------+ |
+ | |
+ | +---------------------------------------------------------------------+ |
+ | | Domain (Container; inside it, fedoracontainer/ becomes /dev/kdbus/) | |
+ | | /dev/kdbus/domain/fedoracontainer/control | |
+ | | +-----------------------------------------------------------------+ | |
+ | | | Bus (System Bus of "fedoracontainer") | | |
+ | | | /dev/kdbus/domain/fedoracontainer/0-system/ | | |
+ | | | +-----------------------------+ | | |
+ | | | | Endpoint | | | |
+ | | | | /dev/.../0-system/bus | | | |
+ | | | +-----------------------------+ | | |
+ | | | +-------------+ +-------------+ | | |
+ | | | | Connection | | Connection | | | |
+ | | | | :1.22 | | :1.25 | | | |
+ | | | +-------------+ +-------------+ | | |
+ | | +-----------------------------------------------------------------+ | |
+ | | | |
+ | | +-----------------------------------------------------------------+ | |
+ | | | Bus (User Bus for UID 270 of "fedoracontainer") | | |
+ | | | /dev/kdbus/domain/fedoracontainer/2702-user/ | | |
+ | | | +-----------------------------+ | | |
+ | | | | Endpoint | | | |
+ | | | | /dev/.../2702-user/bus | | | |
+ | | | +-----------------------------+ | | |
+ | | | +-------------+ +-------------+ | | |
+ | | | | Connection | | Connection | | | |
+ | | | | :1.22 | | :1.25 | | | |
+ | | | +-------------+ +-------------+ | | |
+ | | +-----------------------------------------------------------------+ | |
+ | +---------------------------------------------------------------------+ |
+ +-------------------------------------------------------------------------+
+
+The above description uses the D-Bus notation of unique connection names that
+adds a ":1." prefix to the connection's unique ID. kbus itself doesn't
+use that notation, neither internally nor externally. However, libraries and
+other usespace code that aims for compatibility to D-Bus might.
+
+3.2 Flags
+---------
+
+All ioctls used in the communication with the driver contain two 64-bit fields,
+'flags' and 'kernel_flags'. In 'flags', the behavior of the command can be
+tweaked, whereas in 'kernel_flags', the kernel driver writes back the mask of
+supported bits upon each call, and sets the KDBUS_FLAGS_KERNEL bit. This is a
+way to probe possible kernel features and make code forward and backward
+compatible.
+
+All bits that are not recognized by the kernel in 'flags' are rejected, and the
+ioctl fails with -EINVAL.
+
+
+4. Items
+===============================================================================
+
+To flexibly augment transport structures used by kdbus, data blobs of type
+struct kdbus_item are used. An item has a fixed-sized header that only stores
+the type of the item and the overall size. The total size is variable and is
+in some cases defined by the item type, in other cases, they can be of
+arbitrary length (for instance, a string).
+
+In the external kernel API, items are used for many ioctls to transport
+optional information from userspace to kernelspace. They are also used for
+information stored in a connection's pool, such as messages, name lists or
+requested connection information.
+
+In all such occasions where items are used as part of the kdbus kernel API,
+they are embedded in structs that have an overall size of their own, so there
+can be many of them.
+
+The kernel expects all items to be aligned to 8-byte boundaries.
+
+A simple iterator in userspace would iterate over the items until the items
+have reached the embedding structure's overall size. An example implementation
+of such an iterator can be found in tools/testing/selftests/kdbus/kdbus-util.h.
+
+
+5. Creation of new domains, buses and endpoints
+===============================================================================
+
+The initial kdbus domain is unconditionally created by the kernel module. A
+domain contains a "control" device node which allows to create a new bus or
+domain. New domains do not have any buses created by default.
+
+
+5.1 Domains and buses
+---------------------
+
+Opening the control device node returns a file descriptor, it accepts the
+ioctls KDBUS_CMD_BUS_MAKE and KDBUS_CMD_DOMAIN_MAKE which specify the name of
+the new bus or domain to create. The control file descriptor needs to be kept
+open for the entire life-time of the created bus or domain, closing it will
+immediately cleanup the entire bus or domain and all its associated
+resources and connections. Every control file descriptor can only be used once
+to create a new bus or domain; from that point, it is not used for any
+further communication until the final close().
+
+Each bus will generate a random, 128-bit UUID upon creation. It will be
+returned to the creators of connections through kdbus_cmd_hello.id128 and can
+be used by userspace to uniquely identify buses, even across different machines
+or containers. The UUID will have its its variant bits set to 'DCE', and denote
its its
Post by Greg Kroah-Hartman
+version 4 (random).
+
+When a new domain is created, its structure in /dev/kdbus/<name>/ is a
+replication of what's initially created in /dev/kdbus. In fact, internally,
+a dummy default domain is set up when the driver is loaded. This allows
+userspace to bind-mount domain subtrees of /dev/kdbus into a container's
+filesystem view, and hence achieve complete isolation from the host's domain
+and those of other containers.
+
+
+5.2 Endpoints
+-------------
+
+Endpoints are entry points to a bus. By default, each bus has a default
+endpoint called 'bus'. The bus owner has the ability to create custom
+endpoints with specific names, permissions, and policy databases (see below).
+
+To create a custom endpoint, use the KDBUS_CMD_ENDPOINT_MAKE ioctl with struct
+kdbus_cmd_make. Custom endpoints always have a policy db that, by default,
db -> database
Post by Greg Kroah-Hartman
+does not allow anything. Everything that users of this new endpoint should be
+able to do has to be explicitly specified through KDBUS_ITEM_NAME and
+KDBUS_ITEM_POLICY_ACCESS items.
+
+5.3 Creating domains, buses and endpoints
+-----------------------------------------
+
+KDBUS_CMD_BUS_MAKE, KDBUS_CMD_DOMAIN_MAKE and KDBUS_CMD_ENDPOINT_MAKE take a
+struct kdbus_cmd_make argument.
+
+struct kdbus_cmd_make {
+ __u64 size;
+ The overall size of the struct, including its items.
+
+ __u64 flags;
+ The flags for creation.
+
+ KDBUS_MAKE_ACCESS_GROUP
+ Make the device node group-accessible
+
+ KDBUS_MAKE_ACCESS_WORLD
+ Make the device node world-accessible
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ A list of items, only used for creating custom endpoints. Ignored for
+ buses and domains.
+};
+
+
+6. Connections
+===============================================================================
+
+
+6.1 Connection IDs and well-known connection names
+--------------------------------------------------
+
+Connections are identified by their connection id, internally implemented as a
+uint64_t counter. The IDs of every newly created bus start at 1, and every new
+connection will increment the counter by 1. The ids are not reused.
+
+In higher level tools, the user visible representation of a connection is
+defined by the D-Bus protocol specification as ":1.<id>".
+
+Messages with a specific uint64_t destination id are directly delivered to
+the connection with the corresponding id. Messages with the special destination
+id KDBUS_DST_ID_BROADCAST are broadcast messages and are potentially delivered
+to all known connections on the bus; clients interested in broadcast messages
+need to subscribe to the specific messages they are interested though, before
comma before though
Post by Greg Kroah-Hartman
+any broadcast message reaches them.
+
+Messages synthesized and sent directly by the kernel will carry the special
+source id KDBUS_SRC_ID_KERNEL (0).
+
+In addition to the unique uint64_t connection id, established connections can
+request the ownership of well-known names, under which they can be found and
+addressed by other bus clients. A well-known name is associated with one and
+only one connection at a time. See section 8 on name acquisition and the
+name registry, and the validity of names.
+
+Messages can specify the special destination id 0 and carry a well-known name
+in the message data. Such a message is delivered to the destination connection
+which owns that well-known name.
+
+ +-------------------------------------------------------------------------+
+ | +---------------+ +---------------------------+ |
+ | | Connection | | Message | -----------------+ |
+ | | :1.22 | --> | src: 22 | | |
+ | | | | dst: 25 | | |
+ | | | | | | |
+ | | | | | | |
+ | | | +---------------------------+ | |
+ | | | | |
+ | | | <--------------------------------------+ | |
+ | +---------------+ | | |
+ | | | |
+ | +---------------+ +---------------------------+ | | |
+ | | Connection | | Message | -----+ | |
+ | | :1.25 | --> | src: 25 | | |
+ | | | | dst: 0xffffffffffffffff | -------------+ | |
+ | | | | (KDBUS_DST_ID_BROADCAST) | | | |
+ | | | | | ---------+ | | |
+ | | | +---------------------------+ | | | |
+ | | | | | | |
+ | | | <--------------------------------------------------+ |
+ | +---------------+ | | |
+ | | | |
+ | +---------------+ +---------------------------+ | | |
+ | | Connection | | Message | --+ | | |
+ | | :1.55 | --> | src: 55 | | | | |
+ | | | | dst: 0 / org.foo.bar | | | | |
+ | | | | | | | | |
+ | | | | | | | | |
+ | | | +---------------------------+ | | | |
+ | | | | | | |
+ | | | <------------------------------------------+ | |
+ | +---------------+ | | |
+ | | | |
+ | +---------------+ | | |
+ | | Connection | | | |
+ | | :1.81 | | | |
+ | | org.foo.bar | | | |
+ | | | | | |
+ | | | | | |
+ | | | <-----------------------------------+ | |
+ | | | | |
+ | | | <----------------------------------------------+ |
+ | +---------------+ |
+ +-------------------------------------------------------------------------+
+
+
+6.2 Creating connections
+------------------------
+
+A connection to a bus is created by opening an endpoint device node of
+a bus and becoming an active client with the KDBUS_CMD_HELLO ioctl. Every
+connected client connection has a unique identifier on the bus and can
+address messages to every other connection on the same bus by using
+the peer's connection id as the destination.
+
+The KDBUS_CMD_HELLO ioctl takes the following struct as argument.
+
+struct kdbus_cmd_hello {
+ __u64 size;
+ The overall size of the struct, including all attached items.
+
+ __u64 conn_flags;
+
+ KDBUS_HELLO_ACCEPT_FD
+ When this flag is set, the connection can be sent file descriptors
+ as message payload. If it's not set, any attempt of doing so will
+ result in -ECOMM on the sender's side.
+
+ KDBUS_HELLO_ACTIVATOR
+ Make this connection an activator (see below). With this bit set,
+ an item of type KDBUS_ITEM_NAME has to be attached which describes
+ the well-known name this connection should be an activator for.
+
+ KDBUS_HELLO_POLICY_HOLDER
+ Make this connection a policy holder (see below). With this bit set,
+ an item of type KDBUS_ITEM_NAME has to be attached which describes
+ the well-known name this connection should hold a policy for.
+
+ KDBUS_HELLO_MONITOR
+ Make this connection an eaves-dropping connection that receives all
+ unicast messages sent on the bus. To also receive broadcast messages,
+ the connection has to upload appropriate matches as well.
+ This flag is only valid for privileged bus connections.
+
+ __u64 attach_flags;
+ Request the attachment of metadata for each message received by this
+ connection. The metadata actually attached may actually augment the list
+ of requested items. See section 13 for more details.
+
+ __u64 bus_flags;
+ Upon successful completion of the ioctl, this member will contain the
+ flags of the bus it connected to.
+
+ __u64 id;
+ Upon successful completion of the ioctl, this member will contain the
+ id of the new connection.
+
+ __u64 pool_size;
+ The size of the communication pool, in bytes. The pool can be accessed
+ by calling mmap() on the file descriptor that was used to issue the
+ KDBUS_CMD_HELLO ioctl.
+
+ struct kdbus_bloom_parameter bloom;
+ Bloom filter parameter (see below).
+
+ __u8 id128[16];
+ Upon successful completion of the ioctl, this member will contain the
+ 128 bit wide UUID of the connected bus.
+
+ struct kdbus_item items[0];
+ Variable list of items to add optional additional information. The
+
+ KDBUS_ITEM_CONN_NAME
+ Contains a string to describes this connection's name, so it can be
+ identified later.
+
+ KDBUS_ITEM_NAME
+ KDBUS_ITEM_POLICY_ACCESS
+ For activators and policy holders only, combinations of these two
+ items describe policy access entries (see section about policy db).
the section is titled 'Policy', not policy db
Post by Greg Kroah-Hartman
+
+ KDBUS_ITEM_CREDS
+ KDBUS_ITEM_SECLABEL
+ Privileged bus users may submit these types in order to create
+ connections with faked credentials. The only real use case for this
+ is a proxy service which acts on behalf of some other tasks. For a
+ connection that runs in that mode, the message's metadata items will
+ be limited to what's specified here. See section 13 for more
+ information.
+
+ Items of other types are silently ignored.
+};
+
+
+6.3 Activator and policy holder connection
+------------------------------------------
+
+An activator connection is a placeholder for a well-known name. Messages sent
+to such a connection can be used by userspace to start an implementor
+connection, which will then get all the messages from the activator copied
+over. An activator connection cannot be used to send any message.
+
+A policy holder connection only installs a policy for one or more names.
+These policy entries are kept active as long as the connection is alive, and
+are removed once it terminates. Such a policy connection type can be used to
+deploy restrictions for names that are not yet active on the bus. A policy
+holder connection cannot be used to send any message.
+
+The creation of activator, policy holder or monitor connections is an operation
+restricted to privileged users on the bus (see section "Terminology").
+
+
+6.4 Retrieving information on a connection
+------------------------------------------
+
+The KDBUS_CMD_CONN_INFO ioctl can be used to retrieve credentials and
+properties of the initial creator of a connection. This ioctl uses the
+
+struct kdbus_cmd_info {
+ __u64 size;
+ The overall size of the struct, including the name with its 0-byte string
+ terminator.
+
+ __u64 flags;
+ Specify which items should be attached to the answer.
+
+ KDBUS_ATTACH_NAMES
+ Add an item to the answer containing all the names the connection
+ currently owns.
+
+ KDBUS_ATTACH_CONN_NAME
+ Add an item to the answer containing the connection's name.
+
+ After the ioctl returns, this field will contain the current metadata
+ attach flags of the connection.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ __u64 id;
+ The connection's numerical ID to retrieve information for. If set to
+ non-zero value, the 'name' field is ignored.
+
+ __u64 offset;
+ When the ioctl returns, this value will yield the offset of the connection
+ information inside the caller's pool.
+
+ struct kdbus_item items[0];
+ The optional item list, containing the well-known name to look up as
+ a KDBUS_ITEM_NAME. Only required if the 'id' field is set to 0.
+ All other items are currently ignored.
+};
+
+After the ioctl returns, the following struct will be stored in the caller's
extra space after struct
Post by Greg Kroah-Hartman
+pool at 'offset'.
+
+struct kdbus_info {
+ __u64 size;
+ The overall size of the struct, including all its items.
+
+ __u64 id;
+ The connection's unique ID.
+
+ __u64 flags;
+ The connection's flags as specified when it was created.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ Depending on the 'flags' field in struct kdbus_cmd_info, items of
+ types KDBUS_ITEM_NAME and KDBUS_ITEM_CONN_NAME are followed here.
+};
+
+Once the caller is finished with parsing the return buffer, it needs to call
+KDBUS_CMD_FREE for the offset.
+
+
+6.5 Getting information about a connection's bus creator
+--------------------------------------------------------
+
+The KDBUS_CMD_BUS_CREATOR_INFO ioctl takes the same struct as
+KDBUS_CMD_CONN_INFO but is used to retrieve information about the creator of
+the bus the connection is attached to. The metadata returned by this call is
+collected during the creation of the bus and is never altered afterwards, so
+it provides pristine information on the task that created the bus, at the
+moment when it did so.
+
+In response to this call, a slice in the connection's pool is allocated and
+filled with an object of type struct kdbus_info, pointed to by the ioctl's
+'offset' field.
+
+struct kdbus_info {
+ __u64 size;
+ The overall size of the struct, including all its items.
+
+ __u64 id;
+ The bus' ID
+
+ __u64 flags;
+ The bus' flags as specified when it was created.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ Metadata information is stored in items here.
+};
+
+Once the caller is finished with parsing the return buffer, it needs to call
+KDBUS_CMD_FREE for the offset.
+
+
+6.6 Updating connection details
+-------------------------------
+
+Some of a connection's details can be updated with the KDBUS_CMD_CONN_UPDATE
+ioctl, using the file descriptor that was used to create the connection.
+The update command uses the following struct.
+
+struct kdbus_cmd_update {
+ __u64 size;
+ The overall size of the struct, including all its items.
+
+ struct kdbus_item items[0];
+ Items to describe the connection details to be updated. The following item
+
+ KDBUS_ITEM_ATTACH_FLAGS
+ Supply a new set of items to be attached to each message.
+
+ KDBUS_ITEM_NAME
+ KDBUS_ITEM_POLICY_ACCESS
+ Policy holder connections may supply a new set of policy information
+ with these items. For other connection types, -EOPNOTSUPP is returned.
+};
+
+
+6.6 Termination
+---------------
+
+A connection can be terminated by simply closing the file descriptor that was
+used to start the connection. All pending incoming messages will be discarded,
+and the memory in the pool will be freed.
+
+An alternative way of way of closing down a connection is calling the
way of way
Post by Greg Kroah-Hartman
+KDBUS_CMD_BYEBYE ioctl on it, which will only succeed if the message queue
+of the connection is empty at the time of closing, otherwise, -EBUSY is
+returned.
+
+When this ioctl returns successfully, the connection has been terminated and
+won't accept any new messages from remote peers. This way, a connection can
+be terminated race-free, without losing any messages.
+
+
+7. Messages
+===============================================================================
+
+Messages consist of a fixed-size header followed directly by a list of
+variable-sized data 'items'. The overall message size is specified in the
+header of the message. The chain of data items can contain well-defined
+message metadata fields, raw data, references to data, or file descriptors.
+
+
+7.1 Sending messages
+--------------------
+
+Messages are passed to the kernel with the KDBUS_CMD_MSG_SEND ioctl. Depending
+on the the destination address of the message, the kernel delivers the message
the the
Post by Greg Kroah-Hartman
+to the specific destination connection or to all connections on the same bus.
+Sending messages across buses is not possible. Messages are always queued in
+the memory pool of the destination connection (see below).
+
+The KDBUS_CMD_MSG_SEND ioctl uses struct kdbus_msg to describe the message to
+be sent.
+
+struct kdbus_msg {
+ __u64 size;
+ The over all size of the struct, including the attached items.
overall
Post by Greg Kroah-Hartman
+
+ __u64 flags;
+
+ KDBUS_MSG_FLAGS_EXPECT_REPLY
+ Expect a reply from the remote peer to this message. With this bit set,
+ the timeout_ns field must be set to a non-zero number of nanoseconds in
+ which the receiving peer is expected to reply. If such a reply is not
+ received in time, the sender will be notified with a timeout message
+ (see below). The value must be an absolute value, in nanoseconds and
+ based on CLOCK_MONOTONIC.
+
+ For a message to be accepted as reply, it must be a direct message to
+ the original sender (not a broadcast), and its kdbus_msg.reply_cookie
+ must match the previous message's kdbus_msg.cookie.
+
+ Expected replies also temporarily open the policy of the sending
+ connection, so the other peer is allowed to respond within the given
+ time window.
+
+ KDBUS_MSG_FLAGS_SYNC_REPLY
+ By default, all calls to kdbus are considered asynchronous,
+ non-blocking. However, as there are many use cases that need to wait
+ for a remote peer to answer a method call, there's a way to send a
+ message and wait for a reply in a synchronous fashion. This is what
+ the KDBUS_MSG_FLAGS_SYNC_REPLY controls. The KDBUS_CMD_MSG_SEND ioctl
+ will block until the reply has arrived, the timeout limit is reached,
+ in case the remote connection was shut down, or if interrupted by
+ a signal before any reply; see signal(7).
+
+ The offset of the reply message in the sender's pool is stored in
+ in 'offset_reply' when the ioctl has returned without error. Hence,
+ there is no need for another KDBUS_CMD_MSG_RECV ioctl or anything else
+ to receive the reply.
+
+ KDBUS_MSG_FLAGS_NO_AUTO_START
+ By default, when a message is sent to an activator connection, the
+ activator notified and will start an implementor. This flag inhibits
+ that behavior. With this bit set, and the remote being an activator,
+ -EADDRNOTAVAIL is returned from the ioctl.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call of
+ KDBUS_MSG_SEND.
+
+ __s64 priority;
+ The priority of this message. Receiving messages (see below) may
+ optionally be constrained to messages of a minimal priority. This
+ allows for use cases where timing critical data is interleaved with
+ control data on the same connection. If unused, the priority should be
+ set to zero.
+
+ __u64 dst_id;
+ The numeric ID of the destination connection, or KDBUS_DST_ID_BROADCAST
+ (~0ULL) to address every peer on the bus, or KDBUS_DST_ID_NAME (0) to look
+ it up dynamically from the bus' name registry. In the latter case, an item
+ of type KDBUS_ITEM_DST_NAME is mandatory.
+
+ __u64 src_id;
+ Upon return of the ioctl, this member will contain the sending
+ connection's numerical ID. Should be 0 at send time.
+
+ __u64 payload_type;
+ Type of the payload in the actual data records. Currently, only
+ KDBUS_PAYLOAD_DBUS is accepted as input value of this field. When
+ receiving messages that are generated by the kernel (notifications),
+ this field will yield KDBUS_PAYLOAD_KERNEL.
+
+ __u64 cookie;
+ Cookie of this message, for later recognition. Also, when replying
+ to a message (see above), the cookie_reply field must match this value.
+
+ __u64 timeout_ns;
+ If the message sent requires a reply from the remote peer (see above),
+ this field contains the timeout in absolute nanoseconds based on
+ CLOCK_MONOTONIC.
+
+ __u64 cookie_reply;
+ If the message sent is a reply to another message, this field must
+ match the cookie of the formerly received message.
+
+ __u64 offset_reply;
+ If the message successfully got a synchronous reply (see above), this
+ field will yield the offset of the reply message in the sender's pool.
+ Is is what KDBUS_CMD_MSG_RECV usually does for asynchronous messages.
+
+ struct kdbus_item items[0];
+ A dynamically sized list of items to contain additional information.
+
+ KDBUS_ITEM_PAYLOAD_VEC
+ KDBUS_ITEM_PAYLOAD_MEMFD
+ KDBUS_ITEM_FDS
+ Actual data records containing the payload. See section "Passing of
+ Payload Data".
+
+ KDBUS_ITEM_BLOOM_FILTER
+ Bloom filter for matches (see below).
+
+ KDBUS_ITEM_DST_NAME
+ Well-known name to send this message to. Required if dst_id is set
+ to KDBUS_DST_ID_NAME. If a connection holding the given name can't
+ be found, -ESRCH is returned.
+ For messages to a unique name (ID), this item is optional. If present,
+ the kernel will make sure the name owner matches the given unique name.
+ This allows userspace tie the message sending to the condition that a
+ name is currently owned by a certain unique name.
+};
+
+The message will be augmented by the requested metadata items when queued into
+the receiver's pool. See also section 13.1 ("Metadata and namespaces").
+
+
+7.2 Message layout
+------------------
+
+The layout of a message is shown below.
+
+ +-------------------------------------------------------------------------+
+ | Message |
+ | +---------------------------------------------------------------------+ |
+ | | Header | |
+ | | size: overall message size, including the data records | |
+ | | destination: connection id of the receiver | |
+ | | source: connection id of the sender (set by kernel) | |
+ | | payload_type: "DBusDBus" textual identifier stored as uint64_t | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | Data Record | |
+ | | size: overall record size (without padding) | |
+ | | type: type of data | |
+ | | data: reference to data (address or file descriptor) | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | padding bytes to the next 8 byte alignment | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | Data Record | |
+ | | size: overall record size (without padding) | |
+ | | ... | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | padding bytes to the next 8 byte alignment | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | Data Record | |
+ | | size: overall record size | |
+ | | ... | |
+ | +---------------------------------------------------------------------+ |
+ | +---------------------------------------------------------------------+ |
+ | | padding bytes to the next 8 byte alignment | |
+ | +---------------------------------------------------------------------+ |
+ +-------------------------------------------------------------------------+
+
+
+7.3 Passing of Payload Data
+---------------------------
+
+When connecting to the bus, receivers request a memory pool of a given size,
+large enough to carry all backlog of data enqueued for the connection. The
+pool is internally backed by a shared memory file which can be mmap()ed by
+the receiver.
+
+ Messages are directly copied by the sending process into the receiver's pool,
+ that way two peers can exchange data by effectively doing a single-copy from
+ one process to another, the kernel will not buffer the data anywhere else.
+
+ Messages can reference memfd files which contain the data.
+ memfd files are tmpfs-backed files that allow sealing of the content of the
+ file, which prevents all writable access to the file content.
+ Only sealed memfd files are accepted as payload data, which enforces
+ reliable passing of data; the receiver can assume that neither the sender nor
+ anyone else can alter the content after the message is sent.
+
+Apart from the sender filling-in the content into memfd files, the data will
+be passed as zero-copy from one process to another, read-only, shared between
+the peers.
+
+
+7.4 Receiving messages
+----------------------
+
+Messages are received by the client with the KDBUS_CMD_MSG_RECV ioctl. The
+endpoint device node of the bus supports poll() to wake up the receiving
+process when new messages are queued up to be received.
+
+With the KDBUS_CMD_MSG_RECV ioctl, a struct kdbus_cmd_recv is used.
+
+struct kdbus_cmd_recv {
+ __u64 flags;
+ Flags to control the receive command.
+
+ KDBUS_RECV_PEEK
+ Just return the location of the next message. Do not install file
+ descriptors or anything else. This is usually used to determine the
+ sender of the next queued message.
+
+ KDBUS_RECV_DROP
+ Drop the next message without doing anything else with it, and free the
+ pool slice. This a short-cut for KDBUS_RECV_PEEK and KDBUS_CMD_FREE.
+
+ KDBUS_RECV_USE_PRIORITY
+ Use the priority field (see below).
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ __s64 priority;
+ With KDBUS_RECV_USE_PRIORITY set in flags, receive the next message in
+ the queue with at least the given priority. If no such message is waiting
+ in the queue, -ENOMSG is returned.
+
+ __u64 offset;
+ Upon return of the ioctl, this field contains the offset in the
+ receiver's memory pool.
+};
+
+Unless KDBUS_RECV_DROP was passed, and given that the ioctl succeeded, the
+offset field contains the location of the new message inside the receiver's
+pool. The message is stored as struct kdbus_msg at this offset, and can be
+interpreted with the semantics described above.
+
+Also, if the connection allowed for file descriptor to be passed
+(KDBUS_HELLO_ACCEPT_FD), and if the message contained any, they will be
+installed into the receiving process after the KDBUS_CMD_MSG_RECV ioctl
+returns. The receiving task is obliged to close all of them appropriately.
+
+The caller is obliged to call KDBUS_CMD_FREE with the returned offset when
+the memory is no longer needed.
+
+
+7.5 Canceling messages synchronously waiting for replies
+--------------------------------------------------------
+
+When a connection sends a message with KDBUS_MSG_FLAGS_SYNC_REPLY and
+blocks while waiting for the reply, the KDBUS_CMD_MSG_CANCEL ioctl can be
+used on the same file descriptor to cancel the message, based on its cookie.
+If there are multiple messages with the same cookie that are all synchronously
+waiting for a reply, all of them will be canceled. Obviously, this is only
+possible in multi-threaded applications.
+
+
+8. Name registry
+===============================================================================
+
+Each bus instantiates a name registry to resolve well-known names into unique
+connection IDs for message delivery. The registry will be queried when a
+message is sent with kdbus_msg.dst_id set to KDBUS_DST_ID_NAME, or when a
+registry dump is requested.
+
+All of the below is subject to policy rules for SEE and OWN permissions.
+
+
+8.1 Name validity
+-----------------
+
+
+ - The name has two or more elements separated by a period ('.') character
+ - All elements must contain at least one character
+ - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_"
+ and must not begin with a digit
+ - The name must contain at least one '.' (period) character
+ (and thus at least two elements)
+ - The name must not begin with a '.' (period) character
+ - The name must not exceed KDBUS_NAME_MAX_LEN (255)
+
+
+8.2 Acquiring a name
+--------------------
+
+To acquire a name, a client uses the KDBUS_CMD_NAME_ACQUIRE ioctl with the
+following data structure.
+
+struct kdbus_cmd_name {
+ __u64 size;
+ The overall size of this struct, including the name with its 0-byte string
+ terminator.
+
+ __u64 flags;
+ Flags to control details in the name acquisition.
+
+ KDBUS_NAME_REPLACE_EXISTING
+ Acquiring a name that is already present usually fails, unless this flag
+ is set in the call, and KDBUS_NAME_ALLOW_REPLACEMENT or (see below) was
+ set when the current owner of the name acquired it, or if the current
+ owner is an activator connection (see below).
+
+ KDBUS_NAME_ALLOW_REPLACEMENT
+ Allow other connections to take over this name. When this happens, the
+ former owner of the connection will be notified of the name loss.
+
+ KDBUS_NAME_QUEUE (acquire)
+ A name that is already acquired by a connection, and which wasn't
+ requested with the KDBUS_NAME_ALLOW_REPLACEMENT flag set can not be
+ acquired again. However, a connection can put itself in a queue of
+ connections waiting for the name to be released. Once that happens, the
+ first connection in that queue becomes the new owner and is notified
+ accordingly.
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ Items to submit the name. Currently, one one item of type KDBUS_ITEM_NAME
one one
Post by Greg Kroah-Hartman
+ is expected and allowed, and the contained string must be a valid bus name.
+};
+
+
+8.3 Releasing a name
+--------------------
+
+A connection may release a name explicitly with the KDBUS_CMD_NAME_RELEASE
+ioctl. If the connection was an implementor of an activatable name, its
+pending messages are moved back to the activator. If there are any connections
+queued up as waiters for the name, the oldest one of them will become the new
+owner. The same happens implicitly for all names once a connection terminates.
+
+The KDBUS_CMD_NAME_RELEASE ioctl uses the same data structure as the
+acquisition call, but with slightly different field usage.
+
+struct kdbus_cmd_name {
+ __u64 size;
+ The overall size of this struct, including the name with its 0-byte string
+ terminator.
+
+ __u64 flags;
+
+ struct kdbus_item items[0];
+ Items to submit the name. Currently, one one item of type KDBUS_ITEM_NAME
one one
Post by Greg Kroah-Hartman
+ is expected and allowed, and the contained string must be a valid bus name.
+};
+
+
+8.4 Dumping the name registry
+-----------------------------
+
+A connection may request a complete or filtered dump of currently active bus
+names with the KDBUS_CMD_NAME_LIST ioctl, which takes a struct
+kdbus_cmd_name_list as argument.
+
+struct kdbus_cmd_name_list {
+ __u64 flags;
+ Any combination of flags to specify which names should be dumped.
+
+ KDBUS_NAME_LIST_UNIQUE
+ List the unique (numeric) IDs of the connection, whether it owns a name
+ or not.
+
+ KDBUS_NAME_LIST_NAMES
+ List well-known names stored in the database which are actively owned by
+ a real connection (not an activator).
+
+ KDBUS_NAME_LIST_ACTIVATORS
+ List names that are owned by an activator.
+
+ KDBUS_NAME_LIST_QUEUED
+ List connections that are not yet owning a name but are waiting for it
+ to become available.
+
+ __u64 offset;
+ When the ioctl returns successfully, the offset to the name registry dump
+ inside the connection's pool will be stored in this field.
+};
+
+The returned list of names is stored in a struct kdbus_name_list that in turn
+contains a dynamic number of struct kdbus_cmd_name that carry the actual
+information. The fields inside that struct kdbus_cmd_name is described next.
+
+struct kdbus_name_info {
+ __u64 size;
+ The overall size of this struct, including the name with its 0-byte string
+ terminator.
+
+ __u64 flags;
+ The current flags for this name. Can be any combination of
+
+ KDBUS_NAME_ALLOW_REPLACEMENT
+
+ KDBUS_NAME_IN_QUEUE (list)
+ When retrieving a list of currently acquired name in the registry, this
+ flag indicates whether the connection actually owns the name or is
+ currently waiting for it to become available.
+
+ KDBUS_NAME_ACTIVATOR (list)
+ An activator connection owns a name as a placeholder for an implementor,
+ which is started on demand as soon as the first message arrives. There's
+ some more information on this topic below. In contrast to
+ KDBUS_NAME_REPLACE_EXISTING, when a name is taken over from an activator
+ connection, all the messages that have been queued in the activator
+ connection will be moved over to the new owner. The activator connection
+ will still be tracked for the name and will take control again if the
+ implementor connection terminates.
+ This flag can not be used when acquiring a name, but is implicitly set
+ through KDBUS_CMD_HELLO with KDBUS_HELLO_ACTIVATOR set in
+ kdbus_cmd_hello.conn_flags.
+
+ __u64 owner_id;
+ The owning connection's unique ID.
+
+ __u64 conn_flags;
+ The flags of the owning connection.
+
+ struct kdbus_item items[0];
+ Items containing the actual name. Currently, one one item of type
one one
Post by Greg Kroah-Hartman
+ KDBUS_ITEM_NAME will be attached.
+};
+
+The returned buffer must be freed with the KDBUS_CMD_FREE ioctl when the user
+is finished with it.
+
+
+9. Notifications
+===============================================================================
+
+The kernel will notify its users of the following events.
+
+ * When connection A is terminated while connection B is waiting for a reply
+ from it, connection B is notified with a message with an item of type
+ KDBUS_ITEM_REPLY_DEAD.
+
+ * When connection A does not receive a reply from connection B within the
+ specified timeout window, connection A will receive a message with an item
+ of type KDBUS_ITEM_REPLY_TIMEOUT.
+
+ * When a connection is created on or removed from a bus, messages with an
+ item of type KDBUS_ITEM_ID_ADD or KDBUS_ITEM_ID_REMOVE, respectively, are
+ sent to all bus members that match these messages through their match
+ database.
+
+ * When a connection owns or loses a name, or a name is moved from one
+ connection to another, messages with an item of type KDBUS_ITEM_NAME_ADD,
+ KDBUS_ITEM_NAME_REMOVE or KDBUS_ITEM_NAME_CHANGE are sent to all bus
+ members that match these messages through their match database.
+
+A kernel notification is a regular kdbus message with the following details.
+
+ * kdbus_msg.src_id == KDBUS_SRC_ID_KERNEL
+ * kdbus_msg.dst_id == KDBUS_DST_ID_BROADCAST
+ * kdbus_msg.payload_type == KDBUS_PAYLOAD_KERNEL
+ * Has exactly one of the aforementioned items attached
+
+
+10. Message Matching, Bloom filters
+===============================================================================
+
+10.1 Matches for broadcast messages from other connections
+----------------------------------------------------------
+
+A message addressed at the connection ID KDBUS_DST_ID_BROADCAST (~0ULL) is a
+broadcast message, delivered to all connected peers which installed a rule to
+match certain properties of the message. Without any rules installed in the
+connection, no broadcast message or kernel-side notifications will be delivered
+to the connection. Broadcast messages are subject to policy rules and TALK
+access checks.
+
+See section 11 for details on policies, and section 11.5 for more
+details on implicit policies.
+
+Matches for messages from other connections (not kernel notifications) are
+implemented as bloom filters. The sender adds certain properties of the message
+as elements to a bloom filter bit field, and sends that along with the
+broadcast message.
+
+The connection adds the message properties it is interested as elements to a
+bloom mask bit field, and uploads the mask to the match rules of the
+connection.
+
+The kernel will match the broadcast message's bloom filter against the
+connections bloom mask (simply by &-ing it), and decide whether the message
+should be delivered to the connection.
+
+The kernel has no notion of any specific properties of the message, all it
+sees are the bit fields of the bloom filter and mask to match against. The
+use of bloom filters allows simple and efficient matching, without exposing
+any message properties or internals to the kernel side. Clients need to deal
+with the fact that they might receive broadcasts which they did not subscribe
+to, as the bloom filter might allow false-positives to pass the filter.
+
+To allow the future extension of the set of elements in the bloom filter, the
+filter specifies a "generation" number. A later generation must always contain
+all elements of the set of the previous generation, but can add new elements
+to the set. The match rules mask can carry an array with all previous
+generations of masks individually stored. When the filter and mask are matched
+by the kernel, the mask with the closest matching "generation" is selected
+as the index into the mask array.
+
+
+10.2 Matches for kernel notifications
+------------------------------------
+
+To receive kernel generated notifications (see section 9), a connection must
+install special match rules that are different from the bloom filter matches
+described in the section above. They can be filtered by a sender connection's
+ID, by one of the name the sender connection owns at the time of sending the
+message, or by type of the notification (id/name add/remove/change).
+
+10.3 Adding a match
+-------------------
+
+To add a match, the KDBUS_CMD_MATCH_ADD ioctl is used, which takes a struct
+of the struct described below.
+
+Note that each of the items attached to this command will internally create
+one match 'rule', and the collection of them, which is submitted as one block
+via the ioctl is called a 'match'. To allow a message to pass, all rules of a
+match have to be satisfied. Hence, adding more items to the command will only
+narrow the possibility of a match to effectively let the message pass, and will
+cause the connection's user space process to wake up less likely.
+
+Multiple matches can be installed per connection. As long as one of it has a
+set of rules which allows the message to pass, this one will be decisive.
+
+struct kdbus_cmd_match {
+ __u64 size;
+ The overall size of the struct, including its items.
+
+ __u64 cookie;
+ A cookie which identifies the match, so it can be referred to at removal
+ time.
+
+ __u64 flags;
+ Flags to control the behavior of the ioctl.
+
+ Remove all entries with the given cookie before installing the new one.
+ This allows for race-free replacement of matches.
+
+ struct kdbus_item items[0];
+ Items to define the actual rules of the matches. The following item types
+ are expected. Each item will cause one new match rule to be created.
+
+ KDBUS_ITEM_BLOOM_MASK
+ An item that carries the bloom filter mask to match against in its
+ data field. The payload size must match the bloom filter size that
+ was specified when the bus was created.
+ See section 10.4 for more information.
+
+ KDBUS_ITEM_NAME
+ Specify a name that a sending connection must own at a time of sending
+ a broadcast message in order to match this rule.
+
+ KDBUS_ITEM_ID
+ Specify a sender connection's ID that will match this rule.
+
+ KDBUS_ITEM_NAME_ADD
+ KDBUS_ITEM_NAME_REMOVE
+ KDBUS_ITEM_NAME_CHANGE
+ These items request delivery of broadcast messages that describe a name
+ acquisition, loss, or change. The details are stored in the item's
+ kdbus_notify_name_change member. All information specified must be
+ matched in order to make the message pass. Use KDBUS_MATCH_ID_ANY to
+ match against any unique connection ID.
+
+ KDBUS_ITEM_ID_ADD
+ KDBUS_ITEM_ID_REMOVE
+ These items request delivery of broadcast messages that are generated
+ when a connection is created or terminated. struct kdbus_notify_id_change
+ is used to store the actual match information. This item can be used to
+ monitor one particular connection ID, or, when the id field is set to
+ KDBUS_MATCH_ID_ANY, all of them.
+
+ Other item types are ignored.
+};
+
+
+10.4 Bloom filters
+------------------
+
+Bloom filters allow checking whether a given word is present in a dictionary.
+This allows connections to set up a mask for information it is interested in,
+and will be delivered broadcast messages that have a matching filter.
+
+For general information on bloom filters, see
+
+ https://en.wikipedia.org/wiki/Bloom_filter
+
+The size of the bloom filter is defined per bus when it is created, in
+kdbus_bloom_parameter.size. All bloom filters attached to broadcast messages
+on the bus must match this size, and all bloom filter matches uploaded by
+connections must also match the size, or a multiple thereof (see below).
+
+The calculation of the mask has to be done on the userspace side. The kernel
+just checks the bitmasks to decide whether or not to let the message pass. All
+bits in the mask must match the filter in and bit-wise AND logic, but the
+mask may have more bits set than the filter. Consequently, false positive
+matches are expected to happen, and userspace must deal with that fact.
+
+Masks are entities that are always passed to the kernel as part of a match
+(with an item of type KDBUS_ITEM_BLOOM_MASK), and filters can be attached to
+broadcast messages (with an item of type KDBUS_ITEM_BLOOM_FILTER).
+
+For a broadcast to match, all set bits in the filter have to be set in the
+installed match mask as well. For example, consider a bus has a bloom size
+
+ filter 0x0101010101010101
+ mask 0x0101010101010101
+ -> matches
+
+ filter 0x0303030303030303
+ mask 0x0101010101010101
+ -> doesn't match
+
+ filter 0x0101010101010101
+ mask 0x0303030303030303
+ -> matches
+
+Hence, in order to catch all messages, a mask filled with 0xff bytes can be
+installed as a wildcard match rule.
+
+Uploaded matches may contain multiple masks, each of which in the size of the
+bloom size defined by the bus. Each block of a mask is called a 'generation',
+starting at index 0.
+
+At match time, when a broadcast message is about to be delivered, a bloom
+mask generation is passed, which denotes which of the bloom masks the filter
+should be matched against. This allows userspace to provide backward compatible
+masks at upload time, while older clients can still match against older
+versions of filters.
+
+
+10.5 Removing a match
+--------------------
+
+Matches can be removed through the KDBUS_CMD_MATCH_REMOVE ioctl, which again
+takes struct kdbus_cmd_match as argument, but its fields are used slightly
+differently.
+
+struct kdbus_cmd_match {
+ __u64 size;
+ The overall size of the struct. As it has no items in this use case, the
+ value should yield 16.
+
+ __u64 cookie;
+ The cookie of the match, as it was passed when the match was added.
+ All matches that have this cookie will be removed.
+
+ __u64 flags;
+ Unused for this use case,
+
+ __u64 kernel_flags;
+ Valid flags for this command, returned by the kernel upon each call.
+
+ struct kdbus_item items[0];
+ Unused for this use case.
+};
+
+
+11. Policy
+===============================================================================
+
+A policy databases restrict the possibilities of connections to own, see and
+talk to well-known names. It can be associated with a bus (through a policy
+holder connection) or a custom endpoint.
+
+See section 8.1 for more details on the validity of well-known names.
+
+Default endpoints of buses always have a policy database. The default
+policy is to deny all operations except for operations that are covered by
+implicit policies. Custom endpoints always have a policy, and by default,
+a policy database is empty. Therefore, unless policy rules are added, all
+operations will also be denied by default.
+
+See section 11.5 for more details on implicit policies.
+
+A set of policy rules is described by a name and multiple access rules, defined
+by the following struct.
+
+struct kdbus_policy_access {
+ __u64 type; /* USER, GROUP, WORLD */
+ One of the following.
+
+ KDBUS_POLICY_ACCESS_USER
+ Grant access to a user with the uid stored in the 'id' field.
+
+ KDBUS_POLICY_ACCESS_GROUP
+ Grant access to a user with the gid stored in the 'id' field.
+
+ KDBUS_POLICY_ACCESS_WORLD
+ Grant access to everyone. The 'id' field is ignored.
+
+ __u64 access; /* OWN, TALK, SEE */
+ The access to grant.
+
+ KDBUS_POLICY_SEE
+ Allow the name to be seen.
+
+ KDBUS_POLICY_TALK
+ Allow the name to be talked to.
+
+ KDBUS_POLICY_OWN
+ Allow the name to be owned.
+
+ __u64 id;
+ For KDBUS_POLICY_ACCESS_USER, stores the uid.
+ For KDBUS_POLICY_ACCESS_GROUP, stores the gid.
+};
+
+Policies are set through KDBUS_CMD_HELLO (when creating a policy holder
+connection), KDBUS_CMD_CONN_UPDATE (when updating a policy holder connection),
+KDBUS_CMD_ENDPOINT_MAKE (creating a custom endpoint) or
+KDBUS_CMD_ENDPOINT_UPDATE (updating a custom endpoint). In all cases, the name
+and policy access information is stored in items of type KDBUS_ITEM_NAME and
+KDBUS_ITEM_POLICY_ACCESS. For this transport, the following rules apply.
+
+ * An item of type KDBUS_ITEM_NAME must be followed by at least one
+ KDBUS_ITEM_POLICY_ACCESS item
+ * An item of type KDBUS_ITEM_NAME can be followed by an arbitrary number of
+ KDBUS_ITEM_POLICY_ACCESS items
+ * An arbitrary number of groups of names and access levels can be passed
+
+uids and gids are internally always stored in the kernel's view of global ids,
+and are translated back and forth on the ioctl level accordingly.
+
+
+11.2 Wildcard names
+-------------------
+
+Policy holder connections may upload names that contain the wildcard suffix
+(".*"). That way, a policy can be uploaded that is effective for every
+well-kwown name that extends the provided name by exactly one more level.
+
+For example, if an item of a set up uploaded policy rules contains the name
+"foo.bar.*", both "foo.bar.baz" and "foo.bar.bazbaz" are valid, but
+"foo.bar.baz.baz" is not.
+
+This allows connections to take control over multiple names that the policy
+holder doesn't need to know about when uploading the policy.
+
+Such wildcard entries are not allowed for custom endpoints.
+
+
+11.3 Policy example
+-------------------
+
+
+ KDBUS_ITEM_NAME: str='org.foo.bar'
+ KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=1000
+ KDBUS_ITEM_POLICY_ACCESS: type=USER, access=TALK, id=1001
+ KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=SEE
+ KDBUS_ITEM_NAME: str='org.blah.baz'
+ KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, id=0
+ KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=TALK
+
+That means that 'org.foo.bar' may only be owned by uid 1000, but every user on
+the bus is allowed to see the name. However, only uid 1001 may actually send
+a message to the connection and receive a reply from it.
+
+The second rule allows 'org.blah.baz' to be owned by uid 0 only, but every user
+may talk to it.
+
+
+11.4 TALK access and multiple well-known names per connection
+-------------------------------------------------------------
+
+Note that TALK access is checked against all names of a connection.
+For example, if a connection owns both 'org.foo.bar' and 'org.blah.baz', and
+the policy database allows 'org.blah.baz' to be talked to by WORLD, then this
+permission is also granted to 'org.foo.bar'. That might sound illogical, but
+after all, we allow messages to be directed to either the name or a well-known
+name, and policy is applied to the connection, not the name. In other words,
+the effective TALK policy for a connection is the most permissive of all names
+the connection owns.
+
+If a policy database exists for a bus (because a policy holder created one on
+demand) or for a custom endpoint (which always has one), each one is consulted
+during name registry listing, name owning or message delivery. If either one
+fails, the operation is failed with -EPERM.
+
+For best practices, connections that own names with a restricted TALK
+access should not install matches. This avoids cases where the sent
+message may pass the bloom filter due to false-positives and may also
+satisfy the policy rules.
+
+11.5 Implicit policies
+----------------------
+
+Depending on the type of the endpoint, a set of implicit rules might be
+
+ * Privileged connections always override any installed policy. Those
+ connections could easily install their own policies, so there is no
+ reason to enforce installed policies.
+ * Connections can always talk to connections of the same user. This
+ includes broadcast messages.
+ * Connections that own names might send broadcast messages to other
+ connections that belong to a different user, but only if that
+ destination connection does not own any name.
+
+
+ * Policy rules are always enforced, even if the connection is a privileged
+ connection.
+ * Policy rules are always enforced for TALK access, even if both ends are
+ running under the same user. This includes broadcast messages.
+ * To restrict the set of names that can be seen, endpoint policies can
+ install "SEE" policies.
+
+
+12. Pool
+===============================================================================
+
+A pool for data received from the kernel is installed for every connection of
+the bus, and is sized according to kdbus_cmd_hello.pool_size. It is accessed
+
+ * KDBUS_CMD_MSG_RECV, to receive a message
+ * KDBUS_CMD_NAME_LIST, to dump the name registry
+ * KDBUS_CMD_CONN_INFO, to retrieve information on a connection
+
+Internally, the pool is organized in slices, stored in an rb-tree. The offsets
+returned by either one of the aforementioned ioctls describe offsets inside the
+pool. In order to make the slice available for subsequent calls, KDBUS_CMD_FREE
+has to be called on the offset.
+
+To access the memory, the caller is expected to mmap() it to its task, like
+
+ /*
+ * POOL_SIZE has to be a multiple of PAGE_SIZE, and it must match the
+ * value that was previously passed in the .pool_size field of struct
+ * kdbus_cmd_hello.
+ */
+
+ buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_PRIVATE, conn_fd, 0);
+
+
+13. Metadata
+===============================================================================
+
+When a message is delivered to a receiver connection, it is augmented by
+metadata items in accordance to the destination's current attach flags. The
+information stored in those metadata items refer to the sender task at the
+time of sending the message, so even if any detail of the sender task has
+already changed upon message reception (or if the sender task does not exist
+anymore), the information is still preserved and won't be modfied until the
+message is freed.
+
+
+ a) Kernel generated messages don't have a source connection, so they won't be
+ augmented.
+
+ b) If a connection was created with faked credentials (see section 6.2),
+ the only attached metadata items are the ones provided by the connection
+ itself. The destination's attach_flags won't be looked at in such cases.
+
+Also, there are two things to be considered by userspace programs regarding
+
+ a) Userspace must cope with the fact that it might get more metadata than
+ they requested. That happens, for example, when a broadcast message is
+ sent and receivers have different attach flags. Items that haven't been
+ requested should hence be silently ignored.
+
+ b) Userspace might not always get all requested metadata items that it
+ requested. That is because some of those items are only added if a
+ corresponding kernel feature has been enabled. Also, the two exceptions
+ described above will as well lead to less items be attached than
+ requested.
+
+
+13.1 Known item types
+---------------------
+
+The following attach flags are currently supported.
+
+ KDBUS_ATTACH_TIMESTAMP
+ Attaches an item of type KDBUS_ITEM_TIMESTAMP which contains both the
+ monotonic and the realtime timestamp, taken when the message was
+ processed on the kernel side.
+
+ KDBUS_ATTACH_CREDS
+ Attaches an item of type KDBUS_ITEM_CREDS, containing credentials as
+ described in kdbus_creds: the uid, gid, pid, tid and starttime of the task.
+
+ KDBUS_ATTACH_AUXGROUPS
+ Attaches an item of type KDBUS_ITEM_AUXGROUPS, containing a dynamic
+ number of auxiliary groups the sending task was a member of.
+
+ KDBUS_ATTACH_NAMES
+ Attaches items of type KDBUS_ITEM_NAME, one for each name the sending
+ connection currently owns. The name is stored in kdbus_item.str for each
+ of them.
+
+ KDBUS_ATTACH_COMM
+ Attaches an items of type KDBUS_ITEM_PID_COMM and KDBUS_ITEM_TID_COMM,
+ both transporting the sending task's 'comm', for both the pid and the tid.
+ The strings are stored in kdbus_item.str.
+
+ KDBUS_ATTACH_EXE
+ Attaches an item of type KDBUS_ITEM_EXE, containing the path to the
+ executable of the sending task, stored in kdbus_item.str.
+
+ KDBUS_ATTACH_CMDLINE
+ Attaches an item of type KDBUS_ITEM_CMDLINE, containing the command line
+ arguments of the sending task, as an array of strings, stored in
+ kdbus_item.str.
+
+ KDBUS_ATTACH_CGROUP
+ Attaches an item of type KDBUS_ITEM_CGROUP with the task's cgroup path.
+
+ KDBUS_ATTACH_CAPS
+ Attaches an item of type KDBUS_ITEM_CAPS, carrying sets of capabilities
+ that should be accessed via kdbus_item.caps.caps. Also, userspace should
+ be written in a way that it takes kdbus_item.caps.last_cap into account,
+ and derive the number of sets and rows from the item size and the reported
+ number of valid capability bits.
+
+ KDBUS_ATTACH_SECLABEL
+ Attaches an item of type KDBUS_ITEM_SECLABEL, which contains the SELinux
+ security label of the sending task. Access via kdbus_item->str.
+
+ KDBUS_ATTACH_AUDIT
+ Attaches an item of type KDBUS_ITEM_AUDIT, which contains the audio label
+ of the sending taskj. Access via kdbus_item->str.
+
+ KDBUS_ATTACH_CONN_NAME
+ Attaches an item of type KDBUS_ITEM_CONN_NAME that contain's the
+ sending's connection current name in kdbus_item.str.
+
+
+13.1 Metadata and namespaces
+----------------------------
+Note that if the user or PID namespaces of a connection at the time of sending
+differ from those that were active then the connection was created
+(KDBUS_CMD_HELLO), data structures such as messages will not have any metadata
+attached to prevent leaking security-relevant information.
+
+
+14. Error codes
+===============================================================================
+
+Below is a list of error codes that might be returned by the individual
+ioctl commands. The list focuses on the return values from kdbus code itself,
+and might not cover those of all kernel internal functions.
+
+
+ -ENOMEM The kernel memory is exhausted
+ -ENOTTY Illegal ioctl command issued for the file descriptor
+ -ENOSYS The requested functionality is not available
+
+
+ -EFAULT The supplied data pointer was not 64-bit aligned, or was
+ inaccessible from the kernel side.
+ -EINVAL The size inside the supplied struct was smaller than expected
+ -EMSGSIZE The size inside the supplied struct was bigger than expected
+ -ENAMETOOLONG A supplied name is larger than the allowed maximum size
+
+
+ -EINVAL The flags supplied in the kdbus_cmd_make struct are invalid or
+ the supplied name does not start with the current uid and a '-'
+ -EEXIST A bus of that name already exists
+ -ESHUTDOWN The domain for the bus is already shut down
+ -EMFILE The maximum number of buses for the current user is exhausted
+
+
+ -EPERM The calling user does not have CAP_IPC_OWNER set, or
+ -EINVAL The flags supplied in the kdbus_cmd_make struct are invalid, or
+ no name supplied for top-level domain
+ -EEXIST A domain of that name already exists
+
+
+ -EPERM The calling user is not privileged (see Terminology)
+ -EINVAL The flags supplied in the kdbus_cmd_make struct are invalid
+ -EEXIST An endpoint of that name already exists
+
+
+ -EFAULT The supplied pool size was 0 or not a multiple of the page size
+ -EINVAL The flags supplied in the kdbus_cmd_make struct are invalid, or
+ an illegal combination of KDBUS_HELLO_MONITOR,
+ KDBUS_HELLO_ACTIVATOR and KDBUS_HELLO_POLICY_HOLDER was passed
+ in the flags, or an invalid set of items was supplied
+ -EPERM An KDBUS_ITEM_CREDS items was supplied, but the current user is
+ not privileged
+ -ESHUTDOWN The bus has already been shut down
+ -EMFILE The maximum number of connection on the bus has been reached
+
+
+ -EALREADY The connection has already been shut down
+ -EBUSY There are still messages queued up in the connection's pool
+
+
+ -EOPNOTSUPP The connection is unconnected, or a fd was passed that is
+ either a kdbus handle itself or a unix domain socket. Both is
+ currently unsupported.
+ -EINVAL The submitted payload type is KDBUS_PAYLOAD_KERNEL,
+ KDBUS_MSG_FLAGS_EXPECT_REPLY was set without a timeout value,
+ KDBUS_MSG_FLAGS_SYNC_REPLY was set without
+ KDBUS_MSG_FLAGS_EXPECT_REPLY, an invalid item was supplied,
+ src_id was != 0 and different from the current connection's ID,
+ a supplied memfd had a size of 0, a string was not properly
+ nul-terminated
+ -ENOTUNIQ KDBUS_MSG_FLAGS_EXPECT_REPLY was set, but the dst_id is set
+ to KDBUS_DST_ID_BROADCAST
+ -E2BIG Too many items
+ -EMSGSIZE A payload vector was too big, and the current user is
+ unprivileged.
+ -ENOTUNIQ A fd or memfd payload was passed in a broadcast message, or
+ a timeout was given for a broadcast message
+ -EEXIST Multiple KDBUS_ITEM_FDS or KDBUS_ITEM_BLOOM_FILTER,
+ KDBUS_ITEM_DST_NAME were supplied
+ -EBADF A memfd item contained an illegal fd
+ -EMEDIUMTYPE A file descriptor which is not a kdbus memfd was
+ refused to send as KDBUS_MSG_PAYLOAD_MEMFD.
+ -EMFILE Too many file descriptors inside a KDBUS_ITEM_FDS
+ -EBADMSG An item had illegal size, both a dst_id and a
+ KDBUS_ITEM_DST_NAME was given, or both a name and a bloom
+ filter was given
+ -ETXTBSY A kdbus memfd file cannot be sealed or the seal removed,
+ because it is shared with other processes or still mmap()ed
+ -ECOMM A peer does not accept the file descriptors addressed to it
+ -EFAULT The supplied bloom filter size was not 64-bit aligned
+ -EDOM The supplied bloom filter size did not match the bloom filter
+ size of the bus
+ -EDESTADDRREQ dst_id was set to KDBUS_DST_ID_NAME, but no KDBUS_ITEM_DST_NAME
+ was attached
+ -ESRCH The name to look up was not found in the name registry
+ -EADDRNOTAVAIL KDBUS_MSG_FLAGS_NO_AUTO_START was given but the destination
+ connection is an activator.
+ -ENXIO The passed numeric destination connection ID couldn't be found,
+ or is not connected
+ -ECONNRESET The destination connection is no longer active
+ -ETIMEDOUT Timeout while synchronously waiting for a reply
+ -EINTR System call interrupted while synchronously waiting for a reply
+ -EPIPE When sending a message, a synchronous reply from the receiving
+ connection was expected but the connection died before
+ answering
+ -ECANCELED A synchronous message sending was cancelled
+ -ENOBUFS Too many pending messages on the receiver side
+ -EREMCHG Both a well-known name and a unique name (ID) was given, but
+ the name is not currently owned by that connection.
+
+
+ -EINVAL Invalid flags or offset
+ -EAGAIN No message found in the queue
+ -ENOMSG No message of the requested priority found
+
+
+ -EINVAL Invalid flags
+ -ENOENT Pending message with the supplied cookie not found
+
+
+ -ENXIO No pool slice found at given offset
+ -EINVAL Invalid flags provided, the offset is valid, but the user is
+ not allowed to free the slice. This happens, for example, if
+ the offset was retrieved with KDBUS_RECV_PEEK.
+
+
+ -EINVAL Illegal command flags, illegal name provided, or an activator
+ tried to acquire a second name
+ -EPERM Policy prohibited name ownership
+ -EALREADY Connection already owns that name
+ -EEXIST The name already exists and can not be taken over
+ -ECONNRESET The connection was reset during the call
+
+
+ -EINVAL Invalid command flags, or invalid name provided
+ -ESRCH Name is not found found in the registry
+ -EADDRINUSE Name is owned by a different connection and can't be released
+
+
+ -EINVAL Invalid flags
+ -ENOBUFS No available memory in the connection's pool.
+
+
+ -EINVAL Invalid flags, or neither an ID nor a name was provided,
+ or the name is invalid.
+ -ESRCH Connection lookup by name failed
+ -ENXIO No connection with the provided number connection ID found
+
+
+ -EINVAL Illegal flags or items
+ -EOPNOTSUPP Operation not supported by connection.
+ -E2BIG Too many policy items attached
+ -EINVAL Wildcards submitted in policy entries, or illegal sequence
+ of policy items
+
+
+ -E2BIG Too many policy items attached
+ -EINVAL Invalid flags, or wildcards submitted in policy entries,
+ or illegal sequence of policy items
+
+
+ -EINVAL Illegal flags or items
+ -EDOM Illegal bloom filter size
+ -EMFILE Too many matches for this connection
+
+
+ -EINVAL Illegal flags
+ -ENOENT A match entry with the given cookie could not be found.
--
Peter Meerwald
+43-664-2444418 (mobile)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-11-02 01:30:48 UTC
Permalink
Post by Peter Meerwald
Post by Greg Kroah-Hartman
kdbus is a system for low-latency, low-overhead, easy to use
interprocess communication (IPC).
The interface to all functions in this driver is implemented through ioctls
on /dev nodes. This patch adds detailed documentation about the kernel
level API design.
just some typos below
<snip>

Many thanks for the fixes, I've made them all to the file now, it will
show up in the next version we send out.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-29 22:16:24 UTC
Permalink
(reply 1/2 -- I'm replying twice to keep the threading sane)

On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
* Support for multiple domains, completely separated from each other,
allowing multiple virtualized instances to be used at the same time.
Given that there is no such thing as a device namespace, how does this work?

The docs seem a bit confusing to me as to whether there's a hierarchy
of domains. Do domains have a concept of a parent?

What's "container-name"?

Given that domains have random IDs, how can they be checkpointed and restored?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:28:45 UTC
Permalink
Post by Andy Lutomirski
(reply 1/2 -- I'm replying twice to keep the threading sane)
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
* Support for multiple domains, completely separated from each other,
allowing multiple virtualized instances to be used at the same time.
Given that there is no such thing as a device namespace, how does this work?
See the document for the details.
Post by Andy Lutomirski
The docs seem a bit confusing to me as to whether there's a hierarchy
of domains. Do domains have a concept of a parent?
Yes.
Post by Andy Lutomirski
What's "container-name"?
Is that used in the documentation?
Post by Andy Lutomirski
Given that domains have random IDs, how can they be checkpointed and restored?
Good question, I don't know about checkpoint/restore, but I think that
has been done. Daniel would know more than I do about that.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-29 22:34:52 UTC
Permalink
On Wed, Oct 29, 2014 at 3:27 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
Post by Andy Lutomirski
(reply 1/2 -- I'm replying twice to keep the threading sane)
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
* Support for multiple domains, completely separated from each other,
allowing multiple virtualized instances to be used at the same time.
Given that there is no such thing as a device namespace, how does this work?
See the document for the details.
Post by Andy Lutomirski
The docs seem a bit confusing to me as to whether there's a hierarchy
of domains. Do domains have a concept of a parent?
Yes.
Why? Aren't they completely isolated? Confused.
Post by Greg Kroah-Hartman
Post by Andy Lutomirski
What's "container-name"?
Is that used in the documentation?
/dev/kdbus/domain/<container-name>/+ directory shows up inside the
domain as /dev/kdbus/.

I guess that's the thing that the creator requests.

--Andy
Post by Greg Kroah-Hartman
Post by Andy Lutomirski
Given that domains have random IDs, how can they be checkpointed and restored?
Good question, I don't know about checkpoint/restore, but I think that
has been done. Daniel would know more than I do about that.
thanks,
greg k-h
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 02:28:22 UTC
Permalink
On Wed, Oct 29, 2014 at 3:27 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
Post by Andy Lutomirski
(reply 1/2 -- I'm replying twice to keep the threading sane)
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
* Support for multiple domains, completely separated from each other,
allowing multiple virtualized instances to be used at the same time.
Given that there is no such thing as a device namespace, how does this work?
See the document for the details.
They seem insufficient to me, so I tried to dig in to the code. My
understanding is:

The parent container has /dev mounted. It sends an IOCTL (which
requires global capabilities). In response, kdbus creates a whole
bunch of devices that get put (by udev or devtmpfs, I presume) in a
subdirectory. Then the parent container mounts that subdirectory in
the new container.

This is IMO rather problematic.

First, it enforces the existence of a kdbus domain hierarchy where
none should be needed.

Second, it's incompatible with nested user namespaces. The middle
namespace can't issue the ioctl.

Third, it requires a devtmpfs submount in the child container. This
scares me, especially since there are no device namespaces. Also, the
child container appears to be dependent on the host udev to arbitrate
everything, which seems totally wrong to me. (Also, now we're exposed
to attacks where the child container creates busses or endpoints or
whatever with malicious names to try to trick the host into screwing
up.)

ISTM this should be solved either with device namespaces (which is
well known to be a giant can of worms) or by abandoning the concept of
kdbus using device nodes entirely.

What if kdbus were kdbusfs? If you want to use it in a container, you
mount a brand-new kdbusfs there. No weird domain hierarchy, no global
privilege, no need to name containers, obvious migration semantics, no
dependence on udev/devtmpfs at all, etc.

Eric, any thoughts here?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric W. Biederman
2014-10-30 04:22:14 UTC
Permalink
The userspace API breaks userspace in an unfixable way.

Nacked-by: "Eric W. Biederman" <***@xmission.com>

Problem the first.
- Using global names for containers makes it impossible to create
unprivileged containers.

This is a back to the drawing board problem, and makes device
nodes fundamentally unsuited to what you are doing.

There is no way that I can see to make it safe for an unprivileged
user to create arbitrary named busses. Especially in the presence
of allowing unprivileged checkpoint/restart.

This is particularly bad as kdbus explicitly allows unprivielged
creation of new kdbus instances.

This problem is a userspace regression.

Problem the second.
- The security checks in the code are not based on who opens the
file descriptors but instead based on who is used the file
descriptors at any give moment.

That pattern has been shown to be exploitable.

I expect the policy database makes this poor choice of permission
checks even worse. Pass a more privileged user a kdbus file
descriptor and all of sudden things that were not possible on
that file descriptor become possible.

Problem the third.
- You are using device numbers for things created by unprivileged
users. That breaks checkpoint/restart. Aka CRIU.

We can not migrate a container to a new machine and preserve the
device numbers.

We can not migrate a container to a new machine and have any hope
of preserving the container patsh under /dev/kdbus/...

Both of which look like fundamental show stoppers for
checkpoint/restart.
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 3:27 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
Post by Andy Lutomirski
(reply 1/2 -- I'm replying twice to keep the threading sane)
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
* Support for multiple domains, completely separated from each other,
allowing multiple virtualized instances to be used at the same time.
Given that there is no such thing as a device namespace, how does this work?
See the document for the details.
They seem insufficient to me, so I tried to dig in to the code. My
The parent container has /dev mounted. It sends an IOCTL (which
requires global capabilities). In response, kdbus creates a whole
bunch of devices that get put (by udev or devtmpfs, I presume) in a
subdirectory. Then the parent container mounts that subdirectory in
the new container.
This is IMO rather problematic.
First, it enforces the existence of a kdbus domain hierarchy where
none should be needed.
Second, it's incompatible with nested user namespaces. The middle
namespace can't issue the ioctl.
Third, it requires a devtmpfs submount in the child container. This
scares me, especially since there are no device namespaces. Also, the
child container appears to be dependent on the host udev to arbitrate
everything, which seems totally wrong to me. (Also, now we're exposed
to attacks where the child container creates busses or endpoints or
whatever with malicious names to try to trick the host into screwing
up.)
ISTM this should be solved either with device namespaces (which is
well known to be a giant can of worms) or by abandoning the concept of
kdbus using device nodes entirely.
What if kdbus were kdbusfs? If you want to use it in a container, you
mount a brand-new kdbusfs there. No weird domain hierarchy, no global
privilege, no need to name containers, obvious migration semantics, no
dependence on udev/devtmpfs at all, etc.
Eric, any thoughts here?
I think a kdbusfs modeled on devpts with newinstance at
mount time would solve the naming problems.

That would break one of the current kdbus use cases that allows an
unprivileged user to create a bus.

Eric

p.s. Please excuse my brevity I have am in the middle of packing up my
possessions (including my main machine), as I move this week.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Tom Gundersen
2014-10-30 10:16:09 UTC
Permalink
Hi Eric,

On Thu, Oct 30, 2014 at 5:20 AM, Eric W. Biederman
Post by Eric W. Biederman
The userspace API breaks userspace in an unfixable way.
Problem the first.
- Using global names for containers makes it impossible to create
unprivileged containers.
I don't follow.

Just so we are on the same page:
- creating a domain per container is only a convention, and has to
be done manually. I.e., the worst case scenario is that you are able
to create some container which cannot get a corresponding kdbus
domain.
- domain names are only unique per parent-domain, and domains are
fully recursive. We explicitly tested recursive domains by running
kdbus-enabled containers within kdbus-enabled containers, a number of
iterations deep.

Could you explain the problem you see in more detail? This might just
be a documenation issue, after all.
Post by Eric W. Biederman
This is a back to the drawing board problem, and makes device
nodes fundamentally unsuited to what you are doing.
There is no way that I can see to make it safe for an unprivileged
user to create arbitrary named busses. Especially in the presence
of allowing unprivileged checkpoint/restart.
Note that unprivileged users cannot create arbitrary named busses, the
names must have the format $PID-<arbitrary name>. Do you see a problem
with this?
Post by Eric W. Biederman
This is particularly bad as kdbus explicitly allows unprivielged
creation of new kdbus instances.
What do you mean by kdbus instance? A new domain? This is not allowed
by unprivileged processes. Or do you mean a new bus, in which case see
above.
Post by Eric W. Biederman
This problem is a userspace regression.
This is all new functionality, how does it affect current code?
Post by Eric W. Biederman
Problem the second.
- The security checks in the code are not based on who opens the
file descriptors but instead based on who is used the file
descriptors at any give moment.
That pattern has been shown to be exploitable.
I expect the policy database makes this poor choice of permission
checks even worse. Pass a more privileged user a kdbus file
descriptor and all of sudden things that were not possible on
that file descriptor become possible.
Djalal already commented on this point in another thread. But just to
recap: Please note that we do not do read()/write() at all, only
ioctl's, so the most common exploits do not apply. Moreover, we are
following the same API pattern as used by other similar APIs in the
kernel. With that in mind, could you give some more specific
information about what kind of exploits you imagine?
Post by Eric W. Biederman
Problem the third.
- You are using device numbers for things created by unprivileged
users. That breaks checkpoint/restart. Aka CRIU.
We can not migrate a container to a new machine and preserve the
device numbers.
I must admit to not being too familiar with checkpoint/restart. What
precisely is the problem with unprivileged users?
Post by Eric W. Biederman
We can not migrate a container to a new machine and have any hope
of preserving the container patsh under /dev/kdbus/...
You may not be able to preserve the full path, no, but the container
should not know/care about the parent paths anyway. Note that the
containers only see their own domain subtree mounted to /dev/kdbus,
they see nothing from the parent. Hence when you migrate containers
you can change the naming of the parent freely, but the processes
inside the containers won't see that, they'll have stable paths. I'm
not seeing the problem here, care to elaborate?
Post by Eric W. Biederman
I think a kdbusfs modeled on devpts with newinstance at
mount time would solve the naming problems.
Effectively, what we have in place in the current patch set delivers
similar semantics, however without introducing a new file system. You
just create a new domain and get a new subdir in /dev/kdbus/ for it,
and then inside the container you mount that subdir of /dev/kdbus onto
/dev/kdbus itself.

Do I understand you correctly that what you want is unnamed/anonymous
domains? Considering that domain creation is anyway privileged, why is
this necessary?
Post by Eric W. Biederman
That would break one of the current kdbus use cases that allows an
unprivileged user to create a bus.
That is a fundamental usecase, so I don't think it makes much sense to
do anything that precludes that.

Cheers,

Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric W. Biederman
2014-10-30 12:03:48 UTC
Permalink
Post by Tom Gundersen
Hi Eric,
On Thu, Oct 30, 2014 at 5:20 AM, Eric W. Biederman
Post by Eric W. Biederman
The userspace API breaks userspace in an unfixable way.
Problem the first.
- Using global names for containers makes it impossible to create
unprivileged containers.
I don't follow.
- creating a domain per container is only a convention, and has to
be done manually. I.e., the worst case scenario is that you are able
to create some container which cannot get a corresponding kdbus
domain.
Which is the classic definition of failure to restore a checkpoint. You
can't get the name you needed.
Post by Tom Gundersen
- domain names are only unique per parent-domain, and domains are
fully recursive. We explicitly tested recursive domains by running
kdbus-enabled containers within kdbus-enabled containers, a number of
iterations deep.
Could you explain the problem you see in more detail? This might just
be a documenation issue, after all.
Partly there is just a ridiculous amount of complexity in having
hiearchical names when there is fundamentally no hierarchy.

The problem I see is that creating a kdbus requires someone to grant you
privilege to do it. You have to ask permission from the system
administrator. For unprivileged containers you don't have to ask
permission to create one, you just need the appropriate support in your
kernel.

Given the fact you smash all of the names together in a hierarchy I
can't see how you can avoid requiring privilege for part of the
hierarchy creation.
Post by Tom Gundersen
Post by Eric W. Biederman
This is a back to the drawing board problem, and makes device
nodes fundamentally unsuited to what you are doing.
There is no way that I can see to make it safe for an unprivileged
user to create arbitrary named busses. Especially in the presence
of allowing unprivileged checkpoint/restart.
Note that unprivileged users cannot create arbitrary named busses, the
names must have the format $PID-<arbitrary name>. Do you see a problem
with this?
Yes. What pid namespace is that in?

How do I restore a checkpoint?
Post by Tom Gundersen
Post by Eric W. Biederman
This is particularly bad as kdbus explicitly allows unprivielged
creation of new kdbus instances.
What do you mean by kdbus instance? A new domain? This is not allowed
by unprivileged processes. Or do you mean a new bus, in which case see
above.
Oh great two concepts domains and busses. The bottom line if I can't
create both unprivileged it is a regression in the functionality of
unprivileged containers.
Post by Tom Gundersen
Post by Eric W. Biederman
This problem is a userspace regression.
This is all new functionality, how does it affect current code?
If you simply change the existing dbus users to use kdbus you get a
regression in containers. Furthermore you get a regression in what
kinds of userspace a container can contain.
Post by Tom Gundersen
Post by Eric W. Biederman
Problem the second.
- The security checks in the code are not based on who opens the
file descriptors but instead based on who is used the file
descriptors at any give moment.
That pattern has been shown to be exploitable.
I expect the policy database makes this poor choice of permission
checks even worse. Pass a more privileged user a kdbus file
descriptor and all of sudden things that were not possible on
that file descriptor become possible.
Djalal already commented on this point in another thread. But just to
recap: Please note that we do not do read()/write() at all, only
ioctl's, so the most common exploits do not apply. Moreover, we are
following the same API pattern as used by other similar APIs in the
kernel.
A pattern that has led to an exploitable kernel, because it breaks the
principle of least surprise.
Post by Tom Gundersen
With that in mind, could you give some more specific
information about what kind of exploits you imagine?
I don't know if it is exploitable or simply a maintenance disaster. But
the behavior of file descriptors changing based on who is performing
operations on it is wrong. It breaks the common unix expectations.

It means I can not pass a file descriptor into a strongly sandboxed
application and be able to predict what can be done with the file
descriptor in the sand box.

I suspect what you really want are system calls. As system calls are
both less overhead and easier to understand what is going on.
Especially for something as commonly used as kdbus is aiming to be
ioctls seem like code obfuscation.

The easiest problem to trigger that I can imagine is an application that
calls setresuid will have unpredicatable behavior if the change their
effective uid happens between one call and the next of your ioctl.
Which can create subtle and difficult to find bugs.

There are also all kinds of issues with respect to namespaces that if
you care about the namespace you are referring to has to be pinned at
open time.
Post by Tom Gundersen
Post by Eric W. Biederman
Problem the third.
- You are using device numbers for things created by unprivileged
users. That breaks checkpoint/restart. Aka CRIU.
We can not migrate a container to a new machine and preserve the
device numbers.
I must admit to not being too familiar with checkpoint/restart. What
precisely is the problem with unprivileged users?
Post by Eric W. Biederman
We can not migrate a container to a new machine and have any hope
of preserving the container patsh under /dev/kdbus/...
You may not be able to preserve the full path, no, but the container
should not know/care about the parent paths anyway. Note that the
containers only see their own domain subtree mounted to /dev/kdbus,
they see nothing from the parent. Hence when you migrate containers
you can change the naming of the parent freely, but the processes
inside the containers won't see that, they'll have stable paths. I'm
not seeing the problem here, care to elaborate?
Domain creation.
Random path conflicts for no reason except we have two machines.
Post by Tom Gundersen
Post by Eric W. Biederman
I think a kdbusfs modeled on devpts with newinstance at
mount time would solve the naming problems.
Effectively, what we have in place in the current patch set delivers
similar semantics, however without introducing a new file system. You
just create a new domain and get a new subdir in /dev/kdbus/ for it,
and then inside the container you mount that subdir of /dev/kdbus onto
/dev/kdbus itself.
Do I understand you correctly that what you want is unnamed/anonymous
domains? Considering that domain creation is anyway privileged, why is
this necessary?
When an unprivileged user needs a new domain? If domains are unnamed
it is possible that their creation not require privilege.

Anything that requires stopping and asking the system administrator
for something so that I can do today with an unprivileged container
winds up being a regression, a design bug, and a showstopper.

Unless there is a massive miscommunication you have those kinds of
issues with the kbus design.


I would love to hear different but it sounds like domains are a weird
partial solution for the fact you have crammed everything into a
hierarchy for no good reason.
Post by Tom Gundersen
Post by Eric W. Biederman
That would break one of the current kdbus use cases that allows an
unprivileged user to create a bus.
That is a fundamental usecase, so I don't think it makes much sense to
do anything that precludes that.
Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 13:49:11 UTC
Permalink
Post by Tom Gundersen
Do I understand you correctly that what you want is unnamed/anonymous
domains? Considering that domain creation is anyway privileged, why is
this necessary?
As an executive summary, this is the *problem*, not a mitigation.
Domain creation *should not require privilege*. You should be able to
do it in a user namespace in which you have appropriate capabilities
without needing systemd's (or whatever other daemon's) help from
outside.

Once you fix that (which may not have broken whatever you tested with
but will absolutely break anyone who tries to use this in LXC, Docker,
Sandstorm, etc. without awful hacks) then you will have all of the
problems that you've currently mitigated.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg KH
2014-10-29 22:16:37 UTC
Permalink
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
{sigh}

I'll blame it on the jet-lag for the lack of [XX/12] markings on the
patches. I'll give it a day for review before resending if people
really want to know the ordering. It doesn't matter except for the
final patch that adds the code to the build file.

sorry about that,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric W. Biederman
2014-10-30 04:06:02 UTC
Permalink
Post by Greg KH
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
{sigh}
I'll blame it on the jet-lag for the lack of [XX/12] markings on the
patches. I'll give it a day for review before resending if people
really want to know the ordering. It doesn't matter except for the
final patch that adds the code to the build file.
sorry about that,
For what it is worth these patches are also poorly split up. Every
patch I looked at in detail had functions that were being introduced
that did not have callers.

That poor split up of the patches makes it difficult to see how
the functionality that is being introduced is being used.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-10-30 07:12:14 UTC
Permalink
Post by Eric W. Biederman
For what it is worth these patches are also poorly split up. Every
patch I looked at in detail had functions that were being introduced
that did not have callers.
Yes, we wanted to keep the reply threading cleaner and the individual
patches short. With a patch set that avoids introducing functions
without callers, each patch would have grown substantially. But I know
that's unusual to do it that way.
Post by Eric W. Biederman
That poor split up of the patches makes it difficult to see how
the functionality that is being introduced is being used.
Ok, I see. For now, I think it's probably easiest to pull the patches
from here, and then look at the resulting files directly:


https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/log/?h=kdbus

Other than that, please give us some time to respond to your longer
reply. Thanks for taking the time to write this up!


Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-29 22:19:51 UTC
Permalink
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
* Attachment of trustable metadata to each message on demand, such as
the sending peer's timestamp, creds, auxgroups, comm, exe, cmdline,
cgroup path, capabilities, security label, audit information, etc,
each taken at the time the sender issued the ioctl to send the
message. Which of those are actually recorded and attached is
controlled by the receiving peer.
I think that each piece of trustable metadata needs to be explicitly
opted-in to by the sender at the time of capture. Otherwise you're
asking for lots of information leaks and privilege escalations. This
is especially important given that some of the items in the current
list could be rather sensitive.

NB: UNIX sockets get this wrong, too, but that doesn't mean that kdbus
gets to blindly follow SCM_CREDENTIALS's lead. Also, there is no
excuse here about legacy code that won't opt in when needed.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 22:26:46 UTC
Permalink
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
* Attachment of trustable metadata to each message on demand, such as
the sending peer's timestamp, creds, auxgroups, comm, exe, cmdline,
cgroup path, capabilities, security label, audit information, etc,
each taken at the time the sender issued the ioctl to send the
message. Which of those are actually recorded and attached is
controlled by the receiving peer.
I think that each piece of trustable metadata needs to be explicitly
opted-in to by the sender at the time of capture. Otherwise you're
asking for lots of information leaks and privilege escalations. This
is especially important given that some of the items in the current
list could be rather sensitive.
You do have to opt-in for this information at time of capture, so I
don't understand the issue here. This is the same type of thing that
dbus does today, and I don't see the information leaks happening there,
do you?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-29 22:28:55 UTC
Permalink
On Wed, Oct 29, 2014 at 3:25 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
* Attachment of trustable metadata to each message on demand, such as
the sending peer's timestamp, creds, auxgroups, comm, exe, cmdline,
cgroup path, capabilities, security label, audit information, etc,
each taken at the time the sender issued the ioctl to send the
message. Which of those are actually recorded and attached is
controlled by the receiving peer.
I think that each piece of trustable metadata needs to be explicitly
opted-in to by the sender at the time of capture. Otherwise you're
asking for lots of information leaks and privilege escalations. This
is especially important given that some of the items in the current
list could be rather sensitive.
You do have to opt-in for this information at time of capture, so I
don't understand the issue here. This is the same type of thing that
dbus does today, and I don't see the information leaks happening there,
do you?
The docs suggest that the *receiver* opts in.

I don't think that current dbus has severe information leaks because
the total scope for information transparently sent to dbus is rather
small (struct ucred only, presumably).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-29 22:37:17 UTC
Permalink
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 3:25 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 3:00 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
* Attachment of trustable metadata to each message on demand, such as
the sending peer's timestamp, creds, auxgroups, comm, exe, cmdline,
cgroup path, capabilities, security label, audit information, etc,
each taken at the time the sender issued the ioctl to send the
message. Which of those are actually recorded and attached is
controlled by the receiving peer.
I think that each piece of trustable metadata needs to be explicitly
opted-in to by the sender at the time of capture. Otherwise you're
asking for lots of information leaks and privilege escalations. This
is especially important given that some of the items in the current
list could be rather sensitive.
You do have to opt-in for this information at time of capture, so I
don't understand the issue here. This is the same type of thing that
dbus does today, and I don't see the information leaks happening there,
do you?
The docs suggest that the *receiver* opts in.
So does the code:

+ /*
+ * The first receiver which requests additional
+ * metadata causes the message to carry it; all
+ * receivers after that will see all of the added
+ * data, even when they did not ask for it.
+ */
+ if (conn_src) {
+ /* Check if conn_src is allowed to signal */
+ ret = kdbus_ep_policy_check_broadcast(conn_dst->ep,
+ conn_src,
+ conn_dst);
+ if (ret < 0)
+ continue;
+
+ ret = kdbus_ep_policy_check_src_names(conn_dst->ep,
+ conn_src,
+ conn_dst);
+ if (ret < 0)
+ continue;
+
+ ret = kdbus_kmsg_attach_metadata(kmsg, conn_src,
+ conn_dst);
+ if (ret < 0)
+ goto exit_unlock;
+ }
+

I'd like this if the sender chose the metadata flags. In fact, I'd
want to make that feature available on regular UNIX sockets, too
(search the archives for SCM_IDENTITY).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-10-30 07:44:21 UTC
Permalink
Post by Andy Lutomirski
On Wed, Oct 29, 2014 at 3:25 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
You do have to opt-in for this information at time of capture, so
I don't understand the issue here. This is the same type of thing
that dbus does today, and I don't see the information leaks
happening there, do you?
The docs suggest that the *receiver* opts in.
Yes, that's true.
Post by Andy Lutomirski
I don't think that current dbus has severe information leaks because
the total scope for information transparently sent to dbus is rather
small (struct ucred only, presumably).
Which piece of credential information are you concerned about,
particularly? I might miss something, but AFAICS, all of that
information can be queried by a remote peer anyway, through /proc for
instance. The reason why we (optionally) attach them to messages is that
we want to let the other side know which information was authoritative,
precisely at the time the message was sent. Current implementation can't
do that in a race-free way.

Also note that we currently drop all such metadata whenever a message
crosses a PID or user namespace boundary. This is because we currently
don't know yet which information we would want to transport in such
cases, and how the translation in both directions would look like, from
a semantic perspective. Hence, we decided to leave that for later.

I'll go through your other replies during the day. Thanks for your input
on that RFC, everyone.


Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-11-05 14:35:03 UTC
Permalink
Post by Andy Lutomirski
I think that each piece of trustable metadata needs to be explicitly
opted-in to by the sender at the time of capture. Otherwise you're
asking for lots of information leaks and privilege escalations. This
is especially important given that some of the items in the current
list could be rather sensitive.
Alright, the above seems to pretty much sum up that end of our
discussion. To address this, We've now added the following functionality
for v2:

* The attach_flags in kdbus_cmd_hello was split into two parts,
attach_flags_send and attach_flags_recv, so each peer may chose what
exactly it want to transmit or receive.

* Metadata will only be attached to the final message in the
receiver's pool if both the sender's attach_flags_send and the
receiver's attach_flags_recv bit are set.

* Consequently, the existing KDBUS_ITEM_ATTACH_FLAGS item type is
split into KDBUS_ITEM_ATTACH_FLAGS_SEND and
KDBUS_ITEM_ATTACH_FLAGS_RECV, so that both connection details can be
separately updated through KDBUS_CMD_CONN_UPDATE.

* To allow for use cases that require certain metadata to be attached
on each message, we've added a negotiation mechanism to the HELLO
ioctl: An optional metadata mask can be passed during the creation
of buses, so bus owners may require certain bits in
attach_flags_send to be set. That way, the creator of the bus will
specify which metadata is required to fulfill the requirements of
the specification of the role of the bus.


Thanks again for your input!

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jiri Kosina
2014-10-29 23:00:31 UTC
Permalink
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
I'd be interested in the features that can't be implemented in userspace
(and therefore would justify existence of kdbus in the kernel). Could you
please point me to such list / documentation?

It seems to me that most of the highlight features from the cover letter
can be "easily" (for certain definition of that word, of course)
implemented in userspace (vmsplice(), sending fd through unix socket, user
namespaces, UUID management, etc).

Thanks,
--
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 23:12:30 UTC
Permalink
Post by Jiri Kosina
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
I'd be interested in the features that can't be implemented in userspace
(and therefore would justify existence of kdbus in the kernel). Could you
please point me to such list / documentation?
Lennart has given whole talks about this in the past, here's a recent
talk going into the details:

Post by Jiri Kosina
It seems to me that most of the highlight features from the cover letter
can be "easily" (for certain definition of that word, of course)
implemented in userspace (vmsplice(), sending fd through unix socket, user
namespaces, UUID management, etc).
We have dbus in userspace today, but that requires extra copies of data,
and isn't easy, or even possible, to do some of the application-specific
bus logic that kdbus provides. See the talk above for details, there
are slides around somewhere with just text that we can add to the cover
letter if that will help out in future spins of this patch series.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 23:13:47 UTC
Permalink
Post by Greg Kroah-Hartman
Post by Jiri Kosina
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
I'd be interested in the features that can't be implemented in userspace
(and therefore would justify existence of kdbus in the kernel). Could you
please point me to such list / documentation?
Lennart has given whole talks about this in the past, here's a recent
http://youtu.be/HPbQzm_iz_k
Post by Jiri Kosina
It seems to me that most of the highlight features from the cover letter
can be "easily" (for certain definition of that word, of course)
implemented in userspace (vmsplice(), sending fd through unix socket, user
namespaces, UUID management, etc).
We have dbus in userspace today, but that requires extra copies of data,
and isn't easy, or even possible, to do some of the application-specific
bus logic that kdbus provides. See the talk above for details, there
are slides around somewhere with just text that we can add to the cover
letter if that will help out in future spins of this patch series.
Here's an article describing it as well:
https://lwn.net/Articles/580194/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jiri Kosina
2014-10-29 23:24:17 UTC
Permalink
Post by Greg Kroah-Hartman
Post by Jiri Kosina
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
I'd be interested in the features that can't be implemented in userspace
(and therefore would justify existence of kdbus in the kernel). Could you
please point me to such list / documentation?
Lennart has given whole talks about this in the past, here's a recent
http://youtu.be/HPbQzm_iz_k
I think it's a reasonable expectation that kernel patch submissions should
be reasonably self-contained though. We've always been very strict about
pushing everybody to provide extensive cover letters, changelogs and
explanations, so this shouldn't really be an exception, I think.
Post by Greg Kroah-Hartman
Post by Jiri Kosina
It seems to me that most of the highlight features from the cover letter
can be "easily" (for certain definition of that word, of course)
implemented in userspace (vmsplice(), sending fd through unix socket, user
namespaces, UUID management, etc).
We have dbus in userspace today, but that requires extra copies of data,
But we can do zero-copy between processess for quite some time already, so
what exactly is the issue here?
Post by Greg Kroah-Hartman
and isn't easy, or even possible, to do some of the application-specific
bus logic that kdbus provides.
I unfortunately have absolutely no idea what should I imagine here.
Post by Greg Kroah-Hartman
See the talk above for details, there are slides around somewhere with
just text that we can add to the cover letter if that will help out in
future spins of this patch series.
I think that would be very helpful. Thanks.
--
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jiri Kosina
2014-10-29 23:26:40 UTC
Permalink
Post by Jiri Kosina
Post by Greg Kroah-Hartman
Post by Jiri Kosina
It seems to me that most of the highlight features from the cover letter
can be "easily" (for certain definition of that word, of course)
implemented in userspace (vmsplice(), sending fd through unix socket, user
namespaces, UUID management, etc).
We have dbus in userspace today, but that requires extra copies of data,
But we can do zero-copy between processess for quite some time already, so
what exactly is the issue here?
Post by Greg Kroah-Hartman
and isn't easy, or even possible, to do some of the application-specific
bus logic that kdbus provides.
I unfortunately have absolutely no idea what should I imagine here.
Also, I think I have heard that binder is going out of staging now, right?

I admittedly have very limited understanding of both binder and kdbus, but
I guess that is the case for many folks. My understanding is that they are
providing very similar functionality, so explanation why we need *both* in
the kernel would be very interesting as well.
--
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 23:35:53 UTC
Permalink
Post by Jiri Kosina
Post by Jiri Kosina
Post by Greg Kroah-Hartman
Post by Jiri Kosina
It seems to me that most of the highlight features from the cover letter
can be "easily" (for certain definition of that word, of course)
implemented in userspace (vmsplice(), sending fd through unix socket, user
namespaces, UUID management, etc).
We have dbus in userspace today, but that requires extra copies of data,
But we can do zero-copy between processess for quite some time already, so
what exactly is the issue here?
Post by Greg Kroah-Hartman
and isn't easy, or even possible, to do some of the application-specific
bus logic that kdbus provides.
I unfortunately have absolutely no idea what should I imagine here.
Also, I think I have heard that binder is going out of staging now, right?
Yes, but that needs documentation, which I'm working on at the moment :)
Post by Jiri Kosina
I admittedly have very limited understanding of both binder and kdbus, but
I guess that is the case for many folks. My understanding is that they are
providing very similar functionality, so explanation why we need *both* in
the kernel would be very interesting as well.
They do very different things, see this writeup I did a while ago about
the differences between them:
http://kroah.com/log/blog/2014/01/15/kdbus-details/

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-29 23:41:18 UTC
Permalink
Post by Jiri Kosina
Post by Greg Kroah-Hartman
Post by Jiri Kosina
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
I'd be interested in the features that can't be implemented in userspace
(and therefore would justify existence of kdbus in the kernel). Could you
please point me to such list / documentation?
Lennart has given whole talks about this in the past, here's a recent
http://youtu.be/HPbQzm_iz_k
I think it's a reasonable expectation that kernel patch submissions should
be reasonably self-contained though. We've always been very strict about
pushing everybody to provide extensive cover letters, changelogs and
explanations, so this shouldn't really be an exception, I think.
There is a 1815 line documentation file in this series, so we aren't
trying to not provide this type of information here at all. But yes,
more background, about why this can't be done in userspace (zero copy,
less context switches, proper credential passing, timestamping, availble
at early-boot, LSM hooks for security models to tie into, race-free
interfaces, container/namespace support, etc.) should be added to the
docs as well.
Post by Jiri Kosina
Post by Greg Kroah-Hartman
Post by Jiri Kosina
It seems to me that most of the highlight features from the cover letter
can be "easily" (for certain definition of that word, of course)
implemented in userspace (vmsplice(), sending fd through unix socket, user
namespaces, UUID management, etc).
We have dbus in userspace today, but that requires extra copies of data,
But we can do zero-copy between processess for quite some time already, so
what exactly is the issue here?
See the above list for more details.

We'll work on this for the next round of patches, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-29 23:56:29 UTC
Permalink
On Wed, Oct 29, 2014 at 4:40 PM, Greg Kroah-Hartman
Post by Greg Kroah-Hartman
Post by Jiri Kosina
Post by Greg Kroah-Hartman
Post by Jiri Kosina
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
I'd be interested in the features that can't be implemented in userspace
(and therefore would justify existence of kdbus in the kernel). Could you
please point me to such list / documentation?
Lennart has given whole talks about this in the past, here's a recent
http://youtu.be/HPbQzm_iz_k
I think it's a reasonable expectation that kernel patch submissions should
be reasonably self-contained though. We've always been very strict about
pushing everybody to provide extensive cover letters, changelogs and
explanations, so this shouldn't really be an exception, I think.
There is a 1815 line documentation file in this series, so we aren't
trying to not provide this type of information here at all. But yes,
more background, about why this can't be done in userspace (zero copy,
less context switches, proper credential passing, timestamping, availble
at early-boot, LSM hooks for security models to tie into, race-free
interfaces, container/namespace support, etc.) should be added to the
docs as well.
It's worth noting that:

- Proper credential passing could be added to UNIX sockets, and we
may want to do that anyway. Also, the current kdbus semantics seem to
be "spew lots of credentials and other miscellaneous
potentially-sensitive and sometime spoofable information all over the
place", which isn't obviously an improvement. (This is fixable, but
it will almost certainly not be compatible with current systemd kdbus
code if fixed.)

- The current kdbus patches seem to be worse than UNIX sockets from a
namespace perspective, but maybe I'm misunderstanding how it's
supposed to work. UNIX sockets work quite nicely in containers.

- There's an obvious interface to add timestamping to UNIX sockets
(it could work exactly the way it does for UDP / PTP).

- I'm unconvinced by this performance argument without numbers. The
kdbus credential code, at least, looks to be quite heavy on allocation
and atomics. This isn't to say that the current userspace D-Bus
daemon doesn't also serialize everything, but it could be made
multithreaded.

- Race-free? What are the races that are inherent to UNIX sockets?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Tom Gundersen
2014-10-30 11:52:28 UTC
Permalink
Post by Andy Lutomirski
- Proper credential passing could be added to UNIX sockets, and we
may want to do that anyway. Also, the current kdbus semantics seem to
be "spew lots of credentials and other miscellaneous
potentially-sensitive and sometime spoofable information all over the
place", which isn't obviously an improvement. (This is fixable, but
it will almost certainly not be compatible with current systemd kdbus
code if fixed.)
Care to elaborate on what you think is spoofable, and what needs to be fixed?

Anyway, the idea is that by simply connecting to the bus and sending a
message to some service, you implicitly agree to passing some metadata
along to the service (and to a lesser extent to the bus). It's not
that this information is leaked, or that the peer could actively
access any of the sender's private memory. Also note that this kind of
metadata information is also available via /proc/$PID, and via
SCM_CREDENTIALS/SO_PEERCRED and the socket seclabel APIs. What the
kdbus API allows users to do is to get a lot more of this information
in a race-free way. For example, if you want to get the audit identity
bits, you can now get this attached securely by the kernel, at the
time the message is sent, rather than having to firest get the peer's
$PID from SCM_CREDENTIALS and then read the audit identity bits racily
from /proc/$PID/loginuid and /proc/$PID/sessionid.
Post by Andy Lutomirski
- The current kdbus patches seem to be worse than UNIX sockets from a
namespace perspective, but maybe I'm misunderstanding how it's
supposed to work. UNIX sockets work quite nicely in containers.
kdbus is recusively stackable for containers. You can run
kdbus-enabled containers within kdbus-enabled containers within
kdbus-enabled containers, with the full functionality available for
each container, and each container isolated from each other.

When credential information is passed between processes of different
(PID) namespaces most of the attached metadata is suppressed. This
isn't too different from how SCM_CREDENTIALS works, which will zero
out the bits it cannot translate as well.
Post by Andy Lutomirski
- There's an obvious interface to add timestamping to UNIX sockets
(it could work exactly the way it does for UDP / PTP).
Timestamping on AF_UNIX/SOCK_DGRAM already exists, but that's not
enough for the use-cases we want to support.
Post by Andy Lutomirski
- I'm unconvinced by this performance argument without numbers. The
kdbus credential code, at least, looks to be quite heavy on allocation
and atomics. This isn't to say that the current userspace D-Bus
daemon doesn't also serialize everything, but it could be made
multithreaded.
There are some major benefits regarding performance:

* fewer userspace context switches. For a full-duplex method call it's
down from five to two: instead of sender -> dbus daemon -> service ->
dbus daemon -> sender it's just sender -> service -> sender.
* fewer message copies in userspace. For a full-duplex method call
it's down from eight to two: instead of copying the method call data
into a socket, out of a socket, into a socket, out of a socket, and
the same for the method reply, we just copy one message directly to
the receiver, and the reply back.
* generally fewer syscalls involved. A synchronous method call is now
doable in a single ioctl on the sender side.
* memfds can be used for transport purposes of larger payload. This
way, we can cover substantial payload sizes instead of just small
control messages, with no extra copies. kdbus, in its transport layer,
makes sure only sealed memfds are passed in as payload, so the sender
cannot modify the contents while the receiver is already parsing it.
Post by Andy Lutomirski
- Race-free? What are the races that are inherent to UNIX sockets?
Does the above explain what we have in mind?

Note that the aim is not necessarily that kdbus should be better than
UNIX sockets in every way, nor that it should be favoured in all
cases. What we are trying to address is a common case in environments
where peers don't necessarily trust each other.

Cheers,

Tom
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Simon McVittie
2014-10-30 12:29:09 UTC
Permalink
Post by Tom Gundersen
For example, if you want to get the audit identity
bits, you can now get this attached securely by the kernel, at the
time the message is sent, rather than having to firest get the peer's
$PID from SCM_CREDENTIALS and then read the audit identity bits racily
from /proc/$PID/loginuid and /proc/$PID/sessionid
.. which dbus-daemon (traditional D-Bus) deliberately doesn't offer as
a feature, because we are not aware of any way to do that over Unix
sockets without a race condition; and if we can't have it securely, we
don't want to have it at all.
<https://bugs.freedesktop.org/show_bug.cgi?id=83499>
It would be great if kdbus can fix that omission.

Capabilities are in the same boat, and as a result, systemd can't
currently have D-Bus methods that can only be called with CAP_WHATEVER.
Post by Tom Gundersen
* fewer userspace context switches
[...]
Post by Tom Gundersen
* fewer message copies in userspace
Readers are probably already aware of this, but note that D-Bus is
designed to be usable between mutually distrusting processes, which is
why we use Unix sockets and a lot of copies, rather than mmap or something.

S

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andy Lutomirski
2014-10-30 14:00:28 UTC
Permalink
Post by Tom Gundersen
Post by Andy Lutomirski
- Proper credential passing could be added to UNIX sockets, and we
may want to do that anyway. Also, the current kdbus semantics seem to
be "spew lots of credentials and other miscellaneous
potentially-sensitive and sometime spoofable information all over the
place", which isn't obviously an improvement. (This is fixable, but
it will almost certainly not be compatible with current systemd kdbus
code if fixed.)
Care to elaborate on what you think is spoofable, and what needs to be fixed?
cmd and comm are trivially replaceable by any sender.
Post by Tom Gundersen
Anyway, the idea is that by simply connecting to the bus and sending a
message to some service, you implicitly agree to passing some metadata
along to the service (and to a lesser extent to the bus). It's not
that this information is leaked, or that the peer could actively
access any of the sender's private memory.
To me, this smells like bad design. By using kdbus, I implicitly
agree to send everyone my command line?!? If I'm in a cgroup that
policy decrees should be privileged, then I should invoke that
privilege by specifically asking, *at the time of capture*, to send
that cgroup. Otherwise it becomes unclear what things convey
privilege when, and that will lead immediately to incomprehensible
security models, and that will lead to exploits.

<snark>Sorry, but "implicitly agree" sounds a lot like using my
esteemed cellphone carrier. When I use it, some argue that I
implicitly agree to have my identity prepended to all outgoing HTTP
requests. This is *not* a good thing.</snark>
Post by Tom Gundersen
Also note that this kind of
metadata information is also available via /proc/$PID, and via
SCM_CREDENTIALS/SO_PEERCRED and the socket seclabel APIs.
Not if you have a sensible LSM policy or if you use hidepid. And,
once you've fixed the namespacing issues, not if the sender and
receiver are in different PID namespaces or if they don't have /proc
mounted at all.
Post by Tom Gundersen
When credential information is passed between processes of different
(PID) namespaces most of the attached metadata is suppressed.
This is a bug. It prevents users from usefully sandboxing themselves
in a kdbus world. If you create and enter a user namespace, then your
outside identity (which should be unchanged) is suppressed. (Note
that anything that captures credentials other than at open time is
also an issue for sandboxes in the other direction: it may interfere
with selective privilege dropping.)
Post by Tom Gundersen
This
isn't too different from how SCM_CREDENTIALS works, which will zero
out the bits it cannot translate as well.
SCM_CREDENTIALS translates the translatable parts.
Post by Tom Gundersen
* fewer userspace context switches. For a full-duplex method call it's
down from five to two: instead of sender -> dbus daemon -> service ->
dbus daemon -> sender it's just sender -> service -> sender.
* fewer message copies in userspace. For a full-duplex method call
it's down from eight to two: instead of copying the method call data
into a socket, out of a socket, into a socket, out of a socket, and
the same for the method reply, we just copy one message directly to
the receiver, and the reply back.
* generally fewer syscalls involved. A synchronous method call is now
doable in a single ioctl on the sender side.
* memfds can be used for transport purposes of larger payload. This
way, we can cover substantial payload sizes instead of just small
control messages, with no extra copies. kdbus, in its transport layer,
makes sure only sealed memfds are passed in as payload, so the sender
cannot modify the contents while the receiver is already parsing it.
There should be a number measured in, say, nanoseconds in here
somewhere. The actual extent of the speedup is unmeasurable here.
Also, it's worth reading at least one of Linus' many rants about
zero-copy. It's not an automatic win.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Alex Elsayed
2014-10-30 20:28:38 UTC
Permalink
Andy Lutomirski wrote:

<snip>
Post by Andy Lutomirski
There should be a number measured in, say, nanoseconds in here
somewhere. The actual extent of the speedup is unmeasurable here.
Also, it's worth reading at least one of Linus' many rants about
zero-copy. It's not an automatic win.
It's well-understood that it's not an automatic win; significant testing on
multiple architectures indicated that 512K is a surprisingly universal
crossover point. The userspace code, therefore, switches from copying
(normal kdbus parameters) to zero-copy (memfds) right around there.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Karol Lewandowski
2014-10-30 09:55:15 UTC
Permalink
Post by Greg Kroah-Hartman
There is a 1815 line documentation file in this series, so we aren't
trying to not provide this type of information here at all. But yes,
more background, about why this can't be done in userspace (zero copy,
less context switches, proper credential passing, timestamping, availble
at early-boot, LSM hooks for security models to tie into
While you're at it... I have worked on proof-of-concept LSM patches for
kdbus some time ago, see [1][2]. Currently, these are completely of date.

[1] https://github.com/lmctl/linux/commits/kdbus-lsm-v4.for-systemd-v212
[2] https://github.com/lmctl/kdbus/commit/aa0885489d19be92fa41c6f0a71df28763228a40

May I ask if you guys have your own plan for LSM or maybe it would be
worth to resurrect [1]?

Cheers,
--
Karol Lewandowski, Samsung R&D Institute Poland


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Karol Lewandowski
2014-10-30 10:45:43 UTC
Permalink
[ Sorry for breaking thread and resend - gmane rejected my original message
due to too long list of recipients... ]
Post by Greg Kroah-Hartman
There is a 1815 line documentation file in this series, so we aren't
trying to not provide this type of information here at all. But yes,
more background, about why this can't be done in userspace (zero copy,
less context switches, proper credential passing, timestamping, availble
at early-boot, LSM hooks for security models to tie into
While you're at it... I did some work on proof-of-concept LSM patches for
kdbus some time ago, see [1][2]. Currently, these are completely of date.

[1] https://github.com/lmctl/linux/commits/kdbus-lsm-v4.for-systemd-v212
[2] https://github.com/lmctl/kdbus/commit/aa0885489d19be92fa41c6f0a71df28763228a40

May I ask if you guys have your own plan for LSM or maybe it would be
worth to resurrect [1]?

Cheers,
--
Karol Lewandowski, Samsung R&D Institute Poland



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-30 14:48:27 UTC
Permalink
Post by Karol Lewandowski
[ Sorry for breaking thread and resend - gmane rejected my original message
due to too long list of recipients... ]
Post by Greg Kroah-Hartman
There is a 1815 line documentation file in this series, so we aren't
trying to not provide this type of information here at all. But yes,
more background, about why this can't be done in userspace (zero copy,
less context switches, proper credential passing, timestamping, availble
at early-boot, LSM hooks for security models to tie into
While you're at it... I did some work on proof-of-concept LSM patches for
kdbus some time ago, see [1][2]. Currently, these are completely of date.
[1] https://github.com/lmctl/linux/commits/kdbus-lsm-v4.for-systemd-v212
[2] https://github.com/lmctl/kdbus/commit/aa0885489d19be92fa41c6f0a71df28763228a40
May I ask if you guys have your own plan for LSM or maybe it would be
worth to resurrect [1]?
The core calls are already mediated by LSM today, right? We don't want
anyone to be parsing the data stream through an LSM, that idea got
rejected a long time ago as something that is really not a good idea.

Other than that, I don't know exactly what your patches do, or why they
are needed, care to go into details?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Karol Lewandowski
2014-10-30 19:57:02 UTC
Permalink
Post by Greg Kroah-Hartman
Post by Karol Lewandowski
[ Sorry for breaking thread and resend - gmane rejected my original message
due to too long list of recipients... ]
Post by Greg Kroah-Hartman
There is a 1815 line documentation file in this series, so we aren't
trying to not provide this type of information here at all. But yes,
more background, about why this can't be done in userspace (zero copy,
less context switches, proper credential passing, timestamping, availble
at early-boot, LSM hooks for security models to tie into
While you're at it... I did some work on proof-of-concept LSM patches for
kdbus some time ago, see [1][2]. Currently, these are completely of date.
[1] https://github.com/lmctl/linux/commits/kdbus-lsm-v4.for-systemd-v212
[2] https://github.com/lmctl/kdbus/commit/aa0885489d19be92fa41c6f0a71df28763228a40
May I ask if you guys have your own plan for LSM or maybe it would be
worth to resurrect [1]?
The core calls are already mediated by LSM today, right? We don't want
anyone to be parsing the data stream through an LSM, that idea got
rejected a long time ago as something that is really not a good idea.
Parsing data is out of question, of course, but this is not what we were
proposing.
Post by Greg Kroah-Hartman
Other than that, I don't know exactly what your patches do, or why they
are needed, care to go into details?
Patches in question were supposed to add few hooks for kdbus-specific
operations that doesn't seem to have compatible semantics with hooks
currently available in LSM.

kdbus' bus introduces quite a few new concepts that we wanted to be able
to limit based on MAC label/context, eg.

- check flags at HELO stage (say disallow fd passing),

- restrict ability to acquire name to certain subjects (for system bus),

- disallow creation of new buses,

- limit scope of broadcasts,

- etc.

Please take a look at hook list - I think most of names are self-explanatory:

https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add12/include/linux/security.h#L1874

kdbus modifications were pretty light - with most visible change being
addition of opaque security pointer to kdbus_bus and similar structs.

Thanks!
--
Karol Lewandowski, Samsung R&D Institute Poland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-30 20:25:58 UTC
Permalink
Post by Karol Lewandowski
Post by Greg Kroah-Hartman
Post by Karol Lewandowski
[ Sorry for breaking thread and resend - gmane rejected my original message
due to too long list of recipients... ]
Post by Greg Kroah-Hartman
There is a 1815 line documentation file in this series, so we aren't
trying to not provide this type of information here at all. But yes,
more background, about why this can't be done in userspace (zero copy,
less context switches, proper credential passing, timestamping, availble
at early-boot, LSM hooks for security models to tie into
While you're at it... I did some work on proof-of-concept LSM patches for
kdbus some time ago, see [1][2]. Currently, these are completely of date.
[1] https://github.com/lmctl/linux/commits/kdbus-lsm-v4.for-systemd-v212
[2] https://github.com/lmctl/kdbus/commit/aa0885489d19be92fa41c6f0a71df28763228a40
May I ask if you guys have your own plan for LSM or maybe it would be
worth to resurrect [1]?
The core calls are already mediated by LSM today, right? We don't want
anyone to be parsing the data stream through an LSM, that idea got
rejected a long time ago as something that is really not a good idea.
Parsing data is out of question, of course, but this is not what we were
proposing.
Glad to hear it :)
Post by Karol Lewandowski
Post by Greg Kroah-Hartman
Other than that, I don't know exactly what your patches do, or why they
are needed, care to go into details?
Patches in question were supposed to add few hooks for kdbus-specific
operations that doesn't seem to have compatible semantics with hooks
currently available in LSM.
kdbus' bus introduces quite a few new concepts that we wanted to be able
to limit based on MAC label/context, eg.
- check flags at HELO stage (say disallow fd passing),
- restrict ability to acquire name to certain subjects (for system bus),
- disallow creation of new buses,
- limit scope of broadcasts,
- etc.
Nice list.
Post by Karol Lewandowski
https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add12/include/linux/security.h#L1874
kdbus modifications were pretty light - with most visible change being
addition of opaque security pointer to kdbus_bus and similar structs.
That looks very reasonable, care to make it up into a patch I can add to
the end of this series so it's easy to review and possibly submit as
part of it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Karol Lewandowski
2014-10-31 11:16:49 UTC
Permalink
Post by Greg Kroah-Hartman
Post by Karol Lewandowski
Post by Greg Kroah-Hartman
Post by Karol Lewandowski
[ Sorry for breaking thread and resend - gmane rejected my original message
due to too long list of recipients... ]
Post by Greg Kroah-Hartman
There is a 1815 line documentation file in this series, so we aren't
trying to not provide this type of information here at all. But yes,
more background, about why this can't be done in userspace (zero copy,
less context switches, proper credential passing, timestamping, availble
at early-boot, LSM hooks for security models to tie into
While you're at it... I did some work on proof-of-concept LSM patches for
kdbus some time ago, see [1][2]. Currently, these are completely of date.
[1] https://github.com/lmctl/linux/commits/kdbus-lsm-v4.for-systemd-v212
[2] https://github.com/lmctl/kdbus/commit/aa0885489d19be92fa41c6f0a71df28763228a40
May I ask if you guys have your own plan for LSM or maybe it would be
worth to resurrect [1]?
The core calls are already mediated by LSM today, right? We don't want
anyone to be parsing the data stream through an LSM, that idea got
rejected a long time ago as something that is really not a good idea.
Parsing data is out of question, of course, but this is not what we were
proposing.
Glad to hear it :)
Post by Karol Lewandowski
Post by Greg Kroah-Hartman
Other than that, I don't know exactly what your patches do, or why they
are needed, care to go into details?
Patches in question were supposed to add few hooks for kdbus-specific
operations that doesn't seem to have compatible semantics with hooks
currently available in LSM.
kdbus' bus introduces quite a few new concepts that we wanted to be able
to limit based on MAC label/context, eg.
- check flags at HELO stage (say disallow fd passing),
- restrict ability to acquire name to certain subjects (for system bus),
- disallow creation of new buses,
- limit scope of broadcasts,
- etc.
Nice list.
Post by Karol Lewandowski
https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add12/include/linux/security.h#L1874
kdbus modifications were pretty light - with most visible change being
addition of opaque security pointer to kdbus_bus and similar structs.
That looks very reasonable, care to make it up into a patch I can add to
the end of this series so it's easy to review and possibly submit as
part of it?
I'll do my best to prepare something suitable for review, but I'm
not sure it can/should be part of next patch set.

As Paul wrote - discussion about hooks hasn't really ended up with
satisfactory conclusion but just faded away. kdbus own policy engine
has been rewritten since I last touched it so I'm not sure what part
are still applicable.

(Unfortunately, I'll be traveling from monday and likely to be offline
for a week or two...)

Thanks
--
Karol Lewandowski, Samsung R&D Institute Poland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
One Thousand Gnomes
2014-10-30 23:14:15 UTC
Permalink
Post by Karol Lewandowski
Post by Greg Kroah-Hartman
The core calls are already mediated by LSM today, right? We don't want
anyone to be parsing the data stream through an LSM, that idea got
rejected a long time ago as something that is really not a good idea.
Parsing data is out of question, of course, but this is not what we were
proposing.
Why is it out of the question. If it's a socket you can just a BPF filter
on it, so why can't kdbus support similar basic functionality ?

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Karol Lewandowski
2014-10-31 10:59:39 UTC
Permalink
Post by One Thousand Gnomes
Post by Karol Lewandowski
Post by Greg Kroah-Hartman
The core calls are already mediated by LSM today, right? We don't want
anyone to be parsing the data stream through an LSM, that idea got
rejected a long time ago as something that is really not a good idea.
Parsing data is out of question, of course, but this is not what we were
proposing.
Why is it out of the question. If it's a socket you can just a BPF filter
on it, so why can't kdbus support similar basic functionality ?
Fair point. I think that none of us simply considered this till now.

Thanks
--
Karol Lewandowski, Samsung R&D Institute Poland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul Moore
2014-10-30 23:48:20 UTC
Permalink
Post by Karol Lewandowski
Post by Greg Kroah-Hartman
Other than that, I don't know exactly what your patches do, or why they
are needed, care to go into details?
Patches in question were supposed to add few hooks for kdbus-specific
operations that doesn't seem to have compatible semantics with hooks
currently available in LSM.
kdbus' bus introduces quite a few new concepts that we wanted to be able
to limit based on MAC label/context, eg.
- check flags at HELO stage (say disallow fd passing),
- restrict ability to acquire name to certain subjects (for system bus),
- disallow creation of new buses,
- limit scope of broadcasts,
- etc.
Please take a look at hook list - I think most of names are
https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add1
2/include/linux/security.h#L1874
kdbus modifications were pretty light - with most visible change being
addition of opaque security pointer to kdbus_bus and similar structs.
[NOTE: we really should add the LSM list to this discussion and future
patchset postings.]

Also, to be completely honest, I don't think we ever really arrived at any
final conclusion about those LSM/kdbus hooks either. At least I don't think I
ever really satisfied myself that what we had was the "right" solution.

We both got busy and kinda drifted away from this effort. Karol, did you do
any further work on the hooks?
--
paul moore
security and virtualization @ redhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Karol Lewandowski
2014-10-31 14:22:46 UTC
Permalink
Post by Paul Moore
Post by Karol Lewandowski
Post by Greg Kroah-Hartman
Other than that, I don't know exactly what your patches do, or why they
are needed, care to go into details?
Patches in question were supposed to add few hooks for kdbus-specific
operations that doesn't seem to have compatible semantics with hooks
currently available in LSM.
kdbus' bus introduces quite a few new concepts that we wanted to be able
to limit based on MAC label/context, eg.
- check flags at HELO stage (say disallow fd passing),
- restrict ability to acquire name to certain subjects (for system bus),
- disallow creation of new buses,
- limit scope of broadcasts,
- etc.
Please take a look at hook list - I think most of names are
https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add1
2/include/linux/security.h#L1874
kdbus modifications were pretty light - with most visible change being
addition of opaque security pointer to kdbus_bus and similar structs.
[NOTE: we really should add the LSM list to this discussion and future
patchset postings.]
Also, to be completely honest, I don't think we ever really arrived at any
final conclusion about those LSM/kdbus hooks either. At least I don't think I
ever really satisfied myself that what we had was the "right" solution.
Agreed, "hooks" are far from being complete. I think that patches
were and still are - a starting point for discussion, not "a solution"
itself.

Timing wasn't good either - since our last discussion (Apr/May 2014)
kdbus policy engine has been completely rewritten and few core concepts
changed too.
Post by Paul Moore
We both got busy and kinda drifted away from this effort. Karol, did you do
any further work on the hooks?
I didn't. I was waiting for the peace of change in kdbus to slow
down a bit and, honestly, wasn't expecting submission in few next
months...

I'll do my best to post RFC patchset today or tomorrow.

Thanks
--
Karol Lewandowski, Samsung R&D Institute Poland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Karol Lewandowski
2014-11-09 00:08:17 UTC
Permalink
This is set of EXPERIMENTAL patches adding lsm support to kdbus.
(Rebased on top of v3.17.)
- (1) kdbus: extend structures with security pointer for lsm
Trivial. Applicable as-is.
- (2) security: export security_file_receive for modules
(3) kdbus: check if lsm permits installing received fds
fd_install doesn't seem to consult LSM, these patches
ensure that receiving process has the right to sent fds.
Compile-tested only.
- (4) security: introduce lsm hooks for kdbus
(5) kdbus: make use of new lsm hooks
Set of proof-of-concept hooks discussed previously with Paul Moore.
kdbus integration patch (5) for review, but unlikely for integration
at this stage.
Likewise, compile-tested only.
kdbus: extend structures with security pointer for lsm
security: export security_file_receive for modules
kdbus: check if lsm permits installing received fds
security: introduce lsm hooks for kdbus
kdbus: make use of new lsm hooks
These looks reasonable to me, thanks for sending them. They will need
to be refreshed again after this next round of changes, but it shouldn't
be that hard to do so.
Sure thing.

For completness - there are accompanying Smack and SELinux patches that
could go together with above patches, ie.

https://github.com/lmctl/linux/commit/103c26fd27d1ec8c32d85dd3d85681f936ac66fb

http://git.infradead.org/users/pcmoore/selinux/commitdiff/eef4844f91fef6092b6bfac941ebe7f18375be9d

I've got some free time on my hands now, so I'll try to revisit these too.

Cheers,
Karol Lewandowski
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-11-02 01:23:01 UTC
Permalink
Post by Jiri Kosina
Post by Greg Kroah-Hartman
kdbus is a kernel-level IPC implementation that aims for resemblance to
the the protocol layer with the existing userspace D-Bus daemon while
enabling some features that couldn't be implemented before in userspace.
I'd be interested in the features that can't be implemented in userspace
(and therefore would justify existence of kdbus in the kernel). Could you
please point me to such list / documentation?
It seems to me that most of the highlight features from the cover letter
can be "easily" (for certain definition of that word, of course)
implemented in userspace (vmsplice(), sending fd through unix socket, user
namespaces, UUID management, etc).
Sorry for the long delay in getting back to this, I'm battling a bad
case of jet-lag at the moment...

Here's some reasons why I feel it is better to have kdbus in the kernel
rather than trying to implement the same thing in a userspace daemon:

- performance: fewer process context switches, fewer copies, fewer
syscalls, larger memory chunks via memfd. This is really important
for a whole class of userspace programs that are ported from other
operating systems that are run on tiny ARM systems that rely on
hundreds of thousands of messages passed at boot time, and at
"critical" times in their user interaction loops.
- security: the peers which communicate do not have to trust each other,
as the only trustworthy compoenent in the game is the kernel which
adds metadata and ensures that all data passed as payload is either
copied or sealed, so that the receiver can parse the data without
having to protect against changing memory while parsing buffers. Also,
all the data transfer is controlled by the kernel, so that LSMs can
track and control what is going on, without involving userspace.
Because of the LSM issue, security people are much happier with this
model than the current scheme of having to hook into dbus to mediate
things.
- more metadata can be attached to messages than in userspace
- semantics for apps with heavy data payloads (media apps, for instance)
with optinal priority message dequeuing, and global message ordering.
Some "crazy" people are playing with using kdbus for audio data in the
system. I'm not saying that this is the best model for this, but
until now, there wasn't any other way to do this without having to
create custom "busses", one for each application library.
- being in the kernle closes a lot of races which can't be fixed with
the current userspace solutions. For example, with kdbus, there is a
way a client can disconnect from a bus, but do so only if no further
messages present in its queue, which is crucial for implementing
race-free "exit-on-idle" services
- eavesdropping on the kernel level, so privileged users can hook into
the message stream without hacking support for that into their
userspace processes
- a number of smaller benefits: for example kdbus learned a way to peek
full messages without dequeing them, which is really useful for
logging metadata when handling bus-activation requests. 

Of course, some of the bits above could be implemented in userspace
alone, for example with more sophisticated memory management APIs, but
this is usually done by losing out on the other details. For example,
for many of the memory management APIs, it's hard to not require the
communicating peers to fully trust each other. And we _really_ don't
want peers to have to trust each other.

Another benefit of having this in the kernel, rather than as a userspace
daemon, is that you can now easily use the bus from the initrd, or up to
the very end when the system shuts down. On current userspace D-Bus,
this is not really possible, as this requires passing the bus instance
around between initrd and the "real" system. Such a transition of all
fds also requires keeping full state of what has already been read from
the connection fds. kdbus makes this much simpler, as we can change the
ownership of the bus, just by passing one fd over from one part to the
other.

Regarding binder: binder and kdbus follow very different design
concepts. Binder implies the use of thread-pools to dispatch incoming
method calls. This is a very efficient scheme, and completely natural
in programming languages like Java. On most Linux programs, however,
there's a much stronger focus on central poll() loops that dispatch all
sources a program cares about. kdbus is much more usable in such
environments, as it doesn't enforce a threading model, and it is happy
with serialized dispatching. In fact, this major difference had an
effect on much of the design decisions: binder does not guarantee global
message ordering due to the parallel dispatching in the thread-pools,
but  kdbus does. Moreover, there's also a difference in the way message
handling. In kdbus, every message is basically taken and dispatched as
one blob, while in binder, continious connections to other peers are
created, which are then used to send messages on. Hence, the models are
quite different, and they serve different needs. I believe that the
D-Bus/kdbus model is more compatible and friendly with how Linux
programs are usually implemented.  I went into the kdbus vs. binder
stuff in a blog post that I linked to earlier in this thread that goes
into more detail here.

Hopefully this helps explain why I feel kdbus should be in the kernel
and not a userspace daemon. I'll put this information in the cover
letter for the next round of patches that are sent out.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
One Thousand Gnomes
2014-11-03 14:39:14 UTC
Permalink
On Sat, 1 Nov 2014 18:21:30 -0700
Post by Greg Kroah-Hartman
Here's some reasons why I feel it is better to have kdbus in the kernel
No - these are reasons to have *something* in the kernel. I think it
would be far more constructive to treat the current kdbus as a proof of
concept/prototype or even a draft requirements specification.
Post by Greg Kroah-Hartman
as the only trustworthy compoenent in the game is the kernel which
adds metadata and ensures that all data passed as payload is either
copied or sealed, so that the receiver can parse the data without
When the kernel adds metadata without being told to do so by one end of
the link you create a new set of security and privacy leaks. Far better
that the sender must choose what metadata is added and the receiver can
decide to bin stuff that's not acceptable. The job of the kernel is
really more like that of an auditor in a business transaction - to make
sure that the data they agree to pass is truthful.

(ie its the sender who must say "attach my user info", the receiver who
must say "no info, no play" and the kernel who must provide the info so
it can't be faked.
Post by Greg Kroah-Hartman
- semantics for apps with heavy data payloads (media apps, for instance)
with optinal priority message dequeuing, and global message ordering.
Sounds like System 5 IPC ;-)
Post by Greg Kroah-Hartman
Regarding binder: binder and kdbus follow very different design
concepts.
We know binder is broken but the Android guys are stuck in a special
kind of hell with it for some years to come. We need to make sure kdbus
isn't the same result.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Bergmann
2014-10-30 08:32:26 UTC
Permalink
This patch adds a quite extensive test suite for kdbus that checks
the most important code pathes in the driver. The idea is to extend
the test suite over time.
Also, this code can serve as an example implementation to show how to
use the kernel API from userspace.
Ah, new kernel code that comes with selftests, I'm impressed!

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Arnd Bergmann
2014-10-30 08:34:29 UTC
Permalink
Post by Greg Kroah-Hartman
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/kdbus/Kconfig | 11 +
drivers/misc/kdbus/Makefile | 19 +
drivers/misc/kdbus/bus.c | 450 ++++++
drivers/misc/kdbus/bus.h | 107 ++
drivers/misc/kdbus/connection.c | 1751 +++++++++++++++++++++
drivers/misc/kdbus/connection.h | 177 +++
drivers/misc/kdbus/domain.c | 477 ++++++
One very high-level common:

Since this is going to be a very commonly used IPC mechanism, I don't
like the idea of stuffing it into drivers/misc.

How about putting it into drivers/kdbus or ipc/kdbus instead?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Greg Kroah-Hartman
2014-10-30 16:19:12 UTC
Permalink
Post by Arnd Bergmann
Post by Greg Kroah-Hartman
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/kdbus/Kconfig | 11 +
drivers/misc/kdbus/Makefile | 19 +
drivers/misc/kdbus/bus.c | 450 ++++++
drivers/misc/kdbus/bus.h | 107 ++
drivers/misc/kdbus/connection.c | 1751 +++++++++++++++++++++
drivers/misc/kdbus/connection.h | 177 +++
drivers/misc/kdbus/domain.c | 477 ++++++
Since this is going to be a very commonly used IPC mechanism, I don't
like the idea of stuffing it into drivers/misc.
How about putting it into drivers/kdbus or ipc/kdbus instead?
ipc/kdbus seems good to me. I didn't want to "pollute" drivers/ with
any new subdirectories, it seems to grow fast enough as it is...

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Michael Ellerman
2014-11-14 03:47:06 UTC
Permalink
This patch adds a quite extensive test suite for kdbus that checks
the most important code pathes in the driver. The idea is to extend
the test suite over time.
Also, this code can serve as an example implementation to show how to
use the kernel API from userspace.
Great to see selftests included.

I needed this to get them building:

diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index b70237e8bc37..b1438a02e49f 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -210,6 +210,7 @@ header-y += ixjuser.h
header-y += jffs2.h
header-y += joystick.h
header-y += kd.h
+header-y += kdbus.h
header-y += kdev_t.h
header-y += kernel-page-flags.h
header-y += kernel.h
diff --git a/tools/testing/selftests/kdbus/Makefile b/tools/testing/selftests/kdbus/Makefile
index 0f6a745202af..96766c12a6e3 100644
--- a/tools/testing/selftests/kdbus/Makefile
+++ b/tools/testing/selftests/kdbus/Makefile
@@ -2,7 +2,7 @@ CFLAGS += -I../../../../usr/include/
CFLAGS += -I../../../../include/uapi/
CFLAGS += -std=gnu99
CFLAGS += -DKBUILD_MODNAME=\"kdbus\" -D_GNU_SOURCE
-LDFLAGS = -pthread -lcap
+LDLIBS = -pthread -lcap

OBJS= \
kdbus-enum.o \
@@ -37,7 +37,7 @@ all: kdbus-test
gcc $(CFLAGS) -c $< -o $@

kdbus-test: $(OBJS)
- gcc $(CFLAGS) $(LDFLAGS) $^ -o $@
+ gcc $(CFLAGS) $(LDFLAGS) $^ $(LDLIBS) -o $@

run_tests:
./kdbus-test


And with that it's all happy on ppc64le:

Testing bus make functions (bus-make) .................................. OK
Testing the HELLO command (hello) ...................................... OK
Testing the BYEBYE command (byebye) .................................... OK
Testing a chat pattern (chat) .......................................... OK
Testing a simple dameon (daemon) ....................................... OK
Testing file descriptor passing (fd-passing) ........................... OK
Testing custom endpoint (endpoint) ..................................... OK
Testing monitor functionality (monitor) ................................ OK
Testing basic name registry functions (name-basics) .................... OK
Testing name registry conflict details (name-conflict) ................. OK
Testing queuing of names (name-queue) .................................. OK
Testing basic message handling (message-basic) ......................... OK
Testing handling of messages with priority (message-prio) .............. OK
Testing timeout (timeout) .............................................. OK
Testing synchronous replies vs. BYEBYE (sync-byebye) ................... OK
Testing synchronous replies (sync-reply) ............................... OK
Testing freeing of memory (message-free) ............................... OK
Testing retrieving connection information (connection-info) ............ OK
Testing updating connection information (connection-update) ............ OK
Testing verifying pools are never writable (writable-pool) ............. OK
Testing policy (policy) ................................................ OK
Testing unprivileged bus access (policy-priv) .......................... OK
Testing policy in user namespaces (policy-ns) .......................... OK
Testing metadata in user namespaces (metadata-ns) ...................... OK
Testing adding of matches by id (match-id-add) ......................... OK
Testing removing of matches by id (match-id-remove) .................... OK
Testing adding of matches by name (match-name-add) ..................... OK
Testing removing of matches by name (match-name-remove) ................ OK
Testing matching for name changes (match-name-change) .................. OK
Testing matching with bloom filters (match-bloom) ...................... OK
Testing activator connections (activator) .............................. OK
Testing creating a domain (domain-make) ................................ OK
Testing benchmark (benchmark) .......................................... OK
Testing race multiple byebyes (race-byebye) ............................ OK
Testing race byebye vs match removal (race-byebye-match) ............... OK

SUMMARY: 35 tests passed, 0 skipped, 0 failed


cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Daniel Mack
2014-11-14 08:56:48 UTC
Permalink
Post by Michael Ellerman
This patch adds a quite extensive test suite for kdbus that checks
the most important code pathes in the driver. The idea is to extend
the test suite over time.
Also, this code can serve as an example implementation to show how to
use the kernel API from userspace.
Great to see selftests included.
Thanks a lot for testing! I've added your hunks to the patch set now.


Daniel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Loading...