Discussion:
[PATCH V5 05/15] x86, pci: Cleanup platform specific MCFG data by using ECAM hot_added flag.
(too old to reply)
Tomasz Nowicki
2016-02-16 14:00:02 UTC
Permalink
x86 uses lots of arch-specific data to maintain MCFG regions.
However, there is no need to. Firstly, information like start_bus, end_bus
can be obtained from acpi_pci_root structure. Secondly, mcfg_added flag
is already integrated to MCFG library, so it is enough to call functions
pci_mmconfig_insert and pci_mmconfig_delete which are handling
hot-plugged MCFG regions internally.

This patch implements above improvements, as a results we get
much smaller pci_root_info structure.

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Tested-by: Suravee Suthikulpanit <***@amd.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
arch/x86/pci/acpi.c | 30 ++++++++----------------------
1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index cec68e7..081dc70 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -11,11 +11,6 @@
struct pci_root_info {
struct acpi_pci_root_info common;
struct pci_sysdata sd;
-#ifdef CONFIG_PCI_MMCONFIG
- bool mcfg_added;
- u8 start_bus;
- u8 end_bus;
-#endif
};

static bool pci_use_crs = true;
@@ -179,16 +174,13 @@ static int check_segment(u16 seg, struct device *dev, char *estr)

static int setup_mcfg_map(struct acpi_pci_root_info *ci)
{
- int result, seg;
- struct pci_root_info *info;
+ int result, seg, start, end;
struct acpi_pci_root *root = ci->root;
struct device *dev = &ci->bridge->dev;

- info = container_of(ci, struct pci_root_info, common);
- info->start_bus = (u8)root->secondary.start;
- info->end_bus = (u8)root->secondary.end;
- info->mcfg_added = false;
- seg = info->sd.domain;
+ seg = root->segment;
+ start = root->secondary.start;
+ end = root->secondary.end;

/* return success if MMCFG is not in use */
if (raw_pci_ext_ops && raw_pci_ext_ops != &pci_mmcfg)
@@ -197,13 +189,11 @@ static int setup_mcfg_map(struct acpi_pci_root_info *ci)
if (!(pci_probe & PCI_PROBE_MMCONF))
return check_segment(seg, dev, "MMCONFIG is disabled,");

- result = pci_mmconfig_insert(dev, seg, info->start_bus, info->end_bus,
- root->mcfg_addr);
+ result = pci_mmconfig_insert(dev, seg, start, end, root->mcfg_addr);
if (result == 0) {
/* enable MMCFG if it hasn't been enabled yet */
if (raw_pci_ext_ops == NULL)
raw_pci_ext_ops = &pci_mmcfg;
- info->mcfg_added = true;
} else if (result != -EEXIST)
return check_segment(seg, dev,
"fail to add MMCONFIG information,");
@@ -213,14 +203,10 @@ static int setup_mcfg_map(struct acpi_pci_root_info *ci)

static void teardown_mcfg_map(struct acpi_pci_root_info *ci)
{
- struct pci_root_info *info;
+ struct acpi_pci_root *root = ci->root;

- info = container_of(ci, struct pci_root_info, common);
- if (info->mcfg_added) {
- pci_mmconfig_delete(info->sd.domain,
- info->start_bus, info->end_bus);
- info->mcfg_added = false;
- }
+ pci_mmconfig_delete(root->segment, root->secondary.start,
+ root->secondary.end);
}
#else
static int setup_mcfg_map(struct acpi_pci_root_info *ci)
--
1.9.1
Tomasz Nowicki
2016-02-16 14:00:02 UTC
Permalink
This patch is going to implement generic PCI host controller for
ACPI world, similar to what pci-host-generic.c driver does for DT world.

All such drivers, which we have seen so far, were implemented within
arch/ directory since they had some arch assumptions (x86 and ia64).
However, they all are doing similar thing, so it makes sense to find
some common code and abstract it into the generic driver.

This driver aims to initialize PCI host controller without architecture
assumptions. It uses MCFG library to manage PCI config space regions properly.
Also, it parses _CRS content to find out host bridge's resources (i.e. MEM/IO).
As mentioned in Kconfig help section, ACPI_PCI_HOST_GENERIC choice should be
made on a per-architecture basis.

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Signed-off-by: Hanjun Guo <***@linaro.org>
Signed-off-by: Suravee Suthikulpanit <***@amd.com>
Signed-off-by: Lorenzo Pieralisi <***@arm.com>
TO: Bjorn Helgaas <***@kernel.org>
TO: Rafael J. Wysocki <***@kernel.org>
Tested-by: Suravee Suthikulpanit <***@amd.com>
Tested-by: Jeremy Linton <***@arm.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
drivers/acpi/Kconfig | 7 +++
drivers/acpi/pci_root.c | 128 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/pci-acpi.h | 10 ++--
3 files changed, 141 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 183ffa3..1c7f57bd 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -346,6 +346,13 @@ config ACPI_PCI_SLOT
i.e., segment/bus/device/function tuples, with physical slots in
the system. If you are unsure, say N.

+config ACPI_PCI_HOST_GENERIC
+ bool
+ help
+ Select this config option from the architecture Kconfig,
+ if it is preferred to enable ACPI PCI host controller driver which
+ has no arch-specific assumptions.
+
config X86_PM_TIMER
bool "Power Management Timer Support" if EXPERT
depends on X86
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 3b284dc..02fd690 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -532,6 +532,134 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm)
}
}

+#ifdef CONFIG_ACPI_PCI_HOST_GENERIC
+static int pci_acpi_setup_mcfg_map(struct acpi_pci_root_info *ci)
+{
+ struct acpi_pci_root *root = ci->root;
+ int ret;
+
+ ret = pci_mmconfig_insert(&ci->bridge->dev, root->segment,
+ root->secondary.start, root->secondary.end,
+ root->mcfg_addr);
+ if (ret == -EEXIST)
+ ret = 0;
+
+ return ret;
+}
+
+static void pci_acpi_teardown_mcfg_map(struct acpi_pci_root_info *ci)
+{
+ struct acpi_pci_root *root = ci->root;
+
+ pci_mmconfig_delete(root->segment, root->secondary.start,
+ root->secondary.end);
+ kfree(ci);
+}
+
+static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci)
+{
+ struct list_head *list = &ci->resources;
+ struct acpi_device *device = ci->bridge;
+ struct resource_entry *entry, *tmp;
+ unsigned long flags;
+ int ret;
+
+ flags = IORESOURCE_IO | IORESOURCE_MEM;
+ ret = acpi_dev_get_resources(device, list,
+ acpi_dev_filter_resource_type_cb,
+ (void *)flags);
+ if (ret < 0) {
+ dev_warn(&device->dev,
+ "failed to parse _CRS method, error code %d\n", ret);
+ return ret;
+ } else if (ret == 0)
+ dev_dbg(&device->dev,
+ "no IO and memory resources present in _CRS\n");
+
+ resource_list_for_each_entry_safe(entry, tmp, &ci->resources) {
+ struct resource *res = entry->res;
+
+ if (entry->res->flags & IORESOURCE_DISABLED)
+ resource_list_destroy_entry(entry);
+ else
+ res->name = ci->name;
+
+ if (res->flags & IORESOURCE_IO) {
+ resource_size_t cpu_addr = res->start;
+ resource_size_t pci_addr = cpu_addr - entry->offset;
+ resource_size_t length = resource_size(res);
+ unsigned long port;
+
+ if (pci_register_io_range(cpu_addr, length)) {
+ resource_list_destroy_entry(entry);
+ continue;
+ }
+
+ port = pci_address_to_pio(cpu_addr);
+ if (port == (unsigned long)-1) {
+ resource_list_destroy_entry(entry);
+ continue;
+ }
+
+ res->start = port;
+ res->end = port + length - 1;
+ entry->offset = port - pci_addr;
+
+ if (pci_remap_iospace(res, cpu_addr) < 0)
+ resource_list_destroy_entry(entry);
+ }
+ }
+ return ret;
+}
+
+static struct acpi_pci_root_ops acpi_pci_root_ops = {
+ .init_info = pci_acpi_setup_mcfg_map,
+ .release_info = pci_acpi_teardown_mcfg_map,
+ .prepare_resources = pci_acpi_root_prepare_resources,
+};
+
+/* Root bridge scanning */
+struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
+{
+ int node = acpi_get_node(root->device->handle);
+ int domain = root->segment;
+ int busnum = root->secondary.start;
+ struct acpi_pci_root_info *info;
+ struct pci_bus *bus, *child;
+
+ if (domain && !pci_domains_supported) {
+ pr_warn("PCI %04x:%02x: multiple domains not supported.\n",
+ domain, busnum);
+ return NULL;
+ }
+
+ info = kzalloc_node(sizeof(*info), GFP_KERNEL, node);
+ if (!info) {
+ dev_err(&root->device->dev,
+ "pci_bus %04x:%02x: ignored (out of memory)\n",
+ domain, busnum);
+ return NULL;
+ }
+
+ acpi_pci_root_ops.pci_ops = pci_mcfg_get_ops(root);
+ bus = acpi_pci_root_create(root, &acpi_pci_root_ops, info, root);
+ if (!bus)
+ return NULL;
+
+ pci_bus_claim_resources(bus);
+ pci_assign_unassigned_bus_resources(bus);
+
+ /*
+ * After the PCI-E bus has been walked and all devices discovered,
+ * configure any settings of the fabric that might be necessary.
+ */
+ list_for_each_entry(child, &bus->children, node)
+ pcie_bus_configure_settings(child);
+
+ return bus;
+}
+#endif /* CONFIG_ACPI_PCI_HOST_GENERIC */
+
static int acpi_pci_root_add(struct acpi_device *device,
const struct acpi_device_id *not_used)
{
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 3dc6a8c..93feb04 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -123,10 +123,6 @@ struct pci_mmcfg_region {
bool hot_added;
};

-extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
- phys_addr_t addr);
-extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
-
extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr);
@@ -142,10 +138,16 @@ extern struct list_head pci_mmcfg_list;
#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)

#ifdef CONFIG_PCI_MMCONFIG
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
extern struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root);
extern void __iomem *pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn,
int offset);
#else
+static inline int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start,
+ u8 end, phys_addr_t addr) { return 0; }
+static inline int pci_mmconfig_delete(u16 seg, u8 start, u8 end) { return 0; }
static inline struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root)
{ return NULL; }
static inline void __iomem *pci_mcfg_dev_base(struct pci_bus *bus,
--
1.9.1
Tomasz Nowicki
2016-02-16 14:00:02 UTC
Permalink
As we now have valid PCI host bridge device reference we can
introduce code that is going to find its bus domain number using
ACPI _SEG method.

Note that _SEG method is optional, therefore _SEG absence means
that all PCI buses belong to domain 0.

While at it, for the sake of code clarity we put ACPI and DT domain
assign methods into the corresponding helpers.

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Reviewed-by: Liviu Dudau <***@arm.com>
Tested-by: Suravee Suthikulpanit <***@amd.com>
Tested-by: Jeremy Linton <***@arm.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
drivers/acpi/pci_root.c | 18 ++++++++++++++++++
drivers/pci/pci.c | 11 +++++++++--
include/linux/pci-acpi.h | 2 ++
3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index c2bd6dd..3b284dc 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -419,6 +419,24 @@ out:
}
EXPORT_SYMBOL(acpi_pci_osc_control_set);

+int acpi_pci_bus_domain_nr(struct device *parent)
+{
+ struct acpi_device *acpi_dev = to_acpi_device(parent);
+ unsigned long long segment = 0;
+ acpi_status status;
+
+ /*
+ * If _SEG method does not exist, following ACPI spec (6.5.6)
+ * all PCI buses belong to domain 0.
+ */
+ status = acpi_evaluate_integer(acpi_dev->handle, METHOD_NAME__SEG, NULL,
+ &segment);
+ if (ACPI_FAILURE(status) && status != AE_NOT_FOUND)
+ dev_err(&acpi_dev->dev, "can't evaluate _SEG\n");
+
+ return segment;
+}
+
static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm)
{
u32 support, control, requested;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 602eb42..d6c768e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -19,6 +19,7 @@
#include <linux/spinlock.h>
#include <linux/string.h>
#include <linux/log2.h>
+#include <linux/pci-acpi.h>
#include <linux/pci-aspm.h>
#include <linux/pm_wakeup.h>
#include <linux/interrupt.h>
@@ -4769,7 +4770,7 @@ int pci_get_new_domain_nr(void)
}

#ifdef CONFIG_PCI_DOMAINS_GENERIC
-void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent)
+static int of_pci_bus_domain_nr(struct device *parent)
{
static int use_dt_domains = -1;
int domain = of_get_pci_domain_nr(parent->of_node);
@@ -4811,7 +4812,13 @@ void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent)
domain = -1;
}

- bus->domain_nr = domain;
+ return domain;
+}
+
+void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent)
+{
+ bus->domain_nr = acpi_disabled ? of_pci_bus_domain_nr(parent) :
+ acpi_pci_bus_domain_nr(parent);
}
#endif
#endif
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 94d8f38..b4f87ba9 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -22,6 +22,7 @@ static inline acpi_status pci_acpi_remove_pm_notifier(struct acpi_device *dev)
{
return acpi_remove_pm_notifier(dev);
}
+extern int acpi_pci_bus_domain_nr(struct device *parent);
extern phys_addr_t acpi_pci_root_get_mcfg_addr(acpi_handle handle);

static inline acpi_handle acpi_find_root_bridge_handle(struct pci_dev *pdev)
@@ -143,6 +144,7 @@ extern struct list_head pci_mmcfg_list;
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
+static inline int acpi_pci_bus_domain_nr(struct device *parent) { return -1; }
#endif /* CONFIG_ACPI */

#ifdef CONFIG_ACPI_APEI
--
1.9.1
Jayachandran Chandrashekaran Nair
2016-02-17 13:50:01 UTC
Permalink
Tomasz, Lorenzo,
Post by Tomasz Nowicki
As we now have valid PCI host bridge device reference we can
introduce code that is going to find its bus domain number using
ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means
that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain
assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for
this without calling the _SEG method again. Please see
http://www.spinics.net/lists/arm-kernel/msg478167.html
at the last part of http://www.spinics.net/lists/arm-kernel/msg478169.html

JC.
Tomasz Nowicki
2016-02-17 14:10:03 UTC
Permalink
Post by Jayachandran Chandrashekaran Nair
Tomasz, Lorenzo,
Post by Tomasz Nowicki
As we now have valid PCI host bridge device reference we can
introduce code that is going to find its bus domain number using
ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means
that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain
assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for
this without calling the _SEG method again. Please see
http://www.spinics.net/lists/arm-kernel/msg478167.html
at the last part ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really
ugly way. This may hit us again once we want to obtain another firmware
specific info e.g. numa node. IMO we need to fix it this way.

Tomasz
Jayachandran Chandrashekaran Nair
2016-02-17 14:30:02 UTC
Permalink
Post by Jayachandran Chandrashekaran Nair
Tomasz, Lorenzo,
Post by Tomasz Nowicki
As we now have valid PCI host bridge device reference we can
introduce code that is going to find its bus domain number using
ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means
that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain
assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for
this without calling the _SEG method again. Please see
http://www.spinics.net/lists/arm-kernel/msg478167.html
at the last part ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly
way. This may hit us again once we want to obtain another firmware specific
info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL
in case of ACPI, and the check is needed not to crash (unless that
has changed).

The main part was the macro acpi_pci_get_segment() and the use
of acpi_pci_root_info from sysdata to do this.

JC.
Tomasz Nowicki
2016-02-17 15:10:02 UTC
Permalink
Post by Jayachandran Chandrashekaran Nair
Post by Jayachandran Chandrashekaran Nair
Tomasz, Lorenzo,
Post by Tomasz Nowicki
As we now have valid PCI host bridge device reference we can
introduce code that is going to find its bus domain number using
ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means
that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain
assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for
this without calling the _SEG method again. Please see
http://www.spinics.net/lists/arm-kernel/msg478167.html
at the last part ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly
way. This may hit us again once we want to obtain another firmware specific
info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL
in case of ACPI, and the check is needed not to crash (unless that
has changed).
This series passes down valid parent, see [PATCH V5 06/15].
Post by Jayachandran Chandrashekaran Nair
The main part was the macro acpi_pci_get_segment() and the use
of acpi_pci_root_info from sysdata to do this.
Since we can obtain related firmware specific data from valid parent
device (without defining another accessors), I do not see the point to
use sysdata. Let me know your opinion.

Tomasz
Jayachandran Chandrashekaran Nair
2016-02-17 15:30:02 UTC
Permalink
Post by Tomasz Nowicki
Post by Jayachandran Chandrashekaran Nair
Post by Jayachandran Chandrashekaran Nair
Tomasz, Lorenzo,
Post by Tomasz Nowicki
As we now have valid PCI host bridge device reference we can
introduce code that is going to find its bus domain number using
ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means
that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain
assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for
this without calling the _SEG method again. Please see
http://www.spinics.net/lists/arm-kernel/msg478167.html
at the last part
ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly
way. This may hit us again once we want to obtain another firmware specific
info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL
in case of ACPI, and the check is needed not to crash (unless that
has changed).
This series passes down valid parent, see [PATCH V5 06/15].
Post by Jayachandran Chandrashekaran Nair
The main part was the macro acpi_pci_get_segment() and the use
of acpi_pci_root_info from sysdata to do this.
Since we can obtain related firmware specific data from valid parent device
(without defining another accessors), I do not see the point to use sysdata.
Let me know your opinion.
In the patch, you use the parent info and call _SEG method again.
The segment information is available in the ->root->segment of
acpi_pci_root_info if you setup the sysdata like in my patch

JC.
Tomasz Nowicki
2016-02-17 15:40:02 UTC
Permalink
Post by Jayachandran Chandrashekaran Nair
Post by Tomasz Nowicki
Post by Jayachandran Chandrashekaran Nair
Post by Jayachandran Chandrashekaran Nair
Tomasz, Lorenzo,
Post by Tomasz Nowicki
As we now have valid PCI host bridge device reference we can
introduce code that is going to find its bus domain number using
ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means
that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain
assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for
this without calling the _SEG method again. Please see
http://www.spinics.net/lists/arm-kernel/msg478167.html
at the last part
ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly
way. This may hit us again once we want to obtain another firmware specific
info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL
in case of ACPI, and the check is needed not to crash (unless that
has changed).
This series passes down valid parent, see [PATCH V5 06/15].
Post by Jayachandran Chandrashekaran Nair
The main part was the macro acpi_pci_get_segment() and the use
of acpi_pci_root_info from sysdata to do this.
Since we can obtain related firmware specific data from valid parent device
(without defining another accessors), I do not see the point to use sysdata.
Let me know your opinion.
In the patch, you use the parent info and call _SEG method again.
The segment information is available in the ->root->segment of
acpi_pci_root_info if you setup the sysdata like in my patch
I know it is in sysdata->root->segment, but the way it is passed down is
wrong. sysdata is the pointer to unknown content (void *) so we need to
validate it before we can use it. If we merge this patch we can remove
first _SEG call.

Tomasz
Lorenzo Pieralisi
2016-02-17 17:50:01 UTC
Permalink
Guys,

On Wed, Feb 17, 2016 at 04:35:30PM +0100, Tomasz Nowicki wrote:

[...]
Post by Tomasz Nowicki
Post by Jayachandran Chandrashekaran Nair
Post by Tomasz Nowicki
Post by Jayachandran Chandrashekaran Nair
Post by Jayachandran Chandrashekaran Nair
In my patchset, I had a slightly different and I think better approach for
this without calling the _SEG method again. Please see
http://www.spinics.net/lists/arm-kernel/msg478167.html
at the last part
ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly
way. This may hit us again once we want to obtain another firmware specific
info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL
in case of ACPI, and the check is needed not to crash (unless that
has changed).
This series passes down valid parent, see [PATCH V5 06/15].
Post by Jayachandran Chandrashekaran Nair
The main part was the macro acpi_pci_get_segment() and the use
of acpi_pci_root_info from sysdata to do this.
Since we can obtain related firmware specific data from valid parent device
(without defining another accessors), I do not see the point to use sysdata.
Let me know your opinion.
In the patch, you use the parent info and call _SEG method again.
The segment information is available in the ->root->segment of
acpi_pci_root_info if you setup the sysdata like in my patch
I know it is in sysdata->root->segment, but the way it is passed
down is wrong. sysdata is the pointer to unknown content (void *) so
we need to validate it before we can use it. If we merge this patch
we can remove first _SEG call.
I personally do not think there is such a significant difference, both
solutions have pros and cons, it is worth keeping in mind though
that reading _SEG again to set the bus domain number works only if
the value we stash in acpi_pci_root.segment is not overridden, if it
is (ie see x86 - agreed that's to fix a FW bug) we have a disconnect.

On the other hand Tomasz's code allows removing some IA64 code in the
process (code that sets the bridge companion, so part of the patch
should be kept regardless).

So, there are two things to do:

- Assign the bridge companion in PCI core code
- Decide where to get the domain number from (acpi_pci_root.segment vs
calling _SEG again). At present they are equivalent so I do not see
any compelling reason to change this patch.

Side note: there is already a function (pci_domain_nr()) that you
can implement in ACPI PCI host generic (by deselecting
PCI_DOMAINS_GENERIC if ACPI) so there is no need for acpi_pci_get_segment()
in case we have to override _SEG value in the future, at present
there is no need, comments appreciated.

Lorenzo
Tomasz Nowicki
2016-02-16 14:00:02 UTC
Permalink
No functional changes in this patch.

PCI I/O space mapping code does not depend on OF, therefore it can be
moved to PCI core code. This way we will be able to use it
e.g. in ACPI PCI code.

Suggested-by: Lorenzo Pieralisi <***@arm.com>
Signed-off-by: Tomasz Nowicki <***@semihalf.com>
CC: Arnd Bergmann <***@arndb.de>
CC: Liviu Dudau <***@arm.com>
CC: Lorenzo Pieralisi <***@arm.com>
---
drivers/of/address.c | 116 +--------------------------------------------
drivers/pci/pci.c | 115 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/of_address.h | 9 ----
include/linux/pci.h | 5 ++
4 files changed, 121 insertions(+), 124 deletions(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 91a469d..0a553c0 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -4,6 +4,7 @@
#include <linux/ioport.h>
#include <linux/module.h>
#include <linux/of_address.h>
+#include <linux/pci.h>
#include <linux/pci_regs.h>
#include <linux/sizes.h>
#include <linux/slab.h>
@@ -673,121 +674,6 @@ const __be32 *of_get_address(struct device_node *dev, int index, u64 *size,
}
EXPORT_SYMBOL(of_get_address);

-#ifdef PCI_IOBASE
-struct io_range {
- struct list_head list;
- phys_addr_t start;
- resource_size_t size;
-};
-
-static LIST_HEAD(io_range_list);
-static DEFINE_SPINLOCK(io_range_lock);
-#endif
-
-/*
- * Record the PCI IO range (expressed as CPU physical address + size).
- * Return a negative value if an error has occured, zero otherwise
- */
-int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size)
-{
- int err = 0;
-
-#ifdef PCI_IOBASE
- struct io_range *range;
- resource_size_t allocated_size = 0;
-
- /* check if the range hasn't been previously recorded */
- spin_lock(&io_range_lock);
- list_for_each_entry(range, &io_range_list, list) {
- if (addr >= range->start && addr + size <= range->start + size) {
- /* range already registered, bail out */
- goto end_register;
- }
- allocated_size += range->size;
- }
-
- /* range not registed yet, check for available space */
- if (allocated_size + size - 1 > IO_SPACE_LIMIT) {
- /* if it's too big check if 64K space can be reserved */
- if (allocated_size + SZ_64K - 1 > IO_SPACE_LIMIT) {
- err = -E2BIG;
- goto end_register;
- }
-
- size = SZ_64K;
- pr_warn("Requested IO range too big, new size set to 64K\n");
- }
-
- /* add the range to the list */
- range = kzalloc(sizeof(*range), GFP_ATOMIC);
- if (!range) {
- err = -ENOMEM;
- goto end_register;
- }
-
- range->start = addr;
- range->size = size;
-
- list_add_tail(&range->list, &io_range_list);
-
-end_register:
- spin_unlock(&io_range_lock);
-#endif
-
- return err;
-}
-
-phys_addr_t pci_pio_to_address(unsigned long pio)
-{
- phys_addr_t address = (phys_addr_t)OF_BAD_ADDR;
-
-#ifdef PCI_IOBASE
- struct io_range *range;
- resource_size_t allocated_size = 0;
-
- if (pio > IO_SPACE_LIMIT)
- return address;
-
- spin_lock(&io_range_lock);
- list_for_each_entry(range, &io_range_list, list) {
- if (pio >= allocated_size && pio < allocated_size + range->size) {
- address = range->start + pio - allocated_size;
- break;
- }
- allocated_size += range->size;
- }
- spin_unlock(&io_range_lock);
-#endif
-
- return address;
-}
-
-unsigned long __weak pci_address_to_pio(phys_addr_t address)
-{
-#ifdef PCI_IOBASE
- struct io_range *res;
- resource_size_t offset = 0;
- unsigned long addr = -1;
-
- spin_lock(&io_range_lock);
- list_for_each_entry(res, &io_range_list, list) {
- if (address >= res->start && address < res->start + res->size) {
- addr = address - res->start + offset;
- break;
- }
- offset += res->size;
- }
- spin_unlock(&io_range_lock);
-
- return addr;
-#else
- if (address > IO_SPACE_LIMIT)
- return (unsigned long)-1;
-
- return (unsigned long) address;
-#endif
-}
-
static int __of_address_to_resource(struct device_node *dev,
const __be32 *addrp, u64 size, unsigned int flags,
const char *name, struct resource *r)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d6c768e..3a516c0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3023,6 +3023,121 @@ int pci_request_regions_exclusive(struct pci_dev *pdev, const char *res_name)
}
EXPORT_SYMBOL(pci_request_regions_exclusive);

+#ifdef PCI_IOBASE
+struct io_range {
+ struct list_head list;
+ phys_addr_t start;
+ resource_size_t size;
+};
+
+static LIST_HEAD(io_range_list);
+static DEFINE_SPINLOCK(io_range_lock);
+#endif
+
+/*
+ * Record the PCI IO range (expressed as CPU physical address + size).
+ * Return a negative value if an error has occured, zero otherwise
+ */
+int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size)
+{
+ int err = 0;
+
+#ifdef PCI_IOBASE
+ struct io_range *range;
+ resource_size_t allocated_size = 0;
+
+ /* check if the range hasn't been previously recorded */
+ spin_lock(&io_range_lock);
+ list_for_each_entry(range, &io_range_list, list) {
+ if (addr >= range->start && addr + size <= range->start + size) {
+ /* range already registered, bail out */
+ goto end_register;
+ }
+ allocated_size += range->size;
+ }
+
+ /* range not registed yet, check for available space */
+ if (allocated_size + size - 1 > IO_SPACE_LIMIT) {
+ /* if it's too big check if 64K space can be reserved */
+ if (allocated_size + SZ_64K - 1 > IO_SPACE_LIMIT) {
+ err = -E2BIG;
+ goto end_register;
+ }
+
+ size = SZ_64K;
+ pr_warn("Requested IO range too big, new size set to 64K\n");
+ }
+
+ /* add the range to the list */
+ range = kzalloc(sizeof(*range), GFP_ATOMIC);
+ if (!range) {
+ err = -ENOMEM;
+ goto end_register;
+ }
+
+ range->start = addr;
+ range->size = size;
+
+ list_add_tail(&range->list, &io_range_list);
+
+end_register:
+ spin_unlock(&io_range_lock);
+#endif
+
+ return err;
+}
+
+phys_addr_t pci_pio_to_address(unsigned long pio)
+{
+ phys_addr_t address = (phys_addr_t)OF_BAD_ADDR;
+
+#ifdef PCI_IOBASE
+ struct io_range *range;
+ resource_size_t allocated_size = 0;
+
+ if (pio > IO_SPACE_LIMIT)
+ return address;
+
+ spin_lock(&io_range_lock);
+ list_for_each_entry(range, &io_range_list, list) {
+ if (pio >= allocated_size && pio < allocated_size + range->size) {
+ address = range->start + pio - allocated_size;
+ break;
+ }
+ allocated_size += range->size;
+ }
+ spin_unlock(&io_range_lock);
+#endif
+
+ return address;
+}
+
+unsigned long __weak pci_address_to_pio(phys_addr_t address)
+{
+#ifdef PCI_IOBASE
+ struct io_range *res;
+ resource_size_t offset = 0;
+ unsigned long addr = -1;
+
+ spin_lock(&io_range_lock);
+ list_for_each_entry(res, &io_range_list, list) {
+ if (address >= res->start && address < res->start + res->size) {
+ addr = address - res->start + offset;
+ break;
+ }
+ offset += res->size;
+ }
+ spin_unlock(&io_range_lock);
+
+ return addr;
+#else
+ if (address > IO_SPACE_LIMIT)
+ return (unsigned long)-1;
+
+ return (unsigned long) address;
+#endif
+}
+
/**
* pci_remap_iospace - Remap the memory mapped I/O space
* @res: Resource describing the I/O space
diff --git a/include/linux/of_address.h b/include/linux/of_address.h
index 01c0a55..3786473 100644
--- a/include/linux/of_address.h
+++ b/include/linux/of_address.h
@@ -47,10 +47,6 @@ void __iomem *of_io_request_and_map(struct device_node *device,
extern const __be32 *of_get_address(struct device_node *dev, int index,
u64 *size, unsigned int *flags);

-extern int pci_register_io_range(phys_addr_t addr, resource_size_t size);
-extern unsigned long pci_address_to_pio(phys_addr_t addr);
-extern phys_addr_t pci_pio_to_address(unsigned long pio);
-
extern int of_pci_range_parser_init(struct of_pci_range_parser *parser,
struct device_node *node);
extern struct of_pci_range *of_pci_range_parser_one(
@@ -86,11 +82,6 @@ static inline const __be32 *of_get_address(struct device_node *dev, int index,
return NULL;
}

-static inline phys_addr_t pci_pio_to_address(unsigned long pio)
-{
- return 0;
-}
-
static inline int of_pci_range_parser_init(struct of_pci_range_parser *parser,
struct device_node *node)
{
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 27df4a6..dac677c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1168,6 +1168,9 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
void *alignf_data);


+int pci_register_io_range(phys_addr_t addr, resource_size_t size);
+unsigned long pci_address_to_pio(phys_addr_t addr);
+phys_addr_t pci_pio_to_address(unsigned long pio);
int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr);

static inline pci_bus_addr_t pci_bus_address(struct pci_dev *pdev, int bar)
@@ -1488,6 +1491,8 @@ static inline int pci_request_regions(struct pci_dev *dev, const char *res_name)
{ return -EIO; }
static inline void pci_release_regions(struct pci_dev *dev) { }

+static inline unsigned long pci_address_to_pio(phys_addr_t addr) { return -1; }
+
static inline void pci_block_cfg_access(struct pci_dev *dev) { }
static inline int pci_block_cfg_access_in_atomic(struct pci_dev *dev)
{ return 0; }
--
1.9.1
Tomasz Nowicki
2016-02-16 14:00:02 UTC
Permalink
There are two ways we can get ECAM (aka MCFG) regions using ACPI,
first from MCFG static table and second from _CBA method. We cannot remove
static regions, however regions coming from _CBA should be removed while
removing bridge device.

In the light of above we need flag to mark hot added ECAM entries
and user to call pci_mmconfig_insert while adding regions from _CBA method.
Similarly pci_mmconfig_delete while removing hot added regions.

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Tested-by: Suravee Suthikulpanit <***@amd.com>
Tested-by: Jeremy Linton <***@arm.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
drivers/acpi/pci_mcfg.c | 4 +++-
include/linux/pci-acpi.h | 1 +
2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index 0467b00..3282f2a 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -74,6 +74,7 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
new->segment = segment;
new->start_bus = start;
new->end_bus = end;
+ new->hot_added = false;

res = &new->res;
res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
@@ -205,6 +206,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
}
rc = pci_mmconfig_map_resource(dev, cfg);
if (!rc) {
+ cfg->hot_added = true;
list_add_sorted(cfg);
dev_info(dev, "MMCONFIG at %pR (base %#lx)\n",
&cfg->res, (unsigned long)addr);
@@ -228,7 +230,7 @@ int pci_mmconfig_delete(u16 seg, u8 start, u8 end)
mutex_lock(&pci_mmcfg_lock);
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == seg && cfg->start_bus == start &&
- cfg->end_bus == end) {
+ cfg->end_bus == end && cfg->hot_added) {
list_del_rcu(&cfg->list);
synchronize_rcu();
pci_mmconfig_unmap_resource(cfg);
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index e9450ef..94d8f38 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -119,6 +119,7 @@ struct pci_mmcfg_region {
u8 start_bus;
u8 end_bus;
char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+ bool hot_added;
};

extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
--
1.9.1
Lorenzo Pieralisi
2016-02-18 12:40:02 UTC
Permalink
Post by Tomasz Nowicki
There are two ways we can get ECAM (aka MCFG) regions using ACPI,
first from MCFG static table and second from _CBA method. We cannot remove
static regions, however regions coming from _CBA should be removed while
removing bridge device.
In the light of above we need flag to mark hot added ECAM entries
and user to call pci_mmconfig_insert while adding regions from _CBA method.
Similarly pci_mmconfig_delete while removing hot added regions.
"According to the PCI firmware specification, ACPI provides two standard
mechanisms to retrieve ECAM memory mapped configuration regions (aka MCFG).
For non-hot-removable bridges, ECAM bridge configurations are retrieved from
the static MCFG table and have to be considered non-hot-removable for the
current boot; hot-removable PCI host bridges configurations are retrieved
through bridges _CBA methods.

When ECAM regions are added through _CBA methods, they can be marked
as hot-added so that, upon respective PCI host bridge hot-removal, they can
be unmapped and deleted in that no longer needed.

This patch adds a flag to MCFG regions allowing to mark them as hot-added,
so that upon corresponding PCI bridge hot-removal they can be deleted since
no longer needed."
Post by Tomasz Nowicki
---
drivers/acpi/pci_mcfg.c | 4 +++-
include/linux/pci-acpi.h | 1 +
2 files changed, 4 insertions(+), 1 deletion(-)
It would be great if x86 people can have a look, we no longer
associate a MCFG region to a bridge structure, the end result
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index 0467b00..3282f2a 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -74,6 +74,7 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
new->segment = segment;
new->start_bus = start;
new->end_bus = end;
+ new->hot_added = false;
res = &new->res;
res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
@@ -205,6 +206,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
}
rc = pci_mmconfig_map_resource(dev, cfg);
if (!rc) {
+ cfg->hot_added = true;
list_add_sorted(cfg);
dev_info(dev, "MMCONFIG at %pR (base %#lx)\n",
&cfg->res, (unsigned long)addr);
@@ -228,7 +230,7 @@ int pci_mmconfig_delete(u16 seg, u8 start, u8 end)
mutex_lock(&pci_mmcfg_lock);
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == seg && cfg->start_bus == start &&
- cfg->end_bus == end) {
+ cfg->end_bus == end && cfg->hot_added) {
list_del_rcu(&cfg->list);
synchronize_rcu();
pci_mmconfig_unmap_resource(cfg);
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index e9450ef..94d8f38 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -119,6 +119,7 @@ struct pci_mmcfg_region {
u8 start_bus;
u8 end_bus;
char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+ bool hot_added;
};
extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
--
1.9.1
Tomasz Nowicki
2016-02-16 14:00:02 UTC
Permalink
Currently we have two platforms (x86 & ia64) capable of PCI ACPI host
bridge initialization. They both use arch-specific sysdata to pass down
parent device reference and both rely on NULL parent in pci_create_root_bus()
to validate sysdata content.

It looks hacky and prevents us from getting some firmware specific
info for PCI host controller based on its acpi_device structure
in generic pci_create_root_bus() function. However, we overcome that
blocker by passing down parent device via pci_create_root_bus parameter
(as the ACPI device type). Then we use ACPI_COMPANION_SET in core code
for ACPI boot method only. ACPI_COMPANION_SET is safe to run for all
cases DT, ACPI and DT&ACPI.

Since now PCI core code is setting ACPI companion device for us,
x86 & ia64 specific ACPI companion device setting turns out to be dead now.
We can get rid of it, including related companion reference from
PCI sysdata structure. Aslo, PCI_CONTROLLER macro cannot return valid
companion device anymore. Therefore we need to convert its usage to
ACPI_COMPANION.

Suggested-by: Lorenzo Pieralisi <***@arm.com>
Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Reviewed-by: Lorenzo Pieralisi <***@arm.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
arch/ia64/hp/common/sba_iommu.c | 2 +-
arch/ia64/include/asm/pci.h | 1 -
arch/ia64/pci/pci.c | 16 ----------------
arch/ia64/sn/kernel/io_acpi_init.c | 4 ++--
arch/x86/include/asm/pci.h | 3 ---
arch/x86/pci/acpi.c | 17 -----------------
drivers/acpi/pci_root.c | 8 +++++++-
drivers/pci/probe.c | 2 ++
8 files changed, 12 insertions(+), 41 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index a6d6190..78e4444 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -1981,7 +1981,7 @@ sba_connect_bus(struct pci_bus *bus)
if (PCI_CONTROLLER(bus)->iommu)
return;

- handle = acpi_device_handle(PCI_CONTROLLER(bus)->companion);
+ handle = acpi_device_handle(ACPI_COMPANION(bus->bridge));
if (!handle)
return;

diff --git a/arch/ia64/include/asm/pci.h b/arch/ia64/include/asm/pci.h
index 07039d1..5050748 100644
--- a/arch/ia64/include/asm/pci.h
+++ b/arch/ia64/include/asm/pci.h
@@ -65,7 +65,6 @@ extern int pci_mmap_legacy_page_range(struct pci_bus *bus,
#define pci_legacy_write platform_pci_legacy_write

struct pci_controller {
- struct acpi_device *companion;
void *iommu;
int segment;
int node; /* nearest node with memory or NUMA_NO_NODE for global allocation */
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 8f6ac2f..978d6af 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -301,28 +301,12 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
}

info->controller.segment = root->segment;
- info->controller.companion = device;
info->controller.node = acpi_get_node(device->handle);
INIT_LIST_HEAD(&info->io_resources);
return acpi_pci_root_create(root, &pci_acpi_root_ops,
&info->common, &info->controller);
}

-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
-{
- /*
- * We pass NULL as parent to pci_create_root_bus(), so if it is not NULL
- * here, pci_create_root_bus() has been called by someone else and
- * sysdata is likely to be different from what we expect. Let it go in
- * that case.
- */
- if (!bridge->dev.parent) {
- struct pci_controller *controller = bridge->bus->sysdata;
- ACPI_COMPANION_SET(&bridge->dev, controller->companion);
- }
- return 0;
-}
-
void pcibios_fixup_device_resources(struct pci_dev *dev)
{
int idx;
diff --git a/arch/ia64/sn/kernel/io_acpi_init.c b/arch/ia64/sn/kernel/io_acpi_init.c
index 0640739..bcfddc2 100644
--- a/arch/ia64/sn/kernel/io_acpi_init.c
+++ b/arch/ia64/sn/kernel/io_acpi_init.c
@@ -132,7 +132,7 @@ sn_get_bussoft_ptr(struct pci_bus *bus)
struct acpi_resource_vendor_typed *vendor;


- handle = acpi_device_handle(PCI_CONTROLLER(bus)->companion);
+ handle = acpi_device_handle(ACPI_COMPANION(bus->bridge));
status = acpi_get_vendor_resource(handle, METHOD_NAME__CRS,
&sn_uuid, &buffer);
if (ACPI_FAILURE(status)) {
@@ -360,7 +360,7 @@ sn_acpi_get_pcidev_info(struct pci_dev *dev, struct pcidev_info **pcidev_info,
acpi_status status;
struct acpi_buffer name_buffer = { ACPI_ALLOCATE_BUFFER, NULL };

- rootbus_handle = acpi_device_handle(PCI_CONTROLLER(dev)->companion);
+ rootbus_handle = acpi_device_handle(ACPI_COMPANION(dev->bus->bridge));
status = acpi_evaluate_integer(rootbus_handle, METHOD_NAME__SEG, NULL,
&segment);
if (ACPI_SUCCESS(status)) {
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index 4625943..a98c022 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -14,9 +14,6 @@
struct pci_sysdata {
int domain; /* PCI domain */
int node; /* NUMA node */
-#ifdef CONFIG_ACPI
- struct acpi_device *companion; /* ACPI companion device */
-#endif
#ifdef CONFIG_X86_64
void *iommu; /* IOMMU private data */
#endif
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 081dc70..c67932e 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -334,7 +334,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
struct pci_sysdata sd = {
.domain = domain,
.node = node,
- .companion = root->device
};

memcpy(bus->sysdata, &sd, sizeof(sd));
@@ -349,7 +348,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
else {
info->sd.domain = domain;
info->sd.node = node;
- info->sd.companion = root->device;
bus = acpi_pci_root_create(root, &acpi_pci_root_ops,
&info->common, &info->sd);
}
@@ -367,21 +365,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
return bus;
}

-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
-{
- /*
- * We pass NULL as parent to pci_create_root_bus(), so if it is not NULL
- * here, pci_create_root_bus() has been called by someone else and
- * sysdata is likely to be different from what we expect. Let it go in
- * that case.
- */
- if (!bridge->dev.parent) {
- struct pci_sysdata *sd = bridge->bus->sysdata;
- ACPI_COMPANION_SET(&bridge->dev, sd->companion);
- }
- return 0;
-}
-
int __init pci_acpi_init(void)
{
struct pci_dev *dev = NULL;
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index ae3fe4e..c2bd6dd 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -846,7 +846,13 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,

pci_acpi_root_add_resources(info);
pci_add_resource(&info->resources, &root->secondary);
- bus = pci_create_root_bus(NULL, busnum, ops->pci_ops,
+
+ /*
+ * pci_create_root_bus() needs to detect the parent device type,
+ * so initialize its companion data accordingly.
+ */
+ ACPI_COMPANION_SET(&device->dev, device);
+ bus = pci_create_root_bus(&device->dev, busnum, ops->pci_ops,
sysdata, &info->resources);
if (!bus)
goto out_release_info;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6d7ab9b..88a4734 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2100,6 +2100,8 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
bridge->dev.parent = parent;
bridge->dev.release = pci_release_host_bridge_dev;
dev_set_name(&bridge->dev, "pci%04x:%02x", pci_domain_nr(b), bus);
+ if (parent)
+ ACPI_COMPANION_SET(&bridge->dev, ACPI_COMPANION(parent));
error = pcibios_root_bridge_prepare(bridge);
if (error) {
kfree(bridge);
--
1.9.1
Tomasz Nowicki
2016-02-16 14:00:03 UTC
Permalink
We can now enable MCFG library. Currently, there is no ARM64 use case for
RAW pci config accessors, so lets use empty ones for now.
At the same time, we can cleanup the old implementation of RAW accessors
from arch/arm64/kernel/pci.c

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Tested-by: Suravee Suthikulpanit <***@amd.com>
Tested-by: Jeremy Linton <***@arm.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
arch/arm64/Kconfig | 4 ++++
arch/arm64/kernel/pci.c | 15 ---------------
2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8cc6228..552e996 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -238,6 +238,10 @@ source "drivers/pci/Kconfig"
source "drivers/pci/pcie/Kconfig"
source "drivers/pci/hotplug/Kconfig"

+config PCI_MMCONFIG
+ def_bool y
+ depends on ACPI
+
endmenu

menu "Kernel Features"
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index b3d098b..023b983 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -61,21 +61,6 @@ int pcibios_add_device(struct pci_dev *dev)
return 0;
}

-/*
- * raw_pci_read/write - Platform-specific PCI config space access.
- */
-int raw_pci_read(unsigned int domain, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 *val)
-{
- return -ENXIO;
-}
-
-int raw_pci_write(unsigned int domain, unsigned int bus,
- unsigned int devfn, int reg, int len, u32 val)
-{
- return -ENXIO;
-}
-
#ifdef CONFIG_ACPI
/* Root bridge scanning */
struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
--
1.9.1
Tomasz Nowicki
2016-02-16 14:00:03 UTC
Permalink
We use generic accessors from access.c by default. However, we already
know platforms that need special handling while accessing to PCI config
space. These platforms will need different accessors set matched against
platform ID, domain, bus touple. Therefore we are going to add (in future)
DECLARE_ACPI_MCFG_FIXUP which will register platform specific custom
accessors. For now, we let pci_mcfg_get_ops to take acpi_pci_root structure
as an arguments and left some space for quirk matching algorithm.

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
drivers/acpi/pci_mcfg.c | 30 ++++++++++++++++++++++++++++++
include/linux/pci-acpi.h | 12 ++++++++++++
2 files changed, 42 insertions(+)

diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index 3282f2a..0062257 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -41,6 +41,36 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus,
return PCIBIOS_DEVICE_NOT_FOUND;
}

+void __iomem *
+pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset)
+{
+ struct pci_mmcfg_region *cfg;
+
+ cfg = pci_mmconfig_lookup(pci_domain_nr(bus), bus->number);
+ if (cfg && cfg->virt)
+ return cfg->virt +
+ (PCI_MMCFG_BUS_OFFSET(bus->number) | (devfn << 12)) +
+ offset;
+ return NULL;
+}
+
+/* Default generic PCI config accessors */
+static struct pci_ops default_pci_mcfg_ops = {
+ .map_bus = pci_mcfg_dev_base,
+ .read = pci_generic_config_read,
+ .write = pci_generic_config_write,
+};
+
+struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root)
+{
+ /*
+ * TODO: Match against platform specific quirks and return
+ * corresponding PCI config space accessor set.
+ */
+
+ return &default_pci_mcfg_ops;
+}
+
static void list_add_sorted(struct pci_mmcfg_region *new)
{
struct pci_mmcfg_region *cfg;
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index b4f87ba9..3dc6a8c 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -141,6 +141,18 @@ extern struct list_head pci_mmcfg_list;
#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)

+#ifdef CONFIG_PCI_MMCONFIG
+extern struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root);
+extern void __iomem *pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn,
+ int offset);
+#else
+static inline struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root)
+{ return NULL; }
+static inline void __iomem *pci_mcfg_dev_base(struct pci_bus *bus,
+ unsigned int devfn, int offset)
+{ return NULL; }
+#endif
+
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
--
1.9.1
Lorenzo Pieralisi
2016-02-17 18:40:01 UTC
Permalink
Post by Tomasz Nowicki
We use generic accessors from access.c by default. However, we already
know platforms that need special handling while accessing to PCI config
space. These platforms will need different accessors set matched against
platform ID, domain, bus touple. Therefore we are going to add (in future)
DECLARE_ACPI_MCFG_FIXUP which will register platform specific custom
accessors. For now, we let pci_mcfg_get_ops to take acpi_pci_root structure
as an arguments and left some space for quirk matching algorithm.
You should not describe the future (because you do not know if/when
that will be implemented), you should describe what the patch does
in its current form.

"This patch implements MCFG based PCI bus operations through MCFG
map function and generic PCI accessors".
Post by Tomasz Nowicki
---
drivers/acpi/pci_mcfg.c | 30 ++++++++++++++++++++++++++++++
include/linux/pci-acpi.h | 12 ++++++++++++
2 files changed, 42 insertions(+)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index 3282f2a..0062257 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -41,6 +41,36 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus,
return PCIBIOS_DEVICE_NOT_FOUND;
}
+void __iomem *
+pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset)
+{
+ struct pci_mmcfg_region *cfg;
+
+ cfg = pci_mmconfig_lookup(pci_domain_nr(bus), bus->number);
+ if (cfg && cfg->virt)
+ return cfg->virt +
+ (PCI_MMCFG_BUS_OFFSET(bus->number) | (devfn << 12)) +
+ offset;
+ return NULL;
+}
+
+/* Default generic PCI config accessors */
+static struct pci_ops default_pci_mcfg_ops = {
+ .map_bus = pci_mcfg_dev_base,
+ .read = pci_generic_config_read,
+ .write = pci_generic_config_write,
+};
Nit: s/default_pci_mcfg_ops/pci_mcfg_ops
Post by Tomasz Nowicki
+
+struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root)
+{
+ /*
+ * TODO: Match against platform specific quirks and return
+ * corresponding PCI config space accessor set.
+ */
Remove this comment, see above.
Post by Tomasz Nowicki
+
+ return &default_pci_mcfg_ops;
See above.
Post by Tomasz Nowicki
+}
+
static void list_add_sorted(struct pci_mmcfg_region *new)
{
struct pci_mmcfg_region *cfg;
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index b4f87ba9..3dc6a8c 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -141,6 +141,18 @@ extern struct list_head pci_mmcfg_list;
#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+#ifdef CONFIG_PCI_MMCONFIG
+extern struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root);
+extern void __iomem *pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn,
+ int offset);
+#else
+static inline struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root)
+{ return NULL; }
+static inline void __iomem *pci_mcfg_dev_base(struct pci_bus *bus,
+ unsigned int devfn, int offset)
+{ return NULL; }
+#endif
+
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I think it can even be squashed, anyway:

Reviewed-by: Lorenzo Pieralisi <***@arm.com>
Tomasz Nowicki
2016-02-16 14:00:03 UTC
Permalink
Some platforms may not be fully compliant with generic set of PCI config
accessors. For these cases we implement the way to overwrite accessors
set prior to PCI buses enumeration. Algorithm traverses available quirk
list, matches against <platform ID (DMI), domain, bus number> tuple and
returns corresponding accessors. All quirks can be defined using:
DECLARE_ACPI_MCFG_FIXUP() and kept self contained. Example,

static const struct dmi_system_id foo_dmi[] = {
{
.ident = "<Platform ident string>",
.callback = <handler>,
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"),
DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"),
DMI_MATCH(DMI_PRODUCT_VERSION, "product version"),
},
},
{ }
};

static struct pci_ops foo_ecam_pci_ops = {
.map_bus = pci_mcfg_dev_base,
.read = foo_ecam_config_read,
.write = foo_ecam_config_write,
};
DECLARE_ACPI_MCFG_FIXUP(foo_dmi, NULL, &foo_ecam_pci_ops, <domain_nr>, <bus_nr>);

More custom (non-DMI) matching can be done via an extra call.
Note that there is possibility to assign quirk related private data to
root->sysdata which will be available along read/wriate accessor, example:

static int boo_match(struct pci_mcfg_fixup *fixup, struct acpi_pci_root *root)
{
return [condition] ? 1 : 0;
}

int boo_ecam_config_read(struct pci_bus *bus, unsigned int devfn,
int where, int size, u32 *val)
{
struct acpi_pci_root *root = bus->sysdata;
struct boo_priv_data *boo_data = root->sysdata;

[..]
}

static struct pci_ops boo_ecam_pci_ops = {
.map_bus = pci_mcfg_dev_base,
.read = boo_ecam_config_read,
.write = boo_ecam_config_write,
};
DECLARE_ACPI_MCFG_FIXUP(NULL, boo_match, &boo_ecam_pci_ops, <domain_nr>, <bus_nr>);

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
drivers/acpi/pci_mcfg.c | 32 ++++++++++++++++++++++++++++++--
include/acpi/acpi_bus.h | 1 +
include/asm-generic/vmlinux.lds.h | 7 +++++++
include/linux/pci-acpi.h | 18 ++++++++++++++++++
4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index 0062257..b343547 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -41,6 +41,29 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus,
return PCIBIOS_DEVICE_NOT_FOUND;
}

+extern struct pci_mcfg_fixup __start_acpi_mcfg_fixups[];
+extern struct pci_mcfg_fixup __end_acpi_mcfg_fixups[];
+
+static struct pci_ops *pci_mcfg_check_quirks(struct acpi_pci_root *root)
+{
+ struct pci_mcfg_fixup *f;
+ int bus_num = root->secondary.start;
+ int domain = root->segment;
+
+ /*
+ * First match against PCI topology <domain:bus> then use DMI or
+ * custom match handler.
+ */
+ for (f = __start_acpi_mcfg_fixups; f < __end_acpi_mcfg_fixups; f++) {
+ if ((f->domain == domain || f->domain == PCI_MCFG_DOMAIN_ANY) &&
+ (f->bus_num == bus_num || f->bus_num == PCI_MCFG_BUS_ANY) &&
+ (f->system ? dmi_check_system(f->system) : 1 &&
+ f->match ? f->match(f, root) : 1))
+ return f->ops;
+ }
+ return NULL;
+}
+
void __iomem *
pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset)
{
@@ -63,10 +86,15 @@ static struct pci_ops default_pci_mcfg_ops = {

struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root)
{
+ struct pci_ops *pci_mcfg_ops_quirk;
+
/*
- * TODO: Match against platform specific quirks and return
- * corresponding PCI config space accessor set.
+ * Match against platform specific quirks and return corresponding
+ * PCI config space accessor set.
*/
+ pci_mcfg_ops_quirk = pci_mcfg_check_quirks(root);
+ if (pci_mcfg_ops_quirk)
+ return pci_mcfg_ops_quirk;

return &default_pci_mcfg_ops;
}
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 14362a8..0fc6f13 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -556,6 +556,7 @@ struct acpi_pci_root {
struct pci_bus *bus;
u16 segment;
struct resource secondary; /* downstream bus range */
+ void *sysdata;

u32 osc_support_set; /* _OSC state of support bits */
u32 osc_control_set; /* _OSC state of control bits */
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index c4bd0e2..c93fc97 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -298,6 +298,13 @@
VMLINUX_SYMBOL(__end_pci_fixups_suspend_late) = .; \
} \
\
+ /* ACPI MCFG quirks */ \
+ .acpi_fixup : AT(ADDR(.acpi_fixup) - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__start_acpi_mcfg_fixups) = .; \
+ *(.acpi_fixup_mcfg) \
+ VMLINUX_SYMBOL(__end_acpi_mcfg_fixups) = .; \
+ } \
+ \
/* Built-in firmware blobs */ \
.builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start_builtin_fw) = .; \
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 93feb04..9e1bedd 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -123,6 +123,24 @@ struct pci_mmcfg_region {
bool hot_added;
};

+struct pci_mcfg_fixup {
+ const struct dmi_system_id *system;
+ int (*match)(struct pci_mcfg_fixup *, struct acpi_pci_root *);
+ struct pci_ops *ops;
+ int domain;
+ int bus_num;
+};
+
+#define PCI_MCFG_DOMAIN_ANY -1
+#define PCI_MCFG_BUS_ANY -1
+
+/* Designate a routine to fix up buggy MCFG */
+#define DECLARE_ACPI_MCFG_FIXUP(system, match, ops, dom, bus) \
+ static const struct pci_mcfg_fixup __mcfg_fixup_##system##dom##bus\
+ __used __attribute__((__section__(".acpi_fixup_mcfg"), \
+ aligned((sizeof(void *))))) = \
+ { system, match, ops, dom, bus };
+
extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr);
--
1.9.1
Mark Salter
2016-03-18 16:00:01 UTC
Permalink
Post by Tomasz Nowicki
Some platforms may not be fully compliant with generic set of PCI config
accessors. For these cases we implement the way to overwrite accessors
set prior to PCI buses enumeration. Algorithm traverses available quirk
list, matches against <platform ID (DMI), domain, bus number> tuple and
DECLARE_ACPI_MCFG_FIXUP() and kept self contained. Example,
static const struct dmi_system_id foo_dmi[] = {
        {
                .ident = "<Platform ident string>",
                .callback = <handler>,
                .matches = {
                        DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"),
                        DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"),
                        DMI_MATCH(DMI_PRODUCT_VERSION, "product version"),
                },
        },
        { }
};
static struct pci_ops foo_ecam_pci_ops = {
        .map_bus = pci_mcfg_dev_base,
        .read = foo_ecam_config_read,
        .write = foo_ecam_config_write,
};
DECLARE_ACPI_MCFG_FIXUP(foo_dmi, NULL, &foo_ecam_pci_ops, <domain_nr>, <bus_nr>);
More custom (non-DMI) matching can be done via an extra call.
Note that there is possibility to assign quirk related private data to
static int boo_match(struct pci_mcfg_fixup *fixup, struct acpi_pci_root *root)
{
        return [condition] ? 1 : 0;
}
int boo_ecam_config_read(struct pci_bus *bus, unsigned int devfn,
                          int where, int size, u32 *val)
{
        struct acpi_pci_root *root = bus->sysdata;
        struct boo_priv_data *boo_data = root->sysdata;
        [..]
}
static struct pci_ops boo_ecam_pci_ops = {
.map_bus = pci_mcfg_dev_base,
.read = boo_ecam_config_read,
.write = boo_ecam_config_write,
};
DECLARE_ACPI_MCFG_FIXUP(NULL, boo_match, &boo_ecam_pci_ops, <domain_nr>, <bus_nr>);
---
 drivers/acpi/pci_mcfg.c           | 32 ++++++++++++++++++++++++++++++--
 include/acpi/acpi_bus.h           |  1 +
 include/asm-generic/vmlinux.lds.h |  7 +++++++
 include/linux/pci-acpi.h          | 18 ++++++++++++++++++
 4 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index 0062257..b343547 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -41,6 +41,29 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus,
  return PCIBIOS_DEVICE_NOT_FOUND;
 }
 
+extern struct pci_mcfg_fixup __start_acpi_mcfg_fixups[];
+extern struct pci_mcfg_fixup __end_acpi_mcfg_fixups[];
+
+static struct pci_ops *pci_mcfg_check_quirks(struct acpi_pci_root *root)
+{
+ struct pci_mcfg_fixup *f;
+ int bus_num = root->secondary.start;
+ int domain = root->segment;
+
+ /*
+  * First match against PCI topology <domain:bus> then use DMI or
+  * custom match handler.
+  */
+ for (f = __start_acpi_mcfg_fixups; f < __end_acpi_mcfg_fixups; f++) {
+ if ((f->domain == domain || f->domain == PCI_MCFG_DOMAIN_ANY) &&
+     (f->bus_num == bus_num || f->bus_num == PCI_MCFG_BUS_ANY) &&
+     (f->system ? dmi_check_system(f->system) : 1 &&
+      f->match ? f->match(f, root) : 1))
The parens are not quite right here ^^^^
If dmi_check_system() returns true, f->match won't get called.

This should be:
    (f->system ? dmi_check_system(f->system) : 1) &&
     f->match ? f->match(f, root) : 1)
Post by Tomasz Nowicki
+ return f->ops;
+ }
+ return NULL;
+}
+
 void __iomem *
 pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset)
 {
@@ -63,10 +86,15 @@ static struct pci_ops default_pci_mcfg_ops = {
 
 struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root)
 {
+ struct pci_ops *pci_mcfg_ops_quirk;
+
  /*
-  * TODO: Match against platform specific quirks and return
-  * corresponding PCI config space accessor set.
+  * Match against platform specific quirks and return corresponding
+  * PCI config space accessor set.
   */
+ pci_mcfg_ops_quirk = pci_mcfg_check_quirks(root);
+ if (pci_mcfg_ops_quirk)
+ return pci_mcfg_ops_quirk;
 
  return &default_pci_mcfg_ops;
 }
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 14362a8..0fc6f13 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -556,6 +556,7 @@ struct acpi_pci_root {
  struct pci_bus *bus;
  u16 segment;
  struct resource secondary; /* downstream bus range */
+ void *sysdata;
 
  u32 osc_support_set; /* _OSC state of support bits */
  u32 osc_control_set; /* _OSC state of control bits */
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index c4bd0e2..c93fc97 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -298,6 +298,13 @@
  VMLINUX_SYMBOL(__end_pci_fixups_suspend_late) = .; \
  } \
  \
+ /* ACPI MCFG quirks */ \
+ .acpi_fixup        : AT(ADDR(.acpi_fixup) - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__start_acpi_mcfg_fixups) = .; \
+ *(.acpi_fixup_mcfg) \
+ VMLINUX_SYMBOL(__end_acpi_mcfg_fixups) = .; \
+ } \
+ \
  /* Built-in firmware blobs */ \
  .builtin_fw        : AT(ADDR(.builtin_fw) - LOAD_OFFSET) { \
  VMLINUX_SYMBOL(__start_builtin_fw) = .; \
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 93feb04..9e1bedd 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -123,6 +123,24 @@ struct pci_mmcfg_region {
  bool hot_added;
 };
 
+struct pci_mcfg_fixup {
+ const struct dmi_system_id *system;
+ int (*match)(struct pci_mcfg_fixup *, struct acpi_pci_root *);
+ struct pci_ops *ops;
+ int domain;
+ int bus_num;
+};
+
+#define PCI_MCFG_DOMAIN_ANY -1
+#define PCI_MCFG_BUS_ANY -1
+
+/* Designate a routine to fix up buggy MCFG */
+#define DECLARE_ACPI_MCFG_FIXUP(system, match, ops, dom, bus) \
+ static const struct pci_mcfg_fixup __mcfg_fixup_##system##dom##bus\
+  __used __attribute__((__section__(".acpi_fixup_mcfg"), \
+ aligned((sizeof(void *))))) = \
+ { system, match, ops, dom, bus };
+
 extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
 extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
  int end, u64 addr);
Tomasz Nowicki
2016-03-22 10:30:02 UTC
Permalink
Post by Mark Salter
Post by Tomasz Nowicki
Some platforms may not be fully compliant with generic set of PCI config
accessors. For these cases we implement the way to overwrite accessors
set prior to PCI buses enumeration. Algorithm traverses available quirk
list, matches against <platform ID (DMI), domain, bus number> tuple and
DECLARE_ACPI_MCFG_FIXUP() and kept self contained. Example,
static const struct dmi_system_id foo_dmi[] = {
{
.ident = "<Platform ident string>",
.callback = <handler>,
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"),
DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"),
DMI_MATCH(DMI_PRODUCT_VERSION, "product version"),
},
},
{ }
};
static struct pci_ops foo_ecam_pci_ops = {
.map_bus = pci_mcfg_dev_base,
.read = foo_ecam_config_read,
.write = foo_ecam_config_write,
};
DECLARE_ACPI_MCFG_FIXUP(foo_dmi, NULL, &foo_ecam_pci_ops, <domain_nr>, <bus_nr>);
More custom (non-DMI) matching can be done via an extra call.
Note that there is possibility to assign quirk related private data to
static int boo_match(struct pci_mcfg_fixup *fixup, struct acpi_pci_root *root)
{
return [condition] ? 1 : 0;
}
int boo_ecam_config_read(struct pci_bus *bus, unsigned int devfn,
int where, int size, u32 *val)
{
struct acpi_pci_root *root = bus->sysdata;
struct boo_priv_data *boo_data = root->sysdata;
[..]
}
static struct pci_ops boo_ecam_pci_ops = {
.map_bus = pci_mcfg_dev_base,
.read = boo_ecam_config_read,
.write = boo_ecam_config_write,
};
DECLARE_ACPI_MCFG_FIXUP(NULL, boo_match, &boo_ecam_pci_ops, <domain_nr>, <bus_nr>);
---
drivers/acpi/pci_mcfg.c | 32 ++++++++++++++++++++++++++++++--
include/acpi/acpi_bus.h | 1 +
include/asm-generic/vmlinux.lds.h | 7 +++++++
include/linux/pci-acpi.h | 18 ++++++++++++++++++
4 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index 0062257..b343547 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -41,6 +41,29 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus,
return PCIBIOS_DEVICE_NOT_FOUND;
}
+extern struct pci_mcfg_fixup __start_acpi_mcfg_fixups[];
+extern struct pci_mcfg_fixup __end_acpi_mcfg_fixups[];
+
+static struct pci_ops *pci_mcfg_check_quirks(struct acpi_pci_root *root)
+{
+ struct pci_mcfg_fixup *f;
+ int bus_num = root->secondary.start;
+ int domain = root->segment;
+
+ /*
+ * First match against PCI topology <domain:bus> then use DMI or
+ * custom match handler.
+ */
+ for (f = __start_acpi_mcfg_fixups; f < __end_acpi_mcfg_fixups; f++) {
+ if ((f->domain == domain || f->domain == PCI_MCFG_DOMAIN_ANY) &&
+ (f->bus_num == bus_num || f->bus_num == PCI_MCFG_BUS_ANY) &&
+ (f->system ? dmi_check_system(f->system) : 1 &&
+ f->match ? f->match(f, root) : 1))
The parens are not quite right here ^^^^
If dmi_check_system() returns true, f->match won't get called.
(f->system ? dmi_check_system(f->system) : 1) &&
f->match ? f->match(f, root) : 1)
Well spotted. Thanks!

Tomasz
Tomasz Nowicki
2016-02-16 14:00:03 UTC
Permalink
Lets keep RAW ACPI PCI config space accessors empty by default,
since we are note sure if they are necessary accross all archs.
Once we sort this out, we can provide generic version or let
architectures to overwrite, like now x86.

Suggested-by: Lorenzo Pieralisi <***@arm.com>
Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Tested-by: Suravee Suthikulpanit <***@amd.com>
Tested-by: Jeremy Linton <***@arm.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
drivers/acpi/pci_mcfg.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index ea84365..0467b00 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -21,6 +21,26 @@
static DEFINE_MUTEX(pci_mmcfg_lock);
LIST_HEAD(pci_mmcfg_list);

+/*
+ * raw_pci_read/write - raw ACPI PCI config space accessors.
+ *
+ * By defauly (__weak) these accessors are empty and should be overwritten
+ * by architectures which support operations on ACPI PCI_Config regions,
+ * see osl.c file.
+ */
+
+int __weak raw_pci_read(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 *val)
+{
+ return PCIBIOS_DEVICE_NOT_FOUND;
+}
+
+int __weak raw_pci_write(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 val)
+{
+ return PCIBIOS_DEVICE_NOT_FOUND;
+}
+
static void list_add_sorted(struct pci_mmcfg_region *new)
{
struct pci_mmcfg_region *cfg;
--
1.9.1
Lorenzo Pieralisi
2016-02-17 12:40:01 UTC
Permalink
Post by Tomasz Nowicki
Lets keep RAW ACPI PCI config space accessors empty by default,
since we are note sure if they are necessary accross all archs.
Once we sort this out, we can provide generic version or let
architectures to overwrite, like now x86.
"ACPICA code requires raw PCI bus accessors in order to give AML
access to PCI_Config regions in platforms where they are actually
used. The raw PCI bus accessors implementation is arch-dependent,
therefore this patch adds a weak generic implementation (for now
empty but can be generalized if common functionality is found among
arches) allowing arches where PCI_Config regions are currently required
to override it (eg x86) as needed and providing at the same time
default stubs for arches that do not require them".

?
Post by Tomasz Nowicki
---
drivers/acpi/pci_mcfg.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index ea84365..0467b00 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -21,6 +21,26 @@
static DEFINE_MUTEX(pci_mmcfg_lock);
LIST_HEAD(pci_mmcfg_list);
+/*
+ * raw_pci_read/write - raw ACPI PCI config space accessors.
+ *
+ * By defauly (__weak) these accessors are empty and should be overwritten
s/defauly/default
Post by Tomasz Nowicki
+ * by architectures which support operations on ACPI PCI_Config regions,
+ * see osl.c file.
Add the path or remove the file reference.
Post by Tomasz Nowicki
+ */
+
+int __weak raw_pci_read(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 *val)
+{
+ return PCIBIOS_DEVICE_NOT_FOUND;
+}
+
+int __weak raw_pci_write(unsigned int domain, unsigned int bus,
+ unsigned int devfn, int reg, int len, u32 val)
+{
+ return PCIBIOS_DEVICE_NOT_FOUND;
+}
+
static void list_add_sorted(struct pci_mmcfg_region *new)
{
struct pci_mmcfg_region *cfg;
Note: this patch is not strictly required, but it is nice because
it removes the raw/dumb/empty accessors from ARM64 code (where they do not
belong), so:

Reviewed-by: Lorenzo Pieralisi <***@arm.com>
Tomasz Nowicki
2016-02-16 14:00:03 UTC
Permalink
x86 and ia64 are the only arches that implement pcibios_{add|remove}_bus hooks
and implement them in the same way. Moreover ARM64 is going to do the same.
So it seems that acpi_pci_{add|remove}_bus is generic enough to be default
option for pcibios_{add|remove}_bus hooks. Also, it is always safe to run
acpi_pci_{add|remove}_bus as they have empty stubs for !ACPI case and
return if ACPI has been switched off in run time.

After all we can remove x86 and ia64 pcibios_{add|remove}_bus
implementation.

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Reviewed-by: Lorenzo Pieralisi <***@arm.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
arch/ia64/pci/pci.c | 10 ----------
arch/x86/pci/common.c | 10 ----------
drivers/pci/probe.c | 3 +++
3 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 978d6af..be4c9ef 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -358,16 +358,6 @@ void pcibios_fixup_bus(struct pci_bus *b)
platform_pci_fixup_bus(b);
}

-void pcibios_add_bus(struct pci_bus *bus)
-{
- acpi_pci_add_bus(bus);
-}
-
-void pcibios_remove_bus(struct pci_bus *bus)
-{
- acpi_pci_remove_bus(bus);
-}
-
void pcibios_set_master (struct pci_dev *dev)
{
/* No special bus mastering setup handling */
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 2879efc..5aa25f1 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -171,16 +171,6 @@ void pcibios_fixup_bus(struct pci_bus *b)
pcibios_fixup_device_resources(dev);
}

-void pcibios_add_bus(struct pci_bus *bus)
-{
- acpi_pci_add_bus(bus);
-}
-
-void pcibios_remove_bus(struct pci_bus *bus)
-{
- acpi_pci_remove_bus(bus);
-}
-
/*
* Only use DMI information to set this if nothing was passed
* on the kernel command line (which was parsed earlier).
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 88a4734..9859b12 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/cpumask.h>
+#include <linux/pci-acpi.h>
#include <linux/pci-aspm.h>
#include <linux/aer.h>
#include <linux/acpi.h>
@@ -2060,10 +2061,12 @@ int __weak pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)

void __weak pcibios_add_bus(struct pci_bus *bus)
{
+ acpi_pci_add_bus(bus);
}

void __weak pcibios_remove_bus(struct pci_bus *bus)
{
+ acpi_pci_remove_bus(bus);
}

struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
--
1.9.1
Tomasz Nowicki
2016-02-16 14:00:03 UTC
Permalink
From: Jayachandran C <***@broadcom.com>

Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h

As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.

This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.

Signed-off-by: Jayachandran C <***@broadcom.com>
[Xen parts:]
Acked-by: David Vrabel <***@citrix.com>
---
arch/x86/include/asm/pci_x86.h | 24 +---
arch/x86/pci/mmconfig-shared.c | 269 +++++------------------------------
arch/x86/pci/mmconfig_32.c | 1 +
arch/x86/pci/mmconfig_64.c | 1 +
arch/x86/pci/numachip.c | 1 +
drivers/acpi/Makefile | 1 +
drivers/acpi/pci_mcfg.c | 312 +++++++++++++++++++++++++++++++++++++++++
drivers/xen/pci.c | 5 +-
include/linux/pci-acpi.h | 33 +++++
9 files changed, 386 insertions(+), 261 deletions(-)
create mode 100644 drivers/acpi/pci_mcfg.c

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 46873fb..7824626 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -122,33 +122,11 @@ extern int pci_legacy_init(void);
extern void pcibios_fixup_irqs(void);

/* pci-mmconfig.c */
-
-/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
-#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
-
-struct pci_mmcfg_region {
- struct list_head list;
- struct resource res;
- u64 address;
- char __iomem *virt;
- u16 segment;
- u8 start_bus;
- u8 end_bus;
- char name[PCI_MMCFG_RESOURCE_NAME_LEN];
-};
-
+struct pci_mmcfg_region;
extern int __init pci_mmcfg_arch_init(void);
extern void __init pci_mmcfg_arch_free(void);
extern int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg);
extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg);
-extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
- phys_addr_t addr);
-extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
-extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
-
-extern struct list_head pci_mmcfg_list;
-
-#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)

/*
* On AMD Fam10h CPUs, all PCI MMIO configuration space accesses must use
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index dd30b7e..626710b 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -12,13 +12,12 @@

#include <linux/pci.h>
#include <linux/init.h>
-#include <linux/sfi_acpi.h>
#include <linux/bitmap.h>
-#include <linux/dmi.h>
#include <linux/slab.h>
#include <linux/mutex.h>
#include <linux/rculist.h>
#include <asm/e820.h>
+#include <linux/pci-acpi.h>
#include <asm/pci_x86.h>
#include <asm/acpi.h>

@@ -27,9 +26,6 @@
/* Indicate if the mmcfg resources have been placed into the resource table. */
static bool pci_mmcfg_running_state;
static bool pci_mmcfg_arch_init_failed;
-static DEFINE_MUTEX(pci_mmcfg_lock);
-
-LIST_HEAD(pci_mmcfg_list);

static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg)
{
@@ -48,83 +44,6 @@ static void __init free_all_mmcfg(void)
pci_mmconfig_remove(cfg);
}

-static void list_add_sorted(struct pci_mmcfg_region *new)
-{
- struct pci_mmcfg_region *cfg;
-
- /* keep list sorted by segment and starting bus number */
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
- if (cfg->segment > new->segment ||
- (cfg->segment == new->segment &&
- cfg->start_bus >= new->start_bus)) {
- list_add_tail_rcu(&new->list, &cfg->list);
- return;
- }
- }
- list_add_tail_rcu(&new->list, &pci_mmcfg_list);
-}
-
-static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
- int end, u64 addr)
-{
- struct pci_mmcfg_region *new;
- struct resource *res;
-
- if (addr == 0)
- return NULL;
-
- new = kzalloc(sizeof(*new), GFP_KERNEL);
- if (!new)
- return NULL;
-
- new->address = addr;
- new->segment = segment;
- new->start_bus = start;
- new->end_bus = end;
-
- res = &new->res;
- res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
- res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
- res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN,
- "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end);
- res->name = new->name;
-
- return new;
-}
-
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
- int end, u64 addr)
-{
- struct pci_mmcfg_region *new;
-
- new = pci_mmconfig_alloc(segment, start, end, addr);
- if (new) {
- mutex_lock(&pci_mmcfg_lock);
- list_add_sorted(new);
- mutex_unlock(&pci_mmcfg_lock);
-
- pr_info(PREFIX
- "MMCONFIG for domain %04x [bus %02x-%02x] at %pR "
- "(base %#lx)\n",
- segment, start, end, &new->res, (unsigned long)addr);
- }
-
- return new;
-}
-
-struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
-{
- struct pci_mmcfg_region *cfg;
-
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
- if (cfg->segment == segment &&
- cfg->start_bus <= bus && bus <= cfg->end_bus)
- return cfg;
-
- return NULL;
-}
-
static const char *__init pci_mmcfg_e7520(void)
{
u32 win;
@@ -543,73 +462,6 @@ static void __init pci_mmcfg_reject_broken(int early)
}
}

-static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg,
- struct acpi_mcfg_allocation *cfg)
-{
- int year;
-
- if (cfg->address < 0xFFFFFFFF)
- return 0;
-
- if (!strncmp(mcfg->header.oem_id, "SGI", 3))
- return 0;
-
- if (mcfg->header.revision >= 1) {
- if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) &&
- year >= 2010)
- return 0;
- }
-
- pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx "
- "is above 4GB, ignored\n", cfg->pci_segment,
- cfg->start_bus_number, cfg->end_bus_number, cfg->address);
- return -EINVAL;
-}
-
-static int __init pci_parse_mcfg(struct acpi_table_header *header)
-{
- struct acpi_table_mcfg *mcfg;
- struct acpi_mcfg_allocation *cfg_table, *cfg;
- unsigned long i;
- int entries;
-
- if (!header)
- return -EINVAL;
-
- mcfg = (struct acpi_table_mcfg *)header;
-
- /* how many config structures do we have */
- free_all_mmcfg();
- entries = 0;
- i = header->length - sizeof(struct acpi_table_mcfg);
- while (i >= sizeof(struct acpi_mcfg_allocation)) {
- entries++;
- i -= sizeof(struct acpi_mcfg_allocation);
- }
- if (entries == 0) {
- pr_err(PREFIX "MMCONFIG has no entries\n");
- return -ENODEV;
- }
-
- cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1];
- for (i = 0; i < entries; i++) {
- cfg = &cfg_table[i];
- if (acpi_mcfg_check_entry(mcfg, cfg)) {
- free_all_mmcfg();
- return -ENODEV;
- }
-
- if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number,
- cfg->end_bus_number, cfg->address) == NULL) {
- pr_warn(PREFIX "no memory for MCFG entries\n");
- free_all_mmcfg();
- return -ENOMEM;
- }
- }
-
- return 0;
-}
-
#ifdef CONFIG_ACPI_APEI
extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
void *data), void *data);
@@ -662,13 +514,20 @@ static void __init __pci_mmcfg_init(int early)

static int __initdata known_bridge;

+static void __init pci_mmcfg_list_setup(void)
+{
+ free_all_mmcfg();
+ if (pci_mmconfig_parse_table())
+ free_all_mmcfg();
+}
+
void __init pci_mmcfg_early_init(void)
{
if (pci_probe & PCI_PROBE_MMCONF) {
if (pci_mmcfg_check_hostbridge())
known_bridge = 1;
else
- acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+ pci_mmcfg_list_setup();
__pci_mmcfg_init(1);

set_apei_filter();
@@ -686,7 +545,7 @@ void __init pci_mmcfg_late_init(void)

/* MMCONFIG hasn't been enabled yet, try again */
if (pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF) {
- acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+ pci_mmcfg_list_setup();
__pci_mmcfg_init(0);
}
}
@@ -720,99 +579,41 @@ static int __init pci_mmcfg_late_insert_resources(void)
*/
late_initcall(pci_mmcfg_late_insert_resources);

-/* Add MMCFG information for host bridges */
-int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
- phys_addr_t addr)
+int pci_mmconfig_map_resource(struct device *dev, struct pci_mmcfg_region *cfg)
{
- int rc;
- struct resource *tmp = NULL;
- struct pci_mmcfg_region *cfg;
+ struct resource *tmp;

- if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed)
- return -ENODEV;
-
- if (start > end)
- return -EINVAL;
-
- mutex_lock(&pci_mmcfg_lock);
- cfg = pci_mmconfig_lookup(seg, start);
- if (cfg) {
- if (cfg->end_bus < end)
- dev_info(dev, FW_INFO
- "MMCONFIG for "
- "domain %04x [bus %02x-%02x] "
- "only partially covers this bridge\n",
- cfg->segment, cfg->start_bus, cfg->end_bus);
- mutex_unlock(&pci_mmcfg_lock);
- return -EEXIST;
- }
-
- if (!addr) {
- mutex_unlock(&pci_mmcfg_lock);
- return -EINVAL;
- }
-
- rc = -EBUSY;
- cfg = pci_mmconfig_alloc(seg, start, end, addr);
- if (cfg == NULL) {
- dev_warn(dev, "fail to add MMCONFIG (out of memory)\n");
- rc = -ENOMEM;
- } else if (!pci_mmcfg_check_reserved(dev, cfg, 0)) {
+ if (!pci_mmcfg_check_reserved(dev, cfg, 0)) {
dev_warn(dev, FW_BUG "MMCONFIG %pR isn't reserved\n",
&cfg->res);
- } else {
- /* Insert resource if it's not in boot stage */
- if (pci_mmcfg_running_state)
- tmp = insert_resource_conflict(&iomem_resource,
- &cfg->res);
-
+ return -EBUSY;
+ }
+ /* Insert resource if it's not in boot stage */
+ if (pci_mmcfg_running_state) {
+ tmp = insert_resource_conflict(&iomem_resource,
+ &cfg->res);
if (tmp) {
- dev_warn(dev,
- "MMCONFIG %pR conflicts with "
- "%s %pR\n",
- &cfg->res, tmp->name, tmp);
- } else if (pci_mmcfg_arch_map(cfg)) {
- dev_warn(dev, "fail to map MMCONFIG %pR.\n",
- &cfg->res);
- } else {
- list_add_sorted(cfg);
- dev_info(dev, "MMCONFIG at %pR (base %#lx)\n",
- &cfg->res, (unsigned long)addr);
- cfg = NULL;
- rc = 0;
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &cfg->res, tmp->name, tmp);
+ return -EBUSY;
}
}
-
- if (cfg) {
- if (cfg->res.parent)
- release_resource(&cfg->res);
- kfree(cfg);
+ if (pci_mmcfg_arch_map(cfg)) {
+ dev_warn(dev, "fail to map MMCONFIG %pR.\n", &cfg->res);
+ return -EBUSY;
}
-
- mutex_unlock(&pci_mmcfg_lock);
-
- return rc;
+ return 0;
}

-/* Delete MMCFG information for host bridges */
-int pci_mmconfig_delete(u16 seg, u8 start, u8 end)
+void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *cfg)
{
- struct pci_mmcfg_region *cfg;
-
- mutex_lock(&pci_mmcfg_lock);
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
- if (cfg->segment == seg && cfg->start_bus == start &&
- cfg->end_bus == end) {
- list_del_rcu(&cfg->list);
- synchronize_rcu();
- pci_mmcfg_arch_unmap(cfg);
- if (cfg->res.parent)
- release_resource(&cfg->res);
- mutex_unlock(&pci_mmcfg_lock);
- kfree(cfg);
- return 0;
- }
- mutex_unlock(&pci_mmcfg_lock);
+ pci_mmcfg_arch_unmap(cfg);
+ if (cfg->res.parent)
+ release_resource(&cfg->res);
+ cfg->res.parent = NULL;
+}

- return -ENOENT;
+int pci_mmconfig_enabled(void)
+{
+ return (pci_probe & PCI_PROBE_MMCONF) && !pci_mmcfg_arch_init_failed;
}
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 43984bc..38a37f8 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -12,6 +12,7 @@
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/rcupdate.h>
+#include <linux/pci-acpi.h>
#include <asm/e820.h>
#include <asm/pci_x86.h>

diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index bea5249..29253ec 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -10,6 +10,7 @@
#include <linux/acpi.h>
#include <linux/bitmap.h>
#include <linux/rcupdate.h>
+#include <linux/pci-acpi.h>
#include <asm/e820.h>
#include <asm/pci_x86.h>

diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c
index 2e565e6..c181eeb 100644
--- a/arch/x86/pci/numachip.c
+++ b/arch/x86/pci/numachip.c
@@ -14,6 +14,7 @@
*/

#include <linux/pci.h>
+#include <linux/pci-acpi.h>
#include <asm/pci_x86.h>

static u8 limit __read_mostly;
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 7ea903d..e5e4393 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -40,6 +40,7 @@ acpi-$(CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC) += processor_pdc.o
acpi-y += ec.o
acpi-$(CONFIG_ACPI_DOCK) += dock.o
acpi-y += pci_root.o pci_link.o pci_irq.o
+acpi-$(CONFIG_PCI_MMCONFIG) += pci_mcfg.o
acpi-y += acpi_lpss.o acpi_apd.o
acpi-y += acpi_platform.o
acpi-y += acpi_pnp.o
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
@@ -0,0 +1,312 @@
+/*
+ * pci_mcfg.c
+ *
+ * Common code to maintain the MCFG areas and mappings
+ *
+ * This has been extracted from arch/x86/pci/mmconfig-shared.c
+ * and moved here so that other architectures can use this code.
+ */
+
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/dmi.h>
+#include <linux/pci-acpi.h>
+#include <linux/sfi_acpi.h>
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include <linux/rculist.h>
+
+#define PREFIX "ACPI: "
+
+static DEFINE_MUTEX(pci_mmcfg_lock);
+LIST_HEAD(pci_mmcfg_list);
+
+static void list_add_sorted(struct pci_mmcfg_region *new)
+{
+ struct pci_mmcfg_region *cfg;
+
+ /* keep list sorted by segment and starting bus number */
+ list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
+ if (cfg->segment > new->segment ||
+ (cfg->segment == new->segment &&
+ cfg->start_bus >= new->start_bus)) {
+ list_add_tail_rcu(&new->list, &cfg->list);
+ return;
+ }
+ }
+ list_add_tail_rcu(&new->list, &pci_mmcfg_list);
+}
+
+static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
+ int end, u64 addr)
+{
+ struct pci_mmcfg_region *new;
+ struct resource *res;
+
+ if (addr == 0)
+ return NULL;
+
+ new = kzalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return NULL;
+
+ new->address = addr;
+ new->segment = segment;
+ new->start_bus = start;
+ new->end_bus = end;
+
+ res = &new->res;
+ res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
+ res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN,
+ "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end);
+ res->name = new->name;
+
+ return new;
+}
+
+struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr)
+{
+ struct pci_mmcfg_region *new;
+
+ new = pci_mmconfig_alloc(segment, start, end, addr);
+ if (new) {
+ mutex_lock(&pci_mmcfg_lock);
+ list_add_sorted(new);
+ mutex_unlock(&pci_mmcfg_lock);
+
+ pr_info(PREFIX
+ "MMCONFIG for domain %04x [bus %02x-%02x] at %pR "
+ "(base %#lx)\n",
+ segment, start, end, &new->res, (unsigned long)addr);
+ }
+
+ return new;
+}
+
+struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
+{
+ struct pci_mmcfg_region *cfg;
+
+ list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
+ if (cfg->segment == segment &&
+ cfg->start_bus <= bus && bus <= cfg->end_bus)
+ return cfg;
+
+ return NULL;
+}
+
+/*
+ * Map a pci_mmcfg_region, can be overrriden by arch
+ */
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
+
+ vaddr = ioremap(mcfg->res.start, resource_size(&mcfg->res));
+ if (!vaddr) {
+ release_resource(&mcfg->res);
+ return -ENOMEM;
+ }
+
+ mcfg->virt = vaddr;
+ return 0;
+}
+
+/*
+ * Unmap a pci_mmcfg_region, can be overrriden by arch
+ */
+void __weak pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg)
+{
+ if (mcfg->virt) {
+ iounmap(mcfg->virt);
+ mcfg->virt = NULL;
+ }
+ if (mcfg->res.parent) {
+ release_resource(&mcfg->res);
+ mcfg->res.parent = NULL;
+ }
+}
+
+/*
+ * check if the mmconfig is enabled and configured
+ */
+int __weak pci_mmconfig_enabled(void)
+{
+ return 1;
+}
+
+/* Add MMCFG information for host bridges */
+int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr)
+{
+ struct pci_mmcfg_region *cfg;
+ int rc;
+
+ if (!pci_mmconfig_enabled())
+ return -ENODEV;
+ if (start > end)
+ return -EINVAL;
+
+ mutex_lock(&pci_mmcfg_lock);
+ cfg = pci_mmconfig_lookup(seg, start);
+ if (cfg) {
+ if (cfg->end_bus < end)
+ dev_info(dev, FW_INFO
+ "MMCONFIG for "
+ "domain %04x [bus %02x-%02x] "
+ "only partially covers this bridge\n",
+ cfg->segment, cfg->start_bus, cfg->end_bus);
+ rc = -EEXIST;
+ goto err;
+ }
+
+ if (!addr) {
+ rc = -EINVAL;
+ goto err;
+ }
+
+ cfg = pci_mmconfig_alloc(seg, start, end, addr);
+ if (cfg == NULL) {
+ dev_warn(dev, "fail to add MMCONFIG (out of memory)\n");
+ rc = -ENOMEM;
+ goto err;
+ }
+ rc = pci_mmconfig_map_resource(dev, cfg);
+ if (!rc) {
+ list_add_sorted(cfg);
+ dev_info(dev, "MMCONFIG at %pR (base %#lx)\n",
+ &cfg->res, (unsigned long)addr);
+ return 0;
+ } else {
+ if (cfg->res.parent)
+ release_resource(&cfg->res);
+ kfree(cfg);
+ }
+
+err:
+ mutex_unlock(&pci_mmcfg_lock);
+ return rc;
+}
+
+/* Delete MMCFG information for host bridges */
+int pci_mmconfig_delete(u16 seg, u8 start, u8 end)
+{
+ struct pci_mmcfg_region *cfg;
+
+ mutex_lock(&pci_mmcfg_lock);
+ list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
+ if (cfg->segment == seg && cfg->start_bus == start &&
+ cfg->end_bus == end) {
+ list_del_rcu(&cfg->list);
+ synchronize_rcu();
+ pci_mmconfig_unmap_resource(cfg);
+ mutex_unlock(&pci_mmcfg_lock);
+ kfree(cfg);
+ return 0;
+ }
+ mutex_unlock(&pci_mmcfg_lock);
+
+ return -ENOENT;
+}
+
+static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg,
+ struct acpi_mcfg_allocation *cfg)
+{
+ int year;
+
+ if (!config_enabled(CONFIG_X86))
+ return 0;
+
+ if (cfg->address < 0xFFFFFFFF)
+ return 0;
+
+ if (!strncmp(mcfg->header.oem_id, "SGI", 3))
+ return 0;
+
+ if (mcfg->header.revision >= 1) {
+ if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) &&
+ year >= 2010)
+ return 0;
+ }
+
+ pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx "
+ "is above 4GB, ignored\n", cfg->pci_segment,
+ cfg->start_bus_number, cfg->end_bus_number, cfg->address);
+ return -EINVAL;
+}
+
+static int __init pci_parse_mcfg(struct acpi_table_header *header)
+{
+ struct acpi_table_mcfg *mcfg;
+ struct acpi_mcfg_allocation *cfg_table, *cfg;
+ unsigned long i;
+ int entries;
+
+ if (!header)
+ return -EINVAL;
+
+ mcfg = (struct acpi_table_mcfg *)header;
+
+ /* how many config structures do we have */
+ entries = 0;
+ i = header->length - sizeof(struct acpi_table_mcfg);
+ while (i >= sizeof(struct acpi_mcfg_allocation)) {
+ entries++;
+ i -= sizeof(struct acpi_mcfg_allocation);
+ }
+ if (entries == 0) {
+ pr_err(PREFIX "MMCONFIG has no entries\n");
+ return -ENODEV;
+ }
+
+ cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1];
+ for (i = 0; i < entries; i++) {
+ cfg = &cfg_table[i];
+ if (acpi_mcfg_check_entry(mcfg, cfg))
+ return -ENODEV;
+
+ if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number,
+ cfg->end_bus_number, cfg->address) == NULL) {
+ pr_warn(PREFIX "no memory for MCFG entries\n");
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
+
+void __weak __init pci_mmcfg_late_init(void)
+{
+ int err, n = 0;
+ struct pci_mmcfg_region *cfg;
+
+ err = pci_mmconfig_parse_table();
+ if (err) {
+ pr_err(PREFIX " Failed to parse MCFG (%d)\n", err);
+ return;
+ }
+
+ list_for_each_entry(cfg, &pci_mmcfg_list, list) {
+ pci_mmconfig_map_resource(NULL, cfg);
+ n++;
+ }
+
+ pr_info(PREFIX " MCFG table loaded %d entries\n", n);
+}
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 7494dbe..97aa9d3 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -27,9 +27,6 @@
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
#include "../pci/pci.h"
-#ifdef CONFIG_PCI_MMCONFIG
-#include <asm/pci_x86.h>
-#endif

static bool __read_mostly pci_seg_supported = true;

@@ -221,7 +218,7 @@ static int __init xen_mcfg_late(void)
if (!xen_initial_domain())
return 0;

- if ((pci_probe & PCI_PROBE_MMCONF) == 0)
+ if (!pci_mmconfig_enabled())
return 0;

if (list_empty(&pci_mmcfg_list))
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09

+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
--
1.9.1
Lorenzo Pieralisi
2016-02-17 11:00:02 UTC
Permalink
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
[Xen parts:]
---
arch/x86/include/asm/pci_x86.h | 24 +---
arch/x86/pci/mmconfig-shared.c | 269 +++++------------------------------
arch/x86/pci/mmconfig_32.c | 1 +
arch/x86/pci/mmconfig_64.c | 1 +
arch/x86/pci/numachip.c | 1 +
drivers/acpi/Makefile | 1 +
drivers/acpi/pci_mcfg.c | 312 +++++++++++++++++++++++++++++++++++++++++
drivers/xen/pci.c | 5 +-
include/linux/pci-acpi.h | 33 +++++
9 files changed, 386 insertions(+), 261 deletions(-)
create mode 100644 drivers/acpi/pci_mcfg.c
This patch makes perfect sense to me and manages to move MCFG handling
to common code in a seamless manner, it is basically a code move with
weak functions to cater for X86 specific legacy bits which are otherwise
pretty complex to untangle, so (apart from a few nits below):

Reviewed-by: Lorenzo Pieralisi <***@arm.com>

[...]
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
@@ -0,0 +1,312 @@
+/*
+ * pci_mcfg.c
+ *
+ * Common code to maintain the MCFG areas and mappings
+ *
+ * This has been extracted from arch/x86/pci/mmconfig-shared.c
+ * and moved here so that other architectures can use this code.
+ */
+
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/dmi.h>
+#include <linux/pci-acpi.h>
+#include <linux/sfi_acpi.h>
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include <linux/rculist.h>
Nit: while at it order them alphabetically.
Post by Tomasz Nowicki
+
+#define PREFIX "ACPI: "
+
+static DEFINE_MUTEX(pci_mmcfg_lock);
+LIST_HEAD(pci_mmcfg_list);
+
+static void list_add_sorted(struct pci_mmcfg_region *new)
+{
+ struct pci_mmcfg_region *cfg;
+
+ /* keep list sorted by segment and starting bus number */
+ list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
+ if (cfg->segment > new->segment ||
+ (cfg->segment == new->segment &&
+ cfg->start_bus >= new->start_bus)) {
+ list_add_tail_rcu(&new->list, &cfg->list);
+ return;
+ }
+ }
+ list_add_tail_rcu(&new->list, &pci_mmcfg_list);
+}
+
+static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
+ int end, u64 addr)
+{
+ struct pci_mmcfg_region *new;
+ struct resource *res;
+
+ if (addr == 0)
+ return NULL;
+
+ new = kzalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return NULL;
+
+ new->address = addr;
+ new->segment = segment;
+ new->start_bus = start;
+ new->end_bus = end;
+
+ res = &new->res;
+ res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
+ res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN,
+ "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end);
+ res->name = new->name;
+
+ return new;
+}
+
+struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr)
+{
+ struct pci_mmcfg_region *new;
+
+ new = pci_mmconfig_alloc(segment, start, end, addr);
+ if (new) {
+ mutex_lock(&pci_mmcfg_lock);
+ list_add_sorted(new);
+ mutex_unlock(&pci_mmcfg_lock);
+
+ pr_info(PREFIX
+ "MMCONFIG for domain %04x [bus %02x-%02x] at %pR "
+ "(base %#lx)\n",
+ segment, start, end, &new->res, (unsigned long)addr);
+ }
+
+ return new;
+}
+
+struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
+{
+ struct pci_mmcfg_region *cfg;
+
+ list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
+ if (cfg->segment == segment &&
+ cfg->start_bus <= bus && bus <= cfg->end_bus)
+ return cfg;
+
+ return NULL;
+}
+
+/*
+ * Map a pci_mmcfg_region, can be overrriden by arch
s/overrriden/overridden/

[...]
Post by Tomasz Nowicki
+static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg,
+ struct acpi_mcfg_allocation *cfg)
+{
+ int year;
+
+ if (!config_enabled(CONFIG_X86))
+ return 0;
This check in generic code may ruffle someone's feathers, I even think we
can run this function safely on ARM64 but to prevent surprises we'd better
keep the X86 check, alternatives like adding a weak function just for a
quirk do not make much sense to me.

Lorenzo
Post by Tomasz Nowicki
+
+ if (cfg->address < 0xFFFFFFFF)
+ return 0;
+
+ if (!strncmp(mcfg->header.oem_id, "SGI", 3))
+ return 0;
+
+ if (mcfg->header.revision >= 1) {
+ if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) &&
+ year >= 2010)
+ return 0;
+ }
+
+ pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx "
+ "is above 4GB, ignored\n", cfg->pci_segment,
+ cfg->start_bus_number, cfg->end_bus_number, cfg->address);
+ return -EINVAL;
+}
+
+static int __init pci_parse_mcfg(struct acpi_table_header *header)
+{
+ struct acpi_table_mcfg *mcfg;
+ struct acpi_mcfg_allocation *cfg_table, *cfg;
+ unsigned long i;
+ int entries;
+
+ if (!header)
+ return -EINVAL;
+
+ mcfg = (struct acpi_table_mcfg *)header;
+
+ /* how many config structures do we have */
+ entries = 0;
+ i = header->length - sizeof(struct acpi_table_mcfg);
+ while (i >= sizeof(struct acpi_mcfg_allocation)) {
+ entries++;
+ i -= sizeof(struct acpi_mcfg_allocation);
+ }
+ if (entries == 0) {
+ pr_err(PREFIX "MMCONFIG has no entries\n");
+ return -ENODEV;
+ }
+
+ cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1];
+ for (i = 0; i < entries; i++) {
+ cfg = &cfg_table[i];
+ if (acpi_mcfg_check_entry(mcfg, cfg))
+ return -ENODEV;
+
+ if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number,
+ cfg->end_bus_number, cfg->address) == NULL) {
+ pr_warn(PREFIX "no memory for MCFG entries\n");
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
+
+void __weak __init pci_mmcfg_late_init(void)
+{
+ int err, n = 0;
+ struct pci_mmcfg_region *cfg;
+
+ err = pci_mmconfig_parse_table();
+ if (err) {
+ pr_err(PREFIX " Failed to parse MCFG (%d)\n", err);
+ return;
+ }
+
+ list_for_each_entry(cfg, &pci_mmcfg_list, list) {
+ pci_mmconfig_map_resource(NULL, cfg);
+ n++;
+ }
+
+ pr_info(PREFIX " MCFG table loaded %d entries\n", n);
+}
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 7494dbe..97aa9d3 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -27,9 +27,6 @@
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
#include "../pci/pci.h"
-#ifdef CONFIG_PCI_MMCONFIG
-#include <asm/pci_x86.h>
-#endif
static bool __read_mostly pci_seg_supported = true;
@@ -221,7 +218,7 @@ static int __init xen_mcfg_late(void)
if (!xen_initial_domain())
return 0;
- if ((pci_probe & PCI_PROBE_MMCONF) == 0)
+ if (!pci_mmconfig_enabled())
return 0;
if (list_empty(&pci_mmcfg_list))
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
--
1.9.1
liudongdong (C)
2016-02-18 12:40:03 UTC
Permalink
Hi Tomasz
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
[Xen parts:]
---
arch/x86/include/asm/pci_x86.h | 24 +---
arch/x86/pci/mmconfig-shared.c | 269 +++++------------------------------
arch/x86/pci/mmconfig_32.c | 1 +
arch/x86/pci/mmconfig_64.c | 1 +
arch/x86/pci/numachip.c | 1 +
drivers/acpi/Makefile | 1 +
drivers/acpi/pci_mcfg.c | 312 +++++++++++++++++++++++++++++++++++++++++
drivers/xen/pci.c | 5 +-
include/linux/pci-acpi.h | 33 +++++
9 files changed, 386 insertions(+), 261 deletions(-)
create mode 100644 drivers/acpi/pci_mcfg.c
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 46873fb..7824626 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -122,33 +122,11 @@ extern int pci_legacy_init(void);
extern void pcibios_fixup_irqs(void);
/* pci-mmconfig.c */
-
-/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
-#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
-
-struct pci_mmcfg_region {
- struct list_head list;
- struct resource res;
- u64 address;
- char __iomem *virt;
- u16 segment;
- u8 start_bus;
- u8 end_bus;
- char name[PCI_MMCFG_RESOURCE_NAME_LEN];
-};
-
+struct pci_mmcfg_region;
extern int __init pci_mmcfg_arch_init(void);
extern void __init pci_mmcfg_arch_free(void);
extern int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg);
extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg);
-extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
- phys_addr_t addr);
-extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
-extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
-
-extern struct list_head pci_mmcfg_list;
-
-#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
/*
* On AMD Fam10h CPUs, all PCI MMIO configuration space accesses must use
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index dd30b7e..626710b 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -12,13 +12,12 @@
#include <linux/pci.h>
#include <linux/init.h>
-#include <linux/sfi_acpi.h>
#include <linux/bitmap.h>
-#include <linux/dmi.h>
#include <linux/slab.h>
#include <linux/mutex.h>
#include <linux/rculist.h>
#include <asm/e820.h>
+#include <linux/pci-acpi.h>
#include <asm/pci_x86.h>
#include <asm/acpi.h>
@@ -27,9 +26,6 @@
/* Indicate if the mmcfg resources have been placed into the resource table. */
static bool pci_mmcfg_running_state;
static bool pci_mmcfg_arch_init_failed;
-static DEFINE_MUTEX(pci_mmcfg_lock);
-
-LIST_HEAD(pci_mmcfg_list);
static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg)
{
@@ -48,83 +44,6 @@ static void __init free_all_mmcfg(void)
pci_mmconfig_remove(cfg);
}
-static void list_add_sorted(struct pci_mmcfg_region *new)
-{
- struct pci_mmcfg_region *cfg;
-
- /* keep list sorted by segment and starting bus number */
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
- if (cfg->segment > new->segment ||
- (cfg->segment == new->segment &&
- cfg->start_bus >= new->start_bus)) {
- list_add_tail_rcu(&new->list, &cfg->list);
- return;
- }
- }
- list_add_tail_rcu(&new->list, &pci_mmcfg_list);
-}
-
-static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
- int end, u64 addr)
-{
- struct pci_mmcfg_region *new;
- struct resource *res;
-
- if (addr == 0)
- return NULL;
-
- new = kzalloc(sizeof(*new), GFP_KERNEL);
- if (!new)
- return NULL;
-
- new->address = addr;
- new->segment = segment;
- new->start_bus = start;
- new->end_bus = end;
-
- res = &new->res;
- res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
- res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
- res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN,
- "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end);
- res->name = new->name;
-
- return new;
-}
-
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
- int end, u64 addr)
-{
- struct pci_mmcfg_region *new;
-
- new = pci_mmconfig_alloc(segment, start, end, addr);
- if (new) {
- mutex_lock(&pci_mmcfg_lock);
- list_add_sorted(new);
- mutex_unlock(&pci_mmcfg_lock);
-
- pr_info(PREFIX
- "MMCONFIG for domain %04x [bus %02x-%02x] at %pR "
- "(base %#lx)\n",
- segment, start, end, &new->res, (unsigned long)addr);
- }
-
- return new;
-}
-
-struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
-{
- struct pci_mmcfg_region *cfg;
-
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
- if (cfg->segment == segment &&
- cfg->start_bus <= bus && bus <= cfg->end_bus)
- return cfg;
-
- return NULL;
-}
-
static const char *__init pci_mmcfg_e7520(void)
{
u32 win;
@@ -543,73 +462,6 @@ static void __init pci_mmcfg_reject_broken(int early)
}
}
-static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg,
- struct acpi_mcfg_allocation *cfg)
-{
- int year;
-
- if (cfg->address < 0xFFFFFFFF)
- return 0;
-
- if (!strncmp(mcfg->header.oem_id, "SGI", 3))
- return 0;
-
- if (mcfg->header.revision >= 1) {
- if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) &&
- year >= 2010)
- return 0;
- }
-
- pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx "
- "is above 4GB, ignored\n", cfg->pci_segment,
- cfg->start_bus_number, cfg->end_bus_number, cfg->address);
- return -EINVAL;
-}
-
-static int __init pci_parse_mcfg(struct acpi_table_header *header)
-{
- struct acpi_table_mcfg *mcfg;
- struct acpi_mcfg_allocation *cfg_table, *cfg;
- unsigned long i;
- int entries;
-
- if (!header)
- return -EINVAL;
-
- mcfg = (struct acpi_table_mcfg *)header;
-
- /* how many config structures do we have */
- free_all_mmcfg();
- entries = 0;
- i = header->length - sizeof(struct acpi_table_mcfg);
- while (i >= sizeof(struct acpi_mcfg_allocation)) {
- entries++;
- i -= sizeof(struct acpi_mcfg_allocation);
- }
- if (entries == 0) {
- pr_err(PREFIX "MMCONFIG has no entries\n");
- return -ENODEV;
- }
-
- cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1];
- for (i = 0; i < entries; i++) {
- cfg = &cfg_table[i];
- if (acpi_mcfg_check_entry(mcfg, cfg)) {
- free_all_mmcfg();
- return -ENODEV;
- }
-
- if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number,
- cfg->end_bus_number, cfg->address) == NULL) {
- pr_warn(PREFIX "no memory for MCFG entries\n");
- free_all_mmcfg();
- return -ENOMEM;
- }
- }
-
- return 0;
-}
-
#ifdef CONFIG_ACPI_APEI
extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
void *data), void *data);
@@ -662,13 +514,20 @@ static void __init __pci_mmcfg_init(int early)
static int __initdata known_bridge;
+static void __init pci_mmcfg_list_setup(void)
+{
+ free_all_mmcfg();
+ if (pci_mmconfig_parse_table())
+ free_all_mmcfg();
+}
+
void __init pci_mmcfg_early_init(void)
{
if (pci_probe & PCI_PROBE_MMCONF) {
if (pci_mmcfg_check_hostbridge())
known_bridge = 1;
else
- acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+ pci_mmcfg_list_setup();
__pci_mmcfg_init(1);
set_apei_filter();
@@ -686,7 +545,7 @@ void __init pci_mmcfg_late_init(void)
/* MMCONFIG hasn't been enabled yet, try again */
if (pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF) {
- acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+ pci_mmcfg_list_setup();
__pci_mmcfg_init(0);
}
}
@@ -720,99 +579,41 @@ static int __init pci_mmcfg_late_insert_resources(void)
*/
late_initcall(pci_mmcfg_late_insert_resources);
-/* Add MMCFG information for host bridges */
-int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
- phys_addr_t addr)
+int pci_mmconfig_map_resource(struct device *dev, struct pci_mmcfg_region *cfg)
{
- int rc;
- struct resource *tmp = NULL;
- struct pci_mmcfg_region *cfg;
+ struct resource *tmp;
- if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed)
- return -ENODEV;
-
- if (start > end)
- return -EINVAL;
-
- mutex_lock(&pci_mmcfg_lock);
- cfg = pci_mmconfig_lookup(seg, start);
- if (cfg) {
- if (cfg->end_bus < end)
- dev_info(dev, FW_INFO
- "MMCONFIG for "
- "domain %04x [bus %02x-%02x] "
- "only partially covers this bridge\n",
- cfg->segment, cfg->start_bus, cfg->end_bus);
- mutex_unlock(&pci_mmcfg_lock);
- return -EEXIST;
- }
-
- if (!addr) {
- mutex_unlock(&pci_mmcfg_lock);
- return -EINVAL;
- }
-
- rc = -EBUSY;
- cfg = pci_mmconfig_alloc(seg, start, end, addr);
- if (cfg == NULL) {
- dev_warn(dev, "fail to add MMCONFIG (out of memory)\n");
- rc = -ENOMEM;
- } else if (!pci_mmcfg_check_reserved(dev, cfg, 0)) {
+ if (!pci_mmcfg_check_reserved(dev, cfg, 0)) {
dev_warn(dev, FW_BUG "MMCONFIG %pR isn't reserved\n",
&cfg->res);
- } else {
- /* Insert resource if it's not in boot stage */
- if (pci_mmcfg_running_state)
- tmp = insert_resource_conflict(&iomem_resource,
- &cfg->res);
-
+ return -EBUSY;
+ }
+ /* Insert resource if it's not in boot stage */
+ if (pci_mmcfg_running_state) {
+ tmp = insert_resource_conflict(&iomem_resource,
+ &cfg->res);
if (tmp) {
- dev_warn(dev,
- "MMCONFIG %pR conflicts with "
- "%s %pR\n",
- &cfg->res, tmp->name, tmp);
- } else if (pci_mmcfg_arch_map(cfg)) {
- dev_warn(dev, "fail to map MMCONFIG %pR.\n",
- &cfg->res);
- } else {
- list_add_sorted(cfg);
- dev_info(dev, "MMCONFIG at %pR (base %#lx)\n",
- &cfg->res, (unsigned long)addr);
- cfg = NULL;
- rc = 0;
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &cfg->res, tmp->name, tmp);
+ return -EBUSY;
}
}
-
- if (cfg) {
- if (cfg->res.parent)
- release_resource(&cfg->res);
- kfree(cfg);
+ if (pci_mmcfg_arch_map(cfg)) {
+ dev_warn(dev, "fail to map MMCONFIG %pR.\n", &cfg->res);
+ return -EBUSY;
}
-
- mutex_unlock(&pci_mmcfg_lock);
-
- return rc;
+ return 0;
}
-/* Delete MMCFG information for host bridges */
-int pci_mmconfig_delete(u16 seg, u8 start, u8 end)
+void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *cfg)
{
- struct pci_mmcfg_region *cfg;
-
- mutex_lock(&pci_mmcfg_lock);
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
- if (cfg->segment == seg && cfg->start_bus == start &&
- cfg->end_bus == end) {
- list_del_rcu(&cfg->list);
- synchronize_rcu();
- pci_mmcfg_arch_unmap(cfg);
- if (cfg->res.parent)
- release_resource(&cfg->res);
- mutex_unlock(&pci_mmcfg_lock);
- kfree(cfg);
- return 0;
- }
- mutex_unlock(&pci_mmcfg_lock);
+ pci_mmcfg_arch_unmap(cfg);
+ if (cfg->res.parent)
+ release_resource(&cfg->res);
+ cfg->res.parent = NULL;
+}
- return -ENOENT;
+int pci_mmconfig_enabled(void)
+{
+ return (pci_probe & PCI_PROBE_MMCONF) && !pci_mmcfg_arch_init_failed;
}
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 43984bc..38a37f8 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -12,6 +12,7 @@
#include <linux/pci.h>
#include <linux/init.h>
#include <linux/rcupdate.h>
+#include <linux/pci-acpi.h>
#include <asm/e820.h>
#include <asm/pci_x86.h>
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index bea5249..29253ec 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -10,6 +10,7 @@
#include <linux/acpi.h>
#include <linux/bitmap.h>
#include <linux/rcupdate.h>
+#include <linux/pci-acpi.h>
#include <asm/e820.h>
#include <asm/pci_x86.h>
diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c
index 2e565e6..c181eeb 100644
--- a/arch/x86/pci/numachip.c
+++ b/arch/x86/pci/numachip.c
@@ -14,6 +14,7 @@
*/
#include <linux/pci.h>
+#include <linux/pci-acpi.h>
#include <asm/pci_x86.h>
static u8 limit __read_mostly;
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 7ea903d..e5e4393 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -40,6 +40,7 @@ acpi-$(CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC) += processor_pdc.o
acpi-y += ec.o
acpi-$(CONFIG_ACPI_DOCK) += dock.o
acpi-y += pci_root.o pci_link.o pci_irq.o
+acpi-$(CONFIG_PCI_MMCONFIG) += pci_mcfg.o
acpi-y += acpi_lpss.o acpi_apd.o
acpi-y += acpi_platform.o
acpi-y += acpi_pnp.o
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
@@ -0,0 +1,312 @@
+/*
+ * pci_mcfg.c
+ *
+ * Common code to maintain the MCFG areas and mappings
+ *
+ * This has been extracted from arch/x86/pci/mmconfig-shared.c
+ * and moved here so that other architectures can use this code.
+ */
+
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/dmi.h>
+#include <linux/pci-acpi.h>
+#include <linux/sfi_acpi.h>
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include <linux/rculist.h>
+
+#define PREFIX "ACPI: "
+
+static DEFINE_MUTEX(pci_mmcfg_lock);
+LIST_HEAD(pci_mmcfg_list);
+
+static void list_add_sorted(struct pci_mmcfg_region *new)
+{
+ struct pci_mmcfg_region *cfg;
+
+ /* keep list sorted by segment and starting bus number */
+ list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
+ if (cfg->segment > new->segment ||
+ (cfg->segment == new->segment &&
+ cfg->start_bus >= new->start_bus)) {
+ list_add_tail_rcu(&new->list, &cfg->list);
+ return;
+ }
+ }
+ list_add_tail_rcu(&new->list, &pci_mmcfg_list);
+}
+
+static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
+ int end, u64 addr)
+{
+ struct pci_mmcfg_region *new;
+ struct resource *res;
+
+ if (addr == 0)
+ return NULL;
+
+ new = kzalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return NULL;
+
+ new->address = addr;
+ new->segment = segment;
+ new->start_bus = start;
+ new->end_bus = end;
+
+ res = &new->res;
+ res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
+ res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN,
+ "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end);
+ res->name = new->name;
+
+ return new;
+}
+
+struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr)
+{
+ struct pci_mmcfg_region *new;
+
+ new = pci_mmconfig_alloc(segment, start, end, addr);
+ if (new) {
+ mutex_lock(&pci_mmcfg_lock);
+ list_add_sorted(new);
+ mutex_unlock(&pci_mmcfg_lock);
+
+ pr_info(PREFIX
+ "MMCONFIG for domain %04x [bus %02x-%02x] at %pR "
+ "(base %#lx)\n",
+ segment, start, end, &new->res, (unsigned long)addr);
+ }
+
+ return new;
+}
+
+struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
+{
+ struct pci_mmcfg_region *cfg;
+
+ list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
+ if (cfg->segment == segment &&
+ cfg->start_bus <= bus && bus <= cfg->end_bus)
+ return cfg;
+
+ return NULL;
+}
+
+/*
+ * Map a pci_mmcfg_region, can be overrriden by arch
+ */
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
+
+ vaddr = ioremap(mcfg->res.start, resource_size(&mcfg->res));
+ if (!vaddr) {
+ release_resource(&mcfg->res);
+ return -ENOMEM;
+ }
+
+ mcfg->virt = vaddr;
Here should be changed to
mcfg->virt = vaddr - PCI_MMCFG_BUS_OFFSET(mcfg->start_bus);

or when pcie host "start_bus" is not 0, the configuraion access will be wrong.

See v3 or v4 patchset "addr -= PCI_MMCFG_BUS_OFFSET(cfg->start_bus);"
static void __iomem *mcfg_ioremap(struct pci_mmcfg_region *cfg)
{
void __iomem *addr;
u64 start, size;
int num_buses;

start = cfg->address + PCI_MMCFG_BUS_OFFSET(cfg->start_bus);
num_buses = cfg->end_bus - cfg->start_bus + 1;
size = PCI_MMCFG_BUS_OFFSET(num_buses);
addr = ioremap_nocache(start, size);
if (addr)
addr -= PCI_MMCFG_BUS_OFFSET(cfg->start_bus);
return addr;
}

Dongdong
Thanks
Post by Tomasz Nowicki
+ return 0;
+}
+
+/*
+ * Unmap a pci_mmcfg_region, can be overrriden by arch
+ */
+void __weak pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg)
+{
+ if (mcfg->virt) {
+ iounmap(mcfg->virt);
+ mcfg->virt = NULL;
+ }
+ if (mcfg->res.parent) {
+ release_resource(&mcfg->res);
+ mcfg->res.parent = NULL;
+ }
+}
+
+/*
+ * check if the mmconfig is enabled and configured
+ */
+int __weak pci_mmconfig_enabled(void)
+{
+ return 1;
+}
+
+/* Add MMCFG information for host bridges */
+int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr)
+{
+ struct pci_mmcfg_region *cfg;
+ int rc;
+
+ if (!pci_mmconfig_enabled())
+ return -ENODEV;
+ if (start > end)
+ return -EINVAL;
+
+ mutex_lock(&pci_mmcfg_lock);
+ cfg = pci_mmconfig_lookup(seg, start);
+ if (cfg) {
+ if (cfg->end_bus < end)
+ dev_info(dev, FW_INFO
+ "MMCONFIG for "
+ "domain %04x [bus %02x-%02x] "
+ "only partially covers this bridge\n",
+ cfg->segment, cfg->start_bus, cfg->end_bus);
+ rc = -EEXIST;
+ goto err;
+ }
+
+ if (!addr) {
+ rc = -EINVAL;
+ goto err;
+ }
+
+ cfg = pci_mmconfig_alloc(seg, start, end, addr);
+ if (cfg == NULL) {
+ dev_warn(dev, "fail to add MMCONFIG (out of memory)\n");
+ rc = -ENOMEM;
+ goto err;
+ }
+ rc = pci_mmconfig_map_resource(dev, cfg);
+ if (!rc) {
+ list_add_sorted(cfg);
+ dev_info(dev, "MMCONFIG at %pR (base %#lx)\n",
+ &cfg->res, (unsigned long)addr);
+ return 0;
+ } else {
+ if (cfg->res.parent)
+ release_resource(&cfg->res);
+ kfree(cfg);
+ }
+
+ mutex_unlock(&pci_mmcfg_lock);
+ return rc;
+}
+
+/* Delete MMCFG information for host bridges */
+int pci_mmconfig_delete(u16 seg, u8 start, u8 end)
+{
+ struct pci_mmcfg_region *cfg;
+
+ mutex_lock(&pci_mmcfg_lock);
+ list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
+ if (cfg->segment == seg && cfg->start_bus == start &&
+ cfg->end_bus == end) {
+ list_del_rcu(&cfg->list);
+ synchronize_rcu();
+ pci_mmconfig_unmap_resource(cfg);
+ mutex_unlock(&pci_mmcfg_lock);
+ kfree(cfg);
+ return 0;
+ }
+ mutex_unlock(&pci_mmcfg_lock);
+
+ return -ENOENT;
+}
+
+static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg,
+ struct acpi_mcfg_allocation *cfg)
+{
+ int year;
+
+ if (!config_enabled(CONFIG_X86))
+ return 0;
+
+ if (cfg->address < 0xFFFFFFFF)
+ return 0;
+
+ if (!strncmp(mcfg->header.oem_id, "SGI", 3))
+ return 0;
+
+ if (mcfg->header.revision >= 1) {
+ if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) &&
+ year >= 2010)
+ return 0;
+ }
+
+ pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx "
+ "is above 4GB, ignored\n", cfg->pci_segment,
+ cfg->start_bus_number, cfg->end_bus_number, cfg->address);
+ return -EINVAL;
+}
+
+static int __init pci_parse_mcfg(struct acpi_table_header *header)
+{
+ struct acpi_table_mcfg *mcfg;
+ struct acpi_mcfg_allocation *cfg_table, *cfg;
+ unsigned long i;
+ int entries;
+
+ if (!header)
+ return -EINVAL;
+
+ mcfg = (struct acpi_table_mcfg *)header;
+
+ /* how many config structures do we have */
+ entries = 0;
+ i = header->length - sizeof(struct acpi_table_mcfg);
+ while (i >= sizeof(struct acpi_mcfg_allocation)) {
+ entries++;
+ i -= sizeof(struct acpi_mcfg_allocation);
+ }
+ if (entries == 0) {
+ pr_err(PREFIX "MMCONFIG has no entries\n");
+ return -ENODEV;
+ }
+
+ cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1];
+ for (i = 0; i < entries; i++) {
+ cfg = &cfg_table[i];
+ if (acpi_mcfg_check_entry(mcfg, cfg))
+ return -ENODEV;
+
+ if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number,
+ cfg->end_bus_number, cfg->address) == NULL) {
+ pr_warn(PREFIX "no memory for MCFG entries\n");
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
+
+void __weak __init pci_mmcfg_late_init(void)
+{
+ int err, n = 0;
+ struct pci_mmcfg_region *cfg;
+
+ err = pci_mmconfig_parse_table();
+ if (err) {
+ pr_err(PREFIX " Failed to parse MCFG (%d)\n", err);
+ return;
+ }
+
+ list_for_each_entry(cfg, &pci_mmcfg_list, list) {
+ pci_mmconfig_map_resource(NULL, cfg);
+ n++;
+ }
+
+ pr_info(PREFIX " MCFG table loaded %d entries\n", n);
+}
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 7494dbe..97aa9d3 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -27,9 +27,6 @@
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
#include "../pci/pci.h"
-#ifdef CONFIG_PCI_MMCONFIG
-#include <asm/pci_x86.h>
-#endif
static bool __read_mostly pci_seg_supported = true;
@@ -221,7 +218,7 @@ static int __init xen_mcfg_late(void)
if (!xen_initial_domain())
return 0;
- if ((pci_probe & PCI_PROBE_MMCONF) == 0)
+ if (!pci_mmconfig_enabled())
return 0;
if (list_empty(&pci_mmcfg_list))
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
Lorenzo Pieralisi
2016-02-18 13:20:03 UTC
Permalink
On Thu, Feb 18, 2016 at 08:25:35PM +0800, liudongdong (C) wrote:

[...]
Post by liudongdong (C)
Post by Tomasz Nowicki
+/*
+ * Map a pci_mmcfg_region, can be overrriden by arch
+ */
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
+
+ vaddr = ioremap(mcfg->res.start, resource_size(&mcfg->res));
^^
while at it, stray white space
Post by liudongdong (C)
Post by Tomasz Nowicki
+ if (!vaddr) {
+ release_resource(&mcfg->res);
+ return -ENOMEM;
+ }
+
+ mcfg->virt = vaddr;
Here should be changed to
mcfg->virt = vaddr - PCI_MMCFG_BUS_OFFSET(mcfg->start_bus);
or when pcie host "start_bus" is not 0, the configuraion access will be wrong.
Well spotted, thanks.

Lorenzo
Bjorn Helgaas
2016-03-03 23:00:01 UTC
Permalink
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.

My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.

My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.

Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.

There's a little bit of overlap here with the ECAM code in
pci-host-generic.c. I'm not sure whether or how to include that, but
it's a very good example of how simple this *should* be: probe the
host bridge, discover the ECAM region, request the region, ioremap it,
done.
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
...
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
I think this is a mistake in the x86 MCFG support that we should not
carry over to a generic implementation. We should not use the MCFG
table for resource reservation because MCFG is not defined by the ACPI
spec and an OS need not include support for it. The platform must
indicate in some other, more generic way, that ECAM space is reserved.
This probably means ECAM space should be declared in a PNP0C02 _CRS
method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).

We might need some kind of x86-specific quirk that does this, or a
pcibios hook or something here; I just don't think it should be
generic.
Post by Tomasz Nowicki
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once
here. I think we should look for the information we need when we are
claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might
not be a great fit for the way ACPI table management works, but I
think it's better to do things on-demand rather than just-in-case.
Post by Tomasz Nowicki
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
With the exception of pci_mmconfig_parse_table(), nothing here is
ACPI-specific. I'd like to see the PCI ECAM-related interfaces
(hopefully not these exact ones, but a more rational set) put in
something like include/linux/pci-ecam.h.
Post by Tomasz Nowicki
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
Bjorn
Jayachandran Chandrashekaran Nair
2016-03-04 08:40:03 UTC
Permalink
Post by Bjorn Helgaas
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list
for all architectures or not. My initial plan was not to do this because
of the mess (basically the ECAM region info should be attached to
the pci root and not maintained in a separate list that needs locking),
The patch I posted initially https://patchwork.ozlabs.org/patch/553464/
had a much simpler way of handling the MCFG table without using
the list.

In x86 case it is not feasible to remove using the pci_mmcfg_list.
The only use of it outside is in xen that can be fixed up.
Post by Bjorn Helgaas
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in
pci-host-generic.c. I'm not sure whether or how to include that, but
it's a very good example of how simple this *should* be: probe the
host bridge, discover the ECAM region, request the region, ioremap it,
done.
I had a similar approach in my initial patchset, please see the patch
above. The resource for ECAM is mapped similar to the the way
pci-host-generic.c handled it. An additional step I could do was to
move the common code (ioremap and mapbus) into a common
file and share the code with pci-host-generic.c
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
...
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
I think this is a mistake in the x86 MCFG support that we should not
carry over to a generic implementation. We should not use the MCFG
table for resource reservation because MCFG is not defined by the ACPI
spec and an OS need not include support for it. The platform must
indicate in some other, more generic way, that ECAM space is reserved.
This probably means ECAM space should be declared in a PNP0C02 _CRS
method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a
pcibios hook or something here; I just don't think it should be
generic.
Post by Tomasz Nowicki
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once
here. I think we should look for the information we need when we are
claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might
not be a great fit for the way ACPI table management works, but I
think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information
available there is very limited (i.e, segment, start_bus, end_bus
and address). My approach in the above patch is to save this info
into an array at boot time and avoid multiple lookups.
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
With the exception of pci_mmconfig_parse_table(), nothing here is
ACPI-specific. I'd like to see the PCI ECAM-related interfaces
(hopefully not these exact ones, but a more rational set) put in
something like include/linux/pci-ecam.h.
Post by Tomasz Nowicki
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with
pci-host-generic.c
if that is what you are looking for. The code will end up looking much
simpler.

Thanks,
JC.
Bjorn Helgaas
2016-03-05 04:20:01 UTC
Permalink
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list
for all architectures or not. My initial plan was not to do this because
of the mess (basically the ECAM region info should be attached to
the pci root and not maintained in a separate list that needs locking),
The patch I posted initially https://patchwork.ozlabs.org/patch/553464/
had a much simpler way of handling the MCFG table without using
the list.
I agree that ECAM info should be attached to the PCI host controller.
That should simplify locking and hot-add and hot-removal of host
controllers.

I think pci_mmcfg_list is an implementation detail that may not need
to be generic. I certainly don't think it needs to be part of the
interface.
Post by Jayachandran Chandrashekaran Nair
In x86 case it is not feasible to remove using the pci_mmcfg_list.
The only use of it outside is in xen that can be fixed up.
Post by Bjorn Helgaas
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in
pci-host-generic.c. I'm not sure whether or how to include that, but
it's a very good example of how simple this *should* be: probe the
host bridge, discover the ECAM region, request the region, ioremap it,
done.
I had a similar approach in my initial patchset, please see the patch
above. The resource for ECAM is mapped similar to the the way
pci-host-generic.c handled it. An additional step I could do was to
move the common code (ioremap and mapbus) into a common
file and share the code with pci-host-generic.c
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
...
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
I think this is a mistake in the x86 MCFG support that we should not
carry over to a generic implementation. We should not use the MCFG
table for resource reservation because MCFG is not defined by the ACPI
spec and an OS need not include support for it. The platform must
indicate in some other, more generic way, that ECAM space is reserved.
This probably means ECAM space should be declared in a PNP0C02 _CRS
method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a
pcibios hook or something here; I just don't think it should be
generic.
Post by Tomasz Nowicki
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once
here. I think we should look for the information we need when we are
claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might
not be a great fit for the way ACPI table management works, but I
think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information
available there is very limited (i.e, segment, start_bus, end_bus
and address). My approach in the above patch is to save this info
into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think
there's any performance issue here. But we do use acpi_table_parse(),
which is __init, and *that* is a reason why we might need to parse the
entire MCFG at boot-time. But this is the least of our worries in any
case.
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
With the exception of pci_mmconfig_parse_table(), nothing here is
ACPI-specific. I'd like to see the PCI ECAM-related interfaces
(hopefully not these exact ones, but a more rational set) put in
something like include/linux/pci-ecam.h.
Post by Tomasz Nowicki
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with
pci-host-generic.c
if that is what you are looking for. The code will end up looking much
simpler.
I think we should ignore x86 mmconfig for now. It is absurdly
complicated and I'm not sure it's fixable. I *do* want to keep
drivers/acpi/pci_root.c for all ACPI host bridges, including x86,
ia64, and arm64.

So I think we should write generic MCFG and ECAM support from scratch
for arm64. Something like this:

- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be
called from acpi_init() to copy MCFG info to something we can
access after __init. This would not reserve resources, but
probably does have to ioremap() the regions to support
raw_pci_read().

- Implement raw_pci_read(), which is "special" because ACPI needs it
for PCI config access from AML. It's supposed to be "always
accessible" and we don't have a struct pci_bus *, so this probably
has to use the MCFG copy and the ioremap done above. Maybe it
should go in the same file. This is completely independent of
the PCI core and PCI data structures.

- Implement arm64 pci_acpi_scan_root() that calls
acpi_pci_root_create() with an .init_info() function that calls
acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails,
looks up the bus range in the MCFG copy from above. It should
call request_mem_region(). For a region from _CBA, it should call
ioremap(). For regions from MCFG it can probably use the ioremap
done by acpi_mcfg_init().

I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr()
before calling pci_acpi_scan_root(), but I think that's wrong
because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA
and MCFG should be handled in the same place.

I know calling request_mem_region() here will probably be an
ordering problem because the PNP0C02 driver hasn't reserved
resources yet. But the host bridge driver is using the region and
it should reserve it.

- If we store the ECAM mapped base address in the sysdata or struct
pci_host_bridge, the normal config accessors can use
pci_generic_config_read() with a new generic .map_bus() function.

On x86, the normal config access path is:

pci_read(struct pci_bus *, ...)
raw_pci_read(seg, bus#, ...)
raw_pci_ext_ops->read(seg, bus#, ...)
pci_mmcfg_read(seg, bus#, ...)
pci_dev_base
pci_mmconfig_lookup(seg, bus#)

I think this is somewhat backwards because we start with a pci_bus
pointer, so we *could* have a nice simple bus-specific accessor,
but we throw that pointer away, so pci_mmcfg_read() has to start
over and look up the ECAM offset from scratch, which makes it all
unnecessarily complicated.

Bjorn
Tomasz Nowicki
2016-03-09 09:20:02 UTC
Permalink
Hi Bjorn,

Thanks for your pointers! See my comments inline. Aslo, can you please
have a look at my previous patch set v4 and check how many of your
comments are already addressed there. We may want to back to it then.

https://lkml.org/lkml/2016/2/4/646
Especially patches [0-6] which handle MMCONFIG refactoring.
Post by Bjorn Helgaas
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list
for all architectures or not. My initial plan was not to do this because
of the mess (basically the ECAM region info should be attached to
the pci root and not maintained in a separate list that needs locking),
The patch I posted initially https://patchwork.ozlabs.org/patch/553464/
had a much simpler way of handling the MCFG table without using
the list.
I agree that ECAM info should be attached to the PCI host controller.
That should simplify locking and hot-add and hot-removal of host
controllers.
I think pci_mmcfg_list is an implementation detail that may not need
to be generic. I certainly don't think it needs to be part of the
interface.
Post by Jayachandran Chandrashekaran Nair
In x86 case it is not feasible to remove using the pci_mmcfg_list.
The only use of it outside is in xen that can be fixed up.
Post by Bjorn Helgaas
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in
pci-host-generic.c. I'm not sure whether or how to include that, but
it's a very good example of how simple this *should* be: probe the
host bridge, discover the ECAM region, request the region, ioremap it,
done.
I had a similar approach in my initial patchset, please see the patch
above. The resource for ECAM is mapped similar to the the way
pci-host-generic.c handled it. An additional step I could do was to
move the common code (ioremap and mapbus) into a common
file and share the code with pci-host-generic.c
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
...
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
I think this is a mistake in the x86 MCFG support that we should not
carry over to a generic implementation. We should not use the MCFG
table for resource reservation because MCFG is not defined by the ACPI
spec and an OS need not include support for it. The platform must
indicate in some other, more generic way, that ECAM space is reserved.
This probably means ECAM space should be declared in a PNP0C02 _CRS
method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a
pcibios hook or something here; I just don't think it should be
generic.
Post by Tomasz Nowicki
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once
here. I think we should look for the information we need when we are
claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might
not be a great fit for the way ACPI table management works, but I
think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information
available there is very limited (i.e, segment, start_bus, end_bus
and address). My approach in the above patch is to save this info
into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think
there's any performance issue here. But we do use acpi_table_parse(),
which is __init, and *that* is a reason why we might need to parse the
entire MCFG at boot-time. But this is the least of our worries in any
case.
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
With the exception of pci_mmconfig_parse_table(), nothing here is
ACPI-specific. I'd like to see the PCI ECAM-related interfaces
(hopefully not these exact ones, but a more rational set) put in
something like include/linux/pci-ecam.h.
Post by Tomasz Nowicki
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with
pci-host-generic.c
if that is what you are looking for. The code will end up looking much
simpler.
I think we should ignore x86 mmconfig for now. It is absurdly
complicated and I'm not sure it's fixable. I *do* want to keep
drivers/acpi/pci_root.c for all ACPI host bridges, including x86,
ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be
called from acpi_init() to copy MCFG info to something we can
access after __init. This would not reserve resources, but
probably does have to ioremap() the regions to support
raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch.
There, I tried to leave x86 complication in arch/x86/ and extract
generic functionalities to driver/pci/ecam.c as the library.
Post by Bjorn Helgaas
- Implement raw_pci_read(), which is "special" because ACPI needs it
for PCI config access from AML. It's supposed to be "always
accessible" and we don't have a struct pci_bus *, so this probably
has to use the MCFG copy and the ioremap done above. Maybe it
should go in the same file. This is completely independent of
the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config
accessors being for ARM64 world. Unfortunately, nobody was able to show
real use case for ARM64. Do you see the reason we need this? Our
conclusion was to leave it empty for ARM64 which in turn makes code
simpler. I am not ASWG member while that was under discussion so I will
ask Lorenzo to elaborate more on this.
Post by Bjorn Helgaas
- Implement arm64 pci_acpi_scan_root() that calls
acpi_pci_root_create() with an .init_info() function that calls
acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails,
looks up the bus range in the MCFG copy from above. It should
call request_mem_region(). For a region from _CBA, it should call
ioremap(). For regions from MCFG it can probably use the ioremap
done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
Post by Bjorn Helgaas
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr()
before calling pci_acpi_scan_root(), but I think that's wrong
because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA
and MCFG should be handled in the same place.
I know calling request_mem_region() here will probably be an
ordering problem because the PNP0C02 driver hasn't reserved
resources yet. But the host bridge driver is using the region and
it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct
pci_host_bridge, the normal config accessors can use
pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do
now, but ECAM region and sysdata association will remove ECAM region
lookup step (see patch 09/15 of this series).
Post by Bjorn Helgaas
pci_read(struct pci_bus *, ...)
raw_pci_read(seg, bus#, ...)
raw_pci_ext_ops->read(seg, bus#, ...)
pci_mmcfg_read(seg, bus#, ...)
pci_dev_base
pci_mmconfig_lookup(seg, bus#)
I think this is somewhat backwards because we start with a pci_bus
pointer, so we *could* have a nice simple bus-specific accessor,
but we throw that pointer away, so pci_mmcfg_read() has to start
over and look up the ECAM offset from scratch, which makes it all
unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO
we should either say they are absolutely necessary (and then think how
to simplify it) or just use simple bus-specific accessor (patch 02/15)
e.g. for ARM64.

Any comments appreciated.

Thanks,
Tomasz
Tomasz Nowicki
2016-03-09 09:20:02 UTC
Permalink
Post by Tomasz Nowicki
Post by Bjorn Helgaas
So I think we should write generic MCFG and ECAM support from scratch
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be
called from acpi_init() to copy MCFG info to something we can
access after __init. This would not reserve resources, but
probably does have to ioremap() the regions to support
raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch.
I meant to say, in my previous patch set (V4), sorry.

Tomasz
Jayachandran Chandrashekaran Nair
2016-03-09 10:20:02 UTC
Permalink
Hi Tomasz,
Post by Lorenzo Pieralisi
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have
a look at my previous patch set v4 and check how many of your comments are
already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646
Especially patches [0-6] which handle MMCONFIG refactoring.
Post by Bjorn Helgaas
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list
for all architectures or not. My initial plan was not to do this because
of the mess (basically the ECAM region info should be attached to
the pci root and not maintained in a separate list that needs locking),
The patch I posted initially https://patchwork.ozlabs.org/patch/553464/
had a much simpler way of handling the MCFG table without using
the list.
I agree that ECAM info should be attached to the PCI host controller.
That should simplify locking and hot-add and hot-removal of host
controllers.
I think pci_mmcfg_list is an implementation detail that may not need
to be generic. I certainly don't think it needs to be part of the
interface.
Post by Jayachandran Chandrashekaran Nair
In x86 case it is not feasible to remove using the pci_mmcfg_list.
The only use of it outside is in xen that can be fixed up.
Post by Bjorn Helgaas
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in
pci-host-generic.c. I'm not sure whether or how to include that, but
it's a very good example of how simple this *should* be: probe the
host bridge, discover the ECAM region, request the region, ioremap it,
done.
I had a similar approach in my initial patchset, please see the patch
above. The resource for ECAM is mapped similar to the the way
pci-host-generic.c handled it. An additional step I could do was to
move the common code (ioremap and mapbus) into a common
file and share the code with pci-host-generic.c
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
...
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
I think this is a mistake in the x86 MCFG support that we should not
carry over to a generic implementation. We should not use the MCFG
table for resource reservation because MCFG is not defined by the ACPI
spec and an OS need not include support for it. The platform must
indicate in some other, more generic way, that ECAM space is reserved.
This probably means ECAM space should be declared in a PNP0C02 _CRS
method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a
pcibios hook or something here; I just don't think it should be
generic.
Post by Tomasz Nowicki
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once
here. I think we should look for the information we need when we are
claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might
not be a great fit for the way ACPI table management works, but I
think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information
available there is very limited (i.e, segment, start_bus, end_bus
and address). My approach in the above patch is to save this info
into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think
there's any performance issue here. But we do use acpi_table_parse(),
which is __init, and *that* is a reason why we might need to parse the
entire MCFG at boot-time. But this is the least of our worries in any
case.
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
With the exception of pci_mmconfig_parse_table(), nothing here is
ACPI-specific. I'd like to see the PCI ECAM-related interfaces
(hopefully not these exact ones, but a more rational set) put in
something like include/linux/pci-ecam.h.
Post by Tomasz Nowicki
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with
pci-host-generic.c
if that is what you are looking for. The code will end up looking much
simpler.
I think we should ignore x86 mmconfig for now. It is absurdly
complicated and I'm not sure it's fixable. I *do* want to keep
drivers/acpi/pci_root.c for all ACPI host bridges, including x86,
ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be
called from acpi_init() to copy MCFG info to something we can
access after __init. This would not reserve resources, but
probably does have to ioremap() the regions to support
raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch. There,
I tried to leave x86 complication in arch/x86/ and extract generic
functionalities to driver/pci/ecam.c as the library.
Post by Bjorn Helgaas
- Implement raw_pci_read(), which is "special" because ACPI needs it
for PCI config access from AML. It's supposed to be "always
accessible" and we don't have a struct pci_bus *, so this probably
has to use the MCFG copy and the ioremap done above. Maybe it
should go in the same file. This is completely independent of
the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config accessors
being for ARM64 world. Unfortunately, nobody was able to show real use case
for ARM64. Do you see the reason we need this? Our conclusion was to leave
it empty for ARM64 which in turn makes code simpler. I am not ASWG member
while that was under discussion so I will ask Lorenzo to elaborate more on
this.
Post by Bjorn Helgaas
- Implement arm64 pci_acpi_scan_root() that calls
acpi_pci_root_create() with an .init_info() function that calls
acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails,
looks up the bus range in the MCFG copy from above. It should
call request_mem_region(). For a region from _CBA, it should call
ioremap(). For regions from MCFG it can probably use the ioremap
done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
Post by Bjorn Helgaas
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr()
before calling pci_acpi_scan_root(), but I think that's wrong
because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA
and MCFG should be handled in the same place.
I know calling request_mem_region() here will probably be an
ordering problem because the PNP0C02 driver hasn't reserved
resources yet. But the host bridge driver is using the region and
it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct
pci_host_bridge, the normal config accessors can use
pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do
now, but ECAM region and sysdata association will remove ECAM region lookup
step (see patch 09/15 of this series).
Post by Bjorn Helgaas
pci_read(struct pci_bus *, ...)
raw_pci_read(seg, bus#, ...)
raw_pci_ext_ops->read(seg, bus#, ...)
pci_mmcfg_read(seg, bus#, ...)
pci_dev_base
pci_mmconfig_lookup(seg, bus#)
I think this is somewhat backwards because we start with a pci_bus
pointer, so we *could* have a nice simple bus-specific accessor,
but we throw that pointer away, so pci_mmcfg_read() has to start
over and look up the ECAM offset from scratch, which makes it all
unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO we
should either say they are absolutely necessary (and then think how to
simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for
ARM64.
Both raw_pci_read/write and a host controller with pci_generic_read/write can
be done without much trouble, please see the patch I had at:
https://patchwork.ozlabs.org/patch/575526/

I have been looking at Bjorn's suggestions and trying to see if I can update
my MCFG patch taking care of them. I will post an updated patchset soon,
unless you want to take this up.

JC.
Tomasz Nowicki
2016-03-09 11:00:02 UTC
Permalink
Hi Jayachandran,
Post by Bjorn Helgaas
Hi Tomasz,
Post by Lorenzo Pieralisi
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have
a look at my previous patch set v4 and check how many of your comments are
already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646
Especially patches [0-6] which handle MMCONFIG refactoring.
Post by Bjorn Helgaas
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list
for all architectures or not. My initial plan was not to do this because
of the mess (basically the ECAM region info should be attached to
the pci root and not maintained in a separate list that needs locking),
The patch I posted initially https://patchwork.ozlabs.org/patch/553464/
had a much simpler way of handling the MCFG table without using
the list.
I agree that ECAM info should be attached to the PCI host controller.
That should simplify locking and hot-add and hot-removal of host
controllers.
I think pci_mmcfg_list is an implementation detail that may not need
to be generic. I certainly don't think it needs to be part of the
interface.
Post by Jayachandran Chandrashekaran Nair
In x86 case it is not feasible to remove using the pci_mmcfg_list.
The only use of it outside is in xen that can be fixed up.
Post by Bjorn Helgaas
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in
pci-host-generic.c. I'm not sure whether or how to include that, but
it's a very good example of how simple this *should* be: probe the
host bridge, discover the ECAM region, request the region, ioremap it,
done.
I had a similar approach in my initial patchset, please see the patch
above. The resource for ECAM is mapped similar to the the way
pci-host-generic.c handled it. An additional step I could do was to
move the common code (ioremap and mapbus) into a common
file and share the code with pci-host-generic.c
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
...
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
I think this is a mistake in the x86 MCFG support that we should not
carry over to a generic implementation. We should not use the MCFG
table for resource reservation because MCFG is not defined by the ACPI
spec and an OS need not include support for it. The platform must
indicate in some other, more generic way, that ECAM space is reserved.
This probably means ECAM space should be declared in a PNP0C02 _CRS
method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a
pcibios hook or something here; I just don't think it should be
generic.
Post by Tomasz Nowicki
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once
here. I think we should look for the information we need when we are
claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might
not be a great fit for the way ACPI table management works, but I
think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information
available there is very limited (i.e, segment, start_bus, end_bus
and address). My approach in the above patch is to save this info
into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think
there's any performance issue here. But we do use acpi_table_parse(),
which is __init, and *that* is a reason why we might need to parse the
entire MCFG at boot-time. But this is the least of our worries in any
case.
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
With the exception of pci_mmconfig_parse_table(), nothing here is
ACPI-specific. I'd like to see the PCI ECAM-related interfaces
(hopefully not these exact ones, but a more rational set) put in
something like include/linux/pci-ecam.h.
Post by Tomasz Nowicki
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with
pci-host-generic.c
if that is what you are looking for. The code will end up looking much
simpler.
I think we should ignore x86 mmconfig for now. It is absurdly
complicated and I'm not sure it's fixable. I *do* want to keep
drivers/acpi/pci_root.c for all ACPI host bridges, including x86,
ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be
called from acpi_init() to copy MCFG info to something we can
access after __init. This would not reserve resources, but
probably does have to ioremap() the regions to support
raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch. There,
I tried to leave x86 complication in arch/x86/ and extract generic
functionalities to driver/pci/ecam.c as the library.
Post by Bjorn Helgaas
- Implement raw_pci_read(), which is "special" because ACPI needs it
for PCI config access from AML. It's supposed to be "always
accessible" and we don't have a struct pci_bus *, so this probably
has to use the MCFG copy and the ioremap done above. Maybe it
should go in the same file. This is completely independent of
the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config accessors
being for ARM64 world. Unfortunately, nobody was able to show real use case
for ARM64. Do you see the reason we need this? Our conclusion was to leave
it empty for ARM64 which in turn makes code simpler. I am not ASWG member
while that was under discussion so I will ask Lorenzo to elaborate more on
this.
Post by Bjorn Helgaas
- Implement arm64 pci_acpi_scan_root() that calls
acpi_pci_root_create() with an .init_info() function that calls
acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails,
looks up the bus range in the MCFG copy from above. It should
call request_mem_region(). For a region from _CBA, it should call
ioremap(). For regions from MCFG it can probably use the ioremap
done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
Post by Bjorn Helgaas
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr()
before calling pci_acpi_scan_root(), but I think that's wrong
because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA
and MCFG should be handled in the same place.
I know calling request_mem_region() here will probably be an
ordering problem because the PNP0C02 driver hasn't reserved
resources yet. But the host bridge driver is using the region and
it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct
pci_host_bridge, the normal config accessors can use
pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do
now, but ECAM region and sysdata association will remove ECAM region lookup
step (see patch 09/15 of this series).
Post by Bjorn Helgaas
pci_read(struct pci_bus *, ...)
raw_pci_read(seg, bus#, ...)
raw_pci_ext_ops->read(seg, bus#, ...)
pci_mmcfg_read(seg, bus#, ...)
pci_dev_base
pci_mmconfig_lookup(seg, bus#)
I think this is somewhat backwards because we start with a pci_bus
pointer, so we *could* have a nice simple bus-specific accessor,
but we throw that pointer away, so pci_mmcfg_read() has to start
over and look up the ECAM offset from scratch, which makes it all
unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO we
should either say they are absolutely necessary (and then think how to
simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for
ARM64.
Both raw_pci_read/write and a host controller with pci_generic_read/write can
https://patchwork.ozlabs.org/patch/575526/
Yes it is doable, I implemented it in the same way in one of my initial
patch series, about year ago. I'm questioning raw accessors presence on
per-arch basis. If it is really needed for all archs, then we definitely
should implement it. If ARM64 does not care for it, there is no point to
complicate it. Especially, I mean all kind of PCI config space quirk we
will need to handle right after this patch got merged, see:
[PATCH V5 13/15] pci, acpi: Match PCI config space accessors against
platfrom specific quirks.

and

https://lkml.org/lkml/2016/2/9/627
https://lkml.org/lkml/2016/2/8/967

Giving these quirks, raw accessors are not so easy.
Post by Bjorn Helgaas
I have been looking at Bjorn's suggestions and trying to see if I can update
my MCFG patch taking care of them. I will post an updated patch set soon,
unless you want to take this up.
Yes, I want to post next version and keep this patch set together, if
you and Bjorn are okay. I am feeling that my previous patch set is close
to what Bjorn suggested, modulo the way we keep MCFG regions. Lets
discuss it here, then I will post it as next version. I am looking
forward to hear Bjorn's comment on my previous patch set.

Tomasz
Jayachandran Chandrashekaran Nair
2016-03-10 13:10:01 UTC
Permalink
Hi Tomasz,
Post by Tomasz Nowicki
Hi Jayachandran,
Post by Bjorn Helgaas
Hi Tomasz,
Post by Lorenzo Pieralisi
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have
a look at my previous patch set v4 and check how many of your comments are
already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646
Especially patches [0-6] which handle MMCONFIG refactoring.
Post by Bjorn Helgaas
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list
for all architectures or not. My initial plan was not to do this because
of the mess (basically the ECAM region info should be attached to
the pci root and not maintained in a separate list that needs locking),
The patch I posted initially https://patchwork.ozlabs.org/patch/553464/
had a much simpler way of handling the MCFG table without using
the list.
I agree that ECAM info should be attached to the PCI host controller.
That should simplify locking and hot-add and hot-removal of host
controllers.
I think pci_mmcfg_list is an implementation detail that may not need
to be generic. I certainly don't think it needs to be part of the
interface.
Post by Jayachandran Chandrashekaran Nair
In x86 case it is not feasible to remove using the pci_mmcfg_list.
The only use of it outside is in xen that can be fixed up.
Post by Bjorn Helgaas
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in
pci-host-generic.c. I'm not sure whether or how to include that, but
it's a very good example of how simple this *should* be: probe the
host bridge, discover the ECAM region, request the region, ioremap it,
done.
I had a similar approach in my initial patchset, please see the patch
above. The resource for ECAM is mapped similar to the the way
pci-host-generic.c handled it. An additional step I could do was to
move the common code (ioremap and mapbus) into a common
file and share the code with pci-host-generic.c
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
...
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
I think this is a mistake in the x86 MCFG support that we should not
carry over to a generic implementation. We should not use the MCFG
table for resource reservation because MCFG is not defined by the ACPI
spec and an OS need not include support for it. The platform must
indicate in some other, more generic way, that ECAM space is reserved.
This probably means ECAM space should be declared in a PNP0C02 _CRS
method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a
pcibios hook or something here; I just don't think it should be
generic.
Post by Tomasz Nowicki
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once
here. I think we should look for the information we need when we are
claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might
not be a great fit for the way ACPI table management works, but I
think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information
available there is very limited (i.e, segment, start_bus, end_bus
and address). My approach in the above patch is to save this info
into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think
there's any performance issue here. But we do use acpi_table_parse(),
which is __init, and *that* is a reason why we might need to parse the
entire MCFG at boot-time. But this is the least of our worries in any
case.
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8
start,
u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
With the exception of pci_mmconfig_parse_table(), nothing here is
ACPI-specific. I'd like to see the PCI ECAM-related interfaces
(hopefully not these exact ones, but a more rational set) put in
something like include/linux/pci-ecam.h.
Post by Tomasz Nowicki
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with
pci-host-generic.c
if that is what you are looking for. The code will end up looking much
simpler.
I think we should ignore x86 mmconfig for now. It is absurdly
complicated and I'm not sure it's fixable. I *do* want to keep
drivers/acpi/pci_root.c for all ACPI host bridges, including x86,
ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be
called from acpi_init() to copy MCFG info to something we can
access after __init. This would not reserve resources, but
probably does have to ioremap() the regions to support
raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch. There,
I tried to leave x86 complication in arch/x86/ and extract generic
functionalities to driver/pci/ecam.c as the library.
Post by Bjorn Helgaas
- Implement raw_pci_read(), which is "special" because ACPI needs it
for PCI config access from AML. It's supposed to be "always
accessible" and we don't have a struct pci_bus *, so this probably
has to use the MCFG copy and the ioremap done above. Maybe it
should go in the same file. This is completely independent of
the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config accessors
being for ARM64 world. Unfortunately, nobody was able to show real use case
for ARM64. Do you see the reason we need this? Our conclusion was to leave
it empty for ARM64 which in turn makes code simpler. I am not ASWG member
while that was under discussion so I will ask Lorenzo to elaborate more on
this.
Post by Bjorn Helgaas
- Implement arm64 pci_acpi_scan_root() that calls
acpi_pci_root_create() with an .init_info() function that calls
acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails,
looks up the bus range in the MCFG copy from above. It should
call request_mem_region(). For a region from _CBA, it should call
ioremap(). For regions from MCFG it can probably use the ioremap
done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
Post by Bjorn Helgaas
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr()
before calling pci_acpi_scan_root(), but I think that's wrong
because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA
and MCFG should be handled in the same place.
I know calling request_mem_region() here will probably be an
ordering problem because the PNP0C02 driver hasn't reserved
resources yet. But the host bridge driver is using the region and
it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct
pci_host_bridge, the normal config accessors can use
pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do
now, but ECAM region and sysdata association will remove ECAM region lookup
step (see patch 09/15 of this series).
Post by Bjorn Helgaas
pci_read(struct pci_bus *, ...)
raw_pci_read(seg, bus#, ...)
raw_pci_ext_ops->read(seg, bus#, ...)
pci_mmcfg_read(seg, bus#, ...)
pci_dev_base
pci_mmconfig_lookup(seg, bus#)
I think this is somewhat backwards because we start with a pci_bus
pointer, so we *could* have a nice simple bus-specific accessor,
but we throw that pointer away, so pci_mmcfg_read() has to start
over and look up the ECAM offset from scratch, which makes it all
unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO we
should either say they are absolutely necessary (and then think how to
simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for
ARM64.
Both raw_pci_read/write and a host controller with pci_generic_read/write can
https://patchwork.ozlabs.org/patch/575526/
Yes it is doable, I implemented it in the same way in one of my initial
patch series, about year ago. I'm questioning raw accessors presence on
per-arch basis. If it is really needed for all archs, then we definitely
should implement it. If ARM64 does not care for it, there is no point to
complicate it. Especially, I mean all kind of PCI config space quirk we will
[PATCH V5 13/15] pci, acpi: Match PCI config space accessors against
platfrom specific quirks.
and
https://lkml.org/lkml/2016/2/9/627
https://lkml.org/lkml/2016/2/8/967
Giving these quirks, raw accessors are not so easy.
The whole quirk handling infrastructure seems to be an overkill to me.
I will leave it to maintainers to comment further.
Post by Tomasz Nowicki
Post by Bjorn Helgaas
I have been looking at Bjorn's suggestions and trying to see if I can update
my MCFG patch taking care of them. I will post an updated patch set soon,
unless you want to take this up.
Yes, I want to post next version and keep this patch set together, if you
and Bjorn are okay. I am feeling that my previous patch set is close to what
Bjorn suggested, modulo the way we keep MCFG regions. Lets discuss it here,
then I will post it as next version. I am looking forward to hear Bjorn's
comment on my previous patch set.
I have been looking thru the code, and I have a reasonable implementation
which updates this one patch. This pulls in common code from pci-host-generic.c
as well. I will post it by next week and you can decide whether to use it
to update your patchset.

Thanks,
JC.
Jayachandran C
2016-03-17 20:40:03 UTC
Permalink
Hi Bjorn,

Here is a new patchset for the ACPI PCI controller driver based on the
earlier discussion[1].

The first two patches in the patchset implements pci/ecam.c for generic
config space access and uses it in pci-host-generic.c and related files.

The third patch implements the ACPI PCI host driver using the same ecam
access functions. The fourth patch adds the implementation of raw
operations.

I have not used the pci_mmcfg_list or the region definitions from x86,
but have used a much simpler approach here.

This should apply cleanly on top of the current pci next tree, and
can be reviewed as a patchset. To use it on ARM64, we need to pull
in about 7 patches more from Tomasz patchset that fixes various
issues (like stub code in arm64 pci.c, ACPI companion setup,
domain number assignment, IO resources fixup etc.).

If you are okay with this approach, I will work with Tomasz and
post the full patchset.

This has been tested on qemu with OVMF for the ACPI part and with
device tree for pci-host-generic code.

Thanks,
JC.

[1] https://lkml.org/lkml/2016/3/3/921

Jayachandran C (4):
PCI: Provide generic ECAM mapping functions
PCI: generic,thunder: Use generic config functions
ACPI: PCI: Add generic PCI host controller
ACPI: PCI: Add raw_pci_read/write operations

drivers/acpi/Kconfig | 9 +
drivers/acpi/Makefile | 1 +
drivers/acpi/pci_gen_host.c | 334 ++++++++++++++++++++++++++++++++++++
drivers/pci/Kconfig | 3 +
drivers/pci/Makefile | 2 +
drivers/pci/ecam.c | 127 ++++++++++++++
drivers/pci/host/Kconfig | 1 +
drivers/pci/host/pci-host-common.c | 68 ++++----
drivers/pci/host/pci-host-common.h | 25 +--
drivers/pci/host/pci-host-generic.c | 51 +-----
drivers/pci/host/pci-thunder-ecam.c | 33 +---
drivers/pci/host/pci-thunder-pem.c | 41 ++---
include/linux/pci.h | 10 ++
13 files changed, 560 insertions(+), 145 deletions(-)
create mode 100644 drivers/acpi/pci_gen_host.c
create mode 100644 drivers/pci/ecam.c
--
1.9.1
Jayachandran C
2016-03-18 17:50:03 UTC
Permalink
Post by Lorenzo Pieralisi
Hi Bjorn,
Here is a new patchset for the ACPI PCI controller driver based on the
earlier discussion[1].
The first two patches in the patchset implements pci/ecam.c for generic
config space access and uses it in pci-host-generic.c and related files.
The third patch implements the ACPI PCI host driver using the same ecam
access functions. The fourth patch adds the implementation of raw
operations.
I have not used the pci_mmcfg_list or the region definitions from x86,
but have used a much simpler approach here.
This should apply cleanly on top of the current pci next tree, and
can be reviewed as a patchset. To use it on ARM64, we need to pull
in about 7 patches more from Tomasz patchset that fixes various
issues (like stub code in arm64 pci.c, ACPI companion setup,
domain number assignment, IO resources fixup etc.).
If you are okay with this approach, I will work with Tomasz and
post the full patchset.
This has been tested on qemu with OVMF for the ACPI part and with
device tree for pci-host-generic code.
The full patchset is available at https://github.com/jchandra-brcm/linux.git on
branch arm64-acpi-pci, if anyone wants to try it.

Comments, suggestions and testing would be welcome.

Thanks,
JC.
Gabriele Paoloni
2016-03-23 10:30:02 UTC
Permalink
Hi Jayachandran
-----Original Message-----
Sent: 18 March 2016 17:48
Cc: Jayachandran C; Arnd Bergmann; Will Deacon; Catalin Marinas; Hanjun
Subject: Re: [RFC PATCH 0/4] ACPI based PCI host driver with generic
ECAM
Post by Lorenzo Pieralisi
Hi Bjorn,
Here is a new patchset for the ACPI PCI controller driver based on
the
Post by Lorenzo Pieralisi
earlier discussion[1].
The first two patches in the patchset implements pci/ecam.c for
generic
Post by Lorenzo Pieralisi
config space access and uses it in pci-host-generic.c and related
files.
Post by Lorenzo Pieralisi
The third patch implements the ACPI PCI host driver using the same
ecam
Post by Lorenzo Pieralisi
access functions. The fourth patch adds the implementation of raw
operations.
I have not used the pci_mmcfg_list or the region definitions from
x86,
Post by Lorenzo Pieralisi
but have used a much simpler approach here.
This should apply cleanly on top of the current pci next tree, and
can be reviewed as a patchset. To use it on ARM64, we need to pull
in about 7 patches more from Tomasz patchset that fixes various
issues (like stub code in arm64 pci.c, ACPI companion setup,
domain number assignment, IO resources fixup etc.).
If you are okay with this approach, I will work with Tomasz and
post the full patchset.
This has been tested on qemu with OVMF for the ACPI part and with
device tree for pci-host-generic code.
The full patchset is available at https://github.com/jchandra-
brcm/linux.git on
branch arm64-acpi-pci, if anyone wants to try it.
I had a look at your patchset and also in your git repo at the other
patches that you ported over from Tomasz; it seems that now we miss
a quirk mechanism to enable controller that are not fully ECAM.

This was provided before by Tomasz in:
https://lkml.org/lkml/2016/2/16/410

I think we should put something like that back in...

Thanks

Gab
Comments, suggestions
Sinan Kaya
2016-03-28 13:50:01 UTC
Permalink
Hi,
Post by Gabriele Paoloni
I had a look at your patchset and also in your git repo at the other
patches that you ported over from Tomasz; it seems that now we miss
a quirk mechanism to enable controller that are not fully ECAM.
https://lkml.org/lkml/2016/2/16/410
I think we should put something like that back in...
Thanks
Gab
I was requested to test your patchset. I'll need this mechanism before
I can start.

Sinan
--
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Jayachandran C
2016-03-28 18:10:02 UTC
Permalink
Post by Sinan Kaya
Hi,
Post by Gabriele Paoloni
I had a look at your patchset and also in your git repo at the other
patches that you ported over from Tomasz; it seems that now we miss
a quirk mechanism to enable controller that are not fully ECAM.
https://lkml.org/lkml/2016/2/16/410
I think we should put something like that back in...
Like Tomasz mentioned in his mail, his approach does not work
for raw operations. I have added raw operations in may patchset,
so we have to come up with a new approach or decide that raw
operations can be dropped.

I am waiting for the overall acceptance of the patch set before
going further along this path
Post by Sinan Kaya
Post by Gabriele Paoloni
Thanks
Gab
I was requested to test your patchset. I'll need this mechanism before
I can start.
Please see above, we will need to look at the quirks again.
Post by Sinan Kaya
Sinan
JC.
Tomasz Nowicki
2016-04-05 14:20:02 UTC
Permalink
Post by Lorenzo Pieralisi
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please
have a look at my previous patch set v4 and check how many of your
comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646
Especially patches [0-6] which handle MMCONFIG refactoring.
Post by Bjorn Helgaas
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list
for all architectures or not. My initial plan was not to do this because
of the mess (basically the ECAM region info should be attached to
the pci root and not maintained in a separate list that needs locking),
The patch I posted initially https://patchwork.ozlabs.org/patch/553464/
had a much simpler way of handling the MCFG table without using
the list.
I agree that ECAM info should be attached to the PCI host controller.
That should simplify locking and hot-add and hot-removal of host
controllers.
I think pci_mmcfg_list is an implementation detail that may not need
to be generic. I certainly don't think it needs to be part of the
interface.
Post by Jayachandran Chandrashekaran Nair
In x86 case it is not feasible to remove using the pci_mmcfg_list.
The only use of it outside is in xen that can be fixed up.
Post by Bjorn Helgaas
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in
pci-host-generic.c. I'm not sure whether or how to include that, but
it's a very good example of how simple this *should* be: probe the
host bridge, discover the ECAM region, request the region, ioremap it,
done.
I had a similar approach in my initial patchset, please see the patch
above. The resource for ECAM is mapped similar to the the way
pci-host-generic.c handled it. An additional step I could do was to
move the common code (ioremap and mapbus) into a common
file and share the code with pci-host-generic.c
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
...
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
I think this is a mistake in the x86 MCFG support that we should not
carry over to a generic implementation. We should not use the MCFG
table for resource reservation because MCFG is not defined by the ACPI
spec and an OS need not include support for it. The platform must
indicate in some other, more generic way, that ECAM space is reserved.
This probably means ECAM space should be declared in a PNP0C02 _CRS
method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a
pcibios hook or something here; I just don't think it should be
generic.
Post by Tomasz Nowicki
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once
here. I think we should look for the information we need when we are
claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might
not be a great fit for the way ACPI table management works, but I
think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information
available there is very limited (i.e, segment, start_bus, end_bus
and address). My approach in the above patch is to save this info
into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think
there's any performance issue here. But we do use acpi_table_parse(),
which is __init, and *that* is a reason why we might need to parse the
entire MCFG at boot-time. But this is the least of our worries in any
case.
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
With the exception of pci_mmconfig_parse_table(), nothing here is
ACPI-specific. I'd like to see the PCI ECAM-related interfaces
(hopefully not these exact ones, but a more rational set) put in
something like include/linux/pci-ecam.h.
Post by Tomasz Nowicki
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with
pci-host-generic.c
if that is what you are looking for. The code will end up looking much
simpler.
I think we should ignore x86 mmconfig for now. It is absurdly
complicated and I'm not sure it's fixable. I *do* want to keep
drivers/acpi/pci_root.c for all ACPI host bridges, including x86,
ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be
called from acpi_init() to copy MCFG info to something we can
access after __init. This would not reserve resources, but
probably does have to ioremap() the regions to support
raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch.
There, I tried to leave x86 complication in arch/x86/ and extract
generic functionalities to driver/pci/ecam.c as the library.
Post by Bjorn Helgaas
- Implement raw_pci_read(), which is "special" because ACPI needs it
for PCI config access from AML. It's supposed to be "always
accessible" and we don't have a struct pci_bus *, so this probably
has to use the MCFG copy and the ioremap done above. Maybe it
should go in the same file. This is completely independent of
the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config
accessors being for ARM64 world. Unfortunately, nobody was able to show
real use case for ARM64. Do you see the reason we need this? Our
conclusion was to leave it empty for ARM64 which in turn makes code
simpler. I am not ASWG member while that was under discussion so I will
ask Lorenzo to elaborate more on this.
Post by Bjorn Helgaas
- Implement arm64 pci_acpi_scan_root() that calls
acpi_pci_root_create() with an .init_info() function that calls
acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails,
looks up the bus range in the MCFG copy from above. It should
call request_mem_region(). For a region from _CBA, it should call
ioremap(). For regions from MCFG it can probably use the ioremap
done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
Post by Bjorn Helgaas
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr()
before calling pci_acpi_scan_root(), but I think that's wrong
because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA
and MCFG should be handled in the same place.
I know calling request_mem_region() here will probably be an
ordering problem because the PNP0C02 driver hasn't reserved
resources yet. But the host bridge driver is using the region and
it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct
pci_host_bridge, the normal config accessors can use
pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do
now, but ECAM region and sysdata association will remove ECAM region
lookup step (see patch 09/15 of this series).
Post by Bjorn Helgaas
pci_read(struct pci_bus *, ...)
raw_pci_read(seg, bus#, ...)
raw_pci_ext_ops->read(seg, bus#, ...)
pci_mmcfg_read(seg, bus#, ...)
pci_dev_base
pci_mmconfig_lookup(seg, bus#)
I think this is somewhat backwards because we start with a pci_bus
pointer, so we *could* have a nice simple bus-specific accessor,
but we throw that pointer away, so pci_mmcfg_read() has to start
over and look up the ECAM offset from scratch, which makes it all
unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO
we should either say they are absolutely necessary (and then think how
to simplify it) or just use simple bus-specific accessor (patch 02/15)
e.g. for ARM64.
Any comments appreciated.
Hi Bjorn,

Kindly reminder. I would like to move on with this patch set. Can you
please comments on it so that we could decide which way to go.

Regards,
Tomasz
Bjorn Helgaas
2016-04-05 16:50:03 UTC
Permalink
Hi Tomasz,
Post by Tomasz Nowicki
Post by Lorenzo Pieralisi
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please
have a look at my previous patch set v4 and check how many of your
comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646
Especially patches [0-6] which handle MMCONFIG refactoring.
Post by Bjorn Helgaas
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list
for all architectures or not. My initial plan was not to do this because
of the mess (basically the ECAM region info should be attached to
the pci root and not maintained in a separate list that needs locking),
The patch I posted initially https://patchwork.ozlabs.org/patch/553464/
had a much simpler way of handling the MCFG table without using
the list.
I agree that ECAM info should be attached to the PCI host controller.
That should simplify locking and hot-add and hot-removal of host
controllers.
I think pci_mmcfg_list is an implementation detail that may not need
to be generic. I certainly don't think it needs to be part of the
interface.
Post by Jayachandran Chandrashekaran Nair
In x86 case it is not feasible to remove using the pci_mmcfg_list.
The only use of it outside is in xen that can be fixed up.
Post by Bjorn Helgaas
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in
pci-host-generic.c. I'm not sure whether or how to include that, but
it's a very good example of how simple this *should* be: probe the
host bridge, discover the ECAM region, request the region, ioremap it,
done.
I had a similar approach in my initial patchset, please see the patch
above. The resource for ECAM is mapped similar to the the way
pci-host-generic.c handled it. An additional step I could do was to
move the common code (ioremap and mapbus) into a common
file and share the code with pci-host-generic.c
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
new file mode 100644
index 0000000..ea84365
--- /dev/null
+++ b/drivers/acpi/pci_mcfg.c
...
+int __weak pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg)
+{
+ struct resource *tmp;
+ void __iomem *vaddr;
+
+ tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
+ if (tmp) {
+ dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
+ &mcfg->res, tmp->name, tmp);
+ return -EBUSY;
+ }
I think this is a mistake in the x86 MCFG support that we should not
carry over to a generic implementation. We should not use the MCFG
table for resource reservation because MCFG is not defined by the ACPI
spec and an OS need not include support for it. The platform must
indicate in some other, more generic way, that ECAM space is reserved.
This probably means ECAM space should be declared in a PNP0C02 _CRS
method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a
pcibios hook or something here; I just don't think it should be
generic.
Post by Tomasz Nowicki
+int __init pci_mmconfig_parse_table(void)
+{
+ return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once
here. I think we should look for the information we need when we are
claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might
not be a great fit for the way ACPI table management works, but I
think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information
available there is very limited (i.e, segment, start_bus, end_bus
and address). My approach in the above patch is to save this info
into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think
there's any performance issue here. But we do use acpi_table_parse(),
which is __init, and *that* is a reason why we might need to parse the
entire MCFG at boot-time. But this is the least of our worries in any
case.
Post by Jayachandran Chandrashekaran Nair
Post by Bjorn Helgaas
Post by Tomasz Nowicki
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 89ab057..e9450ef 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[];
#define RESET_DELAY_DSM 0x08
#define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */
+/* "PCI MMCONFIG %04x [bus %02x-%02x]" */
+#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+
+struct pci_mmcfg_region {
+ struct list_head list;
+ struct resource res;
+ u64 address;
+ char __iomem *virt;
+ u16 segment;
+ u8 start_bus;
+ u8 end_bus;
+ char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
+ phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
+ int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
+ struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg);
+extern int pci_mmconfig_enabled(void);
+extern int __init pci_mmconfig_parse_table(void);
+
+extern struct list_head pci_mmcfg_list;
+
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
+#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+
With the exception of pci_mmconfig_parse_table(), nothing here is
ACPI-specific. I'd like to see the PCI ECAM-related interfaces
(hopefully not these exact ones, but a more rational set) put in
something like include/linux/pci-ecam.h.
Post by Tomasz Nowicki
#else /* CONFIG_ACPI */
static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with
pci-host-generic.c
if that is what you are looking for. The code will end up looking much
simpler.
I think we should ignore x86 mmconfig for now. It is absurdly
complicated and I'm not sure it's fixable. I *do* want to keep
drivers/acpi/pci_root.c for all ACPI host bridges, including x86,
ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be
called from acpi_init() to copy MCFG info to something we can
access after __init. This would not reserve resources, but
probably does have to ioremap() the regions to support
raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch.
There, I tried to leave x86 complication in arch/x86/ and extract
generic functionalities to driver/pci/ecam.c as the library.
Post by Bjorn Helgaas
- Implement raw_pci_read(), which is "special" because ACPI needs it
for PCI config access from AML. It's supposed to be "always
accessible" and we don't have a struct pci_bus *, so this probably
has to use the MCFG copy and the ioremap done above. Maybe it
should go in the same file. This is completely independent of
the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config
accessors being for ARM64 world. Unfortunately, nobody was able to show
real use case for ARM64. Do you see the reason we need this? Our
conclusion was to leave it empty for ARM64 which in turn makes code
simpler. I am not ASWG member while that was under discussion so I will
ask Lorenzo to elaborate more on this.
Post by Bjorn Helgaas
- Implement arm64 pci_acpi_scan_root() that calls
acpi_pci_root_create() with an .init_info() function that calls
acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails,
looks up the bus range in the MCFG copy from above. It should
call request_mem_region(). For a region from _CBA, it should call
ioremap(). For regions from MCFG it can probably use the ioremap
done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
Post by Bjorn Helgaas
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr()
before calling pci_acpi_scan_root(), but I think that's wrong
because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA
and MCFG should be handled in the same place.
I know calling request_mem_region() here will probably be an
ordering problem because the PNP0C02 driver hasn't reserved
resources yet. But the host bridge driver is using the region and
it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct
pci_host_bridge, the normal config accessors can use
pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do
now, but ECAM region and sysdata association will remove ECAM region
lookup step (see patch 09/15 of this series).
Post by Bjorn Helgaas
pci_read(struct pci_bus *, ...)
raw_pci_read(seg, bus#, ...)
raw_pci_ext_ops->read(seg, bus#, ...)
pci_mmcfg_read(seg, bus#, ...)
pci_dev_base
pci_mmconfig_lookup(seg, bus#)
I think this is somewhat backwards because we start with a pci_bus
pointer, so we *could* have a nice simple bus-specific accessor,
but we throw that pointer away, so pci_mmcfg_read() has to start
over and look up the ECAM offset from scratch, which makes it all
unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO
we should either say they are absolutely necessary (and then think how
to simplify it) or just use simple bus-specific accessor (patch 02/15)
e.g. for ARM64.
Any comments appreciated.
Kindly reminder. I would like to move on with this patch set. Can
you please comments on it so that we could decide which way to go.
Can you repost your current proposal with a version number higher than
any previous ones? It's OK if the content is the same as v4; I just
think it's confusing if we resurrect v4 and have to follow discussion
from v3 to v4 to v5 and back to v4. The archives would be a bit of a
muddle.

Bjorn
Tomasz Nowicki
2016-04-05 18:10:01 UTC
Permalink
Hi Bjorn,
Post by Bjorn Helgaas
Hi Tomasz,
[...]
Post by Bjorn Helgaas
Post by Tomasz Nowicki
Post by Tomasz Nowicki
As you pointed out raw_pci_{read|write} make things complicated, so IMO
we should either say they are absolutely necessary (and then think how
to simplify it) or just use simple bus-specific accessor (patch 02/15)
e.g. for ARM64.
Any comments appreciated.
Kindly reminder. I would like to move on with this patch set. Can
you please comments on it so that we could decide which way to go.
Can you repost your current proposal with a version number higher than
any previous ones? It's OK if the content is the same as v4; I just
think it's confusing if we resurrect v4 and have to follow discussion
from v3 to v4 to v5 and back to v4. The archives would be a bit of a
muddle.
Sure I will repost ASAP.

Thanks!
Tomasz
Jayachandran C
2016-04-05 19:00:02 UTC
Permalink
Hi Bjorn,
Post by Bjorn Helgaas
Hi Tomasz,
Post by Tomasz Nowicki
Post by Lorenzo Pieralisi
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please
have a look at my previous patch set v4 and check how many of your
comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646
Especially patches [0-6] which handle MMCONFIG refactoring.
[...]
Post by Tomasz Nowicki
Post by Lorenzo Pieralisi
As you pointed out raw_pci_{read|write} make things complicated, so IMO
we should either say they are absolutely necessary (and then think how
to simplify it) or just use simple bus-specific accessor (patch 02/15)
e.g. for ARM64.
Any comments appreciated.
Kindly reminder. I would like to move on with this patch set. Can
you please comments on it so that we could decide which way to go.
Can you repost your current proposal with a version number higher than
any previous ones? It's OK if the content is the same as v4; I just
think it's confusing if we resurrect v4 and have to follow discussion
from v3 to v4 to v5 and back to v4. The archives would be a bit of a
muddle.
I had posted a patchset based on your suggestions in this thread
https://lkml.org/lkml/2016/3/17/621

Would appreciate any comments on that. Like I said in the earlier
mail, if this is a reasonable approach, I can combine this with
Tomasz patchset to provide the full patchset for ACPI support.

Thanks,
JC.

Tomasz Nowicki
2016-03-04 09:40:01 UTC
Permalink
Hi Bjorn,
Post by Bjorn Helgaas
Hi Tomasz, Jayachandran, et al,
Post by Tomasz Nowicki
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is
to share the API and code with ARM64 later. The corresponding
declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be
implemented by the arch code: pci_mmconfig_map_resource() to map a
mcfg entry, pci_mmconfig_unmap_resource to do the corresponding
unmap and pci_mmconfig_enabled to see if the arch setup of
mcfg entries was successful. We also provide weak implementations
of these, which will be used from ARM64. On x86, we retain the
old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any
impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64.
I need to ponder this some more, so these are just some initial
thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated
disaster, and (b) we're trying a little too hard to make that mess
generic. I think we might be better served if we came up with some
cleaner, more generic code that we can use for ARM64 today, and
migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be
"arch-independent", we will be perpetuating some interfaces and
designs that shouldn't be allowed to escape arch/x86.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really
ACPI-specific, and could potentially be used for non-ACPI bridges that
support ECAM. I'd like to see that sort of code moved to a new file
like drivers/pci/ecam.c.
Actually I split it as you suggested in the previous patch set. Please
have a look at:
https://lkml.org/lkml/2016/2/4/646

Especially patches [0-6] which handle MMCONFIG refactoring.

Thanks,
Tomasz
Tomasz Nowicki
2016-02-16 14:00:05 UTC
Permalink
From: Lorenzo Pieralisi <***@arm.com>

PCI core code contains a set of functions, eg:

pci_assign_unassigned_bus_resources()

that allow to assign the PCI resources for a given bus after
enumeration.

On systems where the PCI BARs are immutable (ie they must not and can
not be assigned), PCI resources must be claimed in order to be
validated and inserted in the PCI resources tree, but there is no generic
PCI kernel function for that purpose and the resource claiming is
implemented in an arch specific fashion which resulted in arches
implementations that contain duplicated code.

This patch, based on the ia64 resource claiming arch implementation,
implements a set of functions in core PCI code that provides a PCI core
interface for resources claiming for a given PCI bus hierarchy, paving
the way for further resource claiming consolidation across architectures.

Signed-off-by: Lorenzo Pieralisi <***@arm.com>
Cc: Arnd Bergmann <***@arndb.de>
Cc: Bjorn Helgaas <***@google.com>
Cc: Yinghai Lu <***@kernel.org>
---
drivers/pci/setup-bus.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/pci.h | 1 +
2 files changed, 64 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7796d0a..c959398 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1424,6 +1424,69 @@ void pci_bus_assign_resources(const struct pci_bus *bus)
}
EXPORT_SYMBOL(pci_bus_assign_resources);

+static void pci_claim_device_resources(struct pci_dev *dev)
+{
+ int i;
+
+ for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
+ struct resource *r = &dev->resource[i];
+
+ if (!r->flags || r->parent)
+ continue;
+
+ pci_claim_resource(dev, i);
+ }
+}
+
+static void pci_claim_bridge_resources(struct pci_dev *dev)
+{
+ int i;
+
+ for (i = PCI_BRIDGE_RESOURCES; i < PCI_NUM_RESOURCES; i++) {
+ struct resource *r = &dev->resource[i];
+
+ if (!r->flags || r->parent)
+ continue;
+
+ pci_claim_bridge_resource(dev, i);
+ }
+}
+
+static void pci_bus_allocate_dev_resources(struct pci_bus *b)
+{
+ struct pci_dev *dev;
+ struct pci_bus *child;
+
+ list_for_each_entry(dev, &b->devices, bus_list) {
+ pci_claim_device_resources(dev);
+
+ child = dev->subordinate;
+ if (child)
+ pci_bus_allocate_dev_resources(child);
+ }
+}
+
+static void pci_bus_allocate_resources(struct pci_bus *b)
+{
+ struct pci_bus *child;
+
+ /* Depth-First Search on bus tree */
+ if (b->self) {
+ pci_read_bridge_bases(b);
+ pci_claim_bridge_resources(b->self);
+ }
+
+ list_for_each_entry(child, &b->children, node)
+ pci_bus_allocate_resources(child);
+}
+
+void pci_bus_claim_resources(struct pci_bus *b)
+{
+ pci_bus_allocate_resources(b);
+ pci_bus_allocate_dev_resources(b);
+}
+EXPORT_SYMBOL(pci_bus_claim_resources);
+
static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
struct list_head *add_head,
struct list_head *fail_head)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index dac677c..6faf994 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1119,6 +1119,7 @@ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void
/* Helper functions for low-level code (drivers/pci/setup-[bus,res].c) */
resource_size_t pcibios_retrieve_fw_addr(struct pci_dev *dev, int idx);
void pci_bus_assign_resources(const struct pci_bus *bus);
+void pci_bus_claim_resources(struct pci_bus *bus);
void pci_bus_size_bridges(struct pci_bus *bus);
int pci_claim_resource(struct pci_dev *, int);
int pci_claim_bridge_resource(struct pci_dev *bridge, int i);
--
1.9.1
Tomasz Nowicki
2016-02-16 14:00:05 UTC
Permalink
It is perfectly fine to use ACPI_PCI_HOST_GENERIC for ARM64,
so lets get rid of PCI init empty stub, related ACPI header and
go with full-blown PCI host controller driver.

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
TO: Catalin Marinas <***@arm.com>
TO: Lorenzo Pieralisi <***@arm.com>
TO: Will Deacon <***@arm.com>
TO: Arnd Bergmann <***@arndb.de>
CC: Liviu Dudau <***@arm.com>
Tested-by: Duc Dang <***@apm.com>
Tested-by: Dongdong Liu <***@huawei.com>
Tested-by: Hanjun Guo <***@linaro.org>
Tested-by: Graeme Gregory <***@linaro.org>
Tested-by: Sinan Kaya <***@codeaurora.org>
---
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/pci.c | 9 ---------
2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 552e996..09c49ea 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2,6 +2,7 @@ config ARM64
def_bool y
select ACPI_CCA_REQUIRED if ACPI
select ACPI_GENERIC_GSI if ACPI
+ select ACPI_PCI_HOST_GENERIC if ACPI
select ACPI_REDUCED_HARDWARE_ONLY if ACPI
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index 6e77e1b..1de0168 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -65,12 +65,3 @@ int pcibios_alloc_irq(struct pci_dev *dev)

return 0;
}
-
-#ifdef CONFIG_ACPI
-/* Root bridge scanning */
-struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
-{
- /* TODO: Should be revisited when implementing PCI on ACPI */
- return NULL;
-}
-#endif
--
1.9.1
Tomasz Nowicki
2016-02-16 14:00:05 UTC
Permalink
This is the last step before enabling generic ACPI PCI host controller
for ARM64. We need to take care of legacy IRQ mapping for non-MSI(X)
PCI devices. pcibios_alloc_irq() evaluation is not sensitive to
ACPI device enumeration order, so it is the best place to assign
device's IRQs for ACPI boot method. Also, it does not hurt DT to be
initialized form the same place.

NOTE: *This is going to be temporary solution*. There is ongoing work
which aims for cleaning legacy IRQ allocation from arch specific code.
We can consider this patch as the necessary evil which will be removed
once cleanup series lands in mailnline in the near future.

Signed-off-by: Tomasz Nowicki <***@semihalf.com>
Suggested-by: Lorenzo Pieralisi <***@arm.com>
---
arch/arm64/kernel/pci.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index 023b983..6e77e1b 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -52,11 +52,16 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
}

/*
- * Try to assign the IRQ number from DT when adding a new device
+ * Try to assign the IRQ number when probing a new device
*/
-int pcibios_add_device(struct pci_dev *dev)
+int pcibios_alloc_irq(struct pci_dev *dev)
{
- dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
+ if (acpi_disabled)
+ dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
+#ifdef CONFIG_ACPI
+ else
+ return acpi_pci_irq_enable(dev);
+#endif

return 0;
}
--
1.9.1
Lorenzo Pieralisi
2016-02-17 18:20:02 UTC
Permalink
[+ Duc, this needs testing on DT PCI hosts that do not call pci_fixup_irqs()]

On Tue, Feb 16, 2016 at 02:53:44PM +0100, Tomasz Nowicki wrote:

Subject is wrong, leftover from previous posting (ie you do not allocate
at device enable anymore).
Post by Tomasz Nowicki
This is the last step before enabling generic ACPI PCI host controller
for ARM64. We need to take care of legacy IRQ mapping for non-MSI(X)
You do not check MSIs anymore.
Post by Tomasz Nowicki
PCI devices. pcibios_alloc_irq() evaluation is not sensitive to
ACPI device enumeration order, so it is the best place to assign
device's IRQs for ACPI boot method. Also, it does not hurt DT to be
initialized form the same place.
NOTE: *This is going to be temporary solution*. There is ongoing work
which aims for cleaning legacy IRQ allocation from arch specific code.
We can consider this patch as the necessary evil which will be removed
once cleanup series lands in mailnline in the near future.
"To enable PCI legacy IRQs on platforms booting with ACPI, arch code
should include ACPI specific callbacks that parse and set-up the
device IRQ number, equivalent to the DT boot path. Owing to the current
ACPI core scan handlers implementation, ACPI PCI legacy IRQs bindings
cannot be parsed at device add time, since that would trigger ACPI scan
handlers ordering issues depending on how the ACPI tables are defined.

To solve this problem and consolidate FW PCI legacy IRQs parsing in
one single pcibios callback (pending final removal), this patch moves
DT PCI IRQ parsing to the pcibios_alloc_irq() callback (called by
PCI core code at device probe time) and adds ACPI PCI legacy IRQs
parsing to the same callback too, so that FW PCI legacy IRQs parsing
is confined in one single arch callback that can be easily removed
when code parsing PCI legacy IRQs is consolidated and moved to core
PCI code".

?
Post by Tomasz Nowicki
---
arch/arm64/kernel/pci.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index 023b983..6e77e1b 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -52,11 +52,16 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
}
/*
- * Try to assign the IRQ number from DT when adding a new device
+ * Try to assign the IRQ number when probing a new device
*/
-int pcibios_add_device(struct pci_dev *dev)
+int pcibios_alloc_irq(struct pci_dev *dev)
{
- dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
+ if (acpi_disabled)
+ dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
+#ifdef CONFIG_ACPI
+ else
+ return acpi_pci_irq_enable(dev);
+#endif
return 0;
}
It is good this code is now in one single function, it will be removed
more quickly :D

So pending APM X-gene DT testing:

Reviewed-by: Lorenzo Pieralisi <***@arm.com>
Lorenzo Pieralisi
2016-02-18 13:00:03 UTC
Permalink
Hi Bjorn, Rafael,
From the functionality point of view this series might be split into the
1. Make MMCONFIG code arch-agnostic which allows all architectures to collect
PCI config regions and used when necessary.
2. Move non-arch specific bits to the core code.
3. Use MMCONFIG code and implement generic ACPI based PCI host controller driver.
4. Enable above driver on ARM64
I think that apart from some pending review comments that will force
some minor patches update, the overall structure of this patchset is in
a reasonable shape, I would be grateful if you could have a look
from PCI and ACPI perspectives to see if there is some serious
rework needed and/or you want us to do things differently.

In particular, the MCFG rework (along with some PCI core changes
ie PCI ACPI bridge companion) affects x86 so we definitely need
some feedback on that code, otherwise we are stuck and can't
enable ACPI PCI support for ARM64.

Thank you very much.

Cheers,
Lorenzo
https://patchwork.ozlabs.org/patch/576450/
which can be found in pci-acpi-v5 branch.
This has been tested on Cavium ThunderX server, JunoR2, HP RX2660 IA64, x86,
Hip05, X-Gene and QEMU-aarch64. Any help in reviewing and testing is very appreciated.
v4 -> v5
- dropped MCFG refactoring group patches 1-6 from series v4 and integrated Jayachandran's patch
https://patchwork.ozlabs.org/patch/575525/
- rewrite PCI legacy IRQs allocation
- squashed two patches 11 and 12 from series v4, fixed bisection issue
- changelog improvements
- rebased to 4.5-rc3
v3 -> v4
- dropped Jiang's fix http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04318.html
- added Lorenzo's fix patch 19/24
- ACPI PCI bus domain number assigning cleanup
- changed resource management, we now claim and reassign resources
- improvements for applying quirks
- dropped Matthew's http://www.spinics.net/lists/linux-pci/msg45950.html dependency
- rebased to 4.5-rc1
v2 -> v3
- fix legacy IRQ assigning and IO ports registration
- remove reference to arch specific companion device for ia64
- move ACPI PCI host controller driver to pci_root.c
- drop generic domain assignment for x86 and ia64 as I am not
able to run all necessary test variants
- drop patch which cleaned legacy IRQ assignment since it belongs to
https://patchwork.ozlabs.org/patch/557504/
- extend MCFG quirk code
- rebased to 4.4
v1 -> v2
- moved non-arch specific piece of code to dirver/acpi/ directory
- fixed IO resource handling
- introduced PCI config accessors quirks matching
- moved ACPI_COMPANION_SET to generic code
v1 - https://lkml.org/lkml/2015/10/27/504
v2 - https://lkml.org/lkml/2015/12/16/246
v3 - http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04308.html
v4 - https://lkml.org/lkml/2016/2/4/646
ACPI: MCFG: Move mmcfg_list management to drivers/acpi
drivers: pci: add generic code to claim bus resources
acpi, pci, mcfg: Provide default RAW ACPI PCI config space accessors.
arm64, acpi: Use MCFG library and empty PCI config space accessors
from pci_mcfg.c file.
pci, acpi, ecam: Add flag to indicate whether ECAM region was hot
added or not.
x86, pci: Cleanup platform specific MCFG data by using ECAM hot_added
flag.
pci, acpi, x86, ia64: Move ACPI host bridge device companion
assignment to core code.
pci, acpi: Provide generic way to assign bus domain number.
x86, ia64: Include acpi_pci_{add|remove}_bus to the default
pcibios_{add|remove}_bus implementation.
acpi, mcfg: Add default PCI config accessors implementation and
initial support for related quirks.
pci, of: Move the PCI I/O space management to PCI core code.
pci, acpi: Support for ACPI based generic PCI host controller
initialization
pci, acpi: Match PCI config space accessors against platfrom specific
quirks.
arm64, pci, acpi: Assign legacy IRQs once device is enable.
arm64, pci, acpi: Start using ACPI based PCI host bridge driver for
ARM64.
arch/arm64/Kconfig | 5 +
arch/arm64/kernel/pci.c | 35 +---
arch/ia64/hp/common/sba_iommu.c | 2 +-
arch/ia64/include/asm/pci.h | 1 -
arch/ia64/pci/pci.c | 26 ---
arch/ia64/sn/kernel/io_acpi_init.c | 4 +-
arch/x86/include/asm/pci.h | 3 -
arch/x86/include/asm/pci_x86.h | 24 +--
arch/x86/pci/acpi.c | 47 +----
arch/x86/pci/common.c | 10 -
arch/x86/pci/mmconfig-shared.c | 269 ++++---------------------
arch/x86/pci/mmconfig_32.c | 1 +
arch/x86/pci/mmconfig_64.c | 1 +
arch/x86/pci/numachip.c | 1 +
drivers/acpi/Kconfig | 7 +
drivers/acpi/Makefile | 1 +
drivers/acpi/pci_mcfg.c | 392 +++++++++++++++++++++++++++++++++++++
drivers/acpi/pci_root.c | 154 ++++++++++++++-
drivers/of/address.c | 116 +----------
drivers/pci/pci.c | 126 +++++++++++-
drivers/pci/probe.c | 5 +
drivers/pci/setup-bus.c | 63 ++++++
drivers/xen/pci.c | 5 +-
include/acpi/acpi_bus.h | 1 +
include/asm-generic/vmlinux.lds.h | 7 +
include/linux/of_address.h | 9 -
include/linux/pci-acpi.h | 68 +++++++
include/linux/pci.h | 6 +
28 files changed, 892 insertions(+), 497 deletions(-)
create mode 100644 drivers/acpi/pci_mcfg.c
--
1.9.1
Sinan Kaya
2016-02-29 19:10:02 UTC
Permalink
From the functionality point of view this series might be split into the
1. Make MMCONFIG code arch-agnostic which allows all architectures to collect
PCI config regions and used when necessary.
2. Move non-arch specific bits to the core code.
3. Use MMCONFIG code and implement generic ACPI based PCI host controller driver.
4. Enable above driver on ARM64
https://patchwork.ozlabs.org/patch/576450/
which can be found in pci-acpi-v5 branch.
This has been tested on Cavium ThunderX server, JunoR2, HP RX2660 IA64, x86,
Hip05, X-Gene and QEMU-aarch64. Any help in reviewing and testing is very appreciated.
v4 -> v5
- dropped MCFG refactoring group patches 1-6 from series v4 and integrated Jayachandran's patch
https://patchwork.ozlabs.org/patch/575525/
- rewrite PCI legacy IRQs allocation
- squashed two patches 11 and 12 from series v4, fixed bisection issue
- changelog improvements
- rebased to 4.5-rc3
v3 -> v4
- dropped Jiang's fix http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04318.html
- added Lorenzo's fix patch 19/24
- ACPI PCI bus domain number assigning cleanup
- changed resource management, we now claim and reassign resources
- improvements for applying quirks
- dropped Matthew's http://www.spinics.net/lists/linux-pci/msg45950.html dependency
- rebased to 4.5-rc1
Having tested v4 and v5, I'm seeing some resource assignment problems and address conflicts.
And problems booting QEMU.

Anybody else seeing the same?
--
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Lorenzo Pieralisi
2016-03-03 11:30:03 UTC
Permalink
[+ Yinghai]
Post by Sinan Kaya
From the functionality point of view this series might be split into the
1. Make MMCONFIG code arch-agnostic which allows all architectures to collect
PCI config regions and used when necessary.
2. Move non-arch specific bits to the core code.
3. Use MMCONFIG code and implement generic ACPI based PCI host controller driver.
4. Enable above driver on ARM64
https://patchwork.ozlabs.org/patch/576450/
which can be found in pci-acpi-v5 branch.
This has been tested on Cavium ThunderX server, JunoR2, HP RX2660 IA64, x86,
Hip05, X-Gene and QEMU-aarch64. Any help in reviewing and testing is very appreciated.
v4 -> v5
- dropped MCFG refactoring group patches 1-6 from series v4 and integrated Jayachandran's patch
https://patchwork.ozlabs.org/patch/575525/
- rewrite PCI legacy IRQs allocation
- squashed two patches 11 and 12 from series v4, fixed bisection issue
- changelog improvements
- rebased to 4.5-rc3
v3 -> v4
- dropped Jiang's fix http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04318.html
- added Lorenzo's fix patch 19/24
- ACPI PCI bus domain number assigning cleanup
- changed resource management, we now claim and reassign resources
- improvements for applying quirks
- dropped Matthew's http://www.spinics.net/lists/linux-pci/msg45950.html dependency
- rebased to 4.5-rc1
Having tested v4 and v5, I'm seeing some resource assignment problems
and address conflicts. And problems booting QEMU.
I asked Tomasz to add resource claiming code in v4 to make sure that,
if FW has left resources in a reasonable set-up, we reuse it as-is.

Now, I was and I am aware this could trigger resource allocation
issues (in particular in relation to bridges apertures sizing),
that can be nonetheless solved by forcing the kernel to reallocate
resources (pci=realloc, that's exactly what's there for, release
the bridge apertures, resize the busses downstream and reassign
the respective hierarchy).

I am not entirely aware of how consistently pci=realloc was used on
x86, what I am aware of is the panoply of pci=* command line parameters
defined for x86 and I would certainly avoid that.

The decision on whether we claim resources before reassigning them
is either dictacted by the boot method (ie ACPI->claim resources by
default) or we should control it via a FW option or a command
line option, PCI standard (PCI FW revision 3.1, 3.5 "Device State
at Firmware/Operating System Handoff) IIUC does not stricly mandate
FW configuring the whole PCI hierarchy (and to be 100% compliant
we should check the device IO/MEM enable bits before claiming, as x86 does
- see pcibios_allocate_dev_resources() in arch/x86/pci/i386.c).

x86 and IA64 claim PCI resources on boot and live with that (well, minus
the gazillions x86 pci= parameters that change the PCI resources assignment
one way or another), comments very welcome in particular on the pci=realloc
option and its usage.

What's certain is, if we do not claim resources by default we will *never*
be able to do it, it will certainly trigger regressions.

Lorenzo
Sinan Kaya
2016-03-03 14:30:03 UTC
Permalink
Post by Lorenzo Pieralisi
x86 and IA64 claim PCI resources on boot and live with that (well, minus
the gazillions x86 pci= parameters that change the PCI resources assignment
one way or another), comments very welcome in particular on the pci=realloc
option and its usage.
I have been working with Linux PCIe over 3 years. I never used pci=realloc argument.

The v5 series minus [PATCH V5 11/15] drivers: pci: add generic code to claim bus resources
is working just fine and is ready to go upstream in my opinion. It passed my internal
testing with different types of endpoints.

The inclusion of this patch is now requiring everybody to add pci=realloc argument
otherwise the resources assigned by the UEFI BIOS are not working.

I think there is still some work to be done in this patch and is too early to be included
into the series. It is blocking progress of the series which is sitting on review over 1
year already.

[ 0.752916] pci 0000:01:00.0: VF(n) BAR2 space: [mem 0x80360800000-0x8037fffffff 64bit pref] (contains BAR2 for 63 VFs)
[ 0.771799] pci 0000:00:00.0: PCI bridge to [bus 01-06]
[ 0.777054] pci 0000:00:00.0: root [mem 0x80100100000-0x8013fffffff window] res [mem 0x8013ff00000-0x8013fffffff] nr 14
[ 0.787846] pci 0000:00:00.0: pci_claim_bridge_resource:714 1: i:14
[ 0.794135] pci 0000:00:00.0: root [mem 0x80300000000-0x8037fffffff window] res [mem 0x80360000000-0x8037fffffff 64bit pref] nr 15
[ 0.805881] pci 0000:00:00.0: pci_claim_bridge_resource:714 1: i:15
[ 0.812155] pci 0000:01:00.0: root [mem 0x8013ff00000-0x8013fffffff] res [mem 0x8013ff00000-0x8013fffffff 64bit] nr 0
[ 0.822773] pci 0000:01:00.0: root [mem 0x80360000000-0x8037fffffff 64bit pref] res [mem 0x80360000000-0x803607fffff 64bit pref] nr 2
[ 0.834778] pci 0000:01:00.0: root [mem 0x80360000000-0x8037fffffff 64bit pref] res [mem 0x8037ff00000-0x8037fffffff pref] nr 6
[ 0.846265] pci 0000:01:00.0: can't claim BAR 9 [mem 0x80360800000-0x8037fffffff 64bit pref]: address conflict with 0000:01:00.0 [mem 0x8037ff00000-0x8037fffffff pref]
[ 0.861237] pci 0000:01:00.0: BAR 9: no space for [mem size 0x1f800000 64bit pref]
[ 0.868811] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x1f800000 64bit pref]


I keep saying this but the type of CPU is not important when it comes to PCIe. Both PCIe and
ACPI are governed by specs. If it is working for x86 and i64; it needs to work for ARM64 as
well.

Even ARM64 has the luxury to omit the old BIOS behaviors. Most ARM64 systems use tianocore based
UEFI BIOS.

This is pointing to an implementation problem in arm64 adaptation. Need to figure out
what is different.
--
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Lorenzo Pieralisi
2016-03-04 11:00:02 UTC
Permalink
Post by Sinan Kaya
Post by Lorenzo Pieralisi
x86 and IA64 claim PCI resources on boot and live with that (well, minus
the gazillions x86 pci= parameters that change the PCI resources assignment
one way or another), comments very welcome in particular on the pci=realloc
option and its usage.
I have been working with Linux PCIe over 3 years. I never used
pci=realloc argument.
The v5 series minus [PATCH V5 11/15] drivers: pci: add generic code to
claim bus resources is working just fine and is ready to go upstream
in my opinion. It passed my internal testing with different types of
endpoints.
The inclusion of this patch is now requiring everybody to add
pci=realloc argument otherwise the resources assigned by the UEFI BIOS
are not working.
I think there is still some work to be done in this patch and is too
early to be included into the series. It is blocking progress of the
series which is sitting on review over 1 year already.
First off, I think that's specious, patch 11 is not blocking anything,
if you and Tomasz want to drop it go ahead and take responsibility
of the consequences.

I am not saying patch 11 is perfect, it is there to review, if you
spot bugs point them out.

If you are interested and willing to make an effort to understand why I
asked Tomasz to integrate it, a bit of background here:

http://permalink.gmane.org/gmane.linux.kernel.pci/44830

If we want to drop patch 11, we are going to discard whatever FW
set-up at FW/OS hand-off and reassign everything. Want to do it ?
Go ahead.

I wrote it in my previous email, probably it was not clear, so, here we
go again.

If we want to at least consider the FW PCI configuration at FW/OS
handoff, we should read the PCI bridge apertures and claim them, when
that fails reassign the corresponding PCI bus hierarchy (which means
releasing the bridge resources and downstream devices and reassign
them), that's what pci=realloc does.

I think that it is a command line option since it has to be a choice,
ie overriding FW set-up should be an option, not a default.

Patch 11 does what x86 does in arch code arch/x86/pci/i386.c,

pcibios_resource_survey()

and that works for them (of course, minus quirks that do exist).

I could integrate the code implementing pci=realloc in patch 11 so
that we realloc by default all resources claimed that failed (which
means that bridges are resized accordingly and you won't be forced
to use pci=realloc on command line).
Post by Sinan Kaya
[ 0.752916] pci 0000:01:00.0: VF(n) BAR2 space: [mem 0x80360800000-0x8037fffffff 64bit pref] (contains BAR2 for 63 VFs)
[ 0.771799] pci 0000:00:00.0: PCI bridge to [bus 01-06]
[ 0.777054] pci 0000:00:00.0: root [mem 0x80100100000-0x8013fffffff window] res [mem 0x8013ff00000-0x8013fffffff] nr 14
[ 0.787846] pci 0000:00:00.0: pci_claim_bridge_resource:714 1: i:14
[ 0.794135] pci 0000:00:00.0: root [mem 0x80300000000-0x8037fffffff window] res [mem 0x80360000000-0x8037fffffff 64bit pref] nr 15
[ 0.805881] pci 0000:00:00.0: pci_claim_bridge_resource:714 1: i:15
[ 0.812155] pci 0000:01:00.0: root [mem 0x8013ff00000-0x8013fffffff] res [mem 0x8013ff00000-0x8013fffffff 64bit] nr 0
[ 0.822773] pci 0000:01:00.0: root [mem 0x80360000000-0x8037fffffff 64bit pref] res [mem 0x80360000000-0x803607fffff 64bit pref] nr 2
[ 0.834778] pci 0000:01:00.0: root [mem 0x80360000000-0x8037fffffff 64bit pref] res [mem 0x8037ff00000-0x8037fffffff pref] nr 6
[ 0.846265] pci 0000:01:00.0: can't claim BAR 9 [mem 0x80360800000-0x8037fffffff 64bit pref]: address conflict with 0000:01:00.0 [mem 0x8037ff00000-0x8037fffffff pref]
[ 0.861237] pci 0000:01:00.0: BAR 9: no space for [mem size 0x1f800000 64bit pref]
[ 0.868811] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x1f800000 64bit pref]
I keep saying this but the type of CPU is not important when it comes
to PCIe. Both PCIe and ACPI are governed by specs. If it is working
for x86 and i64; it needs to work for ARM64 as well.
That's theory. In practice there is massive legacy there and PCI resource
assignment is carried out in an arch specific way (otherwise there would
be no pci claiming/assignment code in arch/* right ?) and the resource
claiming/assignment strictly depends on FW set-up, like it or lump it,
that's the way it *currently* is.

I wrote in my previous email, the status of PCI resources at OS/FW
handoff is not strictly mandated by the PCI standard AFAIK (it is
covered by 3.5 "Device state at Firmware/Operating System Handoff" in
the PCI FW spec revision 3.1), so what I suggest above is the only option
we have (or you just discard FW configuration altogether, that's what
happens if all PCI resources are reassigned, it is a choice to be made
and it is neither correct nor wrong, I wish it would).
Post by Sinan Kaya
Even ARM64 has the luxury to omit the old BIOS behaviors. Most ARM64
systems use tianocore based UEFI BIOS.
This is pointing to an implementation problem in arm64 adaptation.
Need to figure out what is different.
Look no further, firmware is different. How do we want to proceed ?

Lorenzo
Tomasz Nowicki
2016-03-04 12:10:02 UTC
Permalink
Post by Lorenzo Pieralisi
Post by Sinan Kaya
Post by Lorenzo Pieralisi
x86 and IA64 claim PCI resources on boot and live with that (well, minus
the gazillions x86 pci= parameters that change the PCI resources assignment
one way or another), comments very welcome in particular on the pci=realloc
option and its usage.
I have been working with Linux PCIe over 3 years. I never used pci=realloc argument.
The v5 series minus [PATCH V5 11/15] drivers: pci: add generic code to
claim bus resources is working just fine and is ready to go upstream
in my opinion. It passed my internal testing with different types of
endpoints.
The inclusion of this patch is now requiring everybody to add
pci=realloc argument otherwise the resources assigned by the UEFI BIOS
are not working.
I think there is still some work to be done in this patch and is too
early to be included into the series. It is blocking progress of the
series which is sitting on review over 1 year already.
First off, I think that's specious, patch 11 is not blocking anything,
if you and Tomasz want to drop it go ahead and take responsibility
of the consequences.
I am not saying patch 11 is perfect, it is there to review, if you
spot bugs point them out.
If you are interested and willing to make an effort to understand why I
http://permalink.gmane.org/gmane.linux.kernel.pci/44830
If we want to drop patch 11, we are going to discard whatever FW
set-up at FW/OS hand-off and reassign everything. Want to do it ?
Go ahead.
I wrote it in my previous email, probably it was not clear, so, here we
go again.
If we want to at least consider the FW PCI configuration at FW/OS
handoff, we should read the PCI bridge apertures and claim them, when
that fails reassign the corresponding PCI bus hierarchy (which means
releasing the bridge resources and downstream devices and reassign
them), that's what pci=realloc does.
I think that it is a command line option since it has to be a choice,
ie overriding FW set-up should be an option, not a default.
Patch 11 does what x86 does in arch code arch/x86/pci/i386.c,
pcibios_resource_survey()
and that works for them (of course, minus quirks that do exist).
I could integrate the code implementing pci=realloc in patch 11 so
that we realloc by default all resources claimed that failed (which
means that bridges are resized accordingly and you won't be forced
to use pci=realloc on command line).
I agree with Lorenzo. Just because v3 works it does not mean we want to
go this way. Also, I think we should realloc all resources claimed that
failed, w/o need to use pci=realloc on command line.

Tomasz
Sinan Kaya
2016-03-04 15:00:02 UTC
Permalink
Posting on top of Tomasz's email...
Post by Lorenzo Pieralisi
Post by Sinan Kaya
Post by Lorenzo Pieralisi
x86 and IA64 claim PCI resources on boot and live with that (well, minus
the gazillions x86 pci= parameters that change the PCI resources assignment
one way or another), comments very welcome in particular on the pci=realloc
option and its usage.
I have been working with Linux PCIe over 3 years. I never used pci=realloc argument.
The v5 series minus [PATCH V5 11/15] drivers: pci: add generic code to
claim bus resources is working just fine and is ready to go upstream
in my opinion. It passed my internal testing with different types of
endpoints.
The inclusion of this patch is now requiring everybody to add
pci=realloc argument otherwise the resources assigned by the UEFI BIOS
are not working.
I think there is still some work to be done in this patch and is too
early to be included into the series. It is blocking progress of the
series which is sitting on review over 1 year already.
First off, I think that's specious, patch 11 is not blocking anything,
if you and Tomasz want to drop it go ahead and take responsibility
of the consequences.
I am not saying patch 11 is perfect, it is there to review, if you
spot bugs point them out.
If you are interested and willing to make an effort to understand why I
http://permalink.gmane.org/gmane.linux.kernel.pci/44830
If we want to drop patch 11, we are going to discard whatever FW
set-up at FW/OS hand-off and reassign everything. Want to do it ?
Go ahead.
Yes, we should ideally reuse the BAR addresses assigned by FW. And, it is not
working as I said. It happens to work on Intel architectures. I'm saying that
this patch needs some more work not that it is right or wrong.

How long it takes to figure this out is the question? It could have been dealt
with separately. If you can come up with a solution in the near future,
I'm ready to test.

There was some big push on v3 to get it tested by multiple vendors. I was under
the impression that we are trying to get some version accepted.

Since I'm the only one complaining right now, I guess nobody else is testing
v4 and v5.
Post by Lorenzo Pieralisi
I wrote it in my previous email, probably it was not clear, so, here we
go again.
If we want to at least consider the FW PCI configuration at FW/OS
handoff, we should read the PCI bridge apertures and claim them, when
that fails reassign the corresponding PCI bus hierarchy (which means
releasing the bridge resources and downstream devices and reassign
them), that's what pci=realloc does.
I think that it is a command line option since it has to be a choice,
ie overriding FW set-up should be an option, not a default.
Patch 11 does what x86 does in arch code arch/x86/pci/i386.c,
pcibios_resource_survey()
and that works for them (of course, minus quirks that do exist).
I could integrate the code implementing pci=realloc in patch 11 so
that we realloc by default all resources claimed that failed (which
means that bridges are resized accordingly and you won't be forced
to use pci=realloc on command line).
I agree with Lorenzo. Just because v3 works it does not mean we want to go this way. Also, I think we should realloc all resources claimed that failed, w/o need to use pci=realloc on command line.
Let's give this a try. I have seen the kernel messages with and without realloc option
too. I don't want to see any kind of error messages if it is actually working.

I don't want to get a support request that PCIe is broken even though it is not just
because of some error message in boot log that eventually got corrected by the
architecture.
Tomasz
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Loading...