Discussion:
[RFC v2 0/8] Support for Tegra 2D hardware
Terje Bergstrom
2012-11-26 13:19:06 UTC
Permalink
This set of patches adds support for Tegra20 and Tegra30 host1x and
2D. It is based on Thierry Reding's tegra/next branch. We still have
unresolved issues, so I don't expect these patches to be merged, but
act as a way to get the code out for public review.

The first version was sent only to linux-***@vger.kernel.org. The
second version has the following changes:
* nvhost split into smaller pieces
* "graphics host" and "grhost" renamed to host1x
* patch to tegradrm exit is removed
* nvhost mem_mgr usage dropped
* public power management API prefix changed to host1x
* fixed some other code style issues

nvhost is the driver that controls host1x hardware. It supports
host1x command channels, synchronization, run-time power management
and memory management. It is sectioned into logical driver under
drivers/video/tegra/host and physical driver under
drivers/video/tegra/host/host1x. The physical driver is compiled with
the hardware headers of the particular host1x version.

The hardware units are described (briefly) in the Tegra2 TRM.

The patch set removes responsibility of host1x from tegradrm. At the
same time, it moves all drm related infrastructure in
drivers/gpu/drm/tegra/host1x.c to other files.

The patch set adds 2D driver to tegradrm, which uses nvhost for
communicating with host1x to access sync points and channels. We
expect to use the same infrastructure for other host1x clients, so
we have kept nvhost and tegradrm separate.

The patch set also adds user space API to tegradrm for accessing
host1d and 2D. We are preparing also patches to libdrm, but they are
not yet in condition that they could be sent out.

TODO:
* tegradrm IOMMU support has been disabled
* Prime support is still tegradrm specific - to be generalized when
we find a way for dma mapping API to handle double mapping
* Find a home for the data in auxdata if they are troublesome
* nvhost is still called nvhost - there's a request to rename to
host1x to reflect the hardware name
* Write an article about host1x and how we use it
* Send out 2D user space code
* Rebase on linux-next

Arto Merilainen (2):
gpu: drm: tegra: Remove redundant host1x
gpu: drm: tegra: Prime support

Terje Bergstrom (6):
video: tegra: Add nvhost driver
video: tegra: Add syncpoint wait and interrupts
video: tegra: host: Add channel and client support
video: tegra: Add debug support
ARM: tegra: Add auxiliary data for nvhost
drm: tegra: Add gr2d device

arch/arm/mach-tegra/board-dt-tegra20.c | 38 +-
arch/arm/mach-tegra/board-dt-tegra30.c | 38 +-
arch/arm/mach-tegra/tegra20_clocks_data.c | 8 +-
arch/arm/mach-tegra/tegra30_clocks_data.c | 2 +
drivers/gpu/drm/tegra/Kconfig | 8 +-
drivers/gpu/drm/tegra/Makefile | 4 +-
drivers/gpu/drm/tegra/dc.c | 22 +-
drivers/gpu/drm/tegra/dmabuf.c | 150 ++++++
drivers/gpu/drm/tegra/drm.c | 445 ++++++++++++++++--
drivers/gpu/drm/tegra/drm.h | 89 ++--
drivers/gpu/drm/tegra/dsi.c | 24 +-
drivers/gpu/drm/tegra/fb.c | 26 +-
drivers/gpu/drm/tegra/gr2d.c | 224 +++++++++
drivers/gpu/drm/tegra/hdmi.c | 24 +-
drivers/gpu/drm/tegra/host1x.c | 343 --------------
drivers/gpu/drm/tegra/tvo.c | 33 +-
drivers/video/Kconfig | 2 +
drivers/video/Makefile | 2 +
drivers/video/tegra/host/Kconfig | 5 +
drivers/video/tegra/host/Makefile | 18 +
drivers/video/tegra/host/bus_client.c | 97 ++++
drivers/video/tegra/host/chip_support.c | 48 ++
drivers/video/tegra/host/chip_support.h | 149 ++++++
drivers/video/tegra/host/debug.c | 252 ++++++++++
drivers/video/tegra/host/debug.h | 50 ++
drivers/video/tegra/host/dev.c | 170 +++++++
drivers/video/tegra/host/dev.h | 33 ++
drivers/video/tegra/host/dmabuf.c | 151 ++++++
drivers/video/tegra/host/dmabuf.h | 45 ++
drivers/video/tegra/host/host1x/Makefile | 7 +
drivers/video/tegra/host/host1x/host1x.c | 257 +++++++++++
drivers/video/tegra/host/host1x/host1x.h | 86 ++++
drivers/video/tegra/host/host1x/host1x01.c | 70 +++
drivers/video/tegra/host/host1x/host1x01.h | 29 ++
.../video/tegra/host/host1x/host1x01_hardware.h | 157 +++++++
drivers/video/tegra/host/host1x/host1x_cdma.c | 486 ++++++++++++++++++++
drivers/video/tegra/host/host1x/host1x_cdma.h | 39 ++
drivers/video/tegra/host/host1x/host1x_channel.c | 150 ++++++
drivers/video/tegra/host/host1x/host1x_debug.c | 405 ++++++++++++++++
drivers/video/tegra/host/host1x/host1x_intr.c | 263 +++++++++++
drivers/video/tegra/host/host1x/host1x_syncpt.c | 168 +++++++
.../video/tegra/host/host1x/hw_host1x01_channel.h | 182 ++++++++
drivers/video/tegra/host/host1x/hw_host1x01_sync.h | 398 ++++++++++++++++
.../video/tegra/host/host1x/hw_host1x01_uclass.h | 474 +++++++++++++++++++
drivers/video/tegra/host/nvhost_acm.c | 481 +++++++++++++++++++
drivers/video/tegra/host/nvhost_acm.h | 45 ++
drivers/video/tegra/host/nvhost_cdma.c | 430 +++++++++++++++++
drivers/video/tegra/host/nvhost_cdma.h | 109 +++++
drivers/video/tegra/host/nvhost_channel.c | 126 +++++
drivers/video/tegra/host/nvhost_channel.h | 65 +++
drivers/video/tegra/host/nvhost_intr.c | 384 ++++++++++++++++
drivers/video/tegra/host/nvhost_intr.h | 110 +++++
drivers/video/tegra/host/nvhost_job.c | 390 ++++++++++++++++
drivers/video/tegra/host/nvhost_memmgr.c | 160 +++++++
drivers/video/tegra/host/nvhost_memmgr.h | 65 +++
drivers/video/tegra/host/nvhost_syncpt.c | 452 ++++++++++++++++++
drivers/video/tegra/host/nvhost_syncpt.h | 148 ++++++
include/drm/tegra_drm.h | 129 ++++++
include/linux/nvhost.h | 294 ++++++++++++
59 files changed, 8549 insertions(+), 510 deletions(-)
create mode 100644 drivers/gpu/drm/tegra/dmabuf.c
create mode 100644 drivers/gpu/drm/tegra/gr2d.c
delete mode 100644 drivers/gpu/drm/tegra/host1x.c
create mode 100644 drivers/video/tegra/host/Kconfig
create mode 100644 drivers/video/tegra/host/Makefile
create mode 100644 drivers/video/tegra/host/bus_client.c
create mode 100644 drivers/video/tegra/host/chip_support.c
create mode 100644 drivers/video/tegra/host/chip_support.h
create mode 100644 drivers/video/tegra/host/debug.c
create mode 100644 drivers/video/tegra/host/debug.h
create mode 100644 drivers/video/tegra/host/dev.c
create mode 100644 drivers/video/tegra/host/dev.h
create mode 100644 drivers/video/tegra/host/dmabuf.c
create mode 100644 drivers/video/tegra/host/dmabuf.h
create mode 100644 drivers/video/tegra/host/host1x/Makefile
create mode 100644 drivers/video/tegra/host/host1x/host1x.c
create mode 100644 drivers/video/tegra/host/host1x/host1x.h
create mode 100644 drivers/video/tegra/host/host1x/host1x01.c
create mode 100644 drivers/video/tegra/host/host1x/host1x01.h
create mode 100644 drivers/video/tegra/host/host1x/host1x01_hardware.h
create mode 100644 drivers/video/tegra/host/host1x/host1x_cdma.c
create mode 100644 drivers/video/tegra/host/host1x/host1x_cdma.h
create mode 100644 drivers/video/tegra/host/host1x/host1x_channel.c
create mode 100644 drivers/video/tegra/host/host1x/host1x_debug.c
create mode 100644 drivers/video/tegra/host/host1x/host1x_intr.c
create mode 100644 drivers/video/tegra/host/host1x/host1x_syncpt.c
create mode 100644 drivers/video/tegra/host/host1x/hw_host1x01_channel.h
create mode 100644 drivers/video/tegra/host/host1x/hw_host1x01_sync.h
create mode 100644 drivers/video/tegra/host/host1x/hw_host1x01_uclass.h
create mode 100644 drivers/video/tegra/host/nvhost_acm.c
create mode 100644 drivers/video/tegra/host/nvhost_acm.h
create mode 100644 drivers/video/tegra/host/nvhost_cdma.c
create mode 100644 drivers/video/tegra/host/nvhost_cdma.h
create mode 100644 drivers/video/tegra/host/nvhost_channel.c
create mode 100644 drivers/video/tegra/host/nvhost_channel.h
create mode 100644 drivers/video/tegra/host/nvhost_intr.c
create mode 100644 drivers/video/tegra/host/nvhost_intr.h
create mode 100644 drivers/video/tegra/host/nvhost_job.c
create mode 100644 drivers/video/tegra/host/nvhost_memmgr.c
create mode 100644 drivers/video/tegra/host/nvhost_memmgr.h
create mode 100644 drivers/video/tegra/host/nvhost_syncpt.c
create mode 100644 drivers/video/tegra/host/nvhost_syncpt.h
create mode 100644 include/drm/tegra_drm.h
create mode 100644 include/linux/nvhost.h

--
1.7.9.5
Terje Bergstrom
2012-11-26 13:19:12 UTC
Permalink
From: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+***@public.gmane.org>

This patch removes the redundant host1x driver from tegradrm and
makes necessary bindings to the separate host driver.

This modification introduces a regression: Because there is no
general framework for attaching separate devices into the
same address space, this patch removes the ability to use IOMMU
in tegradrm.

Signed-off-by: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+***@public.gmane.org>
Signed-off-by: Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org>
---
drivers/gpu/drm/tegra/Kconfig | 8 +-
drivers/gpu/drm/tegra/Makefile | 2 +-
drivers/gpu/drm/tegra/dc.c | 22 +--
drivers/gpu/drm/tegra/drm.c | 207 +++++++++++++++++++-----
drivers/gpu/drm/tegra/drm.h | 55 ++-----
drivers/gpu/drm/tegra/dsi.c | 24 ++-
drivers/gpu/drm/tegra/fb.c | 26 ++-
drivers/gpu/drm/tegra/hdmi.c | 24 ++-
drivers/gpu/drm/tegra/host1x.c | 343 ----------------------------------------
drivers/gpu/drm/tegra/tvo.c | 33 ++--
10 files changed, 246 insertions(+), 498 deletions(-)
delete mode 100644 drivers/gpu/drm/tegra/host1x.c

diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/drm/tegra/Kconfig
index affd741..4a0290e 100644
--- a/drivers/gpu/drm/tegra/Kconfig
+++ b/drivers/gpu/drm/tegra/Kconfig
@@ -1,6 +1,6 @@
config DRM_TEGRA
tristate "NVIDIA Tegra DRM"
- depends on DRM && OF && ARCH_TEGRA
+ depends on DRM && OF && ARCH_TEGRA && TEGRA_HOST1X
select DRM_KMS_HELPER
select DRM_GEM_CMA_HELPER
select DRM_KMS_CMA_HELPER
@@ -20,10 +20,4 @@ config DRM_TEGRA_DEBUG
help
Say yes here to enable debugging support.

-config DRM_TEGRA_IOMMU
- bool "NVIDIA Tegra DRM IOMMU support"
- help
- Say yes here to enable the use of the IOMMU to allocate and
- map memory buffers.
-
endif
diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index e6e96af..57a334d 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -1,7 +1,7 @@
ccflags-y := -Iinclude/drm
ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG

-tegra-drm-y := drm.o fb.o dc.o host1x.o
+tegra-drm-y := drm.o fb.o dc.o
tegra-drm-y += output.o rgb.o hdmi.o tvo.o dsi.o
tegra-drm-y += plane.o

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 3a16e93..1779008 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -12,6 +12,7 @@
#include <linux/module.h>
#include <linux/of.h>
#include <linux/platform_device.h>
+#include <linux/nvhost.h>

#include <mach/clk.h>

@@ -673,10 +674,10 @@ static int tegra_dc_debugfs_exit(struct tegra_dc *dc)
return 0;
}

-static int tegra_dc_drm_init(struct host1x_client *client,
+static int tegra_dc_drm_init(struct tegra_drm_client *client,
struct drm_device *drm)
{
- struct tegra_dc *dc = host1x_client_to_dc(client);
+ struct tegra_dc *dc = tegra_drm_client_to_dc(client);
int err;

dc->pipe = drm->mode_config.num_crtc;
@@ -712,9 +713,9 @@ static int tegra_dc_drm_init(struct host1x_client *client,
return 0;
}

-static int tegra_dc_drm_exit(struct host1x_client *client)
+static int tegra_dc_drm_exit(struct tegra_drm_client *client)
{
- struct tegra_dc *dc = host1x_client_to_dc(client);
+ struct tegra_dc *dc = tegra_drm_client_to_dc(client);
int err;

devm_free_irq(dc->dev, dc->irq, dc);
@@ -734,14 +735,13 @@ static int tegra_dc_drm_exit(struct host1x_client *client)
return 0;
}

-static const struct host1x_client_ops dc_client_ops = {
+static const struct tegra_drm_client_ops dc_client_ops = {
.drm_init = tegra_dc_drm_init,
.drm_exit = tegra_dc_drm_exit,
};

static int tegra_dc_probe(struct platform_device *pdev)
{
- struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
struct resource *regs;
struct tegra_dc *dc;
int err;
@@ -791,13 +791,14 @@ static int tegra_dc_probe(struct platform_device *pdev)
return err;
}

- err = host1x_register_client(host1x, &dc->client);
+ err = tegra_drm_register_client(&dc->client);
if (err < 0) {
- dev_err(&pdev->dev, "failed to register host1x client: %d\n",
+ dev_err(&pdev->dev, "failed to register tegra drm client: %d\n",
err);
return err;
}

+ host1x_busy(pdev);
platform_set_drvdata(pdev, dc);

return 0;
@@ -805,13 +806,12 @@ static int tegra_dc_probe(struct platform_device *pdev)

static int tegra_dc_remove(struct platform_device *pdev)
{
- struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
struct tegra_dc *dc = platform_get_drvdata(pdev);
int err;

- err = host1x_unregister_client(host1x, &dc->client);
+ err = tegra_drm_unregister_client(&dc->client);
if (err < 0) {
- dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+ dev_err(&pdev->dev, "failed to unregister tegra_drm client: %d\n",
err);
return err;
}
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 4a306c2..cba2d1d 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -24,44 +24,137 @@
#define DRIVER_MINOR 0
#define DRIVER_PATCHLEVEL 0

-#ifdef CONFIG_DRM_TEGRA_IOMMU
-#define TEGRA_DRM_IOMMU_BASE_ADDR 0x20000000
-#define TEGRA_DRM_IOMMU_SIZE 0x10000000
-#endif
+static LIST_HEAD(tegra_drm_subdrv_list);
+static LIST_HEAD(tegra_drm_subdrv_required);

-static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
+struct tegra_drm_client_entry {
+ struct device_node *np;
+ struct list_head list;
+};
+
+static int tegra_drm_add_client(struct device_node *np)
+{
+ struct tegra_drm_client_entry *client;
+
+ client = kzalloc(sizeof(*client), GFP_KERNEL);
+ if (!client)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&client->list);
+ client->np = of_node_get(np);
+
+ list_add_tail(&client->list, &tegra_drm_subdrv_required);
+
+ return 0;
+}
+
+static int tegra_drm_parse_dt(void)
{
- struct device *dev = drm->dev;
- struct host1x *host1x;
+ static const char * const compat[] = {
+ "nvidia,tegra20-dc",
+ "nvidia,tegra20-hdmi",
+ "nvidia,tegra20-tvo",
+ "nvidia,tegra20-dsi",
+ "nvidia,tegra30-dc",
+ "nvidia,tegra30-hdmi",
+ "nvidia,tegra30-tvo",
+ "nvidia,tegra30-dsi"
+ };
+ unsigned int i;
int err;
+ struct device *dev;

- host1x = dev_get_drvdata(dev);
- drm->dev_private = host1x;
- host1x->drm = drm;
+ /* host1x is parent of all devices */
+ dev = bus_find_device_by_name(&platform_bus_type, NULL, "host1x");
+ if (!dev)
+ return -ENODEV;

- drm_mode_config_init(drm);
+ /* find devices that are available and add them into the 'required'
+ * list */
+ for (i = 0; i < ARRAY_SIZE(compat); i++) {
+ struct device_node *np;

- err = host1x_drm_init(host1x, drm);
- if (err < 0)
- return err;
+ for_each_child_of_node(dev->of_node, np) {
+ if (of_device_is_compatible(np, compat[i]) &&
+ of_device_is_available(np)) {
+ err = tegra_drm_add_client(np);
+ if (err < 0)
+ return err;
+ }
+ }
+ }

-#ifdef CONFIG_DRM_TEGRA_IOMMU
- host1x->dim = arm_iommu_create_mapping(&platform_bus_type,
- TEGRA_DRM_IOMMU_BASE_ADDR,
- TEGRA_DRM_IOMMU_SIZE, 0);
- if (IS_ERR_OR_NULL(host1x->dim)) {
- dev_err(dev, "%s: Create iommu mapping failed: %ld\n", __func__,
- PTR_ERR(host1x->dim));
- return PTR_ERR(host1x->dim);
+ return 0;
+}
+
+int tegra_drm_register_client(struct tegra_drm_client *client)
+{
+ struct tegra_drm_client_entry *drm, *tmp;
+ int err;
+
+ list_add_tail(&client->list, &tegra_drm_subdrv_list);
+
+ /* remove this device from 'required' list */
+ list_for_each_entry_safe(drm, tmp, &tegra_drm_subdrv_required, list)
+ if (drm->np == client->dev->of_node)
+ list_del(&drm->list);
+
+ /* if all required devices are found, register drm device */
+ if (list_empty(&tegra_drm_subdrv_required)) {
+ struct platform_device *pdev = to_platform_device(client->dev);
+
+ err = drm_platform_init(&tegra_drm_driver, pdev);
+ if (err < 0) {
+ dev_err(client->dev, "drm_platform_init(): %d\n", err);
+ return err;
+ }
}

- err = arm_iommu_attach_device(drm->dev, host1x->dim);
- if (err < 0) {
- dev_err(dev, "%s: Attach iommu device failed: %d\n", __func__,
- err);
- return err;
+ return 0;
+}
+
+int tegra_drm_unregister_client(struct tegra_drm_client *client)
+{
+ list_for_each_entry(client, &tegra_drm_subdrv_list, list) {
+
+ struct platform_device *pdev = to_platform_device(client->dev);
+
+ if (client->ops && client->ops->drm_exit) {
+ int err = client->ops->drm_exit(client);
+ if (err < 0) {
+ dev_err(client->dev,
+ "DRM cleanup failed for %s: %d\n",
+ dev_name(client->dev), err);
+ return err;
+ }
+ }
+
+ /* if this is the last device, unregister the drm driver */
+ if (client->list.next == &tegra_drm_subdrv_list)
+ drm_platform_exit(&tegra_drm_driver, pdev);
+
+ list_del_init(&client->list);
+ }
+
+ return 0;
+}
+
+static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
+{
+ struct tegra_drm_client *client;
+ int err;
+
+ drm_mode_config_init(drm);
+
+ list_for_each_entry(client, &tegra_drm_subdrv_list, list) {
+ if (client->ops && client->ops->drm_init) {
+ int err = client->ops->drm_init(client, drm);
+ if (err < 0) {
+ dev_dbg(drm->dev, "drm_init() failed for %s: %d\n",
+ dev_name(client->dev), err);
+ }
+ }
}
-#endif

err = tegra_drm_fb_init(drm);
if (err < 0)
@@ -74,18 +167,9 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)

static int tegra_drm_unload(struct drm_device *drm)
{
-#ifdef CONFIG_DRM_TEGRA_IOMMU
- struct host1x *host1x = dev_get_drvdata(drm->dev);
-#endif
-
drm_kms_helper_poll_fini(drm);
tegra_drm_fb_exit(drm);

-#ifdef CONFIG_DRM_TEGRA_IOMMU
- if (host1x->dim)
- arm_iommu_release_mapping(host1x->dim);
-#endif
-
drm_mode_config_cleanup(drm);

return 0;
@@ -98,10 +182,55 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)

static void tegra_drm_lastclose(struct drm_device *drm)
{
- struct host1x *host1x = drm->dev_private;
+ tegra_drm_fb_restore(drm);
+}

- drm_fbdev_cma_restore_mode(host1x->fbdev);
+static int __init tegra_drm_init(void)
+{
+ int err;
+
+ tegra_drm_parse_dt();
+
+ err = platform_driver_register(&tegra_dc_driver);
+ if (err < 0)
+ return err;
+
+ err = platform_driver_register(&tegra_hdmi_driver);
+ if (err < 0)
+ goto unregister_dc;
+
+ err = platform_driver_register(&tegra_tvo_driver);
+ if (err < 0)
+ goto unregister_hdmi;
+
+ err = platform_driver_register(&tegra_dsi_driver);
+ if (err < 0)
+ goto unregister_tvo;
+
+ return 0;
+
+unregister_tvo:
+ platform_driver_unregister(&tegra_tvo_driver);
+unregister_hdmi:
+ platform_driver_unregister(&tegra_hdmi_driver);
+unregister_dc:
+ platform_driver_unregister(&tegra_dc_driver);
+ return err;
}
+module_init(tegra_drm_init);
+
+static void __exit tegra_drm_exit(void)
+{
+ platform_driver_unregister(&tegra_dsi_driver);
+ platform_driver_unregister(&tegra_tvo_driver);
+ platform_driver_unregister(&tegra_hdmi_driver);
+ platform_driver_unregister(&tegra_dc_driver);
+}
+module_exit(tegra_drm_exit);
+
+MODULE_AUTHOR("Thierry Reding <thierry.reding-***@public.gmane.org>");
+MODULE_DESCRIPTION("NVIDIA Tegra DRM driver");
+MODULE_LICENSE("GPL");

static struct drm_ioctl_desc tegra_drm_ioctls[] = {
};
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index c7079ff..b2f9f10 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -31,59 +31,31 @@ static inline struct tegra_framebuffer *to_tegra_fb(struct drm_framebuffer *fb)
return container_of(fb, struct tegra_framebuffer, base);
}

-struct host1x {
- struct drm_device *drm;
- struct device *dev;
- void __iomem *regs;
- struct clk *clk;
- int syncpt;
- int irq;
-
- struct mutex drm_clients_lock;
- struct list_head drm_clients;
- struct list_head drm_active;
-
- struct mutex clients_lock;
- struct list_head clients;
+struct tegra_drm_client;

- struct drm_fbdev_cma *fbdev;
- struct tegra_framebuffer fb;
-
-#ifdef CONFIG_DRM_TEGRA_IOMMU
- struct dma_iommu_mapping *dim;
-#endif
-};
-
-struct host1x_client;
-
-struct host1x_client_ops {
- int (*drm_init)(struct host1x_client *client, struct drm_device *drm);
- int (*drm_exit)(struct host1x_client *client);
+struct tegra_drm_client_ops {
+ int (*drm_init)(struct tegra_drm_client *client,
+ struct drm_device *drm);
+ int (*drm_exit)(struct tegra_drm_client *client);
};

-struct host1x_client {
- struct host1x *host1x;
+struct tegra_drm_client {
struct device *dev;

- const struct host1x_client_ops *ops;
+ const struct tegra_drm_client_ops *ops;

struct list_head list;
-};

-extern int host1x_drm_init(struct host1x *host1x, struct drm_device *drm);
-extern int host1x_drm_exit(struct host1x *host1x);
+};

-extern int host1x_register_client(struct host1x *host1x,
- struct host1x_client *client);
-extern int host1x_unregister_client(struct host1x *host1x,
- struct host1x_client *client);
+extern int tegra_drm_register_client(struct tegra_drm_client *client);
+extern int tegra_drm_unregister_client(struct tegra_drm_client *client);

struct tegra_output;

struct tegra_dc {
- struct host1x_client client;
+ struct tegra_drm_client client;

- struct host1x *host1x;
struct device *dev;

struct drm_crtc base;
@@ -103,7 +75,8 @@ struct tegra_dc {
struct dentry *debugfs;
};

-static inline struct tegra_dc *host1x_client_to_dc(struct host1x_client *client)
+static inline struct tegra_dc *tegra_drm_client_to_dc(
+ struct tegra_drm_client *client)
{
return container_of(client, struct tegra_dc, client);
}
@@ -246,8 +219,8 @@ extern struct vm_operations_struct tegra_gem_vm_ops;
/* from fb.c */
extern int tegra_drm_fb_init(struct drm_device *drm);
extern void tegra_drm_fb_exit(struct drm_device *drm);
+extern void tegra_drm_fb_restore(struct drm_device *drm);

-extern struct platform_driver tegra_host1x_driver;
extern struct platform_driver tegra_hdmi_driver;
extern struct platform_driver tegra_tvo_driver;
extern struct platform_driver tegra_dsi_driver;
diff --git a/drivers/gpu/drm/tegra/dsi.c b/drivers/gpu/drm/tegra/dsi.c
index 156b3753..4f4c709 100644
--- a/drivers/gpu/drm/tegra/dsi.c
+++ b/drivers/gpu/drm/tegra/dsi.c
@@ -15,7 +15,7 @@
#include "drm.h"

struct tegra_dsi {
- struct host1x_client client;
+ struct tegra_drm_client client;
struct tegra_output output;

void __iomem *regs;
@@ -23,7 +23,7 @@ struct tegra_dsi {
};

static inline struct tegra_dsi *
-host1x_client_to_dsi(struct host1x_client *client)
+tegra_drm_client_to_dsi(struct tegra_drm_client *client)
{
return container_of(client, struct tegra_dsi, client);
}
@@ -59,10 +59,10 @@ static const struct tegra_output_ops dsi_ops = {
.disable = tegra_output_dsi_disable,
};

-static int tegra_dsi_drm_init(struct host1x_client *client,
+static int tegra_dsi_drm_init(struct tegra_drm_client *client,
struct drm_device *drm)
{
- struct tegra_dsi *dsi = host1x_client_to_dsi(client);
+ struct tegra_dsi *dsi = tegra_drm_client_to_dsi(client);
int err;

dsi->output.type = TEGRA_OUTPUT_DSI;
@@ -78,9 +78,9 @@ static int tegra_dsi_drm_init(struct host1x_client *client,
return 0;
}

-static int tegra_dsi_drm_exit(struct host1x_client *client)
+static int tegra_dsi_drm_exit(struct tegra_drm_client *client)
{
- struct tegra_dsi *dsi = host1x_client_to_dsi(client);
+ struct tegra_dsi *dsi = tegra_drm_client_to_dsi(client);
int err;

err = tegra_output_exit(&dsi->output);
@@ -92,14 +92,13 @@ static int tegra_dsi_drm_exit(struct host1x_client *client)
return 0;
}

-static const struct host1x_client_ops dsi_client_ops = {
+static const struct tegra_drm_client_ops dsi_client_ops = {
.drm_init = tegra_dsi_drm_init,
.drm_exit = tegra_dsi_drm_exit,
};

static int tegra_dsi_probe(struct platform_device *pdev)
{
- struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
struct tegra_dsi *dsi;
struct resource *regs;
int err;
@@ -126,9 +125,9 @@ static int tegra_dsi_probe(struct platform_device *pdev)
INIT_LIST_HEAD(&dsi->client.list);
dsi->client.dev = &pdev->dev;

- err = host1x_register_client(host1x, &dsi->client);
+ err = tegra_drm_register_client(&dsi->client);
if (err < 0) {
- dev_err(&pdev->dev, "failed to register host1x client: %d\n",
+ dev_err(&pdev->dev, "failed to register tegra drm client: %d\n",
err);
return err;
}
@@ -140,13 +139,12 @@ static int tegra_dsi_probe(struct platform_device *pdev)

static int tegra_dsi_remove(struct platform_device *pdev)
{
- struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
struct tegra_dsi *dsi = platform_get_drvdata(pdev);
int err;

- err = host1x_unregister_client(host1x, &dsi->client);
+ err = tegra_drm_unregister_client(&dsi->client);
if (err < 0) {
- dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+ dev_err(&pdev->dev, "failed to unregister tegra drm client: %d\n",
err);
return err;
}
diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c
index 97993c6..d6f44fa 100644
--- a/drivers/gpu/drm/tegra/fb.c
+++ b/drivers/gpu/drm/tegra/fb.c
@@ -9,11 +9,11 @@

#include "drm.h"

+static struct drm_fbdev_cma *tegra_fbdev;
+
static void tegra_drm_fb_output_poll_changed(struct drm_device *drm)
{
- struct host1x *host1x = drm->dev_private;
-
- drm_fbdev_cma_hotplug_event(host1x->fbdev);
+ drm_fbdev_cma_hotplug_event(tegra_fbdev);
}

static const struct drm_mode_config_funcs tegra_drm_mode_funcs = {
@@ -23,9 +23,6 @@ static const struct drm_mode_config_funcs tegra_drm_mode_funcs = {

int tegra_drm_fb_init(struct drm_device *drm)
{
- struct host1x *host1x = drm->dev_private;
- struct drm_fbdev_cma *fbdev;
-
drm->mode_config.min_width = 0;
drm->mode_config.min_height = 0;

@@ -34,23 +31,24 @@ int tegra_drm_fb_init(struct drm_device *drm)

drm->mode_config.funcs = &tegra_drm_mode_funcs;

- fbdev = drm_fbdev_cma_init(drm, 32, drm->mode_config.num_crtc,
+ tegra_fbdev = drm_fbdev_cma_init(drm, 32, drm->mode_config.num_crtc,
drm->mode_config.num_connector);
- if (IS_ERR(fbdev))
- return PTR_ERR(fbdev);
+ if (IS_ERR(tegra_fbdev))
+ return PTR_ERR(tegra_fbdev);

#ifndef CONFIG_FRAMEBUFFER_CONSOLE
- drm_fbdev_cma_restore_mode(fbdev);
+ drm_fbdev_cma_restore_mode(tegra_fbdev);
#endif

- host1x->fbdev = fbdev;
-
return 0;
}

void tegra_drm_fb_exit(struct drm_device *drm)
{
- struct host1x *host1x = drm->dev_private;
+ drm_fbdev_cma_fini(tegra_fbdev);
+}

- drm_fbdev_cma_fini(host1x->fbdev);
+void tegra_drm_fb_restore(struct drm_device *drm)
+{
+ drm_fbdev_cma_restore_mode(tegra_fbdev);
}
diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/drm/tegra/hdmi.c
index 58f55dc..b2b8e58 100644
--- a/drivers/gpu/drm/tegra/hdmi.c
+++ b/drivers/gpu/drm/tegra/hdmi.c
@@ -22,7 +22,7 @@
#include "dc.h"

struct tegra_hdmi {
- struct host1x_client client;
+ struct tegra_drm_client client;
struct tegra_output output;
struct device *dev;

@@ -46,7 +46,7 @@ struct tegra_hdmi {
};

static inline struct tegra_hdmi *
-host1x_client_to_hdmi(struct host1x_client *client)
+tegra_drm_client_to_hdmi(struct tegra_drm_client *client)
{
return container_of(client, struct tegra_hdmi, client);
}
@@ -1152,10 +1152,10 @@ static int tegra_hdmi_debugfs_exit(struct tegra_hdmi *hdmi)
return 0;
}

-static int tegra_hdmi_drm_init(struct host1x_client *client,
+static int tegra_hdmi_drm_init(struct tegra_drm_client *client,
struct drm_device *drm)
{
- struct tegra_hdmi *hdmi = host1x_client_to_hdmi(client);
+ struct tegra_hdmi *hdmi = tegra_drm_client_to_hdmi(client);
int err;

hdmi->output.type = TEGRA_OUTPUT_HDMI;
@@ -1177,9 +1177,9 @@ static int tegra_hdmi_drm_init(struct host1x_client *client,
return 0;
}

-static int tegra_hdmi_drm_exit(struct host1x_client *client)
+static int tegra_hdmi_drm_exit(struct tegra_drm_client *client)
{
- struct tegra_hdmi *hdmi = host1x_client_to_hdmi(client);
+ struct tegra_hdmi *hdmi = tegra_drm_client_to_hdmi(client);
int err;

if (IS_ENABLED(CONFIG_DEBUG_FS)) {
@@ -1204,14 +1204,13 @@ static int tegra_hdmi_drm_exit(struct host1x_client *client)
return 0;
}

-static const struct host1x_client_ops hdmi_client_ops = {
+static const struct tegra_drm_client_ops hdmi_client_ops = {
.drm_init = tegra_hdmi_drm_init,
.drm_exit = tegra_hdmi_drm_exit,
};

static int tegra_hdmi_probe(struct platform_device *pdev)
{
- struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
struct tegra_hdmi *hdmi;
struct resource *regs;
int err;
@@ -1286,9 +1285,9 @@ static int tegra_hdmi_probe(struct platform_device *pdev)
INIT_LIST_HEAD(&hdmi->client.list);
hdmi->client.dev = &pdev->dev;

- err = host1x_register_client(host1x, &hdmi->client);
+ err = tegra_drm_register_client(&hdmi->client);
if (err < 0) {
- dev_err(&pdev->dev, "failed to register host1x client: %d\n",
+ dev_err(&pdev->dev, "failed to register tegra drm client: %d\n",
err);
return err;
}
@@ -1300,13 +1299,12 @@ static int tegra_hdmi_probe(struct platform_device *pdev)

static int tegra_hdmi_remove(struct platform_device *pdev)
{
- struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
struct tegra_hdmi *hdmi = platform_get_drvdata(pdev);
int err;

- err = host1x_unregister_client(host1x, &hdmi->client);
+ err = tegra_drm_unregister_client(&hdmi->client);
if (err < 0) {
- dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+ dev_err(&pdev->dev, "failed to unregister tegra drm client: %d\n",
err);
return err;
}
diff --git a/drivers/gpu/drm/tegra/host1x.c b/drivers/gpu/drm/tegra/host1x.c
deleted file mode 100644
index f9d3a84..0000000
--- a/drivers/gpu/drm/tegra/host1x.c
+++ /dev/null
@@ -1,343 +0,0 @@
-/*
- * Copyright (C) 2012 Avionic Design GmbH
- * Copyright (C) 2012 NVIDIA CORPORATION. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#include <linux/clk.h>
-#include <linux/err.h>
-#include <linux/module.h>
-#include <linux/of.h>
-#include <linux/platform_device.h>
-
-#include "drm.h"
-
-struct host1x_drm_client {
- struct host1x_client *client;
- struct device_node *np;
- struct list_head list;
-};
-
-static int host1x_add_drm_client(struct host1x *host1x, struct device_node *np)
-{
- struct host1x_drm_client *client;
-
- client = kzalloc(sizeof(*client), GFP_KERNEL);
- if (!client)
- return -ENOMEM;
-
- INIT_LIST_HEAD(&client->list);
- client->np = of_node_get(np);
-
- list_add_tail(&client->list, &host1x->drm_clients);
-
- return 0;
-}
-
-static int host1x_activate_drm_client(struct host1x *host1x,
- struct host1x_drm_client *drm,
- struct host1x_client *client)
-{
- mutex_lock(&host1x->drm_clients_lock);
- list_del_init(&drm->list);
- list_add_tail(&drm->list, &host1x->drm_active);
- drm->client = client;
- mutex_unlock(&host1x->drm_clients_lock);
-
- return 0;
-}
-
-static int host1x_remove_drm_client(struct host1x *host1x,
- struct host1x_drm_client *client)
-{
- mutex_lock(&host1x->drm_clients_lock);
- list_del_init(&client->list);
- mutex_unlock(&host1x->drm_clients_lock);
-
- of_node_put(client->np);
- kfree(client);
-
- return 0;
-}
-
-static int host1x_parse_dt(struct host1x *host1x)
-{
- static const char * const compat[] = {
- "nvidia,tegra20-dc",
- "nvidia,tegra20-hdmi",
- "nvidia,tegra20-tvo",
- "nvidia,tegra20-dsi",
- "nvidia,tegra30-dc",
- "nvidia,tegra30-hdmi",
- "nvidia,tegra30-tvo",
- "nvidia,tegra30-dsi"
- };
- unsigned int i;
- int err;
-
- for (i = 0; i < ARRAY_SIZE(compat); i++) {
- struct device_node *np;
-
- for_each_child_of_node(host1x->dev->of_node, np) {
- if (of_device_is_compatible(np, compat[i]) &&
- of_device_is_available(np)) {
- err = host1x_add_drm_client(host1x, np);
- if (err < 0)
- return err;
- }
- }
- }
-
- return 0;
-}
-
-static int tegra_host1x_probe(struct platform_device *pdev)
-{
- struct host1x *host1x;
- struct resource *regs;
- int err;
-
- host1x = devm_kzalloc(&pdev->dev, sizeof(*host1x), GFP_KERNEL);
- if (!host1x)
- return -ENOMEM;
-
- mutex_init(&host1x->drm_clients_lock);
- INIT_LIST_HEAD(&host1x->drm_clients);
- INIT_LIST_HEAD(&host1x->drm_active);
- mutex_init(&host1x->clients_lock);
- INIT_LIST_HEAD(&host1x->clients);
- host1x->dev = &pdev->dev;
-
- err = host1x_parse_dt(host1x);
- if (err < 0) {
- dev_err(&pdev->dev, "failed to parse DT: %d\n", err);
- return err;
- }
-
- host1x->clk = devm_clk_get(&pdev->dev, NULL);
- if (IS_ERR(host1x->clk))
- return PTR_ERR(host1x->clk);
-
- err = clk_prepare_enable(host1x->clk);
- if (err < 0)
- return err;
-
- regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- if (!regs) {
- err = -ENXIO;
- goto err;
- }
-
- err = platform_get_irq(pdev, 0);
- if (err < 0)
- goto err;
-
- host1x->syncpt = err;
-
- err = platform_get_irq(pdev, 1);
- if (err < 0)
- goto err;
-
- host1x->irq = err;
-
- host1x->regs = devm_request_and_ioremap(&pdev->dev, regs);
- if (!host1x->regs) {
- err = -EADDRNOTAVAIL;
- goto err;
- }
-
- platform_set_drvdata(pdev, host1x);
-
- return 0;
-
-err:
- clk_disable_unprepare(host1x->clk);
- return err;
-}
-
-static int tegra_host1x_remove(struct platform_device *pdev)
-{
- struct host1x *host1x = platform_get_drvdata(pdev);
-
- clk_disable_unprepare(host1x->clk);
-
- return 0;
-}
-
-int host1x_drm_init(struct host1x *host1x, struct drm_device *drm)
-{
- struct host1x_client *client;
-
- mutex_lock(&host1x->clients_lock);
-
- list_for_each_entry(client, &host1x->clients, list) {
- if (client->ops && client->ops->drm_init) {
- int err = client->ops->drm_init(client, drm);
- if (err < 0) {
- dev_err(host1x->dev,
- "DRM setup failed for %s: %d\n",
- dev_name(client->dev), err);
- return err;
- }
- }
- }
-
- mutex_unlock(&host1x->clients_lock);
-
- return 0;
-}
-
-int host1x_drm_exit(struct host1x *host1x)
-{
- struct platform_device *pdev = to_platform_device(host1x->dev);
- struct host1x_client *client;
-
- if (!host1x->drm)
- return 0;
-
- mutex_lock(&host1x->clients_lock);
-
- list_for_each_entry_reverse(client, &host1x->clients, list) {
- if (client->ops && client->ops->drm_exit) {
- int err = client->ops->drm_exit(client);
- if (err < 0) {
- dev_err(host1x->dev,
- "DRM cleanup failed for %s: %d\n",
- dev_name(client->dev), err);
- return err;
- }
- }
- }
-
- mutex_unlock(&host1x->clients_lock);
-
- drm_platform_exit(&tegra_drm_driver, pdev);
- host1x->drm = NULL;
-
- return 0;
-}
-
-int host1x_register_client(struct host1x *host1x, struct host1x_client *client)
-{
- struct host1x_drm_client *drm, *tmp;
- int err;
-
- mutex_lock(&host1x->clients_lock);
- list_add_tail(&client->list, &host1x->clients);
- mutex_unlock(&host1x->clients_lock);
-
- list_for_each_entry_safe(drm, tmp, &host1x->drm_clients, list)
- if (drm->np == client->dev->of_node)
- host1x_activate_drm_client(host1x, drm, client);
-
- if (list_empty(&host1x->drm_clients)) {
- struct platform_device *pdev = to_platform_device(host1x->dev);
-
- err = drm_platform_init(&tegra_drm_driver, pdev);
- if (err < 0) {
- dev_err(host1x->dev, "drm_platform_init(): %d\n", err);
- return err;
- }
- }
-
- return 0;
-}
-
-int host1x_unregister_client(struct host1x *host1x,
- struct host1x_client *client)
-{
- struct host1x_drm_client *drm, *tmp;
- int err;
-
- list_for_each_entry_safe(drm, tmp, &host1x->drm_active, list) {
- if (drm->client == client) {
- err = host1x_drm_exit(host1x);
- if (err < 0) {
- dev_err(host1x->dev, "host1x_drm_exit(): %d\n",
- err);
- return err;
- }
-
- host1x_remove_drm_client(host1x, drm);
- break;
- }
- }
-
- mutex_lock(&host1x->clients_lock);
- list_del_init(&client->list);
- mutex_unlock(&host1x->clients_lock);
-
- return 0;
-}
-
-static struct of_device_id tegra_host1x_of_match[] = {
- { .compatible = "nvidia,tegra20-host1x", },
- { .compatible = "nvidia,tegra30-host1x", },
- { },
-};
-MODULE_DEVICE_TABLE(of, tegra_host1x_of_match);
-
-struct platform_driver tegra_host1x_driver = {
- .driver = {
- .name = "tegra-host1x",
- .owner = THIS_MODULE,
- .of_match_table = tegra_host1x_of_match,
- },
- .probe = tegra_host1x_probe,
- .remove = tegra_host1x_remove,
-};
-
-static int __init tegra_host1x_init(void)
-{
- int err;
-
- err = platform_driver_register(&tegra_host1x_driver);
- if (err < 0)
- return err;
-
- err = platform_driver_register(&tegra_dc_driver);
- if (err < 0)
- goto unregister_host1x;
-
- err = platform_driver_register(&tegra_hdmi_driver);
- if (err < 0)
- goto unregister_dc;
-
- err = platform_driver_register(&tegra_tvo_driver);
- if (err < 0)
- goto unregister_hdmi;
-
- err = platform_driver_register(&tegra_dsi_driver);
- if (err < 0)
- goto unregister_tvo;
-
- return 0;
-
-unregister_tvo:
- platform_driver_unregister(&tegra_tvo_driver);
-unregister_hdmi:
- platform_driver_unregister(&tegra_hdmi_driver);
-unregister_dc:
- platform_driver_unregister(&tegra_dc_driver);
-unregister_host1x:
- platform_driver_unregister(&tegra_host1x_driver);
- return err;
-}
-module_init(tegra_host1x_init);
-
-static void __exit tegra_host1x_exit(void)
-{
- platform_driver_unregister(&tegra_dsi_driver);
- platform_driver_unregister(&tegra_tvo_driver);
- platform_driver_unregister(&tegra_hdmi_driver);
- platform_driver_unregister(&tegra_dc_driver);
- platform_driver_unregister(&tegra_host1x_driver);
-}
-module_exit(tegra_host1x_exit);
-
-MODULE_AUTHOR("Thierry Reding <thierry.reding-***@public.gmane.org>");
-MODULE_DESCRIPTION("NVIDIA Tegra DRM driver");
-MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/drm/tegra/tvo.c b/drivers/gpu/drm/tegra/tvo.c
index a67bd28..01ac356 100644
--- a/drivers/gpu/drm/tegra/tvo.c
+++ b/drivers/gpu/drm/tegra/tvo.c
@@ -15,7 +15,7 @@
#include "drm.h"

struct tegra_tvo {
- struct host1x_client client;
+ struct tegra_drm_client client;
struct tegra_output output;

void __iomem *regs;
@@ -24,7 +24,7 @@ struct tegra_tvo {
};

static inline struct tegra_tvo *
-host1x_client_to_tvo(struct host1x_client *client)
+tegra_drm_client_to_tvo(struct tegra_drm_client *client)
{
return container_of(client, struct tegra_tvo, client);
}
@@ -60,10 +60,10 @@ static const struct tegra_output_ops tvo_ops = {
.disable = tegra_output_tvo_disable,
};

-static int tegra_tvo_drm_init(struct host1x_client *client,
+static int tegra_tvo_drm_init(struct tegra_drm_client *client,
struct drm_device *drm)
{
- struct tegra_tvo *tvo = host1x_client_to_tvo(client);
+ struct tegra_tvo *tvo = tegra_drm_client_to_tvo(client);
int err;

tvo->output.type = TEGRA_OUTPUT_TVO;
@@ -79,9 +79,9 @@ static int tegra_tvo_drm_init(struct host1x_client *client,
return 0;
}

-static int tegra_tvo_drm_exit(struct host1x_client *client)
+static int tegra_tvo_drm_exit(struct tegra_drm_client *client)
{
- struct tegra_tvo *tvo = host1x_client_to_tvo(client);
+ struct tegra_tvo *tvo = tegra_drm_client_to_tvo(client);
int err;

err = tegra_output_exit(&tvo->output);
@@ -93,14 +93,13 @@ static int tegra_tvo_drm_exit(struct host1x_client *client)
return err;
}

-static const struct host1x_client_ops tvo_client_ops = {
+static const struct tegra_drm_client_ops tvo_client_ops = {
.drm_init = tegra_tvo_drm_init,
.drm_exit = tegra_tvo_drm_exit,
};

static int tegra_tvo_probe(struct platform_device *pdev)
{
- struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
struct tegra_tvo *tvo;
struct resource *regs;
int err;
@@ -120,22 +119,25 @@ static int tegra_tvo_probe(struct platform_device *pdev)
return -ENXIO;

err = platform_get_irq(pdev, 0);
- if (err < 0)
+ if (err < 0) {
+ dev_err(&pdev->dev, "failed to get tvo irq\n");
return err;
-
+ }
tvo->irq = err;

tvo->regs = devm_request_and_ioremap(&pdev->dev, regs);
- if (!tvo->regs)
+ if (!tvo->regs) {
+ dev_err(&pdev->dev, "failed to request tvo regs\n");
return -EADDRNOTAVAIL;
+ }

tvo->client.ops = &tvo_client_ops;
INIT_LIST_HEAD(&tvo->client.list);
tvo->client.dev = &pdev->dev;

- err = host1x_register_client(host1x, &tvo->client);
+ err = tegra_drm_register_client(&tvo->client);
if (err < 0) {
- dev_err(&pdev->dev, "failed to register host1x client: %d\n",
+ dev_err(&pdev->dev, "failed to register tegra drm client: %d\n",
err);
return err;
}
@@ -147,13 +149,12 @@ static int tegra_tvo_probe(struct platform_device *pdev)

static int tegra_tvo_remove(struct platform_device *pdev)
{
- struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
struct tegra_tvo *tvo = platform_get_drvdata(pdev);
int err;

- err = host1x_unregister_client(host1x, &tvo->client);
+ err = tegra_drm_unregister_client(&tvo->client);
if (err < 0) {
- dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+ dev_err(&pdev->dev, "failed to unregister tegra drm client: %d\n",
err);
return err;
}
--
1.7.9.5
Thierry Reding
2012-12-05 08:33:35 UTC
Permalink
On Mon, Nov 26, 2012 at 03:19:12PM +0200, Terje Bergstrom wrote:
> From: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+***@public.gmane.org>
>
> This patch removes the redundant host1x driver from tegradrm and
> makes necessary bindings to the separate host driver.
>
> This modification introduces a regression: Because there is no
> general framework for attaching separate devices into the
> same address space, this patch removes the ability to use IOMMU
> in tegradrm.
>
> Signed-off-by: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+***@public.gmane.org>
> Signed-off-by: Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org>

I've been thinking about this some more and came to the conclusion that
since we will already have a tight coupling between host1x and tegra-drm
we may just as well keep the client registration code in host1x. The way
I imagine this to work would be to export a public API from tegra-drm,
say tegra_drm_init() and tegra_drm_exit(), that could be called in place
of drm_platform_init() and drm_platform_exit() in the current code.

tegra_drm_init() could then be passed the host1x platform device to bind
to. The only thing that would need to be done is move the fields in the
host1x structure specific to DRM into a separate structure. host1x would
have to export host1x_drm_init/exit() which the DRM can invoke to have
all DRM clients register to the DRM subsystem.

From a hierarchical point of view this makes sense, with host1x being
the parent of all DRM subdevices. It allows us to reuse the current code
from tegra-drm that has been tested and works properly even for module
unload/reload. We also get to keep the proper encapsulation and the
switch to the separate host1x driver will require a much smaller patch.

Does anybody see a disadvantage in this approach?

Thierry
Terje Bergström
2012-12-05 10:10:50 UTC
Permalink
On 05.12.2012 10:33, Thierry Reding wrote:
> I've been thinking about this some more and came to the conclusion that
> since we will already have a tight coupling between host1x and tegra-drm
> we may just as well keep the client registration code in host1x. The way
> I imagine this to work would be to export a public API from tegra-drm,
> say tegra_drm_init() and tegra_drm_exit(), that could be called in place
> of drm_platform_init() and drm_platform_exit() in the current code.
>
> tegra_drm_init() could then be passed the host1x platform device to bind
> to. The only thing that would need to be done is move the fields in the
> host1x structure specific to DRM into a separate structure. host1x would
> have to export host1x_drm_init/exit() which the DRM can invoke to have
> all DRM clients register to the DRM subsystem.
>
> From a hierarchical point of view this makes sense, with host1x being
> the parent of all DRM subdevices. It allows us to reuse the current code
> from tegra-drm that has been tested and works properly even for module
> unload/reload. We also get to keep the proper encapsulation and the
> switch to the separate host1x driver will require a much smaller patch.
>
> Does anybody see a disadvantage in this approach?

I'm a bit confused about the scope. You mention host1x several times,
but I'm not sure if you mean the file drivers/gpu/drm/tegra/host1x.c or
the host1x driver. So I might be babbling when I answer this. Could you
please clarify that?

host1x hardware access must be encapsulated in host1x driver
(drivers/gpu/host1x if that's the location we prefer). As for the
management of the list of DRM clients, we proposed the move to drm.c,
because host1x hardware would anyway be controlled by a different
driver. Having file called host1x.c in tegradrm didn't sound logical, as
its not really controlling host1x, and its probe wouldn't be called.

If your proposal is that we'd move the management of the list of host1x
devices from tegradrm to host1x driver, we'd have a tight circular
dependency between two drivers and that's almost always a bad idea. So
far all ideas have revolved around tegradrm calling host1x, and host1x
calling a bit of DRM (for CMA, would be fixed in later version) but not
host1x calling tegradrm.

host1x driver itself has only little use for the list of clients.
Basically we need only a list of channels, and platform devices
associated with channels, to be able to dump host1x channel state.

Mind you, I believe nvhost driver as part of our BSP has had quite many
more hours of runtime than tegradrm. :-)

Terje
Thierry Reding
2012-12-05 11:13:32 UTC
Permalink
On Wed, Dec 05, 2012 at 12:10:50PM +0200, Terje Bergström wrote:
> On 05.12.2012 10:33, Thierry Reding wrote:
> > I've been thinking about this some more and came to the conclusion that
> > since we will already have a tight coupling between host1x and tegra-drm
> > we may just as well keep the client registration code in host1x. The way
> > I imagine this to work would be to export a public API from tegra-drm,
> > say tegra_drm_init() and tegra_drm_exit(), that could be called in place
> > of drm_platform_init() and drm_platform_exit() in the current code.
> >
> > tegra_drm_init() could then be passed the host1x platform device to bind
> > to. The only thing that would need to be done is move the fields in the
> > host1x structure specific to DRM into a separate structure. host1x would
> > have to export host1x_drm_init/exit() which the DRM can invoke to have
> > all DRM clients register to the DRM subsystem.
> >
> > From a hierarchical point of view this makes sense, with host1x being
> > the parent of all DRM subdevices. It allows us to reuse the current code
> > from tegra-drm that has been tested and works properly even for module
> > unload/reload. We also get to keep the proper encapsulation and the
> > switch to the separate host1x driver will require a much smaller patch.
> >
> > Does anybody see a disadvantage in this approach?
>
> I'm a bit confused about the scope. You mention host1x several times,
> but I'm not sure if you mean the file drivers/gpu/drm/tegra/host1x.c or
> the host1x driver. So I might be babbling when I answer this. Could you
> please clarify that?

What I propose is to move the client registration code that is currently
in drivers/gpu/drm/tegra/host1x.c to the host1x driver, which may or may
not end up in drivers/gpu/host1x.

> host1x hardware access must be encapsulated in host1x driver
> (drivers/gpu/host1x if that's the location we prefer). As for the
> management of the list of DRM clients, we proposed the move to drm.c,
> because host1x hardware would anyway be controlled by a different
> driver. Having file called host1x.c in tegradrm didn't sound logical, as
> its not really controlling host1x, and its probe wouldn't be called.

Oh well, at the time nobody from NVIDIA was involved so I wrote that
code in preparation for proper host1x support that I thought I would
have to add myself at some point. I'm more than glad that I don't have
to do this all by myself. However the patch proposed in this series
breaks a number of requirements such as proper encapsulation, which I
already mentioned in more detail in another mail.

> If your proposal is that we'd move the management of the list of host1x
> devices from tegradrm to host1x driver, we'd have a tight circular
> dependency between two drivers and that's almost always a bad idea. So
> far all ideas have revolved around tegradrm calling host1x, and host1x
> calling a bit of DRM (for CMA, would be fixed in later version) but not
> host1x calling tegradrm.

The problem that this solves is that the DRM driver needs to be bound to
a specific platform device. None of the DRM subdevices are suitable
because they are only part of the whole DRM device. I think that host1x
is the only device that fits here.

Note that this is only an administrative problem. It shouldn't interfere
with the way host1x works. The goal is that the DRM device is registered
at the proper hierarchical location.

The circular dependency is indeed a problem, though. Quite frankly I
have no idea how to solve this. However the approach taken in the
current patch will break several other requirements as I already
explained.

Thierry
Terje Bergström
2012-12-05 11:47:38 UTC
Permalink
On 05.12.2012 13:13, Thierry Reding wrote:
> What I propose is to move the client registration code that is currently
> in drivers/gpu/drm/tegra/host1x.c to the host1x driver, which may or may
> not end up in drivers/gpu/host1x.

Ok.

>
>> host1x hardware access must be encapsulated in host1x driver
>> (drivers/gpu/host1x if that's the location we prefer). As for the
>> management of the list of DRM clients, we proposed the move to drm.c,
>> because host1x hardware would anyway be controlled by a different
>> driver. Having file called host1x.c in tegradrm didn't sound logical, as
>> its not really controlling host1x, and its probe wouldn't be called.
>
> Oh well, at the time nobody from NVIDIA was involved so I wrote that
> code in preparation for proper host1x support that I thought I would
> have to add myself at some point. I'm more than glad that I don't have
> to do this all by myself. However the patch proposed in this series
> breaks a number of requirements such as proper encapsulation, which I
> already mentioned in more detail in another mail.

Hmm, I'm not sure if I remember that you refer to by the proper
encapsulation. Is that the fact that we bind DRM to a sub-client?

> The problem that this solves is that the DRM driver needs to be bound to
> a specific platform device. None of the DRM subdevices are suitable
> because they are only part of the whole DRM device. I think that host1x
> is the only device that fits here.
>
> Note that this is only an administrative problem. It shouldn't interfere
> with the way host1x works. The goal is that the DRM device is registered
> at the proper hierarchical location.
>
> The circular dependency is indeed a problem, though. Quite frankly I
> have no idea how to solve this. However the approach taken in the
> current patch will break several other requirements as I already
> explained.

The problem with doing drm_platform_init() with host1x device as
parameter is that drm_get_platform_dev() will take control of drvdata.
We'd need to put host1x specific struct host1x pointer to some other
place and I'm not sure what that place could be.

You're right in that binding to a sub-device is not a nice way. DRM
framework just needs a "struct device" to bind to. exynos seems to solve
this by introducing a virtual device and bind to that. I'm not sure if
this is the best way, but worth considering?

Terje
Terje Bergstrom
2012-11-26 13:19:08 UTC
Permalink
Add support for sync point interrupts, and sync point wait. Sync point
wait uses interrupts for unblocking wait.

Signed-off-by: Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org>
---
drivers/video/tegra/host/Makefile | 1 +
drivers/video/tegra/host/chip_support.h | 17 ++
drivers/video/tegra/host/dev.c | 7 +
drivers/video/tegra/host/host1x/host1x.c | 33 +++
drivers/video/tegra/host/host1x/host1x.h | 2 +
drivers/video/tegra/host/host1x/host1x01.c | 2 +
drivers/video/tegra/host/host1x/host1x_intr.c | 263 ++++++++++++++++++
drivers/video/tegra/host/nvhost_intr.c | 363 +++++++++++++++++++++++++
drivers/video/tegra/host/nvhost_intr.h | 102 +++++++
drivers/video/tegra/host/nvhost_syncpt.c | 111 ++++++++
drivers/video/tegra/host/nvhost_syncpt.h | 10 +
include/linux/nvhost.h | 2 +
12 files changed, 913 insertions(+)
create mode 100644 drivers/video/tegra/host/host1x/host1x_intr.c
create mode 100644 drivers/video/tegra/host/nvhost_intr.c
create mode 100644 drivers/video/tegra/host/nvhost_intr.h

diff --git a/drivers/video/tegra/host/Makefile b/drivers/video/tegra/host/Makefile
index 3edab4a..24acccc 100644
--- a/drivers/video/tegra/host/Makefile
+++ b/drivers/video/tegra/host/Makefile
@@ -3,6 +3,7 @@ ccflags-y = -Idrivers/video/tegra/host
nvhost-objs = \
nvhost_acm.o \
nvhost_syncpt.o \
+ nvhost_intr.o \
dev.o \
chip_support.o

diff --git a/drivers/video/tegra/host/chip_support.h b/drivers/video/tegra/host/chip_support.h
index acfa2f1..5c8f49f 100644
--- a/drivers/video/tegra/host/chip_support.h
+++ b/drivers/video/tegra/host/chip_support.h
@@ -25,6 +25,7 @@
struct output;

struct nvhost_master;
+struct nvhost_intr;
struct nvhost_syncpt;
struct platform_device;

@@ -38,14 +39,30 @@ struct nvhost_syncpt_ops {
const char * (*name)(struct nvhost_syncpt *, u32 id);
};

+struct nvhost_intr_ops {
+ void (*init_host_sync)(struct nvhost_intr *);
+ void (*set_host_clocks_per_usec)(
+ struct nvhost_intr *, u32 clocks);
+ void (*set_syncpt_threshold)(
+ struct nvhost_intr *, u32 id, u32 thresh);
+ void (*enable_syncpt_intr)(struct nvhost_intr *, u32 id);
+ void (*disable_syncpt_intr)(struct nvhost_intr *, u32 id);
+ void (*disable_all_syncpt_intrs)(struct nvhost_intr *);
+ int (*request_host_general_irq)(struct nvhost_intr *);
+ void (*free_host_general_irq)(struct nvhost_intr *);
+ int (*free_syncpt_irq)(struct nvhost_intr *);
+};
+
struct nvhost_chip_support {
const char *soc_name;
struct nvhost_syncpt_ops syncpt;
+ struct nvhost_intr_ops intr;
};

struct nvhost_chip_support *nvhost_get_chip_ops(void);

#define syncpt_op() (nvhost_get_chip_ops()->syncpt)
+#define intr_op() (nvhost_get_chip_ops()->intr)

int nvhost_init_chip_support(struct nvhost_master *host);

diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
index 98c9c9f..025a820 100644
--- a/drivers/video/tegra/host/dev.c
+++ b/drivers/video/tegra/host/dev.c
@@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
}
EXPORT_SYMBOL(host1x_syncpt_read);

+int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
+{
+ struct nvhost_syncpt *sp = &nvhost->syncpt;
+ return nvhost_syncpt_wait_timeout(sp, id, thresh, timeout, value);
+}
+EXPORT_SYMBOL(host1x_syncpt_wait);
+
bool host1x_powered(struct platform_device *dev)
{
bool ret = 0;
diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c
index 77ff00b..766931b 100644
--- a/drivers/video/tegra/host/host1x/host1x.c
+++ b/drivers/video/tegra/host/host1x/host1x.c
@@ -52,8 +52,24 @@ static int power_off_host(struct platform_device *dev)
return 0;
}

+static void clock_on_host(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+ struct nvhost_master *host = nvhost_get_private_data(dev);
+ nvhost_intr_start(&host->intr, clk_get_rate(pdata->clk[0]));
+}
+
+static int clock_off_host(struct platform_device *dev)
+{
+ struct nvhost_master *host = nvhost_get_private_data(dev);
+ nvhost_intr_stop(&host->intr);
+ return 0;
+}
+
static void nvhost_free_resources(struct nvhost_master *host)
{
+ kfree(host->intr.syncpt);
+ host->intr.syncpt = 0;
}

static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
@@ -64,6 +80,16 @@ static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
if (err)
return err;

+ host->intr.syncpt = devm_kzalloc(&host->dev->dev,
+ sizeof(struct nvhost_intr_syncpt) *
+ nvhost_syncpt_nb_pts(&host->syncpt),
+ GFP_KERNEL);
+
+ if (!host->intr.syncpt) {
+ /* frees happen in the support removal phase */
+ return -ENOMEM;
+ }
+
return 0;
}

@@ -99,6 +125,8 @@ static int __devinit nvhost_probe(struct platform_device *dev)

pdata->finalize_poweron = power_on_host;
pdata->prepare_poweroff = power_off_host;
+ pdata->prepare_clockoff = clock_off_host;
+ pdata->finalize_clockon = clock_on_host;

pdata->pdev = dev;

@@ -125,6 +153,10 @@ static int __devinit nvhost_probe(struct platform_device *dev)
if (err)
goto fail;

+ err = nvhost_intr_init(&host->intr, intr1->start, intr0->start);
+ if (err)
+ goto fail;
+
err = nvhost_module_init(dev);
if (err)
goto fail;
@@ -148,6 +180,7 @@ fail:
static int __exit nvhost_remove(struct platform_device *dev)
{
struct nvhost_master *host = nvhost_get_private_data(dev);
+ nvhost_intr_deinit(&host->intr);
nvhost_syncpt_deinit(&host->syncpt);
nvhost_module_deinit(dev);
nvhost_free_resources(host);
diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h
index 76748ac..af9bfef 100644
--- a/drivers/video/tegra/host/host1x/host1x.h
+++ b/drivers/video/tegra/host/host1x/host1x.h
@@ -25,6 +25,7 @@
#include <linux/nvhost.h>

#include "nvhost_syncpt.h"
+#include "nvhost_intr.h"

#define TRACE_MAX_LENGTH 128U
#define IFACE_NAME "nvhost"
@@ -33,6 +34,7 @@ struct nvhost_master {
void __iomem *aperture;
void __iomem *sync_aperture;
struct nvhost_syncpt syncpt;
+ struct nvhost_intr intr;
struct platform_device *dev;
struct host1x_device_info info;
};
diff --git a/drivers/video/tegra/host/host1x/host1x01.c b/drivers/video/tegra/host/host1x/host1x01.c
index d53302d..5bf0e6e 100644
--- a/drivers/video/tegra/host/host1x/host1x01.c
+++ b/drivers/video/tegra/host/host1x/host1x01.c
@@ -26,12 +26,14 @@
#include "chip_support.h"

#include "host1x/host1x_syncpt.c"
+#include "host1x/host1x_intr.c"

int nvhost_init_host1x01_support(struct nvhost_master *host,
struct nvhost_chip_support *op)
{
host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE;
op->syncpt = host1x_syncpt_ops;
+ op->intr = host1x_intr_ops;

return 0;
}
diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c b/drivers/video/tegra/host/host1x/host1x_intr.c
new file mode 100644
index 0000000..94f08cb
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x_intr.c
@@ -0,0 +1,263 @@
+/*
+ * drivers/video/tegra/host/host1x/host1x_intr.c
+ *
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (C) 2010 Google, Inc.
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/io.h>
+#include <asm/mach/irq.h>
+
+#include "nvhost_intr.h"
+#include "host1x/host1x.h"
+
+/* Spacing between sync registers */
+#define REGISTER_STRIDE 4
+
+static void host1x_intr_syncpt_thresh_isr(struct nvhost_intr_syncpt *syncpt);
+
+static void syncpt_thresh_cascade_fn(struct work_struct *work)
+{
+ struct nvhost_intr_syncpt *sp =
+ container_of(work, struct nvhost_intr_syncpt, work);
+ nvhost_syncpt_thresh_fn(sp->irq, sp);
+}
+
+static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
+{
+ struct nvhost_master *dev = dev_id;
+ void __iomem *sync_regs = dev->sync_aperture;
+ struct nvhost_intr *intr = &dev->intr;
+ unsigned long reg;
+ int i, id;
+
+ for (i = 0; i < dev->info.nb_pts / BITS_PER_LONG; i++) {
+ reg = readl(sync_regs +
+ host1x_sync_syncpt_thresh_cpu0_int_status_r() +
+ i * REGISTER_STRIDE);
+ for_each_set_bit(id, &reg, BITS_PER_LONG) {
+ struct nvhost_intr_syncpt *sp =
+ intr->syncpt + (i * BITS_PER_LONG + id);
+ host1x_intr_syncpt_thresh_isr(sp);
+ queue_work(intr->wq, &sp->work);
+ }
+ }
+
+ return IRQ_HANDLED;
+}
+
+static void host1x_intr_init_host_sync(struct nvhost_intr *intr)
+{
+ struct nvhost_master *dev = intr_to_dev(intr);
+ void __iomem *sync_regs = dev->sync_aperture;
+ int i, err, irq;
+
+ writel(0xffffffffUL,
+ sync_regs + host1x_sync_syncpt_thresh_int_disable_r());
+ writel(0xffffffffUL,
+ sync_regs + host1x_sync_syncpt_thresh_cpu0_int_status_r());
+
+ for (i = 0; i < dev->info.nb_pts; i++)
+ INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
+
+ irq = platform_get_irq(dev->dev, 0);
+ WARN_ON(IS_ERR_VALUE(irq));
+ err = devm_request_irq(&dev->dev->dev, irq,
+ syncpt_thresh_cascade_isr,
+ IRQF_SHARED, "host_syncpt", dev);
+ WARN_ON(IS_ERR_VALUE(err));
+
+ /* disable the ip_busy_timeout. this prevents write drops, etc.
+ * there's no real way to recover from a hung client anyway.
+ */
+ writel(0, sync_regs + host1x_sync_ip_busy_timeout_r());
+
+ /* increase the auto-ack timout to the maximum value. 2d will hang
+ * otherwise on Tegra2.
+ */
+ writel(0xff, sync_regs + host1x_sync_ctxsw_timeout_cfg_r());
+}
+
+static void host1x_intr_set_host_clocks_per_usec(struct nvhost_intr *intr,
+ u32 cpm)
+{
+ struct nvhost_master *dev = intr_to_dev(intr);
+ void __iomem *sync_regs = dev->sync_aperture;
+ /* write microsecond clock register */
+ writel(cpm, sync_regs + host1x_sync_usec_clk_r());
+}
+
+static void host1x_intr_set_syncpt_threshold(struct nvhost_intr *intr,
+ u32 id, u32 thresh)
+{
+ struct nvhost_master *dev = intr_to_dev(intr);
+ void __iomem *sync_regs = dev->sync_aperture;
+ writel(thresh, sync_regs +
+ (host1x_sync_syncpt_int_thresh_0_r() + id * REGISTER_STRIDE));
+}
+
+static void host1x_intr_enable_syncpt_intr(struct nvhost_intr *intr, u32 id)
+{
+ struct nvhost_master *dev = intr_to_dev(intr);
+ void __iomem *sync_regs = dev->sync_aperture;
+
+ writel(BIT_MASK(id), sync_regs +
+ host1x_sync_syncpt_thresh_int_enable_cpu0_r() +
+ BIT_WORD(id) * REGISTER_STRIDE);
+}
+
+static void host1x_intr_disable_syncpt_intr(struct nvhost_intr *intr, u32 id)
+{
+ struct nvhost_master *dev = intr_to_dev(intr);
+ void __iomem *sync_regs = dev->sync_aperture;
+
+ writel(BIT_MASK(id), sync_regs +
+ host1x_sync_syncpt_thresh_int_disable_r() +
+ BIT_WORD(id) * REGISTER_STRIDE);
+
+ writel(BIT_MASK(id), sync_regs +
+ host1x_sync_syncpt_thresh_cpu0_int_status_r() +
+ BIT_WORD(id) * REGISTER_STRIDE);
+}
+
+static void host1x_intr_disable_all_syncpt_intrs(struct nvhost_intr *intr)
+{
+ struct nvhost_master *dev = intr_to_dev(intr);
+ void __iomem *sync_regs = dev->sync_aperture;
+ u32 reg;
+
+ for (reg = 0; reg <= BIT_WORD(dev->info.nb_pts) * REGISTER_STRIDE;
+ reg += REGISTER_STRIDE) {
+ writel(0xffffffffu, sync_regs +
+ host1x_sync_syncpt_thresh_int_disable_r() +
+ reg);
+
+ writel(0xffffffffu, sync_regs +
+ host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
+ }
+}
+
+/**
+ * Sync point threshold interrupt service function
+ * Handles sync point threshold triggers, in interrupt context
+ */
+static void host1x_intr_syncpt_thresh_isr(struct nvhost_intr_syncpt *syncpt)
+{
+ unsigned int id = syncpt->id;
+ struct nvhost_intr *intr = intr_syncpt_to_intr(syncpt);
+
+ void __iomem *sync_regs = intr_to_dev(intr)->sync_aperture;
+
+ u32 reg = BIT_WORD(id) * REGISTER_STRIDE;
+
+ writel(BIT_MASK(id), sync_regs +
+ host1x_sync_syncpt_thresh_int_disable_r() + reg);
+ writel(BIT_MASK(id), sync_regs +
+ host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
+}
+
+/**
+ * Host general interrupt service function
+ * Handles read / write failures
+ */
+static irqreturn_t host1x_intr_host1x_isr(int irq, void *dev_id)
+{
+ struct nvhost_intr *intr = dev_id;
+ void __iomem *sync_regs = intr_to_dev(intr)->sync_aperture;
+ u32 stat;
+ u32 ext_stat;
+ u32 addr;
+
+ stat = readl(sync_regs + host1x_sync_hintstatus_r());
+ ext_stat = readl(sync_regs + host1x_sync_hintstatus_ext_r());
+
+ if (host1x_sync_hintstatus_ext_ip_read_int_v(ext_stat)) {
+ addr = readl(sync_regs + host1x_sync_ip_read_timeout_addr_r());
+ pr_err("Host read timeout at address %x\n", addr);
+ }
+
+ if (host1x_sync_hintstatus_ext_ip_write_int_v(ext_stat)) {
+ addr = readl(sync_regs + host1x_sync_ip_write_timeout_addr_r());
+ pr_err("Host write timeout at address %x\n", addr);
+ }
+
+ writel(ext_stat, sync_regs + host1x_sync_hintstatus_ext_r());
+ writel(stat, sync_regs + host1x_sync_hintstatus_r());
+
+ return IRQ_HANDLED;
+}
+static int host1x_intr_request_host_general_irq(struct nvhost_intr *intr)
+{
+ void __iomem *sync_regs = intr_to_dev(intr)->sync_aperture;
+ int err;
+
+ /* master disable for general (not syncpt) host interrupts */
+ writel(0, sync_regs + host1x_sync_intmask_r());
+
+ /* clear status & extstatus */
+ writel(0xfffffffful, sync_regs + host1x_sync_hintstatus_ext_r());
+ writel(0xfffffffful, sync_regs + host1x_sync_hintstatus_r());
+
+ err = request_irq(intr->host_general_irq, host1x_intr_host1x_isr, 0,
+ "host_status", intr);
+ if (err)
+ return err;
+
+ /* enable extra interrupt sources IP_READ_INT and IP_WRITE_INT */
+ writel(BIT(30) | BIT(31), sync_regs + host1x_sync_hintmask_ext_r());
+
+ /* enable extra interrupt sources */
+ writel(BIT(12) | BIT(31), sync_regs + host1x_sync_hintmask_r());
+
+ /* enable host module interrupt to CPU0 */
+ writel(BIT(0), sync_regs + host1x_sync_intc0mask_r());
+
+ /* master enable for general (not syncpt) host interrupts */
+ writel(BIT(0), sync_regs + host1x_sync_intmask_r());
+
+ return err;
+}
+
+static void host1x_intr_free_host_general_irq(struct nvhost_intr *intr)
+{
+ void __iomem *sync_regs = intr_to_dev(intr)->sync_aperture;
+
+ /* master disable for general (not syncpt) host interrupts */
+ writel(0, sync_regs + host1x_sync_intmask_r());
+
+ free_irq(intr->host_general_irq, intr);
+}
+
+static int host1x_free_syncpt_irq(struct nvhost_intr *intr)
+{
+ flush_workqueue(intr->wq);
+ return 0;
+}
+
+static const struct nvhost_intr_ops host1x_intr_ops = {
+ .init_host_sync = host1x_intr_init_host_sync,
+ .set_host_clocks_per_usec = host1x_intr_set_host_clocks_per_usec,
+ .set_syncpt_threshold = host1x_intr_set_syncpt_threshold,
+ .enable_syncpt_intr = host1x_intr_enable_syncpt_intr,
+ .disable_syncpt_intr = host1x_intr_disable_syncpt_intr,
+ .disable_all_syncpt_intrs = host1x_intr_disable_all_syncpt_intrs,
+ .request_host_general_irq = host1x_intr_request_host_general_irq,
+ .free_host_general_irq = host1x_intr_free_host_general_irq,
+ .free_syncpt_irq = host1x_free_syncpt_irq,
+};
diff --git a/drivers/video/tegra/host/nvhost_intr.c b/drivers/video/tegra/host/nvhost_intr.c
new file mode 100644
index 0000000..35dd7bb
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_intr.c
@@ -0,0 +1,363 @@
+/*
+ * drivers/video/tegra/host/nvhost_intr.c
+ *
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "nvhost_intr.h"
+#include "nvhost_acm.h"
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include "chip_support.h"
+#include "host1x/host1x.h"
+
+/*** Wait list management ***/
+
+struct nvhost_waitlist {
+ struct list_head list;
+ struct kref refcount;
+ u32 thresh;
+ enum nvhost_intr_action action;
+ atomic_t state;
+ void *data;
+ int count;
+};
+
+enum waitlist_state {
+ WLS_PENDING,
+ WLS_REMOVED,
+ WLS_CANCELLED,
+ WLS_HANDLED
+};
+
+static void waiter_release(struct kref *kref)
+{
+ kfree(container_of(kref, struct nvhost_waitlist, refcount));
+}
+
+/**
+ * add a waiter to a waiter queue, sorted by threshold
+ * returns true if it was added at the head of the queue
+ */
+static bool add_waiter_to_queue(struct nvhost_waitlist *waiter,
+ struct list_head *queue)
+{
+ struct nvhost_waitlist *pos;
+ u32 thresh = waiter->thresh;
+
+ list_for_each_entry_reverse(pos, queue, list)
+ if ((s32)(pos->thresh - thresh) <= 0) {
+ list_add(&waiter->list, &pos->list);
+ return false;
+ }
+
+ list_add(&waiter->list, queue);
+ return true;
+}
+
+/**
+ * run through a waiter queue for a single sync point ID
+ * and gather all completed waiters into lists by actions
+ */
+static void remove_completed_waiters(struct list_head *head, u32 sync,
+ struct list_head completed[NVHOST_INTR_ACTION_COUNT])
+{
+ struct list_head *dest;
+ struct nvhost_waitlist *waiter, *next;
+
+ list_for_each_entry_safe(waiter, next, head, list) {
+ if ((s32)(waiter->thresh - sync) > 0)
+ break;
+
+ dest = completed + waiter->action;
+
+ /* PENDING->REMOVED or CANCELLED->HANDLED */
+ if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) {
+ list_del(&waiter->list);
+ kref_put(&waiter->refcount, waiter_release);
+ } else {
+ list_move_tail(&waiter->list, dest);
+ }
+ }
+}
+
+void reset_threshold_interrupt(struct nvhost_intr *intr,
+ struct list_head *head,
+ unsigned int id)
+{
+ u32 thresh = list_first_entry(head,
+ struct nvhost_waitlist, list)->thresh;
+
+ intr_op().set_syncpt_threshold(intr, id, thresh);
+ intr_op().enable_syncpt_intr(intr, id);
+}
+
+
+static void action_wakeup(struct nvhost_waitlist *waiter)
+{
+ wait_queue_head_t *wq = waiter->data;
+
+ wake_up(wq);
+}
+
+static void action_wakeup_interruptible(struct nvhost_waitlist *waiter)
+{
+ wait_queue_head_t *wq = waiter->data;
+
+ wake_up_interruptible(wq);
+}
+
+typedef void (*action_handler)(struct nvhost_waitlist *waiter);
+
+static action_handler action_handlers[NVHOST_INTR_ACTION_COUNT] = {
+ action_wakeup,
+ action_wakeup_interruptible,
+};
+
+static void run_handlers(struct list_head completed[NVHOST_INTR_ACTION_COUNT])
+{
+ struct list_head *head = completed;
+ int i;
+
+ for (i = 0; i < NVHOST_INTR_ACTION_COUNT; ++i, ++head) {
+ action_handler handler = action_handlers[i];
+ struct nvhost_waitlist *waiter, *next;
+
+ list_for_each_entry_safe(waiter, next, head, list) {
+ list_del(&waiter->list);
+ handler(waiter);
+ WARN_ON(atomic_xchg(&waiter->state, WLS_HANDLED)
+ != WLS_REMOVED);
+ kref_put(&waiter->refcount, waiter_release);
+ }
+ }
+}
+
+/**
+ * Remove & handle all waiters that have completed for the given syncpt
+ */
+static int process_wait_list(struct nvhost_intr *intr,
+ struct nvhost_intr_syncpt *syncpt,
+ u32 threshold)
+{
+ struct list_head completed[NVHOST_INTR_ACTION_COUNT];
+ unsigned int i;
+ int empty;
+
+ for (i = 0; i < NVHOST_INTR_ACTION_COUNT; ++i)
+ INIT_LIST_HEAD(completed + i);
+
+ spin_lock(&syncpt->lock);
+
+ remove_completed_waiters(&syncpt->wait_head, threshold, completed);
+
+ empty = list_empty(&syncpt->wait_head);
+ if (empty)
+ intr_op().disable_syncpt_intr(intr, syncpt->id);
+ else
+ reset_threshold_interrupt(intr, &syncpt->wait_head,
+ syncpt->id);
+
+ spin_unlock(&syncpt->lock);
+
+ run_handlers(completed);
+
+ return empty;
+}
+
+/*** host syncpt interrupt service functions ***/
+/**
+ * Sync point threshold interrupt service thread function
+ * Handles sync point threshold triggers, in thread context
+ */
+irqreturn_t nvhost_syncpt_thresh_fn(int irq, void *dev_id)
+{
+ struct nvhost_intr_syncpt *syncpt = dev_id;
+ unsigned int id = syncpt->id;
+ struct nvhost_intr *intr = intr_syncpt_to_intr(syncpt);
+ struct nvhost_master *dev = intr_to_dev(intr);
+
+ (void)process_wait_list(intr, syncpt,
+ nvhost_syncpt_update_min(&dev->syncpt, id));
+
+ return IRQ_HANDLED;
+}
+
+/*** host general interrupt service functions ***/
+
+
+/*** Main API ***/
+
+int nvhost_intr_add_action(struct nvhost_intr *intr, u32 id, u32 thresh,
+ enum nvhost_intr_action action, void *data,
+ void *_waiter,
+ void **ref)
+{
+ struct nvhost_waitlist *waiter = _waiter;
+ struct nvhost_intr_syncpt *syncpt;
+ int queue_was_empty;
+
+ if (waiter == NULL) {
+ pr_warn("%s: NULL waiter\n", __func__);
+ return -EINVAL;
+ }
+
+ /* initialize a new waiter */
+ INIT_LIST_HEAD(&waiter->list);
+ kref_init(&waiter->refcount);
+ if (ref)
+ kref_get(&waiter->refcount);
+ waiter->thresh = thresh;
+ waiter->action = action;
+ atomic_set(&waiter->state, WLS_PENDING);
+ waiter->data = data;
+ waiter->count = 1;
+
+ syncpt = intr->syncpt + id;
+
+ spin_lock(&syncpt->lock);
+
+ queue_was_empty = list_empty(&syncpt->wait_head);
+
+ if (add_waiter_to_queue(waiter, &syncpt->wait_head)) {
+ /* added at head of list - new threshold value */
+ intr_op().set_syncpt_threshold(intr, id, thresh);
+
+ /* added as first waiter - enable interrupt */
+ if (queue_was_empty)
+ intr_op().enable_syncpt_intr(intr, id);
+ }
+
+ spin_unlock(&syncpt->lock);
+
+ if (ref)
+ *ref = waiter;
+ return 0;
+}
+
+void *nvhost_intr_alloc_waiter()
+{
+ return kzalloc(sizeof(struct nvhost_waitlist),
+ GFP_KERNEL|__GFP_REPEAT);
+}
+
+void nvhost_intr_put_ref(struct nvhost_intr *intr, u32 id, void *ref)
+{
+ struct nvhost_waitlist *waiter = ref;
+ struct nvhost_intr_syncpt *syncpt;
+ struct nvhost_master *host = intr_to_dev(intr);
+
+ while (atomic_cmpxchg(&waiter->state,
+ WLS_PENDING, WLS_CANCELLED) == WLS_REMOVED)
+ schedule();
+
+ syncpt = intr->syncpt + id;
+ (void)process_wait_list(intr, syncpt,
+ nvhost_syncpt_update_min(&host->syncpt, id));
+
+ kref_put(&waiter->refcount, waiter_release);
+}
+
+
+/*** Init & shutdown ***/
+
+int nvhost_intr_init(struct nvhost_intr *intr, u32 irq_gen, u32 irq_sync)
+{
+ unsigned int id;
+ struct nvhost_intr_syncpt *syncpt;
+ struct nvhost_master *host = intr_to_dev(intr);
+ u32 nb_pts = nvhost_syncpt_nb_pts(&host->syncpt);
+
+ mutex_init(&intr->mutex);
+ intr->host_syncpt_irq_base = irq_sync;
+ intr->wq = create_workqueue("host_syncpt");
+ intr_op().init_host_sync(intr);
+ intr->host_general_irq = irq_gen;
+ intr_op().request_host_general_irq(intr);
+
+ for (id = 0, syncpt = intr->syncpt;
+ id < nb_pts;
+ ++id, ++syncpt) {
+ syncpt->intr = &host->intr;
+ syncpt->id = id;
+ syncpt->irq = irq_sync + id;
+ spin_lock_init(&syncpt->lock);
+ INIT_LIST_HEAD(&syncpt->wait_head);
+ snprintf(syncpt->thresh_irq_name,
+ sizeof(syncpt->thresh_irq_name),
+ "host_sp_%02d", id);
+ }
+
+ return 0;
+}
+
+void nvhost_intr_deinit(struct nvhost_intr *intr)
+{
+ nvhost_intr_stop(intr);
+ destroy_workqueue(intr->wq);
+}
+
+void nvhost_intr_start(struct nvhost_intr *intr, u32 hz)
+{
+ mutex_lock(&intr->mutex);
+
+ intr_op().init_host_sync(intr);
+ intr_op().set_host_clocks_per_usec(intr,
+ (hz + 1000000 - 1)/1000000);
+
+ intr_op().request_host_general_irq(intr);
+
+ mutex_unlock(&intr->mutex);
+}
+
+void nvhost_intr_stop(struct nvhost_intr *intr)
+{
+ unsigned int id;
+ struct nvhost_intr_syncpt *syncpt;
+ u32 nb_pts = nvhost_syncpt_nb_pts(&intr_to_dev(intr)->syncpt);
+
+ mutex_lock(&intr->mutex);
+
+ intr_op().disable_all_syncpt_intrs(intr);
+
+ for (id = 0, syncpt = intr->syncpt;
+ id < nb_pts;
+ ++id, ++syncpt) {
+ struct nvhost_waitlist *waiter, *next;
+ list_for_each_entry_safe(waiter, next,
+ &syncpt->wait_head, list) {
+ if (atomic_cmpxchg(&waiter->state,
+ WLS_CANCELLED, WLS_HANDLED)
+ == WLS_CANCELLED) {
+ list_del(&waiter->list);
+ kref_put(&waiter->refcount, waiter_release);
+ }
+ }
+
+ if (!list_empty(&syncpt->wait_head)) { /* output diagnostics */
+ pr_warn("%s cannot stop syncpt intr id=%d\n",
+ __func__, id);
+ return;
+ }
+ }
+
+ intr_op().free_host_general_irq(intr);
+ intr_op().free_syncpt_irq(intr);
+
+ mutex_unlock(&intr->mutex);
+}
diff --git a/drivers/video/tegra/host/nvhost_intr.h b/drivers/video/tegra/host/nvhost_intr.h
new file mode 100644
index 0000000..31b0a38
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_intr.h
@@ -0,0 +1,102 @@
+/*
+ * drivers/video/tegra/host/nvhost_intr.h
+ *
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_INTR_H
+#define __NVHOST_INTR_H
+
+#include <linux/kthread.h>
+#include <linux/semaphore.h>
+#include <linux/interrupt.h>
+#include <linux/workqueue.h>
+
+enum nvhost_intr_action {
+ /**
+ * Wake up a task.
+ * 'data' points to a wait_queue_head_t
+ */
+ NVHOST_INTR_ACTION_WAKEUP,
+
+ /**
+ * Wake up a interruptible task.
+ * 'data' points to a wait_queue_head_t
+ */
+ NVHOST_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
+
+ NVHOST_INTR_ACTION_COUNT
+};
+
+struct nvhost_intr;
+
+struct nvhost_intr_syncpt {
+ struct nvhost_intr *intr;
+ u8 id;
+ u16 irq;
+ spinlock_t lock;
+ struct list_head wait_head;
+ char thresh_irq_name[12];
+ struct work_struct work;
+};
+
+struct nvhost_intr {
+ struct nvhost_intr_syncpt *syncpt;
+ struct mutex mutex;
+ int host_general_irq;
+ int host_syncpt_irq_base;
+ struct workqueue_struct *wq;
+};
+#define intr_to_dev(x) container_of(x, struct nvhost_master, intr)
+#define intr_syncpt_to_intr(is) (is->intr)
+
+/**
+ * Schedule an action to be taken when a sync point reaches the given threshold.
+ *
+ * @id the sync point
+ * @thresh the threshold
+ * @action the action to take
+ * @data a pointer to extra data depending on action, see above
+ * @waiter waiter allocated with nvhost_intr_alloc_waiter - assumes ownership
+ * @ref must be passed if cancellation is possible, else NULL
+ *
+ * This is a non-blocking api.
+ */
+int nvhost_intr_add_action(struct nvhost_intr *intr, u32 id, u32 thresh,
+ enum nvhost_intr_action action, void *data,
+ void *waiter,
+ void **ref);
+
+/**
+ * Allocate a waiter.
+ */
+void *nvhost_intr_alloc_waiter(void);
+
+/**
+ * Unreference an action submitted to nvhost_intr_add_action().
+ * You must call this if you passed non-NULL as ref.
+ * @ref the ref returned from nvhost_intr_add_action()
+ */
+void nvhost_intr_put_ref(struct nvhost_intr *intr, u32 id, void *ref);
+
+int nvhost_intr_init(struct nvhost_intr *intr, u32 irq_gen, u32 irq_sync);
+void nvhost_intr_deinit(struct nvhost_intr *intr);
+void nvhost_intr_start(struct nvhost_intr *intr, u32 hz);
+void nvhost_intr_stop(struct nvhost_intr *intr);
+
+irqreturn_t nvhost_syncpt_thresh_fn(int irq, void *dev_id);
+#endif
diff --git a/drivers/video/tegra/host/nvhost_syncpt.c b/drivers/video/tegra/host/nvhost_syncpt.c
index d7c8230..6ef0ba4 100644
--- a/drivers/video/tegra/host/nvhost_syncpt.c
+++ b/drivers/video/tegra/host/nvhost_syncpt.c
@@ -123,6 +123,117 @@ void nvhost_syncpt_incr(struct nvhost_syncpt *sp, u32 id)
}

/**
+ * Updated sync point form hardware, and returns true if syncpoint is expired,
+ * false if we may need to wait
+ */
+static bool syncpt_update_min_is_expired(
+ struct nvhost_syncpt *sp,
+ u32 id,
+ u32 thresh)
+{
+ syncpt_op().update_min(sp, id);
+ return nvhost_syncpt_is_expired(sp, id, thresh);
+}
+
+/**
+ * Main entrypoint for syncpoint value waits.
+ */
+int nvhost_syncpt_wait_timeout(struct nvhost_syncpt *sp, u32 id,
+ u32 thresh, u32 timeout, u32 *value)
+{
+ DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ void *ref;
+ void *waiter;
+ int err = 0, check_count = 0, low_timeout = 0;
+ u32 val;
+
+ if (value)
+ *value = 0;
+
+ /* first check cache */
+ if (nvhost_syncpt_is_expired(sp, id, thresh)) {
+ if (value)
+ *value = nvhost_syncpt_read_min(sp, id);
+ return 0;
+ }
+
+ /* keep host alive */
+ nvhost_module_busy(syncpt_to_dev(sp)->dev);
+
+ /* try to read from register */
+ val = syncpt_op().update_min(sp, id);
+ if (nvhost_syncpt_is_expired(sp, id, thresh)) {
+ if (value)
+ *value = val;
+ goto done;
+ }
+
+ if (!timeout) {
+ err = -EAGAIN;
+ goto done;
+ }
+
+ /* schedule a wakeup when the syncpoint value is reached */
+ waiter = nvhost_intr_alloc_waiter();
+ if (!waiter) {
+ err = -ENOMEM;
+ goto done;
+ }
+
+ err = nvhost_intr_add_action(&(syncpt_to_dev(sp)->intr), id, thresh,
+ NVHOST_INTR_ACTION_WAKEUP_INTERRUPTIBLE, &wq,
+ waiter,
+ &ref);
+ if (err)
+ goto done;
+
+ err = -EAGAIN;
+ /* Caller-specified timeout may be impractically low */
+ if (timeout < SYNCPT_CHECK_PERIOD)
+ low_timeout = timeout;
+
+ /* wait for the syncpoint, or timeout, or signal */
+ while (timeout) {
+ u32 check = min_t(u32, SYNCPT_CHECK_PERIOD, timeout);
+ int remain = wait_event_interruptible_timeout(wq,
+ syncpt_update_min_is_expired(sp, id, thresh),
+ check);
+ if (remain > 0 || nvhost_syncpt_is_expired(sp, id, thresh)) {
+ if (value)
+ *value = nvhost_syncpt_read_min(sp, id);
+ err = 0;
+ break;
+ }
+ if (remain < 0) {
+ err = remain;
+ break;
+ }
+ if (timeout != NVHOST_NO_TIMEOUT)
+ timeout -= check;
+ if (timeout && check_count <= MAX_STUCK_CHECK_COUNT) {
+ dev_warn(&syncpt_to_dev(sp)->dev->dev,
+ "%s: syncpoint id %d (%s) stuck waiting %d, timeout=%d\n",
+ current->comm, id, syncpt_op().name(sp, id),
+ thresh, timeout);
+ syncpt_op().debug(sp);
+ if (check_count == MAX_STUCK_CHECK_COUNT) {
+ if (low_timeout) {
+ dev_warn(&syncpt_to_dev(sp)->dev->dev,
+ "is timeout %d too low?\n",
+ low_timeout);
+ }
+ }
+ check_count++;
+ }
+ }
+ nvhost_intr_put_ref(&(syncpt_to_dev(sp)->intr), id, ref);
+
+done:
+ nvhost_module_idle(syncpt_to_dev(sp)->dev);
+ return err;
+}
+
+/**
* Returns true if syncpoint is expired, false if we may need to wait
*/
bool nvhost_syncpt_is_expired(
diff --git a/drivers/video/tegra/host/nvhost_syncpt.h b/drivers/video/tegra/host/nvhost_syncpt.h
index b883442..dbd3890 100644
--- a/drivers/video/tegra/host/nvhost_syncpt.h
+++ b/drivers/video/tegra/host/nvhost_syncpt.h
@@ -126,6 +126,16 @@ u32 nvhost_syncpt_read_wait_base(struct nvhost_syncpt *sp, u32 id);

void nvhost_syncpt_incr(struct nvhost_syncpt *sp, u32 id);

+int nvhost_syncpt_wait_timeout(struct nvhost_syncpt *sp, u32 id, u32 thresh,
+ u32 timeout, u32 *value);
+
+static inline int nvhost_syncpt_wait(struct nvhost_syncpt *sp,
+ u32 id, u32 thresh)
+{
+ return nvhost_syncpt_wait_timeout(sp, id, thresh,
+ MAX_SCHEDULE_TIMEOUT, NULL);
+}
+
void nvhost_syncpt_debug(struct nvhost_syncpt *sp);

static inline int nvhost_syncpt_is_valid(struct nvhost_syncpt *sp, u32 id)
diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
index 20ba2a5..745f31c 100644
--- a/include/linux/nvhost.h
+++ b/include/linux/nvhost.h
@@ -35,6 +35,7 @@ struct nvhost_device_power_attr;
#define NVHOST_DEFAULT_CLOCKGATE_DELAY .clockgate_delay = 25
#define NVHOST_NAME_SIZE 24
#define NVSYNCPT_INVALID (-1)
+#define NVHOST_NO_TIMEOUT (-1)

enum nvhost_power_sysfs_attributes {
NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY = 0,
@@ -139,5 +140,6 @@ void host1x_idle(struct platform_device *dev);
u32 host1x_syncpt_incr_max(u32 id, u32 incrs);
void host1x_syncpt_incr(u32 id);
u32 host1x_syncpt_read(u32 id);
+int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value);

#endif
--
1.7.9.5
Sivaram Nair
2012-11-27 11:02:59 UTC
Permalink
On Mon, Nov 26, 2012 at 02:19:08PM +0100, Terje Bergstrom wrote:
> +void nvhost_intr_stop(struct nvhost_intr *intr)
> +{
> + unsigned int id;
> + struct nvhost_intr_syncpt *syncpt;
> + u32 nb_pts = nvhost_syncpt_nb_pts(&intr_to_dev(intr)->syncpt);
> +
> + mutex_lock(&intr->mutex);
> +
> + intr_op().disable_all_syncpt_intrs(intr);
> +
> + for (id = 0, syncpt = intr->syncpt;
> + id < nb_pts;
> + ++id, ++syncpt) {
> + struct nvhost_waitlist *waiter, *next;
> + list_for_each_entry_safe(waiter, next,
> + &syncpt->wait_head, list) {
> + if (atomic_cmpxchg(&waiter->state,
> + WLS_CANCELLED, WLS_HANDLED)
> + == WLS_CANCELLED) {
> + list_del(&waiter->list);
> + kref_put(&waiter->refcount, waiter_release);
> + }
> + }
> +
> + if (!list_empty(&syncpt->wait_head)) { /* output diagnostics */
> + pr_warn("%s cannot stop syncpt intr id=%d\n",
> + __func__, id);
> + return;

mutex_unlock() missing before return.

-Sivaram
Thierry Reding
2012-11-29 08:44:00 UTC
Permalink
On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:
[...]
> diff --git a/drivers/video/tegra/host/chip_support.h b/drivers/video/tegra/host/chip_support.h
[...]
> +struct nvhost_intr_ops {
> + void (*init_host_sync)(struct nvhost_intr *);
> + void (*set_host_clocks_per_usec)(
> + struct nvhost_intr *, u32 clocks);
> + void (*set_syncpt_threshold)(
> + struct nvhost_intr *, u32 id, u32 thresh);
> + void (*enable_syncpt_intr)(struct nvhost_intr *, u32 id);
> + void (*disable_syncpt_intr)(struct nvhost_intr *, u32 id);
> + void (*disable_all_syncpt_intrs)(struct nvhost_intr *);
> + int (*request_host_general_irq)(struct nvhost_intr *);
> + void (*free_host_general_irq)(struct nvhost_intr *);
> + int (*free_syncpt_irq)(struct nvhost_intr *);
> +};
> +
> struct nvhost_chip_support {
> const char *soc_name;
> struct nvhost_syncpt_ops syncpt;
> + struct nvhost_intr_ops intr;
> };
>
> struct nvhost_chip_support *nvhost_get_chip_ops(void);
>
> #define syncpt_op() (nvhost_get_chip_ops()->syncpt)
> +#define intr_op() (nvhost_get_chip_ops()->intr)
>
> int nvhost_init_chip_support(struct nvhost_master *host);
>

The same comments apply as for patch 1. Reducing the number of
indirections here make things a lot easier.

> diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
> index 98c9c9f..025a820 100644
> --- a/drivers/video/tegra/host/dev.c
> +++ b/drivers/video/tegra/host/dev.c
> @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
> }
> EXPORT_SYMBOL(host1x_syncpt_read);
>
> +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)

The choice of data types is odd here. id refers to a syncpt so a better
choice would have been unsigned int because the size of the variable
doesn't actually matter. But as I already said in my reply to patch 1,
these are resources and should therefore better be abstracted through an
opaque pointer anyway.

timeout is usually signed long, so this function should reflect that. As
for the value this is probably fine as it will effectively be set from a
register value. Though you also cache them in software using atomics.

> +static void clock_on_host(struct platform_device *dev)
> +{
> + struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> + struct nvhost_master *host = nvhost_get_private_data(dev);
> + nvhost_intr_start(&host->intr, clk_get_rate(pdata->clk[0]));
> +}
> +
> +static int clock_off_host(struct platform_device *dev)
> +{
> + struct nvhost_master *host = nvhost_get_private_data(dev);
> + nvhost_intr_stop(&host->intr);
> + return 0;
> +}

This is a good example of why these indirections are wasteful. You
constantly need to look up the host pointer just to call another
function on it. With some modifications to the structure layouts it
should be possible to make this a lot more straightforward.

> diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h
> index 76748ac..af9bfef 100644
> --- a/drivers/video/tegra/host/host1x/host1x.h
> +++ b/drivers/video/tegra/host/host1x/host1x.h
> @@ -25,6 +25,7 @@
> #include <linux/nvhost.h>
>
> #include "nvhost_syncpt.h"
> +#include "nvhost_intr.h"
>
> #define TRACE_MAX_LENGTH 128U
> #define IFACE_NAME "nvhost"
> @@ -33,6 +34,7 @@ struct nvhost_master {
> void __iomem *aperture;
> void __iomem *sync_aperture;
> struct nvhost_syncpt syncpt;
> + struct nvhost_intr intr;
> struct platform_device *dev;
> struct host1x_device_info info;
> };
> diff --git a/drivers/video/tegra/host/host1x/host1x01.c b/drivers/video/tegra/host/host1x/host1x01.c
> index d53302d..5bf0e6e 100644
> --- a/drivers/video/tegra/host/host1x/host1x01.c
> +++ b/drivers/video/tegra/host/host1x/host1x01.c
> @@ -26,12 +26,14 @@
> #include "chip_support.h"
>
> #include "host1x/host1x_syncpt.c"
> +#include "host1x/host1x_intr.c"
>
> int nvhost_init_host1x01_support(struct nvhost_master *host,
> struct nvhost_chip_support *op)
> {
> host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE;
> op->syncpt = host1x_syncpt_ops;
> + op->intr = host1x_intr_ops;
>
> return 0;
> }

Also you need to touch a lot of files just to add this new feature. This
makes maintenance needlessly difficult.

> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c b/drivers/video/tegra/host/host1x/host1x_intr.c
[...]
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/io.h>
> +#include <asm/mach/irq.h>
> +
> +#include "nvhost_intr.h"
> +#include "host1x/host1x.h"
> +
> +/* Spacing between sync registers */
> +#define REGISTER_STRIDE 4

Erm... no. The usual way you should be doing this is either make the
register definitions account for the stride or use accessors that apply
the stride. You should be doing the latter anyway to make accesses. For
example:

static inline void host1x_syncpt_writel(struct host1x *host1x,
unsigned long value,
unsigned long offset)
{
writel(value, host1x->regs + SYNCPT_BASE + offset);
}

static inline unsigned long host1x_syncpt_readl(struct host1x *host1x,
unsigned long offset)
{
return readl(host1x->regs + SYNCPT_BASE + offset);
}

Alternatively, if you want to pass the register index instead of the
offset, you can use just multiply the offset in that function:

writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));

The same can also be done with the non-syncpt registers.

> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
> +{
> + struct nvhost_master *dev = dev_id;
> + void __iomem *sync_regs = dev->sync_aperture;
> + struct nvhost_intr *intr = &dev->intr;
> + unsigned long reg;
> + int i, id;
> +
> + for (i = 0; i < dev->info.nb_pts / BITS_PER_LONG; i++) {
> + reg = readl(sync_regs +
> + host1x_sync_syncpt_thresh_cpu0_int_status_r() +
> + i * REGISTER_STRIDE);
> + for_each_set_bit(id, &reg, BITS_PER_LONG) {
> + struct nvhost_intr_syncpt *sp =
> + intr->syncpt + (i * BITS_PER_LONG + id);
> + host1x_intr_syncpt_thresh_isr(sp);
> + queue_work(intr->wq, &sp->work);
> + }
> + }
> +
> + return IRQ_HANDLED;
> +}

Maybe it would be better to call the syncpt handlers in interrupt
context and let them schedule work if they want to. I'm thinking about
the display controllers which may want to use syncpoints for VBLANK
support.

> +static void host1x_intr_init_host_sync(struct nvhost_intr *intr)
> +{
> + struct nvhost_master *dev = intr_to_dev(intr);
> + void __iomem *sync_regs = dev->sync_aperture;
> + int i, err, irq;
> +
> + writel(0xffffffffUL,
> + sync_regs + host1x_sync_syncpt_thresh_int_disable_r());
> + writel(0xffffffffUL,
> + sync_regs + host1x_sync_syncpt_thresh_cpu0_int_status_r());
> +
> + for (i = 0; i < dev->info.nb_pts; i++)
> + INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
> +
> + irq = platform_get_irq(dev->dev, 0);
> + WARN_ON(IS_ERR_VALUE(irq));
> + err = devm_request_irq(&dev->dev->dev, irq,
> + syncpt_thresh_cascade_isr,
> + IRQF_SHARED, "host_syncpt", dev);
> + WARN_ON(IS_ERR_VALUE(err));

You should be handling this properly and propagate these errors to the
corresponding .probe().

> +/**
> + * Sync point threshold interrupt service function
> + * Handles sync point threshold triggers, in interrupt context
> + */
> +static void host1x_intr_syncpt_thresh_isr(struct nvhost_intr_syncpt *syncpt)
> +{
> + unsigned int id = syncpt->id;
> + struct nvhost_intr *intr = intr_syncpt_to_intr(syncpt);
> +
> + void __iomem *sync_regs = intr_to_dev(intr)->sync_aperture;
> +
> + u32 reg = BIT_WORD(id) * REGISTER_STRIDE;
> +
> + writel(BIT_MASK(id), sync_regs +
> + host1x_sync_syncpt_thresh_int_disable_r() + reg);
> + writel(BIT_MASK(id), sync_regs +
> + host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
> +}

So this disables all interrupts and is called from the syncpt interrupt
handler. Where are the interrupts reenabled? Do host1x clients have to
do that manually?

> +static int host1x_intr_request_host_general_irq(struct nvhost_intr *intr)
> +{
> + void __iomem *sync_regs = intr_to_dev(intr)->sync_aperture;
> + int err;
> +
> + /* master disable for general (not syncpt) host interrupts */
> + writel(0, sync_regs + host1x_sync_intmask_r());
> +
> + /* clear status & extstatus */
> + writel(0xfffffffful, sync_regs + host1x_sync_hintstatus_ext_r());
> + writel(0xfffffffful, sync_regs + host1x_sync_hintstatus_r());
> +
> + err = request_irq(intr->host_general_irq, host1x_intr_host1x_isr, 0,
> + "host_status", intr);
> + if (err)
> + return err;
> +
> + /* enable extra interrupt sources IP_READ_INT and IP_WRITE_INT */
> + writel(BIT(30) | BIT(31), sync_regs + host1x_sync_hintmask_ext_r());
> +
> + /* enable extra interrupt sources */
> + writel(BIT(12) | BIT(31), sync_regs + host1x_sync_hintmask_r());
> +
> + /* enable host module interrupt to CPU0 */
> + writel(BIT(0), sync_regs + host1x_sync_intc0mask_r());
> +
> + /* master enable for general (not syncpt) host interrupts */
> + writel(BIT(0), sync_regs + host1x_sync_intmask_r());
> +
> + return err;
> +}

You should add defines for these bits, which will likely make the
comments redundant.

> diff --git a/drivers/video/tegra/host/nvhost_intr.c b/drivers/video/tegra/host/nvhost_intr.c
[...]
> +void reset_threshold_interrupt(struct nvhost_intr *intr,
> + struct list_head *head,
> + unsigned int id)
> +{
> + u32 thresh = list_first_entry(head,
> + struct nvhost_waitlist, list)->thresh;
> +
> + intr_op().set_syncpt_threshold(intr, id, thresh);
> + intr_op().enable_syncpt_intr(intr, id);
> +}

Okay, so this is where syncpoint interrupts are reenabled. The rest of
this whole wait queue code looks overly complex. I'll need to go through
that separately and more thoroughly.

> +void *nvhost_intr_alloc_waiter()
> +{
> + return kzalloc(sizeof(struct nvhost_waitlist),
> + GFP_KERNEL|__GFP_REPEAT);
> +}

I don't think we need __GFP_REPEAT here.

> +/*** Init & shutdown ***/
> +
> +int nvhost_intr_init(struct nvhost_intr *intr, u32 irq_gen, u32 irq_sync)

Again, using u32 for interrupt numbers is unusual.

> +{
> + unsigned int id;
> + struct nvhost_intr_syncpt *syncpt;
> + struct nvhost_master *host = intr_to_dev(intr);
> + u32 nb_pts = nvhost_syncpt_nb_pts(&host->syncpt);
> +
> + mutex_init(&intr->mutex);
> + intr->host_syncpt_irq_base = irq_sync;
> + intr->wq = create_workqueue("host_syncpt");

What if create_workqueue() fails?

> + intr_op().init_host_sync(intr);
> + intr->host_general_irq = irq_gen;
> + intr_op().request_host_general_irq(intr);
> +
> + for (id = 0, syncpt = intr->syncpt;
> + id < nb_pts;
> + ++id, ++syncpt) {

This fits perfectly well on a single line, no need to wrap it. Also you
could instead of incrementing syncpt, move it into the loop and assign
it based on id.

for (id = 0; id < nb_pts; id++) {
struct nvhost_intr_syncpt *syncpt = &intr->syncpt[id];
...
}

> +void nvhost_intr_start(struct nvhost_intr *intr, u32 hz)
> +{
> + mutex_lock(&intr->mutex);
> +
> + intr_op().init_host_sync(intr);
> + intr_op().set_host_clocks_per_usec(intr,
> + (hz + 1000000 - 1)/1000000);

DIV_ROUND_UP(hz)?

> diff --git a/drivers/video/tegra/host/nvhost_syncpt.h b/drivers/video/tegra/host/nvhost_syncpt.h
[...]
> index b883442..dbd3890 100644
> --- a/drivers/video/tegra/host/nvhost_syncpt.h
> +++ b/drivers/video/tegra/host/nvhost_syncpt.h
> @@ -126,6 +126,16 @@ u32 nvhost_syncpt_read_wait_base(struct nvhost_syncpt *sp, u32 id);
>
> void nvhost_syncpt_incr(struct nvhost_syncpt *sp, u32 id);
>
> +int nvhost_syncpt_wait_timeout(struct nvhost_syncpt *sp, u32 id, u32 thresh,
> + u32 timeout, u32 *value);
> +
> +static inline int nvhost_syncpt_wait(struct nvhost_syncpt *sp,
> + u32 id, u32 thresh)
> +{
> + return nvhost_syncpt_wait_timeout(sp, id, thresh,
> + MAX_SCHEDULE_TIMEOUT, NULL);
> +}
> +
> void nvhost_syncpt_debug(struct nvhost_syncpt *sp);
>
> static inline int nvhost_syncpt_is_valid(struct nvhost_syncpt *sp, u32 id)
> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
> index 20ba2a5..745f31c 100644
> --- a/include/linux/nvhost.h
> +++ b/include/linux/nvhost.h
> @@ -35,6 +35,7 @@ struct nvhost_device_power_attr;
> #define NVHOST_DEFAULT_CLOCKGATE_DELAY .clockgate_delay = 25
> #define NVHOST_NAME_SIZE 24
> #define NVSYNCPT_INVALID (-1)
> +#define NVHOST_NO_TIMEOUT (-1)

Couldn't you reuse MAX_SCHEDULE_TIMEOUT instead? You already use it as
the value for the timeout parameter in nvhost_syncpt_wait().

Thierry
Terje Bergström
2012-11-29 10:39:23 UTC
Permalink
On 29.11.2012 10:44, Thierry Reding wrote:
>> diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
>> index 98c9c9f..025a820 100644
>> --- a/drivers/video/tegra/host/dev.c
>> +++ b/drivers/video/tegra/host/dev.c
>> @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
>> }
>> EXPORT_SYMBOL(host1x_syncpt_read);
>>
>> +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
>
> The choice of data types is odd here. id refers to a syncpt so a better
> choice would have been unsigned int because the size of the variable
> doesn't actually matter. But as I already said in my reply to patch 1,
> these are resources and should therefore better be abstracted through an
> opaque pointer anyway.
>
> timeout is usually signed long, so this function should reflect that. As
> for the value this is probably fine as it will effectively be set from a
> register value. Though you also cache them in software using atomics.

32-bits is an architectural limit for the sync point id, so that's why I
used it here. But you're right - it doesn't really matter and could be
changed to unsigned long.

thresh and *value reflects that sync point value is 32-bit, and I'd keep
that as is.

Timeout should be unsigned long, yes.

>> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c b/drivers/video/tegra/host/host1x/host1x_intr.c
> [...]
>> +#include <linux/interrupt.h>
>> +#include <linux/irq.h>
>> +#include <linux/io.h>
>> +#include <asm/mach/irq.h>
>> +
>> +#include "nvhost_intr.h"
>> +#include "host1x/host1x.h"
>> +
>> +/* Spacing between sync registers */
>> +#define REGISTER_STRIDE 4
>
> Erm... no. The usual way you should be doing this is either make the
> register definitions account for the stride or use accessors that apply
> the stride. You should be doing the latter anyway to make accesses. For
> example:
>
> static inline void host1x_syncpt_writel(struct host1x *host1x,
> unsigned long value,
> unsigned long offset)
> {
> writel(value, host1x->regs + SYNCPT_BASE + offset);
> }
>
> static inline unsigned long host1x_syncpt_readl(struct host1x *host1x,
> unsigned long offset)
> {
> return readl(host1x->regs + SYNCPT_BASE + offset);
> }
>
> Alternatively, if you want to pass the register index instead of the
> offset, you can use just multiply the offset in that function:
>
> writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
>
> The same can also be done with the non-syncpt registers.

The register number has a stride of 4 when doing writes, and 1 when
adding to command streams. This is why I've kept the register
definitions as is.

I could add helper functions. Just as a side note, the sync register
space has other definitions than just the syncpt registers, so the
naming should be changed a bit.

>> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
>> +{
>> + struct nvhost_master *dev = dev_id;
>> + void __iomem *sync_regs = dev->sync_aperture;
>> + struct nvhost_intr *intr = &dev->intr;
>> + unsigned long reg;
>> + int i, id;
>> +
>> + for (i = 0; i < dev->info.nb_pts / BITS_PER_LONG; i++) {
>> + reg = readl(sync_regs +
>> + host1x_sync_syncpt_thresh_cpu0_int_status_r() +
>> + i * REGISTER_STRIDE);
>> + for_each_set_bit(id, &reg, BITS_PER_LONG) {
>> + struct nvhost_intr_syncpt *sp =
>> + intr->syncpt + (i * BITS_PER_LONG + id);
>> + host1x_intr_syncpt_thresh_isr(sp);
>> + queue_work(intr->wq, &sp->work);
>> + }
>> + }
>> +
>> + return IRQ_HANDLED;
>> +}
>
> Maybe it would be better to call the syncpt handlers in interrupt
> context and let them schedule work if they want to. I'm thinking about
> the display controllers which may want to use syncpoints for VBLANK
> support.

Display controller can use the APIs to read, increment and wait for sync
point.

We could do more in isr, but then again, we've noticed that the current
design already gives pretty good latency, so we haven't seen the need to
move code from thread to isr.

>> +static void host1x_intr_init_host_sync(struct nvhost_intr *intr)
>> +{
>> + struct nvhost_master *dev = intr_to_dev(intr);
>> + void __iomem *sync_regs = dev->sync_aperture;
>> + int i, err, irq;
>> +
>> + writel(0xffffffffUL,
>> + sync_regs + host1x_sync_syncpt_thresh_int_disable_r());
>> + writel(0xffffffffUL,
>> + sync_regs + host1x_sync_syncpt_thresh_cpu0_int_status_r());
>> +
>> + for (i = 0; i < dev->info.nb_pts; i++)
>> + INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
>> +
>> + irq = platform_get_irq(dev->dev, 0);
>> + WARN_ON(IS_ERR_VALUE(irq));
>> + err = devm_request_irq(&dev->dev->dev, irq,
>> + syncpt_thresh_cascade_isr,
>> + IRQF_SHARED, "host_syncpt", dev);
>> + WARN_ON(IS_ERR_VALUE(err));
>
> You should be handling this properly and propagate these errors to the
> corresponding .probe().

Yes, will do. And the strange part is that nvhost_intr actually already
contains the irq number, so actually there's no need to retrieve it from
platform_device.

>> +/**
>> + * Sync point threshold interrupt service function
>> + * Handles sync point threshold triggers, in interrupt context
>> + */
>> +static void host1x_intr_syncpt_thresh_isr(struct nvhost_intr_syncpt *syncpt)
>> +{
>> + unsigned int id = syncpt->id;
>> + struct nvhost_intr *intr = intr_syncpt_to_intr(syncpt);
>> +
>> + void __iomem *sync_regs = intr_to_dev(intr)->sync_aperture;
>> +
>> + u32 reg = BIT_WORD(id) * REGISTER_STRIDE;
>> +
>> + writel(BIT_MASK(id), sync_regs +
>> + host1x_sync_syncpt_thresh_int_disable_r() + reg);
>> + writel(BIT_MASK(id), sync_regs +
>> + host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
>> +}
>
> So this disables all interrupts and is called from the syncpt interrupt
> handler. Where are the interrupts reenabled? Do host1x clients have to
> do that manually?

The thread re-enables once it's done. It checks the next value we're
interested in, and programs that to host1x syncpt threshold.


>> + /* enable extra interrupt sources IP_READ_INT and IP_WRITE_INT */
>> + writel(BIT(30) | BIT(31), sync_regs + host1x_sync_hintmask_ext_r());
>> +
>> + /* enable extra interrupt sources */
>> + writel(BIT(12) | BIT(31), sync_regs + host1x_sync_hintmask_r());
>> +
>> + /* enable host module interrupt to CPU0 */
>> + writel(BIT(0), sync_regs + host1x_sync_intc0mask_r());
>> +
>> + /* master enable for general (not syncpt) host interrupts */
>> + writel(BIT(0), sync_regs + host1x_sync_intmask_r());
>> +
>> + return err;
>> +}
>
> You should add defines for these bits, which will likely make the
> comments redundant.

I'm actually thinking that I might just remove the generic interrupts.
We have no use for it in upstream kernel.

> Okay, so this is where syncpoint interrupts are reenabled. The rest of
> this whole wait queue code looks overly complex. I'll need to go through
> that separately and more thoroughly.

Thanks.

>> +void *nvhost_intr_alloc_waiter()
>> +{
>> + return kzalloc(sizeof(struct nvhost_waitlist),
>> + GFP_KERNEL|__GFP_REPEAT);
>> +}
>
> I don't think we need __GFP_REPEAT here.

This used to be called from code where failed alloc was fatal, but not
anymore, so __GFP_REPEAT isn't needed anymore.

>> +/*** Init & shutdown ***/
>> +
>> +int nvhost_intr_init(struct nvhost_intr *intr, u32 irq_gen, u32 irq_sync)
>
> Again, using u32 for interrupt numbers is unusual.

Ok.

>> + mutex_init(&intr->mutex);
>> + intr->host_syncpt_irq_base = irq_sync;
>> + intr->wq = create_workqueue("host_syncpt");
>
> What if create_workqueue() fails?

Hmm, we panic? Not good.

>
>> + intr_op().init_host_sync(intr);
>> + intr->host_general_irq = irq_gen;
>> + intr_op().request_host_general_irq(intr);
>> +
>> + for (id = 0, syncpt = intr->syncpt;
>> + id < nb_pts;
>> + ++id, ++syncpt) {
>
> This fits perfectly well on a single line, no need to wrap it. Also you
> could instead of incrementing syncpt, move it into the loop and assign
> it based on id.
>
> for (id = 0; id < nb_pts; id++) {
> struct nvhost_intr_syncpt *syncpt = &intr->syncpt[id];
> ...
> }

Looks better, yes.

>> +void nvhost_intr_start(struct nvhost_intr *intr, u32 hz)
>> +{
>> + mutex_lock(&intr->mutex);
>> +
>> + intr_op().init_host_sync(intr);
>> + intr_op().set_host_clocks_per_usec(intr,
>> + (hz + 1000000 - 1)/1000000);
>
> DIV_ROUND_UP(hz)?

Yes, that's what we should use. I didn't know we have that.

>> +#define NVHOST_NO_TIMEOUT (-1)
>
> Couldn't you reuse MAX_SCHEDULE_TIMEOUT instead? You already use it as
> the value for the timeout parameter in nvhost_syncpt_wait().

I guess NVHOST_NO_TIMEOUT is anyway a bad idea, and I could just remove
it. The caller can just pass LONG_MAX if it wants to wait for a _really_
long time, but having no timeout is not good.

Terje
Terje Bergström
2012-11-30 07:41:46 UTC
Permalink
Just replying to part of your mail.

On 30.11.2012 09:22, Thierry Reding wrote:
> Actually for the display controller we want just a notification when the
> VBLANK happens. I'm not sure if we want to do that with syncpoints at
> all since it works quite well using regular interrupts.

VBLANK isn't actually a very good example of dc's use of sync points.
That can easily be done with regular interrupts, as you mention.

More important is when we have double buffering enabled. When you draw
something to a surface, and flip it to display, you want DC to notify
when the flip has been done and rendering can continue to the back buffer.

So, what you can do is return a fence from DC when initiating a flip,
and place that fence into 2D stream as a host wait so that 2D will
patiently wait for buffer to become free before it renders.

> What I'm proposing is to leave it up to each host1x client how they want
> to handle this. For display controllers it may be enough to have their
> callback run in interrupt context but other clients may need to do more
> work so they can queue it themselves.

DC doesn't need to worry about host1x interrupts at all. It's all
internal to the host1x driver, so we're now just talking about the
internal implementation of host1x.

We have two scenarios for the syncpt interrupts. One is that a job got
finished and we need to clean up the queue and free up resources. This
must be done in threads. Other is releasing a thread that is blocked by
a syncpt wait.

It's simpler if both of these are handled with the same infrastructure,
and we've shown that latency is very good even if we handle all events
in a thread.

> I know that this looks like it might be more work, but if it turns out
> that many drivers need to do the exact same thing, that functionality
> can be factored out into a helper. But it may just as well turn out that
> the requirements for each module are slightly different that forcing a
> workqueue on them could result in ugly workarounds because it doesn't
> quite work for them.

This is just driver internal, so there's no need for other drivers to
access this part.

> If we move responsibility of managing the workqueue out of host1x as I
> proposed above, maybe a lot of this code can be removed. Maybe you can
> explain a bit what they are used for exactly in your write-up.

It's going to be a big bad boy. :-)

Terje
Thierry Reding
2012-11-30 07:22:00 UTC
Permalink
On Thu, Nov 29, 2012 at 12:39:23PM +0200, Terje Bergström wrote:
> On 29.11.2012 10:44, Thierry Reding wrote:
> >> diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
> >> index 98c9c9f..025a820 100644
> >> --- a/drivers/video/tegra/host/dev.c
> >> +++ b/drivers/video/tegra/host/dev.c
> >> @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
> >> }
> >> EXPORT_SYMBOL(host1x_syncpt_read);
> >>
> >> +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
> >
> > The choice of data types is odd here. id refers to a syncpt so a better
> > choice would have been unsigned int because the size of the variable
> > doesn't actually matter. But as I already said in my reply to patch 1,
> > these are resources and should therefore better be abstracted through an
> > opaque pointer anyway.
> >
> > timeout is usually signed long, so this function should reflect that. As
> > for the value this is probably fine as it will effectively be set from a
> > register value. Though you also cache them in software using atomics.
>
> 32-bits is an architectural limit for the sync point id, so that's why I
> used it here.

But given that there are only 32 syncpoints they look rather costly, so
I don't expect more than a few hundred to ever be used in hardware,
right?

> But you're right - it doesn't really matter and could be changed to
> unsigned long.

I'd still opt for unsigned int. For no other reason than that it is how
other types of resources are enumerated.

> thresh and *value reflects that sync point value is 32-bit, and I'd keep
> that as is.

Yes, that makes sense.

> Timeout should be unsigned long, yes.

It should actually be signed long to match the type used for timeouts in
the various wait_*() functions.

> >> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c b/drivers/video/tegra/host/host1x/host1x_intr.c
> > [...]
> >> +#include <linux/interrupt.h>
> >> +#include <linux/irq.h>
> >> +#include <linux/io.h>
> >> +#include <asm/mach/irq.h>
> >> +
> >> +#include "nvhost_intr.h"
> >> +#include "host1x/host1x.h"
> >> +
> >> +/* Spacing between sync registers */
> >> +#define REGISTER_STRIDE 4
> >
> > Erm... no. The usual way you should be doing this is either make the
> > register definitions account for the stride or use accessors that apply
> > the stride. You should be doing the latter anyway to make accesses. For
> > example:
> >
> > static inline void host1x_syncpt_writel(struct host1x *host1x,
> > unsigned long value,
> > unsigned long offset)
> > {
> > writel(value, host1x->regs + SYNCPT_BASE + offset);
> > }
> >
> > static inline unsigned long host1x_syncpt_readl(struct host1x *host1x,
> > unsigned long offset)
> > {
> > return readl(host1x->regs + SYNCPT_BASE + offset);
> > }
> >
> > Alternatively, if you want to pass the register index instead of the
> > offset, you can use just multiply the offset in that function:
> >
> > writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
> >
> > The same can also be done with the non-syncpt registers.
>
> The register number has a stride of 4 when doing writes, and 1 when
> adding to command streams. This is why I've kept the register
> definitions as is.

Yes, that's why it makes sense to use such helpers. It allows you to
reuse the register definitions for both direct and indirect access but
doesn't require you to repeat the stride multiplication every time.

> I could add helper functions. Just as a side note, the sync register
> space has other definitions than just the syncpt registers, so the
> naming should be changed a bit.

The TRM refers to them as SYNC registers, so SYNC_BASE should be fine.

> >> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
> >> +{
> >> + struct nvhost_master *dev = dev_id;
> >> + void __iomem *sync_regs = dev->sync_aperture;
> >> + struct nvhost_intr *intr = &dev->intr;
> >> + unsigned long reg;
> >> + int i, id;
> >> +
> >> + for (i = 0; i < dev->info.nb_pts / BITS_PER_LONG; i++) {
> >> + reg = readl(sync_regs +
> >> + host1x_sync_syncpt_thresh_cpu0_int_status_r() +
> >> + i * REGISTER_STRIDE);
> >> + for_each_set_bit(id, &reg, BITS_PER_LONG) {
> >> + struct nvhost_intr_syncpt *sp =
> >> + intr->syncpt + (i * BITS_PER_LONG + id);
> >> + host1x_intr_syncpt_thresh_isr(sp);
> >> + queue_work(intr->wq, &sp->work);
> >> + }
> >> + }
> >> +
> >> + return IRQ_HANDLED;
> >> +}
> >
> > Maybe it would be better to call the syncpt handlers in interrupt
> > context and let them schedule work if they want to. I'm thinking about
> > the display controllers which may want to use syncpoints for VBLANK
> > support.
>
> Display controller can use the APIs to read, increment and wait for sync
> point.

Actually for the display controller we want just a notification when the
VBLANK happens. I'm not sure if we want to do that with syncpoints at
all since it works quite well using regular interrupts.

> We could do more in isr, but then again, we've noticed that the current
> design already gives pretty good latency, so we haven't seen the need to
> move code from thread to isr.

What I'm proposing is to leave it up to each host1x client how they want
to handle this. For display controllers it may be enough to have their
callback run in interrupt context but other clients may need to do more
work so they can queue it themselves.

I know that this looks like it might be more work, but if it turns out
that many drivers need to do the exact same thing, that functionality
can be factored out into a helper. But it may just as well turn out that
the requirements for each module are slightly different that forcing a
workqueue on them could result in ugly workarounds because it doesn't
quite work for them.

> >> +/**
> >> + * Sync point threshold interrupt service function
> >> + * Handles sync point threshold triggers, in interrupt context
> >> + */
> >> +static void host1x_intr_syncpt_thresh_isr(struct nvhost_intr_syncpt *syncpt)
> >> +{
> >> + unsigned int id = syncpt->id;
> >> + struct nvhost_intr *intr = intr_syncpt_to_intr(syncpt);
> >> +
> >> + void __iomem *sync_regs = intr_to_dev(intr)->sync_aperture;
> >> +
> >> + u32 reg = BIT_WORD(id) * REGISTER_STRIDE;
> >> +
> >> + writel(BIT_MASK(id), sync_regs +
> >> + host1x_sync_syncpt_thresh_int_disable_r() + reg);
> >> + writel(BIT_MASK(id), sync_regs +
> >> + host1x_sync_syncpt_thresh_cpu0_int_status_r() + reg);
> >> +}
> >
> > So this disables all interrupts and is called from the syncpt interrupt
> > handler. Where are the interrupts reenabled? Do host1x clients have to
> > do that manually?
>
> The thread re-enables once it's done. It checks the next value we're
> interested in, and programs that to host1x syncpt threshold.

Okay, that does make sense now. I think I'm indeed beginning to
understand how the hardware works...

> > Okay, so this is where syncpoint interrupts are reenabled. The rest of
> > this whole wait queue code looks overly complex. I'll need to go through
> > that separately and more thoroughly.
>
> Thanks.

If we move responsibility of managing the workqueue out of host1x as I
proposed above, maybe a lot of this code can be removed. Maybe you can
explain a bit what they are used for exactly in your write-up.

Thierry
Stephen Warren
2012-11-29 18:41:50 UTC
Permalink
On 11/29/2012 01:44 AM, Thierry Reding wrote:
> On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:

>> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c
>> b/drivers/video/tegra/host/host1x/host1x_intr.c
> [...]
>> +/* Spacing between sync registers */ +#define REGISTER_STRIDE 4
>
> Erm... no. The usual way you should be doing this is either make
> the register definitions account for the stride or use accessors
> that apply the stride. You should be doing the latter anyway to
> make accesses. For example:
>
> static inline void host1x_syncpt_writel(struct host1x *host1x,
> unsigned long value, unsigned long offset) { writel(value,
> host1x->regs + SYNCPT_BASE + offset); }
>
> static inline unsigned long host1x_syncpt_readl(struct host1x
> *host1x, unsigned long offset) { return readl(host1x->regs +
> SYNCPT_BASE + offset); }
>
> Alternatively, if you want to pass the register index instead of
> the offset, you can use just multiply the offset in that function:
>
> writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
>
> The same can also be done with the non-syncpt registers.

It seems like reasonable documentation to replace "<< 2" with "*
REGISTER_STRIDE" here.
Thierry Reding
2012-11-30 07:23:57 UTC
Permalink
On Thu, Nov 29, 2012 at 11:41:50AM -0700, Stephen Warren wrote:
> On 11/29/2012 01:44 AM, Thierry Reding wrote:
> > On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:
>
> >> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c
> >> b/drivers/video/tegra/host/host1x/host1x_intr.c
> > [...]
> >> +/* Spacing between sync registers */ +#define REGISTER_STRIDE 4
> >
> > Erm... no. The usual way you should be doing this is either make
> > the register definitions account for the stride or use accessors
> > that apply the stride. You should be doing the latter anyway to
> > make accesses. For example:
> >
> > static inline void host1x_syncpt_writel(struct host1x *host1x,
> > unsigned long value, unsigned long offset) { writel(value,
> > host1x->regs + SYNCPT_BASE + offset); }
> >
> > static inline unsigned long host1x_syncpt_readl(struct host1x
> > *host1x, unsigned long offset) { return readl(host1x->regs +
> > SYNCPT_BASE + offset); }
> >
> > Alternatively, if you want to pass the register index instead of
> > the offset, you can use just multiply the offset in that function:
> >
> > writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
> >
> > The same can also be done with the non-syncpt registers.
>
> It seems like reasonable documentation to replace "<< 2" with "*
> REGISTER_STRIDE" here.

Given that it is a very common pattern, << 2 seems enough documentation
to me, but sure, if you prefer to be extra explicit that's fine with me.

Thierry
Terje Bergstrom
2012-11-26 13:19:13 UTC
Permalink
From: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+***@public.gmane.org>

This patch introduces support for exporting allocated memory as
dmabuf objects. Exported buffers are used for delivering data to
nvhost driver.

Signed-off-by: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+***@public.gmane.org>
Signed-off-by: Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org>
---
drivers/gpu/drm/tegra/Makefile | 2 +-
drivers/gpu/drm/tegra/dmabuf.c | 150 ++++++++++++++++++++++++++++++++++++++++
drivers/gpu/drm/tegra/drm.c | 9 ++-
drivers/gpu/drm/tegra/drm.h | 6 ++
4 files changed, 165 insertions(+), 2 deletions(-)
create mode 100644 drivers/gpu/drm/tegra/dmabuf.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 57a334d..53ea383 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -3,6 +3,6 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG

tegra-drm-y := drm.o fb.o dc.o
tegra-drm-y += output.o rgb.o hdmi.o tvo.o dsi.o
-tegra-drm-y += plane.o
+tegra-drm-y += plane.o dmabuf.o

obj-$(CONFIG_DRM_TEGRA) += tegra-drm.o
diff --git a/drivers/gpu/drm/tegra/dmabuf.c b/drivers/gpu/drm/tegra/dmabuf.c
new file mode 100644
index 0000000..e81db12
--- /dev/null
+++ b/drivers/gpu/drm/tegra/dmabuf.c
@@ -0,0 +1,150 @@
+/*
+ * drivers/gpu/drm/tegra/dmabuf.c
+ *
+ * dmabuf exporter for cma allocations
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <drm/drmP.h>
+#include <drm/drm.h>
+#include <drm/drm_gem_cma_helper.h>
+
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/scatterlist.h>
+#include <linux/uaccess.h>
+#include <linux/io.h>
+#include <linux/platform_device.h>
+#include <linux/dma-buf.h>
+
+#include <asm/page.h>
+#include <asm/dma-iommu.h>
+
+static struct sg_table *tegra_dmabuf_map(struct dma_buf_attachment *attach,
+ enum dma_data_direction dir)
+{
+ struct drm_gem_cma_object *obj = attach->dmabuf->priv;
+ struct page **pages;
+ int npages = obj->base.size / PAGE_SIZE;
+ struct sg_table *sgt;
+ struct scatterlist *sg;
+ int i, page_num = 0;
+
+ /* create a list of used pages */
+ pages = kzalloc(sizeof(*pages) * npages, GFP_KERNEL);
+ if (!pages)
+ goto err_alloc_pages;
+ for (i = 0; i < npages; i++)
+ pages[i] = virt_to_page(obj->vaddr + i * PAGE_SIZE);
+
+ /* generate sgt using the page list */
+ sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+ if (!sgt)
+ goto err_alloc_sgt;
+ if (sg_alloc_table_from_pages(sgt, pages, npages, 0, obj->base.size,
+ GFP_KERNEL))
+ goto err_generate_sgt;
+ for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+ sg->dma_address = page_to_phys(pages[page_num]);
+ page_num += sg->length >> PAGE_SHIFT;
+ }
+
+ /* only the sgt is interesting */
+ kfree(pages);
+
+ return sgt;
+
+err_generate_sgt:
+ kfree(sgt);
+err_alloc_sgt:
+ kfree(pages);
+err_alloc_pages:
+ return NULL;
+}
+
+static void tegra_dmabuf_unmap(struct dma_buf_attachment *attach,
+ struct sg_table *sgt,
+ enum dma_data_direction dir)
+{
+ sg_free_table(sgt);
+ kfree(sgt);
+}
+
+static void tegra_dmabuf_release(struct dma_buf *dmabuf)
+{
+ struct drm_gem_cma_object *obj = dmabuf->priv;
+
+ if (obj->base.export_dma_buf == dmabuf) {
+ obj->base.export_dma_buf = NULL;
+ drm_gem_object_unreference_unlocked(&obj->base);
+ }
+}
+
+static void *tegra_dmabuf_kmap_atomic(struct dma_buf *dmabuf,
+ unsigned long page_num)
+{
+ struct drm_gem_cma_object *obj = dmabuf->priv;
+ return obj->vaddr + PAGE_SIZE * page_num;
+}
+
+static void *tegra_dmabuf_kmap(struct dma_buf *dmabuf, unsigned long page_num)
+{
+ return tegra_dmabuf_kmap_atomic(dmabuf, page_num);
+}
+
+static int tegra_dmabuf_mmap(struct dma_buf *dmabuf,
+ struct vm_area_struct *vma)
+{
+ struct drm_gem_cma_object *obj = dmabuf->priv;
+ DEFINE_DMA_ATTRS(attrs);
+
+ vma->vm_private_data = obj;
+
+ return dma_mmap_attrs(obj->base.dev->dev->parent, vma, obj->vaddr,
+ obj->paddr, obj->base.size, &attrs);
+}
+
+static void *tegra_dmabuf_vmap(struct dma_buf *dmabuf)
+{
+ struct drm_gem_cma_object *obj = dmabuf->priv;
+ return obj->vaddr;
+}
+
+static struct dma_buf_ops tegra_dmabuf_ops = {
+ .map_dma_buf = tegra_dmabuf_map,
+ .unmap_dma_buf = tegra_dmabuf_unmap,
+ .release = tegra_dmabuf_release,
+ .kmap_atomic = tegra_dmabuf_kmap_atomic,
+ .kmap = tegra_dmabuf_kmap,
+ .mmap = tegra_dmabuf_mmap,
+ .vmap = tegra_dmabuf_vmap,
+};
+
+struct dma_buf *tegra_dmabuf_export(struct drm_device *drm_dev,
+ struct drm_gem_object *obj, int flags)
+{
+ return dma_buf_export(obj, &tegra_dmabuf_ops, obj->size, O_RDWR);
+}
+
+struct drm_gem_object *tegra_dmabuf_import(struct drm_device *drm_dev,
+ struct dma_buf *dmabuf)
+{
+ return NULL;
+}
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index cba2d1d..f78a31b 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -251,7 +251,8 @@ static const struct file_operations tegra_drm_fops = {
};

struct drm_driver tegra_drm_driver = {
- .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM,
+ .driver_features = DRIVER_BUS_PLATFORM | DRIVER_MODESET | DRIVER_GEM |
+ DRIVER_PRIME,
.load = tegra_drm_load,
.unload = tegra_drm_unload,
.open = tegra_drm_open,
@@ -267,6 +268,12 @@ struct drm_driver tegra_drm_driver = {
.num_ioctls = ARRAY_SIZE(tegra_drm_ioctls),
.fops = &tegra_drm_fops,

+ .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
+ .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+
+ .gem_prime_export = tegra_dmabuf_export,
+ .gem_prime_import = tegra_dmabuf_import,
+
.name = DRIVER_NAME,
.desc = DRIVER_DESC,
.date = DRIVER_DATE,
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index b2f9f10..1267a38 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -227,4 +227,10 @@ extern struct platform_driver tegra_dsi_driver;
extern struct platform_driver tegra_dc_driver;
extern struct drm_driver tegra_drm_driver;

+/* from dmabuf.c */
+struct dma_buf *tegra_dmabuf_export(struct drm_device *drm_dev,
+ struct drm_gem_object *obj, int flags);
+struct drm_gem_object *tegra_dmabuf_import(struct drm_device *drm_dev,
+ struct dma_buf *dmabuf);
+
#endif /* TEGRA_DRM_H */
--
1.7.9.5
Terje Bergstrom
2012-11-26 13:19:09 UTC
Permalink
Add support for host1x client modules, and host1x channels to submit
work to the clients. The work is submitted in dmabuf buffers, so add
support for dmabuf memory management, too.

Signed-off-by: Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org>
---
drivers/video/tegra/host/Makefile | 8 +-
drivers/video/tegra/host/bus_client.c | 94 ++++
drivers/video/tegra/host/chip_support.c | 2 +-
drivers/video/tegra/host/chip_support.h | 64 +++
drivers/video/tegra/host/dev.c | 67 +++
drivers/video/tegra/host/dev.h | 33 ++
drivers/video/tegra/host/dmabuf.c | 151 ++++++
drivers/video/tegra/host/dmabuf.h | 45 ++
drivers/video/tegra/host/host1x/host1x.c | 17 +
drivers/video/tegra/host/host1x/host1x.h | 6 +
drivers/video/tegra/host/host1x/host1x01.c | 29 ++
.../video/tegra/host/host1x/host1x01_hardware.h | 121 +++++
drivers/video/tegra/host/host1x/host1x_cdma.c | 483 ++++++++++++++++++++
drivers/video/tegra/host/host1x/host1x_cdma.h | 39 ++
drivers/video/tegra/host/host1x/host1x_channel.c | 150 ++++++
drivers/video/tegra/host/host1x/host1x_syncpt.c | 11 +
.../video/tegra/host/host1x/hw_host1x01_channel.h | 182 ++++++++
.../video/tegra/host/host1x/hw_host1x01_uclass.h | 474 +++++++++++++++++++
drivers/video/tegra/host/nvhost_cdma.c | 429 +++++++++++++++++
drivers/video/tegra/host/nvhost_cdma.h | 109 +++++
drivers/video/tegra/host/nvhost_channel.c | 126 +++++
drivers/video/tegra/host/nvhost_channel.h | 65 +++
drivers/video/tegra/host/nvhost_intr.c | 23 +-
drivers/video/tegra/host/nvhost_intr.h | 8 +
drivers/video/tegra/host/nvhost_job.c | 390 ++++++++++++++++
drivers/video/tegra/host/nvhost_memmgr.c | 160 +++++++
drivers/video/tegra/host/nvhost_memmgr.h | 65 +++
drivers/video/tegra/host/nvhost_syncpt.c | 6 +
drivers/video/tegra/host/nvhost_syncpt.h | 2 +
include/linux/nvhost.h | 149 ++++++
30 files changed, 3505 insertions(+), 3 deletions(-)
create mode 100644 drivers/video/tegra/host/bus_client.c
create mode 100644 drivers/video/tegra/host/dev.h
create mode 100644 drivers/video/tegra/host/dmabuf.c
create mode 100644 drivers/video/tegra/host/dmabuf.h
create mode 100644 drivers/video/tegra/host/host1x/host1x_cdma.c
create mode 100644 drivers/video/tegra/host/host1x/host1x_cdma.h
create mode 100644 drivers/video/tegra/host/host1x/host1x_channel.c
create mode 100644 drivers/video/tegra/host/host1x/hw_host1x01_channel.h
create mode 100644 drivers/video/tegra/host/host1x/hw_host1x01_uclass.h
create mode 100644 drivers/video/tegra/host/nvhost_cdma.c
create mode 100644 drivers/video/tegra/host/nvhost_cdma.h
create mode 100644 drivers/video/tegra/host/nvhost_channel.c
create mode 100644 drivers/video/tegra/host/nvhost_channel.h
create mode 100644 drivers/video/tegra/host/nvhost_job.c
create mode 100644 drivers/video/tegra/host/nvhost_memmgr.c
create mode 100644 drivers/video/tegra/host/nvhost_memmgr.h

diff --git a/drivers/video/tegra/host/Makefile b/drivers/video/tegra/host/Makefile
index 24acccc..128ad03 100644
--- a/drivers/video/tegra/host/Makefile
+++ b/drivers/video/tegra/host/Makefile
@@ -3,9 +3,15 @@ ccflags-y = -Idrivers/video/tegra/host
nvhost-objs = \
nvhost_acm.o \
nvhost_syncpt.o \
+ nvhost_cdma.o \
nvhost_intr.o \
+ nvhost_channel.o \
+ nvhost_job.o \
dev.o \
- chip_support.o
+ bus_client.o \
+ chip_support.o \
+ nvhost_memmgr.o \
+ dmabuf.o

obj-$(CONFIG_TEGRA_HOST1X) += host1x/
obj-$(CONFIG_TEGRA_HOST1X) += nvhost.o
diff --git a/drivers/video/tegra/host/bus_client.c b/drivers/video/tegra/host/bus_client.c
new file mode 100644
index 0000000..3986185
--- /dev/null
+++ b/drivers/video/tegra/host/bus_client.c
@@ -0,0 +1,94 @@
+/*
+ * drivers/video/tegra/host/bus_client.c
+ *
+ * Tegra host1x Client Module
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/spinlock.h>
+#include <linux/fs.h>
+#include <linux/cdev.h>
+#include <linux/uaccess.h>
+#include <linux/file.h>
+#include <linux/clk.h>
+#include <linux/hrtimer.h>
+#include <linux/export.h>
+
+#include <linux/io.h>
+#include <linux/string.h>
+
+#include <linux/nvhost.h>
+
+#include "dev.h"
+#include "nvhost_memmgr.h"
+#include "chip_support.h"
+#include "nvhost_acm.h"
+
+#include "nvhost_channel.h"
+
+int nvhost_client_device_init(struct platform_device *dev)
+{
+ int err;
+ struct nvhost_master *nvhost_master = nvhost_get_host(dev);
+ struct nvhost_channel *ch;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ ch = nvhost_alloc_channel(dev);
+ if (ch == NULL)
+ return -ENODEV;
+
+ /* store the pointer to this device for channel */
+ ch->dev = dev;
+
+ err = nvhost_channel_init(ch, nvhost_master, pdata->index);
+ if (err)
+ goto fail;
+
+ err = nvhost_module_init(dev);
+ if (err)
+ goto fail;
+
+ err = nvhost_device_list_add(dev);
+ if (err)
+ goto fail;
+
+ dev_info(&dev->dev, "initialized\n");
+
+ return 0;
+
+fail:
+ /* Add clean-up */
+ nvhost_free_channel(ch);
+ return err;
+}
+EXPORT_SYMBOL(nvhost_client_device_init);
+
+int nvhost_client_device_suspend(struct platform_device *dev)
+{
+ int ret = 0;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ ret = nvhost_channel_suspend(pdata->channel);
+ dev_info(&dev->dev, "suspend status: %d\n", ret);
+ if (ret)
+ return ret;
+
+ return ret;
+}
+EXPORT_SYMBOL(nvhost_client_device_suspend);
diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/tegra/host/chip_support.c
index 5a44147..8765c83 100644
--- a/drivers/video/tegra/host/chip_support.c
+++ b/drivers/video/tegra/host/chip_support.c
@@ -25,7 +25,7 @@
#include "chip_support.h"
#include "host1x/host1x01.h"

-struct nvhost_chip_support *nvhost_chip_ops;
+static struct nvhost_chip_support *nvhost_chip_ops;

struct nvhost_chip_support *nvhost_get_chip_ops(void)
{
diff --git a/drivers/video/tegra/host/chip_support.h b/drivers/video/tegra/host/chip_support.h
index 5c8f49f..ff141ed 100644
--- a/drivers/video/tegra/host/chip_support.h
+++ b/drivers/video/tegra/host/chip_support.h
@@ -27,14 +27,63 @@ struct output;
struct nvhost_master;
struct nvhost_intr;
struct nvhost_syncpt;
+struct nvhost_channel;
+struct nvhost_cdma;
+struct nvhost_job;
+struct push_buffer;
+struct dentry;
+struct nvhost_job;
+struct nvhost_job_unpin_data;
+struct nvhost_intr_syncpt;
+struct mem_handle;
struct platform_device;

+struct nvhost_channel_ops {
+ const char *soc_name;
+ int (*init)(struct nvhost_channel *,
+ struct nvhost_master *,
+ int chid);
+ int (*submit)(struct nvhost_job *job);
+};
+
+struct nvhost_cdma_ops {
+ void (*start)(struct nvhost_cdma *);
+ void (*stop)(struct nvhost_cdma *);
+ void (*kick)(struct nvhost_cdma *);
+ int (*timeout_init)(struct nvhost_cdma *,
+ u32 syncpt_id);
+ void (*timeout_destroy)(struct nvhost_cdma *);
+ void (*timeout_teardown_begin)(struct nvhost_cdma *);
+ void (*timeout_teardown_end)(struct nvhost_cdma *,
+ u32 getptr);
+ void (*timeout_cpu_incr)(struct nvhost_cdma *,
+ u32 getptr,
+ u32 syncpt_incrs,
+ u32 syncval,
+ u32 nr_slots);
+};
+
+struct nvhost_pushbuffer_ops {
+ void (*reset)(struct push_buffer *);
+ int (*init)(struct push_buffer *);
+ void (*destroy)(struct push_buffer *);
+ void (*push_to)(struct push_buffer *,
+ struct mem_handle *,
+ u32 op1, u32 op2);
+ void (*pop_from)(struct push_buffer *,
+ unsigned int slots);
+ u32 (*space)(struct push_buffer *);
+ u32 (*putptr)(struct push_buffer *);
+};
+
struct nvhost_syncpt_ops {
void (*reset)(struct nvhost_syncpt *, u32 id);
void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
void (*read_wait_base)(struct nvhost_syncpt *, u32 id);
u32 (*update_min)(struct nvhost_syncpt *, u32 id);
void (*cpu_incr)(struct nvhost_syncpt *, u32 id);
+ int (*patch_wait)(struct nvhost_syncpt *sp,
+ void *patch_addr);
void (*debug)(struct nvhost_syncpt *);
const char * (*name)(struct nvhost_syncpt *, u32 id);
};
@@ -53,16 +102,31 @@ struct nvhost_intr_ops {
int (*free_syncpt_irq)(struct nvhost_intr *);
};

+struct nvhost_dev_ops {
+ struct nvhost_channel *(*alloc_nvhost_channel)(
+ struct platform_device *dev);
+ void (*free_nvhost_channel)(struct nvhost_channel *ch);
+};
+
struct nvhost_chip_support {
const char *soc_name;
+ struct nvhost_channel_ops channel;
+ struct nvhost_cdma_ops cdma;
+ struct nvhost_pushbuffer_ops push_buffer;
struct nvhost_syncpt_ops syncpt;
struct nvhost_intr_ops intr;
+ struct nvhost_dev_ops nvhost_dev;
};

struct nvhost_chip_support *nvhost_get_chip_ops(void);

+#define host_device_op() (nvhost_get_chip_ops()->nvhost_dev)
+#define channel_cdma_op() (nvhost_get_chip_ops()->cdma)
+#define channel_op() (nvhost_get_chip_ops()->channel)
#define syncpt_op() (nvhost_get_chip_ops()->syncpt)
#define intr_op() (nvhost_get_chip_ops()->intr)
+#define cdma_op() (nvhost_get_chip_ops()->cdma)
+#define cdma_pb_op() (nvhost_get_chip_ops()->push_buffer)

int nvhost_init_chip_support(struct nvhost_master *host);

diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
index 025a820..9dff8d8 100644
--- a/drivers/video/tegra/host/dev.c
+++ b/drivers/video/tegra/host/dev.c
@@ -19,6 +19,9 @@
*/

#include <linux/module.h>
+#include <linux/nvhost.h>
+#include <linux/list.h>
+#include <linux/slab.h>
#include "host1x/host1x.h"
#include "nvhost_acm.h"

@@ -96,6 +99,70 @@ void host1x_idle(struct platform_device *dev)
}
EXPORT_SYMBOL(host1x_idle);

+/* host1x device list in debug-fs dump of host1x and client device
+ * as well as channel state */
+struct nvhost_device_list {
+ struct list_head list;
+ struct platform_device *pdev;
+};
+
+/* HEAD for the host1x device list */
+static struct nvhost_device_list ndev_head;
+
+/* Constructor for the host1x device list */
+void nvhost_device_list_init(void)
+{
+ INIT_LIST_HEAD(&ndev_head.list);
+}
+
+/* Adds a device to tail of host1x device list */
+int nvhost_device_list_add(struct platform_device *pdev)
+{
+ struct nvhost_device_list *list;
+
+ list = kzalloc(sizeof(struct nvhost_device_list), GFP_KERNEL);
+ if (!list)
+ return -ENOMEM;
+
+ list->pdev = pdev;
+ list_add_tail(&list->list, &ndev_head.list);
+
+ return 0;
+}
+
+/* Iterator function for host1x device list
+ * It takes a fptr as an argument and calls that function for each
+ * device in the list */
+void nvhost_device_list_for_all(void *data,
+ int (*fptr)(struct platform_device *pdev, void *fdata))
+{
+ struct nvhost_device_list *nlist;
+ int ret;
+
+ list_for_each_entry(nlist, &ndev_head.list, list) {
+ if (nlist && nlist->pdev && fptr) {
+ ret = fptr(nlist->pdev, data);
+ if (ret) {
+ pr_info("%s: iterator error\n", __func__);
+ break;
+ }
+ }
+ }
+}
+
+/* Removes a device from the host1x device list */
+void nvhost_device_list_remove(struct platform_device *pdev)
+{
+ struct nvhost_device_list *nlist;
+ list_for_each_entry(nlist, &ndev_head.list, list) {
+ if (nlist && nlist->pdev == pdev) {
+ list_del(&nlist->list);
+ kfree(nlist);
+ return;
+ }
+ }
+}
+
MODULE_AUTHOR("Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org>");
MODULE_DESCRIPTION("Host1x driver for Tegra products");
MODULE_VERSION("1.0");
diff --git a/drivers/video/tegra/host/dev.h b/drivers/video/tegra/host/dev.h
new file mode 100644
index 0000000..12dfda5
--- /dev/null
+++ b/drivers/video/tegra/host/dev.h
@@ -0,0 +1,33 @@
+/*
+ * drivers/video/tegra/host/dev.h
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef NVHOST_DEV_H
+#define NVHOST_DEV_H
+
+#include "host1x/host1x.h"
+
+struct platform_device;
+
+void nvhost_device_list_init(void);
+int nvhost_device_list_add(struct platform_device *pdev);
+void nvhost_device_list_for_all(void *data,
+ int (*fptr)(struct platform_device *pdev, void *fdata));
+struct platform_device *nvhost_device_list_match_by_id(u32 id);
+void nvhost_device_list_remove(struct platform_device *pdev);
+
+#endif
diff --git a/drivers/video/tegra/host/dmabuf.c b/drivers/video/tegra/host/dmabuf.c
new file mode 100644
index 0000000..8af79b7
--- /dev/null
+++ b/drivers/video/tegra/host/dmabuf.c
@@ -0,0 +1,151 @@
+/*
+ * drivers/video/tegra/host/dmabuf.c
+ *
+ * Tegra host1x DMA-BUF support
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/dma-buf.h>
+#include <linux/nvhost.h>
+#include "chip_support.h"
+#include "nvhost_memmgr.h"
+
+static inline struct dma_buf_attachment *to_dmabuf_att(struct mem_handle *h)
+{
+ return (struct dma_buf_attachment *)(((u32)h) & ~0x3);
+}
+
+static inline struct dma_buf *to_dmabuf(struct mem_handle *h)
+{
+ return to_dmabuf_att(h)->dmabuf;
+}
+
+static inline int to_dmabuf_fd(u32 id)
+{
+ return nvhost_memmgr_id(id) >> 2;
+}
+struct mem_handle *nvhost_dmabuf_alloc(size_t size, size_t align, int flags)
+{
+ /* TODO: Add allocation via DMA Mapping API */
+ return NULL;
+}
+
+void nvhost_dmabuf_put(struct mem_handle *handle)
+{
+ struct dma_buf_attachment *attach = to_dmabuf_att(handle);
+ struct dma_buf *dmabuf = attach->dmabuf;
+ dma_buf_detach(dmabuf, attach);
+ dma_buf_put(dmabuf);
+}
+
+struct sg_table *nvhost_dmabuf_pin(struct mem_handle *handle)
+{
+ return dma_buf_map_attachment(to_dmabuf_att(handle),
+ DMA_BIDIRECTIONAL);
+}
+
+void nvhost_dmabuf_unpin(struct mem_handle *handle, struct sg_table *sgt)
+{
+ dma_buf_unmap_attachment(to_dmabuf_att(handle), sgt, DMA_BIDIRECTIONAL);
+}
+
+
+void *nvhost_dmabuf_mmap(struct mem_handle *handle)
+{
+ return dma_buf_vmap(to_dmabuf(handle));
+}
+
+void nvhost_dmabuf_munmap(struct mem_handle *handle, void *addr)
+{
+ dma_buf_vunmap(to_dmabuf(handle), addr);
+}
+
+void *nvhost_dmabuf_kmap(struct mem_handle *handle, unsigned int pagenum)
+{
+ return dma_buf_kmap(to_dmabuf(handle), pagenum);
+}
+
+void nvhost_dmabuf_kunmap(struct mem_handle *handle, unsigned int pagenum,
+ void *addr)
+{
+ dma_buf_kunmap(to_dmabuf(handle), pagenum, addr);
+}
+
+struct mem_handle *nvhost_dmabuf_get(u32 id, struct platform_device *dev)
+{
+ struct mem_handle *h;
+ struct dma_buf *buf;
+
+ buf = dma_buf_get(to_dmabuf_fd(id));
+ if (IS_ERR_OR_NULL(buf))
+ return (struct mem_handle *)buf;
+
+ h = (struct mem_handle *)dma_buf_attach(buf, &dev->dev);
+ if (IS_ERR_OR_NULL(h))
+ dma_buf_put(buf);
+
+ return (struct mem_handle *) ((u32)h | mem_mgr_type_dmabuf);
+}
+
+int nvhost_dmabuf_pin_array_ids(struct platform_device *dev,
+ long unsigned *ids,
+ long unsigned id_type_mask,
+ long unsigned id_type,
+ u32 count,
+ struct nvhost_job_unpin_data *unpin_data,
+ dma_addr_t *phys_addr)
+{
+ int i;
+ int pin_count = 0;
+ int err;
+
+ for (i = 0; i < count; i++) {
+ struct mem_handle *handle;
+ struct sg_table *sgt;
+
+ if ((ids[i] & id_type_mask) != id_type)
+ continue;
+
+ handle = nvhost_dmabuf_get(ids[i], dev);
+
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ goto fail;
+ }
+
+ sgt = nvhost_dmabuf_pin(handle);
+ if (IS_ERR_OR_NULL(sgt)) {
+ nvhost_dmabuf_put(handle);
+ err = PTR_ERR(sgt);
+ goto fail;
+ }
+
+ phys_addr[i] = sg_dma_address(sgt->sgl);
+
+ unpin_data[pin_count].h = handle;
+ unpin_data[pin_count].mem = sgt;
+ pin_count++;
+ }
+ return pin_count;
+fail:
+ while (pin_count) {
+ pin_count--;
+ nvhost_dmabuf_unpin(unpin_data[pin_count].h,
+ unpin_data[pin_count].mem);
+ nvhost_dmabuf_put(unpin_data[pin_count].h);
+ }
+ return err;
+}
diff --git a/drivers/video/tegra/host/dmabuf.h b/drivers/video/tegra/host/dmabuf.h
new file mode 100644
index 0000000..d31827b
--- /dev/null
+++ b/drivers/video/tegra/host/dmabuf.h
@@ -0,0 +1,45 @@
+/*
+ * drivers/video/tegra/host/dmabuf.h
+ *
+ * Tegra host1x dmabuf memory manager
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_DMABUF_H
+#define __NVHOST_DMABUF_H
+
+#include "nvhost_memmgr.h"
+
+struct platform_device;
+
+struct mem_handle *nvhost_dmabuf_alloc(size_t size, size_t align, int flags);
+void nvhost_dmabuf_put(struct mem_handle *handle);
+struct sg_table *nvhost_dmabuf_pin(struct mem_handle *handle);
+void nvhost_dmabuf_unpin(struct mem_handle *handle, struct sg_table *sgt);
+void *nvhost_dmabuf_mmap(struct mem_handle *handle);
+void nvhost_dmabuf_munmap(struct mem_handle *handle, void *addr);
+void *nvhost_dmabuf_kmap(struct mem_handle *handle, unsigned int pagenum);
+void nvhost_dmabuf_kunmap(struct mem_handle *handle, unsigned int pagenum,
+ void *addr);
+int nvhost_dmabuf_get(u32 id, struct platform_device *dev);
+int nvhost_dmabuf_pin_array_ids(struct platform_device *dev,
+ long unsigned *ids,
+ long unsigned id_type_mask,
+ long unsigned id_type,
+ u32 count,
+ struct nvhost_job_unpin_data *unpin_data,
+ dma_addr_t *phys_addr);
+#endif
diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c
index 766931b..8033b2d 100644
--- a/drivers/video/tegra/host/host1x/host1x.c
+++ b/drivers/video/tegra/host/host1x/host1x.c
@@ -29,8 +29,10 @@
#include <linux/of.h>
#include <linux/nvhost.h>

+#include "dev.h"
#include "host1x/host1x.h"
#include "nvhost_acm.h"
+#include "nvhost_channel.h"
#include "chip_support.h"

#define DRIVER_NAME "tegra-host1x"
@@ -66,6 +68,16 @@ static int clock_off_host(struct platform_device *dev)
return 0;
}

+struct nvhost_channel *nvhost_alloc_channel(struct platform_device *dev)
+{
+ return host_device_op().alloc_nvhost_channel(dev);
+}
+
+void nvhost_free_channel(struct nvhost_channel *ch)
+{
+ host_device_op().free_nvhost_channel(ch);
+}
+
static void nvhost_free_resources(struct nvhost_master *host)
{
kfree(host->intr.syncpt);
@@ -167,6 +179,11 @@ static int __devinit nvhost_probe(struct platform_device *dev)
for (i = 0; i < pdata->num_clks; i++)
clk_disable_unprepare(pdata->clk[i]);

+ nvhost_device_list_init();
+ err = nvhost_device_list_add(dev);
+ if (err)
+ goto fail;
+
dev_info(&dev->dev, "initialized\n");

return 0;
diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h
index af9bfef..28ec916 100644
--- a/drivers/video/tegra/host/host1x/host1x.h
+++ b/drivers/video/tegra/host/host1x/host1x.h
@@ -30,17 +30,23 @@
#define TRACE_MAX_LENGTH 128U
#define IFACE_NAME "nvhost"

+struct nvhost_channel;
+
struct nvhost_master {
void __iomem *aperture;
void __iomem *sync_aperture;
struct nvhost_syncpt syncpt;
struct nvhost_intr intr;
struct platform_device *dev;
+ atomic_t clientid;
struct host1x_device_info info;
};

extern struct nvhost_master *nvhost;

+struct nvhost_channel *nvhost_alloc_channel(struct platform_device *dev);
+void nvhost_free_channel(struct nvhost_channel *ch);
+
static inline void *nvhost_get_private_data(struct platform_device *_dev)
{
struct nvhost_device_data *pdata =
diff --git a/drivers/video/tegra/host/host1x/host1x01.c b/drivers/video/tegra/host/host1x/host1x01.c
index 5bf0e6e..cd97339 100644
--- a/drivers/video/tegra/host/host1x/host1x01.c
+++ b/drivers/video/tegra/host/host1x/host1x01.c
@@ -18,22 +18,51 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

+#include <linux/init.h>
+#include <linux/clk.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
#include <linux/nvhost.h>

#include "host1x/host1x01.h"
#include "host1x/host1x.h"
+#include "nvhost_channel.h"
#include "host1x/host1x01_hardware.h"
#include "chip_support.h"

+static int t20_num_alloc_channels;
+
+static void t20_free_nvhost_channel(struct nvhost_channel *ch)
+{
+ nvhost_free_channel_internal(ch, &t20_num_alloc_channels);
+}
+
+static
+struct nvhost_channel *t20_alloc_nvhost_channel(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+ return nvhost_alloc_channel_internal(pdata->index,
+ nvhost_get_host(dev)->info.nb_channels,
+ &t20_num_alloc_channels);
+}
+
+#include "host1x/host1x_channel.c"
+#include "host1x/host1x_cdma.c"
#include "host1x/host1x_syncpt.c"
#include "host1x/host1x_intr.c"

int nvhost_init_host1x01_support(struct nvhost_master *host,
struct nvhost_chip_support *op)
{
+ op->channel = host1x_channel_ops;
+ op->cdma = host1x_cdma_ops;
+ op->push_buffer = host1x_pushbuffer_ops;
host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE;
op->syncpt = host1x_syncpt_ops;
op->intr = host1x_intr_ops;

+ op->nvhost_dev.alloc_nvhost_channel = t20_alloc_nvhost_channel;
+ op->nvhost_dev.free_nvhost_channel = t20_free_nvhost_channel;
+
return 0;
}
diff --git a/drivers/video/tegra/host/host1x/host1x01_hardware.h b/drivers/video/tegra/host/host1x/host1x01_hardware.h
index 0da7e06..0065b24 100644
--- a/drivers/video/tegra/host/host1x/host1x01_hardware.h
+++ b/drivers/video/tegra/host/host1x/host1x01_hardware.h
@@ -23,7 +23,9 @@

#include <linux/types.h>
#include <linux/bitops.h>
+#include "hw_host1x01_channel.h"
#include "hw_host1x01_sync.h"
+#include "hw_host1x01_uclass.h"

/* channel registers */
#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384
@@ -33,4 +35,123 @@
#define HOST1X_CHANNEL_SYNC_REG_BASE 0x3000
#define NV_HOST1X_NB_MLOCKS 16

+static inline u32 nvhost_class_host_wait_syncpt(
+ unsigned indx, unsigned threshold)
+{
+ return host1x_uclass_wait_syncpt_indx_f(indx)
+ | host1x_uclass_wait_syncpt_thresh_f(threshold);
+}
+
+static inline u32 nvhost_class_host_load_syncpt_base(
+ unsigned indx, unsigned threshold)
+{
+ return host1x_uclass_load_syncpt_base_base_indx_f(indx)
+ | host1x_uclass_load_syncpt_base_value_f(threshold);
+}
+
+static inline u32 nvhost_class_host_wait_syncpt_base(
+ unsigned indx, unsigned base_indx, unsigned offset)
+{
+ return host1x_uclass_wait_syncpt_base_indx_f(indx)
+ | host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
+ | host1x_uclass_wait_syncpt_base_offset_f(offset);
+}
+
+static inline u32 nvhost_class_host_incr_syncpt_base(
+ unsigned base_indx, unsigned offset)
+{
+ return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
+ | host1x_uclass_incr_syncpt_base_offset_f(offset);
+}
+
+static inline u32 nvhost_class_host_incr_syncpt(
+ unsigned cond, unsigned indx)
+{
+ return host1x_uclass_incr_syncpt_cond_f(cond)
+ | host1x_uclass_incr_syncpt_indx_f(indx);
+}
+
+static inline u32 nvhost_class_host_indoff_reg_write(
+ unsigned mod_id, unsigned offset, bool auto_inc)
+{
+ u32 v = host1x_uclass_indoff_indbe_f(0xf)
+ | host1x_uclass_indoff_indmodid_f(mod_id)
+ | host1x_uclass_indoff_indroffset_f(offset);
+ if (auto_inc)
+ v |= host1x_uclass_indoff_autoinc_f(1);
+ return v;
+}
+
+static inline u32 nvhost_class_host_indoff_reg_read(
+ unsigned mod_id, unsigned offset, bool auto_inc)
+{
+ u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
+ | host1x_uclass_indoff_indroffset_f(offset)
+ | host1x_uclass_indoff_rwn_read_v();
+ if (auto_inc)
+ v |= host1x_uclass_indoff_autoinc_f(1);
+ return v;
+}
+
+
+/* cdma opcodes */
+static inline u32 nvhost_opcode_setclass(
+ unsigned class_id, unsigned offset, unsigned mask)
+{
+ return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
+}
+
+static inline u32 nvhost_opcode_incr(unsigned offset, unsigned count)
+{
+ return (1 << 28) | (offset << 16) | count;
+}
+
+static inline u32 nvhost_opcode_nonincr(unsigned offset, unsigned count)
+{
+ return (2 << 28) | (offset << 16) | count;
+}
+
+static inline u32 nvhost_opcode_mask(unsigned offset, unsigned mask)
+{
+ return (3 << 28) | (offset << 16) | mask;
+}
+
+static inline u32 nvhost_opcode_imm(unsigned offset, unsigned value)
+{
+ return (4 << 28) | (offset << 16) | value;
+}
+
+static inline u32 nvhost_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
+{
+ return nvhost_opcode_imm(host1x_uclass_incr_syncpt_r(),
+ nvhost_class_host_incr_syncpt(cond, indx));
+}
+
+static inline u32 nvhost_opcode_restart(unsigned address)
+{
+ return (5 << 28) | (address >> 4);
+}
+
+static inline u32 nvhost_opcode_gather(unsigned count)
+{
+ return (6 << 28) | count;
+}
+
+static inline u32 nvhost_opcode_gather_nonincr(unsigned offset, unsigned count)
+{
+ return (6 << 28) | (offset << 16) | BIT(15) | count;
+}
+
+static inline u32 nvhost_opcode_gather_incr(unsigned offset, unsigned count)
+{
+ return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
+}
+
+#define NVHOST_OPCODE_NOOP nvhost_opcode_nonincr(0, 0)
+
+static inline u32 nvhost_mask2(unsigned x, unsigned y)
+{
+ return 1 | (1 << (y - x));
+}
+
#endif
diff --git a/drivers/video/tegra/host/host1x/host1x_cdma.c b/drivers/video/tegra/host/host1x/host1x_cdma.c
new file mode 100644
index 0000000..07f0758
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x_cdma.c
@@ -0,0 +1,483 @@
+/*
+ * drivers/video/tegra/host/host1x/host1x_cdma.c
+ *
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/slab.h>
+#include <linux/scatterlist.h>
+#include <linux/dma-mapping.h>
+#include "nvhost_acm.h"
+#include "nvhost_cdma.h"
+#include "nvhost_channel.h"
+#include "dev.h"
+#include "chip_support.h"
+#include "nvhost_memmgr.h"
+
+#include "host1x_cdma.h"
+
+static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get)
+{
+ return host1x_channel_dmactrl_dmastop_f(stop)
+ | host1x_channel_dmactrl_dmagetrst_f(get_rst)
+ | host1x_channel_dmactrl_dmainitget_f(init_get);
+}
+
+static void cdma_timeout_handler(struct work_struct *work);
+
+/*
+ * push_buffer
+ *
+ * The push buffer is a circular array of words to be fetched by command DMA.
+ * Note that it works slightly differently to the sync queue; fence == cur
+ * means that the push buffer is full, not empty.
+ */
+
+
+/**
+ * Reset to empty push buffer
+ */
+static void push_buffer_reset(struct push_buffer *pb)
+{
+ pb->fence = PUSH_BUFFER_SIZE - 8;
+ pb->cur = 0;
+}
+
+/**
+ * Init push buffer resources
+ */
+static void push_buffer_destroy(struct push_buffer *pb);
+static int push_buffer_init(struct push_buffer *pb)
+{
+ struct nvhost_cdma *cdma = pb_to_cdma(pb);
+ struct nvhost_master *master = cdma_to_dev(cdma);
+ pb->mapped = NULL;
+ pb->phys = 0;
+ pb->handle = NULL;
+
+ cdma_pb_op().reset(pb);
+
+ /* allocate and map pushbuffer memory */
+ pb->mapped = dma_alloc_writecombine(&master->dev->dev,
+ PUSH_BUFFER_SIZE + 4, &pb->phys, GFP_KERNEL);
+ if (IS_ERR_OR_NULL(pb->mapped)) {
+ pb->mapped = NULL;
+ goto fail;
+ }
+
+ /* memory for storing mem client and handles for each opcode pair */
+ pb->handle = kzalloc(NVHOST_GATHER_QUEUE_SIZE *
+ sizeof(struct mem_handle *),
+ GFP_KERNEL);
+ if (!pb->handle)
+ goto fail;
+
+ /* put the restart at the end of pushbuffer memory */
+ *(pb->mapped + (PUSH_BUFFER_SIZE >> 2)) =
+ nvhost_opcode_restart(pb->phys);
+
+ return 0;
+
+fail:
+ push_buffer_destroy(pb);
+ return -ENOMEM;
+}
+
+/**
+ * Clean up push buffer resources
+ */
+static void push_buffer_destroy(struct push_buffer *pb)
+{
+ struct nvhost_cdma *cdma = pb_to_cdma(pb);
+ struct nvhost_master *master = cdma_to_dev(cdma);
+
+ if (pb->phys != 0)
+ dma_free_writecombine(&master->dev->dev,
+ PUSH_BUFFER_SIZE + 4,
+ pb->mapped, pb->phys);
+
+ kfree(pb->handle);
+
+ pb->mapped = NULL;
+ pb->phys = 0;
+ pb->handle = 0;
+}
+
+/**
+ * Push two words to the push buffer
+ * Caller must ensure push buffer is not full
+ */
+static void push_buffer_push_to(struct push_buffer *pb,
+ struct mem_handle *handle,
+ u32 op1, u32 op2)
+{
+ u32 cur = pb->cur;
+ u32 *p = (u32 *)((u32)pb->mapped + cur);
+ u32 cur_mem = (cur/8) & (NVHOST_GATHER_QUEUE_SIZE - 1);
+ WARN_ON(cur == pb->fence);
+ *(p++) = op1;
+ *(p++) = op2;
+ pb->handle[cur_mem] = handle;
+ pb->cur = (cur + 8) & (PUSH_BUFFER_SIZE - 1);
+}
+
+/**
+ * Pop a number of two word slots from the push buffer
+ * Caller must ensure push buffer is not empty
+ */
+static void push_buffer_pop_from(struct push_buffer *pb,
+ unsigned int slots)
+{
+ /* Clear the mem references for old items from pb */
+ unsigned int i;
+ u32 fence_mem = pb->fence/8;
+ for (i = 0; i < slots; i++) {
+ int cur_fence_mem = (fence_mem+i)
+ & (NVHOST_GATHER_QUEUE_SIZE - 1);
+ pb->handle[cur_fence_mem] = NULL;
+ }
+ /* Advance the next write position */
+ pb->fence = (pb->fence + slots * 8) & (PUSH_BUFFER_SIZE - 1);
+}
+
+/**
+ * Return the number of two word slots free in the push buffer
+ */
+static u32 push_buffer_space(struct push_buffer *pb)
+{
+ return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
+}
+
+static u32 push_buffer_putptr(struct push_buffer *pb)
+{
+ return pb->phys + pb->cur;
+}
+
+/*
+ * The syncpt incr buffer is filled with methods to increment syncpts, which
+ * is later GATHER-ed into the mainline PB. It's used when a timed out context
+ * is interleaved with other work, so needs to inline the syncpt increments
+ * to maintain the count (but otherwise does no work).
+ */
+
+/**
+ * Init timeout resources
+ */
+static int cdma_timeout_init(struct nvhost_cdma *cdma,
+ u32 syncpt_id)
+{
+ if (syncpt_id == NVSYNCPT_INVALID)
+ return -EINVAL;
+
+ INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler);
+ cdma->timeout.initialized = true;
+
+ return 0;
+}
+
+/**
+ * Clean up timeout resources
+ */
+static void cdma_timeout_destroy(struct nvhost_cdma *cdma)
+{
+ if (cdma->timeout.initialized)
+ cancel_delayed_work(&cdma->timeout.wq);
+ cdma->timeout.initialized = false;
+}
+
+/**
+ * Increment timedout buffer's syncpt via CPU.
+ */
+static void cdma_timeout_cpu_incr(struct nvhost_cdma *cdma, u32 getptr,
+ u32 syncpt_incrs, u32 syncval, u32 nr_slots)
+{
+ struct nvhost_master *dev = cdma_to_dev(cdma);
+ struct push_buffer *pb = &cdma->push_buffer;
+ u32 i, getidx;
+
+ for (i = 0; i < syncpt_incrs; i++)
+ nvhost_syncpt_cpu_incr(&dev->syncpt, cdma->timeout.syncpt_id);
+
+ /* after CPU incr, ensure shadow is up to date */
+ nvhost_syncpt_update_min(&dev->syncpt, cdma->timeout.syncpt_id);
+
+ /* NOP all the PB slots */
+ getidx = getptr - pb->phys;
+ while (nr_slots--) {
+ u32 *p = (u32 *)((u32)pb->mapped + getidx);
+ *(p++) = NVHOST_OPCODE_NOOP;
+ *(p++) = NVHOST_OPCODE_NOOP;
+ dev_dbg(&dev->dev->dev, "%s: NOP at 0x%x\n",
+ __func__, pb->phys + getidx);
+ getidx = (getidx + 8) & (PUSH_BUFFER_SIZE - 1);
+ }
+ wmb();
+}
+
+/**
+ * Start channel DMA
+ */
+static void cdma_start(struct nvhost_cdma *cdma)
+{
+ void __iomem *chan_regs = cdma_to_channel(cdma)->aperture;
+
+ if (cdma->running)
+ return;
+
+ cdma->last_put = cdma_pb_op().putptr(&cdma->push_buffer);
+
+ writel(host1x_channel_dmactrl(true, false, false),
+ chan_regs + host1x_channel_dmactrl_r());
+
+ /* set base, put, end pointer (all of memory) */
+ writel(0, chan_regs + host1x_channel_dmastart_r());
+ writel(cdma->last_put, chan_regs + host1x_channel_dmaput_r());
+ writel(0xFFFFFFFF, chan_regs + host1x_channel_dmaend_r());
+
+ /* reset GET */
+ writel(host1x_channel_dmactrl(true, true, true),
+ chan_regs + host1x_channel_dmactrl_r());
+
+ /* start the command DMA */
+ writel(host1x_channel_dmactrl(false, false, false),
+ chan_regs + host1x_channel_dmactrl_r());
+
+ cdma->running = true;
+}
+
+/**
+ * Similar to cdma_start(), but rather than starting from an idle
+ * state (where DMA GET is set to DMA PUT), on a timeout we restore
+ * DMA GET from an explicit value (so DMA may again be pending).
+ */
+static void cdma_timeout_restart(struct nvhost_cdma *cdma, u32 getptr)
+{
+ struct nvhost_master *dev = cdma_to_dev(cdma);
+ void __iomem *chan_regs = cdma_to_channel(cdma)->aperture;
+
+ if (cdma->running)
+ return;
+
+ cdma->last_put = cdma_pb_op().putptr(&cdma->push_buffer);
+
+ writel(host1x_channel_dmactrl(true, false, false),
+ chan_regs + host1x_channel_dmactrl_r());
+
+ /* set base, end pointer (all of memory) */
+ writel(0, chan_regs + host1x_channel_dmastart_r());
+ writel(0xFFFFFFFF, chan_regs + host1x_channel_dmaend_r());
+
+ /* set GET, by loading the value in PUT (then reset GET) */
+ writel(getptr, chan_regs + host1x_channel_dmaput_r());
+ writel(host1x_channel_dmactrl(true, true, true),
+ chan_regs + host1x_channel_dmactrl_r());
+
+ dev_dbg(&dev->dev->dev,
+ "%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n",
+ __func__,
+ readl(chan_regs + host1x_channel_dmaget_r()),
+ readl(chan_regs + host1x_channel_dmaput_r()),
+ cdma->last_put);
+
+ /* deassert GET reset and set PUT */
+ writel(host1x_channel_dmactrl(true, false, false),
+ chan_regs + host1x_channel_dmactrl_r());
+ writel(cdma->last_put, chan_regs + host1x_channel_dmaput_r());
+
+ /* start the command DMA */
+ writel(host1x_channel_dmactrl(false, false, false),
+ chan_regs + host1x_channel_dmactrl_r());
+
+ cdma->running = true;
+}
+
+/**
+ * Kick channel DMA into action by writing its PUT offset (if it has changed)
+ */
+static void cdma_kick(struct nvhost_cdma *cdma)
+{
+ u32 put;
+
+ put = cdma_pb_op().putptr(&cdma->push_buffer);
+
+ if (put != cdma->last_put) {
+ void __iomem *chan_regs = cdma_to_channel(cdma)->aperture;
+ writel(put, chan_regs + host1x_channel_dmaput_r());
+ cdma->last_put = put;
+ }
+}
+
+static void cdma_stop(struct nvhost_cdma *cdma)
+{
+ void __iomem *chan_regs = cdma_to_channel(cdma)->aperture;
+
+ mutex_lock(&cdma->lock);
+ if (cdma->running) {
+ nvhost_cdma_wait_locked(cdma, CDMA_EVENT_SYNC_QUEUE_EMPTY);
+ writel(host1x_channel_dmactrl(true, false, false),
+ chan_regs + host1x_channel_dmactrl_r());
+ cdma->running = false;
+ }
+ mutex_unlock(&cdma->lock);
+}
+
+/**
+ * Stops both channel's command processor and CDMA immediately.
+ * Also, tears down the channel and resets corresponding module.
+ */
+static void cdma_timeout_teardown_begin(struct nvhost_cdma *cdma)
+{
+ struct nvhost_master *dev = cdma_to_dev(cdma);
+ struct nvhost_channel *ch = cdma_to_channel(cdma);
+ u32 cmdproc_stop;
+
+ if (cdma->torndown && !cdma->running) {
+ dev_warn(&dev->dev->dev, "Already torn down\n");
+ return;
+ }
+
+ dev_dbg(&dev->dev->dev,
+ "begin channel teardown (channel id %d)\n", ch->chid);
+
+ cmdproc_stop = readl(dev->sync_aperture + host1x_sync_cmdproc_stop_r());
+ cmdproc_stop |= BIT(ch->chid);
+ writel(cmdproc_stop, dev->sync_aperture + host1x_sync_cmdproc_stop_r());
+
+ dev_dbg(&dev->dev->dev,
+ "%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n",
+ __func__,
+ readl(ch->aperture + host1x_channel_dmaget_r()),
+ readl(ch->aperture + host1x_channel_dmaput_r()),
+ cdma->last_put);
+
+ writel(host1x_channel_dmactrl(true, false, false),
+ ch->aperture + host1x_channel_dmactrl_r());
+
+ writel(BIT(ch->chid), dev->sync_aperture + host1x_sync_ch_teardown_r());
+
+ cdma->running = false;
+ cdma->torndown = true;
+}
+
+static void cdma_timeout_teardown_end(struct nvhost_cdma *cdma, u32 getptr)
+{
+ struct nvhost_master *dev = cdma_to_dev(cdma);
+ struct nvhost_channel *ch = cdma_to_channel(cdma);
+ u32 cmdproc_stop;
+
+ dev_dbg(&dev->dev->dev,
+ "end channel teardown (id %d, DMAGET restart = 0x%x)\n",
+ ch->chid, getptr);
+
+ cmdproc_stop = readl(dev->sync_aperture + host1x_sync_cmdproc_stop_r());
+ cmdproc_stop &= ~(BIT(ch->chid));
+ writel(cmdproc_stop, dev->sync_aperture + host1x_sync_cmdproc_stop_r());
+
+ cdma->torndown = false;
+ cdma_timeout_restart(cdma, getptr);
+}
+
+/**
+ * If this timeout fires, it indicates the current sync_queue entry has
+ * exceeded its TTL and the userctx should be timed out and remaining
+ * submits already issued cleaned up (future submits return an error).
+ */
+static void cdma_timeout_handler(struct work_struct *work)
+{
+ struct nvhost_cdma *cdma;
+ struct nvhost_master *dev;
+ struct nvhost_syncpt *sp;
+ struct nvhost_channel *ch;
+
+ u32 syncpt_val;
+
+ u32 prev_cmdproc, cmdproc_stop;
+
+ cdma = container_of(to_delayed_work(work), struct nvhost_cdma,
+ timeout.wq);
+ dev = cdma_to_dev(cdma);
+ sp = &dev->syncpt;
+ ch = cdma_to_channel(cdma);
+
+ mutex_lock(&cdma->lock);
+
+ if (!cdma->timeout.clientid) {
+ dev_dbg(&dev->dev->dev,
+ "cdma_timeout: expired, but has no clientid\n");
+ mutex_unlock(&cdma->lock);
+ return;
+ }
+
+ /* stop processing to get a clean snapshot */
+ prev_cmdproc = readl(dev->sync_aperture + host1x_sync_cmdproc_stop_r());
+ cmdproc_stop = prev_cmdproc | BIT(ch->chid);
+ writel(cmdproc_stop, dev->sync_aperture + host1x_sync_cmdproc_stop_r());
+
+ dev_dbg(&dev->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
+ prev_cmdproc, cmdproc_stop);
+
+ syncpt_val = nvhost_syncpt_update_min(&dev->syncpt,
+ cdma->timeout.syncpt_id);
+
+ /* has buffer actually completed? */
+ if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
+ dev_dbg(&dev->dev->dev,
+ "cdma_timeout: expired, but buffer had completed\n");
+ /* restore */
+ cmdproc_stop = prev_cmdproc & ~(BIT(ch->chid));
+ writel(cmdproc_stop,
+ dev->sync_aperture + host1x_sync_cmdproc_stop_r());
+ mutex_unlock(&cdma->lock);
+ return;
+ }
+
+ dev_warn(&dev->dev->dev,
+ "%s: timeout: %d (%s), HW thresh %d, done %d\n",
+ __func__,
+ cdma->timeout.syncpt_id,
+ syncpt_op().name(sp, cdma->timeout.syncpt_id),
+ syncpt_val, cdma->timeout.syncpt_val);
+
+ /* stop HW, resetting channel/module */
+ cdma_op().timeout_teardown_begin(cdma);
+
+ nvhost_cdma_update_sync_queue(cdma, sp, ch->dev);
+ mutex_unlock(&cdma->lock);
+}
+
+static const struct nvhost_cdma_ops host1x_cdma_ops = {
+ .start = cdma_start,
+ .stop = cdma_stop,
+ .kick = cdma_kick,
+
+ .timeout_init = cdma_timeout_init,
+ .timeout_destroy = cdma_timeout_destroy,
+ .timeout_teardown_begin = cdma_timeout_teardown_begin,
+ .timeout_teardown_end = cdma_timeout_teardown_end,
+ .timeout_cpu_incr = cdma_timeout_cpu_incr,
+};
+
+static const struct nvhost_pushbuffer_ops host1x_pushbuffer_ops = {
+ .reset = push_buffer_reset,
+ .init = push_buffer_init,
+ .destroy = push_buffer_destroy,
+ .push_to = push_buffer_push_to,
+ .pop_from = push_buffer_pop_from,
+ .space = push_buffer_space,
+ .putptr = push_buffer_putptr,
+};
+
diff --git a/drivers/video/tegra/host/host1x/host1x_cdma.h b/drivers/video/tegra/host/host1x/host1x_cdma.h
new file mode 100644
index 0000000..dc0d0b0
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x_cdma.h
@@ -0,0 +1,39 @@
+/*
+ * drivers/video/tegra/host/host1x/host1x_cdma.h
+ *
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_HOST1X_HOST1X_CDMA_H
+#define __NVHOST_HOST1X_HOST1X_CDMA_H
+
+/* Size of the sync queue. If it is too small, we won't be able to queue up
+ * many command buffers. If it is too large, we waste memory. */
+#define NVHOST_SYNC_QUEUE_SIZE 512
+
+/* Number of gathers we allow to be queued up per channel. Must be a
+ * power of two. Currently sized such that pushbuffer is 4KB (512*8B). */
+#define NVHOST_GATHER_QUEUE_SIZE 512
+
+/* 8 bytes per slot. (This number does not include the final RESTART.) */
+#define PUSH_BUFFER_SIZE (NVHOST_GATHER_QUEUE_SIZE * 8)
+
+/* 4K page containing GATHERed methods to increment channel syncpts
+ * and replaces the original timed out contexts GATHER slots */
+#define SYNCPT_INCR_BUFFER_SIZE_WORDS (4096 / sizeof(u32))
+
+#endif
diff --git a/drivers/video/tegra/host/host1x/host1x_channel.c b/drivers/video/tegra/host/host1x/host1x_channel.c
new file mode 100644
index 0000000..78df954
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x_channel.c
@@ -0,0 +1,150 @@
+/*
+ * drivers/video/tegra/host/host1x/channel_host1x.c
+ *
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/nvhost.h>
+#include "nvhost_channel.h"
+#include "dev.h"
+#include "nvhost_acm.h"
+#include <linux/slab.h>
+#include "nvhost_intr.h"
+
+static void submit_gathers(struct nvhost_job *job)
+{
+ /* push user gathers */
+ int i;
+ for (i = 0 ; i < job->num_gathers; i++) {
+ struct nvhost_job_gather *g = &job->gathers[i];
+ u32 op1 = nvhost_opcode_gather(g->words);
+ u32 op2 = g->mem_base + g->offset;
+ nvhost_cdma_push_gather(&job->ch->cdma,
+ job->gathers[i].ref,
+ job->gathers[i].offset,
+ op1, op2);
+ }
+}
+
+static int host1x_channel_submit(struct nvhost_job *job)
+{
+ struct nvhost_channel *ch = job->ch;
+ struct nvhost_syncpt *sp = &nvhost_get_host(job->ch->dev)->syncpt;
+ u32 user_syncpt_incrs = job->syncpt_incrs;
+ u32 prev_max = 0;
+ u32 syncval;
+ int err;
+ void *completed_waiter = NULL;
+ struct nvhost_device_data *pdata = platform_get_drvdata(ch->dev);
+
+ /* Turn on the client module and host1x */
+ nvhost_module_busy(ch->dev);
+
+ /* before error checks, return current max */
+ prev_max = job->syncpt_end =
+ nvhost_syncpt_read_max(sp, job->syncpt_id);
+
+ /* get submit lock */
+ err = mutex_lock_interruptible(&ch->submitlock);
+ if (err) {
+ nvhost_module_idle(ch->dev);
+ goto error;
+ }
+
+ completed_waiter = nvhost_intr_alloc_waiter();
+ if (!completed_waiter) {
+ nvhost_module_idle(ch->dev);
+ mutex_unlock(&ch->submitlock);
+ err = -ENOMEM;
+ goto error;
+ }
+
+ /* begin a CDMA submit */
+ err = nvhost_cdma_begin(&ch->cdma, job);
+ if (err) {
+ mutex_unlock(&ch->submitlock);
+ nvhost_module_idle(ch->dev);
+ goto error;
+ }
+
+ if (pdata->serialize) {
+ /* Force serialization by inserting a host wait for the
+ * previous job to finish before this one can commence. */
+ nvhost_cdma_push(&ch->cdma,
+ nvhost_opcode_setclass(NV_HOST1X_CLASS_ID,
+ host1x_uclass_wait_syncpt_r(),
+ 1),
+ nvhost_class_host_wait_syncpt(job->syncpt_id,
+ nvhost_syncpt_read_max(sp,
+ job->syncpt_id)));
+ }
+
+ syncval = nvhost_syncpt_incr_max(sp,
+ job->syncpt_id, user_syncpt_incrs);
+
+ job->syncpt_end = syncval;
+
+ /* add a setclass for modules that require it */
+ if (pdata->class)
+ nvhost_cdma_push(&ch->cdma,
+ nvhost_opcode_setclass(pdata->class, 0, 0),
+ NVHOST_OPCODE_NOOP);
+
+ submit_gathers(job);
+
+ /* end CDMA submit & stash pinned hMems into sync queue */
+ nvhost_cdma_end(&ch->cdma, job);
+
+ /* schedule a submit complete interrupt */
+ err = nvhost_intr_add_action(&nvhost_get_host(ch->dev)->intr,
+ job->syncpt_id, syncval,
+ NVHOST_INTR_ACTION_SUBMIT_COMPLETE, ch,
+ completed_waiter,
+ NULL);
+ completed_waiter = NULL;
+ WARN(err, "Failed to set submit complete interrupt");
+
+ mutex_unlock(&ch->submitlock);
+
+ return 0;
+
+error:
+ kfree(completed_waiter);
+ return err;
+}
+
+static inline void __iomem *host1x_channel_aperture(void __iomem *p, int ndx)
+{
+ p += ndx * NV_HOST1X_CHANNEL_MAP_SIZE_BYTES;
+ return p;
+}
+
+static int host1x_channel_init(struct nvhost_channel *ch,
+ struct nvhost_master *dev, int index)
+{
+ ch->chid = index;
+ mutex_init(&ch->reflock);
+ mutex_init(&ch->submitlock);
+
+ ch->aperture = host1x_channel_aperture(dev->aperture, index);
+ return 0;
+}
+
+static const struct nvhost_channel_ops host1x_channel_ops = {
+ .init = host1x_channel_init,
+ .submit = host1x_channel_submit,
+};
diff --git a/drivers/video/tegra/host/host1x/host1x_syncpt.c b/drivers/video/tegra/host/host1x/host1x_syncpt.c
index 57cc1b1..e47bd71 100644
--- a/drivers/video/tegra/host/host1x/host1x_syncpt.c
+++ b/drivers/video/tegra/host/host1x/host1x_syncpt.c
@@ -107,6 +107,16 @@ static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id)
wmb();
}

+/* remove a wait pointed to by patch_addr */
+static int host1x_syncpt_patch_wait(struct nvhost_syncpt *sp,
+ void *patch_addr)
+{
+ u32 override = nvhost_class_host_wait_syncpt(
+ NVSYNCPT_GRAPHICS_HOST, 0);
+ __raw_writel(override, patch_addr);
+ return 0;
+}
+
static const char *host1x_syncpt_name(struct nvhost_syncpt *sp, u32 id)
{
struct host1x_device_info *info = &syncpt_to_dev(sp)->info;
@@ -151,6 +161,7 @@ static const struct nvhost_syncpt_ops host1x_syncpt_ops = {
.read_wait_base = host1x_syncpt_read_wait_base,
.update_min = host1x_syncpt_update_min,
.cpu_incr = host1x_syncpt_cpu_incr,
+ .patch_wait = host1x_syncpt_patch_wait,
.debug = host1x_syncpt_debug,
.name = host1x_syncpt_name,
};
diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_channel.h b/drivers/video/tegra/host/host1x/hw_host1x01_channel.h
new file mode 100644
index 0000000..ca2f9a0
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/hw_host1x01_channel.h
@@ -0,0 +1,182 @@
+/*
+ * drivers/video/tegra/host/host1x/hw_host1x_channel_host1x.h
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+ * Function naming determines intended use:
+ *
+ * <x>_r(void) : Returns the offset for register <x>.
+ *
+ * <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+ *
+ * <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+ *
+ * <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+ * and masked to place it at field <y> of register <x>. This value
+ * can be |'d with others to produce a full register value for
+ * register <x>.
+ *
+ * <x>_<y>_m(void) : Returns a mask for field <y> of register <x>. This
+ * value can be ~'d and then &'d to clear the value of field <y> for
+ * register <x>.
+ *
+ * <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+ * to place it at field <y> of register <x>. This value can be |'d
+ * with others to produce a full register value for <x>.
+ *
+ * <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+ * <x> value 'r' after being shifted to place its LSB at bit 0.
+ * This value is suitable for direct comparison with other unshifted
+ * values appropriate for use in field <y> of register <x>.
+ *
+ * <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+ * field <y> of register <x>. This value is suitable for direct
+ * comparison with unshifted values appropriate for use in field <y>
+ * of register <x>.
+ */
+
+#ifndef __hw_host1x_channel_host1x_h__
+#define __hw_host1x_channel_host1x_h__
+/*This file is autogenerated. Do not edit. */
+
+static inline u32 host1x_channel_fifostat_r(void)
+{
+ return 0x0;
+}
+static inline u32 host1x_channel_fifostat_cfempty_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_channel_fifostat_cfempty_f(u32 v)
+{
+ return (v & 0x1) << 10;
+}
+static inline u32 host1x_channel_fifostat_cfempty_m(void)
+{
+ return 0x1 << 10;
+}
+static inline u32 host1x_channel_fifostat_cfempty_v(u32 r)
+{
+ return (r >> 10) & 0x1;
+}
+static inline u32 host1x_channel_fifostat_cfempty_notempty_v(void)
+{
+ return 0;
+}
+static inline u32 host1x_channel_fifostat_cfempty_empty_v(void)
+{
+ return 1;
+}
+static inline u32 host1x_channel_fifostat_outfentries_s(void)
+{
+ return 5;
+}
+static inline u32 host1x_channel_fifostat_outfentries_f(u32 v)
+{
+ return (v & 0x1f) << 24;
+}
+static inline u32 host1x_channel_fifostat_outfentries_m(void)
+{
+ return 0x1f << 24;
+}
+static inline u32 host1x_channel_fifostat_outfentries_v(u32 r)
+{
+ return (r >> 24) & 0x1f;
+}
+static inline u32 host1x_channel_inddata_r(void)
+{
+ return 0xc;
+}
+static inline u32 host1x_channel_dmastart_r(void)
+{
+ return 0x14;
+}
+static inline u32 host1x_channel_dmaput_r(void)
+{
+ return 0x18;
+}
+static inline u32 host1x_channel_dmaget_r(void)
+{
+ return 0x1c;
+}
+static inline u32 host1x_channel_dmaend_r(void)
+{
+ return 0x20;
+}
+static inline u32 host1x_channel_dmactrl_r(void)
+{
+ return 0x24;
+}
+static inline u32 host1x_channel_dmactrl_dmastop_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_channel_dmactrl_dmastop_f(u32 v)
+{
+ return (v & 0x1) << 0;
+}
+static inline u32 host1x_channel_dmactrl_dmastop_m(void)
+{
+ return 0x1 << 0;
+}
+static inline u32 host1x_channel_dmactrl_dmastop_v(u32 r)
+{
+ return (r >> 0) & 0x1;
+}
+static inline u32 host1x_channel_dmactrl_dmastop_run_v(void)
+{
+ return 0;
+}
+static inline u32 host1x_channel_dmactrl_dmastop_stop_v(void)
+{
+ return 1;
+}
+static inline u32 host1x_channel_dmactrl_dmagetrst_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_channel_dmactrl_dmagetrst_f(u32 v)
+{
+ return (v & 0x1) << 1;
+}
+static inline u32 host1x_channel_dmactrl_dmagetrst_m(void)
+{
+ return 0x1 << 1;
+}
+static inline u32 host1x_channel_dmactrl_dmagetrst_v(u32 r)
+{
+ return (r >> 1) & 0x1;
+}
+static inline u32 host1x_channel_dmactrl_dmainitget_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_channel_dmactrl_dmainitget_f(u32 v)
+{
+ return (v & 0x1) << 2;
+}
+static inline u32 host1x_channel_dmactrl_dmainitget_m(void)
+{
+ return 0x1 << 2;
+}
+static inline u32 host1x_channel_dmactrl_dmainitget_v(u32 r)
+{
+ return (r >> 2) & 0x1;
+}
+
+#endif /* __hw_host1x_channel_host1x_h__ */
diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_uclass.h b/drivers/video/tegra/host/host1x/hw_host1x01_uclass.h
new file mode 100644
index 0000000..ed6e4b7
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/hw_host1x01_uclass.h
@@ -0,0 +1,474 @@
+/*
+ * drivers/video/tegra/host/host1x/hw_host1x_uclass_host1x.h
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+ * Function naming determines intended use:
+ *
+ * <x>_r(void) : Returns the offset for register <x>.
+ *
+ * <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+ *
+ * <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+ *
+ * <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+ * and masked to place it at field <y> of register <x>. This value
+ * can be |'d with others to produce a full register value for
+ * register <x>.
+ *
+ * <x>_<y>_m(void) : Returns a mask for field <y> of register <x>. This
+ * value can be ~'d and then &'d to clear the value of field <y> for
+ * register <x>.
+ *
+ * <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+ * to place it at field <y> of register <x>. This value can be |'d
+ * with others to produce a full register value for <x>.
+ *
+ * <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+ * <x> value 'r' after being shifted to place its LSB at bit 0.
+ * This value is suitable for direct comparison with other unshifted
+ * values appropriate for use in field <y> of register <x>.
+ *
+ * <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+ * field <y> of register <x>. This value is suitable for direct
+ * comparison with unshifted values appropriate for use in field <y>
+ * of register <x>.
+ */
+
+#ifndef __hw_host1x_uclass_host1x_h__
+#define __hw_host1x_uclass_host1x_h__
+/*This file is autogenerated. Do not edit. */
+
+static inline u32 host1x_uclass_incr_syncpt_r(void)
+{
+ return 0x0;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_s(void)
+{
+ return 8;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_f(u32 v)
+{
+ return (v & 0xff) << 8;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_m(void)
+{
+ return 0xff << 8;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_v(u32 r)
+{
+ return (r >> 8) & 0xff;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_immediate_v(void)
+{
+ return 0;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_op_done_v(void)
+{
+ return 1;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_rd_done_v(void)
+{
+ return 2;
+}
+static inline u32 host1x_uclass_incr_syncpt_cond_reg_wr_safe_v(void)
+{
+ return 3;
+}
+static inline u32 host1x_uclass_incr_syncpt_indx_s(void)
+{
+ return 8;
+}
+static inline u32 host1x_uclass_incr_syncpt_indx_f(u32 v)
+{
+ return (v & 0xff) << 0;
+}
+static inline u32 host1x_uclass_incr_syncpt_indx_m(void)
+{
+ return 0xff << 0;
+}
+static inline u32 host1x_uclass_incr_syncpt_indx_v(u32 r)
+{
+ return (r >> 0) & 0xff;
+}
+static inline u32 host1x_uclass_wait_syncpt_r(void)
+{
+ return 0x8;
+}
+static inline u32 host1x_uclass_wait_syncpt_indx_s(void)
+{
+ return 8;
+}
+static inline u32 host1x_uclass_wait_syncpt_indx_f(u32 v)
+{
+ return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_wait_syncpt_indx_m(void)
+{
+ return 0xff << 24;
+}
+static inline u32 host1x_uclass_wait_syncpt_indx_v(u32 r)
+{
+ return (r >> 24) & 0xff;
+}
+static inline u32 host1x_uclass_wait_syncpt_thresh_s(void)
+{
+ return 24;
+}
+static inline u32 host1x_uclass_wait_syncpt_thresh_f(u32 v)
+{
+ return (v & 0xffffff) << 0;
+}
+static inline u32 host1x_uclass_wait_syncpt_thresh_m(void)
+{
+ return 0xffffff << 0;
+}
+static inline u32 host1x_uclass_wait_syncpt_thresh_v(u32 r)
+{
+ return (r >> 0) & 0xffffff;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_r(void)
+{
+ return 0x9;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_indx_s(void)
+{
+ return 8;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_indx_f(u32 v)
+{
+ return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_indx_m(void)
+{
+ return 0xff << 24;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_indx_v(u32 r)
+{
+ return (r >> 24) & 0xff;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_base_indx_s(void)
+{
+ return 8;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_base_indx_f(u32 v)
+{
+ return (v & 0xff) << 16;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_base_indx_m(void)
+{
+ return 0xff << 16;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_base_indx_v(u32 r)
+{
+ return (r >> 16) & 0xff;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_offset_s(void)
+{
+ return 16;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_offset_f(u32 v)
+{
+ return (v & 0xffff) << 0;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_offset_m(void)
+{
+ return 0xffff << 0;
+}
+static inline u32 host1x_uclass_wait_syncpt_base_offset_v(u32 r)
+{
+ return (r >> 0) & 0xffff;
+}
+static inline u32 host1x_uclass_load_syncpt_base_r(void)
+{
+ return 0xb;
+}
+static inline u32 host1x_uclass_load_syncpt_base_base_indx_s(void)
+{
+ return 8;
+}
+static inline u32 host1x_uclass_load_syncpt_base_base_indx_f(u32 v)
+{
+ return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_load_syncpt_base_base_indx_m(void)
+{
+ return 0xff << 24;
+}
+static inline u32 host1x_uclass_load_syncpt_base_base_indx_v(u32 r)
+{
+ return (r >> 24) & 0xff;
+}
+static inline u32 host1x_uclass_load_syncpt_base_value_s(void)
+{
+ return 24;
+}
+static inline u32 host1x_uclass_load_syncpt_base_value_f(u32 v)
+{
+ return (v & 0xffffff) << 0;
+}
+static inline u32 host1x_uclass_load_syncpt_base_value_m(void)
+{
+ return 0xffffff << 0;
+}
+static inline u32 host1x_uclass_load_syncpt_base_value_v(u32 r)
+{
+ return (r >> 0) & 0xffffff;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_r(void)
+{
+ return 0xc;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_base_indx_s(void)
+{
+ return 8;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_base_indx_f(u32 v)
+{
+ return (v & 0xff) << 24;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_base_indx_m(void)
+{
+ return 0xff << 24;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_base_indx_v(u32 r)
+{
+ return (r >> 24) & 0xff;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_offset_s(void)
+{
+ return 24;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_offset_f(u32 v)
+{
+ return (v & 0xffffff) << 0;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_offset_m(void)
+{
+ return 0xffffff << 0;
+}
+static inline u32 host1x_uclass_incr_syncpt_base_offset_v(u32 r)
+{
+ return (r >> 0) & 0xffffff;
+}
+static inline u32 host1x_uclass_indoff_r(void)
+{
+ return 0x2d;
+}
+static inline u32 host1x_uclass_indoff_indbe_s(void)
+{
+ return 4;
+}
+static inline u32 host1x_uclass_indoff_indbe_f(u32 v)
+{
+ return (v & 0xf) << 28;
+}
+static inline u32 host1x_uclass_indoff_indbe_m(void)
+{
+ return 0xf << 28;
+}
+static inline u32 host1x_uclass_indoff_indbe_v(u32 r)
+{
+ return (r >> 28) & 0xf;
+}
+static inline u32 host1x_uclass_indoff_autoinc_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_uclass_indoff_autoinc_f(u32 v)
+{
+ return (v & 0x1) << 27;
+}
+static inline u32 host1x_uclass_indoff_autoinc_m(void)
+{
+ return 0x1 << 27;
+}
+static inline u32 host1x_uclass_indoff_autoinc_v(u32 r)
+{
+ return (r >> 27) & 0x1;
+}
+static inline u32 host1x_uclass_indoff_spool_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_uclass_indoff_spool_f(u32 v)
+{
+ return (v & 0x1) << 26;
+}
+static inline u32 host1x_uclass_indoff_spool_m(void)
+{
+ return 0x1 << 26;
+}
+static inline u32 host1x_uclass_indoff_spool_v(u32 r)
+{
+ return (r >> 26) & 0x1;
+}
+static inline u32 host1x_uclass_indoff_indoffset_s(void)
+{
+ return 24;
+}
+static inline u32 host1x_uclass_indoff_indoffset_f(u32 v)
+{
+ return (v & 0xffffff) << 2;
+}
+static inline u32 host1x_uclass_indoff_indoffset_m(void)
+{
+ return 0xffffff << 2;
+}
+static inline u32 host1x_uclass_indoff_indoffset_v(u32 r)
+{
+ return (r >> 2) & 0xffffff;
+}
+static inline u32 host1x_uclass_indoff_indmodid_s(void)
+{
+ return 8;
+}
+static inline u32 host1x_uclass_indoff_indmodid_f(u32 v)
+{
+ return (v & 0xff) << 18;
+}
+static inline u32 host1x_uclass_indoff_indmodid_m(void)
+{
+ return 0xff << 18;
+}
+static inline u32 host1x_uclass_indoff_indmodid_v(u32 r)
+{
+ return (r >> 18) & 0xff;
+}
+static inline u32 host1x_uclass_indoff_indmodid_host1x_v(void)
+{
+ return 0;
+}
+static inline u32 host1x_uclass_indoff_indmodid_mpe_v(void)
+{
+ return 1;
+}
+static inline u32 host1x_uclass_indoff_indmodid_vi_v(void)
+{
+ return 2;
+}
+static inline u32 host1x_uclass_indoff_indmodid_epp_v(void)
+{
+ return 3;
+}
+static inline u32 host1x_uclass_indoff_indmodid_isp_v(void)
+{
+ return 4;
+}
+static inline u32 host1x_uclass_indoff_indmodid_gr2d_v(void)
+{
+ return 5;
+}
+static inline u32 host1x_uclass_indoff_indmodid_gr3d_v(void)
+{
+ return 6;
+}
+static inline u32 host1x_uclass_indoff_indmodid_display_v(void)
+{
+ return 8;
+}
+static inline u32 host1x_uclass_indoff_indmodid_tvo_v(void)
+{
+ return 11;
+}
+static inline u32 host1x_uclass_indoff_indmodid_displayb_v(void)
+{
+ return 9;
+}
+static inline u32 host1x_uclass_indoff_indmodid_dsi_v(void)
+{
+ return 12;
+}
+static inline u32 host1x_uclass_indoff_indmodid_hdmi_v(void)
+{
+ return 10;
+}
+static inline u32 host1x_uclass_indoff_indmodid_dsib_v(void)
+{
+ return 16;
+}
+static inline u32 host1x_uclass_indoff_indroffset_s(void)
+{
+ return 16;
+}
+static inline u32 host1x_uclass_indoff_indroffset_f(u32 v)
+{
+ return (v & 0xffff) << 2;
+}
+static inline u32 host1x_uclass_indoff_indroffset_m(void)
+{
+ return 0xffff << 2;
+}
+static inline u32 host1x_uclass_indoff_indroffset_v(u32 r)
+{
+ return (r >> 2) & 0xffff;
+}
+static inline u32 host1x_uclass_indoff_acctype_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_uclass_indoff_acctype_f(u32 v)
+{
+ return (v & 0x1) << 1;
+}
+static inline u32 host1x_uclass_indoff_acctype_m(void)
+{
+ return 0x1 << 1;
+}
+static inline u32 host1x_uclass_indoff_acctype_v(u32 r)
+{
+ return (r >> 1) & 0x1;
+}
+static inline u32 host1x_uclass_indoff_acctype_reg_v(void)
+{
+ return 0;
+}
+static inline u32 host1x_uclass_indoff_acctype_fb_v(void)
+{
+ return 1;
+}
+static inline u32 host1x_uclass_indoff_rwn_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_uclass_indoff_rwn_f(u32 v)
+{
+ return (v & 0x1) << 0;
+}
+static inline u32 host1x_uclass_indoff_rwn_m(void)
+{
+ return 0x1 << 0;
+}
+static inline u32 host1x_uclass_indoff_rwn_v(u32 r)
+{
+ return (r >> 0) & 0x1;
+}
+static inline u32 host1x_uclass_indoff_rwn_write_v(void)
+{
+ return 0;
+}
+static inline u32 host1x_uclass_indoff_rwn_read_v(void)
+{
+ return 1;
+}
+static inline u32 host1x_uclass_inddata_r(void)
+{
+ return 0x2e;
+}
+
+#endif /* __hw_host1x_uclass_host1x_h__ */
diff --git a/drivers/video/tegra/host/nvhost_cdma.c b/drivers/video/tegra/host/nvhost_cdma.c
new file mode 100644
index 0000000..e581836
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_cdma.c
@@ -0,0 +1,429 @@
+/*
+ * drivers/video/tegra/host/nvhost_cdma.c
+ *
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "nvhost_cdma.h"
+#include "nvhost_channel.h"
+#include "dev.h"
+#include "nvhost_memmgr.h"
+#include "chip_support.h"
+#include <asm/cacheflush.h>
+
+#include <linux/slab.h>
+#include <linux/kfifo.h>
+#include <linux/interrupt.h>
+
+/*
+ * TODO:
+ * stats
+ * - for figuring out what to optimize further
+ * resizable push buffer
+ * - some channels hardly need any, some channels (3d) could use more
+ */
+
+/**
+ * Add an entry to the sync queue.
+ */
+static void add_to_sync_queue(struct nvhost_cdma *cdma,
+ struct nvhost_job *job,
+ u32 nr_slots,
+ u32 first_get)
+{
+ if (job->syncpt_id == NVSYNCPT_INVALID) {
+ dev_warn(&job->ch->dev->dev, "%s: Invalid syncpt\n",
+ __func__);
+ return;
+ }
+
+ job->first_get = first_get;
+ job->num_slots = nr_slots;
+ nvhost_job_get(job);
+ list_add_tail(&job->list, &cdma->sync_queue);
+}
+
+/**
+ * Return the status of the cdma's sync queue or push buffer for the given event
+ * - sq empty: returns 1 for empty, 0 for not empty (as in "1 empty queue" :-)
+ * - pb space: returns the number of free slots in the channel's push buffer
+ * Must be called with the cdma lock held.
+ */
+static unsigned int cdma_status_locked(struct nvhost_cdma *cdma,
+ enum cdma_event event)
+{
+ switch (event) {
+ case CDMA_EVENT_SYNC_QUEUE_EMPTY:
+ return list_empty(&cdma->sync_queue) ? 1 : 0;
+ case CDMA_EVENT_PUSH_BUFFER_SPACE: {
+ struct push_buffer *pb = &cdma->push_buffer;
+ return cdma_pb_op().space(pb);
+ }
+ default:
+ return 0;
+ }
+}
+
+/**
+ * Sleep (if necessary) until the requested event happens
+ * - CDMA_EVENT_SYNC_QUEUE_EMPTY : sync queue is completely empty.
+ * - Returns 1
+ * - CDMA_EVENT_PUSH_BUFFER_SPACE : there is space in the push buffer
+ * - Return the amount of space (> 0)
+ * Must be called with the cdma lock held.
+ */
+unsigned int nvhost_cdma_wait_locked(struct nvhost_cdma *cdma,
+ enum cdma_event event)
+{
+ for (;;) {
+ unsigned int space = cdma_status_locked(cdma, event);
+ if (space)
+ return space;
+
+ /* If somebody has managed to already start waiting, yield */
+ if (cdma->event != CDMA_EVENT_NONE) {
+ mutex_unlock(&cdma->lock);
+ schedule();
+ mutex_lock(&cdma->lock);
+ continue;
+ }
+ cdma->event = event;
+
+ mutex_unlock(&cdma->lock);
+ down(&cdma->sem);
+ mutex_lock(&cdma->lock);
+ }
+ return 0;
+}
+
+/**
+ * Start timer for a buffer submition that has completed yet.
+ * Must be called with the cdma lock held.
+ */
+static void cdma_start_timer_locked(struct nvhost_cdma *cdma,
+ struct nvhost_job *job)
+{
+ if (cdma->timeout.clientid) {
+ /* timer already started */
+ return;
+ }
+
+ cdma->timeout.clientid = job->clientid;
+ cdma->timeout.syncpt_id = job->syncpt_id;
+ cdma->timeout.syncpt_val = job->syncpt_end;
+ cdma->timeout.start_ktime = ktime_get();
+
+ schedule_delayed_work(&cdma->timeout.wq,
+ msecs_to_jiffies(job->timeout));
+}
+
+/**
+ * Stop timer when a buffer submition completes.
+ * Must be called with the cdma lock held.
+ */
+static void stop_cdma_timer_locked(struct nvhost_cdma *cdma)
+{
+ cancel_delayed_work(&cdma->timeout.wq);
+ cdma->timeout.clientid = 0;
+}
+
+/**
+ * For all sync queue entries that have already finished according to the
+ * current sync point registers:
+ * - unpin & unref their mems
+ * - pop their push buffer slots
+ * - remove them from the sync queue
+ * This is normally called from the host code's worker thread, but can be
+ * called manually if necessary.
+ * Must be called with the cdma lock held.
+ */
+static void update_cdma_locked(struct nvhost_cdma *cdma)
+{
+ bool signal = false;
+ struct nvhost_master *dev = cdma_to_dev(cdma);
+ struct nvhost_syncpt *sp = &dev->syncpt;
+ struct nvhost_job *job, *n;
+
+ /* If CDMA is stopped, queue is cleared and we can return */
+ if (!cdma->running)
+ return;
+
+ /*
+ * Walk the sync queue, reading the sync point registers as necessary,
+ * to consume as many sync queue entries as possible without blocking
+ */
+ list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
+ /* Check whether this syncpt has completed, and bail if not */
+ if (!nvhost_syncpt_is_expired(sp,
+ job->syncpt_id, job->syncpt_end)) {
+ /* Start timer on next pending syncpt */
+ if (job->timeout)
+ cdma_start_timer_locked(cdma, job);
+ break;
+ }
+
+ /* Cancel timeout, when a buffer completes */
+ if (cdma->timeout.clientid)
+ stop_cdma_timer_locked(cdma);
+
+ /* Unpin the memory */
+ nvhost_job_unpin(job);
+
+ /* Pop push buffer slots */
+ if (job->num_slots) {
+ struct push_buffer *pb = &cdma->push_buffer;
+ cdma_pb_op().pop_from(pb, job->num_slots);
+ if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
+ signal = true;
+ }
+
+ list_del(&job->list);
+ nvhost_job_put(job);
+ }
+
+ if (list_empty(&cdma->sync_queue) &&
+ cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
+ signal = true;
+
+ /* Wake up CdmaWait() if the requested event happened */
+ if (signal) {
+ cdma->event = CDMA_EVENT_NONE;
+ up(&cdma->sem);
+ }
+}
+
+void nvhost_cdma_update_sync_queue(struct nvhost_cdma *cdma,
+ struct nvhost_syncpt *syncpt, struct platform_device *dev)
+{
+ u32 get_restart;
+ u32 syncpt_incrs;
+ struct nvhost_job *job = NULL;
+ u32 syncpt_val;
+
+ syncpt_val = nvhost_syncpt_update_min(syncpt, cdma->timeout.syncpt_id);
+
+ dev_dbg(&dev->dev,
+ "%s: starting cleanup (thresh %d)\n",
+ __func__, syncpt_val);
+
+ /*
+ * Move the sync_queue read pointer to the first entry that hasn't
+ * completed based on the current HW syncpt value. It's likely there
+ * won't be any (i.e. we're still at the head), but covers the case
+ * where a syncpt incr happens just prior/during the teardown.
+ */
+
+ dev_dbg(&dev->dev,
+ "%s: skip completed buffers still in sync_queue\n",
+ __func__);
+
+ list_for_each_entry(job, &cdma->sync_queue, list) {
+ if (syncpt_val < job->syncpt_end)
+ break;
+
+ nvhost_job_dump(&dev->dev, job);
+ }
+
+ /*
+ * Walk the sync_queue, first incrementing with the CPU syncpts that
+ * are partially executed (the first buffer) or fully skipped while
+ * still in the current context (slots are also NOP-ed).
+ *
+ * At the point contexts are interleaved, syncpt increments must be
+ * done inline with the pushbuffer from a GATHER buffer to maintain
+ * the order (slots are modified to be a GATHER of syncpt incrs).
+ *
+ * Note: save in get_restart the location where the timed out buffer
+ * started in the PB, so we can start the refetch from there (with the
+ * modified NOP-ed PB slots). This lets things appear to have completed
+ * properly for this buffer and resources are freed.
+ */
+
+ dev_dbg(&dev->dev,
+ "%s: perform CPU incr on pending same ctx buffers\n",
+ __func__);
+
+ get_restart = cdma->last_put;
+ if (!list_empty(&cdma->sync_queue))
+ get_restart = job->first_get;
+
+ /* do CPU increments as long as this context continues */
+ list_for_each_entry_from(job, &cdma->sync_queue, list) {
+ /* different context, gets us out of this loop */
+ if (job->clientid != cdma->timeout.clientid)
+ break;
+
+ /* won't need a timeout when replayed */
+ job->timeout = 0;
+
+ syncpt_incrs = job->syncpt_end - syncpt_val;
+ dev_dbg(&dev->dev,
+ "%s: CPU incr (%d)\n", __func__, syncpt_incrs);
+
+ nvhost_job_dump(&dev->dev, job);
+
+ /* safe to use CPU to incr syncpts */
+ cdma_op().timeout_cpu_incr(cdma,
+ job->first_get,
+ syncpt_incrs,
+ job->syncpt_end,
+ job->num_slots);
+
+ syncpt_val += syncpt_incrs;
+ }
+
+ list_for_each_entry_from(job, &cdma->sync_queue, list)
+ if (job->clientid == cdma->timeout.clientid)
+ job->timeout = 500;
+
+ dev_dbg(&dev->dev,
+ "%s: finished sync_queue modification\n", __func__);
+
+ /* roll back DMAGET and start up channel again */
+ cdma_op().timeout_teardown_end(cdma, get_restart);
+}
+
+/**
+ * Create a cdma
+ */
+int nvhost_cdma_init(struct nvhost_cdma *cdma)
+{
+ int err;
+ struct push_buffer *pb = &cdma->push_buffer;
+ mutex_init(&cdma->lock);
+ sema_init(&cdma->sem, 0);
+
+ INIT_LIST_HEAD(&cdma->sync_queue);
+
+ cdma->event = CDMA_EVENT_NONE;
+ cdma->running = false;
+ cdma->torndown = false;
+
+ err = cdma_pb_op().init(pb);
+ if (err)
+ return err;
+ return 0;
+}
+
+/**
+ * Destroy a cdma
+ */
+void nvhost_cdma_deinit(struct nvhost_cdma *cdma)
+{
+ struct push_buffer *pb = &cdma->push_buffer;
+
+ if (cdma->running) {
+ pr_warn("%s: CDMA still running\n",
+ __func__);
+ } else {
+ cdma_pb_op().destroy(pb);
+ cdma_op().timeout_destroy(cdma);
+ }
+}
+
+/**
+ * Begin a cdma submit
+ */
+int nvhost_cdma_begin(struct nvhost_cdma *cdma, struct nvhost_job *job)
+{
+ mutex_lock(&cdma->lock);
+
+ if (job->timeout) {
+ /* init state on first submit with timeout value */
+ if (!cdma->timeout.initialized) {
+ int err;
+ err = cdma_op().timeout_init(cdma,
+ job->syncpt_id);
+ if (err) {
+ mutex_unlock(&cdma->lock);
+ return err;
+ }
+ }
+ }
+ if (!cdma->running)
+ cdma_op().start(cdma);
+
+ cdma->slots_free = 0;
+ cdma->slots_used = 0;
+ cdma->first_get = cdma_pb_op().putptr(&cdma->push_buffer);
+ return 0;
+}
+
+/**
+ * Push two words into a push buffer slot
+ * Blocks as necessary if the push buffer is full.
+ */
+void nvhost_cdma_push(struct nvhost_cdma *cdma, u32 op1, u32 op2)
+{
+ nvhost_cdma_push_gather(cdma, NULL, 0, op1, op2);
+}
+
+/**
+ * Push two words into a push buffer slot
+ * Blocks as necessary if the push buffer is full.
+ */
+void nvhost_cdma_push_gather(struct nvhost_cdma *cdma,
+ struct mem_handle *handle,
+ u32 offset, u32 op1, u32 op2)
+{
+ u32 slots_free = cdma->slots_free;
+ struct push_buffer *pb = &cdma->push_buffer;
+
+ if (slots_free == 0) {
+ cdma_op().kick(cdma);
+ slots_free = nvhost_cdma_wait_locked(cdma,
+ CDMA_EVENT_PUSH_BUFFER_SPACE);
+ }
+ cdma->slots_free = slots_free - 1;
+ cdma->slots_used++;
+ cdma_pb_op().push_to(pb, handle, op1, op2);
+}
+
+/**
+ * End a cdma submit
+ * Kick off DMA, add job to the sync queue, and a number of slots to be freed
+ * from the pushbuffer. The handles for a submit must all be pinned at the same
+ * time, but they can be unpinned in smaller chunks.
+ */
+void nvhost_cdma_end(struct nvhost_cdma *cdma,
+ struct nvhost_job *job)
+{
+ bool was_idle = list_empty(&cdma->sync_queue);
+
+ cdma_op().kick(cdma);
+
+ add_to_sync_queue(cdma,
+ job,
+ cdma->slots_used,
+ cdma->first_get);
+
+ /* start timer on idle -> active transitions */
+ if (job->timeout && was_idle)
+ cdma_start_timer_locked(cdma, job);
+
+ mutex_unlock(&cdma->lock);
+}
+
+/**
+ * Update cdma state according to current sync point values
+ */
+void nvhost_cdma_update(struct nvhost_cdma *cdma)
+{
+ mutex_lock(&cdma->lock);
+ update_cdma_locked(cdma);
+ mutex_unlock(&cdma->lock);
+}
diff --git a/drivers/video/tegra/host/nvhost_cdma.h b/drivers/video/tegra/host/nvhost_cdma.h
new file mode 100644
index 0000000..ab40bf1
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_cdma.h
@@ -0,0 +1,109 @@
+/*
+ * drivers/video/tegra/host/nvhost_cdma.h
+ *
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_CDMA_H
+#define __NVHOST_CDMA_H
+
+#include <linux/sched.h>
+#include <linux/semaphore.h>
+
+#include <linux/nvhost.h>
+#include <linux/list.h>
+
+struct nvhost_syncpt;
+struct nvhost_userctx_timeout;
+struct nvhost_job;
+struct mem_handle;
+
+/*
+ * cdma
+ *
+ * This is in charge of a host command DMA channel.
+ * Sends ops to a push buffer, and takes responsibility for unpinning
+ * (& possibly freeing) of memory after those ops have completed.
+ * Producer:
+ * begin
+ * push - send ops to the push buffer
+ * end - start command DMA and enqueue handles to be unpinned
+ * Consumer:
+ * update - call to update sync queue and push buffer, unpin memory
+ */
+
+struct push_buffer {
+ u32 *mapped; /* mapped pushbuffer memory */
+ dma_addr_t phys; /* physical address of pushbuffer */
+ u32 fence; /* index we've written */
+ u32 cur; /* index to write to */
+ struct mem_handle **handle; /* handle for each opcode pair */
+};
+
+struct buffer_timeout {
+ struct delayed_work wq; /* work queue */
+ bool initialized; /* timer one-time setup flag */
+ u32 syncpt_id; /* buffer completion syncpt id */
+ u32 syncpt_val; /* syncpt value when completed */
+ ktime_t start_ktime; /* starting time */
+ /* context timeout information */
+ int clientid;
+};
+
+enum cdma_event {
+ CDMA_EVENT_NONE, /* not waiting for any event */
+ CDMA_EVENT_SYNC_QUEUE_EMPTY, /* wait for empty sync queue */
+ CDMA_EVENT_PUSH_BUFFER_SPACE /* wait for space in push buffer */
+};
+
+struct nvhost_cdma {
+ struct mutex lock; /* controls access to shared state */
+ struct semaphore sem; /* signalled when event occurs */
+ enum cdma_event event; /* event that sem is waiting for */
+ unsigned int slots_used; /* pb slots used in current submit */
+ unsigned int slots_free; /* pb slots free in current submit */
+ unsigned int first_get; /* DMAGET value, where submit begins */
+ unsigned int last_put; /* last value written to DMAPUT */
+ struct push_buffer push_buffer; /* channel's push buffer */
+ struct list_head sync_queue; /* job queue */
+ struct buffer_timeout timeout; /* channel's timeout state/wq */
+ bool running;
+ bool torndown;
+};
+
+#define cdma_to_channel(cdma) container_of(cdma, struct nvhost_channel, cdma)
+#define cdma_to_dev(cdma) nvhost_get_host(cdma_to_channel(cdma)->dev)
+#define cdma_to_memmgr(cdma) ((cdma_to_dev(cdma))->memmgr)
+#define pb_to_cdma(pb) container_of(pb, struct nvhost_cdma, push_buffer)
+
+int nvhost_cdma_init(struct nvhost_cdma *cdma);
+void nvhost_cdma_deinit(struct nvhost_cdma *cdma);
+void nvhost_cdma_stop(struct nvhost_cdma *cdma);
+int nvhost_cdma_begin(struct nvhost_cdma *cdma, struct nvhost_job *job);
+void nvhost_cdma_push(struct nvhost_cdma *cdma, u32 op1, u32 op2);
+void nvhost_cdma_push_gather(struct nvhost_cdma *cdma,
+ struct mem_handle *handle, u32 offset, u32 op1, u32 op2);
+void nvhost_cdma_end(struct nvhost_cdma *cdma,
+ struct nvhost_job *job);
+void nvhost_cdma_update(struct nvhost_cdma *cdma);
+void nvhost_cdma_peek(struct nvhost_cdma *cdma,
+ u32 dmaget, int slot, u32 *out);
+unsigned int nvhost_cdma_wait_locked(struct nvhost_cdma *cdma,
+ enum cdma_event event);
+void nvhost_cdma_update_sync_queue(struct nvhost_cdma *cdma,
+ struct nvhost_syncpt *syncpt, struct platform_device *dev);
+#endif
diff --git a/drivers/video/tegra/host/nvhost_channel.c b/drivers/video/tegra/host/nvhost_channel.c
new file mode 100644
index 0000000..a134f33
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_channel.c
@@ -0,0 +1,126 @@
+/*
+ * drivers/video/tegra/host/nvhost_channel.c
+ *
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "nvhost_channel.h"
+#include "dev.h"
+#include "nvhost_acm.h"
+#include "chip_support.h"
+
+#include <linux/slab.h>
+#include <linux/module.h>
+
+#define NVHOST_CHANNEL_LOW_PRIO_MAX_WAIT 50
+
+int nvhost_channel_init(struct nvhost_channel *ch,
+ struct nvhost_master *dev, int index)
+{
+ int err;
+ struct nvhost_device_data *pdata = platform_get_drvdata(ch->dev);
+
+ /* Link platform_device to nvhost_channel */
+ err = channel_op().init(ch, dev, index);
+ if (err < 0) {
+ dev_err(&dev->dev->dev, "failed to init channel %d\n",
+ index);
+ return err;
+ }
+ pdata->channel = ch;
+
+ return 0;
+}
+
+int nvhost_channel_submit(struct nvhost_job *job)
+{
+ return channel_op().submit(job);
+}
+EXPORT_SYMBOL(nvhost_channel_submit);
+
+struct nvhost_channel *nvhost_getchannel(struct nvhost_channel *ch)
+{
+ int err = 0;
+ struct nvhost_device_data *pdata = platform_get_drvdata(ch->dev);
+
+ mutex_lock(&ch->reflock);
+ if (ch->refcount == 0) {
+ if (pdata->init)
+ pdata->init(ch->dev);
+ err = nvhost_cdma_init(&ch->cdma);
+ }
+ if (!err)
+ ch->refcount++;
+
+ mutex_unlock(&ch->reflock);
+
+ return err ? NULL : ch;
+}
+EXPORT_SYMBOL(nvhost_getchannel);
+
+void nvhost_putchannel(struct nvhost_channel *ch)
+{
+ mutex_lock(&ch->reflock);
+ if (ch->refcount == 1) {
+ channel_cdma_op().stop(&ch->cdma);
+ nvhost_cdma_deinit(&ch->cdma);
+ nvhost_module_suspend(ch->dev);
+ }
+ ch->refcount--;
+ mutex_unlock(&ch->reflock);
+}
+EXPORT_SYMBOL(nvhost_putchannel);
+
+int nvhost_channel_suspend(struct nvhost_channel *ch)
+{
+ int ret = 0;
+
+ mutex_lock(&ch->reflock);
+
+ if (ch->refcount) {
+ ret = nvhost_module_suspend(ch->dev);
+ if (!ret)
+ channel_cdma_op().stop(&ch->cdma);
+ }
+ mutex_unlock(&ch->reflock);
+
+ return ret;
+}
+
+struct nvhost_channel *nvhost_alloc_channel_internal(int chindex,
+ int max_channels, int *current_channel_count)
+{
+ struct nvhost_channel *ch = NULL;
+
+ if (chindex > max_channels ||
+ (*current_channel_count + 1) > max_channels)
+ return NULL;
+
+ ch = kzalloc(sizeof(*ch), GFP_KERNEL);
+ if (ch == NULL)
+ return NULL;
+
+ (*current_channel_count)++;
+ return ch;
+}
+
+void nvhost_free_channel_internal(struct nvhost_channel *ch,
+ int *current_channel_count)
+{
+ kfree(ch);
+ (*current_channel_count)--;
+}
diff --git a/drivers/video/tegra/host/nvhost_channel.h b/drivers/video/tegra/host/nvhost_channel.h
new file mode 100644
index 0000000..fff94b1
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_channel.h
@@ -0,0 +1,65 @@
+/*
+ * drivers/video/tegra/host/nvhost_channel.h
+ *
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_CHANNEL_H
+#define __NVHOST_CHANNEL_H
+
+#include <linux/cdev.h>
+#include <linux/io.h>
+#include "nvhost_cdma.h"
+
+#define NVHOST_MAX_WAIT_CHECKS 256
+#define NVHOST_MAX_GATHERS 512
+#define NVHOST_MAX_HANDLES 1280
+#define NVHOST_MAX_POWERGATE_IDS 2
+
+struct nvhost_master;
+struct platform_device;
+struct nvhost_channel;
+
+struct nvhost_channel {
+ int refcount;
+ int chid;
+ u32 syncpt_id;
+ struct mutex reflock;
+ struct mutex submitlock;
+ void __iomem *aperture;
+ struct device *node;
+ struct platform_device *dev;
+ struct cdev cdev;
+ struct nvhost_cdma cdma;
+};
+
+int nvhost_channel_init(struct nvhost_channel *ch,
+ struct nvhost_master *dev, int index);
+
+struct nvhost_channel *nvhost_getchannel(struct nvhost_channel *ch);
+void nvhost_putchannel(struct nvhost_channel *ch);
+int nvhost_channel_suspend(struct nvhost_channel *ch);
+
+struct nvhost_channel *nvhost_alloc_channel_internal(int chindex,
+ int max_channels, int *current_channel_count);
+
+void nvhost_free_channel_internal(struct nvhost_channel *ch,
+ int *current_channel_count);
+
+int nvhost_channel_save_context(struct nvhost_channel *ch);
+
+#endif
diff --git a/drivers/video/tegra/host/nvhost_intr.c b/drivers/video/tegra/host/nvhost_intr.c
index 35dd7bb..0b451c8 100644
--- a/drivers/video/tegra/host/nvhost_intr.c
+++ b/drivers/video/tegra/host/nvhost_intr.c
@@ -23,6 +23,7 @@
#include <linux/interrupt.h>
#include <linux/slab.h>
#include <linux/irq.h>
+#include "nvhost_channel.h"
#include "chip_support.h"
#include "host1x/host1x.h"

@@ -78,7 +79,7 @@ static void remove_completed_waiters(struct list_head *head, u32 sync,
struct list_head completed[NVHOST_INTR_ACTION_COUNT])
{
struct list_head *dest;
- struct nvhost_waitlist *waiter, *next;
+ struct nvhost_waitlist *waiter, *next, *prev;

list_for_each_entry_safe(waiter, next, head, list) {
if ((s32)(waiter->thresh - sync) > 0)
@@ -86,6 +87,17 @@ static void remove_completed_waiters(struct list_head *head, u32 sync,

dest = completed + waiter->action;

+ /* consolidate submit cleanups */
+ if (waiter->action == NVHOST_INTR_ACTION_SUBMIT_COMPLETE
+ && !list_empty(dest)) {
+ prev = list_entry(dest->prev,
+ struct nvhost_waitlist, list);
+ if (prev->data == waiter->data) {
+ prev->count++;
+ dest = NULL;
+ }
+ }
+
/* PENDING->REMOVED or CANCELLED->HANDLED */
if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) {
list_del(&waiter->list);
@@ -107,6 +119,14 @@ void reset_threshold_interrupt(struct nvhost_intr *intr,
intr_op().enable_syncpt_intr(intr, id);
}

+static void action_submit_complete(struct nvhost_waitlist *waiter)
+{
+ struct nvhost_channel *channel = waiter->data;
+ int nr_completed = waiter->count;
+
+ nvhost_cdma_update(&channel->cdma);
+ nvhost_module_idle_mult(channel->dev, nr_completed);
+}

static void action_wakeup(struct nvhost_waitlist *waiter)
{
@@ -125,6 +145,7 @@ static void action_wakeup_interruptible(struct nvhost_waitlist *waiter)
typedef void (*action_handler)(struct nvhost_waitlist *waiter);

static action_handler action_handlers[NVHOST_INTR_ACTION_COUNT] = {
+ action_submit_complete,
action_wakeup,
action_wakeup_interruptible,
};
diff --git a/drivers/video/tegra/host/nvhost_intr.h b/drivers/video/tegra/host/nvhost_intr.h
index 31b0a38..601ea64 100644
--- a/drivers/video/tegra/host/nvhost_intr.h
+++ b/drivers/video/tegra/host/nvhost_intr.h
@@ -26,8 +26,16 @@
#include <linux/interrupt.h>
#include <linux/workqueue.h>

+struct nvhost_channel;
+
enum nvhost_intr_action {
/**
+ * Perform cleanup after a submit has completed.
+ * 'data' points to a channel
+ */
+ NVHOST_INTR_ACTION_SUBMIT_COMPLETE = 0,
+
+ /**
* Wake up a task.
* 'data' points to a wait_queue_head_t
*/
diff --git a/drivers/video/tegra/host/nvhost_job.c b/drivers/video/tegra/host/nvhost_job.c
new file mode 100644
index 0000000..aaa51b5
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_job.c
@@ -0,0 +1,390 @@
+/*
+ * drivers/video/tegra/host/nvhost_job.c
+ *
+ * Tegra host1x Job
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/kref.h>
+#include <linux/err.h>
+#include <linux/vmalloc.h>
+#include <linux/scatterlist.h>
+#include <linux/nvhost.h>
+#include "nvhost_channel.h"
+#include "nvhost_syncpt.h"
+#include "dev.h"
+#include "nvhost_memmgr.h"
+#include "chip_support.h"
+
+/* Magic to use to fill freed handle slots */
+#define BAD_MAGIC 0xdeadbeef
+
+static size_t job_size(u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
+{
+ u32 num_unpins = num_cmdbufs + num_relocs;
+ s64 total;
+
+ if (num_relocs < 0 || num_waitchks < 0 || num_cmdbufs < 0)
+ return 0;
+
+ total = sizeof(struct nvhost_job)
+ + num_relocs * sizeof(struct nvhost_reloc)
+ + num_unpins * sizeof(struct nvhost_job_unpin_data)
+ + num_waitchks * sizeof(struct nvhost_waitchk)
+ + num_cmdbufs * sizeof(struct nvhost_job_gather);
+
+ if (total > ULONG_MAX)
+ return 0;
+ return (size_t)total;
+}
+
+
+static void init_fields(struct nvhost_job *job,
+ u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
+{
+ u32 num_unpins = num_cmdbufs + num_relocs;
+ void *mem = job;
+
+ /* First init state to zero */
+
+ /*
+ * Redistribute memory to the structs.
+ * Overflows and negative conditions have
+ * already been checked in job_alloc().
+ */
+ mem += sizeof(struct nvhost_job);
+ job->relocarray = num_relocs ? mem : NULL;
+ mem += num_relocs * sizeof(struct nvhost_reloc);
+ job->unpins = num_unpins ? mem : NULL;
+ mem += num_unpins * sizeof(struct nvhost_job_unpin_data);
+ job->waitchk = num_waitchks ? mem : NULL;
+ mem += num_waitchks * sizeof(struct nvhost_waitchk);
+ job->gathers = num_cmdbufs ? mem : NULL;
+ mem += num_cmdbufs * sizeof(struct nvhost_job_gather);
+ job->addr_phys = (num_cmdbufs || num_relocs) ? mem : NULL;
+
+ job->reloc_addr_phys = job->addr_phys;
+ job->gather_addr_phys = &job->addr_phys[num_relocs];
+}
+
+struct nvhost_job *nvhost_job_alloc(struct nvhost_channel *ch,
+ int num_cmdbufs, int num_relocs, int num_waitchks)
+{
+ struct nvhost_job *job = NULL;
+ size_t size = job_size(num_cmdbufs, num_relocs, num_waitchks);
+
+ if (!size)
+ return NULL;
+ job = vzalloc(size);
+ if (!job)
+ return NULL;
+
+ kref_init(&job->ref);
+ job->ch = ch;
+
+ init_fields(job, num_cmdbufs, num_relocs, num_waitchks);
+
+ return job;
+}
+EXPORT_SYMBOL(nvhost_job_alloc);
+
+void nvhost_job_get(struct nvhost_job *job)
+{
+ kref_get(&job->ref);
+}
+EXPORT_SYMBOL(nvhost_job_get);
+
+static void job_free(struct kref *ref)
+{
+ struct nvhost_job *job = container_of(ref, struct nvhost_job, ref);
+
+ vfree(job);
+}
+
+void nvhost_job_put(struct nvhost_job *job)
+{
+ kref_put(&job->ref, job_free);
+}
+EXPORT_SYMBOL(nvhost_job_put);
+
+void nvhost_job_add_gather(struct nvhost_job *job,
+ u32 mem_id, u32 words, u32 offset)
+{
+ struct nvhost_job_gather *cur_gather =
+ &job->gathers[job->num_gathers];
+
+ cur_gather->words = words;
+ cur_gather->mem_id = mem_id;
+ cur_gather->offset = offset;
+ job->num_gathers += 1;
+}
+EXPORT_SYMBOL(nvhost_job_add_gather);
+
+/*
+ * Check driver supplied waitchk structs for syncpt thresholds
+ * that have already been satisfied and NULL the comparison (to
+ * avoid a wrap condition in the HW).
+ */
+static int do_waitchks(struct nvhost_job *job, struct nvhost_syncpt *sp,
+ u32 patch_mem, struct mem_handle *h)
+{
+ int i;
+
+ /* compare syncpt vs wait threshold */
+ for (i = 0; i < job->num_waitchk; i++) {
+ struct nvhost_waitchk *wait = &job->waitchk[i];
+
+ /* validate syncpt id */
+ if (wait->syncpt_id > nvhost_syncpt_nb_pts(sp))
+ continue;
+
+ /* skip all other gathers */
+ if (patch_mem != wait->mem)
+ continue;
+
+ if (nvhost_syncpt_is_expired(sp,
+ wait->syncpt_id, wait->thresh)) {
+ void *patch_addr = NULL;
+
+ /*
+ * NULL an already satisfied WAIT_SYNCPT host method,
+ * by patching its args in the command stream. The
+ * method data is changed to reference a reserved
+ * (never given out or incr) NVSYNCPT_GRAPHICS_HOST
+ * syncpt with a matching threshold value of 0, so
+ * is guaranteed to be popped by the host HW.
+ */
+ dev_dbg(&syncpt_to_dev(sp)->dev->dev,
+ "drop WAIT id %d (%s) thresh 0x%x, min 0x%x\n",
+ wait->syncpt_id,
+ syncpt_op().name(sp, wait->syncpt_id),
+ wait->thresh,
+ nvhost_syncpt_read_min(sp, wait->syncpt_id));
+
+ /* patch the wait */
+ patch_addr = nvhost_memmgr_kmap(h,
+ wait->offset >> PAGE_SHIFT);
+ if (patch_addr) {
+ nvhost_syncpt_patch_wait(sp,
+ (patch_addr +
+ (wait->offset & ~PAGE_MASK)));
+ nvhost_memmgr_kunmap(h,
+ wait->offset >> PAGE_SHIFT,
+ patch_addr);
+ } else {
+ pr_err("Couldn't map cmdbuf for wait check\n");
+ }
+ }
+
+ wait->mem = 0;
+ }
+ return 0;
+}
+
+
+static int pin_job_mem(struct nvhost_job *job)
+{
+ int i;
+ int count = 0;
+ int result;
+ long unsigned *ids =
+ kmalloc(sizeof(u32 *) *
+ (job->num_relocs + job->num_gathers),
+ GFP_KERNEL);
+ if (!ids)
+ return -ENOMEM;
+
+ for (i = 0; i < job->num_relocs; i++) {
+ struct nvhost_reloc *reloc = &job->relocarray[i];
+ ids[count] = reloc->target;
+ count++;
+ }
+
+ for (i = 0; i < job->num_gathers; i++) {
+ struct nvhost_job_gather *g = &job->gathers[i];
+ ids[count] = g->mem_id;
+ count++;
+ }
+
+ /* validate array and pin unique ids, get refs for unpinning */
+ result = nvhost_memmgr_pin_array_ids(job->ch->dev,
+ ids, job->addr_phys,
+ count,
+ job->unpins);
+ kfree(ids);
+
+ if (result > 0)
+ job->num_unpins = result;
+
+ return result;
+}
+
+static int do_relocs(struct nvhost_job *job,
+ u32 cmdbuf_mem, struct mem_handle *h)
+{
+ int i = 0;
+ int last_page = -1;
+ void *cmdbuf_page_addr = NULL;
+
+ /* pin & patch the relocs for one gather */
+ while (i < job->num_relocs) {
+ struct nvhost_reloc *reloc = &job->relocarray[i];
+
+ /* skip all other gathers */
+ if (cmdbuf_mem != reloc->cmdbuf_mem) {
+ i++;
+ continue;
+ }
+
+ if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) {
+ if (cmdbuf_page_addr)
+ nvhost_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
+
+ cmdbuf_page_addr = nvhost_memmgr_kmap(h,
+ reloc->cmdbuf_offset >> PAGE_SHIFT);
+ last_page = reloc->cmdbuf_offset >> PAGE_SHIFT;
+
+ if (unlikely(!cmdbuf_page_addr)) {
+ pr_err("Couldn't map cmdbuf for relocation\n");
+ return -ENOMEM;
+ }
+ }
+
+ __raw_writel(
+ (job->reloc_addr_phys[i] +
+ reloc->target_offset) >> reloc->shift,
+ (cmdbuf_page_addr +
+ (reloc->cmdbuf_offset & ~PAGE_MASK)));
+
+ /* remove completed reloc from the job */
+ if (i != job->num_relocs - 1) {
+ struct nvhost_reloc *reloc_last =
+ &job->relocarray[job->num_relocs - 1];
+ reloc->cmdbuf_mem = reloc_last->cmdbuf_mem;
+ reloc->cmdbuf_offset = reloc_last->cmdbuf_offset;
+ reloc->target = reloc_last->target;
+ reloc->target_offset = reloc_last->target_offset;
+ reloc->shift = reloc_last->shift;
+ job->reloc_addr_phys[i] =
+ job->reloc_addr_phys[job->num_relocs - 1];
+ job->num_relocs--;
+ } else {
+ break;
+ }
+ }
+
+ if (cmdbuf_page_addr)
+ nvhost_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
+
+ return 0;
+}
+
+
+int nvhost_job_pin(struct nvhost_job *job, struct platform_device *pdev)
+{
+ int err = 0, i = 0, j = 0;
+ struct nvhost_syncpt *sp = &nvhost_get_host(pdev)->syncpt;
+ unsigned long waitchk_mask[nvhost_syncpt_nb_pts(sp) / BITS_PER_LONG];
+
+ memset(&waitchk_mask[0], 0, sizeof(waitchk_mask));
+ for (i = 0; i < job->num_waitchk; i++) {
+ u32 syncpt_id = job->waitchk[i].syncpt_id;
+ if (syncpt_id < nvhost_syncpt_nb_pts(sp))
+ waitchk_mask[BIT_WORD(syncpt_id)] |=
+ BIT_MASK(syncpt_id);
+ }
+
+ /* get current syncpt values for waitchk */
+ for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
+ nvhost_syncpt_update_min(sp, i);
+
+ /* pin memory */
+ err = pin_job_mem(job);
+ if (err <= 0)
+ goto fail;
+
+ /* patch gathers */
+ for (i = 0; i < job->num_gathers; i++) {
+ struct nvhost_job_gather *g = &job->gathers[i];
+
+ /* process each gather mem only once */
+ if (!g->ref) {
+ g->ref = nvhost_memmgr_get(g->mem_id, job->ch->dev);
+ if (IS_ERR(g->ref)) {
+ err = PTR_ERR(g->ref);
+ g->ref = NULL;
+ break;
+ }
+
+ g->mem_base = job->gather_addr_phys[i];
+
+ for (j = 0; j < job->num_gathers; j++) {
+ struct nvhost_job_gather *tmp =
+ &job->gathers[j];
+ if (!tmp->ref && tmp->mem_id == g->mem_id) {
+ tmp->ref = g->ref;
+ tmp->mem_base = g->mem_base;
+ }
+ }
+ err = do_relocs(job, g->mem_id, g->ref);
+ if (!err)
+ err = do_waitchks(job, sp,
+ g->mem_id, g->ref);
+ nvhost_memmgr_put(g->ref);
+ if (err)
+ break;
+ }
+ }
+fail:
+ wmb();
+
+ return err;
+}
+EXPORT_SYMBOL(nvhost_job_pin);
+
+void nvhost_job_unpin(struct nvhost_job *job)
+{
+ int i;
+
+ for (i = 0; i < job->num_unpins; i++) {
+ struct nvhost_job_unpin_data *unpin = &job->unpins[i];
+ nvhost_memmgr_unpin(unpin->h, unpin->mem);
+ nvhost_memmgr_put(unpin->h);
+ }
+ job->num_unpins = 0;
+}
+EXPORT_SYMBOL(nvhost_job_unpin);
+
+/**
+ * Debug routine used to dump job entries
+ */
+void nvhost_job_dump(struct device *dev, struct nvhost_job *job)
+{
+ dev_dbg(dev, " SYNCPT_ID %d\n",
+ job->syncpt_id);
+ dev_dbg(dev, " SYNCPT_VAL %d\n",
+ job->syncpt_end);
+ dev_dbg(dev, " FIRST_GET 0x%x\n",
+ job->first_get);
+ dev_dbg(dev, " TIMEOUT %d\n",
+ job->timeout);
+ dev_dbg(dev, " NUM_SLOTS %d\n",
+ job->num_slots);
+ dev_dbg(dev, " NUM_HANDLES %d\n",
+ job->num_unpins);
+}
diff --git a/drivers/video/tegra/host/nvhost_memmgr.c b/drivers/video/tegra/host/nvhost_memmgr.c
new file mode 100644
index 0000000..bdceb74
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_memmgr.c
@@ -0,0 +1,160 @@
+/*
+ * drivers/video/tegra/host/nvhost_memmgr.c
+ *
+ * Tegra host1x Memory Management Abstraction
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/err.h>
+
+#include "nvhost_memmgr.h"
+#include "dmabuf.h"
+#include "chip_support.h"
+
+struct mem_handle *nvhost_memmgr_alloc(size_t size, size_t align, int flags)
+{
+ struct mem_handle *h = NULL;
+ h = nvhost_dmabuf_alloc(size, align, flags);
+
+ return h;
+}
+
+struct mem_handle *nvhost_memmgr_get(u32 id, struct platform_device *dev)
+{
+ struct mem_handle *h = NULL;
+
+ switch (nvhost_memmgr_type(id)) {
+ case mem_mgr_type_dmabuf:
+ h = (struct mem_handle *) nvhost_dmabuf_get(id, dev);
+ break;
+ default:
+ break;
+ }
+
+ return h;
+}
+
+void nvhost_memmgr_put(struct mem_handle *handle)
+{
+ switch (nvhost_memmgr_type((u32)handle)) {
+ case mem_mgr_type_dmabuf:
+ nvhost_dmabuf_put(handle);
+ break;
+ default:
+ break;
+ }
+}
+
+struct sg_table *nvhost_memmgr_pin(struct mem_handle *handle)
+{
+ switch (nvhost_memmgr_type((u32)handle)) {
+ case mem_mgr_type_dmabuf:
+ return nvhost_dmabuf_pin(handle);
+ break;
+ default:
+ return 0;
+ break;
+ }
+}
+
+void nvhost_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt)
+{
+ switch (nvhost_memmgr_type((u32)handle)) {
+ case mem_mgr_type_dmabuf:
+ nvhost_dmabuf_unpin(handle, sgt);
+ break;
+ default:
+ break;
+ }
+}
+
+void *nvhost_memmgr_mmap(struct mem_handle *handle)
+{
+ switch (nvhost_memmgr_type((u32)handle)) {
+ case mem_mgr_type_dmabuf:
+ return nvhost_dmabuf_mmap(handle);
+ break;
+ default:
+ return 0;
+ break;
+ }
+}
+
+void nvhost_memmgr_munmap(struct mem_handle *handle, void *addr)
+{
+ switch (nvhost_memmgr_type((u32)handle)) {
+ case mem_mgr_type_dmabuf:
+ nvhost_dmabuf_munmap(handle, addr);
+ break;
+ default:
+ break;
+ }
+}
+
+void *nvhost_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum)
+{
+ switch (nvhost_memmgr_type((u32)handle)) {
+ case mem_mgr_type_dmabuf:
+ return nvhost_dmabuf_kmap(handle, pagenum);
+ break;
+ default:
+ return 0;
+ break;
+ }
+}
+
+void nvhost_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum,
+ void *addr)
+{
+ switch (nvhost_memmgr_type((u32)handle)) {
+ case mem_mgr_type_dmabuf:
+ nvhost_dmabuf_kunmap(handle, pagenum, addr);
+ break;
+ default:
+ break;
+ }
+}
+
+int nvhost_memmgr_pin_array_ids(struct platform_device *dev,
+ long unsigned *ids,
+ dma_addr_t *phys_addr,
+ u32 count,
+ struct nvhost_job_unpin_data *unpin_data)
+{
+ int pin_count = 0;
+
+ int dmabuf_count = 0;
+ dmabuf_count = nvhost_dmabuf_pin_array_ids(dev,
+ ids, MEMMGR_TYPE_MASK,
+ mem_mgr_type_dmabuf,
+ count, &unpin_data[pin_count],
+ phys_addr);
+
+ if (dmabuf_count < 0) {
+ /* clean up previous handles */
+ while (pin_count) {
+ pin_count--;
+ /* unpin, put */
+ nvhost_memmgr_unpin(unpin_data[pin_count].h,
+ unpin_data[pin_count].mem);
+ nvhost_memmgr_put(unpin_data[pin_count].h);
+ }
+ return dmabuf_count;
+ }
+ pin_count += dmabuf_count;
+ return pin_count;
+}
diff --git a/drivers/video/tegra/host/nvhost_memmgr.h b/drivers/video/tegra/host/nvhost_memmgr.h
new file mode 100644
index 0000000..77b755d
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_memmgr.h
@@ -0,0 +1,65 @@
+/*
+ * drivers/video/tegra/host/nvhost_memmgr.h
+ *
+ * Tegra host1x Memory Management Abstraction header
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _NVHOST_MEM_MGR_H_
+#define _NVHOST_MEM_MGR_H_
+
+struct nvhost_chip_support;
+struct mem_handle;
+struct platform_device;
+
+struct nvhost_job_unpin_data {
+ struct mem_handle *h;
+ struct sg_table *mem;
+};
+
+enum mem_mgr_flag {
+ mem_mgr_flag_uncacheable = 0,
+ mem_mgr_flag_write_combine = 1,
+};
+
+enum mem_mgr_type {
+ mem_mgr_type_dmabuf = 1,
+};
+
+#define MEMMGR_TYPE_MASK 0x3
+#define MEMMGR_ID_MASK ~0x3
+
+struct mem_handle *nvhost_memmgr_alloc(size_t size, size_t align,
+ int flags);
+struct mem_handle *nvhost_memmgr_get(u32 id, struct platform_device *dev);
+void nvhost_memmgr_put(struct mem_handle *handle);
+struct sg_table *nvhost_memmgr_pin(struct mem_handle *handle);
+void nvhost_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt);
+void *nvhost_memmgr_mmap(struct mem_handle *handle);
+void nvhost_memmgr_munmap(struct mem_handle *handle, void *addr);
+void *nvhost_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum);
+void nvhost_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum,
+ void *addr);
+static inline int nvhost_memmgr_type(u32 id) { return id & MEMMGR_TYPE_MASK; }
+static inline int nvhost_memmgr_id(u32 id) { return id & MEMMGR_ID_MASK; }
+
+int nvhost_memmgr_pin_array_ids(struct platform_device *dev,
+ long unsigned *ids,
+ dma_addr_t *phys_addr,
+ u32 count,
+ struct nvhost_job_unpin_data *unpin_data);
+
+#endif
diff --git a/drivers/video/tegra/host/nvhost_syncpt.c b/drivers/video/tegra/host/nvhost_syncpt.c
index 6ef0ba4..f61b924 100644
--- a/drivers/video/tegra/host/nvhost_syncpt.c
+++ b/drivers/video/tegra/host/nvhost_syncpt.c
@@ -299,6 +299,12 @@ void nvhost_syncpt_debug(struct nvhost_syncpt *sp)
{
syncpt_op().debug(sp);
}
+/* remove a wait pointed to by patch_addr */
+int nvhost_syncpt_patch_wait(struct nvhost_syncpt *sp, void *patch_addr)
+{
+ return syncpt_op().patch_wait(sp, patch_addr);
+}
+
/* Displays the current value of the sync point via sysfs */
static ssize_t syncpt_min_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
diff --git a/drivers/video/tegra/host/nvhost_syncpt.h b/drivers/video/tegra/host/nvhost_syncpt.h
index dbd3890..93ec123 100644
--- a/drivers/video/tegra/host/nvhost_syncpt.h
+++ b/drivers/video/tegra/host/nvhost_syncpt.h
@@ -136,6 +136,8 @@ static inline int nvhost_syncpt_wait(struct nvhost_syncpt *sp,
MAX_SCHEDULE_TIMEOUT, NULL);
}

+int nvhost_syncpt_patch_wait(struct nvhost_syncpt *sp, void *patch_addr);
+
void nvhost_syncpt_debug(struct nvhost_syncpt *sp);

static inline int nvhost_syncpt_is_valid(struct nvhost_syncpt *sp, u32 id)
diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
index 745f31c..96405bf 100644
--- a/include/linux/nvhost.h
+++ b/include/linux/nvhost.h
@@ -27,7 +27,9 @@
#include <linux/types.h>
#include <linux/platform_device.h>

+struct nvhost_job;
struct nvhost_device_power_attr;
+struct nvhost_job_unpin_data;

#define NVHOST_MODULE_MAX_CLOCKS 3
#define NVHOST_MODULE_MAX_POWERGATE_IDS 2
@@ -37,6 +39,19 @@ struct nvhost_device_power_attr;
#define NVSYNCPT_INVALID (-1)
#define NVHOST_NO_TIMEOUT (-1)

+#define NVSYNCPT_2D_0 (18)
+#define NVSYNCPT_2D_1 (19)
+#define NVSYNCPT_VBLANK0 (26)
+#define NVSYNCPT_VBLANK1 (27)
+
+/* sync points that are wholly managed by the client */
+#define NVSYNCPTS_CLIENT_MANAGED (\
+ BIT(NVSYNCPT_VBLANK0) | \
+ BIT(NVSYNCPT_VBLANK1) | \
+ BIT(NVSYNCPT_2D_1))
+
+#define NVWAITBASE_2D_0 (1)
+#define NVWAITBASE_2D_1 (2)
enum nvhost_power_sysfs_attributes {
NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY = 0,
NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY,
@@ -142,4 +157,138 @@ void host1x_syncpt_incr(u32 id);
u32 host1x_syncpt_read(u32 id);
int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value);

+/* Register device */
+int nvhost_client_device_init(struct platform_device *dev);
+int nvhost_client_device_suspend(struct platform_device *dev);
+struct nvhost_channel *nvhost_getchannel(struct nvhost_channel *ch);
+void nvhost_putchannel(struct nvhost_channel *ch);
+int nvhost_channel_submit(struct nvhost_job *job);
+
+enum host1x_class {
+ NV_HOST1X_CLASS_ID = 0x1,
+ NV_GRAPHICS_2D_CLASS_ID = 0x51,
+};
+
+struct nvhost_job_gather {
+ u32 words;
+ struct sg_table *mem_sgt;
+ dma_addr_t mem_base;
+ u32 mem_id;
+ int offset;
+ struct mem_handle *ref;
+};
+
+struct nvhost_cmdbuf {
+ __u32 mem;
+ __u32 offset;
+ __u32 words;
+};
+
+struct nvhost_reloc {
+ __u32 cmdbuf_mem;
+ __u32 cmdbuf_offset;
+ __u32 target;
+ __u32 target_offset;
+ __u32 shift;
+};
+
+struct nvhost_waitchk {
+ __u32 mem;
+ __u32 offset;
+ __u32 syncpt_id;
+ __u32 thresh;
+};
+
+/*
+ * Each submit is tracked as a nvhost_job.
+ */
+struct nvhost_job {
+ /* When refcount goes to zero, job can be freed */
+ struct kref ref;
+
+ /* List entry */
+ struct list_head list;
+
+ /* Channel where job is submitted to */
+ struct nvhost_channel *ch;
+
+ int clientid;
+
+ /* Gathers and their memory */
+ struct nvhost_job_gather *gathers;
+ int num_gathers;
+
+ /* Wait checks to be processed at submit time */
+ struct nvhost_waitchk *waitchk;
+ int num_waitchk;
+ u32 waitchk_mask;
+
+ /* Array of handles to be pinned & unpinned */
+ struct nvhost_reloc *relocarray;
+ int num_relocs;
+ struct nvhost_job_unpin_data *unpins;
+ int num_unpins;
+
+ dma_addr_t *addr_phys;
+ dma_addr_t *gather_addr_phys;
+ dma_addr_t *reloc_addr_phys;
+
+ /* Sync point id, number of increments and end related to the submit */
+ u32 syncpt_id;
+ u32 syncpt_incrs;
+ u32 syncpt_end;
+
+ /* Maximum time to wait for this job */
+ int timeout;
+
+ /* Null kickoff prevents submit from being sent to hardware */
+ bool null_kickoff;
+
+ /* Index and number of slots used in the push buffer */
+ int first_get;
+ int num_slots;
+};
+/*
+ * Allocate memory for a job. Just enough memory will be allocated to
+ * accomodate the submit.
+ */
+struct nvhost_job *nvhost_job_alloc(struct nvhost_channel *ch,
+ int num_cmdbufs, int num_relocs, int num_waitchks);
+
+/*
+ * Add a gather to a job.
+ */
+void nvhost_job_add_gather(struct nvhost_job *job,
+ u32 mem_id, u32 words, u32 offset);
+
+/*
+ * Increment reference going to nvhost_job.
+ */
+void nvhost_job_get(struct nvhost_job *job);
+
+/*
+ * Decrement reference job, free if goes to zero.
+ */
+void nvhost_job_put(struct nvhost_job *job);
+
+/*
+ * Pin memory related to job. This handles relocation of addresses to the
+ * host1x address space. Handles both the gather memory and any other memory
+ * referred to from the gather buffers.
+ *
+ * Handles also patching out host waits that would wait for an expired sync
+ * point value.
+ */
+int nvhost_job_pin(struct nvhost_job *job, struct platform_device *pdev);
+
+/*
+ * Unpin memory related to job.
+ */
+void nvhost_job_unpin(struct nvhost_job *job);
+
+/*
+ * Dump contents of job to debug output.
+ */
+void nvhost_job_dump(struct device *dev, struct nvhost_job *job);
+
#endif
--
1.7.9.5
Mark Zhang
2012-11-29 10:01:03 UTC
Permalink
On 11/26/2012 09:19 PM, Terje Bergstr=C3=B6m <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org> wr=
ote:
> Add support for host1x client modules, and host1x channels to submit
> work to the clients. The work is submitted in dmabuf buffers, so add
> support for dmabuf memory management, too.
[...]
> diff --git a/drivers/video/tegra/host/bus_client.c b/drivers/video/te=
gra/host/bus_client.c
[...]
> +int nvhost_client_device_init(struct platform_device *dev)
> +{
> + int err;
> + struct nvhost_master *nvhost_master =3D nvhost_get_host(dev);
> + struct nvhost_channel *ch;
> + struct nvhost_device_data *pdata =3D platform_get_drvdata(dev);
> +
> + ch =3D nvhost_alloc_channel(dev);
> + if (ch =3D=3D NULL)
> + return -ENODEV;
> +
> + /* store the pointer to this device for channel */
> + ch->dev =3D dev;
> +
> + err =3D nvhost_channel_init(ch, nvhost_master, pdata->index);
> + if (err)
> + goto fail;
> +
> + err =3D nvhost_module_init(dev);
> + if (err)
> + goto fail;
> +
> + err =3D nvhost_device_list_add(dev);
> + if (err)
> + goto fail;
> +
> + dev_info(&dev->dev, "initialized\n");
> +
> + return 0;
> +
> +fail:
> + /* Add clean-up */

Yes, add "nvhost_module_deinit" here?

> + nvhost_free_channel(ch);
> + return err;
> +}
> +EXPORT_SYMBOL(nvhost_client_device_init);
> +
> +int nvhost_client_device_suspend(struct platform_device *dev)
> +{
> + int ret =3D 0;
> + struct nvhost_device_data *pdata =3D platform_get_drvdata(dev);
> +
> + ret =3D nvhost_channel_suspend(pdata->channel);
> + dev_info(&dev->dev, "suspend status: %d\n", ret);
> + if (ret)
> + return ret;
> +
> + return ret;

Minor issue: just "return ret" is OK. That "if" doesn't make sense.

> +}
> +EXPORT_SYMBOL(nvhost_client_device_suspend);
> diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/=
tegra/host/chip_support.c
> index 5a44147..8765c83 100644
> --- a/drivers/video/tegra/host/chip_support.c
> +++ b/drivers/video/tegra/host/chip_support.c
> @@ -25,7 +25,7 @@
> #include "chip_support.h"
> #include "host1x/host1x01.h"
> =20
> -struct nvhost_chip_support *nvhost_chip_ops;
> +static struct nvhost_chip_support *nvhost_chip_ops;
> =20

All right, already fixed here. Sorry, so just ignore what I said about
this in my reply to your patch 1.

[...]
> +
> +struct mem_handle *nvhost_dmabuf_get(u32 id, struct platform_device =
*dev)
> +{
> + struct mem_handle *h;
> + struct dma_buf *buf;
> +
> + buf =3D dma_buf_get(to_dmabuf_fd(id));
> + if (IS_ERR_OR_NULL(buf))
> + return (struct mem_handle *)buf;
> +
> + h =3D (struct mem_handle *)dma_buf_attach(buf, &dev->dev);
> + if (IS_ERR_OR_NULL(h))
> + dma_buf_put(buf);

Return an error here.

> +
> + return (struct mem_handle *) ((u32)h | mem_mgr_type_dmabuf);
> +}
> +
[...]
> int nvhost_init_host1x01_support(struct nvhost_master *host,
> struct nvhost_chip_support *op)
> {
> + op->channel =3D host1x_channel_ops;
> + op->cdma =3D host1x_cdma_ops;
> + op->push_buffer =3D host1x_pushbuffer_ops;
> host->sync_aperture =3D host->aperture + HOST1X_CHANNEL_SYNC_REG_BA=
SE;
> op->syncpt =3D host1x_syncpt_ops;
> op->intr =3D host1x_intr_ops;
> =20
> + op->nvhost_dev.alloc_nvhost_channel =3D t20_alloc_nvhost_channel;
> + op->nvhost_dev.free_nvhost_channel =3D t20_free_nvhost_channel;
> +

I recall in previous version, there is t30-related alloc_nvhost_channel
& free_nvhost_channel. Why remove them?

> return 0;
> }
[...]
> +static int push_buffer_init(struct push_buffer *pb)
> +{
> + struct nvhost_cdma *cdma =3D pb_to_cdma(pb);
> + struct nvhost_master *master =3D cdma_to_dev(cdma);
> + pb->mapped =3D NULL;
> + pb->phys =3D 0;
> + pb->handle =3D NULL;
> +
> + cdma_pb_op().reset(pb);
> +
> + /* allocate and map pushbuffer memory */
> + pb->mapped =3D dma_alloc_writecombine(&master->dev->dev,
> + PUSH_BUFFER_SIZE + 4, &pb->phys, GFP_KERNEL);
> + if (IS_ERR_OR_NULL(pb->mapped)) {
> + pb->mapped =3D NULL;
> + goto fail;

Return directly here. "goto fail" makes "push_buffer_destroy" get calle=
d.

> + }
> +
> + /* memory for storing mem client and handles for each opcode pair *=
/
> + pb->handle =3D kzalloc(NVHOST_GATHER_QUEUE_SIZE *
> + sizeof(struct mem_handle *),
> + GFP_KERNEL);
> + if (!pb->handle)
> + goto fail;
> +
> + /* put the restart at the end of pushbuffer memory */

Just for curious, why "pb->mapped + 1K" is the end of a 4K pushbuffer?

> + *(pb->mapped + (PUSH_BUFFER_SIZE >> 2)) =3D
> + nvhost_opcode_restart(pb->phys);
> +
> + return 0;
> +
> +fail:
> + push_buffer_destroy(pb);
> + return -ENOMEM;
> +}
> +
[...]
> +
> +/**
> + * Sleep (if necessary) until the requested event happens
> + * - CDMA_EVENT_SYNC_QUEUE_EMPTY : sync queue is completely empty.
> + * - Returns 1
> + * - CDMA_EVENT_PUSH_BUFFER_SPACE : there is space in the push buf=
fer
> + * - Return the amount of space (> 0)
> + * Must be called with the cdma lock held.
> + */
> +unsigned int nvhost_cdma_wait_locked(struct nvhost_cdma *cdma,
> + enum cdma_event event)
> +{
> + for (;;) {
> + unsigned int space =3D cdma_status_locked(cdma, event);
> + if (space)
> + return space;
> +
> + /* If somebody has managed to already start waiting, yield */
> + if (cdma->event !=3D CDMA_EVENT_NONE) {
> + mutex_unlock(&cdma->lock);
> + schedule();
> + mutex_lock(&cdma->lock);
> + continue;
> + }
> + cdma->event =3D event;
> +
> + mutex_unlock(&cdma->lock);
> + down(&cdma->sem);
> + mutex_lock(&cdma->lock);

I'm newbie of nvhost but I feel here is very tricky, about the lock and
unlock of this mutex: cdma->lock. Does it require this mutex is locked
before calling this function? And do we need to unlock it before the
code: "return space;" above? IMHO, this is not a good design and can we
find out a better solution?

> + }
> + return 0;
> +}
[...]

> +/*
> + * Dump contents of job to debug output.
> + */
> +void nvhost_job_dump(struct device *dev, struct nvhost_job *job);
> +
> #endif
>=20
Terje Bergström
2012-11-29 10:46:54 UTC
Permalink
On 29.11.2012 12:01, Mark Zhang wrote:
>> +fail:
>> + /* Add clean-up */
>
> Yes, add "nvhost_module_deinit" here?

Sounds good.

>> +int nvhost_client_device_suspend(struct platform_device *dev)
>> +{
>> + int ret = 0;
>> + struct nvhost_device_data *pdata = platform_get_drvdata(dev);
>> +
>> + ret = nvhost_channel_suspend(pdata->channel);
>> + dev_info(&dev->dev, "suspend status: %d\n", ret);
>> + if (ret)
>> + return ret;
>> +
>> + return ret;
>
> Minor issue: just "return ret" is OK. That "if" doesn't make sense.

Yes, must be some snafu when doing changes in code.

>> -struct nvhost_chip_support *nvhost_chip_ops;
>> +static struct nvhost_chip_support *nvhost_chip_ops;
>>
>
> All right, already fixed here. Sorry, so just ignore what I said about
> this in my reply to your patch 1.

I was wondering about this, because I thought I did make it static. But
it looks like I added that to the wrong commit. Anyway, this needs
rethinking.

>> +struct mem_handle *nvhost_dmabuf_get(u32 id, struct platform_device *dev)
>> +{
>> + struct mem_handle *h;
>> + struct dma_buf *buf;
>> +
>> + buf = dma_buf_get(to_dmabuf_fd(id));
>> + if (IS_ERR_OR_NULL(buf))
>> + return (struct mem_handle *)buf;
>> +
>> + h = (struct mem_handle *)dma_buf_attach(buf, &dev->dev);
>> + if (IS_ERR_OR_NULL(h))
>> + dma_buf_put(buf);
>
> Return an error here.

Will do.

>> + op->nvhost_dev.alloc_nvhost_channel = t20_alloc_nvhost_channel;
>> + op->nvhost_dev.free_nvhost_channel = t20_free_nvhost_channel;
>> +
>
> I recall in previous version, there is t30-related alloc_nvhost_channel
> & free_nvhost_channel. Why remove them?

I could actually refactor these all into one alloc channel. We already
store the number of channels in a data type, so a generic channel
allocator would be better than having a chip specific one.

>> +static int push_buffer_init(struct push_buffer *pb)
>> +{
>> + struct nvhost_cdma *cdma = pb_to_cdma(pb);
>> + struct nvhost_master *master = cdma_to_dev(cdma);
>> + pb->mapped = NULL;
>> + pb->phys = 0;
>> + pb->handle = NULL;
>> +
>> + cdma_pb_op().reset(pb);
>> +
>> + /* allocate and map pushbuffer memory */
>> + pb->mapped = dma_alloc_writecombine(&master->dev->dev,
>> + PUSH_BUFFER_SIZE + 4, &pb->phys, GFP_KERNEL);
>> + if (IS_ERR_OR_NULL(pb->mapped)) {
>> + pb->mapped = NULL;
>> + goto fail;
>
> Return directly here. "goto fail" makes "push_buffer_destroy" get called.

Will do.

>
>> + }
>> +
>> + /* memory for storing mem client and handles for each opcode pair */
>> + pb->handle = kzalloc(NVHOST_GATHER_QUEUE_SIZE *
>> + sizeof(struct mem_handle *),
>> + GFP_KERNEL);
>> + if (!pb->handle)
>> + goto fail;
>> +
>> + /* put the restart at the end of pushbuffer memory */
>
> Just for curious, why "pb->mapped + 1K" is the end of a 4K pushbuffer?

pb->mapped is u32 *, so compiler will take care of multiplying by
sizeof(u32).

>> +unsigned int nvhost_cdma_wait_locked(struct nvhost_cdma *cdma,
>> + enum cdma_event event)
>> +{
>> + for (;;) {
>> + unsigned int space = cdma_status_locked(cdma, event);
>> + if (space)
>> + return space;
>> +
>> + /* If somebody has managed to already start waiting, yield */
>> + if (cdma->event != CDMA_EVENT_NONE) {
>> + mutex_unlock(&cdma->lock);
>> + schedule();
>> + mutex_lock(&cdma->lock);
>> + continue;
>> + }
>> + cdma->event = event;
>> +
>> + mutex_unlock(&cdma->lock);
>> + down(&cdma->sem);
>> + mutex_lock(&cdma->lock);
>
> I'm newbie of nvhost but I feel here is very tricky, about the lock and
> unlock of this mutex: cdma->lock. Does it require this mutex is locked
> before calling this function? And do we need to unlock it before the
> code: "return space;" above? IMHO, this is not a good design and can we
> find out a better solution?

Yeah, it's not perfect and good solutions are welcome.
cdma_status_locked() must be called with a mutex. But, what we generally
wait for is for space in push buffer. The cleanup code cannot run if we
keep cdma->lock, so I release it.

The two ways to loop are because there was a race between two processes
waiting for space. One of them set cdma->event to indicate what it's
waiting for and can go to sleep, but the other has to keep spinning.

Terje
Mark Zhang
2012-11-30 06:13:36 UTC
Permalink
On 11/29/2012 06:46 PM, Terje Bergstr=C3=B6m wrote:
> On 29.11.2012 12:01, Mark Zhang wrote:
>>
>> Just for curious, why "pb->mapped + 1K" is the end of a 4K pushbuffe=
r?
>=20
> pb->mapped is u32 *, so compiler will take care of multiplying by
> sizeof(u32).
>=20

Ah, yes. Sorry, I must be insane at that time. :)

>>> +unsigned int nvhost_cdma_wait_locked(struct nvhost_cdma *cdma,
>>> + enum cdma_event event)
>>> +{
>>> + for (;;) {
>>> + unsigned int space =3D cdma_status_locked(cdma, event);
>>> + if (space)
>>> + return space;
>>> +
>>> + /* If somebody has managed to already start waiting, yield */
>>> + if (cdma->event !=3D CDMA_EVENT_NONE) {
>>> + mutex_unlock(&cdma->lock);
>>> + schedule();
>>> + mutex_lock(&cdma->lock);
>>> + continue;
>>> + }
>>> + cdma->event =3D event;
>>> +
>>> + mutex_unlock(&cdma->lock);
>>> + down(&cdma->sem);
>>> + mutex_lock(&cdma->lock);
>>
>> I'm newbie of nvhost but I feel here is very tricky, about the lock =
and
>> unlock of this mutex: cdma->lock. Does it require this mutex is lock=
ed
>> before calling this function? And do we need to unlock it before the
>> code: "return space;" above? IMHO, this is not a good design and can=
we
>> find out a better solution?
>=20
> Yeah, it's not perfect and good solutions are welcome.
> cdma_status_locked() must be called with a mutex. But, what we genera=
lly
> wait for is for space in push buffer. The cleanup code cannot run if =
we
> keep cdma->lock, so I release it.
>=20
> The two ways to loop are because there was a race between two process=
es
> waiting for space. One of them set cdma->event to indicate what it's
> waiting for and can go to sleep, but the other has to keep spinning.
>=20

Alright. I just feel this mutex operations is complicated and
error-prone, but I just get the big picture of nvhost and still don't
know much about a lot of details. So I'll let you know if I find some
better solutions.

> Terje
>=20
Thierry Reding
2012-11-29 10:04:05 UTC
Permalink
On Mon, Nov 26, 2012 at 03:19:09PM +0200, Terje Bergstrom wrote:

I've skipped a lot of code here that I need more time to review.

[...]
> diff --git a/drivers/video/tegra/host/nvhost_intr.c b/drivers/video/tegra/host/nvhost_intr.c
[...]
> +static void action_submit_complete(struct nvhost_waitlist *waiter)
> +{
> + struct nvhost_channel *channel = waiter->data;
> + int nr_completed = waiter->count;
> +
> + nvhost_cdma_update(&channel->cdma);
> + nvhost_module_idle_mult(channel->dev, nr_completed);
> +}
>
> static void action_wakeup(struct nvhost_waitlist *waiter)
> {
> @@ -125,6 +145,7 @@ static void action_wakeup_interruptible(struct nvhost_waitlist *waiter)
> typedef void (*action_handler)(struct nvhost_waitlist *waiter);
>
> static action_handler action_handlers[NVHOST_INTR_ACTION_COUNT] = {
> + action_submit_complete,
> action_wakeup,
> action_wakeup_interruptible,
> };
[...]
> diff --git a/drivers/video/tegra/host/nvhost_intr.h b/drivers/video/tegra/host/nvhost_intr.h
[...]
> enum nvhost_intr_action {
> /**
> + * Perform cleanup after a submit has completed.
> + * 'data' points to a channel
> + */
> + NVHOST_INTR_ACTION_SUBMIT_COMPLETE = 0,
> +
> + /**
> * Wake up a task.
> * 'data' points to a wait_queue_head_t
> */

Looking some more at how this is used, I'm starting to think that it
might be easier to export the various handlers and allow them to be
passed to the nvhost_intr_add_action() explicitly.

> diff --git a/drivers/video/tegra/host/nvhost_job.c b/drivers/video/tegra/host/nvhost_job.c
[...]
> +/* Magic to use to fill freed handle slots */
> +#define BAD_MAGIC 0xdeadbeef

This isn't currently used.

> +static size_t job_size(u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
> +{
> + u32 num_unpins = num_cmdbufs + num_relocs;
> + s64 total;
> +
> + if (num_relocs < 0 || num_waitchks < 0 || num_cmdbufs < 0)
> + return 0;
> +
> + total = sizeof(struct nvhost_job)
> + + num_relocs * sizeof(struct nvhost_reloc)
> + + num_unpins * sizeof(struct nvhost_job_unpin_data)
> + + num_waitchks * sizeof(struct nvhost_waitchk)
> + + num_cmdbufs * sizeof(struct nvhost_job_gather);
> +
> + if (total > ULONG_MAX)
> + return 0;
> + return (size_t)total;
> +}
> +
> +
> +static void init_fields(struct nvhost_job *job,
> + u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
> +{
> + u32 num_unpins = num_cmdbufs + num_relocs;
> + void *mem = job;
> +
> + /* First init state to zero */
> +
> + /*
> + * Redistribute memory to the structs.
> + * Overflows and negative conditions have
> + * already been checked in job_alloc().
> + */
> + mem += sizeof(struct nvhost_job);
> + job->relocarray = num_relocs ? mem : NULL;
> + mem += num_relocs * sizeof(struct nvhost_reloc);
> + job->unpins = num_unpins ? mem : NULL;
> + mem += num_unpins * sizeof(struct nvhost_job_unpin_data);
> + job->waitchk = num_waitchks ? mem : NULL;
> + mem += num_waitchks * sizeof(struct nvhost_waitchk);
> + job->gathers = num_cmdbufs ? mem : NULL;
> + mem += num_cmdbufs * sizeof(struct nvhost_job_gather);
> + job->addr_phys = (num_cmdbufs || num_relocs) ? mem : NULL;
> +
> + job->reloc_addr_phys = job->addr_phys;
> + job->gather_addr_phys = &job->addr_phys[num_relocs];
> +}

I wouldn't bother splitting out the above two functions.

> +
> +struct nvhost_job *nvhost_job_alloc(struct nvhost_channel *ch,
> + int num_cmdbufs, int num_relocs, int num_waitchks)
> +{
> + struct nvhost_job *job = NULL;
> + size_t size = job_size(num_cmdbufs, num_relocs, num_waitchks);
> +
> + if (!size)
> + return NULL;
> + job = vzalloc(size);

Why vzalloc()?

> +void nvhost_job_add_gather(struct nvhost_job *job,
> + u32 mem_id, u32 words, u32 offset)
> +{
> + struct nvhost_job_gather *cur_gather =
> + &job->gathers[job->num_gathers];
> +
> + cur_gather->words = words;
> + cur_gather->mem_id = mem_id;
> + cur_gather->offset = offset;
> + job->num_gathers += 1;

job->num_gathers++

> +static int pin_job_mem(struct nvhost_job *job)
> +{
> + int i;
> + int count = 0;
> + int result;
> + long unsigned *ids =
> + kmalloc(sizeof(u32 *) *
> + (job->num_relocs + job->num_gathers),
> + GFP_KERNEL);

Maybe this should be allocated along with the nvhost_job and the other
fields to avoid having to allocate, and potentially fail, here?

> +static int do_relocs(struct nvhost_job *job,
> + u32 cmdbuf_mem, struct mem_handle *h)
> +{
> + int i = 0;
> + int last_page = -1;
> + void *cmdbuf_page_addr = NULL;
> +
> + /* pin & patch the relocs for one gather */
> + while (i < job->num_relocs) {
> + struct nvhost_reloc *reloc = &job->relocarray[i];
> +
> + /* skip all other gathers */
> + if (cmdbuf_mem != reloc->cmdbuf_mem) {
> + i++;
> + continue;
> + }
> +
> + if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) {
> + if (cmdbuf_page_addr)
> + nvhost_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
> +
> + cmdbuf_page_addr = nvhost_memmgr_kmap(h,
> + reloc->cmdbuf_offset >> PAGE_SHIFT);
> + last_page = reloc->cmdbuf_offset >> PAGE_SHIFT;
> +
> + if (unlikely(!cmdbuf_page_addr)) {
> + pr_err("Couldn't map cmdbuf for relocation\n");
> + return -ENOMEM;
> + }
> + }
> +
> + __raw_writel(
> + (job->reloc_addr_phys[i] +
> + reloc->target_offset) >> reloc->shift,
> + (cmdbuf_page_addr +
> + (reloc->cmdbuf_offset & ~PAGE_MASK)));

You're not writing to I/O memory, so you shouldn't be using
__raw_writel() here.

> +int nvhost_job_pin(struct nvhost_job *job, struct platform_device *pdev)
> +{
> + int err = 0, i = 0, j = 0;
> + struct nvhost_syncpt *sp = &nvhost_get_host(pdev)->syncpt;
> + unsigned long waitchk_mask[nvhost_syncpt_nb_pts(sp) / BITS_PER_LONG];

You should use a bitmap instead. See DECLARE_BITMAP in linux/types.h...

> +
> + memset(&waitchk_mask[0], 0, sizeof(waitchk_mask));

and run bitmap_zero() here...

> + for (i = 0; i < job->num_waitchk; i++) {
> + u32 syncpt_id = job->waitchk[i].syncpt_id;
> + if (syncpt_id < nvhost_syncpt_nb_pts(sp))
> + waitchk_mask[BIT_WORD(syncpt_id)] |=
> + BIT_MASK(syncpt_id);

and use _set_bit here.

> + }
> +
> + /* get current syncpt values for waitchk */
> + for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
> + nvhost_syncpt_update_min(sp, i);

Or since you only use the mask here, why not move the
nvhost_syncpt_update_min() into the above loop?

> + /* patch gathers */
> + for (i = 0; i < job->num_gathers; i++) {
> + struct nvhost_job_gather *g = &job->gathers[i];
> +
> + /* process each gather mem only once */
> + if (!g->ref) {
> + g->ref = nvhost_memmgr_get(g->mem_id, job->ch->dev);
> + if (IS_ERR(g->ref)) {
> + err = PTR_ERR(g->ref);
> + g->ref = NULL;
> + break;
> + }
> +
> + g->mem_base = job->gather_addr_phys[i];
> +
> + for (j = 0; j < job->num_gathers; j++) {
> + struct nvhost_job_gather *tmp =
> + &job->gathers[j];
> + if (!tmp->ref && tmp->mem_id == g->mem_id) {
> + tmp->ref = g->ref;
> + tmp->mem_base = g->mem_base;
> + }
> + }
> + err = do_relocs(job, g->mem_id, g->ref);
> + if (!err)
> + err = do_waitchks(job, sp,
> + g->mem_id, g->ref);
> + nvhost_memmgr_put(g->ref);
> + if (err)
> + break;
> + }
> + }
> +fail:
> + wmb();

What do you need this write barrier for?

> diff --git a/drivers/video/tegra/host/nvhost_memmgr.c b/drivers/video/tegra/host/nvhost_memmgr.c
> new file mode 100644
> index 0000000..bdceb74
> --- /dev/null
> +++ b/drivers/video/tegra/host/nvhost_memmgr.c
> @@ -0,0 +1,160 @@
> +/*
> + * drivers/video/tegra/host/nvhost_memmgr.c
> + *
> + * Tegra host1x Memory Management Abstraction
> + *
> + * Copyright (c) 2012, NVIDIA Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/err.h>
> +
> +#include "nvhost_memmgr.h"
> +#include "dmabuf.h"
> +#include "chip_support.h"
> +
> +struct mem_handle *nvhost_memmgr_alloc(size_t size, size_t align, int flags)
> +{
> + struct mem_handle *h = NULL;
> + h = nvhost_dmabuf_alloc(size, align, flags);
> +
> + return h;
> +}
> +
> +struct mem_handle *nvhost_memmgr_get(u32 id, struct platform_device *dev)
> +{
> + struct mem_handle *h = NULL;
> +
> + switch (nvhost_memmgr_type(id)) {
> + case mem_mgr_type_dmabuf:
> + h = (struct mem_handle *) nvhost_dmabuf_get(id, dev);
> + break;
> + default:
> + break;
> + }
> +
> + return h;
> +}

Heh, this would actually be a case where I'd argue in favour of an ops
structure. But Lucas already mentioned that we may want to revise how
the memory manager works and ideally we'd only have a single type of
buffers anyway so this would largely become obsolete anyway.

> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
[...]
> +#define NVSYNCPT_2D_0 (18)
> +#define NVSYNCPT_2D_1 (19)
> +#define NVSYNCPT_VBLANK0 (26)
> +#define NVSYNCPT_VBLANK1 (27)
>
> +/* sync points that are wholly managed by the client */
> +#define NVSYNCPTS_CLIENT_MANAGED (\
> + BIT(NVSYNCPT_VBLANK0) | \
> + BIT(NVSYNCPT_VBLANK1) | \
> + BIT(NVSYNCPT_2D_1))
> +
> +#define NVWAITBASE_2D_0 (1)
> +#define NVWAITBASE_2D_1 (2)

I think we already agreed that we want to do dynamic allocation of
syncpoints so these definitions can go away.

> enum nvhost_power_sysfs_attributes {
> NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY = 0,
> NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY,
> @@ -142,4 +157,138 @@ void host1x_syncpt_incr(u32 id);
> u32 host1x_syncpt_read(u32 id);
> int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value);
>
> +/* Register device */
> +int nvhost_client_device_init(struct platform_device *dev);

Again, I think this should be made easier on the clients. Ideally
there'd be a single call to register a client with host1x which would
already initialize the appropriate fields. There can be other, separate
functions for resource allocations such as syncpoints and channels,
though.

> +int nvhost_client_device_suspend(struct platform_device *dev);

Again, I think this should be handled via runtime PM.

> +struct nvhost_channel *nvhost_getchannel(struct nvhost_channel *ch);
> +void nvhost_putchannel(struct nvhost_channel *ch);

These are oddly named. Better names would be nvhost_channel_get() or
nvhost_channel_put().

> +int nvhost_channel_submit(struct nvhost_job *job);
> +
> +enum host1x_class {
> + NV_HOST1X_CLASS_ID = 0x1,
> + NV_GRAPHICS_2D_CLASS_ID = 0x51,
> +};

Maybe this enumeration should be made more consistent, somewhat along
the lines of:

enum host1x_class {
HOST1X_CLASS_HOST1X = 0x01,
HOST1X_CLASS_2D = 0x51,
};

Again, I'll need more time to go over the rest of the code but the good
news is that I'm starting to form a better picture of how things work.

Thierry
Terje Bergström
2012-11-29 11:00:40 UTC
Permalink
On 29.11.2012 12:04, Thierry Reding wrote:
> * PGP Signed by an unknown key
>
> On Mon, Nov 26, 2012 at 03:19:09PM +0200, Terje Bergstrom wrote:
>
> I've skipped a lot of code here that I need more time to review.

Thanks already for the very good comments! It's great getting comments
on the code from fresh eyes.

> Looking some more at how this is used, I'm starting to think that it
> might be easier to export the various handlers and allow them to be
> passed to the nvhost_intr_add_action() explicitly.

Oh, so you mean like "nvhost_intr_add_action(intr, id, threshold,
nvhost_intr_action_submit_complete, channel, waiter, priv), and
nvhost_intr_action_submit_complete is the function pointer?

There's one case to take care of: we merge the waits for the jobs into
one waiter to save us from having too many irq calls. Perhaps that could
be handled by a flag, or something like that.

>> +/* Magic to use to fill freed handle slots */
>> +#define BAD_MAGIC 0xdeadbeef
>
> This isn't currently used.

Will remove.

>
>> +static size_t job_size(u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
>> +{
>> + u32 num_unpins = num_cmdbufs + num_relocs;
>> + s64 total;
>> +
>> + if (num_relocs < 0 || num_waitchks < 0 || num_cmdbufs < 0)
>> + return 0;
>> +
>> + total = sizeof(struct nvhost_job)
>> + + num_relocs * sizeof(struct nvhost_reloc)
>> + + num_unpins * sizeof(struct nvhost_job_unpin_data)
>> + + num_waitchks * sizeof(struct nvhost_waitchk)
>> + + num_cmdbufs * sizeof(struct nvhost_job_gather);
>> +
>> + if (total > ULONG_MAX)
>> + return 0;
>> + return (size_t)total;
>> +}
>> +
>> +
>> +static void init_fields(struct nvhost_job *job,
>> + u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
>> +{
>> + u32 num_unpins = num_cmdbufs + num_relocs;
>> + void *mem = job;
>> +
>> + /* First init state to zero */
>> +
>> + /*
>> + * Redistribute memory to the structs.
>> + * Overflows and negative conditions have
>> + * already been checked in job_alloc().
>> + */
>> + mem += sizeof(struct nvhost_job);
>> + job->relocarray = num_relocs ? mem : NULL;
>> + mem += num_relocs * sizeof(struct nvhost_reloc);
>> + job->unpins = num_unpins ? mem : NULL;
>> + mem += num_unpins * sizeof(struct nvhost_job_unpin_data);
>> + job->waitchk = num_waitchks ? mem : NULL;
>> + mem += num_waitchks * sizeof(struct nvhost_waitchk);
>> + job->gathers = num_cmdbufs ? mem : NULL;
>> + mem += num_cmdbufs * sizeof(struct nvhost_job_gather);
>> + job->addr_phys = (num_cmdbufs || num_relocs) ? mem : NULL;
>> +
>> + job->reloc_addr_phys = job->addr_phys;
>> + job->gather_addr_phys = &job->addr_phys[num_relocs];
>> +}
>
> I wouldn't bother splitting out the above two functions.

You're right, I'll merge them back in. There was a historical reason for
the split, but not anymore.

>
>> +
>> +struct nvhost_job *nvhost_job_alloc(struct nvhost_channel *ch,
>> + int num_cmdbufs, int num_relocs, int num_waitchks)
>> +{
>> + struct nvhost_job *job = NULL;
>> + size_t size = job_size(num_cmdbufs, num_relocs, num_waitchks);
>> +
>> + if (!size)
>> + return NULL;
>> + job = vzalloc(size);
>
> Why vzalloc()?

I guess it's basically moot, but we tried that when we had some memory
fragmentation issues and it was left even though we did find out it's
not needed.

>> +void nvhost_job_add_gather(struct nvhost_job *job,
>> + u32 mem_id, u32 words, u32 offset)
>> +{
>> + struct nvhost_job_gather *cur_gather =
>> + &job->gathers[job->num_gathers];
>> +
>> + cur_gather->words = words;
>> + cur_gather->mem_id = mem_id;
>> + cur_gather->offset = offset;
>> + job->num_gathers += 1;
>
> job->num_gathers++

OK.

>> +static int pin_job_mem(struct nvhost_job *job)
>> +{
>> + int i;
>> + int count = 0;
>> + int result;
>> + long unsigned *ids =
>> + kmalloc(sizeof(u32 *) *
>> + (job->num_relocs + job->num_gathers),
>> + GFP_KERNEL);
>
> Maybe this should be allocated along with the nvhost_job and the other
> fields to avoid having to allocate, and potentially fail, here?

Yes, you're right.

>> + __raw_writel(
>> + (job->reloc_addr_phys[i] +
>> + reloc->target_offset) >> reloc->shift,
>> + (cmdbuf_page_addr +
>> + (reloc->cmdbuf_offset & ~PAGE_MASK)));
>
> You're not writing to I/O memory, so you shouldn't be using
> __raw_writel() here.

Will change.

>
>> +int nvhost_job_pin(struct nvhost_job *job, struct platform_device *pdev)
>> +{
>> + int err = 0, i = 0, j = 0;
>> + struct nvhost_syncpt *sp = &nvhost_get_host(pdev)->syncpt;
>> + unsigned long waitchk_mask[nvhost_syncpt_nb_pts(sp) / BITS_PER_LONG];
>
> You should use a bitmap instead. See DECLARE_BITMAP in linux/types.h...

Oh, nice. Will do.

>
>> +
>> + memset(&waitchk_mask[0], 0, sizeof(waitchk_mask));
>
> and run bitmap_zero() here...
>
>> + for (i = 0; i < job->num_waitchk; i++) {
>> + u32 syncpt_id = job->waitchk[i].syncpt_id;
>> + if (syncpt_id < nvhost_syncpt_nb_pts(sp))
>> + waitchk_mask[BIT_WORD(syncpt_id)] |=
>> + BIT_MASK(syncpt_id);
>
> and use _set_bit here.

Will do.

>
>> + }
>> +
>> + /* get current syncpt values for waitchk */
>> + for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
>> + nvhost_syncpt_update_min(sp, i);
>
> Or since you only use the mask here, why not move the
> nvhost_syncpt_update_min() into the above loop?

I want to call nvhost_syncpt_update_min() only once per syncpt register.
If the job has 100 sync point increments for 2D sync point, I'd read the
value from hardware 100 times, which is expensive.

>> + /* patch gathers */
>> + for (i = 0; i < job->num_gathers; i++) {
>> + struct nvhost_job_gather *g = &job->gathers[i];
>> +
>> + /* process each gather mem only once */
>> + if (!g->ref) {
>> + g->ref = nvhost_memmgr_get(g->mem_id, job->ch->dev);
>> + if (IS_ERR(g->ref)) {
>> + err = PTR_ERR(g->ref);
>> + g->ref = NULL;
>> + break;
>> + }
>> +
>> + g->mem_base = job->gather_addr_phys[i];
>> +
>> + for (j = 0; j < job->num_gathers; j++) {
>> + struct nvhost_job_gather *tmp =
>> + &job->gathers[j];
>> + if (!tmp->ref && tmp->mem_id == g->mem_id) {
>> + tmp->ref = g->ref;
>> + tmp->mem_base = g->mem_base;
>> + }
>> + }
>> + err = do_relocs(job, g->mem_id, g->ref);
>> + if (!err)
>> + err = do_waitchks(job, sp,
>> + g->mem_id, g->ref);
>> + nvhost_memmgr_put(g->ref);
>> + if (err)
>> + break;
>> + }
>> + }
>> +fail:
>> + wmb();
>
> What do you need this write barrier for?

A good question. It looks we've sprinkled barriers around the code with
no proper reason.

>
>> diff --git a/drivers/video/tegra/host/nvhost_memmgr.c b/drivers/video/tegra/host/nvhost_memmgr.c
>> new file mode 100644
>> index 0000000..bdceb74
>> --- /dev/null
>> +++ b/drivers/video/tegra/host/nvhost_memmgr.c
>> @@ -0,0 +1,160 @@
>> +/*
>> + * drivers/video/tegra/host/nvhost_memmgr.c
>> + *
>> + * Tegra host1x Memory Management Abstraction
>> + *
>> + * Copyright (c) 2012, NVIDIA Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/err.h>
>> +
>> +#include "nvhost_memmgr.h"
>> +#include "dmabuf.h"
>> +#include "chip_support.h"
>> +
>> +struct mem_handle *nvhost_memmgr_alloc(size_t size, size_t align, int flags)
>> +{
>> + struct mem_handle *h = NULL;
>> + h = nvhost_dmabuf_alloc(size, align, flags);
>> +
>> + return h;
>> +}
>> +
>> +struct mem_handle *nvhost_memmgr_get(u32 id, struct platform_device *dev)
>> +{
>> + struct mem_handle *h = NULL;
>> +
>> + switch (nvhost_memmgr_type(id)) {
>> + case mem_mgr_type_dmabuf:
>> + h = (struct mem_handle *) nvhost_dmabuf_get(id, dev);
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + return h;
>> +}
>
> Heh, this would actually be a case where I'd argue in favour of an ops
> structure. But Lucas already mentioned that we may want to revise how
> the memory manager works and ideally we'd only have a single type of
> buffers anyway so this would largely become obsolete anyway.

Hmm, damn, and I've spent actually significant amount of time of
bypassing the ops here. Actually downstream memory manager uses ops
indirection. :-) I started removing it when I noticed that the ops
should actually be per handle, and not per SoC as the mem_ops() was.

We did discuss with Arto and he proposed that we'd just import dma_buf
fds into CMA GEM objects. That'd allow us to only support CMA GEM
objects inside nvhost. He started already by implementing CMA GEM buffer
support in nvhost as a new buffer type.

I'm not sure if importing is better than just supporting both CMA GEM
and dma_buf handles, but I guess we'll find out by trying it.

> I think we already agreed that we want to do dynamic allocation of
> syncpoints so these definitions can go away.

Yep.

>
>> enum nvhost_power_sysfs_attributes {
>> NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY = 0,
>> NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY,
>> @@ -142,4 +157,138 @@ void host1x_syncpt_incr(u32 id);
>> u32 host1x_syncpt_read(u32 id);
>> int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value);
>>
>> +/* Register device */
>> +int nvhost_client_device_init(struct platform_device *dev);
>
> Again, I think this should be made easier on the clients. Ideally
> there'd be a single call to register a client with host1x which would
> already initialize the appropriate fields. There can be other, separate
> functions for resource allocations such as syncpoints and channels,
> though.

I'll try to cook up something to make this clearer.

>> +int nvhost_client_device_suspend(struct platform_device *dev);
> Again, I think this should be handled via runtime PM.

Referring to my previous comment.

>
>> +struct nvhost_channel *nvhost_getchannel(struct nvhost_channel *ch);
>> +void nvhost_putchannel(struct nvhost_channel *ch);
>
> These are oddly named. Better names would be nvhost_channel_get() or
> nvhost_channel_put().

Sounds good to me.

>> +int nvhost_channel_submit(struct nvhost_job *job);
>> +
>> +enum host1x_class {
>> + NV_HOST1X_CLASS_ID = 0x1,
>> + NV_GRAPHICS_2D_CLASS_ID = 0x51,
>> +};
>
> Maybe this enumeration should be made more consistent, somewhat along
> the lines of:
>
> enum host1x_class {
> HOST1X_CLASS_HOST1X = 0x01,
> HOST1X_CLASS_2D = 0x51,
> };

Sure.

> Again, I'll need more time to go over the rest of the code but the good
> news is that I'm starting to form a better picture of how things work.

Thanks. I've collected a massive amount of feedback already. v3 will
take quite a while to appear after we've finished all the reviews of v2.

Terje
Thierry Reding
2012-11-30 07:46:58 UTC
Permalink
On Thu, Nov 29, 2012 at 01:00:40PM +0200, Terje Bergström wrote:
> On 29.11.2012 12:04, Thierry Reding wrote:
> > Looking some more at how this is used, I'm starting to think that it
> > might be easier to export the various handlers and allow them to be
> > passed to the nvhost_intr_add_action() explicitly.
>
> Oh, so you mean like "nvhost_intr_add_action(intr, id, threshold,
> nvhost_intr_action_submit_complete, channel, waiter, priv), and
> nvhost_intr_action_submit_complete is the function pointer?
>
> There's one case to take care of: we merge the waits for the jobs into
> one waiter to save us from having too many irq calls. Perhaps that could
> be handled by a flag, or something like that.

Yes, something like ACTION_MERGE or something should work fine.
Alternatively you could handle it by providing two public functions, one
which adds to the list of jobs that can be merged, the other that adds
to the list that cannot be merged.

> >> +struct nvhost_job *nvhost_job_alloc(struct nvhost_channel *ch,
> >> + int num_cmdbufs, int num_relocs, int num_waitchks)
> >> +{
> >> + struct nvhost_job *job = NULL;
> >> + size_t size = job_size(num_cmdbufs, num_relocs, num_waitchks);
> >> +
> >> + if (!size)
> >> + return NULL;
> >> + job = vzalloc(size);
> >
> > Why vzalloc()?
>
> I guess it's basically moot, but we tried that when we had some memory
> fragmentation issues and it was left even though we did find out it's
> not needed.

I think kzalloc() would be a better choice here. Also, while at it you
may want to make the num_* parameters unsigned.

> >> + }
> >> +
> >> + /* get current syncpt values for waitchk */
> >> + for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
> >> + nvhost_syncpt_update_min(sp, i);
> >
> > Or since you only use the mask here, why not move the
> > nvhost_syncpt_update_min() into the above loop?
>
> I want to call nvhost_syncpt_update_min() only once per syncpt register.
> If the job has 100 sync point increments for 2D sync point, I'd read the
> value from hardware 100 times, which is expensive.

Right, hadn't thought about the fact that you can have multiple waits
for a single syncpoint in the job.

Looking at the code again, I see that you use sizeof(waitchk_mask) as
the third parameter to the for_each_set_bit() macro. However the size
parameter is to be specified in bits, not bytes.

Also the name nvhost_syncpt_update_min() has had me confused. So say it
is used to update the value that you've cached in software from the real
value in the register. However I interpret update_min() as "update the
minimum value in the register". Maybe something like *_load_min() would
be clearer.

> Thanks. I've collected a massive amount of feedback already. v3 will
> take quite a while to appear after we've finished all the reviews of v2.

Yes, that should keep you busy for quite a while. =) But I also think
we've made good progress so far.

Thierry
Terje Bergstrom
2012-11-26 13:19:14 UTC
Permalink
Add client driver for 2D device.

Signed-off-by: Arto Merilainen <***@nvidia.com>
Signed-off-by: Terje Bergstrom <***@nvidia.com>
---
drivers/gpu/drm/tegra/Makefile | 2 +-
drivers/gpu/drm/tegra/drm.c | 231 +++++++++++++++++++++++++++++++++++++++-
drivers/gpu/drm/tegra/drm.h | 42 ++++++--
drivers/gpu/drm/tegra/gr2d.c | 224 ++++++++++++++++++++++++++++++++++++++
include/drm/tegra_drm.h | 129 ++++++++++++++++++++++
5 files changed, 615 insertions(+), 13 deletions(-)
create mode 100644 drivers/gpu/drm/tegra/gr2d.c
create mode 100644 include/drm/tegra_drm.h

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 53ea383..5e85042 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -1,7 +1,7 @@
ccflags-y := -Iinclude/drm
ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG

-tegra-drm-y := drm.o fb.o dc.o
+tegra-drm-y := drm.o fb.o dc.o gr2d.o
tegra-drm-y += output.o rgb.o hdmi.o tvo.o dsi.o
tegra-drm-y += plane.o dmabuf.o

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index f78a31b..c35e547 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -10,6 +10,7 @@
#include <linux/module.h>
#include <linux/of_address.h>
#include <linux/of_platform.h>
+#include <linux/nvhost.h>

#include <mach/clk.h>
#include <linux/dma-mapping.h>
@@ -55,10 +56,12 @@ static int tegra_drm_parse_dt(void)
"nvidia,tegra20-hdmi",
"nvidia,tegra20-tvo",
"nvidia,tegra20-dsi",
+ "nvidia,tegra20-gr2d",
"nvidia,tegra30-dc",
"nvidia,tegra30-hdmi",
"nvidia,tegra30-tvo",
- "nvidia,tegra30-dsi"
+ "nvidia,tegra30-dsi",
+ "nvidia,tegra30-gr2d"
};
unsigned int i;
int err;
@@ -177,7 +180,17 @@ static int tegra_drm_unload(struct drm_device *drm)

static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
{
- return 0;
+ struct tegra_drm_fpriv *fpriv;
+ int err = 0;
+
+ fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
+ if (!fpriv)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&fpriv->contexts);
+ filp->driver_priv = fpriv;
+
+ return err;
}

static void tegra_drm_lastclose(struct drm_device *drm)
@@ -207,8 +220,13 @@ static int __init tegra_drm_init(void)
if (err < 0)
goto unregister_tvo;

+ err = platform_driver_register(&tegra_gr2d_driver);
+ if (err < 0)
+ goto unregister_dsi;
return 0;

+unregister_dsi:
+ platform_driver_unregister(&tegra_dsi_driver);
unregister_tvo:
platform_driver_unregister(&tegra_tvo_driver);
unregister_hdmi:
@@ -221,6 +239,7 @@ module_init(tegra_drm_init);

static void __exit tegra_drm_exit(void)
{
+ platform_driver_unregister(&tegra_gr2d_driver);
platform_driver_unregister(&tegra_dsi_driver);
platform_driver_unregister(&tegra_tvo_driver);
platform_driver_unregister(&tegra_hdmi_driver);
@@ -232,7 +251,215 @@ MODULE_AUTHOR("Thierry Reding <***@avionic-design.de>");
MODULE_DESCRIPTION("NVIDIA Tegra DRM driver");
MODULE_LICENSE("GPL");

+static int
+tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
+ struct drm_file *file_priv)
+{
+ struct tegra_drm_syncpt_read_args *args = data;
+
+ dev_dbg(drm->dev, "> %s(drm=%p, id=%d)\n", __func__, drm, args->id);
+ args->value = host1x_syncpt_read(args->id);
+ dev_dbg(drm->dev, "< %s(value=%d)\n", __func__, args->value);
+ return 0;
+}
+
+static int
+tegra_drm_ioctl_syncpt_incr(struct drm_device *drm, void *data,
+ struct drm_file *file_priv)
+{
+ struct tegra_drm_syncpt_incr_args *args = data;
+ dev_dbg(drm->dev, "> %s(drm=%p, id=%d)\n", __func__, drm, args->id);
+ host1x_syncpt_incr(args->id);
+ dev_dbg(drm->dev, "< %s()\n", __func__);
+ return 0;
+}
+
+static int
+tegra_drm_ioctl_syncpt_wait(struct drm_device *drm, void *data,
+ struct drm_file *file_priv)
+{
+ struct tegra_drm_syncpt_wait_args *args = data;
+ int err;
+
+ dev_dbg(drm->dev, "> %s(drm=%p, id=%d, thresh=%d)\n", __func__, drm,
+ args->id, args->thresh);
+ err = host1x_syncpt_wait(args->id, args->thresh,
+ args->timeout, &args->value);
+ dev_dbg(drm->dev, "< %s() = %d\n", __func__, err);
+
+ return err;
+}
+
+static int
+tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
+ struct drm_file *file_priv)
+{
+ struct tegra_drm_open_channel_args *args = data;
+ struct tegra_drm_client *client;
+ struct tegra_drm_context *context;
+ struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+ int err = 0;
+
+ dev_dbg(drm->dev, "> %s(fpriv=%p, class=%x)\n", __func__,
+ fpriv, args->class);
+
+ context = kzalloc(sizeof(*context), GFP_KERNEL);
+ if (!context) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ list_for_each_entry(client, &tegra_drm_subdrv_list, list) {
+ if (client->class == args->class) {
+ dev_dbg(drm->dev, "opening client %x\n", args->class);
+ context->client = client;
+ err = client->ops->open_channel(client, context);
+ if (err)
+ goto out;
+
+ dev_dbg(drm->dev, "context %p\n", context);
+ list_add(&context->list, &fpriv->contexts);
+ args->context = context;
+ goto out;
+ }
+ }
+ err = -ENODEV;
+
+out:
+ if (err)
+ kfree(context);
+
+ dev_dbg(drm->dev, "< %s() = %d\n", __func__, err);
+ return err;
+}
+
+static int
+tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
+ struct drm_file *file_priv)
+{
+ struct tegra_drm_open_channel_args *args = data;
+ struct tegra_drm_context *context;
+ struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+ int err = 0;
+
+ dev_dbg(drm->dev, "> %s(fpriv=%p)\n", __func__, fpriv);
+ list_for_each_entry(context, &fpriv->contexts, list) {
+ if (context == args->context) {
+ context->client->ops->close_channel(context);
+ list_del(&context->list);
+ kfree(context);
+ goto out;
+ }
+ }
+ err = -EINVAL;
+
+out:
+ dev_dbg(drm->dev, "< %s() = %d\n", __func__, err);
+ return err;
+}
+
+static int
+tegra_drm_ioctl_get_syncpoints(struct drm_device *drm, void *data,
+ struct drm_file *file_priv)
+{
+ struct tegra_drm_get_channel_param_args *args = data;
+ struct tegra_drm_context *context;
+ struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+ int err = 0;
+
+ list_for_each_entry(context, &fpriv->contexts, list) {
+ if (context == args->context) {
+ args->value =
+ context->client->ops->get_syncpoints(context);
+ goto out;
+ }
+ }
+ err = -ENODEV;
+
+out:
+ return err;
+}
+
+static int
+tegra_drm_ioctl_get_modmutexes(struct drm_device *drm, void *data,
+ struct drm_file *file_priv)
+{
+ struct tegra_drm_get_channel_param_args *args = data;
+ struct tegra_drm_context *context;
+ struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+ int err = 0;
+
+ list_for_each_entry(context, &fpriv->contexts, list) {
+ if (context == args->context) {
+ args->value =
+ context->client->ops->get_modmutexes(context);
+ goto out;
+ }
+ }
+ err = -ENODEV;
+
+out:
+ return err;
+}
+
+static int
+tegra_drm_ioctl_submit(struct drm_device *drm, void *data,
+ struct drm_file *file_priv)
+{
+ struct tegra_drm_submit_args *args = data;
+ struct tegra_drm_context *context;
+ struct tegra_drm_fpriv *fpriv = tegra_drm_fpriv(file_priv);
+ int err = 0;
+
+ list_for_each_entry(context, &fpriv->contexts, list) {
+ if (context == args->context) {
+ err = context->client->ops->submit(context, args);
+ goto out;
+ }
+ }
+ err = -ENODEV;
+
+out:
+ return err;
+
+}
+
+static int
+tegra_drm_create_ioctl(struct drm_device *drm, void *data,
+ struct drm_file *file_priv)
+{
+ struct tegra_gem_create *args = data;
+ struct drm_gem_cma_object *cma_obj;
+ int ret;
+
+ cma_obj = drm_gem_cma_create(drm, args->size);
+ if (IS_ERR(cma_obj))
+ goto err_cma_create;
+
+ ret = drm_gem_handle_create(file_priv, &cma_obj->base, &args->handle);
+ if (ret)
+ goto err_handle_create;
+
+ drm_gem_object_unreference(&cma_obj->base);
+
+ return 0;
+
+err_handle_create:
+ drm_gem_cma_free_object(&cma_obj->base);
+err_cma_create:
+ return -ENOMEM;
+}
+
static struct drm_ioctl_desc tegra_drm_ioctls[] = {
+ DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
+ DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_READ, tegra_drm_ioctl_syncpt_read, DRM_UNLOCKED),
+ DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_INCR, tegra_drm_ioctl_syncpt_incr, DRM_UNLOCKED),
+ DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_WAIT, tegra_drm_ioctl_syncpt_wait, DRM_UNLOCKED),
+ DRM_IOCTL_DEF_DRV(TEGRA_DRM_OPEN_CHANNEL, tegra_drm_ioctl_open_channel, DRM_UNLOCKED),
+ DRM_IOCTL_DEF_DRV(TEGRA_DRM_CLOSE_CHANNEL, tegra_drm_ioctl_close_channel, DRM_UNLOCKED),
+ DRM_IOCTL_DEF_DRV(TEGRA_DRM_GET_SYNCPOINTS, tegra_drm_ioctl_get_syncpoints, DRM_UNLOCKED),
+ DRM_IOCTL_DEF_DRV(TEGRA_DRM_GET_MODMUTEXES, tegra_drm_ioctl_get_modmutexes, DRM_UNLOCKED),
+ DRM_IOCTL_DEF_DRV(TEGRA_DRM_SUBMIT, tegra_drm_ioctl_submit, DRM_UNLOCKED),
};

static const struct file_operations tegra_drm_fops = {
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 1267a38..db197f6 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -20,6 +20,7 @@
#include <drm/drm_gem_cma_helper.h>
#include <drm/drm_fb_cma_helper.h>
#include <drm/drm_fixed.h>
+#include <drm/tegra_drm.h>

struct tegra_framebuffer {
struct drm_framebuffer base;
@@ -33,17 +34,44 @@ static inline struct tegra_framebuffer *to_tegra_fb(struct drm_framebuffer *fb)

struct tegra_drm_client;

+struct tegra_drm_context {
+ struct tegra_drm_client *client;
+ struct nvhost_channel *channel;
+ struct list_head list;
+};
+
struct tegra_drm_client_ops {
- int (*drm_init)(struct tegra_drm_client *client,
- struct drm_device *drm);
- int (*drm_exit)(struct tegra_drm_client *client);
+ int (*drm_init)(struct tegra_drm_client *, struct drm_device *);
+ int (*drm_exit)(struct tegra_drm_client *);
+ int (*open_channel)(struct tegra_drm_client *,
+ struct tegra_drm_context *);
+ void (*close_channel)(struct tegra_drm_context *);
+ u32 (*get_syncpoints)(struct tegra_drm_context *);
+ u32 (*get_waitbases)(struct tegra_drm_context *);
+ u32 (*get_modmutexes)(struct tegra_drm_context *);
+ int (*submit)(struct tegra_drm_context *,
+ struct tegra_drm_submit_args *);
+};
+
+
+struct tegra_drm_fpriv {
+ struct list_head contexts;
};

+static inline struct tegra_drm_fpriv *
+tegra_drm_fpriv(struct drm_file *file_priv)
+{
+ return file_priv ? file_priv->driver_priv : NULL;
+}
+
struct tegra_drm_client {
struct device *dev;

const struct tegra_drm_client_ops *ops;

+ u32 class;
+ struct nvhost_channel *channel;
+
struct list_head list;

};
@@ -116,13 +144,6 @@ struct tegra_output_ops {
enum drm_mode_status *status);
};

-enum tegra_output_type {
- TEGRA_OUTPUT_RGB,
- TEGRA_OUTPUT_HDMI,
- TEGRA_OUTPUT_TVO,
- TEGRA_OUTPUT_DSI,
-};
-
struct tegra_output {
struct device_node *of_node;
struct device *dev;
@@ -225,6 +246,7 @@ extern struct platform_driver tegra_hdmi_driver;
extern struct platform_driver tegra_tvo_driver;
extern struct platform_driver tegra_dsi_driver;
extern struct platform_driver tegra_dc_driver;
+extern struct platform_driver tegra_gr2d_driver;
extern struct drm_driver tegra_drm_driver;

/* from dmabuf.c */
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
new file mode 100644
index 0000000..51605af
--- /dev/null
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -0,0 +1,224 @@
+/*
+ * drivers/video/tegra/host/gr2d/gr2d.c
+ *
+ * Tegra Graphics 2D
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/of.h>
+#include <drm/tegra_drm.h>
+#include <linux/nvhost.h>
+#include "drm.h"
+
+static struct tegra_drm_client gr2d_client;
+
+static int gr2d_client_init(struct tegra_drm_client *client,
+ struct drm_device *drm)
+{
+ return 0;
+}
+
+static int gr2d_client_exit(struct tegra_drm_client *client)
+{
+ return 0;
+}
+
+static int gr2d_open_channel(struct tegra_drm_client *client,
+ struct tegra_drm_context *context)
+{
+ struct nvhost_device_data *pdata = dev_get_drvdata(client->dev);
+ context->channel = nvhost_getchannel(pdata->channel);
+
+ if (!context->channel)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void gr2d_close_channel(struct tegra_drm_context *context)
+{
+ nvhost_putchannel(context->channel);
+}
+
+static u32 gr2d_get_syncpoints(struct tegra_drm_context *context)
+{
+ struct nvhost_device_data *pdata =
+ dev_get_drvdata(context->client->dev);
+ return pdata->syncpts;
+}
+
+static u32 gr2d_get_modmutexes(struct tegra_drm_context *context)
+{
+ struct nvhost_device_data *pdata =
+ dev_get_drvdata(context->client->dev);
+ return pdata->modulemutexes;
+}
+
+static int gr2d_submit(struct tegra_drm_context *context,
+ struct tegra_drm_submit_args *args)
+{
+ struct nvhost_job *job;
+ int num_cmdbufs = args->num_cmdbufs;
+ int num_relocs = args->num_relocs;
+ int num_waitchks = args->num_waitchks;
+ struct tegra_drm_cmdbuf __user *cmdbufs = args->cmdbufs;
+ struct tegra_drm_reloc __user *relocs = args->relocs;
+ struct tegra_drm_waitchk __user *waitchks = args->waitchks;
+ struct tegra_drm_syncpt_incr syncpt_incr;
+ int err;
+
+ dev_dbg(context->client->dev, "> %s(context=%p, cmdbufs=%d, relocs=%d, waitchks=%d)\n",
+ __func__, context,
+ num_cmdbufs, num_relocs, num_waitchks);
+
+ /* We don't yet support other than one syncpt_incr struct per submit */
+ if (args->num_syncpt_incrs != 1)
+ return -EINVAL;
+
+ job = nvhost_job_alloc(context->channel,
+ args->num_cmdbufs,
+ args->num_relocs,
+ args->num_waitchks);
+ if (!job)
+ return -ENOMEM;
+
+ job->num_relocs = args->num_relocs;
+ job->num_waitchk = args->num_waitchks;
+ job->clientid = (u32)args->context;
+
+ while (num_cmdbufs) {
+ struct tegra_drm_cmdbuf cmdbuf;
+ err = copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf));
+ if (err)
+ goto fail;
+ dev_dbg(context->client->dev, "cmdbuf: mem=%08x, words=%d, offset=%d\n",
+ cmdbuf.mem, cmdbuf.words, cmdbuf.offset);
+ nvhost_job_add_gather(job,
+ cmdbuf.mem, cmdbuf.words, cmdbuf.offset);
+ num_cmdbufs--;
+ cmdbufs++;
+ }
+
+ err = copy_from_user(job->relocarray,
+ relocs, sizeof(*relocs) * num_relocs);
+ if (err)
+ goto fail;
+
+ err = copy_from_user(job->waitchk,
+ relocs, sizeof(*waitchks) * num_waitchks);
+ if (err)
+ goto fail;
+
+ err = nvhost_job_pin(job, to_platform_device(context->client->dev));
+ if (err)
+ goto fail;
+
+ err = copy_from_user(&syncpt_incr,
+ args->syncpt_incrs, sizeof(syncpt_incr));
+ if (err)
+ goto fail;
+
+ job->syncpt_id = syncpt_incr.syncpt_id;
+ job->syncpt_incrs = syncpt_incr.syncpt_incrs;
+ job->timeout = 10000;
+ if (args->timeout && args->timeout < 10000)
+ job->timeout = args->timeout;
+
+ err = nvhost_channel_submit(job);
+ if (err)
+ goto fail_submit;
+
+ args->fence = job->syncpt_end;
+
+ nvhost_job_put(job);
+ dev_dbg(context->client->dev, "< %s(context=%p)\n", __func__, context);
+ return 0;
+
+fail_submit:
+ nvhost_job_unpin(job);
+fail:
+ nvhost_job_put(job);
+ dev_dbg(context->client->dev,
+ "< %s(context=%p) = %d\n", __func__, context, err);
+ return err;
+}
+
+static struct tegra_drm_client_ops gr2d_client_ops = {
+ .drm_init = gr2d_client_init,
+ .drm_exit = gr2d_client_exit,
+ .open_channel = gr2d_open_channel,
+ .close_channel = gr2d_close_channel,
+ .get_syncpoints = gr2d_get_syncpoints,
+ .get_modmutexes = gr2d_get_modmutexes,
+ .submit = gr2d_submit,
+};
+
+static int __devinit gr2d_probe(struct platform_device *dev)
+{
+ int err;
+ struct nvhost_device_data *pdata =
+ (struct nvhost_device_data *)dev->dev.platform_data;
+ pdata->pdev = dev;
+ platform_set_drvdata(dev, pdata);
+ err = nvhost_client_device_init(dev);
+ if (err)
+ return err;
+
+ gr2d_client.ops = &gr2d_client_ops;
+ gr2d_client.dev = &dev->dev;
+ gr2d_client.class = NV_GRAPHICS_2D_CLASS_ID;
+ return tegra_drm_register_client(&gr2d_client);
+}
+
+static int __exit gr2d_remove(struct platform_device *dev)
+{
+ /* Add clean-up */
+ return 0;
+}
+
+#ifdef CONFIG_PM
+static int gr2d_suspend(struct platform_device *dev, pm_message_t state)
+{
+ return nvhost_client_device_suspend(dev);
+}
+
+static int gr2d_resume(struct platform_device *dev)
+{
+ dev_info(&dev->dev, "resuming\n");
+ return 0;
+}
+#endif
+
+static struct of_device_id gr2d_match[] __devinitdata = {
+ { .compatible = "nvidia,tegra20-gr2d", },
+ { .compatible = "nvidia,tegra30-gr2d", },
+ { },
+};
+
+struct platform_driver tegra_gr2d_driver = {
+ .probe = gr2d_probe,
+ .remove = __exit_p(gr2d_remove),
+#ifdef CONFIG_PM
+ .suspend = gr2d_suspend,
+ .resume = gr2d_resume,
+#endif
+ .driver = {
+ .owner = THIS_MODULE,
+ .name = "tegra-gr2d",
+ .of_match_table = of_match_ptr(gr2d_match),
+ }
+};
diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
new file mode 100644
index 0000000..bfc54d8
--- /dev/null
+++ b/include/drm/tegra_drm.h
@@ -0,0 +1,129 @@
+/*
+ * Copyright (c) 2012, Avionic Design GmbH
+ * Copyright (c) 2012, NVIDIA CORPORATION. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _TEGRA_DRM_H_
+#define _TEGRA_DRM_H_
+
+enum tegra_output_type {
+ TEGRA_OUTPUT_RGB,
+ TEGRA_OUTPUT_HDMI,
+ TEGRA_OUTPUT_TVO,
+ TEGRA_OUTPUT_DSI,
+};
+
+struct tegra_gem_create {
+ uint64_t size;
+ unsigned int flags;
+ unsigned int handle;
+};
+
+#define DRM_TEGRA_GEM_CREATE 0x00
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + \
+ DRM_TEGRA_GEM_CREATE, struct tegra_gem_create)
+
+struct tegra_drm_syncpt_read_args {
+ __u32 id;
+ __u32 value;
+};
+
+struct tegra_drm_syncpt_incr_args {
+ __u32 id;
+};
+
+struct tegra_drm_syncpt_wait_args {
+ __u32 id;
+ __u32 thresh;
+ __s32 timeout;
+ __u32 value;
+};
+
+#define DRM_TEGRA_NO_TIMEOUT (-1)
+
+struct tegra_drm_open_channel_args {
+ __u32 class;
+ void *context;
+};
+
+struct tegra_drm_get_channel_param_args {
+ void *context;
+ __u32 value;
+};
+
+struct tegra_drm_syncpt_incr {
+ __u32 syncpt_id;
+ __u32 syncpt_incrs;
+};
+
+struct tegra_drm_cmdbuf {
+ __u32 mem;
+ __u32 offset;
+ __u32 words;
+};
+
+struct tegra_drm_reloc {
+ __u32 cmdbuf_mem;
+ __u32 cmdbuf_offset;
+ __u32 target;
+ __u32 target_offset;
+ __u32 shift;
+};
+
+struct tegra_drm_waitchk {
+ __u32 mem;
+ __u32 offset;
+ __u32 syncpt_id;
+ __u32 thresh;
+};
+
+struct tegra_drm_submit_args {
+ void *context;
+ __u32 num_syncpt_incrs;
+ __u32 num_cmdbufs;
+ __u32 num_relocs;
+ __u32 submit_version;
+ __u32 num_waitchks;
+ __u32 waitchk_mask;
+ __u32 timeout;
+ struct tegra_drm_syncpt_incrs *syncpt_incrs;
+ struct tegra_drm_cmdbuf *cmdbufs;
+ struct tegra_drm_reloc *relocs;
+ struct tegra_drm_waitchk *waitchks;
+
+ __u32 pad[5]; /* future expansion */
+ __u32 fence; /* Return value */
+};
+
+#define DRM_TEGRA_DRM_SYNCPT_READ 0x01
+#define DRM_TEGRA_DRM_SYNCPT_INCR 0x02
+#define DRM_TEGRA_DRM_SYNCPT_WAIT 0x03
+#define DRM_TEGRA_DRM_OPEN_CHANNEL 0x04
+#define DRM_TEGRA_DRM_CLOSE_CHANNEL 0x05
+#define DRM_TEGRA_DRM_GET_SYNCPOINTS 0x06
+#define DRM_TEGRA_DRM_GET_MODMUTEXES 0x07
+#define DRM_TEGRA_DRM_SUBMIT 0x08
+
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_READ, struct tegra_drm_syncpt_read_args)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_INCR, struct tegra_drm_syncpt_incr_args)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_WAIT, struct tegra_drm_syncpt_wait_args)
+#define DRM_IOCTL_TEGRA_DRM_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_OPEN_CHANNEL, struct tegra_drm_open_channel_args)
+#define DRM_IOCTL_TEGRA_DRM_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_CLOSE_CHANNEL, struct tegra_drm_open_channel_args)
+#define DRM_IOCTL_TEGRA_DRM_GET_SYNCPOINTS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_GET_SYNCPOINTS, struct tegra_drm_get_channel_param_args)
+#define DRM_IOCTL_TEGRA_DRM_GET_MODMUTEXES DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_GET_MODMUTEXES, struct tegra_drm_get_channel_param_args)
+#define DRM_IOCTL_TEGRA_DRM_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SUBMIT, struct tegra_drm_submit_args)
+
+#endif
--
1.7.9.5
Rob Clark
2012-11-26 21:59:37 UTC
Permalink
On Mon, Nov 26, 2012 at 7:19 AM, Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org> wrote:
>
> +struct tegra_drm_submit_args {
> + void *context;

Just a quick comment..

You shouldn't really use ptr here, but instead use a 64bit type so
that you don't run into issues later for armv8/64bit. Same comment
applies in a few other places too.

I'll try and spend a bit more time going through this in more detail
in the coming days

BR,
-R

> + __u32 num_syncpt_incrs;
> + __u32 num_cmdbufs;
> + __u32 num_relocs;
> + __u32 submit_version;
> + __u32 num_waitchks;
> + __u32 waitchk_mask;
> + __u32 timeout;
> + struct tegra_drm_syncpt_incrs *syncpt_incrs;
> + struct tegra_drm_cmdbuf *cmdbufs;
> + struct tegra_drm_reloc *relocs;
> + struct tegra_drm_waitchk *waitchks;
> +
> + __u32 pad[5]; /* future expansion */
> + __u32 fence; /* Return value */
> +};
> +
Dave Airlie
2012-11-26 22:15:06 UTC
Permalink
> static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
> {
> - return 0;
> + struct tegra_drm_fpriv *fpriv;
> + int err = 0;
> +
> + fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
> + if (!fpriv)
> + return -ENOMEM;
> +
> + INIT_LIST_HEAD(&fpriv->contexts);
> + filp->driver_priv = fpriv;
> +

who frees this?
> +struct tegra_drm_syncpt_incr_args {
> + __u32 id;
> +};

add 32-bits of padding here

> +
> +struct tegra_drm_syncpt_wait_args {
> + __u32 id;
> + __u32 thresh;
> + __s32 timeout;
> + __u32 value;
> +};
> +
> +#define DRM_TEGRA_NO_TIMEOUT (-1)
> +
> +struct tegra_drm_open_channel_args {
> + __u32 class;
> + void *context;

no pointers use u64, align them to 64-bits, so 32-bits of padding,

> +};
> +
> +struct tegra_drm_get_channel_param_args {
> + void *context;
> + __u32 value;

Same padding + uint64_t for void *

> +};
> +
> +struct tegra_drm_syncpt_incr {
> + __u32 syncpt_id;
> + __u32 syncpt_incrs;
> +};
> +
> +struct tegra_drm_cmdbuf {
> + __u32 mem;
> + __u32 offset;
> + __u32 words;
> +};

add padding
> +
> +struct tegra_drm_reloc {
> + __u32 cmdbuf_mem;
> + __u32 cmdbuf_offset;
> + __u32 target;
> + __u32 target_offset;
> + __u32 shift;
> +};

add padding

> +
> +struct tegra_drm_waitchk {
> + __u32 mem;
> + __u32 offset;
> + __u32 syncpt_id;
> + __u32 thresh;
> +};
> +
> +struct tegra_drm_submit_args {
> + void *context;
> + __u32 num_syncpt_incrs;
> + __u32 num_cmdbufs;
> + __u32 num_relocs;
> + __u32 submit_version;
> + __u32 num_waitchks;
> + __u32 waitchk_mask;
> + __u32 timeout;
> + struct tegra_drm_syncpt_incrs *syncpt_incrs;
> + struct tegra_drm_cmdbuf *cmdbufs;
> + struct tegra_drm_reloc *relocs;
> + struct tegra_drm_waitchk *waitchks;
> +
> + __u32 pad[5]; /* future expansion */
> + __u32 fence; /* Return value */
> +};

lose all the pointers for 64-bit aligned uint64_t.

Probably should align all of these on __u64 and __u32 usage if possible.

i'll look at the rest of the patches, but I need to know what commands
can be submitted via this interface and what are the security
implications of it.

Dave.
Terje Bergström
2012-11-27 06:52:38 UTC
Permalink
On 27.11.2012 00:15, Dave Airlie wrote:
>> static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>> {
>> - return 0;
>> + struct tegra_drm_fpriv *fpriv;
>> + int err = 0;
>> +
>> + fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
>> + if (!fpriv)
>> + return -ENOMEM;
>> +
>> + INIT_LIST_HEAD(&fpriv->contexts);
>> + filp->driver_priv = fpriv;
>> +
>
> who frees this?

Probably nobody. Will fix.

>> +struct tegra_drm_syncpt_incr_args {
>> + __u32 id;
>> +};
>
> add 32-bits of padding here
>
>> +
>> +struct tegra_drm_syncpt_wait_args {
>> + __u32 id;
>> + __u32 thresh;
>> + __s32 timeout;
>> + __u32 value;
>> +};
>> +
>> +#define DRM_TEGRA_NO_TIMEOUT (-1)
>> +
>> +struct tegra_drm_open_channel_args {
>> + __u32 class;
>> + void *context;
>
> no pointers use u64, align them to 64-bits, so 32-bits of padding,

I'll add the paddings and change pointers to u64's to all of the structs
in this file.

> i'll look at the rest of the patches, but I need to know what commands
> can be submitted via this interface and what are the security
> implications of it.

All of the commands are memory operations (copy, clear, rotate, etc)
involving either one or two memory regions that are defined via dmabuf
fd's and offsets. The commands can also contain plain device virtual
addresses and 2D would be happy to oblige as long as the memory is
mapped to it.

There are a few ways to help the situation. None of them are perfect.

On Tegra30 we could allocate an address space per process. This would
mean that max 3 processes would be able to use the 2D unit at one time,
assuming that other devices are find using the one remaining address
space. On Tegra20 this is not an option.

Second would be using device permissions - only allow selected processes
to access 2D.

Third would be having a firewall in 2D driver checking the stream and
ensuring all registers that accept addresses are written by values
derived from dmabufs. I haven't tried implementing this, but it'd
involve a lookup table in kernel and CPU reading through the command
stream. Offsets and sizes would also need to be validated. There would
be a performance hit.

Fourth would be to move the creation of streams to kernel space. That'd
mean moving the whole 2D driver and host1x command stream handling to
kernel space and quite a lot of time spent in kernel. I'm not too keen
on this for obvious reasons.

Other ideas are obviously welcome.

Thanks!

Terje
Dave Airlie
2012-11-27 07:33:59 UTC
Permalink
>
> Third would be having a firewall in 2D driver checking the stream and
> ensuring all registers that accept addresses are written by values
> derived from dmabufs. I haven't tried implementing this, but it'd
> involve a lookup table in kernel and CPU reading through the command
> stream. Offsets and sizes would also need to be validated. There would
> be a performance hit.

This is the standard mechanism, and what exynos does as well.

The per process VM method is also used as an extension to this on some hw.

Dave.
Terje Bergström
2012-11-27 08:16:33 UTC
Permalink
On 27.11.2012 09:33, Dave Airlie wrote:
>> Third would be having a firewall in 2D driver checking the stream and
>> ensuring all registers that accept addresses are written by values
>> derived from dmabufs. I haven't tried implementing this, but it'd
>> involve a lookup table in kernel and CPU reading through the command
>> stream. Offsets and sizes would also need to be validated. There would
>> be a performance hit.
>
> This is the standard mechanism, and what exynos does as well.
>
> The per process VM method is also used as an extension to this on some hw.

Hi,

Thanks for the pointer, I looked at exynos code. It indeed checks the
registers written to, but it doesn't prevent overrun by checking sizes
of buffers and compare against requests.

Based on my experience with Tegra graphics stack, going through command
streams in kernel is bad for performance. For 2D operations this is
probably ok as the command streams are pretty simple. Anything more
complex is going to cause severe degradation of performance, but it's
also outside the scope of this patch set.

If this is the way to go, I'll put the firewall behind a Kconfig flag so
that system integrator can decide if his system needs it.

Terje
Dave Airlie
2012-11-27 08:32:38 UTC
Permalink
On Tue, Nov 27, 2012 at 8:16 AM, Terje Bergstr=F6m <***@nvidia.c=
om> wrote:
> On 27.11.2012 09:33, Dave Airlie wrote:
>>> Third would be having a firewall in 2D driver checking the stream a=
nd
>>> ensuring all registers that accept addresses are written by values
>>> derived from dmabufs. I haven't tried implementing this, but it'd
>>> involve a lookup table in kernel and CPU reading through the comman=
d
>>> stream. Offsets and sizes would also need to be validated. There wo=
uld
>>> be a performance hit.
>>
>> This is the standard mechanism, and what exynos does as well.
>>
>> The per process VM method is also used as an extension to this on so=
me hw.
>
> Hi,
>
> Thanks for the pointer, I looked at exynos code. It indeed checks the
> registers written to, but it doesn't prevent overrun by checking size=
s
> of buffers and compare against requests.

They probably need to add that, its not as important as the base
addresses, unless it takes negative strides, generally base addresses
means you can target current->uid quite easily!

> Based on my experience with Tegra graphics stack, going through comma=
nd
> streams in kernel is bad for performance. For 2D operations this is
> probably ok as the command streams are pretty simple. Anything more
> complex is going to cause severe degradation of performance, but it's
> also outside the scope of this patch set.
>
> If this is the way to go, I'll put the firewall behind a Kconfig flag=
so
> that system integrator can decide if his system needs it.

We don't generally make security like this optional :-)

If you do that you should restrict the drm device to root users only,
and never let a user with a browser anywhere near it.

Like I know what you guys get away with in closed source world, but
here we don't write root holes into our driver deliberately.

Dave.
Terje Bergström
2012-11-27 08:45:28 UTC
Permalink
On 27.11.2012 10:32, Dave Airlie wrote:
> On Tue, Nov 27, 2012 at 8:16 AM, Terje Bergstr=F6m <***@nvidia=
=2Ecom> wrote:
>> Thanks for the pointer, I looked at exynos code. It indeed checks th=
e
>> registers written to, but it doesn't prevent overrun by checking siz=
es
>> of buffers and compare against requests.
> They probably need to add that, its not as important as the base
> addresses, unless it takes negative strides, generally base addresses
> means you can target current->uid quite easily!

Ok. We'll implement the firewall, unless we come up with even a better
choice.

>> If this is the way to go, I'll put the firewall behind a Kconfig fla=
g so
>> that system integrator can decide if his system needs it.
> We don't generally make security like this optional :-)
>=20
> If you do that you should restrict the drm device to root users only,
> and never let a user with a browser anywhere near it.

My thinking was that the system integrator can decide how much he trust=
s
the binaries (incl. browser plugins) in the system. If he trusts the
binaries, the firewall can be turned off.

> Like I know what you guys get away with in closed source world, but
> here we don't write root holes into our driver deliberately.

Duly noted. :-)

Terje
Lucas Stach
2012-11-27 10:22:56 UTC
Permalink
Am Dienstag, den 27.11.2012, 10:45 +0200 schrieb Terje Bergstr=C3=B6m:
> On 27.11.2012 10:32, Dave Airlie wrote:
> > On Tue, Nov 27, 2012 at 8:16 AM, Terje Bergstr=C3=B6m <***@n=
vidia.com> wrote:
> >> Thanks for the pointer, I looked at exynos code. It indeed checks =
the
> >> registers written to, but it doesn't prevent overrun by checking s=
izes
> >> of buffers and compare against requests.
> > They probably need to add that, its not as important as the base
> > addresses, unless it takes negative strides, generally base address=
es
> > means you can target current->uid quite easily!
>=20
> Ok. We'll implement the firewall, unless we come up with even a bette=
r
> choice.
>=20
> >> If this is the way to go, I'll put the firewall behind a Kconfig f=
lag so
> >> that system integrator can decide if his system needs it.
> > We don't generally make security like this optional :-)
> >=20
> > If you do that you should restrict the drm device to root users onl=
y,
> > and never let a user with a browser anywhere near it.
>=20
Personally I would never trust any binary, but that's just my personal
opinion.

But I'm in favour of having the command stream checking optional, simpl=
y
backed by the fact that we are likely to use the same 2D driver
infrastructure for Tegra 2 and 3. On Tegra 3 we can most likely go
without in-depth command stream checking as the graphics core there sit=
s
behind the IOMMU, which can provide an appropriate level of security.

Regards,
Lucas
Thierry Reding
2012-11-27 10:37:39 UTC
Permalink
On Tue, Nov 27, 2012 at 11:22:56AM +0100, Lucas Stach wrote:
> Am Dienstag, den 27.11.2012, 10:45 +0200 schrieb Terje Bergström:
> > On 27.11.2012 10:32, Dave Airlie wrote:
> > > On Tue, Nov 27, 2012 at 8:16 AM, Terje Bergström <***@nvidia.com> wrote:
> > >> Thanks for the pointer, I looked at exynos code. It indeed checks the
> > >> registers written to, but it doesn't prevent overrun by checking sizes
> > >> of buffers and compare against requests.
> > > They probably need to add that, its not as important as the base
> > > addresses, unless it takes negative strides, generally base addresses
> > > means you can target current->uid quite easily!
> >
> > Ok. We'll implement the firewall, unless we come up with even a better
> > choice.
> >
> > >> If this is the way to go, I'll put the firewall behind a Kconfig flag so
> > >> that system integrator can decide if his system needs it.
> > > We don't generally make security like this optional :-)
> > >
> > > If you do that you should restrict the drm device to root users only,
> > > and never let a user with a browser anywhere near it.
> >
> Personally I would never trust any binary, but that's just my personal
> opinion.
>
> But I'm in favour of having the command stream checking optional, simply
> backed by the fact that we are likely to use the same 2D driver
> infrastructure for Tegra 2 and 3. On Tegra 3 we can most likely go
> without in-depth command stream checking as the graphics core there sits
> behind the IOMMU, which can provide an appropriate level of security.

But in that case it should be made mandatory at first until proper IOMMU
support is enabled on Tegra30. Then it can be checked at driver probe
time whether or not to enable the extra checks. That way we don't need a
special Kconfig option and we still get all the security that we need,
right?

Thierry
Terje Bergström
2012-11-27 11:31:15 UTC
Permalink
On 27.11.2012 12:37, Thierry Reding wrote:
> But in that case it should be made mandatory at first until proper IOMMU
> support is enabled on Tegra30. Then it can be checked at driver probe
> time whether or not to enable the extra checks. That way we don't need a
> special Kconfig option and we still get all the security that we need,
> right?

I guess it depends on the level of security.

If we want to only protect kernel and user space memory, this would be
sufficient and no firewall is needed if IOMMU is turned on.

If we want to protect 2D buffers from each other, this is not sufficient.

Terje
Lucas Stach
2012-11-27 11:47:50 UTC
Permalink
Am Dienstag, den 27.11.2012, 13:31 +0200 schrieb Terje Bergstr=C3=B6m:
> On 27.11.2012 12:37, Thierry Reding wrote:
> > But in that case it should be made mandatory at first until proper =
IOMMU
> > support is enabled on Tegra30. Then it can be checked at driver pro=
be
> > time whether or not to enable the extra checks. That way we don't n=
eed a
> > special Kconfig option and we still get all the security that we ne=
ed,
> > right?
>=20
> I guess it depends on the level of security.
>=20
> If we want to only protect kernel and user space memory, this would b=
e
> sufficient and no firewall is needed if IOMMU is turned on.
>=20
> If we want to protect 2D buffers from each other, this is not suffici=
ent.
>=20
I guess we could change IOMMU address spaces for the graphics units
depending on the active channel. This would still be a bit of a
performance hit, because of the necessary TLB flushing and so on, but
should be much better than checking the whole command stream. This way
you at least get security on a process level, as no process is able to
corrupt another processes graphics resources.

This is the same level of security as provided by the nouveau driver.
But to do so all memory management has to be done in kernel and from th=
e
current submissions of the 2D infrastructure I fear that the current
architecture does too much of that in userspace, but I'll hold back wit=
h
any judgement until we actually get to see the userspace parts.

Also to implement this strategy you have to take ownership of the
graphics address space on a much lower level than the DMA API. This
might take some work together with the IOMMU guys.

Regards,
Lucas
Terje Bergström
2012-11-27 12:59:09 UTC
Permalink
On 27.11.2012 13:47, Lucas Stach wrote:
> I guess we could change IOMMU address spaces for the graphics units
> depending on the active channel. This would still be a bit of a
> performance hit, because of the necessary TLB flushing and so on, but
> should be much better than checking the whole command stream. This way
> you at least get security on a process level, as no process is able to
> corrupt another processes graphics resources.

One physical channel is shared with all users of the 2D unit. Each job
is just added to the queue, and host1x will happily cross from one job
to the next without intervention from CPU. This is done to keep CPU
overhead down to improve power and performance.

This also means that we cannot change the IOMMU settings between jobs
from different processes, unless we pause the channel after every job.

This is still an interesting thought - can we postpone binding of a
buffer to address space until submit time, and give each process its own
address space? We would have a limit of "submits from three processes
going on at once" instead of "three processes can open 2D channel at
once". That's a limitation we could live with.

Naturally, Tegra2 is still left in the cold.

> This is the same level of security as provided by the nouveau driver.
> But to do so all memory management has to be done in kernel and from the
> current submissions of the 2D infrastructure I fear that the current
> architecture does too much of that in userspace, but I'll hold back with
> any judgement until we actually get to see the userspace parts.

User space allocates buffer, exports as dmabuf fd, and passes the fd in
submits to kernel, and frees the buffer. No other memory management is
done in user space.

> Also to implement this strategy you have to take ownership of the
> graphics address space on a much lower level than the DMA API. This
> might take some work together with the IOMMU guys.

I'll go through this with Hiroshi, who knows that area.

Terje
Dave Airlie
2012-11-27 23:00:54 UTC
Permalink
On Tue, Nov 27, 2012 at 9:31 PM, Terje Bergstr=F6m <***@nvidia.c=
om> wrote:
> On 27.11.2012 12:37, Thierry Reding wrote:
>> But in that case it should be made mandatory at first until proper I=
OMMU
>> support is enabled on Tegra30. Then it can be checked at driver prob=
e
>> time whether or not to enable the extra checks. That way we don't ne=
ed a
>> special Kconfig option and we still get all the security that we nee=
d,
>> right?
>
> I guess it depends on the level of security.
>
> If we want to only protect kernel and user space memory, this would b=
e
> sufficient and no firewall is needed if IOMMU is turned on.
>
> If we want to protect 2D buffers from each other, this is not suffici=
ent.

We generally aim for the first, to stop the gpu from reading/writing
any memory it hasn't been granted access to,
the second is nice to have though, but really requires a GPU with VM
to implement properly.

Dave.
Terje Bergström
2012-11-28 13:17:51 UTC
Permalink
On 28.11.2012 01:00, Dave Airlie wrote:
> We generally aim for the first, to stop the gpu from reading/writing
> any memory it hasn't been granted access to,
> the second is nice to have though, but really requires a GPU with VM
> to implement properly.

I wonder if we should aim at root only access on Tegra20, and force
IOMMU on Tegra30 and fix the remaining issues we have with IOMMU. The
firewall turns out to be more complicated than I wished.

Biggest problem is that we aim at zero-copy for everything possible,
including command streams. Kernel gets a handle to a command stream, but
the command stream is allocated by the user space process. So the user
space can tamper with the stream once it's been written to the host1x 2D
channel.

Copying with firewall is one option, but that would again kill the
performance. One option would be user space unmapping the command buffer
when it's sent to kernel, and kernel checking that it's unmapped before
it agrees to send the stream to hardware.

On Tegra30 with IOMMU turned on things are ok without any checks,
because all access would go via MMU, which makes kernel memory inaccessible.

Of course, better ideas are welcome.

Terje
Lucas Stach
2012-11-28 13:33:22 UTC
Permalink
Am Mittwoch, den 28.11.2012, 15:17 +0200 schrieb Terje Bergstr=C3=B6m:
> On 28.11.2012 01:00, Dave Airlie wrote:
> > We generally aim for the first, to stop the gpu from reading/writi=
ng
> > any memory it hasn't been granted access to,
> > the second is nice to have though, but really requires a GPU with V=
M
> > to implement properly.
>=20
> I wonder if we should aim at root only access on Tegra20, and force
> IOMMU on Tegra30 and fix the remaining issues we have with IOMMU. The
> firewall turns out to be more complicated than I wished.
>=20
> Biggest problem is that we aim at zero-copy for everything possible,
> including command streams. Kernel gets a handle to a command stream, =
but
> the command stream is allocated by the user space process. So the use=
r
> space can tamper with the stream once it's been written to the host1x=
2D
> channel.
>=20
So this is obviously wrong. Userspace has to allocate a pushbuffer from
the kernel just as every other buffer, then map it into it's own addres=
s
space to push in commands. At submit time of the pushbuf kernel has to
make sure that userspace is not able to access the memory any more, i.e=
=2E
kernel shoots down the vma or pagetable of the vma. To keep overhead lo=
w
and not do any blocking you can just keep some pushbufs around for one
channel and switch over the pagetable entries to the next free buffer,
just make sure that userspace is never able to tamper with a buffer as
long as the gpu isn't done with it.

Regards,
Lucas
Terje Bergström
2012-11-28 13:57:25 UTC
Permalink
On 28.11.2012 15:33, Lucas Stach wrote:
> So this is obviously wrong. Userspace has to allocate a pushbuffer from
> the kernel just as every other buffer, then map it into it's own address
> space to push in commands. At submit time of the pushbuf kernel has to
> make sure that userspace is not able to access the memory any more, i.e.
> kernel shoots down the vma or pagetable of the vma. To keep overhead low
> and not do any blocking you can just keep some pushbufs around for one
> channel and switch over the pagetable entries to the next free buffer,
> just make sure that userspace is never able to tamper with a buffer as
> long as the gpu isn't done with it.

That's really not something dma-buf APIs are equipped to handle. We need
something to ensure user space doesn't have the buffer mapped (either
return error if has, or zap the mapping), something to ensure user space
cannot mmap the buffer, and something to revert this all once we're done.

We could add these as special ops to tegradrm dmabuf code for now, and
assume that command streams are always allocated by tegradrm. Now we
allow any dmabuf to be used as buffers for command streams.

And, with IOMMU I don't think we would need any of this. I guess we need
to press the gas pedal on figuring out how to enable that for tegradrm
on Tegra30.

We already allocate multiple buffers to be able to fill in the next
buffer once we've send one to kernel, so that part is ok. We reuse only
once we know that the operations contained are done.

Terje
Lucas Stach
2012-11-28 14:06:05 UTC
Permalink
Am Mittwoch, den 28.11.2012, 15:57 +0200 schrieb Terje Bergstr=C3=B6m:
> On 28.11.2012 15:33, Lucas Stach wrote:
> > So this is obviously wrong. Userspace has to allocate a pushbuffer =
from
> > the kernel just as every other buffer, then map it into it's own ad=
dress
> > space to push in commands. At submit time of the pushbuf kernel has=
to
> > make sure that userspace is not able to access the memory any more,=
i.e.
> > kernel shoots down the vma or pagetable of the vma. To keep overhea=
d low
> > and not do any blocking you can just keep some pushbufs around for =
one
> > channel and switch over the pagetable entries to the next free buff=
er,
> > just make sure that userspace is never able to tamper with a buffer=
as
> > long as the gpu isn't done with it.
>=20
> That's really not something dma-buf APIs are equipped to handle. We n=
eed
> something to ensure user space doesn't have the buffer mapped (either
> return error if has, or zap the mapping), something to ensure user sp=
ace
> cannot mmap the buffer, and something to revert this all once we're d=
one.
>=20
> We could add these as special ops to tegradrm dmabuf code for now, an=
d
> assume that command streams are always allocated by tegradrm. Now we
> allow any dmabuf to be used as buffers for command streams.
>=20
Why do even need/use dma-buf for this use case? This is all one DRM
device, even if we separate host1x and gr2d as implementation modules.

So standard way of doing this is:
1. create gem object for pushbuffer
2. create fake mmap offset for gem obj
3. map pushbuf using the fake offset on the drm device
4. at submit time zap the mapping

You need this logic anyway, as normally we don't rely on userspace to
sync gpu and cpu, but use the kernel to handle the concurrency issues.

Regards,
Lucas
Terje Bergström
2012-11-28 14:45:15 UTC
Permalink
On 28.11.2012 16:06, Lucas Stach wrote:
> Why do even need/use dma-buf for this use case? This is all one DRM
> device, even if we separate host1x and gr2d as implementation modules.

I didn't want to implement dependency to drm gem objects in nvhost, but
we have thought about doing that. dma-buf brings quite a lot of
overhead, so implementing support for gem buffers would make the
sequence a bit leaner.

nvhost already has infra to support multiple memory managers.

> So standard way of doing this is:
> 1. create gem object for pushbuffer
> 2. create fake mmap offset for gem obj
> 3. map pushbuf using the fake offset on the drm device
> 4. at submit time zap the mapping
>
> You need this logic anyway, as normally we don't rely on userspace to
> sync gpu and cpu, but use the kernel to handle the concurrency issues.

Taking a step back - 2D streams are actually very short, in the order of
<100 bytes. Just copying them to kernel space would actually be faster
than doing MMU operations.

I think for Tegra20 and non-IOMMU case, we just need to copy the command
stream to kernel buffer. In Tegra30 IOMMU case reference to user space
buffers are fine, as tampering the streams doesn't have any ill effects.

Terje
Lucas Stach
2012-11-28 15:13:29 UTC
Permalink
Am Mittwoch, den 28.11.2012, 16:45 +0200 schrieb Terje Bergstr=C3=B6m:
> On 28.11.2012 16:06, Lucas Stach wrote:
> > Why do even need/use dma-buf for this use case? This is all one DRM
> > device, even if we separate host1x and gr2d as implementation modul=
es.
>=20
> I didn't want to implement dependency to drm gem objects in nvhost, b=
ut
> we have thought about doing that. dma-buf brings quite a lot of
> overhead, so implementing support for gem buffers would make the
> sequence a bit leaner.
>=20
> nvhost already has infra to support multiple memory managers.
>=20
To be honest I still don't grok all of this, but nonetheless I try my
best.

Anyway, shouldn't nvhost be something like an allocator used by host1x
clients? With the added ability to do relocs/binding of buffers into
client address spaces, refcounting buffers and import/export dma-bufs?
In this case nvhost objects would just be used to back DRM GEM objects.
If using GEM objects in the DRM driver introduces any cross dependencie=
s
with nvhost, you should take a step back and ask yourself if the curren=
t
design is the right way to go.

> > So standard way of doing this is:
> > 1. create gem object for pushbuffer
> > 2. create fake mmap offset for gem obj
> > 3. map pushbuf using the fake offset on the drm device
> > 4. at submit time zap the mapping
> >=20
> > You need this logic anyway, as normally we don't rely on userspace =
to
> > sync gpu and cpu, but use the kernel to handle the concurrency issu=
es.
>=20
> Taking a step back - 2D streams are actually very short, in the order=
of
> <100 bytes. Just copying them to kernel space would actually be faste=
r
> than doing MMU operations.
>=20
Is this always the case because of the limited abilities of the gr2d
engine, or is it just your current driver flushing the stream very
often?

> I think for Tegra20 and non-IOMMU case, we just need to copy the comm=
and
> stream to kernel buffer. In Tegra30 IOMMU case reference to user spac=
e
> buffers are fine, as tampering the streams doesn't have any ill effec=
ts.
>=20
In which way is it a good design choice to let the CPU happily alter
_any_ buffer the GPU is busy processing without getting the concurrency
right?

Please keep in mind that the interfaces you are now trying to introduce
have to be supported for virtually unlimited time. You might not be abl=
e
to scrub your mistakes later on without going through a lot of hassles.

To avoid a lot of those mistakes it might be a good idea to look at how
other drivers use the DRM infrastructure and only part from those prove=
n
schemes where really necessary/worthwhile.

Regards,
Lucas
Terje Bergström
2012-11-28 16:23:12 UTC
Permalink
On 28.11.2012 17:13, Lucas Stach wrote:
> To be honest I still don't grok all of this, but nonetheless I try my
> best.

Sorry. I promised in another thread a write-up explaining the design. I
still owe you guys that.

> Anyway, shouldn't nvhost be something like an allocator used by host1x
> clients? With the added ability to do relocs/binding of buffers into
> client address spaces, refcounting buffers and import/export dma-bufs?
> In this case nvhost objects would just be used to back DRM GEM objects.
> If using GEM objects in the DRM driver introduces any cross dependencies
> with nvhost, you should take a step back and ask yourself if the current
> design is the right way to go.

tegradrm has the GEM allocator, and tegradrm contains the 2D kernel
interface. tegradrm contains a dma-buf exporter for the tegradrm GEM
objects.

nvhost accepts jobs from tegradrm's 2D driver. nvhost increments
refcounts and maps the command stream and target memories to devices,
maps the command streams to kernel memory, replaces the placeholders in
command streams with addresses with device virtual addresses, and unmaps
the buffer from kernel memory. nvhost uses dma buf APIs for all of the
memory operations, and relies on dmabuf for refcounting. After all this
the command streams are pushed to host1x push buffer as GATHER (kind of
a "gosub") opcodes, which reference to the command streams.

Once the job is done, nvhost decrements refcounts and updates pushbuffer
pointers.

The design is done so that nvhost won't be DRM specific. I want to
enable creating V4L2 etc interfaces that talk to other host1x clients.
V4L2 (yeah, I know nothing of V4L2) could pass frames via nvhost to EPP
for pixel format conversion or 2D for rotation and write result to frame
buffer.

Do you think there's some fundamental problem with this design?

>> Taking a step back - 2D streams are actually very short, in the order of
>> <100 bytes. Just copying them to kernel space would actually be faster
>> than doing MMU operations.
>>
> Is this always the case because of the limited abilities of the gr2d
> engine, or is it just your current driver flushing the stream very
> often?

It's because of limited abilities of the hardware. It just doesn't take
that many operations to invoke 2D.

The libdrm user space we're created flushes probably a bit too often
now, but even in downstream the streams are not much longer. It takes
still at least a week to get the user space code out for you to look at.

> In which way is it a good design choice to let the CPU happily alter
> _any_ buffer the GPU is busy processing without getting the concurrency
> right?

Concurrency is handled with sync points. User space will know when a
command stream is processed and can be reused by comparing the current
sync point value, and the fence that 2D driver returned to user space.
User space can have a pool of buffers and can recycle when it knows it
can do so. But, this is not enforced by kernel.

The difference with your proposal and what I posted is the level of
control user space has over its command stream management. But as said,
2D streams are so short that my guess is that there's not too much
penalty copying it to kernel managed host1x push buffer directly instead
of inserting a GATHER reference.

> Please keep in mind that the interfaces you are now trying to introduce
> have to be supported for virtually unlimited time. You might not be able
> to scrub your mistakes later on without going through a lot of hassles.
>
> To avoid a lot of those mistakes it might be a good idea to look at how
> other drivers use the DRM infrastructure and only part from those proven
> schemes where really necessary/worthwhile.

Yep, as the owner of this driver downstream, I'm also leveraging my
experience with the graphics stack in our downstream software stack that
is accessible via f.ex. L4T.

This is exactly the discussion we should be having, and I'm learning all
the time, so let's continue tossing around ideas until we're both happy
with the result.

Terje
Lucas Stach
2012-11-28 18:46:48 UTC
Permalink
Am Mittwoch, den 28.11.2012, 18:23 +0200 schrieb Terje Bergstr=C3=B6m:
> On 28.11.2012 17:13, Lucas Stach wrote:
> > To be honest I still don't grok all of this, but nonetheless I try =
my
> > best.
>=20
> Sorry. I promised in another thread a write-up explaining the design.=
I
> still owe you guys that.
>=20
That would be really nice to have. I'm also particularly interested in
how you plan to do synchronization of command streams to different
engines working together, if that's not too much to ask for now. Like
userspace uploading a texture in a buffer, 2D engine doing mipmap
generation, 3D engine using mipmapped texture.

> > Anyway, shouldn't nvhost be something like an allocator used by hos=
t1x
> > clients? With the added ability to do relocs/binding of buffers int=
o
> > client address spaces, refcounting buffers and import/export dma-bu=
fs?
> > In this case nvhost objects would just be used to back DRM GEM obje=
cts.
> > If using GEM objects in the DRM driver introduces any cross depende=
ncies
> > with nvhost, you should take a step back and ask yourself if the cu=
rrent
> > design is the right way to go.
>=20
> tegradrm has the GEM allocator, and tegradrm contains the 2D kernel
> interface. tegradrm contains a dma-buf exporter for the tegradrm GEM
> objects.
>=20
> nvhost accepts jobs from tegradrm's 2D driver. nvhost increments
> refcounts and maps the command stream and target memories to devices,
> maps the command streams to kernel memory, replaces the placeholders =
in
> command streams with addresses with device virtual addresses, and unm=
aps
> the buffer from kernel memory. nvhost uses dma buf APIs for all of th=
e
> memory operations, and relies on dmabuf for refcounting. After all th=
is
> the command streams are pushed to host1x push buffer as GATHER (kind =
of
> a "gosub") opcodes, which reference to the command streams.
>=20
> Once the job is done, nvhost decrements refcounts and updates pushbuf=
fer
> pointers.
>=20
> The design is done so that nvhost won't be DRM specific. I want to
> enable creating V4L2 etc interfaces that talk to other host1x clients=
=2E
> V4L2 (yeah, I know nothing of V4L2) could pass frames via nvhost to E=
PP
> for pixel format conversion or 2D for rotation and write result to fr=
ame
> buffer.
>=20
> Do you think there's some fundamental problem with this design?
>=20
Ah yes I see. So if we consider nvhost to be the central entity in
charge of controlling all host1x clients and tegradrm as the interface
that happens to bundle display, 2d and 3d engine functionality into it'=
s
interface we should probably aim for two things:
1. Move everything needed by all engines down into nvhost (I especially
see the allocator falling under this point, I'll explain why this would
be beneficial a bit later)

2. Move the exposed DRM interface more in line with other DRM drivers.
Please take a look at how for example the GEM_EXECBUF ioctl works on
other drivers to get a feeling of what I'm talking about. Everything
using the display, 2D and maybe later on the 3D engine should only deal
with GEM handles. I really don't like the idea of having a single
userspace application, which uses engines with similar and known
requirements (DDX) dealing with dma-buf handles or other similar high
overhead stuff to do the most basic tasks.=20

If we move down the allocator into nvhost we can use buffers allocated
from this to back GEM or V4L2 buffers transparently. The ioctl to
allocate a GEM buffer shouldn't do much more than wrapping the nvhost
buffer.
This may also solve your problem with having multiple mappings of the
same buffer into the very same address space, as nvhost is the single
instance that manages all host1x client address spaces. If the buffer i=
s
originating from there you can easily check if it's already mapped. For
Tegra 3 to do things in an efficient way we likely have to move away
from dealing with the DMA API to dealing with the IOMMU API, this gets =
a
_lot_ easier_ if you have a single point where you manage memory
allocation and address space.

dma-buf should only be used where userspace is dealing with devices tha=
t
get controlled by different interfaces, like pointing a display plane t=
o
some camera buffer. And even then with a single allocator for the host1=
x
clients an dma-buf import is nothing more than realizing that the fd is
one of the fds you exported yourself, so you can go and look it up and
then depending on the device you are on just pointing the engine at the
memory location or fixing up the iommu mapping.

> >> Taking a step back - 2D streams are actually very short, in the or=
der of
> >> <100 bytes. Just copying them to kernel space would actually be fa=
ster
> >> than doing MMU operations.
> >>
> > Is this always the case because of the limited abilities of the gr2=
d
> > engine, or is it just your current driver flushing the stream very
> > often?
>=20
> It's because of limited abilities of the hardware. It just doesn't ta=
ke
> that many operations to invoke 2D.
>=20
> The libdrm user space we're created flushes probably a bit too often
> now, but even in downstream the streams are not much longer. It take=
s
> still at least a week to get the user space code out for you to look =
at.
>=20
That's no problem, as I may not be able to do in-depth reviews of any
code until next week.

> > In which way is it a good design choice to let the CPU happily alte=
r
> > _any_ buffer the GPU is busy processing without getting the concurr=
ency
> > right?
>=20
> Concurrency is handled with sync points. User space will know when a
> command stream is processed and can be reused by comparing the curren=
t
> sync point value, and the fence that 2D driver returned to user space=
=2E
> User space can have a pool of buffers and can recycle when it knows i=
t
> can do so. But, this is not enforced by kernel.
>=20
This is the point where we have differ: You have to deal with syncpts i=
n
kernel anyway, otherwise you don't know when it's safe to destroy a
buffer. And no, userspace should not have the ability to destroy a
buffer itself, userspace should always just be able to free it's
reference to the buffer. Remember: never trust the userspace. And if yo=
u
are dealing with syncpts in kernel anyway, you can just go ahead and
enforce some sane concurrency rules. There may be some corener cases
related to userspace suballocating a kernel buffer, which might need
some more thought still, but that's not a valid excuse to not do any
concurrency validation in kernel.

> The difference with your proposal and what I posted is the level of
> control user space has over its command stream management. But as sai=
d,
> 2D streams are so short that my guess is that there's not too much
> penalty copying it to kernel managed host1x push buffer directly inst=
ead
> of inserting a GATHER reference.
>=20
This an implementation detail. Whether you shoot down the old pushbuf
mapping and insert a new one pointing to free backing memory (which may
be the way to go for 3D) or do an immediate copy of the channel pushbuf
contents to the host1x pushbuf (which may be beneficial for very small
pushs) is all the same. Both methods implicitly guarantee that the
memory mapped by userspace always points to a location the CPU can writ=
e
to without interfering with the GPU.

> > Please keep in mind that the interfaces you are now trying to intro=
duce
> > have to be supported for virtually unlimited time. You might not be=
able
> > to scrub your mistakes later on without going through a lot of hass=
les.
> >=20
> > To avoid a lot of those mistakes it might be a good idea to look at=
how
> > other drivers use the DRM infrastructure and only part from those p=
roven
> > schemes where really necessary/worthwhile.
>=20
> Yep, as the owner of this driver downstream, I'm also leveraging my
> experience with the graphics stack in our downstream software stack t=
hat
> is accessible via f.ex. L4T.
>=20
> This is exactly the discussion we should be having, and I'm learning =
all
> the time, so let's continue tossing around ideas until we're both hap=
py
> with the result.
>=20
I really enjoyed the discussion so far and hope we can get to the point
where we have a nice design/interface, working together.

Regards,
Lucas
Terje Bergström
2012-11-29 08:17:44 UTC
Permalink
On 28.11.2012 20:46, Lucas Stach wrote:
> Am Mittwoch, den 28.11.2012, 18:23 +0200 schrieb Terje Bergström:
>> Sorry. I promised in another thread a write-up explaining the design. I
>> still owe you guys that.
> That would be really nice to have. I'm also particularly interested in
> how you plan to do synchronization of command streams to different
> engines working together, if that's not too much to ask for now. Like
> userspace uploading a texture in a buffer, 2D engine doing mipmap
> generation, 3D engine using mipmapped texture.

I can briefly explain (and then copy-paste to a coherent text once I get
to it) how inter-engine synchronization is done. It's not specifically
for 2D or 3D, but generic to any host1x client.

Sync point register is a counter that can only increase. It starts from
0 and is incremented by a host1x client or CPU. host1x can freeze a
channel until a sync point value is reached, and it can trigger an
interrupt upon reaching a threshold. On Tegra2 and Tegra3 we have 32
sync points.

host1x clients all implement a method for incrementing a sync point
based on a condition, and on all of them (well, not entirely true) the
register is number 0. The most used condition is op_done, telling the
client to increment sync point once the previous operations are done.

In kernel, we keep track of the active range of sync point values, i.e.
ones we expect to be reached with the active jobs. Active range's
minimum is the current value read from hw and shadowed in memory. At job
submit time, kernel increments the maximum by the number of sync points
the stream announces it will perform. After performing the increment, we
have a number, which the sync point is supposed to reach at the end of
submit. That number is the the fence and it is recorded in kernel and
returned to user space.

So, when user space flushes, it receives a fence. It can insert the
fence into another command stream as parameter to a host1x channel wait.
This makes that channel freeze until an operation in another channel is
finished. That's how different host1x clients can synchronize without
using CPU.

Kernel's sync point wait essentially puts the process into sleep until
host1x sends an interrupt and we determine the value that a process is
waiting for, has been reached.

On top of this, we guard against wrapping issues by nulling out any sync
point waits (CPU or inside stream) that are waiting for values outside
the active range, and we have a timeout for jobs so that we can kick out
misbehaving command streams.

> Ah yes I see. So if we consider nvhost to be the central entity in
> charge of controlling all host1x clients and tegradrm as the interface
> that happens to bundle display, 2d and 3d engine functionality into it's
> interface we should probably aim for two things:
> 1. Move everything needed by all engines down into nvhost (I especially
> see the allocator falling under this point, I'll explain why this would
> be beneficial a bit later)

Ok. This is almost the current design, except for the allocator.

> 2. Move the exposed DRM interface more in line with other DRM drivers.
> Please take a look at how for example the GEM_EXECBUF ioctl works on
> other drivers to get a feeling of what I'm talking about. Everything
> using the display, 2D and maybe later on the 3D engine should only deal
> with GEM handles. I really don't like the idea of having a single
> userspace application, which uses engines with similar and known
> requirements (DDX) dealing with dma-buf handles or other similar high
> overhead stuff to do the most basic tasks.
> If we move down the allocator into nvhost we can use buffers allocated
> from this to back GEM or V4L2 buffers transparently. The ioctl to
> allocate a GEM buffer shouldn't do much more than wrapping the nvhost
> buffer.

Ok, this is actually what we do downstream. We use dma-buf handles only
for purposes where they're really needed (in fact, none yet), and use
our downstream allocator handles for the rest. I did this, because
benchmarks were showing that memory management overhead shoot through
the roof if I tried doing everything via dma-buf.

We can move support for allocating GEM handles to nvhost, and GEM
handles can be treated just as another memory handle type in nvhost.
tegradrm would then call nvhost for allocation.


> This may also solve your problem with having multiple mappings of the
> same buffer into the very same address space, as nvhost is the single
> instance that manages all host1x client address spaces. If the buffer is
> originating from there you can easily check if it's already mapped. For
> Tegra 3 to do things in an efficient way we likely have to move away
> from dealing with the DMA API to dealing with the IOMMU API, this gets a
> _lot_ easier_ if you have a single point where you manage memory
> allocation and address space.

Yep, this would definitely simplify our IOMMU problem. But, I thought
the canonical way of dealing with device memory is DMA API, and you're
saying that we should just bypass it and call IOMMU directly?

>> Concurrency is handled with sync points. User space will know when a
>> command stream is processed and can be reused by comparing the current
>> sync point value, and the fence that 2D driver returned to user space.
>> User space can have a pool of buffers and can recycle when it knows it
>> can do so. But, this is not enforced by kernel.
>>
> This is the point where we have differ: You have to deal with syncpts in
> kernel anyway, otherwise you don't know when it's safe to destroy a
> buffer. And no, userspace should not have the ability to destroy a
> buffer itself, userspace should always just be able to free it's
> reference to the buffer. Remember: never trust the userspace. And if you
> are dealing with syncpts in kernel anyway, you can just go ahead and
> enforce some sane concurrency rules. There may be some corener cases
> related to userspace suballocating a kernel buffer, which might need
> some more thought still, but that's not a valid excuse to not do any
> concurrency validation in kernel.

nvhost is already dealing with sync points, and protecting memory from
being freed if it's used. We use refcounting to do that. When a job is
sent to hw, we get reference to all memory (command stream & surfaces).
When job is done (fence reached), nvhost unreferences them. User space
can free the memory it has allocated, but kernel would hold on to it
until it's safe to actually release the memory.

>> The difference with your proposal and what I posted is the level of
>> control user space has over its command stream management. But as said,
>> 2D streams are so short that my guess is that there's not too much
>> penalty copying it to kernel managed host1x push buffer directly instead
>> of inserting a GATHER reference.
>>
> This an implementation detail. Whether you shoot down the old pushbuf
> mapping and insert a new one pointing to free backing memory (which may
> be the way to go for 3D) or do an immediate copy of the channel pushbuf
> contents to the host1x pushbuf (which may be beneficial for very small
> pushs) is all the same. Both methods implicitly guarantee that the
> memory mapped by userspace always points to a location the CPU can write
> to without interfering with the GPU.

Ok. Based on this, I propose the way to go for cases without IOMMU
support and all Tegra20 cases (as Tegra20's GART can't provide memory
protection) is to copy the stream to host1x push buffer. In Tegra30 with
IOMMU support we can just reference the buffer. This way we don't have
to do expensive MMU operations.

> I really enjoyed the discussion so far and hope we can get to the point
> where we have a nice design/interface, working together.

Thanks. I don't have a strict deadline for upstreaming, so I'm fine
continuing with discussion until we're settled, and then doing the
needed changes.

My goal is to end up with something that we can take advantage of in
upstream with tegradrm and 2D, but also downstream with the rest of the
downstream stack. That way there's no technical barrier to us moving in
downstream to use the upstream code.

Terje
Lucas Stach
2012-11-29 09:09:13 UTC
Permalink
Am Donnerstag, den 29.11.2012, 10:17 +0200 schrieb Terje Bergstr=C3=B6m=
:
> On 28.11.2012 20:46, Lucas Stach wrote:
> > Am Mittwoch, den 28.11.2012, 18:23 +0200 schrieb Terje Bergstr=C3=B6=
m:
> >> Sorry. I promised in another thread a write-up explaining the desi=
gn. I
> >> still owe you guys that.
> > That would be really nice to have. I'm also particularly interested=
in
> > how you plan to do synchronization of command streams to different
> > engines working together, if that's not too much to ask for now. Li=
ke
> > userspace uploading a texture in a buffer, 2D engine doing mipmap
> > generation, 3D engine using mipmapped texture.
>=20
> I can briefly explain (and then copy-paste to a coherent text once I =
get
> to it) how inter-engine synchronization is done. It's not specificall=
y
> for 2D or 3D, but generic to any host1x client.
[...]
Thanks for that.
[...]

> > 2. Move the exposed DRM interface more in line with other DRM drive=
rs.
> > Please take a look at how for example the GEM_EXECBUF ioctl works o=
n
> > other drivers to get a feeling of what I'm talking about. Everythin=
g
> > using the display, 2D and maybe later on the 3D engine should only =
deal
> > with GEM handles. I really don't like the idea of having a single
> > userspace application, which uses engines with similar and known
> > requirements (DDX) dealing with dma-buf handles or other similar hi=
gh
> > overhead stuff to do the most basic tasks.
> > If we move down the allocator into nvhost we can use buffers alloca=
ted
> > from this to back GEM or V4L2 buffers transparently. The ioctl to
> > allocate a GEM buffer shouldn't do much more than wrapping the nvho=
st
> > buffer.
>=20
> Ok, this is actually what we do downstream. We use dma-buf handles on=
ly
> for purposes where they're really needed (in fact, none yet), and use
> our downstream allocator handles for the rest. I did this, because
> benchmarks were showing that memory management overhead shoot through
> the roof if I tried doing everything via dma-buf.
>=20
> We can move support for allocating GEM handles to nvhost, and GEM
> handles can be treated just as another memory handle type in nvhost.
> tegradrm would then call nvhost for allocation.
>=20
We should aim for a clean split here. GEM handles are something which i=
s
really specific to how DRM works and as such should be constructed by
tegradrm. nvhost should really just manage allocations/virtual address
space and provide something that is able to back all the GEM handle
operations.

nvhost has really no reason at all to even know about GEM handles. If
you back a GEM object by a nvhost object you can just peel out the
nvhost handles from the GEM wrapper in the tegradrm submit ioctl handle=
r
and queue the job to nvhost using it's native handles.

This way you would also be able to construct different handles (like GE=
M
obj or V4L2 buffers) from the same backing nvhost object. Note that I'm
not sure how useful this would be, but it seems like a reasonable desig=
n
to me being able to do so.

> > This may also solve your problem with having multiple mappings of t=
he
> > same buffer into the very same address space, as nvhost is the sing=
le
> > instance that manages all host1x client address spaces. If the buff=
er is
> > originating from there you can easily check if it's already mapped.=
For
> > Tegra 3 to do things in an efficient way we likely have to move awa=
y
> > from dealing with the DMA API to dealing with the IOMMU API, this g=
ets a
> > _lot_ easier_ if you have a single point where you manage memory
> > allocation and address space.
>=20
> Yep, this would definitely simplify our IOMMU problem. But, I thought
> the canonical way of dealing with device memory is DMA API, and you'r=
e
> saying that we should just bypass it and call IOMMU directly?
>=20
This is true for all standard devices. But we should not consider this
as something set in stone and then building some crufty design around
it. If we can manage to make our design a lot cleaner by managing DMA
memory and the corresponding IOMMU address spaces for the host1x device=
s
ourselves, I think this is the way to go. All other graphics drivers in
the Linux kernel have to deal with their GTT in some way, we just happe=
n
to do so by using a shared system IOMMU and not something that is
exclusive to the graphics devices.

This is more work on the side of nvhost, but IMHO the benefits make it
look worthwhile.

What we should avoid is something that completely escapes the standard
ways of dealing with memory used in the Linux kernel, like using
carveout areas, but I think this is already consensus among us all.

[...]
> > This an implementation detail. Whether you shoot down the old pushb=
uf
> > mapping and insert a new one pointing to free backing memory (which=
may
> > be the way to go for 3D) or do an immediate copy of the channel pus=
hbuf
> > contents to the host1x pushbuf (which may be beneficial for very sm=
all
> > pushs) is all the same. Both methods implicitly guarantee that the
> > memory mapped by userspace always points to a location the CPU can =
write
> > to without interfering with the GPU.
>=20
> Ok. Based on this, I propose the way to go for cases without IOMMU
> support and all Tegra20 cases (as Tegra20's GART can't provide memory
> protection) is to copy the stream to host1x push buffer. In Tegra30 w=
ith
> IOMMU support we can just reference the buffer. This way we don't hav=
e
> to do expensive MMU operations.
>=20
Sounds like a plan.

Regards,
Lucas
Thierry Reding
2012-11-29 12:14:30 UTC
Permalink
On Thu, Nov 29, 2012 at 10:09:13AM +0100, Lucas Stach wrote:
> Am Donnerstag, den 29.11.2012, 10:17 +0200 schrieb Terje Bergström:
> > On 28.11.2012 20:46, Lucas Stach wrote:
> > > Am Mittwoch, den 28.11.2012, 18:23 +0200 schrieb Terje Bergström:
> > >> Sorry. I promised in another thread a write-up explaining the design. I
> > >> still owe you guys that.
> > > That would be really nice to have. I'm also particularly interested in
> > > how you plan to do synchronization of command streams to different
> > > engines working together, if that's not too much to ask for now. Like
> > > userspace uploading a texture in a buffer, 2D engine doing mipmap
> > > generation, 3D engine using mipmapped texture.
> >
> > I can briefly explain (and then copy-paste to a coherent text once I get
> > to it) how inter-engine synchronization is done. It's not specifically
> > for 2D or 3D, but generic to any host1x client.
> [...]
> Thanks for that.
> [...]
>
> > > 2. Move the exposed DRM interface more in line with other DRM drivers.
> > > Please take a look at how for example the GEM_EXECBUF ioctl works on
> > > other drivers to get a feeling of what I'm talking about. Everything
> > > using the display, 2D and maybe later on the 3D engine should only deal
> > > with GEM handles. I really don't like the idea of having a single
> > > userspace application, which uses engines with similar and known
> > > requirements (DDX) dealing with dma-buf handles or other similar high
> > > overhead stuff to do the most basic tasks.
> > > If we move down the allocator into nvhost we can use buffers allocated
> > > from this to back GEM or V4L2 buffers transparently. The ioctl to
> > > allocate a GEM buffer shouldn't do much more than wrapping the nvhost
> > > buffer.
> >
> > Ok, this is actually what we do downstream. We use dma-buf handles only
> > for purposes where they're really needed (in fact, none yet), and use
> > our downstream allocator handles for the rest. I did this, because
> > benchmarks were showing that memory management overhead shoot through
> > the roof if I tried doing everything via dma-buf.
> >
> > We can move support for allocating GEM handles to nvhost, and GEM
> > handles can be treated just as another memory handle type in nvhost.
> > tegradrm would then call nvhost for allocation.
> >
> We should aim for a clean split here. GEM handles are something which is
> really specific to how DRM works and as such should be constructed by
> tegradrm. nvhost should really just manage allocations/virtual address
> space and provide something that is able to back all the GEM handle
> operations.
>
> nvhost has really no reason at all to even know about GEM handles. If
> you back a GEM object by a nvhost object you can just peel out the
> nvhost handles from the GEM wrapper in the tegradrm submit ioctl handler
> and queue the job to nvhost using it's native handles.

That certainly sounds sensible to me. We would obviously no longer be
able to reuse the CMA GEM helpers, but if it makes things easier to
handle in general that's definitely something we can live with.

If I understand this correctly it would also allow us to do the buffer
management within host1x and therefore allow the differences between
Tegra20 (CMA) and Tegra30 (IOMMU) allocations to be handled in one
central place. That would indeed make things a lot easier in the host1x
client drivers.

> This way you would also be able to construct different handles (like GEM
> obj or V4L2 buffers) from the same backing nvhost object. Note that I'm
> not sure how useful this would be, but it seems like a reasonable design
> to me being able to do so.

Wouldn't that be useful for sharing buffers between DRM and V4L2 using
dma-buf? I'm not very familiar with how exactly importing and exporting
work with dma-buf, so maybe I need to read up some more.

Thierry
Terje Bergström
2012-11-30 07:44:36 UTC
Permalink
On 29.11.2012 14:14, Thierry Reding wrote:
> On Thu, Nov 29, 2012 at 10:09:13AM +0100, Lucas Stach wrote:
>> This way you would also be able to construct different handles (like GEM
>> obj or V4L2 buffers) from the same backing nvhost object. Note that I'm
>> not sure how useful this would be, but it seems like a reasonable design
>> to me being able to do so.
>
> Wouldn't that be useful for sharing buffers between DRM and V4L2 using
> dma-buf? I'm not very familiar with how exactly importing and exporting
> work with dma-buf, so maybe I need to read up some more.

I would still preserve the dma-buf support, for exactly this purpose.

Terje
Lucas Stach
2012-11-30 07:53:05 UTC
Permalink
Am Freitag, den 30.11.2012, 09:44 +0200 schrieb Terje Bergstr=C3=B6m:
> On 29.11.2012 14:14, Thierry Reding wrote:
> > On Thu, Nov 29, 2012 at 10:09:13AM +0100, Lucas Stach wrote:
> >> This way you would also be able to construct different handles (li=
ke GEM
> >> obj or V4L2 buffers) from the same backing nvhost object. Note tha=
t I'm
> >> not sure how useful this would be, but it seems like a reasonable =
design
> >> to me being able to do so.
> >=20
> > Wouldn't that be useful for sharing buffers between DRM and V4L2 us=
ing
> > dma-buf? I'm not very familiar with how exactly importing and expor=
ting
> > work with dma-buf, so maybe I need to read up some more.
>=20
> I would still preserve the dma-buf support, for exactly this purpose.
>=20
dma-buf is useful and should be preserved, as some userspace like
gstreamer might rely on us being able to import/export dma-buf handles
at some time. At the very latest we'll need it if someone wants to run =
a
UDL device to scanout a buffer rendered to by the internal GPU.

What I'm saying is just that with a common allocator we could cut down =
a
lot on the usage of dma-buf, where not really necessary. Also you might
be able to do some optimisations based on the fact that a dma-buf handl=
e
exported for some V4L2 buffer, which gets imported into DRM to construc=
t
a GEM object, is the very same nvhost object in the end.

Regards,
Lucas
Terje Bergström
2012-11-29 13:36:30 UTC
Permalink
On 29.11.2012 11:09, Lucas Stach wrote:
> We should aim for a clean split here. GEM handles are something which is
> really specific to how DRM works and as such should be constructed by
> tegradrm. nvhost should really just manage allocations/virtual address
> space and provide something that is able to back all the GEM handle
> operations.
>
> nvhost has really no reason at all to even know about GEM handles. If
> you back a GEM object by a nvhost object you can just peel out the
> nvhost handles from the GEM wrapper in the tegradrm submit ioctl handler
> and queue the job to nvhost using it's native handles.
>
> This way you would also be able to construct different handles (like GEM
> obj or V4L2 buffers) from the same backing nvhost object. Note that I'm
> not sure how useful this would be, but it seems like a reasonable design
> to me being able to do so.

Ok, I must say that I got totally surprised by this and almost fell off
the bench of the bus while commuting to home and reading this mail. On
the technical side, what you wrote makes perfect sense and we'll go
through this idea very carefully, so don't take me wrong.

What surprised me was that we had always assumed that nvmap, the
allocator we use in downstream kernel, would never be something that
would be accepted upstream, so we haven't done work at all on cleaning
it up and refactoring it for upstreaming, and cutting ties between
nvhost and nvmap. We assumed that we need to provide something that fit
into tegradrm and interacts with dma_buf and GEM, so we've written
something small that fulfills this need.

Now what you're suggesting is akin to getting a subset of nvmap into
picture. In downstream kernel it already takes care of all memory
management problems we've discussed wrt IOMMU (duplicate management,
different memory architectures, etc). But, it has a lot more than what
we need for now, so we'd need to decide if we go for importing parts of
nvmap as nvhost allocator, or use the allocator in the patchset I sent
earlier as basis.

>> Yep, this would definitely simplify our IOMMU problem. But, I thought
>> the canonical way of dealing with device memory is DMA API, and you're
>> saying that we should just bypass it and call IOMMU directly?
>>
> This is true for all standard devices. But we should not consider this
> as something set in stone and then building some crufty design around
> it. If we can manage to make our design a lot cleaner by managing DMA
> memory and the corresponding IOMMU address spaces for the host1x devices
> ourselves, I think this is the way to go. All other graphics drivers in
> the Linux kernel have to deal with their GTT in some way, we just happen
> to do so by using a shared system IOMMU and not something that is
> exclusive to the graphics devices.
>
> This is more work on the side of nvhost, but IMHO the benefits make it
> look worthwhile.
> What we should avoid is something that completely escapes the standard
> ways of dealing with memory used in the Linux kernel, like using
> carveout areas, but I think this is already consensus among us all.

Makes perfect sense. I'll need to hash out a proposal on how to go about
this.

Terje
Stephen Warren
2012-11-28 16:24:21 UTC
Permalink
On 11/28/2012 07:45 AM, Terje Bergstr=C3=B6m wrote:
> On 28.11.2012 16:06, Lucas Stach wrote:
>> Why do even need/use dma-buf for this use case? This is all one DRM
>> device, even if we separate host1x and gr2d as implementation module=
s.
>=20
> I didn't want to implement dependency to drm gem objects in nvhost, b=
ut
> we have thought about doing that. dma-buf brings quite a lot of
> overhead, so implementing support for gem buffers would make the
> sequence a bit leaner.
>=20
> nvhost already has infra to support multiple memory managers.
>=20
>> So standard way of doing this is:
>> 1. create gem object for pushbuffer
>> 2. create fake mmap offset for gem obj
>> 3. map pushbuf using the fake offset on the drm device
>> 4. at submit time zap the mapping
>>
>> You need this logic anyway, as normally we don't rely on userspace t=
o
>> sync gpu and cpu, but use the kernel to handle the concurrency issue=
s.
>=20
> Taking a step back - 2D streams are actually very short, in the order=
of
> <100 bytes. Just copying them to kernel space would actually be faste=
r
> than doing MMU operations.

I'm not sure it's a good idea to have one buffer submission mechanism
for the 2D class and another for the 3D/... class, nor to bet that 2D
streams will always be short.
Thomas Hellstrom
2012-11-28 20:53:53 UTC
Permalink
On 11/28/2012 02:33 PM, Lucas Stach wrote:
> Am Mittwoch, den 28.11.2012, 15:17 +0200 schrieb Terje Bergström:
>> On 28.11.2012 01:00, Dave Airlie wrote:
>>> We generally aim for the first, to stop the gpu from reading/writing
>>> any memory it hasn't been granted access to,
>>> the second is nice to have though, but really requires a GPU with VM
>>> to implement properly.
>> I wonder if we should aim at root only access on Tegra20, and force
>> IOMMU on Tegra30 and fix the remaining issues we have with IOMMU. The
>> firewall turns out to be more complicated than I wished.
>>
>> Biggest problem is that we aim at zero-copy for everything possible,
>> including command streams. Kernel gets a handle to a command stream, but
>> the command stream is allocated by the user space process. So the user
>> space can tamper with the stream once it's been written to the host1x 2D
>> channel.
>>
> So this is obviously wrong. Userspace has to allocate a pushbuffer from
> the kernel just as every other buffer, then map it into it's own address
> space to push in commands. At submit time of the pushbuf kernel has to
> make sure that userspace is not able to access the memory any more, i.e.
> kernel shoots down the vma or pagetable of the vma.

To me this sounds very expensive. Zapping the page table requires a CPU
TLB flush
on all cores that have touched the buffer, not to mention the kernel calls
required to set up the page table once the buffer is reused.

If this usage scheme then is combined with a command verifier or
"firewall" that
reads from a *write-combined* pushbuffer performance will be bad. Really
bad.

In such situations I think one should consider copy-from-user while
validating, and
let user-space set up the command buffer in malloced memory.

/Thomas
Mark Zhang
2012-12-03 09:30:00 UTC
Permalink
Hi Dave:

I'm new in kernel development. Could you tell me or give me some
materials to read that why we need to align the size of IOCTL structures
to 64bit? I can understand if we're working in a 64bit kernel but why we
need to do this if we're in a 32bit arm kernel? Besides, why the
pointers in IOCTL structure should be declared as u64?

Mark
On 11/27/2012 06:15 AM, Dave Airlie wrote:
>> static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>> {
>> - return 0;
>> + struct tegra_drm_fpriv *fpriv;
>> + int err = 0;
>> +
>> + fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
>> + if (!fpriv)
>> + return -ENOMEM;
>> +
>> + INIT_LIST_HEAD(&fpriv->contexts);
>> + filp->driver_priv = fpriv;
>> +
>
> who frees this?
>> +struct tegra_drm_syncpt_incr_args {
>> + __u32 id;
>> +};
>
> add 32-bits of padding here
>
>> +
>> +struct tegra_drm_syncpt_wait_args {
>> + __u32 id;
>> + __u32 thresh;
>> + __s32 timeout;
>> + __u32 value;
>> +};
>> +
>> +#define DRM_TEGRA_NO_TIMEOUT (-1)
>> +
>> +struct tegra_drm_open_channel_args {
>> + __u32 class;
>> + void *context;
>
> no pointers use u64, align them to 64-bits, so 32-bits of padding,
>
>> +};
>> +
>> +struct tegra_drm_get_channel_param_args {
>> + void *context;
>> + __u32 value;
>
> Same padding + uint64_t for void *
>
>> +};
>> +
>> +struct tegra_drm_syncpt_incr {
>> + __u32 syncpt_id;
>> + __u32 syncpt_incrs;
>> +};
>> +
>> +struct tegra_drm_cmdbuf {
>> + __u32 mem;
>> + __u32 offset;
>> + __u32 words;
>> +};
>
> add padding
>> +
>> +struct tegra_drm_reloc {
>> + __u32 cmdbuf_mem;
>> + __u32 cmdbuf_offset;
>> + __u32 target;
>> + __u32 target_offset;
>> + __u32 shift;
>> +};
>
> add padding
>
>> +
>> +struct tegra_drm_waitchk {
>> + __u32 mem;
>> + __u32 offset;
>> + __u32 syncpt_id;
>> + __u32 thresh;
>> +};
>> +
>> +struct tegra_drm_submit_args {
>> + void *context;
>> + __u32 num_syncpt_incrs;
>> + __u32 num_cmdbufs;
>> + __u32 num_relocs;
>> + __u32 submit_version;
>> + __u32 num_waitchks;
>> + __u32 waitchk_mask;
>> + __u32 timeout;
>> + struct tegra_drm_syncpt_incrs *syncpt_incrs;
>> + struct tegra_drm_cmdbuf *cmdbufs;
>> + struct tegra_drm_reloc *relocs;
>> + struct tegra_drm_waitchk *waitchks;
>> +
>> + __u32 pad[5]; /* future expansion */
>> + __u32 fence; /* Return value */
>> +};
>
> lose all the pointers for 64-bit aligned uint64_t.
>
> Probably should align all of these on __u64 and __u32 usage if possible.
>
> i'll look at the rest of the patches, but I need to know what commands
> can be submitted via this interface and what are the security
> implications of it.
>
> Dave.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo-***@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Daniel Vetter
2012-12-03 09:40:10 UTC
Permalink
On Mon, Dec 3, 2012 at 10:30 AM, Mark Zhang <***@gmail.com> wrote:
> I'm new in kernel development. Could you tell me or give me some
> materials to read that why we need to align the size of IOCTL structures
> to 64bit? I can understand if we're working in a 64bit kernel but why we
> need to do this if we're in a 32bit arm kernel? Besides, why the
> pointers in IOCTL structure should be declared as u64?

Because in a few years/months you'll have arm64, but still the same
driver with the same ioctls ... and if the ioctls are not _exactly_
the same you get to write compat ioctl code which copies the old 32bit
struct into the 64bit struct the kernel understands. Hence your ioctl
must be laid out exactly the same for both 32bit and 64bit, which
happens if you naturally align/pad everything to 64bits and only use
fixed-sized integers and no pointers.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Mark Zhang
2012-12-04 01:49:41 UTC
Permalink
On 12/03/2012 05:40 PM, Daniel Vetter wrote:
> On Mon, Dec 3, 2012 at 10:30 AM, Mark Zhang <nvmarkzhang-***@public.gmane.org> wrote:
>> I'm new in kernel development. Could you tell me or give me some
>> materials to read that why we need to align the size of IOCTL structures
>> to 64bit? I can understand if we're working in a 64bit kernel but why we
>> need to do this if we're in a 32bit arm kernel? Besides, why the
>> pointers in IOCTL structure should be declared as u64?
>
> Because in a few years/months you'll have arm64, but still the same
> driver with the same ioctls ... and if the ioctls are not _exactly_
> the same you get to write compat ioctl code which copies the old 32bit
> struct into the 64bit struct the kernel understands. Hence your ioctl
> must be laid out exactly the same for both 32bit and 64bit, which
> happens if you naturally align/pad everything to 64bits and only use
> fixed-sized integers and no pointers.

Ah, I see. Thanks. Yes, u64 still works as 32 bit pointer.

Mark
> -Daniel
>
Mark Zhang
2012-11-29 07:37:04 UTC
Permalink
On 11/26/2012 09:19 PM, Terje Bergstr=C3=B6m <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org> wr=
ote:
> Add client driver for 2D device.
>=20
> Signed-off-by: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+***@public.gmane.org>
> Signed-off-by: Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org>
>=20
> ---
>=20
[...]
> +
> +static int
> +tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
> + struct drm_file *file_priv)
> +{
> + struct tegra_drm_open_channel_args *args =3D data;
> + struct tegra_drm_client *client;
> + struct tegra_drm_context *context;
> + struct tegra_drm_fpriv *fpriv =3D tegra_drm_fpriv(file_priv);
> + int err =3D 0;
> +
> + dev_dbg(drm->dev, "> %s(fpriv=3D%p, class=3D%x)\n", __func__,
> + fpriv, args->class);
> +
> + context =3D kzalloc(sizeof(*context), GFP_KERNEL);
> + if (!context) {
> + err =3D -ENOMEM;
> + goto out;

Change to "return -ENOMEM". Otherwise the NULL "context" will be kfree.

> + }
> +
> + list_for_each_entry(client, &tegra_drm_subdrv_list, list) {
> + if (client->class =3D=3D args->class) {
> + dev_dbg(drm->dev, "opening client %x\n", args->class);
> + context->client =3D client;
> + err =3D client->ops->open_channel(client, context);
> + if (err)
> + goto out;
> +
> + dev_dbg(drm->dev, "context %p\n", context);
> + list_add(&context->list, &fpriv->contexts);
> + args->context =3D context;
> + goto out;
> + }
> + }
> + err =3D -ENODEV;
> +
> +out:
> + if (err)
> + kfree(context);
> +
> + dev_dbg(drm->dev, "< %s() =3D %d\n", __func__, err);
> + return err;
> +}
> +
> +static int
> +tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
> + struct drm_file *file_priv)
> +{
> + struct tegra_drm_open_channel_args *args =3D data;
> + struct tegra_drm_context *context;
> + struct tegra_drm_fpriv *fpriv =3D tegra_drm_fpriv(file_priv);
> + int err =3D 0;
> +
> + dev_dbg(drm->dev, "> %s(fpriv=3D%p)\n", __func__, fpriv);
> + list_for_each_entry(context, &fpriv->contexts, list) {

Consider "list_for_each_entry_safe" cause you remove list members durin=
g
the loop.

> + if (context =3D=3D args->context) {
> + context->client->ops->close_channel(context);
> + list_del(&context->list);
> + kfree(context);
> + goto out;
> + }
> + }
> + err =3D -EINVAL;
> +
> +out:
> + dev_dbg(drm->dev, "< %s() =3D %d\n", __func__, err);
> + return err;
> +}
> +
[...]
> +
> +static int gr2d_submit(struct tegra_drm_context *context,
> + struct tegra_drm_submit_args *args)

I'm still in the middle of code reading of job submit, so I'll get you
back if I find something.

[...]
> +
> +static struct of_device_id gr2d_match[] __devinitdata =3D {
> + { .compatible =3D "nvidia,tegra20-gr2d", },
> + { .compatible =3D "nvidia,tegra30-gr2d", },

Just as swarren mentioned, you'd better place "nvidia,tegra30-gr2d" in
the front of "nvidia,tegra20-gr2d"...

[...]
> +
> +#define DRM_TEGRA_GEM_CREATE 0x00
> +
> +#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + \
> + DRM_TEGRA_GEM_CREATE, struct tegra_gem_create)
> +

Just a very minor suggestion: could you put them at the end of this
file? I mean, stay with other IOCTLs like SYNCPT_READ, SYNCPT_WAIT...

[...]
> +
> +#define DRM_TEGRA_DRM_SYNCPT_READ 0x01
> +#define DRM_TEGRA_DRM_SYNCPT_INCR 0x02
> +#define DRM_TEGRA_DRM_SYNCPT_WAIT 0x03
> +#define DRM_TEGRA_DRM_OPEN_CHANNEL 0x04
> +#define DRM_TEGRA_DRM_CLOSE_CHANNEL 0x05
> +#define DRM_TEGRA_DRM_GET_SYNCPOINTS 0x06
> +#define DRM_TEGRA_DRM_GET_MODMUTEXES 0x07
> +#define DRM_TEGRA_DRM_SUBMIT 0x08
> +
> +#define DRM_IOCTL_TEGRA_DRM_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + =
DRM_TEGRA_DRM_SYNCPT_READ, struct tegra_drm_syncpt_read_args)
> +#define DRM_IOCTL_TEGRA_DRM_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + =
DRM_TEGRA_DRM_SYNCPT_INCR, struct tegra_drm_syncpt_incr_args)
> +#define DRM_IOCTL_TEGRA_DRM_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + =
DRM_TEGRA_DRM_SYNCPT_WAIT, struct tegra_drm_syncpt_wait_args)
> +#define DRM_IOCTL_TEGRA_DRM_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE +=
DRM_TEGRA_DRM_OPEN_CHANNEL, struct tegra_drm_open_channel_args)
> +#define DRM_IOCTL_TEGRA_DRM_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE =
+ DRM_TEGRA_DRM_CLOSE_CHANNEL, struct tegra_drm_open_channel_args)
> +#define DRM_IOCTL_TEGRA_DRM_GET_SYNCPOINTS DRM_IOWR(DRM_COMMAND_BASE=
+ DRM_TEGRA_DRM_GET_SYNCPOINTS, struct tegra_drm_get_channel_param_arg=
s)
> +#define DRM_IOCTL_TEGRA_DRM_GET_MODMUTEXES DRM_IOWR(DRM_COMMAND_BASE=
+ DRM_TEGRA_DRM_GET_MODMUTEXES, struct tegra_drm_get_channel_param_arg=
s)
> +#define DRM_IOCTL_TEGRA_DRM_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_T=
EGRA_DRM_SUBMIT, struct tegra_drm_submit_args)
> +
> +#endif
>=20
Terje Bergstrom
2012-11-26 13:19:07 UTC
Permalink
Add nvhost, the driver for host1x. This patch adds support for reading and
incrementing sync points and dynamic power management.

Signed-off-by: Terje Bergstrom <***@nvidia.com>
---
drivers/video/Kconfig | 2 +
drivers/video/Makefile | 2 +
drivers/video/tegra/host/Kconfig | 5 +
drivers/video/tegra/host/Makefile | 10 +
drivers/video/tegra/host/chip_support.c | 48 ++
drivers/video/tegra/host/chip_support.h | 52 +++
drivers/video/tegra/host/dev.c | 96 ++++
drivers/video/tegra/host/host1x/Makefile | 7 +
drivers/video/tegra/host/host1x/host1x.c | 204 +++++++++
drivers/video/tegra/host/host1x/host1x.h | 78 ++++
drivers/video/tegra/host/host1x/host1x01.c | 37 ++
drivers/video/tegra/host/host1x/host1x01.h | 29 ++
.../video/tegra/host/host1x/host1x01_hardware.h | 36 ++
drivers/video/tegra/host/host1x/host1x_syncpt.c | 156 +++++++
drivers/video/tegra/host/host1x/hw_host1x01_sync.h | 398 ++++++++++++++++
drivers/video/tegra/host/nvhost_acm.c | 481 ++++++++++++++++++++
drivers/video/tegra/host/nvhost_acm.h | 45 ++
drivers/video/tegra/host/nvhost_syncpt.c | 333 ++++++++++++++
drivers/video/tegra/host/nvhost_syncpt.h | 136 ++++++
include/linux/nvhost.h | 143 ++++++
20 files changed, 2298 insertions(+)
create mode 100644 drivers/video/tegra/host/Kconfig
create mode 100644 drivers/video/tegra/host/Makefile
create mode 100644 drivers/video/tegra/host/chip_support.c
create mode 100644 drivers/video/tegra/host/chip_support.h
create mode 100644 drivers/video/tegra/host/dev.c
create mode 100644 drivers/video/tegra/host/host1x/Makefile
create mode 100644 drivers/video/tegra/host/host1x/host1x.c
create mode 100644 drivers/video/tegra/host/host1x/host1x.h
create mode 100644 drivers/video/tegra/host/host1x/host1x01.c
create mode 100644 drivers/video/tegra/host/host1x/host1x01.h
create mode 100644 drivers/video/tegra/host/host1x/host1x01_hardware.h
create mode 100644 drivers/video/tegra/host/host1x/host1x_syncpt.c
create mode 100644 drivers/video/tegra/host/host1x/hw_host1x01_sync.h
create mode 100644 drivers/video/tegra/host/nvhost_acm.c
create mode 100644 drivers/video/tegra/host/nvhost_acm.h
create mode 100644 drivers/video/tegra/host/nvhost_syncpt.c
create mode 100644 drivers/video/tegra/host/nvhost_syncpt.h
create mode 100644 include/linux/nvhost.h

diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index fb9a14e..94c861b 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -2463,4 +2463,6 @@ config FB_SH_MOBILE_MERAM
Up to 4 memory channels can be configured, allowing 4 RGB or
2 YCbCr framebuffers to be configured.

+source "drivers/video/tegra/host/Kconfig"
+
endmenu
diff --git a/drivers/video/Makefile b/drivers/video/Makefile
index b936b00..61a4287 100644
--- a/drivers/video/Makefile
+++ b/drivers/video/Makefile
@@ -17,6 +17,8 @@ obj-y += backlight/

obj-$(CONFIG_EXYNOS_VIDEO) += exynos/

+obj-$(CONFIG_TEGRA_HOST1X) += tegra/host/
+
obj-$(CONFIG_FB_CFB_FILLRECT) += cfbfillrect.o
obj-$(CONFIG_FB_CFB_COPYAREA) += cfbcopyarea.o
obj-$(CONFIG_FB_CFB_IMAGEBLIT) += cfbimgblt.o
diff --git a/drivers/video/tegra/host/Kconfig b/drivers/video/tegra/host/Kconfig
new file mode 100644
index 0000000..ebe9bbc
--- /dev/null
+++ b/drivers/video/tegra/host/Kconfig
@@ -0,0 +1,5 @@
+config TEGRA_HOST1X
+ tristate "Tegra host1x driver"
+ help
+ Driver for the Tegra host1x hardware.
+
diff --git a/drivers/video/tegra/host/Makefile b/drivers/video/tegra/host/Makefile
new file mode 100644
index 0000000..3edab4a
--- /dev/null
+++ b/drivers/video/tegra/host/Makefile
@@ -0,0 +1,10 @@
+ccflags-y = -Idrivers/video/tegra/host
+
+nvhost-objs = \
+ nvhost_acm.o \
+ nvhost_syncpt.o \
+ dev.o \
+ chip_support.o
+
+obj-$(CONFIG_TEGRA_HOST1X) += host1x/
+obj-$(CONFIG_TEGRA_HOST1X) += nvhost.o
diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/tegra/host/chip_support.c
new file mode 100644
index 0000000..5a44147
--- /dev/null
+++ b/drivers/video/tegra/host/chip_support.c
@@ -0,0 +1,48 @@
+/*
+ * drivers/video/tegra/host/chip_support.c
+ *
+ * Tegra host1x chip support module
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+
+#include "chip_support.h"
+#include "host1x/host1x01.h"
+
+struct nvhost_chip_support *nvhost_chip_ops;
+
+struct nvhost_chip_support *nvhost_get_chip_ops(void)
+{
+ return nvhost_chip_ops;
+}
+
+int nvhost_init_chip_support(struct nvhost_master *host)
+{
+ if (nvhost_chip_ops == NULL) {
+ nvhost_chip_ops = kzalloc(sizeof(*nvhost_chip_ops), GFP_KERNEL);
+ if (nvhost_chip_ops == NULL) {
+ pr_err("%s: Cannot allocate nvhost_chip_support\n",
+ __func__);
+ return -ENOMEM;
+ }
+ }
+
+ nvhost_init_host1x01_support(host, nvhost_chip_ops);
+ return 0;
+}
diff --git a/drivers/video/tegra/host/chip_support.h b/drivers/video/tegra/host/chip_support.h
new file mode 100644
index 0000000..acfa2f1
--- /dev/null
+++ b/drivers/video/tegra/host/chip_support.h
@@ -0,0 +1,52 @@
+/*
+ * drivers/video/tegra/host/chip_support.h
+ *
+ * Tegra host1x chip Support
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _NVHOST_CHIP_SUPPORT_H_
+#define _NVHOST_CHIP_SUPPORT_H_
+
+#include <linux/types.h>
+
+struct output;
+
+struct nvhost_master;
+struct nvhost_syncpt;
+struct platform_device;
+
+struct nvhost_syncpt_ops {
+ void (*reset)(struct nvhost_syncpt *, u32 id);
+ void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
+ void (*read_wait_base)(struct nvhost_syncpt *, u32 id);
+ u32 (*update_min)(struct nvhost_syncpt *, u32 id);
+ void (*cpu_incr)(struct nvhost_syncpt *, u32 id);
+ void (*debug)(struct nvhost_syncpt *);
+ const char * (*name)(struct nvhost_syncpt *, u32 id);
+};
+
+struct nvhost_chip_support {
+ const char *soc_name;
+ struct nvhost_syncpt_ops syncpt;
+};
+
+struct nvhost_chip_support *nvhost_get_chip_ops(void);
+
+#define syncpt_op() (nvhost_get_chip_ops()->syncpt)
+
+int nvhost_init_chip_support(struct nvhost_master *host);
+
+#endif /* _NVHOST_CHIP_SUPPORT_H_ */
diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
new file mode 100644
index 0000000..98c9c9f
--- /dev/null
+++ b/drivers/video/tegra/host/dev.c
@@ -0,0 +1,96 @@
+/*
+ * drivers/video/tegra/host/dev.c
+ *
+ * Tegra host1x driver
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include "host1x/host1x.h"
+#include "nvhost_acm.h"
+
+u32 host1x_syncpt_incr_max(u32 id, u32 incrs)
+{
+ struct nvhost_syncpt *sp = &nvhost->syncpt;
+ return nvhost_syncpt_incr_max(sp, id, incrs);
+}
+EXPORT_SYMBOL(host1x_syncpt_incr_max);
+
+void host1x_syncpt_incr(u32 id)
+{
+ struct nvhost_syncpt *sp = &nvhost->syncpt;
+ nvhost_syncpt_incr(sp, id);
+}
+EXPORT_SYMBOL(host1x_syncpt_incr);
+
+u32 host1x_syncpt_read(u32 id)
+{
+ struct nvhost_syncpt *sp = &nvhost->syncpt;
+ return nvhost_syncpt_read(sp, id);
+}
+EXPORT_SYMBOL(host1x_syncpt_read);
+
+bool host1x_powered(struct platform_device *dev)
+{
+ bool ret = 0;
+
+ /* get the parent */
+ if (dev->dev.parent) {
+ struct platform_device *pdev;
+ pdev = to_platform_device(dev->dev.parent);
+
+ ret = nvhost_module_powered(pdev);
+ } else {
+ dev_warn(&dev->dev, "Cannot return power state, no parent\n");
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL(host1x_powered);
+
+void host1x_busy(struct platform_device *dev)
+{
+ /* get the parent */
+ if (dev->dev.parent) {
+ struct platform_device *pdev;
+ pdev = to_platform_device(dev->dev.parent);
+
+ nvhost_module_busy(pdev);
+ } else {
+ dev_warn(&dev->dev, "Cannot turn on, no parent\n");
+ }
+}
+EXPORT_SYMBOL(host1x_busy);
+
+void host1x_idle(struct platform_device *dev)
+{
+ /* get the parent */
+ if (dev->dev.parent) {
+ struct platform_device *pdev;
+ pdev = to_platform_device(dev->dev.parent);
+
+ nvhost_module_idle(pdev);
+ } else {
+ dev_warn(&dev->dev, "Cannot idle, no parent\n");
+ }
+}
+EXPORT_SYMBOL(host1x_idle);
+
+MODULE_AUTHOR("Terje Bergstrom <***@nvidia.com>");
+MODULE_DESCRIPTION("Host1x driver for Tegra products");
+MODULE_VERSION("1.0");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform-nvhost");
diff --git a/drivers/video/tegra/host/host1x/Makefile b/drivers/video/tegra/host/host1x/Makefile
new file mode 100644
index 0000000..330d507
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/Makefile
@@ -0,0 +1,7 @@
+ccflags-y = -Idrivers/video/tegra/host
+
+nvhost-host1x-objs = \
+ host1x.o \
+ host1x01.o
+
+obj-$(CONFIG_TEGRA_HOST1X) += nvhost-host1x.o
diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c
new file mode 100644
index 0000000..77ff00b
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x.c
@@ -0,0 +1,204 @@
+/*
+ * drivers/video/tegra/host/host1x.c
+ *
+ * Tegra host1x Driver Entrypoint
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/cdev.h>
+#include <linux/uaccess.h>
+#include <linux/file.h>
+#include <linux/clk.h>
+#include <linux/hrtimer.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/nvhost.h>
+
+#include "host1x/host1x.h"
+#include "nvhost_acm.h"
+#include "chip_support.h"
+
+#define DRIVER_NAME "tegra-host1x"
+
+struct nvhost_master *nvhost;
+
+static void power_on_host(struct platform_device *dev)
+{
+ struct nvhost_master *host = nvhost_get_private_data(dev);
+
+ nvhost_syncpt_reset(&host->syncpt);
+}
+
+static int power_off_host(struct platform_device *dev)
+{
+ struct nvhost_master *host = nvhost_get_private_data(dev);
+
+ nvhost_syncpt_save(&host->syncpt);
+ return 0;
+}
+
+static void nvhost_free_resources(struct nvhost_master *host)
+{
+}
+
+static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
+{
+ int err;
+
+ err = nvhost_init_chip_support(host);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static int __devinit nvhost_probe(struct platform_device *dev)
+{
+ struct nvhost_master *host;
+ struct resource *regs, *intr0, *intr1;
+ int i, err;
+ struct nvhost_device_data *pdata =
+ (struct nvhost_device_data *)dev->dev.platform_data;
+
+ regs = platform_get_resource(dev, IORESOURCE_MEM, 0);
+ intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0);
+ intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1);
+
+ if (!regs || !intr0 || !intr1) {
+ dev_err(&dev->dev, "missing required platform resources\n");
+ return -ENXIO;
+ }
+
+ host = devm_kzalloc(&dev->dev, sizeof(*host), GFP_KERNEL);
+ if (!host)
+ return -ENOMEM;
+
+ nvhost = host;
+
+ host->dev = dev;
+
+ /* Copy host1x parameters. The private_data gets replaced
+ * by nvhost_master later */
+ memcpy(&host->info, pdata->private_data,
+ sizeof(struct host1x_device_info));
+
+ pdata->finalize_poweron = power_on_host;
+ pdata->prepare_poweroff = power_off_host;
+
+ pdata->pdev = dev;
+
+ /* set common host1x device data */
+ platform_set_drvdata(dev, pdata);
+
+ /* set private host1x device data */
+ nvhost_set_private_data(dev, host);
+
+ host->aperture = devm_request_and_ioremap(&dev->dev, regs);
+ if (!host->aperture) {
+ dev_err(&dev->dev, "failed to remap host registers\n");
+ err = -ENXIO;
+ goto fail;
+ }
+
+ err = nvhost_alloc_resources(host);
+ if (err) {
+ dev_err(&dev->dev, "failed to init chip support\n");
+ goto fail;
+ }
+
+ err = nvhost_syncpt_init(dev, &host->syncpt);
+ if (err)
+ goto fail;
+
+ err = nvhost_module_init(dev);
+ if (err)
+ goto fail;
+
+ for (i = 0; i < pdata->num_clks; i++)
+ clk_prepare_enable(pdata->clk[i]);
+ nvhost_syncpt_reset(&host->syncpt);
+ for (i = 0; i < pdata->num_clks; i++)
+ clk_disable_unprepare(pdata->clk[i]);
+
+ dev_info(&dev->dev, "initialized\n");
+
+ return 0;
+
+fail:
+ nvhost_free_resources(host);
+ kfree(host);
+ return err;
+}
+
+static int __exit nvhost_remove(struct platform_device *dev)
+{
+ struct nvhost_master *host = nvhost_get_private_data(dev);
+ nvhost_syncpt_deinit(&host->syncpt);
+ nvhost_module_deinit(dev);
+ nvhost_free_resources(host);
+ return 0;
+}
+
+static int nvhost_suspend(struct platform_device *dev, pm_message_t state)
+{
+ struct nvhost_master *host = nvhost_get_private_data(dev);
+ int ret = 0;
+
+ ret = nvhost_module_suspend(host->dev);
+ dev_info(&dev->dev, "suspend status: %d\n", ret);
+
+ return ret;
+}
+
+static int nvhost_resume(struct platform_device *dev)
+{
+ dev_info(&dev->dev, "resuming\n");
+ return 0;
+}
+
+static struct of_device_id host1x_match[] __devinitdata = {
+ { .compatible = "nvidia,tegra20-host1x", },
+ { .compatible = "nvidia,tegra30-host1x", },
+ { },
+};
+
+static struct platform_driver platform_driver = {
+ .probe = nvhost_probe,
+ .remove = __exit_p(nvhost_remove),
+ .suspend = nvhost_suspend,
+ .resume = nvhost_resume,
+ .driver = {
+ .owner = THIS_MODULE,
+ .name = DRIVER_NAME,
+ .of_match_table = of_match_ptr(host1x_match),
+ },
+};
+
+static int __init nvhost_mod_init(void)
+{
+ return platform_driver_register(&platform_driver);
+}
+
+static void __exit nvhost_mod_exit(void)
+{
+ platform_driver_unregister(&platform_driver);
+}
+
+module_init(nvhost_mod_init);
+module_exit(nvhost_mod_exit);
+
diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h
new file mode 100644
index 0000000..76748ac
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x.h
@@ -0,0 +1,78 @@
+/*
+ * drivers/video/tegra/host/host1x/host1x.h
+ *
+ * Tegra host1x Driver Entrypoint
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_HOST1X_H
+#define __NVHOST_HOST1X_H
+
+#include <linux/cdev.h>
+#include <linux/nvhost.h>
+
+#include "nvhost_syncpt.h"
+
+#define TRACE_MAX_LENGTH 128U
+#define IFACE_NAME "nvhost"
+
+struct nvhost_master {
+ void __iomem *aperture;
+ void __iomem *sync_aperture;
+ struct nvhost_syncpt syncpt;
+ struct platform_device *dev;
+ struct host1x_device_info info;
+};
+
+extern struct nvhost_master *nvhost;
+
+static inline void *nvhost_get_private_data(struct platform_device *_dev)
+{
+ struct nvhost_device_data *pdata =
+ (struct nvhost_device_data *)platform_get_drvdata(_dev);
+ WARN_ON(!pdata);
+ return (pdata && pdata->private_data) ? pdata->private_data : NULL;
+}
+
+static inline void nvhost_set_private_data(struct platform_device *_dev,
+ void *priv_data)
+{
+ struct nvhost_device_data *pdata =
+ (struct nvhost_device_data *)platform_get_drvdata(_dev);
+ WARN_ON(!pdata);
+ if (pdata)
+ pdata->private_data = priv_data;
+}
+
+static inline
+struct nvhost_master *nvhost_get_host(struct platform_device *_dev)
+{
+ struct platform_device *pdev;
+
+ if (_dev->dev.parent) {
+ pdev = to_platform_device(_dev->dev.parent);
+ return nvhost_get_private_data(pdev);
+ } else
+ return nvhost_get_private_data(_dev);
+}
+
+static inline
+struct platform_device *nvhost_get_parent(struct platform_device *_dev)
+{
+ return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL;
+}
+
+#endif
diff --git a/drivers/video/tegra/host/host1x/host1x01.c b/drivers/video/tegra/host/host1x/host1x01.c
new file mode 100644
index 0000000..d53302d
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x01.c
@@ -0,0 +1,37 @@
+/*
+ * drivers/video/tegra/host/host1x01.c
+ *
+ * Host1x init for T20 and T30 Architecture Chips
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/nvhost.h>
+
+#include "host1x/host1x01.h"
+#include "host1x/host1x.h"
+#include "host1x/host1x01_hardware.h"
+#include "chip_support.h"
+
+#include "host1x/host1x_syncpt.c"
+
+int nvhost_init_host1x01_support(struct nvhost_master *host,
+ struct nvhost_chip_support *op)
+{
+ host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE;
+ op->syncpt = host1x_syncpt_ops;
+
+ return 0;
+}
diff --git a/drivers/video/tegra/host/host1x/host1x01.h b/drivers/video/tegra/host/host1x/host1x01.h
new file mode 100644
index 0000000..91624d66
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x01.h
@@ -0,0 +1,29 @@
+/*
+ * drivers/video/tegra/host/host1x01.h
+ *
+ * Host1x init for T20 and T30 Architecture Chips
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef NVHOST_HOST1X01_H
+#define NVHOST_HOST1X01_H
+
+struct nvhost_master;
+struct nvhost_chip_support;
+
+int nvhost_init_host1x01_support(struct nvhost_master *,
+ struct nvhost_chip_support *);
+
+#endif /* NVHOST_HOST1X01_H_ */
diff --git a/drivers/video/tegra/host/host1x/host1x01_hardware.h b/drivers/video/tegra/host/host1x/host1x01_hardware.h
new file mode 100644
index 0000000..0da7e06
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x01_hardware.h
@@ -0,0 +1,36 @@
+/*
+ * drivers/video/tegra/host/host1x/host1x01_hardware.h
+ *
+ * Tegra host1x Register Offsets for Tegra20 and Tegra30
+ *
+ * Copyright (c) 2010-2012 NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_HOST1X01_HARDWARE_H
+#define __NVHOST_HOST1X01_HARDWARE_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include "hw_host1x01_sync.h"
+
+/* channel registers */
+#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384
+#define NV_HOST1X_SYNC_MLOCK_NUM 16
+
+/* sync registers */
+#define HOST1X_CHANNEL_SYNC_REG_BASE 0x3000
+#define NV_HOST1X_NB_MLOCKS 16
+
+#endif
diff --git a/drivers/video/tegra/host/host1x/host1x_syncpt.c b/drivers/video/tegra/host/host1x/host1x_syncpt.c
new file mode 100644
index 0000000..57cc1b1
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x_syncpt.c
@@ -0,0 +1,156 @@
+/*
+ * drivers/video/tegra/host/host1x/host1x_syncpt.c
+ *
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/io.h>
+#include "nvhost_syncpt.h"
+#include "nvhost_acm.h"
+#include "host1x.h"
+#include "chip_support.h"
+
+/**
+ * Write the current syncpoint value back to hw.
+ */
+static void host1x_syncpt_reset(struct nvhost_syncpt *sp, u32 id)
+{
+ struct nvhost_master *dev = syncpt_to_dev(sp);
+ int min = nvhost_syncpt_read_min(sp, id);
+ writel(min, dev->sync_aperture + (host1x_sync_syncpt_0_r() + id * 4));
+}
+
+/**
+ * Write the current waitbase value back to hw.
+ */
+static void host1x_syncpt_reset_wait_base(struct nvhost_syncpt *sp, u32 id)
+{
+ struct nvhost_master *dev = syncpt_to_dev(sp);
+ writel(sp->base_val[id],
+ dev->sync_aperture + (host1x_sync_syncpt_base_0_r() + id * 4));
+}
+
+/**
+ * Read waitbase value from hw.
+ */
+static void host1x_syncpt_read_wait_base(struct nvhost_syncpt *sp, u32 id)
+{
+ struct nvhost_master *dev = syncpt_to_dev(sp);
+ sp->base_val[id] = readl(dev->sync_aperture +
+ (host1x_sync_syncpt_base_0_r() + id * 4));
+}
+
+/**
+ * Updates the last value read from hardware.
+ * (was nvhost_syncpt_update_min)
+ */
+static u32 host1x_syncpt_update_min(struct nvhost_syncpt *sp, u32 id)
+{
+ struct nvhost_master *dev = syncpt_to_dev(sp);
+ void __iomem *sync_regs = dev->sync_aperture;
+ u32 old, live;
+
+ do {
+ old = nvhost_syncpt_read_min(sp, id);
+ live = readl(sync_regs + (host1x_sync_syncpt_0_r() + id * 4));
+ } while ((u32)atomic_cmpxchg(&sp->min_val[id], old, live) != old);
+
+ if (!nvhost_syncpt_check_max(sp, id, live))
+ dev_err(&syncpt_to_dev(sp)->dev->dev,
+ "%s failed: id=%u, min=%d, max=%d\n",
+ __func__,
+ id,
+ nvhost_syncpt_read_min(sp, id),
+ nvhost_syncpt_read_max(sp, id));
+
+ return live;
+}
+
+/**
+ * Write a cpu syncpoint increment to the hardware, without touching
+ * the cache. Caller is responsible for host being powered.
+ */
+static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id)
+{
+ struct nvhost_master *dev = syncpt_to_dev(sp);
+ u32 reg_offset = id / 32;
+
+ if (!nvhost_module_powered(dev->dev)) {
+ dev_err(&syncpt_to_dev(sp)->dev->dev,
+ "Trying to access host1x when it's off");
+ return;
+ }
+
+ if (!nvhost_syncpt_client_managed(sp, id)
+ && nvhost_syncpt_min_eq_max(sp, id)) {
+ dev_err(&syncpt_to_dev(sp)->dev->dev,
+ "Trying to increment syncpoint id %d beyond max\n",
+ id);
+ return;
+ }
+ writel(BIT_MASK(id), dev->sync_aperture +
+ host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4);
+ wmb();
+}
+
+static const char *host1x_syncpt_name(struct nvhost_syncpt *sp, u32 id)
+{
+ struct host1x_device_info *info = &syncpt_to_dev(sp)->info;
+ const char *name = NULL;
+
+ if (id < info->nb_pts)
+ name = info->syncpt_names[id];
+
+ return name ? name : "";
+}
+
+static void host1x_syncpt_debug(struct nvhost_syncpt *sp)
+{
+ u32 i;
+ for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
+ u32 max = nvhost_syncpt_read_max(sp, i);
+ u32 min = nvhost_syncpt_update_min(sp, i);
+ if (!max && !min)
+ continue;
+ dev_info(&syncpt_to_dev(sp)->dev->dev,
+ "id %d (%s) min %d max %d\n",
+ i, syncpt_op().name(sp, i),
+ min, max);
+
+ }
+
+ for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) {
+ u32 base_val;
+ host1x_syncpt_read_wait_base(sp, i);
+ base_val = sp->base_val[i];
+ if (base_val)
+ dev_info(&syncpt_to_dev(sp)->dev->dev,
+ "waitbase id %d val %d\n",
+ i, base_val);
+
+ }
+}
+
+static const struct nvhost_syncpt_ops host1x_syncpt_ops = {
+ .reset = host1x_syncpt_reset,
+ .reset_wait_base = host1x_syncpt_reset_wait_base,
+ .read_wait_base = host1x_syncpt_read_wait_base,
+ .update_min = host1x_syncpt_update_min,
+ .cpu_incr = host1x_syncpt_cpu_incr,
+ .debug = host1x_syncpt_debug,
+ .name = host1x_syncpt_name,
+};
diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h
new file mode 100644
index 0000000..67f0cbf
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h
@@ -0,0 +1,398 @@
+/*
+ * drivers/video/tegra/host/host1x/hw_host1x_sync_host1x.h
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+ * Function naming determines intended use:
+ *
+ * <x>_r(void) : Returns the offset for register <x>.
+ *
+ * <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+ *
+ * <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+ *
+ * <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+ * and masked to place it at field <y> of register <x>. This value
+ * can be |'d with others to produce a full register value for
+ * register <x>.
+ *
+ * <x>_<y>_m(void) : Returns a mask for field <y> of register <x>. This
+ * value can be ~'d and then &'d to clear the value of field <y> for
+ * register <x>.
+ *
+ * <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+ * to place it at field <y> of register <x>. This value can be |'d
+ * with others to produce a full register value for <x>.
+ *
+ * <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+ * <x> value 'r' after being shifted to place its LSB at bit 0.
+ * This value is suitable for direct comparison with other unshifted
+ * values appropriate for use in field <y> of register <x>.
+ *
+ * <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+ * field <y> of register <x>. This value is suitable for direct
+ * comparison with unshifted values appropriate for use in field <y>
+ * of register <x>.
+ */
+
+#ifndef __hw_host1x_sync_host1x_h__
+#define __hw_host1x_sync_host1x_h__
+/*This file is autogenerated. Do not edit. */
+
+static inline u32 host1x_sync_intmask_r(void)
+{
+ return 0x4;
+}
+static inline u32 host1x_sync_intc0mask_r(void)
+{
+ return 0x8;
+}
+static inline u32 host1x_sync_hintstatus_r(void)
+{
+ return 0x20;
+}
+static inline u32 host1x_sync_hintmask_r(void)
+{
+ return 0x24;
+}
+static inline u32 host1x_sync_hintstatus_ext_r(void)
+{
+ return 0x28;
+}
+static inline u32 host1x_sync_hintstatus_ext_ip_read_int_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_sync_hintstatus_ext_ip_read_int_f(u32 v)
+{
+ return (v & 0x1) << 30;
+}
+static inline u32 host1x_sync_hintstatus_ext_ip_read_int_m(void)
+{
+ return 0x1 << 30;
+}
+static inline u32 host1x_sync_hintstatus_ext_ip_read_int_v(u32 r)
+{
+ return (r >> 30) & 0x1;
+}
+static inline u32 host1x_sync_hintstatus_ext_ip_write_int_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_sync_hintstatus_ext_ip_write_int_f(u32 v)
+{
+ return (v & 0x1) << 31;
+}
+static inline u32 host1x_sync_hintstatus_ext_ip_write_int_m(void)
+{
+ return 0x1 << 31;
+}
+static inline u32 host1x_sync_hintstatus_ext_ip_write_int_v(u32 r)
+{
+ return (r >> 31) & 0x1;
+}
+static inline u32 host1x_sync_hintmask_ext_r(void)
+{
+ return 0x2c;
+}
+static inline u32 host1x_sync_syncpt_thresh_cpu0_int_status_r(void)
+{
+ return 0x40;
+}
+static inline u32 host1x_sync_syncpt_thresh_cpu1_int_status_r(void)
+{
+ return 0x48;
+}
+static inline u32 host1x_sync_syncpt_thresh_int_disable_r(void)
+{
+ return 0x60;
+}
+static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
+{
+ return 0x68;
+}
+static inline u32 host1x_sync_cf0_setup_r(void)
+{
+ return 0x80;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_base_s(void)
+{
+ return 9;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_base_f(u32 v)
+{
+ return (v & 0x1ff) << 0;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_base_m(void)
+{
+ return 0x1ff << 0;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_base_v(u32 r)
+{
+ return (r >> 0) & 0x1ff;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_limit_s(void)
+{
+ return 9;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_limit_f(u32 v)
+{
+ return (v & 0x1ff) << 16;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_limit_m(void)
+{
+ return 0x1ff << 16;
+}
+static inline u32 host1x_sync_cf0_setup_cf0_limit_v(u32 r)
+{
+ return (r >> 16) & 0x1ff;
+}
+static inline u32 host1x_sync_cmdproc_stop_r(void)
+{
+ return 0xac;
+}
+static inline u32 host1x_sync_ch_teardown_r(void)
+{
+ return 0xb0;
+}
+static inline u32 host1x_sync_usec_clk_r(void)
+{
+ return 0x1a4;
+}
+static inline u32 host1x_sync_ctxsw_timeout_cfg_r(void)
+{
+ return 0x1a8;
+}
+static inline u32 host1x_sync_ip_busy_timeout_r(void)
+{
+ return 0x1bc;
+}
+static inline u32 host1x_sync_ip_read_timeout_addr_r(void)
+{
+ return 0x1c0;
+}
+static inline u32 host1x_sync_ip_write_timeout_addr_r(void)
+{
+ return 0x1c4;
+}
+static inline u32 host1x_sync_mlock_0_r(void)
+{
+ return 0x2c0;
+}
+static inline u32 host1x_sync_mlock_owner_0_r(void)
+{
+ return 0x340;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_s(void)
+{
+ return 4;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(u32 v)
+{
+ return (v & 0xf) << 8;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_m(void)
+{
+ return 0xf << 8;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_v(u32 r)
+{
+ return (r >> 8) & 0xf;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_f(u32 v)
+{
+ return (v & 0x1) << 1;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_m(void)
+{
+ return 0x1 << 1;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(u32 r)
+{
+ return (r >> 1) & 0x1;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_f(u32 v)
+{
+ return (v & 0x1) << 0;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_m(void)
+{
+ return 0x1 << 0;
+}
+static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(u32 r)
+{
+ return (r >> 0) & 0x1;
+}
+static inline u32 host1x_sync_syncpt_0_r(void)
+{
+ return 0x400;
+}
+static inline u32 host1x_sync_syncpt_int_thresh_0_r(void)
+{
+ return 0x500;
+}
+static inline u32 host1x_sync_syncpt_base_0_r(void)
+{
+ return 0x600;
+}
+static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
+{
+ return 0x700;
+}
+static inline u32 host1x_sync_cbread0_r(void)
+{
+ return 0x720;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_r(void)
+{
+ return 0x74c;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_s(void)
+{
+ return 9;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v)
+{
+ return (v & 0x1ff) << 0;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_m(void)
+{
+ return 0x1ff << 0;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_v(u32 r)
+{
+ return (r >> 0) & 0x1ff;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_s(void)
+{
+ return 3;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_f(u32 v)
+{
+ return (v & 0x7) << 16;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_m(void)
+{
+ return 0x7 << 16;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_v(u32 r)
+{
+ return (r >> 16) & 0x7;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_s(void)
+{
+ return 1;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v)
+{
+ return (v & 0x1) << 31;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_m(void)
+{
+ return 0x1 << 31;
+}
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_v(u32 r)
+{
+ return (r >> 31) & 0x1;
+}
+static inline u32 host1x_sync_cfpeek_read_r(void)
+{
+ return 0x750;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_r(void)
+{
+ return 0x754;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_s(void)
+{
+ return 9;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_f(u32 v)
+{
+ return (v & 0x1ff) << 0;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_m(void)
+{
+ return 0x1ff << 0;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(u32 r)
+{
+ return (r >> 0) & 0x1ff;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_s(void)
+{
+ return 9;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_f(u32 v)
+{
+ return (v & 0x1ff) << 16;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_m(void)
+{
+ return 0x1ff << 16;
+}
+static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(u32 r)
+{
+ return (r >> 16) & 0x1ff;
+}
+static inline u32 host1x_sync_cbstat_0_r(void)
+{
+ return 0x758;
+}
+static inline u32 host1x_sync_cbstat_0_cboffset0_s(void)
+{
+ return 16;
+}
+static inline u32 host1x_sync_cbstat_0_cboffset0_f(u32 v)
+{
+ return (v & 0xffff) << 0;
+}
+static inline u32 host1x_sync_cbstat_0_cboffset0_m(void)
+{
+ return 0xffff << 0;
+}
+static inline u32 host1x_sync_cbstat_0_cboffset0_v(u32 r)
+{
+ return (r >> 0) & 0xffff;
+}
+static inline u32 host1x_sync_cbstat_0_cbclass0_s(void)
+{
+ return 10;
+}
+static inline u32 host1x_sync_cbstat_0_cbclass0_f(u32 v)
+{
+ return (v & 0x3ff) << 16;
+}
+static inline u32 host1x_sync_cbstat_0_cbclass0_m(void)
+{
+ return 0x3ff << 16;
+}
+static inline u32 host1x_sync_cbstat_0_cbclass0_v(u32 r)
+{
+ return (r >> 16) & 0x3ff;
+}
+
+#endif /* __hw_host1x_sync_host1x_h__ */
diff --git a/drivers/video/tegra/host/nvhost_acm.c b/drivers/video/tegra/host/nvhost_acm.c
new file mode 100644
index 0000000..15cf395
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_acm.c
@@ -0,0 +1,481 @@
+/*
+ * drivers/video/tegra/host/nvhost_acm.c
+ *
+ * Tegra host1x Automatic Clock Management
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/stat.h>
+#include <linux/string.h>
+#include <linux/sched.h>
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/device.h>
+#include <linux/delay.h>
+#include <linux/platform_device.h>
+
+#include <mach/powergate.h>
+#include <mach/clk.h>
+
+#include "nvhost_acm.h"
+
+#define ACM_SUSPEND_WAIT_FOR_IDLE_TIMEOUT (2 * HZ)
+#define POWERGATE_DELAY 10
+#define MAX_DEVID_LENGTH 16
+
+static void do_powergate_locked(int id)
+{
+ if (id != -1 && tegra_powergate_is_powered(id))
+ tegra_powergate_power_off(id);
+}
+
+static void do_unpowergate_locked(int id)
+{
+ if (id != -1)
+ tegra_powergate_power_on(id);
+}
+
+static void to_state_clockgated_locked(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ if (pdata->powerstate == NVHOST_POWER_STATE_RUNNING) {
+ int i, err;
+ if (pdata->prepare_clockoff) {
+ err = pdata->prepare_clockoff(dev);
+ if (err) {
+ dev_err(&dev->dev, "error clock gating");
+ return;
+ }
+ }
+
+ for (i = 0; i < pdata->num_clks; i++)
+ clk_disable_unprepare(pdata->clk[i]);
+ if (dev->dev.parent)
+ nvhost_module_idle(to_platform_device(dev->dev.parent));
+ } else if (pdata->powerstate == NVHOST_POWER_STATE_POWERGATED
+ && pdata->can_powergate) {
+ do_unpowergate_locked(pdata->powergate_ids[0]);
+ do_unpowergate_locked(pdata->powergate_ids[1]);
+ }
+ pdata->powerstate = NVHOST_POWER_STATE_CLOCKGATED;
+}
+
+static void to_state_running_locked(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+ int prev_state = pdata->powerstate;
+
+ if (pdata->powerstate == NVHOST_POWER_STATE_POWERGATED)
+ to_state_clockgated_locked(dev);
+
+ if (pdata->powerstate == NVHOST_POWER_STATE_CLOCKGATED) {
+ int i;
+
+ if (dev->dev.parent)
+ nvhost_module_busy(to_platform_device(dev->dev.parent));
+
+ for (i = 0; i < pdata->num_clks; i++) {
+ int err = clk_prepare_enable(pdata->clk[i]);
+ if (err) {
+ dev_err(&dev->dev, "Cannot turn on clock %s",
+ pdata->clocks[i].name);
+ return;
+ }
+ }
+
+ if (pdata->finalize_clockon)
+ pdata->finalize_clockon(dev);
+
+ /* Invoke callback after power un-gating. This is used for
+ * restoring context. */
+ if (prev_state == NVHOST_POWER_STATE_POWERGATED
+ && pdata->finalize_poweron)
+ pdata->finalize_poweron(dev);
+ }
+ pdata->powerstate = NVHOST_POWER_STATE_RUNNING;
+}
+
+/* This gets called from powergate_handler() and from module suspend.
+ * Module suspend is done for all modules, runtime power gating only
+ * for modules with can_powergate set.
+ */
+static int to_state_powergated_locked(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+ int err = 0;
+
+ if (pdata->prepare_poweroff &&
+ pdata->powerstate != NVHOST_POWER_STATE_POWERGATED) {
+ /* Clock needs to be on in prepare_poweroff */
+ to_state_running_locked(dev);
+ err = pdata->prepare_poweroff(dev);
+ if (err)
+ return err;
+ }
+
+ if (pdata->powerstate == NVHOST_POWER_STATE_RUNNING)
+ to_state_clockgated_locked(dev);
+
+ if (pdata->can_powergate) {
+ do_powergate_locked(pdata->powergate_ids[0]);
+ do_powergate_locked(pdata->powergate_ids[1]);
+ }
+
+ pdata->powerstate = NVHOST_POWER_STATE_POWERGATED;
+ return 0;
+}
+
+static void schedule_powergating_locked(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+ if (pdata->can_powergate)
+ schedule_delayed_work(&pdata->powerstate_down,
+ msecs_to_jiffies(pdata->powergate_delay));
+}
+
+static void schedule_clockgating_locked(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+ schedule_delayed_work(&pdata->powerstate_down,
+ msecs_to_jiffies(pdata->clockgate_delay));
+}
+
+void nvhost_module_busy(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ if (pdata->busy)
+ pdata->busy(dev);
+
+ mutex_lock(&pdata->lock);
+ cancel_delayed_work(&pdata->powerstate_down);
+
+ pdata->refcount++;
+ if (pdata->refcount > 0 && !nvhost_module_powered(dev))
+ to_state_running_locked(dev);
+ mutex_unlock(&pdata->lock);
+}
+
+static void powerstate_down_handler(struct work_struct *work)
+{
+ struct platform_device *dev;
+ struct nvhost_device_data *pdata;
+
+ pdata = container_of(to_delayed_work(work),
+ struct nvhost_device_data,
+ powerstate_down);
+
+ dev = pdata->pdev;
+
+ mutex_lock(&pdata->lock);
+ if (pdata->refcount == 0) {
+ switch (pdata->powerstate) {
+ case NVHOST_POWER_STATE_RUNNING:
+ to_state_clockgated_locked(dev);
+ schedule_powergating_locked(dev);
+ break;
+ case NVHOST_POWER_STATE_CLOCKGATED:
+ if (to_state_powergated_locked(dev))
+ schedule_powergating_locked(dev);
+ break;
+ default:
+ break;
+ }
+ }
+ mutex_unlock(&pdata->lock);
+}
+
+void nvhost_module_idle_mult(struct platform_device *dev, int refs)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+ bool kick = false;
+
+ mutex_lock(&pdata->lock);
+ pdata->refcount -= refs;
+ if (pdata->refcount == 0) {
+ if (nvhost_module_powered(dev))
+ schedule_clockgating_locked(dev);
+ kick = true;
+ }
+ mutex_unlock(&pdata->lock);
+
+ if (kick) {
+ wake_up(&pdata->idle_wq);
+
+ if (pdata->idle)
+ pdata->idle(dev);
+ }
+}
+
+static ssize_t refcount_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ int ret;
+ struct nvhost_device_power_attr *power_attribute =
+ container_of(attr, struct nvhost_device_power_attr,
+ power_attr[NVHOST_POWER_SYSFS_ATTRIB_REFCOUNT]);
+ struct platform_device *dev = power_attribute->ndev;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ mutex_lock(&pdata->lock);
+ ret = sprintf(buf, "%d\n", pdata->refcount);
+ mutex_unlock(&pdata->lock);
+
+ return ret;
+}
+
+static ssize_t powergate_delay_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ int powergate_delay = 0, ret = 0;
+ struct nvhost_device_power_attr *power_attribute =
+ container_of(attr, struct nvhost_device_power_attr,
+ power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]);
+ struct platform_device *dev = power_attribute->ndev;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ if (!pdata->can_powergate) {
+ dev_info(&dev->dev, "does not support power-gating\n");
+ return count;
+ }
+
+ mutex_lock(&pdata->lock);
+ ret = sscanf(buf, "%d", &powergate_delay);
+ if (ret == 1 && powergate_delay >= 0)
+ pdata->powergate_delay = powergate_delay;
+ else
+ dev_err(&dev->dev, "Invalid powergate delay\n");
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+
+static ssize_t powergate_delay_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ int ret;
+ struct nvhost_device_power_attr *power_attribute =
+ container_of(attr, struct nvhost_device_power_attr,
+ power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]);
+ struct platform_device *dev = power_attribute->ndev;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ mutex_lock(&pdata->lock);
+ ret = sprintf(buf, "%d\n", pdata->powergate_delay);
+ mutex_unlock(&pdata->lock);
+
+ return ret;
+}
+
+static ssize_t clockgate_delay_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ int clockgate_delay = 0, ret = 0;
+ struct nvhost_device_power_attr *power_attribute =
+ container_of(attr, struct nvhost_device_power_attr,
+ power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY]);
+ struct platform_device *dev = power_attribute->ndev;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ mutex_lock(&pdata->lock);
+ ret = sscanf(buf, "%d", &clockgate_delay);
+ if (ret == 1 && clockgate_delay >= 0)
+ pdata->clockgate_delay = clockgate_delay;
+ else
+ dev_err(&dev->dev, "Invalid clockgate delay\n");
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+
+static ssize_t clockgate_delay_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ int ret;
+ struct nvhost_device_power_attr *power_attribute =
+ container_of(attr, struct nvhost_device_power_attr,
+ power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY]);
+ struct platform_device *dev = power_attribute->ndev;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ mutex_lock(&pdata->lock);
+ ret = sprintf(buf, "%d\n", pdata->clockgate_delay);
+ mutex_unlock(&pdata->lock);
+
+ return ret;
+}
+
+int nvhost_module_init(struct platform_device *dev)
+{
+ int i = 0, err = 0;
+ struct kobj_attribute *attr = NULL;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ /* initialize clocks to known state */
+ while (pdata->clocks[i].name && i < NVHOST_MODULE_MAX_CLOCKS) {
+ long rate = pdata->clocks[i].default_rate;
+ struct clk *c;
+
+ c = devm_clk_get(&dev->dev, pdata->clocks[i].name);
+ if (IS_ERR_OR_NULL(c)) {
+ dev_err(&dev->dev, "Cannot get clock %s\n",
+ pdata->clocks[i].name);
+ return -ENODEV;
+ }
+
+ rate = clk_round_rate(c, rate);
+ clk_prepare_enable(c);
+ clk_set_rate(c, rate);
+ clk_disable_unprepare(c);
+ pdata->clk[i] = c;
+ i++;
+ }
+ pdata->num_clks = i;
+
+ mutex_init(&pdata->lock);
+ init_waitqueue_head(&pdata->idle_wq);
+ INIT_DELAYED_WORK(&pdata->powerstate_down, powerstate_down_handler);
+
+ /* power gate units that we can power gate */
+ if (pdata->can_powergate) {
+ do_powergate_locked(pdata->powergate_ids[0]);
+ do_powergate_locked(pdata->powergate_ids[1]);
+ pdata->powerstate = NVHOST_POWER_STATE_POWERGATED;
+ } else {
+ do_unpowergate_locked(pdata->powergate_ids[0]);
+ do_unpowergate_locked(pdata->powergate_ids[1]);
+ pdata->powerstate = NVHOST_POWER_STATE_CLOCKGATED;
+ }
+
+ /* Init the power sysfs attributes for this device */
+ pdata->power_attrib = devm_kzalloc(&dev->dev,
+ sizeof(struct nvhost_device_power_attr),
+ GFP_KERNEL);
+ if (!pdata->power_attrib) {
+ dev_err(&dev->dev, "Unable to allocate sysfs attributes\n");
+ return -ENOMEM;
+ }
+ pdata->power_attrib->ndev = dev;
+
+ pdata->power_kobj = kobject_create_and_add("acm", &dev->dev.kobj);
+ if (!pdata->power_kobj) {
+ dev_err(&dev->dev, "Could not add dir 'power'\n");
+ err = -EIO;
+ goto fail_attrib_alloc;
+ }
+
+ attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY];
+ attr->attr.name = "clockgate_delay";
+ attr->attr.mode = S_IWUSR | S_IRUGO;
+ attr->show = clockgate_delay_show;
+ attr->store = clockgate_delay_store;
+ if (sysfs_create_file(pdata->power_kobj, &attr->attr)) {
+ dev_err(&dev->dev, "Could not create sysfs attribute clockgate_delay\n");
+ err = -EIO;
+ goto fail_clockdelay;
+ }
+
+ attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY];
+ attr->attr.name = "powergate_delay";
+ attr->attr.mode = S_IWUSR | S_IRUGO;
+ attr->show = powergate_delay_show;
+ attr->store = powergate_delay_store;
+ if (sysfs_create_file(pdata->power_kobj, &attr->attr)) {
+ dev_err(&dev->dev, "Could not create sysfs attribute powergate_delay\n");
+ err = -EIO;
+ goto fail_powergatedelay;
+ }
+
+ attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_REFCOUNT];
+ attr->attr.name = "refcount";
+ attr->attr.mode = S_IRUGO;
+ attr->show = refcount_show;
+ if (sysfs_create_file(pdata->power_kobj, &attr->attr)) {
+ dev_err(&dev->dev, "Could not create sysfs attribute refcount\n");
+ err = -EIO;
+ goto fail_refcount;
+ }
+
+ return 0;
+
+fail_refcount:
+ attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY];
+ sysfs_remove_file(pdata->power_kobj, &attr->attr);
+
+fail_powergatedelay:
+ attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY];
+ sysfs_remove_file(pdata->power_kobj, &attr->attr);
+
+fail_clockdelay:
+ kobject_put(pdata->power_kobj);
+
+fail_attrib_alloc:
+ kfree(pdata->power_attrib);
+
+ return err;
+}
+
+static int is_module_idle(struct platform_device *dev)
+{
+ int count;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ mutex_lock(&pdata->lock);
+ count = pdata->refcount;
+ mutex_unlock(&pdata->lock);
+
+ return (count == 0);
+}
+
+int nvhost_module_suspend(struct platform_device *dev)
+{
+ int ret;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ ret = wait_event_timeout(pdata->idle_wq, is_module_idle(dev),
+ ACM_SUSPEND_WAIT_FOR_IDLE_TIMEOUT);
+ if (ret == 0) {
+ dev_info(&dev->dev, "%s prevented suspend\n",
+ dev_name(&dev->dev));
+ return -EBUSY;
+ }
+
+ mutex_lock(&pdata->lock);
+ cancel_delayed_work(&pdata->powerstate_down);
+ to_state_powergated_locked(dev);
+ mutex_unlock(&pdata->lock);
+
+ if (pdata->suspend_ndev)
+ pdata->suspend_ndev(dev);
+
+ return 0;
+}
+
+void nvhost_module_deinit(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ kobject_put(pdata->power_kobj);
+
+ if (pdata->deinit)
+ pdata->deinit(dev);
+
+ nvhost_module_suspend(dev);
+ pdata->powerstate = NVHOST_POWER_STATE_DEINIT;
+}
diff --git a/drivers/video/tegra/host/nvhost_acm.h b/drivers/video/tegra/host/nvhost_acm.h
new file mode 100644
index 0000000..0892a57
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_acm.h
@@ -0,0 +1,45 @@
+/*
+ * drivers/video/tegra/host/nvhost_acm.h
+ *
+ * Tegra host1x Automatic Clock Management
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_ACM_H
+#define __NVHOST_ACM_H
+
+#include <linux/nvhost.h>
+
+/* Sets clocks and powergating state for a module */
+int nvhost_module_init(struct platform_device *ndev);
+void nvhost_module_deinit(struct platform_device *dev);
+int nvhost_module_suspend(struct platform_device *dev);
+
+void nvhost_module_busy(struct platform_device *dev);
+void nvhost_module_idle_mult(struct platform_device *dev, int refs);
+
+static inline bool nvhost_module_powered(struct platform_device *dev)
+{
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+ return pdata->powerstate == NVHOST_POWER_STATE_RUNNING;
+}
+
+static inline void nvhost_module_idle(struct platform_device *dev)
+{
+ nvhost_module_idle_mult(dev, 1);
+}
+
+#endif
diff --git a/drivers/video/tegra/host/nvhost_syncpt.c b/drivers/video/tegra/host/nvhost_syncpt.c
new file mode 100644
index 0000000..d7c8230
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_syncpt.c
@@ -0,0 +1,333 @@
+/*
+ * drivers/video/tegra/host/nvhost_syncpt.c
+ *
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/stat.h>
+#include "nvhost_syncpt.h"
+#include "nvhost_acm.h"
+#include "host1x/host1x.h"
+#include "chip_support.h"
+
+#define MAX_SYNCPT_LENGTH 5
+
+/* Name of sysfs node for min and max value */
+static const char *min_name = "min";
+static const char *max_name = "max";
+
+/**
+ * Resets syncpoint and waitbase values to sw shadows
+ */
+void nvhost_syncpt_reset(struct nvhost_syncpt *sp)
+{
+ u32 i;
+
+ for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++)
+ syncpt_op().reset(sp, i);
+ for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++)
+ syncpt_op().reset_wait_base(sp, i);
+ wmb();
+}
+
+/**
+ * Updates sw shadow state for client managed registers
+ */
+void nvhost_syncpt_save(struct nvhost_syncpt *sp)
+{
+ u32 i;
+
+ for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
+ if (nvhost_syncpt_client_managed(sp, i))
+ syncpt_op().update_min(sp, i);
+ else
+ WARN_ON(!nvhost_syncpt_min_eq_max(sp, i));
+ }
+
+ for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++)
+ syncpt_op().read_wait_base(sp, i);
+}
+
+/**
+ * Updates the last value read from hardware.
+ */
+u32 nvhost_syncpt_update_min(struct nvhost_syncpt *sp, u32 id)
+{
+ u32 val;
+
+ val = syncpt_op().update_min(sp, id);
+
+ return val;
+}
+
+/**
+ * Get the current syncpoint value
+ */
+u32 nvhost_syncpt_read(struct nvhost_syncpt *sp, u32 id)
+{
+ u32 val;
+ nvhost_module_busy(syncpt_to_dev(sp)->dev);
+ val = syncpt_op().update_min(sp, id);
+ nvhost_module_idle(syncpt_to_dev(sp)->dev);
+ return val;
+}
+
+/**
+ * Get the current syncpoint base
+ */
+u32 nvhost_syncpt_read_wait_base(struct nvhost_syncpt *sp, u32 id)
+{
+ u32 val;
+ nvhost_module_busy(syncpt_to_dev(sp)->dev);
+ syncpt_op().read_wait_base(sp, id);
+ val = sp->base_val[id];
+ nvhost_module_idle(syncpt_to_dev(sp)->dev);
+ return val;
+}
+
+/**
+ * Write a cpu syncpoint increment to the hardware, without touching
+ * the cache. Caller is responsible for host being powered.
+ */
+void nvhost_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id)
+{
+ syncpt_op().cpu_incr(sp, id);
+}
+
+/**
+ * Increment syncpoint value from cpu, updating cache
+ */
+void nvhost_syncpt_incr(struct nvhost_syncpt *sp, u32 id)
+{
+ if (nvhost_syncpt_client_managed(sp, id))
+ nvhost_syncpt_incr_max(sp, id, 1);
+ nvhost_module_busy(syncpt_to_dev(sp)->dev);
+ nvhost_syncpt_cpu_incr(sp, id);
+ nvhost_module_idle(syncpt_to_dev(sp)->dev);
+}
+
+/**
+ * Returns true if syncpoint is expired, false if we may need to wait
+ */
+bool nvhost_syncpt_is_expired(
+ struct nvhost_syncpt *sp,
+ u32 id,
+ u32 thresh)
+{
+ u32 current_val;
+ u32 future_val;
+ smp_rmb();
+ current_val = (u32)atomic_read(&sp->min_val[id]);
+ future_val = (u32)atomic_read(&sp->max_val[id]);
+
+ /* Note the use of unsigned arithmetic here (mod 1<<32).
+ *
+ * c = current_val = min_val = the current value of the syncpoint.
+ * t = thresh = the value we are checking
+ * f = future_val = max_val = the value c will reach when all
+ * outstanding increments have completed.
+ *
+ * Note that c always chases f until it reaches f.
+ *
+ * Dtf = (f - t)
+ * Dtc = (c - t)
+ *
+ * Consider all cases:
+ *
+ * A) .....c..t..f..... Dtf < Dtc need to wait
+ * B) .....c.....f..t.. Dtf > Dtc expired
+ * C) ..t..c.....f..... Dtf > Dtc expired (Dct very large)
+ *
+ * Any case where f==c: always expired (for any t). Dtf == Dcf
+ * Any case where t==c: always expired (for any f). Dtf >= Dtc (because Dtc==0)
+ * Any case where t==f!=c: always wait. Dtf < Dtc (because Dtf==0,
+ * Dtc!=0)
+ *
+ * Other cases:
+ *
+ * A) .....t..f..c..... Dtf < Dtc need to wait
+ * A) .....f..c..t..... Dtf < Dtc need to wait
+ * A) .....f..t..c..... Dtf > Dtc expired
+ *
+ * So:
+ * Dtf >= Dtc implies EXPIRED (return true)
+ * Dtf < Dtc implies WAIT (return false)
+ *
+ * Note: If t is expired then we *cannot* wait on it. We would wait
+ * forever (hang the system).
+ *
+ * Note: do NOT get clever and remove the -thresh from both sides. It
+ * is NOT the same.
+ *
+ * If future valueis zero, we have a client managed sync point. In that
+ * case we do a direct comparison.
+ */
+ if (!nvhost_syncpt_client_managed(sp, id))
+ return future_val - thresh >= current_val - thresh;
+ else
+ return (s32)(current_val - thresh) >= 0;
+}
+
+void nvhost_syncpt_debug(struct nvhost_syncpt *sp)
+{
+ syncpt_op().debug(sp);
+}
+/* Displays the current value of the sync point via sysfs */
+static ssize_t syncpt_min_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct nvhost_syncpt_attr *syncpt_attr =
+ container_of(attr, struct nvhost_syncpt_attr, attr);
+
+ return snprintf(buf, PAGE_SIZE, "%u",
+ nvhost_syncpt_read(&syncpt_attr->host->syncpt,
+ syncpt_attr->id));
+}
+
+static ssize_t syncpt_max_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct nvhost_syncpt_attr *syncpt_attr =
+ container_of(attr, struct nvhost_syncpt_attr, attr);
+
+ return snprintf(buf, PAGE_SIZE, "%u",
+ nvhost_syncpt_read_max(&syncpt_attr->host->syncpt,
+ syncpt_attr->id));
+}
+
+int nvhost_syncpt_init(struct platform_device *dev,
+ struct nvhost_syncpt *sp)
+{
+ int i;
+ struct nvhost_master *host = syncpt_to_dev(sp);
+ int err = 0;
+
+ /* Allocate structs for min, max and base values */
+ sp->min_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
+ GFP_KERNEL);
+ sp->max_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
+ GFP_KERNEL);
+ sp->base_val = kzalloc(sizeof(u32) * nvhost_syncpt_nb_bases(sp),
+ GFP_KERNEL);
+ sp->lock_counts =
+ kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_mlocks(sp),
+ GFP_KERNEL);
+
+ if (!(sp->min_val && sp->max_val && sp->base_val && sp->lock_counts)) {
+ /* frees happen in the deinit */
+ err = -ENOMEM;
+ goto fail;
+ }
+
+ sp->kobj = kobject_create_and_add("syncpt", &dev->dev.kobj);
+ if (!sp->kobj) {
+ err = -EIO;
+ goto fail;
+ }
+
+ /* Allocate two attributes for each sync point: min and max */
+ sp->syncpt_attrs = kzalloc(sizeof(*sp->syncpt_attrs)
+ * nvhost_syncpt_nb_pts(sp) * 2, GFP_KERNEL);
+ if (!sp->syncpt_attrs) {
+ err = -ENOMEM;
+ goto fail;
+ }
+
+ /* Fill in the attributes */
+ for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
+ char name[MAX_SYNCPT_LENGTH];
+ struct kobject *kobj;
+ struct nvhost_syncpt_attr *min = &sp->syncpt_attrs[i*2];
+ struct nvhost_syncpt_attr *max = &sp->syncpt_attrs[i*2+1];
+
+ /* Create one directory per sync point */
+ snprintf(name, sizeof(name), "%d", i);
+ kobj = kobject_create_and_add(name, sp->kobj);
+ if (!kobj) {
+ err = -EIO;
+ goto fail;
+ }
+
+ min->id = i;
+ min->host = host;
+ min->attr.attr.name = min_name;
+ min->attr.attr.mode = S_IRUGO;
+ min->attr.show = syncpt_min_show;
+ if (sysfs_create_file(kobj, &min->attr.attr)) {
+ err = -EIO;
+ goto fail;
+ }
+
+ max->id = i;
+ max->host = host;
+ max->attr.attr.name = max_name;
+ max->attr.attr.mode = S_IRUGO;
+ max->attr.show = syncpt_max_show;
+ if (sysfs_create_file(kobj, &max->attr.attr)) {
+ err = -EIO;
+ goto fail;
+ }
+ }
+
+ return err;
+
+fail:
+ nvhost_syncpt_deinit(sp);
+ return err;
+}
+
+void nvhost_syncpt_deinit(struct nvhost_syncpt *sp)
+{
+ kobject_put(sp->kobj);
+
+ kfree(sp->min_val);
+ sp->min_val = NULL;
+
+ kfree(sp->max_val);
+ sp->max_val = NULL;
+
+ kfree(sp->base_val);
+ sp->base_val = NULL;
+
+ kfree(sp->lock_counts);
+ sp->lock_counts = 0;
+
+ kfree(sp->syncpt_attrs);
+ sp->syncpt_attrs = NULL;
+}
+
+int nvhost_syncpt_client_managed(struct nvhost_syncpt *sp, u32 id)
+{
+ return BIT(id) & syncpt_to_dev(sp)->info.client_managed;
+}
+
+int nvhost_syncpt_nb_pts(struct nvhost_syncpt *sp)
+{
+ return syncpt_to_dev(sp)->info.nb_pts;
+}
+
+int nvhost_syncpt_nb_bases(struct nvhost_syncpt *sp)
+{
+ return syncpt_to_dev(sp)->info.nb_bases;
+}
+
+int nvhost_syncpt_nb_mlocks(struct nvhost_syncpt *sp)
+{
+ return syncpt_to_dev(sp)->info.nb_mlocks;
+}
diff --git a/drivers/video/tegra/host/nvhost_syncpt.h b/drivers/video/tegra/host/nvhost_syncpt.h
new file mode 100644
index 0000000..b883442
--- /dev/null
+++ b/drivers/video/tegra/host/nvhost_syncpt.h
@@ -0,0 +1,136 @@
+/*
+ * drivers/video/tegra/host/nvhost_syncpt.h
+ *
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __NVHOST_SYNCPT_H
+#define __NVHOST_SYNCPT_H
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/nvhost.h>
+#include <linux/atomic.h>
+
+/* host managed and invalid syncpt id */
+#define NVSYNCPT_GRAPHICS_HOST (0)
+
+/* Attribute struct for sysfs min and max attributes */
+struct nvhost_syncpt_attr {
+ struct kobj_attribute attr;
+ struct nvhost_master *host;
+ int id;
+};
+
+struct nvhost_syncpt {
+ struct kobject *kobj;
+ atomic_t *min_val;
+ atomic_t *max_val;
+ u32 *base_val;
+ atomic_t *lock_counts;
+ const char **syncpt_names;
+ struct nvhost_syncpt_attr *syncpt_attrs;
+};
+
+int nvhost_syncpt_init(struct platform_device *, struct nvhost_syncpt *);
+void nvhost_syncpt_deinit(struct nvhost_syncpt *);
+
+#define syncpt_to_dev(sp) container_of(sp, struct nvhost_master, syncpt)
+#define SYNCPT_CHECK_PERIOD (2 * HZ)
+#define MAX_STUCK_CHECK_COUNT 15
+
+/**
+ * Updates the value sent to hardware.
+ */
+static inline u32 nvhost_syncpt_incr_max(struct nvhost_syncpt *sp,
+ u32 id, u32 incrs)
+{
+ return (u32)atomic_add_return(incrs, &sp->max_val[id]);
+}
+
+/**
+ * Updated the value sent to hardware.
+ */
+static inline u32 nvhost_syncpt_set_max(struct nvhost_syncpt *sp,
+ u32 id, u32 val)
+{
+ atomic_set(&sp->max_val[id], val);
+ smp_wmb();
+ return val;
+}
+
+static inline u32 nvhost_syncpt_read_max(struct nvhost_syncpt *sp, u32 id)
+{
+ smp_rmb();
+ return (u32)atomic_read(&sp->max_val[id]);
+}
+
+static inline u32 nvhost_syncpt_read_min(struct nvhost_syncpt *sp, u32 id)
+{
+ smp_rmb();
+ return (u32)atomic_read(&sp->min_val[id]);
+}
+
+int nvhost_syncpt_client_managed(struct nvhost_syncpt *sp, u32 id);
+int nvhost_syncpt_nb_pts(struct nvhost_syncpt *sp);
+int nvhost_syncpt_nb_bases(struct nvhost_syncpt *sp);
+int nvhost_syncpt_nb_mlocks(struct nvhost_syncpt *sp);
+
+static inline bool nvhost_syncpt_check_max(struct nvhost_syncpt *sp,
+ u32 id, u32 real)
+{
+ u32 max;
+ if (nvhost_syncpt_client_managed(sp, id))
+ return true;
+ max = nvhost_syncpt_read_max(sp, id);
+ return (s32)(max - real) >= 0;
+}
+
+/**
+ * Returns true if syncpoint min == max
+ */
+static inline bool nvhost_syncpt_min_eq_max(struct nvhost_syncpt *sp, u32 id)
+{
+ int min, max;
+ smp_rmb();
+ min = atomic_read(&sp->min_val[id]);
+ max = atomic_read(&sp->max_val[id]);
+ return (min == max);
+}
+
+void nvhost_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id);
+
+u32 nvhost_syncpt_update_min(struct nvhost_syncpt *sp, u32 id);
+bool nvhost_syncpt_is_expired(struct nvhost_syncpt *sp, u32 id, u32 thresh);
+
+void nvhost_syncpt_save(struct nvhost_syncpt *sp);
+
+void nvhost_syncpt_reset(struct nvhost_syncpt *sp);
+
+u32 nvhost_syncpt_read(struct nvhost_syncpt *sp, u32 id);
+u32 nvhost_syncpt_read_wait_base(struct nvhost_syncpt *sp, u32 id);
+
+void nvhost_syncpt_incr(struct nvhost_syncpt *sp, u32 id);
+
+void nvhost_syncpt_debug(struct nvhost_syncpt *sp);
+
+static inline int nvhost_syncpt_is_valid(struct nvhost_syncpt *sp, u32 id)
+{
+ return id != NVSYNCPT_INVALID && id < nvhost_syncpt_nb_pts(sp);
+}
+
+#endif
diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
new file mode 100644
index 0000000..20ba2a5
--- /dev/null
+++ b/include/linux/nvhost.h
@@ -0,0 +1,143 @@
+/*
+ * include/linux/nvhost.h
+ *
+ * Tegra host1x driver
+ *
+ * Copyright (c) 2009-2012, NVIDIA Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef __LINUX_NVHOST_H
+#define __LINUX_NVHOST_H
+
+#include <linux/device.h>
+#include <linux/types.h>
+#include <linux/platform_device.h>
+
+struct nvhost_device_power_attr;
+
+#define NVHOST_MODULE_MAX_CLOCKS 3
+#define NVHOST_MODULE_MAX_POWERGATE_IDS 2
+#define NVHOST_MODULE_NO_POWERGATE_IDS .powergate_ids = {-1, -1}
+#define NVHOST_DEFAULT_CLOCKGATE_DELAY .clockgate_delay = 25
+#define NVHOST_NAME_SIZE 24
+#define NVSYNCPT_INVALID (-1)
+
+enum nvhost_power_sysfs_attributes {
+ NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY = 0,
+ NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY,
+ NVHOST_POWER_SYSFS_ATTRIB_REFCOUNT,
+ NVHOST_POWER_SYSFS_ATTRIB_MAX
+};
+
+struct nvhost_clock {
+ char *name;
+ unsigned long default_rate;
+ int reset;
+};
+
+enum nvhost_device_powerstate_t {
+ NVHOST_POWER_STATE_DEINIT,
+ NVHOST_POWER_STATE_RUNNING,
+ NVHOST_POWER_STATE_CLOCKGATED,
+ NVHOST_POWER_STATE_POWERGATED
+};
+
+struct host1x_device_info {
+ int nb_channels; /* host1x: num channels supported */
+ int nb_pts; /* host1x: num syncpoints supported */
+ int nb_bases; /* host1x: num syncpoints supported */
+ u32 client_managed; /* host1x: client managed syncpts */
+ int nb_mlocks; /* host1x: number of mlocks */
+ const char **syncpt_names; /* names of sync points */
+};
+
+struct nvhost_device_data {
+ int version; /* ip version number of device */
+ int id; /* Separates clients of same hw */
+ int index; /* Hardware channel number */
+ void __iomem *aperture; /* Iomem mapped to kernel */
+
+ u32 syncpts; /* Bitfield of sync points used */
+ u32 modulemutexes; /* Bit field of module mutexes */
+
+ u32 class; /* Device class */
+ bool serialize; /* Serialize submits in the channel */
+
+ int powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS];
+ bool can_powergate; /* True if module can be power gated */
+ int clockgate_delay;/* Delay before clock gated */
+ int powergate_delay;/* Delay before power gated */
+ struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */
+
+ struct delayed_work powerstate_down;/* Power state management */
+ int num_clks; /* Number of clocks opened for dev */
+ struct clk *clk[NVHOST_MODULE_MAX_CLOCKS];
+ struct mutex lock; /* Power management lock */
+ int powerstate; /* Current power state */
+ int refcount; /* Number of tasks active */
+ wait_queue_head_t idle_wq; /* Work queue for idle */
+
+ struct nvhost_channel *channel; /* Channel assigned for the module */
+ struct kobject *power_kobj; /* kobj to hold power sysfs entries */
+ struct nvhost_device_power_attr *power_attrib; /* sysfs attributes */
+ struct dentry *debugfs; /* debugfs directory */
+
+ void *private_data; /* private platform data */
+ struct platform_device *pdev; /* owner platform_device */
+
+ /* Finalize power on. Can be used for context restore. */
+ void (*finalize_poweron)(struct platform_device *dev);
+
+ /* Device is busy. */
+ void (*busy)(struct platform_device *);
+
+ /* Device is idle. */
+ void (*idle)(struct platform_device *);
+
+ /* Device is going to be suspended */
+ void (*suspend_ndev)(struct platform_device *);
+
+ /* Device is initialized */
+ void (*init)(struct platform_device *dev);
+
+ /* Device is de-initialized. */
+ void (*deinit)(struct platform_device *dev);
+
+ /* Preparing for power off. Used for context save. */
+ int (*prepare_poweroff)(struct platform_device *dev);
+
+ /* Clock gating callbacks */
+ int (*prepare_clockoff)(struct platform_device *dev);
+ void (*finalize_clockon)(struct platform_device *dev);
+};
+
+struct nvhost_device_power_attr {
+ struct platform_device *ndev;
+ struct kobj_attribute power_attr[NVHOST_POWER_SYSFS_ATTRIB_MAX];
+};
+
+/* public host1x power management APIs */
+bool host1x_powered(struct platform_device *dev);
+void host1x_busy(struct platform_device *dev);
+void host1x_idle(struct platform_device *dev);
+
+/* public host1x sync-point management APIs */
+u32 host1x_syncpt_incr_max(u32 id, u32 incrs);
+void host1x_syncpt_incr(u32 id);
+u32 host1x_syncpt_read(u32 id);
+
+#endif
--
1.7.9.5
Sivaram Nair
2012-11-27 10:52:30 UTC
Permalink
On Mon, Nov 26, 2012 at 02:19:07PM +0100, Terje Bergstrom wrote:
> +
> +struct nvhost_chip_support *nvhost_chip_ops;

should be static?

> +static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
> +{
> + int err;
> +
> + err = nvhost_init_chip_support(host);
> + if (err)
> + return err;
> +
> + return 0;

nit: why not just return err - the 'if(err)' is unnecessary)?

> +
> + nvhost = host;

I think this should be delayed until the init is complete as this
variable is not cleared if there is a failure during init. Also I feel
that the name nvhost is a bit short for an exported variable.

> +static void to_state_running_locked(struct platform_device *dev)
> +{
> + struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> + int prev_state = pdata->powerstate;
> +
> + if (pdata->powerstate == NVHOST_POWER_STATE_POWERGATED)
> + to_state_clockgated_locked(dev);
> +
> + if (pdata->powerstate == NVHOST_POWER_STATE_CLOCKGATED) {
> + int i;
> +
> + if (dev->dev.parent)
> + nvhost_module_busy(to_platform_device(dev->dev.parent));
> +
> + for (i = 0; i < pdata->num_clks; i++) {
> + int err = clk_prepare_enable(pdata->clk[i]);
> + if (err) {
> + dev_err(&dev->dev, "Cannot turn on clock %s",
> + pdata->clocks[i].name);
> + return;

In case of an error, returning here leaves some clocks turned on.

Sivaram
Thierry Reding
2012-11-28 21:23:01 UTC
Permalink
On Mon, Nov 26, 2012 at 03:19:07PM +0200, Terje Bergstrom wrote:
[...]
> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> index fb9a14e..94c861b 100644
> --- a/drivers/video/Kconfig
> +++ b/drivers/video/Kconfig
> @@ -2463,4 +2463,6 @@ config FB_SH_MOBILE_MERAM
> Up to 4 memory channels can be configured, allowing 4 RGB or
> 2 YCbCr framebuffers to be configured.
>
> +source "drivers/video/tegra/host/Kconfig"
> +

This could be problematic. Since drivers/video and drivers/gpu/drm are
separate trees, this would entail a continuous burden on keeping both
trees synchronized. While I realize that eventually it might be better
to put the host1x driver in a separate place to accomodate for its use
by other subsystems, I'm not sure moving it here right away is the best
approach.

I'm not sure drivers/video is the best location either. Perhaps
drivers/bus would be better? Or maybe we need a new subdirectory for
this kind of device.

> diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/tegra/host/chip_support.c
> new file mode 100644
> index 0000000..5a44147
> --- /dev/null
> +++ b/drivers/video/tegra/host/chip_support.c
> @@ -0,0 +1,48 @@
> +/*
> + * drivers/video/tegra/host/chip_support.c

I think the general nowadays is to no longer use filenames in comments.

[...]
> +struct nvhost_chip_support *nvhost_chip_ops;
> +
> +struct nvhost_chip_support *nvhost_get_chip_ops(void)
> +{
> + return nvhost_chip_ops;
> +}

This seems like it should be more tightly coupled to the host1x device.
And it shouldn't be a global variable.

> +
> +int nvhost_init_chip_support(struct nvhost_master *host)
> +{
> + if (nvhost_chip_ops == NULL) {
> + nvhost_chip_ops = kzalloc(sizeof(*nvhost_chip_ops), GFP_KERNEL);
> + if (nvhost_chip_ops == NULL) {
> + pr_err("%s: Cannot allocate nvhost_chip_support\n",
> + __func__);
> + return -ENOMEM;
> + }
> + }
> +
> + nvhost_init_host1x01_support(host, nvhost_chip_ops);
> + return 0;
> +}

We also don't need this. This should really be done by the central
host1x device's initialization.

> diff --git a/drivers/video/tegra/host/chip_support.h b/drivers/video/tegra/host/chip_support.h
[...]
> +struct output;

What's this? It doesn't seem to be used anywhere.

> +struct nvhost_master;

Why do you suffix this with _master? The whole point of host1x is to be
the "master" so you can just as well call it nvhost, right? Ideally
you'd call it host1x, but I'm repeating myself. =)

> +struct nvhost_syncpt;
> +struct platform_device;
> +
> +struct nvhost_syncpt_ops {
> + void (*reset)(struct nvhost_syncpt *, u32 id);
> + void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
> + void (*read_wait_base)(struct nvhost_syncpt *, u32 id);
> + u32 (*update_min)(struct nvhost_syncpt *, u32 id);
> + void (*cpu_incr)(struct nvhost_syncpt *, u32 id);
> + void (*debug)(struct nvhost_syncpt *);
> + const char * (*name)(struct nvhost_syncpt *, u32 id);
> +};

Why are these even defined as ops structure? Tegra20 and Tegra30 seem to
be compatible when it comes to handling syncpoints. I thought they would
even be compatible in all other aspects as well, so why even have this?

> +
> +struct nvhost_chip_support {
> + const char *soc_name;
> + struct nvhost_syncpt_ops syncpt;
> +};
> +
> +struct nvhost_chip_support *nvhost_get_chip_ops(void);
> +
> +#define syncpt_op() (nvhost_get_chip_ops()->syncpt)

You really shouldn't be doing this, but rather use explicit accesses for
these structures. If you're design doesn't scatter these definitions
across several files then it isn't difficult to obtain the correct
pointers and you don't need these "shortcuts".

> diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
[...]
> +u32 host1x_syncpt_incr_max(u32 id, u32 incrs)
> +{
> + struct nvhost_syncpt *sp = &nvhost->syncpt;
> + return nvhost_syncpt_incr_max(sp, id, incrs);
> +}
> +EXPORT_SYMBOL(host1x_syncpt_incr_max);

This API looks odd. Should syncpoints not be considered as regular
resources, much like interrupts? In that case it would be easier to
abstract them away behind an opaque type. It looks like you already use
the struct nvhost_syncpt to refer to the set of syncpoints associated
with a host1x device.

How about you use nvhost/host1x_syncpt to refer to individual syncpoints
instead. You could export an array of those from your host1x device and
implement a basic resource allocation mechanism on top, similar to how
other resources are handled in the kernel.

So a host1x client device could call host1x_request_syncpt() to allocate
a syncpoint from it's host1x parent dynamically along with passing a
name and a syncpoint handler to it.

> +
> +void host1x_syncpt_incr(u32 id)
> +{
> + struct nvhost_syncpt *sp = &nvhost->syncpt;
> + nvhost_syncpt_incr(sp, id);
> +}
> +EXPORT_SYMBOL(host1x_syncpt_incr);

Similarly, instead of passing an integer here, host1x clients would pass
a pointer to the requested syncpoint instead.

> +bool host1x_powered(struct platform_device *dev)
> +{
[...]
> +}
> +EXPORT_SYMBOL(host1x_powered);
> +
> +void host1x_busy(struct platform_device *dev)
> +{
[...]
> +}
> +EXPORT_SYMBOL(host1x_busy);
> +
> +void host1x_idle(struct platform_device *dev)
> +{
[...]
> +}
> +EXPORT_SYMBOL(host1x_idle);

These look like a reimplementation of the runtime power-management
framework.

> diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c
[...]
> +struct nvhost_master *nvhost;

Bad habbit. I know that this is a popular shortcut. However this also
leads to very bad designs because you're allowed to reuse this pointer
from wherever you like.

When I wrote the tegra-drm code I explicitly made sure to not use any
such global variable. In the end it forces you to clean up the driver
design.

As a bonus you automatically get support for any number of host1x
devices on the same SoC. Now you will probably tell me that this is
never going to happen. People also used to think that a computers would
never use more than a single CPU...

> +static void power_on_host(struct platform_device *dev)
> +{
> + struct nvhost_master *host = nvhost_get_private_data(dev);
> +
> + nvhost_syncpt_reset(&host->syncpt);
> +}
> +
> +static int power_off_host(struct platform_device *dev)
> +{
> + struct nvhost_master *host = nvhost_get_private_data(dev);
> +
> + nvhost_syncpt_save(&host->syncpt);
> + return 0;
> +}

These seem like possible candidates for runtime PM.

> +
> +static void nvhost_free_resources(struct nvhost_master *host)
> +{
> +}

This should be removed since it's empty.

> +
> +static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
> +{
> + int err;
> +
> + err = nvhost_init_chip_support(host);
> + if (err)
> + return err;
> +
> + return 0;
> +}

Again, this chip support concept is not useful, so this function can go
away as well. Also nvhost_init_chip_support() doesn't allocate any
resources so it shouldn't be called from this function in the first
place.

> +
> +static int __devinit nvhost_probe(struct platform_device *dev)
> +{
> + struct nvhost_master *host;
> + struct resource *regs, *intr0, *intr1;
> + int i, err;
> + struct nvhost_device_data *pdata =
> + (struct nvhost_device_data *)dev->dev.platform_data;

Platform data should not be used. Tegra is DT only.

> + regs = platform_get_resource(dev, IORESOURCE_MEM, 0);
> + intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0);
> + intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1);
> +
> + if (!regs || !intr0 || !intr1) {

I prefer to have these checked for explicitly, one by one for
readability and potentially more useful diagnostics.

Also you should be using platform_get_irq() for interrupts. Furthermore
the host1x DT node (and the TRM) name the interrupts "syncpt" and
"general", so maybe those would be more useful variable names than
"intr0" and "intr1".

But since you don't use them anyway they shouldn't be part of this
patch.

> + host = devm_kzalloc(&dev->dev, sizeof(*host), GFP_KERNEL);
> + if (!host)
> + return -ENOMEM;
> +
> + nvhost = host;
> +
> + host->dev = dev;
> +
> + /* Copy host1x parameters. The private_data gets replaced
> + * by nvhost_master later */

Multiline comments should be in this format:

/*
* foo
*/

> + memcpy(&host->info, pdata->private_data,
> + sizeof(struct host1x_device_info));

I don't think passing data in this way shouldn't be necessary as
discussed in the subthread on the Tegra AUXDATA.

> +
> + pdata->finalize_poweron = power_on_host;
> + pdata->prepare_poweroff = power_off_host;
> +
> + pdata->pdev = dev;
> +
> + /* set common host1x device data */
> + platform_set_drvdata(dev, pdata);
> +
> + /* set private host1x device data */
> + nvhost_set_private_data(dev, host);
> +
> + host->aperture = devm_request_and_ioremap(&dev->dev, regs);
> + if (!host->aperture) {

aperture is confusing as it is typically used for GTT-type memory
regions, so it may be mistaken for the GART found on Tegra 2. Why not
call it "regs" instead?

> + dev_err(&dev->dev, "failed to remap host registers\n");

This is unnecessary. devm_request_and_ioremap() already prints an error
message on failure.

> + for (i = 0; i < pdata->num_clks; i++)
> + clk_prepare_enable(pdata->clk[i]);
> + nvhost_syncpt_reset(&host->syncpt);
> + for (i = 0; i < pdata->num_clks; i++)
> + clk_disable_unprepare(pdata->clk[i]);

Stephen already hinted at this when discussing the AUXDATA. You should
explicitly request the clocks.

> +static int __exit nvhost_remove(struct platform_device *dev)

This should really be __devexit to allow the driver to be built as a
module. However, __dev* are deprecated and in the process of being
removed so you can just drop __exit as well.

> +static struct of_device_id host1x_match[] __devinitdata = {

__devinitdata can be dropped.

> + { .compatible = "nvidia,tegra20-host1x", },
> + { .compatible = "nvidia,tegra30-host1x", },
> + { },
> +};
> +
> +static struct platform_driver platform_driver = {
> + .probe = nvhost_probe,
> + .remove = __exit_p(nvhost_remove),

__exit_p also.

> + .suspend = nvhost_suspend,
> + .resume = nvhost_resume,
> + .driver = {
> + .owner = THIS_MODULE,
> + .name = DRIVER_NAME,
> + .of_match_table = of_match_ptr(host1x_match),

No need for of_match_ptr().

> +static int __init nvhost_mod_init(void)
> +{
> + return platform_driver_register(&platform_driver);
> +}
> +
> +static void __exit nvhost_mod_exit(void)
> +{
> + platform_driver_unregister(&platform_driver);
> +}
> +
> +module_init(nvhost_mod_init);
> +module_exit(nvhost_mod_exit);

Use module_platform_driver().

> diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h
[...]
> +#define TRACE_MAX_LENGTH 128U
> +#define IFACE_NAME "nvhost"

None of these seem to be used.

> +static inline void *nvhost_get_private_data(struct platform_device *_dev)
> +{
> + struct nvhost_device_data *pdata =
> + (struct nvhost_device_data *)platform_get_drvdata(_dev);
> + WARN_ON(!pdata);
> + return (pdata && pdata->private_data) ? pdata->private_data : NULL;
> +}
> +
> +static inline void nvhost_set_private_data(struct platform_device *_dev,
> + void *priv_data)
> +{
> + struct nvhost_device_data *pdata =
> + (struct nvhost_device_data *)platform_get_drvdata(_dev);
> + WARN_ON(!pdata);
> + if (pdata)
> + pdata->private_data = priv_data;
> +}

You should need none of these. Instead put all the data you need into
you struct host1x and associate that with the platform device using
platform_set_drvdata().

> +static inline
> +struct nvhost_master *nvhost_get_host(struct platform_device *_dev)
> +{
> + struct platform_device *pdev;
> +
> + if (_dev->dev.parent) {
> + pdev = to_platform_device(_dev->dev.parent);
> + return nvhost_get_private_data(pdev);
> + } else
> + return nvhost_get_private_data(_dev);
> +}
> +
> +static inline
> +struct platform_device *nvhost_get_parent(struct platform_device *_dev)
> +{
> + return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL;
> +}

These don't seem to be used.

> diff --git a/drivers/video/tegra/host/host1x/host1x01.c b/drivers/video/tegra/host/host1x/host1x01.c
> new file mode 100644
> index 0000000..d53302d
> --- /dev/null
> +++ b/drivers/video/tegra/host/host1x/host1x01.c
> @@ -0,0 +1,37 @@
> +/*
> + * drivers/video/tegra/host/host1x01.c
> + *
> + * Host1x init for T20 and T30 Architecture Chips
> + *
> + * Copyright (c) 2011-2012, NVIDIA Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/nvhost.h>
> +
> +#include "host1x/host1x01.h"
> +#include "host1x/host1x.h"
> +#include "host1x/host1x01_hardware.h"
> +#include "chip_support.h"
> +
> +#include "host1x/host1x_syncpt.c"
> +
> +int nvhost_init_host1x01_support(struct nvhost_master *host,
> + struct nvhost_chip_support *op)
> +{
> + host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE;

Usually you don't keep separate variables for subregions. This can
equally well be done with just adding a corresponding offset.

Then again, I already said that this whole chip support concept is
unnecessary and can be dropped.

> diff --git a/drivers/video/tegra/host/host1x/host1x_syncpt.c b/drivers/video/tegra/host/host1x/host1x_syncpt.c
[...]
> +/**
> + * Write the current syncpoint value back to hw.
> + */
> +static void host1x_syncpt_reset(struct nvhost_syncpt *sp, u32 id)
> +{
> + struct nvhost_master *dev = syncpt_to_dev(sp);
> + int min = nvhost_syncpt_read_min(sp, id);
> + writel(min, dev->sync_aperture + (host1x_sync_syncpt_0_r() + id * 4));
> +}

Again, better to represent individual syncpoints with opaque pointers
and dereference them here. Obviously this file will need access to the
structure definition.

> +static void host1x_syncpt_debug(struct nvhost_syncpt *sp)
> +{
> + u32 i;
> + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
> + u32 max = nvhost_syncpt_read_max(sp, i);
> + u32 min = nvhost_syncpt_update_min(sp, i);
> + if (!max && !min)
> + continue;
> + dev_info(&syncpt_to_dev(sp)->dev->dev,
> + "id %d (%s) min %d max %d\n",
> + i, syncpt_op().name(sp, i),
> + min, max);
> +
> + }
> +
> + for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) {
> + u32 base_val;
> + host1x_syncpt_read_wait_base(sp, i);
> + base_val = sp->base_val[i];
> + if (base_val)
> + dev_info(&syncpt_to_dev(sp)->dev->dev,
> + "waitbase id %d val %d\n",
> + i, base_val);
> +
> + }
> +}

This should probably be integrated with debugfs.

> diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h

Autogenerated files are generally not acceptable. And I already
mentioned before that you should be using #define instead of static
inline functions for register and bit definitions.

> diff --git a/drivers/video/tegra/host/nvhost_acm.c b/drivers/video/tegra/host/nvhost_acm.c
[...]

This whole file largely looks like a reimplementation of runtime PM. You
should investigate if you can't reuse the existing infrastructure.

> + /* Init the power sysfs attributes for this device */
> + pdata->power_attrib = devm_kzalloc(&dev->dev,
> + sizeof(struct nvhost_device_power_attr),
> + GFP_KERNEL);
> + if (!pdata->power_attrib) {
> + dev_err(&dev->dev, "Unable to allocate sysfs attributes\n");
> + return -ENOMEM;
> + }
> + pdata->power_attrib->ndev = dev;
> +
> + pdata->power_kobj = kobject_create_and_add("acm", &dev->dev.kobj);
> + if (!pdata->power_kobj) {
> + dev_err(&dev->dev, "Could not add dir 'power'\n");
> + err = -EIO;
> + goto fail_attrib_alloc;
> + }
> +
> + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY];
> + attr->attr.name = "clockgate_delay";
> + attr->attr.mode = S_IWUSR | S_IRUGO;
> + attr->show = clockgate_delay_show;
> + attr->store = clockgate_delay_store;
> + if (sysfs_create_file(pdata->power_kobj, &attr->attr)) {
> + dev_err(&dev->dev, "Could not create sysfs attribute clockgate_delay\n");
> + err = -EIO;
> + goto fail_clockdelay;
> + }
> +
> + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY];
> + attr->attr.name = "powergate_delay";
> + attr->attr.mode = S_IWUSR | S_IRUGO;
> + attr->show = powergate_delay_show;
> + attr->store = powergate_delay_store;
> + if (sysfs_create_file(pdata->power_kobj, &attr->attr)) {
> + dev_err(&dev->dev, "Could not create sysfs attribute powergate_delay\n");
> + err = -EIO;
> + goto fail_powergatedelay;
> + }
> +
> + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_REFCOUNT];
> + attr->attr.name = "refcount";
> + attr->attr.mode = S_IRUGO;
> + attr->show = refcount_show;
> + if (sysfs_create_file(pdata->power_kobj, &attr->attr)) {
> + dev_err(&dev->dev, "Could not create sysfs attribute refcount\n");
> + err = -EIO;
> + goto fail_refcount;
> + }

This is a very funky way of creating sysfs attributes. What you probably
want here are device attributes. See Documentation/filesystems/sysfs.txt
and include/linux/sysfs.h.

But if you can replace this by runtime PM, you'll get similar
functionality for free anyway.

> diff --git a/drivers/video/tegra/host/nvhost_syncpt.c b/drivers/video/tegra/host/nvhost_syncpt.c
[...]
> +/**
> + * Returns true if syncpoint is expired, false if we may need to wait
> + */
> +bool nvhost_syncpt_is_expired(
> + struct nvhost_syncpt *sp,
> + u32 id,
> + u32 thresh)
> +{
> + u32 current_val;
> + u32 future_val;
> + smp_rmb();

What do you need a read memory barrier for?

> +/* Displays the current value of the sync point via sysfs */
> +static ssize_t syncpt_min_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + struct nvhost_syncpt_attr *syncpt_attr =
> + container_of(attr, struct nvhost_syncpt_attr, attr);
> +
> + return snprintf(buf, PAGE_SIZE, "%u",
> + nvhost_syncpt_read(&syncpt_attr->host->syncpt,
> + syncpt_attr->id));
> +}
> +
> +static ssize_t syncpt_max_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + struct nvhost_syncpt_attr *syncpt_attr =
> + container_of(attr, struct nvhost_syncpt_attr, attr);
> +
> + return snprintf(buf, PAGE_SIZE, "%u",
> + nvhost_syncpt_read_max(&syncpt_attr->host->syncpt,
> + syncpt_attr->id));
> +}

This looks like it belongs in debugfs.

> +int nvhost_syncpt_init(struct platform_device *dev,
> + struct nvhost_syncpt *sp)
> +{
> + int i;
> + struct nvhost_master *host = syncpt_to_dev(sp);
> + int err = 0;
> +
> + /* Allocate structs for min, max and base values */
> + sp->min_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
> + GFP_KERNEL);
> + sp->max_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
> + GFP_KERNEL);
> + sp->base_val = kzalloc(sizeof(u32) * nvhost_syncpt_nb_bases(sp),
> + GFP_KERNEL);
> + sp->lock_counts =
> + kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_mlocks(sp),
> + GFP_KERNEL);

Again, I really think that syncpoints should be objects with
corresponding attributes instead of keeping them in these arrays.

> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
[...]
> +struct host1x_device_info {
> + int nb_channels; /* host1x: num channels supported */
> + int nb_pts; /* host1x: num syncpoints supported */
> + int nb_bases; /* host1x: num syncpoints supported */
> + u32 client_managed; /* host1x: client managed syncpts */
> + int nb_mlocks; /* host1x: number of mlocks */
> + const char **syncpt_names; /* names of sync points */
> +};
> +
> +struct nvhost_device_data {
> + int version; /* ip version number of device */
> + int id; /* Separates clients of same hw */
> + int index; /* Hardware channel number */
> + void __iomem *aperture; /* Iomem mapped to kernel */
> +
> + u32 syncpts; /* Bitfield of sync points used */
> + u32 modulemutexes; /* Bit field of module mutexes */
> +
> + u32 class; /* Device class */
> + bool serialize; /* Serialize submits in the channel */
> +
> + int powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS];
> + bool can_powergate; /* True if module can be power gated */
> + int clockgate_delay;/* Delay before clock gated */
> + int powergate_delay;/* Delay before power gated */
> + struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */
> +
> + struct delayed_work powerstate_down;/* Power state management */
> + int num_clks; /* Number of clocks opened for dev */
> + struct clk *clk[NVHOST_MODULE_MAX_CLOCKS];
> + struct mutex lock; /* Power management lock */
> + int powerstate; /* Current power state */
> + int refcount; /* Number of tasks active */
> + wait_queue_head_t idle_wq; /* Work queue for idle */
> +
> + struct nvhost_channel *channel; /* Channel assigned for the module */
> + struct kobject *power_kobj; /* kobj to hold power sysfs entries */
> + struct nvhost_device_power_attr *power_attrib; /* sysfs attributes */
> + struct dentry *debugfs; /* debugfs directory */
> +
> + void *private_data; /* private platform data */
> + struct platform_device *pdev; /* owner platform_device */
> +
> + /* Finalize power on. Can be used for context restore. */
> + void (*finalize_poweron)(struct platform_device *dev);
> +
> + /* Device is busy. */
> + void (*busy)(struct platform_device *);
> +
> + /* Device is idle. */
> + void (*idle)(struct platform_device *);
> +
> + /* Device is going to be suspended */
> + void (*suspend_ndev)(struct platform_device *);
> +
> + /* Device is initialized */
> + void (*init)(struct platform_device *dev);
> +
> + /* Device is de-initialized. */
> + void (*deinit)(struct platform_device *dev);
> +
> + /* Preparing for power off. Used for context save. */
> + int (*prepare_poweroff)(struct platform_device *dev);
> +
> + /* Clock gating callbacks */
> + int (*prepare_clockoff)(struct platform_device *dev);
> + void (*finalize_clockon)(struct platform_device *dev);
> +};

A lot of this can be removed if you use existing infrastructure and
simplify the design a bit. Most of it can probably move into the main
struct host1x to avoid needless indirections that make the code hard to
follow and maintain.

Thierry
Terje Bergström
2012-11-29 10:21:04 UTC
Permalink
On 28.11.2012 23:23, Thierry Reding wrote:
> This could be problematic. Since drivers/video and drivers/gpu/drm are
> separate trees, this would entail a continuous burden on keeping both
> trees synchronized. While I realize that eventually it might be better
> to put the host1x driver in a separate place to accomodate for its use
> by other subsystems, I'm not sure moving it here right away is the best
> approach.

I understand your point, but I hope also that we'd end up with something
that can be used as basis for the downstream kernel to migrate to
upstream stack.

The key point here is to make the API between nvhost and tegradrm as
small and robust to changes as possible.

> I'm not sure drivers/video is the best location either. Perhaps
> drivers/bus would be better? Or maybe we need a new subdirectory for
> this kind of device.

This is a question I don't have an answer to. I'm perfectly ok moving it
wherever the public opinion leads it to.

> I think the general nowadays is to no longer use filenames in comments.

Ok, I hadn't noticed that. I'll remove them. It's redundant information
anyway.

>> +struct nvhost_chip_support *nvhost_chip_ops;
>> +
>> +struct nvhost_chip_support *nvhost_get_chip_ops(void)
>> +{
>> + return nvhost_chip_ops;
>> +}
>
> This seems like it should be more tightly coupled to the host1x device.
> And it shouldn't be a global variable.

Yeah, I will figure out a better way to handle the chip ops. I'm not too
happy with it. Give me a bit of time to come up with a good solution.

>> +struct output;
>
> What's this? It doesn't seem to be used anywhere.

It's just a mistake. The struct is used in debug code, but not referred
to in this file so the forward declaration is not needed.

>> +struct nvhost_master;
>
> Why do you suffix this with _master? The whole point of host1x is to be
> the "master" so you can just as well call it nvhost, right? Ideally
> you'd call it host1x, but I'm repeating myself. =)

Yes, the name is just a historic relict and I'm blind to them as I've
been staring at the code for so long. I think "host1x" would be a good
name for the struct.

>> +struct nvhost_syncpt_ops {
>> + void (*reset)(struct nvhost_syncpt *, u32 id);
>> + void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
>> + void (*read_wait_base)(struct nvhost_syncpt *, u32 id);
>> + u32 (*update_min)(struct nvhost_syncpt *, u32 id);
>> + void (*cpu_incr)(struct nvhost_syncpt *, u32 id);
>> + void (*debug)(struct nvhost_syncpt *);
>> + const char * (*name)(struct nvhost_syncpt *, u32 id);
>> +};
>
> Why are these even defined as ops structure? Tegra20 and Tegra30 seem to
> be compatible when it comes to handling syncpoints. I thought they would
> even be compatible in all other aspects as well, so why even have this?

Tegra20 and Tegra30 are compatible, but future chips are not. I was
hoping we would be ready in upstream kernel for future chips.

>> +#define syncpt_op() (nvhost_get_chip_ops()->syncpt)
>
> You really shouldn't be doing this, but rather use explicit accesses for
> these structures. If you're design doesn't scatter these definitions
> across several files then it isn't difficult to obtain the correct
> pointers and you don't need these "shortcuts".

Do you mean that I would move the ops to be part of f.ex. nvhost_syncpt
or nvhost_intr structs?

> This API looks odd. Should syncpoints not be considered as regular
> resources, much like interrupts? In that case it would be easier to
> abstract them away behind an opaque type. It looks like you already use
> the struct nvhost_syncpt to refer to the set of syncpoints associated
> with a host1x device.
>
> How about you use nvhost/host1x_syncpt to refer to individual syncpoints
> instead. You could export an array of those from your host1x device and
> implement a basic resource allocation mechanism on top, similar to how
> other resources are handled in the kernel.
>
> So a host1x client device could call host1x_request_syncpt() to allocate
> a syncpoint from it's host1x parent dynamically along with passing a
> name and a syncpoint handler to it.

That might work. I'll think about that - thanks.

>> +bool host1x_powered(struct platform_device *dev)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_powered);
>> +
>> +void host1x_busy(struct platform_device *dev)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_busy);
>> +
>> +void host1x_idle(struct platform_device *dev)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_idle);
>
> These look like a reimplementation of the runtime power-management
> framework.

Yes, we at some point tried to move to use runtime PM. The first attempt
was thwarted by runtime PM and system suspend conflicting with each
other. I believe this is pretty much fixed in later versions of kernel.

Also, the problem was that runtime PM doesn't support multiple power
states. In downstream kernel, we support clock gating and power gating.
When we moved to runtime PM and implemented power gating on top of that,
we ended up with more code than just using the current ACM code.

I have a developer starting to look into how we could take runtime PM
again into use with proper power gating support. It'll take a while to
get that right. It might be best to rip the dynamic power management out
from this patch set, and introduce it later when we have a proper
runtime PM solution.

I'll skip the other comments about ACM because of this.

>> diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c
> [...]
>> +struct nvhost_master *nvhost;
> Bad habbit. I know that this is a popular shortcut. However this also
> leads to very bad designs because you're allowed to reuse this pointer
> from wherever you like.
>
> When I wrote the tegra-drm code I explicitly made sure to not use any
> such global variable. In the end it forces you to clean up the driver
> design.
>
> As a bonus you automatically get support for any number of host1x
> devices on the same SoC. Now you will probably tell me that this is
> never going to happen. People also used to think that a computers would
> never use more than a single CPU...

I think this might get cleaned up at the same time with cleaning up the
auxdata/chip_ops design. We used to have this struct set as driver
private data, but as we started using that for another purpose, we moved
this variable out.

>> +
>> +static void nvhost_free_resources(struct nvhost_master *host)
>> +{
>> +}
>
> This should be removed since it's empty.

True. I wonder how that happened - there was content since recently, but
I guess I deleted the code without noticing that the function needs to
go, too.

>> + regs = platform_get_resource(dev, IORESOURCE_MEM, 0);
>> + intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0);
>> + intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1);
>> +
>> + if (!regs || !intr0 || !intr1) {
>
> I prefer to have these checked for explicitly, one by one for
> readability and potentially more useful diagnostics.

Can do.

> Also you should be using platform_get_irq() for interrupts. Furthermore
> the host1x DT node (and the TRM) name the interrupts "syncpt" and
> "general", so maybe those would be more useful variable names than
> "intr0" and "intr1".
>
> But since you don't use them anyway they shouldn't be part of this
> patch.

True. I might also as well delete the general interrupt altogether, as
we don't use it for any real purpose.

>> + /* Copy host1x parameters. The private_data gets replaced
>> + * by nvhost_master later */
>
> Multiline comments should be in this format:
>
> /*
> * foo
> */

Ok.

>> + host->aperture = devm_request_and_ioremap(&dev->dev, regs);
>> + if (!host->aperture) {
>
> aperture is confusing as it is typically used for GTT-type memory
> regions, so it may be mistaken for the GART found on Tegra 2. Why not
> call it "regs" instead?

Can do.

>
>> + dev_err(&dev->dev, "failed to remap host registers\n");
>
> This is unnecessary. devm_request_and_ioremap() already prints an error
> message on failure.

I'll remove that, thanks.

>
>> + for (i = 0; i < pdata->num_clks; i++)
>> + clk_prepare_enable(pdata->clk[i]);
>> + nvhost_syncpt_reset(&host->syncpt);
>> + for (i = 0; i < pdata->num_clks; i++)
>> + clk_disable_unprepare(pdata->clk[i]);
>
> Stephen already hinted at this when discussing the AUXDATA. You should
> explicitly request the clocks.

I'm not too happy about that idea. The clock management code is generic
for all modules, and that's why it's driven by a data structure. Now
Stephen and you ask me to unroll the loops and make copies of the code
to facilitate different modules and different SoCs.

>> +static int __exit nvhost_remove(struct platform_device *dev)
>
> This should really be __devexit to allow the driver to be built as a
> module. However, __dev* are deprecated and in the process of being
> removed so you can just drop __exit as well.
>> +static struct of_device_id host1x_match[] __devinitdata = {
>
> __devinitdata can be dropped.
>> + { .compatible = "nvidia,tegra20-host1x", },
>> + { .compatible = "nvidia,tegra30-host1x", },
>> + { },
>> +};
>> +
>> +static struct platform_driver platform_driver = {
>> + .probe = nvhost_probe,
>> + .remove = __exit_p(nvhost_remove),
>
> __exit_p also.

Ok.

>> + .of_match_table = of_match_ptr(host1x_match),
>
> No need for of_match_ptr().

Will remove.

>> +module_init(nvhost_mod_init);
>> +module_exit(nvhost_mod_exit);
>
> Use module_platform_driver().

Ok.

>
>> diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h
> [...]
>> +#define TRACE_MAX_LENGTH 128U
>> +#define IFACE_NAME "nvhost"
>
> None of these seem to be used.

Will remove.

>> +static inline void *nvhost_get_private_data(struct platform_device *_dev)
>> +{
>> + struct nvhost_device_data *pdata =
>> + (struct nvhost_device_data *)platform_get_drvdata(_dev);
>> + WARN_ON(!pdata);
>> + return (pdata && pdata->private_data) ? pdata->private_data : NULL;
>> +}
>> +
>> +static inline void nvhost_set_private_data(struct platform_device *_dev,
>> + void *priv_data)
>> +{
>> + struct nvhost_device_data *pdata =
>> + (struct nvhost_device_data *)platform_get_drvdata(_dev);
>> + WARN_ON(!pdata);
>> + if (pdata)
>> + pdata->private_data = priv_data;
>> +}
>
> You should need none of these. Instead put all the data you need into
> you struct host1x and associate that with the platform device using
> platform_set_drvdata().

I need to actually find a way to do this for both host1x itself, and the
2D module. But as said, I'll try to remove the auxdata and come up with
something better.

>> +static inline
>> +struct nvhost_master *nvhost_get_host(struct platform_device *_dev)
>> +{
>> + struct platform_device *pdev;
>> +
>> + if (_dev->dev.parent) {
>> + pdev = to_platform_device(_dev->dev.parent);
>> + return nvhost_get_private_data(pdev);
>> + } else
>> + return nvhost_get_private_data(_dev);
>> +}
>> +
>> +static inline
>> +struct platform_device *nvhost_get_parent(struct platform_device *_dev)
>> +{
>> + return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL;
>> +}
>
> These don't seem to be used.

nvhost_get_host() is used in a subsequent patch, but getting parent
doesn't seem to be.

> Usually you don't keep separate variables for subregions. This can
> equally well be done with just adding a corresponding offset.

Hmm, ok, I could do that, but it just sounds logical to have only one
piece of code that finds the sync reg base. I don't really understand
why it needs to be duplicate for every access.

>> +static void host1x_syncpt_debug(struct nvhost_syncpt *sp)
>> +{
>> + u32 i;
>> + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
>> + u32 max = nvhost_syncpt_read_max(sp, i);
>> + u32 min = nvhost_syncpt_update_min(sp, i);
>> + if (!max && !min)
>> + continue;
>> + dev_info(&syncpt_to_dev(sp)->dev->dev,
>> + "id %d (%s) min %d max %d\n",
>> + i, syncpt_op().name(sp, i),
>> + min, max);
>> +
>> + }
>> +
>> + for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) {
>> + u32 base_val;
>> + host1x_syncpt_read_wait_base(sp, i);
>> + base_val = sp->base_val[i];
>> + if (base_val)
>> + dev_info(&syncpt_to_dev(sp)->dev->dev,
>> + "waitbase id %d val %d\n",
>> + i, base_val);
>> +
>> + }
>> +}
>
> This should probably be integrated with debugfs.

I could move this to debug.c, but it's debugging aid when a command
stream is misbehaving and it spews this to UART when sync point wait is
timing out. So not debugfs stuff.

>> diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h
>
> Autogenerated files are generally not acceptable. And I already
> mentioned before that you should be using #define instead of static
> inline functions for register and bit definitions.

What's the root cause for autogenerated files not being acceptable? I'm
autogenerating them from definitions I get from hardware, which makes
the results reliable.

I like static inline because I get the benefit of compiler type
checking, and gcov shows me which register definitions have been used in
different tests.

#defines are always messy and I pretty much hate them. But if the
general request is to use #define's, even though I don't agree, I can
accommodate. It's simple to write a sed script to do the conversion.

> This is a very funky way of creating sysfs attributes. What you probably
> want here are device attributes. See Documentation/filesystems/sysfs.txt
> and include/linux/sysfs.h.

Thanks for the pointers, looks like device attributes could be used.

>> +bool nvhost_syncpt_is_expired(
>> + struct nvhost_syncpt *sp,
>> + u32 id,
>> + u32 thresh)
>> +{
>> + u32 current_val;
>> + u32 future_val;
>> + smp_rmb();
>
> What do you need a read memory barrier for?

I'll test without that. I can't see any valid reason, and I have a
couple of other similar problems.

>> +/* Displays the current value of the sync point via sysfs */
>> +static ssize_t syncpt_min_show(struct kobject *kobj,
>> + struct kobj_attribute *attr, char *buf)
>> +{
>> + struct nvhost_syncpt_attr *syncpt_attr =
>> + container_of(attr, struct nvhost_syncpt_attr, attr);
>> +
>> + return snprintf(buf, PAGE_SIZE, "%u",
>> + nvhost_syncpt_read(&syncpt_attr->host->syncpt,
>> + syncpt_attr->id));
>> +}
>> +
>> +static ssize_t syncpt_max_show(struct kobject *kobj,
>> + struct kobj_attribute *attr, char *buf)
>> +{
>> + struct nvhost_syncpt_attr *syncpt_attr =
>> + container_of(attr, struct nvhost_syncpt_attr, attr);
>> +
>> + return snprintf(buf, PAGE_SIZE, "%u",
>> + nvhost_syncpt_read_max(&syncpt_attr->host->syncpt,
>> + syncpt_attr->id));
>> +}
>
> This looks like it belongs in debugfs.

This is actually the only interface to read the max value to user space,
which can be useful for doing some comparisons that take wrapping into
account. But we could just add IOCTLs and remove the sysfs entries.

>> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
> [...]
>> +struct host1x_device_info {
>> + int nb_channels; /* host1x: num channels supported */
>> + int nb_pts; /* host1x: num syncpoints supported */
>> + int nb_bases; /* host1x: num syncpoints supported */
>> + u32 client_managed; /* host1x: client managed syncpts */
>> + int nb_mlocks; /* host1x: number of mlocks */
>> + const char **syncpt_names; /* names of sync points */
>> +};
>> +
>> +struct nvhost_device_data {
>> + int version; /* ip version number of device */
>> + int id; /* Separates clients of same hw */
>> + int index; /* Hardware channel number */
>> + void __iomem *aperture; /* Iomem mapped to kernel */
>> +
>> + u32 syncpts; /* Bitfield of sync points used */
>> + u32 modulemutexes; /* Bit field of module mutexes */
>> +
>> + u32 class; /* Device class */
>> + bool serialize; /* Serialize submits in the channel */
>> +
>> + int powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS];
>> + bool can_powergate; /* True if module can be power gated */
>> + int clockgate_delay;/* Delay before clock gated */
>> + int powergate_delay;/* Delay before power gated */
>> + struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */
>> +
>> + struct delayed_work powerstate_down;/* Power state management */
>> + int num_clks; /* Number of clocks opened for dev */
>> + struct clk *clk[NVHOST_MODULE_MAX_CLOCKS];
>> + struct mutex lock; /* Power management lock */
>> + int powerstate; /* Current power state */
>> + int refcount; /* Number of tasks active */
>> + wait_queue_head_t idle_wq; /* Work queue for idle */
>> +
>> + struct nvhost_channel *channel; /* Channel assigned for the module */
>> + struct kobject *power_kobj; /* kobj to hold power sysfs entries */
>> + struct nvhost_device_power_attr *power_attrib; /* sysfs attributes */
>> + struct dentry *debugfs; /* debugfs directory */
>> +
>> + void *private_data; /* private platform data */
>> + struct platform_device *pdev; /* owner platform_device */
>> +
>> + /* Finalize power on. Can be used for context restore. */
>> + void (*finalize_poweron)(struct platform_device *dev);
>> +
>> + /* Device is busy. */
>> + void (*busy)(struct platform_device *);
>> +
>> + /* Device is idle. */
>> + void (*idle)(struct platform_device *);
>> +
>> + /* Device is going to be suspended */
>> + void (*suspend_ndev)(struct platform_device *);
>> +
>> + /* Device is initialized */
>> + void (*init)(struct platform_device *dev);
>> +
>> + /* Device is de-initialized. */
>> + void (*deinit)(struct platform_device *dev);
>> +
>> + /* Preparing for power off. Used for context save. */
>> + int (*prepare_poweroff)(struct platform_device *dev);
>> +
>> + /* Clock gating callbacks */
>> + int (*prepare_clockoff)(struct platform_device *dev);
>> + void (*finalize_clockon)(struct platform_device *dev);
>> +};
>
> A lot of this can be removed if you use existing infrastructure and
> simplify the design a bit. Most of it can probably move into the main
> struct host1x to avoid needless indirections that make the code hard to
> follow and maintain.

Actually, this struct is generic for host1x and host1x clients, so they
cannot be moved to host1x. I do also realize that I'm not using them in
the patch sets I sent for 2D.

Terje
Thierry Reding
2012-11-29 11:47:04 UTC
Permalink
On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
> On 28.11.2012 23:23, Thierry Reding wrote:
> > This could be problematic. Since drivers/video and drivers/gpu/drm are
> > separate trees, this would entail a continuous burden on keeping both
> > trees synchronized. While I realize that eventually it might be better
> > to put the host1x driver in a separate place to accomodate for its use
> > by other subsystems, I'm not sure moving it here right away is the best
> > approach.
>
> I understand your point, but I hope also that we'd end up with something
> that can be used as basis for the downstream kernel to migrate to
> upstream stack.
>
> The key point here is to make the API between nvhost and tegradrm as
> small and robust to changes as possible.

I agree. But I also fear that there will be changes eventually and
having both go in via different tree requires those trees to be merged
in a specific order to avoid breakage should the API change. This will
be particularly ugly in linux-next.

That's why I explicitly proposed to take this into drivers/gpu/drm/tegra
for the time being, until we can be reasonably sure that the API is
fixed. Then I'm fine with moving it wherever seems the best fit. Even
then there might be the occasional dependency, but they should get fewer
and fewer as the code matures.

> >> +struct nvhost_syncpt_ops {
> >> + void (*reset)(struct nvhost_syncpt *, u32 id);
> >> + void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
> >> + void (*read_wait_base)(struct nvhost_syncpt *, u32 id);
> >> + u32 (*update_min)(struct nvhost_syncpt *, u32 id);
> >> + void (*cpu_incr)(struct nvhost_syncpt *, u32 id);
> >> + void (*debug)(struct nvhost_syncpt *);
> >> + const char * (*name)(struct nvhost_syncpt *, u32 id);
> >> +};
> >
> > Why are these even defined as ops structure? Tegra20 and Tegra30 seem to
> > be compatible when it comes to handling syncpoints. I thought they would
> > even be compatible in all other aspects as well, so why even have this?
>
> Tegra20 and Tegra30 are compatible, but future chips are not. I was
> hoping we would be ready in upstream kernel for future chips.

I think we should ignore that problem for now. Generally planning for
any possible combination of incompatibilities leads to overgeneralized
designs that require precisely these kinds of indirections.

Once some documentation for Tegra 40 materializes we can start thinking
about how to encapsulate the incompatible code.

> >> +#define syncpt_op() (nvhost_get_chip_ops()->syncpt)
> >
> > You really shouldn't be doing this, but rather use explicit accesses for
> > these structures. If you're design doesn't scatter these definitions
> > across several files then it isn't difficult to obtain the correct
> > pointers and you don't need these "shortcuts".
>
> Do you mean that I would move the ops to be part of f.ex. nvhost_syncpt
> or nvhost_intr structs?

Not quite. What I'm saying is that unless there is a reason to
encapsulate the functions into an ops structure (for instance because of
incompatibilities across chip generations) they shouldn't be pointers in
a struct at all.

For that matter I don't think you need the nvhost_syncpt and nvhost_intr
structures either.

> >> +bool host1x_powered(struct platform_device *dev)
> >> +{
> > [...]
> >> +}
> >> +EXPORT_SYMBOL(host1x_powered);
> >> +
> >> +void host1x_busy(struct platform_device *dev)
> >> +{
> > [...]
> >> +}
> >> +EXPORT_SYMBOL(host1x_busy);
> >> +
> >> +void host1x_idle(struct platform_device *dev)
> >> +{
> > [...]
> >> +}
> >> +EXPORT_SYMBOL(host1x_idle);
> >
> > These look like a reimplementation of the runtime power-management
> > framework.
>
> Yes, we at some point tried to move to use runtime PM. The first attempt
> was thwarted by runtime PM and system suspend conflicting with each
> other. I believe this is pretty much fixed in later versions of kernel.
>
> Also, the problem was that runtime PM doesn't support multiple power
> states. In downstream kernel, we support clock gating and power gating.
> When we moved to runtime PM and implemented power gating on top of that,
> we ended up with more code than just using the current ACM code.
>
> I have a developer starting to look into how we could take runtime PM
> again into use with proper power gating support. It'll take a while to
> get that right. It might be best to rip the dynamic power management out
> from this patch set, and introduce it later when we have a proper
> runtime PM solution.

Okay, sounds like a plan. Even if it turns out that the current runtime
PM implementation doesn't provide every functionality that you need, we
should try to get these changes into the existing frameworks instead of
copying large chunks of code.

> >> +static void nvhost_free_resources(struct nvhost_master *host)
> >> +{
> >> +}
> >
> > This should be removed since it's empty.
>
> True. I wonder how that happened - there was content since recently, but
> I guess I deleted the code without noticing that the function needs to
> go, too.

I noticed that it was filled with content in one of the subsequent
patches. Depending on how this gets merged eventually you could postpone
adding the function until the later patch. But perhaps once the code has
been properly reviewed we can just squash the patches again. We'll see.

> > Also you should be using platform_get_irq() for interrupts. Furthermore
> > the host1x DT node (and the TRM) name the interrupts "syncpt" and
> > "general", so maybe those would be more useful variable names than
> > "intr0" and "intr1".
> >
> > But since you don't use them anyway they shouldn't be part of this
> > patch.
>
> True. I might also as well delete the general interrupt altogether, as
> we don't use it for any real purpose.

I think it might still be useful for diagnostics. It seems to be used
when writes time out. That could still be helpful information when
debugging problems.

> >> + for (i = 0; i < pdata->num_clks; i++)
> >> + clk_prepare_enable(pdata->clk[i]);
> >> + nvhost_syncpt_reset(&host->syncpt);
> >> + for (i = 0; i < pdata->num_clks; i++)
> >> + clk_disable_unprepare(pdata->clk[i]);
> >
> > Stephen already hinted at this when discussing the AUXDATA. You should
> > explicitly request the clocks.
>
> I'm not too happy about that idea. The clock management code is generic
> for all modules, and that's why it's driven by a data structure. Now
> Stephen and you ask me to unroll the loops and make copies of the code
> to facilitate different modules and different SoCs.

Making this generic for all modules may not be what we want as it
doesn't allow devices to handle things themselves if necessary. Clock
management is just part of the boiler plate that every driver is
supposed to cope with. Also the number of clocks is usually not higher
than 2 or 3, so the pain is manageable. =)

Furthermore doing this in loops may not work for all modules. Some may
require additional delays between enabling the clocks, others may be
able to selectively disable one clock but not the other(s).

> >> +static inline void *nvhost_get_private_data(struct platform_device *_dev)
> >> +{
> >> + struct nvhost_device_data *pdata =
> >> + (struct nvhost_device_data *)platform_get_drvdata(_dev);
> >> + WARN_ON(!pdata);
> >> + return (pdata && pdata->private_data) ? pdata->private_data : NULL;
> >> +}
> >> +
> >> +static inline void nvhost_set_private_data(struct platform_device *_dev,
> >> + void *priv_data)
> >> +{
> >> + struct nvhost_device_data *pdata =
> >> + (struct nvhost_device_data *)platform_get_drvdata(_dev);
> >> + WARN_ON(!pdata);
> >> + if (pdata)
> >> + pdata->private_data = priv_data;
> >> +}
> >
> > You should need none of these. Instead put all the data you need into
> > you struct host1x and associate that with the platform device using
> > platform_set_drvdata().
>
> I need to actually find a way to do this for both host1x itself, and the
> 2D module. But as said, I'll try to remove the auxdata and come up with
> something better.

The existing host1x and DRM code could serve as an example since I
explicitly wrote them to behave properly.

> >> +static inline
> >> +struct nvhost_master *nvhost_get_host(struct platform_device *_dev)
> >> +{
> >> + struct platform_device *pdev;
> >> +
> >> + if (_dev->dev.parent) {
> >> + pdev = to_platform_device(_dev->dev.parent);
> >> + return nvhost_get_private_data(pdev);
> >> + } else
> >> + return nvhost_get_private_data(_dev);
> >> +}
> >> +
> >> +static inline
> >> +struct platform_device *nvhost_get_parent(struct platform_device *_dev)
> >> +{
> >> + return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL;
> >> +}
> >
> > These don't seem to be used.
>
> nvhost_get_host() is used in a subsequent patch, but getting parent
> doesn't seem to be.

Again, if you look at the existing tegra-drm code, the client modules
already use something a bit more explicit to obtain a reference to the
host1x:

struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);

The good thing about it is that it very clearly says where the host1x
pointer should be coming from. Explicitness is good.

> > Usually you don't keep separate variables for subregions. This can
> > equally well be done with just adding a corresponding offset.
>
> Hmm, ok, I could do that, but it just sounds logical to have only one
> piece of code that finds the sync reg base. I don't really understand
> why it needs to be duplicate for every access.

You wouldn't actually be duplicating it. Rather you'd just add another
offset. But I commented on this more explicitly in a reply to one of the
other patches.

> >> +static void host1x_syncpt_debug(struct nvhost_syncpt *sp)
> >> +{
> >> + u32 i;
> >> + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
> >> + u32 max = nvhost_syncpt_read_max(sp, i);
> >> + u32 min = nvhost_syncpt_update_min(sp, i);
> >> + if (!max && !min)
> >> + continue;
> >> + dev_info(&syncpt_to_dev(sp)->dev->dev,
> >> + "id %d (%s) min %d max %d\n",
> >> + i, syncpt_op().name(sp, i),
> >> + min, max);
> >> +
> >> + }
> >> +
> >> + for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) {
> >> + u32 base_val;
> >> + host1x_syncpt_read_wait_base(sp, i);
> >> + base_val = sp->base_val[i];
> >> + if (base_val)
> >> + dev_info(&syncpt_to_dev(sp)->dev->dev,
> >> + "waitbase id %d val %d\n",
> >> + i, base_val);
> >> +
> >> + }
> >> +}
> >
> > This should probably be integrated with debugfs.
>
> I could move this to debug.c, but it's debugging aid when a command
> stream is misbehaving and it spews this to UART when sync point wait is
> timing out. So not debugfs stuff.

Okay, in that case it should stay in. Perhaps convert dev_info() to
dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG
guards would also be useful. Maybe not.

> >> diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h
> >
> > Autogenerated files are generally not acceptable. And I already
> > mentioned before that you should be using #define instead of static
> > inline functions for register and bit definitions.
>
> What's the root cause for autogenerated files not being acceptable? I'm
> autogenerating them from definitions I get from hardware, which makes
> the results reliable.

The problem is not with autogenerated files in general. The means by
which they are generated are less important. However, autogenerated
files often contain a lot of unneeded definitions and contain things
such as "autogenerated - do not edit" lines.

So generally if you generate the content using some scripts to make sure
it corresponds to what engineering gave you, that's okay as long as you
make sure it has the correct form and doesn't contain any cruft.

> I like static inline because I get the benefit of compiler type
> checking, and gcov shows me which register definitions have been used in
> different tests.

Type checking shouldn't be necessary for simple defines. And I wasn't
aware that you could get the Linux kernel to write out data to be fed to
gcov.

> #defines are always messy and I pretty much hate them. But if the
> general request is to use #define's, even though I don't agree, I can
> accommodate. It's simple to write a sed script to do the conversion.

There are a lot of opportunities to abuse #defines but they are harmless
for register definitions. The Linux kernel is full of them and I haven't
yet seen any code that uses static inline functions for this purpose.

What you need to consider as well is that many people that work with the
Linux kernel expect code to be in a certain style. Register accesses of
the form

writel(value, base + OFFSET);

are very common and expected to look a certain way, so if you write code
that doesn't comply with these guidelines you make it extra hard for
people to read the code. And that'll cost extra time, which people don't
usually have in excess.

> >> +/* Displays the current value of the sync point via sysfs */
> >> +static ssize_t syncpt_min_show(struct kobject *kobj,
> >> + struct kobj_attribute *attr, char *buf)
> >> +{
> >> + struct nvhost_syncpt_attr *syncpt_attr =
> >> + container_of(attr, struct nvhost_syncpt_attr, attr);
> >> +
> >> + return snprintf(buf, PAGE_SIZE, "%u",
> >> + nvhost_syncpt_read(&syncpt_attr->host->syncpt,
> >> + syncpt_attr->id));
> >> +}
> >> +
> >> +static ssize_t syncpt_max_show(struct kobject *kobj,
> >> + struct kobj_attribute *attr, char *buf)
> >> +{
> >> + struct nvhost_syncpt_attr *syncpt_attr =
> >> + container_of(attr, struct nvhost_syncpt_attr, attr);
> >> +
> >> + return snprintf(buf, PAGE_SIZE, "%u",
> >> + nvhost_syncpt_read_max(&syncpt_attr->host->syncpt,
> >> + syncpt_attr->id));
> >> +}
> >
> > This looks like it belongs in debugfs.
>
> This is actually the only interface to read the max value to user space,
> which can be useful for doing some comparisons that take wrapping into
> account. But we could just add IOCTLs and remove the sysfs entries.

Maybe you can explain the usefulness of this some more. Why would it be
easier to look at them in sysfs than in debugfs? You could be providing
a simple list of syncpoints along with min/max, name, requested status,
etc. in debugfs and it should be as easy to parse for both humans and
machines as sysfs. I don't think IOCTLs would be any gain as they tend
to have higher ABI stability requirements than debugfs (which doesn't
have very strong requirements) or sysfs (which is often considered as a
public ABI as well and therefore needs to be stable).

> >> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
> > [...]
> >> +struct host1x_device_info {
> >> + int nb_channels; /* host1x: num channels supported */
> >> + int nb_pts; /* host1x: num syncpoints supported */
> >> + int nb_bases; /* host1x: num syncpoints supported */
> >> + u32 client_managed; /* host1x: client managed syncpts */
> >> + int nb_mlocks; /* host1x: number of mlocks */
> >> + const char **syncpt_names; /* names of sync points */
> >> +};
> >> +
> >> +struct nvhost_device_data {
> >> + int version; /* ip version number of device */
> >> + int id; /* Separates clients of same hw */
> >> + int index; /* Hardware channel number */
> >> + void __iomem *aperture; /* Iomem mapped to kernel */
> >> +
> >> + u32 syncpts; /* Bitfield of sync points used */
> >> + u32 modulemutexes; /* Bit field of module mutexes */
> >> +
> >> + u32 class; /* Device class */
> >> + bool serialize; /* Serialize submits in the channel */
> >> +
> >> + int powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS];
> >> + bool can_powergate; /* True if module can be power gated */
> >> + int clockgate_delay;/* Delay before clock gated */
> >> + int powergate_delay;/* Delay before power gated */
> >> + struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */
> >> +
> >> + struct delayed_work powerstate_down;/* Power state management */
> >> + int num_clks; /* Number of clocks opened for dev */
> >> + struct clk *clk[NVHOST_MODULE_MAX_CLOCKS];
> >> + struct mutex lock; /* Power management lock */
> >> + int powerstate; /* Current power state */
> >> + int refcount; /* Number of tasks active */
> >> + wait_queue_head_t idle_wq; /* Work queue for idle */
> >> +
> >> + struct nvhost_channel *channel; /* Channel assigned for the module */
> >> + struct kobject *power_kobj; /* kobj to hold power sysfs entries */
> >> + struct nvhost_device_power_attr *power_attrib; /* sysfs attributes */
> >> + struct dentry *debugfs; /* debugfs directory */
> >> +
> >> + void *private_data; /* private platform data */
> >> + struct platform_device *pdev; /* owner platform_device */
> >> +
> >> + /* Finalize power on. Can be used for context restore. */
> >> + void (*finalize_poweron)(struct platform_device *dev);
> >> +
> >> + /* Device is busy. */
> >> + void (*busy)(struct platform_device *);
> >> +
> >> + /* Device is idle. */
> >> + void (*idle)(struct platform_device *);
> >> +
> >> + /* Device is going to be suspended */
> >> + void (*suspend_ndev)(struct platform_device *);
> >> +
> >> + /* Device is initialized */
> >> + void (*init)(struct platform_device *dev);
> >> +
> >> + /* Device is de-initialized. */
> >> + void (*deinit)(struct platform_device *dev);
> >> +
> >> + /* Preparing for power off. Used for context save. */
> >> + int (*prepare_poweroff)(struct platform_device *dev);
> >> +
> >> + /* Clock gating callbacks */
> >> + int (*prepare_clockoff)(struct platform_device *dev);
> >> + void (*finalize_clockon)(struct platform_device *dev);
> >> +};
> >
> > A lot of this can be removed if you use existing infrastructure and
> > simplify the design a bit. Most of it can probably move into the main
> > struct host1x to avoid needless indirections that make the code hard to
> > follow and maintain.
>
> Actually, this struct is generic for host1x and host1x clients, so they
> cannot be moved to host1x. I do also realize that I'm not using them in
> the patch sets I sent for 2D.

I've said this before, and I think that this tries to be overly generic.
Display controllers for instance work quite well without an attached
nvhost_channel.

Thierry
Stephen Warren
2012-11-29 18:38:11 UTC
Permalink
On 11/29/2012 04:47 AM, Thierry Reding wrote:
> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergstr=C3=B6m wrote:
>> On 28.11.2012 23:23, Thierry Reding wrote:
>>> This could be problematic. Since drivers/video and
>>> drivers/gpu/drm are separate trees, this would entail a
>>> continuous burden on keeping both trees synchronized. While I
>>> realize that eventually it might be better to put the host1x
>>> driver in a separate place to accomodate for its use by other
>>> subsystems, I'm not sure moving it here right away is the best=20
>>> approach.
>>=20
>> I understand your point, but I hope also that we'd end up with
>> something that can be used as basis for the downstream kernel to
>> migrate to upstream stack.
>>=20
>> The key point here is to make the API between nvhost and tegradrm
>> as small and robust to changes as possible.
>=20
> I agree. But I also fear that there will be changes eventually and=20
> having both go in via different tree requires those trees to be
> merged in a specific order to avoid breakage should the API change.
> This will be particularly ugly in linux-next.
>=20
> That's why I explicitly proposed to take this into
> drivers/gpu/drm/tegra for the time being, until we can be
> reasonably sure that the API is fixed. Then I'm fine with moving it
> wherever seems the best fit. Even then there might be the
> occasional dependency, but they should get fewer and fewer as the
> code matures.

It is acceptable for one maintainer to ack patches, and another
maintainer to merge a series that touches both "their own" code and
code owned by another tree. This should of course only be needed when
inter-module APIs change; changes to code within a module shouldn't
require this.
Thierry Reding
2012-11-30 06:52:34 UTC
Permalink
On Thu, Nov 29, 2012 at 11:38:11AM -0700, Stephen Warren wrote:
> On 11/29/2012 04:47 AM, Thierry Reding wrote:
> > On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
> >> On 28.11.2012 23:23, Thierry Reding wrote:
> >>> This could be problematic. Since drivers/video and
> >>> drivers/gpu/drm are separate trees, this would entail a
> >>> continuous burden on keeping both trees synchronized. While I
> >>> realize that eventually it might be better to put the host1x
> >>> driver in a separate place to accomodate for its use by other
> >>> subsystems, I'm not sure moving it here right away is the best
> >>> approach.
> >>
> >> I understand your point, but I hope also that we'd end up with
> >> something that can be used as basis for the downstream kernel to
> >> migrate to upstream stack.
> >>
> >> The key point here is to make the API between nvhost and tegradrm
> >> as small and robust to changes as possible.
> >
> > I agree. But I also fear that there will be changes eventually and
> > having both go in via different tree requires those trees to be
> > merged in a specific order to avoid breakage should the API change.
> > This will be particularly ugly in linux-next.
> >
> > That's why I explicitly proposed to take this into
> > drivers/gpu/drm/tegra for the time being, until we can be
> > reasonably sure that the API is fixed. Then I'm fine with moving it
> > wherever seems the best fit. Even then there might be the
> > occasional dependency, but they should get fewer and fewer as the
> > code matures.
>
> It is acceptable for one maintainer to ack patches, and another
> maintainer to merge a series that touches both "their own" code and
> code owned by another tree. This should of course only be needed when
> inter-module APIs change; changes to code within a module shouldn't
> require this.

Yes, that's true. But it still makes things more complicated since each
of the maintainers will have to do extra work to test the changes.
Anyway we'll see how this plays out. The ideal case would of course be
to get the API right from the start. =)

Thierry
Lucas Stach
2012-11-30 08:50:29 UTC
Permalink
Am Donnerstag, den 29.11.2012, 11:38 -0700 schrieb Stephen Warren:
> On 11/29/2012 04:47 AM, Thierry Reding wrote:
> > I agree. But I also fear that there will be changes eventually and
> > having both go in via different tree requires those trees to be
> > merged in a specific order to avoid breakage should the API change.
> > This will be particularly ugly in linux-next.
> >
> > That's why I explicitly proposed to take this into
> > drivers/gpu/drm/tegra for the time being, until we can be
> > reasonably sure that the API is fixed. Then I'm fine with moving it
> > wherever seems the best fit. Even then there might be the
> > occasional dependency, but they should get fewer and fewer as the
> > code matures.
>
> It is acceptable for one maintainer to ack patches, and another
> maintainer to merge a series that touches both "their own" code and
> code owned by another tree. This should of course only be needed when
> inter-module APIs change; changes to code within a module shouldn't
> require this.
>

I'm with Thierry here. I think there is a fair chance that we won't get
the API right from the start, even when trying to come up with something
that sounds sane to everyone. It's also not desirable to delay gr2d
going into mainline until we are all completely satisfied with the API.

I also fail to see how host1x module being in the DRM directory hinders
any downstream development. So I'm in favour of keeping host1x besides
the other DRM components to lower the burden for API changes and move it
out into some more generic directory, once we feel confident that the
API is reasonable stable.

Regards,
Lucas
Terje Bergström
2012-12-01 11:44:41 UTC
Permalink
On 30.11.2012 10:50, Lucas Stach wrote:
> I'm with Thierry here. I think there is a fair chance that we won't get
> the API right from the start, even when trying to come up with something
> that sounds sane to everyone. It's also not desirable to delay gr2d
> going into mainline until we are all completely satisfied with the API.
>
> I also fail to see how host1x module being in the DRM directory hinders
> any downstream development. So I'm in favour of keeping host1x besides
> the other DRM components to lower the burden for API changes and move it
> out into some more generic directory, once we feel confident that the
> API is reasonable stable.

host1x module being in DRM directory hinders using nvhost from anywhere
outside DRM in both upstream and downstream. I also don't like first
putting the driver in one place, and then moving it with a huge commit
to another place. We'd just postpone exactly the problems that were
indicated earlier: we'd need to synchronize two trees to remove code in
one and add in another at the same time so that there wouldn't be
conflicting host1x drivers. I'd rather just add it in final place once,
and be done with it.

But if it's a make-it-or-brake-it for upstreaming, I can move it to be a
subdirectory under drivers/gpu/drm/tegra. Would this mean that we'd
modify the MAINTAINER's file so that the tegradrm entry excludes host1x
sub-directory, and I'd add another one which included only the host1x
sub-directory? The host1x part would be Supported, whereas rest of
tegradrm is Maintained.

Best regards,
Terje
Thierry Reding
2012-12-01 15:10:20 UTC
Permalink
On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergström wrote:
> On 30.11.2012 10:50, Lucas Stach wrote:
> > I'm with Thierry here. I think there is a fair chance that we won't get
> > the API right from the start, even when trying to come up with something
> > that sounds sane to everyone. It's also not desirable to delay gr2d
> > going into mainline until we are all completely satisfied with the API.
> >
> > I also fail to see how host1x module being in the DRM directory hinders
> > any downstream development. So I'm in favour of keeping host1x besides
> > the other DRM components to lower the burden for API changes and move it
> > out into some more generic directory, once we feel confident that the
> > API is reasonable stable.
>
> host1x module being in DRM directory hinders using nvhost from anywhere
> outside DRM in both upstream and downstream.

That's not true. Nothing keeps the rest of the kernel from using an API
exported by the tegra-drm driver.

> I also don't like first putting the driver in one place, and then
> moving it with a huge commit to another place.

Hehe, you're doing exactly that in this patch series. =)

> We'd just postpone exactly the problems that were indicated earlier:
> we'd need to synchronize two trees to remove code in one and add in
> another at the same time so that there wouldn't be conflicting host1x
> drivers. I'd rather just add it in final place once, and be done with
> it.

Yes, there would be a certain amount of synchronization needed, but as
Stephen correctly pointed out we could do that move through one tree
with the Acked-by of the other maintainer. The point is that we need to
do this once instead of everytime the API changes.

> But if it's a make-it-or-brake-it for upstreaming, I can move it to be a
> subdirectory under drivers/gpu/drm/tegra. Would this mean that we'd
> modify the MAINTAINER's file so that the tegradrm entry excludes host1x
> sub-directory, and I'd add another one which included only the host1x
> sub-directory? The host1x part would be Supported, whereas rest of
> tegradrm is Maintained.

An entry for drivers/gpu/drm/tegra/host1x would override an entry for
drivers/gpu/drm/tegra so no need to exclude it. That said, there's no
way to exclude an subdirectory in MAINTAINERS that I know of.

My main point for keeping host1x within tegra-drm for now was that it
could possibly help speed up the inclusion of the host1x code. Seeing
that there's still a substantial amount of work to be done and a need
for discussion I'm not sure if rushing this is the best way. In that
case there may be justification for putting it in a separate location
from the start.

Thierry
Terje Bergström
2012-12-01 16:55:04 UTC
Permalink
On 01.12.2012 17:10, Thierry Reding wrote:
> On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergstr=C3=B6m wrote:
>> host1x module being in DRM directory hinders using nvhost from anywh=
ere
>> outside DRM in both upstream and downstream.
>=20
> That's not true. Nothing keeps the rest of the kernel from using an A=
PI
> exported by the tegra-drm driver.

Right, it's just a directory. I was actually thinking that it'd be weir=
d
if a V4L2 driver would use something from inside drivers/gpu/drm/tegra
(V4L use DRM? Oh nooooo!).

Shoot the idea down if it's crazy, but please think about it first. :-)

I started thinking about this and we are constrained by the Linux kerne=
l
subsystems that have a complete different architecture than hardware.
This leads to awkward design as DRM design as it conflicts with the way
hardware works.

Placing host1x driver in one place, DRM driver in another and XYZ drive=
r
in yet another is not ideal either. We're exposing a public API which
needs to be strictly maintained, because we maintain drivers in
different trees, but then again, the list of users is very static and
well-defined, so public API is an overshoot.

How about if we look at this from the hardware architecture point of
view? You mentioned that perhaps drivers/bus/host1x would be the best
place for host1x driver.

What if we put also all host1x client modules under that same directory=
?
drivers/bus/host1x/drm would be for DRM interface, and all other host1x
client module drivers could be placed similarly. This way we could keep
the host1x API private to host1x and the client module drivers, and it'=
s
easy to understand how host1x is used by just following the directory
structure.

Naturally, we could also think if we want to have sub-components per
host1x client (dc, 2d, etc) and a drm sub-component that implements the
DRM interface, and a V4L2 sub-component that implements V4L2 interface
(when/if I can convince people that camera should go upstream).

>> I also don't like first putting the driver in one place, and then
>> moving it with a huge commit to another place.
>=20
> Hehe, you're doing exactly that in this patch series. =3D)

True, I guess it's just a matter of determining what's the best time.

> Yes, there would be a certain amount of synchronization needed, but a=
s
> Stephen correctly pointed out we could do that move through one tree
> with the Acked-by of the other maintainer. The point is that we need =
to
> do this once instead of everytime the API changes.

Yep, inter-tree synchronization is possible, so not a show stopper.

> An entry for drivers/gpu/drm/tegra/host1x would override an entry for
> drivers/gpu/drm/tegra so no need to exclude it. That said, there's no
> way to exclude an subdirectory in MAINTAINERS that I know of.

I saw tag X: in MAINTAINERS file, so that could be used. There's
documentation for it, but also some examples like:

IBM Power Virtual SCSI/FC Device Drivers
M: Robert Jennings <rcj-***@public.gmane.org>
L: linux-scsi-***@public.gmane.org
S: Supported
=46: drivers/scsi/ibmvscsi/
X: drivers/scsi/ibmvscsi/ibmvstgt.c

> My main point for keeping host1x within tegra-drm for now was that it
> could possibly help speed up the inclusion of the host1x code. Seeing
> that there's still a substantial amount of work to be done and a need
> for discussion I'm not sure if rushing this is the best way. In that
> case there may be justification for putting it in a separate location
> from the start.

I'm not in a hurry, so let's try to figure the best design first.
Biggest architectural unsolved problem is the memory management and
relationship between tegradrm and host1x driver. What Lucas proposed
about memory management makes sense, but it'll take a while to implemen=
t it.

The rest of the unsolved questions are more about differences in
opinion, and solvable.

Terje
Lucas Stach
2012-12-01 17:34:51 UTC
Permalink
Am Samstag, den 01.12.2012, 18:55 +0200 schrieb Terje Bergstr=C3=B6m:
> On 01.12.2012 17:10, Thierry Reding wrote:
> > On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergstr=C3=B6m wrot=
e:
> >> host1x module being in DRM directory hinders using nvhost from any=
where
> >> outside DRM in both upstream and downstream.
> >=20
> > That's not true. Nothing keeps the rest of the kernel from using an=
API
> > exported by the tegra-drm driver.
>=20
> Right, it's just a directory. I was actually thinking that it'd be we=
ird
> if a V4L2 driver would use something from inside drivers/gpu/drm/tegr=
a
> (V4L use DRM? Oh nooooo!).
>=20
Yes it _is_ weird to have V4L using something which resides inside DRM,
but see below.

> Shoot the idea down if it's crazy, but please think about it first. :=
-)
>=20
> I started thinking about this and we are constrained by the Linux ker=
nel
> subsystems that have a complete different architecture than hardware.
> This leads to awkward design as DRM design as it conflicts with the w=
ay
> hardware works.
>=20
> Placing host1x driver in one place, DRM driver in another and XYZ dri=
ver
> in yet another is not ideal either. We're exposing a public API which
> needs to be strictly maintained, because we maintain drivers in
> different trees, but then again, the list of users is very static and
> well-defined, so public API is an overshoot.

> How about if we look at this from the hardware architecture point of
> view? You mentioned that perhaps drivers/bus/host1x would be the best
> place for host1x driver.
>=20
> What if we put also all host1x client modules under that same directo=
ry?
> drivers/bus/host1x/drm would be for DRM interface, and all other host=
1x
> client module drivers could be placed similarly. This way we could ke=
ep
> the host1x API private to host1x and the client module drivers, and i=
t's
> easy to understand how host1x is used by just following the directory
> structure.
>=20
This would certainly make life easier, but personally I don't think it'=
s
the right thing to do. The separation of the Linux kernel into differen=
t
subsystems was done for a reason and just because the specific hardware
at hands happens to work a bit different is no valid reason to break
with the standard rules of the kernel.

So I think there is no way around handling the different drivers that
use host1x in different trees. For the time being there is _only_
tegra-drm using host1x in the upstream kernel. We have to make sure to
come up with some API which is reasonably stable, so we don't run into
big problems later. That's why I'm really in favour to keep host1x and
tegra-drm side by side in the current upstream, to make sure we can
change the API without jumping through too much hoops.

Your downstream V4L would have to use host1x from the DRM directory, bu=
t
really: is your downstream such a nice, clean codebase that you are not
able to cope with the slight ugliness of this solution?

> Naturally, we could also think if we want to have sub-components per
> host1x client (dc, 2d, etc) and a drm sub-component that implements t=
he
> DRM interface, and a V4L2 sub-component that implements V4L2 interfac=
e
> (when/if I can convince people that camera should go upstream).
>=20
To me this sound as if V4L upstream support is still a fair time away.
IMHO the right time to move out host1x is exactly the point when a
second user starts appearing upstream. This will give us some time to
fiddle with the API until we have to commit to it as being stable.

> >> I also don't like first putting the driver in one place, and then
> >> moving it with a huge commit to another place.
> >=20
> > Hehe, you're doing exactly that in this patch series. =3D)
>=20
> True, I guess it's just a matter of determining what's the best time.
>=20
See above.

[...]
> I'm not in a hurry, so let's try to figure the best design first.
> Biggest architectural unsolved problem is the memory management and
> relationship between tegradrm and host1x driver. What Lucas proposed
> about memory management makes sense, but it'll take a while to implem=
ent it.

Please make sure to remove any unnecessary cruft from host1x in the
process and don't try to make too big of a step at once. We only need
one type of memory within host1x: native host1x objects, no need to pla=
n
for support of anything else. Also taking over ownership of the IOMMU
address space might take some more work in the IOMMU API. We can leave
this out completely for a start. Both Tegra 2 and 3 should be able to
work with CMA backed objects just fine.

Regards,
Lucas
Terje Bergström
2012-12-01 19:29:54 UTC
Permalink
On 01.12.2012 19:34, Lucas Stach wrote:
> This would certainly make life easier, but personally I don't think it's
> the right thing to do. The separation of the Linux kernel into different
> subsystems was done for a reason and just because the specific hardware
> at hands happens to work a bit different is no valid reason to break
> with the standard rules of the kernel.
>
> So I think there is no way around handling the different drivers that
> use host1x in different trees. For the time being there is _only_
> tegra-drm using host1x in the upstream kernel. We have to make sure to
> come up with some API which is reasonably stable, so we don't run into
> big problems later. That's why I'm really in favour to keep host1x and
> tegra-drm side by side in the current upstream, to make sure we can
> change the API without jumping through too much hoops.
>
> Your downstream V4L would have to use host1x from the DRM directory, but
> really: is your downstream such a nice, clean codebase that you are not
> able to cope with the slight ugliness of this solution?

Ok, can do. I'll move the code base to drivers/gpu/drm/tegra/host1x. For
downstream, the host1x driver implements all user space APIs (no drm, no
v4l, etc) so the directory is of no consequence. If we immersed host1x
driver totally with tegra-drm, that'd be a problem, but if I can keep a
separation, that's fine.

> Please make sure to remove any unnecessary cruft from host1x in the
> process and don't try to make too big of a step at once. We only need
> one type of memory within host1x: native host1x objects, no need to plan
> for support of anything else. Also taking over ownership of the IOMMU
> address space might take some more work in the IOMMU API. We can leave
> this out completely for a start. Both Tegra 2 and 3 should be able to
> work with CMA backed objects just fine.

Ok, that simplifies the process. I'll just implement firewall and copy
the strema over to kernel space unconditionally.

Terje
Dave Airlie
2012-12-01 21:42:06 UTC
Permalink
Guys I think you guys might be overthniking things here.

I know you have some sort of upstream/downstream split, but really in
the upstream kernel, we don't care about that, so don't make it our
problem.

There is no need for any sort of stable API between host1x and the sub
drivers, we change APIs in the kernel the whole time it isn't a
problem.

If you need to change the API, submit a single patch changing it
across all the drivers in the tree, collecting Acks or not as needed.
We do this the whole time, I've never had or seen a problem with it.

We don't do separate subsystems APIs set in stone bullshit, and all
subsystem maintainers are used to dealing with these sort of issues.
You get an ack from one maintainer and the other one sticks it in his
tree with a note to Linus.

You can put the code where you want, maybe just under drivers/gpu
instead of drivers/video or drivers/gpu/drm, just make sure you have a
path for it into the kernel.

And I have an non-upstream precedent for v4l sitting on drm, some
radeon GPUs have capture tuners, and the only way to implement that
would be to stick a v4l driver in the radeon drm driver. Not a
problem, just never finished writing the code.

Dave.
Thierry Reding
2012-12-01 22:39:13 UTC
Permalink
On Sun, Dec 02, 2012 at 07:42:06AM +1000, Dave Airlie wrote:
> Guys I think you guys might be overthniking things here.
>
> I know you have some sort of upstream/downstream split, but really in
> the upstream kernel, we don't care about that, so don't make it our
> problem.
>
> There is no need for any sort of stable API between host1x and the sub
> drivers, we change APIs in the kernel the whole time it isn't a
> problem.

Point taken. I was primarily concerned about needless churn during early
development. But given the latest discussions it has become clear that
there's no need to rush things and therefore we should be able to
resolve any potential issues that could result in churn before the first
patches are merged.

> You can put the code where you want, maybe just under drivers/gpu
> instead of drivers/video or drivers/gpu/drm, just make sure you have a
> path for it into the kernel.

drivers/gpu/host1x sounds like a good location to me. Does that still go
in via your tree?

Thierry
Terje Bergström
2012-12-02 11:24:13 UTC
Permalink
On 01.12.2012 23:42, Dave Airlie wrote:
> Guys I think you guys might be overthniking things here.
>
> I know you have some sort of upstream/downstream split, but really in
> the upstream kernel, we don't care about that, so don't make it our
> problem.

I am not trying to make anything your problem. Most of the issues we
have already worked out with a good solution that all active
participants have agreed with. We have only a couple of disagreements
with Thierry.

My goal is to get a good open source co-operation and trying to prevent
a code fork while still maintaining good design. That way everybody
wins. The way to do that is to base our BSP on upstream kernel.

I'm not trying to here throw code over the fence and flee. This is a
genuine attempt to work together. I want to prevent the "we" (kernel
community excluding NVIDIA) and "you" (NVIDIA) that a split code base
would cause in the long run. I'd like to just talk about "we" including
NVIDIA.

> There is no need for any sort of stable API between host1x and the sub
> drivers, we change APIs in the kernel the whole time it isn't a
> problem.
>
> If you need to change the API, submit a single patch changing it
> across all the drivers in the tree, collecting Acks or not as needed.
> We do this the whole time, I've never had or seen a problem with it.
>
> We don't do separate subsystems APIs set in stone bullshit, and all
> subsystem maintainers are used to dealing with these sort of issues.
> You get an ack from one maintainer and the other one sticks it in his
> tree with a note to Linus.
>
> You can put the code where you want, maybe just under drivers/gpu
> instead of drivers/video or drivers/gpu/drm, just make sure you have a
> path for it into the kernel.

Follows exactly my thinking, as the location of host1x driver has no
practical consequence to me.

Thierry proposed drivers/gpu/host1x. I'd like to see a couple of
comments on that proposal, and if it sticks, follow that.

Thierry, did you mean that host1x driver would be in drivers/gpu/host1x,
and tegradrm in drivers/gpu/drm/tegra, or would we put both in same
directory?

> And I have an non-upstream precedent for v4l sitting on drm, some
> radeon GPUs have capture tuners, and the only way to implement that
> would be to stick a v4l driver in the radeon drm driver. Not a
> problem, just never finished writing the code.

Yes, I just mentioned that as awkward, but I have no problem with any path.

Terje
Thierry Reding
2012-12-02 20:55:27 UTC
Permalink
On Sun, Dec 02, 2012 at 01:24:13PM +0200, Terje Bergström wrote:
> On 01.12.2012 23:42, Dave Airlie wrote:
> > Guys I think you guys might be overthniking things here.
> >
> > I know you have some sort of upstream/downstream split, but really in
> > the upstream kernel, we don't care about that, so don't make it our
> > problem.
>
> I am not trying to make anything your problem. Most of the issues we
> have already worked out with a good solution that all active
> participants have agreed with. We have only a couple of disagreements
> with Thierry.
>
> My goal is to get a good open source co-operation and trying to prevent
> a code fork while still maintaining good design. That way everybody
> wins. The way to do that is to base our BSP on upstream kernel.

Yes, that's exactly what you should be doing.

> I'm not trying to here throw code over the fence and flee. This is a
> genuine attempt to work together. I want to prevent the "we" (kernel
> community excluding NVIDIA) and "you" (NVIDIA) that a split code base
> would cause in the long run. I'd like to just talk about "we" including
> NVIDIA.

FWIW I'm convinced that you're genuinely trying to make this work and
nobody welcomes this more than me. However it is only natural if you
dump such a large body of code on the community that people will
disagree with some of the design decisions.

So when I comment on the design or patches in general, it is not my
intention to exclude you or NVIDIA in any way. All I'm trying to do is
spot problematic or unclear parts that will make working with the code
any more difficult than it has to be.

> > There is no need for any sort of stable API between host1x and the sub
> > drivers, we change APIs in the kernel the whole time it isn't a
> > problem.
> >
> > If you need to change the API, submit a single patch changing it
> > across all the drivers in the tree, collecting Acks or not as needed.
> > We do this the whole time, I've never had or seen a problem with it.
> >
> > We don't do separate subsystems APIs set in stone bullshit, and all
> > subsystem maintainers are used to dealing with these sort of issues.
> > You get an ack from one maintainer and the other one sticks it in his
> > tree with a note to Linus.
> >
> > You can put the code where you want, maybe just under drivers/gpu
> > instead of drivers/video or drivers/gpu/drm, just make sure you have a
> > path for it into the kernel.
>
> Follows exactly my thinking, as the location of host1x driver has no
> practical consequence to me.
>
> Thierry proposed drivers/gpu/host1x. I'd like to see a couple of
> comments on that proposal, and if it sticks, follow that.
>
> Thierry, did you mean that host1x driver would be in drivers/gpu/host1x,
> and tegradrm in drivers/gpu/drm/tegra, or would we put both in same
> directory?

Since tegra-drm is a DRM driver it should stay in drivers/gpu/drm. I can
also live with the host1x driver staying in drivers/video, but I don't
think it's the proper location and drivers/gpu/host1x seems like a much
better fit.

Thierry
Terje Bergström
2012-12-03 06:26:44 UTC
Permalink
On 02.12.2012 22:55, Thierry Reding wrote:
> FWIW I'm convinced that you're genuinely trying to make this work and
> nobody welcomes this more than me. However it is only natural if you
> dump such a large body of code on the community that people will
> disagree with some of the design decisions.
>
> So when I comment on the design or patches in general, it is not my
> intention to exclude you or NVIDIA in any way. All I'm trying to do is
> spot problematic or unclear parts that will make working with the code
> any more difficult than it has to be.

Thanks, I know it'a a large dump and you've made great comments about
the code, and hit most of the sore spots I've had with the driver, too.
I appreciate your effort - this process is making the driver better.

It's good to hear that our goals are aligned. So now that we got that
out of the system, let's get back to business. :-)

> Since tegra-drm is a DRM driver it should stay in drivers/gpu/drm. I can
> also live with the host1x driver staying in drivers/video, but I don't
> think it's the proper location and drivers/gpu/host1x seems like a much
> better fit.

That sounds like a plan to me.

Terje
Terje Bergström
2012-11-30 08:56:39 UTC
Permalink
On 29.11.2012 13:47, Thierry Reding wrote:
> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
>> Tegra20 and Tegra30 are compatible, but future chips are not. I was
>> hoping we would be ready in upstream kernel for future chips.
>
> I think we should ignore that problem for now. Generally planning for
> any possible combination of incompatibilities leads to overgeneralized
> designs that require precisely these kinds of indirections.
>
> Once some documentation for Tegra 40 materializes we can start thinking
> about how to encapsulate the incompatible code.

I think here our perspectives differ a lot. That is natural considering
the company I work for and company you work for, so let's try to sync
the perspective.

In my reality, whatever is in market is old news and I barely work on
them anymore. Upstreaming activity is the exception. 90% of my time is
spent dealing with future chips which I know cannot be handled without
this split to logical and physical driver parts.

For you, Tegra2 and Tegra3 are the reality.

If we move nvhost in upstream a bit incompatible, that's fine, like
ripping out features or adding new new stuff, like a new memory type.
All of this I can support with a good diff tool to get all the patches
flowing between upstream and downstream.

If we do fundamental changes that prevent bringing the code back to
downstream, like removing this abstraction, the whole process of
upstream and downstream converging hits a brick wall. We wouldn't have
proper continuing co-operation, but just pushing code out and being done
with it.

> I noticed that it was filled with content in one of the subsequent
> patches. Depending on how this gets merged eventually you could postpone
> adding the function until the later patch. But perhaps once the code has
> been properly reviewed we can just squash the patches again. We'll see.

Ok, thanks.

>> True. I might also as well delete the general interrupt altogether, as
>> we don't use it for any real purpose.
>
> I think it might still be useful for diagnostics. It seems to be used
> when writes time out. That could still be helpful information when
> debugging problems.

It's actually a stale comment. The client units are not signaling
anything useful with the interrupt. There's use for it in downstream,
but that's irrelevant here.

> Making this generic for all modules may not be what we want as it
> doesn't allow devices to handle things themselves if necessary. Clock
> management is just part of the boiler plate that every driver is
> supposed to cope with. Also the number of clocks is usually not higher
> than 2 or 3, so the pain is manageable. =)
>
> Furthermore doing this in loops may not work for all modules. Some may
> require additional delays between enabling the clocks, others may be
> able to selectively disable one clock but not the other(s).

Yes, but I'll just rip the power management code out, so we can postpone
this until we have validated and verified the runtime PM mechanism
downstream.

>> I could move this to debug.c, but it's debugging aid when a command
>> stream is misbehaving and it spews this to UART when sync point wait is
>> timing out. So not debugfs stuff.
>
> Okay, in that case it should stay in. Perhaps convert dev_info() to
> dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG
> guards would also be useful. Maybe not.

I could do that for upstream. In downstream it cannot depend on DEBUG
flag, as these spews are an important part of how we debug problems with
customer devices and the DEBUG flag is never on in customer builds.

> The problem is not with autogenerated files in general. The means by
> which they are generated are less important. However, autogenerated
> files often contain a lot of unneeded definitions and contain things
> such as "autogenerated - do not edit" lines.
>
> So generally if you generate the content using some scripts to make sure
> it corresponds to what engineering gave you, that's okay as long as you
> make sure it has the correct form and doesn't contain any cruft.

I can remove the boilerplate, that's not a problem. In general, we have
tried to be very selective about what we generate, so that it matches
what we're using.

>> I like static inline because I get the benefit of compiler type
>> checking, and gcov shows me which register definitions have been used in
>> different tests.
>
> Type checking shouldn't be necessary for simple defines. And I wasn't
> aware that you could get the Linux kernel to write out data to be fed to
> gcov.
>
>> #defines are always messy and I pretty much hate them. But if the
>> general request is to use #define's, even though I don't agree, I can
>> accommodate. It's simple to write a sed script to do the conversion.
>
> There are a lot of opportunities to abuse #defines but they are harmless
> for register definitions. The Linux kernel is full of them and I haven't
> yet seen any code that uses static inline functions for this purpose.

My problem is just that I know that the code generated is the same. What
we're talking about is that should we let the preprocessor or compiler
take care of this.

My take is that using preprocessor is not wise - it's the last resort if
there's no other proper way of doing things. Preprocessor requires all
sorts of extra parenthesis to protect against its deficiencies, and it
it merely a tool to do search-and-replace. Even multi-line needs special
treatment.

> What you need to consider as well is that many people that work with the
> Linux kernel expect code to be in a certain style. Register accesses of
> the form
>
> writel(value, base + OFFSET);
>
> are very common and expected to look a certain way, so if you write code
> that doesn't comply with these guidelines you make it extra hard for
> people to read the code. And that'll cost extra time, which people don't
> usually have in excess.

But this has nothing to do with static inline vs. #define anymore, right?

> Maybe you can explain the usefulness of this some more. Why would it be
> easier to look at them in sysfs than in debugfs? You could be providing
> a simple list of syncpoints along with min/max, name, requested status,
> etc. in debugfs and it should be as easy to parse for both humans and
> machines as sysfs. I don't think IOCTLs would be any gain as they tend
> to have higher ABI stability requirements than debugfs (which doesn't
> have very strong requirements) or sysfs (which is often considered as a
> public ABI as well and therefore needs to be stable).

debugfs is just a debugging tool, and user space cannot rely on it. Only
developers can rely on existence of debugfs, as they have the means to
enable it.

sysfs is a place for actual APIs as you mention, and user space can rely
on them as proper APIs. That's what the values were exported for.

> I've said this before, and I think that this tries to be overly generic.
> Display controllers for instance work quite well without an attached
> nvhost_channel.

Yes, these structures aren't meant to be used by anything else than
units that are controlled by the host1x driver. DC, for example,
wouldn't have this.

Terje
Thierry Reding
2012-11-30 10:38:50 UTC
Permalink
On Fri, Nov 30, 2012 at 10:56:39AM +0200, Terje Bergström wrote:
> On 29.11.2012 13:47, Thierry Reding wrote:
> > On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
> >> Tegra20 and Tegra30 are compatible, but future chips are not. I was
> >> hoping we would be ready in upstream kernel for future chips.
> >
> > I think we should ignore that problem for now. Generally planning for
> > any possible combination of incompatibilities leads to overgeneralized
> > designs that require precisely these kinds of indirections.
> >
> > Once some documentation for Tegra 40 materializes we can start thinking
> > about how to encapsulate the incompatible code.
>
> I think here our perspectives differ a lot. That is natural considering
> the company I work for and company you work for, so let's try to sync
> the perspective.
>
> In my reality, whatever is in market is old news and I barely work on
> them anymore. Upstreaming activity is the exception. 90% of my time is
> spent dealing with future chips which I know cannot be handled without
> this split to logical and physical driver parts.
>
> For you, Tegra2 and Tegra3 are the reality.

To be fair, Tegra2 and Tegra3 are the reality for *everyone* *outside*
NVIDIA.

It's great that you spend most of your time working with future chips,
but unless you submit the code for inclusion or review nobody upstream
needs to be concerned about the implications. Most people don't have
time to waste so we naturally try to keep the maintenance burden to a
minimum.

The above implies that when you submit code it shouldn't contain pieces
that prepare for possible future extensions which may or may not be
submitted (the exception being if such changes are part of a series
where subsequent patches actually use them). The outcome is that the
amount of cruft in the mainline kernel is kept to a minimum. And that's
a very good thing.

> If we move nvhost in upstream a bit incompatible, that's fine, like
> ripping out features or adding new new stuff, like a new memory type.
> All of this I can support with a good diff tool to get all the patches
> flowing between upstream and downstream.
>
> If we do fundamental changes that prevent bringing the code back to
> downstream, like removing this abstraction, the whole process of
> upstream and downstream converging hits a brick wall. We wouldn't have
> proper continuing co-operation, but just pushing code out and being done
> with it.

Generally upstream doesn't concern itself with downstream. However we
still willingly accept code that is submitted for upstream inclusion
independent of where it comes from. The only requirements are that the
code conforms to the established standards and has gone through an
appropriate amount of review. Downstream maintenance is up to you. If
you need to maintain code that doesn't meet the above requirements or
that you don't want to submit or haven't got around to yet that's your
problem.

If you're serious about wanting to derive your downstream kernel from a
mainline kernel, then the only realistic way for you to reduce your
amount of work is to push your code upstream. And typically the earlier
you do so, the better.

> >> I could move this to debug.c, but it's debugging aid when a command
> >> stream is misbehaving and it spews this to UART when sync point wait is
> >> timing out. So not debugfs stuff.
> >
> > Okay, in that case it should stay in. Perhaps convert dev_info() to
> > dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG
> > guards would also be useful. Maybe not.
>
> I could do that for upstream. In downstream it cannot depend on DEBUG
> flag, as these spews are an important part of how we debug problems with
> customer devices and the DEBUG flag is never on in customer builds.

So I've just looked through these patches once more and I can't find
where this functionality is actually used. The host1x_syncpt_debug()
function is assigned to the nvhost_syncpt_ops.debug member, which in
turn is only used by nvhost_syncpt_debug(). The latter, however is
never used (not even by the debug infrastructure introduced in patch
4).

> >> I like static inline because I get the benefit of compiler type
> >> checking, and gcov shows me which register definitions have been used in
> >> different tests.
> >
> > Type checking shouldn't be necessary for simple defines. And I wasn't
> > aware that you could get the Linux kernel to write out data to be fed to
> > gcov.
> >
> >> #defines are always messy and I pretty much hate them. But if the
> >> general request is to use #define's, even though I don't agree, I can
> >> accommodate. It's simple to write a sed script to do the conversion.
> >
> > There are a lot of opportunities to abuse #defines but they are harmless
> > for register definitions. The Linux kernel is full of them and I haven't
> > yet seen any code that uses static inline functions for this purpose.
>
> My problem is just that I know that the code generated is the same. What
> we're talking about is that should we let the preprocessor or compiler
> take care of this.
>
> My take is that using preprocessor is not wise - it's the last resort if
> there's no other proper way of doing things. Preprocessor requires all
> sorts of extra parenthesis to protect against its deficiencies, and it
> it merely a tool to do search-and-replace. Even multi-line needs special
> treatment.

Okay, so what you're saying here is that a huge number of people haven't
been wise in using the preprocessor for register definitions all these
years. That's a pretty bold statement. Now I obviously haven't looked at
every single line in the kernel, but I have never come across this usage
for static inline functions used for this. So, to be honest, I don't
think this is really up for discussion. Of course if you come up with an
example where this is done in a similar way I could be persuaded
otherwise.

> > What you need to consider as well is that many people that work with the
> > Linux kernel expect code to be in a certain style. Register accesses of
> > the form
> >
> > writel(value, base + OFFSET);
> >
> > are very common and expected to look a certain way, so if you write code
> > that doesn't comply with these guidelines you make it extra hard for
> > people to read the code. And that'll cost extra time, which people don't
> > usually have in excess.
>
> But this has nothing to do with static inline vs. #define anymore, right?

Of course it has. With the way you've chosen to define registers the
code will look like this:

writel(value, base + offset_r())

Maybe it's just me, but when I read code like that I need additional
time to parse it as opposed to the canonical form.

> > Maybe you can explain the usefulness of this some more. Why would it be
> > easier to look at them in sysfs than in debugfs? You could be providing
> > a simple list of syncpoints along with min/max, name, requested status,
> > etc. in debugfs and it should be as easy to parse for both humans and
> > machines as sysfs. I don't think IOCTLs would be any gain as they tend
> > to have higher ABI stability requirements than debugfs (which doesn't
> > have very strong requirements) or sysfs (which is often considered as a
> > public ABI as well and therefore needs to be stable).
>
> debugfs is just a debugging tool, and user space cannot rely on it. Only
> developers can rely on existence of debugfs, as they have the means to
> enable it.
>
> sysfs is a place for actual APIs as you mention, and user space can rely
> on them as proper APIs. That's what the values were exported for.

But I don't see how that's relevant here. Let me quote what you said
originally:

> This is actually the only interface to read the max value to user space,
> which can be useful for doing some comparisons that take wrapping into
> account. But we could just add IOCTLs and remove the sysfs entries.

To me that sounded like it was only used for debugging purposes. If you
actually need to access this from a userspace driver then, as opposed to
what I said earlier, this should be handled by some IOCTL.

Thierry
Terje Bergström
2012-12-01 11:31:02 UTC
Permalink
On 30.11.2012 12:38, Thierry Reding wrote:
> * PGP Signed by an unknown key
> The above implies that when you submit code it shouldn't contain pieces
> that prepare for possible future extensions which may or may not be
> submitted (the exception being if such changes are part of a series
> where subsequent patches actually use them). The outcome is that the
> amount of cruft in the mainline kernel is kept to a minimum. And that's
> a very good thing.

We're now talking about actually a separation of logical and physical
driver. I can't see why that's a bad thing. Especially considering that
it's standard practice in well written drivers. Let's try to find a
technical clean solution instead of debating politics. The latter should
never be part of Linux kernel reviews.

>> I could do that for upstream. In downstream it cannot depend on DEBUG
>> flag, as these spews are an important part of how we debug problems with
>> customer devices and the DEBUG flag is never on in customer builds.
>
> So I've just looked through these patches once more and I can't find
> where this functionality is actually used. The host1x_syncpt_debug()
> function is assigned to the nvhost_syncpt_ops.debug member, which in
> turn is only used by nvhost_syncpt_debug(). The latter, however is
> never used (not even by the debug infrastructure introduced in patch
> 4).

I have accidentally used the syncpt_op().debug() version directly. I'll
fix that.

> Okay, so what you're sayingCan here is that a huge number of people haven't
> been wise in using the preprocessor for register definitions all these
> years. That's a pretty bold statement. Now I obviously haven't looked at
> every single line in the kernel, but I have never come across this usage
> for static inline functions used for this. So, to be honest, I don't
> think this is really up for discussion. Of course if you come up with an
> example where this is done in a similar way I could be persuaded
> otherwise.

We must've talked about a bit different things. For pure register defs,
I can accommodate changing to #defines. We'd lose the code coverage
analysis, though, but if the parentheses are a make-or-break question to
upstreaming, I can change.

I was thinking of definitions like this:

static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v)
{
return (v & 0x1ff) << 0;
}

versus

#define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) & 0x3ff

Both of these produce the same machine code and have same usage, but the
latter has type checking and code coverage analysis and the former is
(in my eyes) clearer. In both of these cases the usage is like this:

writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
| host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid)
| host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr),
m->sync_aperture + host1x_sync_cfpeek_ctrl_r());

> But I don't see how that's relevant here. Let me quote what you said
> originally:
>
>> This is actually the only interface to read the max value to user space,
>> which can be useful for doing some comparisons that take wrapping into
>> account. But we could just add IOCTLs and remove the sysfs entries.
>
> To me that sounded like it was only used for debugging purposes. If you
> actually need to access this from a userspace driver then, as opposed to
> what I said earlier, this should be handled by some IOCTL.

There's a use for production code to know both the max and min, but I
think we can just scope that use out from this patch sest.

User space can use these two for checking if one of their fences has
already passed by comparing if the current value is between min and
fence, taking wrapping into account. In these cases user space can f.ex.
leave a host1x wait out from a command stream.

Terje
Daniel Vetter
2012-12-01 13:42:18 UTC
Permalink
On Sat, Dec 1, 2012 at 12:31 PM, Terje Bergstr=F6m <***@nvidia.c=
om> wrote:
> We must've talked about a bit different things. For pure register def=
s,
> I can accommodate changing to #defines. We'd lose the code coverage
> analysis, though, but if the parentheses are a make-or-break question=
to
> upstreaming, I can change.

Out of sheer curiosity: What are you using the coverage data of these
register definitions for? When I looked into coverage analysis the
resulting data seemed rather useless to me, since the important thing
is how well we cover the entire dynamic state space of the hw+sw (e.g.
crap left behind by the bios ...) and coverage seemed to be a poor
proxy for that. Hence why I wonder what you're doing with this data
=2E..
-Daniel
--=20
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Terje Bergström
2012-12-01 16:22:00 UTC
Permalink
On 01.12.2012 15:42, Daniel Vetter wrote:
> Out of sheer curiosity: What are you using the coverage data of these
> register definitions for? When I looked into coverage analysis the
> resulting data seemed rather useless to me, since the important thing
> is how well we cover the entire dynamic state space of the hw+sw (e.g.
> crap left behind by the bios ...) and coverage seemed to be a poor
> proxy for that. Hence why I wonder what you're doing with this data

Yes, it's a poor proxy. But still, I use it to determine how big
portions of hardware address space and fields I'm touching when running
host1x tests. It's interesting data for planning tests, but not much more.

Best regards,
Terje
Thierry Reding
2012-12-01 14:58:14 UTC
Permalink
On Sat, Dec 01, 2012 at 01:31:02PM +0200, Terje Bergström wrote:
> On 30.11.2012 12:38, Thierry Reding wrote:
> > * PGP Signed by an unknown key
> > The above implies that when you submit code it shouldn't contain pieces
> > that prepare for possible future extensions which may or may not be
> > submitted (the exception being if such changes are part of a series
> > where subsequent patches actually use them). The outcome is that the
> > amount of cruft in the mainline kernel is kept to a minimum. And that's
> > a very good thing.
>
> We're now talking about actually a separation of logical and physical
> driver. I can't see why that's a bad thing. Especially considering that
> it's standard practice in well written drivers. Let's try to find a
> technical clean solution instead of debating politics. The latter should
> never be part of Linux kernel reviews.

I don't know where you see politics in what I said. All I'm saying is
that we shouldn't be making things needlessly complex. In my experience
the technically cleanest solution is usually the one with the least
complexity.

> > Okay, so what you're sayingCan here is that a huge number of people haven't
> > been wise in using the preprocessor for register definitions all these
> > years. That's a pretty bold statement. Now I obviously haven't looked at
> > every single line in the kernel, but I have never come across this usage
> > for static inline functions used for this. So, to be honest, I don't
> > think this is really up for discussion. Of course if you come up with an
> > example where this is done in a similar way I could be persuaded
> > otherwise.
>
> We must've talked about a bit different things. For pure register defs,
> I can accommodate changing to #defines. We'd lose the code coverage
> analysis, though, but if the parentheses are a make-or-break question to
> upstreaming, I can change.
>
> I was thinking of definitions like this:
>
> static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v)
> {
> return (v & 0x1ff) << 0;
> }
>
> versus
>
> #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) & 0x3ff
>
> Both of these produce the same machine code and have same usage, but the
> latter has type checking and code coverage analysis and the former is
> (in my eyes) clearer. In both of these cases the usage is like this:
>
> writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
> | host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid)
> | host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr),
> m->sync_aperture + host1x_sync_cfpeek_ctrl_r());

Again there's no precedent for doing this with static inline functions.
You can do the same with macros. Type checking isn't an issue in these
cases since we're talking about bitfields for which no proper type
exists.

Two other things about the examples above: the definitions should be all
caps and it would be nice if they could be made a bit shorter.

> > But I don't see how that's relevant here. Let me quote what you said
> > originally:
> >
> >> This is actually the only interface to read the max value to user space,
> >> which can be useful for doing some comparisons that take wrapping into
> >> account. But we could just add IOCTLs and remove the sysfs entries.
> >
> > To me that sounded like it was only used for debugging purposes. If you
> > actually need to access this from a userspace driver then, as opposed to
> > what I said earlier, this should be handled by some IOCTL.
>
> There's a use for production code to know both the max and min, but I
> think we can just scope that use out from this patch sest.
>
> User space can use these two for checking if one of their fences has
> already passed by comparing if the current value is between min and
> fence, taking wrapping into account. In these cases user space can f.ex.
> leave a host1x wait out from a command stream.

But you already have extra code in the kernel to patch out expired sync-
points. Is it really worth the added effort to burden userspace with
this? If so I still think some kind of generic IOCTL to retrieve
information about a syncpoint would be better than a sysfs interface.

Thierry
Terje Bergström
2012-12-01 17:13:49 UTC
Permalink
On 01.12.2012 16:58, Thierry Reding wrote:
> I don't know where you see politics in what I said. All I'm saying is
> that we shouldn't be making things needlessly complex. In my experience
> the technically cleanest solution is usually the one with the least
> complexity.

Let me come up with a proposal and let's then see where to go next.

> But you already have extra code in the kernel to patch out expired sync-
> points. Is it really worth the added effort to burden userspace with
> this? If so I still think some kind of generic IOCTL to retrieve
> information about a syncpoint would be better than a sysfs interface.

That's exactly why I mentioned that it's not useful to upstream. There
are some cases where user space might want to check if a fence has
passed without waiting for it, but that's marginal and could be handled
even with waits with zero timeout.

Terje
Stephen Warren
2012-12-03 19:23:32 UTC
Permalink
On 12/01/2012 07:58 AM, Thierry Reding wrote:
> On Sat, Dec 01, 2012 at 01:31:02PM +0200, Terje Bergstr=C3=B6m wrote:
=2E..
>> I was thinking of definitions like this:
>>=20
>> static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) {=20
>> return (v & 0x1ff) << 0; }
>>=20
>> versus
>>=20
>> #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) &
>> 0x3ff
>>=20
>> Both of these produce the same machine code and have same usage,
>> but the latter has type checking and code coverage analysis and
>> the former is (in my eyes) clearer. In both of these cases the
>> usage is like this:
>>=20
>> writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1) |
>> host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid) |
>> host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr), m->sync_aperture +
>> host1x_sync_cfpeek_ctrl_r());
>=20
> Again there's no precedent for doing this with static inline
> functions. You can do the same with macros. Type checking isn't an
> issue in these cases since we're talking about bitfields for which
> no proper type exists.

I suspect the inline functions could encode signed-vs-unsigned fields,
perhaps catch u8 variables when they should have been u32, etc.?
Thierry Reding
2012-12-04 21:31:53 UTC
Permalink
On Mon, Dec 03, 2012 at 12:23:32PM -0700, Stephen Warren wrote:
> On 12/01/2012 07:58 AM, Thierry Reding wrote:
> > On Sat, Dec 01, 2012 at 01:31:02PM +0200, Terje Bergström wrote:
> ...
> >> I was thinking of definitions like this:
> >>
> >> static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) {
> >> return (v & 0x1ff) << 0; }
> >>
> >> versus
> >>
> >> #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) &
> >> 0x3ff
> >>
> >> Both of these produce the same machine code and have same usage,
> >> but the latter has type checking and code coverage analysis and
> >> the former is (in my eyes) clearer. In both of these cases the
> >> usage is like this:
> >>
> >> writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1) |
> >> host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid) |
> >> host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr), m->sync_aperture +
> >> host1x_sync_cfpeek_ctrl_r());
> >
> > Again there's no precedent for doing this with static inline
> > functions. You can do the same with macros. Type checking isn't an
> > issue in these cases since we're talking about bitfields for which
> > no proper type exists.
>
> I suspect the inline functions could encode signed-vs-unsigned fields,
> perhaps catch u8 variables when they should have been u32, etc.?

I don't see how this would be relevant here. These definitions are only
used in the driver internally and not a public API, therefore none of
those checks should really be needed. If somebody writes code for this
driver and uses the register definitions, they better know what they're
doing. Or at least wrong usage should be filtered out through review.

In my opinion the consistency with how other drivers are written far
outweigh the benefits provided by inline functions. That said, I'm out
of arguments and I don't have a final say anyway, so if it is decided
to stick with inline functions I can find a way to live with them.

Thierry
Stephen Warren
2012-12-03 19:20:30 UTC
Permalink
On 11/30/2012 03:38 AM, Thierry Reding wrote:
> On Fri, Nov 30, 2012 at 10:56:39AM +0200, Terje Bergstr=C3=B6m wrote:
>> On 29.11.2012 13:47, Thierry Reding wrote:
>>> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergstr=C3=B6m
>>> wrote:
>>>> Tegra20 and Tegra30 are compatible, but future chips are not.
>>>> I was hoping we would be ready in upstream kernel for future
>>>> chips.
>>>=20
>>> I think we should ignore that problem for now. Generally
>>> planning for any possible combination of incompatibilities
>>> leads to overgeneralized designs that require precisely these
>>> kinds of indirections.
>>>=20
>>> Once some documentation for Tegra 40 materializes we can start
>>> thinking about how to encapsulate the incompatible code.
>>=20
>> I think here our perspectives differ a lot. That is natural
>> considering the company I work for and company you work for, so
>> let's try to sync the perspective.
>>=20
>> In my reality, whatever is in market is old news and I barely
>> work on them anymore. Upstreaming activity is the exception. 90%
>> of my time is spent dealing with future chips which I know cannot
>> be handled without this split to logical and physical driver
>> parts.
>>=20
>> For you, Tegra2 and Tegra3 are the reality.
>=20
> To be fair, Tegra2 and Tegra3 are the reality for *everyone*
> *outside* NVIDIA.
>=20
> It's great that you spend most of your time working with future
> chips, but unless you submit the code for inclusion or review
> nobody upstream needs to be concerned about the implications. Most
> people don't have time to waste so we naturally try to keep the
> maintenance burden to a minimum.
>=20
> The above implies that when you submit code it shouldn't contain
> pieces that prepare for possible future extensions which may or may
> not be submitted (the exception being if such changes are part of a
> series where subsequent patches actually use them). The outcome is
> that the amount of cruft in the mainline kernel is kept to a
> minimum. And that's a very good thing.

I think there's room for letting Terje's complete knowledge of future
chips guide the design of the current code that's sent upstream.
Certainly we shouldn't add a ton of unnecessary abstraction layers
right now that aren't needed for Tegra20/30, but if there's some
decision that doesn't affect the bloat, opaqueness, ... of the current
code but one choice is better for future development without serious
negatives for the current code, it's pretty reasonable to make that
decision rather than the other.

(That all said, I haven't really followed the details of this
particular point, so I can't say how my comment applies to any
decisions being made right now - just that we shouldn't blanket reject
future knowledge when making decisions)

After all, making the right decision now will reduce the number/size
of patches later, and hence reduce code churn and reviewer load.
Thierry Reding
2012-12-03 21:03:29 UTC
Permalink
On Mon, Dec 03, 2012 at 12:20:30PM -0700, Stephen Warren wrote:
> On 11/30/2012 03:38 AM, Thierry Reding wrote:
> > On Fri, Nov 30, 2012 at 10:56:39AM +0200, Terje Bergström wrote:
> >> On 29.11.2012 13:47, Thierry Reding wrote:
> >>> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström
> >>> wrote:
> >>>> Tegra20 and Tegra30 are compatible, but future chips are not.
> >>>> I was hoping we would be ready in upstream kernel for future
> >>>> chips.
> >>>
> >>> I think we should ignore that problem for now. Generally
> >>> planning for any possible combination of incompatibilities
> >>> leads to overgeneralized designs that require precisely these
> >>> kinds of indirections.
> >>>
> >>> Once some documentation for Tegra 40 materializes we can start
> >>> thinking about how to encapsulate the incompatible code.
> >>
> >> I think here our perspectives differ a lot. That is natural
> >> considering the company I work for and company you work for, so
> >> let's try to sync the perspective.
> >>
> >> In my reality, whatever is in market is old news and I barely
> >> work on them anymore. Upstreaming activity is the exception. 90%
> >> of my time is spent dealing with future chips which I know cannot
> >> be handled without this split to logical and physical driver
> >> parts.
> >>
> >> For you, Tegra2 and Tegra3 are the reality.
> >
> > To be fair, Tegra2 and Tegra3 are the reality for *everyone*
> > *outside* NVIDIA.
> >
> > It's great that you spend most of your time working with future
> > chips, but unless you submit the code for inclusion or review
> > nobody upstream needs to be concerned about the implications. Most
> > people don't have time to waste so we naturally try to keep the
> > maintenance burden to a minimum.
> >
> > The above implies that when you submit code it shouldn't contain
> > pieces that prepare for possible future extensions which may or may
> > not be submitted (the exception being if such changes are part of a
> > series where subsequent patches actually use them). The outcome is
> > that the amount of cruft in the mainline kernel is kept to a
> > minimum. And that's a very good thing.
>
> I think there's room for letting Terje's complete knowledge of future
> chips guide the design of the current code that's sent upstream.
> Certainly we shouldn't add a ton of unnecessary abstraction layers
> right now that aren't needed for Tegra20/30, but if there's some
> decision that doesn't affect the bloat, opaqueness, ... of the current
> code but one choice is better for future development without serious
> negatives for the current code, it's pretty reasonable to make that
> decision rather than the other.

The original point was that the current design stashes every function of
host1x into an ops structure and you have to go through those ops to get
at the functionality. I can understand the need to add an ops structure
to cope with incompatibilities between versions, but as you say there
should to be a reason for them being introduced. If such reasons exists,
then I think they at least warrant a comment somewhere.

Furthermore this is usually best handled by wrapping the ops accesses in
a public API, so that the ops structure can be hidden within the driver.
For example, submitting a job to a channel should have a public API such
as:

int host1x_channel_submit(struct host1x_channel *channel,
struct host1x_job *job)
{
...
}

An initial implementation would just add the code into this function. If
it turns out some future version requires special incantations to submit
a job, only then should we introduce an ops structure, with only the one
function:

struct host1x_channel_ops {
int (*submit)(struct host1x_channel *channel,
struct host1x_job *job);
};

But since only the public API above has been used, access to the special
implementation can be hidden from the user. So the public function could
be modified in this way:

int host1x_channel_submit(struct hostx1_channel *channel,
struct host1x_job *job)
{
if (channel->ops && channel->ops->submit)
return channel->ops->submit(channel, job);

...
}

And then you have two choices: either you keep the code for previous
generations after the if block or you provide a separate ops structure
for older generations as well and handle them via the same code path.

One other thing that such a design can help with is refactoring common
code or parameterizing code. Maybe newer generations are not compatible
but can easily be made to work with existing code by introducing a
variable such as register stride or something.

What's really difficult to follow is if an ops structure is accessed via
some global macro. It also breaks encapsulation because you have a
global ops structure. That may even work fine for now, but it will break
once you have more than a single host1x in a system. I know this will
never happen, but all of a sudden it happens anyway and the code breaks.
Doing this right isn't very hard and it will lead to a better design and
is less likely to break at some point.

Thierry
Mark Zhang
2012-12-04 02:08:41 UTC
Permalink
On 12/04/2012 05:03 AM, Thierry Reding wrote:
[...]
>> I think there's room for letting Terje's complete knowledge of future
>> chips guide the design of the current code that's sent upstream.
>> Certainly we shouldn't add a ton of unnecessary abstraction layers
>> right now that aren't needed for Tegra20/30, but if there's some
>> decision that doesn't affect the bloat, opaqueness, ... of the current
>> code but one choice is better for future development without serious
>> negatives for the current code, it's pretty reasonable to make that
>> decision rather than the other.
>
> The original point was that the current design stashes every function of
> host1x into an ops structure and you have to go through those ops to get
> at the functionality. I can understand the need to add an ops structure
> to cope with incompatibilities between versions, but as you say there
> should to be a reason for them being introduced. If such reasons exists,
> then I think they at least warrant a comment somewhere.
>
> Furthermore this is usually best handled by wrapping the ops accesses in
> a public API, so that the ops structure can be hidden within the driver.
> For example, submitting a job to a channel should have a public API such
> as:
>
> int host1x_channel_submit(struct host1x_channel *channel,
> struct host1x_job *job)
> {
> ...
> }
>
> An initial implementation would just add the code into this function. If
> it turns out some future version requires special incantations to submit
> a job, only then should we introduce an ops structure, with only the one
> function:
>
> struct host1x_channel_ops {
> int (*submit)(struct host1x_channel *channel,
> struct host1x_job *job);
> };
>
> But since only the public API above has been used, access to the special
> implementation can be hidden from the user. So the public function could
> be modified in this way:
>
> int host1x_channel_submit(struct hostx1_channel *channel,
> struct host1x_job *job)
> {
> if (channel->ops && channel->ops->submit)
> return channel->ops->submit(channel, job);
>
> ...
> }
>

I guess we do this in exactly this way at the beginning. Then we
realized that we need to define callbacks to make different tegra has
different logics. So that's why we see the codes have lots of function
ops right now.

If so, this will make Terje modify the code back to the original
version, and this is not an interesting work.

Just my personal guess, no offence.

Mark
> And then you have two choices: either you keep the code for previous
> generations after the if block or you provide a separate ops structure
> for older generations as well and handle them via the same code path.
>
> One other thing that such a design can help with is refactoring common
> code or parameterizing code. Maybe newer generations are not compatible
> but can easily be made to work with existing code by introducing a
> variable such as register stride or something.
>
> What's really difficult to follow is if an ops structure is accessed via
> some global macro. It also breaks encapsulation because you have a
> global ops structure. That may even work fine for now, but it will break
> once you have more than a single host1x in a system. I know this will
> never happen, but all of a sudden it happens anyway and the code breaks.
> Doing this right isn't very hard and it will lead to a better design and
> is less likely to break at some point.
>
> Thierry
>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel-***@public.gmane.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Mark Zhang
2012-12-04 02:11:54 UTC
Permalink
On 12/04/2012 05:03 AM, Thierry Reding wrote:
[...]
>
> One other thing that such a design can help with is refactoring common
> code or parameterizing code. Maybe newer generations are not compatible
> but can easily be made to work with existing code by introducing a
> variable such as register stride or something.
>
> What's really difficult to follow is if an ops structure is accessed via
> some global macro. It also breaks encapsulation because you have a
> global ops structure. That may even work fine for now, but it will break
> once you have more than a single host1x in a system. I know this will
> never happen, but all of a sudden it happens anyway and the code breaks.
> Doing this right isn't very hard and it will lead to a better design and
> is less likely to break at some point.
>

Sorry I forget to reply this in last mail...

Agree. Even for userspace programs, we should avoid using global
variables as possible as we can. So we need to think about this and try
to reduce the number of global vars.

Mark
> Thierry
>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel-***@public.gmane.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Terje Bergström
2012-12-04 06:17:23 UTC
Permalink
On 03.12.2012 23:03, Thierry Reding wrote:
> What's really difficult to follow is if an ops structure is accessed via
> some global macro. It also breaks encapsulation because you have a
> global ops structure. That may even work fine for now, but it will break
> once you have more than a single host1x in a system. I know this will
> never happen, but all of a sudden it happens anyway and the code breaks.
> Doing this right isn't very hard and it will lead to a better design and
> is less likely to break at some point.

I agree that the chip ops access goes through too much indirection and
macro magic (which I already declared I hate), so we're going to design
something simpler.

Terje
Stephen Warren
2012-11-29 18:34:14 UTC
Permalink
On 11/29/2012 03:21 AM, Terje Bergstr=F6m wrote:
> On 28.11.2012 23:23, Thierry Reding wrote:
=2E..
>>> + regs =3D platform_get_resource(dev, IORESOURCE_MEM, 0);
>>> + intr0 =3D platform_get_resource(dev, IORESOURCE_IRQ, 0);
>>> + intr1 =3D platform_get_resource(dev, IORESOURCE_IRQ, 1);
>>> +
>>> + if (!regs || !intr0 || !intr1) {
>>
>> I prefer to have these checked for explicitly, one by one for
>> readability and potentially more useful diagnostics.
>=20
> Can do.
>=20
>> Also you should be using platform_get_irq() for interrupts. Furtherm=
ore
>> the host1x DT node (and the TRM) name the interrupts "syncpt" and
>> "general", so maybe those would be more useful variable names than
>> "intr0" and "intr1".
>>
>> But since you don't use them anyway they shouldn't be part of this
>> patch.
>=20
> True. I might also as well delete the general interrupt altogether, a=
s
> we don't use it for any real purpose.

Do make sure the interrupts still are part of the DT binding though, so
that the binding fully describes the HW, and the interrupt is available
to retrieve if we ever do use it in the future.

>>> + for (i =3D 0; i < pdata->num_clks; i++)
>>> + clk_prepare_enable(pdata->clk[i]);
>>> + nvhost_syncpt_reset(&host->syncpt);
>>> + for (i =3D 0; i < pdata->num_clks; i++)
>>> + clk_disable_unprepare(pdata->clk[i]);
>>
>> Stephen already hinted at this when discussing the AUXDATA. You shou=
ld
>> explicitly request the clocks.
>=20
> I'm not too happy about that idea. The clock management code is gener=
ic
> for all modules, and that's why it's driven by a data structure. Now
> Stephen and you ask me to unroll the loops and make copies of the cod=
e
> to facilitate different modules and different SoCs.

You can still create tables of clocks inside the driver and loop over
them. So, loop unrolling isn't related to my comments at least. It's
just that clk_get() shouldn't take its parameters from platform data.

But if these are clocks for (arbitrary) child modules (that may or may
not exist dynamically), why aren't the drivers for the child modules
managing them?
Terje Bergström
2012-11-30 06:54:32 UTC
Permalink
On 29.11.2012 20:34, Stephen Warren wrote:
> On 11/29/2012 03:21 AM, Terje Bergström wrote:
>> True. I might also as well delete the general interrupt altogether, as
>> we don't use it for any real purpose.
>
> Do make sure the interrupts still are part of the DT binding though, so
> that the binding fully describes the HW, and the interrupt is available
> to retrieve if we ever do use it in the future.

Sure, I will just not use the generic irq in DT, but it won't require
any changes in DT bindings.

> You can still create tables of clocks inside the driver and loop over
> them. So, loop unrolling isn't related to my comments at least. It's
> just that clk_get() shouldn't take its parameters from platform data.
>
> But if these are clocks for (arbitrary) child modules (that may or may
> not exist dynamically), why aren't the drivers for the child modules
> managing them?

There are actually two things here that I mixed, and because of that I
probably confused everybody else.

Let's rip out the ACM. ACM is generic to all modules, and in nvhost owns
the clocks. That's why list of clocks and their frequency policies have
been part of the device description in nvhost. ACM is being replaced
with runtime PM in downstream kernel, but it still requires rigorous
testing and analysis of power profile before we can move to it.

Then, the second thing is that nvhost_probe() has had its own loop to go
through the clocks of host1x module. It's copy-paste of what ACM did,
which is just bad design. That's easily replaceable with static code, as
nvhost_probe() is just for host1x. I'll do that, and as I rip out the
generic power management code, I'll also make 2D and host1x drivers
enable the clocks at probe with static code.

So I think we have a solution that resonates with all proposals.

Best regards,
Terje
Thierry Reding
2012-11-30 06:53:58 UTC
Permalink
On Fri, Nov 30, 2012 at 08:54:32AM +0200, Terje Bergström wrote:
> On 29.11.2012 20:34, Stephen Warren wrote:
> > On 11/29/2012 03:21 AM, Terje Bergström wrote:
> >> True. I might also as well delete the general interrupt altogether, as
> >> we don't use it for any real purpose.
> >
> > Do make sure the interrupts still are part of the DT binding though, so
> > that the binding fully describes the HW, and the interrupt is available
> > to retrieve if we ever do use it in the future.
>
> Sure, I will just not use the generic irq in DT, but it won't require
> any changes in DT bindings.
>
> > You can still create tables of clocks inside the driver and loop over
> > them. So, loop unrolling isn't related to my comments at least. It's
> > just that clk_get() shouldn't take its parameters from platform data.
> >
> > But if these are clocks for (arbitrary) child modules (that may or may
> > not exist dynamically), why aren't the drivers for the child modules
> > managing them?
>
> There are actually two things here that I mixed, and because of that I
> probably confused everybody else.
>
> Let's rip out the ACM. ACM is generic to all modules, and in nvhost owns
> the clocks. That's why list of clocks and their frequency policies have
> been part of the device description in nvhost. ACM is being replaced
> with runtime PM in downstream kernel, but it still requires rigorous
> testing and analysis of power profile before we can move to it.
>
> Then, the second thing is that nvhost_probe() has had its own loop to go
> through the clocks of host1x module. It's copy-paste of what ACM did,
> which is just bad design. That's easily replaceable with static code, as
> nvhost_probe() is just for host1x. I'll do that, and as I rip out the
> generic power management code, I'll also make 2D and host1x drivers
> enable the clocks at probe with static code.
>
> So I think we have a solution that resonates with all proposals.

Yes, that sounds good to me.

Thierry
Mark Zhang
2012-11-29 09:10:46 UTC
Permalink
On 11/26/2012 09:19 PM, Terje Bergström <***@nvidia.com> wrote:
> Add nvhost, the driver for host1x. This patch adds support for reading and
> incrementing sync points and dynamic power management.
>
> Signed-off-by: Terje Bergstrom <***@nvidia.com>
>
> ---
> drivers/video/Kconfig | 2 +
> drivers/video/Makefile | 2 +
> drivers/video/tegra/host/Kconfig | 5 +
> drivers/video/tegra/host/Makefile | 10 +
> drivers/video/tegra/host/chip_support.c | 48 ++
> drivers/video/tegra/host/chip_support.h | 52 +++
> drivers/video/tegra/host/dev.c | 96 ++++
> drivers/video/tegra/host/host1x/Makefile | 7 +
> drivers/video/tegra/host/host1x/host1x.c | 204 +++++++++
> drivers/video/tegra/host/host1x/host1x.h | 78 ++++
> drivers/video/tegra/host/host1x/host1x01.c | 37 ++
> drivers/video/tegra/host/host1x/host1x01.h | 29 ++
> .../video/tegra/host/host1x/host1x01_hardware.h | 36 ++
> drivers/video/tegra/host/host1x/host1x_syncpt.c | 156 +++++++
> drivers/video/tegra/host/host1x/hw_host1x01_sync.h | 398 ++++++++++++++++
> drivers/video/tegra/host/nvhost_acm.c | 481 ++++++++++++++++++++
> drivers/video/tegra/host/nvhost_acm.h | 45 ++
> drivers/video/tegra/host/nvhost_syncpt.c | 333 ++++++++++++++
> drivers/video/tegra/host/nvhost_syncpt.h | 136 ++++++
> include/linux/nvhost.h | 143 ++++++
> 20 files changed, 2298 insertions(+)
[...]
> diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/tegra/host/chip_support.c
> +#include "chip_support.h"
> +#include "host1x/host1x01.h"
> +
> +struct nvhost_chip_support *nvhost_chip_ops;
> +
> +struct nvhost_chip_support *nvhost_get_chip_ops(void)
> +{
> + return nvhost_chip_ops;
> +}

If you wanna hide "nvhost_chip_ops" from other source files, declare it
as "static". So this is not a static member which means other files is
able to touch it by "extern" but we also define a function to get it,
and this looks redundant.

[...]
> diff --git a/drivers/video/tegra/host/host1x/Makefile b/drivers/video/tegra/host/host1x/Makefile
> new file mode 100644
> index 0000000..330d507
> --- /dev/null
> +++ b/drivers/video/tegra/host/host1x/Makefile
> @@ -0,0 +1,7 @@
> +ccflags-y = -Idrivers/video/tegra/host
> +
> +nvhost-host1x-objs = \
> + host1x.o \
> + host1x01.o

Can we rename this "host1x01.c"? I just really don't like this kind of
variables/files, I mean, I can't imagine the purpose of the file
according to it's name...

[...]
> +
> +static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
> +{
> + int err;
> +
> + err = nvhost_init_chip_support(host);
> + if (err)
> + return err;
> +
> + return 0;
> +}

Just "return nvhost_init_chip_support(host)" is enough. If so, do we
still need this function?

[...]
> +
> +static int __devinit nvhost_probe(struct platform_device *dev)
> +
[...]
> + dev_info(&dev->dev, "initialized\n");
> +
> + return 0;
> +
> +fail:

Add more "free" codes here. Actually, "nvhost_free_resources" frees the
host->intr.syncpt which is not needed to free manually.
Seems at least we need to add "nvhost_syncpt_deinit" here.

[...]
> +
> +static struct of_device_id host1x_match[] __devinitdata = {
> + { .compatible = "nvidia,tegra20-host1x", },
> + { .compatible = "nvidia,tegra30-host1x", },

Again, place tegra30-host1x before tegra20-host1x.

[...]
> +
> +/**
> + * Write a cpu syncpoint increment to the hardware, without touching
> + * the cache. Caller is responsible for host being powered.
> + */
> +static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id)
> +{
> + struct nvhost_master *dev = syncpt_to_dev(sp);
> + u32 reg_offset = id / 32;
> +
> + if (!nvhost_module_powered(dev->dev)) {
> + dev_err(&syncpt_to_dev(sp)->dev->dev,
> + "Trying to access host1x when it's off");
> + return;
> + }
> +
> + if (!nvhost_syncpt_client_managed(sp, id)
> + && nvhost_syncpt_min_eq_max(sp, id)) {
> + dev_err(&syncpt_to_dev(sp)->dev->dev,
> + "Trying to increment syncpoint id %d beyond max\n",
> + id);
> + return;
> + }
> + writel(BIT_MASK(id), dev->sync_aperture +
> + host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4);

I have a stupid question: According to the name and the context of this
function, seems it increases the syncpt value which specified by param
"id". So how does this "writel" increase the value? I don't know much
about host1x/syncpt reg operations, so could you explain a little bit or
I just completely have a wrong understanding?

[...]
> +
> +static ssize_t powergate_delay_store(struct kobject *kobj,
> + struct kobj_attribute *attr, const char *buf, size_t count)
> +{
> + int powergate_delay = 0, ret = 0;
> + struct nvhost_device_power_attr *power_attribute =
> + container_of(attr, struct nvhost_device_power_attr,
> + power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]);
> + struct platform_device *dev = power_attribute->ndev;
> + struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> +
> + if (!pdata->can_powergate) {
> + dev_info(&dev->dev, "does not support power-gating\n");
> + return count;
> + }
> +
> + mutex_lock(&pdata->lock);
> + ret = sscanf(buf, "%d", &powergate_delay);
> + if (ret == 1 && powergate_delay >= 0)
> + pdata->powergate_delay = powergate_delay;
> + else
> + dev_err(&dev->dev, "Invalid powergate delay\n");
> + mutex_unlock(&pdata->lock);
> +
> + return count;

Why we need to return an unchanged param? Seems param "count" doesn't
make sense here.

[...]
> +
> +int nvhost_module_init(struct platform_device *dev)
> +{
> + int i = 0, err = 0;
> + struct kobj_attribute *attr = NULL;
> + struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> +
> + /* initialize clocks to known state */
> + while (pdata->clocks[i].name && i < NVHOST_MODULE_MAX_CLOCKS) {
> + long rate = pdata->clocks[i].default_rate;
> + struct clk *c;
> +
> + c = devm_clk_get(&dev->dev, pdata->clocks[i].name);
> + if (IS_ERR_OR_NULL(c)) {
> + dev_err(&dev->dev, "Cannot get clock %s\n",
> + pdata->clocks[i].name);
> + return -ENODEV;
> + }
> +
> + rate = clk_round_rate(c, rate);
> + clk_prepare_enable(c);
> + clk_set_rate(c, rate);
> + clk_disable_unprepare(c);
> + pdata->clk[i] = c;
> + i++;
> + }
> + pdata->num_clks = i;
> +
> + mutex_init(&pdata->lock);
> + init_waitqueue_head(&pdata->idle_wq);
> + INIT_DELAYED_WORK(&pdata->powerstate_down, powerstate_down_handler);
> +
> + /* power gate units that we can power gate */
> + if (pdata->can_powergate) {
> + do_powergate_locked(pdata->powergate_ids[0]);
> + do_powergate_locked(pdata->powergate_ids[1]);

Seems we don't set these 2 powergate_ids. Does this mean we have not
enabled power management feature in this version?

[...]
> +
> +int nvhost_module_suspend(struct platform_device *dev)
> +{
> + int ret;
> + struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> +
> + ret = wait_event_timeout(pdata->idle_wq, is_module_idle(dev),
> + ACM_SUSPEND_WAIT_FOR_IDLE_TIMEOUT);
> + if (ret == 0) {
> + dev_info(&dev->dev, "%s prevented suspend\n",
> + dev_name(&dev->dev));
> + return -EBUSY;
> + }
> +

I'm not sure whether there is a race condition here. We wait until this
module is idle(refcount == 0), then try to powergate it next. But the
wait queue function "powerstate_down_handler" might already powergate
it. So we need to either "cancel_delayed_work(&pdate->powerstate_down)"
before waiting the module to idle state or add some protection codes in
"to_state_powergated_locked".

> + mutex_lock(&pdata->lock);
> + cancel_delayed_work(&pdata->powerstate_down);
> + to_state_powergated_locked(dev);
> + mutex_unlock(&pdata->lock);
> +
> + if (pdata->suspend_ndev)
> + pdata->suspend_ndev(dev);
> +
> + return 0;
> +}
> +
[...]
> +
> +int nvhost_syncpt_init(struct platform_device *dev,
> + struct nvhost_syncpt *sp)
> +{
> + int i;
> + struct nvhost_master *host = syncpt_to_dev(sp);
> + int err = 0;
> +
> + /* Allocate structs for min, max and base values */
> + sp->min_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
> + GFP_KERNEL);
> + sp->max_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
> + GFP_KERNEL);
> + sp->base_val = kzalloc(sizeof(u32) * nvhost_syncpt_nb_bases(sp),
> + GFP_KERNEL);
> + sp->lock_counts =
> + kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_mlocks(sp),
> + GFP_KERNEL);
> +
> + if (!(sp->min_val && sp->max_val && sp->base_val && sp->lock_counts)) {
> + /* frees happen in the deinit */
> + err = -ENOMEM;
> + goto fail;
> + }
> +
> + sp->kobj = kobject_create_and_add("syncpt", &dev->dev.kobj);
> + if (!sp->kobj) {
> + err = -EIO;
> + goto fail;
> + }
> +
> + /* Allocate two attributes for each sync point: min and max */
> + sp->syncpt_attrs = kzalloc(sizeof(*sp->syncpt_attrs)
> + * nvhost_syncpt_nb_pts(sp) * 2, GFP_KERNEL);
> + if (!sp->syncpt_attrs) {
> + err = -ENOMEM;
> + goto fail;
> + }
> +
> + /* Fill in the attributes */
> + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
> + char name[MAX_SYNCPT_LENGTH];
> + struct kobject *kobj;
> + struct nvhost_syncpt_attr *min = &sp->syncpt_attrs[i*2];
> + struct nvhost_syncpt_attr *max = &sp->syncpt_attrs[i*2+1];
> +
> + /* Create one directory per sync point */
> + snprintf(name, sizeof(name), "%d", i);
> + kobj = kobject_create_and_add(name, sp->kobj);

Where do we "kobject_put" this kobj?

[...]
> + if (!kobj) {
> + err = -EIO;
> + goto fail;
> + }
> +
> + min->id = i;
> + min->host = host;
> + min->attr.attr.name = min_name;
> + min->attr.attr.mode = S_IRUGO;
> + min->attr.show = syncpt_min_show;
> + if (sysfs_create_file(kobj, &min->attr.attr)) {
> + err = -EIO;
> + goto fail;
> + }
> +
> + max->id = i;
> + max->host = host;
> + max->attr.attr.name = max_name;
> + max->attr.attr.mode = S_IRUGO;
> + max->attr.show = syncpt_max_show;
> + if (sysfs_create_file(kobj, &max->attr.attr)) {
> + err = -EIO;
> + goto fail;
> + }
> + }
> +
> + return err;
> +
> +fail:
> + nvhost_syncpt_deinit(sp);
> + return err;
> +}
> +
[...]
> +/* public host1x sync-point management APIs */
> +u32 host1x_syncpt_incr_max(u32 id, u32 incrs);
> +void host1x_syncpt_incr(u32 id);
> +u32 host1x_syncpt_read(u32 id);
> +
> +#endif
>
Terje Bergstrom
2012-11-26 13:19:11 UTC
Permalink
Add SoC specific auxiliary data to host1x and gr2d. nvhost uses
this data.

Signed-off-by: Terje Bergstrom <***@nvidia.com>
Signed-off-by: Arto Merilainen <***@nvidia.com>
---
arch/arm/mach-tegra/board-dt-tegra20.c | 38 ++++++++++++++++++++++++++++-
arch/arm/mach-tegra/board-dt-tegra30.c | 38 ++++++++++++++++++++++++++++-
arch/arm/mach-tegra/tegra20_clocks_data.c | 8 +++---
arch/arm/mach-tegra/tegra30_clocks_data.c | 2 ++
4 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/arch/arm/mach-tegra/board-dt-tegra20.c b/arch/arm/mach-tegra/board-dt-tegra20.c
index 1d30eac..c695392 100644
--- a/arch/arm/mach-tegra/board-dt-tegra20.c
+++ b/arch/arm/mach-tegra/board-dt-tegra20.c
@@ -33,6 +33,7 @@
#include <linux/i2c.h>
#include <linux/i2c-tegra.h>
#include <linux/usb/tegra_usb_phy.h>
+#include <linux/nvhost.h>

#include <asm/hardware/gic.h>
#include <asm/mach-types.h>
@@ -45,6 +46,38 @@
#include "common.h"
#include "iomap.h"

+static const char *host1x_syncpt_names[32] = {
+ [0] = "gfx_host",
+ [NVSYNCPT_2D_0] = "2d_0",
+ [NVSYNCPT_2D_1] = "2d_1",
+ [NVSYNCPT_VBLANK0] = "vblank0",
+ [NVSYNCPT_VBLANK1] = "vblank1",
+};
+
+static struct host1x_device_info host1x_info = {
+ .nb_channels = 8,
+ .nb_pts = 32,
+ .nb_mlocks = 16,
+ .nb_bases = 8,
+ .syncpt_names = host1x_syncpt_names,
+ .client_managed = NVSYNCPTS_CLIENT_MANAGED,
+};
+
+static struct nvhost_device_data tegra_host1x_info = {
+ .clocks = { {"host1x", UINT_MAX} },
+ NVHOST_MODULE_NO_POWERGATE_IDS,
+ .private_data = &host1x_info,
+};
+
+static struct nvhost_device_data tegra_gr2d_info = {
+ .index = 2,
+ .syncpts = BIT(NVSYNCPT_2D_0) | BIT(NVSYNCPT_2D_1),
+ .clocks = { {"gr2d", UINT_MAX, true}, {"epp", UINT_MAX, true} },
+ NVHOST_MODULE_NO_POWERGATE_IDS,
+ .clockgate_delay = 0,
+ .serialize = true,
+};
+
struct tegra_ehci_platform_data tegra_ehci1_pdata = {
.operating_mode = TEGRA_USB_OTG,
.power_down_on_bus_suspend = 1,
@@ -94,13 +127,16 @@ struct of_dev_auxdata tegra20_auxdata_lookup[] __initdata = {
OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000D600, "spi_tegra.1", NULL),
OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000D800, "spi_tegra.2", NULL),
OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000DA00, "spi_tegra.3", NULL),
- OF_DEV_AUXDATA("nvidia,tegra20-host1x", 0x50000000, "host1x", NULL),
+ OF_DEV_AUXDATA("nvidia,tegra20-host1x", 0x50000000, "host1x",
+ &tegra_host1x_info),
OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54200000, "tegradc.0", NULL),
OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54240000, "tegradc.1", NULL),
OF_DEV_AUXDATA("nvidia,tegra20-hdmi", 0x54280000, "hdmi", NULL),
OF_DEV_AUXDATA("nvidia,tegra20-dsi", 0x54300000, "dsi", NULL),
OF_DEV_AUXDATA("nvidia,tegra20-tvo", 0x542c0000, "tvo", NULL),
OF_DEV_AUXDATA("nvidia,tegra20-nand", 0x70008000, "tegra-nand", NULL),
+ OF_DEV_AUXDATA("nvidia,tegra20-gr2d", 0x54140000, "tegra-gr2d",
+ &tegra_gr2d_info),
{}
};

diff --git a/arch/arm/mach-tegra/board-dt-tegra30.c b/arch/arm/mach-tegra/board-dt-tegra30.c
index 6497d12..1afa68b 100644
--- a/arch/arm/mach-tegra/board-dt-tegra30.c
+++ b/arch/arm/mach-tegra/board-dt-tegra30.c
@@ -29,6 +29,7 @@
#include <linux/of_fdt.h>
#include <linux/of_irq.h>
#include <linux/of_platform.h>
+#include <linux/nvhost.h>

#include <asm/mach/arch.h>
#include <asm/hardware/gic.h>
@@ -38,6 +39,38 @@
#include "common.h"
#include "iomap.h"

+static const char *host1x_syncpt_names[32] = {
+ [0] = "gfx_host",
+ [NVSYNCPT_2D_0] = "2d_0",
+ [NVSYNCPT_2D_1] = "2d_1",
+ [NVSYNCPT_VBLANK0] = "vblank0",
+ [NVSYNCPT_VBLANK1] = "vblank1",
+};
+
+static struct host1x_device_info host1x_info = {
+ .nb_channels = 8,
+ .nb_pts = 32,
+ .nb_mlocks = 16,
+ .nb_bases = 8,
+ .syncpt_names = host1x_syncpt_names,
+ .client_managed = NVSYNCPTS_CLIENT_MANAGED,
+};
+
+static struct nvhost_device_data tegra_host1x_info = {
+ .clocks = { {"host1x", UINT_MAX} },
+ NVHOST_MODULE_NO_POWERGATE_IDS,
+ .private_data = &host1x_info,
+};
+
+static struct nvhost_device_data tegra_gr2d_info = {
+ .index = 2,
+ .syncpts = BIT(NVSYNCPT_2D_0) | BIT(NVSYNCPT_2D_1),
+ .clocks = { {"gr2d", UINT_MAX, true}, {"epp", UINT_MAX, true} },
+ NVHOST_MODULE_NO_POWERGATE_IDS,
+ .clockgate_delay = 0,
+ .serialize = true,
+};
+
struct of_dev_auxdata tegra30_auxdata_lookup[] __initdata = {
OF_DEV_AUXDATA("nvidia,tegra20-sdhci", 0x78000000, "sdhci-tegra.0", NULL),
OF_DEV_AUXDATA("nvidia,tegra20-sdhci", 0x78000200, "sdhci-tegra.1", NULL),
@@ -57,12 +90,15 @@ struct of_dev_auxdata tegra30_auxdata_lookup[] __initdata = {
OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DA00, "spi_tegra.3", NULL),
OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DC00, "spi_tegra.4", NULL),
OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DE00, "spi_tegra.5", NULL),
- OF_DEV_AUXDATA("nvidia,tegra30-host1x", 0x50000000, "host1x", NULL),
+ OF_DEV_AUXDATA("nvidia,tegra30-host1x", 0x50000000, "host1x",
+ &tegra_host1x_info),
OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54200000, "tegradc.0", NULL),
OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54240000, "tegradc.1", NULL),
OF_DEV_AUXDATA("nvidia,tegra30-hdmi", 0x54280000, "hdmi", NULL),
OF_DEV_AUXDATA("nvidia,tegra30-dsi", 0x54300000, "dsi", NULL),
OF_DEV_AUXDATA("nvidia,tegra30-tvo", 0x542c0000, "tvo", NULL),
+ OF_DEV_AUXDATA("nvidia,tegra30-gr2d", 0x54140000, "gr2d",
+ &tegra_gr2d_info),
{}
};

diff --git a/arch/arm/mach-tegra/tegra20_clocks_data.c b/arch/arm/mach-tegra/tegra20_clocks_data.c
index 7f049ac..3314e50 100644
--- a/arch/arm/mach-tegra/tegra20_clocks_data.c
+++ b/arch/arm/mach-tegra/tegra20_clocks_data.c
@@ -1041,10 +1041,10 @@ static struct clk_duplicate tegra_clk_duplicates[] = {
CLK_DUPLICATE("usbd", "utmip-pad", NULL),
CLK_DUPLICATE("usbd", "tegra-ehci.0", NULL),
CLK_DUPLICATE("usbd", "tegra-otg", NULL),
- CLK_DUPLICATE("2d", "tegra_grhost", "gr2d"),
- CLK_DUPLICATE("3d", "tegra_grhost", "gr3d"),
- CLK_DUPLICATE("epp", "tegra_grhost", "epp"),
- CLK_DUPLICATE("mpe", "tegra_grhost", "mpe"),
+ CLK_DUPLICATE("2d", NULL, "gr2d"),
+ CLK_DUPLICATE("3d", NULL, "gr3d"),
+ CLK_DUPLICATE("epp", NULL, "epp"),
+ CLK_DUPLICATE("mpe", NULL, "mpe"),
CLK_DUPLICATE("cop", "tegra-avp", "cop"),
CLK_DUPLICATE("vde", "tegra-aes", "vde"),
CLK_DUPLICATE("cclk", NULL, "cpu"),
diff --git a/arch/arm/mach-tegra/tegra30_clocks_data.c b/arch/arm/mach-tegra/tegra30_clocks_data.c
index 6942c7a..f30bd54 100644
--- a/arch/arm/mach-tegra/tegra30_clocks_data.c
+++ b/arch/arm/mach-tegra/tegra30_clocks_data.c
@@ -1338,6 +1338,8 @@ struct clk_duplicate tegra_clk_duplicates[] = {
CLK_DUPLICATE("pll_p", "tegradc.0", "parent"),
CLK_DUPLICATE("pll_p", "tegradc.1", "parent"),
CLK_DUPLICATE("pll_d2_out0", "hdmi", "parent"),
+ CLK_DUPLICATE("2d", NULL, "gr2d"),
+ CLK_DUPLICATE("epp", NULL, "epp"),
};

struct clk *tegra_ptr_clks[] = {
--
1.7.9.5
Stephen Warren
2012-11-26 23:39:56 UTC
Permalink
On 11/26/2012 06:19 AM, Terje Bergstrom wrote:
> Add SoC specific auxiliary data to host1x and gr2d. nvhost uses
> this data.
>
> Signed-off-by: Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org>
> Signed-off-by: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+***@public.gmane.org>

Arto's S-o-b really should be first and yours last since you're (Terje)
the one who touched the patches last.

> diff --git a/arch/arm/mach-tegra/board-dt-tegra20.c b/arch/arm/mach-tegra/board-dt-tegra20.c

I think none of the changes the board-dt-tegra*.c should be made.

AUXDATA is a temporary measure to keep things working during the
transition to device tree. We want to remove entries from the AUXDATA
tables rather than add them. The only thing that's stopping us from
doing so right now is the lack of DT-based clock lookups, which hence
require devices to have a specific name.

> +static struct nvhost_device_data tegra_host1x_info = {
> + .clocks = { {"host1x", UINT_MAX} },

> +static struct nvhost_device_data tegra_gr2d_info = {
> + .clocks = { {"gr2d", UINT_MAX, true}, {"epp", UINT_MAX, true} },

Clock names shouldn't be passed in platform data; instead, clk_get()
should be passed the device object and device-relative (i.e. not global)
clock name. I expect if the driver is fixed to make this change, the
changes to tegra*_clocks_data.c won't be needed either.
Terje Bergström
2012-11-27 06:33:12 UTC
Permalink
On 27.11.2012 01:39, Stephen Warren wrote:
> Clock names shouldn't be passed in platform data; instead, clk_get()
> should be passed the device object and device-relative (i.e. not global)
> clock name. I expect if the driver is fixed to make this change, the
> changes to tegra*_clocks_data.c won't be needed either.

Isn't this code doing exactly that - getting a device relative clock,
nvhost_module_init() in nvhost.acm.c:

(...)
/* initialize clocks to known state */
while (pdata->clocks[i].name && i < NVHOST_MODULE_MAX_CLOCKS) {
long rate = pdata->clocks[i].default_rate;
struct clk *c;

c = devm_clk_get(&dev->dev, pdata->clocks[i].name);
if (IS_ERR_OR_NULL(c)) {
dev_err(&dev->dev, "Cannot get clock %s\n",
pdata->clocks[i].name);
return -ENODEV;
}

rate = clk_round_rate(c, rate);
clk_prepare_enable(c);
clk_set_rate(c, rate);
clk_disable_unprepare(c);
pdata->clk[i] = c;
i++;
}
(...)

Without the clock changes, the clocks in board files are now assigned to
devid "tegra_grhost". I guess the correct way to do this would be to
assign them to "tegra-gr2d" (2d, epp) and "host1x" - except if we also
want to drop "tegra-" from the device name.

Terje
Stephen Warren
2012-11-27 17:17:42 UTC
Permalink
On 11/26/2012 11:33 PM, Terje Bergstr=F6m wrote:
> On 27.11.2012 01:39, Stephen Warren wrote:
>> Clock names shouldn't be passed in platform data; instead, clk_get()
>> should be passed the device object and device-relative (i.e. not glo=
bal)
>> clock name. I expect if the driver is fixed to make this change, the
>> changes to tegra*_clocks_data.c won't be needed either.
>=20
> Isn't this code doing exactly that - getting a device relative clock,
> nvhost_module_init() in nvhost.acm.c:
>=20
> (...)
> /* initialize clocks to known state */
> while (pdata->clocks[i].name && i < NVHOST_MODULE_MAX_CLOCKS) {
> long rate =3D pdata->clocks[i].default_rate;
> struct clk *c;
>=20
> c =3D devm_clk_get(&dev->dev, pdata->clocks[i].name);

The line above is getting the (device-relative) clock name from platfor=
m
data, rather than using some fixed name as it should be.
Terje Bergstrom
2012-11-26 13:19:10 UTC
Permalink
Add support for host1x debugging. Adds debugfs entries, and dumps
channel state to UART in case of stuck submit.

Signed-off-by: Terje Bergstrom <tbergstrom-DDmLM1+adcrQT0dZR+***@public.gmane.org>
---
drivers/video/tegra/host/Makefile | 1 +
drivers/video/tegra/host/bus_client.c | 3 +
drivers/video/tegra/host/chip_support.h | 16 +
drivers/video/tegra/host/debug.c | 252 ++++++++++++++
drivers/video/tegra/host/debug.h | 50 +++
drivers/video/tegra/host/host1x/host1x.c | 3 +
drivers/video/tegra/host/host1x/host1x01.c | 2 +
drivers/video/tegra/host/host1x/host1x_cdma.c | 3 +
drivers/video/tegra/host/host1x/host1x_debug.c | 405 +++++++++++++++++++++++
drivers/video/tegra/host/host1x/host1x_syncpt.c | 1 +
drivers/video/tegra/host/nvhost_cdma.c | 1 +
drivers/video/tegra/host/nvhost_syncpt.c | 2 +
12 files changed, 739 insertions(+)
create mode 100644 drivers/video/tegra/host/debug.c
create mode 100644 drivers/video/tegra/host/debug.h
create mode 100644 drivers/video/tegra/host/host1x/host1x_debug.c

diff --git a/drivers/video/tegra/host/Makefile b/drivers/video/tegra/host/Makefile
index 128ad03..9553b3a 100644
--- a/drivers/video/tegra/host/Makefile
+++ b/drivers/video/tegra/host/Makefile
@@ -8,6 +8,7 @@ nvhost-objs = \
nvhost_channel.o \
nvhost_job.o \
dev.o \
+ debug.o \
bus_client.o \
chip_support.o \
nvhost_memmgr.o \
diff --git a/drivers/video/tegra/host/bus_client.c b/drivers/video/tegra/host/bus_client.c
index 3986185..1b02836 100644
--- a/drivers/video/tegra/host/bus_client.c
+++ b/drivers/video/tegra/host/bus_client.c
@@ -35,6 +35,7 @@

#include <linux/nvhost.h>

+#include "debug.h"
#include "dev.h"
#include "nvhost_memmgr.h"
#include "chip_support.h"
@@ -68,6 +69,8 @@ int nvhost_client_device_init(struct platform_device *dev)
if (err)
goto fail;

+ nvhost_device_debug_init(dev);
+
dev_info(&dev->dev, "initialized\n");

return 0;
diff --git a/drivers/video/tegra/host/chip_support.h b/drivers/video/tegra/host/chip_support.h
index ff141ed..efc8c10 100644
--- a/drivers/video/tegra/host/chip_support.h
+++ b/drivers/video/tegra/host/chip_support.h
@@ -76,6 +76,21 @@ struct nvhost_pushbuffer_ops {
u32 (*putptr)(struct push_buffer *);
};

+struct nvhost_debug_ops {
+ void (*debug_init)(struct dentry *de);
+ void (*show_channel_cdma)(struct nvhost_master *,
+ struct nvhost_channel *,
+ struct output *,
+ int chid);
+ void (*show_channel_fifo)(struct nvhost_master *,
+ struct nvhost_channel *,
+ struct output *,
+ int chid);
+ void (*show_mlocks)(struct nvhost_master *m,
+ struct output *o);
+
+};
+
struct nvhost_syncpt_ops {
void (*reset)(struct nvhost_syncpt *, u32 id);
void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
@@ -113,6 +128,7 @@ struct nvhost_chip_support {
struct nvhost_channel_ops channel;
struct nvhost_cdma_ops cdma;
struct nvhost_pushbuffer_ops push_buffer;
+ struct nvhost_debug_ops debug;
struct nvhost_syncpt_ops syncpt;
struct nvhost_intr_ops intr;
struct nvhost_dev_ops nvhost_dev;
diff --git a/drivers/video/tegra/host/debug.c b/drivers/video/tegra/host/debug.c
new file mode 100644
index 0000000..496c5a1
--- /dev/null
+++ b/drivers/video/tegra/host/debug.c
@@ -0,0 +1,252 @@
+/*
+ * drivers/video/tegra/host/debug.c
+ *
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling <konkers-***@public.gmane.org>
+ *
+ * Copyright (C) 2011-2012 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/uaccess.h>
+
+#include <linux/io.h>
+
+#include "dev.h"
+#include "debug.h"
+#include "nvhost_acm.h"
+#include "nvhost_channel.h"
+#include "chip_support.h"
+
+pid_t nvhost_debug_null_kickoff_pid;
+
+pid_t nvhost_debug_force_timeout_pid;
+u32 nvhost_debug_force_timeout_val;
+u32 nvhost_debug_force_timeout_channel;
+
+void nvhost_debug_output(struct output *o, const char *fmt, ...)
+{
+ va_list args;
+ int len;
+
+ va_start(args, fmt);
+ len = vsnprintf(o->buf, sizeof(o->buf), fmt, args);
+ va_end(args);
+ o->fn(o->ctx, o->buf, len);
+}
+
+static int show_channels(struct platform_device *pdev, void *data)
+{
+ struct nvhost_channel *ch;
+ struct output *o = data;
+ struct nvhost_master *m;
+ struct nvhost_device_data *pdata;
+
+ if (pdev == NULL)
+ return 0;
+
+ pdata = platform_get_drvdata(pdev);
+ m = nvhost_get_host(pdev);
+ ch = pdata->channel;
+ if (ch) {
+ mutex_lock(&ch->reflock);
+ if (ch->refcount) {
+ mutex_lock(&ch->cdma.lock);
+ nvhost_get_chip_ops()->debug.show_channel_fifo(
+ m, ch, o, pdata->index);
+ nvhost_get_chip_ops()->debug.show_channel_cdma(
+ m, ch, o, pdata->index);
+ mutex_unlock(&ch->cdma.lock);
+ }
+ mutex_unlock(&ch->reflock);
+ }
+
+ return 0;
+}
+
+static void show_syncpts(struct nvhost_master *m, struct output *o)
+{
+ int i;
+ nvhost_debug_output(o, "---- syncpts ----\n");
+ for (i = 0; i < nvhost_syncpt_nb_pts(&m->syncpt); i++) {
+ u32 max = nvhost_syncpt_read_max(&m->syncpt, i);
+ u32 min = nvhost_syncpt_update_min(&m->syncpt, i);
+ if (!min && !max)
+ continue;
+ nvhost_debug_output(o, "id %d (%s) min %d max %d\n",
+ i, nvhost_get_chip_ops()->syncpt.name(&m->syncpt, i),
+ min, max);
+ }
+
+ for (i = 0; i < nvhost_syncpt_nb_bases(&m->syncpt); i++) {
+ u32 base_val;
+ base_val = nvhost_syncpt_read_wait_base(&m->syncpt, i);
+ if (base_val)
+ nvhost_debug_output(o, "waitbase id %d val %d\n",
+ i, base_val);
+ }
+
+ nvhost_debug_output(o, "\n");
+}
+
+static void show_all(struct nvhost_master *m, struct output *o)
+{
+ nvhost_module_busy(m->dev);
+
+ nvhost_get_chip_ops()->debug.show_mlocks(m, o);
+ show_syncpts(m, o);
+ nvhost_debug_output(o, "---- channels ----\n");
+ nvhost_device_list_for_all(o, show_channels);
+
+ nvhost_module_idle(m->dev);
+}
+
+#ifdef CONFIG_DEBUG_FS
+static int show_channels_no_fifo(struct platform_device *pdev, void *data)
+{
+ struct nvhost_channel *ch;
+ struct output *o = data;
+ struct nvhost_master *m;
+ struct nvhost_device_data *pdata;
+
+ if (pdev == NULL)
+ return 0;
+
+ pdata = platform_get_drvdata(pdev);
+ m = nvhost_get_host(pdev);
+ ch = pdata->channel;
+ if (ch) {
+ mutex_lock(&ch->reflock);
+ if (ch->refcount) {
+ mutex_lock(&ch->cdma.lock);
+ nvhost_get_chip_ops()->debug.show_channel_cdma(m,
+ ch, o, pdata->index);
+ mutex_unlock(&ch->cdma.lock);
+ }
+ mutex_unlock(&ch->reflock);
+ }
+
+ return 0;
+}
+
+static void show_all_no_fifo(struct nvhost_master *m, struct output *o)
+{
+ nvhost_module_busy(m->dev);
+
+ nvhost_get_chip_ops()->debug.show_mlocks(m, o);
+ show_syncpts(m, o);
+ nvhost_debug_output(o, "---- channels ----\n");
+ nvhost_device_list_for_all(o, show_channels_no_fifo);
+
+ nvhost_module_idle(m->dev);
+}
+
+static int nvhost_debug_show_all(struct seq_file *s, void *unused)
+{
+ struct output o = {
+ .fn = write_to_seqfile,
+ .ctx = s
+ };
+ show_all(s->private, &o);
+ return 0;
+}
+
+static int nvhost_debug_show(struct seq_file *s, void *unused)
+{
+ struct output o = {
+ .fn = write_to_seqfile,
+ .ctx = s
+ };
+ show_all_no_fifo(s->private, &o);
+ return 0;
+}
+
+static int nvhost_debug_open_all(struct inode *inode, struct file *file)
+{
+ return single_open(file, nvhost_debug_show_all, inode->i_private);
+}
+
+static const struct file_operations nvhost_debug_all_fops = {
+ .open = nvhost_debug_open_all,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int nvhost_debug_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, nvhost_debug_show, inode->i_private);
+}
+
+static const struct file_operations nvhost_debug_fops = {
+ .open = nvhost_debug_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+void nvhost_device_debug_init(struct platform_device *dev)
+{
+ struct dentry *de = NULL;
+ struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+
+ de = debugfs_create_dir(dev->name, de);
+
+ pdata->debugfs = de;
+}
+
+void nvhost_debug_init(struct nvhost_master *master)
+{
+ struct nvhost_device_data *pdata;
+ struct dentry *de = debugfs_create_dir("tegra_host", NULL);
+
+ if (!de)
+ return;
+
+ pdata = platform_get_drvdata(master->dev);
+
+ /* Store the created entry */
+ pdata->debugfs = de;
+
+ debugfs_create_file("status", S_IRUGO, de,
+ master, &nvhost_debug_fops);
+ debugfs_create_file("status_all", S_IRUGO, de,
+ master, &nvhost_debug_all_fops);
+
+ debugfs_create_u32("null_kickoff_pid", S_IRUGO|S_IWUSR, de,
+ &nvhost_debug_null_kickoff_pid);
+
+ if (nvhost_get_chip_ops()->debug.debug_init)
+ nvhost_get_chip_ops()->debug.debug_init(de);
+
+ debugfs_create_u32("force_timeout_pid", S_IRUGO|S_IWUSR, de,
+ &nvhost_debug_force_timeout_pid);
+ debugfs_create_u32("force_timeout_val", S_IRUGO|S_IWUSR, de,
+ &nvhost_debug_force_timeout_val);
+ debugfs_create_u32("force_timeout_channel", S_IRUGO|S_IWUSR, de,
+ &nvhost_debug_force_timeout_channel);
+}
+#else
+void nvhost_debug_init(struct nvhost_master *master)
+{
+}
+#endif
+
+void nvhost_debug_dump(struct nvhost_master *master)
+{
+ struct output o = {
+ .fn = write_to_printk
+ };
+ show_all(master, &o);
+}
diff --git a/drivers/video/tegra/host/debug.h b/drivers/video/tegra/host/debug.h
new file mode 100644
index 0000000..c484a46
--- /dev/null
+++ b/drivers/video/tegra/host/debug.h
@@ -0,0 +1,50 @@
+/*
+ * drivers/video/tegra/host/debug.h
+ *
+ * Tegra host1x Debug
+ *
+ * Copyright (c) 2011-2012 NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __NVHOST_DEBUG_H
+#define __NVHOST_DEBUG_H
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+struct nvhost_master;
+
+struct output {
+ void (*fn)(void *ctx, const char *str, size_t len);
+ void *ctx;
+ char buf[256];
+};
+
+static inline void write_to_seqfile(void *ctx, const char *str, size_t len)
+{
+ seq_write((struct seq_file *)ctx, str, len);
+}
+
+static inline void write_to_printk(void *ctx, const char *str, size_t len)
+{
+ pr_info("%s", str);
+}
+
+void nvhost_debug_output(struct output *o, const char *fmt, ...);
+
+void nvhost_debug_init(struct nvhost_master *master);
+void nvhost_device_debug_init(struct platform_device *dev);
+void nvhost_debug_dump(struct nvhost_master *master);
+
+#endif /*__NVHOST_DEBUG_H */
diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c
index 8033b2d..fb2a0d2 100644
--- a/drivers/video/tegra/host/host1x/host1x.c
+++ b/drivers/video/tegra/host/host1x/host1x.c
@@ -31,6 +31,7 @@

#include "dev.h"
#include "host1x/host1x.h"
+#include "debug.h"
#include "nvhost_acm.h"
#include "nvhost_channel.h"
#include "chip_support.h"
@@ -184,6 +185,8 @@ static int __devinit nvhost_probe(struct platform_device *dev)
if (err)
goto fail;

+ nvhost_debug_init(host);
+
dev_info(&dev->dev, "initialized\n");

return 0;
diff --git a/drivers/video/tegra/host/host1x/host1x01.c b/drivers/video/tegra/host/host1x/host1x01.c
index cd97339..2c69200 100644
--- a/drivers/video/tegra/host/host1x/host1x01.c
+++ b/drivers/video/tegra/host/host1x/host1x01.c
@@ -48,6 +48,7 @@ struct nvhost_channel *t20_alloc_nvhost_channel(struct platform_device *dev)

#include "host1x/host1x_channel.c"
#include "host1x/host1x_cdma.c"
+#include "host1x/host1x_debug.c"
#include "host1x/host1x_syncpt.c"
#include "host1x/host1x_intr.c"

@@ -57,6 +58,7 @@ int nvhost_init_host1x01_support(struct nvhost_master *host,
op->channel = host1x_channel_ops;
op->cdma = host1x_cdma_ops;
op->push_buffer = host1x_pushbuffer_ops;
+ op->debug = host1x_debug_ops;
host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE;
op->syncpt = host1x_syncpt_ops;
op->intr = host1x_intr_ops;
diff --git a/drivers/video/tegra/host/host1x/host1x_cdma.c b/drivers/video/tegra/host/host1x/host1x_cdma.c
index 07f0758..bbc021e 100644
--- a/drivers/video/tegra/host/host1x/host1x_cdma.c
+++ b/drivers/video/tegra/host/host1x/host1x_cdma.c
@@ -25,6 +25,7 @@
#include "nvhost_cdma.h"
#include "nvhost_channel.h"
#include "dev.h"
+#include "debug.h"
#include "chip_support.h"
#include "nvhost_memmgr.h"

@@ -413,6 +414,8 @@ static void cdma_timeout_handler(struct work_struct *work)
sp = &dev->syncpt;
ch = cdma_to_channel(cdma);

+ nvhost_debug_dump(cdma_to_dev(cdma));
+
mutex_lock(&cdma->lock);

if (!cdma->timeout.clientid) {
diff --git a/drivers/video/tegra/host/host1x/host1x_debug.c b/drivers/video/tegra/host/host1x/host1x_debug.c
new file mode 100644
index 0000000..27f696cd
--- /dev/null
+++ b/drivers/video/tegra/host/host1x/host1x_debug.c
@@ -0,0 +1,405 @@
+/*
+ * drivers/video/tegra/host/host1x/host1x_debug.c
+ *
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling <konkers-***@public.gmane.org>
+ *
+ * Copyright (C) 2011 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+
+#include <linux/io.h>
+
+#include "dev.h"
+#include "debug.h"
+#include "nvhost_cdma.h"
+#include "nvhost_channel.h"
+#include "chip_support.h"
+#include "nvhost_memmgr.h"
+
+#define NVHOST_DEBUG_MAX_PAGE_OFFSET 102400
+
+enum {
+ NVHOST_DBG_STATE_CMD = 0,
+ NVHOST_DBG_STATE_DATA = 1,
+ NVHOST_DBG_STATE_GATHER = 2
+};
+
+static int show_channel_command(struct output *o, u32 addr, u32 val, int *count)
+{
+ unsigned mask;
+ unsigned subop;
+
+ switch (val >> 28) {
+ case 0x0:
+ mask = val & 0x3f;
+ if (mask) {
+ nvhost_debug_output(o,
+ "SETCL(class=%03x, offset=%03x, mask=%02x, [",
+ val >> 6 & 0x3ff, val >> 16 & 0xfff, mask);
+ *count = hweight8(mask);
+ return NVHOST_DBG_STATE_DATA;
+ } else {
+ nvhost_debug_output(o, "SETCL(class=%03x)\n",
+ val >> 6 & 0x3ff);
+ return NVHOST_DBG_STATE_CMD;
+ }
+
+ case 0x1:
+ nvhost_debug_output(o, "INCR(offset=%03x, [",
+ val >> 16 & 0xfff);
+ *count = val & 0xffff;
+ return NVHOST_DBG_STATE_DATA;
+
+ case 0x2:
+ nvhost_debug_output(o, "NONINCR(offset=%03x, [",
+ val >> 16 & 0xfff);
+ *count = val & 0xffff;
+ return NVHOST_DBG_STATE_DATA;
+
+ case 0x3:
+ mask = val & 0xffff;
+ nvhost_debug_output(o, "MASK(offset=%03x, mask=%03x, [",
+ val >> 16 & 0xfff, mask);
+ *count = hweight16(mask);
+ return NVHOST_DBG_STATE_DATA;
+
+ case 0x4:
+ nvhost_debug_output(o, "IMM(offset=%03x, data=%03x)\n",
+ val >> 16 & 0xfff, val & 0xffff);
+ return NVHOST_DBG_STATE_CMD;
+
+ case 0x5:
+ nvhost_debug_output(o, "RESTART(offset=%08x)\n", val << 4);
+ return NVHOST_DBG_STATE_CMD;
+
+ case 0x6:
+ nvhost_debug_output(o,
+ "GATHER(offset=%03x, insert=%d, type=%d, count=%04x, addr=[",
+ val >> 16 & 0xfff, val >> 15 & 0x1, val >> 14 & 0x1,
+ val & 0x3fff);
+ *count = val & 0x3fff; /* TODO: insert */
+ return NVHOST_DBG_STATE_GATHER;
+
+ case 0xe:
+ subop = val >> 24 & 0xf;
+ if (subop == 0)
+ nvhost_debug_output(o, "ACQUIRE_MLOCK(index=%d)\n",
+ val & 0xff);
+ else if (subop == 1)
+ nvhost_debug_output(o, "RELEASE_MLOCK(index=%d)\n",
+ val & 0xff);
+ else
+ nvhost_debug_output(o, "EXTEND_UNKNOWN(%08x)\n", val);
+ return NVHOST_DBG_STATE_CMD;
+
+ default:
+ return NVHOST_DBG_STATE_CMD;
+ }
+}
+
+static void show_channel_gather(struct output *o, u32 addr,
+ phys_addr_t phys_addr, u32 words, struct nvhost_cdma *cdma);
+
+static void show_channel_word(struct output *o, int *state, int *count,
+ u32 addr, u32 val, struct nvhost_cdma *cdma)
+{
+ static int start_count, dont_print;
+
+ switch (*state) {
+ case NVHOST_DBG_STATE_CMD:
+ if (addr)
+ nvhost_debug_output(o, "%08x: %08x:", addr, val);
+ else
+ nvhost_debug_output(o, "%08x:", val);
+
+ *state = show_channel_command(o, addr, val, count);
+ dont_print = 0;
+ start_count = *count;
+ if (*state == NVHOST_DBG_STATE_DATA && *count == 0) {
+ *state = NVHOST_DBG_STATE_CMD;
+ nvhost_debug_output(o, "])\n");
+ }
+ break;
+
+ case NVHOST_DBG_STATE_DATA:
+ (*count)--;
+ if (start_count - *count < 64)
+ nvhost_debug_output(o, "%08x%s",
+ val, *count > 0 ? ", " : "])\n");
+ else if (!dont_print && (*count > 0)) {
+ nvhost_debug_output(o, "[truncated; %d more words]\n",
+ *count);
+ dont_print = 1;
+ }
+ if (*count == 0)
+ *state = NVHOST_DBG_STATE_CMD;
+ break;
+
+ case NVHOST_DBG_STATE_GATHER:
+ *state = NVHOST_DBG_STATE_CMD;
+ nvhost_debug_output(o, "%08x]):\n", val);
+ if (cdma) {
+ show_channel_gather(o, addr, val,
+ *count, cdma);
+ }
+ break;
+ }
+}
+
+static void do_show_channel_gather(struct output *o,
+ phys_addr_t phys_addr,
+ u32 words, struct nvhost_cdma *cdma,
+ phys_addr_t pin_addr, u32 *map_addr)
+{
+ /* Map dmaget cursor to corresponding mem handle */
+ u32 offset;
+ int state, count, i;
+
+ offset = phys_addr - pin_addr;
+ /*
+ * Sometimes we're given different hardware address to the same
+ * page - in these cases the offset will get an invalid number and
+ * we just have to bail out.
+ */
+ if (offset > NVHOST_DEBUG_MAX_PAGE_OFFSET) {
+ nvhost_debug_output(o, "[address mismatch]\n");
+ } else {
+ /* GATHER buffer starts always with commands */
+ state = NVHOST_DBG_STATE_CMD;
+ for (i = 0; i < words; i++)
+ show_channel_word(o, &state, &count,
+ phys_addr + i * 4,
+ *(map_addr + offset/4 + i),
+ cdma);
+ }
+}
+
+static void show_channel_gather(struct output *o, u32 addr,
+ phys_addr_t phys_addr,
+ u32 words, struct nvhost_cdma *cdma)
+{
+ /* Map dmaget cursor to corresponding mem handle */
+ struct push_buffer *pb = &cdma->push_buffer;
+ u32 cur = addr - pb->phys;
+ struct mem_handle *mem = pb->handle[cur/8];
+ u32 *map_addr, offset;
+ struct sg_table *sgt;
+
+ if (!mem) {
+ nvhost_debug_output(o, "[already deallocated]\n");
+ return;
+ }
+
+ map_addr = nvhost_memmgr_mmap(mem);
+ if (!map_addr) {
+ nvhost_debug_output(o, "[could not mmap]\n");
+ return;
+ }
+
+ /* Get base address from mem */
+ sgt = nvhost_memmgr_pin(mem);
+ if (IS_ERR(sgt)) {
+ nvhost_debug_output(o, "[couldn't pin]\n");
+ nvhost_memmgr_munmap(mem, map_addr);
+ return;
+ }
+
+ offset = phys_addr - sg_dma_address(sgt->sgl);
+ do_show_channel_gather(o, phys_addr, words, cdma,
+ sg_dma_address(sgt->sgl), map_addr);
+ nvhost_memmgr_unpin(mem, sgt);
+ nvhost_memmgr_munmap(mem, map_addr);
+}
+
+static void show_channel_gathers(struct output *o, struct nvhost_cdma *cdma)
+{
+ struct nvhost_job *job;
+
+ list_for_each_entry(job, &cdma->sync_queue, list) {
+ int i;
+ nvhost_debug_output(o, "\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
+ " first_get=%08x, timeout=%d"
+ " num_slots=%d, num_handles=%d\n",
+ job,
+ job->syncpt_id,
+ job->syncpt_end,
+ job->first_get,
+ job->timeout,
+ job->num_slots,
+ job->num_unpins);
+
+ for (i = 0; i < job->num_gathers; i++) {
+ struct nvhost_job_gather *g = &job->gathers[i];
+ u32 *mapped = nvhost_memmgr_mmap(g->ref);
+ if (!mapped) {
+ nvhost_debug_output(o, "[could not mmap]\n");
+ continue;
+ }
+
+ nvhost_debug_output(o,
+ " GATHER at %08x+%04x, %d words\n",
+ g->mem_base, g->offset, g->words);
+
+ do_show_channel_gather(o, g->mem_base + g->offset,
+ g->words, cdma, g->mem_base, mapped);
+ nvhost_memmgr_munmap(g->ref, mapped);
+ }
+ }
+}
+
+static void host1x_debug_show_channel_cdma(struct nvhost_master *m,
+ struct nvhost_channel *ch, struct output *o, int chid)
+{
+ struct nvhost_channel *channel = ch;
+ struct nvhost_cdma *cdma = &channel->cdma;
+ u32 dmaput, dmaget, dmactrl;
+ u32 cbstat, cbread;
+ u32 val, base, baseval;
+ struct nvhost_device_data *pdata = platform_get_drvdata(channel->dev);
+
+ dmaput = readl(channel->aperture + host1x_channel_dmaput_r());
+ dmaget = readl(channel->aperture + host1x_channel_dmaget_r());
+ dmactrl = readl(channel->aperture + host1x_channel_dmactrl_r());
+ cbread = readl(m->sync_aperture + host1x_sync_cbread0_r() + 4 * chid);
+ cbstat = readl(m->sync_aperture + host1x_sync_cbstat_0_r() + 4 * chid);
+
+ nvhost_debug_output(o, "%d-%s (%d): ", chid,
+ channel->dev->name,
+ pdata->refcount);
+
+ if (host1x_channel_dmactrl_dmastop_v(dmactrl)
+ || !channel->cdma.push_buffer.mapped) {
+ nvhost_debug_output(o, "inactive\n\n");
+ return;
+ }
+
+ switch (cbstat) {
+ case 0x00010008:
+ nvhost_debug_output(o, "waiting on syncpt %d val %d\n",
+ cbread >> 24, cbread & 0xffffff);
+ break;
+
+ case 0x00010009:
+ base = (cbread >> 16) & 0xff;
+ baseval = readl(m->sync_aperture +
+ host1x_sync_syncpt_base_0_r() + 4 * base);
+ val = cbread & 0xffff;
+ nvhost_debug_output(o, "waiting on syncpt %d val %d "
+ "(base %d = %d; offset = %d)\n",
+ cbread >> 24, baseval + val,
+ base, baseval, val);
+ break;
+
+ default:
+ nvhost_debug_output(o,
+ "active class %02x, offset %04x, val %08x\n",
+ host1x_sync_cbstat_0_cbclass0_v(cbstat),
+ host1x_sync_cbstat_0_cboffset0_v(cbstat),
+ cbread);
+ break;
+ }
+
+ nvhost_debug_output(o, "DMAPUT %08x, DMAGET %08x, DMACTL %08x\n",
+ dmaput, dmaget, dmactrl);
+ nvhost_debug_output(o, "CBREAD %08x, CBSTAT %08x\n", cbread, cbstat);
+
+ show_channel_gathers(o, cdma);
+ nvhost_debug_output(o, "\n");
+}
+
+static void host1x_debug_show_channel_fifo(struct nvhost_master *m,
+ struct nvhost_channel *ch, struct output *o, int chid)
+{
+ u32 val, rd_ptr, wr_ptr, start, end;
+ struct nvhost_channel *channel = ch;
+ int state, count;
+
+ nvhost_debug_output(o, "%d: fifo:\n", chid);
+
+ val = readl(channel->aperture + host1x_channel_fifostat_r());
+ nvhost_debug_output(o, "FIFOSTAT %08x\n", val);
+ if (host1x_channel_fifostat_cfempty_v(val)) {
+ nvhost_debug_output(o, "[empty]\n");
+ return;
+ }
+
+ writel(0x0, m->sync_aperture + host1x_sync_cfpeek_ctrl_r());
+ writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
+ | host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid),
+ m->sync_aperture + host1x_sync_cfpeek_ctrl_r());
+
+ val = readl(m->sync_aperture + host1x_sync_cfpeek_ptrs_r());
+ rd_ptr = host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(val);
+ wr_ptr = host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(val);
+
+ val = readl(m->sync_aperture + host1x_sync_cf0_setup_r() + 4 * chid);
+ start = host1x_sync_cf0_setup_cf0_base_v(val);
+ end = host1x_sync_cf0_setup_cf0_limit_v(val);
+
+ state = NVHOST_DBG_STATE_CMD;
+
+ do {
+ writel(0x0, m->sync_aperture + host1x_sync_cfpeek_ctrl_r());
+ writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
+ | host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid)
+ | host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr),
+ m->sync_aperture + host1x_sync_cfpeek_ctrl_r());
+ val = readl(m->sync_aperture + host1x_sync_cfpeek_read_r());
+
+ show_channel_word(o, &state, &count, 0, val, NULL);
+
+ if (rd_ptr == end)
+ rd_ptr = start;
+ else
+ rd_ptr++;
+ } while (rd_ptr != wr_ptr);
+
+ if (state == NVHOST_DBG_STATE_DATA)
+ nvhost_debug_output(o, ", ...])\n");
+ nvhost_debug_output(o, "\n");
+
+ writel(0x0, m->sync_aperture + host1x_sync_cfpeek_ctrl_r());
+}
+
+static void host1x_debug_show_mlocks(struct nvhost_master *m, struct output *o)
+{
+ u32 __iomem *mlo_regs = m->sync_aperture +
+ host1x_sync_mlock_owner_0_r();
+ int i;
+
+ nvhost_debug_output(o, "---- mlocks ----\n");
+ for (i = 0; i < NV_HOST1X_NB_MLOCKS; i++) {
+ u32 owner = readl(mlo_regs + i);
+ if (host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(owner))
+ nvhost_debug_output(o, "%d: locked by channel %d\n",
+ i,
+ host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(
+ owner));
+ else if (host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(owner))
+ nvhost_debug_output(o, "%d: locked by cpu\n", i);
+ else
+ nvhost_debug_output(o, "%d: unlocked\n", i);
+ }
+ nvhost_debug_output(o, "\n");
+}
+
+static const struct nvhost_debug_ops host1x_debug_ops = {
+ .show_channel_cdma = host1x_debug_show_channel_cdma,
+ .show_channel_fifo = host1x_debug_show_channel_fifo,
+ .show_mlocks = host1x_debug_show_mlocks,
+};
diff --git a/drivers/video/tegra/host/host1x/host1x_syncpt.c b/drivers/video/tegra/host/host1x/host1x_syncpt.c
index e47bd71..eadb8cf 100644
--- a/drivers/video/tegra/host/host1x/host1x_syncpt.c
+++ b/drivers/video/tegra/host/host1x/host1x_syncpt.c
@@ -100,6 +100,7 @@ static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id)
dev_err(&syncpt_to_dev(sp)->dev->dev,
"Trying to increment syncpoint id %d beyond max\n",
id);
+ nvhost_debug_dump(syncpt_to_dev(sp));
return;
}
writel(BIT_MASK(id), dev->sync_aperture +
diff --git a/drivers/video/tegra/host/nvhost_cdma.c b/drivers/video/tegra/host/nvhost_cdma.c
index e581836..50b1e7d 100644
--- a/drivers/video/tegra/host/nvhost_cdma.c
+++ b/drivers/video/tegra/host/nvhost_cdma.c
@@ -21,6 +21,7 @@
#include "nvhost_cdma.h"
#include "nvhost_channel.h"
#include "dev.h"
+#include "debug.h"
#include "nvhost_memmgr.h"
#include "chip_support.h"
#include <asm/cacheflush.h>
diff --git a/drivers/video/tegra/host/nvhost_syncpt.c b/drivers/video/tegra/host/nvhost_syncpt.c
index f61b924..fc1c19c 100644
--- a/drivers/video/tegra/host/nvhost_syncpt.c
+++ b/drivers/video/tegra/host/nvhost_syncpt.c
@@ -24,6 +24,7 @@
#include "nvhost_syncpt.h"
#include "nvhost_acm.h"
#include "host1x/host1x.h"
+#include "debug.h"
#include "chip_support.h"

#define MAX_SYNCPT_LENGTH 5
@@ -222,6 +223,7 @@ int nvhost_syncpt_wait_timeout(struct nvhost_syncpt *sp, u32 id,
"is timeout %d too low?\n",
low_timeout);
}
+ nvhost_debug_dump(syncpt_to_dev(sp));
}
check_count++;
}
--
1.7.9.5
Thierry Reding
2012-12-01 14:45:12 UTC
Permalink
On Mon, Nov 26, 2012 at 03:19:06PM +0200, Terje Bergstrom wrote:
[...]
> The patch set also adds user space API to tegradrm for accessing
> host1d and 2D. We are preparing also patches to libdrm, but they are
> not yet in condition that they could be sent out.

I did some prototyping on how a libdrm API could look like a few weeks
back. I should clean the patches up some and push them to a public
repository or to the mailing lists for review.

There isn't actually much more than a bit of framework along with two
IOCTLs that allow creating and looking up a Tegra-specific GEM. The
related kernel patches aren't available anywhere since I didn't deem
them ready yet. At that time I wasn't even sure if we'd need special
allocations other than what the dumb BO infrastructure provides. They
implement some parts of what you've implemented in this series as well,
with some slight differences.

Currently these still use the CMA-backed GEM objects but it should be
easy to switch to something backed by the host1x infrastructure once
that's in good shape.

While I can't find the quote right now, I seem to remember that you said
at some point that you were planning on adding some 2D acceleration bits
to libdrm. I don't think that's the right place. That code should rather
go into the DDX. libdrm should instead provide a thin layer on top of
the DRM IOCTLs to manage buffers and submit command streams. I hope I
can finish the cleanup of my libdrm patches over the weekend and push
them out so this may become clearer. Maybe I can even get the
corresponding kernel patches pushed out.

Thierry
Terje Bergström
2012-12-01 17:08:12 UTC
Permalink
On 01.12.2012 16:45, Thierry Reding wrote:
> I did some prototyping on how a libdrm API could look like a few weeks
> back. I should clean the patches up some and push them to a public
> repository or to the mailing lists for review.

Ok. Sorry about the delay - I recently learned I need separate
permission for user space contribution, so I'm pushing to get that
permission.

> There isn't actually much more than a bit of framework along with two
> IOCTLs that allow creating and looking up a Tegra-specific GEM. The
> related kernel patches aren't available anywhere since I didn't deem
> them ready yet. At that time I wasn't even sure if we'd need special
> allocations other than what the dumb BO infrastructure provides. They
> implement some parts of what you've implemented in this series as well,
> with some slight differences.

Ok, the BO infra is still under flux as we're designing the best place
and work split.

> Currently these still use the CMA-backed GEM objects but it should be
> easy to switch to something backed by the host1x infrastructure once
> that's in good shape.

Sounds good.

> While I can't find the quote right now, I seem to remember that you said
> at some point that you were planning on adding some 2D acceleration bits
> to libdrm. I don't think that's the right place. That code should rather
> go into the DDX. libdrm should instead provide a thin layer on top of
> the DRM IOCTLs to manage buffers and submit command streams. I hope I
> can finish the cleanup of my libdrm patches over the weekend and push
> them out so this may become clearer. Maybe I can even get the
> corresponding kernel patches pushed out.

Yep, that's exactly what I actually posed as a question in one of the
earlier mails. We also agree that 2D bits should not stay in libdrm.
That's why we've kept the 2D bits design-wise separate from the host1x
stream generation.

We don't yet have any other place to put 2D functions in, so we'll
probably post them as part of patch series to libdrm. We'll just add a
disclaimer that the 2D code won't remain in libdrm, and wanted to get
the code out to review as a code example. We can put the 2D code either
to a separate library or to DDX, whichever is preferred.

The host1x command stream generation would still remain in libdrm. That
seems to be the pattern with other hardware.

Best regards,
Terje
Thierry Reding
2012-12-01 19:29:17 UTC
Permalink
On Sat, Dec 01, 2012 at 07:08:12PM +0200, Terje Bergström wrote:
> On 01.12.2012 16:45, Thierry Reding wrote:
> > I did some prototyping on how a libdrm API could look like a few weeks
> > back. I should clean the patches up some and push them to a public
> > repository or to the mailing lists for review.
>
> Ok. Sorry about the delay - I recently learned I need separate
> permission for user space contribution, so I'm pushing to get that
> permission.

Oh dear. Doesn't sound like fun. =)

> > There isn't actually much more than a bit of framework along with two
> > IOCTLs that allow creating and looking up a Tegra-specific GEM. The
> > related kernel patches aren't available anywhere since I didn't deem
> > them ready yet. At that time I wasn't even sure if we'd need special
> > allocations other than what the dumb BO infrastructure provides. They
> > implement some parts of what you've implemented in this series as well,
> > with some slight differences.
>
> Ok, the BO infra is still under flux as we're designing the best place
> and work split.

Yes, I've put the prototype under a --enable-tegra-experimental-api
switch, which has been used in the past for helpers that weren't
finalized yet.

> > While I can't find the quote right now, I seem to remember that you said
> > at some point that you were planning on adding some 2D acceleration bits
> > to libdrm. I don't think that's the right place. That code should rather
> > go into the DDX. libdrm should instead provide a thin layer on top of
> > the DRM IOCTLs to manage buffers and submit command streams. I hope I
> > can finish the cleanup of my libdrm patches over the weekend and push
> > them out so this may become clearer. Maybe I can even get the
> > corresponding kernel patches pushed out.
>
> Yep, that's exactly what I actually posed as a question in one of the
> earlier mails. We also agree that 2D bits should not stay in libdrm.
> That's why we've kept the 2D bits design-wise separate from the host1x
> stream generation.
>
> We don't yet have any other place to put 2D functions in, so we'll
> probably post them as part of patch series to libdrm. We'll just add a
> disclaimer that the 2D code won't remain in libdrm, and wanted to get
> the code out to review as a code example. We can put the 2D code either
> to a separate library or to DDX, whichever is preferred.

FWIW, I've done some work on an initial DDX, which is basically a fork
of xf86-video-modesetting rebranded and with some cleanup like ripping
out the PCI support. I wanted to do some testing before pushing it out
and I think I can get that done on Monday.

Posting the code early is exactly the right thing to do. We still have
to figure out quite a number of things and we can always move code
between the various components of the whole stack.

> The host1x command stream generation would still remain in libdrm. That
> seems to be the pattern with other hardware.

Yes, I fully agree.

Thierry
Loading...