[dpdk-dev] [PATCH 0/8] support reset of VF link

Discussion:

Wenzhuo Lu

2016-06-06 05:40:45 UTC

If the PF link is down and up, VF link will not work
accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the
VF link and let it recover.

PS: This patch set is splitted from a previous patch set, *automatic
link recovery on ixgbe/igb VF*, and it's base on the patch set
*support mailbox interruption on ixgbe/igb VF*.

Wenzhuo Lu (8):
lib/librte_ether: support device reset
lib/librte_ether: defind RX/TX lock mode
ixgbe: RX/TX with lock on VF
ixgbe: implement device reset on VF
igb: RX/TX with lock on VF
igb: implement device reset on VF
i40e:RX/TX with lock on VF
i40e: implement device reset on VF

doc/guides/rel_notes/release_16_07.rst | 14 ++++
drivers/net/e1000/e1000_ethdev.h | 126 ++++++++++++++++++++++++++++
drivers/net/e1000/igb_ethdev.c | 118 +++++++++++++++++++++++++-
drivers/net/e1000/igb_rxtx.c | 148 +++++++++------------------------
drivers/net/i40e/i40e_ethdev.c | 4 +-
drivers/net/i40e/i40e_ethdev.h | 5 ++
drivers/net/i40e/i40e_ethdev_vf.c | 145 +++++++++++++++++++++++++++++++-
drivers/net/i40e/i40e_rxtx.c | 45 ++++++----
drivers/net/i40e/i40e_rxtx.h | 34 ++++++++
drivers/net/ixgbe/ixgbe_ethdev.c | 120 +++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 32 ++++++-
drivers/net/ixgbe/ixgbe_rxtx.c | 116 +++++++++++++++++++++++---
drivers/net/ixgbe/ixgbe_rxtx.h | 13 +++
drivers/net/ixgbe/ixgbe_rxtx_vec.c | 6 ++
lib/librte_ether/rte_ethdev.c | 17 ++++
lib/librte_ether/rte_ethdev.h | 76 +++++++++++++++++
lib/librte_ether/rte_ether_version.map | 7 ++
17 files changed, 879 insertions(+), 147 deletions(-)

--
1.9.3

Wenzhuo Lu

2016-06-06 05:40:46 UTC

Permalink

Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down/up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
lib/librte_ether/rte_ethdev.c | 17 +++++++++++++++++
lib/librte_ether/rte_ethdev.h | 14 ++++++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++++++
3 files changed, 38 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e148028..e43dca9 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
-ENOTSUP);
return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
}
+
+int
+rte_eth_dev_reset(uint8_t port_id)
+{
+ struct rte_eth_dev *dev;
+ int diag;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+ dev = &rte_eth_devices[port_id];
+
+ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+ diag = (*dev->dev_ops->dev_reset)(dev);
+
+ return diag;
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..74e895f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
uint8_t en);
/**< @internal enable/disable the l2 tunnel offload functions */

+typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to reset a configured Ethernet device. */
+
#ifdef RTE_NIC_BYPASS

enum {
@@ -1508,6 +1511,8 @@ struct eth_dev_ops {
eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
/** Enable/disable l2 tunnel offload functions */
eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+ /** Reset device. */
+ eth_dev_reset_t dev_reset;
};

/**
@@ -4253,6 +4258,15 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
uint32_t mask,
uint8_t en);

+/**
+ * Reset an Ethernet device.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c34207e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
rte_eth_tx_buffer_set_err_callback;

} DPDK_2.2;
+
+DPDK_16.07 {
+ global:
+
+ rte_eth_dev_reset;
+
+} DPDK_16.04;

--
1.9.3

Wenzhuo Lu

2016-06-06 05:40:47 UTC

Permalink

Define lock mode for RX/TX queue. Because when resetting
the device we want the resetting thread to get the lock
of the RX/TX queue to make sure the RX/TX is stopped.

Using next ABI macro for this ABI change as it has too
much impact. 7 APIs and 1 global variable are impacted.

Signed-off-by: Wenzhuo Lu <***@intel.com>
Signed-off-by: Zhe Tao <***@intel.com>
---
lib/librte_ether/rte_ethdev.h | 62 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 74e895f..4efb5e9 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -354,7 +354,12 @@ struct rte_eth_rxmode {
jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */
hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
enable_scatter : 1, /**< Enable scatter packets rx handler */
+#ifndef RTE_NEXT_ABI
enable_lro : 1; /**< Enable LRO */
+#else
+ enable_lro : 1, /**< Enable LRO */
+ lock_mode : 1; /**< Using lock path */
+#endif
};

/**
@@ -634,11 +639,68 @@ struct rte_eth_txmode {
/**< If set, reject sending out tagged pkts */
hw_vlan_reject_untagged : 1,
/**< If set, reject sending out untagged pkts */
+#ifndef RTE_NEXT_ABI
hw_vlan_insert_pvid : 1;
/**< If set, enable port based VLAN insertion */
+#else
+ hw_vlan_insert_pvid : 1,
+ /**< If set, enable port based VLAN insertion */
+ lock_mode : 1;
+ /**< If set, using lock path */
+#endif
};

/**
+ * The macros for the RX/TX lock mode functions
+ */
+#ifdef RTE_NEXT_ABI
+#define RX_LOCK_FUNCTION(dev, func) \
+ (dev->data->dev_conf.rxmode.lock_mode ? \
+ func ## _lock : func)
+
+#define TX_LOCK_FUNCTION(dev, func) \
+ (dev->data->dev_conf.txmode.lock_mode ? \
+ func ## _lock : func)
+#else
+#define RX_LOCK_FUNCTION(dev, func) func
+
+#define TX_LOCK_FUNCTION(dev, func) func
+#endif
+
+/* Add the lock RX/TX function for VF reset */
+#define GENERATE_RX_LOCK(func, nic) \
+uint16_t func ## _lock(void *rx_queue, \
+ struct rte_mbuf **rx_pkts, \
+ uint16_t nb_pkts) \
+{ \
+ struct nic ## _rx_queue *rxq = rx_queue; \
+ uint16_t nb_rx = 0; \
+ \
+ if (rte_spinlock_trylock(&rxq->rx_lock)) { \
+ nb_rx = func(rx_queue, rx_pkts, nb_pkts); \
+ rte_spinlock_unlock(&rxq->rx_lock); \
+ } \
+ \
+ return nb_rx; \
+}
+
+#define GENERATE_TX_LOCK(func, nic) \
+uint16_t func ## _lock(void *tx_queue, \
+ struct rte_mbuf **tx_pkts, \
+ uint16_t nb_pkts) \
+{ \
+ struct nic ## _tx_queue *txq = tx_queue; \
+ uint16_t nb_tx = 0; \
+ \
+ if (rte_spinlock_trylock(&txq->tx_lock)) { \
+ nb_tx = func(tx_queue, tx_pkts, nb_pkts); \
+ rte_spinlock_unlock(&txq->tx_lock); \
+ } \
+ \
+ return nb_tx; \
+}
+
+/**
* A structure used to configure an RX ring of an Ethernet port.
*/
struct rte_eth_rxconf {

--
1.9.3

Stephen Hemminger

2016-06-08 02:15:53 UTC

Permalink

On Mon, 6 Jun 2016 13:40:47 +0800

Post by Wenzhuo Lu
Define lock mode for RX/TX queue. Because when resetting
the device we want the resetting thread to get the lock
of the RX/TX queue to make sure the RX/TX is stopped.
Using next ABI macro for this ABI change as it has too
much impact. 7 APIs and 1 global variable are impacted.

Why does this patch set make a different assumption the rest of the DPDK?

The rest of the DPDK operates on the principle that the application
is smart enough to stop the device before making changes. There is no
equivalent to the Linux kernel RTNL mutex. The API assumes application
threads are well behaved and will not try and sabotage each other.

If you restrict the reset operation to only being available when RX/TX is stopped,
then no lock is needed.

The fact that it requires lots more locking inside each device driver implies
to me this is not correct way to architect this.

Lu, Wenzhuo

2016-06-08 07:34:43 UTC

Permalink

Hi Stephen,

-----Original Message-----
Sent: Wednesday, June 8, 2016 10:16 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
On Mon, 6 Jun 2016 13:40:47 +0800

Why does this patch set make a different assumption the rest of the DPDK?
The rest of the DPDK operates on the principle that the application is smart
enough to stop the device before making changes. There is no equivalent to the
Linux kernel RTNL mutex. The API assumes application threads are well behaved
and will not try and sabotage each other.
If you restrict the reset operation to only being available when RX/TX is stopped,
then no lock is needed.
The fact that it requires lots more locking inside each device driver implies to me
this is not correct way to architect this.

It's a good question. This patch set doesn't follow the regular assumption of DPDK.
But it's a requirement we've got from some customers. The users want the driver does as much as it can. The best is the link state change is transparent to the users.
The patch set tries to provide another choice if the users don't want to stop their rx/tx to handle the reset event.

And as discussed in the other thread, most probably we will move the lock from the PMD layer to rte lay. It'll avoid the change in every device.

Olivier Matz

2016-06-09 07:50:57 UTC

Permalink

Hi,

Post by Lu, Wenzhuo
Hi Stephen,

-----Original Message-----
Sent: Wednesday, June 8, 2016 10:16 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
On Mon, 6 Jun 2016 13:40:47 +0800

Why does this patch set make a different assumption the rest of the DPDK?
The rest of the DPDK operates on the principle that the application is smart
enough to stop the device before making changes. There is no equivalent to the
Linux kernel RTNL mutex. The API assumes application threads are well behaved
and will not try and sabotage each other.
If you restrict the reset operation to only being available when RX/TX is stopped,
then no lock is needed.
The fact that it requires lots more locking inside each device driver implies to me
this is not correct way to architect this.

+1

I'm not sure adding locks is the proper way to do.
This is the application responsibility to ensure that:
- control functions are not called concurrently on the same port
- rx/tx functions are not called when the device is stopped/reset/...

However, I do think the usage paradigms of the ethdev api should be
better documented in rte_ethdev.h (ex: which functions can be called
concurrently). This would be a first step.

If we really want a helper API to do that in DPDK, the _next_ step
could be to add them in the ethdev api to achieve this. Maybe
something like (the function names could be better):

- to be called on one control thread:

rte_eth_stop_rxtx(port)
rte_eth_start_rxtx(port)

rte_eth_get_rxtx_state(port)
-> return "running" if at least one core is inside the rx/tx code
-> return "stopped" if all cores are outside the rx/tx code

- to be called on dataplane cores:

/* same than rte_eth_rx_burst(), but checks if rx/tx is allowed
* first, else do nothing */
rte_eth_rx_burst_interruptible()
rte_eth_tx_burst_interruptible()

The code of control thread could be:

rte_eth_stop_rxtx(port);
/* wait that all dataplane cores finished their processing */
while (rte_eth_get_rxtx_state(port) != stopped)
;
rte_eth_some_control_operation(port);
rte_eth_start_rxtx(port);

I think this could be done without any lock, just with the proper
memory barriers and a per-core status.

But this API may impose a paradigm to the application, and I'm not
sure the DPDK should do that.

Regards,
Olivier

Lu, Wenzhuo

2016-06-12 05:25:41 UTC

Permalink

Hi Olivier,

-----Original Message-----
Sent: Thursday, June 9, 2016 3:51 PM
To: Lu, Wenzhuo; Stephen Hemminger
Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
Hi,

Post by Lu, Wenzhuo
Hi Stephen,

-----Original Message-----
Sent: Wednesday, June 8, 2016 10:16 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
On Mon, 6 Jun 2016 13:40:47 +0800

other.