Discussion:
[dpdk-dev] [PATCH 0/8] support reset of VF link
Wenzhuo Lu
2016-06-06 05:40:45 UTC
Permalink
If the PF link is down and up, VF link will not work
accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the
VF link and let it recover.

PS: This patch set is splitted from a previous patch set, *automatic
link recovery on ixgbe/igb VF*, and it's base on the patch set
*support mailbox interruption on ixgbe/igb VF*.

Wenzhuo Lu (8):
lib/librte_ether: support device reset
lib/librte_ether: defind RX/TX lock mode
ixgbe: RX/TX with lock on VF
ixgbe: implement device reset on VF
igb: RX/TX with lock on VF
igb: implement device reset on VF
i40e:RX/TX with lock on VF
i40e: implement device reset on VF

doc/guides/rel_notes/release_16_07.rst | 14 ++++
drivers/net/e1000/e1000_ethdev.h | 126 ++++++++++++++++++++++++++++
drivers/net/e1000/igb_ethdev.c | 118 +++++++++++++++++++++++++-
drivers/net/e1000/igb_rxtx.c | 148 +++++++++------------------------
drivers/net/i40e/i40e_ethdev.c | 4 +-
drivers/net/i40e/i40e_ethdev.h | 5 ++
drivers/net/i40e/i40e_ethdev_vf.c | 145 +++++++++++++++++++++++++++++++-
drivers/net/i40e/i40e_rxtx.c | 45 ++++++----
drivers/net/i40e/i40e_rxtx.h | 34 ++++++++
drivers/net/ixgbe/ixgbe_ethdev.c | 120 +++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 32 ++++++-
drivers/net/ixgbe/ixgbe_rxtx.c | 116 +++++++++++++++++++++++---
drivers/net/ixgbe/ixgbe_rxtx.h | 13 +++
drivers/net/ixgbe/ixgbe_rxtx_vec.c | 6 ++
lib/librte_ether/rte_ethdev.c | 17 ++++
lib/librte_ether/rte_ethdev.h | 76 +++++++++++++++++
lib/librte_ether/rte_ether_version.map | 7 ++
17 files changed, 879 insertions(+), 147 deletions(-)
--
1.9.3
Wenzhuo Lu
2016-06-06 05:40:46 UTC
Permalink
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down/up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
lib/librte_ether/rte_ethdev.c | 17 +++++++++++++++++
lib/librte_ether/rte_ethdev.h | 14 ++++++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++++++
3 files changed, 38 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e148028..e43dca9 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
-ENOTSUP);
return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
}
+
+int
+rte_eth_dev_reset(uint8_t port_id)
+{
+ struct rte_eth_dev *dev;
+ int diag;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+ dev = &rte_eth_devices[port_id];
+
+ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+ diag = (*dev->dev_ops->dev_reset)(dev);
+
+ return diag;
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..74e895f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
uint8_t en);
/**< @internal enable/disable the l2 tunnel offload functions */

+typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to reset a configured Ethernet device. */
+
#ifdef RTE_NIC_BYPASS

enum {
@@ -1508,6 +1511,8 @@ struct eth_dev_ops {
eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
/** Enable/disable l2 tunnel offload functions */
eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+ /** Reset device. */
+ eth_dev_reset_t dev_reset;
};

/**
@@ -4253,6 +4258,15 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
uint32_t mask,
uint8_t en);

+/**
+ * Reset an Ethernet device.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c34207e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
rte_eth_tx_buffer_set_err_callback;

} DPDK_2.2;
+
+DPDK_16.07 {
+ global:
+
+ rte_eth_dev_reset;
+
+} DPDK_16.04;
--
1.9.3
Wenzhuo Lu
2016-06-06 05:40:47 UTC
Permalink
Define lock mode for RX/TX queue. Because when resetting
the device we want the resetting thread to get the lock
of the RX/TX queue to make sure the RX/TX is stopped.

Using next ABI macro for this ABI change as it has too
much impact. 7 APIs and 1 global variable are impacted.

Signed-off-by: Wenzhuo Lu <***@intel.com>
Signed-off-by: Zhe Tao <***@intel.com>
---
lib/librte_ether/rte_ethdev.h | 62 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 74e895f..4efb5e9 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -354,7 +354,12 @@ struct rte_eth_rxmode {
jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */
hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
enable_scatter : 1, /**< Enable scatter packets rx handler */
+#ifndef RTE_NEXT_ABI
enable_lro : 1; /**< Enable LRO */
+#else
+ enable_lro : 1, /**< Enable LRO */
+ lock_mode : 1; /**< Using lock path */
+#endif
};

/**
@@ -634,11 +639,68 @@ struct rte_eth_txmode {
/**< If set, reject sending out tagged pkts */
hw_vlan_reject_untagged : 1,
/**< If set, reject sending out untagged pkts */
+#ifndef RTE_NEXT_ABI
hw_vlan_insert_pvid : 1;
/**< If set, enable port based VLAN insertion */
+#else
+ hw_vlan_insert_pvid : 1,
+ /**< If set, enable port based VLAN insertion */
+ lock_mode : 1;
+ /**< If set, using lock path */
+#endif
};

/**
+ * The macros for the RX/TX lock mode functions
+ */
+#ifdef RTE_NEXT_ABI
+#define RX_LOCK_FUNCTION(dev, func) \
+ (dev->data->dev_conf.rxmode.lock_mode ? \
+ func ## _lock : func)
+
+#define TX_LOCK_FUNCTION(dev, func) \
+ (dev->data->dev_conf.txmode.lock_mode ? \
+ func ## _lock : func)
+#else
+#define RX_LOCK_FUNCTION(dev, func) func
+
+#define TX_LOCK_FUNCTION(dev, func) func
+#endif
+
+/* Add the lock RX/TX function for VF reset */
+#define GENERATE_RX_LOCK(func, nic) \
+uint16_t func ## _lock(void *rx_queue, \
+ struct rte_mbuf **rx_pkts, \
+ uint16_t nb_pkts) \
+{ \
+ struct nic ## _rx_queue *rxq = rx_queue; \
+ uint16_t nb_rx = 0; \
+ \
+ if (rte_spinlock_trylock(&rxq->rx_lock)) { \
+ nb_rx = func(rx_queue, rx_pkts, nb_pkts); \
+ rte_spinlock_unlock(&rxq->rx_lock); \
+ } \
+ \
+ return nb_rx; \
+}
+
+#define GENERATE_TX_LOCK(func, nic) \
+uint16_t func ## _lock(void *tx_queue, \
+ struct rte_mbuf **tx_pkts, \
+ uint16_t nb_pkts) \
+{ \
+ struct nic ## _tx_queue *txq = tx_queue; \
+ uint16_t nb_tx = 0; \
+ \
+ if (rte_spinlock_trylock(&txq->tx_lock)) { \
+ nb_tx = func(tx_queue, tx_pkts, nb_pkts); \
+ rte_spinlock_unlock(&txq->tx_lock); \
+ } \
+ \
+ return nb_tx; \
+}
+
+/**
* A structure used to configure an RX ring of an Ethernet port.
*/
struct rte_eth_rxconf {
--
1.9.3
Stephen Hemminger
2016-06-08 02:15:53 UTC
Permalink
On Mon, 6 Jun 2016 13:40:47 +0800
Post by Wenzhuo Lu
Define lock mode for RX/TX queue. Because when resetting
the device we want the resetting thread to get the lock
of the RX/TX queue to make sure the RX/TX is stopped.
Using next ABI macro for this ABI change as it has too
much impact. 7 APIs and 1 global variable are impacted.
Why does this patch set make a different assumption the rest of the DPDK?

The rest of the DPDK operates on the principle that the application
is smart enough to stop the device before making changes. There is no
equivalent to the Linux kernel RTNL mutex. The API assumes application
threads are well behaved and will not try and sabotage each other.

If you restrict the reset operation to only being available when RX/TX is stopped,
then no lock is needed.

The fact that it requires lots more locking inside each device driver implies
to me this is not correct way to architect this.
Lu, Wenzhuo
2016-06-08 07:34:43 UTC
Permalink
Hi Stephen,
-----Original Message-----
Sent: Wednesday, June 8, 2016 10:16 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
On Mon, 6 Jun 2016 13:40:47 +0800
Define lock mode for RX/TX queue. Because when resetting the device we
want the resetting thread to get the lock of the RX/TX queue to make
sure the RX/TX is stopped.
Using next ABI macro for this ABI change as it has too much impact. 7
APIs and 1 global variable are impacted.
Why does this patch set make a different assumption the rest of the DPDK?
The rest of the DPDK operates on the principle that the application is smart
enough to stop the device before making changes. There is no equivalent to the
Linux kernel RTNL mutex. The API assumes application threads are well behaved
and will not try and sabotage each other.
If you restrict the reset operation to only being available when RX/TX is stopped,
then no lock is needed.
The fact that it requires lots more locking inside each device driver implies to me
this is not correct way to architect this.
It's a good question. This patch set doesn't follow the regular assumption of DPDK.
But it's a requirement we've got from some customers. The users want the driver does as much as it can. The best is the link state change is transparent to the users.
The patch set tries to provide another choice if the users don't want to stop their rx/tx to handle the reset event.

And as discussed in the other thread, most probably we will move the lock from the PMD layer to rte lay. It'll avoid the change in every device.
Olivier Matz
2016-06-09 07:50:57 UTC
Permalink
Hi,
Post by Lu, Wenzhuo
Hi Stephen,
-----Original Message-----
Sent: Wednesday, June 8, 2016 10:16 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
On Mon, 6 Jun 2016 13:40:47 +0800
Define lock mode for RX/TX queue. Because when resetting the device we
want the resetting thread to get the lock of the RX/TX queue to make
sure the RX/TX is stopped.
Using next ABI macro for this ABI change as it has too much impact. 7
APIs and 1 global variable are impacted.
Why does this patch set make a different assumption the rest of the DPDK?
The rest of the DPDK operates on the principle that the application is smart
enough to stop the device before making changes. There is no equivalent to the
Linux kernel RTNL mutex. The API assumes application threads are well behaved
and will not try and sabotage each other.
If you restrict the reset operation to only being available when RX/TX is stopped,
then no lock is needed.
The fact that it requires lots more locking inside each device driver implies to me
this is not correct way to architect this.
+1

I'm not sure adding locks is the proper way to do.
This is the application responsibility to ensure that:
- control functions are not called concurrently on the same port
- rx/tx functions are not called when the device is stopped/reset/...

However, I do think the usage paradigms of the ethdev api should be
better documented in rte_ethdev.h (ex: which functions can be called
concurrently). This would be a first step.

If we really want a helper API to do that in DPDK, the _next_ step
could be to add them in the ethdev api to achieve this. Maybe
something like (the function names could be better):

- to be called on one control thread:

rte_eth_stop_rxtx(port)
rte_eth_start_rxtx(port)

rte_eth_get_rxtx_state(port)
-> return "running" if at least one core is inside the rx/tx code
-> return "stopped" if all cores are outside the rx/tx code

- to be called on dataplane cores:

/* same than rte_eth_rx_burst(), but checks if rx/tx is allowed
* first, else do nothing */
rte_eth_rx_burst_interruptible()
rte_eth_tx_burst_interruptible()


The code of control thread could be:

rte_eth_stop_rxtx(port);
/* wait that all dataplane cores finished their processing */
while (rte_eth_get_rxtx_state(port) != stopped)
;
rte_eth_some_control_operation(port);
rte_eth_start_rxtx(port);


I think this could be done without any lock, just with the proper
memory barriers and a per-core status.

But this API may impose a paradigm to the application, and I'm not
sure the DPDK should do that.

Regards,
Olivier
Lu, Wenzhuo
2016-06-12 05:25:41 UTC
Permalink
Hi Olivier,
-----Original Message-----
Sent: Thursday, June 9, 2016 3:51 PM
To: Lu, Wenzhuo; Stephen Hemminger
Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
Hi,
Post by Lu, Wenzhuo
Hi Stephen,
-----Original Message-----
Sent: Wednesday, June 8, 2016 10:16 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
On Mon, 6 Jun 2016 13:40:47 +0800
Define lock mode for RX/TX queue. Because when resetting the device
we want the resetting thread to get the lock of the RX/TX queue to
make sure the RX/TX is stopped.
Using next ABI macro for this ABI change as it has too much impact.
7 APIs and 1 global variable are impacted.
Why does this patch set make a different assumption the rest of the DPDK?
The rest of the DPDK operates on the principle that the application
is smart enough to stop the device before making changes. There is no
equivalent to the Linux kernel RTNL mutex. The API assumes
application threads are well behaved and will not try and sabotage each
other.
Post by Lu, Wenzhuo
If you restrict the reset operation to only being available when
RX/TX is stopped, then no lock is needed.
The fact that it requires lots more locking inside each device driver
implies to me this is not correct way to architect this.
+1
I'm not sure adding locks is the proper way to do.
- control functions are not called concurrently on the same port
- rx/tx functions are not called when the device is stopped/reset/...
However, I do think the usage paradigms of the ethdev api should be better
documented in rte_ethdev.h (ex: which functions can be called concurrently).
This would be a first step.
If we really want a helper API to do that in DPDK, the _next_ step could be to
add them in the ethdev api to achieve this. Maybe something like (the function
rte_eth_stop_rxtx(port)
rte_eth_start_rxtx(port)
rte_eth_get_rxtx_state(port)
-> return "running" if at least one core is inside the rx/tx code
-> return "stopped" if all cores are outside the rx/tx code
/* same than rte_eth_rx_burst(), but checks if rx/tx is allowed
* first, else do nothing */
rte_eth_rx_burst_interruptible()
rte_eth_tx_burst_interruptible()
rte_eth_stop_rxtx(port);
/* wait that all dataplane cores finished their processing */
while (rte_eth_get_rxtx_state(port) != stopped)
;
rte_eth_some_control_operation(port);
rte_eth_start_rxtx(port);
I think this could be done without any lock, just with the proper memory barriers
and a per-core status.
But this API may impose a paradigm to the application, and I'm not sure the
DPDK should do that.
I don't quite catch your point. Seems your solution still need the APP to change the code. I think it's more complex than just letting the APP to stop the rx/tx and reset the port. Our purpose of this patch set is to let APP do less as possible. It's not a good choice if we make it more complex.
And seems it's hard to stop and start rx/tx in rte layer. Normally APP should do that. To my opinion, we have to introduce lock in rte to achieve that.
Regards,
Olivier
Stephen Hemminger
2016-06-10 18:12:27 UTC
Permalink
On Wed, 8 Jun 2016 07:34:43 +0000
Post by Lu, Wenzhuo
The fact that it requires lots more locking inside each device driver implies to me
this is not correct way to architect this.
It's a good question. This patch set doesn't follow the regular assumption of DPDK.
But it's a requirement we've got from some customers. The users want the driver does as much as it can. The best is the link state change is transparent to the users.
The patch set tries to provide another choice if the users don't want to stop their rx/tx to handle the reset event.
Then bring those uses to the development world (on users mailing list) and lets
start the discussion there. The requirements creeping in through the backdoor also worries me.
Lu, Wenzhuo
2016-06-12 05:27:54 UTC
Permalink
Hi Stephen,
-----Original Message-----
Sent: Saturday, June 11, 2016 2:12 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
On Wed, 8 Jun 2016 07:34:43 +0000
Post by Lu, Wenzhuo
Post by Stephen Hemminger
The fact that it requires lots more locking inside each device
driver implies to me this is not correct way to architect this.
It's a good question. This patch set doesn't follow the regular assumption of
DPDK.
Post by Lu, Wenzhuo
But it's a requirement we've got from some customers. The users want the
driver does as much as it can. The best is the link state change is transparent to
the users.
Post by Lu, Wenzhuo
The patch set tries to provide another choice if the users don't want to stop
their rx/tx to handle the reset event.
Then bring those uses to the development world (on users mailing list) and lets
start the discussion there. The requirements creeping in through the backdoor
also worries me.
Got it. Then how about we only provide a reset API and let the APP to stop/start the rx/tx and call the API to reset the port? Thanks.
Wenzhuo Lu
2016-06-06 05:40:48 UTC
Permalink
Add RX/TX paths with lock for VF. It's used when
the function of link reset on VF is needed.
When the lock for RX/TX is added, the RX/TX can be
stopped. Then we have a chance to reset the VF link.

Please be aware there's performence drop if the lock
path is chosen.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
drivers/net/ixgbe/ixgbe_ethdev.c | 12 +++++--
drivers/net/ixgbe/ixgbe_ethdev.h | 20 +++++++++++
drivers/net/ixgbe/ixgbe_rxtx.c | 74 ++++++++++++++++++++++++++++++++------
drivers/net/ixgbe/ixgbe_rxtx.h | 13 +++++++
drivers/net/ixgbe/ixgbe_rxtx_vec.c | 6 ++++
5 files changed, 112 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 05f4f29..fd2682f 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1325,8 +1325,8 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
PMD_INIT_FUNC_TRACE();

eth_dev->dev_ops = &ixgbevf_eth_dev_ops;
- eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
- eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+ eth_dev->rx_pkt_burst = RX_LOCK_FUNCTION(eth_dev, ixgbe_recv_pkts);
+ eth_dev->tx_pkt_burst = TX_LOCK_FUNCTION(eth_dev, ixgbe_xmit_pkts);

/* for secondary processes, we don't initialise any further as primary
* has already done this work. Only check we don't need a different
@@ -3012,7 +3012,15 @@ ixgbe_dev_supported_ptypes_get(struct rte_eth_dev *dev)
if (dev->rx_pkt_burst == ixgbe_recv_pkts ||
dev->rx_pkt_burst == ixgbe_recv_pkts_lro_single_alloc ||
dev->rx_pkt_burst == ixgbe_recv_pkts_lro_bulk_alloc ||
+#ifndef RTE_NEXT_ABI
dev->rx_pkt_burst == ixgbe_recv_pkts_bulk_alloc)
+#else
+ dev->rx_pkt_burst == ixgbe_recv_pkts_bulk_alloc ||
+ dev->rx_pkt_burst == ixgbe_recv_pkts_lock ||
+ dev->rx_pkt_burst == ixgbe_recv_pkts_lro_single_alloc_lock ||
+ dev->rx_pkt_burst == ixgbe_recv_pkts_lro_bulk_alloc_lock ||
+ dev->rx_pkt_burst == ixgbe_recv_pkts_bulk_alloc_lock)
+#endif
return ptypes;
return NULL;
}
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..701107b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -390,12 +390,32 @@ uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
struct rte_mbuf **rx_pkts, uint16_t nb_pkts);

+uint16_t ixgbe_recv_pkts_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_bulk_alloc_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_lro_single_alloc_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_lro_bulk_alloc_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+
uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t ixgbe_xmit_pkts_lock(void *tx_queue,
+ struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
+uint16_t ixgbe_xmit_pkts_simple_lock(void *tx_queue,
+ struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
+
int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
struct rte_eth_rss_conf *rss_conf);

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 9c6eaf2..a45d115 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -353,6 +353,8 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
return nb_tx;
}

+GENERATE_TX_LOCK(ixgbe_xmit_pkts_simple, ixgbe)
+
static inline void
ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq,
volatile struct ixgbe_adv_tx_context_desc *ctx_txd,
@@ -904,6 +906,8 @@ end_of_tx:
return nb_tx;
}

+GENERATE_TX_LOCK(ixgbe_xmit_pkts, ixgbe)
+
/*********************************************************************
*
* RX functions
@@ -1524,6 +1528,8 @@ ixgbe_recv_pkts_bulk_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
return nb_rx;
}

+GENERATE_RX_LOCK(ixgbe_recv_pkts_bulk_alloc, ixgbe)
+
uint16_t
ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts)
@@ -1712,6 +1718,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
return nb_rx;
}

+GENERATE_RX_LOCK(ixgbe_recv_pkts, ixgbe)
+
/**
* Detect an RSC descriptor.
*/
@@ -2071,6 +2079,8 @@ ixgbe_recv_pkts_lro_single_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, false);
}

+GENERATE_RX_LOCK(ixgbe_recv_pkts_lro_single_alloc, ixgbe)
+
uint16_t
ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts)
@@ -2078,6 +2088,8 @@ ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
}

+GENERATE_RX_LOCK(ixgbe_recv_pkts_lro_bulk_alloc, ixgbe)
+
/*********************************************************************
*
* Queue management functions
@@ -2186,10 +2198,12 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
(rte_eal_process_type() != RTE_PROC_PRIMARY ||
ixgbe_txq_vec_setup(txq) == 0)) {
PMD_INIT_LOG(DEBUG, "Vector tx enabled.");
- dev->tx_pkt_burst = ixgbe_xmit_pkts_vec;
+ dev->tx_pkt_burst =
+ TX_LOCK_FUNCTION(dev, ixgbe_xmit_pkts_vec);
} else
#endif
- dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
+ dev->tx_pkt_burst =
+ TX_LOCK_FUNCTION(dev, ixgbe_xmit_pkts_simple);
} else {
PMD_INIT_LOG(DEBUG, "Using full-featured tx code path");
PMD_INIT_LOG(DEBUG,
@@ -2200,7 +2214,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
" - tx_rs_thresh = %lu " "[RTE_PMD_IXGBE_TX_MAX_BURST=%lu]",
(unsigned long)txq->tx_rs_thresh,
(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
- dev->tx_pkt_burst = ixgbe_xmit_pkts;
+ dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, ixgbe_xmit_pkts);
}
}

@@ -2347,6 +2361,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
txq->txq_flags = tx_conf->txq_flags;
txq->ops = &def_txq_ops;
txq->tx_deferred_start = tx_conf->tx_deferred_start;
+ rte_spinlock_init(&txq->tx_lock);

/*
* Modification to set VFTDT for virtual function if vf is detected
@@ -2625,6 +2640,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
0 : ETHER_CRC_LEN);
rxq->drop_en = rx_conf->rx_drop_en;
rxq->rx_deferred_start = rx_conf->rx_deferred_start;
+ rte_spinlock_init(&rxq->rx_lock);

/*
* The packet type in RX descriptor is different for different NICs.
@@ -4172,11 +4188,15 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
if (adapter->rx_bulk_alloc_allowed) {
PMD_INIT_LOG(DEBUG, "LRO is requested. Using a bulk "
"allocation version");
- dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
+ dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(dev,
+ ixgbe_recv_pkts_lro_bulk_alloc);
} else {
PMD_INIT_LOG(DEBUG, "LRO is requested. Using a single "
"allocation version");
- dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
+ dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(dev,
+ ixgbe_recv_pkts_lro_single_alloc);
}
} else if (dev->data->scattered_rx) {
/*
@@ -4188,12 +4208,16 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
"callback (port=%d).",
dev->data->port_id);

- dev->rx_pkt_burst = ixgbe_recv_scattered_pkts_vec;
+ dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(dev,
+ ixgbe_recv_scattered_pkts_vec);
} else if (adapter->rx_bulk_alloc_allowed) {
PMD_INIT_LOG(DEBUG, "Using a Scattered with bulk "
"allocation callback (port=%d).",
dev->data->port_id);
- dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
+ dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(dev,
+ ixgbe_recv_pkts_lro_bulk_alloc);
} else {
PMD_INIT_LOG(DEBUG, "Using Regualr (non-vector, "
"single allocation) "
@@ -4201,7 +4225,9 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
"(port=%d).",
dev->data->port_id);

- dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
+ dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(dev,
+ ixgbe_recv_pkts_lro_single_alloc);
}
/*
* Below we set "simple" callbacks according to port/queues parameters.
@@ -4217,28 +4243,36 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
RTE_IXGBE_DESCS_PER_LOOP,
dev->data->port_id);

- dev->rx_pkt_burst = ixgbe_recv_pkts_vec;
+ dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, ixgbe_recv_pkts_vec);
} else if (adapter->rx_bulk_alloc_allowed) {
PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are "
"satisfied. Rx Burst Bulk Alloc function "
"will be used on port=%d.",
dev->data->port_id);

- dev->rx_pkt_burst = ixgbe_recv_pkts_bulk_alloc;
+ dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(dev,
+ ixgbe_recv_pkts_bulk_alloc);
} else {
PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are not "
"satisfied, or Scattered Rx is requested "
"(port=%d).",
dev->data->port_id);

- dev->rx_pkt_burst = ixgbe_recv_pkts;
+ dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, ixgbe_recv_pkts);
}

/* Propagate information about RX function choice through all queues. */

rx_using_sse =
(dev->rx_pkt_burst == ixgbe_recv_scattered_pkts_vec ||
+#ifndef RTE_NEXT_ABI
dev->rx_pkt_burst == ixgbe_recv_pkts_vec);
+#else
+ dev->rx_pkt_burst == ixgbe_recv_pkts_vec ||
+ dev->rx_pkt_burst == ixgbe_recv_scattered_pkts_vec_lock ||
+ dev->rx_pkt_burst == ixgbe_recv_pkts_vec_lock);
+#endif

for (i = 0; i < dev->data->nb_rx_queues; i++) {
struct ixgbe_rx_queue *rxq = dev->data->rx_queues[i];
@@ -5225,6 +5259,15 @@ ixgbe_recv_pkts_vec(
}

uint16_t __attribute__((weak))
+ixgbe_recv_pkts_vec_lock(
+ void __rte_unused *rx_queue,
+ struct rte_mbuf __rte_unused **rx_pkts,
+ uint16_t __rte_unused nb_pkts)
+{
+ return 0;
+}
+
+uint16_t __attribute__((weak))
ixgbe_recv_scattered_pkts_vec(
void __rte_unused *rx_queue,
struct rte_mbuf __rte_unused **rx_pkts,
@@ -5233,6 +5276,15 @@ ixgbe_recv_scattered_pkts_vec(
return 0;
}

+uint16_t __attribute__((weak))
+ixgbe_recv_scattered_pkts_vec_lock(
+ void __rte_unused *rx_queue,
+ struct rte_mbuf __rte_unused **rx_pkts,
+ uint16_t __rte_unused nb_pkts)
+{
+ return 0;
+}
+
int __attribute__((weak))
ixgbe_rxq_vec_setup(struct ixgbe_rx_queue __rte_unused *rxq)
{
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 3691a19..5f0ca1f 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -34,6 +34,8 @@
#ifndef _IXGBE_RXTX_H_
#define _IXGBE_RXTX_H_

+#include <rte_spinlock.h>
+
/*
* Rings setup and release.
*
@@ -126,6 +128,7 @@ struct ixgbe_rx_queue {
struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
struct rte_mbuf *pkt_last_seg; /**< Last segment of current packet. */
uint64_t mbuf_initializer; /**< value to init mbufs */
+ rte_spinlock_t rx_lock; /**< Lock for packet receiption. */
uint16_t nb_rx_desc; /**< number of RX descriptors. */
uint16_t rx_tail; /**< current value of RDT register. */
uint16_t nb_rx_hold; /**< number of held free RX desc. */
@@ -212,6 +215,7 @@ struct ixgbe_tx_queue {
struct ixgbe_tx_entry_v *sw_ring_v; /**< address of SW ring for vector PMD */
};
volatile uint32_t *tdt_reg_addr; /**< Address of TDT register. */
+ rte_spinlock_t tx_lock; /**< Lock for packet transmission. */
uint16_t nb_tx_desc; /**< number of TX descriptors. */
uint16_t tx_tail; /**< current value of TDT reg. */
/**< Start freeing TX buffers if there are less free descriptors than
@@ -301,6 +305,12 @@ uint16_t ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);
uint16_t ixgbe_recv_scattered_pkts_vec(void *rx_queue,
struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_vec_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+uint16_t ixgbe_recv_scattered_pkts_vec_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
int ixgbe_rx_vec_dev_conf_condition_check(struct rte_eth_dev *dev);
int ixgbe_rxq_vec_setup(struct ixgbe_rx_queue *rxq);
void ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue *rxq);
@@ -309,6 +319,9 @@ void ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue *rxq);

uint16_t ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+uint16_t ixgbe_xmit_pkts_vec_lock(void *tx_queue,
+ struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
int ixgbe_txq_vec_setup(struct ixgbe_tx_queue *txq);

#endif /* RTE_IXGBE_INC_VECTOR */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index e97ea82..32ecbd2 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -420,6 +420,8 @@ ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
return _recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
}

+GENERATE_RX_LOCK(ixgbe_recv_pkts_vec, ixgbe)
+
static inline uint16_t
reassemble_packets(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_bufs,
uint16_t nb_bufs, uint8_t *split_flags)
@@ -526,6 +528,8 @@ ixgbe_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
&split_flags[i]);
}

+GENERATE_RX_LOCK(ixgbe_recv_scattered_pkts_vec, ixgbe)
+
static inline void
vtx1(volatile union ixgbe_adv_tx_desc *txdp,
struct rte_mbuf *pkt, uint64_t flags)
@@ -680,6 +684,8 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
return nb_pkts;
}

+GENERATE_TX_LOCK(ixgbe_xmit_pkts_vec, ixgbe)
+
static void __attribute__((cold))
ixgbe_tx_queue_release_mbufs_vec(struct ixgbe_tx_queue *txq)
{
--
1.9.3
Wenzhuo Lu
2016-06-06 05:40:49 UTC
Permalink
Implement the device reset function.
1, Add the fake RX/TX functions.
2, The reset function tries to stop RX/TX by replacing
the RX/TX functions with the fake ones and getting the
locks to make sure the regular RX/TX finished.
3, After the RX/TX stopped, reset the VF port, and then
release the locks and restore the RX/TX functions.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
doc/guides/rel_notes/release_16_07.rst | 9 +++
drivers/net/ixgbe/ixgbe_ethdev.c | 108 ++++++++++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 12 +++-
drivers/net/ixgbe/ixgbe_rxtx.c | 42 ++++++++++++-
4 files changed, 168 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a761e3c..d36c4b1 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,6 +53,15 @@ New Features
VF. To handle this link up/down event, add the mailbox interruption
support to receive the message.

+* **Added device reset support for ixgbe VF.**
+
+ Added the device reset API. APP can call this API to reset the VF port
+ when it's not working.
+ Based on the mailbox interruption support, when VF reseives the control
+ message from PF, it means the PF link state changes, VF uses the reset
+ callback in the message handler to notice the APP. APP need call the device
+ reset API to reset the VF port.
+

Resolved Issues
---------------
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index fd2682f..1e3520b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -381,6 +381,8 @@ static int ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
struct rte_eth_udp_tunnel *udp_tunnel);

+static int ixgbevf_dev_reset(struct rte_eth_dev *dev);
+
/*
* Define VF Stats MACRO for Non "cleared on read" register
*/
@@ -586,6 +588,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
.reta_query = ixgbe_dev_rss_reta_query,
.rss_hash_update = ixgbe_dev_rss_hash_update,
.rss_hash_conf_get = ixgbe_dev_rss_hash_conf_get,
+ .dev_reset = ixgbevf_dev_reset,
};

/* store statistics names and its offset in stats structure */
@@ -4060,7 +4063,8 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
ETH_VLAN_EXTEND_MASK;
ixgbevf_vlan_offload_set(dev, mask);

- ixgbevf_dev_rxtx_start(dev);
+ if (ixgbevf_dev_rxtx_start(dev))
+ return -1;

/* check and configure queue intr-vector mapping */
if (dev->data->dev_conf.intr_conf.rxq != 0) {
@@ -7193,6 +7197,108 @@ static void ixgbevf_mbx_process(struct rte_eth_dev *dev)
}

static int
+ixgbevf_dev_reset(struct rte_eth_dev *dev)
+{
+ struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ struct ixgbe_adapter *adapter =
+ (struct ixgbe_adapter *)dev->data->dev_private;
+ int diag = 0;
+ uint32_t vteiam;
+ uint16_t i;
+ struct ixgbe_rx_queue *rxq;
+ struct ixgbe_tx_queue *txq;
+
+ /* Nothing needs to be done if the device is not started. */
+ if (!dev->data->dev_started)
+ return 0;
+
+ PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+ /**
+ * Stop RX/TX by fake functions and locks.
+ * Fake functions are used to make RX/TX lock easier.
+ */
+ adapter->rx_backup = dev->rx_pkt_burst;
+ adapter->tx_backup = dev->tx_pkt_burst;
+ dev->rx_pkt_burst = ixgbevf_recv_pkts_fake;
+ dev->tx_pkt_burst = ixgbevf_xmit_pkts_fake;
+
+ if (dev->data->rx_queues)
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ rxq = dev->data->rx_queues[i];
+ rte_spinlock_lock(&rxq->rx_lock);
+ }
+
+ if (dev->data->tx_queues)
+ for (i = 0; i < dev->data->nb_tx_queues; i++) {
+ txq = dev->data->tx_queues[i];
+ rte_spinlock_lock(&txq->tx_lock);
+ }
+
+ /* Performance VF reset. */
+ do {
+ dev->data->dev_started = 0;
+ ixgbevf_dev_stop(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = ixgbe_dev_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+ "Failed to update link.");
+ }
+ rte_delay_ms(1000);
+
+ diag = ixgbevf_dev_start(dev);
+ /*If fail to start the device, need to stop/start it again. */
+ if (diag) {
+ PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
+ "Failed to start device.");
+ continue;
+ }
+ dev->data->dev_started = 1;
+ ixgbevf_dev_stats_reset(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = ixgbe_dev_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+ "Failed to update link.");
+ diag = 0;
+ }
+
+ /**
+ * When the PF link is down, there has chance
+ * that VF cannot operate its registers. Will
+ * check if the registers is written
+ * successfully. If not, repeat stop/start until
+ * the PF link is up, in other words, until the
+ * registers can be written.
+ */
+ vteiam = IXGBE_READ_REG(hw, IXGBE_VTEIAM);
+ /* Reference ixgbevf_intr_enable when checking */
+ } while (diag || vteiam != IXGBE_VF_IRQ_ENABLE_MASK);
+
+ /**
+ * Release the locks for queues.
+ * Restore the RX/TX functions.
+ */
+ if (dev->data->rx_queues)
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ rxq = dev->data->rx_queues[i];
+ rte_spinlock_unlock(&rxq->rx_lock);
+ }
+
+ if (dev->data->tx_queues)
+ for (i = 0; i < dev->data->nb_tx_queues; i++) {
+ txq = dev->data->tx_queues[i];
+ rte_spinlock_unlock(&txq->tx_lock);
+ }
+
+ dev->rx_pkt_burst = adapter->rx_backup;
+ dev->tx_pkt_burst = adapter->tx_backup;
+
+ return 0;
+}
+
+static int
ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev)
{
uint32_t eicr;
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 701107b..d50fad4 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -289,6 +289,8 @@ struct ixgbe_adapter {
struct rte_timecounter systime_tc;
struct rte_timecounter rx_tstamp_tc;
struct rte_timecounter tx_tstamp_tc;
+ eth_rx_burst_t rx_backup;
+ eth_tx_burst_t tx_backup;
};

#define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
@@ -377,7 +379,7 @@ int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);

void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);

-void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
+int ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);

uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);
@@ -409,6 +411,14 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t ixgbevf_recv_pkts_fake(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+
+uint16_t ixgbevf_xmit_pkts_fake(void *tx_queue,
+ struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
+
uint16_t ixgbe_xmit_pkts_lock(void *tx_queue,
struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index a45d115..b4e7659 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -5181,7 +5181,7 @@ ixgbevf_dev_tx_init(struct rte_eth_dev *dev)
/*
* [VF] Start Transmit and Receive Units.
*/
-void __attribute__((cold))
+int __attribute__((cold))
ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
{
struct ixgbe_hw *hw;
@@ -5218,7 +5218,15 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
txdctl = IXGBE_READ_REG(hw, IXGBE_VFTXDCTL(i));
} while (--poll_ms && !(txdctl & IXGBE_TXDCTL_ENABLE));
if (!poll_ms)
+#ifndef RTE_NEXT_ABI
PMD_INIT_LOG(ERR, "Could not enable Tx Queue %d", i);
+#else
+ {
+ PMD_INIT_LOG(ERR, "Could not enable Tx Queue %d", i);
+ if (dev->data->dev_conf.txmode.lock_mode)
+ return -1;
+ }
+#endif
}
for (i = 0; i < dev->data->nb_rx_queues; i++) {

@@ -5235,11 +5243,21 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i));
} while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE));
if (!poll_ms)
+#ifndef RTE_NEXT_ABI
+ PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d", i);
+#else
+ {
PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d", i);
+ if (dev->data->dev_conf.rxmode.lock_mode)
+ return -1;
+ }
+#endif
rte_wmb();
IXGBE_WRITE_REG(hw, IXGBE_VFRDT(i), rxq->nb_rx_desc - 1);

}
+
+ return 0;
}

/* Stubs needed for linkage when CONFIG_RTE_IXGBE_INC_VECTOR is set to 'n' */
@@ -5290,3 +5308,25 @@ ixgbe_rxq_vec_setup(struct ixgbe_rx_queue __rte_unused *rxq)
{
return -1;
}
+
+/**
+ * A fake function to stop receiption.
+ */
+uint16_t
+ixgbevf_recv_pkts_fake(void __rte_unused *rx_queue,
+ struct rte_mbuf __rte_unused **rx_pkts,
+ uint16_t __rte_unused nb_pkts)
+{
+ return 0;
+}
+
+/**
+ * A fake function to stop transmission.
+ */
+uint16_t
+ixgbevf_xmit_pkts_fake(void __rte_unused *tx_queue,
+ struct rte_mbuf __rte_unused **tx_pkts,
+ uint16_t __rte_unused nb_pkts)
+{
+ return 0;
+}
--
1.9.3
Wenzhuo Lu
2016-06-06 05:40:50 UTC
Permalink
Add RX/TX paths with lock for VF. It's used when
the function of link reset on VF is needed.
When the lock for RX/TX is added, the RX/TX can be
stopped. Then we have a chance to reset the VF link.

Please be aware there's performence drop if the lock
path is chosen.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
drivers/net/e1000/e1000_ethdev.h | 10 ++++++++++
drivers/net/e1000/igb_ethdev.c | 14 +++++++++++---
drivers/net/e1000/igb_rxtx.c | 26 +++++++++++++++++++++-----
3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index e8bf8da..6a42994 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -319,6 +319,16 @@ uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
uint16_t eth_igb_recv_scattered_pkts(void *rxq,
struct rte_mbuf **rx_pkts, uint16_t nb_pkts);

+uint16_t eth_igb_xmit_pkts_lock(void *txq,
+ struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
+uint16_t eth_igb_recv_pkts_lock(void *rxq,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+uint16_t eth_igb_recv_scattered_pkts_lock(void *rxq,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+
int eth_igb_rss_hash_update(struct rte_eth_dev *dev,
struct rte_eth_rss_conf *rss_conf);

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index b0e5e6a..8aad741 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -909,15 +909,17 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
PMD_INIT_FUNC_TRACE();

eth_dev->dev_ops = &igbvf_eth_dev_ops;
- eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
- eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+ eth_dev->rx_pkt_burst = RX_LOCK_FUNCTION(eth_dev, eth_igb_recv_pkts);
+ eth_dev->tx_pkt_burst = TX_LOCK_FUNCTION(eth_dev, eth_igb_xmit_pkts);

/* for secondary processes, we don't initialise any further as primary
* has already done this work. Only check we don't need a different
* RX function */
if (rte_eal_process_type() != RTE_PROC_PRIMARY){
if (eth_dev->data->scattered_rx)
- eth_dev->rx_pkt_burst = &eth_igb_recv_scattered_pkts;
+ eth_dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(eth_dev,
+ eth_igb_recv_scattered_pkts);
return 0;
}

@@ -1999,7 +2001,13 @@ eth_igb_supported_ptypes_get(struct rte_eth_dev *dev)
};

if (dev->rx_pkt_burst == eth_igb_recv_pkts ||
+#ifndef RTE_NEXT_ABI
dev->rx_pkt_burst == eth_igb_recv_scattered_pkts)
+#else
+ dev->rx_pkt_burst == eth_igb_recv_scattered_pkts ||
+ dev->rx_pkt_burst == eth_igb_recv_pkts_lock ||
+ dev->rx_pkt_burst == eth_igb_recv_scattered_pkts_lock)
+#endif
return ptypes;
return NULL;
}
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 18aeead..7e97330 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -67,6 +67,7 @@
#include <rte_tcp.h>
#include <rte_sctp.h>
#include <rte_string_fns.h>
+#include <rte_spinlock.h>

#include "e1000_logs.h"
#include "base/e1000_api.h"
@@ -107,6 +108,7 @@ struct igb_rx_queue {
struct igb_rx_entry *sw_ring; /**< address of RX software ring. */
struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
struct rte_mbuf *pkt_last_seg; /**< Last segment of current packet. */
+ rte_spinlock_t rx_lock; /**< Lock for packet receiption. */
uint16_t nb_rx_desc; /**< number of RX descriptors. */
uint16_t rx_tail; /**< current value of RDT register. */
uint16_t nb_rx_hold; /**< number of held free RX desc. */
@@ -174,6 +176,7 @@ struct igb_tx_queue {
volatile union e1000_adv_tx_desc *tx_ring; /**< TX ring address */
uint64_t tx_ring_phys_addr; /**< TX ring DMA address. */
struct igb_tx_entry *sw_ring; /**< virtual address of SW ring. */
+ rte_spinlock_t tx_lock; /**< Lock for packet transmission. */
volatile uint32_t *tdt_reg_addr; /**< Address of TDT register. */
uint32_t txd_type; /**< Device-specific TXD type */
uint16_t nb_tx_desc; /**< number of TX descriptors. */
@@ -615,6 +618,8 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
return nb_tx;
}

+GENERATE_TX_LOCK(eth_igb_xmit_pkts, igb)
+
/*********************************************************************
*
* RX functions
@@ -931,6 +936,8 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
return nb_rx;
}

+GENERATE_RX_LOCK(eth_igb_recv_pkts, igb)
+
uint16_t
eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts)
@@ -1186,6 +1193,8 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
return nb_rx;
}

+GENERATE_RX_LOCK(eth_igb_recv_scattered_pkts, igb)
+
/*
* Maximum number of Ring Descriptors.
*
@@ -1344,6 +1353,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
txq->reg_idx = (uint16_t)((RTE_ETH_DEV_SRIOV(dev).active == 0) ?
queue_idx : RTE_ETH_DEV_SRIOV(dev).def_pool_q_idx + queue_idx);
txq->port_id = dev->data->port_id;
+ rte_spinlock_init(&txq->tx_lock);

txq->tdt_reg_addr = E1000_PCI_REG_ADDR(hw, E1000_TDT(txq->reg_idx));
txq->tx_ring_phys_addr = rte_mem_phy2mch(tz->memseg_id, tz->phys_addr);
@@ -1361,7 +1371,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
txq->sw_ring, txq->tx_ring, txq->tx_ring_phys_addr);

igb_reset_tx_queue(txq, dev);
- dev->tx_pkt_burst = eth_igb_xmit_pkts;
+ dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, eth_igb_xmit_pkts);
dev->data->tx_queues[queue_idx] = txq;

return 0;
@@ -1467,6 +1477,7 @@ eth_igb_rx_queue_setup(struct rte_eth_dev *dev,
rxq->port_id = dev->data->port_id;
rxq->crc_len = (uint8_t) ((dev->data->dev_conf.rxmode.hw_strip_crc) ? 0 :
ETHER_CRC_LEN);
+ rte_spinlock_init(&rxq->rx_lock);

/*
* Allocate RX ring hardware descriptors. A memzone large enough to
@@ -2323,7 +2334,7 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)

/* Configure and enable each RX queue. */
rctl_bsize = 0;
- dev->rx_pkt_burst = eth_igb_recv_pkts;
+ dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, eth_igb_recv_pkts);
for (i = 0; i < dev->data->nb_rx_queues; i++) {
uint64_t bus_addr;
uint32_t rxdctl;
@@ -2370,7 +2381,9 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
if (!dev->data->scattered_rx)
PMD_INIT_LOG(DEBUG,
"forcing scatter mode");
- dev->rx_pkt_burst = eth_igb_recv_scattered_pkts;
+ dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(dev,
+ eth_igb_recv_scattered_pkts);
dev->data->scattered_rx = 1;
}
} else {
@@ -2381,7 +2394,9 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
rctl_bsize = buf_size;
if (!dev->data->scattered_rx)
PMD_INIT_LOG(DEBUG, "forcing scatter mode");
- dev->rx_pkt_burst = eth_igb_recv_scattered_pkts;
+ dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(dev,
+ eth_igb_recv_scattered_pkts);
dev->data->scattered_rx = 1;
}

@@ -2414,7 +2429,8 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
if (dev->data->dev_conf.rxmode.enable_scatter) {
if (!dev->data->scattered_rx)
PMD_INIT_LOG(DEBUG, "forcing scatter mode");
- dev->rx_pkt_burst = eth_igb_recv_scattered_pkts;
+ dev->rx_pkt_burst =
+ RX_LOCK_FUNCTION(dev, eth_igb_recv_scattered_pkts);
dev->data->scattered_rx = 1;
}
--
1.9.3
Wenzhuo Lu
2016-06-06 05:40:51 UTC
Permalink
Implement the device reset function.
1, Add the fake RX/TX functions.
2, The reset function tries to stop RX/TX by replacing
the RX/TX functions with the fake ones and getting the
locks to make sure the regular RX/TX finished.
3, After the RX/TX stopped, reset the VF port, and then
release the locks and restore the RX/TX functions.

BTW: The definition of some structures are moved from .c
file to .h file.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
doc/guides/rel_notes/release_16_07.rst | 2 +-
drivers/net/e1000/e1000_ethdev.h | 116 ++++++++++++++++++++++++++++++
drivers/net/e1000/igb_ethdev.c | 104 +++++++++++++++++++++++++++
drivers/net/e1000/igb_rxtx.c | 128 ++++++---------------------------
4 files changed, 243 insertions(+), 107 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index d36c4b1..a4c0cc3 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,7 +53,7 @@ New Features
VF. To handle this link up/down event, add the mailbox interruption
support to receive the message.

-* **Added device reset support for ixgbe VF.**
+* **Added device reset support for ixgbe/igb VF.**

Added the device reset API. APP can call this API to reset the VF port
when it's not working.
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6a42994..4ae03ce 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -34,6 +34,7 @@
#ifndef _E1000_ETHDEV_H_
#define _E1000_ETHDEV_H_
#include <rte_time.h>
+#include <rte_spinlock.h>

/* need update link, bit flag */
#define E1000_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0)
@@ -261,6 +262,113 @@ struct e1000_adapter {
struct rte_timecounter systime_tc;
struct rte_timecounter rx_tstamp_tc;
struct rte_timecounter tx_tstamp_tc;
+ eth_rx_burst_t rx_backup;
+ eth_tx_burst_t tx_backup;
+};
+
+/**
+ * Structure associated with each descriptor of the RX ring of a RX queue.
+ */
+struct igb_rx_entry {
+ struct rte_mbuf *mbuf; /**< mbuf associated with RX descriptor. */
+};
+
+/**
+ * Structure associated with each descriptor of the TX ring of a TX queue.
+ */
+struct igb_tx_entry {
+ struct rte_mbuf *mbuf; /**< mbuf associated with TX desc, if any. */
+ uint16_t next_id; /**< Index of next descriptor in ring. */
+ uint16_t last_id; /**< Index of last scattered descriptor. */
+};
+
+/**
+ * Hardware context number
+ */
+enum igb_advctx_num {
+ IGB_CTX_0 = 0, /**< CTX0 */
+ IGB_CTX_1 = 1, /**< CTX1 */
+ IGB_CTX_NUM = 2, /**< CTX_NUM */
+};
+
+/** Offload features */
+union igb_tx_offload {
+ uint64_t data;
+ struct {
+ uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+ uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
+ uint64_t vlan_tci:16; /**< VLAN Tag Control Identifier(CPU order). */
+ uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+ uint64_t tso_segsz:16; /**< TCP TSO segment size. */
+
+ /* uint64_t unused:8; */
+ };
+};
+
+/**
+ * Strucutre to check if new context need be built
+ */
+struct igb_advctx_info {
+ uint64_t flags; /**< ol_flags related to context build. */
+ /** tx offload: vlan, tso, l2-l3-l4 lengths. */
+ union igb_tx_offload tx_offload;
+ /** compare mask for tx offload. */
+ union igb_tx_offload tx_offload_mask;
+};
+
+/**
+ * Structure associated with each RX queue.
+ */
+struct igb_rx_queue {
+ struct rte_mempool *mb_pool; /**< mbuf pool to populate RX ring. */
+ volatile union e1000_adv_rx_desc *rx_ring; /**< RX ring virtual address. */
+ uint64_t rx_ring_phys_addr; /**< RX ring DMA address. */
+ volatile uint32_t *rdt_reg_addr; /**< RDT register address. */
+ volatile uint32_t *rdh_reg_addr; /**< RDH register address. */
+ struct igb_rx_entry *sw_ring; /**< address of RX software ring. */
+ struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
+ struct rte_mbuf *pkt_last_seg; /**< Last segment of current packet. */
+ rte_spinlock_t rx_lock; /**< Lock for packet receiption. */
+ uint16_t nb_rx_desc; /**< number of RX descriptors. */
+ uint16_t rx_tail; /**< current value of RDT register. */
+ uint16_t nb_rx_hold; /**< number of held free RX desc. */
+ uint16_t rx_free_thresh; /**< max free RX desc to hold. */
+ uint16_t queue_id; /**< RX queue index. */
+ uint16_t reg_idx; /**< RX queue register index. */
+ uint8_t port_id; /**< Device port identifier. */
+ uint8_t pthresh; /**< Prefetch threshold register. */
+ uint8_t hthresh; /**< Host threshold register. */
+ uint8_t wthresh; /**< Write-back threshold register. */
+ uint8_t crc_len; /**< 0 if CRC stripped, 4 otherwise. */
+ uint8_t drop_en; /**< If not 0, set SRRCTL.Drop_En. */
+};
+
+/**
+ * Structure associated with each TX queue.
+ */
+struct igb_tx_queue {
+ volatile union e1000_adv_tx_desc *tx_ring; /**< TX ring address */
+ uint64_t tx_ring_phys_addr; /**< TX ring DMA address. */
+ struct igb_tx_entry *sw_ring; /**< virtual address of SW ring. */
+ volatile uint32_t *tdt_reg_addr; /**< Address of TDT register. */
+ rte_spinlock_t tx_lock; /**< Lock for packet transmission. */
+ uint32_t txd_type; /**< Device-specific TXD type */
+ uint16_t nb_tx_desc; /**< number of TX descriptors. */
+ uint16_t tx_tail; /**< Current value of TDT register. */
+ uint16_t tx_head;
+ /**< Index of first used TX descriptor. */
+ uint16_t queue_id; /**< TX queue index. */
+ uint16_t reg_idx; /**< TX queue register index. */
+ uint8_t port_id; /**< Device port identifier. */
+ uint8_t pthresh; /**< Prefetch threshold register. */
+ uint8_t hthresh; /**< Host threshold register. */
+ uint8_t wthresh; /**< Write-back threshold register. */
+ uint32_t ctx_curr;
+ /**< Current used hardware descriptor. */
+ uint32_t ctx_start;
+ /**< Start context position for transmit queue. */
+ struct igb_advctx_info ctx_cache[IGB_CTX_NUM];
+ /**< Hardware context history.*/
};

#define E1000_DEV_PRIVATE(adapter) \
@@ -316,6 +424,14 @@ uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

+uint16_t eth_igbvf_xmit_pkts_fake(void *txq,
+ struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
+
+uint16_t eth_igbvf_recv_pkts_fake(void *rxq,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+
uint16_t eth_igb_recv_scattered_pkts(void *rxq,
struct rte_mbuf **rx_pkts, uint16_t nb_pkts);

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 8aad741..4b78a25 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -268,6 +268,7 @@ static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
void *param);
static void igbvf_mbx_process(struct rte_eth_dev *dev);
+static int igbvf_dev_reset(struct rte_eth_dev *dev);

/*
* Define VF Stats MACRO for Non "cleared on read" register
@@ -409,6 +410,7 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
.mac_addr_set = igbvf_default_mac_addr_set,
.get_reg_length = igbvf_get_reg_length,
.get_reg = igbvf_get_regs,
+ .dev_reset = igbvf_dev_reset,
};

/* store statistics names and its offset in stats structure */
@@ -2663,6 +2665,108 @@ void igbvf_mbx_process(struct rte_eth_dev *dev)
}

static int
+igbvf_dev_reset(struct rte_eth_dev *dev)
+{
+ struct e1000_adapter *adapter =
+ (struct e1000_adapter *)dev->data->dev_private;
+ struct e1000_hw *hw =
+ E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ int diag = 0;
+ uint32_t eiam;
+ uint16_t i;
+ struct igb_rx_queue *rxq;
+ struct igb_tx_queue *txq;
+ /* Reference igbvf_intr_enable */
+ uint32_t eiam_mbx = 1 << E1000_VTIVAR_MISC_MAILBOX;
+
+ /* Nothing needs to be done if the device is not started. */
+ if (!dev->data->dev_started)
+ return 0;
+
+ PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+ /**
+ * Stop RX/TX by fake functions and locks.
+ * Fake functions are used to make RX/TX lock easier.
+ */
+ adapter->rx_backup = dev->rx_pkt_burst;
+ adapter->tx_backup = dev->tx_pkt_burst;
+ dev->rx_pkt_burst = eth_igbvf_recv_pkts_fake;
+ dev->tx_pkt_burst = eth_igbvf_xmit_pkts_fake;
+
+ if (dev->data->rx_queues)
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ rxq = dev->data->rx_queues[i];
+ rte_spinlock_lock(&rxq->rx_lock);
+ }
+
+ if (dev->data->tx_queues)
+ for (i = 0; i < dev->data->nb_tx_queues; i++) {
+ txq = dev->data->tx_queues[i];
+ rte_spinlock_lock(&txq->tx_lock);
+ }
+
+ /* Performance VF reset. */
+ do {
+ dev->data->dev_started = 0;
+ igbvf_dev_stop(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = eth_igb_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Igb VF reset: "
+ "Failed to update link.");
+ }
+ rte_delay_ms(1000);
+
+ diag = igbvf_dev_start(dev);
+ if (diag) {
+ PMD_INIT_LOG(ERR, "Igb VF reset: "
+ "Failed to start device.");
+ return diag;
+ }
+ dev->data->dev_started = 1;
+ eth_igbvf_stats_reset(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = eth_igb_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Igb VF reset: "
+ "Failed to update link.");
+ }
+
+ /**
+ * When the PF link is down, there has chance
+ * that VF cannot operate its registers. Will
+ * check if the registers is written
+ * successfully. If not, repeat stop/start until
+ * the PF link is up, in other words, until the
+ * registers can be written.
+ */
+ eiam = E1000_READ_REG(hw, E1000_EIAM);
+ } while (!(eiam & eiam_mbx));
+
+ /**
+ * Release the locks for queues.
+ * Restore the RX/TX functions.
+ */
+ if (dev->data->rx_queues)
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ rxq = dev->data->rx_queues[i];
+ rte_spinlock_unlock(&rxq->rx_lock);
+ }
+
+ if (dev->data->tx_queues)
+ for (i = 0; i < dev->data->nb_tx_queues; i++) {
+ txq = dev->data->tx_queues[i];
+ rte_spinlock_unlock(&txq->tx_lock);
+ }
+
+ dev->rx_pkt_burst = adapter->rx_backup;
+ dev->tx_pkt_burst = adapter->tx_backup;
+
+ return 0;
+}
+
+static int
eth_igbvf_interrupt_action(struct rte_eth_dev *dev)
{
struct e1000_interrupt *intr =
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 7e97330..5af7173 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -67,7 +67,6 @@
#include <rte_tcp.h>
#include <rte_sctp.h>
#include <rte_string_fns.h>
-#include <rte_spinlock.h>

#include "e1000_logs.h"
#include "base/e1000_api.h"
@@ -80,72 +79,6 @@
PKT_TX_L4_MASK | \
PKT_TX_TCP_SEG)

-/**
- * Structure associated with each descriptor of the RX ring of a RX queue.
- */
-struct igb_rx_entry {
- struct rte_mbuf *mbuf; /**< mbuf associated with RX descriptor. */
-};
-
-/**
- * Structure associated with each descriptor of the TX ring of a TX queue.
- */
-struct igb_tx_entry {
- struct rte_mbuf *mbuf; /**< mbuf associated with TX desc, if any. */
- uint16_t next_id; /**< Index of next descriptor in ring. */
- uint16_t last_id; /**< Index of last scattered descriptor. */
-};
-
-/**
- * Structure associated with each RX queue.
- */
-struct igb_rx_queue {
- struct rte_mempool *mb_pool; /**< mbuf pool to populate RX ring. */
- volatile union e1000_adv_rx_desc *rx_ring; /**< RX ring virtual address. */
- uint64_t rx_ring_phys_addr; /**< RX ring DMA address. */
- volatile uint32_t *rdt_reg_addr; /**< RDT register address. */
- volatile uint32_t *rdh_reg_addr; /**< RDH register address. */
- struct igb_rx_entry *sw_ring; /**< address of RX software ring. */
- struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
- struct rte_mbuf *pkt_last_seg; /**< Last segment of current packet. */
- rte_spinlock_t rx_lock; /**< Lock for packet receiption. */
- uint16_t nb_rx_desc; /**< number of RX descriptors. */
- uint16_t rx_tail; /**< current value of RDT register. */
- uint16_t nb_rx_hold; /**< number of held free RX desc. */
- uint16_t rx_free_thresh; /**< max free RX desc to hold. */
- uint16_t queue_id; /**< RX queue index. */
- uint16_t reg_idx; /**< RX queue register index. */
- uint8_t port_id; /**< Device port identifier. */
- uint8_t pthresh; /**< Prefetch threshold register. */
- uint8_t hthresh; /**< Host threshold register. */
- uint8_t wthresh; /**< Write-back threshold register. */
- uint8_t crc_len; /**< 0 if CRC stripped, 4 otherwise. */
- uint8_t drop_en; /**< If not 0, set SRRCTL.Drop_En. */
-};
-
-/**
- * Hardware context number
- */
-enum igb_advctx_num {
- IGB_CTX_0 = 0, /**< CTX0 */
- IGB_CTX_1 = 1, /**< CTX1 */
- IGB_CTX_NUM = 2, /**< CTX_NUM */
-};
-
-/** Offload features */
-union igb_tx_offload {
- uint64_t data;
- struct {
- uint64_t l3_len:9; /**< L3 (IP) Header Length. */
- uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
- uint64_t vlan_tci:16; /**< VLAN Tag Control Identifier(CPU order). */
- uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
- uint64_t tso_segsz:16; /**< TCP TSO segment size. */
-
- /* uint64_t unused:8; */
- };
-};
-
/*
* Compare mask for igb_tx_offload.data,
* should be in sync with igb_tx_offload layout.
@@ -158,45 +91,6 @@ union igb_tx_offload {
#define TX_TSO_CMP_MASK \
(TX_MACIP_LEN_CMP_MASK | TX_TCP_LEN_CMP_MASK | TX_TSO_MSS_CMP_MASK)

-/**
- * Strucutre to check if new context need be built
- */
-struct igb_advctx_info {
- uint64_t flags; /**< ol_flags related to context build. */
- /** tx offload: vlan, tso, l2-l3-l4 lengths. */
- union igb_tx_offload tx_offload;
- /** compare mask for tx offload. */
- union igb_tx_offload tx_offload_mask;
-};
-
-/**
- * Structure associated with each TX queue.
- */
-struct igb_tx_queue {
- volatile union e1000_adv_tx_desc *tx_ring; /**< TX ring address */
- uint64_t tx_ring_phys_addr; /**< TX ring DMA address. */
- struct igb_tx_entry *sw_ring; /**< virtual address of SW ring. */
- rte_spinlock_t tx_lock; /**< Lock for packet transmission. */
- volatile uint32_t *tdt_reg_addr; /**< Address of TDT register. */
- uint32_t txd_type; /**< Device-specific TXD type */
- uint16_t nb_tx_desc; /**< number of TX descriptors. */
- uint16_t tx_tail; /**< Current value of TDT register. */
- uint16_t tx_head;
- /**< Index of first used TX descriptor. */
- uint16_t queue_id; /**< TX queue index. */
- uint16_t reg_idx; /**< TX queue register index. */
- uint8_t port_id; /**< Device port identifier. */
- uint8_t pthresh; /**< Prefetch threshold register. */
- uint8_t hthresh; /**< Host threshold register. */
- uint8_t wthresh; /**< Write-back threshold register. */
- uint32_t ctx_curr;
- /**< Current used hardware descriptor. */
- uint32_t ctx_start;
- /**< Start context position for transmit queue. */
- struct igb_advctx_info ctx_cache[IGB_CTX_NUM];
- /**< Hardware context history.*/
-};
-
#if 1
#define RTE_PMD_USE_PREFETCH
#endif
@@ -2530,3 +2424,25 @@ igb_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
qinfo->conf.tx_thresh.hthresh = txq->hthresh;
qinfo->conf.tx_thresh.wthresh = txq->wthresh;
}
+
+/**
+ * A fake function to stop transmission.
+ */
+uint16_t
+eth_igbvf_xmit_pkts_fake(void __rte_unused *tx_queue,
+ struct rte_mbuf __rte_unused **tx_pkts,
+ uint16_t __rte_unused nb_pkts)
+{
+ return 0;
+}
+
+/**
+ * A fake function to stop receiption.
+ */
+uint16_t
+eth_igbvf_recv_pkts_fake(void __rte_unused *rx_queue,
+ struct rte_mbuf __rte_unused **rx_pkts,
+ uint16_t __rte_unused nb_pkts)
+{
+ return 0;
+}
--
1.9.3
Wenzhuo Lu
2016-06-06 05:40:52 UTC
Permalink
Add RX/TX paths with lock for VF. It's used when
the function of link reset on VF is needed.
When the lock for RX/TX is added, the RX/TX can be
stopped. Then we have a chance to reset the VF link.

Please be aware there's performence drop if the lock
path is chosen.

Signed-off-by: Zhe Tao <***@intel.com>
---
drivers/net/i40e/i40e_ethdev.c | 4 ++--
drivers/net/i40e/i40e_ethdev.h | 4 ++++
drivers/net/i40e/i40e_ethdev_vf.c | 4 ++--
drivers/net/i40e/i40e_rxtx.c | 45 +++++++++++++++++++++++++--------------
drivers/net/i40e/i40e_rxtx.h | 30 ++++++++++++++++++++++++++
5 files changed, 67 insertions(+), 20 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 24777d5..1380330 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -764,8 +764,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();

dev->dev_ops = &i40e_eth_dev_ops;
- dev->rx_pkt_burst = i40e_recv_pkts;
- dev->tx_pkt_burst = i40e_xmit_pkts;
+ dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_pkts);
+ dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, i40e_xmit_pkts);

/* for secondary processes, we don't initialise any further as primary
* has already done this work. Only check we don't need a different
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cfd2399..672d920 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -540,6 +540,10 @@ struct i40e_adapter {
struct rte_timecounter systime_tc;
struct rte_timecounter rx_tstamp_tc;
struct rte_timecounter tx_tstamp_tc;
+
+ /* For VF reset backup */
+ eth_rx_burst_t rx_backup;
+ eth_tx_burst_t tx_backup;
};

int i40e_dev_switch_queues(struct i40e_pf *pf, bool on);
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 90682ac..46d8a7c 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1451,8 +1451,8 @@ i40evf_dev_init(struct rte_eth_dev *eth_dev)

/* assign ops func pointer */
eth_dev->dev_ops = &i40evf_eth_dev_ops;
- eth_dev->rx_pkt_burst = &i40e_recv_pkts;
- eth_dev->tx_pkt_burst = &i40e_xmit_pkts;
+ eth_dev->rx_pkt_burst = RX_LOCK_FUNCTION(eth_dev, i40e_recv_pkts);
+ eth_dev->tx_pkt_burst = TX_LOCK_FUNCTION(eth_dev, i40e_xmit_pkts);

/*
* For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index c833aa3..0a6dcfb 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -79,10 +79,6 @@
PKT_TX_TCP_SEG | \
PKT_TX_OUTER_IP_CKSUM)

-static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
- struct rte_mbuf **tx_pkts,
- uint16_t nb_pkts);
-
static inline void
i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp)
{
@@ -1144,7 +1140,7 @@ rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
return 0;
}

-static uint16_t
+uint16_t
i40e_recv_pkts_bulk_alloc(void *rx_queue,
struct rte_mbuf **rx_pkts,
uint16_t nb_pkts)
@@ -1169,7 +1165,7 @@ i40e_recv_pkts_bulk_alloc(void *rx_queue,
return nb_rx;
}
#else
-static uint16_t
+uint16_t
i40e_recv_pkts_bulk_alloc(void __rte_unused *rx_queue,
struct rte_mbuf __rte_unused **rx_pkts,
uint16_t __rte_unused nb_pkts)
@@ -1892,7 +1888,7 @@ tx_xmit_pkts(struct i40e_tx_queue *txq,
return nb_pkts;
}

-static uint16_t
+uint16_t
i40e_xmit_pkts_simple(void *tx_queue,
struct rte_mbuf **tx_pkts,
uint16_t nb_pkts)
@@ -2121,10 +2117,13 @@ i40e_dev_supported_ptypes_get(struct rte_eth_dev *dev)
};

if (dev->rx_pkt_burst == i40e_recv_pkts ||
+ dev->rx_pkt_burst == i40e_recv_pkts_lock ||
#ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
dev->rx_pkt_burst == i40e_recv_pkts_bulk_alloc ||
+ dev->rx_pkt_burst == i40e_recv_pkts_bulk_alloc_lock ||
#endif
- dev->rx_pkt_burst == i40e_recv_scattered_pkts)
+ dev->rx_pkt_burst == i40e_recv_scattered_pkts ||
+ dev->rx_pkt_burst == i40e_recv_scattered_pkts_lock)
return ptypes;
return NULL;
}
@@ -2648,6 +2647,7 @@ i40e_reset_rx_queue(struct i40e_rx_queue *rxq)

rxq->rxrearm_start = 0;
rxq->rxrearm_nb = 0;
+ rte_spinlock_init(&rxq->rx_lock);
}

void
@@ -2704,6 +2704,7 @@ i40e_reset_tx_queue(struct i40e_tx_queue *txq)

txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
+ rte_spinlock_init(&txq->tx_lock);
}

/* Init the TX queue in hardware */
@@ -3155,12 +3156,12 @@ i40e_set_rx_function(struct rte_eth_dev *dev)
"callback (port=%d).",
dev->data->port_id);

- dev->rx_pkt_burst = i40e_recv_scattered_pkts_vec;
+ dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_scattered_pkts_vec);
} else {
PMD_INIT_LOG(DEBUG, "Using a Scattered with bulk "
"allocation callback (port=%d).",
dev->data->port_id);
- dev->rx_pkt_burst = i40e_recv_scattered_pkts;
+ dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_scattered_pkts);
}
/* If parameters allow we are going to choose between the following
* callbacks:
@@ -3174,27 +3175,29 @@ i40e_set_rx_function(struct rte_eth_dev *dev)
RTE_I40E_DESCS_PER_LOOP,
dev->data->port_id);

- dev->rx_pkt_burst = i40e_recv_pkts_vec;
+ dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_pkts_vec);
} else if (ad->rx_bulk_alloc_allowed) {
PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are "
"satisfied. Rx Burst Bulk Alloc function "
"will be used on port=%d.",
dev->data->port_id);

- dev->rx_pkt_burst = i40e_recv_pkts_bulk_alloc;
+ dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_pkts_bulk_alloc);
} else {
PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are not "
"satisfied, or Scattered Rx is requested "
"(port=%d).",
dev->data->port_id);

- dev->rx_pkt_burst = i40e_recv_pkts;
+ dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_pkts);
}

/* Propagate information about RX function choice through all queues. */
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
rx_using_sse =
(dev->rx_pkt_burst == i40e_recv_scattered_pkts_vec ||
+ dev->rx_pkt_burst == i40e_recv_scattered_pkts_vec_lock ||
+ dev->rx_pkt_burst == i40e_recv_pkts_vec_lock ||
dev->rx_pkt_burst == i40e_recv_pkts_vec);

for (i = 0; i < dev->data->nb_rx_queues; i++) {
@@ -3250,14 +3253,14 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
if (ad->tx_simple_allowed) {
if (ad->tx_vec_allowed) {
PMD_INIT_LOG(DEBUG, "Vector tx finally be used.");
- dev->tx_pkt_burst = i40e_xmit_pkts_vec;
+ dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, i40e_xmit_pkts_vec);
} else {
PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
- dev->tx_pkt_burst = i40e_xmit_pkts_simple;
+ dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, i40e_xmit_pkts_simple);
}
} else {
PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
- dev->tx_pkt_burst = i40e_xmit_pkts;
+ dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, i40e_xmit_pkts);
}
}

@@ -3311,3 +3314,13 @@ i40e_xmit_pkts_vec(void __rte_unused *tx_queue,
{
return 0;
}
+
+GENERATE_RX_LOCK(i40e_recv_pkts, i40e)
+GENERATE_RX_LOCK(i40e_recv_pkts_vec, i40e)
+GENERATE_RX_LOCK(i40e_recv_pkts_bulk_alloc, i40e)
+GENERATE_RX_LOCK(i40e_recv_scattered_pkts, i40e)
+GENERATE_RX_LOCK(i40e_recv_scattered_pkts_vec, i40e)
+
+GENERATE_TX_LOCK(i40e_xmit_pkts, i40e)
+GENERATE_TX_LOCK(i40e_xmit_pkts_vec, i40e)
+GENERATE_TX_LOCK(i40e_xmit_pkts_simple, i40e)
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..a1c13b8 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -140,6 +140,7 @@ struct i40e_rx_queue {
bool rx_deferred_start; /**< don't start this queue in dev start */
uint16_t rx_using_sse; /**<flag indicate the usage of vPMD for rx */
uint8_t dcb_tc; /**< Traffic class of rx queue */
+ rte_spinlock_t rx_lock; /**< lock for rx path */
};

struct i40e_tx_entry {
@@ -181,6 +182,7 @@ struct i40e_tx_queue {
bool q_set; /**< indicate if tx queue has been configured */
bool tx_deferred_start; /**< don't start this queue in dev start */
uint8_t dcb_tc; /**< Traffic class of tx queue */
+ rte_spinlock_t tx_lock; /**< lock for tx path */
};

/** Offload features */
@@ -223,6 +225,27 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
uint16_t i40e_xmit_pkts(void *tx_queue,
struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+uint16_t i40e_xmit_pkts_lock(void *tx_queue,
+ struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
+uint16_t i40e_xmit_pkts_simple(void *tx_queue,
+ struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
+uint16_t i40e_xmit_pkts_simple_lock(void *tx_queue,
+ struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
+uint16_t i40e_recv_pkts_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+uint16_t i40e_recv_scattered_pkts_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+uint16_t i40e_recv_pkts_bulk_alloc(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+uint16_t i40e_recv_pkts_bulk_alloc_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
int i40e_tx_queue_init(struct i40e_tx_queue *txq);
int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
void i40e_free_tx_resources(struct i40e_tx_queue *txq);
@@ -244,12 +267,19 @@ uint16_t i40e_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t i40e_recv_scattered_pkts_vec(void *rx_queue,
struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);
+uint16_t i40e_recv_pkts_vec_lock(void *rx_queue, struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
+uint16_t i40e_recv_scattered_pkts_vec_lock(void *rx_queue,
+ struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts);
int i40e_rx_vec_dev_conf_condition_check(struct rte_eth_dev *dev);
int i40e_rxq_vec_setup(struct i40e_rx_queue *rxq);
int i40e_txq_vec_setup(struct i40e_tx_queue *txq);
void i40e_rx_queue_release_mbufs_vec(struct i40e_rx_queue *rxq);
uint16_t i40e_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+uint16_t i40e_xmit_pkts_vec_lock(void *tx_queue, struct rte_mbuf **tx_pkts,
+ uint16_t nb_pkts);
void i40e_set_rx_function(struct rte_eth_dev *dev);
void i40e_set_tx_function_flag(struct rte_eth_dev *dev,
struct i40e_tx_queue *txq);
--
1.9.3
Wenzhuo Lu
2016-06-06 05:40:53 UTC
Permalink
Implement the device reset function.
1, Add the fake RX/TX functions.
2, The reset function tries to stop RX/TX by replacing
the RX/TX functions with the fake ones and getting the
locks to make sure the regular RX/TX finished.
3, After the RX/TX stopped, reset the VF port, and then
release the locks.

Signed-off-by: Zhe Tao <***@intel.com>
---
doc/guides/rel_notes/release_16_07.rst | 5 ++
drivers/net/i40e/i40e_ethdev.h | 7 +-
drivers/net/i40e/i40e_ethdev_vf.c | 141 +++++++++++++++++++++++++++++++++
drivers/net/i40e/i40e_rxtx.h | 4 +
4 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a4c0cc3..f43b867 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -62,6 +62,11 @@ New Features
callback in the message handler to notice the APP. APP need call the device
reset API to reset the VF port.

+* **Added VF reset support for i40e VF driver.**
+
+ Added a new implementaion to allow i40e VF driver to
+ reset the functionality and state of itself.
+

Resolved Issues
---------------
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 672d920..dcd6e0f 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -541,9 +541,8 @@ struct i40e_adapter {
struct rte_timecounter rx_tstamp_tc;
struct rte_timecounter tx_tstamp_tc;

- /* For VF reset backup */
- eth_rx_burst_t rx_backup;
- eth_tx_burst_t tx_backup;
+ /* For VF reset */
+ uint8_t reset_number;
};

int i40e_dev_switch_queues(struct i40e_pf *pf, bool on);
@@ -597,6 +596,8 @@ void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
struct rte_eth_txq_info *qinfo);

+void i40evf_emulate_vf_reset(uint8_t port_id);
+
/* I40E_DEV_PRIVATE_TO */
#define I40E_DEV_PRIVATE_TO_PF(adapter) \
(&((struct i40e_adapter *)adapter)->pf)
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 46d8a7c..9fc121b 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -157,6 +157,12 @@ i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id);
static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
uint8_t *msg,
uint16_t msglen);
+static int i40evf_dev_uninit(struct rte_eth_dev *eth_dev);
+static int i40evf_dev_init(struct rte_eth_dev *eth_dev);
+static void i40evf_dev_close(struct rte_eth_dev *dev);
+static int i40evf_dev_start(struct rte_eth_dev *dev);
+static int i40evf_dev_configure(struct rte_eth_dev *dev);
+static int i40evf_handle_vf_reset(struct rte_eth_dev *dev);

/* Default hash key buffer for RSS */
static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -223,6 +229,7 @@ static const struct eth_dev_ops i40evf_eth_dev_ops = {
.reta_query = i40evf_dev_rss_reta_query,
.rss_hash_update = i40evf_dev_rss_hash_update,
.rss_hash_conf_get = i40evf_dev_rss_hash_conf_get,
+ .dev_reset = i40evf_handle_vf_reset
};

/*
@@ -1309,6 +1316,140 @@ i40evf_uninit_vf(struct rte_eth_dev *dev)
}

static void
+i40e_vf_queue_reset(struct rte_eth_dev *dev)
+{
+ uint16_t i;
+
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ struct i40e_rx_queue *rxq = dev->data->rx_queues[i];
+
+ if (rxq->q_set) {
+ i40e_dev_rx_queue_setup(dev,
+ rxq->queue_id,
+ rxq->nb_rx_desc,
+ rxq->socket_id,
+ &rxq->rxconf,
+ rxq->mp);
+ }
+
+ rxq = dev->data->rx_queues[i];
+ rte_spinlock_trylock(&rxq->rx_lock);
+ }
+ for (i = 0; i < dev->data->nb_tx_queues; i++) {
+ struct i40e_tx_queue *txq = dev->data->tx_queues[i];
+
+ if (txq->q_set) {
+ i40e_dev_tx_queue_setup(dev,
+ txq->queue_id,
+ txq->nb_tx_desc,
+ txq->socket_id,
+ &txq->txconf);
+ }
+
+ txq = dev->data->tx_queues[i];
+ rte_spinlock_trylock(&txq->tx_lock);
+ }
+}
+
+static void
+i40e_vf_reset_dev(struct rte_eth_dev *dev)
+{
+ struct i40e_adapter *adapter =
+ I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+ i40evf_dev_close(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev close complete");
+ i40evf_dev_uninit(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev detached");
+ memset(dev->data->dev_private, 0,
+ (uint64_t)&adapter->reset_number - (uint64_t)adapter);
+
+ i40evf_dev_configure(dev);
+ i40evf_dev_init(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev attached");
+ i40e_vf_queue_reset(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf queue reset");
+ i40evf_dev_start(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev restart");
+}
+
+static uint16_t
+i40evf_recv_pkts_detach(void __rte_unused *rx_queue,
+ struct rte_mbuf __rte_unused **rx_pkts,
+ uint16_t __rte_unused nb_pkts)
+{
+ return 0;
+}
+
+static uint16_t
+i40evf_xmit_pkts_detach(void __rte_unused *tx_queue,
+ struct rte_mbuf __rte_unused **tx_pkts,
+ uint16_t __rte_unused nb_pkts)
+{
+ return 0;
+}
+
+static int
+i40evf_handle_vf_reset(struct rte_eth_dev *dev)
+{
+ struct i40e_adapter *adapter =
+ I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+ uint16_t i = 0;
+ struct i40e_rx_queue *rxq;
+ struct i40e_tx_queue *txq;
+
+ if (!dev->data->dev_started)
+ return 0;
+
+ adapter->reset_number = 1;
+
+ /**
+ * Stop RX/TX by fake functions and locks.
+ * Fake functions are used to make RX/TX lock easier.
+ */
+ dev->rx_pkt_burst = i40evf_recv_pkts_detach;
+ dev->tx_pkt_burst = i40evf_xmit_pkts_detach;
+
+ if (dev->data->rx_queues)
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ rxq = dev->data->rx_queues[i];
+ rte_spinlock_lock(&rxq->rx_lock);
+ }
+
+ if (dev->data->tx_queues)
+ for (i = 0; i < dev->data->nb_tx_queues; i++) {
+ txq = dev->data->tx_queues[i];
+ rte_spinlock_lock(&txq->tx_lock);
+ }
+
+ i40e_vf_reset_dev(dev);
+
+ adapter->reset_number = 0;
+
+ if (dev->data->rx_queues)
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ rxq = dev->data->rx_queues[i];
+ rte_spinlock_unlock(&rxq->rx_lock);
+ }
+
+ if (dev->data->tx_queues)
+ for (i = 0; i < dev->data->nb_tx_queues; i++) {
+ txq = dev->data->tx_queues[i];
+ rte_spinlock_unlock(&txq->tx_lock);
+ }
+
+ return 0;
+}
+
+void
+i40evf_emulate_vf_reset(uint8_t port_id)
+{
+ struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+ i40evf_handle_vf_reset(dev);
+}
+
+static void
i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
uint8_t *msg,
__rte_unused uint16_t msglen)
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index a1c13b8..7ee33dc 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -141,6 +141,8 @@ struct i40e_rx_queue {
uint16_t rx_using_sse; /**<flag indicate the usage of vPMD for rx */
uint8_t dcb_tc; /**< Traffic class of rx queue */
rte_spinlock_t rx_lock; /**< lock for rx path */
+ uint8_t socket_id;
+ struct rte_eth_rxconf rxconf;
};

struct i40e_tx_entry {
@@ -183,6 +185,8 @@ struct i40e_tx_queue {
bool tx_deferred_start; /**< don't start this queue in dev start */
uint8_t dcb_tc; /**< Traffic class of tx queue */
rte_spinlock_t tx_lock; /**< lock for tx path */
+ uint8_t socket_id;
+ struct rte_eth_txconf txconf;
};

/** Offload features */
--
1.9.3
Wenzhuo Lu
2016-06-15 03:03:31 UTC
Permalink
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe. It means
APP should stop the rx/tx and the device, then reset the
device, then recover the device and rx/tx.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
lib/librte_ether/rte_ethdev.c | 17 +++++++++++++++++
lib/librte_ether/rte_ethdev.h | 14 ++++++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++++++
3 files changed, 38 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e148028..e43dca9 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
-ENOTSUP);
return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
}
+
+int
+rte_eth_dev_reset(uint8_t port_id)
+{
+ struct rte_eth_dev *dev;
+ int diag;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+ dev = &rte_eth_devices[port_id];
+
+ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+ diag = (*dev->dev_ops->dev_reset)(dev);
+
+ return diag;
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..74e895f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
uint8_t en);
/**< @internal enable/disable the l2 tunnel offload functions */

+typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to reset a configured Ethernet device. */
+
#ifdef RTE_NIC_BYPASS

enum {
@@ -1508,6 +1511,8 @@ struct eth_dev_ops {
eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
/** Enable/disable l2 tunnel offload functions */
eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+ /** Reset device. */
+ eth_dev_reset_t dev_reset;
};

/**
@@ -4253,6 +4258,15 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
uint32_t mask,
uint8_t en);

+/**
+ * Reset an Ethernet device.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c34207e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
rte_eth_tx_buffer_set_err_callback;

} DPDK_2.2;
+
+DPDK_16.07 {
+ global:
+
+ rte_eth_dev_reset;
+
+} DPDK_16.04;
--
1.9.3
Bruce Richardson
2016-06-16 15:31:04 UTC
Permalink
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe. It means
APP should stop the rx/tx and the device, then reset the
device, then recover the device and rx/tx.
Since this is adding a new ethdev feature, I think you should also add a new
row to the NIC feature overview matrix so we can record the PMDs which support
it.

/Bruce
Thomas Monjalon
2016-06-16 15:36:44 UTC
Permalink
Post by Wenzhuo Lu
+/**
+ * Reset an Ethernet device.
+ *
+ * The port identifier of the Ethernet device.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
Please explain in the doxygen comment what means a reset.
We must understand why and when an application should call it.
And it must be clear for a PMD developper how to implement it.
What is the return value?
Wenzhuo Lu
2016-06-15 03:03:32 UTC
Permalink
Implement the device reset function.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
doc/guides/rel_notes/release_16_07.rst | 9 +++++
drivers/net/ixgbe/ixgbe_ethdev.c | 64 +++++++++++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++++--
4 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a761e3c..d36c4b1 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,6 +53,15 @@ New Features
VF. To handle this link up/down event, add the mailbox interruption
support to receive the message.

+* **Added device reset support for ixgbe VF.**
+
+ Added the device reset API. APP can call this API to reset the VF port
+ when it's not working.
+ Based on the mailbox interruption support, when VF reseives the control
+ message from PF, it means the PF link state changes, VF uses the reset
+ callback in the message handler to notice the APP. APP need call the device
+ reset API to reset the VF port.
+

Resolved Issues
---------------
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 05f4f29..4e62cbb 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -381,6 +381,8 @@ static int ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
struct rte_eth_udp_tunnel *udp_tunnel);

+static int ixgbevf_dev_reset(struct rte_eth_dev *dev);
+
/*
* Define VF Stats MACRO for Non "cleared on read" register
*/
@@ -586,6 +588,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
.reta_query = ixgbe_dev_rss_reta_query,
.rss_hash_update = ixgbe_dev_rss_hash_update,
.rss_hash_conf_get = ixgbe_dev_rss_hash_conf_get,
+ .dev_reset = ixgbevf_dev_reset,
};

/* store statistics names and its offset in stats structure */
@@ -4052,7 +4055,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
ETH_VLAN_EXTEND_MASK;
ixgbevf_vlan_offload_set(dev, mask);

- ixgbevf_dev_rxtx_start(dev);
+ err = ixgbevf_dev_rxtx_start(dev);
+ if (err)
+ return err;

/* check and configure queue intr-vector mapping */
if (dev->data->dev_conf.intr_conf.rxq != 0) {
@@ -7185,6 +7190,63 @@ static void ixgbevf_mbx_process(struct rte_eth_dev *dev)
}

static int
+ixgbevf_dev_reset(struct rte_eth_dev *dev)
+{
+ struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ int diag = 0;
+ uint32_t vteiam;
+
+ /* Nothing needs to be done if the device is not started. */
+ if (!dev->data->dev_started)
+ return 0;
+
+ PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+ /* Performance VF reset. */
+ do {
+ dev->data->dev_started = 0;
+ ixgbevf_dev_stop(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = ixgbe_dev_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+ "Failed to update link.");
+ }
+ rte_delay_ms(1000);
+
+ diag = ixgbevf_dev_start(dev);
+ /*If fail to start the device, need to stop/start it again. */
+ if (diag) {
+ PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
+ "Failed to start device.");
+ continue;
+ }
+ dev->data->dev_started = 1;
+ ixgbevf_dev_stats_reset(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = ixgbe_dev_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+ "Failed to update link.");
+ diag = 0;
+ }
+
+ /**
+ * When the PF link is down, there has chance
+ * that VF cannot operate its registers. Will
+ * check if the registers is written
+ * successfully. If not, repeat stop/start until
+ * the PF link is up, in other words, until the
+ * registers can be written.
+ */
+ vteiam = IXGBE_READ_REG(hw, IXGBE_VTEIAM);
+ /* Reference ixgbevf_intr_enable when checking */
+ } while (diag || vteiam != IXGBE_VF_IRQ_ENABLE_MASK);
+
+ return 0;
+}
+
+static int
ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev)
{
uint32_t eicr;
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..bc68b43 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -377,7 +377,7 @@ int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);

void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);

-void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
+int ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);

uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 9c6eaf2..aa26c12 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -5147,7 +5147,7 @@ ixgbevf_dev_tx_init(struct rte_eth_dev *dev)
/*
* [VF] Start Transmit and Receive Units.
*/
-void __attribute__((cold))
+int __attribute__((cold))
ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
{
struct ixgbe_hw *hw;
@@ -5183,8 +5183,10 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
rte_delay_ms(1);
txdctl = IXGBE_READ_REG(hw, IXGBE_VFTXDCTL(i));
} while (--poll_ms && !(txdctl & IXGBE_TXDCTL_ENABLE));
- if (!poll_ms)
+ if (!poll_ms) {
PMD_INIT_LOG(ERR, "Could not enable Tx Queue %d", i);
+ return -1;
+ }
}
for (i = 0; i < dev->data->nb_rx_queues; i++) {

@@ -5200,12 +5202,16 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
rte_delay_ms(1);
rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i));
} while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE));
- if (!poll_ms)
+ if (!poll_ms) {
PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d", i);
+ return -1;
+ }
rte_wmb();
IXGBE_WRITE_REG(hw, IXGBE_VFRDT(i), rxq->nb_rx_desc - 1);

}
+
+ return 0;
}

/* Stubs needed for linkage when CONFIG_RTE_IXGBE_INC_VECTOR is set to 'n' */
--
1.9.3
Wenzhuo Lu
2016-06-15 03:03:33 UTC
Permalink
Implement the device reset function.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
doc/guides/rel_notes/release_16_07.rst | 2 +-
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++++++++++++
2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index d36c4b1..a4c0cc3 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,7 +53,7 @@ New Features
VF. To handle this link up/down event, add the mailbox interruption
support to receive the message.

-* **Added device reset support for ixgbe VF.**
+* **Added device reset support for ixgbe/igb VF.**

Added the device reset API. APP can call this API to reset the VF port
when it's not working.
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index b0e5e6a..f1ac4b5 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -268,6 +268,7 @@ static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
void *param);
static void igbvf_mbx_process(struct rte_eth_dev *dev);
+static int igbvf_dev_reset(struct rte_eth_dev *dev);

/*
* Define VF Stats MACRO for Non "cleared on read" register
@@ -409,6 +410,7 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
.mac_addr_set = igbvf_default_mac_addr_set,
.get_reg_length = igbvf_get_reg_length,
.get_reg = igbvf_get_regs,
+ .dev_reset = igbvf_dev_reset,
};

/* store statistics names and its offset in stats structure */
@@ -2655,6 +2657,63 @@ void igbvf_mbx_process(struct rte_eth_dev *dev)
}

static int
+igbvf_dev_reset(struct rte_eth_dev *dev)
+{
+ struct e1000_hw *hw =
+ E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ int diag = 0;
+ uint32_t eiam;
+ /* Reference igbvf_intr_enable */
+ uint32_t eiam_mbx = 1 << E1000_VTIVAR_MISC_MAILBOX;
+
+ /* Nothing needs to be done if the device is not started. */
+ if (!dev->data->dev_started)
+ return 0;
+
+ PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+ /* Performance VF reset. */
+ do {
+ dev->data->dev_started = 0;
+ igbvf_dev_stop(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = eth_igb_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Igb VF reset: "
+ "Failed to update link.");
+ }
+ rte_delay_ms(1000);
+
+ diag = igbvf_dev_start(dev);
+ if (diag) {
+ PMD_INIT_LOG(ERR, "Igb VF reset: "
+ "Failed to start device.");
+ return diag;
+ }
+ dev->data->dev_started = 1;
+ eth_igbvf_stats_reset(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = eth_igb_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Igb VF reset: "
+ "Failed to update link.");
+ }
+
+ /**
+ * When the PF link is down, there has chance
+ * that VF cannot operate its registers. Will
+ * check if the registers is written
+ * successfully. If not, repeat stop/start until
+ * the PF link is up, in other words, until the
+ * registers can be written.
+ */
+ eiam = E1000_READ_REG(hw, E1000_EIAM);
+ } while (!(eiam & eiam_mbx));
+
+ return 0;
+}
+
+static int
eth_igbvf_interrupt_action(struct rte_eth_dev *dev)
{
struct e1000_interrupt *intr =
--
1.9.3
Wenzhuo Lu
2016-06-15 03:03:34 UTC
Permalink
Implement the device reset function.
This reset function will detach device then
attach device, reconfigure dev, re-setup the Rx/Tx queues.

Signed-off-by: Zhe Tao <***@intel.com>
---
doc/guides/rel_notes/release_16_07.rst | 4 ++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83 ++++++++++++++++++++++++++++++++++
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
5 files changed, 105 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a4c0cc3..6661b07 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -62,6 +62,10 @@ New Features
callback in the message handler to notice the APP. APP need call the device
reset API to reset the VF port.

+* **Added VF reset support for i40e VF driver.**
+
+ Added a new implementaion to allow i40e VF driver to
+ reset the functionality and state of itself.

Resolved Issues
---------------
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cfd2399..4e0df3b 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -540,6 +540,8 @@ struct i40e_adapter {
struct rte_timecounter systime_tc;
struct rte_timecounter rx_tstamp_tc;
struct rte_timecounter tx_tstamp_tc;
+ /* For VF reset */
+ uint8_t reset_number;
};

int i40e_dev_switch_queues(struct i40e_pf *pf, bool on);
@@ -593,6 +595,8 @@ void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
struct rte_eth_txq_info *qinfo);

+void i40evf_emulate_vf_reset(uint8_t port_id);
+
/* I40E_DEV_PRIVATE_TO */
#define I40E_DEV_PRIVATE_TO_PF(adapter) \
(&((struct i40e_adapter *)adapter)->pf)
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 90682ac..2f65a29 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -157,6 +157,12 @@ i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id);
static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
uint8_t *msg,
uint16_t msglen);
+static int i40evf_dev_uninit(struct rte_eth_dev *eth_dev);
+static int i40evf_dev_init(struct rte_eth_dev *eth_dev);
+static void i40evf_dev_close(struct rte_eth_dev *dev);
+static int i40evf_dev_start(struct rte_eth_dev *dev);
+static int i40evf_dev_configure(struct rte_eth_dev *dev);
+static int i40evf_handle_vf_reset(struct rte_eth_dev *dev);

/* Default hash key buffer for RSS */
static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -223,6 +229,7 @@ static const struct eth_dev_ops i40evf_eth_dev_ops = {
.reta_query = i40evf_dev_rss_reta_query,
.rss_hash_update = i40evf_dev_rss_hash_update,
.rss_hash_conf_get = i40evf_dev_rss_hash_conf_get,
+ .dev_reset = i40evf_handle_vf_reset
};

/*
@@ -1309,6 +1316,82 @@ i40evf_uninit_vf(struct rte_eth_dev *dev)
}

static void
+i40e_vf_queue_reset(struct rte_eth_dev *dev)
+{
+ uint16_t i;
+
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ struct i40e_rx_queue *rxq = dev->data->rx_queues[i];
+
+ if (rxq->q_set) {
+ i40e_dev_rx_queue_setup(dev,
+ rxq->queue_id,
+ rxq->nb_rx_desc,
+ rxq->socket_id,
+ &rxq->rxconf,
+ rxq->mp);
+ }
+ }
+ for (i = 0; i < dev->data->nb_tx_queues; i++) {
+ struct i40e_tx_queue *txq = dev->data->tx_queues[i];
+
+ if (txq->q_set) {
+ i40e_dev_tx_queue_setup(dev,
+ txq->queue_id,
+ txq->nb_tx_desc,
+ txq->socket_id,
+ &txq->txconf);
+ }
+ }
+}
+
+static void
+i40e_vf_reset_dev(struct rte_eth_dev *dev)
+{
+ struct i40e_adapter *adapter =
+ I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+ i40evf_dev_close(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev close complete");
+ i40evf_dev_uninit(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev detached");
+ memset(dev->data->dev_private, 0,
+ (uint64_t)&adapter->reset_number - (uint64_t)adapter);
+
+ i40evf_dev_configure(dev);
+ i40evf_dev_init(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev attached");
+ i40e_vf_queue_reset(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf queue reset");
+ i40evf_dev_start(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev restart");
+}
+
+static int
+i40evf_handle_vf_reset(struct rte_eth_dev *dev)
+{
+ struct i40e_adapter *adapter =
+ I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+ if (!dev->data->dev_started)
+ return 0;
+
+ adapter->reset_number = 1;
+ i40e_vf_reset_dev(dev);
+ adapter->reset_number = 0;
+
+ return 0;
+}
+
+void
+i40evf_emulate_vf_reset(uint8_t port_id)
+{
+ struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+ i40evf_handle_vf_reset(dev);
+}
+
+static void
i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
uint8_t *msg,
__rte_unused uint16_t msglen)
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index c833aa3..8dbc64c 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -2148,6 +2148,7 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
uint16_t len, i;
uint16_t base, bsf, tc_mapping;
int use_def_burst_func = 1;
+ struct rte_eth_rxconf conf = *rx_conf;

if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
struct i40e_vf *vf =
@@ -2186,6 +2187,8 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
return -ENOMEM;
}
rxq->mp = mp;
+ rxq->socket_id = socket_id;
+ rxq->rxconf = conf;
rxq->nb_rx_desc = nb_desc;
rxq->rx_free_thresh = rx_conf->rx_free_thresh;
rxq->queue_id = queue_idx;
@@ -2365,6 +2368,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
uint32_t ring_size;
uint16_t tx_rs_thresh, tx_free_thresh;
uint16_t i, base, bsf, tc_mapping;
+ struct rte_eth_txconf conf = *tx_conf;

if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
struct i40e_vf *vf =
@@ -2488,6 +2492,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
}

txq->nb_tx_desc = nb_desc;
+ txq->socket_id = socket_id;
+ txq->txconf = conf;
txq->tx_rs_thresh = tx_rs_thresh;
txq->tx_free_thresh = tx_free_thresh;
txq->pthresh = tx_conf->tx_thresh.pthresh;
@@ -2950,8 +2956,12 @@ void
i40e_dev_free_queues(struct rte_eth_dev *dev)
{
uint16_t i;
+ struct i40e_adapter *adapter =
+ I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);

PMD_INIT_FUNC_TRACE();
+ if (adapter->reset_number)
+ return;

for (i = 0; i < dev->data->nb_rx_queues; i++) {
i40e_dev_rx_queue_release(dev->data->rx_queues[i]);
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..9e1b05a 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -140,6 +140,8 @@ struct i40e_rx_queue {
bool rx_deferred_start; /**< don't start this queue in dev start */
uint16_t rx_using_sse; /**<flag indicate the usage of vPMD for rx */
uint8_t dcb_tc; /**< Traffic class of rx queue */
+ uint8_t socket_id;
+ struct rte_eth_rxconf rxconf;
};

struct i40e_tx_entry {
@@ -181,6 +183,8 @@ struct i40e_tx_queue {
bool q_set; /**< indicate if tx queue has been configured */
bool tx_deferred_start; /**< don't start this queue in dev start */
uint8_t dcb_tc; /**< Traffic class of tx queue */
+ uint8_t socket_id;
+ struct rte_eth_txconf txconf;
};

/** Offload features */
--
1.9.3
Wenzhuo Lu
2016-06-20 06:24:26 UTC
Permalink
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the
VF link and let it recover.

PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base on the
patch set *support mailbox interruption on ixgbe/igb VF*.

Wenzhuo Lu (3):
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF

Zhe Tao (1):
i40e: implement device reset on VF

v1:
- Added the implementation for the VF reset functionality.
v2:
- Changed the i40e related operations during VF reset.
v3:
- Resent the patches because of the mail sent issue.
v4:
- Removed some VF reset emulation code.
v5:
- Removed all the code related with lock.
v6:
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.

doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83 ++++++++++++++++++++++++++++++++++
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64 +++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
--
1.9.3
Wenzhuo Lu
2016-06-20 06:24:27 UTC
Permalink
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe. It means
APP should stop the rx/tx and the device, then reset the
device, then recover the device and rx/tx.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
doc/guides/nics/overview.rst | 1 +
lib/librte_ether/rte_ethdev.c | 17 +++++++++++++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++++++++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++++++
4 files changed, 49 insertions(+)

diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 0bd8fae..c8a4985 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -89,6 +89,7 @@ Most of these differences are summarized below.
Speed capabilities
Link status Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Link status event Y Y Y Y Y Y Y Y Y Y Y Y Y
+ Link reset Y Y Y Y Y
Queue status event Y
Rx interrupt Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Queue start/stop Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e148028..6c0449b 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
-ENOTSUP);
return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
}
+
+int
+rte_eth_dev_reset(uint8_t port_id)
+{
+ struct rte_eth_dev *dev;
+ int diag;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+
+ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+ diag = (*dev->dev_ops->dev_reset)(dev);
+
+ return diag;
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..5b3ba12 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
uint8_t en);
/**< @internal enable/disable the l2 tunnel offload functions */

+typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to reset a configured Ethernet device. */
+
#ifdef RTE_NIC_BYPASS

enum {
@@ -1508,6 +1511,8 @@ struct eth_dev_ops {
eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
/** Enable/disable l2 tunnel offload functions */
eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+ /** Reset device. */
+ eth_dev_reset_t dev_reset;
};

/**
@@ -4253,6 +4258,25 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
uint32_t mask,
uint8_t en);

+/**
+ * Reset an ethernet device when it's not working. One scenario is, after PF
+ * port is down and up, the related VF port should be reset.
+ * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
+ * queues, restart the port.
+ * Before calling this API, APP should stop the rx/tx. When tx is being stopped,
+ * APP can drop the packets and release the buffer instead of sending them.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ *
+ * @return
+ * - (0) if successful.
+ * - (-ENODEV) if port identifier is invalid.
+ * - (-ENOTSUP) if hardware doesn't support this function.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c34207e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
rte_eth_tx_buffer_set_err_callback;

} DPDK_2.2;
+
+DPDK_16.07 {
+ global:
+
+ rte_eth_dev_reset;
+
+} DPDK_16.04;
--
1.9.3
Jerin Jacob
2016-06-20 09:14:11 UTC
Permalink
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe. It means
APP should stop the rx/tx and the device, then reset the
device, then recover the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be _the_ use
case. IMO, We need to first say expected behavior of this API and add a use-case
later.

Other use-case would be, PCIe VF with functional level reset for SRIOV
migration.
Are we on same page?
Post by Wenzhuo Lu
---
doc/guides/nics/overview.rst | 1 +
lib/librte_ether/rte_ethdev.c | 17 +++++++++++++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++++++++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++++++
4 files changed, 49 insertions(+)
diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 0bd8fae..c8a4985 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -89,6 +89,7 @@ Most of these differences are summarized below.
Speed capabilities
Link status Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Link status event Y Y Y Y Y Y Y Y Y Y Y Y Y
+ Link reset Y Y Y Y Y
More appropriate would be "Device reset" ? Right?
Post by Wenzhuo Lu
Queue status event Y
Rx interrupt Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Queue start/stop Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e148028..6c0449b 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
-ENOTSUP);
return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
}
+
+int
+rte_eth_dev_reset(uint8_t port_id)
+{
+ struct rte_eth_dev *dev;
+ int diag;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+
+ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+ diag = (*dev->dev_ops->dev_reset)(dev);
+
+ return diag;
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..5b3ba12 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
uint8_t en);
+typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+
#ifdef RTE_NIC_BYPASS
enum {
@@ -1508,6 +1511,8 @@ struct eth_dev_ops {
eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
/** Enable/disable l2 tunnel offload functions */
eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+ /** Reset device. */
+ eth_dev_reset_t dev_reset;
};
/**
@@ -4253,6 +4258,25 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
uint32_t mask,
uint8_t en);
+/**
+ * Reset an ethernet device when it's not working. One scenario is, after PF
+ * port is down and up, the related VF port should be reset.
+ * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
+ * queues, restart the port.
+ * Before calling this API, APP should stop the rx/tx. When tx is being stopped,
+ * APP can drop the packets and release the buffer instead of sending them.
Same as first comment.
Post by Wenzhuo Lu
+ *
+ * The port identifier of the Ethernet device.
+ *
+ * - (0) if successful.
+ * - (-ENODEV) if port identifier is invalid.
+ * - (-ENOTSUP) if hardware doesn't support this function.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c34207e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
rte_eth_tx_buffer_set_err_callback;
} DPDK_2.2;
+
+DPDK_16.07 {
+
+ rte_eth_dev_reset;
+
+} DPDK_16.04;
--
1.9.3
Stephen Hemminger
2016-06-20 16:17:14 UTC
Permalink
On Mon, 20 Jun 2016 14:44:11 +0530
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe. It means
APP should stop the rx/tx and the device, then reset the
device, then recover the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be _the_ use
case. IMO, We need to first say expected behavior of this API and add a use-case
later.
Other use-case would be, PCIe VF with functional level reset for SRIOV
migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by the
device driver in the start routine. Since any use case which needs
this is going to do a stop/reset/start sequence, why not just have
the VF device driver do this in the start routine?.

Adding yet another API and state transistion if not necessary increases
the complexity and required test cases for all devices.
Jerin Jacob
2016-06-21 03:51:25 UTC
Permalink
Post by Stephen Hemminger
On Mon, 20 Jun 2016 14:44:11 +0530
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe. It means
APP should stop the rx/tx and the device, then reset the
device, then recover the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be _the_ use
case. IMO, We need to first say expected behavior of this API and add a use-case
later.
Other use-case would be, PCIe VF with functional level reset for SRIOV
migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by the
device driver in the start routine. Since any use case which needs
this is going to do a stop/reset/start sequence, why not just have
the VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary increases
the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API
Lu, Wenzhuo
2016-06-21 06:14:29 UTC
Permalink
Hi Jerin, Stephen,
-----Original Message-----
Sent: Tuesday, June 21, 2016 11:51 AM
To: Stephen Hemminger
Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Stephen Hemminger
On Mon, 20 Jun 2016 14:44:11 +0530
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset VF
port. Most likely, APP should call it in its management thread and
guarantee the thread safe. It means APP should stop the rx/tx and
the device, then reset the device, then recover the device and
rx/tx.
Following is _a_ use-case for Device reset. But may be not be _the_
use case. IMO, We need to first say expected behavior of this API
and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by the
device driver in the start routine. Since any use case which needs
this is going to do a stop/reset/start sequence, why not just have the
VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start after the
device reset then we could add this logic in start itself rather exposing a yet
another API
Do you mean changing the device_start to include all these actions, stop device -> stop queue -> re-setup queue -> start queue -> start device ?
Jerin Jacob
2016-06-21 07:37:11 UTC
Permalink
Post by Lu, Wenzhuo
Hi Jerin, Stephen,
-----Original Message-----
Sent: Tuesday, June 21, 2016 11:51 AM
To: Stephen Hemminger
Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Stephen Hemminger
On Mon, 20 Jun 2016 14:44:11 +0530
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset VF
port. Most likely, APP should call it in its management thread and
guarantee the thread safe. It means APP should stop the rx/tx and
the device, then reset the device, then recover the device and
rx/tx.
Following is _a_ use-case for Device reset. But may be not be _the_
use case. IMO, We need to first say expected behavior of this API
and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by the
device driver in the start routine. Since any use case which needs
this is going to do a stop/reset/start sequence, why not just have the
VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start after the
device reset then we could add this logic in start itself rather exposing a yet
another API
Do you mean changing the device_start to include all these actions, stop device -> stop queue -> re-setup queue -> start queue -> start device ?
What was the expected API call sequence when you were introduced this API?

Point was to have implicit device reset in the API call
sequence(Wherever make sense for specific PMD)

Jerin
Lu, Wenzhuo
2016-06-21 08:24:36 UTC
Permalink
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 3:37 PM
To: Lu, Wenzhuo
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin, Stephen,
-----Original Message-----
Sent: Tuesday, June 21, 2016 11:51 AM
To: Stephen Hemminger
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
On Mon, 20 Jun 2016 14:44:11 +0530 Jerin Jacob
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset
VF port. Most likely, APP should call it in its management
thread and guarantee the thread safe. It means APP should stop
the rx/tx and the device, then reset the device, then recover
the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be
_the_ use case. IMO, We need to first say expected behavior of
this API and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by
the device driver in the start routine. Since any use case which
needs this is going to do a stop/reset/start sequence, why not
just have the VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API
Do you mean changing the device_start to include all these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start device ?
What was the expected API call sequence when you were introduced this API?
Point was to have implicit device reset in the API call sequence(Wherever make
sense for specific PMD)
I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this API call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup -> rte_eth_dev_start.
Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The benefit is to save the code for APP.
Jerin
Jerin Jacob
2016-06-21 08:55:32 UTC
Permalink
Post by Lu, Wenzhuo
Hi Jerin,
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset
VF port. Most likely, APP should call it in its management
thread and guarantee the thread safe. It means APP should stop
the rx/tx and the device, then reset the device, then recover
the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be
_the_ use case. IMO, We need to first say expected behavior of
this API and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by
the device driver in the start routine. Since any use case which
needs this is going to do a stop/reset/start sequence, why not
just have the VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API
Do you mean changing the device_start to include all these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start device ?
What was the expected API call sequence when you were introduced this API?
Point was to have implicit device reset in the API call sequence(Wherever make
sense for specific PMD)
I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this API call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup -> rte_eth_dev_start.
Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The benefit is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for this device? Is
n't it?

The other issue is generalized outlook of the API, Certain PMD will not
have PF link down/up event? Link down/up and only connected to VF and PF
only for configuration.

How about fixing it more transparently in PMD driver itself as
PMD driver knows the PF link up/down event, Is it possible to
recover the VF on that event if its only matter of resetting it?

Jerin
Ananyev, Konstantin
2016-06-21 09:26:12 UTC
Permalink
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 9:56 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin,
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset
VF port. Most likely, APP should call it in its management
thread and guarantee the thread safe. It means APP should stop
the rx/tx and the device, then reset the device, then recover
the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be
_the_ use case. IMO, We need to first say expected behavior of
this API and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by
the device driver in the start routine. Since any use case which
needs this is going to do a stop/reset/start sequence, why not
just have the VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API
Do you mean changing the device_start to include all these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start device ?
What was the expected API call sequence when you were introduced this API?
Point was to have implicit device reset in the API call sequence(Wherever make
sense for specific PMD)
I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this API
call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup ->
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The benefit
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for this device? Is
n't it?
The other issue is generalized outlook of the API, Certain PMD will not
have PF link down/up event? Link down/up and only connected to VF and PF
only for configuration.
How about fixing it more transparently in PMD driver itself as
PMD driver knows the PF link up/down event, Is it possible to
recover the VF on that event if its only matter of resetting it?
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of synchronisation
between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function will be a problem.
Yes, it would exist only for VFs, for PF it could be left unimplemented.
Though it definitely seems more convenient from user point of view,
they would know: to handle VF reset event, they just need to call that
particular function, not to re-implement their own.

Konstantin
Jerin
Jerin Jacob
2016-06-21 10:57:52 UTC
Permalink
On Tue, Jun 21, 2016 at 09:26:12AM +0000, Ananyev, Konstantin wrote:

Hi Konstantin,
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 9:56 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin,
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset
VF port. Most likely, APP should call it in its management
thread and guarantee the thread safe. It means APP should stop
the rx/tx and the device, then reset the device, then recover
the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be
_the_ use case. IMO, We need to first say expected behavior of
this API and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by
the device driver in the start routine. Since any use case which
needs this is going to do a stop/reset/start sequence, why not
just have the VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API
Do you mean changing the device_start to include all these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start device ?
What was the expected API call sequence when you were introduced this API?
Point was to have implicit device reset in the API call sequence(Wherever make
sense for specific PMD)
I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this API
call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup ->
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The benefit
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for this device? Is
n't it?
The other issue is generalized outlook of the API, Certain PMD will not
have PF link down/up event? Link down/up and only connected to VF and PF
only for configuration.
How about fixing it more transparently in PMD driver itself as
PMD driver knows the PF link up/down event, Is it possible to
recover the VF on that event if its only matter of resetting it?
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of synchronisation
between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function will be a problem.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the stop then what
value addtion reset API brings on the table?
Post by Lu, Wenzhuo
Yes, it would exist only for VFs, for PF it could be left unimplemented.
Though it definitely seems more convenient from user point of view,
they would know: to handle VF reset event, they just need to call that
particular function, not to re-implement their own.
What if driver returns "not implemented" then application will have do
generic rte_eth_dev_stop/rte_eth_dev_start.That way in application
perspective we are NOT solving any problem.

Jerin
Ananyev, Konstantin
2016-06-21 13:10:40 UTC
Permalink
Post by Jerin Jacob
Hi Konstantin,
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 9:56 AM
To: Lu, Wenzhuo
Zhang,
Post by Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin,
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset
VF port. Most likely, APP should call it in its management
thread and guarantee the thread safe. It means APP should stop
the rx/tx and the device, then reset the device, then recover
the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be
_the_ use case. IMO, We need to first say expected behavior of
this API and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by
the device driver in the start routine. Since any use case which
needs this is going to do a stop/reset/start sequence, why not
just have the VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API
Do you mean changing the device_start to include all these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start device ?
What was the expected API call sequence when you were introduced this API?
Point was to have implicit device reset in the API call sequence(Wherever make
sense for specific PMD)
I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this
API
Post by Lu, Wenzhuo
call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup -
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The
benefit
Post by Lu, Wenzhuo
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for this device? Is
n't it?
The other issue is generalized outlook of the API, Certain PMD will not
have PF link down/up event? Link down/up and only connected to VF and PF
only for configuration.
How about fixing it more transparently in PMD driver itself as
PMD driver knows the PF link up/down event, Is it possible to
recover the VF on that event if its only matter of resetting it?
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of synchronisation
between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function will be a problem.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the stop then what
value addtion reset API brings on the table?
If application calls dev_reset() it doesn't need to call dev_stop() before it.
dev_reset() will take care of it.
But it needs to make sure that no other thread will try to modify that device state
(either dev_stop/start, or eth_rx_busrst/eth_tx_burst) while the reset op is in place.
Post by Jerin Jacob
Post by Lu, Wenzhuo
Yes, it would exist only for VFs, for PF it could be left unimplemented.
Though it definitely seems more convenient from user point of view,
they would know: to handle VF reset event, they just need to call that
particular function, not to re-implement their own.
What if driver returns "not implemented" then application will have do
generic rte_eth_dev_stop/rte_eth_dev_start.
That way in application perspective we are NOT solving any problem.
True, but as I said for PF application would just never receive such event.
I suppose it is possible to implement one for PF too, I just don't see
much point - as probably no-one will ever use it.

Konstantin
Jerin Jacob
2016-06-21 13:30:42 UTC
Permalink
Post by Ananyev, Konstantin
Post by Jerin Jacob
Hi Konstantin,
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 9:56 AM
To: Lu, Wenzhuo
Zhang,
Post by Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin,
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset
VF port. Most likely, APP should call it in its management
thread and guarantee the thread safe. It means APP should stop
the rx/tx and the device, then reset the device, then recover
the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be
_the_ use case. IMO, We need to first say expected behavior of
this API and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by
the device driver in the start routine. Since any use case which
needs this is going to do a stop/reset/start sequence, why not
just have the VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API
Do you mean changing the device_start to include all these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start device ?
What was the expected API call sequence when you were introduced this API?
Point was to have implicit device reset in the API call sequence(Wherever make
sense for specific PMD)
I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this
API
Post by Lu, Wenzhuo
call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup -
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The
benefit
Post by Lu, Wenzhuo
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for this device? Is
n't it?
The other issue is generalized outlook of the API, Certain PMD will not
have PF link down/up event? Link down/up and only connected to VF and PF
only for configuration.
How about fixing it more transparently in PMD driver itself as
PMD driver knows the PF link up/down event, Is it possible to
recover the VF on that event if its only matter of resetting it?
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of synchronisation
between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function will be a problem.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the stop then what
value addtion reset API brings on the table?
If application calls dev_reset() it doesn't need to call dev_stop() before it.
dev_reset() will take care of it.
But it needs to make sure that no other thread will try to modify that device state
(either dev_stop/start, or eth_rx_busrst/eth_tx_burst) while the reset op is in place.
OK. This description looks different than commit log and API doxygen comment. Please fix it.
How about a different name for this API. Device reset is too generic?
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Yes, it would exist only for VFs, for PF it could be left unimplemented.
Though it definitely seems more convenient from user point of view,
they would know: to handle VF reset event, they just need to call that
particular function, not to re-implement their own.
What if driver returns "not implemented" then application will have do
generic rte_eth_dev_stop/rte_eth_dev_start.
That way in application perspective we are NOT solving any problem.
True, but as I said for PF application would just never receive such event.
What is this event ? Is it VF Link up/down event?

No I was referring to VF itself, Other VF PMD drivers in drivers/net
where this callback is not implemented.

Jerin
Post by Ananyev, Konstantin
I suppose it is possible to implement one for PF too, I just don't see
much point - as probably no-one will ever use it.
Konstantin
Ananyev, Konstantin
2016-06-21 14:03:15 UTC
Permalink
-----Original Message-----
Sent: Tuesday, June 21, 2016 2:31 PM
To: Ananyev, Konstantin
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Ananyev, Konstantin
Post by Jerin Jacob
Hi Konstantin,
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 9:56 AM
To: Lu, Wenzhuo
Zhang,
Post by Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin,
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset
VF port. Most likely, APP should call it in its management
thread and guarantee the thread safe. It means APP should stop
the rx/tx and the device, then reset the device, then recover
the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be
_the_ use case. IMO, We need to first say expected behavior of
this API and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by
the device driver in the start routine. Since any use case which
needs this is going to do a stop/reset/start sequence, why not
just have the VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API
Do you mean changing the device_start to include all these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start device ?
What was the expected API call sequence when you were introduced this API?
Point was to have implicit device reset in the API call sequence(Wherever make
sense for specific PMD)
I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use
this
Post by Ananyev, Konstantin
Post by Jerin Jacob
API
Post by Lu, Wenzhuo
call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup ->
rte_eth_tx_queue_setup -
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The
benefit
Post by Lu, Wenzhuo
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for this device? Is
n't it?
The other issue is generalized outlook of the API, Certain PMD will not
have PF link down/up event? Link down/up and only connected to VF and PF
only for configuration.
How about fixing it more transparently in PMD driver itself as
PMD driver knows the PF link up/down event, Is it possible to
recover the VF on that event if its only matter of resetting it?
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of synchronisation
between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function will be a problem.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the stop then what
value addtion reset API brings on the table?
If application calls dev_reset() it doesn't need to call dev_stop() before it.
dev_reset() will take care of it.
But it needs to make sure that no other thread will try to modify that device state
(either dev_stop/start, or eth_rx_busrst/eth_tx_burst) while the reset op is in place.
OK. This description looks different than commit log and API doxygen comment. Please fix it.
How about a different name for this API. Device reset is too generic?
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Yes, it would exist only for VFs, for PF it could be left unimplemented.
Though it definitely seems more convenient from user point of view,
they would know: to handle VF reset event, they just need to call that
particular function, not to re-implement their own.
What if driver returns "not implemented" then application will have do
generic rte_eth_dev_stop/rte_eth_dev_start.
That way in application perspective we are NOT solving any problem.
True, but as I said for PF application would just never receive such event.
What is this event ? Is it VF Link up/down event?
No I was referring to VF itself, Other VF PMD drivers in drivers/net
where this callback is not implemented.
Hmm, the only suggestion I have here -
Maintainers/developers of non-Intel PMD will implement it for their VFs?
In case of course they do need to handle similar event.
if not I suppose there is no harm to left it unimplemented.
Konstantin
Jerin Jacob
2016-06-21 14:29:25 UTC
Permalink
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Post by Jerin Jacob
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset
VF port. Most likely, APP should call it in its management
thread and guarantee the thread safe. It means APP should stop
the rx/tx and the device, then reset the device, then recover
the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be
_the_ use case. IMO, We need to first say expected behavior of
this API and add a use-case later.
Other use-case would be, PCIe VF with functional level reset for
SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is normally handled by
the device driver in the start routine. Since any use case which
needs this is going to do a stop/reset/start sequence, why not
just have the VF device driver do this in the start routine?.
Adding yet another API and state transistion if not necessary
increases the complexity and required test cases for all devices.
I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API
Do you mean changing the device_start to include all these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start device ?
What was the expected API call sequence when you were introduced this API?
Point was to have implicit device reset in the API call sequence(Wherever make
sense for specific PMD)
I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use
this
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
API
Post by Ananyev, Konstantin
Post by Jerin Jacob
call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup ->
rte_eth_tx_queue_setup -
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The
benefit
Post by Ananyev, Konstantin
Post by Jerin Jacob
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for this device? Is
n't it?
The other issue is generalized outlook of the API, Certain PMD will not
have PF link down/up event? Link down/up and only connected to VF and PF
only for configuration.
How about fixing it more transparently in PMD driver itself as
PMD driver knows the PF link up/down event, Is it possible to
recover the VF on that event if its only matter of resetting it?
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of synchronisation
between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function will be a problem.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the stop then what
value addtion reset API brings on the table?
If application calls dev_reset() it doesn't need to call dev_stop() before it.
dev_reset() will take care of it.
But it needs to make sure that no other thread will try to modify that device state
(either dev_stop/start, or eth_rx_busrst/eth_tx_burst) while the reset op is in place.
OK. This description looks different than commit log and API doxygen comment. Please fix it.
How about a different name for this API. Device reset is too generic?
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Yes, it would exist only for VFs, for PF it could be left unimplemented.
Though it definitely seems more convenient from user point of view,
they would know: to handle VF reset event, they just need to call that
particular function, not to re-implement their own.
What if driver returns "not implemented" then application will have do
generic rte_eth_dev_stop/rte_eth_dev_start.
That way in application perspective we are NOT solving any problem.
True, but as I said for PF application would just never receive such event.
What is this event ? Is it VF Link up/down event?
No I was referring to VF itself, Other VF PMD drivers in drivers/net
where this callback is not implemented.
Hmm, the only suggestion I have here -
Maintainers/developers of non-Intel PMD will implement it for their VFs?
That's fine. But, We have to know what to implement here in PMD perspective?
That's reason being asking about the API expectation and application usage :-)
Post by Ananyev, Konstantin
In case of course they do need to handle similar event.
Which is this event and How application get notify it.
Post by Ananyev, Konstantin
if not I suppose there is no harm to left it unimplemented.
OK. If it is for VF/PF link down-up event then I will make it as 'nop'.

Jerin
Post by Ananyev, Konstantin
Konstantin
Lu, Wenzhuo
2016-06-22 01:35:37 UTC
Permalink
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 10:29 PM
To: Ananyev, Konstantin
Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call
this API to reset VF port. Most likely, APP
should call it in its management thread and
guarantee the thread safe. It means APP should
stop the rx/tx and the device, then reset the device, then
recover the device and rx/tx.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Following is _a_ use-case for Device reset. But
may be not be _the_ use case. IMO, We need to
first say expected behavior of this API and add a use-case
later.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Other use-case would be, PCIe VF with functional
level reset for SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is
normally handled by the device driver in the start
routine. Since any use case which needs this is
going to do a stop/reset/start sequence, why not just have
the VF device driver do this in the start routine?.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Adding yet another API and state transistion if
not necessary increases the complexity and required test
cases for all devices.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
I agree with Stephen here.I think if application
needs to call start after the device reset then we
could add this logic in start itself rather exposing
a yet another API
Do you mean changing the device_start to include all
these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start
device ?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
What was the expected API call sequence when you were
introduced this API?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Point was to have implicit device reset in the API call
sequence(Wherever make sense for specific PMD)
I think the API call sequence depends on the
implementation of the APP. Let's say if there's not this
reset API, APP can use
this
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
API
Post by Ananyev, Konstantin
Post by Jerin Jacob
call sequence to handle the PF link down/up event,
rte_eth_dev_close -> rte_eth_rx_queue_setup ->
rte_eth_tx_queue_setup -
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead of
the API call sequence. You can see the reset API is not
necessary. The
benefit
Post by Ananyev, Konstantin
Post by Jerin Jacob
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then
|reset the device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for
this device? Is n't it?
The other issue is generalized outlook of the API, Certain
PMD will not have PF link down/up event? Link down/up and
only connected to VF and PF only for configuration.
How about fixing it more transparently in PMD driver itself
as PMD driver knows the PF link up/down event, Is it
possible to recover the VF on that event if its only matter of resetting
it?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of
synchronisation between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function will be a
problem.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
|
|It means APP should stop the rx/tx and the device, then reset
|the device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the stop
then what value addtion reset API brings on the table?
If application calls dev_reset() it doesn't need to call dev_stop() before it.
dev_reset() will take care of it.
But it needs to make sure that no other thread will try to modify
that device state (either dev_stop/start, or eth_rx_busrst/eth_tx_burst)
while the reset op is in place.
Post by Lu, Wenzhuo
OK. This description looks different than commit log and API doxygen
comment. Please fix it.
Post by Lu, Wenzhuo
How about a different name for this API. Device reset is too generic?
Any suggestion? I use this name because I believe what this API do is to reset the device.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Yes, it would exist only for VFs, for PF it could be left unimplemented.
Though it definitely seems more convenient from user point of
view, they would know: to handle VF reset event, they just
need to call that particular function, not to re-implement their own.
What if driver returns "not implemented" then application will
have do generic rte_eth_dev_stop/rte_eth_dev_start.
That way in application perspective we are NOT solving any problem.
True, but as I said for PF application would just never receive such event.
What is this event ? Is it VF Link up/down event?
No I was referring to VF itself, Other VF PMD drivers in drivers/net
where this callback is not implemented.
Hmm, the only suggestion I have here - Maintainers/developers of
non-Intel PMD will implement it for their VFs?
That's fine. But, We have to know what to implement here in PMD perspective?
That's reason being asking about the API expectation and application usage :-)
In case of course they do need to handle similar event.
Which is this event and How application get notify it.
When the PF link is down/up, the PF will use the mailbox to send a message to VF. The event here means the VF receives that message from PF. So VF can know the physical link state changed. You see it's only for VF. PF will not receive such kind of message.
And we use the callback mechanism to let APP notified. APP should register a callback function. When VF driver receives the message it will call the callback function, then APP can know that.
if not I suppose there is no harm to left it unimplemented.
OK. If it is for VF/PF link down-up event then I will make it as 'nop'.
As explained above, the event is not VF/PF link down-up. Actually it's that VF is notified the PF link is down-up.

And to my opinion, although now we only implement the reset API for VF, I believe there's nothing preventing us to implement this API for PF if we can find some scenario that we need to reset the PF link. The reset API is reset API, it can be used for the event described above. But it's not bound to this event.
Jerin
Konstantin
Jerin Jacob
2016-06-22 02:37:47 UTC
Permalink
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 10:29 PM
To: Ananyev, Konstantin
Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call
this API to reset VF port. Most likely, APP
should call it in its management thread and
guarantee the thread safe. It means APP should
stop the rx/tx and the device, then reset the device, then
recover the device and rx/tx.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Following is _a_ use-case for Device reset. But
may be not be _the_ use case. IMO, We need to
first say expected behavior of this API and add a use-case
later.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Other use-case would be, PCIe VF with functional
level reset for SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is
normally handled by the device driver in the start
routine. Since any use case which needs this is
going to do a stop/reset/start sequence, why not just have
the VF device driver do this in the start routine?.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Adding yet another API and state transistion if
not necessary increases the complexity and required test
cases for all devices.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
I agree with Stephen here.I think if application
needs to call start after the device reset then we
could add this logic in start itself rather exposing
a yet another API
Do you mean changing the device_start to include all
these actions, stop
device -> stop queue -> re-setup queue -> start queue -> start
device ?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
What was the expected API call sequence when you were
introduced this API?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Point was to have implicit device reset in the API call
sequence(Wherever make sense for specific PMD)
I think the API call sequence depends on the
implementation of the APP. Let's say if there's not this
reset API, APP can use
this
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
API
Post by Ananyev, Konstantin
Post by Jerin Jacob
call sequence to handle the PF link down/up event,
rte_eth_dev_close -> rte_eth_rx_queue_setup ->
rte_eth_tx_queue_setup -
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead of
the API call sequence. You can see the reset API is not
necessary. The
benefit
Post by Ananyev, Konstantin
Post by Jerin Jacob
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then
|reset the device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for
this device? Is n't it?
The other issue is generalized outlook of the API, Certain
PMD will not have PF link down/up event? Link down/up and
only connected to VF and PF only for configuration.
How about fixing it more transparently in PMD driver itself
as PMD driver knows the PF link up/down event, Is it
possible to recover the VF on that event if its only matter of resetting
it?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of
synchronisation between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function will be a
problem.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
|
|It means APP should stop the rx/tx and the device, then reset
|the device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the stop
then what value addtion reset API brings on the table?
If application calls dev_reset() it doesn't need to call dev_stop() before it.
dev_reset() will take care of it.
But it needs to make sure that no other thread will try to modify
that device state (either dev_stop/start, or eth_rx_busrst/eth_tx_burst)
while the reset op is in place.
Post by Lu, Wenzhuo
OK. This description looks different than commit log and API doxygen
comment. Please fix it.
Post by Lu, Wenzhuo
How about a different name for this API. Device reset is too generic?
Any suggestion? I use this name because I believe what this API do is to reset the device.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Yes, it would exist only for VFs, for PF it could be left unimplemented.
Though it definitely seems more convenient from user point of
view, they would know: to handle VF reset event, they just
need to call that particular function, not to re-implement their own.
What if driver returns "not implemented" then application will
have do generic rte_eth_dev_stop/rte_eth_dev_start.
That way in application perspective we are NOT solving any problem.
True, but as I said for PF application would just never receive such event.
What is this event ? Is it VF Link up/down event?
No I was referring to VF itself, Other VF PMD drivers in drivers/net
where this callback is not implemented.
Hmm, the only suggestion I have here - Maintainers/developers of
non-Intel PMD will implement it for their VFs?
That's fine. But, We have to know what to implement here in PMD perspective?
That's reason being asking about the API expectation and application usage :-)
In case of course they do need to handle similar event.
Which is this event and How application get notify it.
When the PF link is down/up, the PF will use the mailbox to send a message to VF. The event here means the VF receives that message from PF. So VF can know the physical link state changed. You see it's only for VF. PF will not receive such kind of message.
And we use the callback mechanism to let APP notified. APP should register a callback function. When VF driver receives the message it will call the callback function, then APP can know that.
How about the standardizing a name for that event like
RTE_ETH_EVENT_INTR_DOWNSTREAM_LSC or
RTE_ETH_EVENT_INTR_PF_LSC or similar (like RTE_ETH_EVENT_INTR_RESET)
and counter API in VF to handle the specific event whose API name
similar to selected event name not eth_dev_reset(reset sounds like more
like HW reset, In PCIe device perspective FLR etc)

OR

How about handling in more generic way where a generic alert message
send by PF to VF like RTE_ETH_EVENT_INTR_PF_ALERT or similar.
And have only one handle functions in VF side so that in future
we can keep adding new functionality with out introducing new counter API in VF

Jerin
Post by Lu, Wenzhuo
if not I suppose there is no harm to left it unimplemented.
OK. If it is for VF/PF link down-up event then I will make it as 'nop'.
As explained above, the event is not VF/PF link down-up. Actually it's that VF is notified the PF link is down-up.
And to my opinion, although now we only implement the reset API for VF, I believe there's nothing preventing us to implement this API for PF if we can find some scenario that we need to reset the PF link. The reset API is reset API, it can be used for the event described above. But it's not bound to this event.
Jerin
Konstantin
Lu, Wenzhuo
2016-06-22 03:32:16 UTC
Permalink
Hi Jerin,
-----Original Message-----
Sent: Wednesday, June 22, 2016 10:38 AM
To: Lu, Wenzhuo
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 10:29 PM
To: Ananyev, Konstantin
Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
On Mon, Jun 20, 2016 at 02:24:27PM +0800,
Wenzhuo Lu
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call
this API to reset VF port. Most likely,
APP should call it in its management
thread and guarantee the thread safe. It
means APP should stop the rx/tx and the
device, then reset the device, then
recover the device and rx/tx.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Following is _a_ use-case for Device reset.
But may be not be _the_ use case. IMO, We
need to first say expected behavior of this
API and add a use-case
later.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Other use-case would be, PCIe VF with
functional level reset for SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is
normally handled by the device driver in the
start routine. Since any use case which needs
this is going to do a stop/reset/start
sequence, why not just have
the VF device driver do this in the start routine?.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Adding yet another API and state transistion
if not necessary increases the complexity and
required test
cases for all devices.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
I agree with Stephen here.I think if application
needs to call start after the device reset then
we could add this logic in start itself rather
exposing a yet another API
Do you mean changing the device_start to include
all these actions, stop
device -> stop queue -> re-setup queue -> start
queue -> start
device ?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
What was the expected API call sequence when you
were
introduced this API?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Point was to have implicit device reset in the API
call sequence(Wherever make sense for specific PMD)
I think the API call sequence depends on the
implementation of the APP. Let's say if there's not
this reset API, APP can use
this
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
API
Post by Ananyev, Konstantin
Post by Jerin Jacob
call sequence to handle the PF link down/up event,
rte_eth_dev_close -> rte_eth_rx_queue_setup ->
rte_eth_tx_queue_setup -
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead
of the API call sequence. You can see the reset API is
not necessary. The
benefit
Post by Ananyev, Konstantin
Post by Jerin Jacob
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then
|reset the device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for
this device? Is n't it?
The other issue is generalized outlook of the API,
Certain PMD will not have PF link down/up event? Link
down/up and only connected to VF and PF only for configuration.
How about fixing it more transparently in PMD driver
itself as PMD driver knows the PF link up/down event, Is
it possible to recover the VF on that event if its only
matter of resetting
it?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of
synchronisation between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function
will be a
problem.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
|
|It means APP should stop the rx/tx and the device, then
|reset the device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the
stop then what value addtion reset API brings on the table?
If application calls dev_reset() it doesn't need to call dev_stop() before
it.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
dev_reset() will take care of it.
But it needs to make sure that no other thread will try to
modify that device state (either dev_stop/start, or
eth_rx_busrst/eth_tx_burst)
while the reset op is in place.
Post by Lu, Wenzhuo
OK. This description looks different than commit log and API doxygen
comment. Please fix it.
Post by Lu, Wenzhuo
How about a different name for this API. Device reset is too generic?
Any suggestion? I use this name because I believe what this API do is to reset
the device.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Yes, it would exist only for VFs, for PF it could be left
unimplemented.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Though it definitely seems more convenient from user point
of view, they would know: to handle VF reset event, they
just need to call that particular function, not to re-implement their
own.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
What if driver returns "not implemented" then application
will have do generic rte_eth_dev_stop/rte_eth_dev_start.
That way in application perspective we are NOT solving any problem.
True, but as I said for PF application would just never receive such event.
What is this event ? Is it VF Link up/down event?
No I was referring to VF itself, Other VF PMD drivers in
drivers/net where this callback is not implemented.
Hmm, the only suggestion I have here - Maintainers/developers of
non-Intel PMD will implement it for their VFs?
That's fine. But, We have to know what to implement here in PMD
perspective?
Post by Lu, Wenzhuo
That's reason being asking about the API expectation and application usage :-)
In case of course they do need to handle similar event.
Which is this event and How application get notify it.
When the PF link is down/up, the PF will use the mailbox to send a message to
VF. The event here means the VF receives that message from PF. So VF can know
the physical link state changed. You see it's only for VF. PF will not receive such
kind of message.
Post by Lu, Wenzhuo
And we use the callback mechanism to let APP notified. APP should register a
callback function. When VF driver receives the message it will call the callback
function, then APP can know that.
How about the standardizing a name for that event like
RTE_ETH_EVENT_INTR_DOWNSTREAM_LSC or RTE_ETH_EVENT_INTR_PF_LSC
or similar (like RTE_ETH_EVENT_INTR_RESET) and counter API in VF to handle
the specific event whose API name similar to selected event name not
eth_dev_reset(reset sounds like more like HW reset, In PCIe device perspective
FLR etc)
OR
How about handling in more generic way where a generic alert message send by
PF to VF like RTE_ETH_EVENT_INTR_PF_ALERT or similar.
And have only one handle functions in VF side so that in future we can keep
adding new functionality with out introducing new counter API in VF
Jerin
Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call back functions with the events.
Actually I want the APP to register a callback function reset_event_callback for the reset event. Like this,
/* register reset interrupt callback */
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET, reset_event_callback, NULL);
And when the VF driver finds PF link down/up, it should use _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run into the callback which is provided by APP. Means reset_event_callback here.
Post by Lu, Wenzhuo
if not I suppose there is no harm to left it unimplemented.
OK. If it is for VF/PF link down-up event then I will make it as 'nop'.
As explained above, the event is not VF/PF link down-up. Actually it's that VF is
notified the PF link is down-up.
Post by Lu, Wenzhuo
And to my opinion, although now we only implement the reset API for VF, I
believe there's nothing preventing us to implement this API for PF if we can find
some scenario that we need to reset the PF link. The reset API is reset API, it can
be used for the event described above. But it's not bound to this event.
Post by Lu, Wenzhuo
Jerin
Konstantin
Jerin Jacob
2016-06-22 04:14:33 UTC
Permalink
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Wednesday, June 22, 2016 10:38 AM
To: Lu, Wenzhuo
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 10:29 PM
To: Ananyev, Konstantin
Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
On Mon, Jun 20, 2016 at 02:24:27PM +0800,
Wenzhuo Lu
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call
this API to reset VF port. Most likely,
APP should call it in its management
thread and guarantee the thread safe. It
means APP should stop the rx/tx and the
device, then reset the device, then
recover the device and rx/tx.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Following is _a_ use-case for Device reset.
But may be not be _the_ use case. IMO, We
need to first say expected behavior of this
API and add a use-case
later.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Other use-case would be, PCIe VF with
functional level reset for SRIOV migration.
Are we on same page?
In my experience with Linux devices, this is
normally handled by the device driver in the
start routine. Since any use case which needs
this is going to do a stop/reset/start
sequence, why not just have
the VF device driver do this in the start routine?.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Adding yet another API and state transistion
if not necessary increases the complexity and
required test
cases for all devices.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
I agree with Stephen here.I think if application
needs to call start after the device reset then
we could add this logic in start itself rather
exposing a yet another API
Do you mean changing the device_start to include
all these actions, stop
device -> stop queue -> re-setup queue -> start
queue -> start
device ?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
What was the expected API call sequence when you
were
introduced this API?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Point was to have implicit device reset in the API
call sequence(Wherever make sense for specific PMD)
I think the API call sequence depends on the
implementation of the APP. Let's say if there's not
this reset API, APP can use
this
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
API
Post by Ananyev, Konstantin
Post by Jerin Jacob
call sequence to handle the PF link down/up event,
rte_eth_dev_close -> rte_eth_rx_queue_setup ->
rte_eth_tx_queue_setup -
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API instead
of the API call sequence. You can see the reset API is
not necessary. The
benefit
Post by Ananyev, Konstantin
Post by Jerin Jacob
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then
|reset the device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for
this device? Is n't it?
The other issue is generalized outlook of the API,
Certain PMD will not have PF link down/up event? Link
down/up and only connected to VF and PF only for configuration.
How about fixing it more transparently in PMD driver
itself as PMD driver knows the PF link up/down event, Is
it possible to recover the VF on that event if its only
matter of resetting
it?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of
synchronisation between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function
will be a
problem.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
|
|It means APP should stop the rx/tx and the device, then
|reset the device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the
stop then what value addtion reset API brings on the table?
If application calls dev_reset() it doesn't need to call dev_stop() before
it.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
dev_reset() will take care of it.
But it needs to make sure that no other thread will try to
modify that device state (either dev_stop/start, or
eth_rx_busrst/eth_tx_burst)
while the reset op is in place.
Post by Lu, Wenzhuo
OK. This description looks different than commit log and API doxygen
comment. Please fix it.
Post by Lu, Wenzhuo
How about a different name for this API. Device reset is too generic?
Any suggestion? I use this name because I believe what this API do is to reset
the device.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Yes, it would exist only for VFs, for PF it could be left
unimplemented.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Though it definitely seems more convenient from user point
of view, they would know: to handle VF reset event, they
just need to call that particular function, not to re-implement their
own.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
What if driver returns "not implemented" then application
will have do generic rte_eth_dev_stop/rte_eth_dev_start.
That way in application perspective we are NOT solving any problem.
True, but as I said for PF application would just never receive such event.
What is this event ? Is it VF Link up/down event?
No I was referring to VF itself, Other VF PMD drivers in
drivers/net where this callback is not implemented.
Hmm, the only suggestion I have here - Maintainers/developers of
non-Intel PMD will implement it for their VFs?
That's fine. But, We have to know what to implement here in PMD
perspective?
Post by Lu, Wenzhuo
That's reason being asking about the API expectation and application usage :-)
In case of course they do need to handle similar event.
Which is this event and How application get notify it.
When the PF link is down/up, the PF will use the mailbox to send a message to
VF. The event here means the VF receives that message from PF. So VF can know
the physical link state changed. You see it's only for VF. PF will not receive such
kind of message.
Post by Lu, Wenzhuo
And we use the callback mechanism to let APP notified. APP should register a
callback function. When VF driver receives the message it will call the callback
function, then APP can know that.
How about the standardizing a name for that event like
RTE_ETH_EVENT_INTR_DOWNSTREAM_LSC or RTE_ETH_EVENT_INTR_PF_LSC
or similar (like RTE_ETH_EVENT_INTR_RESET) and counter API in VF to handle
the specific event whose API name similar to selected event name not
eth_dev_reset(reset sounds like more like HW reset, In PCIe device perspective
FLR etc)
OR
How about handling in more generic way where a generic alert message send by
PF to VF like RTE_ETH_EVENT_INTR_PF_ALERT or similar.
And have only one handle functions in VF side so that in future we can keep
adding new functionality with out introducing new counter API in VF
Jerin
Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call back functions with the events.
Actually I want the APP to register a callback function reset_event_callback for the reset event. Like this,
/* register reset interrupt callback */
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET, reset_event_callback, NULL);
And when the VF driver finds PF link down/up, it should use _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run into the callback which is provided by APP. Means reset_event_callback here.
me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify the PF
reset.I guess it is not for the PF link change or it isfor generic VF reset request
initiated by PF for everything.

file: lib/librte_ether/rte_ethdev.h
RTE_ETH_EVENT_INTR_RESET,
/**< reset interrupt event, sent to VF on PF reset */
^^^^^^^^^^^^^^^^^^^^^^^^^^^

if application need to call rte_ethdev_reset() on RTE_ETH_EVENT_INTR_RESET
event then please mention it commit log or API description.
Lu, Wenzhuo
2016-06-22 05:05:14 UTC
Permalink
-----Original Message-----
Sent: Wednesday, June 22, 2016 12:15 PM
To: Lu, Wenzhuo
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Wednesday, June 22, 2016 10:38 AM
To: Lu, Wenzhuo
Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Tuesday, June 21, 2016 10:29 PM
To: Ananyev, Konstantin
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
device reset
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Hi Wenzhuo,
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
On Mon, Jun 20, 2016 at 02:24:27PM
+0800, Wenzhuo Lu
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK
VF.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Post by Wenzhuo Lu
When the PF port down->up, APP should
call this API to reset VF port. Most
likely, APP should call it in its
management thread and guarantee the
thread safe. It means APP should stop
the rx/tx and the device, then reset
the device, then
recover the device and rx/tx.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Following is _a_ use-case for Device reset.
But may be not be _the_ use case. IMO,
We need to first say expected behavior
of this API and add a use-case
later.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Other use-case would be, PCIe VF with
functional level reset for SRIOV migration.
Are we on same page?
In my experience with Linux devices, this
is normally handled by the device driver
in the start routine. Since any use case
which needs this is going to do a
stop/reset/start sequence, why not just
have
the VF device driver do this in the start routine?.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
Post by Stephen Hemminger
Adding yet another API and state
transistion if not necessary increases the
complexity and required test
cases for all devices.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Jerin Jacob
I agree with Stephen here.I think if
application needs to call start after the
device reset then we could add this logic in
start itself rather exposing a yet another
API
Do you mean changing the device_start to
include all these actions, stop
device -> stop queue -> re-setup queue -> start
queue -> start
device ?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
What was the expected API call sequence when you
were
introduced this API?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Point was to have implicit device reset in the
API call sequence(Wherever make sense for
specific PMD)
I think the API call sequence depends on the
implementation of the APP. Let's say if there's
not this reset API, APP can use
this
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
API
Post by Ananyev, Konstantin
Post by Jerin Jacob
call sequence to handle the PF link down/up event,
rte_eth_dev_close -> rte_eth_rx_queue_setup ->
rte_eth_tx_queue_setup -
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
rte_eth_dev_start.
Post by Lu, Wenzhuo
Actually our purpose is to use this reset API
instead of the API call sequence. You can see the
reset API is not necessary. The
benefit
Post by Ananyev, Konstantin
Post by Jerin Jacob
is to save the code for APP.
Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device,
|then reset the device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API
for this device? Is n't it?
The other issue is generalized outlook of the API,
Certain PMD will not have PF link down/up event?
Link down/up and only connected to VF and PF only for
configuration.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Jerin Jacob
How about fixing it more transparently in PMD driver
itself as PMD driver knows the PF link up/down
event, Is it possible to recover the VF on that
event if its only matter of resetting
it?
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of
synchronisation between IO and control APIs (locking or so).
Actually I am not sure why having a special reset
function will be a
problem.
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
|
|It means APP should stop the rx/tx and the device, then
|reset the device, then recover the device and rx/tx.
|
Just to understand, If application still need to do the
stop then what value addtion reset API brings on the table?
If application calls dev_reset() it doesn't need to call
dev_stop() before
it.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
dev_reset() will take care of it.
But it needs to make sure that no other thread will try to
modify that device state (either dev_stop/start, or
eth_rx_busrst/eth_tx_burst)
while the reset op is in place.
Post by Lu, Wenzhuo
OK. This description looks different than commit log and API
doxygen
comment. Please fix it.
Post by Lu, Wenzhuo
How about a different name for this API. Device reset is too generic?
Any suggestion? I use this name because I believe what this API do is to reset
the device.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Yes, it would exist only for VFs, for PF it could be
left
unimplemented.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Though it definitely seems more convenient from user
point of view, they would know: to handle VF reset
event, they just need to call that particular
function, not to re-implement their
own.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
What if driver returns "not implemented" then
application will have do generic
rte_eth_dev_stop/rte_eth_dev_start.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
Post by Lu, Wenzhuo
That way in application perspective we are NOT solving any
problem.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Ananyev, Konstantin
True, but as I said for PF application would just never receive such
event.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
What is this event ? Is it VF Link up/down event?
No I was referring to VF itself, Other VF PMD drivers in
drivers/net where this callback is not implemented.
Hmm, the only suggestion I have here - Maintainers/developers
of non-Intel PMD will implement it for their VFs?
That's fine. But, We have to know what to implement here in PMD
perspective?
Post by Lu, Wenzhuo
That's reason being asking about the API expectation and
application usage :-)
In case of course they do need to handle similar event.
Which is this event and How application get notify it.
When the PF link is down/up, the PF will use the mailbox to send a message to
VF. The event here means the VF receives that message from PF. So VF
can know the physical link state changed. You see it's only for VF.
PF will not receive such kind of message.
Post by Lu, Wenzhuo
And we use the callback mechanism to let APP notified. APP should register a
callback function. When VF driver receives the message it will call
the callback function, then APP can know that.
How about the standardizing a name for that event like
RTE_ETH_EVENT_INTR_DOWNSTREAM_LSC or
RTE_ETH_EVENT_INTR_PF_LSC or
Post by Lu, Wenzhuo
similar (like RTE_ETH_EVENT_INTR_RESET) and counter API in VF to
handle the specific event whose API name similar to selected event
name not eth_dev_reset(reset sounds like more like HW reset, In PCIe
device perspective FLR etc)
OR
How about handling in more generic way where a generic alert message
send by PF to VF like RTE_ETH_EVENT_INTR_PF_ALERT or similar.
And have only one handle functions in VF side so that in future we
can keep adding new functionality with out introducing new counter
API in VF
Jerin
Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call
back functions with the events.
Post by Lu, Wenzhuo
Actually I want the APP to register a callback function reset_event_callback for
the reset event. Like this,
Post by Lu, Wenzhuo
/* register reset interrupt callback */
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
NULL); And when the
Post by Lu, Wenzhuo
VF driver finds PF link down/up, it should use
_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run into
the callback which is provided by APP. Means reset_event_callback here.
me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify the PF
reset.I guess it is not for the PF link change or it isfor generic VF reset request
initiated by PF for everything.
I think this event is for device reset not only for PF but also can for VF. I think we can use this event when the driver want the APP to reset the device. The PF link down/up caused VF reset is one of the cases.
file: lib/librte_ether/rte_ethdev.h
RTE_ETH_EVENT_INTR_RESET,
/**< reset interrupt event, sent to VF on PF reset */
^^^^^^^^^^^^^^^^^^^^^^^^^^^
if application need to call rte_ethdev_reset() on RTE_ETH_EVENT_INTR_RESET
event then please mention it commit log or API description.
Good suggestion. I'll try to find where's the good place to add more explanation.
Jerin Jacob
2016-06-22 06:10:02 UTC
Permalink
Post by Lu, Wenzhuo
-----Original Message-----
Sent: Wednesday, June 22, 2016 12:15 PM
To: Lu, Wenzhuo
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call
back functions with the events.
Post by Lu, Wenzhuo
Actually I want the APP to register a callback function reset_event_callback for
the reset event. Like this,
Post by Lu, Wenzhuo
/* register reset interrupt callback */
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
NULL); And when the
Post by Lu, Wenzhuo
VF driver finds PF link down/up, it should use
_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run into
the callback which is provided by APP. Means reset_event_callback here.
me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify the PF
reset.I guess it is not for the PF link change or it isfor generic VF reset request
initiated by PF for everything.
I think this event is for device reset not only for PF but also can for VF. I think we can use this event when the driver want the APP to reset the device. The PF link down/up caused VF reset is one of the cases.
Then please correct description for the RTE_ETH_EVENT_INTR_RESET
in lib/librte_ether/rte_ethdev.h
"/**< reset interrupt event, sent to VF on PF reset */"
Post by Lu, Wenzhuo
file: lib/librte_ether/rte_ethdev.h
RTE_ETH_EVENT_INTR_RESET,
/**< reset interrupt event, sent to VF on PF reset */
^^^^^^^^^^^^^^^^^^^^^^^^^^^
if application need to call rte_ethdev_reset() on RTE_ETH_EVENT_INTR_RESET
event then please mention it commit log or API description.
Good suggestion. I'll try to find where's the good place to add more explanation.
I guess then reset API can be changed to rte_ethdev_process_reset_intr() or
similar to reflect the use case(API called by application on reset event from PF)

The PMDs were PF does not generate the RTE_ETH_EVENT_INTR_RESET to VF
then VF's reset PMD callback shall be a 'nop'

Jerin
Lu, Wenzhuo
2016-06-22 06:42:43 UTC
Permalink
Hi Jerin,
-----Original Message-----
Sent: Wednesday, June 22, 2016 2:10 PM
To: Lu, Wenzhuo
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
-----Original Message-----
Sent: Wednesday, June 22, 2016 12:15 PM
To: Lu, Wenzhuo
Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call
back functions with the events.
Post by Lu, Wenzhuo
Actually I want the APP to register a callback function
reset_event_callback for
the reset event. Like this,
Post by Lu, Wenzhuo
/* register reset interrupt callback */
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
NULL); And when the
Post by Lu, Wenzhuo
VF driver finds PF link down/up, it should use
_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run
into the callback which is provided by APP. Means reset_event_callback here.
me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify
the PF reset.I guess it is not for the PF link change or it isfor
generic VF reset request initiated by PF for everything.
I think this event is for device reset not only for PF but also can for VF. I think
we can use this event when the driver want the APP to reset the device. The PF
link down/up caused VF reset is one of the cases.
Then please correct description for the RTE_ETH_EVENT_INTR_RESET in
lib/librte_ether/rte_ethdev.h "/**< reset interrupt event, sent to VF on PF reset
*/"
Post by Lu, Wenzhuo
file: lib/librte_ether/rte_ethdev.h
RTE_ETH_EVENT_INTR_RESET,
/**< reset interrupt event, sent to VF on PF reset */
^^^^^^^^^^^^^^^^^^^^^^^^^^^
if application need to call rte_ethdev_reset() on
RTE_ETH_EVENT_INTR_RESET event then please mention it commit log or
API description.
Post by Lu, Wenzhuo
Good suggestion. I'll try to find where's the good place to add more
explanation.
I guess then reset API can be changed to rte_ethdev_process_reset_intr() or
similar to reflect the use case(API called by application on reset event from PF)
The PMDs were PF does not generate the RTE_ETH_EVENT_INTR_RESET to VF
then VF's reset PMD callback shall be a 'nop'
Jerin
But I don't think it's appropriate to bind the RTE_ETH_EVENTs with the APIs. This patch set provide a reset API to reset the device. Don't mean this reset API only can be used when the APP hit the event RTE_ETH_EVENT_INTR_RESET. I can add some comments to suggest the user to call the reset API at that time. But I think APP can call the reset API anytime when it thinks it's necessary. So I don't like the name *process_reset_intr*, it hints that this API is only for the INTR_RESET event.
Jerin Jacob
2016-06-22 07:59:32 UTC
Permalink
Post by Lu, Wenzhuo
Hi Jerin,
-----Original Message-----
Sent: Wednesday, June 22, 2016 2:10 PM
To: Lu, Wenzhuo
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
-----Original Message-----
Sent: Wednesday, June 22, 2016 12:15 PM
To: Lu, Wenzhuo
Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call
back functions with the events.
Post by Lu, Wenzhuo
Actually I want the APP to register a callback function
reset_event_callback for
the reset event. Like this,
Post by Lu, Wenzhuo
/* register reset interrupt callback */
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
NULL); And when the
Post by Lu, Wenzhuo
VF driver finds PF link down/up, it should use
_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run
into the callback which is provided by APP. Means reset_event_callback here.
me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify
the PF reset.I guess it is not for the PF link change or it isfor
generic VF reset request initiated by PF for everything.
I think this event is for device reset not only for PF but also can for VF. I think
we can use this event when the driver want the APP to reset the device. The PF
link down/up caused VF reset is one of the cases.
Then please correct description for the RTE_ETH_EVENT_INTR_RESET in
lib/librte_ether/rte_ethdev.h "/**< reset interrupt event, sent to VF on PF reset
*/"
Post by Lu, Wenzhuo
file: lib/librte_ether/rte_ethdev.h
RTE_ETH_EVENT_INTR_RESET,
/**< reset interrupt event, sent to VF on PF reset */
^^^^^^^^^^^^^^^^^^^^^^^^^^^
if application need to call rte_ethdev_reset() on
RTE_ETH_EVENT_INTR_RESET event then please mention it commit log or
API description.
Post by Lu, Wenzhuo
Good suggestion. I'll try to find where's the good place to add more
explanation.
I guess then reset API can be changed to rte_ethdev_process_reset_intr() or
similar to reflect the use case(API called by application on reset event from PF)
The PMDs were PF does not generate the RTE_ETH_EVENT_INTR_RESET to VF
then VF's reset PMD callback shall be a 'nop'
Jerin
But I don't think it's appropriate to bind the RTE_ETH_EVENTs with the APIs. This patch set provide a reset API to reset the device. Don't mean this reset API only can be used when the APP hit the event RTE_ETH_EVENT_INTR_RESET. I can add some comments to suggest the user to call the reset API at that time. But I think APP can call the reset API anytime when it thinks it's necessary. So I don't like the name *process_reset_intr*, it hints that this API is only for the INTR_RESET event.
That's where scope of API and PMD implementation its not getting clear.
Can you tell me any other use case where we need to call this API from application.
The name rte_ethdev_reset() is too generic. If you are going with that
generic name then you may need add lot of details in API description.

Thomas,
As a librte_ether maintainer any comments on this?
Thomas Monjalon
2016-06-22 08:17:14 UTC
Permalink
Post by Jerin Jacob
Thomas,
As a librte_ether maintainer any comments on this?
+1 for adding details and make sure naming is good.
I don't really need to comment here because I have already done this
comment earlier:
http://dpdk.org/ml/archives/dev/2016-June/041845.html
Thank you for insisting.
Lu, Wenzhuo
2016-06-22 08:25:45 UTC
Permalink
Hi Thomas,
-----Original Message-----
Sent: Wednesday, June 22, 2016 4:17 PM
To: Jerin Jacob
Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Jerin Jacob
Thomas,
As a librte_ether maintainer any comments on this?
+1 for adding details and make sure naming is good.
I don't really need to comment here because I have already done this comment
http://dpdk.org/ml/archives/dev/2016-June/041845.html
Thank you for insisting.
I've add some details in this patch set. If it's not enough, please let me know.
And I think this discussion is about what the API name should be like. Actually I think all the existing name is describing what is done by the API not when and where it should be used, like dev_start/stop.
But anyway I'm open for changing the name. Is the name process_reset_intr you prefer? Thanks.
Thomas Monjalon
2016-06-22 09:18:21 UTC
Permalink
Post by Lu, Wenzhuo
Post by Thomas Monjalon
Post by Jerin Jacob
Thomas,
As a librte_ether maintainer any comments on this?
+1 for adding details and make sure naming is good.
I don't really need to comment here because I have already done this comment
http://dpdk.org/ml/archives/dev/2016-June/041845.html
Thank you for insisting.
I've add some details in this patch set. If it's not enough, please let me know.
And I think this discussion is about what the API name should be like. Actually I think all the existing name is describing what is done by the API not when and where it should be used, like dev_start/stop.
You're right, I overlooked it:

+ * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
+ * queues, restart the port.

Jerin, which detail do you think is needed?

Wenzhuo, why this function is needed?
All these actions are already possible independently.
When looking at ixgbe implementation, I see:
ixgbevf_dev_stats_reset() which is not documented in the API
rte_delay_ms(1000);
do {} while
It looks to be some hacks.
If you really need some workarounds to handle some tricky situations,
maybe that the API is not detailed enough.
Post by Lu, Wenzhuo
But anyway I'm open for changing the name. Is the name process_reset_intr you prefer? Thanks.
Not sure.
If you really intend to add a generic reset, maybe rte_eth_dev_reset()
is a good name. We just need more justification.
After reading the doc, the user can understand it is just a wrapper of
existing functions. But it appears in the code that it does more and can
help in some situations.
Jerin Jacob
2016-06-22 11:06:43 UTC
Permalink
Post by Wenzhuo Lu
Post by Lu, Wenzhuo
Post by Thomas Monjalon
Post by Jerin Jacob
Thomas,
As a librte_ether maintainer any comments on this?
+1 for adding details and make sure naming is good.
I don't really need to comment here because I have already done this comment
http://dpdk.org/ml/archives/dev/2016-June/041845.html
Thank you for insisting.
I've add some details in this patch set. If it's not enough, please let me know.
And I think this discussion is about what the API name should be like. Actually I think all the existing name is describing what is done by the API not when and where it should be used, like dev_start/stop.
+ * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
+ * queues, restart the port.
Jerin, which detail do you think is needed?
When to use what ? In what scenarios application need to use
generic stop/start vs this new API?

How about calling it as rte_eth_dev_restart() ?

If existing stop and then start is same the new API in functional perspective,
How about having generic implementation of rte_eth_dev_restart() if PMD
specific restart handlers are NOT found.

That why application need to call only rte_eth_dev_restart() for port
restart. It can internally decide optimized stop/start or generic
restart

Jerin
Post by Wenzhuo Lu
Wenzhuo, why this function is needed?
All these actions are already possible independently.
ixgbevf_dev_stats_reset() which is not documented in the API
rte_delay_ms(1000);
do {} while
It looks to be some hacks.
If you really need some workarounds to handle some tricky situations,
maybe that the API is not detailed enough.
Post by Lu, Wenzhuo
But anyway I'm open for changing the name. Is the name process_reset_intr you prefer? Thanks.
Not sure.
If you really intend to add a generic reset, maybe rte_eth_dev_reset()
is a good name. We just need more justification.
After reading the doc, the user can understand it is just a wrapper of
existing functions. But it appears in the code that it does more and can
help in some situations.
Lu, Wenzhuo
2016-06-23 00:45:56 UTC
Permalink
Hi Jerin,
-----Original Message-----
Sent: Wednesday, June 22, 2016 7:07 PM
To: Thomas Monjalon
Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Wenzhuo Lu
Post by Lu, Wenzhuo
Post by Thomas Monjalon
Post by Jerin Jacob
Thomas,
As a librte_ether maintainer any comments on this?
+1 for adding details and make sure naming is good.
I don't really need to comment here because I have already done this comment
http://dpdk.org/ml/archives/dev/2016-June/041845.html
Thank you for insisting.
I've add some details in this patch set. If it's not enough, please let me know.
And I think this discussion is about what the API name should be like. Actually
I think all the existing name is describing what is done by the API not when and
where it should be used, like dev_start/stop.
Post by Wenzhuo Lu
+ * The API will stop the port, clear the rx/tx queues, re-setup the
+ rx/tx
+ * queues, restart the port.
Jerin, which detail do you think is needed?
When to use what ? In what scenarios application need to use generic stop/start
vs this new API?
I'll add more explanation. Actually I've written an example. But after discussion we agree it's not a good idea to add a totally new example just for one function. I'm thinking about now to fuse this example into testpmd.
How about calling it as rte_eth_dev_restart() ?
Sounds good :)
If existing stop and then start is same the new API in functional perspective, How
about having generic implementation of rte_eth_dev_restart() if PMD specific
restart handlers are NOT found.
Good suggestion, thanks.
That why application need to call only rte_eth_dev_restart() for port restart. It
can internally decide optimized stop/start or generic restart
Jerin
Post by Wenzhuo Lu
Wenzhuo, why this function is needed?
All these actions are already possible independently.
ixgbevf_dev_stats_reset() which is not documented in the API
rte_delay_ms(1000);
do {} while
It looks to be some hacks.
If you really need some workarounds to handle some tricky situations,
maybe that the API is not detailed enough.
Post by Lu, Wenzhuo
But anyway I'm open for changing the name. Is the name process_reset_intr
you prefer? Thanks.
Post by Wenzhuo Lu
Not sure.
If you really intend to add a generic reset, maybe rte_eth_dev_reset()
is a good name. We just need more justification.
After reading the doc, the user can understand it is just a wrapper of
existing functions. But it appears in the code that it does more and
can help in some situations.
Lu, Wenzhuo
2016-06-23 00:39:30 UTC
Permalink
Hi Thomas,
-----Original Message-----
Sent: Wednesday, June 22, 2016 5:18 PM
To: Lu, Wenzhuo; Jerin Jacob
Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Lu, Wenzhuo
Post by Thomas Monjalon
Post by Jerin Jacob
Thomas,
As a librte_ether maintainer any comments on this?
+1 for adding details and make sure naming is good.
I don't really need to comment here because I have already done this comment
http://dpdk.org/ml/archives/dev/2016-June/041845.html
Thank you for insisting.
I've add some details in this patch set. If it's not enough, please let me know.
And I think this discussion is about what the API name should be like. Actually I
think all the existing name is describing what is done by the API not when and
where it should be used, like dev_start/stop.
+ * The API will stop the port, clear the rx/tx queues, re-setup the
+ rx/tx
+ * queues, restart the port.
Jerin, which detail do you think is needed?
Wenzhuo, why this function is needed?
As you said below and discussed before, it's a wrapper of the existing functions. The benefit is helping the users avoid the complex implementation when they want to stop and re-start the device.
All these actions are already possible independently.
ixgbevf_dev_stats_reset() which is not documented in the API
rte_delay_ms(1000);
do {} while
It looks to be some hacks.
If you really need some workarounds to handle some tricky situations, maybe
that the API is not detailed enough.
Yes, you're right. Still something left. I'll add more detail.
Post by Lu, Wenzhuo
But anyway I'm open for changing the name. Is the name process_reset_intr
you prefer? Thanks.
Not sure.
If you really intend to add a generic reset, maybe rte_eth_dev_reset() is a good
name. We just need more justification.
After reading the doc, the user can understand it is just a wrapper of existing
functions. But it appears in the code that it does more and can help in some
situations.
I'll add more info. Thanks.
Lu, Wenzhuo
2016-06-21 00:51:30 UTC
Permalink
Hi Jerin,
-----Original Message-----
Sent: Monday, June 20, 2016 5:14 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
Post by Wenzhuo Lu
Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to reset VF port.
Most likely, APP should call it in its management thread and guarantee
the thread safe. It means APP should stop the rx/tx and the device,
then reset the device, then recover the device and rx/tx.
Following is _a_ use-case for Device reset. But may be not be _the_ use case.
IMO, We need to first say expected behavior of this API and add a use-case later.
Thanks for the suggestion, I'll reword it.
Other use-case would be, PCIe VF with functional level reset for SRIOV migration.
Are we on same page?
I'm not sure:) Does this SRIOV migration mean the migration of a Logical domain that has a VF assigned to it?
Post by Wenzhuo Lu
---
doc/guides/nics/overview.rst | 1 +
lib/librte_ether/rte_ethdev.c | 17 +++++++++++++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++++++++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++++++
4 files changed, 49 insertions(+)
diff --git a/doc/guides/nics/overview.rst
b/doc/guides/nics/overview.rst index 0bd8fae..c8a4985 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -89,6 +89,7 @@ Most of these differences are summarized below.
Speed capabilities
Link status Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Link status event Y Y Y Y Y Y Y Y Y Y Y Y Y
+ Link reset Y Y Y Y Y
More appropriate would be "Device reset" ? Right?
Yes, sounds better :)
Post by Wenzhuo Lu
Queue status event Y
Rx interrupt Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Queue start/stop Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
diff --git a/lib/librte_ether/rte_ethdev.c
b/lib/librte_ether/rte_ethdev.c index e148028..6c0449b 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t
port_id,
Post by Wenzhuo Lu
-ENOTSUP);
return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask,
en); }
+
+int
+rte_eth_dev_reset(uint8_t port_id)
+{
+ struct rte_eth_dev *dev;
+ int diag;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+
+ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+ diag = (*dev->dev_ops->dev_reset)(dev);
+
+ return diag;
+}
diff --git a/lib/librte_ether/rte_ethdev.h
b/lib/librte_ether/rte_ethdev.h index 2757510..5b3ba12 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
uint8_t en);
+typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev); /**<
+
#ifdef RTE_NIC_BYPASS
enum {
@@ -1508,6 +1511,8 @@ struct eth_dev_ops {
eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
/** Enable/disable l2 tunnel offload functions */
eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+ /** Reset device. */
+ eth_dev_reset_t dev_reset;
};
/**
@@ -4253,6 +4258,25 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t
port_id,
Post by Wenzhuo Lu
uint32_t mask,
uint8_t en);
+/**
+ * Reset an ethernet device when it's not working. One scenario is,
+after PF
+ * port is down and up, the related VF port should be reset.
+ * The API will stop the port, clear the rx/tx queues, re-setup the
+rx/tx
+ * queues, restart the port.
+ * Before calling this API, APP should stop the rx/tx. When tx is
+being stopped,
+ * APP can drop the packets and release the buffer instead of sending them.
Same as first comment.
I'll reword it.
Post by Wenzhuo Lu
+ *
+ * The port identifier of the Ethernet device.
+ *
+ * - (0) if successful.
+ * - (-ENODEV) if port identifier is invalid.
+ * - (-ENOTSUP) if hardware doesn't support this function.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_ether/rte_ether_version.map
b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c34207e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
rte_eth_tx_buffer_set_err_callback;
} DPDK_2.2;
+
+DPDK_16.07 {
+
+ rte_eth_dev_reset;
+
+} DPDK_16.04;
--
1.9.3
Wenzhuo Lu
2016-06-20 06:24:28 UTC
Permalink
Implement the device reset function.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
doc/guides/rel_notes/release_16_07.rst | 9 +++++
drivers/net/ixgbe/ixgbe_ethdev.c | 64 +++++++++++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++++--
4 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a761e3c..d36c4b1 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,6 +53,15 @@ New Features
VF. To handle this link up/down event, add the mailbox interruption
support to receive the message.

+* **Added device reset support for ixgbe VF.**
+
+ Added the device reset API. APP can call this API to reset the VF port
+ when it's not working.
+ Based on the mailbox interruption support, when VF reseives the control
+ message from PF, it means the PF link state changes, VF uses the reset
+ callback in the message handler to notice the APP. APP need call the device
+ reset API to reset the VF port.
+

Resolved Issues
---------------
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 05f4f29..4e62cbb 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -381,6 +381,8 @@ static int ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
struct rte_eth_udp_tunnel *udp_tunnel);

+static int ixgbevf_dev_reset(struct rte_eth_dev *dev);
+
/*
* Define VF Stats MACRO for Non "cleared on read" register
*/
@@ -586,6 +588,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
.reta_query = ixgbe_dev_rss_reta_query,
.rss_hash_update = ixgbe_dev_rss_hash_update,
.rss_hash_conf_get = ixgbe_dev_rss_hash_conf_get,
+ .dev_reset = ixgbevf_dev_reset,
};

/* store statistics names and its offset in stats structure */
@@ -4052,7 +4055,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
ETH_VLAN_EXTEND_MASK;
ixgbevf_vlan_offload_set(dev, mask);

- ixgbevf_dev_rxtx_start(dev);
+ err = ixgbevf_dev_rxtx_start(dev);
+ if (err)
+ return err;

/* check and configure queue intr-vector mapping */
if (dev->data->dev_conf.intr_conf.rxq != 0) {
@@ -7185,6 +7190,63 @@ static void ixgbevf_mbx_process(struct rte_eth_dev *dev)
}

static int
+ixgbevf_dev_reset(struct rte_eth_dev *dev)
+{
+ struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ int diag = 0;
+ uint32_t vteiam;
+
+ /* Nothing needs to be done if the device is not started. */
+ if (!dev->data->dev_started)
+ return 0;
+
+ PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+ /* Performance VF reset. */
+ do {
+ dev->data->dev_started = 0;
+ ixgbevf_dev_stop(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = ixgbe_dev_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+ "Failed to update link.");
+ }
+ rte_delay_ms(1000);
+
+ diag = ixgbevf_dev_start(dev);
+ /*If fail to start the device, need to stop/start it again. */
+ if (diag) {
+ PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
+ "Failed to start device.");
+ continue;
+ }
+ dev->data->dev_started = 1;
+ ixgbevf_dev_stats_reset(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = ixgbe_dev_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+ "Failed to update link.");
+ diag = 0;
+ }
+
+ /**
+ * When the PF link is down, there has chance
+ * that VF cannot operate its registers. Will
+ * check if the registers is written
+ * successfully. If not, repeat stop/start until
+ * the PF link is up, in other words, until the
+ * registers can be written.
+ */
+ vteiam = IXGBE_READ_REG(hw, IXGBE_VTEIAM);
+ /* Reference ixgbevf_intr_enable when checking */
+ } while (diag || vteiam != IXGBE_VF_IRQ_ENABLE_MASK);
+
+ return 0;
+}
+
+static int
ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev)
{
uint32_t eicr;
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..bc68b43 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -377,7 +377,7 @@ int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);

void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);

-void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
+int ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);

uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 9c6eaf2..aa26c12 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -5147,7 +5147,7 @@ ixgbevf_dev_tx_init(struct rte_eth_dev *dev)
/*
* [VF] Start Transmit and Receive Units.
*/
-void __attribute__((cold))
+int __attribute__((cold))
ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
{
struct ixgbe_hw *hw;
@@ -5183,8 +5183,10 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
rte_delay_ms(1);
txdctl = IXGBE_READ_REG(hw, IXGBE_VFTXDCTL(i));
} while (--poll_ms && !(txdctl & IXGBE_TXDCTL_ENABLE));
- if (!poll_ms)
+ if (!poll_ms) {
PMD_INIT_LOG(ERR, "Could not enable Tx Queue %d", i);
+ return -1;
+ }
}
for (i = 0; i < dev->data->nb_rx_queues; i++) {

@@ -5200,12 +5202,16 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
rte_delay_ms(1);
rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i));
} while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE));
- if (!poll_ms)
+ if (!poll_ms) {
PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d", i);
+ return -1;
+ }
rte_wmb();
IXGBE_WRITE_REG(hw, IXGBE_VFRDT(i), rxq->nb_rx_desc - 1);

}
+
+ return 0;
}

/* Stubs needed for linkage when CONFIG_RTE_IXGBE_INC_VECTOR is set to 'n' */
--
1.9.3
Wenzhuo Lu
2016-06-20 06:24:30 UTC
Permalink
Implement the device reset function.
This reset function will detach device then
attach device, reconfigure dev, re-setup the Rx/Tx queues.

Signed-off-by: Zhe Tao <***@intel.com>
---
doc/guides/rel_notes/release_16_07.rst | 4 ++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83 ++++++++++++++++++++++++++++++++++
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
5 files changed, 105 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a4c0cc3..6661b07 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -62,6 +62,10 @@ New Features
callback in the message handler to notice the APP. APP need call the device
reset API to reset the VF port.

+* **Added VF reset support for i40e VF driver.**
+
+ Added a new implementaion to allow i40e VF driver to
+ reset the functionality and state of itself.

Resolved Issues
---------------
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cfd2399..4e0df3b 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -540,6 +540,8 @@ struct i40e_adapter {
struct rte_timecounter systime_tc;
struct rte_timecounter rx_tstamp_tc;
struct rte_timecounter tx_tstamp_tc;
+ /* For VF reset */
+ uint8_t reset_number;
};

int i40e_dev_switch_queues(struct i40e_pf *pf, bool on);
@@ -593,6 +595,8 @@ void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
struct rte_eth_txq_info *qinfo);

+void i40evf_emulate_vf_reset(uint8_t port_id);
+
/* I40E_DEV_PRIVATE_TO */
#define I40E_DEV_PRIVATE_TO_PF(adapter) \
(&((struct i40e_adapter *)adapter)->pf)
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 90682ac..2f65a29 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -157,6 +157,12 @@ i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id);
static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
uint8_t *msg,
uint16_t msglen);
+static int i40evf_dev_uninit(struct rte_eth_dev *eth_dev);
+static int i40evf_dev_init(struct rte_eth_dev *eth_dev);
+static void i40evf_dev_close(struct rte_eth_dev *dev);
+static int i40evf_dev_start(struct rte_eth_dev *dev);
+static int i40evf_dev_configure(struct rte_eth_dev *dev);
+static int i40evf_handle_vf_reset(struct rte_eth_dev *dev);

/* Default hash key buffer for RSS */
static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -223,6 +229,7 @@ static const struct eth_dev_ops i40evf_eth_dev_ops = {
.reta_query = i40evf_dev_rss_reta_query,
.rss_hash_update = i40evf_dev_rss_hash_update,
.rss_hash_conf_get = i40evf_dev_rss_hash_conf_get,
+ .dev_reset = i40evf_handle_vf_reset
};

/*
@@ -1309,6 +1316,82 @@ i40evf_uninit_vf(struct rte_eth_dev *dev)
}

static void
+i40e_vf_queue_reset(struct rte_eth_dev *dev)
+{
+ uint16_t i;
+
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ struct i40e_rx_queue *rxq = dev->data->rx_queues[i];
+
+ if (rxq->q_set) {
+ i40e_dev_rx_queue_setup(dev,
+ rxq->queue_id,
+ rxq->nb_rx_desc,
+ rxq->socket_id,
+ &rxq->rxconf,
+ rxq->mp);
+ }
+ }
+ for (i = 0; i < dev->data->nb_tx_queues; i++) {
+ struct i40e_tx_queue *txq = dev->data->tx_queues[i];
+
+ if (txq->q_set) {
+ i40e_dev_tx_queue_setup(dev,
+ txq->queue_id,
+ txq->nb_tx_desc,
+ txq->socket_id,
+ &txq->txconf);
+ }
+ }
+}
+
+static void
+i40e_vf_reset_dev(struct rte_eth_dev *dev)
+{
+ struct i40e_adapter *adapter =
+ I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+ i40evf_dev_close(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev close complete");
+ i40evf_dev_uninit(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev detached");
+ memset(dev->data->dev_private, 0,
+ (uint64_t)&adapter->reset_number - (uint64_t)adapter);
+
+ i40evf_dev_configure(dev);
+ i40evf_dev_init(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev attached");
+ i40e_vf_queue_reset(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf queue reset");
+ i40evf_dev_start(dev);
+ PMD_DRV_LOG(DEBUG, "i40evf dev restart");
+}
+
+static int
+i40evf_handle_vf_reset(struct rte_eth_dev *dev)
+{
+ struct i40e_adapter *adapter =
+ I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+ if (!dev->data->dev_started)
+ return 0;
+
+ adapter->reset_number = 1;
+ i40e_vf_reset_dev(dev);
+ adapter->reset_number = 0;
+
+ return 0;
+}
+
+void
+i40evf_emulate_vf_reset(uint8_t port_id)
+{
+ struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+ i40evf_handle_vf_reset(dev);
+}
+
+static void
i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
uint8_t *msg,
__rte_unused uint16_t msglen)
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index c833aa3..8dbc64c 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -2148,6 +2148,7 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
uint16_t len, i;
uint16_t base, bsf, tc_mapping;
int use_def_burst_func = 1;
+ struct rte_eth_rxconf conf = *rx_conf;

if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
struct i40e_vf *vf =
@@ -2186,6 +2187,8 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
return -ENOMEM;
}
rxq->mp = mp;
+ rxq->socket_id = socket_id;
+ rxq->rxconf = conf;
rxq->nb_rx_desc = nb_desc;
rxq->rx_free_thresh = rx_conf->rx_free_thresh;
rxq->queue_id = queue_idx;
@@ -2365,6 +2368,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
uint32_t ring_size;
uint16_t tx_rs_thresh, tx_free_thresh;
uint16_t i, base, bsf, tc_mapping;
+ struct rte_eth_txconf conf = *tx_conf;

if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
struct i40e_vf *vf =
@@ -2488,6 +2492,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
}

txq->nb_tx_desc = nb_desc;
+ txq->socket_id = socket_id;
+ txq->txconf = conf;
txq->tx_rs_thresh = tx_rs_thresh;
txq->tx_free_thresh = tx_free_thresh;
txq->pthresh = tx_conf->tx_thresh.pthresh;
@@ -2950,8 +2956,12 @@ void
i40e_dev_free_queues(struct rte_eth_dev *dev)
{
uint16_t i;
+ struct i40e_adapter *adapter =
+ I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);

PMD_INIT_FUNC_TRACE();
+ if (adapter->reset_number)
+ return;

for (i = 0; i < dev->data->nb_rx_queues; i++) {
i40e_dev_rx_queue_release(dev->data->rx_queues[i]);
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..9e1b05a 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -140,6 +140,8 @@ struct i40e_rx_queue {
bool rx_deferred_start; /**< don't start this queue in dev start */
uint16_t rx_using_sse; /**<flag indicate the usage of vPMD for rx */
uint8_t dcb_tc; /**< Traffic class of rx queue */
+ uint8_t socket_id;
+ struct rte_eth_rxconf rxconf;
};

struct i40e_tx_entry {
@@ -181,6 +183,8 @@ struct i40e_tx_queue {
bool q_set; /**< indicate if tx queue has been configured */
bool tx_deferred_start; /**< don't start this queue in dev start */
uint8_t dcb_tc; /**< Traffic class of tx queue */
+ uint8_t socket_id;
+ struct rte_eth_txconf txconf;
};

/** Offload features */
--
1.9.3
Wenzhuo Lu
2016-06-20 06:24:29 UTC
Permalink
Implement the device reset function.

Signed-off-by: Wenzhuo Lu <***@intel.com>
---
doc/guides/rel_notes/release_16_07.rst | 2 +-
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++++++++++++
2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index d36c4b1..a4c0cc3 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,7 +53,7 @@ New Features
VF. To handle this link up/down event, add the mailbox interruption
support to receive the message.

-* **Added device reset support for ixgbe VF.**
+* **Added device reset support for ixgbe/igb VF.**

Added the device reset API. APP can call this API to reset the VF port
when it's not working.
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index b0e5e6a..f1ac4b5 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -268,6 +268,7 @@ static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
void *param);
static void igbvf_mbx_process(struct rte_eth_dev *dev);
+static int igbvf_dev_reset(struct rte_eth_dev *dev);

/*
* Define VF Stats MACRO for Non "cleared on read" register
@@ -409,6 +410,7 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
.mac_addr_set = igbvf_default_mac_addr_set,
.get_reg_length = igbvf_get_reg_length,
.get_reg = igbvf_get_regs,
+ .dev_reset = igbvf_dev_reset,
};

/* store statistics names and its offset in stats structure */
@@ -2655,6 +2657,63 @@ void igbvf_mbx_process(struct rte_eth_dev *dev)
}

static int
+igbvf_dev_reset(struct rte_eth_dev *dev)
+{
+ struct e1000_hw *hw =
+ E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ int diag = 0;
+ uint32_t eiam;
+ /* Reference igbvf_intr_enable */
+ uint32_t eiam_mbx = 1 << E1000_VTIVAR_MISC_MAILBOX;
+
+ /* Nothing needs to be done if the device is not started. */
+ if (!dev->data->dev_started)
+ return 0;
+
+ PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+ /* Performance VF reset. */
+ do {
+ dev->data->dev_started = 0;
+ igbvf_dev_stop(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = eth_igb_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Igb VF reset: "
+ "Failed to update link.");
+ }
+ rte_delay_ms(1000);
+
+ diag = igbvf_dev_start(dev);
+ if (diag) {
+ PMD_INIT_LOG(ERR, "Igb VF reset: "
+ "Failed to start device.");
+ return diag;
+ }
+ dev->data->dev_started = 1;
+ eth_igbvf_stats_reset(dev);
+ if (dev->data->dev_conf.intr_conf.lsc == 0)
+ diag = eth_igb_link_update(dev, 0);
+ if (diag) {
+ PMD_INIT_LOG(INFO, "Igb VF reset: "
+ "Failed to update link.");
+ }
+
+ /**
+ * When the PF link is down, there has chance
+ * that VF cannot operate its registers. Will
+ * check if the registers is written
+ * successfully. If not, repeat stop/start until
+ * the PF link is up, in other words, until the
+ * registers can be written.
+ */
+ eiam = E1000_READ_REG(hw, E1000_EIAM);
+ } while (!(eiam & eiam_mbx));
+
+ return 0;
+}
+
+static int
eth_igbvf_interrupt_action(struct rte_eth_dev *dev)
{
struct e1000_interrupt *intr =
--
1.9.3
Luca Boccassi
2016-07-04 15:48:08 UTC
Permalink
Post by Wenzhuo Lu
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the
VF link and let it recover.
PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base on the
patch set *support mailbox interruption on ixgbe/igb VF*.
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF
i40e: implement device reset on VF
- Added the implementation for the VF reset functionality.
- Changed the i40e related operations during VF reset.
- Resent the patches because of the mail sent issue.
- Removed some VF reset emulation code.
- Removed all the code related with lock.
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.
doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83 ++++++++++++++++++++++++++++++++++
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64 +++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
Hello Wenzhuo,

I'm testing this patchset, but I am sporadically running into an issue
where the VFs reset fails after the PF flaps.

I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.

I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear in
the log:

PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.

Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
The application has ~500 2MB hugepages, and there's 2GB of free memory
available on top of that.

Have you seen this before? Any pointer or suggestion for debugg
Lu, Wenzhuo
2016-07-05 00:52:23 UTC
Permalink
Hi Luca,
-----Original Message-----
Sent: Monday, July 4, 2016 11:48 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Wenzhuo Lu
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the VF
link and let it recover.
PS: This patch set is splitted from a previous patch set, *automatic
link recovery on ixgbe/igb VF*, and it's base on the patch set
*support mailbox interruption on ixgbe/igb VF*.
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF
i40e: implement device reset on VF
- Added the implementation for the VF reset functionality.
- Changed the i40e related operations during VF reset.
- Resent the patches because of the mail sent issue.
- Removed some VF reset emulation code.
- Removed all the code related with lock.
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.
doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83
++++++++++++++++++++++++++++++++++
Post by Wenzhuo Lu
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64 +++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running into an issue where the
VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
The application has ~500 2MB hugepages, and there's 2GB of free memory
available on top of that.
Have you seen this before? Any pointer or suggestion for debugging?
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is not released. This memory has to be released by the APP, so my patches haven’t covered this. Actually an example is needed to show how to use the reset API. I plan to modify the testpmd.
You may notice this feature is postponed to 16.11. Would you like to wait for the new version that wil
Luca Boccassi
2016-07-05 09:52:31 UTC
Permalink
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Monday, July 4, 2016 11:48 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Wenzhuo Lu
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the VF
link and let it recover.
PS: This patch set is splitted from a previous patch set, *automatic
link recovery on ixgbe/igb VF*, and it's base on the patch set
*support mailbox interruption on ixgbe/igb VF*.
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF
i40e: implement device reset on VF
- Added the implementation for the VF reset functionality.
- Changed the i40e related operations during VF reset.
- Resent the patches because of the mail sent issue.
- Removed some VF reset emulation code.
- Removed all the code related with lock.
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.
doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83
++++++++++++++++++++++++++++++++++
Post by Wenzhuo Lu
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64 +++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running into an issue where the
VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
The application has ~500 2MB hugepages, and there's 2GB of free memory
available on top of that.
Have you seen this before? Any pointer or suggestion for debugging?
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is not released. This memory has to be released by the APP, so my patches haven’t covered this. Actually an example is needed to show how to use the reset API. I plan to modify the testpmd.
You may notice this feature is postponed to 16.11. Would you like to wait for the new version that will include an example?
Hi,

Unfortunately we need the VF reset working sooner than that, so one way
or the other I'll need to sort it out. Given I've got a use case where
this is happening, if it can be helpful for you I'm more than happy to
help as a guinea pig. If you could please give some guidance/guidelines
with regards to which API to use to sort the mbuf problem, I can try it
out and give back some feedback.

Thanks!

--
Kind
Lu, Wenzhuo
2016-07-06 00:45:18 UTC
Permalink
Hi Luca,
-----Original Message-----
Sent: Tuesday, July 5, 2016 5:53 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Monday, July 4, 2016 11:48 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Wenzhuo Lu
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the
VF link and let it recover.
PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base on the
patch set *support mailbox interruption on ixgbe/igb VF*.
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF
i40e: implement device reset on VF
- Added the implementation for the VF reset functionality.
- Changed the i40e related operations during VF reset.
- Resent the patches because of the mail sent issue.
- Removed some VF reset emulation code.
- Removed all the code related with lock.
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.
doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83
++++++++++++++++++++++++++++++++++
Post by Wenzhuo Lu
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64 +++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running into an
issue where the VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
The application has ~500 2MB hugepages, and there's 2GB of free
memory available on top of that.
Have you seen this before? Any pointer or suggestion for debugging?
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is not released. This
memory has to be released by the APP, so my patches haven’t covered this.
Actually an example is needed to show how to use the reset API. I plan to modify
the testpmd.
Post by Lu, Wenzhuo
You may notice this feature is postponed to 16.11. Would you like to wait for
the new version that will include an example?
Hi,
Unfortunately we need the VF reset working sooner than that, so one way or
the other I'll need to sort it out. Given I've got a use case where this is happening,
if it can be helpful for you I'm more than happy to help as a guinea pig. If you
could please give some guidance/guidelines with regards to which API to use to
sort the mbuf problem, I can try it out and give back some feedback.
Thanks!
I made a stupid mistake and deleted all my code. So, I have to take some time to rewrite it :(
Attached the example I used to test the reset API. It's modified from the l2fwd example. So you can compare it with l2fwd to see what need to be added.
Hopefully it can help :)
--
Kind regards,
Luca Boccassi
Luca Boccassi
2016-07-06 16:26:37 UTC
Permalink
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Tuesday, July 5, 2016 5:53 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Monday, July 4, 2016 11:48 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Wenzhuo Lu
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the
VF link and let it recover.
PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base on the
patch set *support mailbox interruption on ixgbe/igb VF*.
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF
i40e: implement device reset on VF
- Added the implementation for the VF reset functionality.
- Changed the i40e related operations during VF reset.
- Resent the patches because of the mail sent issue.
- Removed some VF reset emulation code.
- Removed all the code related with lock.
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.
doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83
++++++++++++++++++++++++++++++++++
Post by Wenzhuo Lu
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64 +++++++++++++++++++++++++-
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running into an
issue where the VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
The application has ~500 2MB hugepages, and there's 2GB of free
memory available on top of that.
Have you seen this before? Any pointer or suggestion for debugging?
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is not released. This
memory has to be released by the APP, so my patches haven’t covered this.
Actually an example is needed to show how to use the reset API. I plan to modify
the testpmd.
Post by Lu, Wenzhuo
You may notice this feature is postponed to 16.11. Would you like to wait for
the new version that will include an example?
Hi,
Unfortunately we need the VF reset working sooner than that, so one way or
the other I'll need to sort it out. Given I've got a use case where this is happening,
if it can be helpful for you I'm more than happy to help as a guinea pig. If you
could please give some guidance/guidelines with regards to which API to use to
sort the mbuf problem, I can try it out and give back some feedback.
Thanks!
I made a stupid mistake and deleted all my code. So, I have to take some time to rewrite it :(
Attached the example I used to test the reset API. It's modified from the l2fwd example. So you can compare it with l2fwd to see what need to be added.
Hopefully it can help :)
Thanks! That made me understand a couple of things more, and I've got
past the problem.

Unfortunately now there's a bigger issue - rte_eth_dev_reset is a
blocking call. the _RESET event callback is fired when the PF goes down,
but when I call rte_eth_dev_reset it will block until the PF goes back
up. There is no way, as far as I can see, to know if the PF is back up
before calling rte_eth_dev_reset.

This is a problem because, as far as I understand, I have to call all
the rte_eth_dev_ APIs from the same thread, in my case the master
thread, and I can't have that block potentially indefinitely.

Would it be possible to have 2 events instead of 1, one when the PF goes
down and one when it goes up? This way an application would be able to
soft-stop the port (drain queues, etc) when the PF is down, and then
call the reset API when it goes back up.

Thank
Lu, Wenzhuo
2016-07-07 01:09:53 UTC
Permalink
Hi Luca,
-----Original Message-----
Sent: Thursday, July 7, 2016 12:23 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Tuesday, July 5, 2016 5:53 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Monday, July 4, 2016 11:48 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Wenzhuo Lu
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset
the VF link and let it recover.
PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base on
the patch set *support mailbox interruption on ixgbe/igb VF*.
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF
i40e: implement device reset on VF
- Added the implementation for the VF reset functionality.
- Changed the i40e related operations during VF reset.
- Resent the patches because of the mail sent issue.
- Removed some VF reset emulation code.
- Removed all the code related with lock.
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.
doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83
++++++++++++++++++++++++++++++++++
Post by Wenzhuo Lu
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64
+++++++++++++++++++++++++-
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Wenzhuo Lu
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running into an
issue where the VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
The application has ~500 2MB hugepages, and there's 2GB of free
memory available on top of that.
Have you seen this before? Any pointer or suggestion for debugging?
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is not released. This
memory has to be released by the APP, so my patches haven’t covered this.
Actually an example is needed to show how to use the reset API. I
plan to modify the testpmd.
Post by Lu, Wenzhuo
You may notice this feature is postponed to 16.11. Would you like to wait for
the new version that will include an example?
Hi,
Unfortunately we need the VF reset working sooner than that, so one
way or the other I'll need to sort it out. Given I've got a use case
where this is happening, if it can be helpful for you I'm more than
happy to help as a guinea pig. If you could please give some
guidance/guidelines with regards to which API to use to sort the mbuf
problem, I can try it out and give back some feedback.
Post by Lu, Wenzhuo
Thanks!
I made a stupid mistake and deleted all my code. So, I have to take
some time to rewrite it :( Attached the example I used to test the reset API. It's
modified from the l2fwd example. So you can compare it with l2fwd to see what
need to be added.
Post by Lu, Wenzhuo
Hopefully it can help :)
Thanks! That made me understand a couple of things more, and I've got past the
problem.
Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking call.
the _RESET event callback is fired when the PF goes down, but when I call
rte_eth_dev_reset it will block until the PF goes back up. There is no way, as far
as I can see, to know if the PF is back up before calling rte_eth_dev_reset.
This is a problem because, as far as I understand, I have to call all the
rte_eth_dev_ APIs from the same thread, in my case the master thread, and I
can't have that block potentially indefinitely.
Would it be possible to have 2 events instead of 1, one when the PF goes down
and one when it goes up? This way an application would be able to soft-stop the
port (drain queues, etc) when the PF is down, and then call the reset API when it
goes back up.
Thanks!
Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
1, Normally we use kernel driver for PF. Now the kernel driver only have one kind of message for link down and up. So we cannot tell if it's down or up.
2, When the PF is down, if we don't reset the VF, VF is not working. It cannot receive any message from PF. So we cannot know that when PF is up. It means normally we have to reset VF twice when PF down and up. (Surely we can wait a while when we receive the message from PF until PF is up. But we cannot tell how long the time is appropriate. So this *wait a while* may work for flash.)
--
Kind re
Luca Boccassi
2016-07-07 10:20:51 UTC
Permalink
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Thursday, July 7, 2016 12:23 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Tuesday, July 5, 2016 5:53 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Monday, July 4, 2016 11:48 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Wenzhuo Lu
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset
the VF link and let it recover.
PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base on
the patch set *support mailbox interruption on ixgbe/igb VF*.
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF
i40e: implement device reset on VF
- Added the implementation for the VF reset functionality.
- Changed the i40e related operations during VF reset.
- Resent the patches because of the mail sent issue.
- Removed some VF reset emulation code.
- Removed all the code related with lock.
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.
doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59 ++++++++++++++++++++++++
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83
++++++++++++++++++++++++++++++++++
Post by Wenzhuo Lu
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64
+++++++++++++++++++++++++-
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Wenzhuo Lu
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running into an
issue where the VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors
appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
The application has ~500 2MB hugepages, and there's 2GB of free
memory available on top of that.
Have you seen this before? Any pointer or suggestion for debugging?
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is not
released. This
memory has to be released by the APP, so my patches haven’t covered this.
Actually an example is needed to show how to use the reset API. I
plan to modify the testpmd.
Post by Lu, Wenzhuo
You may notice this feature is postponed to 16.11. Would you like
to wait for
the new version that will include an example?
Hi,
Unfortunately we need the VF reset working sooner than that, so one
way or the other I'll need to sort it out. Given I've got a use case
where this is happening, if it can be helpful for you I'm more than
happy to help as a guinea pig. If you could please give some
guidance/guidelines with regards to which API to use to sort the mbuf
problem, I can try it out and give back some feedback.
Post by Lu, Wenzhuo
Thanks!
I made a stupid mistake and deleted all my code. So, I have to take
some time to rewrite it :( Attached the example I used to test the reset API. It's
modified from the l2fwd example. So you can compare it with l2fwd to see what
need to be added.
Post by Lu, Wenzhuo
Hopefully it can help :)
Thanks! That made me understand a couple of things more, and I've got past the
problem.
Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking call.
the _RESET event callback is fired when the PF goes down, but when I call
rte_eth_dev_reset it will block until the PF goes back up. There is no way, as far
as I can see, to know if the PF is back up before calling rte_eth_dev_reset.
This is a problem because, as far as I understand, I have to call all the
rte_eth_dev_ APIs from the same thread, in my case the master thread, and I
can't have that block potentially indefinitely.
Would it be possible to have 2 events instead of 1, one when the PF goes down
and one when it goes up? This way an application would be able to soft-stop the
port (drain queues, etc) when the PF is down, and then call the reset API when it
goes back up.
Thanks!
Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
1, Normally we use kernel driver for PF. Now the kernel driver only have one kind of message for link down and up. So we cannot tell if it's down or up.
2, When the PF is down, if we don't reset the VF, VF is not working. It cannot receive any message from PF. So we cannot know that when PF is up. It means normally we have to reset VF twice when PF down and up. (Surely we can wait a while when we receive the message from PF until PF is up. But we cannot tell how long the time is appropriate. So this *wait a while* may work for flash.)
Thanks for the clarification, I understand.

The problem with a blocking call is that we basically need to spawn one
thread per rte_eth_dev_reset call, since there is no way of knowing if a
PF is down for good or just flapping, and we can't have a single thread
managing all the interfaces being blocked forever (EG: PF 1 and 2 go
down, thread blocks on PF 1 reset call but it never returns, meanwhile
PF 2 goes back up but call is never made).

A colleague of mine, Eric Kinzie, suggested to add a blocking boolean
parameter to rte_eth_dev_reset API. If set to false, then the call will
not block and just does one try and return an error (EAGAIN ?). Would
this be an acceptable proposition?

--
Kind regards,
Luca Bo
Lu, Wenzhuo
2016-07-07 13:12:26 UTC
Permalink
-----Original Message-----
Sent: Thursday, July 7, 2016 6:21 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Thursday, July 7, 2016 12:23 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Tuesday, July 5, 2016 5:53 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Monday, July 4, 2016 11:48 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Wenzhuo Lu
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So,
when VF receices the messges of physical link down/up. APP
can reset the VF link and let it recover.
PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base
on the patch set *support mailbox interruption on ixgbe/igb VF*.
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF
i40e: implement device reset on VF
- Added the implementation for the VF reset functionality.
- Changed the i40e related operations during VF reset.
- Resent the patches because of the mail sent issue.
- Removed some VF reset emulation code.
- Removed all the code related with lock.
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.
doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59
++++++++++++++++++++++++
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Wenzhuo Lu
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83
++++++++++++++++++++++++++++++++++
Post by Wenzhuo Lu
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64
+++++++++++++++++++++++++-
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Wenzhuo Lu
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running
into an issue where the VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors
appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc call
in ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of
2048.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
The application has ~500 2MB hugepages, and there's 2GB of
free memory available on top of that.
Have you seen this before? Any pointer or suggestion for debugging?
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is not
released. This
memory has to be released by the APP, so my patches haven’t covered
this.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Actually an example is needed to show how to use the reset API.
I plan to modify the testpmd.
Post by Lu, Wenzhuo
You may notice this feature is postponed to 16.11. Would you
like to wait for
the new version that will include an example?
Hi,
Unfortunately we need the VF reset working sooner than that, so
one way or the other I'll need to sort it out. Given I've got a
use case where this is happening, if it can be helpful for you
I'm more than happy to help as a guinea pig. If you could please
give some guidance/guidelines with regards to which API to use
to sort the mbuf
problem, I can try it out and give back some feedback.
Post by Lu, Wenzhuo
Thanks!
I made a stupid mistake and deleted all my code. So, I have to
take some time to rewrite it :( Attached the example I used to
test the reset API. It's
modified from the l2fwd example. So you can compare it with l2fwd to
see what need to be added.
Post by Lu, Wenzhuo
Hopefully it can help :)
Thanks! That made me understand a couple of things more, and I've
got past the problem.
Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking
call.
Post by Lu, Wenzhuo
the _RESET event callback is fired when the PF goes down, but when I
call rte_eth_dev_reset it will block until the PF goes back up.
There is no way, as far as I can see, to know if the PF is back up before
calling rte_eth_dev_reset.
Post by Lu, Wenzhuo
This is a problem because, as far as I understand, I have to call
all the rte_eth_dev_ APIs from the same thread, in my case the
master thread, and I can't have that block potentially indefinitely.
Would it be possible to have 2 events instead of 1, one when the PF
goes down and one when it goes up? This way an application would be
able to soft-stop the port (drain queues, etc) when the PF is down,
and then call the reset API when it goes back up.
Thanks!
Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
1, Normally we use kernel driver for PF. Now the kernel driver only have one
kind of message for link down and up. So we cannot tell if it's down or up.
Post by Lu, Wenzhuo
2, When the PF is down, if we don't reset the VF, VF is not working.
It cannot receive any message from PF. So we cannot know that when PF
is up. It means normally we have to reset VF twice when PF down and
up. (Surely we can wait a while when we receive the message from PF
until PF is up. But we cannot tell how long the time is appropriate.
So this *wait a while* may work for flash.)
Thanks for the clarification, I understand.
The problem with a blocking call is that we basically need to spawn one thread
per rte_eth_dev_reset call, since there is no way of knowing if a PF is down for
good or just flapping, and we can't have a single thread managing all the
interfaces being blocked forever (EG: PF 1 and 2 go down, thread blocks on PF 1
reset call but it never returns, meanwhile PF 2 goes back up but call is never
made).
A colleague of mine, Eric Kinzie, suggested to add a blocking boolean parameter
to rte_eth_dev_reset API. If set to false, then the call will not block and just does
one try and return an error (EAGAIN ?). Would this be an acceptable proposition?
It's a good suggestion.
And I think if the parameter is set to false and the link is not up after trying once, it will be APP's responsibility to setup a timer or something like that to keep trying to bring up the link.
Luca Boccassi
2016-07-07 16:19:43 UTC
Permalink
Post by Lu, Wenzhuo
-----Original Message-----
Sent: Thursday, July 7, 2016 6:21 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Thursday, July 7, 2016 12:23 AM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Tuesday, July 5, 2016 5:53 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Lu, Wenzhuo
Hi Luca,
-----Original Message-----
Sent: Monday, July 4, 2016 11:48 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Wenzhuo Lu
If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So,
when VF receices the messges of physical link down/up. APP
can reset the VF link and let it recover.
PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base
on the patch set *support mailbox interruption on ixgbe/igb VF*.
lib/librte_ether: support device reset
ixgbe: implement device reset on VF
igb: implement device reset on VF
i40e: implement device reset on VF
- Added the implementation for the VF reset functionality.
- Changed the i40e related operations during VF reset.
- Resent the patches because of the mail sent issue.
- Removed some VF reset emulation code.
- Removed all the code related with lock.
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.
doc/guides/nics/overview.rst | 1 +
doc/guides/rel_notes/release_16_07.rst | 13 ++++++
drivers/net/e1000/igb_ethdev.c | 59
++++++++++++++++++++++++
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Wenzhuo Lu
drivers/net/i40e/i40e_ethdev.h | 4 ++
drivers/net/i40e/i40e_ethdev_vf.c | 83
++++++++++++++++++++++++++++++++++
Post by Wenzhuo Lu
drivers/net/i40e/i40e_rxtx.c | 10 ++++
drivers/net/i40e/i40e_rxtx.h | 4 ++
drivers/net/ixgbe/ixgbe_ethdev.c | 64
+++++++++++++++++++++++++-
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Wenzhuo Lu
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++--
lib/librte_ether/rte_ethdev.c | 17 +++++++
lib/librte_ether/rte_ethdev.h | 24 ++++++++++
lib/librte_ether/rte_ether_version.map | 7 +++
13 files changed, 295 insertions(+), 5 deletions(-)
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running
into an issue where the VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors
appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware
(-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc call
in ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of
2048.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
The application has ~500 2MB hugepages, and there's 2GB of
free memory available on top of that.
Have you seen this before? Any pointer or suggestion for debugging?
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is not
released. This
memory has to be released by the APP, so my patches haven’t covered
this.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Actually an example is needed to show how to use the reset API.
I plan to modify the testpmd.
Post by Lu, Wenzhuo
You may notice this feature is postponed to 16.11. Would you
like to wait for
the new version that will include an example?
Hi,
Unfortunately we need the VF reset working sooner than that, so
one way or the other I'll need to sort it out. Given I've got a
use case where this is happening, if it can be helpful for you
I'm more than happy to help as a guinea pig. If you could please
give some guidance/guidelines with regards to which API to use
to sort the mbuf
problem, I can try it out and give back some feedback.
Post by Lu, Wenzhuo
Thanks!
I made a stupid mistake and deleted all my code. So, I have to
take some time to rewrite it :( Attached the example I used to
test the reset API. It's
modified from the l2fwd example. So you can compare it with l2fwd to
see what need to be added.
Post by Lu, Wenzhuo
Hopefully it can help :)
Thanks! That made me understand a couple of things more, and I've
got past the problem.
Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking
call.
Post by Lu, Wenzhuo
the _RESET event callback is fired when the PF goes down, but when I
call rte_eth_dev_reset it will block until the PF goes back up.
There is no way, as far as I can see, to know if the PF is back up before
calling rte_eth_dev_reset.
Post by Lu, Wenzhuo
This is a problem because, as far as I understand, I have to call
all the rte_eth_dev_ APIs from the same thread, in my case the
master thread, and I can't have that block potentially indefinitely.
Would it be possible to have 2 events instead of 1, one when the PF
goes down and one when it goes up? This way an application would be
able to soft-stop the port (drain queues, etc) when the PF is down,
and then call the reset API when it goes back up.
Thanks!
Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
1, Normally we use kernel driver for PF. Now the kernel driver only have one
kind of message for link down and up. So we cannot tell if it's down or up.
Post by Lu, Wenzhuo
2, When the PF is down, if we don't reset the VF, VF is not working.
It cannot receive any message from PF. So we cannot know that when PF
is up. It means normally we have to reset VF twice when PF down and
up. (Surely we can wait a while when we receive the message from PF
until PF is up. But we cannot tell how long the time is appropriate.
So this *wait a while* may work for flash.)
Thanks for the clarification, I understand.
The problem with a blocking call is that we basically need to spawn one thread
per rte_eth_dev_reset call, since there is no way of knowing if a PF is down for
good or just flapping, and we can't have a single thread managing all the
interfaces being blocked forever (EG: PF 1 and 2 go down, thread blocks on PF 1
reset call but it never returns, meanwhile PF 2 goes back up but call is never
made).
A colleague of mine, Eric Kinzie, suggested to add a blocking boolean parameter
to rte_eth_dev_reset API. If set to false, then the call will not block and just does
one try and return an error (EAGAIN ?). Would this be an acceptable proposition?
It's a good suggestion.
And I think if the parameter is set to false and the link is not up after trying once, it will be APP's responsibility to setup a timer or something like that to keep trying to bring up the link.
That seems reasonable. I've thrown together a quick diff and played with
it on top of your patches and DPDK 2.2, seems to work as intended, I'm
attaching it for reference. Feel free to pick it up, adapt it or ignore
it :-)

Also I've noticed that the ixgbe is the only one that actually blocks,
e1000 returns already immediately if the dev_start fails (perhaps it
should be changed to be consistent?) and ixgb40 does weird things that
I'm not sure about, but couldn't spot a loop in there :-)

Also I've used int instead of bool because
drivers/net/e1000/base/e1000_osdep.h redefines bool and true/false, so
compilation fails when including stdbool.h and using bool in
rte_ethdev.h

--
Kind regards,
Luca Boccassi

--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -260,7 +260,7 @@ static void eth_igb_configure_msix_intr(
static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
void *param);
static void igbvf_mbx_process(struct rte_eth_dev *dev);
-static int igbvf_dev_reset(struct rte_eth_dev *dev);
+static int igbvf_dev_reset(struct rte_eth_dev *dev, int blocking);

/*
* Define VF Stats MACRO for Non "cleared on read" register
@@ -2598,7 +2598,7 @@ void igbvf_mbx_process(struct rte_eth_de
}

static int
-igbvf_dev_reset(struct rte_eth_dev *dev)
+igbvf_dev_reset(struct rte_eth_dev *dev, __rte_unused int blocking)
{
struct e1000_hw *hw =
E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -2626,12 +2626,12 @@ igbvf_dev_reset(struct rte_eth_dev *dev)
rte_delay_ms(1000);

diag = igbvf_dev_start(dev);
+ dev->data->dev_started = 1;
if (diag) {
PMD_INIT_LOG(ERR, "Igb VF reset: "
"Failed to start device.");
- return diag;
+ return -EAGAIN;
}
- dev->data->dev_started = 1;
eth_igbvf_stats_reset(dev);
if (dev->data->dev_conf.intr_conf.lsc == 0)
diag = eth_igb_link_update(dev, 0);
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -157,7 +157,7 @@ static int i40evf_dev_init(struct rte_et
static void i40evf_dev_close(struct rte_eth_dev *dev);
static int i40evf_dev_start(struct rte_eth_dev *dev);
static int i40evf_dev_configure(struct rte_eth_dev *dev);
-static int i40evf_handle_vf_reset(struct rte_eth_dev *dev);
+static int i40evf_handle_vf_reset(struct rte_eth_dev *dev, int blocking);

/* Default hash key buffer for RSS */
static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -1498,7 +1498,7 @@ i40e_vf_reset_dev(struct rte_eth_dev *de
}

static int
-i40evf_handle_vf_reset(struct rte_eth_dev *dev)
+i40evf_handle_vf_reset(struct rte_eth_dev *dev, __rte_unused int blocking)
{
struct i40e_adapter *adapter =
I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
@@ -1518,7 +1518,7 @@ i40evf_emulate_vf_reset(uint8_t port_id)
{
struct rte_eth_dev *dev = &rte_eth_devices[port_id];

- i40evf_handle_vf_reset(dev);
+ i40evf_handle_vf_reset(dev, 0);
}

static int
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -379,7 +379,7 @@ static void ixgbevf_dev_interrupt_handle
(r) = (h)->bitmap[idx] >> bit & 1;\
}while(0)

-static int ixgbevf_dev_reset(struct rte_eth_dev *dev);
+static int ixgbevf_dev_reset(struct rte_eth_dev *dev, int blocking);

/*
* The set of PCI devices this driver supports
@@ -6227,7 +6227,7 @@ static void ixgbevf_mbx_process(struct r
}

static int
-ixgbevf_dev_reset(struct rte_eth_dev *dev)
+ixgbevf_dev_reset(struct rte_eth_dev *dev, int blocking)
{
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
int diag = 0;
@@ -6256,7 +6256,12 @@ ixgbevf_dev_reset(struct rte_eth_dev *de
if (diag) {
PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
"Failed to start device.");
- continue;
+ if (blocking)
+ continue;
+ else {
+ dev->data->dev_started = 1;
+ return -EAGAIN;
+ }
}
dev->data->dev_started = 1;
ixgbevf_dev_stats_reset(dev);
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3370,7 +3370,7 @@ rte_eth_copy_pci_info(struct rte_eth_dev
}

int
-rte_eth_dev_reset(uint8_t port_id)
+rte_eth_dev_reset(uint8_t port_id, int blocking)
{
struct rte_eth_dev *dev;
int diag;
@@ -3381,7 +3381,7 @@ rte_eth_dev_reset(uint8_t port_id)

RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);

- diag = (*dev->dev_ops->dev_reset)(dev);
+ diag = (*dev->dev_ops->dev_reset)(dev, blocking);

return diag;
}
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1262,7 +1262,7 @@ typedef int (*eth_set_eeprom_t)(struct r
struct rte_dev_eeprom_info *info);
/**< @internal Program eeprom data */

-typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev, int blocking);
/**< @internal Function used to reset a configured Ethernet device. */

#ifdef RTE_NIC_BYPASS
@@ -3927,17 +3927,21 @@ rte_eth_dma_zone_reserve(const struct rt
* queues, restart the port.
* Before calling this API, APP should stop the rx/tx. When tx is being stopped,
* APP can drop the packets and release the buffer instead of sending them.
+ * This call will block until the PF is up again, unless blocking is false.
*
* @param port_id
* The port identifier of the Ethernet device.
+ * @param blocking
+ * Whether or not to block if the PF is not yet UP.
*
* @return
* - (0) if successful.
* - (-ENODEV) if port identifier is invalid.
* - (-ENOTSUP) if hardware doesn't support this function.
+ * - (-EAGAIN) if PF is not up and blocking was false.
*/
int
-rte_eth_dev_reset(uint8_t port_id);
+rte_eth_dev_reset(uint8_t port_id, int blocking);

#ifdef
Lu, Wenzhuo
2016-07-08 00:14:13 UTC
Permalink
Post by Lu, Wenzhuo
Post by Luca Boccassi
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Luca Boccassi
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running
into an issue where the VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs
in.
Post by Lu, Wenzhuo
Post by Luca Boccassi
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Luca Boccassi
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following
errors appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware
(-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc
call in ixgbe_alloc_rx_queue_mbufs returns NULL at
iteration 64 out of
2048.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Luca Boccassi
The application has ~500 2MB hugepages, and there's 2GB
of free memory available on top of that.
Have you seen this before? Any pointer or suggestion for
debugging?
Post by Lu, Wenzhuo
Post by Luca Boccassi
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Luca Boccassi
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is
not released. This
memory has to be released by the APP, so my patches haven’t
covered
this.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Actually an example is needed to show how to use the reset API.
I plan to modify the testpmd.
Post by Lu, Wenzhuo
You may notice this feature is postponed to 16.11. Would
you like to wait for
the new version that will include an example?
Hi,
Unfortunately we need the VF reset working sooner than that,
so one way or the other I'll need to sort it out. Given I've
got a use case where this is happening, if it can be helpful
for you I'm more than happy to help as a guinea pig. If you
could please give some guidance/guidelines with regards to
which API to use to sort the mbuf
problem, I can try it out and give back some feedback.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Thanks!
I made a stupid mistake and deleted all my code. So, I have to
take some time to rewrite it :( Attached the example I used to
test the reset API. It's
modified from the l2fwd example. So you can compare it with
l2fwd to see what need to be added.
Post by Lu, Wenzhuo
Hopefully it can help :)
Thanks! That made me understand a couple of things more, and
I've got past the problem.
Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking
call.
Post by Lu, Wenzhuo
the _RESET event callback is fired when the PF goes down, but
when I call rte_eth_dev_reset it will block until the PF goes back up.
There is no way, as far as I can see, to know if the PF is back up before
calling rte_eth_dev_reset.
Post by Lu, Wenzhuo
This is a problem because, as far as I understand, I have to
call all the rte_eth_dev_ APIs from the same thread, in my case
the master thread, and I can't have that block potentially indefinitely.
Would it be possible to have 2 events instead of 1, one when the
PF goes down and one when it goes up? This way an application
would be able to soft-stop the port (drain queues, etc) when the
PF is down, and then call the reset API when it goes back up.
Thanks!
Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
1, Normally we use kernel driver for PF. Now the kernel driver only have one
kind of message for link down and up. So we cannot tell if it's down or up.
Post by Lu, Wenzhuo
2, When the PF is down, if we don't reset the VF, VF is not working.
It cannot receive any message from PF. So we cannot know that when
PF is up. It means normally we have to reset VF twice when PF down
and up. (Surely we can wait a while when we receive the message
from PF until PF is up. But we cannot tell how long the time is appropriate.
So this *wait a while* may work for flash.)
Thanks for the clarification, I understand.
The problem with a blocking call is that we basically need to spawn
one thread per rte_eth_dev_reset call, since there is no way of
knowing if a PF is down for good or just flapping, and we can't have
a single thread managing all the interfaces being blocked forever
(EG: PF 1 and 2 go down, thread blocks on PF 1 reset call but it
never returns, meanwhile PF 2 goes back up but call is never made).
A colleague of mine, Eric Kinzie, suggested to add a blocking
boolean parameter to rte_eth_dev_reset API. If set to false, then
the call will not block and just does one try and return an error (EAGAIN ?).
Would this be an acceptable proposition?
Post by Lu, Wenzhuo
It's a good suggestion.
And I think if the parameter is set to false and the link is not up after trying
once, it will be APP's responsibility to setup a timer or something like that to
keep trying to bring up the link.
That seems reasonable. I've thrown together a quick diff and played with it on
top of your patches and DPDK 2.2, seems to work as intended, I'm attaching it
for reference. Feel free to pick it up, adapt it or ignore it :-)
Also I've noticed that the ixgbe is the only one that actually blocks,
e1000 returns already immediately if the dev_start fails (perhaps it should be
changed to be consistent?) and ixgb40 does weird things that I'm not sure about,
but couldn't spot a loop in there :-)
Also I've used int instead of bool because
drivers/net/e1000/base/e1000_osdep.h redefines bool and true/false, so
compilation fails when including stdbool.h and using bool in rte_ethdev.h
--
Kind regards,
Luca Boccassi
Glad to know it's working now. Thanks for your patch. Surely I'll try to include it in the next version :)
Luca Boccassi
2016-07-08 17:15:27 UTC
Permalink
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Luca Boccassi
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Luca Boccassi
Hello Wenzhuo,
I'm testing this patchset, but I am sporadically running
into an issue where the VFs reset fails after the PF flaps.
I have a VM running on a KVM box with a X540-AT2, passing 2 VFs
in.
Post by Lu, Wenzhuo
Post by Luca Boccassi
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Luca Boccassi
I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following
errors appear in the
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX
hardware
(-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
Jumping in with GDB, it seems that the rte_rxmbuf_alloc
call in ixgbe_alloc_rx_queue_mbufs returns NULL at
iteration 64 out of
2048.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Luca Boccassi
The application has ~500 2MB hugepages, and there's 2GB
of free memory available on top of that.
Have you seen this before? Any pointer or suggestion for
debugging?
Post by Lu, Wenzhuo
Post by Luca Boccassi
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Luca Boccassi
Thanks!
--
Kind regards,
Luca Boccassi
I think the problem is the mbuf occupied by the packets is
not released. This
memory has to be released by the APP, so my patches haven’t
covered
this.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Actually an example is needed to show how to use the reset API.
I plan to modify the testpmd.
Post by Lu, Wenzhuo
You may notice this feature is postponed to 16.11. Would
you like to wait for
the new version that will include an example?
Hi,
Unfortunately we need the VF reset working sooner than that,
so one way or the other I'll need to sort it out. Given I've
got a use case where this is happening, if it can be helpful
for you I'm more than happy to help as a guinea pig. If you
could please give some guidance/guidelines with regards to
which API to use to sort the mbuf
problem, I can try it out and give back some feedback.
Post by Lu, Wenzhuo
Post by Lu, Wenzhuo
Thanks!
I made a stupid mistake and deleted all my code. So, I have to
take some time to rewrite it :( Attached the example I used to
test the reset API. It's
modified from the l2fwd example. So you can compare it with
l2fwd to see what need to be added.
Post by Lu, Wenzhuo
Hopefully it can help :)
Thanks! That made me understand a couple of things more, and
I've got past the problem.
Unfortunately now there's a bigger issue - rte_eth_dev_reset is
a blocking
call.
Post by Lu, Wenzhuo
the _RESET event callback is fired when the PF goes down, but
when I call rte_eth_dev_reset it will block until the PF goes back up.
There is no way, as far as I can see, to know if the PF is back
up before
calling rte_eth_dev_reset.
Post by Lu, Wenzhuo
This is a problem because, as far as I understand, I have to
call all the rte_eth_dev_ APIs from the same thread, in my case
the master thread, and I can't have that block potentially indefinitely.
Would it be possible to have 2 events instead of 1, one when the
PF goes down and one when it goes up? This way an application
would be able to soft-stop the port (drain queues, etc) when the
PF is down, and then call the reset API when it goes back up.
Thanks!
Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
1, Normally we use kernel driver for PF. Now the kernel driver
only have one
kind of message for link down and up. So we cannot tell if it's down or up.
Post by Lu, Wenzhuo
2, When the PF is down, if we don't reset the VF, VF is not working.
It cannot receive any message from PF. So we cannot know that when
PF is up. It means normally we have to reset VF twice when PF down
and up. (Surely we can wait a while when we receive the message
from PF until PF is up. But we cannot tell how long the time is appropriate.
So this *wait a while* may work for flash.)
Thanks for the clarification, I understand.
The problem with a blocking call is that we basically need to spawn
one thread per rte_eth_dev_reset call, since there is no way of
knowing if a PF is down for good or just flapping, and we can't have
a single thread managing all the interfaces being blocked forever
(EG: PF 1 and 2 go down, thread blocks on PF 1 reset call but it
never returns, meanwhile PF 2 goes back up but call is never made).
A colleague of mine, Eric Kinzie, suggested to add a blocking
boolean parameter to rte_eth_dev_reset API. If set to false, then
the call will not block and just does one try and return an error (EAGAIN ?).
Would this be an acceptable proposition?
Post by Lu, Wenzhuo
It's a good suggestion.
And I think if the parameter is set to false and the link is not up after trying
once, it will be APP's responsibility to setup a timer or something like that to
keep trying to bring up the link.
That seems reasonable. I've thrown together a quick diff and played with it on
top of your patches and DPDK 2.2, seems to work as intended, I'm attaching it
for reference. Feel free to pick it up, adapt it or ignore it :-)
Also I've noticed that the ixgbe is the only one that actually blocks,
e1000 returns already immediately if the dev_start fails (perhaps it should be
changed to be consistent?) and ixgb40 does weird things that I'm not sure about,
but couldn't spot a loop in there :-)
Also I've used int instead of bool because
drivers/net/e1000/base/e1000_osdep.h redefines bool and true/false, so
compilation fails when including stdbool.h and using bool in rte_ethdev.h
--
Kind regards,
Luca Boccassi
Glad to know it's working now. Thanks for your patch. Surely I'll try to include it in the next version :)
Great, thanks!

Unfortunately I found one issue: if PF is down, and then the VF on the
guest is down as well (ip link down) and then goes back up before the
PF, then calling rte_eth_dev_reset will return 0 (success), even though
the PF is still down and it should fail. This is with ixgbe. Any idea
what could be the problem?

--
Kind regards,
Luca
Lu, Wenzhuo
2016-07-11 01:32:20 UTC
Permalink
Unfortunately I found one issue: if PF is down, and then the VF on the guest is
down as well (ip link down) and then goes back up before the PF, then calling
rte_eth_dev_reset will return 0 (success), even though the PF is still down and it
should fail. This is with ixgbe. Any idea what could be the problem?
I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.
It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again.
But on igb, when PF link is down, we cannot reset VF link successfully, so when the PF link is up, we cannot receive the message. No trigger for us to reset the VF link again. That's why on igb we have to try again and again until it succeed, means until PF link is up.
So the return 0 by rte_eth_dev_reset means the resetting succeeded, not mean the rx/tx is ready. Rx/tx has to depend on the PF link is up.
--
Kind regards,
Luca
Luca Boccassi
2016-07-11 12:02:20 UTC
Permalink
Post by Lu, Wenzhuo
Unfortunately I found one issue: if PF is down, and then the VF on the guest is
down as well (ip link down) and then goes back up before the PF, then calling
rte_eth_dev_reset will return 0 (success), even though the PF is still down and it
should fail. This is with ixgbe. Any idea what could be the problem?
I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.
It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again.
What message do you refer to here? I am seeing the RESET callback only
when the PF goes down, not when it goes up.

At the moment, with ixgbe, this happens:

PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up ->
VF link has no-carrier, and traffic does NOT go through

The problem is that there is just no way of being notified that PF is
up, and if rte_eth_dev_reset succeeds I have no way of knowing that I
need to run it again.
Post by Lu, Wenzhuo
But on igb, when PF link is down, we cannot reset VF link successfully, so when the PF link is up, we cannot receive the message. No trigger for us to reset the VF link again. That's why on igb we have to try again and again until it succeed, means until PF link is up.
So the return 0 by rte_eth_dev_reset means the resetting succeeded, not mean the rx/tx is ready. Rx/tx has to depend on the PF link is up.
Luca Boccassi
2016-07-11 15:43:17 UTC
Permalink
Post by Luca Boccassi
Post by Lu, Wenzhuo
Unfortunately I found one issue: if PF is down, and then the VF on the guest is
down as well (ip link down) and then goes back up before the PF, then calling
rte_eth_dev_reset will return 0 (success), even though the PF is still down and it
should fail. This is with ixgbe. Any idea what could be the problem?
I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.
It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again.
What message do you refer to here? I am seeing the RESET callback only
when the PF goes down, not when it goes up.
PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up ->
VF link has no-carrier, and traffic does NOT go through
The problem is that there is just no way of being notified that PF is
up, and if rte_eth_dev_reset succeeds I have no way of knowing that I
need to run it again.
I was now able to solve this use case, by having the rte_eth_dev_reset
implementations return -EAGAIN if the dev is not up. This way I know, in
the application, that I have to try again. What do you think?

IMHO it makes sense, as the reset does not actually succeeds, and the
caller should try again. The diff is very trivial, and attached for
reference.

--
Kind regards,
Luca Boccassi


Make rte_eth_dev_reset return EAGAIN if VF down

If VF is down the reset will not happen, so the driver should return
EAGAIN to signal the application that it needs to call again
rte_eth_dev_reset.

Signed-off-by: Luca Boccassi <***@brocade.com
---
drivers/net/e1000/igb_ethdev.c | 2 +-
drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
drivers/net/ixgbe/ixgbe_ethdev.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -6235,7 +6235,7 @@ ixgbevf_dev_reset(struct rte_eth_dev *de

/* Nothing needs to be done if the device is not started. */
if (!dev->data->dev_started)
- return 0;
+ return -EAGAIN;

PMD_DRV_LOG(DEBUG, "Link up/down event detected.");

--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1504,7 +1504,7 @@ i40evf_handle_vf_reset(struct rte_eth_de
I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);

if (!dev->data->dev_started)
- return 0;
+ return -EAGAIN;

adapter->reset_number = 1;
i40e_vf_reset_dev(dev);
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2609,7 +2609,7 @@ igbvf_dev_reset(struct rte_eth_dev *dev,

/* Nothing needs to be done if the device is not started. */
if (!dev->data->dev_started)
- return 0;
+ return -EAGAIN;

PMD_DRV_LOG(DEBUG, "Link up/down event det
Lu, Wenzhuo
2016-07-12 01:19:39 UTC
Permalink
-----Original Message-----
Sent: Monday, July 11, 2016 11:43 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Luca Boccassi
Post by Lu, Wenzhuo
Post by Luca Boccassi
Unfortunately I found one issue: if PF is down, and then the VF on
the guest is down as well (ip link down) and then goes back up
before the PF, then calling rte_eth_dev_reset will return 0
(success), even though the PF is still down and it should fail. This is with
ixgbe. Any idea what could be the problem?
Post by Luca Boccassi
Post by Lu, Wenzhuo
I've found this interesting thing. I believe it’s the HW difference between igb
and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF
cannot. The expression is the registers of the ixgbe VF can be accessed when
the PF link is down but igb VF cannot.
Post by Luca Boccassi
Post by Lu, Wenzhuo
It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is
up, we receive the message again and reset the VF link again.
Post by Luca Boccassi
What message do you refer to here? I am seeing the RESET callback only
when the PF goes down, not when it goes up.
PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up
-> VF link has no-carrier, and traffic does NOT go through
The problem is that there is just no way of being notified that PF is
up, and if rte_eth_dev_reset succeeds I have no way of knowing that I
need to run it again.
I was now able to solve this use case, by having the rte_eth_dev_reset
implementations return -EAGAIN if the dev is not up. This way I know, in the
application, that I have to try again. What do you think?
IMHO it makes sense, as the reset does not actually succeeds, and the caller
should try again. The diff is very trivial, and attached for reference.
Yes, I think the change is reasonable. Sorry, I didn’t realize you're talking about the code you have changed. Maybe we're not on the same page when discussing before :)
--
Kind regards,
Luca Boccassi
Make rte_eth_dev_reset return EAGAIN if VF down
If VF is down the reset will not happen, so the driver should return
EAGAIN to signal the application that it needs to call again
rte_eth_dev_reset.
---
drivers/net/e1000/igb_ethdev.c | 2 +-
drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
drivers/net/ixgbe/ixgbe_ethdev.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -6235,7 +6235,7 @@ ixgbevf_dev_reset(struct rte_eth_dev *de
/* Nothing needs to be done if the device is not started. */
if (!dev->data->dev_started)
- return 0;
+ return -EAGAIN;
PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1504,7 +1504,7 @@ i40evf_handle_vf_reset(struct rte_eth_de
I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
if (!dev->data->dev_started)
- return 0;
+ return -EAGAIN;
adapter->reset_number = 1;
i40e_vf_reset_dev(dev);
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2609,7 +2609,7 @@ igbvf_dev_reset(struct rte_eth_dev *dev,
/* Nothing needs to be done if the device is not started. */
if (!dev->data->dev_started)
- return 0;
+ return -EAGAIN;
PMD_DRV_LOG(DEBUG, "Link up/down event detec
Luca Boccassi
2016-08-26 12:58:19 UTC
Permalink
Post by Luca Boccassi
Post by Luca Boccassi
Post by Lu, Wenzhuo
Unfortunately I found one issue: if PF is down, and then the VF on the guest is
down as well (ip link down) and then goes back up before the PF, then calling
rte_eth_dev_reset will return 0 (success), even though the PF is still down and it
should fail. This is with ixgbe. Any idea what could be the problem?
I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.
It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again.
What message do you refer to here? I am seeing the RESET callback only
when the PF goes down, not when it goes up.
PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up ->
VF link has no-carrier, and traffic does NOT go through
The problem is that there is just no way of being notified that PF is
up, and if rte_eth_dev_reset succeeds I have no way of knowing that I
need to run it again.
I was now able to solve this use case, by having the rte_eth_dev_reset
implementations return -EAGAIN if the dev is not up. This way I know, in
the application, that I have to try again. What do you think?
IMHO it makes sense, as the reset does not actually succeeds, and the
caller should try again. The diff is very trivial, and attached for
reference.
Hi,

Is there any update on resubmitting this patchset for
Lu, Wenzhuo
2016-08-29 01:04:09 UTC
Permalink
Hi Luca,
-----Original Message-----
Sent: Friday, August 26, 2016 8:58 PM
To: Lu, Wenzhuo
Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
Post by Luca Boccassi
Post by Luca Boccassi
Post by Lu, Wenzhuo
Post by Luca Boccassi
Unfortunately I found one issue: if PF is down, and then the VF
on the guest is down as well (ip link down) and then goes back
up before the PF, then calling rte_eth_dev_reset will return 0
(success), even though the PF is still down and it should fail. This is with
ixgbe. Any idea what could be the problem?
Post by Luca Boccassi
Post by Luca Boccassi
Post by Lu, Wenzhuo
I've found this interesting thing. I believe it’s the HW difference between
igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb
VF cannot. The expression is the registers of the ixgbe VF can be accessed when
the PF link is down but igb VF cannot.
Post by Luca Boccassi
Post by Luca Boccassi
Post by Lu, Wenzhuo
It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link
is up, we receive the message again and reset the VF link again.
Post by Luca Boccassi
Post by Luca Boccassi
What message do you refer to here? I am seeing the RESET callback
only when the PF goes down, not when it goes up.
PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up
-> VF link has no-carrier, and traffic does NOT go through
The problem is that there is just no way of being notified that PF
is up, and if rte_eth_dev_reset succeeds I have no way of knowing
that I need to run it again.
I was now able to solve this use case, by having the rte_eth_dev_reset
implementations return -EAGAIN if the dev is not up. This way I know,
in the application, that I have to try again. What do you think?
IMHO it makes sense, as the reset does not actually succeeds, and the
caller should try again. The diff is very trivial, and attached for
reference.
Hi,
Is there any update on resubmitting this patchset for 16.11? Thanks!
Sorry, we're short of hands, so this feature is planned to

Loading...