Discussion:
Intel Wireless 7260 hardware timed out randomly
wzyboy
2013-11-03 09:11:39 UTC
Permalink
Hi, there.

I am encountering some bugs with iwlwifi.ko in Linux 3.11.6.

My laptop is a ThinkPad X240s with Network controller Intel Corporation
Wireless 7260 (rev 6b). I have Arch Linux with kernel 3.11.6 installed
on it. The wireless works out-of-box, thanks for your work!

However, after using wireless for some time (could be either an hour
with normal web page browsing, or several hours with heavy Internet
downloading), the wireless works abnormally. The sympton is that every
connection get lost, and the hardware "timed out" when I try to take
down or take up the interface with `ip link set wlan0 down' or `ip link
set wlan0 up'. When this bug occurs, even reloading the kernel modules
(with `modprobe') could not bring the wireless back alive. What I could
only do is to reboot the laptop.

After encountering this bug for several times, I found something useful
in `dmesg', as I've attached. Please take a look at the log and find out
then squash this bug.

If I could be helpful by providing more debugging information please let
me know.


Sincere regards.

--
wzyboy
Grumbach, Emmanuel
2013-11-03 09:23:51 UTC
Permalink
wzyboy
2013-11-04 09:21:59 UTC
Permalink
Here is a text/plain version of mail since the mail gateway at
kernel.org does not accept any html mail.

--
wzyboy


---------- Forwarded message ----------
From: wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org>
Date: 2013/11/4
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly
To: "Grumbach, Emmanuel" <emmanuel.grumbach-***@public.gmane.org>
Cc: "ilw-VuQAYsv1563Yd54FQh9/***@public.gmane.org" <ilw-VuQAYsv1563Yd54FQh9/***@public.gmane.org>,
"linux-wireless-***@public.gmane.org" <linux-wireless-***@public.gmane.org>


Hi,

thanks for your quick reply.

On receiving your email 23 hours ago, I set that option in modprobe
and rebooted my laptop. It works fine till just now when that bug
occurs again when I was just viewing some man-page. Then I have to
reboot my laptop to bring network back alive, in order to reply you
this email :-)

I've attached the lasted systemd journal, it seems a little different
from yesterday's dmesg log... Could you take a look at it? And I found
that every time I boot up my laptop there is a little line of message
like:

Nov 04 17:07:36 xenien kernel: iwlwifi 0000:03:00.0: can't disable
ASPM; OS doesn't have ASPM control

Does it have something to do with the bug?

Though this bug only occurs at a frequency of ~1 per day but it's
somewhat unexpected and annoying... ;-(

Sincere regards.

--
wzyboy


2013/11/3 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>

> >
> > Hi, there.
> >
> > I am encountering some bugs with iwlwifi.ko in Linux 3.11.6.
> >
> > My laptop is a ThinkPad X240s with Network controller Intel Corporation
> > Wireless 7260 (rev 6b). I have Arch Linux with kernel 3.11.6 installed on it. The
> > wireless works out-of-box, thanks for your work!
> >
> > However, after using wireless for some time (could be either an hour with
> > normal web page browsing, or several hours with heavy Internet
> > downloading), the wireless works abnormally. The sympton is that every
> > connection get lost, and the hardware "timed out" when I try to take down
> > or take up the interface with `ip link set wlan0 down' or `ip link set wlan0 up'.
> > When this bug occurs, even reloading the kernel modules (with `modprobe')
> > could not bring the wireless back alive. What I could only do is to reboot the
> > laptop.
> >
> > After encountering this bug for several times, I found something useful in
> > `dmesg', as I've attached. Please take a look at the log and find out then
> > squash this bug.
> >
>
> How easily can you reproduce the bug?
> Can you please try to see if this work around can help?
> Add
> options iwlmvm power_scheme=1
>
> in a .conf file under /etc/modprobe.d/
>
> Thanks.
>
Sedat Dilek
2013-11-04 09:42:16 UTC
Permalink
On Mon, Nov 4, 2013 at 10:21 AM, wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org> wrote:
> Here is a text/plain version of mail since the mail gateway at
> kernel.org does not accept any html mail.
>

Hi,

Here two hints for "correct asking" on Linux-related mailing-lists:

Documentation/email-clients.txt [1] says:

"The default setting of not composing in HTML is appropriate; do not
enable it."

Documentation/development-process/2.Process [2] says:

"- Avoid top-posting (the practice of putting your answer above the quoted
text you are responding to). It makes your response harder to read and
makes a poor impression."

Checkout Documentation/ directory for more help/hints.

Regards,
- Sedat -

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/email-clients.txt#n89
[2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/development-process/2.Process#n426

> --
> wzyboy
>
>
> ---------- Forwarded message ----------
> From: wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org>
> Date: 2013/11/4
> Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly
> To: "Grumbach, Emmanuel" <emmanuel.grumbach-***@public.gmane.org>
> Cc: "ilw-VuQAYsv1563Yd54FQh9/***@public.gmane.org" <ilw-VuQAYsv1563Yd54FQh9/***@public.gmane.org>,
> "linux-wireless-***@public.gmane.org" <linux-wireless-***@public.gmane.org>
>
>
> Hi,
>
> thanks for your quick reply.
>
> On receiving your email 23 hours ago, I set that option in modprobe
> and rebooted my laptop. It works fine till just now when that bug
> occurs again when I was just viewing some man-page. Then I have to
> reboot my laptop to bring network back alive, in order to reply you
> this email :-)
>
> I've attached the lasted systemd journal, it seems a little different
> from yesterday's dmesg log... Could you take a look at it? And I found
> that every time I boot up my laptop there is a little line of message
> like:
>
> Nov 04 17:07:36 xenien kernel: iwlwifi 0000:03:00.0: can't disable
> ASPM; OS doesn't have ASPM control
>
> Does it have something to do with the bug?
>
> Though this bug only occurs at a frequency of ~1 per day but it's
> somewhat unexpected and annoying... ;-(
>
> Sincere regards.
>
> --
> wzyboy
>
>
> 2013/11/3 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>
>
>> >
>> > Hi, there.
>> >
>> > I am encountering some bugs with iwlwifi.ko in Linux 3.11.6.
>> >
>> > My laptop is a ThinkPad X240s with Network controller Intel Corporation
>> > Wireless 7260 (rev 6b). I have Arch Linux with kernel 3.11.6 installed on it. The
>> > wireless works out-of-box, thanks for your work!
>> >
>> > However, after using wireless for some time (could be either an hour with
>> > normal web page browsing, or several hours with heavy Internet
>> > downloading), the wireless works abnormally. The sympton is that every
>> > connection get lost, and the hardware "timed out" when I try to take down
>> > or take up the interface with `ip link set wlan0 down' or `ip link set wlan0 up'.
>> > When this bug occurs, even reloading the kernel modules (with `modprobe')
>> > could not bring the wireless back alive. What I could only do is to reboot the
>> > laptop.
>> >
>> > After encountering this bug for several times, I found something useful in
>> > `dmesg', as I've attached. Please take a look at the log and find out then
>> > squash this bug.
>> >
>>
>> How easily can you reproduce the bug?
>> Can you please try to see if this work around can help?
>> Add
>> options iwlmvm power_scheme=1
>>
>> in a .conf file under /etc/modprobe.d/
>>
>> Thanks.
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-04 09:53:56 UTC
Permalink
2013/11/4 Sedat Dilek <sedat.dilek-***@public.gmane.org>:
> Hi,
>
> Here two hints for "correct asking" on Linux-related mailing-lists:
>
> Documentation/email-clients.txt [1] says:
>
> "The default setting of not composing in HTML is appropriate; do not
> enable it."
>
> Documentation/development-process/2.Process [2] says:
>
> "- Avoid top-posting (the practice of putting your answer above the quoted
> text you are responding to). It makes your response harder to read and
> makes a poor impression."
>
> Checkout Documentation/ directory for more help/hints.
>
> Regards,
> - Sedat -


Hi, Sedat.

I am so sorry that I was in a hurry to reply in Gmail web interface
so I forgot to set "Text-only" mode and violated the mailing lists
etiquette.

I'll keep them in mind.

--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Grumbach, Emmanuel
2013-11-04 09:42:53 UTC
Permalink
wzyboy
2013-11-05 04:34:15 UTC
Permalink
2013/11/4 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
> Not sure - you can try 3.12 in which you shouldn't see this or remove
> pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
> PCIE_LINK_STATE_CLKPM);
> from drivers/net/wireless/iwlwifi/pcie/trans.c
>
> Did you see an improvement without power save feature? Or it was just the same?


Since Arch Linux's official responsitory has no linux-3.12 yet, I
compiled Linux 3.12 on my own.

Unforturnately, this bug occurs twice during my wgeting
linux-3.12.0.tar.xz so I have to reboot twice to finish the
download... (So disabling power save feature makes no difference at least.)

Nevertheless, I'm now using Linux 3.12.0 and I've attached the dmesg
log, in which the complain about ASPM exists no more.

I'll follow up if this bug occurs in 3.12.

Thanks Emmanuell!

--
wzyboy
Grumbach, Emmanuel
2013-11-05 11:04:52 UTC
Permalink
wzyboy
2013-11-05 11:42:12 UTC
Permalink
Sorry I forgot to "Reply-all" instead of "Reply"... (Bad habit
developed by Google Groups)

--
wzyboy



---------- Forwarded message ----------
From: wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org>
Date: 2013/11/5
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly
To: "Grumbach, Emmanuel" <emmanuel.grumbach-***@public.gmane.org>


2013/11/5 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
> Are you sure it is the same phenomenon?
> Same WARNING with same value in register?


Yes I am sure that's the same phenomenon. And here are the comparison
of those "WARNING" lines in recent days' kernel log:

// I bought this laptop on Nov 01 and installed (with wired network)
Arch Linux on the same day so no logs for Nov 01. On Nov 02 I started
to use wireless network, and here came the WARNINGs:

Nov 02 13:01:53 xenien kernel: WARNING: CPU: 2 PID: 790 at
drivers/net/wireless/iwlwifi/mvm/mac80211.c:1047
iwl_mvm_mac_sta_state+0x230/0x290 [iwlmvm]()
Nov 02 13:01:55 xenien kernel: WARNING: CPU: 2 PID: 790 at
net/mac80211/sta_info.c:839 __sta_info_destroy+0x33c/0x380
[mac80211]()
Nov 02 13:02:11 xenien kernel: WARNING: CPU: 1 PID: 4942 at
drivers/net/wireless/iwlwifi/pcie/trans.c:883
iwl_trans_pcie_grab_nic_access+0x1d8/0x1f0 [iwlwifi]()
Nov 02 13:02:11 xenien kernel: Timeout waiting for hardware access
(CSR_GP_CNTRL 0xffffffff)
Nov 02 13:09:17 xenien kernel: WARNING: CPU: 1 PID: 5776 at
kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xd0()
Nov 02 13:09:17 xenien kernel: Watchdog detected hard LOCKUP on cpu 1

Nov 03 14:13:21 xenien kernel: WARNING: CPU: 3 PID: 454 at
drivers/net/wireless/iwlwifi/pcie/trans.c:883
iwl_trans_pcie_grab_nic_access+0x1d8/0x1f0 [iwlwifi]()
Nov 03 14:13:21 xenien kernel: Timeout waiting for hardware access
(CSR_GP_CNTRL 0xffffffff)
Nov 03 14:13:25 xenien kernel: WARNING: CPU: 0 PID: 454 at
drivers/net/wireless/iwlwifi/mvm/mac80211.c:1047
iwl_mvm_mac_sta_state+0x230/0x290 [iwlmvm]()
Nov 03 14:13:27 xenien kernel: WARNING: CPU: 0 PID: 454 at
net/mac80211/sta_info.c:839 __sta_info_destroy+0x33c/0x380
[mac80211]()
Nov 03 14:20:26 xenien kernel: WARNING: CPU: 1 PID: 6955 at
drivers/net/wireless/iwlwifi/pcie/trans.c:883
iwl_trans_pcie_grab_nic_access+0x1d8/0x1f0 [iwlwifi]()
Nov 03 16:30:45 xenien kernel: WARNING: CPU: 3 PID: 12926 at
drivers/net/wireless/iwlwifi/pcie/trans.c:883
iwl_trans_pcie_grab_nic_access+0x1d8/0x1f0 [iwlwifi]()
Nov 03 16:33:12 xenien kernel: WARNING: CPU: 0 PID: 469 at
net/mac80211/sta_info.c:839 __sta_info_destroy+0x33c/0x380
[mac80211]()

Nov 04 17:00:27 xenien kernel: WARNING: CPU: 3 PID: 10978 at
drivers/net/wireless/iwlwifi/pcie/trans.c:883
iwl_trans_pcie_grab_nic_access+0x1d8/0x1f0 [iwlwifi]()
Nov 04 17:00:27 xenien kernel: Timeout waiting for hardware access
(CSR_GP_CNTRL 0xffffffff)
Nov 04 17:00:31 xenien kernel: WARNING: CPU: 3 PID: 10978 at
drivers/net/wireless/iwlwifi/mvm/mac80211.c:1047
iwl_mvm_mac_sta_state+0x230/0x290 [iwlmvm]()
Nov 04 17:00:33 xenien kernel: WARNING: CPU: 0 PID: 10978 at
net/mac80211/sta_info.c:839 __sta_info_destroy+0x33c/0x380
[mac80211]()
Nov 04 22:13:08 xenien kernel: WARNING: CPU: 1 PID: 21506 at
drivers/net/wireless/iwlwifi/pcie/trans.c:883
iwl_trans_pcie_grab_nic_access+0x1d8/0x1f0 [iwlwifi]()
Nov 04 22:13:12 xenien kernel: WARNING: CPU: 2 PID: 21506 at
drivers/net/wireless/iwlwifi/mvm/mac80211.c:1047
iwl_mvm_mac_sta_state+0x230/0x290 [iwlmvm]()
Nov 04 22:13:14 xenien kernel: WARNING: CPU: 2 PID: 21506 at
net/mac80211/sta_info.c:839 __sta_info_destroy+0x33c/0x380
[mac80211]()
Nov 04 22:26:13 xenien kernel: WARNING: CPU: 3 PID: 1623 at
drivers/net/wireless/iwlwifi/pcie/trans.c:883
iwl_trans_pcie_grab_nic_access+0x1d8/0x1f0 [iwlwifi]()
Nov 04 22:26:17 xenien kernel: WARNING: CPU: 3 PID: 1623 at
drivers/net/wireless/iwlwifi/mvm/mac80211.c:1047
iwl_mvm_mac_sta_state+0x230/0x290 [iwlmvm]()
Nov 04 22:26:19 xenien kernel: WARNING: CPU: 3 PID: 1623 at
net/mac80211/sta_info.c:839 __sta_info_destroy+0x33c/0x380
[mac80211]()
Nov 04 22:31:41 xenien kernel: WARNING: CPU: 3 PID: 872 at
drivers/net/wireless/iwlwifi/pcie/trans.c:883
iwl_trans_pcie_grab_nic_access+0x1d8/0x1f0 [iwlwifi]()
Nov 04 22:31:45 xenien kernel: WARNING: CPU: 3 PID: 872 at
drivers/net/wireless/iwlwifi/mvm/mac80211.c:1047
iwl_mvm_mac_sta_state+0x230/0x290 [iwlmvm]()
Nov 04 22:31:47 xenien kernel: WARNING: CPU: 3 PID: 872 at
net/mac80211/sta_info.c:839 __sta_info_destroy+0x33c/0x380
[mac80211]()

// At the night of Nov 04 I compiled Linux 3.12 and no more WARNINGs
on Nov 05, yet.


These timestamps are of UTC+8. I added "options iwlmvm power_scheme=1"
at about 2013-11-03 17:30 UTC+8...

I've been using Linux 3.12 for one day now, no more bugs about
iwlwifi.ko are encountered... Maybe this is fixed in Linux 3.12?


--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-06 04:47:18 UTC
Permalink
2013/11/5 wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org>:
> I've been using Linux 3.12 for one day now, no more bugs about
> iwlwifi.ko are encountered... Maybe this is fixed in Linux 3.12?

Hi, I'm back.

This terriable bug occurs again in Linux 3.12. This time I've been
prepared for it and wrote down the complete process of its apperance
and my reacting, as attached.

IMHO they are quite similar with those errors in 3.11...


Sincere regards.

--
wzyboy
Grumbach, Emmanuel
2013-11-06 06:18:44 UTC
Permalink
wzyboy
2013-11-06 06:34:50 UTC
Permalink
2013/11/6 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
>>
>> Hi, I'm back.
>>
>> This terriable bug occurs again in Linux 3.12. This time I've been p=
repared for
>> it and wrote down the complete process of its apperance and my react=
ing, as
>> attached.
>
> Thanks - was that with power save disabled?

Yes, Linux 3.12 with "options iwlmvm power_scheme=3D1". (I did not touc=
h
that .conf file after the kernel upgrade)

>
>>
>> IMHO they are quite similar with those errors in 3.11...
>
> Indeed. The only difference is that you don=E2=80=99t have PCI compla=
in about not being able to disable L1.

I see. But I still cannot figure out what is the "trigger" of this
bug. Today (Nov 06 UTC+8) this bug occurs twice till now (14:30), they
were at 12:31 and 13:30. At the first time I was about to do a system
upgrade and at the second time I was using rsync to upload photos from
my Android phone to my laptop.

Sometimes I was not even using network (the traffic was near zero)
when the bug occurs. So this bug seems to occur no matter of network
traffic states?

Could you think of a possible "trigger" of this bug so I could try to
avoid it (I hate rebooting) before the final fix is released? For
example, if there is something wrong with the "modules linked in" I
could blacklist that module...


Sincere regards.

--=20
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireles=
s" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Grumbach, Emmanuel
2013-11-06 06:37:18 UTC
Permalink
wzyboy
2013-11-06 06:47:31 UTC
Permalink
2013/11/6 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
> I don't know - I am trying to check with our HW guys here.
> Can you please run lspci -xxx before and after it happens?
> BTW - how do you recover? Reloading the module is enough?

I've attached the output of `lspci -xxx'. The next time this bug
occurs I will run it again and send the output to you.

This bug occurs more than 15 times since I bought this laptop on Nov
01 and I've tried differnt methods after each, trying to bring the
network back alive without rebooting. However, no matter I use "ip
link set wlan0 down", "ip link set wlan0 up" or reloading the kernel
module, it just did not work -- the only way is to reboot the laptop
... (That's why I said "I hate rebooting")

Could you suggest any other possible methods that may have a chance to
recover without rebooting? I could try it the next time this bug
occurs.

--
wzyboy
wzyboy
2013-11-06 07:07:12 UTC
Permalink
2013/11/6 wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org>:
> I've attached the output of `lspci -xxx'. The next time this bug
> occurs I will run it again and send the output to you.


So dramatical it was. Several minutes ago I clicked on "Senden"
(German for "Send") button to send that email. After the mail was
sent, I opened a new tab in the browser, trying to google something
and found the Internet connection is lost. -- The bug occured again!

So I ran `lcpci -xxx" and saved the output along with the kernel
logs. I'm shocked that the hex strings are all "00".

I rebooted my laptop and try to send you the logs and the bug occurs
again... -- seems more and more freuquent -- and I had to reboot ...

Here are the logs ... finally.

--
wzyboy
Grumbach, Emmanuel
2013-11-06 07:10:28 UTC
Permalink
wzyboy
2013-11-06 07:12:40 UTC
Permalink
2013/11/6 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
> Wait - you mean that after the bug occurred before you rebooted, lspci -xxx show all 00?
> I can see 0xff here.
> Anyway - this is very bad... checking with HW guys...


Sorry, that's my typo. They are all 0xff... (I don't know what do they
mean but it look bad...)

Thanks for your effort! I'm waiting for good news from you and HW guys. :-)

--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Emmanuel Grumbach
2013-11-06 17:50:30 UTC
Permalink
Hi,

adding PCI folks.
Here is the story:

* Wzyboy has a Lenovo laptop with _OSC control *not* granted
* L1 Active is enabled
* kernel: 3.12.0
* Nic is PCIe (Gen2 but not sure...)

At some random point, the driver loses access to the NIC: all readl
operation return 0xff.
Even lspci returns 0xff:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev ff)
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

here is the output of lspci *before* the issue hits:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
00: 86 80 b2 08 06 04 10 00 6b 00 80 02 10 00 00 00
10: 04 00 40 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 62 c2
30: 00 00 00 00 c8 00 00 00 00 00 00 00 09 01 00 00

have you any idea of what we can do to understand what it going wrong here?

Thanks

On 11/06/2013 09:12 AM, wzyboy wrote:
> 2013/11/6 Grumbach, Emmanuel <***@intel.com>:
>> Wait - you mean that after the bug occurred before you rebooted, lspci -xxx show all 00?
>> I can see 0xff here.
>> Anyway - this is very bad... checking with HW guys...
>
>
> Sorry, that's my typo. They are all 0xff... (I don't know what do they
> mean but it look bad...)
>
> Thanks for your effort! I'm waiting for good news from you and HW guys. :-)
>
Bjorn Helgaas
2013-11-06 18:32:56 UTC
Permalink
On Wed, Nov 6, 2013 at 10:50 AM, Emmanuel Grumbach <egrumbach-***@public.gmane.org> wrote:
> Hi,
>
> adding PCI folks.
> Here is the story:
>
> * Wzyboy has a Lenovo laptop with _OSC control *not* granted
> * L1 Active is enabled
> * kernel: 3.12.0
> * Nic is PCIe (Gen2 but not sure...)
>
> At some random point, the driver loses access to the NIC: all readl
> operation return 0xff.
> Even lspci returns 0xff:
>
> 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev ff)
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> here is the output of lspci *before* the issue hits:
>
> 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
> 00: 86 80 b2 08 06 04 10 00 6b 00 80 02 10 00 00 00
> 10: 04 00 40 f0 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 62 c2
> 30: 00 00 00 00 c8 00 00 00 00 00 00 00 09 01 00 00
>
> have you any idea of what we can do to understand what it going wrong here?

Do you have any more details? Maybe open a bugzilla.kernel.org report
and attach:

- complete dmesg log
- lspci -vvxxx output for entire system before issue occurs
- lspci -vvxxx output for entire system after issue occurs

Bjorn

> On 11/06/2013 09:12 AM, wzyboy wrote:
>> 2013/11/6 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
>>> Wait - you mean that after the bug occurred before you rebooted, lspci -xxx show all 00?
>>> I can see 0xff here.
>>> Anyway - this is very bad... checking with HW guys...
>>
>>
>> Sorry, that's my typo. They are all 0xff... (I don't know what do they
>> mean but it look bad...)
>>
>> Thanks for your effort! I'm waiting for good news from you and HW guys. :-)
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-07 04:49:15 UTC
Permalink
2013/11/7 Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+***@public.gmane.org>:
> Do you have any more details? Maybe open a bugzilla.kernel.org report
> and attach:
>
> - complete dmesg log
> - lspci -vvxxx output for entire system before issue occurs
> - lspci -vvxxx output for entire system after issue occurs


Hi, I have filed a bug on bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=64541

Actually I have posted some more logs before, you could find them on
the mailing lists archive:
http://thread.gmane.org/gmane.linux.kernel.wireless.general/115259

Here is the output of lspci -vxxxx just now. I'll run this command
again when the bug occurs next time:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
Subsystem: Intel Corporation Wireless-N 7260
Flags: bus master, fast devsel, latency 0, IRQ 62
Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [c8] Power Management version 3
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [40] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 5c-51-4f-ff-ff-0d-82-ac
Capabilities: [14c] Latency Tolerance Reporting
Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 <?>
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi
00: 86 80 b2 08 06 04 10 00 6b 00 80 02 10 00 00 00
10: 04 00 40 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 62 c2
30: 00 00 00 00 c8 00 00 00 00 00 00 00 09 01 00 00
40: 10 00 02 00 c0 8e 00 10 10 0c 19 00 11 ec 06 00
50: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 12 08 08 00 05 04 00 00 00 00 00 00
70: 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 01 d0 23 c8 00 00 00 0d
d0: 05 40 81 00 0c f0 e0 fe 00 00 00 00 42 41 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
100: 01 00 01 14 00 00 00 00 00 00 00 00 31 20 46 00
110: 00 20 00 00 00 20 00 00 00 00 00 00 00 00 00 00
120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
140: 03 00 c1 14 ac 82 0d ff ff 4f 51 5c 18 00 41 15
150: 03 10 03 10 0b 00 01 00 fe ca 41 01 1f 1e f0 00
160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
440: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
610: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
620: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
630: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
640: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
650: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
710: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
720: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
730: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
740: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
830: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
840: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
850: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
860: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
870: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
890: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
920: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
930: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
990: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
aa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ab0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ad0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
be0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
db0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
dc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
dd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
de0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-07 14:24:54 UTC
Permalink
2013/11/7 wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org>:
> Here is the output of lspci -vxxxx just now. I'll run this command
> again when the bug occurs next time:


Hi, I'm back. The bug occurs two more times, after the first of which
I forgot to run that command.

In the attachment is the output of lspci -vxxxx after the bug occured
and before I rebooted my laptop.

There are too many terriable 0xff there...


--
wzyboy
wzyboy
2013-11-08 04:41:28 UTC
Permalink
Hi,

it seems that I might find one possible "trigger" of this bug: heavy download.

Today I'm trying to download a big file with wget, and this bug occurs
six times before I could not bear it any more (every time I have to
reboot to recover my network!) and bought a network cable downstairs
in the supermarket and used wired network instead.

It seems that when downloading at full speed (20 Mbps fiber, ~2.3
MiB/s) for several minutes and the bug, very possibly, may occur.

Several days ago, when I tried to download "linux-3.12.tar.xz" from
kernal.org with wget, the bug also occured twice during the whole
download process.

Could this be a hardware issue (flawed hardware?) or just driver
issue? I bought this laptop 8 days ago and wiped the pre-installed
Windows the moment I got it, so I have no idea how this wireless card
performs under Windows.

--
wzyboy
wzyboy
2013-11-08 04:46:34 UTC
Permalink
2013/11/8 wzyboy <***@wzyboy.org>:
> so I have no idea how this wireless card
> performs under Windows.


And I could not test this out under Windows any more, since my laptop
is now a BIOS + GTP + LUKS setup, so it would be a big project if I
want to install Windows again -- Windows 7/8/8.1 does not allow BIOS +
GPT setup, not speaking of the headache of resizing LUKS containers...

--
wzyboy
Bjorn Helgaas
2013-11-08 17:20:27 UTC
Permalink
On Wed, Nov 6, 2013 at 9:49 PM, wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org> wrote:
> 2013/11/7 Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+***@public.gmane.org>:
>> Do you have any more details? Maybe open a bugzilla.kernel.org report
>> and attach:
>>
>> - complete dmesg log
>> - lspci -vvxxx output for entire system before issue occurs
>> - lspci -vvxxx output for entire system after issue occurs
>
>
> Hi, I have filed a bug on bugzilla:
> https://bugzilla.kernel.org/show_bug.cgi?id=64541
>
> Actually I have posted some more logs before, you could find them on
> the mailing lists archive:
> http://thread.gmane.org/gmane.linux.kernel.wireless.general/115259
>
> Here is the output of lspci -vxxxx just now. I'll run this command
> again when the bug occurs next time:

Thanks. But can you please attach the output of "lspci -vvxxx" (not
"-vxxxx") for the entire system before the problem occurs? All the
info is in the "-xxxx" output, but it's really painful to decode it
all by hand. Using "-vv" will decode the PCIe Capability structures
where the ASPM configuration is. And the entire system is
interesting, because ASPM requires configuration on upstream bridges
as well as on the device itself.

My only guess is that there's something wrong with the ASPM
configuration and the device just stops responding to config accesses
(and probably MMIO accesses, too, based on the errors in your dmesg
log). Or maybe the device got powered off somehow.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2013-11-08 17:38:07 UTC
Permalink
On Fri, Nov 8, 2013 at 10:20 AM, Bjorn Helgaas <***@google.com> wrote:
> My only guess is that there's something wrong with the ASPM
> configuration and the device just stops responding to config accesses
> (and probably MMIO accesses, too, based on the errors in your dmesg
> log). Or maybe the device got powered off somehow.

If you've figured out a way to reproduce this more reliably, it might
be interesting to do "echo on >
/sys/bus/pci/devices/0000:03:00.0/power/control" and see whether it
makes a difference. That should prevent us from using runtime power
management for the iwlwifi device.

Bjorn
wzyboy
2013-11-10 10:19:58 UTC
Permalink
2013/11/9 Bjorn Helgaas <***@google.com>:
> it might
> be interesting to do "echo on >
> /sys/bus/pci/devices/0000:03:00.0/power/control" and see whether it
> makes a difference.


Hi, should I run this command after the bug? I just ran this after the
bug occurs, but there is no output in dmesg, and "ip link set wlan0
up" still returns same error ("RTNETLINK answers: Connection timed
out").

--
wzyboy
Emmanuel Grumbach
2013-11-10 11:32:27 UTC
Permalink
On Sun, Nov 10, 2013 at 12:19 PM, wzyboy <***@wzyboy.org> wrote:
> 2013/11/9 Bjorn Helgaas <***@google.com>:
>> it might
>> be interesting to do "echo on >
>> /sys/bus/pci/devices/0000:03:00.0/power/control" and see whether it
>> makes a difference.
>
>
> Hi, should I run this command after the bug? I just ran this after the
> bug occurs, but there is no output in dmesg, and "ip link set wlan0
> up" still returns same error ("RTNETLINK answers: Connection timed
> out").
>
> --

HW people seem to want to know what happens in you disable L1 substates.
Can you enter you BIOS and check if you have such an option in your BIOS?
wzyboy
2013-11-10 11:38:05 UTC
Permalink
2013/11/10 Emmanuel Grumbach <***@gmail.com>:
> HW people seem to want to know what happens in you disable L1 substates.
> Can you enter you BIOS and check if you have such an option in your BIOS?


Could you be more specific what the option name look like? Or I could
take photos for each tab in BIOS and attach the photos here.

--
wzyboy
Grumbach, Emmanuel
2013-11-10 11:41:07 UTC
Permalink
>
> 2013/11/10 Emmanuel Grumbach <***@gmail.com>:
> > HW people seem to want to know what happens in you disable L1
> substates.
> > Can you enter you BIOS and check if you have such an option in your BIOS?
>
>
> Could you be more specific what the option name look like? Or I could take
> photos for each tab in BIOS and attach the photos here.
>

It is really called L1 PM Substate.
I don't really know the BIOS of ThinkPad... But w
wzyboy
2013-11-10 12:13:22 UTC
Permalink
2013/11/10 Grumbach, Emmanuel <***@intel.com>:
> It is really called L1 PM Substate.
> I don't really know the BIOS of ThinkPad... But we can try...


I have booted into BIOS and write down almost all the options, as attached.

By the way, I always attach my laptop to an AC adaptor.


--
wzyboy
Grumbach, Emmanuel
2013-11-10 12:17:33 UTC
Permalink
>
> 2013/11/10 Grumbach, Emmanuel <***@intel.com>:
> > It is really called L1 PM Substate.
> > I don't really know the BIOS of ThinkPad... But we can try...
>
>
> I have booted into BIOS and write down almost all the options, as attached.
>
> By the way, I always attach my laptop to an AC adaptor.
>

Cool.... t
Emmanuel Grumbach
2013-11-11 09:43:55 UTC
Permalink
>>
>> 2013/11/10 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
>> > It is really called L1 PM Substate.
>> > I don't really know the BIOS of ThinkPad... But we can try...
>>
>>
>> I have booted into BIOS and write down almost all the options, as attached.
>>
>> By the way, I always attach my laptop to an AC adaptor.
>>
>
> Cool.... they mask all the interesting options... oh well...

Can you please try this?

diff --git a/drivers/net/wireless/iwlwifi/pcie/trans.c
b/drivers/net/wireless/iwlwifi/pcie/trans.c
index ebe351d..f8fbe08 100644
--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
@@ -131,7 +131,7 @@ static void iwl_pcie_apm_config(struct iwl_trans *trans)
* power savings, even without L1.
*/
pcie_capability_read_word(trans_pcie->pci_dev, PCI_EXP_LNKCTL, &lctl);
- if (lctl & PCI_EXP_LNKCTL_ASPM_L1) {
+ if (0) {
/* L1-ASPM enabled; disable(!) L0S */
iwl_set_bit(trans, CSR_GIO_REG, CSR_GIO_REG_VAL_L0S_ENABLED);
dev_info(trans->dev, "L1 Enabled; Disabling L0S\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-11 11:40:00 UTC
Permalink
2013/11/11 Emmanuel Grumbach <egrumbach-***@public.gmane.org>:
> Can you please try this?


Hi, thanks for your patch. I re-compiled my kernel with the patch and
benchmarked it, but sadly the bug still exists...

After booted into the new kernel, I tried to download a 4.0GB file
from the Internet with wget:

HTTP-Anforderung gesendet, warte auf Antwort... 200 OK
LÀnge: 4268605440 (4,0G) [application/octet-stream]
In »»./en_windows_server_2012_r2_x64_dvd_2707946.iso«« speichern.

63% [=====> ] 2.713.761.088 --.-K/s ETA 19m 59s ^C

The download speed was at a average of 2.3 MB/s and suddenly the bug occured...

The dmesg log and lspci -vvxxx output are attached. They seem no different...

--
wzyboy
Bjorn Helgaas
2013-11-11 21:55:53 UTC
Permalink
On Sun, Nov 10, 2013 at 06:19:58PM +0800, wzyboy wrote:
> 2013/11/9 Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+***@public.gmane.org>:
> > it might
> > be interesting to do "echo on >
> > /sys/bus/pci/devices/0000:03:00.0/power/control" and see whether it
> > makes a difference.
>
> Hi, should I run this command after the bug? I just ran this after the
> bug occurs, but there is no output in dmesg, and "ip link set wlan0
> up" still returns same error ("RTNETLINK answers: Connection timed
> out").

Run the command before the bug occurs. The idea is to disable run-time
power management. If the problem is that we're turning off power to
the device, disabling power management might make a difference.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-09 02:46:21 UTC
Permalink
2013/11/9 Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+***@public.gmane.org>:
> Thanks. But can you please attach the output of "lspci -vvxxx" (not
> "-vxxxx") for the entire system before the problem occurs?


Sorry I used the wrong command...

I've attached the output of -vvxxx below.

There are three files:

* lspci.vvxxx.normal.txt: When the interface is "state DOWN" in "ip link".
* lspci.vvxxx.normal2.txt: When the interface is "state UP" in "ip
link" after I ran "ip link set wlan0 up".
* lspci.vvxxx.normal3.txt" When the interface is connected to the
Wi-Fi of my dormitory and got an address (but without default
gateway, I'm using wired network now).

--
wzyboy
Emmanuel Grumbach
2013-11-10 07:03:16 UTC
Permalink
>
>
> Sorry I used the wrong command...
>
> I've attached the output of -vvxxx below.
>
> There are three files:
>
> * lspci.vvxxx.normal.txt: When the interface is "state DOWN" in "ip link".
> * lspci.vvxxx.normal2.txt: When the interface is "state UP" in "ip
> link" after I ran "ip link set wlan0 up".
> * lspci.vvxxx.normal3.txt" When the interface is connected to the
> Wi-Fi of my dormitory and got an address (but without default
> gateway, I'm using wired network now).
>

one more thing?
Are you using KVM with pass-through? Or is it a native installation?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-10 07:08:32 UTC
Permalink
2013/11/10 Emmanuel Grumbach <***@gmail.com>:
> one more thing?
> Are you using KVM with pass-through? Or is it a native installation?


No, I=E2=80=99m using a native installation. I just wiped the pre-insta=
lled
Windows OS and booted from Arch Linux installation disk, set up LUKS
(without LVM) and installed the system.

However, as you can see in dmesg "modules linked in", I have
VirtualBox installed and "vboxdrv.ko" loaded.

--=20
wzyboy
Bjorn Helgaas
2013-11-11 22:44:39 UTC
Permalink
On Sat, Nov 09, 2013 at 10:46:21AM +0800, wzyboy wrote:
> 2013/11/9 Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+***@public.gmane.org>:
> > Thanks. But can you please attach the output of "lspci -vvxxx" (not
> > "-vxxxx") for the entire system before the problem occurs?
>
>
> Sorry I used the wrong command...
>
> I've attached the output of -vvxxx below.
>
> There are three files:
>
> * lspci.vvxxx.normal.txt: When the interface is "state DOWN" in "ip link".
> * lspci.vvxxx.normal2.txt: When the interface is "state UP" in "ip
> link" after I ran "ip link set wlan0 up".
> * lspci.vvxxx.normal3.txt" When the interface is connected to the
> Wi-Fi of my dormitory and got an address (but without default
> gateway, I'm using wired network now).

The only interesting difference is this (between "normal" and "normal3"):

--- lspci.vvxxx.normal.txt 2013-11-11 14:42:14.000000000 -0700
+++ lspci.vvxxx.normal3.txt 2013-11-11 14:42:14.000000000 -0700

00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 3 (rev e4) (prog-if 00 [Normal decode])
- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
+ LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train+ SlotClk+ DLActive+ BWMgmt+ ABWMgmt-

In "normal3", the Link Training bit is set. I'm not a hardware person,
but my guess it this might be normal. The spec says Link Training
indicates that the "LTSSM is in the Configuration or Recovery state,"
and Figure 5-1 shows that the transition from L1 to L0 goes through
the Recovery state. So we might just be seeing the device returning
from L1 to L0. Maybe Emmanuel can confirm this with the hardware guys.

Comparing "lspci.vvxxx.normal.txt" with "lspci.vvxxx.patched.bug.txt",
I see these changes in the 00:1c.1 Downstream Port (the bridge that
leads to the 7260 NIC):

--- before 2013-11-11 15:24:04.755738964 -0700
+++ after 2013-11-11 15:24:11.875722068 -0700
00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 3 (rev e4) (prog-if 00 [Normal decode])
- DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
+ DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
+ LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt-
- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
- Changed: MRL- PresDet- LinkState+
+ SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
+ Changed: MRL- PresDet+ LinkState+
- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-

So when the bug occurs,

- Correctable Error Detected is set
- Data Link Layer Link Active is cleared
- Presence Detect State is cleared
- LTR Mechanism Enable is cleared (spec says this bit must be
reset to the default value when a Downstream Port goes to
DL_Down)

This all seems consistent with the device being powered off. Maybe
the 7260 is on a daughterboard with a bad connection to the system
board? Any chance you can open up the box and make sure the
connection is tight?

It's possible there's some ASPM issue, but I would think Presence
Detect would still work even if the 7260 had a problem with ASPM.
Here's another experiment to try to rule out ASPM. Run these
commands as root after the driver is loaded but before the bug occurs:

setpci -s03:00.0 0x50.W=0x140
setpci -s00:1c.1 0x50.W=0x040
lspci -vv

This should disable ASPM completely on that link, and the lspci output
will help verify that.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-12 05:42:35 UTC
Permalink
Hi, I've got some good news. Here is what I did today:

boot up laptop -> do the sysfs trick -> start downloading a big file
to benchmark it -> several minutes later the bug occurs -> reboot my
laptop to recover -> do the setpci trick -> start downloading a big
file to benchmark it -> more than 5 GiB downloaded (at ~ 2.3 MiB/s)
and everything works fine!

Here are the output of lspci -vv after running two "setpci" commands.

There is also a screenshot of ThinkPad X240s HMM, showing how the
wireless card is connected to the motherboard. In this figure #10 is
the Wireless LAN card. It is connected to the motherboard with Intel's
NGFF connector.

I will continue downloading big files to benchmark it.

--
wzyboy
Emmanuel Grumbach
2013-11-12 07:02:51 UTC
Permalink
On Tue, Nov 12, 2013 at 7:42 AM, wzyboy <***@wzyboy.org> wrote:
> Hi, I've got some good news. Here is what I did today:
>
> boot up laptop -> do the sysfs trick -> start downloading a big file
> to benchmark it -> several minutes later the bug occurs -> reboot my
> laptop to recover -> do the setpci trick -> start downloading a big
> file to benchmark it -> more than 5 GiB downloaded (at ~ 2.3 MiB/s)
> and everything works fine!

encouraging. Thanks.
I just wonder... the patch I sent was supposed to tell the HW not to
use L1. So I would have hoped it would have helped in the same way?
After all, L1 is a handshake between the device and the bridge, so
that if the device doesn't initiate / refuses to go into L1, I'd
expect it to have the same effect as disabling L1 in the ASPM register
PCIe config space?
Obviously I am wrong though.

>
> Here are the output of lspci -vv after running two "setpci" commands.
>
> There is also a screenshot of ThinkPad X240s HMM, showing how the
> wireless card is connected to the motherboard. In this figure #10 is
> the Wireless LAN card. It is connected to the motherboard with Intel's
> NGFF connector.
>
> I will continue downloading big files to benchmark it.
>

Thanks

> --
> wzyboy
Emmanuel Grumbach
2013-11-12 09:36:40 UTC
Permalink
On Tue, Nov 12, 2013 at 9:02 AM, Emmanuel Grumbach <egrumbach-***@public.gmane.org> wrote:
> On Tue, Nov 12, 2013 at 7:42 AM, wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org> wrote:
>> Hi, I've got some good news. Here is what I did today:
>>
>> boot up laptop -> do the sysfs trick -> start downloading a big file
>> to benchmark it -> several minutes later the bug occurs -> reboot my
>> laptop to recover -> do the setpci trick -> start downloading a big
>> file to benchmark it -> more than 5 GiB downloaded (at ~ 2.3 MiB/s)
>> and everything works fine!
>
> encouraging. Thanks.
> I just wonder... the patch I sent was supposed to tell the HW not to
> use L1. So I would have hoped it would have helped in the same way?
> After all, L1 is a handshake between the device and the bridge, so
> that if the device doesn't initiate / refuses to go into L1, I'd
> expect it to have the same effect as disabling L1 in the ASPM register
> PCIe config space?
> Obviously I am wrong though.
>

can you please try to see if you have BIOS updates? (It seems that all
the BIOS update tools run on windows... - but I can have a bootable CD
:))

You can also check this out:
http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=TOOL-ASU
This might help to remove support for L1 substates.
I guess it'd be nice to ask Lenovo too about how to find these options
in BIOS. From our experience, there are a lot of features and options
in BIOS that are accessible only after you enter a "secret" (I meant
obscure) sequence of keys.

>>
>> Here are the output of lspci -vv after running two "setpci" commands.
>>
>> There is also a screenshot of ThinkPad X240s HMM, showing how the
>> wireless card is connected to the motherboard. In this figure #10 is
>> the Wireless LAN card. It is connected to the motherboard with Intel's
>> NGFF connector.
>>
>> I will continue downloading big files to benchmark it.
>>
>
> Thanks
>
>> --
>> wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-12 12:10:56 UTC
Permalink
2013/11/12 wzyboy <***@wzyboy.org>:
> I will continue downloading big files to benchmark it.

Hi guys, good news!

Six hours ago I ran a simple loop script to repeatly download big
files (and saving to /dev/null) and went to have lessons. Six hours
later it's after school. I found that the wireless still works!

So I believe that the two "setpci" commands really work! Thanks Bjorn
and Emmanuel!

--
wzyboy
Grumbach, Emmanuel
2013-11-12 12:16:06 UTC
Permalink
>
> 2013/11/12 wzyboy <***@wzyboy.org>:
> > I will continue downloading big files to benchmark it.
>
> Hi guys, good news!
>
> Six hours ago I ran a simple loop script to repeatly download big files (and
> saving to /dev/null) and went to have lessons. Six hours later it's after school.
> I found that the wireless still works!
>
> So I believe that the two "setpci" commands really work! Thanks Bjorn and
> Emmanuel!
>

Well... I haven't done much, but the setpci isn't really a solution - it is more a work around.
Bjorn is basically disabling L1 PCIe feature which allows to save power. While you might not care, I do :)
The HW folks here would still want to know if you can disable L1 substates feature (not that I know what it is - but I can guess).
If you can try to:
* upgrade your BIOS (if needed)
* check the advanced options I sent to you to see if you can unlock the advanced menu in your BIOS

it'd help me to understand the issue.
In any case, I am happy that you have a way to re
wzyboy
2013-11-12 12:25:40 UTC
Permalink
2013/11/12 Grumbach, Emmanuel <***@intel.com>:
> Well... I haven't done much, but the setpci isn't really a solution - it is more a work around.
> Bjorn is basically disabling L1 PCIe feature which allows to save power. While you might not care, I do :)
> The HW folks here would still want to know if you can disable L1 substates feature (not that I know what it is - but I can guess).
> If you can try to:
> * upgrade your BIOS (if needed)
> * check the advanced options I sent to you to see if you can unlock the advanced menu in your BIOS
>
> it'd help me to understand the issue.
> In any case, I am happy that you have a way to reliably use your NIC now.


Oh, actually I upgraded my BIOS to newest version the day I got this
laptop (12 days ago).

ThinkPad really changed a lot after being acquired by Lenovo...

As of ASU, I've downloaded it but I do not know how to show hidden
BIOS options with it since I have little knowledge about hardware...

Here is what I got when running ./asu64 dump:

***@xenien:~/Desktop/asu$ sudo ./asu64 dump
IBM Advanced Settings Utility version 9.41.81K
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2013 All Rights Reserved
0 1 2 3 4 5 6 7 8 9 A B C D E F
00: 00>00*00*00*00*00*67*8b*45*06*67*f6*45*ec*01*75
10: 03*b8*00*f0*8e*d8*67*8e*45*04*67*8b*7d*02*33*c0
20: 26*89*45*04*a1*61*09*26*89*45*02*a0*63*09*26*88
30: 45*01*26*c6*05*01*b8*00*00*c3*b8*82*00*c3*b8*82
40: 00*c3*b8*82*00*c3*b8*82*00*c3*b8*00*10*26*89*45
50: 0d*f8*c3*9c*66*60*e4*60*eb*00*eb*00*66*61*9d*c3
60: 1e*b8*40*00*8e*d8*f6*06*10*04*04*74*03*1f*f8*c3
70: 1f*f9*c3*f8*c3*25*00*00*41*d0*00*00*08*00*00*03
80: 00*22*04*00*47*01*20*00*20*00*00*02*47*01*a0*00
90: a0*00*00*02*79*00*79*00*79*00*45*00*00*41*d0*02
a0: 00*08*01*00*03*00*2a*10*00*47*01*00*00*00*00*00
b0: 10*47*01*81*00*81*00*00*03*47*01*87*00*87*00*00
c0: 01*47*01*89*00*89*00*00*03*47*01*8f*00*8f*00*00
d0: 03*47*01*c0*00*c0*00*00*20*79*00*79*00*79*00*1d
e0: 00*00*41*d0*01*00*08*02*01*03*00*22*01*00*47*01
f0: 40*00*40*00*00*04*79*00*79*00*79*00*1d*00*00*41


Could you help by telling me what command should I run to enable those
hidden options in BIOS?
--
wzyboy
Grumbach, Emmanuel
2013-11-12 12:45:11 UTC
Permalink
wzyboy
2013-11-12 12:59:21 UTC
Permalink
2013/11/12 Grumbach, Emmanuel <***@intel.com>:
> If you say so :) - I don'tknow :)
>
I checked Lenovo's page again:
http://support.lenovo.com/en_US/downloads/detail.page?DocID=DS035950

It says the latest BIOS version is 2.12, and this is the version I am using...
>
> I have no clue unfortunately - maybe contact Lenovo?

I sent an email to Lenovo support and am waiting for reply. I hope the
consumer service is as good as in IBM-era...

--
wzyboy
Bjorn Helgaas
2013-11-12 18:14:31 UTC
Permalink
On Tue, Nov 12, 2013 at 5:16 AM, Grumbach, Emmanuel
<emmanuel.grumbach-***@public.gmane.org> wrote:
>>
>> 2013/11/12 wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org>:
>> > I will continue downloading big files to benchmark it.
>>
>> Hi guys, good news!
>>
>> Six hours ago I ran a simple loop script to repeatly download big files (and
>> saving to /dev/null) and went to have lessons. Six hours later it's after school.
>> I found that the wireless still works!
>>
>> So I believe that the two "setpci" commands really work! Thanks Bjorn and
>> Emmanuel!
>
> Well... I haven't done much, but the setpci isn't really a solution - it is more a work around.
> Bjorn is basically disabling L1 PCIe feature which allows to save power. While you might not care, I do :)
> The HW folks here would still want to know if you can disable L1 substates feature (not that I know what it is - but I can guess).
> If you can try to:
> * upgrade your BIOS (if needed)
> * check the advanced options I sent to you to see if you can unlock the advanced menu in your BIOS
>
> it'd help me to understand the issue.
> In any case, I am happy that you have a way to reliably use your NIC now.

The setpci experiment was only for debugging. Obviously it's not a
real fix, and it doesn't help any other users of this ThinkPad X240s.
But it does seem clear that the problem is related to ASPM.

And it looks like the same thing we investigated here:
https://bugzilla.kernel.org/show_bug.cgi?id=57331, which is even on
the same device.

>From your dmesg logs:

ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
acpi PNP0A08:00: ACPI _OSC control for PCIe not granted, disabling ASPM

The messages are misleading. Linux does not actually disable ASPM as
you did with setpci. All Linux does is leave the current ASPM
configuration untouched, because we believe that the BIOS is managing
it. The BIOS must have enabled ASPM on this device (you could verify
this by booting with "pci=earlydump"), and BIOS also says the OS must
not enable ASPM control (via the ACPI FADT table and the PCI host
bridge _OSC method).

It would really help if you still had Windows on this system, and we
could look and see whether it disables ASPM for this device (if
anybody does have Windows, I would probably use AIDA64 to dump the PCI
config space). I did experiments for bug 57331 that suggested that
Windows leaves ASPM alone just like Linux does, but my experiments
were on qemu, not on real hardware, and I didn't have an Intel wifi
device.

If the Windows driver works fine even with ASPM enabled, that would
suggest that the problem is something in the Linux iwlwifi driver. If
Windows does actually disable ASPM on this device, then we would have
to figure out if there's a way we can safely make Linux also disable
ASPM.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Grumbach, Emmanuel
2013-11-12 18:25:50 UTC
Permalink
>
> On Tue, Nov 12, 2013 at 5:16 AM, Grumbach, Emmanuel
> <***@intel.com> wrote:
> >>
> >> 2013/11/12 wzyboy <***@wzyboy.org>:
> >> > I will continue downloading big files to benchmark it.
> >>
> >> Hi guys, good news!
> >>
> >> Six hours ago I ran a simple loop script to repeatly download big
> >> files (and saving to /dev/null) and went to have lessons. Six hours later it's
> after school.
> >> I found that the wireless still works!
> >>
> >> So I believe that the two "setpci" commands really work! Thanks Bjorn
> >> and Emmanuel!
> >
> > Well... I haven't done much, but the setpci isn't really a solution - it is more
> a work around.
> > Bjorn is basically disabling L1 PCIe feature which allows to save
> > power. While you might not care, I do :) The HW folks here would still want
> to know if you can disable L1 substates feature (not that I know what it is -
> but I can guess).
> > If you can try to:
> > * upgrade your BIOS (if needed)
> > * check the advanced options I sent to you to see if you can unlock
> > the advanced menu in your BIOS
> >
> > it'd help me to understand the issue.
> > In any case, I am happy that you have a way to reliably use your NIC now.
>
> The setpci experiment was only for debugging. Obviously it's not a real fix,
> and it doesn't help any other users of this ThinkPad X240s.
> But it does seem clear that the problem is related to ASPM.
>
> And it looks like the same thing we investigated here:
> https://bugzilla.kernel.org/show_bug.cgi?id=57331, which is even on the
> same device.
>

Not the same device but the same driver.

> From your dmesg logs:
>
> ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
> acpi PNP0A08:00: ACPI _OSC control for PCIe not granted, disabling ASPM
>

Right - I remember the discussion we had on that.
On this device (7260 that has an issue with ASPM), we don't call pci_disable_link_state, because we know it is supposed to work...
This code is new in 3.12, and is not in 3.11. The first log that the user here sent is on 3.11, hence you still see the error message from PCI subsystem.
Now (3.12) the code reads:

if (!cfg->base_params->pcie_l1_allowed) {
/*
* W/A - seems to solve weird behavior. We need to remove this
* if we don't want to stay in L1 all the time. This wastes a
* lot of power.
*/
pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S |
PCIE_LINK_STATE_L1 |
PCIE_LINK_STATE_CLKPM);
}

and the if is *not* taken and 7260 which is the device we are talking about.

> The messages are misleading. Linux does not actually disable ASPM as you
> did with setpci. All Linux does is leave the current ASPM configuration
> untouched, because we believe that the BIOS is managing it. The BIOS must
> have enabled ASPM on this device (you could verify this by booting with
> "pci=earlydump"), and BIOS also says the OS must not enable ASPM control
> (via the ACPI FADT table and the PCI host bridge _OSC method).
>
> It would really help if you still had Windows on this system, and we could look
> and see whether it disables ASPM for this device (if anybody does have
> Windows, I would probably use AIDA64 to dump the PCI config space). I did
> experiments for bug 57331 that suggested that Windows leaves ASPM alone
> just like Linux does, but my experiments were on qemu, not on real
> hardware, and I didn't have an Intel wifi device.
>
> If the Windows driver works fine even with ASPM enabled, that would
> suggest that the problem is something in the Linux iwlwifi driver. If Windows
> does actually disable ASPM on this device, then we would have to figure out
> if there's a way we can safely make Linux also disable ASPM.
>
Bjorn Helgaas
2013-11-12 19:14:24 UTC
Permalink
On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
<emmanuel.grumbach-***@public.gmane.org> wrote:

> Right - I remember the discussion we had on that.
> On this device (7260 that has an issue with ASPM), we don't call pci_disable_link_state, because we know it is supposed to work...

If ASPM is supposed to work as far as the hardware is concerned, I
guess you're saying this must be an iwlwifi driver issue. Right?

If you think it's a PCI core problem, we have to figure out what the
core needs to do differently. If somebody can point to a difference
in the ASPM configuration between Windows and Linux, that would be a
good start.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Emmanuel Grumbach
2013-11-12 19:37:08 UTC
Permalink
On 11/12/2013 09:14 PM, Bjorn Helgaas wrote:
> On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
> <emmanuel.grumbach-***@public.gmane.org> wrote:
>
>> Right - I remember the discussion we had on that.
>> On this device (7260 that has an issue with ASPM), we don't call pci_disable_link_state, because we know it is supposed to work...
>
> If ASPM is supposed to work as far as the hardware is concerned, I
> guess you're saying this must be an iwlwifi driver issue. Right?

ASPM is supposed to work as far as the hardware is concerned.
We might very well have an issue in iwlwifi - and I am checking this
internally with our System guys.
It can be a PCI core problem too, and it could also be a platform / BIOS
/ Lenovo issue.
Of course, I have no clue which of these is the culprit here.
Our System folks seemed to say that this new device uses L1 substates
which can be enabled in Haswell platform which the user owns.
Now - L1 substates is a new feature and might introduce issues
(apparently) - and this is why they (System folks) wanted the try
without L1 substates. But disabling L1 substates doesn't seem trivial
with the production BIOS of Lenovo. So I am pretty stuck here.
Another possibility is to run a PCI analyser on the machine, but that
requires to have the machine in the lab...

> If you think it's a PCI core problem, we have to figure out what the
> core needs to do differently. If somebody can point to a difference
> in the ASPM configuration between Windows and Linux, that would be a
> good start.
>
> Bjorn
>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2013-11-12 22:09:14 UTC
Permalink
On Tue, Nov 12, 2013 at 12:37 PM, Emmanuel Grumbach <egrumbach-***@public.gmane.org> wrote:
> On 11/12/2013 09:14 PM, Bjorn Helgaas wrote:
>> On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
>> <emmanuel.grumbach-***@public.gmane.org> wrote:
>>
>>> Right - I remember the discussion we had on that.
>>> On this device (7260 that has an issue with ASPM), we don't call pci_disable_link_state, because we know it is supposed to work...
>>
>> If ASPM is supposed to work as far as the hardware is concerned, I
>> guess you're saying this must be an iwlwifi driver issue. Right?
>
> ASPM is supposed to work as far as the hardware is concerned.
> We might very well have an issue in iwlwifi - and I am checking this
> internally with our System guys.
> It can be a PCI core problem too, and it could also be a platform / BIOS
> / Lenovo issue.
> Of course, I have no clue which of these is the culprit here.
> Our System folks seemed to say that this new device uses L1 substates
> which can be enabled in Haswell platform which the user owns.
> Now - L1 substates is a new feature and might introduce issues
> (apparently) - and this is why they (System folks) wanted the try
> without L1 substates. But disabling L1 substates doesn't seem trivial
> with the production BIOS of Lenovo. So I am pretty stuck here.

For debugging purposes, we could configure L1 substates with setpci,
as we did for ASPM. The Linux kernel knows nothing about L1
substates, so the PCI core isn't doing anything with them. It's
possible the driver itself could muck with L1 substate configuration,
but that would be discouraged, and I don't see anything in iwlwifi
that is doing that.

The lspci output in
https://bugzilla.kernel.org/attachment.cgi?id=114061 shows an L1 PM
Substates extended capability (capability ID 0x1e) for the Root Port
leading to the 7260 device, but not for the 7260 device itself:

00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root
Port 3 (rev e4) (prog-if 00 [Normal decode])
Capabilities: [200 v1] #1e

Per sec 5.5.4 of the ECN for L1 PM Substates (15 Aug 2012), I think L1
substates must be configured on both ends of the link, and if the 7260
device doesn't have that capability, I don't see how it could be
enabled.

The lspci version wzyboy has doesn't decode the L1 PM Substates
capability, but there is a newer version at
git://git.kernel.org/pub/scm/utils/pciutils/pciutils.git that should
decode it. Also, "lspci -vvxxx" didn't hexdump this capability, which
should be at offset 0x200. Using "lspci -xxxx" (four "x"s) should
dump it, and we can decode it manually.

wzyboy, can you run these commands before the bug occurs and before
using the "setpci" workaround:

lspci -vvxxxx -s00:1c.1
lspci -vvxxxx -s03:00.0

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-13 05:39:07 UTC
Permalink
2013/11/13 Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+***@public.gmane.org>:
> wzyboy, can you run these commands before the bug occurs and before
> using the "setpci" workaround:
>
> lspci -vvxxxx -s00:1c.1
> lspci -vvxxxx -s03:00.0

After today's morning lessons I booted up my laptop with pci=earlydump
kernel perameter and here are the output of lspci (without setpci and
before bug hit) and dmesg.

--
wzyboy
wzyboy
2013-11-13 06:46:04 UTC
Permalink
2013/11/13 wzyboy <***@wzyboy.org>:
> After today's morning lessons I booted up my laptop with pci=earlydump
> kernel perameter and here are the output of lspci (without setpci and
> before bug hit) and dmesg.


Hi, I have a question: The "setpci" workaround can now make me use my
NIC without having to reboot my laptop from time to time. However,
they are under Linux 3.12 with Grumbach's patch. I'm wondering whether
"setpci" workaround still works in official Linux 3.12 kernel?

I'll try official Linux 3.12 later.

--
wzyboy
wzyboy
2013-11-13 07:25:08 UTC
Permalink
2013/11/13 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
> It'll work on any kernel, including 3.11 you had before
>
> Sent from my phone.

Thanks Grumbach. I'm now in Linux 3.12 without your patch. The
"setpci" workaround still works!

--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2013-11-13 17:42:37 UTC
Permalink
[+cc Bj=F8rn]

On Tue, Nov 12, 2013 at 10:39 PM, wzyboy <***@wzyboy.org> wrote:
> 2013/11/13 Bjorn Helgaas <***@google.com>:
>> wzyboy, can you run these commands before the bug occurs and before
>> using the "setpci" workaround:
>>
>> lspci -vvxxxx -s00:1c.1
>> lspci -vvxxxx -s03:00.0
>
> After today's morning lessons I booted up my laptop with pci=3Dearlyd=
ump
> kernel perameter and here are the output of lspci (without setpci and
> before bug hit) and dmesg.

Thanks. Are you 100% sure the lspci output is before the setpci
workaround? The dmesg earlydump shows this (the ASPM control bits are
in the 16-bit Link Control register, which is at 0x50 for both
devices):

00:1c.1 Root Port config
50: 42 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
03:00.0 Intel 7260 config:
50: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00

So at boot-time, ASPM was enabled. But the lspci shows:

00:1c.1 Root Port config
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
LnkCtl: ASPM Disabled
50: 40 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
03:00.0 Intel 7620 config
Capabilities: [40] Express (v2) Endpoint, MSI 00
LnkCtl: ASPM Disabled
50: 40 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00

And now ASPM is disabled. I'm pretty sure the kernel did not disable
ASPM, so I assume it was disabled by setpci.

I manually decoded the L1 PM Substates registers for both the Root
Port and the 7260 device (details appended below), and everything
appears enabled there (though I think that since ASPM is disabled, L1
PM substates is ignored).

My conclusion is that the BIOS enabled both ASPM and L1 PM substates.
Obviously the BIOS will do the same when booting Windows, and I assume
the device works fine with the Windows driver. Based on our previous
experience with Windows, I don't think it will change the ASPM
configuration because the ACPI FADT table and the PCI _OSC method do
not grant control of ASPM to the OS. Therefore, I think the problem
is in the Linux iwlwifi driver.

I don't think there's anything more I can do here because there's no
evidence that the PCI core is doing anything wrong. But if it turns
out that we *should* be doing something differently, let me know.

Bjorn


00:1c.1 Root Port config
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
LnkCtl: ASPM Disabled
50: 40 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
Capabilities: [200 v1] #1e
200: 1e 00 01 00 1f 28 28 00 1f 28 a0 40 f0 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Header 0x0001001e
ID 0x001e (L1 Substates)
Version 1
Capabilities 0x0028281f
Control 1 0x40a0281f
0x40a0281f Control 1
0x00000001 PCI-PM L1.2 Enable
0x00000002 PCI-PM L1.1 Enable
0x00000004 ASPM L1.2 Enable
0x00000008 ASPM L1.1 Enable
0x00000010 RsvdP
0x00002800 Common_Mode_Restore_Time
0x00a00000 LTR_L1.2_THRESHOLD_Value
0x40000000 LTR_L1.2_THRESHOLD_Scale
Control 2 0x000000f0
0x000000f0 Control 2
0x000000f0 T_POWER_ON Value

03:00.0 Intel 7620 config
Capabilities: [40] Express (v2) Endpoint, MSI 00
LnkCtl: ASPM Disabled
50: 40 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
Capabilities: [154 v1] Vendor Specific Information: ID=3Dcafe
Rev=3D1 Len=3D014 <?>
150: 0b 00 01 00 fe ca 41 01 1f 1e f0 00
160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
Header 0x0001000b
ID 0x000b (Vendor-specific)
Version 1
Vendor-specific header 0x0141cafe
VSEC ID 0xcafe
VSEC Rev 1
Length 0x14
Capabilities 0x00f01e1f
Control 1 0x40a0000f
0x40a0000f Control 1
0x00000001 PCI-PM L1.2 Enable
0x00000002 PCI-PM L1.1 Enable
0x00000004 ASPM L1.2 Enable
0x00000008 ASPM L1.1 Enable
Control 2 0x000000f0
Grumbach, Emmanuel
2013-11-13 20:30:57 UTC
Permalink
>=20
> [+cc Bj=F8rn]
>=20
> On Tue, Nov 12, 2013 at 10:39 PM, wzyboy <***@wzyboy.org> wrote:
> > 2013/11/13 Bjorn Helgaas <***@google.com>:
> >> wzyboy, can you run these commands before the bug occurs and befor=
e
> >> using the "setpci" workaround:
> >>
> >> lspci -vvxxxx -s00:1c.1
> >> lspci -vvxxxx -s03:00.0
> >
> > After today's morning lessons I booted up my laptop with pci=3Dearl=
ydump
> > kernel perameter and here are the output of lspci (without setpci a=
nd
> > before bug hit) and dmesg.
>=20
> Thanks. Are you 100% sure the lspci output is before the setpci work=
around?
> The dmesg earlydump shows this (the ASPM control bits are in the 16-b=
it Link
> Control register, which is at 0x50 for both
> devices):
>=20
> 00:1c.1 Root Port config
> 50: 42 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
> 03:00.0 Intel 7260 config:
> 50: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
>=20
> So at boot-time, ASPM was enabled. But the lspci shows:
>=20
> 00:1c.1 Root Port config
> Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
> LnkCtl: ASPM Disabled
> 50: 40 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
> 03:00.0 Intel 7620 config
> Capabilities: [40] Express (v2) Endpoint, MSI 00
> LnkCtl: ASPM Disabled
> 50: 40 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
>=20
> And now ASPM is disabled. I'm pretty sure the kernel did not disable=
ASPM,
> so I assume it was disabled by setpci.
>=20
> I manually decoded the L1 PM Substates registers for both the Root Po=
rt and
> the 7260 device (details appended below), and everything appears enab=
led
> there (though I think that since ASPM is disabled, L1 PM substates is
> ignored).
>=20
> My conclusion is that the BIOS enabled both ASPM and L1 PM substates.
> Obviously the BIOS will do the same when booting Windows, and I assum=
e
> the device works fine with the Windows driver. Based on our previous
> experience with Windows, I don't think it will change the ASPM config=
uration
> because the ACPI FADT table and the PCI _OSC method do not grant cont=
rol
> of ASPM to the OS. Therefore, I think the problem is in the Linux iw=
lwifi
> driver.
>=20
> I don't think there's anything more I can do here because there's no
> evidence that the PCI core is doing anything wrong. But if it turns =
out that we
> *should* be doing something differently, let me know.
>=20

Right - no evidence of anything - Thank you a lot for all your help. I =
have learnt a lot from this thread.
I guess I'll try to disable L1 PM substates with setpci command and see=
what happens.



> Bjorn
>=20
>=20
> 00:1c.1 Root Port config
> Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
> LnkCtl: ASPM Disabled
> 50: 40 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
> Capabilities: [200 v1] #1e
> 200: 1e 00 01 00 1f 28 28 00 1f 28 a0 40 f0 00 00 00
> 210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Header 0x0001001e
> ID 0x001e (L1 Substates)
> Version 1
> Capabilities 0x0028281f
> Control 1 0x40a0281f
> 0x40a0281f Control 1
> 0x00000001 PCI-PM L1.2 Enable
> 0x00000002 PCI-PM L1.1 Enable
> 0x00000004 ASPM L1.2 Enable
> 0x00000008 ASPM L1.1 Enable
> 0x00000010 RsvdP
> 0x00002800 Common_Mode_Restore_Time
> 0x00a00000 LTR_L1.2_THRESHOLD_Value
> 0x40000000 LTR_L1.2_THRESHOLD_Scale
> Control 2 0x000000f0
> 0x000000f0 Control 2
> 0x000000f0 T_POWER_ON Value
>=20
> 03:00.0 Intel 7620 config
> Capabilities: [40] Express (v2) Endpoint, MSI 00
> LnkCtl: ASPM Disabled
> 50: 40 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
> Capabilities: [154 v1] Vendor Specific Information: ID=3Dcafe
> Rev=3D1 Len=3D014 <?>
> 150: 0b 00 01 00 fe ca 41 01 1f 1e f0 00
> 160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
> Header 0x0001000b
> ID 0x000b (Vendor-specific)
> Version 1
> Vendor-specific header 0x0141cafe
> VSEC ID 0xcafe
> VSEC Rev 1
> Length 0x14
> Capabilities 0x00f01e1f
> Control 1 0x40a0000f
> 0x40a0000f Control 1
> 0x00000001 PCI-PM L1.2 Enable
> 0x00000002 PCI-PM L1.1 Enable
> 0x00000004 ASPM L1.2 Enable
> 0x00000008 ASPM L1.1 Enable
> Control 2 0x000000f0
wzyboy
2013-11-14 04:23:46 UTC
Permalink
2013/11/14 Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+***@public.gmane.org>:
> Thanks. Are you 100% sure the lspci output is before the setpci
> workaround?


To ensure that, I did it again:

boot up laptop -> connect to domitory's WiFi -> run lspci -> run
setpci -> run dmesg.

Here are the outputs.
--
wzyboy
Grumbach, Emmanuel
2013-11-14 06:20:48 UTC
Permalink
>
> 2013/11/14 Bjorn Helgaas <***@google.com>:
> > Thanks. Are you 100% sure the lspci output is before the setpci
> > workaround?
>
>
> To ensure that, I did it again:
>
> boot up laptop -> connect to domitory's WiFi -> run lspci -> run setpci -> run
> dmesg.
>
> Here are the outputs.

Can you please try the following:
* Boot without any changes
* setpci -s03:00.0 0x160.B=0x00
* setpci -s00:1c.1 0x204.B=0x10
lspci -vv

and tell me if WiFi works then.
(this replaces the previou
wzyboy
2013-11-14 07:01:57 UTC
Permalink
2013/11/14 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
> Can you please try the following:
> * Boot without any changes
> * setpci -s03:00.0 0x160.B=0x00
> * setpci -s00:1c.1 0x204.B=0x10
> lspci -vv
>
> and tell me if WiFi works then.
> (this replaces the previous setpci commands)
>
>
> Thank you


boot up (no connection) -> run new setpci commands -> lspci and dmesg
-> connect to domitory's WiFi -> download 3.5 GiB data -> works fine!

--
wzyboy
Grumbach, Emmanuel
2013-11-14 07:04:45 UTC
Permalink
> 2013/11/14 Grumbach, Emmanuel <***@intel.com>:
> > Can you please try the following:
> > * Boot without any changes
> > * setpci -s03:00.0 0x160.B=0x00
> > * setpci -s00:1c.1 0x204.B=0x10
> > lspci -vv
> >
> > and tell me if WiFi works then.
> > (this replaces the previous setpci commands)
> >
> >
> > Thank you
>
>
> boot up (no connection) -> run new setpci commands -> lspci and dmesg
> -> connect to domitory's WiFi -> download 3.5 GiB data -> works fine!
>

Awesome - you have L1 enabled:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSync
wzyboy
2013-11-14 07:09:36 UTC
Permalink
2013/11/14 Grumbach, Emmanuel <***@intel.com>:
> Awesome - you have L1 enabled:


Though do not understand but it seems like a good news :-)

Does that mean you find a real fix instead of a workaround? Congratulations!

--
wzyboy
Grumbach, Emmanuel
2013-11-14 08:39:57 UTC
Permalink
> > Awesome - you have L1 enabled:
>
>
> Though do not understand but it seems like a good news :-)
>

You are saving power (and I know that at least, L1 works).
L1 PM substates doesn't work (but that's a brand new feature)

> Does that mean you find a real fix instead of a workaround? Congratulations!

No. The real fix should come from the driver (unlikely from what I hear from system people here) or disable in BIOS.
So I guess you'd ne
Bjorn Helgaas
2013-11-14 17:53:58 UTC
Permalink
On Thu, Nov 14, 2013 at 1:39 AM, Grumbach, Emmanuel
<emmanuel.grumbach-***@public.gmane.org> wrote:
>> > Awesome - you have L1 enabled:
>>
>> Though do not understand but it seems like a good news :-)
>
> You are saving power (and I know that at least, L1 works).
> L1 PM substates doesn't work (but that's a brand new feature)
>
>> Does that mean you find a real fix instead of a workaround? Congratulations!
>
> No. The real fix should come from the driver (unlikely from what I hear from system people here) or disable in BIOS.
> So I guess you'd need to ask Lenovo how to disable L1 PM Substate in BIOS.

Why would it be unlikely to fix the driver? Do people think the
problem is not actually in the driver?

Asking Lenovo how to disable L1 PM substates is really a non-answer.
Only the extremely technical and extremely patient user (hi wzyboy :))
will even bother to investigate why wifi works fine with Windows but
not with Linux. The only thing Lenovo *could* do is to release a new
BIOS with a switch to control L1 PM Substates. If I were Lenovo, I
would never do that because then I would have to tell customers
"disable this for Linux, enable this for Windows," and I'd have to
deal with support calls about devices using more power than they
should, battery life being shorter, etc. Plus you'd have to ask every
Linux user to upgrade their BIOS. That's all just a terrible user
experience.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-11-15 03:06:58 UTC
Permalink
2013/11/15 Bjorn Helgaas <***@google.com>:
> Why would it be unlikely to fix the driver? Do people think the
> problem is not actually in the driver?
>
> Asking Lenovo how to disable L1 PM substates is really a non-answer.
> Only the extremely technical and extremely patient user (hi wzyboy :))
> will even bother to investigate why wifi works fine with Windows but
> not with Linux. The only thing Lenovo *could* do is to release a new
> BIOS with a switch to control L1 PM Substates. If I were Lenovo, I
> would never do that because then I would have to tell customers
> "disable this for Linux, enable this for Windows," and I'd have to
> deal with support calls about devices using more power than they
> should, battery life being shorter, etc. Plus you'd have to ask every
> Linux user to upgrade their BIOS. That's all just a terrible user
> experience.


I am a little confused. There are two sets of "setpci" commands, both
of which can make me use my NIC reliably. But you two say they are
just workarounds, not real fixes.

I know the "side effect" of first two "setpci" commands is consuming
more power. (Actually by my experience of running on battery, I did
not notice ...)

But Grumbach said after the second two "setpci" commands enables "L1".
Does it mean it saves power? So what's the "side effect" of second two
"setpci" commands?

IMHO, if this could user use their NIC reliably, maybe Grumbach may
write these commands to iwlwifi driver and run them when 7260 is
detected...

BTW, no replies from Lenovo, yet.

--
wzyboy
wzyboy
2013-11-15 03:09:36 UTC
Permalink
2013/11/15 wzyboy <wzyboy-a4GgARivpFQdnm+***@public.gmane.org>:
> IMHO, if this could user use their NIC reliably, maybe Grumbach may
> write these commands to iwlwifi driver and run them when 7260 is
> detected...


Or maybe you could add an option, which enables this "workaround" if
user wants. A user could simply write a /etc/modprobe.d/iwlwifi.conf
and enable this "workaround", to use their NICs without having to
reboot from time to time...

--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Emmanuel Grumbach
2013-11-15 08:49:44 UTC
Permalink
On 11/15/2013 05:06 AM, wzyboy wrote:
> 2013/11/15 Bjorn Helgaas <***@google.com>:
>> Why would it be unlikely to fix the driver? Do people think the
>> problem is not actually in the driver?
>>
>> Asking Lenovo how to disable L1 PM substates is really a non-answer.
>> Only the extremely technical and extremely patient user (hi wzyboy :))
>> will even bother to investigate why wifi works fine with Windows but
>> not with Linux. The only thing Lenovo *could* do is to release a new
>> BIOS with a switch to control L1 PM Substates. If I were Lenovo, I
>> would never do that because then I would have to tell customers
>> "disable this for Linux, enable this for Windows," and I'd have to
>> deal with support calls about devices using more power than they
>> should, battery life being shorter, etc. Plus you'd have to ask every
>> Linux user to upgrade their BIOS. That's all just a terrible user
>> experience.
>
>
> I am a little confused. There are two sets of "setpci" commands, both
> of which can make me use my NIC reliably. But you two say they are
> just workarounds, not real fixes.
>

Right - because they force a mode that the BIOS doesn't allow. The BIOS
doesn't allow the OS (the driver) to decide in what mode to work - so we
cannot reach the same effect as the setpci command from the OS / driver
level. setpci just directly accesses the HW without asking the
permissions of anyone.

> I know the "side effect" of first two "setpci" commands is consuming
> more power. (Actually by my experience of running on battery, I did
> not notice ...)
>
> But Grumbach said after the second two "setpci" commands enables "L1".
> Does it mean it saves power? So what's the "side effect" of second two
> "setpci" commands?
>

They are both the same in terms of side-effects. The first set of setpci
commands will disable L1 altogether - meaning you don't save any power.
The second set of setpci doesn't disable L1, but disable a more subtle
power state (actually several) which are defined as L1 PM substates. In
theses substates, you save less power than in L1 (I think) but you are
more likely to be able to reach them. After all, it is always the same
story - the deeper you sleep, you longer it takes to wake up. And if it
takes longer to wake up, it also means that in several cases you won't
chose to go to sleep. So the way PCI folks help to save power even in
case where you cannot go to a deep sleep is to define states in the
middle in which you save less power, but in which you are more likely to
be. Again - time spent in each state and power saved in each state trade
off.
Now:
L1 - deep sleep
L1 PM substate - something in the middle.

First setpci command - disable both features.
Second setpci command - disable only the second feature.

Regarding side effects... I don't think this is really "dangerous". But
this is not a fix in the way that I wouldn't like to deploy millions of
machines like that. The risk you have here is probably to have a bad
timing and have the setpci commands run exactly when the link is in a
state that setpci disables. That would be bad. How bad? Probably would
just require a reboot - or worst case G3 (take the battery off).

> IMHO, if this could user use their NIC reliably, maybe Grumbach may
> write these commands to iwlwifi driver and run them when 7260 is
> detected...

I can't as exlained above.

>
> BTW, no replies from Lenovo, yet.
>


> Or maybe you could add an option, which enables this "workaround" if
> user wants. A user could simply write a /etc/modprobe.d/iwlwifi.conf
> and enable this "workaround", to use their NICs without having to
> reboot from time to time...

same.
wzyboy
2013-11-15 09:04:26 UTC
Permalink
Thanks a lot for explaination, Emmanuel!

Now I finally know why this is a "catch-22" situation: Disabling those
features with OS/drvier cannot be as neat as disabling them directly
in BIOS. And there may be chance, that disabling them at a bad timing
may cause G3...

--
wzyboy


2013/11/15 Emmanuel Grumbach <***@gmail.com>:
>
>
> On 11/15/2013 05:06 AM, wzyboy wrote:
>> 2013/11/15 Bjorn Helgaas <***@google.com>:
>>> Why would it be unlikely to fix the driver? Do people think the
>>> problem is not actually in the driver?
>>>
>>> Asking Lenovo how to disable L1 PM substates is really a non-answer.
>>> Only the extremely technical and extremely patient user (hi wzyboy :))
>>> will even bother to investigate why wifi works fine with Windows but
>>> not with Linux. The only thing Lenovo *could* do is to release a new
>>> BIOS with a switch to control L1 PM Substates. If I were Lenovo, I
>>> would never do that because then I would have to tell customers
>>> "disable this for Linux, enable this for Windows," and I'd have to
>>> deal with support calls about devices using more power than they
>>> should, battery life being shorter, etc. Plus you'd have to ask every
>>> Linux user to upgrade their BIOS. That's all just a terrible user
>>> experience.
>>
>>
>> I am a little confused. There are two sets of "setpci" commands, both
>> of which can make me use my NIC reliably. But you two say they are
>> just workarounds, not real fixes.
>>
>
> Right - because they force a mode that the BIOS doesn't allow. The BIOS
> doesn't allow the OS (the driver) to decide in what mode to work - so we
> cannot reach the same effect as the setpci command from the OS / driver
> level. setpci just directly accesses the HW without asking the
> permissions of anyone.
>
>> I know the "side effect" of first two "setpci" commands is consuming
>> more power. (Actually by my experience of running on battery, I did
>> not notice ...)
>>
>> But Grumbach said after the second two "setpci" commands enables "L1".
>> Does it mean it saves power? So what's the "side effect" of second two
>> "setpci" commands?
>>
>
> They are both the same in terms of side-effects. The first set of setpci
> commands will disable L1 altogether - meaning you don't save any power.
> The second set of setpci doesn't disable L1, but disable a more subtle
> power state (actually several) which are defined as L1 PM substates. In
> theses substates, you save less power than in L1 (I think) but you are
> more likely to be able to reach them. After all, it is always the same
> story - the deeper you sleep, you longer it takes to wake up. And if it
> takes longer to wake up, it also means that in several cases you won't
> chose to go to sleep. So the way PCI folks help to save power even in
> case where you cannot go to a deep sleep is to define states in the
> middle in which you save less power, but in which you are more likely to
> be. Again - time spent in each state and power saved in each state trade
> off.
> Now:
> L1 - deep sleep
> L1 PM substate - something in the middle.
>
> First setpci command - disable both features.
> Second setpci command - disable only the second feature.
>
> Regarding side effects... I don't think this is really "dangerous". But
> this is not a fix in the way that I wouldn't like to deploy millions of
> machines like that. The risk you have here is probably to have a bad
> timing and have the setpci commands run exactly when the link is in a
> state that setpci disables. That would be bad. How bad? Probably would
> just require a reboot - or worst case G3 (take the battery off).
>
>> IMHO, if this could user use their NIC reliably, maybe Grumbach may
>> write these commands to iwlwifi driver and run them when 7260 is
>> detected...
>
> I can't as exlained above.
>
>>
>> BTW, no replies from Lenovo, yet.
>>
>
>
>> Or maybe you could add an option, which enables this "workaround" if
>> user wants. A user could simply write a /etc/modprobe.d/iwlwifi.conf
>> and enable this "workaround", to use their NICs without having to
>> reboot from time to time...
>
> same.
>
Emmanuel Grumbach
2013-12-25 08:27:58 UTC
Permalink
On Fri, Nov 15, 2013 at 11:04 AM, wzyboy <***@wzyboy.org> wrote:
> Thanks a lot for explaination, Emmanuel!
>
> Now I finally know why this is a "catch-22" situation: Disabling those
> features with OS/drvier cannot be as neat as disabling them directly
> in BIOS. And there may be chance, that disabling them at a bad timing
> may cause G3...
>
> --

Back to you.
Can you please try not to do the setpci and add this:


diff --git a/drivers/net/wireless/iwlwifi/pcie/tx.c
b/drivers/net/wireless/iwlwifi/pcie/tx.c
index 079a511..e8a52f3 100644
--- a/drivers/net/wireless/iwlwifi/pcie/tx.c
+++ b/drivers/net/wireless/iwlwifi/pcie/tx.c
@@ -707,6 +707,8 @@ void iwl_pcie_tx_start(struct iwl_trans *trans,
u32 scd_base_addr)
iwl_write_direct32(trans, FH_TX_CHICKEN_BITS_REG,
reg_val | FH_TX_CHICKEN_BITS_SCD_AUTO_RETRY_EN);

+ iwl_set_bits_prph(trans, 0xa04068, 0x8);
+
/* Enable L1-Active */
iwl_clear_bits_prph(trans, APMG_PCIDEV_STT_REG,
APMG_PCIDEV_STT_VAL_L1_ACT_DIS);


thanks
Grumbach, Emmanuel
2013-12-25 10:38:20 UTC
Permalink
>
>
> Glad to see you again :-)
>
> I've compiled Linux 3.12.5 with your patch, removed "setpci" trick and
> rebooted.
>
> During the boot of new kernel, I can see additional (error) messages among
> "systemd-fsck" lines but I was not fast enough to take photos for them
> before they disappeared (flushed away) by tty login interface.
>
> After logging in, I find that netcfg did not connect to dormitory's Wi-Fi as
> before. I run "lspci -vvxxx" and find that the interface is filled with "ff". I've
> attached the output of "lspci -vvxxx" and "dmesg".
>

So it didn't work - ok. I am not surprised, but I still wanted to know.
This patch is supposed to fix some timing issue in the wake up from L1.
Thanks for testing.

>
> (And here is something "fun": I reverted my kernel to Arch's official
> 3.12.5 and rebooted, and the interface is totally missing! I mean, it
> disappeared from the output of "ip link". I cannot even see it in "lspci -
> vvxxx", not even "ff". The strange effect vanished after one more reboot
> and a cold boot.)

Great -
wzyboy
2013-12-28 09:54:55 UTC
Permalink
Hi,

yesterday a friend of mine told me that one can now install Windows
8/8.1 on a USB device natively. So I tried this so-called "Windows To
Go" technology.

Now I have successfully deployed a Windows 8.1 installation on my
external USB HDD and booted my laptop up with it. That is to say, I
can now observe how Intel 7260 acts under Windows.

--
wzyboy

Could you tell me what should I do to gather debugging information
such as L1 mode etc in Windows? I will send them to you then. Maybe
this could help, to figure out what the bug of iwlwifi is.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-12-28 09:57:39 UTC
Permalink
(Sorry for sending this mail multiple times...)

Hi,

yesterday a friend of mine told me that one can now install Windows
8/8.1 on a USB device natively. So I tried this so-called "Windows To
Go" technology.

Now I have successfully deployed a Windows 8.1 installation on my
external USB HDD and booted my laptop up with it. That is to say, I
can now observe how Intel 7260 acts under Windows.

Could you tell me what should I do to gather debugging information
such as L1 mode etc in Windows? I will send them to you then. Maybe
this could help, to figure out what the bug of iwlwifi is.

--
wzyboy
Grumbach, Emmanuel
2013-12-29 08:14:50 UTC
Permalink
wzyboy
2013-12-29 09:23:20 UTC
Permalink
2013/12/29 Grumbach, Emmanuel <***@intel.com>:
> http://rweverything.com/


Here are the output of "PCI" and "PCI Index" of Intel Wireless.


--
wzyboy
Emmanuel Grumbach
2013-12-29 11:45:19 UTC
Permalink
>
>
> Here are the output of "PCI" and "PCI Index" of Intel Wireless.
>
>

looks like all the power features are enabled... including the ones I
told you to disable.
I am lost now...
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-12-29 13:06:24 UTC
Permalink
2013/12/29 Emmanuel Grumbach <egrumbach-***@public.gmane.org>:
> looks like all the power features are enabled... including the ones I
> told you to disable.
> I am lost now...


Oh... That sounds bad. But I thought both Windows and Linux driver for
this NIC is written by Intel?

--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2014-01-02 21:34:17 UTC
Permalink
On Sun, Dec 29, 2013 at 2:23 AM, wzyboy <***@wzyboy.org> wrote:
> 2013/12/29 Grumbach, Emmanuel <***@intel.com>:
>> http://rweverything.com/
>
>
> Here are the output of "PCI" and "PCI Index" of Intel Wireless.

ASPM must be configured on both ends of the link, so for completeness,
can you also collect the "PCI" output for the bridge leading to the
7260 device? Based on the Linux lspci output, this should be
0000:00:1c.1.

And I assume the device works well with the Windows driver?

Bjorn
wzyboy
2014-01-04 14:41:24 UTC
Permalink
2014/1/3 Bjorn Helgaas <***@google.com>:
> ASPM must be configured on both ends of the link, so for completeness,
> can you also collect the "PCI" output for the bridge leading to the
> 7260 device? Based on the Linux lspci output, this should be
> 0000:00:1c.1.
>
> And I assume the device works well with the Windows driver?


Here are the "PCI" and "PCI Index" data for 0000:00:1c:1.

And yes the NIC works nice in Windows.

--
wzyboy
Grumbach, Emmanuel
2014-01-13 06:01:46 UTC
Permalink
wzyboy
2014-01-13 06:15:54 UTC
Permalink
2014/1/13 Grumbach, Emmanuel <***@intel.com>:
> Small update from the bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=64541).
> The bug is solved... There seem to be a hardware bug with the L1 OFF exit timer. To solve this bug we need *not* to rely on the internal clock and need to keep using the external clock. The internal clock isn't reliable enough and can lead to loss of synchronization between the bridge and the device upon L1 OFF transition.
> This issue has been seen in simulation and not on real hardware... until now... The windows driver has a workaround for this hardware bug, this is why the issue wasn't seen on Windows. I am porting the work around to the Linux driver.
>
> Thank you wzyboy for your patience...
>
> Then end.


So this is a hardware bug instead of a driver bug? Oh...

I'm glad this is finally solved since 2013-11-03. Thanks to all,
providing me with wordarounds, without which I could only use wired
network. :-)

Cheers.

--
wzyboy
wzyboy
2014-01-13 07:56:24 UTC
Permalink
Hi,

I am sorry but here is bad news.

During previous debugging process, I have a modconf file
/etc/modprobe.d/iwlwifi.conf, containing "options iwlmvm
power_scheme=1". I removed it just now (Emmanuel says I can remove it
now) and encountered (maybe) new bugs. Here is what I did just now:

0. Current kernel: patched with
https://bugzilla.kernel.org/attachment.cgi?id=121671&action=diff ;
setpci trick: none ; NIC status: works nice after ~16 hours heavy
usage.
1. Delete that modconf file, reboot.
2. Network connection becomes painfully laggy and lossy.
3. Re-create that modconf file, reboot.
4. Network connection works fine.
5. Comment out that line, reboot.
6. Network connection becomes painfully laggy and lossy.
7. Uncomment that line, reboot.
8. Network connection works fine.

What I mean "painfully laggy and lossy" is that, to whomever I "ping"
(Google, 8.8.8.8, local DNS server...), the RTT is rather high than
normal, and packet loss rate is above 90% (some addresses 100% loss).
While at the same time, other network device in the same LAN works
fine.

I've attached dmesg and lspci output at step 6 and 8.


--
wzyboy
Grumbach, Emmanuel
2014-01-13 08:26:09 UTC
Permalink
> Hi,
>
> I am sorry but here is bad news.
>
> During previous debugging process, I have a modconf file
> /etc/modprobe.d/iwlwifi.conf, containing "options iwlmvm
> power_scheme=1". I removed it just now (Emmanuel says I can remove it
> now) and encountered (maybe) new bugs. Here is what I did just now:
>
> 0. Current kernel: patched with
> https://bugzilla.kernel.org/attachment.cgi?id=121671&action=diff ;
> setpci trick: none ; NIC status: works nice after ~16 hours heavy
> usage.
> 1. Delete that modconf file, reboot.
> 2. Network connection becomes painfully laggy and lossy.
> 3. Re-create that modconf file, reboot.
> 4. Network connection works fine.
> 5. Comment out that line, reboot.
> 6. Network connection becomes painfully laggy and lossy.
> 7. Uncomment that line, reboot.
> 8. Network connection works fine.
>
> What I mean "painfully laggy and lossy" is that, to whomever I "ping"
> (Google, 8.8.8.8, local DNS server...), the RTT is rather high than
> normal, and packet loss rate is above 90% (some addresses 100% loss).
> While at the same time, other network device in the same LAN works
> fine.
>
> I've attached dmesg and lspci output at step 6 and 8.
>

Are you sure about step 1 and 5?
It seems completely weird that an existing file with a line commented out have any impact.
Can you please send the output of:
cat /sys/module/iwlmvm/parameters/power_scheme
in both cases.

Also - what code base are you using?
Since this is surely not related to PCI, please remove them in your reply.
(I keep them here to have them se
wzyboy
2014-01-13 08:50:56 UTC
Permalink
2014/1/13 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
> Are you sure about step 1 and 5?
> It seems completely weird that an existing file with a line commented out have any impact.
> Can you please send the output of:
> cat /sys/module/iwlmvm/parameters/power_scheme
> in both cases.
>

Hi, the point is:

* With "options iwlmvm power_scheme=1" -> everything's fine
* Without "options iwlmvm power_scheme=1" -> network is bad

Absense of the modconf file and a modconf with only one comment line
have the *same* effect - network is bad.

cat /sys/module/iwlmvm/parameters/power_scheme now returns 1, and the
network is good.

> Also - what code base are you using?

What is "code base"...?

> Since this is surely not related to PCI, please remove them in your reply.
> (I keep them here to have them see my mail :))

Done. :-)



English is not my native language so I checked if I misuse the phrase
"comment out": http://english.stackexchange.com/questions/33483/when-i-say-comment-out-does-it-mean-to-uncomment-something-or-comment-it

Seems not...

--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Grumbach, Emmanuel
2014-01-13 08:59:37 UTC
Permalink
wzyboy
2014-01-13 09:05:53 UTC
Permalink
2014/1/13 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
> What kernel version :)
> If you use 3.13, you can try a newer firmware.
>

I applied your patch against Arch Linux's stock kernel 3.12.7, using
this PKGBUILD: https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/linux&id=5e2f3b10dc4be40da8f1fe355bc871d7936ec2d8

***@xenien:~$ uname -a
Linux xenien.wzyboy.im 3.12.7-1-ARCH #1 SMP PREEMPT Sun Jan 12
20:38:55 CST 2014 x86_64 GNU/Linux
***@xenien:~$ cat /proc/version
Linux version 3.12.7-1-ARCH (wzyboy-***@public.gmane.org) (gcc version
4.8.2 20131219 (prerelease) (GCC) ) #1 SMP PREEMPT Sun Jan 12 20:38:55
CST 2014



--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Grumbach, Emmanuel
2014-01-13 10:51:43 UTC
Permalink
wzyboy
2014-01-13 10:56:14 UTC
Permalink
2014/1/13 Grumbach, Emmanuel <emmanuel.grumbach-***@public.gmane.org>:
> Are you using Bluetooth at the same time?

No, I disabled bluetooth in BIOS since Nov 2013. Never have used it.

> Also - it might be worth to try wireless-testing and the latest firmware.
> While you seem to be the first to report issues about power save on this firmware, we know that a lot of issues have been fixed in the latest firmware (-8.ucode) which is supported in 3.13 only. But if you change kernel, go for wireless-testing.
> Thanks.

Okay, I'll put this down and follow up when Linux 3.13 goes stable.

Thanks.

--
wzyboy
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2014-01-13 17:29:30 UTC
Permalink
[-cc linux-pci]

On Mon, Jan 13, 2014 at 1:26 AM, Grumbach, Emmanuel
<emmanuel.grumbach-***@public.gmane.org> wrote:
>> Hi,
>>
>> I am sorry but here is bad news.
>>
>> During previous debugging process, I have a modconf file
>> /etc/modprobe.d/iwlwifi.conf, containing "options iwlmvm
>> power_scheme=1". I removed it just now (Emmanuel says I can remove it
>> now) and encountered (maybe) new bugs. Here is what I did just now:
>>
>> 0. Current kernel: patched with
>> https://bugzilla.kernel.org/attachment.cgi?id=121671&action=diff ;
>> setpci trick: none ; NIC status: works nice after ~16 hours heavy
>> usage.
>> 1. Delete that modconf file, reboot.
>> 2. Network connection becomes painfully laggy and lossy.
>> 3. Re-create that modconf file, reboot.
>> 4. Network connection works fine.
>> 5. Comment out that line, reboot.
>> 6. Network connection becomes painfully laggy and lossy.
>> 7. Uncomment that line, reboot.
>> 8. Network connection works fine.
>>
>> What I mean "painfully laggy and lossy" is that, to whomever I "ping"
>> (Google, 8.8.8.8, local DNS server...), the RTT is rather high than
>> normal, and packet loss rate is above 90% (some addresses 100% loss).
>> While at the same time, other network device in the same LAN works
>> fine.
>>
>> I've attached dmesg and lspci output at step 6 and 8.
>>
>
> Are you sure about step 1 and 5?
> It seems completely weird that an existing file with a line commented out have any impact.

It doesn't seem strange to me; maybe we're interpreting wzyboy's data
differently. The way I read it, if /etc/modprobe.d/iwlwifi.conf
contains "options iwlmvm power_scheme=1", everything works fine (cases
0, 4, 8). If iwlwifi.conf does not exist or contains only a
commented-out line, he sees problems (cases 2, 6).

> Can you please send the output of:
> cat /sys/module/iwlmvm/parameters/power_scheme
> in both cases.
>
> Also - what code base are you using?
> Since this is surely not related to PCI, please remove them in your reply.
> (I keep them here to have them see my mail :))

Agreed, this doesn't sound PCI-related, so I removed linux-pci. Feel
free to keep me or add me back if you do see anything PCI-related.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Sascha Weaver
2014-01-14 03:56:11 UTC
Permalink
2014/1/14 Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+***@public.gmane.org>:
> It doesn't seem strange to me; maybe we're interpreting wzyboy's data
> differently. The way I read it, if /etc/modprobe.d/iwlwifi.conf
> contains "options iwlmvm power_scheme=1", everything works fine (cases
> 0, 4, 8). If iwlwifi.conf does not exist or contains only a
> commented-out line, he sees problems (cases 2, 6).
>

Yes, Bjorn got my point :-)

With "options iwlmvm power_scheme=1" -> network is good
Without "options iwlmvm power_scheme=1" -> network is bad

Currently I am using Linux 3.12.7 with Emmanuel's patch, and without
any setpci tricks.

Emmanuel says a new firmware is available in 3.13 so I will follow up
when 3.13 is released in my distro's repo.



--
Sascha Weaver
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
wzyboy
2013-12-25 10:34:40 UTC
Permalink
2013/12/25 Emmanuel Grumbach <egrumbach-***@public.gmane.org>:
> Back to you.
> Can you please try not to do the setpci and add this:


Glad to see you again :-)

I've compiled Linux 3.12.5 with your patch, removed "setpci" trick and rebooted.

During the boot of new kernel, I can see additional (error) messages
among "systemd-fsck" lines but I was not fast enough to take photos
for them before they disappeared (flushed away) by tty login
interface.

After logging in, I find that netcfg did not connect to dormitory's
Wi-Fi as before. I run "lspci -vvxxx" and find that the interface is
filled with "ff". I've attached the output of "lspci -vvxxx" and
"dmesg".




(And here is something "fun": I reverted my kernel to Arch's official
3.12.5 and rebooted, and the interface is totally missing! I mean, it
disappeared from the output of "ip link". I cannot even see it in
"lspci -vvxxx", not even "ff". The strange effect vanished after one
more reboot and a cold boot.)

--
wzyboy
Grumbach, Emmanuel
2013-11-13 08:45:48 UTC
Permalink
>
> On Tue, Nov 12, 2013 at 12:37 PM, Emmanuel Grumbach
> <***@gmail.com> wrote:
> > On 11/12/2013 09:14 PM, Bjorn Helgaas wrote:
> >> On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
> >> <***@intel.com> wrote:
> >>
> >>> Right - I remember the discussion we had on that.
> >>> On this device (7260 that has an issue with ASPM), we don't call
> pci_disable_link_state, because we know it is supposed to work...
> >>
> >> If ASPM is supposed to work as far as the hardware is concerned, I
> >> guess you're saying this must be an iwlwifi driver issue. Right?
> >
> > ASPM is supposed to work as far as the hardware is concerned.
> > We might very well have an issue in iwlwifi - and I am checking this
> > internally with our System guys.
> > It can be a PCI core problem too, and it could also be a platform /
> > BIOS / Lenovo issue.
> > Of course, I have no clue which of these is the culprit here.
> > Our System folks seemed to say that this new device uses L1 substates
> > which can be enabled in Haswell platform which the user owns.
> > Now - L1 substates is a new feature and might introduce issues
> > (apparently) - and this is why they (System folks) wanted the try
> > without L1 substates. But disabling L1 substates doesn't seem trivial
> > with the production BIOS of Lenovo. So I am pretty stuck here.
>
> For debugging purposes, we could configure L1 substates with setpci, as we
> did for ASPM. The Linux kernel knows nothing about L1 substates, so the PCI
> core isn't doing anything with them. It's possible the driver itself could muck
> with L1 substate configuration, but that would be discouraged, and I don't
> see anything in iwlwifi that is doing that.
>
> The lspci output in
> https://bugzilla.kernel.org/attachment.cgi?id=114061 shows an L1 PM
> Substates extended capability (capability ID 0x1e) for the Root Port leading to
> the 7260 device, but not for the 7260 device itself:
>
> 00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 3
> (rev e4) (prog-if 00 [Normal decode])
> Capabilities: [200 v1] #1e
>
> Per sec 5.5.4 of the ECN for L1 PM Substates (15 Aug 2012), I think L1
> substates must be configured on both ends of the link, and if the 7260 device
> doesn't have that capability, I don't see how it could be enabled.

Makes sense.

>
> The lspci version wzyboy has doesn't decode the L1 PM Substates capability,
> but there is a newer version at
> git://git.kernel.org/pub/scm/utils/pciutils/pciutils.git that should decode it.
> Also, "lspci -vvxxx" didn't hexdump this capability, which should be at offset
> 0x200. Using "lspci -xxxx" (four "x"s) should dump it, and we can decode it
> manually.
>

You can find this in http://permalink.gmane.org/gmane.linux.kernel.wireless.general/115378.

Somehow my System team says that it should be at offset 0x160?
Is it possible that there is a "walk algorithm" with pointers just like for the ASPM register?
I'll try to check the PCI spec when I'll find the time for that.

In any case, here are the relevant offsets:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
[...]
160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
[...]
200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

> wzyboy, can you run these commands before the bug occurs and before
> using the "setpci" workaround:
>
> lspci -vvxxxx -s00:1c.1
> lspci -vvxxxx -s03:00.0
Grumbach, Emmanuel
2013-11-13 09:47:30 UTC
Permalink
> >
> > On Tue, Nov 12, 2013 at 12:37 PM, Emmanuel Grumbach
> > <***@gmail.com> wrote:
> > > On 11/12/2013 09:14 PM, Bjorn Helgaas wrote:
> > >> On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
> > >> <***@intel.com> wrote:
> > >>
> > >>> Right - I remember the discussion we had on that.
> > >>> On this device (7260 that has an issue with ASPM), we don't call
> > pci_disable_link_state, because we know it is supposed to work...
> > >>
> > >> If ASPM is supposed to work as far as the hardware is concerned, I
> > >> guess you're saying this must be an iwlwifi driver issue. Right?
> > >
> > > ASPM is supposed to work as far as the hardware is concerned.
> > > We might very well have an issue in iwlwifi - and I am checking this
> > > internally with our System guys.
> > > It can be a PCI core problem too, and it could also be a platform /
> > > BIOS / Lenovo issue.
> > > Of course, I have no clue which of these is the culprit here.
> > > Our System folks seemed to say that this new device uses L1
> > > substates which can be enabled in Haswell platform which the user owns.
> > > Now - L1 substates is a new feature and might introduce issues
> > > (apparently) - and this is why they (System folks) wanted the try
> > > without L1 substates. But disabling L1 substates doesn't seem
> > > trivial with the production BIOS of Lenovo. So I am pretty stuck here.
> >
> > For debugging purposes, we could configure L1 substates with setpci,
> > as we did for ASPM. The Linux kernel knows nothing about L1
> > substates, so the PCI core isn't doing anything with them. It's
> > possible the driver itself could muck with L1 substate configuration,
> > but that would be discouraged, and I don't see anything in iwlwifi that is
> doing that.
> >
> > The lspci output in
> > https://bugzilla.kernel.org/attachment.cgi?id=114061 shows an L1 PM
> > Substates extended capability (capability ID 0x1e) for the Root Port
> > leading to the 7260 device, but not for the 7260 device itself:
> >
> > 00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root
> > Port 3 (rev e4) (prog-if 00 [Normal decode])
> > Capabilities: [200 v1] #1e
> >
> > Per sec 5.5.4 of the ECN for L1 PM Substates (15 Aug 2012), I think L1
> > substates must be configured on both ends of the link, and if the 7260
> > device doesn't have that capability, I don't see how it could be enabled.
>
> Makes sense.
>
> >
> > The lspci version wzyboy has doesn't decode the L1 PM Substates
> > capability, but there is a newer version at
> > git://git.kernel.org/pub/scm/utils/pciutils/pciutils.git that should decode it.
> > Also, "lspci -vvxxx" didn't hexdump this capability, which should be
> > at offset 0x200. Using "lspci -xxxx" (four "x"s) should dump it, and
> > we can decode it manually.
> >
>
> You can find this in
> http://permalink.gmane.org/gmane.linux.kernel.wireless.general/115378.
>
> Somehow my System team says that it should be at offset 0x160?
> Is it possible that there is a "walk algorithm" with pointers just like for the
> ASPM register?
> I'll try to check the PCI spec when I'll find the time for that.

So I read a bit the lspci code, and it looks that there are plenty of pointers inside the config space. Fun :)
So basically, since:
#define PCI_EXT_CAP_ID_L1PM 0x1e
This means that I need to find an 0x1e in the output of wzyboy's lspci. I found only one: at offset 0x15d.
Should that mean that my System team was right when they asked for offset 0x160 which is 3 bytes afterwards (and matches more the less the code of lspci)?
If so,
160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00 seems to say that it is enabled?

OTOH, 0x15d is 0x1e and not 0x001e as required by PCI-SIG ECN? Me scraping my head.
Grumbach, Emmanuel
2013-11-13 12:13:31 UTC
Permalink
> > > On Tue, Nov 12, 2013 at 12:37 PM, Emmanuel Grumbach
> > > <***@gmail.com> wrote:
> > > > On 11/12/2013 09:14 PM, Bjorn Helgaas wrote:
> > > >> On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
> > > >> <***@intel.com> wrote:
> > > >>
> > > >>> Right - I remember the discussion we had on that.
> > > >>> On this device (7260 that has an issue with ASPM), we don't call
> > > pci_disable_link_state, because we know it is supposed to work...
> > > >>
> > > >> If ASPM is supposed to work as far as the hardware is concerned,
> > > >> I guess you're saying this must be an iwlwifi driver issue. Right?
> > > >
> > > > ASPM is supposed to work as far as the hardware is concerned.
> > > > We might very well have an issue in iwlwifi - and I am checking
> > > > this internally with our System guys.
> > > > It can be a PCI core problem too, and it could also be a platform
> > > > / BIOS / Lenovo issue.
> > > > Of course, I have no clue which of these is the culprit here.
> > > > Our System folks seemed to say that this new device uses L1
> > > > substates which can be enabled in Haswell platform which the user
> owns.
> > > > Now - L1 substates is a new feature and might introduce issues
> > > > (apparently) - and this is why they (System folks) wanted the try
> > > > without L1 substates. But disabling L1 substates doesn't seem
> > > > trivial with the production BIOS of Lenovo. So I am pretty stuck here.
> > >
> > > For debugging purposes, we could configure L1 substates with setpci,
> > > as we did for ASPM. The Linux kernel knows nothing about L1
> > > substates, so the PCI core isn't doing anything with them. It's
> > > possible the driver itself could muck with L1 substate
> > > configuration, but that would be discouraged, and I don't see
> > > anything in iwlwifi that is
> > doing that.
> > >
> > > The lspci output in
> > > https://bugzilla.kernel.org/attachment.cgi?id=114061 shows an L1 PM
> > > Substates extended capability (capability ID 0x1e) for the Root Port
> > > leading to the 7260 device, but not for the 7260 device itself:
> > >
> > > 00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express
> > > Root Port 3 (rev e4) (prog-if 00 [Normal decode])
> > > Capabilities: [200 v1] #1e
> > >
> > > Per sec 5.5.4 of the ECN for L1 PM Substates (15 Aug 2012), I think
> > > L1 substates must be configured on both ends of the link, and if the
> > > 7260 device doesn't have that capability, I don't see how it could be
> enabled.
> >
> > Makes sense.
> >
> > >
> > > The lspci version wzyboy has doesn't decode the L1 PM Substates
> > > capability, but there is a newer version at
> > > git://git.kernel.org/pub/scm/utils/pciutils/pciutils.git that should decode
> it.
> > > Also, "lspci -vvxxx" didn't hexdump this capability, which should be
> > > at offset 0x200. Using "lspci -xxxx" (four "x"s) should dump it,
> > > and we can decode it manually.
> > >
> >
> > You can find this in
> > http://permalink.gmane.org/gmane.linux.kernel.wireless.general/115378.
> >
> > Somehow my System team says that it should be at offset 0x160?
> > Is it possible that there is a "walk algorithm" with pointers just
> > like for the ASPM register?
> > I'll try to check the PCI spec when I'll find the time for that.
>
> So I read a bit the lspci code, and it looks that there are plenty of pointers
> inside the config space. Fun :) So basically, since:
> #define PCI_EXT_CAP_ID_L1PM 0x1e
> This means that I need to find an 0x1e in the output of wzyboy's lspci. I found
> only one: at offset 0x15d.
> Should that mean that my System team was right when they asked for offset
> 0x160 which is 3 bytes afterwards (and matches more the less the code of
> lspci)?
> If so,
> 160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00 seems to say that it is
> enabled?
>
> OTOH, 0x15d is 0x1e and not 0x001e as required by PCI-SIG ECN? Me scraping
> my head.

Ok - so I have now the complete picture.
This device was designed before PCI-SIG gave an ID to L1 PM Substates, so Intel had to use the L1 PM Substate as a Vendor Define whose ID is 0xCAFE. Layout is the same as defined now in PCI-SIG (page 21 in http://www.pcisig.com/specifications/pciexpress/specifications/ECN_L1_PM_Substates_with_CLKREQ_31_May_2013_Rev10a.pdf).

So:
150: 03 10 03 10 0b 00 01 00 fe ca 41 01 1f 1e f0 00
160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00

We can see that L1 PM Substate *is* enabled:
004h = 41 01 1f 1e
008h = 0f 00 f0 00
00Ch = a0 40 f0 00

I may have messed up things here...

According to System / HW, it is unsafe to disable L1 PM Substate using setpci, even if we disable it from both sides (device and bridge). This kind of settings should be done by BIOS only.
So we have 2 options here (assuming that we can't disable that in BIOS):
* either we try to disable L1 PM Substate even my colleagues think it is not safe
* either we just disable L1 altogether
wzyboy
2013-11-13 12:18:24 UTC
Permalink
2013/11/13 Grumbach, Emmanuel <***@intel.com>:
> According to System / HW, it is unsafe to disable L1 PM Substate using setpci, even if we disable it from both sides (device and bridge). This kind of settings should be done by BIOS only.
> So we have 2 options here (assuming that we can't disable that in BIOS):
> * either we try to disable L1 PM Substate even my colleagues think it is not safe
> * either we just disable L1 altogether


Unsafe means battery blows up like a Samsung? :-)

My laptop is now connected to classroom's WiFi (with setpci
workaround) and running on battery, it seems the power consumption is
acceptable ... at least not too much more than before.

--
wzyboy
Grumbach, Emmanuel
2013-11-13 12:21:10 UTC
Permalink
>
> 2013/11/13 Grumbach, Emmanuel <***@intel.com>:
> > According to System / HW, it is unsafe to disable L1 PM Substate using
> setpci, even if we disable it from both sides (device and bridge). This kind of
> settings should be done by BIOS only.
> > So we have 2 options here (assuming that we can't disable that in BIOS):
> > * either we try to disable L1 PM Substate even my colleagues think it
> > is not safe
> > * either we just disable L1 altogether
>
>
> Unsafe means battery blows up like a Samsung? :-)
>

You already blew your warranty the minute your installed Linux, didn't you? :)

> My laptop is now connected to classroom's WiFi (with setpci
> workaround) and running on battery, it seems the power consumption is
> acceptable ... at least not too much more than before.

Cool - we work for nothing -
wzyboy
2013-11-13 12:33:39 UTC
Permalink
2013/11/13 Grumbach, Emmanuel <***@intel.com>:
>> Unsafe means battery blows up like a Samsung? :-)
>>
>
> You already blew your warranty the minute your installed Linux, didn't you? :)
>

Indeed. :-)

>> My laptop is now connected to classroom's WiFi (with setpci
>> workaround) and running on battery, it seems the power consumption is
>> acceptable ... at least not too much more than before.
>
> Cool - we work for nothing - encouraging :)

To be frank I really know little about hardware (i.e. I had never
heard of "L1" "ASPM" and other HW terms before you talked about it...)
but I'll keep an eye, see if "setpci" workaround (this disables L1 /
ASPM?) has any side effects. (I hope not)

--
wzyboy
Bjørn Mork
2013-11-13 13:48:12 UTC
Permalink
"Grumbach, Emmanuel" <***@intel.com> writes:

> Ok - so I have now the complete picture.

> This device was designed before PCI-SIG gave an ID to L1 PM Substates,
> so Intel had to use the L1 PM Substate as a Vendor Define whose ID is
> 0xCAFE. Layout is the same as defined now in PCI-SIG (page 21 in
> http://www.pcisig.com/specifications/pciexpress/specifications/ECN_L1_PM_Substates_with_CLKREQ_31_May_2013_Rev10a.pdf).
>
> So:
> 150: 03 10 03 10 0b 00 01 00 fe ca 41 01 1f 1e f0 00
> 160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
>
> We can see that L1 PM Substate *is* enabled:
> 004h = 41 01 1f 1e
> 008h = 0f 00 f0 00
> 00Ch = a0 40 f0 00

Wow! That's extremely useful.

But I do have some problems placing the individiual bits right in these
LE registers. And so have you, I believe... You are 2 bytes off.
"41 01" is part of the vendor specific header (12 bits length == 0x014
and 4 bits version == 0x1.

So I made a simple patch for lspci based on your info. Seems to work
for me:


03:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev 63)
Subsystem: Intel Corporation Dual Band Wireless-AC 7260 [8086:4070]
Physical Slot: 1
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 45
Region 0: Memory at f0500000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [c8] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0100c Data: 4152
Capabilities: [40] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 <32us
ClockPM+ Surprise- LLActRep- BwNot-
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Via WAKE#
DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP+ BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number 0c-8b-fd-ff-ff-08-09-71
Capabilities: [14c v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [154 v1] Vendor Specific Information: ID=cafe Rev=1 Len=014 L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
Kernel driver in use: iwlwifi
00: 86 80 b1 08 06 05 10 00 63 00 80 02 10 00 00 00
10: 04 00 50 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 70 40
30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00
40: 10 00 02 00 c0 8e 00 10 10 0c 11 00 11 ec 06 00
50: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 12 08 08 00 05 00 00 00 00 00 00 00
70: 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 01 d0 23 c8 00 00 00 0d
d0: 05 40 81 00 0c 10 e0 fe 00 00 00 00 52 41 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
100: 01 00 01 14 00 00 00 00 00 00 00 00 31 20 46 00
110: 40 20 00 00 00 20 00 00 00 00 00 00 00 00 00 00
120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
140: 03 00 c1 14 71 09 08 ff ff fd 8b 0c 18 00 41 15
150: 00 00 00 00 0b 00 01 00 fe ca 41 01 1f 1e f0 00
160: 00 00 00 00 28 00 00 00 00 00 00 00 00 00 00 00
170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[snipped the remaing 00 bytes]


> I may have messed up things here...
>
> According to System / HW, it is unsafe to disable L1 PM Substate using setpci, even if we disable it from both sides (device and bridge). This kind of settings should be done by BIOS only.
> So we have 2 options here (assuming that we can't disable that in BIOS):
> * either we try to disable L1 PM Substate even my colleagues think it is not safe
> * either we just disable L1 altogether

How about forcing ASPM even if the BIOS says it's unsupported? Has that
been tested?


BjÞrn
Grumbach, Emmanuel
2013-11-13 14:16:00 UTC
Permalink
>
> "Grumbach, Emmanuel" <***@intel.com> writes:
>
> > Ok - so I have now the complete picture.
>
> > This device was designed before PCI-SIG gave an ID to L1 PM Substates,
> > so Intel had to use the L1 PM Substate as a Vendor Define whose ID is
> > 0xCAFE. Layout is the same as defined now in PCI-SIG (page 21 in
> >
> http://www.pcisig.com/specifications/pciexpress/specifications/ECN_L1_PM
> _Substates_with_CLKREQ_31_May_2013_Rev10a.pdf).
> >
> > So:
> > 150: 03 10 03 10 0b 00 01 00 fe ca 41 01 1f 1e f0 00
> > 160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
> >
> > We can see that L1 PM Substate *is* enabled:
> > 004h = 41 01 1f 1e
> > 008h = 0f 00 f0 00
> > 00Ch = a0 40 f0 00
>
> Wow! That's extremely useful.
>
> But I do have some problems placing the individiual bits right in these LE
> registers. And so have you, I believe... You are 2 bytes off.
> "41 01" is part of the vendor specific header (12 bits length == 0x014 and 4
> bits version == 0x1.
>

yeah - it seemed awkward to me that the header isn't 4-bytes aligned, but some bits seemed ... strange then. So I opted to the that (wrong) parsing.
My poor brain has always issues parsing this kind of things and take in count the endianity etc.. :)

> So I made a simple patch for lspci based on your info. Seems to work for me:
>
>
> 03:00.0 Network controller [0280]: Intel Corporation Wireless 7260
> [8086:08b1] (rev 63)
> Subsystem: Intel Corporation Dual Band Wireless-AC 7260
> [8086:4070]
> Physical Slot: 1
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR+ FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 45
> Region 0: Memory at f0500000 (64-bit, non-prefetchable) [size=8K]
> Capabilities: [c8] Power Management version 3
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-
> ,D2-,D3hot+,D3cold+)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Address: 00000000fee0100c Data: 4152
> Capabilities: [40] Express (v2) Endpoint, MSI 00
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency
> L0s <512ns, L1 unlimited
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
> FLReset-
> MaxPayload 128 bytes, MaxReadReq 128 bytes
> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq-
> AuxPwr+ TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s
> L1, Exit Latency L0s <4us, L1 <32us
> ClockPM+ Surprise- LLActRep- BwNot-
> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
> DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+,
> OBFF Via WAKE#
> DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-,
> LTR-, OBFF Disabled
> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
> SpeedDis-
> Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -3.5dB,
> EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
> Capabilities: [100 v1] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr- BadTLP+ BadDLLP- Rollover- Timeout-
> NonFatalErr+
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> NonFatalErr+
> AERCap: First Error Pointer: 00, GenCap- CGenEn-
> ChkCap- ChkEn-
> Capabilities: [140 v1] Device Serial Number 0c-8b-fd-ff-ff-08-09-71
> Capabilities: [14c v1] Latency Tolerance Reporting
> Max snoop latency: 0ns
> Max no snoop latency: 0ns
> Capabilities: [154 v1] Vendor Specific Information: ID=cafe Rev=1
> Len=014 L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> ASPM_L1.1+ L1_PM_Substates+
> PortCommonModeRestoreTime=30us
> PortTPowerOnTime=60us
> Kernel driver in use: iwlwifi
> 00: 86 80 b1 08 06 05 10 00 63 00 80 02 10 00 00 00
> 10: 04 00 50 f0 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 70 40
> 30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00
> 40: 10 00 02 00 c0 8e 00 10 10 0c 11 00 11 ec 06 00
> 50: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 00 00 00 00 12 08 08 00 05 00 00 00 00 00 00 00
> 70: 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 00 00 00 00 00 00 00 00 01 d0 23 c8 00 00 00 0d
> d0: 05 40 81 00 0c 10 e0 fe 00 00 00 00 52 41 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 100: 01 00 01 14 00 00 00 00 00 00 00 00 31 20 46 00
> 110: 40 20 00 00 00 20 00 00 00 00 00 00 00 00 00 00
> 120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 140: 03 00 c1 14 71 09 08 ff ff fd 8b 0c 18 00 41 15
> 150: 00 00 00 00 0b 00 01 00 fe ca 41 01 1f 1e f0 00
> 160: 00 00 00 00 28 00 00 00 00 00 00 00 00 00 00 00
> 170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [snipped the remaing 00
> bytes]
>
>
> > I may have messed up things here...
> >
> > According to System / HW, it is unsafe to disable L1 PM Substate using
> setpci, even if we disable it from both sides (device and bridge). This kind of
> settings should be done by BIOS only.
> > So we have 2 options here (assuming that we can't disable that in BIOS):
> > * either we try to disable L1 PM Substate even my colleagues think it
> > is not safe
> > * either we just disable L1 altogether
>
> How about forcing ASPM even if the BIOS says it's unsupported? Has that
> been tested?
>

Well... No... At least not by me. People usually care more about connectivity than sa
Loading...