[Cerowrt-devel] DOCSIS 3+ recommendation?

Post by Matt Taggart
Hi cerowrt-devel,
My cable internet provider (Comcast) has been pestering me (monthly email
and robocalls) to upgrade my cable modem to something newer. But I _like_
my current one (no wifi, battery backup) and it's been very stable and can
handle the data rates I am paying for. But they are starting to roll out
faster service plans and I guess it would be good to have that option (and
eventually they will probably boost the speed of the plan I'm paying for).
So...
Any recommendations for cable modems that are known to be solid and less
bufferbloated?

David P. Reed

2015-03-18 04:34:30 UTC

It is not the cable modem itself that is bufferbloated. It is the head end working with the cable modem. Docsis 3 has mechanisms to avoid queue buildup but they are turned on by the head end.

I don't know for sure but I believe that the modem itself cannot measure or control the queueing in the system to minimize latency.

You can use codel or whatever if you bound you traffic upward and stifle traffic downward. But that doesn't deal with the queueing in the link away from your home.

Post by Matt Taggart

Post by Matt Taggart
Hi cerowrt-devel,
My cable internet provider (Comcast) has been pestering me (monthly

Post by Matt Taggart
and robocalls) to upgrade my cable modem to something newer. But I

_like_

Post by Matt Taggart
my current one (no wifi, battery backup) and it's been very stable

and can

Post by Matt Taggart
handle the data rates I am paying for. But they are starting to roll

out

Post by Matt Taggart
faster service plans and I guess it would be good to have that option

(and

Post by Matt Taggart
eventually they will probably boost the speed of the plan I'm paying

for).

Post by Matt Taggart
So...
Any recommendations for cable modems that are known to be solid and

less

Post by Matt Taggart
bufferbloated?

I've been using the Motorola Surfboard SB6141 on Comcast with good results.
Anybody got a good suggestion on how to test a cablemodem for
bufferbloat,
or what you can do about it anyhow (given that firmware is usually pushed
from the ISP side)?
------------------------------------------------------------------------
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel

-- Sent with K-@ Mail - the evolution of emailing.

Jonathan Morton

2015-03-18 06:26:38 UTC

DOCSIS 3.1 mandates support for AQM (at minimum the PIE algorithm) in both
CPE and head end. If you can get hold of a D3.1 modem, you'll at least be
ready for the corresponding upgrade by your ISP.

Unfortunately I don't know which cable modems support which DOCSIS
versions, but it should be straightforward to look that up for any given
model.

- Jonathan Morton

JF Tremblay

2015-03-18 19:38:41 UTC

DOCSIS 3.1 mandates support for AQM (at minimum the PIE algorithm) in both CPE and head end. If you can get hold of a D3.1 modem [âŠ].

That last part might involve robbing the house of a Comcast employee... ;)

http://www.lightreading.com/cable/docsis/comcast-puts-docsis-31-live-in-the-field/d/d-id/714494 <http://www.lightreading.com/cable/docsis/comcast-puts-docsis-31-live-in-the-field/d/d-id/714494>

No D3.1 hardware is certified at this point, the chipsets are just barely out and still experimental. Customers probably wonât see D31 hardware before 2016.

Btw, in my experience, modems and CMTSes have no AQM at all configured. And the buffers are large, in both directions. The more recent the model, the more buffer it usually has (hey, more speed requires more buffer, right?). Iâve seen multiple Mb in some models, canât remember the exact amount, but it might have been 2-4 Mb for my current TM702. So the worst case is actually to have a very recent modem with a lower-tier speed (like a 10 Mbps).

JF

Jonathan Morton

2015-03-18 19:50:12 UTC

Right, so until 3.1 modems actually become available, it's probably best to
stick with a modem that already supports your subscribed speed, and manage
the bloat separately with shaping and AQM.

- Jonathan Morton

d***@reed.com

2015-03-19 13:53:54 UTC

How many years has it been since Comcast said they were going to fix bufferbloat in their network within a year?

And LTE operators haven't even started.

THat's a sign that the two dominant sectors of "Internet Access" business are refusing to support quality Internet service. (the old saying about monopoly AT&T: "we don't care. we don't have to." applies to these sectors).

Have fun avoiding bufferbloat in places where there is no "home router" you can put fq_codel into.

It's almost as if the cable companies don't want OTT video or simultaneous FTP and interactive gaming to work. Of course not. They'd never do that.

Post by Matt Taggart
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
Right, so until 3.1 modems actually become available, it's probably best to
stick with a modem that already supports your subscribed speed, and manage
the bloat separately with shaping and AQM.
- Jonathan Morton

JF Tremblay

2015-03-19 14:11:39 UTC

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix bufferbloat in their network within a year?

Any quote on that?

Post by d***@reed.com
THat's a sign that the two dominant sectors of "Internet Access" business are refusing to support quality Internet service.

I’m not sure this is a fair statement. Comcast is a major (if not “the” player) in CableLabs, and they made it clear that for Docsis 3.1, aqm was one of the important target. This might not have happened without all the noise around bloat that Jim and Dave made for years. (now peering and transit disputes are another ball game)

While cable operators started pretty much with a blank slate in the early days of Docsis, they now have to deal with legacy and a huge tail of old devices. So in this respect, yes they are now a bit like the DSL incumbents, introduction of new technologies is over a 3-4 years timeframe at least.

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or simultaneous FTP and interactive gaming to work. Of course not. They'd never do that.

You might be surprised at how much they care for gamers, these are often their most vocal users. And those who will call to get things fixed. Support calls and truck rolls are expensive and touch the bottom line, where it hurts…

JF
(a former cable operator)

d***@reed.com

2015-03-19 15:38:19 UTC

I'll look up the quote, when I get home from California, in my email archives. It may have been private email from Richard Woundy (an engineering SVP at Comcast who is the person who drove the CableLabs effort forward, working with Jim Gettys - doing the in-house experiments...). To be clear, I am not blaming Comcast's engineers or technologists for the most part. I *am* blaming the failure of the Comcast leadership to invest in deploying the solution their own guys developed. I was skeptical at the time (and I think I can find that email to Rich Woundy, too, as well as a note to Jim Gettys expressing the same skepticism when he was celebrating the CableLabs experiments and their "best practices" regarding AQM).

It's worth remembering that CableLabs, while owned jointly by all cable operators, does not actually tell the operators what to do in any way. So recommendations are routinely ignored in favor of profitable operations. I'm sure you know that. It's certainly common knowledge among those who work at CableLabs (I had a number of conversations with Richard Green when he ran the place on this very subject).

So like any discussion where we anthropomorphize companies, it's probably not useful to "pin blame".

I wasn't trying to pin blame anywhere in particular - just observing that Cable companies still haven't deployed the actual AQM options they already have.

Instead the cable operators seem obsessed with creating a semi-proprietary "game lane" that involves trying to use diffserv, even though they don't (and can't) have end-to-end agreement on the meaning of the DCP used, and therefore will try to use that as a basis for requiring gaming companies to directly peer with the cable distribution network, where the DCP will work (as long as you buy only "special" gear) to give the gaming companies a "fast lane" that they have to pay for (to bypass the bloat that they haven't eliminated by upgrading their deployments).

Why will the game providers not be able to just use the standard Internet access service, without peering to every cable company directly? Well, because when it comes to spending money on hardware upgrades, there's more money in it to pay for the upgrade.

That's just business logic, when you own a monopoly on Internet access. You want to maximize the profits from your monopoly, because competition csn't exist. [Fixing bufferbloat doesn't increase profits for a monopoly. In fact it discourages people from buying more expensive service, so it probably decreases profits.]

It's counterintuitive, I suppose, to focus on the business ecology distortions caused by franchise monopolies in a technical group. But engineering is not just technical - it's about economics in a very fundamental way. Network engineering in particular.

If you want better networks, eliminate the monopolies who have no interest in making them better for users.

Post by JF Tremblay

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix bufferbloat
in their network within a year?

Any quote on that?

Post by d***@reed.com
THat's a sign that the two dominant sectors of "Internet Access" business are
refusing to support quality Internet service.

I’m not sure this is a fair statement. Comcast is a major (if not
“the” player) in CableLabs, and they made it clear that for Docsis
3.1, aqm was one of the important target. This might not have happened without all
the noise around bloat that Jim and Dave made for years. (now peering and transit
disputes are another ball game)
While cable operators started pretty much with a blank slate in the early days of
Docsis, they now have to deal with legacy and a huge tail of old devices. So in
this respect, yes they are now a bit like the DSL incumbents, introduction of new
technologies is over a 3-4 years timeframe at least.

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or simultaneous FTP
and interactive gaming to work. Of course not. They'd never do that.

You might be surprised at how much they care for gamers, these are often their
most vocal users. And those who will call to get things fixed. Support calls and
truck rolls are expensive and touch the bottom line, where it hurts…
JF
(a former cable operator)

Jim Gettys

2015-03-19 15:40:42 UTC

On Thu, Mar 19, 2015 at 10:11 AM, JF Tremblay <

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix

bufferbloat in their network within a year?

âThey had hoped to be able to use a feature in DOCSIS to at least set the
buffering to the "correct" size for the provisioned bandwidth. While not
fixing bufferbloat, it would have made a big difference (getting latency
down to the 100ms range; that would have taken my original 1.2 seconds of
bloat down to 100ms).

When they went and tested that featureâ, the actual implementations weren't
there and were so buggy, they couldn't turn it on.

Moral 1: anything not tested by being used on an ongoing basis, doesn't
work.

Moral 2: Companies like Comcast do not (currently) control their own
destiny, since they outsourced too much of the technology to others.

Post by d***@reed.com
Any quote on that?

Post by d***@reed.com
THat's a sign that the two dominant sectors of "Internet Access"

business are refusing to support quality Internet service.
Iâm not sure this is a fair statement. Comcast is a major (if not âtheâ
player) in CableLabs, and they made it clear that for Docsis 3.1, aqm was
one of the important target. This might not have happened without all the
noise around bloat that Jim and Dave made for years. (now peering and
transit disputes are another ball game)
While cable operators started pretty much with a blank slate in the early
days of Docsis, they now have to deal with legacy and a huge tail of old
devices. So in this respect, yes they are now a bit like the DSL
incumbents, introduction of new technologies is over a 3-4 years timeframe
at least.

âYup.
â

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or

simultaneous FTP and interactive gaming to work. Of course not. They'd
never do that.
You might be surprised at how much they care for gamers, these are often
their most vocal users. And those who will call to get things fixed.
Support calls and truck rolls are expensive and touch the bottom line,
where it hurtsâŠ

âYup.

And I agree with Dave Taht, Comcast has had a lot more technical clue than
most other ISP's we've interacted with.â

âAnd these industries are captive to the practices of the companies that
make the gear, and as I've said in public at the Berkman Center, this has
really bad and dangerous consequences for the Internet. I'll post a new
version of that talk, maybe later today.

Now, I've yet to detect any clue it cellular ISP's.... And there, dpr's
complaints I believe are correct.
- Jim
â

Post by d***@reed.com
JF
(a former cable operator)
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Michael Richardson

2015-03-19 17:04:09 UTC

Post by Jim Gettys
Moral 1: anything not tested by being used on an ongoing basis,
doesn't work.
Moral 2: Companies like Comcast do not (currently) control their own
destiny, since they outsourced too much of the technology to others.

Moral 2 might be something that the C* suite types might actuall get.
I don't know how to get that message there, though.

--
] Never tell me the odds! | ipv6 mesh networks [
] Michael Richardson, Sandelman Software Works | network architect [
] ***@sandelman.ca http://www.sandelman.ca/ | ruby on rails [

Jonathan Morton

2015-03-19 17:14:58 UTC

Post by Michael Richardson

Moral 2 might be something that the C* suite types might actuall get.
I don't know how to get that message there, though.

Be careful what you wish for: if the cable companies controlled the hardware more tightly, how much less experimentation would we be able to do? The general hackability of your average CPE router is a benefit to our research efforts, even if the default configuration they come with is still utterly terrible.

- Jonathan Morton

Dave Taht

2015-03-19 17:11:11 UTC

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix bufferbloat in their network within a year?

It is unfair to lump every individual in an organization together. All
orgs have people trying to do the right thing(s), and sometimes,
eventually, they win. All that is required for evil to triumph is for
good people to do nothing, and docsis 3.1 is entering trials. Some
competition still exists there for both modems (8? providers?) and
CMTSes (3). My hope is that if we can continue to poke at it,
eventually a better modem and cmts setup will emerge, from someone.

http://www-personal.umich.edu/~jlawler/aue/sig.html

Or one of the CMTS vendors will ship something that works better,
although the ARRIS study had many flaws (LRED was lousy, their SFQ
enhancement quite interesting):

preso: http://snapon.lab.bufferbloat.net/~d/trimfat/Cloonan_Presentation.pdf
paper: http://snapon.lab.bufferbloat.net/~d/trimfat/Cloonan_Paper.pdf

I have of the cynical view that it does help to have knowledgeable
people such as yourself rattling the cages, and certainly I was
pleased with the results of my recent explosion at virgin - 2000+ hits
on the web site! 150 +1s! So I do plan to start blogging again
(everyone tired of my long emails? wait til you see the blog!)

Post by d***@reed.com
And LTE operators haven't even started.

And we haven't worked our magic on them, nor conducted sufficient
research on how they could get it more right. That said, there has
been progress in that area as well, and certainly quite a few papers
demonstrating their problems.

Post by d***@reed.com
THat's a sign that the two dominant sectors of "Internet Access" business are refusing to support quality Internet service. (the old saying about monopoly AT&T: "we don't care. we don't have to." applies to these sectors).
Have fun avoiding bufferbloat in places where there is no "home router" you can put fq_codel into.

Given the game theory here, this is why my own largest bet has been on
trying to resuscitate the home router and small business firewall
markets.

covering bets are on at least some ISPs (maybe not in the US) getting
it right, on regulation, etc.

Forces I am actively working against include the plans juniper and
cisco are pimping for moving ISP cpe into the cloud.

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or simultaneous FTP and interactive gaming to work. Of course not. They'd never do that.

I do understand there are strong forces against us, especially in the USA.

I ended up writing a MUCH longer blog entry for this, I do hope I get
around to getting that site up.

_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel

--
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

Livingood, Jason

2015-03-19 19:58:20 UTC

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix
bufferbloat in their network within a year?

I¹m not sure anyone ever said it¹d take a year. If someone did (even if it
was me) then it was in the days when the problem appeared less complicated
than it is and I apologize for that. Let¹s face it - the problem is
complex and the software that has to be fixed is everywhere. As I said
about IPv6: if it were easy, it¹d be done by now. ;-)

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or
simultaneous FTP and interactive gaming to work. Of course not. They'd
never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we have
done and are doing. We¹ve underwritten some of Dave¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck to get
AQM built into the default D3.1 spec (had CTO-level awareness & support,
and was due to Greg White¹s work at CableLabs). We are starting to field
test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every tactic
will always be a winner.

As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
network of any scale in the world solved it? If so, I have something to
use to learn from and apply here at Comcast - and I¹d **love** an
introduction to someone who has so I can get this info.

But usually there are rational explanations for why something is still not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses and not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

Post by Dave Taht
I do understand there are strong forces against us, especially in the USA.

I¹m not sure there are any forces against this issue. It¹s more a question
of awareness - it is not apparent it is more urgent than other work in
everyone¹s backlog. For example, the number of ISP customers even aware of
buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
product managers have a tough time arguing to prioritize buffer bloat work
over new feature X or Y.

One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that you
could get people to run as they do speed tests today. (If someone thinks
they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat - it¹s hard
to make an Œelevator pitch¹ about it.

It reminds me a bit of IPv6 several years ago. Rather than saying in
essence Œyou operators are dummies¹ for not already fixing this, maybe
assume the engineers all Œget it¹ and what to do it. Because we really do
get it and want to do something about it. Then ask those operators what
they need to convince their leadership and their suppliers and product
managers and whomever else that it needs to be resourced more effectively
(see above for example).

We¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
default, and we¹re starting trials now. And probably within 18-24 months
we won¹t buy any DOCSIS CPE that is not 3.1.

The question for me is how and when to address it in DOCSIS 3.0.

- Jason

d***@reed.com

2015-03-19 20:29:21 UTC

I do think engineers operating networks get it, and that Comcast's engineers really get it, as I clarified in my followup note.

The issue is indeed prioritization of investment, engineering resources and management attention. The teams at Comcast in the engineering side have been the leaders in "bufferbloat minimizing" work, and I think they should get more recognition for that.

I disagree a little bit about not having a test that shows the issue, and the value the test would have in demonstrating the issue to users. Netalyzer has been doing an amazing job on this since before the bufferbloat term was invented. Every time I've talked about this issue I've suggested running Netalyzer, so I have a personal set of comments from people all over the world who run Netalyzer on their home networks, on hotel networks, etc.

When I have brought up these measurements from Netalyzr (which are not aimed at showing the problem as users experience) I observe an interesting reaction from many industry insiders: the results are not "sexy enough for stupid users" and also "no one will care".

I think the reaction characterizes the problem correctly - but the second part is the most serious objection. People don't need a measurement tool, they need to know that this is why their home network sucks sometimes.

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix
bufferbloat in their network within a year?

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or
simultaneous FTP and interactive gaming to work. Of course not. They'd
never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we have
done and are doing. We¹ve underwritten some of Dave¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck to get
AQM built into the default D3.1 spec (had CTO-level awareness & support,
and was due to Greg White¹s work at CableLabs). We are starting to field
test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every tactic
will always be a winner.
As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
network of any scale in the world solved it? If so, I have something to
use to learn from and apply here at Comcast - and I¹d **love** an
introduction to someone who has so I can get this info.
But usually there are rational explanations for why something is still not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses and not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

Post by Dave Taht
I do understand there are strong forces against us, especially in the USA.

I¹m not sure there are any forces against this issue. It¹s more a question
of awareness - it is not apparent it is more urgent than other work in
everyone¹s backlog. For example, the number of ISP customers even aware of
buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
product managers have a tough time arguing to prioritize buffer bloat work
over new feature X or Y.
One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that you
could get people to run as they do speed tests today. (If someone thinks
they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat - it¹s hard
to make an Œelevator pitch¹ about it.
It reminds me a bit of IPv6 several years ago. Rather than saying in
essence Œyou operators are dummies¹ for not already fixing this, maybe
assume the engineers all Œget it¹ and what to do it. Because we really do
get it and want to do something about it. Then ask those operators what
they need to convince their leadership and their suppliers and product
managers and whomever else that it needs to be resourced more effectively
(see above for example).
We¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
default, and we¹re starting trials now. And probably within 18-24 months
we won¹t buy any DOCSIS CPE that is not 3.1.
The question for me is how and when to address it in DOCSIS 3.0.
- Jason

Greg White

2015-03-19 23:18:10 UTC

Netalyzr is great for network geeks, hardly consumer-friendly, and even so
the "network buffer measurements" part is buried in 150 other statistics.
Why couldn't Ookla* add a simultaneous "ping" test to their throughput
test? When was the last time someone leaned on them?

*I realize not everyone likes the Ookla tool, but it is popular and about
as "sexy" as you are going to get with a network performance tool.

-Greg

Post by d***@reed.com
I do think engineers operating networks get it, and that Comcast's
engineers really get it, as I clarified in my followup note.
The issue is indeed prioritization of investment, engineering resources
and management attention. The teams at Comcast in the engineering side
have been the leaders in "bufferbloat minimizing" work, and I think they
should get more recognition for that.
I disagree a little bit about not having a test that shows the issue, and
the value the test would have in demonstrating the issue to users.
Netalyzer has been doing an amazing job on this since before the
bufferbloat term was invented. Every time I've talked about this issue
I've suggested running Netalyzer, so I have a personal set of comments
from people all over the world who run Netalyzer on their home networks,
on hotel networks, etc.
When I have brought up these measurements from Netalyzr (which are not
aimed at showing the problem as users experience) I observe an
interesting reaction from many industry insiders: the results are not
"sexy enough for stupid users" and also "no one will care".
I think the reaction characterizes the problem correctly - but the second
part is the most serious objection. People don't need a measurement
tool, they need to know that this is why their home network sucks
sometimes.
On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix
bufferbloat in their network within a year?

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or
simultaneous FTP and interactive gaming to work. Of course not. They'd
never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we have
done and are doing. We¹ve underwritten some of Dave¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck to get
AQM built into the default D3.1 spec (had CTO-level awareness & support,
and was due to Greg White¹s work at CableLabs). We are starting to field
test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every tactic
will always be a winner.
As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
network of any scale in the world solved it? If so, I have something to
use to learn from and apply here at Comcast - and I¹d **love** an
introduction to someone who has so I can get this info.
But usually there are rational explanations for why something is still not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses and not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

Post by Dave Taht
I do understand there are strong forces against us, especially in the USA.

I¹m not sure there are any forces against this issue. It¹s more a question
of awareness - it is not apparent it is more urgent than other work in
everyone¹s backlog. For example, the number of ISP customers even aware of
buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
product managers have a tough time arguing to prioritize buffer bloat work
over new feature X or Y.
One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that you
could get people to run as they do speed tests today. (If someone thinks
they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat - it¹s hard
to make an Œelevator pitch¹ about it.
It reminds me a bit of IPv6 several years ago. Rather than saying in
essence Œyou operators are dummies¹ for not already fixing this, maybe
assume the engineers all Œget it¹ and what to do it. Because we really do
get it and want to do something about it. Then ask those operators what
they need to convince their leadership and their suppliers and product
managers and whomever else that it needs to be resourced more
effectively
(see above for example).
We¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
default, and we¹re starting trials now. And probably within 18-24 months
we won¹t buy any DOCSIS CPE that is not 3.1.
The question for me is how and when to address it in DOCSIS 3.0.
- Jason

_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

MUSCARIELLO Luca IMT/OLN

2015-03-20 08:18:35 UTC

I agree. Having that ping included in Ookla would help a lot more

Luca

Post by Greg White
Netalyzr is great for network geeks, hardly consumer-friendly, and even so
the "network buffer measurements" part is buried in 150 other statistics.
Why couldn't Ookla* add a simultaneous "ping" test to their throughput
test? When was the last time someone leaned on them?
*I realize not everyone likes the Ookla tool, but it is popular and about
as "sexy" as you are going to get with a network performance tool.
-Greg

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix
bufferbloat in their network within a year?

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or
simultaneous FTP and interactive gaming to work. Of course not. They'd
never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we have
done and are doing. We¹ve underwritten some of Dave¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck to get
AQM built into the default D3.1 spec (had CTO-level awareness & support,
and was due to Greg White¹s work at CableLabs). We are starting to field
test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every tactic
will always be a winner.
As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
network of any scale in the world solved it? If so, I have something to
use to learn from and apply here at Comcast - and I¹d **love** an
introduction to someone who has so I can get this info.
But usually there are rational explanations for why something is still not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses and not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

Post by Dave Taht
I do understand there are strong forces against us, especially in the USA.

I¹m not sure there are any forces against this issue. It¹s more a question
of awareness - it is not apparent it is more urgent than other work in
everyone¹s backlog. For example, the number of ISP customers even aware of
buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
product managers have a tough time arguing to prioritize buffer bloat work
over new feature X or Y.
One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that you
could get people to run as they do speed tests today. (If someone thinks
they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat - it¹s hard
to make an Œelevator pitch¹ about it.
It reminds me a bit of IPv6 several years ago. Rather than saying in
essence Œyou operators are dummies¹ for not already fixing this, maybe
assume the engineers all Œget it¹ and what to do it. Because we really do
get it and want to do something about it. Then ask those operators what
they need to convince their leadership and their suppliers and product
managers and whomever else that it needs to be resourced more effectively
(see above for example).
We¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
default, and we¹re starting trials now. And probably within 18-24 months
we won¹t buy any DOCSIS CPE that is not 3.1.
The question for me is how and when to address it in DOCSIS 3.0.
- Jason

David P. Reed

2015-03-20 13:31:29 UTC

The mystery in most users' minds is that ping at a time when there is no load does tell them anything at all about why the network connection will such when their kid is uploading to youtube.

So giving them ping time is meaningless.
I think most network engineers think ping time is a useful measure of a badly bufferbloated system. It is not.

The only measure is ping time under maximum load of raw packets.

And that requires a way to test maximum load rtt.

There is no problem with that ... other than that to understand why and how that is relevant you have to understand Internet congestion control.

Having had to testify before CRTC about this, I learned that most access providers (the Canadian ones) claim that such measurements are never made as a measure of quality, and that you can calculate expected latency by using Little's lemma from average throughput. And that dropped packets are the right measure of quality of service.

Ookla ping time is useless in a context where even the "experts" wearing ties from the top grossing Internet firms are so confused. And maybe deliberately misleading on purpose... they had to be forced to provide any data they had about congestion in their networks by a ruling during the proceeding and then responded that they had no data - they never measured queueing delay and disputed that it mattered. The proper measure of congestion was throughput.

I kid you not.

So Ookla ping time is useless against such public ignorance.

That's completely wrong for

Post by MUSCARIELLO Luca IMT/OLN
I agree. Having that ping included in Ookla would help a lot more
Luca

Post by Greg White
Netalyzr is great for network geeks, hardly consumer-friendly, and

even so

Post by Greg White
the "network buffer measurements" part is buried in 150 other

statistics.

Post by Greg White
Why couldn't Ookla* add a simultaneous "ping" test to their

throughput

Post by Greg White
test? When was the last time someone leaned on them?
*I realize not everyone likes the Ookla tool, but it is popular and

about

Post by Greg White
as "sexy" as you are going to get with a network performance tool.
-Greg

resources

Post by d***@reed.com
and management attention. The teams at Comcast in the engineering

side

Post by d***@reed.com
have been the leaders in "bufferbloat minimizing" work, and I think

they

Post by d***@reed.com
should get more recognition for that.
I disagree a little bit about not having a test that shows the

issue, and

issue

Post by d***@reed.com
I've suggested running Netalyzer, so I have a personal set of

comments

Post by d***@reed.com
from people all over the world who run Netalyzer on their home

networks,

Post by d***@reed.com
on hotel networks, etc.
When I have brought up these measurements from Netalyzr (which are

not

Post by d***@reed.com
aimed at showing the problem as users experience) I observe an
interesting reaction from many industry insiders: the results are

not

Post by d***@reed.com
"sexy enough for stupid users" and also "no one will care".
I think the reaction characterizes the problem correctly - but the

second

Post by d***@reed.com
part is the most serious objection. People don't need a measurement
tool, they need to know that this is why their home network sucks
sometimes.
On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"

Post by d***@reed.com
How many years has it been since Comcast said they were going to

fix

Post by d***@reed.com
bufferbloat in their network within a year?

IÂ¹m not sure anyone ever said itÂ¹d take a year. If someone did

(even if

it
was me) then it was in the days when the problem appeared less complicated
than it is and I apologize for that. LetÂ¹s face it - the problem is
complex and the software that has to be fixed is everywhere. As I

said

about IPv6: if it were easy, itÂ¹d be done by now. ;-)

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or
simultaneous FTP and interactive gaming to work. Of course not.

They'd

Post by d***@reed.com
never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we

have

done and are doing. WeÂ¹ve underwritten some of DaveÂ¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck

get
AQM built into the default D3.1 spec (had CTO-level awareness &

support,

and was due to Greg WhiteÂ¹s work at CableLabs). We are starting to

field

test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every

tactic

will always be a winner.
As for existing D3.0 gear, itÂ¹s not for lack of trying. Has any

DOCSIS

network of any scale in the world solved it? If so, I have

something to

use to learn from and apply here at Comcast - and IÂ¹d **love** an
introduction to someone who has so I can get this info.
But usually there are rational explanations for why something is

still

not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses

and

not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

Post by Dave Taht
I do understand there are strong forces against us, especially in

the

Post by Dave Taht
USA.

IÂ¹m not sure there are any forces against this issue. ItÂ¹s more a
question
of awareness - it is not apparent it is more urgent than other work

everyoneÂ¹s backlog. For example, the number of ISP customers even

aware

of
buffer bloat is probably 0.001%; if customers arenÂ¹t asking for it,

the

product managers have a tough time arguing to prioritize buffer

bloat

work
over new feature X or Y.
One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that

you

could get people to run as they do speed tests today. (If someone

thinks

they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat -

itÂ¹s

hard
to make an Åelevator pitchÂ¹ about it.
It reminds me a bit of IPv6 several years ago. Rather than saying

essence Åyou operators are dummiesÂ¹ for not already fixing this,

maybe

assume the engineers all Åget itÂ¹ and what to do it. Because we

really

do
get it and want to do something about it. Then ask those operators

what

they need to convince their leadership and their suppliers and

product

managers and whomever else that it needs to be resourced more effectively
(see above for example).
WeÂ¹re at least part of the way there in DOCSIS networks. It is in

D3.1

by
default, and weÂ¹re starting trials now. And probably within 18-24

months

we wonÂ¹t buy any DOCSIS CPE that is not 3.1.
The question for me is how and when to address it in DOCSIS 3.0.
- Jason

-- Sent with K-@ Mail - the evolution of emailing.

Sebastian Moeller

2015-03-20 13:46:29 UTC

Hi David,

But it does, by giving a baseline to compare the ping tim under load against ;)

Post by David P. Reed
So giving them ping time is meaningless.
I think most network engineers think ping time is a useful measure of a badly bufferbloated system. It is not.
The only measure is ping time under maximum load of raw packets.

Why raw packets? But yes I agree; I think “ping” in this discussion here is short hand for "latency measurement under load” which writes a bit unwieldy. The typical speed tests are almost there as they already perform (half of) the create maximum load requirement for the additional measurements we need (as well as already measuring unloaded latency, they all already report a “ping” number back, but that is best case RTT, so the baseline with which to compare the latency under load number (well obviously both numbers should be measured exactly the same)). Measuring latency under simultaneous saturation of both up- and downlink would be even better, but measuring it during simplex saturation should already give meaningful numbers.
I think it would be great if speedtest sites could agree to measure and report such a number, so that end customers had data to base their ISP selection on (at least those fortunate few that actually have ISP choice…).

Post by David P. Reed
And that requires a way to test maximum load rtt.
There is no problem with that ... other than that to understand why and how that is relevant you have to understand Internet congestion control.
Having had to testify before CRTC about this, I learned that most access providers (the Canadian ones) claim that such measurements are never made as a measure of quality, and that you can calculate expected latency by using Little's lemma from average throughput. And that dropped packets are the right measure of quality of service.
Ookla ping time is useless in a context where even the "experts" wearing ties from the top grossing Internet firms are so confused. And maybe deliberately misleading on purpose... they had to be forced to provide any data they had about congestion in their networks by a ruling during the proceeding and then responded that they had no data - they never measured queueing delay and disputed that it mattered. The proper measure of congestion was throughput.
I kid you not.
So Ookla ping time is useless against such public ignorance.

But, if people make their choice of (higher/ more expensive) service tiers dependent on its behavior at “capacity” as approximated by a speedtest latency under (full) load test that would make it much easier for ISPs to actually respond to it; even marketing can realize that this can be monetized ;)

Best Regards
Sebastian

[...]

MUSCARIELLO Luca IMT/OLN

2015-03-20 14:05:36 UTC

I don't know.
From my personal experience, I feel like the "expert" wearing ties
watch the speed meter and the needle moving across the red bar.

We just need to be sure about the colors: when the latency goes into the
crazy region
the needle has to cross a RED bar! GREEN is good, RED is bad (exceptions
apply in case of daltonism).

Maybe I'm oversimplifying... but not that much...

If your solution is to educate people with ties on Internet congestion
control I feel bad...

Luca

Post by David P. Reed
The mystery in most users' minds is that ping at a time when there is
no load does tell them anything at all about why the network
connection will such when their kid is uploading to youtube.
So giving them ping time is meaningless.
I think most network engineers think ping time is a useful measure of
a badly bufferbloated system. It is not.
The only measure is ping time under maximum load of raw packets.
And that requires a way to test maximum load rtt.
There is no problem with that ... other than that to understand why
and how that is relevant you have to understand Internet congestion
control.
Having had to testify before CRTC about this, I learned that most
access providers (the Canadian ones) claim that such measurements are
never made as a measure of quality, and that you can calculate
expected latency by using Little's lemma from average throughput. And
that dropped packets are the right measure of quality of service.
Ookla ping time is useless in a context where even the "experts"
wearing ties from the top grossing Internet firms are so confused. And
maybe deliberately misleading on purpose... they had to be forced to
provide any data they had about congestion in their networks by a
ruling during the proceeding and then responded that they had no data
- they never measured queueing delay and disputed that it mattered.
The proper measure of congestion was throughput.
I kid you not.
So Ookla ping time is useless against such public ignorance.
That's completely wrong for
On Mar 20, 2015, MUSCARIELLO Luca IMT/OLN
I agree. Having that ping included in Ookla would help a lot more
Luca
Netalyzr is great for network geeks, hardly consumer-friendly, and even so
the "network buffer measurements" part is buried in 150 other statistics.
Why couldn't Ookla* add a simultaneous "ping" test to their throughput
test? When was the last time someone leaned on them?
*I realize not everyone likes the Ookla tool, but it is popular and about
as "sexy" as you are going to get with a network performance tool.
-Greg
I do think engineers operating networks get it, and that Comcast's
engineers really get it, as I clarified in my followup note.
The issue is indeed prioritization of investment,
engineering resources
and management attention. The teams at Comcast in the
engineering side
have been the leaders in "bufferbloat minimizing" work,
and I think they
should get more recognition for that.
I disagree a little bit about not having a test that shows
the issue, and
the value the test would have in demonstrating the issue to users.
Netalyzer has been doing an amazing job on this since before the
bufferbloat term was invented. Every time I've talked
about this issue
I've suggested running Netalyzer, so I have a personal set
of comments
from people all over the world who run Netalyzer on their
home networks,
on hotel networks, etc.
When I have brought up these measurements from Netalyzr
(which are not
aimed at showing the problem as users experience) I observe an
interesting reaction from many industry insiders: the
results are not
"sexy enough for stupid users" and also "no one will care".
I think the reaction characterizes the problem correctly -
but the second
part is the most serious objection. People don't need a
measurement
tool, they need to know that this is why their home network sucks
sometimes.
On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"
How many years has it been since Comcast said
they were going to fix
bufferbloat in their network within a year?
IÂ¹m not sure anyone ever said itÂ¹d take a year. If
someone did (even if
it
was me) then it was in the days when the problem
appeared less
complicated
than it is and I apologize for that. LetÂ¹s face it -
the problem is
complex and the software that has to be fixed is
everywhere. As I said
about IPv6: if it were easy, itÂ¹d be done by now. ;-)
It's almost as if the cable companies don't
want OTT video or
simultaneous FTP and interactive gaming to
work. Of course not. They'd
never do that.
Sorry, but that seems a bit unfair. It flies in the
face of what we have
done and are doing. WeÂ¹ve underwritten some of DaveÂ¹s
work, we got
CableLabs to underwrite AQM work, and I personally
pushed like heck to
get
AQM built into the default D3.1 spec (had CTO-level
awareness & support,
and was due to Greg WhiteÂ¹s work at CableLabs). We are
starting to field
test D3.1 gear now, by the way. We made some bad bets
too, such as
trying
to underwrite an OpenWRT-related program with ISC, but
not every tactic
will always be a winner.
As for existing D3.0 gear, itÂ¹s not for lack of
trying. Has any DOCSIS
network of any scale in the world solved it? If so, I
have something to
use to learn from and apply here at Comcast - and IÂ¹d
**love** an
introduction to someone who has so I can get this info.
But usually there are rational explanations for why
something is still
not
done. One of them is that the at-scale operational
issues are more
complicated that some people realize. And there is
always a case of
prioritization - meaning things like running out of
IPv4 addresses and
not
having service trump more subtle things like buffer
bloat (and the
effort
to get vendors to support v6 has been tremendous).
I do understand there are strong forces against
us, especially in the
USA.
IÂ¹m not sure there are any forces against this issue.
ItÂ¹s more a
question
of awareness - it is not apparent it is more urgent
than other work in
everyoneÂ¹s backlog. For example, the number of ISP
customers even aware
of
buffer bloat is probably 0.001%; if customers arenÂ¹t
asking for it, the
product managers have a tough time arguing to
prioritize buffer bloat
work
over new feature X or Y.
One suggestion I have made to increase awareness is
that there be a
nice,
web-based, consumer-friendly latency under load /
bloat test that you
could get people to run as they do speed tests today.
(If someone thinks
they can actually deliver this, I will try to fund it
- ping me
off-list.)
I also think a better job can be done explaining
buffer bloat - itÂ¹s
hard
to make an Åelevator pitchÂ¹ about it.
It reminds me a bit of IPv6 several years ago. Rather
than saying in
essence Åyou operators are dummiesÂ¹ for not already
fixing this, maybe
assume the engineers all Åget itÂ¹ and what to do it.
Because we really
do
get it and want to do something about it. Then ask
those operators what
they need to convince their leadership and their
suppliers and product
managers and whomever else that it needs to be
resourced more
effectively
(see above for example).
WeÂ¹re at least part of the way there in DOCSIS
networks. It is in D3.1
by
default, and weÂ¹re starting trials now. And probably
within 18-24 months
we wonÂ¹t buy any DOCSIS CPE that is not 3.1.
The question for me is how and when to address it in
DOCSIS 3.0.
- Jason
<https://play.google.com/store/apps/details?id=com.onegravity.k10.pro2>*
- the evolution of emailing.

Sebastian Moeller

2015-03-20 10:07:19 UTC

Hi All,

I guess I have nothing to say that most of you don’t know already, but...

Post by Greg White
Netalyzr is great for network geeks, hardly consumer-friendly, and even so
the "network buffer measurements" part is buried in 150 other statistics.

The bigger issue with netalyzr is that it is a worst case probe with an unrelenting UDP “flood” that does not measure the “responsiveness/latency” of unrelated flows concurrently. In all fairness it not even tests the worst case as it floods up- and downlink sequentially and it seems to use the same port for all packets. This kind of traffic is well suited to measure the worst case buffering for misbehaving ((D)DOS) flows, not necessarily the amount of effective buffering well behaved flows encounter.
And then the help text related to “network buffer measurements” section in the results report seems to be actually misleading in that the used DOS traffic is assumed to be representative of normal traffic (also it does not allow for AQMs that manage normal responsive traffic better).
It would be so sweet, if they could also measure the ICMP RTT (or another type of timestamped tcp or udp flow) to say a well connected CDN concurrently to give a first approximation about the effect of link saturation on other competing flows; and then report the amount of change in that number caused by link saturation as the actual indicator of effective buffering...

Post by Greg White
Why couldn't Ookla* add a simultaneous "ping" test to their throughput
test? When was the last time someone leaned on them?
*I realize not everyone likes the Ookla tool, but it is popular and about
as "sexy" as you are going to get with a network performance tool.

I think you are right; instead of trying to get better tools out we might have a better chance of getting small modifications into existing tools.

Best Regards
Sebastian

Post by Greg White
-Greg

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix
bufferbloat in their network within a year?

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or
simultaneous FTP and interactive gaming to work. Of course not. They'd
never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we have
done and are doing. We¹ve underwritten some of Dave¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck to get
AQM built into the default D3.1 spec (had CTO-level awareness & support,
and was due to Greg White¹s work at CableLabs). We are starting to field
test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every tactic
will always be a winner.
As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
network of any scale in the world solved it? If so, I have something to
use to learn from and apply here at Comcast - and I¹d **love** an
introduction to someone who has so I can get this info.
But usually there are rational explanations for why something is still not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses and not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

Post by Dave Taht
I do understand there are strong forces against us, especially in the USA.

I¹m not sure there are any forces against this issue. It¹s more a question
of awareness - it is not apparent it is more urgent than other work in
everyone¹s backlog. For example, the number of ISP customers even aware of
buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
product managers have a tough time arguing to prioritize buffer bloat work
over new feature X or Y.
One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that you
could get people to run as they do speed tests today. (If someone thinks
they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat - it¹s hard
to make an Œelevator pitch¹ about it.
It reminds me a bit of IPv6 several years ago. Rather than saying in
essence Œyou operators are dummies¹ for not already fixing this, maybe
assume the engineers all Œget it¹ and what to do it. Because we really do
get it and want to do something about it. Then ask those operators what
they need to convince their leadership and their suppliers and product
managers and whomever else that it needs to be resourced more effectively
(see above for example).
We¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
default, and we¹re starting trials now. And probably within 18-24 months
we won¹t buy any DOCSIS CPE that is not 3.1.
The question for me is how and when to address it in DOCSIS 3.0.
- Jason

_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Rich Brown

2015-03-20 13:50:13 UTC

Back in July, I contacted the support groups at Ookla, speedof.me, and testmy.net, and all three responded, "Hmmm... We'll refer that to our techies for review." and I never heard back.

It seems to be hard to attract attention when there's only one voice crying in the wilderness. It might be worth sending a note to:

- Speedtest.net <***@speedtest.net> or open a ticket at: https://www.ookla.com/support
- SpeedOfMe <***@speedof.me>
- TestMyNet <***@testmy.net>

I append my (somewhat edited) note from July for your email drafting pleasure.

Rich

--- Sample Letter ---

Subject: Add latency measurements (min/max)

I have been using NAME-OF-SERVICE for quite a while to measure my network's performance. I had a couple thoughts that could make it more useful to me and others who want to test their network.

Your page currently displays a single "latency" value of the ping time before the data transfers begin. It would be really helpful to report real-time min/max latency measurements made *during the up and downloads*.

Why is latency interesting? Because when it's not well controlled, it completely destroys people's internet for voice, gaming, other time-sensitive traffic, and even everyday web browsing. As you may know, many routers (home and otherwise) buffer more data than can be sent, and this can dramatically affect latency for everyone using that router.

I'm asking you to consider implementing the web-equivalent of the "Quick Test for Bufferbloat" that's on the Bufferbloat site. (I'm a member of the Bufferbloat team.) http://www.bufferbloat.net/projects/cerowrt/wiki/Quick_Test_for_Bufferbloat

Please get back to me if you have questions.

Many thanks!

YOUR NAME
YOUR SIG

--- end of sample ---

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix
bufferbloat in their network within a year?

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or
simultaneous FTP and interactive gaming to work. Of course not. They'd
never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we have
done and are doing. We¹ve underwritten some of Dave¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck to get
AQM built into the default D3.1 spec (had CTO-level awareness & support,
and was due to Greg White¹s work at CableLabs). We are starting to field
test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every tactic
will always be a winner.
As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
network of any scale in the world solved it? If so, I have something to
use to learn from and apply here at Comcast - and I¹d **love** an
introduction to someone who has so I can get this info.
But usually there are rational explanations for why something is still not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses and not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

Post by Dave Taht
I do understand there are strong forces against us, especially in the USA.

I¹m not sure there are any forces against this issue. It¹s more a question
of awareness - it is not apparent it is more urgent than other work in
everyone¹s backlog. For example, the number of ISP customers even aware of
buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
product managers have a tough time arguing to prioritize buffer bloat work
over new feature X or Y.
One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that you
could get people to run as they do speed tests today. (If someone thinks
they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat - it¹s hard
to make an Œelevator pitch¹ about it.
It reminds me a bit of IPv6 several years ago. Rather than saying in
essence Œyou operators are dummies¹ for not already fixing this, maybe
assume the engineers all Œget it¹ and what to do it. Because we really do
get it and want to do something about it. Then ask those operators what
they need to convince their leadership and their suppliers and product
managers and whomever else that it needs to be resourced more effectively
(see above for example).
We¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
default, and we¹re starting trials now. And probably within 18-24 months
we won¹t buy any DOCSIS CPE that is not 3.1.
The question for me is how and when to address it in DOCSIS 3.0.
- Jason

_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

Pedro Tumusok

2015-03-29 17:36:11 UTC

Dslreports got a new speedtester up, anybody know Justin or some of the
other people over there?

http://www.dslreports.com/speedtest

Maybe somebody on here could even lend a hand in getting them to implement
features like ping under load etc.

Pedro

Post by Rich Brown

Post by Greg White
Netalyzr is great for network geeks, hardly consumer-friendly, and even

Post by Greg White
the "network buffer measurements" part is buried in 150 other statistics.
Why couldn't Ookla* add a simultaneous "ping" test to their throughput
test? When was the last time someone leaned on them?
*I realize not everyone likes the Ookla tool, but it is popular and about
as "sexy" as you are going to get with a network performance tool.
-Greg

Back in July, I contacted the support groups at Ookla, speedof.me, and
testmy.net, and all three responded, "Hmmm... We'll refer that to our
techies for review." and I never heard back.
It seems to be hard to attract attention when there's only one voice
https://www.ookla.com/support
I append my (somewhat edited) note from July for your email drafting pleasure.
Rich
--- Sample Letter ---
Subject: Add latency measurements (min/max)
I have been using NAME-OF-SERVICE for quite a while to measure my
network's performance. I had a couple thoughts that could make it more
useful to me and others who want to test their network.
Your page currently displays a single "latency" value of the ping time
before the data transfers begin. It would be really helpful to report
real-time min/max latency measurements made *during the up and downloads*.
Why is latency interesting? Because when it's not well controlled, it
completely destroys people's internet for voice, gaming, other
time-sensitive traffic, and even everyday web browsing. As you may know,
many routers (home and otherwise) buffer more data than can be sent, and
this can dramatically affect latency for everyone using that router.
I'm asking you to consider implementing the web-equivalent of the "Quick
Test for Bufferbloat" that's on the Bufferbloat site. (I'm a member of the
Bufferbloat team.)
http://www.bufferbloat.net/projects/cerowrt/wiki/Quick_Test_for_Bufferbloat
Please get back to me if you have questions.
Many thanks!
YOUR NAME
YOUR SIG
--- end of sample ---

and

Post by d***@reed.com
the value the test would have in demonstrating the issue to users.
Netalyzer has been doing an amazing job on this since before the
bufferbloat term was invented. Every time I've talked about this issue
I've suggested running Netalyzer, so I have a personal set of comments
from people all over the world who run Netalyzer on their home networks,
on hotel networks, etc.
When I have brought up these measurements from Netalyzr (which are not
aimed at showing the problem as users experience) I observe an
interesting reaction from many industry insiders: the results are not
"sexy enough for stupid users" and also "no one will care".
I think the reaction characterizes the problem correctly - but the

second

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix
bufferbloat in their network within a year?

IÂ¹m not sure anyone ever said itÂ¹d take a year. If someone did (even if
it
was me) then it was in the days when the problem appeared less complicated
than it is and I apologize for that. LetÂ¹s face it - the problem is
complex and the software that has to be fixed is everywhere. As I said
about IPv6: if it were easy, itÂ¹d be done by now. ;-)

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or
simultaneous FTP and interactive gaming to work. Of course not.

They'd

Post by d***@reed.com
never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we

have

done and are doing. WeÂ¹ve underwritten some of DaveÂ¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck to get
AQM built into the default D3.1 spec (had CTO-level awareness &

support,

and was due to Greg WhiteÂ¹s work at CableLabs). We are starting to

field

test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every tactic
will always be a winner.
As for existing D3.0 gear, itÂ¹s not for lack of trying. Has any DOCSIS
network of any scale in the world solved it? If so, I have something to
use to learn from and apply here at Comcast - and IÂ¹d **love** an
introduction to someone who has so I can get this info.
But usually there are rational explanations for why something is still not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses and not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

Post by Dave Taht
I do understand there are strong forces against us, especially in the USA.

IÂ¹m not sure there are any forces against this issue. ItÂ¹s more a
question
of awareness - it is not apparent it is more urgent than other work in
everyoneÂ¹s backlog. For example, the number of ISP customers even aware
of
buffer bloat is probably 0.001%; if customers arenÂ¹t asking for it, the
product managers have a tough time arguing to prioritize buffer bloat work
over new feature X or Y.
One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that you
could get people to run as they do speed tests today. (If someone

thinks

they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat - itÂ¹s
hard
to make an Åelevator pitchÂ¹ about it.
It reminds me a bit of IPv6 several years ago. Rather than saying in
essence Åyou operators are dummiesÂ¹ for not already fixing this, maybe
assume the engineers all Åget itÂ¹ and what to do it. Because we really
do
get it and want to do something about it. Then ask those operators what
they need to convince their leadership and their suppliers and product
managers and whomever else that it needs to be resourced more effectively
(see above for example).
WeÂ¹re at least part of the way there in DOCSIS networks. It is in D3.1
by
default, and weÂ¹re starting trials now. And probably within 18-24

months

we wonÂ¹t buy any DOCSIS CPE that is not 3.1.
The question for me is how and when to address it in DOCSIS 3.0.
- Jason

_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

--
Best regards / Mvh
Jan Pedro Tumusok

Jonathan Morton

2015-03-30 07:06:59 UTC

Dslreports got a new speedtester up, anybody know Justin or some of the other people over there?
http://www.dslreports.com/speedtest
Maybe somebody on here could even lend a hand in getting them to implement features like ping under load etc.

I gave that test a quick try. It measured my download speed well enough, but the upload…

Let’s just say it effectively measured the speed to my local webcache, not to the server itself.

- Jonathan Morton

Livingood, Jason

2015-03-20 13:57:01 UTC

Post by Greg White
*I realize not everyone likes the Ookla tool, but it is popular and about
as "sexy" as you are going to get with a network performance tool.

Ookla has recently been acquired by Ziff-Davis
(http://finance.yahoo.com/news/ziff-davis-acquires-ookla-120100454.html).
I am not sure have that may influence their potential involvement. I have
suggested they add this test previously. I also suggested it be added to
the FCC¹s SamKnows / Measuring Broadband American platform and that the
FCC potentially does a one-off special report on the results.

- Jason

David P. Reed

2015-03-20 14:08:27 UTC

SamKnows is carefully constructed politically to claim that everyone has great service and no problems are detected. They were constructed by opponents of government supervision - the corporate FCC lobby.

Don't believe they have any incentive to measure customer relevant measures

M-Lab is better by far. But control by Google automatically discredits it's data. As well as the claims by operators that measurements by independent parties violate their trade secrets. Winning that battle requires a group that can measure while supporting a very expensive defense against lawsuits by operators making such claim of trade secrecy.

Criticizing M-LAB is just fodder fir the operators' lobby in DC.

Post by Greg White
*I realize not everyone likes the Ookla tool, but it is popular and

about

Post by Greg White
as "sexy" as you are going to get with a network performance tool.

Ookla has recently been acquired by Ziff-Davis
(http://finance.yahoo.com/news/ziff-davis-acquires-ookla-120100454.html).
I am not sure have that may influence their potential involvement. I have
suggested they add this test previously. I also suggested it be added to
the FCCÂ¹s SamKnows / Measuring Broadband American platform and that the
FCC potentially does a one-off special report on the results.
- Jason

-- Sent with K-@ Mail - the evolution of emailing.

MUSCARIELLO Luca IMT/OLN

2015-03-20 14:14:48 UTC

FYI, we have this in France.

http://www.arcep.fr/index.php?id=8571&tx_gsactualite_pi1[uid]=1701&tx_gsactualite_pi1[annee]=&tx_gsactualite_pi1[theme]=&tx_gsactualite_pi1[motscle]=&tx_gsactualite_pi1[backID]=26&cHash=f558832b5af1b8e505a77860f9d555f5&L=1

ARCEP is the equivalent of FCC in France.

User QoS is measured in the fixed access by third parties.
The tests they run can be ameliorated of course but the concept is right.
The data is then published periodically.

Luca

Post by David P. Reed
M-Lab is better by far. But control by Google automatically discredits
it's data. As well as the claims by operators that measurements by
independent parties violate their trade secrets. Winning that battle
requires a group that can measure while supporting a very expensive
defense against lawsuits by operators making such claim of trade secrecy.

Matt Mathis

2015-03-20 14:48:36 UTC

Section 7.2 of
https://tools.ietf.org/html/draft-ietf-ippm-model-based-metrics-04 includes
a bufferbloat test. It is however somewhat underspecified.

Thanks,
--MM--
The best way to predict the future is to create it. - Alan Kay

Privacy matters! We know from recent events that people are using our
services to speak in defiance of unjust governments. We treat privacy and
security as matters of life and death, because for some users, they are.

On Fri, Mar 20, 2015 at 7:14 AM, MUSCARIELLO Luca IMT/OLN <

Post by MUSCARIELLO Luca IMT/OLN
FYI, we have this in France.
http://www.arcep.fr/index.php?id=8571&tx_gsactualite_pi1[
uid]=1701&tx_gsactualite_pi1[annee]=&tx_gsactualite_pi1[
theme]=&tx_gsactualite_pi1[motscle]=&tx_gsactualite_pi1[backID]=26&cHash=
f558832b5af1b8e505a77860f9d555f5&L=1
ARCEP is the equivalent of FCC in France.
User QoS is measured in the fixed access by third parties.
The tests they run can be ameliorated of course but the concept is right.
The data is then published periodically.
Luca

Post by David P. Reed
M-Lab is better by far. But control by Google automatically discredits
it's data. As well as the claims by operators that measurements by
independent parties violate their trade secrets. Winning that battle
requires a group that can measure while supporting a very expensive defense
against lawsuits by operators making such claim of trade secrecy.

_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

V***@vt.edu

2015-03-20 18:04:40 UTC

Post by David P. Reed
M-Lab is better by far. But control by Google automatically discredits it's data.
Criticizing M-LAB is just fodder fir the operators' lobby in DC.

I'm trying to get those two statements to play nice together, but keep having
to beat down the cognitive dissonance with a stick.

And why, exactly, are they automatically discredited? Unless you live in one
of the very few places that has Google Fiber, you're someplace where Google
has a vested interest in improving the eyeball-to-content connection, because
they want to get content to you.

Jim Gettys

2015-03-20 13:48:28 UTC

On Thu, Mar 19, 2015 at 3:58 PM, Livingood, Jason <

Post by d***@reed.com
How many years has it been since Comcast said they were going to fix
bufferbloat in their network within a year?

IÂ¹m not sure anyone ever said itÂ¹d take a year. If someone did (even if it
was me) then it was in the days when the problem appeared less complicated
than it is and I apologize for that. LetÂ¹s face it - the problem is
complex and the software that has to be fixed is everywhere. As I said
about IPv6: if it were easy, itÂ¹d be done by now. ;-)

âI think this was the hope that the buffer size control feature in Docsis
could at least be used to cut bufferbloat down to the "traditional" 100ms
level, as I remember the sequence of events. But reality intervened: buggy
implementations by too many vendors, is what I remember hearing from Rich
Woundy.
â

Post by d***@reed.com
It's almost as if the cable companies don't want OTT video or
simultaneous FTP and interactive gaming to work. Of course not. They'd
never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we have
done and are doing. WeÂ¹ve underwritten some of DaveÂ¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck to get
AQM built into the default D3.1 spec (had CTO-level awareness & support,
and was due to Greg WhiteÂ¹s work at CableLabs). We are starting to field
test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every tactic
will always be a winner.
As for existing D3.0 gear, itÂ¹s not for lack of trying. Has any DOCSIS
network of any scale in the world solved it? If so, I have something to
use to learn from and apply here at Comcast - and IÂ¹d **love** an
introduction to someone who has so I can get this info.
But usually there are rational explanations for why something is still not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses and not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

Post by Dave Taht
I do understand there are strong forces against us, especially in the USA.

âI agree with Jason on this one. We have to take bufferbloat mainstream to
generate "market pull". I've been reluctant in the past before we had
solutions in hand: very early in this quest, Dave Clark noted:
â"Yelling fire without having the exits marked" could be counter
productive. I think we have the exits marked now. Time to yell "Fire".

Even when you get to engineers in the organizations who build the
equipment, it's hard. First you have to explain that "more is not better",
and "some packet loss is good for you".

Day to day market pressures for other features mean that:
1) many/most of the engineers

âdon't see that âas what they need to do in the next quarter/year.
2) their management don't see that working on it should take any of their
time. It won't help them sell the next set of gear.

***So we have to generate demand from the market.***

Now, I can see a couple ways to do this:

1) help expose the problem, preferably in a dead simple way that everyone
sees. If we can get Ookla to add a simple test to their test system, this
would be a good start. If not, other test sites are needed. Nice as
Netalyzer is, it a) tops out around 20Mbps, and b) buries the buffering
results among 50 other numbers.
2) Markets such as gaming are large, and very latency sensitive. Even
better, lots of geeks hang out there. So investing in educating that
submarket may help pull things through the system overall.
3) Competitive pressures can be very helpful: but this requires at least
one significant player in each product category to "get it". So these are
currently slow falling dominoes.

One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that you
could get people to run as they do speed tests today. (If someone thinks
they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat - itÂ¹s hard
to make an Åelevator pitchÂ¹ about it.

âYeah, the elevator pitch is hard, since a number of things around
bufferbloat are counter intuitive. I know, I've tried, and not really
succeeded. The best kinds of metaphors have been traffic related
("building parking lots at all the bottlenecks"), and explanations like
"packet loss is how the Internet enforces speed limits"
http://www.circleid.com/posts/20150228_packet_loss_how_the_internet_enforces_speed_limits/
.
â

It reminds me a bit of IPv6 several years ago. Rather than saying in
essence Åyou operators are dummiesÂ¹ for not already fixing this, maybe
assume the engineers all Åget itÂ¹ and what to do it.

âMany/most practicing engineers are still unaware of it, or if they have
heard the word bufferbloat, still don't "get it" that they see
bufferbloat's effects all the time.
â

Because we really do
get it and want to do something about it. Then ask those operators what
they need to convince their leadership and their suppliers and product
managers and whomever else that it needs to be resourced more effectively
(see above for example).
WeÂ¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
default, and weÂ¹re starting trials now. And probably within 18-24 months
we wonÂ¹t buy any DOCSIS CPE that is not 3.1.
The question for me is how and when to address it in DOCSIS 3.0.

âWe should talk at IETF.
â

- Jason
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

Livingood, Jason

2015-03-20 14:11:12 UTC

On 3/20/15, 9:48 AM, "Jim Gettys" <***@freedesktop.org<mailto:***@freedesktop.org>> wrote:
âI think this was the hope that the buffer size control feature in Docsis could at least be used to cut bufferbloat down to the "traditional" 100ms level, as I remember the sequence of events. But reality intervened: buggy implementations by too many vendors, is what I remember hearing from Rich Woundy.

Indeed!

If I can re-prioritize some work (and fight some internal battles) to do a buffer bloat trial this year (next few months) - would folks here be willing to give input on the design / parameters? It would not be perfect but would be along the lines of âwhatâs the best we can do regarding buffer bloat with the equipment/software/systems/network we have nowâ.

â
Even when you get to engineers in the organizations who build the equipment, it's hard. First you have to explain that "more is not better", and "some packet loss is good for you".

Thatâs right, Jim. The âsome packet loss is goodâ part is from what I have seen the hardest thing for people to understand. People have been trained to believe that any packet loss is terrible, not to mention that you should never fill a link to capacity (meaning either there should never be a bottleneck link anywhere on the Internet and/or that congestion should never occur anywhere).

***So we have to generate demand from the market.***

+1

1) help expose the problem, preferably in a dead simple way that everyone sees. If we can get Ookla to add a simple test to their test system, this would be a good start. If not, other test sites are needed. Nice as Netalyzer is, it a) tops out around 20Mbps, and b) buries the buffering results among 50 other numbers.

+1

2) Markets such as gaming are large, and very latency sensitive. Even better, lots of geeks hang out there. So investing in educating that submarket may help pull things through the system overall.

Consumer segments like gamers are very important. I suggest getting them coordinated in some manner. Create a campaign like #GamersAgainstBufferbloat / GamersAgainstBufferbloat.org or something.

We should talk at IETF.

Wish I were there! I will be in Amsterdam at the RIPE Atlas hack-a-thon. Some cool work happening on that measurement platform!

Jason

Michael Welzl

2015-03-20 14:54:07 UTC

Folks,

I understand the "wrong mindset" thing and the idea of AQM doing something better. Still, I'd like people to understand that packet loss often also comes with delay - for having to retransmit. This delay is not visible in the queue, but it's visible in the end system. It also comes with head-of-line blocking delay on the receiver side: at least with TCP, whatever has been received after a dropped packet needs to wait in the OS for the hole to be filled before it can be handed over to the application.

Here we're not talking a few ms more or less in the queue, we're talking an RTT, when enough DupACKs are produced to make the sender clock out the missing packet again. Else, we're talking an RTO, which can be much, much more than an RTT, and which is what TLP tries to fix (but TLP's timer is also 2 RTTs - so this is all about delay at RTT-and-higher magnitudes).

Again, significant delay can come from dropped packets - you just don't see it when all you measure is the queue. ECN can help.

Cheers,
Michael

Jim Gettys

2015-03-20 15:31:27 UTC

Post by Michael Welzl
Folks,

Post by Livingood, Jason
Thatâs right, Jim. The âsome packet loss is goodâ part is from what I

have seen the hardest thing for people to understand. People have been
trained to believe that any packet loss is terrible (..)
I understand the "wrong mindset" thing and the idea of AQM doing something
better. Still, I'd like people to understand that packet loss often also
comes with delay - for having to retransmit. This delay is not visible in
the queue, but it's visible in the end system. It also comes with
head-of-line blocking delay on the receiver side: at least with TCP,
whatever has been received after a dropped packet needs to wait in the OS
for the hole to be filled before it can be handed over to the application.
Here we're not talking a few ms more or less in the queue, we're talking
an RTT, when enough DupACKs are produced to make the sender clock out the
missing packet again. Else, we're talking an RTO, which can be much, much
more than an RTT, and which is what TLP tries to fix (but TLP's timer is
also 2 RTTs - so this is all about delay at RTT-and-higher magnitudes).
Again, significant delay can come from dropped packets - you just don't
see it when all you measure is the queue. ECN can help.

âAnd without AQM, the RTT's are often many times the actual speed of light
RTT's, sometimes measured in seconds. And you eventually get the losses
anyway, as the bloated queues overflow.

So without AQM, you are âoften/usually in much, much, much worse shape;
better to suffer the loss, and do the retransmit than wait forever.
- Jim

Post by Michael Welzl
Cheers,
Michael

Michael Welzl

2015-03-20 15:39:11 UTC

Sent from my iPhone

Post by Michael Welzl
Folks,

Thatâs right, Jim. The âsome packet loss is goodâ part is from what I have seen the hardest thing for people to understand. People have been trained to believe that any packet loss is terrible (..)

I understand the "wrong mindset" thing and the idea of AQM doing something better. Still, I'd like people to understand that packet loss often also comes with delay - for having to retransmit. This delay is not visible in the queue, but it's visible in the end system. It also comes with head-of-line blocking delay on the receiver side: at least with TCP, whatever has been received after a dropped packet needs to wait in the OS for the hole to be filled before it can be handed over to the application.
Here we're not talking a few ms more or less in the queue, we're talking an RTT, when enough DupACKs are produced to make the sender clock out the missing packet again. Else, we're talking an RTO, which can be much, much more than an RTT, and which is what TLP tries to fix (but TLP's timer is also 2 RTTs - so this is all about delay at RTT-and-higher magnitudes).
Again, significant delay can come from dropped packets - you just don't see it when all you measure is the queue. ECN can help.

âAnd without AQM, the RTT's are often many times the actual speed of light RTT's, sometimes measured in seconds. And you eventually get the losses anyway, as the bloated queues overflow.

not necessarily with ecn. and where in a burst loss occurs also matters

So without AQM, you are âoften/usually in much, much, much worse shape; better to suffer the loss, and do the retransmit than wait forever.

sure!!

- Jim

Post by Michael Welzl
Cheers,
Michael

Jonathan Morton

2015-03-20 16:31:53 UTC

Post by Michael Welzl
I'd like people to understand that packet loss often also comes with delay - for having to retransmit.

Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.

With ECN, of course, you don’t even have that caveat.

- Jonathan Morton

Michael Welzl

2015-03-20 20:59:37 UTC

Post by Michael Welzl
I'd like people to understand that packet loss often also comes with delay - for having to retransmit.

Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.

Actually, no: as I said, the delay caused by a dropped packet can be more than 1 RTT - even much more under some circumstances. Consider this quote from the intro of https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 :

***
To get a sense of just how long the RTOs are in relation to
connection RTTs, following is the distribution of RTO/RTT values on
Google Web servers. [percentile, RTO/RTT]: [50th percentile, 4.3];
[75th percentile, 11.3]; [90th percentile, 28.9]; [95th percentile,
53.9]; [99th percentile, 214].
***

That would be for the unfortunate case where you drop a packet at the end of a burst and you don't have TLP or anything, and only an RTO helps...

Cheers,
Michael

David P. Reed

2015-03-20 23:47:01 UTC

I think this is because there are a lot of packets in flight from end to end meaning that the window is wide open and has way overshot the mark. This can happen if the receiving end keeps opening it's window and has not encountered a lost frame. That is: the dropped or marked packets are not happening early eniugh.

Evaluating an RTO measure from an out of whack system that is not sending congestion signals is not a good source of data, unless you show the internal state of the endpoints that was going on at the same time.

Do the control theory.

Post by Michael Welzl
I'd like people to understand that packet loss often also comes with

delay - for having to retransmit.

Or, turning it upside down, itâs always a win to drop packets (in the

service of signalling congestion) if the induced delay exceeds the
inherent RTT.
Actually, no: as I said, the delay caused by a dropped packet can be
more than 1 RTT - even much more under some circumstances. Consider
this quote from the intro of
***
To get a sense of just how long the RTOs are in relation to
connection RTTs, following is the distribution of RTO/RTT values on
Google Web servers. [percentile, RTO/RTT]: [50th percentile, 4.3];
[75th percentile, 11.3]; [90th percentile, 28.9]; [95th percentile,
53.9]; [99th percentile, 214].
***
That would be for the unfortunate case where you drop a packet at the
end of a burst and you don't have TLP or anything, and only an RTO
helps...
Cheers,
Michael

-- Sent with K-@ Mail - the evolution of emailing.

Michael Welzl

2015-03-21 00:08:00 UTC

Post by David P. Reed
I think this is because there are a lot of packets in flight from end to end meaning that the window is wide open and has way overshot the mark. This can happen if the receiving end keeps opening it's window and has not encountered a lost frame. That is: the dropped or marked packets are not happening early eniugh.

... or they're so early that there are not enough RTT samples for a meaningful RTT measure.

Post by David P. Reed
Evaluating an RTO measure from an out of whack system that is not sending congestion signals is not a good source of data, unless you show the internal state of the endpoints that was going on at the same time.
Do the control theory.

Well - the RTO calculation can easily go out of whack when there is some variation, due to the + 4*RTTVAR bit. I don't need control theory to show that, a simple Excel sheet with a few realistic example numbers is enough. There's not much deep logic behind the 4*RTTVAR AFAIK - probably 4 worked ok in tests that Van did back then. Okay though, as fine tuning would mean making more assumptions about the path which is unknown in TCP - its just a conservative calculation, and the RTO being way too large often just doesn't matter much (thanks to DupACKs). Anyway, sometimes it can - and then a dropped packet can be pretty bad.

Cheers
Michael

Post by David P. Reed
I'd like people to understand that packet loss often also comes with delay - for having to retransmit.
Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.
***
To get a sense of just how long the RTOs are in relation to
connection RTTs, following is the distribution of RTO/RTT values on
Google Web servers. [percentile, RTO/RTT]: [50th percentile, 4.3];
[75th percentile, 11.3]; [90th percentile, 28.9]; [95th percentile,
53.9]; [99th percentile, 214].
***
That would be for the unfortunate case where you drop a packet at the end of a burst and you don't have TLP or anything, and only an RTO helps...
Cheers,
Michael

David Lang

2015-03-21 00:03:16 UTC

Post by Michael Welzl
I'd like people to understand that packet loss often also comes with delay - for having to retransmit.

Or, turning it upside down, itâs always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.

You are viewing this as a question to drop a packet or not drop a packet.

The problem is that isn't the actual question.

The question is to drop a packet early and have the sender slow down, or wait
until the sender has filled the buffer to the point that all traffic (including
acks) is experiencing multi-second latency and then drop a bunch of packets.

In theory ECN would allow for feedback to the sender to have it slow down
without any packet being dropped, but in the real world it doesn't work that
well.

1. If you mark packets as congested if they have ECN and drop them if they
don't, programmers will mark everything ECN (and not slow transmission) because
doing so gives them an advantage over applications that don't mark their packets
with ECN

marking packets with ECN gives an advantage to them in mixed environments

2. If you mark packets as congested at a lower level than where you drop them,
no programmer is going to enable ECN because flows with ECN will be prioritized
below flows without ECN

If everyone use ECN you don't have a problem, but if only some
users/applications do, there's no way to make it equal, so one or the other is
going to have an advantage, programmers will game the system to do whatever
gives them the advantage

David Lang

Steinar H. Gunderson

2015-03-21 00:13:07 UTC

Post by David Lang
1. If you mark packets as congested if they have ECN and drop them
if they don't, programmers will mark everything ECN (and not slow
transmission) because doing so gives them an advantage over
applications that don't mark their packets with ECN

I'm not sure if this is actually true. Somehow TCP stacks appear to be tricky
enough to mess with that the people who are capable of gaming congestion
control algorithms are also wise enough not to do so. Granted, we are seeing
some mild IW escalation, but you could very well make a TCP that's
dramatically unfair to everything else and deploy that on your CDN, and
somehow we're not seeing that.

(OK, concession #2, “download accelerators” are doing really bad things with
multiple connections to gain TCP unfairness, but that's on the client side
only, not the server side.)

Based on this, I'm not convinced that people would bulk-mark their packets as
ECN-capable just to get ahead in the queues. It _is_ hard to know when to
drop and when to ECN-mark, though; maybe you could imagine the benefits of
ECN (for the flow itself) to be big enough that you don't actually need to
lower the drop probability (just make the ECN probability a bit higher),
but this is pure unfounded speculation on my behalf.

/* Steinar */

--
Homepage: http://www.sesse.net/

David Lang

2015-03-21 00:25:08 UTC

Post by Steinar H. Gunderson

It doesn't take deep mucking with the TCP stack. A simple iptables rule to OR a
bit on as it's leaving the box would make the router think that the system has
ECN enabled (or do it on your local gateway if you think it gives you higher
priority over the wider network)

If you start talking about ECN and UDP things are even simpler, there's no need
to go through the OS stack at all, craft your own packets and send the raw
packets

Post by Steinar H. Gunderson
(OK, concession #2, âdownload acceleratorsâ are doing really bad things with
multiple connections to gain TCP unfairness, but that's on the client side
only, not the server side.)
Based on this, I'm not convinced that people would bulk-mark their packets as
ECN-capable just to get ahead in the queues.

Given the money they will spend and the cargo-cult steps that gamers will do in
the hope of gaining even a slight advantage, I can easily see this happening

Post by Steinar H. Gunderson
It _is_ hard to know when to
drop and when to ECN-mark, though; maybe you could imagine the benefits of
ECN (for the flow itself) to be big enough that you don't actually need to
lower the drop probability (just make the ECN probability a bit higher),
but this is pure unfounded speculation on my behalf.

As I said, there are two possibilities

1. if you mark packets sooner than you would drop them, advantage non-ECN

2. if you mark packets and don't drop them until higher levels, advantage ECN,
and big advantage to fake ECN

David Lang

Jonathan Morton

2015-03-21 00:34:23 UTC

Post by David Lang
As I said, there are two possibilities
1. if you mark packets sooner than you would drop them, advantage non-ECN
2. if you mark packets and don't drop them until higher levels, advantage ECN, and big advantage to fake ECN

3: if you have flow isolation with drop-from-longest-queue-on-overflow, faking ECN doesn’t matter to other traffic - it just turns the faker’s allocation of queue into a dumb, non-AQM one. No problem.

- Jonathan Morton

David Lang

2015-03-21 00:38:26 UTC

3: if you have flow isolation with drop-from-longest-queue-on-overflow, faking ECN doesnât matter to other traffic - it just turns the fakerâs allocation of queue into a dumb, non-AQM one. No problem.

so if every flow is isolated so that what it generates has no effect on any
other traffic, what value does ECN provide?

and how do you decide what the fair allocation of bandwidth is between all the
threads?

David Lang

Jonathan Morton

2015-03-21 00:43:58 UTC

so if every flow is isolated so that what it generates has no effect on any other traffic, what value does ECN provide?

A *genuine* ECN flow benefits from reduced packet loss and smoother progress, because the AQM can signal congestion to it without dropping.

and how do you decide what the fair allocation of bandwidth is between all the threads?

Using DRR. This is what fq_codel does already, as it happens. As does cake.

In other words, the last half-dozen posts have been an argument about a solved problem.

- Jonathan Morton

Michael Welzl

2015-03-22 04:15:48 UTC

Post by Steinar H. Gunderson

It doesn't take deep mucking with the TCP stack. A simple iptables rule to OR a bit on as it's leaving the box would make the router think that the system has ECN enabled (or do it on your local gateway if you think it gives you higher priority over the wider network)
If you start talking about ECN and UDP things are even simpler, there's no need to go through the OS stack at all, craft your own packets and send the raw packets

Post by Steinar H. Gunderson
(OK, concession #2, “download accelerators” are doing really bad things with
multiple connections to gain TCP unfairness, but that's on the client side
only, not the server side.)
Based on this, I'm not convinced that people would bulk-mark their packets as
ECN-capable just to get ahead in the queues.

Given the money they will spend and the cargo-cult steps that gamers will do in the hope of gaining even a slight advantage, I can easily see this happening

As I said, there are two possibilities
1. if you mark packets sooner than you would drop them, advantage non-ECN

Agreed, with a risk of starvation of ECN flows as we've seen - this is not easy to get right and shouldn't be "just done somehow".

2. if you mark packets and don't drop them until higher levels, advantage ECN, and big advantage to fake ECN

Same level as you would normally drop is what the RFC recommends. Result: advantage ECN mostly because of the end-to-end effects I was explaining earlier, not because of the immediate queuing behavior (as figure 14 in https://www.duo.uio.no/handle/10852/37381 shows). "Big advantage to fake ECN" is the part I don't buy; I explained in more detail in the AQM list.

Cheers,
Michael

Michael Welzl

2015-03-21 00:15:35 UTC

Post by David Lang

Post by Michael Welzl
I'd like people to understand that packet loss often also comes with delay - for having to retransmit.

Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.

I think it's about time we finally turn it on in the real world.

Post by David Lang
1. If you mark packets as congested if they have ECN and drop them if they don't, programmers will mark everything ECN (and not slow transmission) because doing so gives them an advantage over applications that don't mark their packets with ECN

I heard this before but don't buy this as being a significant problem (and haven't seen evidence thereof either). Getting more queue space and occasionally getting a packet through that others don't isn't that much of an advantage - it comes at the cost of latency for your own application too unless you react to congestion.

Post by David Lang
marking packets with ECN gives an advantage to them in mixed environments
2. If you mark packets as congested at a lower level than where you drop them, no programmer is going to enable ECN because flows with ECN will be prioritized below flows without ECN

Well.... longer story. Let me just say that marking where you would otherwise drop would be fine as a starting point. You don't HAVE to mark lower than you'd drop.

Post by David Lang
If everyone use ECN you don't have a problem, but if only some users/applications do, there's no way to make it equal, so one or the other is going to have an advantage, programmers will game the system to do whatever gives them the advantage

I don't buy this at all. Game to gain what advantage? Anyway I can be more aggressive than everyone else if I want to, by backing off less, or not backing off at all, with or without ECN. Setting ECN-capable lets me do this with also getting a few more packets through without dropping - but packets get dropped at the hard queue limit anyway. So what's the big deal? What is the major gain that can be gained over others?

Cheers,
Michael

David Lang

2015-03-21 00:29:00 UTC

Post by David Lang

Post by Michael Welzl
I'd like people to understand that packet loss often also comes with delay - for having to retransmit.

Or, turning it upside down, itâs always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.

I think it's about time we finally turn it on in the real world.

Post by David Lang
1. If you mark packets as congested if they have ECN and drop them if they
don't, programmers will mark everything ECN (and not slow transmission)
because doing so gives them an advantage over applications that don't mark
their packets with ECN

I heard this before but don't buy this as being a significant problem (and
haven't seen evidence thereof either). Getting more queue space and
occasionally getting a packet through that others don't isn't that much of an
advantage - it comes at the cost of latency for your own application too
unless you react to congestion.

but the router will still be working to reduce traffic, so more non-ECN flows
will get packets dropped to reduce the
loadhttp://email.chase.com/10385c493layfousub74lnvqaaaaaahg7lbwdgdvonyyaaaaa/C?V=emlwX2NvZGUBAUNVU1RfTEFTVF9OTQFMQU5HAVJFV0FSRFNfQkF
MQU5DRQExNi43MwFnX2luZGV4AQFDVVNUX0ZJUlNUX05NAURBVklEAUxBU1RfNAE1NDE3AWxfaW5kZXgBAXByb2ZpbGVfaWQBNDg0Mzk5MjEyAW1haWxpbmdfaWQBMTE
0OTI5NTU5AV9XQVZFX0lEXwE4NTY2MDAxNzQBX1BMSVNUX0lEXwExNjgwMTYwMQFVTlFfRU5STF9DRAEyMTEyMzkzOTE1AWVtYWlsX2FkX2lkAQFMU1RfU1RNVF9EQVR
FATAyLzAxLzE1AWVtYWlsX2FkZHIBZGF2aWRAbGFuZy5obQFfU0NIRF9UTV8BMjAxNTAzMjAyMTAwMDABcHJvZmlsZV9rZXkBQTE0NjQ3MjgxMTQ%3D&KwXv5L3yGN8q
uPM67mqc0Q

Post by David Lang
marking packets with ECN gives an advantage to them in mixed environments
2. If you mark packets as congested at a lower level than where you drop
them, no programmer is going to enable ECN because flows with ECN will be
prioritized below flows without ECN

Well.... longer story. Let me just say that marking where you would otherwise
drop would be fine as a starting point. You don't HAVE to mark lower than
you'd drop.

Post by David Lang
If everyone use ECN you don't have a problem, but if only some
users/applications do, there's no way to make it equal, so one or the other
is going to have an advantage, programmers will game the system to do
whatever gives them the advantage

I don't buy this at all. Game to gain what advantage? Anyway I can be more
aggressive than everyone else if I want to, by backing off less, or not
backing off at all, with or without ECN. Setting ECN-capable lets me do this
with also getting a few more packets through without dropping - but packets
get dropped at the hard queue limit anyway. So what's the big deal? What is
the major gain that can be gained over others?

for gamers, even a small gain can be major. Don't forget that there's also the
perceived advantage "If I do this, everyone else's packets will be dropped and
mine will get through, WIN!!!"

David Lang

Michael Welzl

2015-03-22 04:10:25 UTC

Post by David Lang

Post by Michael Welzl
I'd like people to understand that packet loss often also comes with delay - for having to retransmit.

Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.

I think it's about time we finally turn it on in the real world.

Post by David Lang
1. If you mark packets as congested if they have ECN and drop them if they don't, programmers will mark everything ECN (and not slow transmission) because doing so gives them an advantage over applications that don't mark their packets with ECN

but the router will still be working to reduce traffic, so more non-ECN flows will get packets dropped to reduce the loadhttp://email.chase.com/10385c493layfousub74lnvqaaaaaahg7lbwdgdvonyyaaaaa/C?V=emlwX2NvZGUBAUNVU1RfTEFTVF9OTQFMQU5HAVJFV0FSRFNfQkF
MQU5DRQExNi43MwFnX2luZGV4AQFDVVNUX0ZJUlNUX05NAURBVklEAUxBU1RfNAE1NDE3AWxfaW5kZXgBAXByb2ZpbGVfaWQBNDg0Mzk5MjEyAW1haWxpbmdfaWQBMTE
0OTI5NTU5AV9XQVZFX0lEXwE4NTY2MDAxNzQBX1BMSVNUX0lEXwExNjgwMTYwMQFVTlFfRU5STF9DRAEyMTEyMzkzOTE1AWVtYWlsX2FkX2lkAQFMU1RfU1RNVF9EQVR
FATAyLzAxLzE1AWVtYWlsX2FkZHIBZGF2aWRAbGFuZy5obQFfU0NIRF9UTV8BMjAxNTAzMjAyMTAwMDABcHJvZmlsZV9rZXkBQTE0NjQ3MjgxMTQ%3D&KwXv5L3yGN8q
uPM67mqc0Q

Well.... longer story. Let me just say that marking where you would otherwise drop would be fine as a starting point. You don't HAVE to mark lower than you'd drop.

for gamers, even a small gain can be major. Don't forget that there's also the perceived advantage "If I do this, everyone else's packets will be dropped and mine will get through, WIN!!!"

I just addressed this with a message to the AQM list (should soon be in the archives: http://www.ietf.org/mail-archive/web/aqm/current/maillist.html ). In short, I don't see any clear indications for this "benefit". And clearly game developers also want low delay - blowing up the queue creates more delay... and without clear knowledge about how many flows are actively filling up the queue in parallel, there is a risk of creating extra delay with this for no actual benefit whatsoever.

Cheers,
Michael

Jonathan Morton

2015-03-20 18:14:03 UTC

Post by Livingood, Jason
Even when you get to engineers in the organizations who build the equipment, it's hard. First you have to explain that "more is not better", and "some packet loss is good for you".

That’s right, Jim. The “some packet loss is good” part is from what I have seen the hardest thing for people to understand. People have been trained to believe that any packet loss is terrible, not to mention that you should never fill a link to capacity (meaning either there should never be a bottleneck link anywhere on the Internet and/or that congestion should never occur anywhere).

That’s a rather interesting combination of viewpoints to have - and very revealing, too, of a fundamental disconnect between their mental theory of how the Internet works and how the Internet is actually used.

So here are some talking points that might be useful in an elevator pitch. The wording will need to be adjusted to circumstances.

In short, they’re thinking only about the *core* Internet. There, not being the bottleneck is a reasonably good idea, and packet loss is a reasonable metric of performance. Buffers are used to absorb momentary bursts exceeding the normal rate, and since the link is supposed to never be congested, it doesn’t matter for latency how big those buffers are. Adding capacity to satisfy that assumption is relatively easy, too - just plug in another 10G Ethernet module for peering, or another optical transceiver on a spare light-frequency for transit. Or so I hear.

But nobody sees the core Internet except a few technician types in shadowy datacentres. At least 99.999% of Internet users have to deal with the last mile on a daily basis - and it’s usually the last mile that is the bottleneck, unless someone *really* screwed up on a peering arrangement. The key technologies in the last mile are the head-end, the CPE modem, and the CPE router; the last two might be in the same physical box as each other. Those three are where we’re focusing our attention.

There, the basic assumption that the link should never be loaded to capacity is utter bunk. The only common benchmarks of Internet performance that most people have access to (and which CPE vendors perform) are to do precisely that, and see just how big they can make the resulting bandwidth number. And as soon as anyone starts a big TCP/IP-based upload or download, such as a software update or a video, the TCP stack in any modern OS will do its level best to load the link to capacity - and beyond. This is more than a simple buffer - of *any* size - can deal with.

As an aside, it’s occasionally difficult to convince last-mile ISPs that packet loss (of several percent, due to line quality, not congestion) *is* a problem. But in that case, it’s probably because it would cost money (and thus profit margin) to send someone out to fix the underlying physical cause. It really is a different world.

Once upon a time, the receive window of TCP was limited to 64KB, and the momentary bursts that could be expected from a single flow were limited accordingly. Those days are long gone. Given the chance, a modern TCP stack will increase the receive and congestion window to multi-megabyte proportions. Even on a premium, 100Mbps cable or FTTC downlink (which most consumers can’t afford and often can’t even obtain), that corresponds to roughly a whole second of buffering; an order of magnitude above the usual rule of thumb for buffer sizing. On slower links, the proportions are even more outrageous. Something to think about next time you’re negotiating microseconds with a high-frequency trading outfit.

I count myself among the camp of “packet loss is bad”. However, I have the sense to realise that if more packets are persistently coming into a box than can be sent out the other side, some of those packets *will* be lost, sooner or later. What AQM does is to signal (either through early loss or ECN marking) to the TCP endpoints that the link capacity has been reached, and it can stop pushing now - please - thank you. This allows the buffer to do its designed job of absorbing momentary bursts.

Given that last-mile links are often congested, it becomes important to distinguish between latency-sensitive and throughput-sensitive traffic flows. VoIP and online gaming are the most obvious examples of latency-sensitive traffic, but Web browsing is *also* more latency-sensitive than throughput-sensitive, for typical modern Web pages. Video streaming, software updates and uploading photos are good examples of throughput-sensitive applications; latency doesn’t matter much to them, since all they want to do is use the full link capacity.

The trouble is that often, in the same household, there are several different people using the same last-mile link, and they will tend to get home and spend their leisure time on the Internet at roughly the same time as each other. The son fires up his console to frag some noobs, and Mother calls her sister over VoIP; so far so good. But then Father decides on which movie to watch later that evening and starts downloading it, and the daughter starts uploading photos from her school field trip to goodness knows where. So there are now two latency-sensitive and and two throughput-sensitive applications using this single link simultaneously, and the throughput-sensitive ones have immediately loaded the link to capacity in both directions (one each).

So what happens then? You tell me - you know your hardware the best. Or haven’t you measured its behaviour under those conditions? Oh, for shame!

Okay, I’ll tell you what happens with 99.9% of head-end and CPE hardware out there today: Mother can’t hear her sister properly any more, nor vice versa. And not just because the son has just stormed out of his bedroom yelling about lag and how he would have pwned that lamer if only that crucial shot had actually gone where he knows he aimed it. But as far as Father and the daughter are concerned, the Internet is still working just fine - look, the progress bars are ticking along nicely! - until, that is, Father wants to read the evening news, but the news site’s front page takes half a minute to load, and half the images are missing when it does.

And Father knows that calling the ISP in the morning (when their call centre is open) won’t help. They’ll run tests and find absolutely nothing wrong, and not-so-subtly imply that he (or more likely his wife) is an idiotic time-waster. Of course, a weekday morning isn't when everyone’s using it, so nothing *is* wrong. The link is uncongested at the time of testing, latency is as low as it should be, and there’s no line-quality packet loss. The problem has mysteriously disappeared - only to reappear in the evening. It’s not even weather related, and the ISP insists that they have adequate backhaul and peering capacity.

So why? Because the throughput-sensitive applications fill not only the link capacity but the buffers in front of it (on both sides). Since it takes time for a packet at the back of each queue to reach the link, this induces latency - typically *hundreds* of milliseconds of it, and sometimes even much more than that; *minutes* in extreme cases. But both a VoIP call and a typical online game require latencies *below one hundred* milliseconds for optimum performance. That’s why Mother and the son had their respective evening activities ruined, and Father’s experience with the news site is representative of a particularly bad case.

The better AQM systems now available (eg. fq_codel) can separate latency-sensitive traffic from throughput-sensitive traffic and give them both the service they need. This will give your customers a far better experience in the reasonably common situation I just outlined - but only if you put it in your hardware product and make sure that it actually works. Otherwise, you’ll start losing customers to the first competitor who does.

- Jonathan Morton

Sebastian Moeller

2015-03-18 08:06:12 UTC