Broadband Platform Traffic Management - UPDATE

Other

Posted on: Thursday 7 December 2006, 16:17

This is an update to the previously reported problems related to our broadband network management system. A copy of the last announcement can be found here:-
http://usertools.plus.net/status/archive/1165333427.htm

We are continuing to investigate and work to resolve speed and performance issues, particularly during peak hours of the day, across certain traffic types. Some customers are reported slow speeds affecting all traffic types including web browsing.

These issues are caused by the bandwidth available on our platform being shared out disproportionately across customers. Usage has increased slightly on average, but across a small group of Premier and Business customers usage has increased massively since upgrades to up-to-8Mb speeds.

Whilst the changes made over the weekend to distribute priority web traffic were met with some reports of success, there are still many complaints of speed and performance issues affecting web browsing and other priority web-based Internet traffic. A knock on effect of this is that lower priority applications even for light users are being affected.

Diagnosing the sources of these issues has been extremely complex and our investigations continue for a number of reasons. Firstly, the experience does not seem to be consistent across each product tier. Whilst there are other factors that can affect speed and performance (such as exchange contention), there are cases where customers with identical account configurations are seeing large differences in performance.

Secondly, our data in some areas was inconsistent with what some users are reporting in our discussion forums. Changes we have made over recent days have not had the effect we anticipated and our graphing in places seemed to be at odds with what our customers are reporting.

Over recent days we have been contacting individual customers to help diagnose and resolve specific issues. Yesterday afternoon we contacted a number of users reporting substandard email performance. Initial diagnosis of the customers' problem revealed no discrepancies with the way their accounts or our systems were set up. As part of the investigation we moved the customers from their current profile to a new profile and then immediately reverted back to the original configuration. This instantly resolved the traffic issues they were reporting even though the configuration was the same as when originally checked. What that means is that even within identical profiles, customers were being treated differently by our platform.

This was obviously a cause for serious concern, so we took the decision to reload all the Ellacoya switches which was completed by 6am this morning. Analysis since we did this has returned a more consistent and accurate set of results. Our graphing is substantially different and we are now reporting increased usage in the silver queues and a decrease in gold traffic on the network. This suggests that some traffic was not being prioritised in the correct queues. Further investigation proved that changes to our Ellacoya database were not being loaded correctly into the switches. This is in part due to the increase in changes made over the last two weeks in an effort to resolve the issues customers were reporting. After speaking to senior engineers at Ellacoya, a problem has been identified that both ourselves and Ellacoya believe to be the cause of the issues we reported to them. We have a ticket open with Ellacoya and are awaiting further details regarding this matter.

This issue explains why the changes we were making were not benefiting customers immediately. It also explains why changes made that initially seemed to work became progressively worse. This in part has led to a lack of visibility in the customer forums as we have not been able to fully explain the issues customers were reporting.

Before the reloading of the switches all Usenet traffic on port 80 was being incorrectly marked in the gold queue. This was having a significant impact on all gold traffic and subsequently on silver & bronze traffic. This is now in the silver queue where it belongs.

Our graphing and network side monitoring is also now reporting accurately. This and the information from Ellacoya should help us further our investigations and assist in improving performance for customers. We will shortly be adding more graphing to the portal so that customers can see traffic across different gateways.

We are continuing to take action where a relatively small number of customers have a disproportionate amount of usage which is impacting other customers. We are sending MAC keys in extreme cases but in general, updating our automated and manual network implementation to redistribute their usage fairly across other customers.

We believe that the recently identified problem has impacted PAYG usage significantly in the past few weeks. Work to protect the peak time performance of customers who pay for their bandwidth per gigabyte will commence immediately and will be resolved next week. Light and average usage Premier and Business customers experiencing speeds issues caused by our platform will see progressive improvements through the weekend and next week.

Kind Regards,

Bob Pullen
Customer Support

Return to Index