You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The hang is from CPU overload by a rogue process that's allowed enough priority to only leave approximately 10-5 of cycles to other processes.
The process is seemingly a kernel process because of the large cycle allocation, and because the top output show high sirq directly before the hang which might be related to the hang.
The hang last for around 20 min, hang duration examples are 12, 17, 27, 33 min.
During the hang it's only servicing LAN including ping, its dropbear, telnetd, httpd, dnsmasq and WAN is unreachable. When it becomes unreachable the ping to it instantly goes from ~1 ms to ~45 ms.
The hang begin and end abruptly, there's no noticeable increase in latency or reduction in throughput or other problems directly before or directly after the hang.
There's no clear indication of its cause before or after the hang.
It's not possible to identify any client activity (use of a particular network software or a particular use of it, f.e. utorrent.exe or qbittorent.exe traffic increase, skype.exe connection, teamviewer.exe connection) with correlation to the error.
Request
dir600b-revb-ddwrt-webflash.bin built without NO_LOG so that a trace of the error might be written to /tmp/var/log/messages. Alternatively provide the exact commands to build from a default Debian installation.
Reports of similar occurrences, f.e. because "hang" in the dd-wrt.com/phpBB2 Google index doesn't give any meaningful suggestion.
System information
cat /tmp/loginprompt
DD-WRT v24-sp2 std (c) 2010 NewMedia-NET GmbH
Release: 08/07/10 (SVN revision: 14896)
nvram get DD_BOARD
Dlink-DIR600 rev b
Settings
These are the settings I'm aware that I've changed from the default
The syslog show no message for the period the hang occurs, in this example minutes after a hang the last message is 13 h old
date; cat /tmp/var/log/messages|tail -1
Fri Jan 4 13:01:38 UTC 2013
Jan 3 23:59:48 DD-WRT user.debug syslog: ttraff: data for 3-1-2013 commited to nvram
and the system has no other logs and to my knowledge there is no more logging that can be enabled. Please correct this statement if it's incorrect because additional logging could be important to trace this error.
[events/0] sometimes has high load directly after the hang (and is otherwise always idle), apparently from an increased event queue size because of the rogue process
4 2 root RW< 0 0.0 98.5 [events/0]
Other discussion
Directly after the hang the historical load average is high, f.e. (custom log)
The fact that top output freeze in the middle of an output illustrate how abruptly the rogue process begin to consume all cycles. This is two examples of hanged top output (the only meaningful pattern seem to be the high sirq load)
dnsmasq is not involved because this command was run (through an established ssh connection, taking around five minutes to visibly return) during the hang without affecting it
Is it possible to see the new connection rate to the system?
The router has a high connection count (/proc/sys/net/ipv4/netfilter/ip_conntrack_count) during normal operation (from bittorent traffic). And sometimes the connection count is higher after the hang, f.e.
So it's not clear that an increase in number of connections (or traffic) is correlated to the hang. And the connection increase can be correlated to the hang as a result (because WAN connections time out or are placed in queue during the hang) rather than a cause.
A high connection count by itself doesn't use much resources, and during normal operation (above) the load is often 0.00, so it would be beneficial to see the rate of new connections.
DNS lookup flood?
Is it possible to see the DNS lookup rate to dnsmasq?
The text was updated successfully, but these errors were encountered:
looks like a out of memory condition. limit the max conntrack to a sane value for the available memory space and yes from the high cpu load on dnsmasq its possible its a dns request flood. since dns is blocked from wan side by default this flood must be caused by your inner network. if the connection tracking table is full, you wont get any new connections established including ping. this will cause the router to be unresponsive which is also some sort of feature since it just protects itself.
Error
The hang is from CPU overload by a rogue process that's allowed enough priority to only leave approximately 10-5 of cycles to other processes.
The process is seemingly a kernel process because of the large cycle allocation, and because the
top
output show highsirq
directly before the hang which might be related to the hang.The hang last for around 20 min, hang duration examples are 12, 17, 27, 33 min.
During the hang it's only servicing LAN including
ping
, itsdropbear
,telnetd
,httpd
,dnsmasq
and WAN is unreachable. When it becomes unreachable the ping to it instantly goes from ~1 ms to ~45 ms.The hang begin and end abruptly, there's no noticeable increase in latency or reduction in throughput or other problems directly before or directly after the hang.
There's no clear indication of its cause before or after the hang.
It's not possible to identify any client activity (use of a particular network software or a particular use of it, f.e. utorrent.exe or qbittorent.exe traffic increase, skype.exe connection, teamviewer.exe connection) with correlation to the error.
Request
dir600b-revb-ddwrt-webflash.bin
built withoutNO_LOG
so that a trace of the error might be written to/tmp/var/log/messages
. Alternatively provide the exact commands to build from a defaultDebian
installation.Reports of similar occurrences, f.e. because "hang" in the dd-wrt.com/phpBB2 Google index doesn't give any meaningful suggestion.
System information
Settings
These are the settings I'm aware that I've changed from the default
I don't have the original settings however, how do I retrieve the
nvram show
output from the default setting so that I can confirm this list?The hang occured when
ip_conntrack_max
was the default4096
too.Request the status for other settings that you believe can be relevant and discuss why they might be relevant.
Error tracing
Syslog
The syslog show no message for the period the hang occurs, in this example minutes after a hang the last message is 13 h old
and the system has no other logs and to my knowledge there is no more logging that can be enabled. Please correct this statement if it's incorrect because additional logging could be important to trace this error.
Custom log
The custom log to attempt to indentify the problem is
Please suggest additional commands that you believe are useful for this log.
Identified patterns are
High
sirq
load for two min before the hang, f.e.compared to around 30% during normal operation, f.e.
[events/0]
sometimes has high load directly after the hang (and is otherwise always idle), apparently from an increased event queue size because of the rogue processOther discussion
Directly after the hang the historical load average is high, f.e. (custom log)
Compared to normal operation with unchanged client circumstances
The fact that
top
output freeze in the middle of an output illustrate how abruptly the rogue process begin to consume all cycles. This is two examples of hangedtop
output (the only meaningful pattern seem to be the highsirq
load)dnsmasq
dnsmasq
is not involved because this command was run (through an established ssh connection, taking around five minutes to visibly return) during the hang without affecting itConnection flood?
Is it possible to see the new connection rate to the system?
The router has a high connection count (
/proc/sys/net/ipv4/netfilter/ip_conntrack_count
) during normal operation (from bittorent traffic). And sometimes the connection count is higher after the hang, f.e.and sometimes not
So it's not clear that an increase in number of connections (or traffic) is correlated to the hang. And the connection increase can be correlated to the hang as a result (because WAN connections time out or are placed in queue during the hang) rather than a cause.
A high connection count by itself doesn't use much resources, and during normal operation (above) the load is often
0.00
, so it would be beneficial to see the rate of new connections.DNS lookup flood?
Is it possible to see the DNS lookup rate to
dnsmasq
?The text was updated successfully, but these errors were encountered: