Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cassandra-stress: counter_write workload got stuck with small population seq #54

Open
amoskong opened this issue Nov 23, 2017 · 1 comment
Assignees

Comments

@amoskong
Copy link
Contributor

from: scylladb/scylladb#2790 (comment)
I will retest with laster scylla.

Installation details
Scylla version (or git commit hash): 1.7.4-0.20170726.ff643e3 2.0.rc4-0.20170903.6e6de34
Cluster size: 4
OS (RHEL/CentOS/Ubuntu/AWS AMI): CentOS7

Description
cassandra-stress got stuck after the total ops number reached to population seq.
If we use a short duration (such as 10s), it won't exit.

Prepare
Create keyspace2 (tables: counter1, standard1)

CREATE KEYSPACE IF NOT EXISTS keyspace2
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

CREATE TABLE IF NOT EXISTS keyspace2.counter1 (
key blob PRIMARY KEY,
"C0" counter,
"C1" counter,
"C2" counter,
"C3" counter,
"C4" counter
) WITH COMPACT STORAGE
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL","rows_per_partition":"ALL"}'
AND comment = ''
AND compaction = {'class': 'SizeTieredCompactionStrategy'}
AND compression = {}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Result
No output from commandline, and raise an exception after some minutes.

$ cassandra-stress counter_write no-warmup cl=QUORUM duration=5m -schema 'replication(factor=1) compaction(strategy=DateTieredCompactionStrategy)' keyspace=keyspace2 -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10000 -node 10.240.0.4
Connected to cluster: longevity-50gb-4d-amosread-db-cluster-76310bd4, max pending requests per connection 128, max connections per host 8
Datatacenter: datacenter1; Host: /10.240.0.30; Rack: rack1
Datatacenter: datacenter1; Host: /10.240.0.19; Rack: rack1
Datatacenter: datacenter1; Host: /10.240.0.4; Rack: rack1
Datatacenter: datacenter1; Host: /10.240.0.27; Rack: rack1
Created keyspaces. Sleeping 1s for propagation.
Sleeping 2s...
Running COUNTER_WRITE with 10 threads 5 minutes
Failed to connect over JMX; not collecting these stats
type, total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb
total, 840, 840, 840, 840, 10.4, 3.6, 35.5, 87.1, 136.1, 136.1, 1.0, 0.00000, 0, 0, 0, 0, 0, 0
total, 1641, 767, 767, 767, 13.0, 9.7, 28.7, 209.9, 263.6, 263.6, 2.0, 0.04577, 0, 0, 0, 0, 0, 0
total, 3026, 1380, 1380, 1380, 7.2, 2.4, 22.0, 33.0, 44.0, 84.0, 3.0, 0.12653, 0, 0, 0, 0, 0, 0
total, 4213, 1133, 1133, 1133, 8.5, 9.3, 22.0, 32.6, 44.0, 65.1, 4.1, 0.09444, 0, 0, 0, 0, 0, 0
total, 5630, 1365, 1365, 1365, 7.5, 2.7, 22.0, 34.9, 76.0, 83.4, 5.1, 0.08485, 0, 0, 0, 0, 0, 0
total, 6626, 977, 977, 977, 10.2, 9.0, 26.5, 86.4, 122.9, 122.9, 6.2, 0.07556, 0, 0, 0, 0, 0, 0
total, 7659, 1020, 1020, 1020, 9.9, 10.2, 22.9, 45.9, 96.4, 102.6, 7.2, 0.06565, 0, 0, 0, 0, 0, 0
total, 9205, 1520, 1520, 1520, 6.6, 1.5, 21.9, 32.3, 46.1, 46.2, 8.2, 0.06942, 0, 0, 0, 0, 0, 0

java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck. Check GC/Heap size
at org.apache.cassandra.stress.util.Timing.snap(Timing.java:98)
at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156)
at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:37)
at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104)
at java.lang.Thread.run(Thread.java:748)

@amoskong
Copy link
Contributor Author

Reproduced with recent master.

scylla-jmx-666.development-20171121.f4ef4a5.el7.centos.noarch
scylla-conf-666.development-0.20171121.c1b97d1.el7.centos.x86_64
scylla-tools-core-666.development-20171121.c4ba9fc.el7.centos.noarch
scylla-server-666.development-0.20171121.c1b97d1.el7.centos.x86_64
scylla-tools-666.development-20171121.c4ba9fc.el7.centos.noarch
scylla-666.development-0.20171121.c1b97d1.el7.centos.x86_64
scylla-kernel-conf-666.development-0.20171121.c1b97d1.el7.centos.x86_64
$ cassandra-stress counter_write no-warmup cl=QUORUM duration=10s -schema 'replication(factor=1) compaction(strategy=DateTieredCompactionStrategy)' keyspace=keyspace2 -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10000
WARN  02:25:56 You listed localhost/0:0:0:0:0:0:0:1:9042 in your contact points, but it wasn't found in the control host's system.peers at startup
Connected to cluster: Test Cluster, max pending requests per connection 128, max connections per host 8
Datatacenter: datacenter1; Host: localhost/127.0.0.1; Rack: rack1
Created keyspaces. Sleeping 1s for propagation.
Sleeping 2s...
Running COUNTER_WRITE with 10 threads 10 seconds
Failed to connect over JMX; not collecting these stats
type,      total ops,    op/s,    pk/s,   row/s,    mean,     med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max ms,  sum ms,  sdv ms,      mb
total,          1415,    1415,    1415,    1415,     6.8,     6.4,    13.0,    18.7,    29.8,    31.5,    1.0,  0.00000,      0,      0,       0,       0,       0,       0
total,          3542,    2010,    2010,    2010,     4.9,     4.5,     9.3,    12.1,    24.8,    25.9,    2.1,  0.11598,      0,      0,       0,       0,       0,       0
total,          6759,    3136,    3136,    3136,     3.1,     2.9,     6.6,     8.4,    10.8,    12.5,    3.1,  0.18488,      0,      0,       0,       0,       0,       0
<wait for some minutes....>
java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck. Check GC/Heap size
        at org.apache.cassandra.stress.util.Timing.snap(Timing.java:98)
        at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156)
        at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:37)
        at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104)
        at java.lang.Thread.run(Thread.java:748)
<stuck, not return....>

c-s processes status:

amos      6433  0.0  0.0 113128  1568 pts/4    S+   02:25   0:00 /bin/sh /usr/bin/cassandra-stress counter_write no-warmup cl=QUORUM duration=10s -schema replication(factor=1) compaction(strategy=DateTieredCompactionStrategy) keyspace=keyspace2 -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10000
amos      6443  182  6.9 4554116 535888 pts/4  Sl+  02:25   7:22 /bin/java -server -ea -cp /tmp/tmp.5YXcq3OEDz:/usr/share/scylla/cassandra/lib/airline-0.6.jar:/usr/share/scylla/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/scylla/cassandra/lib/asm-5.0.4.jar:/usr/share/scylla/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/share/scylla/cassandra/lib/commons-cli-1.1.jar:/usr/share/scylla/cassandra/lib/commons-codec-1.2.jar:/usr/share/scylla/cassandra/lib/commons-lang3-3.1.jar:/usr/share/scylla/cassandra/lib/commons-math3-3.2.jar:/usr/share/scylla/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/scylla/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/scylla/cassandra/lib/disruptor-3.0.1.jar:/usr/share/scylla/cassandra/lib/ecj-4.4.2.jar:/usr/share/scylla/cassandra/lib/guava-18.0.jar:/usr/share/scylla/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/scylla/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/scylla/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/scylla/cassandra/lib/jamm-0.3.0.jar:/usr/share/scylla/cassandra/lib/javax.inject.jar:/usr/share/scylla/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/scylla/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/scylla/cassandra/lib/jna-4.0.0.jar:/usr/share/scylla/cassandra/lib/joda-time-2.4.jar:/usr/share/scylla/cassandra/lib/json-simple-1.1.jar:/usr/share/scylla/cassandra/lib/libthrift-0.9.2.jar:/usr/share/scylla/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/scylla/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/scylla/cassandra/lib/logback-core-1.1.3.jar:/usr/share/scylla/cassandra/lib/lz4-1.3.0.jar:/usr/share/scylla/cassandra/lib/metrics-core-3.1.0.jar:/usr/share/scylla/cassandra/lib/metrics-jvm-3.1.0.jar:/usr/share/scylla/cassandra/lib/metrics-logback-3.1.0.jar:/usr/share/scylla/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/scylla/cassandra/lib/ohc-core-0.4.3.jar:/usr/share/scylla/cassandra/lib/ohc-core-j8-0.4.3.jar:/usr/share/scylla/cassandra/lib/reporter-config3-3.0.0.jar:/usr/share/scylla/cassandra/lib/reporter-config-base-3.0.0.jar:/usr/share/scylla/cassandra/lib/sigar-1.6.4.jar:/usr/share/scylla/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/scylla/cassandra/lib/snakeyaml-1.11.jar:/usr/share/scylla/cassandra/lib/snappy-java-1.1.1.7.jar:/usr/share/scylla/cassandra/lib/ST4-4.0.8.jar:/usr/share/scylla/cassandra/lib/stream-2.5.2.jar:/usr/share/scylla/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/scylla/cassandra/apache-cassandra-3.0.8-SNAPSHOT.jar:/usr/share/scylla/cassandra/apache-cassandra.jar:/usr/share/scylla/cassandra/apache-cassandra-thrift-3.0.8-SNAPSHOT.jar:/usr/share/scylla/cassandra/scylla-tools-3.0.8-SNAPSHOT.jar:/usr/share/scylla/cassandra/stress.jar: -Dcassandra.storagedir= -Dlogback.configurationFile=logback-tools.xml org.apache.cassandra.stress.Stress counter_write no-warmup cl=QUORUM duration=10s -schema replication(factor=1) compaction(strategy=DateTieredCompactionStrategy) keyspace=keyspace2 -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10000
$ strace -p 6433
strace: Process 6433 attached
wait4(-1, 

$ strace -p 6443
strace: Process 6443 attached
futex(0x7f8e8089f9d0, FUTEX_WAIT, 6444, NULL

@amoskong amoskong changed the title cassandra-stress: counter_write workload got stuck with small population seq #2790 cassandra-stress: counter_write workload got stuck with small population seq Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants