forked from apache/cassandra
-
Notifications
You must be signed in to change notification settings - Fork 22
/
NEWS.txt
3210 lines (2817 loc) · 176 KB
/
NEWS.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
PLEASE READ: MAXIMUM TTL EXPIRATION DATE NOTICE (CASSANDRA-14092 & CASSANDRA-14227)
-----------------------------------------------------------------------------------
(General upgrading instructions are available in the next section)
The maximum expiration timestamp that can be represented by the storage engine has been
raised to 2106-02-07T06:28:13+00:00 (2038-01-19T03:14:06+00:00 if in compatibility mode
with Cassandra <5.0, sstable version <=oa), which means that inserts with TTL that expire after
this date are not currently supported. By default, INSERTS with TTL exceeding the
maximum supported date are rejected, but it's possible to choose a different
expiration overflow policy. See CASSANDRA-14092.txt for more details.
There is a new yaml property storage_compatibility_mode that determines the
Cassandra major version we want to stay compatible with. Its default is CASSANDRA_4, which means that
the node sstables, commitlog, hints and messaging version will stay compatible with Cassandra 4.x,
2038 will still be the limit, and it will be possible to rollback to the previous version. To upgrade:
- Do a rolling upgrade to 5.0 where 2038 will still be the limit. At this point, the node won't write
anything incompatible with Cassandra 4.x, and you would still be able to rollback to that version.
- Do a rolling restart setting storage_compatibility_mode=UPGRADING. Once all nodes
are in storage version 5, 2106 will become the new limit.
- Do a rolling restart setting storage_compatibility_mode=NONE. Now mixed
2038 and 2106 nodes are no longer possible.
Notice the yaml property needs to be set all the time for all executables and tools. It
will be removed in future versions when 2038 nodes are no longer possible.
Prior to 3.0.16 (3.0.X) and 3.11.2 (3.11.x) there was no protection against INSERTS
with TTL expiring after the maximum supported date, causing the expiration time
field to overflow and the records to expire immediately. Clusters in the 2.X and
lower series are not subject to this when assertions are enabled. Backed up SSTables
can be potentially recovered and recovery instructions can be found on the
CASSANDRA-14092.txt file.
If you use or plan to use very large TTLS (10 to 20 years), read CASSANDRA-14092.txt
for more information.
PLEASE READ: CVE-2021-44521 SCRIPTED UDF SYSTEM ACCESS (CASSANDRA-17352)
------------------------------------------------------------------------
If you have enabled scripted UDFs and run without UDF threads in cassandra.yaml:
enable_user_defined_functions_threads: false
an attacker could access java.lang.System methods and execute arbitrary code on
the machine. Disabling UDF threads is still considered insecure and not recommended.
To continue running without UDF threads you will need to set:
allow_insecure_udfs: true
and if you need access to java.lang.System for existing UDFs, set:
allow_extra_insecure_udfs: true
GENERAL UPGRADING ADVICE FOR ANY VERSION
========================================
Snapshotting is fast (especially if you have JNA installed) and takes
effectively zero disk space until you start compacting the live data
files again. Thus, best practice is to ALWAYS snapshot before any
upgrade, just in case you need to roll back to the previous version.
(Cassandra version X + 1 will always be able to read data files created
by version X, but the inverse is not necessarily the case.)
When upgrading major versions of Cassandra, you will be unable to
restore snapshots created with the previous major version using the
'sstableloader' tool. You can upgrade the file format of your snapshots
using the provided 'sstableupgrade' tool.
5.0
===
New features
------------
- A new configuration file, `cassandra_latest.yaml`, is provided for users that would like to evaluate and
experiment with the latest recommended features and changes in Cassandra, which provide improved functionality and
performance. This file is intended to be used in a development environment and is only recommended for production
use after careful evaluation. The file is located in the conf directory and is not selected by default.
To use it, one may specify the file using the `-Dcassandra.config` option, e.g. by running
`cassandra -Dcassandra.config=file://$CASSANDRA_HOME/conf/cassandra_latest.yaml`.
- Added a new authorizer, CIDR authorizer, to restrict user access based on CIDR groups.
- Pluggable crypto providers were made possible via `crypto_provider` section in cassandra.yaml. The default provider is
Amazon Corretto Crypto Provider and it is installed automatically upon node's start. Only x86_64 and aarch64 architectures are supported now.
Please consult upgrade section to know more details when upgrading from older Cassandra versions.
- Added a new secondary index implementation, Storage-Attached Indexes (SAI). Overview documentation and a basic
tutorial can be found at src/java/org/apache/cassandra/index/sai/README.md.
- *Experimental* support for Java 17 has been added. JVM options that differ between or are
specific for Java 17 have been added into jvm17.options.
IMPORTANT: Running C* on Java 17 is *experimental* and do it at your own risk.
- Added a new "unified" compaction strategy that supports the use cases of the legacy compaction strategies, with
low space overhead, high parallelism and flexible configuration. Implemented by the UnifiedCompactionStrategy
class. Further details and documentation can be found in
src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md
- New `VectorType` (cql `vector<element_type, dimension>`) which adds new fixed-length element arrays. See CASSANDRA-18504
- Added new vector similarity functions `similarity_cosine`, `similarity_euclidean` and `similarity_dot_product`.
- Removed UDT type migration logic for 3.6+ clusters upgrading to 4.0. If migration has been disabled, it must be
enabled before upgrading to 5.0 if the cluster used UDTs. See CASSANDRA-18504
- Entended max expiration time from 2038-01-19T03:14:06+00:00 to 2106-02-07T06:28:13+00:00
- Added new Mathematical CQL functions: abs, exp, log, log10 and round.
- Added a trie-based memtable implementation, which improves memory use, garbage collection efficiency and lookup
performance. The new memtable is implemented by the TrieMemtable class and can be selected using the memtable
API, see src/java/org/apache/cassandra/db/memtable/Memtable_API.md.
- Added a new trie-indexed SSTable format with better lookup efficiency and size. The new format removes the index
summary component and does not require key caching. Additionally, it is able to efficiently search in partitions
spanning thousands or millions of rows.
The format is applied by setting "bti" as the selected sstable format in cassandra.yaml's sstables option.
- Added a new configuration cdc_on_repair_enabled to toggle whether CDC mutations are replayed through the
write path on streaming, e.g. repair. When enabled, CDC data streamed to the destination node will be written into
commit log first. When disabled, the streamed CDC data is written into SSTables just the same as normal streaming.
If this is set to false, streaming will be considerably faster however it's possible that, in extreme situations
(losing > quorum # nodes in a replica set), you may have data in your SSTables that never makes it to the CDC log.
The default is true/enabled. The configuration can be altered via JMX.
- Added support for reading the write times and TTLs of the elements of collections and UDTs, regardless of being
frozen or not. The CQL functions writetime, maxwritetime and ttl can now be applied to entire collections/UDTs,
single collection/UDT elements and slices of collection/UDT elements.
- Added a new CQL function, maxwritetime. It shows the largest unix timestamp that the data was written, similar to
its sibling CQL function, writetime.
- New Guardrails added:
- Whether ALTER TABLE commands are allowed to mutate columns
- Whether SimpleStrategy is allowed on keyspace creation or alteration
- Maximum replication factor
- Whether DROP KEYSPACE commands are allowed.
- Column value size
- Partition size
- Partition tombstones
- Vector dimensions
- Whether it is possible to execute secondary index queries without restricting on partition key
- Warning and failure thresholds for maximum referenced SAI indexes on a replica when executing a SELECT query
- It is possible to list ephemeral snapshots by nodetool listsnaphots command when flag "-e" is specified.
- Added a new flag to `nodetool profileload` and JMX endpoint to set up recurring profile load generation on specified
intervals (see CASSANDRA-17821)
- Added a new property, gossiper.loose_empty_enabled, to allow for a looser definition of "empty" when
considering the heartbeat state of another node in Gossip. This should only be used by knowledgeable
operators in the following scenarios:
Currently "empty" w/regards to heartbeat state in Gossip is very specific to a single edge case (i.e. in
isEmptyWithoutStatus() our usage of hbState() + applicationState), however there are other failure cases which
block host replacements and require intrusive workarounds and human intervention to recover from when you
have something in hbState() you don't expect. See CASSANDRA-17842 for further details.
- Added new CQL table property 'allow_auto_snapshot' which is by default true. When set to false and 'auto_snapshot: true'
in cassandra.yaml, there will be no snapshot taken when a table is truncated or dropped. When auto_snapshot in
casandra.yaml is set to false, the newly added table property does not have any effect.
- Changed default on resumable bootstrap to be disabled. Resumable bootstrap has edge cases with potential correctness
violations or data loss scenarios if nodes go down during bootstrap, tombstones are written, and operations race with
repair. As streaming is considerably faster in the 4.0+ era (as well as with zero copy streaming), the risks of
having these edge cases during a failed and resumed bootstrap are no longer deemed acceptable.
To re-enable this feature, use the -Dcassandra.reset_bootstrap_progress=false environment flag.
- Added --older-than and --older-than-timestamp options to nodetool clearsnapshot command. It is possible to
clear snapshots which are older than some period for example, "--older-than 5h" to remove
snapshots older than 5 hours and it is possible to clear all snapshots older than some timestamp, for example
--older-than-timestamp 2022-12-03T10:15:30Z.
- Cassandra logs can be viewed in the virtual table system_views.system_logs.
Please uncomment the respective appender in logback.xml file to make logs flow into this table. This feature is turned off by default.
- Added new CQL table property 'incremental_backups' which is by default true. When 'incremental_backups' property in cassandra.yaml
is set to true and table property is set to false, incremental backups for that specific table will not be done.
When 'incremental_backups' in casandra.yaml is set to false, the newly added table property does not have any effect.
Both properties have to be set to true (cassandra.yaml and table property) in order to make incremental backups.
- Added new CQL native scalar functions for collections. The new functions are mostly analogous to the existing
aggregation functions, but they operate on the elements of collection columns. The new functions are `map_keys`,
`map_values`, `collection_count`, `collection_min`, `collection_max`, `collection_sum` and `collection_avg`.
- Added compaction_properties column to system.compaction_history table and nodetool compactionhistory command
- SimpleSeedProvider can resolve multiple IP addresses per DNS record. SimpleSeedProvider reacts on
the paramater called `resolve_multiple_ip_addresses_per_dns_record` which value is meant to be boolean and by
default it is set to false. When set to true, SimpleSeedProvider will resolve all IP addresses per DNS record,
based on the configured name service on the system.
- Added new native CQL functions for data masking, allowing to replace or obscure sensitive data. The functions are:
- `mask_null` replaces the column value by null.
- `mask_default` replaces the data by a fixed default value of the same type.
- `mask_replace` replaces the data by a custom value.
- `mask_inner` replaces every character but the first and last ones by a fixed character.
- `mask_outer` replaces the first and last characters by a fixed character.
- `mask_hash` replaces the data by its hash, according to the specified algorithm.
- On virtual tables, it is not strictly necessary to specify `ALLOW FILTERING` for select statements which would
normally require it, except `system_views.system_logs`.
- More accurate skipping of sstables in read path due to better handling of min/max clustering and lower bound;
SSTable format has been bumped to 'oa' because there are new fields in stats metadata\
- Added MaxSSTableSize and MaxSSTableDuration metrics to TableMetrics. The former returns the size of the biggest
SSTable of a table or 0 when there is not any SSTable. The latter returns the maximum duration, computed as
`maxTimestamp - minTimestamp`, effectively non-zero for SSTables produced by TimeWindowCompactionStrategy.
- Added local read/write ratio to tablestats.
- Added system_views.max_sstable_size and system_views.max_sstable_duration tables.
- Added virtual table system_views.snapshots to see all snapshots from CQL shell.
- Added support for attaching CQL dynamic data masking functions to table columns on the schema. These masking
functions can be attached to or dettached from columns with CREATE/ALTER TABLE statements. The functions obscure
the masked data during queries, but they don't change the stored data.
- Added new UNMASK permission. It allows to see the clear data of columns with an attached mask. Superusers have it
by default, whereas regular users don't have it by default.
- Added new SELECT_MASKED permission. It allows to run SELECT queries selecting the clear values of masked columns.
Superusers have it by default, whereas regular users don't have it by default.
- Added support for using UDFs as masking functions attached to table columns on the schema.
- Added `sstablepartitions` offline tool to find large partitions in sstables.
- `cassandra-stress` has a new option called '-jmx' which enables a user to pass username and password to JMX (CASSANDRA-18544)
- It is possible to read all credentials for `cassandra-stress` from a file via option `-credentials-file` (CASSANDRA-18544)
- nodetool info displays bootstrap state a node is in as well as if it was decommissioned or if it failed to decommission (CASSANDRA-18555)
- Added snitch for Microsoft Azure of name AzureSnitch (CASSANDRA-18646)
- legacy command line options from cassandra-stress were removed
- `-mode` option in cassandra-stress has `native` and `cql3` as defaults and they do not need to be specified
- Allow to write the commitlog using direct I/O. Direct I/O is a new feature that minimizes cache effects and
memory-mapping overhead by using user-space buffers. This helps in transferring data from/to disk at high speed.
Java enabled support for the direct I/O feature from version 10 onwards - see JDK-8164900 for reference. (CASSANDRA-18464)
Upgrading
---------
- Default disk_access_mode value changed from "auto" to "mmap_index_only". Override this setting with "disk_access_mode: auto" on
cassandra.yaml to keep the previous default. See CASSANDRA-19021 for details.
- The Python versions recommended for running cqlsh have been bumped from 3.6+ to 3.8-3.11. Python 3.6-3.7 are now
deprecated, as they have reached end-of-life, and support will be removed in a future major release.
- Java 8 has been removed. Lowest supported version is Java 11.
- Ephemeral marker files for snapshots done by repairs are not created anymore,
there is a dedicated flag in snapshot manifest instead. On upgrade of a node to this version, on node's start, in case there
are such ephemeral snapshots on disk, they will be deleted (same behaviour as before) and any new ephemeral snapshots
will stop to create ephemeral marker files as flag in a snapshot manifest was introduced instead.
- There were new table properties introduced called 'allow_auto_snapshot' and 'incremental_backups' (see section 'New features'). Hence, upgraded
node will be on a new schema version. Please do a rolling upgrade of nodes of a cluster to converge to one schema version.
- All previous versions of 4.x contained a mistake on the implementation of the old CQL native protocol v3. That
mistake produced issues when paging over tables with compact storage and a single clustering column during rolling
upgrades involving 3.x and 4.x nodes. The fix for that issue makes it can now appear during rolling upgrades from
4.1.0 or 4.0.0-4.0.7. If that is your case, please use protocol v4 or higher in your driver. See CASSANDRA-17507
for further details.
- Added API for alternative sstable implementations. For details, see src/java/org/apache/cassandra/io/sstable/SSTable_API.md
- DateTieredCompactionStrategy was removed. Please change the compaction strategy for the tables using this strategy
to TimeWindowCompactionStrategy before upgrading to this version.
- The deprecated functions `dateOf` and `unixTimestampOf` have been removed. They were deprecated and replaced by
`toTimestamp` and `toUnixTimestamp` in Cassandra 2.2.
- Hadoop integration is no longer available (CASSANDRA-18323). If you want to process Cassandra data by big data frameworks,
please upgrade your infrastructure to use Cassandra Spark connector.
- Keystore/truststore password configurations are nullable now and the code defaults of those passwords to 'cassandra' are
removed. Any deployments that depend upon the code default to this password value without explicitly specifying
it in cassandra.yaml will fail on upgrade. Please specify your keystore_password and truststore_password elements in cassandra.yaml with appropriate
values to prevent this failure.
- Please beware that if you use Ec2Snitch or Ec2MultiRegionSnitch, by default it will
communicate with AWS IMDS of version 2. This change is transparent, there does not need
to be done anything upon upgrade. Furthermore, IMDS of version 2 can be configured to be required in AWS EC2 console.
Consult cassandra-rackdc.properties for more details. (CASSANDRA-16555)
- JMX MBean `org.apache.cassandra.metrics:type=BufferPool` without scope has been removed.
Use instead `org.apache.cassandra.metrics:type=BufferPool,scope=chunk-cache`. (CASSANDRA-17668)
- Upon upgrade, when cassandra.yaml does not contain `crypto_provider` configuration section, crypto providers from JRE installation will be used
and no installation of DefaultCryptoProvider installing Amazon Corretto Crypto Provider will be conducted.
You need to explicitly add this section to the old yaml if it does not contain it yet to enable Amazon Corretto Crypto Provider for such node.
New deployments have `crypto_provider` uncommented with DefaultCryptoProvider hence Corretto provider will be installed automatically for corresponding architecture.
- `commitlog_sync_batch_window_in_ms` configuration property in cassandra.yaml was removed. Please ensure your configuration is not using this property.
- The pluggable metrics reporter called metrics-reporter-config is removed. The way that metrics can be exported is
fully covered by the dropwizard metrics library itself, using e.g. CsvReporter. See CASSANDRA-18743 for more details.
- Deprecated CQL compression parameters for table, `sstable_compression` and `chunk_length_kb`, were removed. Please use `class` and `chunk_length_in_kb` instead.
Deprecation
-----------
- Deprecated code in Cassandra 1.x and 2.x was removed. See CASSANDRA-18959 for more details.
- In the JMX MBean `org.apache.cassandra.db:type=RepairService` (CASSANDRA-17668):
- deprecate the getter/setter methods `getRepairSessionSpaceInMebibytes` and `setRepairSessionSpaceInMebibytes`
in favor of `getRepairSessionSpaceInMiB` and `setRepairSessionSpaceInMiB` respectively
- In the JMX MBean `org.apache.cassandra.db:type=StorageService` (CASSANDRA-17668):
- deprecate the getter/setter methods `getRepairSessionMaxTreeDepth` and `setRepairSessionMaxTreeDepth`
in favor of `getRepairSessionMaximumTreeDepth` and `setRepairSessionMaximumTreeDepth`
- deprecate the setter method `setColumnIndexSize` in favor of `setColumnIndexSizeInKiB`
- deprecate the getter/setter methods `getColumnIndexCacheSize` and `setColumnIndexCacheSize` in favor of
`getColumnIndexCacheSizeInKiB` and `setColumnIndexCacheSizeInKiB` respectively
- deprecate the getter/setter methods `getBatchSizeWarnThreshold` and `setBatchSizeWarnThreshold` in favor of
`getBatchSizeWarnThresholdInKiB` and `setBatchSizeWarnThresholdInKiB` respectively
- All native CQL functions names that don't use the snake case names are deprecated in favour of equivalent names
using snake casing. Thus, `totimestamp` is deprecated in favour of `to_timestamp`, `intasblob` in favour
of `int_as_blob`, `castAsInt` in favour of `cast_as_int`, etc.
- The config property `compaction_large_partition_warning_threshold` has been deprecated in favour of the new
guardrail for partition size. That guardrail is based on the properties `partition_size_warn_threshold` and
`partition_size_fail_threshold`. The warn threshold has a very similar behaviour to the old config property.
The old property is still supported for backward compatibility, but now it is disabled by default.
- The config property `compaction_tombstone_warning_threshold` has been deprecated in favour of the new guardrail
for partition tombstones. That guardrail is based on the properties `partition_tombstones_warn_threshold` and
`partition_tombstones_fail_threshold`. The warn threshold has a very similar behaviour to the old config property.
The old property is still supported for backward compatibility, but now it is disabled by default.
- CloudstackSnitch is marked as deprecated and it is not actively maintained anymore. It is scheduled to be removed
in the next major version of Cassandra.
- Usage of dual native ports (native_transport_port and native_transport_port_ssl) is deprecated and will be removed
in a future release. A single native port can be used for both encrypted and unencrypted traffic; see CASSANDRA-10559.
Cluster hosts running with dual native ports were not correctly identified in the system.peers tables and server-sent EVENTs,
causing clients that encrypt traffic to fail to maintain correct connection pools. For more information, see CASSANDRA-19392.
4.1
===
New features
------------
- Added API for alternative memtable implementations. For details, see
src/java/org/apache/cassandra/db/memtable/Memtable_API.md
- Added a new guardrails framework allowing to define soft/hard limits for different user actions, such as limiting
the number of tables, columns per table or the size of collections. These guardrails are only applied to regular
user queries, and superusers and internal queries are excluded. Reaching the soft limit raises a client warning,
whereas reaching the hard limit aborts the query. In both cases a log message and a diagnostic event are emitted.
Additionally, some guardrails are not linked to specific user queries due to techincal limitations, such as
detecting the size of large collections during compaction or periodically monitoring the disk usage. These
guardrails would only emit the proper logs and diagnostic events when triggered, without aborting any processes.
Guardrails config is defined through cassandra.yaml properties, and they can be dynamically updated through the
JMX MBean `org.apache.cassandra.db:type=Guardrails`. There are guardrails for:
- Number of user keyspaces.
- Number of user tables.
- Number of columns per table.
- Number of secondary indexes per table.
- Number of materialized tables per table.
- Number of fields per user-defined type.
- Number of items in a collection .
- Number of partition keys selected by an IN restriction.
- Number of partition keys selected by the cartesian product of multiple IN restrictions.
- Allowed table properties.
- Allowed read consistency levels.
- Allowed write consistency levels.
- Collections size.
- Query page size.
- Minimum replication factor.
- Data disk usage, defined either as a percentage or as an absolute size.
- Whether user-defined timestamps are allowed.
- Whether GROUP BY queries are allowed.
- Whether the creation of secondary indexes is allowed.
- Whether the creation of uncompressed tables is allowed.
- Whether querying with ALLOW FILTERING is allowed.
- Whether DROP or TRUNCATE TABLE commands are allowed.
- Add support for the use of pure monotonic functions on the last attribute of the GROUP BY clause.
- Add floor functions that can be use to group by time range.
- Support for native transport rate limiting via native_transport_rate_limiting_enabled and
native_transport_max_requests_per_second in cassandra.yaml.
- Support for pre hashing passwords on CQL DCL commands
- Expose all client options via system_views.clients and nodetool clientstats --client-options.
- Add new nodetool compactionstats --vtable option to match the sstable_tasks vtable.
- Support for String concatenation has been added through the + operator.
- New configuration max_hints_size_per_host to limit the size of local hints files per host in mebibytes. Setting to
non-positive value disables the limit, which is the default behavior. Setting to a positive value to ensure
the total size of the hints files per host does not exceed the limit.
- Added ability to configure auth caches through corresponding `nodetool` commands.
- CDC data flushing now can be configured to be non-blocking with the configuration cdc_block_writes. Setting to true,
any writes to the CDC-enabled tables will be blocked when reaching to the limit for CDC data on disk, which is the
existing and the default behavior. Setting to false, the writes to the CDC-enabled tables will be accepted and
the oldest CDC data on disk will be deleted to ensure the size constraint.
- Top partitions based on partition size or tombstone count are now tracked per table. These partitions are stored
in a new system.top_partitions table and exposed via JMX and nodetool tablestats. The partitions are tracked
during full or validation repairs but not incremental ones since those don't include all sstables and the partition
size/tombstone count would not be correct.
- New native functions to convert unix time values into C* native types: toDate(bigint), toTimestamp(bigint),
mintimeuuid(bigint) and maxtimeuuid(bigint)
- Support for multiple permission in a single GRANT/REVOKE/LIST statement has been added. It allows to
grant/revoke/list multiple permissions using a single statement by providing a list of comma-separated
permissions.
- A new ALL TABLES IN KEYSPACE resource has been added. It allows to grant permissions for all tables and user types
in a keyspace while preventing the user to use those permissions on the keyspace itself.
- Added support for type casting in the WHERE clause components and in the values of INSERT and UPDATE statements.
- A new implementation of Paxos (named v2) has been included that improves the safety and performance of LWT operations.
Importantly, v2 guarantees linearizability across safe range movements, so users are encouraged to enable v2.
v2 also halves the number of WAN messages required to be exchanged if used on conjunction with the new Paxos Repair
mechanism (see below) and with some minor modifications to applications using LWTs.
The new implementation may be enabled at any time by setting paxos_variant: v2, and disabled by setting to v1,
and this alone will reduce the number of WAN round-trips by between one and two for reads, and one for writes.
- A new Paxos Repair mechanism has been introduced as part of Repair, that permits further reducing the number of WAN
round-trips for write LWTs. This process may be manually executed for v1 and is run automatically alongside normal
repairs for v2. Once users are running regular repairs that include paxos repairs they are encouraged to set
paxos_state_purging: repaired. Once this has been set across the cluster, users are encouraged to set their
applications to supply a Commit consistency level of ANY with their LWT write operations, saving one additional WAN
round-trip. See upgrade notes below.
- Warn/fail thresholds added to read queries notifying clients when these thresholds trigger (by
emitting a client warning or failing the query). This feature is disabled by default, scheduled
to be enabled in 4.2; it is controlled with the configuration read_thresholds_enabled,
setting to true will enable this feature. Each check has its own warn/fail thresholds, currently
tombstones (tombstone_warn_threshold, and tombstone_failure_threshold), coordinator result set
materialized size (coordinator_read_size_warn_threshold and coordinator_read_size_fail_threshold),
local read materialized heap size
(local_read_size_warn_threshold and local_read_size_fail_threshold),
and RowIndexEntry estimated memory size (row_index_read_size_warn_threshold and
row_index_read_size_fail_threshold) are supported; more checks will be added over time.
- Prior to this version, the hint system was storing a window of hints as defined by
configuration property max_hint_window_in_ms, however this window is not persistent across restarts.
For example, if a node is restarted, it will be still eligible for a hint to be sent to it because it
was down less than max_hint_window_in_ms. Hence if that node continues restarting without hint delivery completing,
hints will be sent to that node indefinitely which would occupy more and more disk space.
This behaviour was changed in CASSANDRA-14309. From now on, by default, if a node is not down longer than
max_hint_window_in_ms, there is an additional check to see if there is a hint to be delivered which is older
than max_window_in_ms. If there is, a hint is not persisted. If there is not, it is.
This behaviour might be reverted as it was in previous version by property hint_window_persistent_enabled by
setting it to false. This property is by default set to true.
- Added a new feature to allow denylisting (i.e. blocking read, write, or range read configurable) access to partition
keys in configured keyspaces and tables. See doc/operating/denylisting_partitions.rst for details on using this new
feature. Also see CASSANDRA-12106.
- Information about pending hints is now available through `nodetool listpendinghints` and `pending_hints` virtual
table.
- Added ability to invalidate auth caches through corresponding `nodetool` commands and virtual tables.
- DCL statements in audit logs will now obscure only the password if they don't fail to parse.
- Starting from 4.1 sstables support UUID based generation identifiers. They are globally unique and thus they let
the node to create sstables without any prior knowledge about the existing sstables in the data directory.
The feature is disabled by default in cassandra.yaml because once enabled, there is no easy way to downgrade.
When the node is restarted with UUID based generation identifiers enabled, each newly created sstable will have
a UUID based generation identifier and such files are not readable by previous Cassandra versions. In the future
those new identifiers will become enabled by default.
- Resetting schema behavior has changed in 4.1 so that: 1) resetting schema is prohibited when there is no live node
where the schema could be fetched from, and 2) truncating local schema keyspace is postponed to the moment when
the node receives schema from some other node.
Upgrading
---------
- `cache_load_timeout_seconds` being negative for disabled is equivalent to `cache_load_timeout` = 0 for disabled.
- `sstable_preemptive_open_interval_in_mb` being negative for disabled is equivalent to `sstable_preemptive_open_interval`
being null again. In the JMX MBean `org.apache.cassandra.db:type=StorageService`, the setter method
`setSSTablePreemptiveOpenIntervalInMB`still takes `intervalInMB` negative numbers for disabled.
- `enable_uuid_sstable_identifiers` parameter from 4.1 alpha1 was renamed to `uuid_sstable_identifiers_enabled`.
- `index_summary_resize_interval_in_minutes = -1` is equivalent to index_summary_resize_interval being set to `null` or
disabled. In the JMX MBean `org.apache.cassandra.db:type=IndexSummaryManager`, the setter method `setResizeIntervalInMinutes` still takes
`resizeIntervalInMinutes = -1` for disabled.
- min_tracked_partition_size_bytes parameter from 4.1 alpha1 was renamed to min_tracked_partition_size.
- Parameters of type data storage, duration and data rate cannot be set to Long.MAX_VALUE (former parameters of long type)
and Integer.MAX_VALUE (former parameters of int type). Those numbers are used during conversion between units to prevent
an overflow from happening. (CASSANDRA-17571)
- We added new JMX methods `setStreamThroughputMbitPerSec`, `getStreamThroughputMbitPerSec`, `setInterDCStreamThroughputMbitPerSec`,
`getInterDCStreamThroughputMbitPerSec` to the JMX MBean `org.apache.cassandra.db:type=StorageService`. They replace the now
deprecated methods `setStreamThroughputMbPerSec`, `getStreamThroughputMbPerSec`, `setInterDCStreamThroughputMbPerSec`, and
`getInterDCStreamThroughputMbPerSec`, which will be removed in a future major release.
- The config property `repair_session_space_in_mb` was wrongly advertised in previous versions that it should be set in
megabytes when it is interpreted internally in mebibytes. To reduce the confusion we added two new JMX methods
`setRepairSessionSpaceInMebibytes(int sizeInMebibytes)` and `getRepairSessionSpaceInMebibytes`. They replace the now
deprecated methods `setRepairSessionSpaceInMegabytes(int sizeInMegabytes)` and `getRepairSessionSpaceInMegabytes`, which
will be removed in a future major release.
- There is a new cassandra.yaml version 2. Units suffixes should be provided for all rates(B/s|KiB/s|MiB/s),
memory (B|KiB|MiB|GiB) and duration(d|h|m|s|ms|us|µs|ns)
parameters. List of changed parameters and details to consider during configuration setup can be
found at https://cassandra.apache.org/doc/latest/cassandra/new/configuration.html. (CASSANDRA-15234)
Backward compatibility with the old cassandra.yaml file will be in place until at least the next major version.
By default we refuse starting Cassandra with a config containing both old and new config keys for the same parameter. Start
Cassandra with -Dcassandra.allow_new_old_config_keys=true to override. For historical reasons duplicate config keys
in cassandra.yaml are allowed by default, start Cassandra with -Dcassandra.allow_duplicate_config_keys=false to disallow this.
- Many cassandra.yaml parameters' names have been changed. Full list and details to consider during configuration setup
when installing/upgrading Cassandra can be found at https://cassandra.apache.org/doc/latest/cassandra/new/configuration.html (CASSANDRA-15234)
- Negative values cannot be used for parameters of type data rate, duration and data storage with both old and new cassandra.yaml version.
Only exception is if you use old cassandra.yaml, pre-CASSANDRA-15234 - then -1 or other negative values which were advertised as an option
to disable config parameters in the old cassandra.yaml are still used. Those are probably converted to null value with the new cassandra.yaml,
as written in the new cassandra.yaml version and docs.
- Before you upgrade, if you are using `cassandra.auth_bcrypt_gensalt_log2_rounds` property,
confirm it is set to value lower than 31 otherwise Cassandra will fail to start. See CASSANDRA-9384
for further details. You also need to regenerate passwords for users for who the password
was created while the above property was set to be more than 30 otherwise they will not be able to log in.
- JNA library was updated from 5.6.0 to 5.9.0. In version 5.7.0, Darwin support for M1 devices
was fixed but prebuild native library for Darwin x86 (32bit Java on Mac OS) was removed.
- The config properties for setting the streaming throughput `stream_throughput_outbound_megabits_per_sec` and
`inter_dc_stream_throughput_outbound_megabits_per_sec` were incorrectly interpreted as mebibits. This has
been fixed by CASSANDRA-17243, so the values for these properties will now indicate a throughput ~4.6% lower than
what was actually applied in previous versions. This also affects the setters and getters for these properties in
the JMX MBean `org.apache.cassandra.db:type=StorageService` and the nodetool commands `set/getstreamthroughput`
and `set/getinterdcstreamthroughput`.
- Steps for upgrading Paxos
- Set paxos_variant: v2 across the cluster. This may be set via JMX, but should also be written
persistently to any yaml.
- Ensure paxos repairs are running regularly, either as part of normal incremental repair workflows or on their
own separate schedule. These operations are cheap and better to run frequently (e.g. once per hour)
- Set paxos_state_purging: repaired across the cluster. This may be set via JMX, but should also be written
persistently to any yaml. NOTE: once this has been set, you must not restore paxos_state_purging: legacy. If
this setting must be disabled you must instead set paxos_state_purging: gc_grace. This may be necessary if
paxos repairs must be disabled for some reason on an extended basis, but in this case your applications must
restore default commit consistency to ensure correctness.
- Applications may now safely be updated to use ANY commit consistency level (or LOCAL_QUORUM, as preferred).
Uncontended writes should now take 2 round-trips, and uncontended reads should typically take one round-trip.
- A required [f|force] flag has been added to both "nodetool verify" and the standalone "sstableverify" tools.
These tools have some subtleties and should not be used unless the operator is familiar with what they do
and do not do, as well as the edge cases associated with their use.
NOTE: ANY SCRIPTS THAT RELY ON sstableverify OR nodetool verify WILL STOP WORKING UNTIL MODIFIED.
Please see CASSANDRA-17017 for details: https://issues.apache.org/jira/browse/CASSANDRA-17017
- `MutationExceededMaxSizeException` thrown when a mutation exceeds `max_mutation_size` inherits
from `InvalidRequestException` instead of `RuntimeException`. See CASSANDRA-17456 for details.
Deprecation
-----------
- In the command line options for `org.apache.cassandra.tools.LoaderOptions`: deprecate the `-t`, `--throttle`,
`-idct`, and `--inter-dc-throttle` options for setting the throttle and inter-datacenter throttle options in
Mbps. Instead, users are instructed to use the `--throttle-mib`, and `--inter-dc-throttle-mib` for setting the
throttling options in MiB/s. Additionally, in the loader options builder
`org.apache.cassandra.tools.LoaderOptions$Builder`: deprecate the `throttle(int)`, `interDcThrottle(int)`,
`entireSSTableThrottle(int)`, and the `entireSSTableInterDcThrottle(int)` methods.
- In the JMX MBean `org.apache.cassandra.db:type=StorageService`: deprecate getter method `getStreamThroughputMbitPerSec`
in favor of getter method `getStreamThroughputMbitPerSecAsDouble`; deprecate getter method `getStreamThroughputMbPerSec`
in favor of getter methods `getStreamThroughputMebibytesPerSec` and `getStreamThroughputMebibytesPerSecAsDouble`;
deprecate getter method `getInterDCStreamThroughputMbitPerSec` in favor of getter method `getInterDCStreamThroughputMbitPerSecAsDouble`;
deprecate getter method `getInterDCStreamThroughputMbPerSec` in favor of getter methods `getInterDCStreamThroughputMebibytesPerSecAsDouble`;
deprecate getter method `getCompactionThroughputMbPerSec` in favor of getter methods `getCompactionThroughtputMibPerSecAsDouble`
and `getCompactionThroughtputBytesPerSec`; deprecate setter methods `setStreamThroughputMbPerSec` and `setStreamThroughputMbitPerSec`
in favor of `setStreamThroughputMebibytesPerSec`; deprecate setter methods `setInterDCStreamThroughputMbitPerSec` and
`setInterDCStreamThroughputMbPerSec` in favor of `setInterDCStreamThroughputMebibytesPerSec`. The deprecated JMX methods
may return a rounded value so if precision is important, you want to use the new getters. While those deprecated JMX getters
will return a rounded number, the nodetool commands `getstreamthroughput` and `getinterdcstreamthroughput`
will throw Runtime Exceptions advising to use the new -d flag in case an integer cannot be returned. See CASSANDRA-17725 for further details.
- Deprecate public method `setRate(final double throughputMbPerSec)` in `Compaction Manager` in favor of
`setRateInBytes(final double throughputBytesPerSec)`
- `withBufferSizeInMB(int size)` in `StressCQLSSTableWriter.Builder` class is deprecated in favor of `withBufferSizeInMiB(int size)`
No change of functionality in the new one, only name change for clarity in regards to units and to follow naming
standartization.
- `withBufferSizeInMB(int size)` in `CQLSSTableWriter.Builder` class is deprecated in favor of `withBufferSizeInMiB(int size)`
No change of functionality in the new one, only name change for clarity in regards to units and to follow naming
standartization.
- The properties `keyspace_count_warn_threshold` and `table_count_warn_threshold` in cassandra.yaml have been
deprecated in favour of the new `keyspaces_warn_threshold` and `tables_warn_threshold` properties and will be removed
in a subsequent major version. This also affects the setters and getters for those properties in the JMX MBean
`org.apache.cassandra.db:type=StorageService`, which are equally deprecated in favour of the analogous methods
in the JMX MBean `org.apache.cassandra.db:type=Guardrails`. See CASSANDRA-17195 for further details.
- The functionality behind the property `windows_timer_interval` was removed as part of CASSANDRA-16956. The
property is still present but it is deprecated and it is just a place-holder to prevent breaking upgrades. This
property is expected to be fully removed in the next major release of Cassandra.
4.0
===
New features
------------
- Full support for Java 11, it is not experimental anymore.
- The data of the system keyspaces using a local strategy (at the exception of the system.batches,
system.paxos, system.compaction_history, system.prepared_statements and system.repair tables)
is now stored by default in the first data directory, instead of being distributed among all
the data directories. This approach will allow the server to tolerate the failure of the other disks.
To ensure that a disk failure will not bring a node down, it is possible to use the system_data_file_directory
yaml property to store the local system keyspaces data on a directory that provides redundancy.
On node startup the local system keyspaces data will be automatically migrated if needed to the
correct location.
- Nodes will now bootstrap all intra-cluster connections at startup by default and wait
10 seconds for the all but one node in the local data center to be connected and marked
UP in gossip. This prevents nodes from coordinating requests and failing because they
aren't able to connect to the cluster fast enough. block_for_peers_timeout_in_secs in
cassandra.yaml can be used to configure how long to wait (or whether to wait at all)
and block_for_peers_in_remote_dcs can be used to also block on all but one node in
each remote DC as well. See CASSANDRA-14297 and CASSANDRA-13993 for more information.
- *Experimental* support for Transient Replication and Cheap Quorums introduced by CASSANDRA-14404
The intended audience for this functionality is expert users of Cassandra who are prepared
to validate every aspect of the database for their application and deployment practices. Future
releases of Cassandra will make this feature suitable for a wider audience.
- *Experimental* support for Java 11 has been added. JVM options that differ between or are
specific for Java 8 and 11 have been moved from jvm.options into jvm8.options and jvm11.options.
IMPORTANT: Running C* on Java 11 is *experimental* and do it at your own risk.
- LCS now respects the max_threshold parameter when compacting - this was hard coded to 32
before, but now it is possible to do bigger compactions when compacting from L0 to L1.
This also applies to STCS-compactions in L0 - if there are more than 32 sstables in L0
we will compact at most max_threshold sstables in an L0 STCS compaction. See CASSANDRA-14388
for more information.
- There is now an option to automatically upgrade sstables after Cassandra upgrade, enable
either in `cassandra.yaml:automatic_sstable_upgrade` or via JMX during runtime. See
CASSANDRA-14197.
- `nodetool refresh` has been deprecated in favour of `nodetool import` - see CASSANDRA-6719
for details
- An experimental option to compare all merkle trees together has been added - for example, in
a 3 node cluster with 2 replicas identical and 1 out-of-date, with this option enabled, the
out-of-date replica will only stream a single copy from up-to-date replica. Enable it by adding
"-os" to nodetool repair. See CASSANDRA-3200.
- The currentTimestamp, currentDate, currentTime and currentTimeUUID functions have been added.
See CASSANDRA-13132
- Support for arithmetic operations between `timestamp`/`date` and `duration` has been added.
See CASSANDRA-11936
- Support for arithmetic operations on number has been added. See CASSANDRA-11935
- Preview expected streaming required for a repair (nodetool repair --preview), and validate the
consistency of repaired data between nodes (nodetool repair --validate). See CASSANDRA-13257
- Support for selecting Map values and Set elements has been added for SELECT queries. See CASSANDRA-7396
- Change-Data-Capture has been modified to make CommitLogSegments available
immediately upon creation via hard-linking the files. This means that incomplete
segments will be available in cdc_raw rather than fully flushed. See documentation
and CASSANDRA-12148 for more detail.
- The initial build of materialized views can be parallelized. The number of concurrent builder
threads is specified by the property `cassandra.yaml:concurrent_materialized_view_builders`.
This property can be modified at runtime through both JMX and the new `setconcurrentviewbuilders`
and `getconcurrentviewbuilders` nodetool commands. See CASSANDRA-12245 for more details.
- There is now a binary full query log based on Chronicle Queue that can be controlled using
nodetool enablefullquerylog, disablefullquerylog, and resetfullquerylog. The log
contains all queries invoked, approximate time they were invoked, any parameters necessary
to bind wildcard values, and all query options. A human readable version of the log can be
dumped or tailed using the new bin/fqltool utility. The full query log is designed to be safe
to use in production and limits utilization of heap memory and disk space with limits
you can specify when enabling the log.
See nodetool and fqltool help text for more information.
- SSTableDump now supports the -l option to output each partition as it's own json object
See CASSANDRA-13848 for more detail
- Metric for coordinator writes per table has been added. See CASSANDRA-14232
- Nodetool cfstats now has options to sort by various metrics as well as limit results.
- Operators can restrict login user activity to one or more datacenters. See `network_authorizer`
in cassandra.yaml, and the docs for create and alter role statements. CASSANDRA-13985
- Roles altered from login=true to login=false will prevent existing connections from executing any
statements after the cache has been refreshed. CASSANDRA-13985
- Support for audit logging of database activity. If enabled, logs every incoming
CQL command request, Authentication (successful as well as unsuccessful login) to a node.
- Faster streaming of entire SSTables using ZeroCopy APIs. If enabled, Cassandra will use stream
entire SSTables, significantly speeding up transfers. Any streaming related operations will see
corresponding improvement. See CASSANDRA-14556.
- NetworkTopologyStrategy now supports auto-expanding the replication_factor
option into all available datacenters at CREATE or ALTER time. For example,
specifying replication_factor: 3 translates to three replicas in every
datacenter. This auto-expansion will _only add_ datacenters for safety.
See CASSANDRA-14303 for more details.
- Added Python 3 support so cqlsh and cqlshlib is now compatible with Python 2.7 and Python 3.6.
Added --python option to cqlsh so users can specify the path to their chosen Python interpreter.
See CASSANDRA-10190 for details.
- Support for server side DESCRIBE statements has been added. See CASSANDRA-14825
- It is now possible to rate limit snapshot creation/clearing. See CASSANDRA-13019
- Authentication reads and writes have been changed from a mix of ONE, LOCAL_ONE, and QUORUM
to LOCAL_QUORUM on reads and EACH_QUORUM on writes. This is configurable via cassandra.yaml with
auth_read_consistency_level and auth_write_consistency_level respectively. See CASSANDRA-12988.
Upgrading
---------
- If you were on 4.0.1 - 4.0.5 and if you haven't set the compaction_thoroughput_mb_per_sec in your 4.0 cassandra.yaml
file but you relied on the internal default value,then compaction_throughput_mb_per_sec was equal to an old default
value of 16MiB/s in Cassandra 4.0. After CASSANDRA-17790 this is changed to 64MiB/s to match the default value in
cassandra.yaml. If you prefer the old one of 16MiB/s, you need to set it explicitly in your cassandra.yaml file.
- otc_coalescing_strategy, otc_coalescing_window_us, otc_coalescing_enough_coalesced_messages,
otc_backlog_expiration_interval_ms are deprecated and will be removed at earliest with next major release.
otc_coalescing_strategy is disabled since 3.11.
- As part of the Internode Messaging improvement work in CASSANDRA-15066, internode_send_buff_size_in_bytes and
internode_recv_buff_size_in_bytes were renamed to internode_socket_send_buffer_size_in_bytes and
internode_socket_receive_buffer_size_in_bytes. To support upgrades pre-4.0, we add backward compatibility and
currently both old and new names should work. Cassandra 4.0.0 and Cassandra 4.0.1 work ONLY with the new names
(They weren't updated in cassandra.yaml though).
- DESCRIBE|DESC was moved to server side in Cassandra 4.0. As a consequence DESCRIBE|DESC will not work in cqlsh 6.0.0
being connected to earlier major Cassandra versions where DESCRIBE does not exist server side.
- cqlsh shell startup script now prefers 'python3' before 'python' when identifying a runtime.
- As part of the Internode Messaging improvement work in CASSANDRA-15066, matching response verbs for every request
verb were introduced and verbs were renamed. DroppedMessageMetrics pre-4.0 are now available with _REQ suffix. As
part of CASSANDRA-16083, we added DroppedMessageMetrics backward compatibility layer which exposes the metrics with
their old names too. Only the value for verbs READ and RANGE_SLICE will differ from the same metrics in 3.11 as it
does not include anymore the responses dropped, only the requests. After being deprecated in 3.11 PAGED_RANGE was
fully removed in 4.0. ConditionNotMet metric has been moved under scope CASClientWriteRequestMetrtic but as part of
CASSANDRA-16083, backward compatibility layer was added so it can be still exposed under the old 3.11 scope.
- Native protocol v5 is promoted from beta in this release. The wire format has changed
significantly and users should take care to ensure client drivers are upgraded to a version
with support for the final v5 format, if currently connecting over v5-beta. (CASSANDRA-15299, CASSANDRA-14973)
- Cassandra removed support for the OldNetworkTopologyStrategy. Before upgrading you will need to change the
replication strategy for the keyspaces using this strategy to the NetworkTopologyStrategy. (CASSANDRA-13990)
- Sstables for tables using with a frozen UDT written by C* 3.0 appear as corrupted.
Background: The serialization-header in the -Statistics.db sstable component contains the type information
of the table columns. C* 3.0 write incorrect type information for frozen UDTs by omitting the
"frozen" information. Non-frozen UDTs were introduced by CASSANDRA-7423 in C* 3.6. Since then, the missing
"frozen" information leads to deserialization issues that result in CorruptSSTableExceptions, potentially other
exceptions as well.
As a mitigation, the sstable serialization-headers are rewritten to contain the missing "frozen" information for
UDTs once, when an upgrade from C* 3.0 is detected. This migration does not touch snapshots or backups.
The sstablescrub tool now performs a check of the sstable serialization-header against the schema. A mismatch of
the types in the serialization-header and the schema will cause sstablescrub to error out and stop by default.
See the new `-e` option. `-e off` disables the new validation code. `-e fix` or `-e fix-only`, e.g.
`sstablescrub -e fix keyspace table`, will validate the serialization-header, rewrite the non-frozen UDTs
in the serialzation-header to frozen UDTs, if that matches the schema, and continue with scrub.
See `sstablescrub -h`.
(CASSANDRA-15035)
- CASSANDRA-13241 lowered the default chunk_lengh_in_kb for compresesd tables from
64kb to 16kb. For highly compressible data this can have a noticeable impact
on space utilization. You may want to consider manually specifying this value.
- Additional columns have been added to system_distributed.repair_history,
system_traces.sessions and system_traces.events. As a result select queries
against these tables - including queries against tracing tables performed
automatically by the drivers and cqlsh - will fail and generate an error in the log
during upgrade when the cluster is mixed version. On 3.x side this will also lead
to broken internode connections and lost messages.
Cassandra versions 3.0.20 and 3.11.6 pre-add these columns (see CASSANDRA-15385),
so please make sure to upgrade to those versions or higher before upgrading to
4.0 for query tracing to not cause any issues during the upgrade to 4.0.
- Timestamp ties between values resolve differently: if either value has a TTL,
this value always wins. This is to provide consistent reconciliation before
and after the value expires into a tombstone.
- Support for legacy auth tables in the system_auth keyspace (users,
permissions, credentials) and the migration code has been removed. Migration
of these legacy auth tables must have been completed before the upgrade to
4.0 and the legacy tables must have been removed. See the 'Upgrading' section
for version 2.2 for migration instructions.
- Cassandra 4.0 removed support for the deprecated Thrift interface. Amongst
other things, this implies the removal of all yaml options related to thrift
('start_rpc', rpc_port, ...).
- Cassandra 4.0 removed support for any pre-3.0 format. This means you
cannot upgrade from a 2.x version to 4.0 directly, you have to upgrade to
a 3.0.x/3.x version first (and run upgradesstable). In particular, this
mean Cassandra 4.0 cannot load or read pre-3.0 sstables in any way: you
will need to upgrade those sstable in 3.0.x/3.x first.
- Upgrades from 3.0.x or 3.x are supported since 3.0.13 or 3.11.0, previous
versions will causes issues during rolling upgrades (CASSANDRA-13274).
- Cassandra will no longer allow invalid keyspace replication options, such
as invalid datacenter names for NetworkTopologyStrategy. Operators MUST
add new nodes to a datacenter before they can set set ALTER or CREATE
keyspace replication policies using that datacenter. Existing keyspaces
will continue to operate, but CREATE and ALTER will validate that all
datacenters specified exist in the cluster.
- Cassandra 4.0 fixes a problem with incremental repair which caused repaired
data to be inconsistent between nodes. The fix changes the behavior of both
full and incremental repairs. For full repairs, data is no longer marked
repaired. For incremental repairs, anticompaction is run at the beginning
of the repair, instead of at the end. If incremental repair was being used
prior to upgrading, a full repair should be run after upgrading to resolve
any inconsistencies.
- Config option index_interval has been removed (it was deprecated since 2.0)
- Deprecated repair JMX APIs are removed.
- The version of snappy-java has been upgraded to 1.1.2.6
- the miniumum value for internode message timeouts is 10ms. Previously, any
positive value was allowed. See cassandra.yaml entries like
read_request_timeout_in_ms for more details.
- Cassandra 4.0 allows a single port to be used for both secure and insecure
connections between cassandra nodes (CASSANDRA-10404). See the yaml for
specific property changes, and see the security doc for full details.
- Due to the parallelization of the initial build of materialized views,
the per token range view building status is stored in the new table
`system.view_builds_in_progress`. The old table `system.views_builds_in_progress`
is no longer used and can be removed. See CASSANDRA-12245 for more details.
- Config option commitlog_sync_batch_window_in_ms has been deprecated as it's
documentation has been incorrect and the setting itself near useless.
Batch mode remains a valid commit log mode, however.
- There is a new commit log mode, group, which is similar to batch mode
but blocks for up to a configurable number of milliseconds between disk flushes.
- nodetool clearsnapshot now required the --all flag to remove all snapshots.
Previous behavior would delete all snapshots by default.
- Nodes are now identified by a combination of IP, and storage port.
Existing JMX APIs, nodetool, and system tables continue to work
and accept/return just an IP, but there is a new
version of each that works with the full unambiguous identifier.
You should prefer these over the deprecated ambiguous versions that only
work with an IP. This was done to support multiple instances per IP.
Additionally we are moving to only using a single port for encrypted and
unencrypted traffic and if you want multiple instances per IP you must
first switch encrypted traffic to the storage port and not a separate
encrypted port. If you want to use multiple instances per IP
with SSL you will need to use StartTLS on storage_port and set
outgoing_encrypted_port_source to gossip outbound connections
know what port to connect to for each instance. Before changing
storage port or native port at nodes you must first upgrade the entire cluster
and clients to 4.0 so they can handle the port not being consistent across
the cluster.
- Names of AWS regions/availability zones have been cleaned up to more correctly
match the Amazon names. There is now a new option in conf/cassandra-rackdc.properties
that lets users enable the correct names for new clusters, or use the legacy
names for existing clusters. See conf/cassandra-rackdc.properties for details.
- Background repair has been removed. dclocal_read_repair_chance and
read_repair_chance table options have been removed and are now rejected.
See CASSANDRA-13910 for details.
- Internode TCP connections that do not ack segments for 30s will now
be automatically detected and closed via the Linux TCP_USER_TIMEOUT
socket option. This should be exceedingly rare, but AWS networks (and
other stateful firewalls) apparently suffer from this issue. You can
tune the timeouts on TCP connection and segment ack via the
`cassandra.yaml:internode_tcp_connect_timeout_in_ms` and
`cassandra.yaml:internode_tcp_user_timeout_in_ms` options respectively.
See CASSANDRA-14358 for details.
- repair_session_space_in_mb setting has been added to cassandra.yaml to allow operators to reduce
merkle tree size if repair is creating too much heap pressure. The repair_session_max_tree_depth
setting added in 3.0.19 and 3.11.5 is deprecated in favor of this setting. See CASSANDRA-14096
- The flags 'enable_materialized_views' and 'enable_sasi_indexes' in cassandra.yaml
have been set as false by default. Operators should modify them to allow the
creation of new views and SASI indexes, the existing ones will continue working.
See CASSANDRA-14866 for details.
- CASSANDRA-15216 - The flag 'cross_node_timeout' has been set as true by default.
This change is done under the assumption that users have setup NTP on
their clusters or otherwise synchronize their clocks, and that clocks are
mostly in sync, since this is a requirement for general correctness of
last write wins.
- CASSANDRA-15257 removed the joda time dependency. Any time formats
passed will now need to conform to java.time.format.DateTimeFormatter.
Most notably, days and months must be two digits, and years exceeding
four digits need to be prefixed with a plus or minus sign.
- cqlsh now returns a non-zero code in case of errors. This is a backward incompatible change so it may
break existing scripts that rely on the current behavior. See CASSANDRA-15623 for more details.
- Updated the default compaction_throughput_mb_per_sec to to 64. The original
default (16) was meant for spinning disk volumes. See CASSANDRA-14902 for details.
- Custom compaction strategies must now handle getting sstables added/removed notifications for
sstables already added/removed - see CASSANDRA-14103 for details.
- Support for JNA with glibc 2.6 and earlier has been removed. Centos 5, Debian 4, and Ubuntu 7.10 operating systems
must be first upgraded. See CASSANDRA-16212 for more.
- In cassandra.yaml, when using vnodes num_tokens must be defined if initial_token is defined.
If it is not defined, or not equal to the numbers of tokens defined in initial_tokens,
the node will not start. See CASSANDRA-14477 for details.
- CASSANDRA-13701 To give a better out of the box experience, the default 'num_tokens'
value has been changed from 256 to 16 for reasons described in
https://cassandra.apache.org/doc/latest/getting-started/production.html#tokens
'allocate_tokens_for_local_replication_factor' is also uncommented and set to 3.
Please note when upgrading that if the 'num_tokens' value is different than what you have
configured, the upgraded node will refuse to start. Also note that if a new node joining
the cluster has a different value for 'num_tokens' than the rest of the datacenter,
the new node will be responsible for a different amount of data than the rest of the datacenter.
Deprecation
-----------
- JavaScript user-defined functions have been deprecated. They are planned for removal
in the next major release. (CASSANDRA-17280)
- The JMX MBean org.apache.cassandra.metrics:type=Streaming,name=ActiveOutboundStreams has been
deprecated and will be removed in a subsequent major version. This metric was not updated since several version
already.
- The JMX MBean org.apache.cassandra.db:type=BlacklistedDirectories has been
deprecated in favor of org.apache.cassandra.db:type=DisallowedDirectories
and will be removed in a subsequent major version.
- cqlsh support of 2.7 is deprecated and will warn when running with Python 2.7.
ALTER ... DROP COMPACT STORAGE
------------------------------
- Following a discussion regarding concerns about the safety of the 'ALTER ... DROP COMPACT STORAGE' statement,
the C* development community does not recommend its use in production and considers it experimental
(see https://www.mail-archive.com/[email protected]/msg16789.html).
- An 'enable_drop_compact_storage' flag has been added to cassandra.yaml to allow operators to prevent its use.
Materialized Views
-------------------
- Following a discussion regarding concerns about the design and safety of Materialized Views, the C* development
community no longer recommends them for production use, and considers them experimental. Warnings messages will
now be logged when they are created. (See https://www.mail-archive.com/[email protected]/msg11511.html)
- An 'enable_materialized_views' flag has been added to cassandra.yaml to allow operators to prevent creation of
views
- CREATE MATERIALIZED VIEW syntax has become stricter. Partition key columns are no longer implicitly considered
to be NOT NULL, and no base primary key columns get automatically included in view definition. You have to
specify them explicitly now.
Windows Support Removed
-----------------------
- Due to the lack of maintenance and testing, Windows support is removed from this version onward. The developers
who use Windows 10 still can run Apache Cassandra locally using WSL2 (Windows Subsystem for Linux version 2),
Docker for Windows, or virtualization platform like Hyper-V and VirtualBox.
3.11.10
======
Upgrading
---------
- This release fix a correctness issue with SERIAL reads, and LWT writes that do not apply.
Unfortunately, this fix has a performance impact on read performance at the SERIAL or
LOCAL_SERIAL consistency levels. For heavy users of such SERIAL reads, the performance
impact may be noticeable and may also result in an increased of timeouts. For that
reason, a opt-in system property has been added to disable the fix:
-Dcassandra.unsafe.disable-serial-reads-linearizability=true
Use this flag at your own risk as it revert SERIAL reads to the incorrect behavior of
previous versions. See CASSANDRA-12126 for details.
- SASI's `max_compaction_flush_memory_in_mb` setting was previously getting interpreted in bytes. From 3.11.8
it is correctly interpreted in megabytes, but prior to 3.11.10 previous configurations of this setting will
lead to nodes OOM during compaction. From 3.11.10 previous configurations will be detected as incorrect,
logged, and the setting reverted to the default value of 1GB. It is up to the user to correct the setting
after an upgrade, via dropping and recreating the index. See CASSANDRA-16071 for details.
3.11.6
======
Upgrading
---------
- Sstables for tables using with a frozen UDT written by C* 3.0 appear as corrupted.
Background: The serialization-header in the -Statistics.db sstable component contains the type information
of the table columns. C* 3.0 write incorrect type information for frozen UDTs by omitting the
"frozen" information. Non-frozen UDTs were introduced by CASSANDRA-7423 in C* 3.6. Since then, the missing
"frozen" information leads to deserialization issues that result in CorruptSSTableExceptions, potentially other
exceptions as well.
As a mitigation, the sstable serialization-headers are rewritten to contain the missing "frozen" information for
UDTs once, when an upgrade from C* 3.0 is detected. This migration does not touch snapshots or backups.
The sstablescrub tool now performs a check of the sstable serialization-header against the schema. A mismatch of
the types in the serialization-header and the schema will cause sstablescrub to error out and stop by default.
See the new `-e` option. `-e off` disables the new validation code. `-e fix` or `-e fix-only`, e.g.
`sstablescrub -e fix keyspace table`, will validate the serialization-header, rewrite the non-frozen UDTs
in the serialzation-header to frozen UDTs, if that matches the schema, and continue with scrub.
See `sstablescrub -h`.
(CASSANDRA-15035)
- repair_session_max_tree_depth setting has been added to cassandra.yaml to allow operators to reduce
merkle tree size if repair is creating too much heap pressure. See CASSANDRA-14096 for details.
3.11.5
======
Experimental features
---------------------
- An 'enable_sasi_indexes' flag, true by default, has been added to cassandra.yaml to allow operators to prevent
the creation of new SASI indexes, which are considered experimental and are not recommended for production use.
(See https://www.mail-archive.com/[email protected]/msg13582.html)
- The flags 'enable_sasi_indexes' and 'enable_materialized_views' have been grouped under an experimental features
section in cassandra.yaml.
3.11.4
======
Upgrading
---------
- The order of static columns in SELECT * has been fixed to match that of 2.0 and 2.1 - they are now sorted
alphabetically again, by their name, just like regular columns are. If you use prepared statements and
SELECT * queries, and have both simple and collection static columns in those tables, and are upgrading from an
earlier 3.0 version, then you might be affected by this change. Please see CASSANDRA-14638 for details.
3.11.3
=====
Upgrading
---------
- Materialized view users upgrading from 3.0.15 (3.0.X series) or 3.11.1 (3.11.X series) and later that have performed range movements (join, decommission, move, etc),
should run repair on the base tables, and subsequently on the views to ensure data affected by CASSANDRA-14251 is correctly propagated to all replicas.
- Changes to bloom_filter_fp_chance will no longer take effect on existing sstables when the node is restarted. Only
compactions/upgradesstables regenerates bloom filters and Summaries sstable components. See CASSANDRA-11163
3.11.2
======
Upgrading
---------
- See MAXIMUM TTL EXPIRATION DATE NOTICE above.
- Cassandra is now relying on the JVM options to properly shutdown on OutOfMemoryError. By default it will
rely on the OnOutOfMemoryError option as the ExitOnOutOfMemoryError and CrashOnOutOfMemoryError options
are not supported by the older 1.7 and 1.8 JVMs. A warning will be logged at startup if none of those JVM
options are used. See CASSANDRA-13006 for more details
- Cassandra is not logging anymore by default an Heap histogram on OutOfMemoryError. To enable that behavior
set the 'cassandra.printHeapHistogramOnOutOfMemoryError' System property to 'true'. See CASSANDRA-13006
for more details.
3.11.1
======
Upgrading
---------
- Creating Materialized View with filtering on non-primary-key base column
(added in CASSANDRA-10368) is disabled, because the liveness of view row
is depending on multiple filtered base non-key columns and base non-key
column used in view primary-key. This semantic cannot be supported without
storage format change, see CASSANDRA-13826. For append-only use case, you
may still use this feature with a startup flag: "-Dcassandra.mv.allow_filtering_nonkey_columns_unsafe=true"
Compact Storage (only when upgrading from 3.X or any version lower than 3.0.15)
---------------
- Starting version 4.0, Thrift is no longer supported.
COMPACT STORAGE will no longer be supported after 'ALTER ... DROP COMPACT STORAGE'
is taken out of experimental mode. This will be done in a major future release, post 5.0 (CASSANDRA-19324).
'ALTER ... DROP COMPACT STORAGE' statement makes Compact Tables CQL-compatible,
exposing internal structure of Thrift/Compact Tables. You can find more details
on exposed internal structure under:
http://cassandra.apache.org/doc/latest/cql/appendices.html#appendix-c-dropping-compact-storage
For uninterrupted cluster upgrades, drivers now support 'NO_COMPACT' startup option.
Supplying this flag will have same effect as 'DROP COMPACT STORAGE', but only for the
current connection.
In order to upgrade, clients supporting a non-compact schema view can be rolled out
gradually. When all the clients are updated 'ALTER ... DROP COMPACT STORAGE' can be
executed. After dropping compact storage, ’NO_COMPACT' option will have no effect
after that.
Materialized Views
-------------------
Materialized Views (only when upgrading from any version lower than 3.0.15 (3.0 series) or 3.11.1 (3.X series))
---------------------------------------------------------------------------------------
- Cassandra will no longer allow dropping columns on tables with Materialized Views.
- A change was made in the way the Materialized View timestamp is computed, which
may cause an old deletion to a base column which is view primary key (PK) column
to not be reflected in the view when repairing the base table post-upgrade. This
condition is only possible when a column deletion to an MV primary key (PK) column
not present in the base table PK (via UPDATE base SET view_pk_col = null or DELETE
view_pk_col FROM base) is missed before the upgrade and received by repair after the upgrade.
If such column deletions are done on a view PK column which is not a base PK, it's advisable
to run repair on the base table of all nodes prior to the upgrade. Alternatively it's possible
to fix potential inconsistencies by running repair on the views after upgrade or drop and
re-create the views. See CASSANDRA-11500 for more details.
- Removal of columns not selected in the Materialized View (via UPDATE base SET unselected_column
= null or DELETE unselected_column FROM base) may not be properly reflected in the view in some
situations so we advise against doing deletions on base columns not selected in views
until this is fixed on CASSANDRA-13826.
3.11.0
======
Upgrading
---------
- Creating Materialized View with filtering on non-primary-key base column
(added in CASSANDRA-10368) is disabled, because the liveness of view row
is depending on multiple filtered base non-key columns and base non-key
column used in view primary-key. This semantic cannot be supported without
storage format change, see CASSANDRA-13826. For append-only use case, you
may still use this feature with a startup flag: "-Dcassandra.mv.allow_filtering_nonkey_columns_unsafe=true"
- The NativeAccessMBean isAvailable method will only return true if the
native library has been successfully linked. Previously it was returning
true if JNA could be found but was not taking into account link failures.
- Primary ranges in the system.size_estimates table are now based on the keyspace
replication settings and adjacent ranges are no longer merged (CASSANDRA-9639).
- In 2.1, the default for otc_coalescing_strategy was 'DISABLED'.
In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown
to be a performance regression. The default for 3.11.0 and newer has
been reverted to 'DISABLED'. Users upgrading from Cassandra 2.2 or 3.0 should
be aware that the default has changed.
- The StorageHook interface has been modified to allow to retrieve read information from
SSTableReader (CASSANDRA-13120).
3.10
====
New features
------------
- New `DurationType` (cql duration). See CASSANDRA-11873
- Runtime modification of concurrent_compactors is now available via nodetool
- Support for the assignment operators +=/-= has been added for update queries.
- An Index implementation may now provide a task which runs prior to joining
the ring. See CASSANDRA-12039
- Filtering on partition key columns is now also supported for queries without
secondary indexes.
- A slow query log has been added: slow queries will be logged at DEBUG level.
For more details refer to CASSANDRA-12403 and slow_query_log_timeout_in_ms
in cassandra.yaml.
- Support for GROUP BY queries has been added.
- A new compaction-stress tool has been added to test the throughput of compaction
for any cassandra-stress user schema. see compaction-stress help for how to use.
- Compaction can now take into account overlapping tables that don't take part
in the compaction to look for deleted or overwritten data in the compacted tables.
Then such data is found, it can be safely discarded, which in turn should enable
the removal of tombstones over that data.
The behavior can be engaged in two ways:
- as a "nodetool garbagecollect -g CELL/ROW" operation, which applies
single-table compaction on all sstables to discard deleted data in one step.
- as a "provide_overlapping_tombstones:CELL/ROW/NONE" compaction strategy flag,
which uses overlapping tables as a source of deletions/overwrites during all