The purpose of this guide is to detail changes made by successive versions of the Java driver.
Frozen
annotations in the mapper are no longer checked at runtime (see [JAVA-843|https://datastax-oss.atlassian.net/browse/JAVA-843] for more explanations). So they become purely informational at this stage. However it is a good idea to keep using these annotations and make sure they match the schema, in anticipation for the schema generation features that will be added in a future version.
2.1.8 is binary-compatible with 2.1.7 but introduces a small change in the driver's behavior:
- The list of contact points provided at startup is now shuffled before trying
to open the control connection, so that multiple clients with the same contact
points don't all pick the same control host. As a result, you can't assume that
the driver will try contact points in a deterministic order. In particular, if
you use the
DCAwareRoundRobinPolicy
without specifying a primary datacenter name, make sure that you only provide local hosts as contact points.
This version brings a few changes in the driver's behavior; none of them break binary compatibility.
-
The
DefaultRetryPolicy
's behaviour has changed in the case of an Unavailable exception received from a request. The new behaviour will cause the driver to process a Retry on a different node at most once, otherwise an exception will be thrown. This change makes sense in the case where the node tried initially for the request happens to be isolated from the rest of the cluster (e.g. because of a network partition) but can still answer to the client normally. In this case, trying another node has a chance of success. The previous behaviour was to always throw an exception. -
The following properties in
PoolingOptions
were renamed:MaxSimultaneousRequestsPerConnectionThreshold
toNewConnectionThreshold
MaxSimultaneousRequestsPerHostThreshold
toMaxRequestsPerConnection
The old getters/setters were deprecated, but they delegate to the new ones.
Also, note that the connection pool for protocol v3 can now be configured to use multiple connections. See this page for more information.
-
MappingManager(Session)
will now force the initialization of theSession
if needed. This is a change from 2.1.6, where if you gave it an uninitialized session (created withCluster#newSession()
instead ofCluster#connect()
), it would only get initialized on the first request.If this is a problem for you,
MappingManager(Session, ProtocolVersion)
preserves the previous behavior (see the API docs for more details). -
A
BuiltStatement
is now considered non-idempotent whenever afcall()
orraw()
is used to build a value to be inserted in the database. If you know that the CQL functions or expressions are safe, usesetIdempotent(true)
on the statement.
See 2.0.10.
2.1.2 brings important internal changes with native protocol v3 support, but the impact on the public API has been kept as low as possible.
- The native protocol version is now modelled as an enum:
ProtocolVersion
. Most public methods that take it as an argument have a backward-compatible version that takes anint
(the exception beingRegularStatement
, described below). For new code, prefer the enum version.
-
RegularStatement.getValues
now takes the protocol version as aProtocolVersion
instead of anint
. This is transparent for callers since there is a backward-compatible alternative, but if you happened to extend the class you'll need to update your implementation. -
BatchStatement.setSerialConsistencyLevel
now returnsBatchStatement
instead ofStatement
. Again, this only matters if you extended this class (if so, it might be a good idea to also have a covariant return in your child class). -
The constructor of
UnsupportedFeatureException
now takes aProtocolVersion
as a parameter. This should impact few users, as there's hardly any reason to build instances of that class from client code.
These features are only active when the native protocol v3 is in use.
-
The driver now uses a single connection per host (as opposed to a pool in 2.1.1). Most options in
PoolingOptions
are ignored, except for a new one calledmaxSimultaneousRequestsPerHostThreshold
. See the class's Javadocs for detailed explanations. -
You can now provide a default timestamp with each query (but it will be ignored if the CQL query string already contains a
USING TIMESTAMP
clause). This can be done on a per-statement basis withStatement.setDefaultTimestamp
, or automatically with aTimestampGenerator
specified withCluster.Builder.withTimestampGenerator
(two implementations are provided:ThreadLocalMonotonicTimestampGenerator
andAtomicMonotonicTimestampGenerator
). If you specify both, the statement's timestamp takes precedence over the generator. By default, the driver has the same behavior as 2.1.1 (no generator, timestamps are assigned by Cassandra unlessUSING TIMESTAMP
was specified). -
BatchStatement.setSerialConsistencyLevel
no longer throws an exception, it will honor the serial consistency level for the batch.
- The
ResultSet
interface has a newwasApplied()
method. This will only affect clients that provide their own implementation of this interface.
-
The
getCaching
method ofTableMetadata#Options
now returns aMap
to account for changes to Cassandra 2.1. Also, thegetIndexInterval
method now returns anInteger
instead of anint
which will benull
when connected to Cassandra 2.1 nodes. -
BoundStatement
variables that have not been set explicitly will no longer default tonull
. Instead, all variables must be bound explicitly, otherwise the execution of the statement will fail (this also applies to statements inside of aBatchStatement
). For variables that map to a primitive Java type, a newsetToNull
method has been added. We made this change because the driver might soon distinguish between unset and null variables, so we don't want clients relying on the "leave unset to set tonull
" behavior.
The changes listed in this section should normally not impact end users of the driver, but rather third-party frameworks and tools.
-
The
serialize
anddeserialize
methods inDataType
now take an additional parameter: the protocol version. As explained in the javadoc, if unsure, the proper value to use for this parameter is the protocol version in use by the driver, i.e. the value returned bycluster.getConfiguration().getProtocolOptions().getProtocolVersion()
. -
The
parse
method inDataType
now returns a Java object, not aByteBuffer
. The previous behavior can be obtained by calling theserialize
method on the returned object. -
The
getValues
method ofRegularStatement
now takes the protocol version as a parameter. As above, the proper value if unsure is almost surely the protocol version in use (cluster.getConfiguration().getProtocolOptions().getProtocolVersion()
).
2.0.11 preserves binary compatibility with previous versions. There are a few changes in the driver's behavior:
-
The
DefaultRetryPolicy
's behaviour has changed in the case of an Unavailable exception received from a request. The new behaviour will cause the driver to process a Retry on a different node at most once, otherwise an exception will be thrown. This change makes sense in the case where the node tried initially for the request happens to be isolated from the rest of the cluster (e.g. because of a network partition) but can still answer to the client normally. In this case, trying another node has a chance of success. The previous behaviour was to always throw an exception. -
A
BuiltStatement
is now considered non-idempotent whenever afcall()
orraw()
is used to build a value to be inserted in the database. If you know that the CQL functions or expressions are safe, usesetIdempotent(true)
on the statement. -
The list of contact points provided at startup is now shuffled before trying to open the control connection, so that multiple clients with the same contact points don't all pick the same control host. As a result, you can't assume that the driver will try contact points in a deterministic order. In particular, if you use the
DCAwareRoundRobinPolicy
without specifying a primary datacenter name, make sure that you only provide local hosts as contact points.
We try to avoid breaking changes within a branch (2.0.x to 2.0.y), but 2.0.10 saw a lot of new features and internal improvements. There is one breaking change:
LatencyTracker#update
now has a different signature and takes two new parameters: the statement that has been executed (never null), and the exception thrown while executing the query (or null, if the query executed successfully). Existing implementations of this interface, once upgraded to the new method signature, should continue to work as before.
The following might also be of interest:
-
SocketOptions#getTcpNoDelay()
is now TRUE by default (it was previously undefined). This reflects the new behavior of Netty (which was upgraded from version 3.9.0 to 4.0.27):TCP_NODELAY
is now turned on by default, instead of depending on the OS default like in previous versions. -
Netty is not shaded anymore in the default Maven artifact. However we publish a shaded artifact under a different classifier.
-
The internal initialization sequence of the Cluster object has been slightly changed: some fields that were previously initialized in the constructor are now set when the
init()
method is called. In particular,Cluster#getMetrics()
will returnnull
until the cluster is initialized.
We used the opportunity of a major version bump to incorporate your feedback and improve the API, to fix a number of inconsistencies and remove cruft. Unfortunately this means there are some breaking changes, but the new API should be both simpler and more complete.
The following describes the changes for 2.0 that are breaking changes of the 1.0 API. For ease of use, we distinguish two categories of API changes: the "main" ones and the "other" ones.
The "main" API changes are the ones that are either likely to affect most upgraded apps or are incompatible changes that, even if minor, will not be detected at compile time. Upgraders are highly encouraged to check this list of "main" changes while upgrading their application to 2.0 (even though most applications are likely to be affected by only a handful of changes).
The "other" list is, well, other changes: those that are likely to affect a minor number of applications and will be detected by compile time errors anyway. It is ok to skip those initially and only come back to them if you have trouble compiling your application after an upgrade.
-
The
Query
class has been renamed intoStatement
(it was confusing to some that theBoundStatement
was not aStatement
). To allow this, the oldStatement
class has been renamed toRegularStatement
. -
The
Cluster
andSession
shutdown API has changed. There is now acloseAsync
that is asynchronous but returns aFuture
on the completion of the shutdown process. There is also aclose
shortcut that does the same but blocks. Also,close
now waits for ongoing queries to complete by default (but you can force the closing of all connections if you want to). -
NoHostAvailableException#getErrors
now returns the full exception objects for each node instead of just a message. In other words, it returns aMap<InetAddress, Throwable>
instead of aMap<InetAddress, String>
. -
Statement#getConsistencyLevel
(previouslyQuery#getConsistencyLevel
, see first point) will now returnnull
by default (instead ofCL.ONE
), with the meaning of "use the default consistency level". The default consistency level can now be configured through the newQueryOptions
object in the clusterConfiguration
. -
The
Metrics
class now uses the Codahale metrics library version 3 (version 2 was used previously). This new major version of the library has many API changes compared to its version 2 (see the release notes for details), which can thus impact consumers of the Metrics class. Furthermore, the defaultJmxReporter
now includes a name specific to the cluster instance (to avoid conflicts when multiple Cluster instances are created in the same JVM). As a result, tools that were polling JMX info will have to be updated accordingly. -
The
QueryBuilder#in
method now has the following special case: usingQueryBuilder.in(QueryBuilder.bindMarker())
will generate the stringIN ?
, notIN (?)
as was the case in 1.0. The reasoning being that the former syntax, made valid by CASSANDRA-4210 is a lot more useful thanIN (?)
, as the latter can more simply use an equality. Note that if you really want to outputIN (?)
with the query builder, you can useQueryBuilder.in(QueryBuilder.raw("?"))
. -
When binding values by name in
BoundStatement
(i.e. using thesetX(String, X)
methods), if more than one variable have the same name, then all values corresponding to that variable name are set instead of just the first occurrence. -
The
QueryBuilder#raw
method does not automatically add quotes anymore, but rather output its result without any change (as the raw name implies). This means for instance thateq("x", raw(foo))
will outputx = foo
, notx = 'foo'
(you don't need the raw method to output the latter string). -
The
QueryBuilder
will now sometimes use the new ability to send value as bytes instead of serializing everything to string. In general the QueryBuilder will do the right thing, but if you were calling thegetQueryString()
method on a Statement created with a QueryBuilder (for other reasons than to prepare a query) then the returned string may contain bind markers in place of some of the values provided (and in that case,getValues()
will contain the values corresponding to those markers). If need be, it is possible to force the old behavior by using the newsetForceNoValues()
method.
-
Creating a Cluster instance (through
Cluster#buildFrom
or theCluster.Builder#build
method) does not create any connection right away anymore (and thus cannot throw aNoHostAvailableException
or anAuthenticationException
). Instead, the initial contact points are checked the first time a call toCluster#connect
is done. If for some reason you want to emulate the previous behavior, you can use the new methodCluster#init
:Cluster.builder().build()
in 1.0 is equivalent toCluster.builder().build().init()
in 2.0. -
Methods from
Metadata
,KeyspaceMetadata
andTableMetadata
now use by default case insensitive identifiers (for keyspace, table and column names in parameter). You can double-quote an identifier if you want it to be a case sensitive one (as you would do in CQL) and there is aMetadata.quote
helper method for that. -
The
TableMetadata#getClusteringKey
method has been renamedTableMetadata#getClusteringColumns
to match the "official" vocabulary. -
The
UnavailableException#getConsistency
method has been renamed toUnavailableException#getConsistencyLevel
for consistency with the method ofQueryTimeoutException
. -
The
RegularStatement
class (ex-Statement
class, see above) must now implement two additional methods:RegularStatement#getKeyspace
andRegularStatement#getValues
. If you had extended this class, you will have to implement those new methods, but both can return null if they are not useful in your case. -
The
Cluster.Initializer
interface should now implement 2 new methods:Cluster.Initializer#getInitialListeners
(which can return an empty collection) andCluster.Initializer#getClusterName
(which can return null). -
The
Metadata#getReplicas
method now takes 2 arguments. On top of the partition key, you must now provide the keyspace too. The previous behavior was buggy: it's impossible to properly return the full list of replica for a partition key without knowing the keyspace since replication may depend on the keyspace). -
The method
LoadBalancingPolicy#newQueryPlan()
method now takes the currently logged keyspace as 2nd argument. This information is necessary to do proper token aware balancing (see preceding point). -
The
ResultSetFuture#set
andResultSetFuture#setException
methods have been removed (from the public API at least). They were never meant to be exposed publicly: aresultSetFuture
is always set by the driver itself and should not be set manually. -
The deprecated since 1.0.2
Host.HealthMonitor
class has been removed. You will now need to useHost#isUp
andCluster#register
if you were using that class.
This section details the biggest additions to 2.0 API wise. It is not an exhaustive list of new features in 2.0.
-
The new
BatchStatement
class allows to group any type of insert Statements (BoundStatement
orRegularStatement
) for execution as a batch. For instance, you can do something like:List<String> values = ...; PreparedStatement ps = session.prepare("INSERT INTO myTable(value) VALUES (?)"); BatchStatement bs = new BatchStatement(); for (String value : values) bs.add(ps.bind(value)); session.execute(bs);
-
SimpleStatement
can now take a list of values in addition to the query. This allows to do the equivalent of a prepare+execute but with only one round-trip to the server and without keeping the prepared statement after the execution.This is typically useful if a given query should be executed only once (i.e. you don't want to prepare it) but you also don't want to serialize all values into strings. Shortcut
Session#execute()
andSession#executeAsync()
methods are also provided so you that you can do:String imgName = ...; ByteBuffer imgBytes = ...; session.execute("INSERT INTO images(name, bytes) VALUES (?, ?)", imgName, imgBytes);
-
SELECT queries are now "paged" under the hood. In other words, if a query yields a very large result, only the beginning of the
ResultSet
will be fetched initially, the rest being fetched "on-demand". In practice, this means that:for (Row r : session.execute("SELECT * FROM mytable")) ... process r ...
should not timeout or OOM the server anymore even if "mytable" contains a lot of data. In general paging should be transparent for the application (as in the example above), but the implementation provides a number of knobs to fine tune the behavior of that paging:
- the size of each "page" can be set per-query (
Statement#setFetchSize()
) - the
ResultSet
object provides 2 methods to check the state of paging (ResultSet#getAvailableWithoutFetching
andResultSet#isFullyFetched
) as well as a mean to force the pre-fetching of the next page (ResultSet#fetchMoreResults
).
- the size of each "page" can be set per-query (