Cassandra schema management for a multi-environment development.
A gem to manage Cassandra database schema for Rails. This gem offers migrations and environment specific databases out-of-the-box for Rails users.
This enables you to use Cassandra in an organized way, combined with your ActiveRecord relational database.
- Cassandra 1.2 or higher with the native_transport_protocol turned on (Instructions to install cassandra locally)
- Ruby 1.9
- Rails > 3.2
gem install cassandra_migrations
The native transport protocol (sometimes called binary protocol, or CQL protocol) is not on by default on all version of Cassandra. If it is not you can enable by editing the CASSANDRA_DIR/conf/cassandra.yaml
file on all nodes in your cluster and set start_native_transport
to true
. You need to restart the nodes for this to have effect.
In your rails root directory run:
prepare_for_cassandra .
Which create the config/cassandra.yml
Open the newly-created config/cassandra.yml
and configure the database name for each of the environments, just like you would do for your regular database. The other options defaults should be enough for now.
development:
hosts: ['127.0.0.1']
port: 9042
keyspace: 'my_keyspace_name'
replication:
class: 'SimpleStrategy'
replication_factor: 1
SUPPORTED CONFIGURATION OPTIONS: For a list of supported options see the docs for Cassandra module, connect method in the DataStax Ruby Driver
There are a collection of rake tasks to help you manage the cassandra database (rake cassandra:create
, rake cassandra:migrate
, rake cassandra:drop
, etc.). For now this one does the trick:
rake cassandra:reset
rails generate cassandra_migration create_posts
In your migration file, make it create a table and drop it on its way back:
class CreatePosts < CassandraMigrations::Migration
def up
create_table :posts do |p|
p.integer :id, :primary_key => true
p.timestamp :created_at
p.string :title
p.text :text
end
end
def self.down
drop_table :posts
end
end
And now run:
rake cassandra:migrate
To create a table with compound primary key just specify the primary keys on table creation, i.e.:
class CreatePosts < CassandraMigrations::Migration
def up
create_table :posts, :primary_keys => [:id, :created_at] do |p|
p.integer :id
p.timestamp :created_at
p.string :title
p.text :text
end
end
def self.down
drop_table :posts
end
end
To create a table with a compound partition key specify the partition keys on table creation, i.e.:
class CreatePosts < CassandraMigrations::Migration
def up
create_table :posts, :partition_keys => [:id, :created_month], :primary_keys => [:created_at] do |p|
p.integer :id
p.string :created_month
p.timestamp :created_at
p.string :title
p.text :text
end
end
def self.down
drop_table :posts
end
end
To create a table with a secondary index you add it similar to regular rails indexes, i.e.:
class CreatePosts < CassandraMigrations::Migration
def up
create_table :posts, :primary_keys => [:id, :created_at] do |p|
p.integer :id
p.timestamp :created_at
p.string :title
p.text :text
end
create_index :posts, :title, :name => 'by_title'
end
def self.down
drop_index 'by_title'
drop_table :posts
end
end
The create_table method allow do pass a hash of options for:
- Clustering Order (clustering_order): A string such as 'a_decimal DESC'
- Compact Storage (compact_storage): Boolean, true or false
- Wait before GC (gc_grace_seconds): Default: 864000 [10 days]
- Others: See CQL Table Properties
Cassandra Migration will attempt to pass through the properties to the CREATE TABLE command.
Examples:
class WithClusteringOrderMigration < CassandraMigrations::Migration
def up
create_table :collection_lists, options: {
clustering_order: 'a_decimal DESC',
compact_storage: true,
gc_grace_seconds: 43200
} do |t|
t.uuid :id, :primary_key => true
t.decimal :a_decimal
end
end
end
The using_keyspace method in a migration allows to execute that migration in the context of a specific keyspace:
class WithAlternateKeyspaceMigration < CassandraMigrations::Migration
def up
using_keyspace('alternative') do
create_table :collection_lists, options: {compact_storage: true} do |t|
t.uuid :id, :primary_key => true
t.decimal :a_decimal
end
end
end
end
The overall workflow for a multiple keyspace env:
- define all of your keyspaces/environment combinations as separate environments
in
cassandra.yml
. You probably want to keep your main or default keyspace as just plaindevelopment
or 'production`, especially if you're using the queries stuff (so as to confuse Rails as little as possible) - make sure to run
rake cassandra:create
for all of them - if you use
using_keyspace
in all your migrations for keyspaces defined in environments other than the standard Rails ones, you won't have to run them for each 'special' environment.
Side Note: If you're going to be using multiple keyspaces in one application (specially with cql-rb), you probably want to just fully qualify your table names in your queries rather than having to call
USE <keyspace>
all over the place. Specially since cql-rb encourages you to only have one client object per application.
There are some other helpers like add_column
too.. take a look inside!
Support for C* collections is provided via the list, set and map column types.
class CollectionsListMigration < CassandraMigrations::Migration
def up
create_table :collection_lists do |t|
t.uuid :id, :primary_key => true
t.list :my_list, :type => :string
t.set :my_set, :type => :float
t.map :my_map, :key_type => :uuid, :value_type => :float
end
end
end
There are two ways to use the cassandra interface provided by this gem
# selects all posts
CassandraMigrations::Cassandra.select(:posts)
# more complex select query
CassandraMigrations::Cassandra.select(:posts,
:projection => 'title, created_at',
:selection => 'id > 1234',
:order_by => 'created_at DESC',
:limit => 10
)
# selects single row by uuid
CassandraMigrations::Cassandra.select(:posts,
:projection => 'title, created_at',
:selection => 'id = 6bc939c2-838e-11e3-9706-4f2824f98172',
:allow_filtering => true # needed for potentially expensive queries
)
# secondary options
If using gem version 0.2.3+, you can also select based on secondary options listed [here](http://datastax.github.io/ruby-driver/api/session/#execute_async-instance_method).
For instance, for the above query you might want your results to be paginated with 50 results on each page with a timeout of 200 seconds:
CassandraMigrations::Cassandra.select(:posts,
:projection => 'title, created_at',
:selection => 'id > 1234',
:order_by => 'created_at DESC',
:limit => 10,
:page_size => 50,
:timeout => 200
)
All listed options in the linked page above are supported though you can also pass in any secondary options using a "secondary_options" hash as shown below:
CassandraMigrations::Cassandra.select(:posts,
:projection => 'title, created_at',
:selection => 'id > 1234',
:order_by => 'created_at DESC',
:limit => 10,
{:secondary_options =>
{:page_size => 50,
{:timeout => 200}}
)
# adding a new post
CassandraMigrations::Cassandra.write!(:posts, {
:id => 9999,
:created_at => Time.current,
:title => 'My new post',
:text => 'lorem ipsum dolor sit amet.'
})
# adding a new post with TTL
CassandraMigrations::Cassandra.write!(:posts,
{
:id => 9999,
:created_at => Time.current,
:title => 'My new post',
:text => 'lorem ipsum dolor sit amet.'
},
:ttl => 3600
)
# updating a post
CassandraMigrations::Cassandra.update!(:posts, 'id = 9999',
:title => 'Updated title'
)
# updating a post with TTL
CassandraMigrations::Cassandra.update!(:posts, 'id = 9999',
{ :title => 'Updated title' },
:ttl => 3600
)
# deleting a post
CassandraMigrations::Cassandra.delete!(:posts, 'id = 1234')
# deleting a post title
CassandraMigrations::Cassandra.delete!(:posts, 'id = 1234',
:projection => 'title'
)
# deleting all posts
CassandraMigrations::Cassandra.truncate!(:posts)
Given a migration that generates a set type column as shown next:
class CreatePeople < CassandraMigrations::Migration
def up
create_table :people, :primary_keys => :id do |t|
t.uuid :id
t.string :ssn
...
t.set :emails, :type => :string
end
end
...
end
You can add new emails to the existing collection:
CassandraMigrations::Cassandra.update!(:people, "ssn = '867530900'",
{emails: ['[email protected]', '[email protected]']},
{operations: {emails: :+}})
You can remove emails from the collection:
CassandraMigrations::Cassandra.update!(:people, "ssn = '867530900'",
{emails: ['[email protected]']},
{operations: {emails: :-}})
Or, completely replace the existing values in the collection:
CassandraMigrations::Cassandra.update!(:people, "ssn = '867530900'",
{emails: ['[email protected]', '[email protected]']})
The same operations (addition :+
and subtraction :-
) are supported by all collection types.
Read more about C* collections at http://cassandra.apache.org/doc/cql3/CQL.html#collections
CassandraMigrations::Cassandra.execute('SELECT * FROM posts')
Select queries will return an enumerable object over which you can iterate. All other query types return nil
.
CassandraMigrations::Cassandra.select(:posts).each |post_attributes|
puts post_attributes
end
# => {'id' => 9999, 'created_at' => 2013-05-20 18:43:23 -0300, 'title' => 'My new post', 'text' => 'lorem ipsum dolor sit amet.'}
If your want some info about the table metadata just call it on a query result:
CassandraMigrations::Cassandra.select(:posts).metadata
# => {'id' => :integer, 'created_at' => :timestamp, 'title' => :varchar, 'text' => :varchar}
Please refer to the wiki: Using uuid data type
This gem comes with built-in compatibility with Passenger and its smart spawning functionality, so if you're using Passenger all you have to do is deploy and be happy!
To add cassandra database creation and migrations steps to your Capistrano recipe, just add the following line to you deploy.rb:
require 'cassandra_migrations/capistrano'
This gem is built upon the official Ruby Driver for Apache Cassandra by DataStax. Which supersedes the cql-rb gem (thank you Theo for doing an awesome job).