Cassandra and the Cloud go together like <insert hilarious comparison here>. So does multi-tenancy and the Cloud.
As well as being able to save costs by elastically horizontally scaling in response to load during a day, it’s an attractive idea to make further strides by sharing hardware between customers (multi-tenanting).
I’m not against multi-tenancy per se… I think it’s a great idea. But I am against doing it with Cassandra and here’s why…
You basically have 3 real options for multi-tenanting Cassandra, each with pros and cons:
By Keyspace
Pros:
- Tenants may have different replication factors.
- Tenants may have different schemas (enabling them to potentially run different applications).
- Tenants don’t share SSTables and data is effectively isolated by Cassandra.
Cons:
- Your scalability is limited because holding all the extra keyspaces and their tables in memory does not play well with Cassandra.
By Table
Pros:
- Tenants may have different schemas (enabling them to potentially run different applications).
- Tenants don’t share SSTables and data is effectively isolated by Cassandra.
Cons:
- Your scalability is limited because holding all the extra tables in memory does not play well with Cassandra.
- More complex application logic is needed to deal with different tables per tenant.
By Row
Pros:
- A scalable approach as your memory requirements are the same as for one tenant.
- Tenants must have compatible schemas.
- Tenants share both SSTables and Memtables, so isolation has to be done in the application, but active tenants can hinder the effectiveness of the shared caches for other tenants and cause compaction issues that will impact every tenant.
Having multiple Logical Datacenters has also been suggested to isolate the workloads of different customers. But this is designed for different workloads (e.g. BAU and Analytics) on the same data. If you want to isolate by tenant (data) then you’ll lose the benefits of multi-tenancy as you’ll no longer be sharing the hardware.
You can not win.
If you really want to save wastage costs from unused hardware, I'd consider increasing the granularity of your instances so that the wastage is reduced.
What would you suggest as an alternative to using Cassandra in this situation?
ReplyDeleteHere is a presentation that uses a "by Partition" scheme for a multi tenant Cassandra database:
ReplyDeletehttps://academy.datastax.com/resources/blackrock-multi-tenancy-in-cassandra-at-blackrock
I've been meaning to update this with the news that I have since done some deployments of multi-tenanted Cassandra clusters with the "by keyspace" approach (data isolation is very important to our tenants). However I've only done this with tenants who have very low demands such that being a sole tenant with a replication factor of 3 would seem unjustified. These shared-tenancy by keyspace environments also have a strictly limited occupancy dictated by our particular schema and our heap size.
ReplyDelete