Cassandra CQL中的字符串排序-Java 学习之路

在Cassandra CQL中查询文本主键时，字符串比较与预期的方式相反，即

cqlsh:test> select * from sl;

 name                     | data
--------------------------+------
 000000020000000000000003 | null
 000000010000000000000005 | null
 000000010000000000000003 | null
 000000010000000000000002 | null
 000000010000000000000001 | null

cqlsh:test> select name from sl where token(name) < token('000000010000000000000005');
name
--------------------------
 000000020000000000000003

(1 rows)

cqlsh:test> select name from sl where token(name) > token('000000010000000000000005');
 name
--------------------------
 000000010000000000000003
 000000010000000000000002
 000000010000000000000001

(3 rows)

相比之下，这是我从Python中的字符串比较中获得的（我认为在大多数其他语言中）：

>>>'000000020000000000000003' < '000000010000000000000005'
False

如果我在没有令牌功能的情况下查询，则会收到以下错误：

cqlsh:test> select name from sl where name < '000000010000000000000005';
Bad Request: Only EQ and IN relation are supported on the partition key (unless you use the token() function)

表格描述如下：

CREATE TABLE sl (
  name text,
  data blob,
  PRIMARY KEY (name)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

在我错过的文档或其他地方为什么会选择这样一个奇怪的字符串比较顺序，或者字符串比较运算符不符合我的预期（即返回一些不相关的顺序，即顺序为将它们写入数据库时的行） . 我正在使用Murmur3Partitioner分区程序以防万一 .

2 回答

3
在Cassandra中，行按其键值的哈希值排序 . 使用Random和Murmur3分区器时，哈希值有一个随机元素，因此顺序是A）没有意义，B）设计为均匀分布在环上 .

因此，查询小于 token('000000010000000000000005') 的令牌将不会基于"000000010000000000000005"的字符串值进行比较 . 它将对散列标记值进行比较 . 根据您看到的结果，字符串"000000020000000000000003"的标记值小于"000000010000000000000005"的标记值 .

有关更多信息，请查看DataStax中的此文档：Paging Through Unordered Partitioner Results .

假设您希望能够通过“name”的值查询数据，您可以构建一个这样的表：
```
CREATE TABLE sl (
  type text,
  name text,
  data blob,
  PRIMARY KEY (type, name)
)
```
我创建了 type 作为分区键 . 我更多的是为了举例而不是其他任何事情 . 无论如何，使用 name 作为聚类键（确定磁盘排序顺序），此查询将起作用：
```
select * from sl where type='sometype' AND name < '000000010000000000000005';
```
它只是一个例子，但我希望这有助于指出你正确的方向 .
回复于 2024-05-03T18:16:27+08:00
3
以下是有关令牌功能和相关分页的文档的一些链接 . 为广泛的主题道歉 . 我不确切知道哪些可能会有所帮助：
- http://www.datastax.com/documentation/cql/3.1/cql/cql_using/paging_c.html通过无序分区结果进行分页意味着使用Murmur3Partitioner确实很重要 .
- http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__paging-through-unordered-results部分说使用RandomPartitioner进行分页不会给您带来有意义的结果 . 在这种情况下，RandomPartitioner与Murmer3Partitioner同义 . 文档应该提到两者 .
- http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0请参阅自动分页 .
- http://datastax.github.io/python-driver/query_paging.html
- http://www.datastax.com/drivers/java/2.0/index.html请参见ResultSet .
回复于 2024-05-03T18:16:27+08:00

Cassandra CQL中的字符串排序

2 回答

相关问题