Cassandra CQL3使用复合主键从表中选择行键-Java 学习之路

我正在使用Cassandra 1.2.7和使用CQL3的官方Java驱动程序 .

假设由一个表创建

CREATE TABLE foo ( 
    row int, 
    column int, 
    txt text, 
    PRIMARY KEY (row, column)
);

然后我想预制相当于 SELECT DISTINCT row FROM foo

至于我的理解，应该可以在Cassandra的数据模型中有效地执行这个查询（给定复合主键的实现方式），因为它只是查询'raw'表 .

我搜索了CQL文档但我没有找到任何选项来做到这一点 .

我的备份计划是创建一个单独的表 - 类似于

CREATE TABLE foo_rows (
    row int,
    PRIMARY KEY (row)
);

但这需要让两者保持同步的麻烦 - 写入foo_rows用于foo中的任何写入（也是性能损失） .

那么有没有办法查询不同的行（分区）键？

3 回答

7
根据documentation，从CQL版本3.11开始，cassandra了解DISTINCT修饰符 . 所以你现在可以写了
```
SELECT DISTINCT row FROM foo
```
回复于 2024-04-29T09:24:04+08:00
4
我先给你一个不好的方法 . 如果您插入这些行：
```
insert into foo (row,column,txt) values (1,1,'First Insert');
insert into foo (row,column,txt) values (1,2,'Second Insert');
insert into foo (row,column,txt) values (2,1,'First Insert');
insert into foo (row,column,txt) values (2,2,'Second Insert');
```
做一个
```
'select row from foo;'
```
会给你以下内容：
```
row
-----
   1
   1
   2
   2
```
不明显，因为它显示了行和列的所有可能组合 . 要查询以获取一行值，可以添加列值：
```
select row from foo where column = 1;
```
但是你会得到这个警告：
```
Bad Request: Cannot execute this query as it might involve data filtering and thus may  have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
```
好 . 然后用这个：
```
select row from foo where column = 1 ALLOW FILTERING;

 row
-----
   1
   2
```
大 . 我想要的 . 我们不要忽视那个警告 . 如果你只有少量行，比如说10000，那么这将在没有大幅提升性能的情况下发挥作用 . 如果我有10亿呢？根据节点数量和复制因素，您的性能将受到严重影响 . 首先，查询必须扫描表中的每个可能的行（读取全表扫描），然后筛选结果集的唯一值 . 在某些情况下，此查询将超时 . 鉴于此，可能不是你想要的 .

您提到您担心插入多个表时性能受到影响 . 多表插入是一种非常有效的数据建模技术 . Cassandra 可以做大量的写作 . 至于同步的痛苦，我不知道你的确切应用，但我可以给出一般提示 .

如果需要进行不同的扫描，则需要考虑分区列 . 这就是我们所说的索引或查询表 . 在任何Cassandra数据模型中要考虑的重要事项是应用程序查询 . 如果我使用IP地址作为行，我可能会创建这样的东西来扫描我依次拥有的所有IP地址 .
```
CREATE TABLE ip_addresses (
 first_quad int,
 last_quads ascii,
 PRIMARY KEY (first_quad, last_quads)
);
```
现在，要在我的192.x.x.x地址空间中插入一些行：
```
insert into ip_addresses (first_quad,last_quads) VALUES (192,'000000001');
insert into ip_addresses (first_quad,last_quads) VALUES (192,'000000002');
insert into ip_addresses (first_quad,last_quads) VALUES (192,'000001001');
insert into ip_addresses (first_quad,last_quads) VALUES (192,'000001255');
```
为了获得192空间中的不同行，我这样做：
```
SELECT * FROM ip_addresses WHERE first_quad = 192;

 first_quad | last_quads
------------+------------
        192 |  000000001
        192 |  000000002
        192 |  000001001
        192 |  000001255
```
要获取每个地址，您只需要迭代0-255中的每个可能的行键 . 在我的例子中，我希望应用程序要求特定的范围来保持高性能 . 您的应用程序可能有不同的需求，但希望您可以在此处查看模式 .
回复于 2024-04-29T09:24:04+08:00
0
@edofic

分区行键用作唯一索引以区分存储引擎中的不同行，因此本质上，行键始终是不同的 . 您不需要在SELECT子句中放入DISTINCT

例
```
INSERT INTO foo(row,column,txt) VALUES (1,1,'1-1');
 INSERT INTO foo(row,column,txt) VALUES (2,1,'2-1');
 INSERT INTO foo(row,column,txt) VALUES (1,2,'1-2');
```
然后
```
SELECT row FROM foo
```
将返回2个值：1和2

以下是Cassandra的持续性

|行键| column1 / value | column2 / value |

| 1 | 1 / '1' | 2 / '2' |
| 2 | 1 / '1' | |
回复于 2024-04-29T09:24:04+08:00

Cassandra CQL3使用复合主键从表中选择行键

3 回答

相关问题