在CQL查询中使用Cassandra 'com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex'-Java 学习之路

com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex是Datastax为Solr集成引入的自定义Cassandra索引类型 . 我的主要问题是：_1565043已经尝试了一些带有索引列过滤器的CQL查询，但它们总是以RPC超时结束 .

My use case:

我有一个表，其中查询通常涉及多列的过滤器 . 自从Cassandra 's native secondary indexes can only be defined in one column at a time (i.e. one index = one column) and only one index can be used by any given CQL query, I figured that I can't完成我的应用程序's read requirements using CQL. This is why I resorted to Solr for ALL read operations - because Solr can filter on multiple columns at once. This works fine for most cases; BUT I have two queries that turned out to be too heavy for Solr. Now, I want to try Spark because I' ve阅读其惊人的分析功能 . 但是，我偶然发现了一个拦截器：Spark依靠CQL "WHERE"过滤掉将从Cassandra加载到Spark的数据 . 因为CQL查询似乎可以知道如何将我的数据加载到Spark中 . 我知道在将数据从Cassandra加载到Spark时，不必在Cassandra服务器端进行过滤;但在我的情况下，它是 required 因为表太大（在RF = 2时，大约有40亿条记录分布在6个节点上） . 我试图在我打算过滤的其中一列中定义一个本机Cassandra索引，但Cassandra抛出一个错误，说该列已存在一个索引（即Cql3SolrSecondaryIndex索引） .

正如我现在看来的那样：DSE强迫我在Solr和Spark之间做出选择 - 如果我在Solr核心中包含一个列，那么将在该列中定义一个Cql3SolrSecondaryIndex索引，我无法再将其定义为本地Cassandra索引 . 如果没有本机Cassandra索引，CQL查询就无法对该列进行过滤 . 如果没有服务器端CQL过滤，Spark会阻止尝试加载所有40亿行，并可能触发OOM .

我的印象是否正确？有解决方法吗？

1 回答

0

您可以使用CQL solr查询在CQL中使用solr索引 . 建议不要将其用于生产环境用途（坚持使用HTTP API），但在您的情况下，这可能是最好的选择 .

语法如下：

SELECT ...

FROM ...

where solr_query = 'search expression'

[LIMIT ....]

您的搜索表达式应符合Lucene语法 .

以下是Datastax文档的链接：http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/srch/srchCql.html

回复于 2024-04-20T08:40:07+08:00

在CQL查询中使用Cassandra 'com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex'

1 回答

相关问题