我努力调整火花和 Cassandra . 我在cassandra中有1000万个数据,我正在使用spark-cassandra-connector执行像火花/直线读取操作 . 但需要15-20分钟 . 我有4个节点的cassandra和3个节点的火花 . 这是我的cassandra和spark配置 .

Cassandra :

listen_address: 192.168.xx.xx rpc_address: 192.168.xx.xx endpoint_snitch: GossipingPropertyFileSnitch auto_bootstrap: true start_rpc: true read_request_timeout_in_ms: 5000 write_request_timeout_in_ms: 2000 batch_size_warn_threshold_in_kb: 100 batch_size_fail_threshold_in_kb: 1000 authenticator: PasswordAuthenticator authorizer: CassandraAuthorizer request_timeout_in_ms: 300000 range_request_timeout_in_ms: 360000

火花:

spark.master spark://master:7077 spark.cassandra.connection.host 192.168.xx.xx,192.168.xx.xx,192.168.xx.xx,192.168.xx.xx spark.cassandra.connection.port 9042 spark.cassandra.auth.username cassandra spark.cassandra.auth.password cassandra spark.driver.memory 5g spark.executor.memory 6g spark.cassandra.input.consistency.level QUORUM spark.eventLog.enabled true spark.serializer org.apache.spark.serializer.KryoSerializer spark.cassandra.input.split.size_in_mb 128 spark.cassandra.input.fetch.size_in_rows 10000 spark.sql.qubole.split.computation true spark.sql.inmemorycolumnarstorage.compressed true