我正在尝试将我的RStudio服务器连接到我的DSE Analytics集群 .

设置:

  • CentOS 7

  • openjdk-1.8

  • RStudio Server v1.0.136(最新版本的sparklyr by >devtools::install_github("rstudio/sparklyr")

  • DSE 5.0(火花1.6.2)

  • 群集中的DC中的DSE Analytics的5个节点(由另一个DC for OLTP共享)

  • RStudio服务器运行DSE Analytics独立(VM)

因为,不像sparklyr tutorial,我'm bringing my own (DSE' s)Spark . SPARK_HOME 未设置 . 也不是 JAVA_HOME . 所以:

> Sys.setenv(JAVA_HOME = '/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64')  
> Sys.setenv(SPARK_HOME = '/usr/share/dse/spark/')

我的config.yml(找到了exaple here):

spark.cassandra.connection.host: <IP of one node>
spark.cassandra.auth.username: cassandra
spark.cassandra.auth.password: <PW>

sparklyr.defaultPackages:
- com.databricks:spark-csv_2.11:1.3.0
- com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M1
- com.datastax.cassandra:cassandra-driver-core:3.0.2

我的会话信息:

> devtools::session_info()
Session info --------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, linux-gnu           
 ui       RStudio (1.0.136)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Mexico_City         
 date     2017-02-02                  

Packages ----------------------------------------
 package    * version    date       source                           
 assertthat   0.1        2013-12-06 CRAN (R 3.3.2)                   
 backports    1.0.5      2017-01-18 CRAN (R 3.3.2)                   
 base64enc    0.1-3      2015-07-28 CRAN (R 3.3.2)                   
 config       0.2        2016-08-02 CRAN (R 3.3.2)                   
 curl         2.3        2016-11-24 CRAN (R 3.3.2)                   
 DBI          0.5-1      2016-09-10 CRAN (R 3.3.2)                   
 devtools     1.12.0     2016-12-05 CRAN (R 3.3.2)                   
 digest       0.6.12     2017-01-27 CRAN (R 3.3.2)                   
 dplyr        0.5.0      2016-06-24 CRAN (R 3.3.2)                   
 git2r        0.18.0     2017-01-01 CRAN (R 3.3.2)                   
 htmltools    0.3.5      2016-03-21 cran (@0.3.5)                    
 httpuv       1.3.3      2015-08-04 cran (@1.3.3)                    
 httr         1.2.1      2016-07-03 CRAN (R 3.3.2)                   
 jsonlite     1.2        2016-12-31 CRAN (R 3.3.2)                   
 magrittr     1.5        2014-11-22 CRAN (R 3.3.2)                   
 memoise      1.0.0      2016-01-29 CRAN (R 3.3.2)                   
 mime         0.5        2016-07-07 CRAN (R 3.3.2)                   
 packrat      0.4.8-1    2016-09-07 CRAN (R 3.3.2)                   
 R6           2.2.0      2016-10-05 CRAN (R 3.3.2)                   
 Rcpp         0.12.9     2017-01-14 CRAN (R 3.3.2)                   
 rprojroot    1.2        2017-01-16 CRAN (R 3.3.2)                   
 rstudioapi   0.6        2016-06-27 CRAN (R 3.3.2)                   
 shiny        1.0.0      2017-01-12 cran (@1.0.0)                    
 sparklyr   * 0.5.3-9000 2017-02-02 Github (rstudio/sparklyr@bd4aee0)
 tibble       1.2        2016-08-26 CRAN (R 3.3.2)                   
 withr        1.0.2      2016-06-20 CRAN (R 3.3.2)                   
 xtable       1.8-2      2016-02-05 cran (@1.8-2)                    
 yaml         2.1.14     2016-11-12 CRAN (R 3.3.2)

现在,当我尝试生成spark上下文时,这就是我得到的:

> sc <- spark_connect(master = "spark://<IP of one node>", config = spark_config(file = "config.yml"), version = "1.6.2")  
Error in force(code) : 
  Failed while connecting to sparklyr to port (8880) for sessionid (646): Gateway in port (8880) did not respond.
    Path: /usr/share/dse/spark/bin/spark-submit
    Parameters: --class, sparklyr.Backend, --jars, '/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/spark-csv_2.11-1.3.0.jar','/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/commons-csv-1.1.jar','/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/univocity-parsers-1.5.1.jar', '/home/emiliano/rprojects/sparklyr_test/packrat/lib/x86_64-redhat-linux-gnu/3.3.2/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 646


---- Output Log ----
Failed to find Spark assembly in /usr/share/dse/spark/lib.
You need to build Spark before running this program.

---- Error Log ----

从这个输出,我的猜测是,闪闪发光的人没有认识到 DSE Analytics 的火花 . 据我了解,DSE 's spark it'与Cassandra及其连接器深深集成,它甚至有自己的 dse spark-submit . 我把错误的配置传给了闪闪发光的人 . 我只是失去了传递给它的东西 . 欢迎任何帮助 . 谢谢 .

编辑:我显然用 > sc <- spark_connect(master="local") 遇到了同样的错误