首页 文章

Azure Data Lake Store作为Databricks中的EXTERNAL TABLE

提问于
浏览
1

如何在Azure Databricks中创建从Azure Data Lake Store读取的EXTERNAL TABLE?如果有可能的话,我在文档中看到了问题 . 我在Azure Data Lake Store的特定文件夹中有一组CSV文件,我想在Azure Databricks中创建一个指向CSV文件的CREATE EXTERNAL TABLE .

2 回答

  • 0

    您可以将Azure Data Lake Store(ADLS)安装到Azure Databricks DBFS(需要4.0运行时或更高版本):

    # Get Azure Data Lake Store credentials from the secret store
        clientid = dbutils.preview.secret.get(scope = "adls", key = "clientid")
        credential = dbutils.preview.secret.get(scope = "adls", key = "credential")
        refreshurl = dbutils.preview.secret.get(scope = "adls", key = "refreshurl")
         accounturl = dbutils.preview.secret.get(scope = "adls", key = "accounturl")
    
        # Mount the ADLS
        configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
           "dfs.adls.oauth2.client.id": clientid,
           "dfs.adls.oauth2.credential": credential,
           "dfs.adls.oauth2.refresh.url": refreshurl}
    
        dbutils.fs.mount(
           source = accounturl,
           mount_point = "/mnt/adls",
           extra_configs = configs)
    

    表创建的工作方式与DBFS相同 . 只需使用ADLS中的目录引用mountpoint,例如: G . :

    %sql 
        CREATE TABLE product
        USING CSV
        OPTIONS (header "true", inferSchema "true")
        LOCATION "/mnt/adls/productscsv/"
    

    location子句自动暗示EXTERNAL . 另见Azure Databricks Documentation .

  • 2

    你应该考虑看看这个链接:https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html

    使用Spark API访问Azure Data Lake Store要从Data Lake Store帐户读取,您可以将Spark配置为使用服务凭据和笔记本中的以下代码段:

    spark.conf.set(“dfs.adls.oauth2.access.token.provider.type”,“ClientCredential”)spark.conf.set(“dfs.adls.oauth2.client.id”,“{您的服务客户端ID}“)spark.conf.set(”dfs.adls.oauth2.credential“,”“)spark.conf.set(”dfs.adls.oauth2.refresh.url“,”https:/ /login.microsoftonline.com/,YOUR DIRECTORY ID} / oauth2 / token“)

    它没有提到使用外部表 .

相关问题