首页 文章

如何更快地从azure documentdb获取数据

提问于
浏览
-1

我正在尝试实现此示例:

https://github.com/Azure/azure-documentdb-python/blob/master/samples/DatabaseManagement/Program.py

从azure documentdb获取数据并进行一些可视化 . 但是,我想在这里使用#error这一行的查询 .

def read_database(client, id):
    print('3. Read a database by id')

    try:

       db = next((data for data in client.ReadDatabases() if data['id'] == database_id))
       coll = next((coll for coll in client.ReadCollections(db['_self']) if coll['id'] == database_collection))
       return list(itertools.islice(client.ReadDocuments(coll['_self']), 0, 100, 1))

    except errors.DocumentDBError as e:
        if e.status_code == 404:
            print('A Database with id \'{0}\' does not exist'.format(id))
        else:
            raise errors.HTTPFailure(e.status_code)

当我想获得> 10k的物品时,取物真的很慢,我该如何改进?

谢谢!

1 回答

  • 0

    您无法直接通过数据库实体查询文档 .

    代码中使用的ReadDocuments()方法的参数应该是集合链接和查询选项 .

    def ReadDocuments(self, collection_link, feed_options=None):
        """Reads all documents in a collection.
    
        :Parameters:
            - `collection_link`: str, the link to the document collection.
            - `feed_options`: dict
    
        :Returns:
            query_iterable.QueryIterable
    
        """
        if feed_options is None:
            feed_options = {}
    
        return self.QueryDocuments(collection_link, None, feed_options)
    

    因此,您可以修改您的代码,如下所示:

    # Initialize the Python DocumentDB client
    client = document_client.DocumentClient(config['ENDPOINT'], {'masterKey': config['MASTERKEY']})
    
    db = "db"
    coll = "coll"
    
    try:
        database_link = 'dbs/' + db
        database = client.ReadDatabase(database_link)
    
        collection_link = 'dbs/' + db + "/colls/" + coll
        collection = client.ReadCollection(collection_link)
    
        # options = {}
        # options['enableCrossPartitionQuery'] = True
        # options['partitionKey'] = 'jay'
        docs = client.ReadDocuments(collection_link)
        print(list(docs))
    
    except errors.DocumentDBError as e:
        if e.status_code == 404:
            print('A Database with id \'{0}\' does not exist'.format(id))
        else:
            raise errors.HTTPFailure(e.status_code)
    

    如果您想查询集合的分区,请添加上述代码中注释的代码片段 .

    options = {}
       options['enableCrossPartitionQuery'] = True
       options['partitionKey'] = 'jay'
    

    您的问题似乎集中在Azure Cosmos数据库查询性能上 .

    您可以参考以下几点来提高查询性能 .

    Partitioning

    您可以在数据库中设置分区键,并在单个分区键上使用过滤器子句进行查询,以便它需要更低的延迟并消耗更低的RU .

    Throughput

    您可以将吞吐量设置得更大,以便Azure Cosmos DB在单位时间内的性能将得到极大提高 . 当然,这将导致更高的成本 .

    Indexing Policy

    索引路径的使用可以提供改进的性能和更低的延迟 .

    有关详细信息,建议您参考official performance documentation .

    希望它能帮到你 .

相关问题