首页 文章

Cassandra:如何将整个表移动到另一个键空间

提问于
浏览
2

我的Cassandra的版本信息:

[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL规范3.3.1 |原生协议v4]

我试图将一些巨大的表(几百万行)移动到另一个键空间 . 除了“COPY to csv,COPY from csv”之外,还有更好的解决方案吗?

1 回答

  • 3

    好的,我设法让它在运行2.2.8的单节点集群上运行 .

    我通过将 holidays 表从我的 presentation 键空间移动到我的 stackoverflow 键空间进行了实验 .

    以下是我采取的步骤:

    Create the table inside the new keyspace.

    此步骤很重要,因为每个表都有一个唯一标识符的UUID,存储在 cf_id 列的 system.schema_columnfamilies 表中 . 此ID附加到保存数据的目录名称 . 通过将模式从一个密钥空间复制/粘贴到另一个密钥空间,您将确保使用相同的列名称,但会生成新的唯一标识符 .

    注意:在3.x中,标识符存储在 system_schema.tables 表中 .

    aploetz@cqlsh:stackoverflow> CREATE TABLE holidays (
     type text,
     eventtime timestamp,
     beginend text,
     name text,
     PRIMARY KEY (type, eventtime, beginend)
    ) WITH CLUSTERING ORDER BY (eventtime DESC, beginend DESC);
    
    aploetz@cqlsh:stackoverflow> SELECT * FROM stackoverflow.holidays ;
    
     type | eventtime | beginend | name
    ------+-----------+----------+------
    
    (0 rows)
    

    Stop your node(s) properly (DISABLEGOSSIP, DRAIN, kill/stop, etc...).

    Now, find the location of the old and new table on disk, and copy/move the files to the new location (来自旧地点):

    $ ls -al /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
    total 12
    drwxrwxr-x  3 aploetz aploetz 4096 Nov  8 14:25 .
    drwxrwxr-x 17 aploetz aploetz 4096 Nov  8 14:25 ..
    drwxrwxr-x  2 aploetz aploetz 4096 Nov  8 14:25 backups
    $ cp /var/lib/cassandra/data22/presentation/holidays-74bcfde0139011e6a67c2575e6398503/la* /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
    
    $ ls -al /var/lib/cassandra/data22/stackoverflow/holidays-77a767e0a5f111e6a2bebd9d201c4c8f/
    drwxrwxr-x  3 aploetz aploetz 4096 Nov  8 14:26 .
    drwxrwxr-x 17 aploetz aploetz 4096 Nov  8 14:25 ..
    drwxrwxr-x  2 aploetz aploetz 4096 Nov  8 14:25 backups
    -rw-rw-r--  1 aploetz aploetz   43 Nov  8 14:26 la-1-big-CompressionInfo.db
    -rw-rw-r--  1 aploetz aploetz  628 Nov  8 14:26 la-1-big-Data.db
    -rw-rw-r--  1 aploetz aploetz    9 Nov  8 14:26 la-1-big-Digest.adler32
    -rw-rw-r--  1 aploetz aploetz   16 Nov  8 14:26 la-1-big-Filter.db
    -rw-rw-r--  1 aploetz aploetz   57 Nov  8 14:26 la-1-big-Index.db
    -rw-rw-r--  1 aploetz aploetz 4468 Nov  8 14:26 la-1-big-Statistics.db
    -rw-rw-r--  1 aploetz aploetz   94 Nov  8 14:26 la-1-big-Summary.db
    -rw-rw-r--  1 aploetz aploetz   94 Nov  8 14:26 la-1-big-TOC.txt
    -rw-rw-r--  1 aploetz aploetz   43 Nov  8 14:26 la-2-big-CompressionInfo.db
    -rw-rw-r--  1 aploetz aploetz  164 Nov  8 14:26 la-2-big-Data.db
    -rw-rw-r--  1 aploetz aploetz   10 Nov  8 14:26 la-2-big-Digest.adler32
    -rw-rw-r--  1 aploetz aploetz   16 Nov  8 14:26 la-2-big-Filter.db
    -rw-rw-r--  1 aploetz aploetz   26 Nov  8 14:26 la-2-big-Index.db
    -rw-rw-r--  1 aploetz aploetz 4460 Nov  8 14:26 la-2-big-Statistics.db
    -rw-rw-r--  1 aploetz aploetz  108 Nov  8 14:26 la-2-big-Summary.db
    -rw-rw-r--  1 aploetz aploetz   94 Nov  8 14:26 la-2-big-TOC.txt
    

    Restart your node(s).

    Query via cqlsh:

    Connected to SnakesAndArrows at 127.0.0.1:9042.
    [cqlsh 5.0.1 | Cassandra 2.2.8 | CQL spec 3.3.1 | Native protocol v4]
    Use HELP for help.
    aploetz@cqlsh> SELECT * FROM stackoverflow.holidays ;
    
     type         | eventtime                | beginend | name
    --------------+--------------------------+----------+------------------------
        Religious | 2016-12-26 05:59:59+0000 |        E |              Christmas
        Religious | 2016-12-25 06:00:00+0000 |        B |              Christmas
        Religious | 2016-03-28 04:59:59+0000 |        E |                 Easter
        Religious | 2016-03-27 05:00:00+0000 |        B |                 Easter
     presentation | 2016-05-06 20:40:08+0000 |        B |        my presentation
     presentation | 2016-05-06 20:40:03+0000 |        B |        my presentation
     presentation | 2016-05-06 20:39:15+0000 |        B |        my presentation
     presentation | 2016-05-06 20:38:10+0000 |        B |        my presentation
               US | 2016-07-05 04:59:59+0000 |        E |            4th of July
               US | 2016-07-04 05:00:00+0000 |        B |            4th of July
               US | 2016-05-09 04:59:59+0000 |        E |            Mothers Day
               US | 2016-05-08 05:00:00+0000 |        B |            Mothers Day
             Nerd | 2016-12-22 05:59:59+0000 |        E |               2112 Day
             Nerd | 2016-12-21 06:00:00+0000 |        B |               2112 Day
             Nerd | 2016-09-26 04:59:59+0000 |        E |             Hobbit Day
             Nerd | 2016-09-25 05:00:00+0000 |        B |             Hobbit Day
             Nerd | 2016-09-20 04:59:59+0000 |        E | Talk Like a Pirate Day
             Nerd | 2016-09-19 05:00:00+0000 |        B | Talk Like a Pirate Day
             Nerd | 2016-05-07 04:59:59+0000 |        E |         Star Wars Week
             Nerd | 2016-05-04 05:00:00+0000 |        B |         Star Wars Week
             Nerd | 2016-03-14 05:00:00+0000 |        E |                 Pi Day
             Nerd | 2016-03-14 05:00:00+0000 |        B |                 Pi Day
    
    (22 rows)
    

    这种方法的问题在于,您需要停止群集,并在每个节点上移动文件 . 而cqlsh COPY 允许您在单个节点上导入和导出,而群集仍在运行 .

    而且我知道 COPY 具有将其限制为较小数据集的声誉 . 但2.2.x有一些选项可以帮助限制COPY,以防止它在大型数据集上超时 . 我最近得到它导出/导入3.7亿行没有超时 .

相关问题