首页 文章

gcp:磁盘完成后*挂载我无法ssh到我的实例

提问于
浏览
1

在磁盘安装完成后的Google Cloud Platform(GCP)中,我无法ssh到我的实例 .

细节:

在Windows 7 PC上通过google cloud sdk shell cmd windows我在 C:\> 运行以下命令:

python "C:\Users\user's name\path\to\py\create_instance_working.py" --name inst-test2 --zone us-central1-a direct-to
pic-1234 cc-test1

运行 create_instance_working.py ,看起来像这样:

import argparse
import os
import time

import googleapiclient.discovery
from six.moves import input


# [START list_instances]
def list_instances(compute, project, zone):
    result = compute.instances().list(project=project, zone=zone).execute()
    return result['items']
# [END list_instances]


# [START create_instance]
def create_instance(compute, project, zone, name, bucket):

    image_response = compute.images().getFromFamily(
        project='direct-topic-1234', family='theFam').execute()
    source_disk_image = image_response['selfLink']

    machine_type = "projects/direct-topic-1234/zones/us-central1-a/machineTypes/n1-standard-4"
    startup_script = open(
        os.path.join(
            os.path.dirname(__file__), 'startup-script_working.sh'), 'r').read()


    print(machine_type) 

    config = {
        'name': name,
        'machineType': machine_type,


        'disks': [
            {
                'boot': True,
                'autoDelete': True,
                'initializeParams': {
                    'sourceImage': source_disk_image,
                    'diskSizeGb': '15',
                }
            }, {

              "deviceName": "disk-2",
              "index": 1,
              "interface": "SCSI",
              "kind": "compute#attachedDisk",
              "mode": "READ_WRITE",
              "source": "projects/direct-topic-1234/zones/us-central1-a/disks/disk-2",
              "type": "PERSISTENT"
            }
        ],

        'networkInterfaces': [{
            'network': 'global/networks/default',
            'accessConfigs': [
                {'type': 'ONE_TO_ONE_NAT', 'name': 'External NAT'}
            ]
        }],


        "serviceAccounts": [
            {
              "email": "123456789-compute@developer.gserviceaccount.com",
              "scopes": [
                "https://www.googleapis.com/auth/devstorage.read_only",
                "https://www.googleapis.com/auth/logging.write",
                "https://www.googleapis.com/auth/monitoring.write",
                "https://www.googleapis.com/auth/servicecontrol",
                "https://www.googleapis.com/auth/service.management.readonly",
                "https://www.googleapis.com/auth/trace.append"
              ]
            }
          ],



        'metadata': {
            'items': [{
                'key': 'startup-script',
                'value': startup_script
            }, {
                'key': 'bucket',
                'value': bucket
            }]
        }
    }

    return compute.instances().insert(
        project=project,
        zone=zone,
        body=config).execute()
# [END create_instance]


# [START delete_instance]
def delete_instance(compute, project, zone, name):
    return compute.instances().delete(
        project=project,
        zone=zone,
        instance=name).execute()
# [END delete_instance]


# [START wait_for_operation]
def wait_for_operation(compute, project, zone, operation):
    print('Waiting for operation to finish...')
    while True:
        result = compute.zoneOperations().get(
            project=project,
            zone=zone,
            operation=operation).execute()

        if result['status'] == 'DONE':
            print("done.")
            if 'error' in result:
                raise Exception(result['error'])
            return result

        time.sleep(1)
# [END wait_for_operation]


# [START run]
def main(project, bucket, zone, instance_name, wait=True):
    compute = googleapiclient.discovery.build('compute', 'v1')

    print('Creating instance.')

    operation = create_instance(compute, project, zone, instance_name, bucket)
    wait_for_operation(compute, project, zone, operation['name'])

    instances = list_instances(compute, project, zone)

    print('Instances in project %s and zone %s:' % (project, zone))
    for instance in instances:
        print(' - ' + instance['name'])

    print("""
Instance created.
It will take a minute or two for the instance to complete work.
Check this URL: http://storage.googleapis.com/{}/output.png
Once the image is uploaded press enter to delete the instance.
""".format(bucket))

#     if wait:
#         input()
# 
#     print('Deleting instance.')
# 
#     operation = delete_instance(compute, project, zone, instance_name)
#     wait_for_operation(compute, project, zone, operation['name'])

    print('all done with instance.')

if __name__ == '__main__':

    print('in here 3')
    main('direct-topic-1234', 'cc-test1', 'us-central1-a', 'inst-test1')
    print('in here 4')
# [END run]

它调用一个如下所示的启动脚本( startup-script_working.sh ):

sudo mkfs.ext4 -m 0 -F -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
sudo mount -o discard,defaults /dev/sdb /var
sudo chmod a+w /var 
sudo cp /etc/fstab /etc/fstab.backup
echo UUID=`sudo blkid -s UUID -o value /dev/sdb` /var ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab

两者都改编自:

https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/compute/api/create_instance.py

在GCP控制台中,当我看到实例绿灯时,我立即单击实例 ssh 按钮并成功连接到实例 . 但是,如果我继续打开与实例的新ssh连接,它们都会工作,直到我在 /var 上的挂载完成 . 通过在ssh连接(通常是第一个)中通过以下方式查找它时,我看到挂载是完整的: df -h . 我可以看到 /dev/sdb 197G 60M 197G 1% /var . 在尝试失败之前,挂载没有出现 . 但是在它出现后,没有什么可以扼杀它 . 在控制台中尝试了 >_ 按钮(shell) gcloud compute ssh [instance name] . 用 [user name]@[external IP] 试过腻子 .

我已经尝试过等待5分钟ssh到实例(挂载将在那时完成),这也不起作用 .

IMPORTANT: 如果我注释掉所有启动脚本行,我可以无限期地连接,没有SSH问题 . 我尝试过创建一个新磁盘并附加它 .

所以它似乎是导致ssh问题的磁盘装载 .

早期的ssh连接继续运行得很好,即使我不能创建一个新的连接 .

当ssh连接失败时,我在ssh窗口中看到:“Connection Failed

与SSH服务器通信时发生错误 . 检查服务器和网络配置 . “

会有什么想法导致这种情况?

实例是Linux发行版SUSE 12

我的安装说明来自这里:

https://cloud.google.com/compute/docs/disks/add-persistent-disk

如果有一个好方法可以避免有用的情况(请提供),但我真的想知道我做错了什么 .

我是GCP的新手,通常是 Cloud ,python,ssh和Linux . (这个问题中的所有内容都是新的!)

如果我注释掉启动脚本行,按照描述运行所有内容,ssh到保险,手动运行启动脚本命令,我没有错误,需要仍然测试我之后是否创建另一个ssh连接 . 我们这样做并报告回来 .

1 回答

  • 1

    挂载在 /var 上,其中包含与 ssh (以及其他内容)相关的数据,因此可以使 /var 的系统看到空白磁盘 . 必须在其他地方保留数据( cp -ar ),将mount挂载到var,然后再移回数据 .

    我之前的回答是错的 .

相关问题