首页 文章

Linux VXLAN驱动程序和网络命名空间

提问于
浏览
0

我试图了解linux内核中的vxlan驱动程序代码 . 内核版本是:3.16.0-29-generic

看看 vxlan.c 似乎每个VNI都创建了一个vxlan dev,并且它与netdevice所属的netns相关联,并且每个dev都创建一个udp套接字 .

我对此感到有点困惑,因为除了全局网络之外,您无法将vxlan设备真正连接到物理设备( ethx ),因为物理设备必须属于与vxlan设备相同的网络 .

例如:如果我在全局netns中创建vxlan链接,它将按预期工作:

ip link add vxlan0 type vxlan id 10 dev eth0
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 10.10.100.51/24 scope global lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:22:4d:99:32:6b brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.25/24 brd 192.168.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::222:4dff:fe99:326b/64 scope link 
       valid_lft forever preferred_lft forever
15: vxlan0: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default 
    link/ether fe:9c:49:26:ba:63 brd ff:ff:ff:ff:ff:ff

如果我尝试在网络命名空间中执行相同的操作,它将无法工作:

ip netns exec test0 ip link add vxlan1 type vxlan id 20 dev eth0
ip netns exec test0 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

这里的问题是它不喜欢“dev eth0”,因为代码检查eth0是否与添加的链接在同一个netns中 .

如果我创建没有eth0的相同设备,它可以正常工作:

ip netns exec test0 ip link add vxlan1 type vxlan id 20 
ip netns exec test0 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: vxlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default 
    link/ether 46:7a:5b:87:7d:2f brd ff:ff:ff:ff:ff:ff

如果你不能将运营商连接到vxlan设备,你怎么能真正tx / rx数据包进出主机外?

这是否真的意味着您只能将vxlan驱动程序与全局netns一起使用,或者您“必须”将它与桥接器一起使用?

vxlan数据包具有与之关联的VNI . 您应该能够使用它直接将数据包发送到非全局网络中的开发人员,类似于macvlans真正可能的情况 .

我错过了什么吗?

3 回答

  • 0

    事实证明,您可以将物理设备添加到非全局网络中 . 因此这个问题没有实际意义 . 我宁愿在全局网络中看到一个vxlan设备基于VNI将数据包发送到适当的网络,类似于在macvlans的情况下如何实现 .

  • 0

    在内核4.3中,有patches添加了一种使用VXLAN的新方法,使用单个VXLAN netdev和路由规则来添加隧道信息 .

    根据补丁,您将能够创建查看隧道信息的路由规则,例如:

    ip rule add from all tunnel-id 100 lookup 100
    ip rule add from all tunnel-id 200 lookup 200
    

    并使用以下规则添加封装头:

    ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0
    
  • 0

    我想你可以看看http://events.linuxfoundation.org/sites/events/files/slides/2013-linuxcon.pdf

    您可以打开不在全局网络中的vxlan设备的l2miss和l3miss,并手动设置ARP和FDB条目 .

    以下示例显示了如何实现此目的 .

    function setup_overlay() {
        docker run -d --net=none --name=test-overlay ubuntu sleep 321339
        sleep 3
        pid=`docker inspect -f '{{.State.Pid}}' test-overlay`
        ip netns add overlay
        ip netns exec overlay ip li ad dev br0 type bridge
        ip li add dev vxlan212 type vxlan id 42 l2miss l3miss proxy learning dstport 4789
        ip link set vxlan212 netns overlay
        ip netns exec overlay ip li set dev vxlan212 name vxlan1
        ip netns exec overlay brctl addif br0 vxlan1
        ip li add dev vetha1 mtu 1450 type veth peer name vetha2 mtu 1450
        ip li set dev vetha1 netns overlay
        ip netns exec overlay ip -d li set dev vetha1 name veth2
        ip netns exec overlay brctl addif br0 veth2
        ip netns exec overlay ip ad add dev br0 $bridge_gatway_cidr
        ip netns exec overlay ip li set vxlan1 up
        ip netns exec overlay ip li set veth2 up
        ip netns exec overlay ip li set br0 up
        ln -sfn /proc/$pid/ns/net /var/run/netns/$pid
        ip li set dev vetha2 netns $pid
        ip netns exec $pid ip li set dev vetha2 name eth1 address $container1_mac_addr
        ip netns exec $pid ip ad add dev eth1 $container1_ip_cidr
        ip netns exec $pid ip li set dev eth1 up
    
        ip netns exec overlay ip neighbor add $container2_ip lladdr $container2_mac_addr dev vxlan1 nud permanent
        ip netns exec overlay bridge fdb add $container2_mac_addr dev vxlan1 self dst $container2_host_ip vni 42 port 4789
    }
    
    # setup overlay on host1
    bridge_gatway_cidr='10.0.0.1/24'
    container1_ip_cidr='10.0.0.2/24'
    container1_mac_addr='02:42:0a:00:00:02'
    container2_ip='10.0.0.3'
    container2_mac_addr='02:42:0a:00:00:03'
    container2_host_ip='192.168.10.22'
    setup_overlay
    
    # setup overlay on host2
    bridge_gatway_cidr='10.0.0.1/24'
    container1_ip_cidr='10.0.0.3/24'
    container1_mac_addr='02:42:0a:00:00:03'
    container2_ip='10.0.0.2'
    container2_mac_addr='02:42:0a:00:00:02'
    container2_host_ip='192.168.10.21'
    setup_overlay
    

    上面的脚本在两台主机上的两个docker容器之间设置了一个覆盖网络 . Vxlan设备连接到 overlay netns中的桥 br0br0 用一对veth设备连接到容器netns .

    现在检查新设置的覆盖网络 .

    # ping container2 on host1
    ip netns exec $pid ping -c 10 10.0.0.3
    ## successful output
    root@docker-1:/home/vagrant# ip netns exec $pid ping -c 10 10.0.0.3
    PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data.
    64 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=0.879 ms
    64 bytes from 10.0.0.3: icmp_seq=2 ttl=64 time=0.558 ms
    64 bytes from 10.0.0.3: icmp_seq=3 ttl=64 time=0.576 ms
    64 bytes from 10.0.0.3: icmp_seq=4 ttl=64 time=0.614 ms
    64 bytes from 10.0.0.3: icmp_seq=5 ttl=64 time=0.521 ms
    64 bytes from 10.0.0.3: icmp_seq=6 ttl=64 time=0.389 ms
    64 bytes from 10.0.0.3: icmp_seq=7 ttl=64 time=0.551 ms
    64 bytes from 10.0.0.3: icmp_seq=8 ttl=64 time=0.565 ms
    64 bytes from 10.0.0.3: icmp_seq=9 ttl=64 time=0.488 ms
    64 bytes from 10.0.0.3: icmp_seq=10 ttl=64 time=0.531 ms
    
    --- 10.0.0.3 ping statistics ---
    10 packets transmitted, 10 received, 0% packet loss, time 9008ms
    rtt min/avg/max/mdev = 0.389/0.567/0.879/0.119 ms
    
    ## tcpdump sample on host1
    root@docker-1:/home/vagrant# tcpdump -vv -n -s 0 -e -i eth1
    tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
    12:09:35.589244 08:00:27:00:4a:3a > 08:00:27:82:e5:ca, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 59751, offset 0, flags [none], proto UDP (17), length 134)
        192.168.0.11.42791 > 192.168.0.12.4789: [no cksum] VXLAN, flags [I] (0x08), vni 42
    02:42:0a:00:00:02 > 02:42:0a:00:00:03, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 49924, offset 0, flags [DF], proto ICMP (1), length 84)
        10.0.0.2 > 10.0.0.3: ICMP echo request, id 1908, seq 129, length 64
    12:09:35.589559 08:00:27:82:e5:ca > 08:00:27:00:4a:3a, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 38389, offset 0, flags [none], proto UDP (17), length 134)
        192.168.0.12.56727 > 192.168.0.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 42
    02:42:0a:00:00:03 > 02:42:0a:00:00:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 19444, offset 0, flags [none], proto ICMP (1), length 84)
        10.0.0.3 > 10.0.0.2: ICMP echo reply, id 1908, seq 129, length 64
    12:09:36.590840 08:00:27:00:4a:3a > 08:00:27:82:e5:ca, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 59879, offset 0, flags [none], proto UDP (17), length 134)
        192.168.0.11.42791 > 192.168.0.12.4789: [no cksum] VXLAN, flags [I] (0x08), vni 42
    02:42:0a:00:00:02 > 02:42:0a:00:00:03, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 49951, offset 0, flags [DF], proto ICMP (1), length 84)
        10.0.0.2 > 10.0.0.3: ICMP echo request, id 1908, seq 130, length 64
    12:09:36.591328 08:00:27:82:e5:ca > 08:00:27:00:4a:3a, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 38437, offset 0, flags [none], proto UDP (17), length 134)
        192.168.0.12.56727 > 192.168.0.11.4789: [no cksum] VXLAN, flags [I] (0x08), vni 42
    02:42:0a:00:00:03 > 02:42:0a:00:00:02, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 19687, offset 0, flags [none], proto ICMP (1), length 84)
        10.0.0.3 > 10.0.0.2: ICMP echo reply, id 1908, seq 130, length 64
    

    清理每台主机

    ip netns del overlay
    ip netns del $pid
    docker rm -v -f test-overlay
    

    解释为什么vxlan设备在非全局网络中不使用接收器的原因:

    请注意,我们首先在全局netns中创建vxlan设备并将其移动到 overlay netns中 . 这确实是需要的,因为在创建vxlan设备时,内核中的vxlan驱动程序将保留对src netns的引用 . 请参阅drivers / net / vxlan.c中的以下代码

    static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
     
            struct vxlan_config *conf)
     {
        //...
        vxlan->net = src_net;
    
        //...
    }
    

    和vxlan驱动程序在src netns中创建udp套接字

    vxlan_sock_add(vxlan->net, vxlan->cfg.dst_port, vxlan->cfg.no_share, vxlan->flags);

    

相关问题