Troubleshooting Tailscale Network




中文版:一次Tailscale网络问题的调试过程 – Frank’s Weblog

As mentioned in an earlier post, I used Tailscale to create a mesh network that connects all of my devices, and I used a cloud server located in AliCloud Beijing as an exit node, in order to access geographically restricted network services.

However, I noticed that I could not access the Internet at all when using that exit node. I thought it was a network connectivity issue with the relays, so I didn’t worry too much about it. But afterward, I noticed some other services on that server was not functioning, so I looked into it and found out that the problem was not that simple.

First I noticed that I couldn’t access the internet at all from the server, but curl the IP address was working, which indicated the problem with DNS resolution. resolvectl status showed that there were two DNS servers. I assumed this was the DNS server for the Tailscale internal network (actually not, will elaborate later) since the IPs started with 100.100[1],

Link 2 (eth0)
......
  Current DNS Server: 100.100.2.136
         DNS Servers: 100.100.2.136
                      100.100.2.138

I tried dig @100.100.2.136 baidu.com to check the response from the DNS server and got connection timed out: no servers could be reached. The response from the command became normal after shutting down Tailscale. So probably Tailscale somehow affected the DNS resolution on the system.

Workaround

Changing the DNS configuration on the server will work around this problem. Edit /etc/netplan/99-netcfg.yaml, add a public DNS into nameserver section under eth0.

network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      dhcp4: yes
      dhcp6: no
      nameservers:
        addresses: [114.114.114.114]

Run sudo netplan apply to apply changes, then dig baidu.com returns the correct response.

However, modifying the DNS server allows the server to access the Internet, but many services inside AliCloud still require internal DNS resolution. For example, AliCloud’s internal apt mirror (mirrors.cloud.aliyuncs.com) and products such as cloud databases. Configuring apt sources to public mirrors can be a workaround for the apt mirror issue.

Locating Issue

To locate the problem, we need to find the reason why the IP address 100.100.2.136 is not reachable. I thought these two DNS servers were IPs in Tailscale’s internal network, but they were inaccessible by all means. After some searching, I found that 100.100.2.136 and 100.100.2.138 are actually internal DNS servers provided by AliCloud. There are also some AliCloud internal services that use similar IPs, for example, the apt mirror whose IP is 100.100.2.148, which is also not able to connect using curl.

We can therefore draw a preliminary conclusion that Tailscale somehow affected access to the 100.100.x.x IP range.

Possibilities

Routing

My first thought was that Tailscale was routing the entire 100.100.x.x IP range. However, according to the Tailscale documentation, Tailscale only routes the assigned IP address, not the entire CIDR. ip route list also confirms this.

ip route list table 52
100.69.x.x dev tailscale0
100.90.x.x dev tailscale0
100.96.x.x dev tailscale0
100.98.x.x dev tailscale0
100.100.100.100 dev tailscale0
100.104.x.x dev tailscale0
100.121.x.x dev tailscale0
100.127.x.x dev tailscale0

ip route get 100.100.2.136 returns the following result, indicating that the packet will be routed to the eth0 interface. This indicates that the routing table is correct and that the problem is not with the routing.

100.100.2.136 via 172.24.63.253 dev eth0 src 172.24.4.100 uid 0
    cache

iptables

Another thing that may interfere the packets traveling is iptables. iptables -S reveals the following entries related to Tailscale.

-A ts-forward -i tailscale0 -j MARK --set-xmark 0x40000/0xffffffff
-A ts-forward -m mark --mark 0x40000 -j ACCEPT
-A ts-forward -s 100.64.0.0/10 -o tailscale0 -j DROP
-A ts-forward -o tailscale0 -j ACCEPT
-A ts-input -s 100.92.187.56/32 -i lo -j ACCEPT
-A ts-input -s 100.115.92.0/23 ! -i tailscale0 -j RETURN
-A ts-input -s 100.64.0.0/10 ! -i tailscale0 -j DROP

The last entry of these rules drops the packets to the entire 100.64.0.0/10 CIDR. The problem was solved after removing the rule using iptables -D.

After some searching, I found there are issues already posted earlier this year:

[1] tailscale drops 100.64.0.0/10 on firewall when ipv4 is disabled · Issue #3837 · tailscale/tailscale · GitHub

[2] FR: netfilter CGNAT mode when non-Tailscale CGNAT addresses should be allowed · Issue #3104 · tailscale/tailscale · GitHub

Conclusion

To sum up, the problem was caused by a firewall rule set by Tailscale to block traffic to 100.64.0.0/10 CIDR, therefore some services on AliCloud’s internal network were blocked because they reside in this IP range. According to Tailscale CLI documentation, adding --netfilter-mod=off parameter when starting Tailscale can avoid this rule from being set. However, this poses some security risks.

Tailscale set this rule because the IP range (100.64.0.0/10)[1] it uses for the Tailscale network is reserved for Carrier Grade NAT (CGNAT) and was assumed not to be used by the private networks. However, AliCloud uses this IP range for their internal services, thus causing conflict.

References

iptables(8) Linux man page

[1] What are these 100.x.y.z addresses? · Tailscale


3 responses to “Troubleshooting Tailscale Network”

Leave a Reply

Your email address will not be published.