dotlinux guide

How to Troubleshoot Network Issues on Linux: A Comprehensive Guide

Linux powers a vast majority of servers, embedded systems, and cloud infrastructure worldwide, making network connectivity a critical component of its functionality. Whether you’re a system administrator, developer, or DevOps engineer, encountering network issues is inevitable—from intermittent connectivity and slow performance to DNS failures and blocked ports. Troubleshooting these issues requires a systematic approach, familiarity with Linux networking fundamentals, and proficiency with specialized tools. This blog aims to demystify Linux network troubleshooting by breaking down fundamental concepts, exploring essential tools, and providing a step-by-step methodology to diagnose and resolve common issues. By the end, you’ll be equipped to efficiently identify root causes and restore network functionality.

Table of Contents

Understanding Network Fundamentals on Linux

Before diving into troubleshooting, it’s essential to grasp core Linux networking concepts. These form the foundation for diagnosing issues effectively.

Network Interfaces

Network interfaces (e.g., eth0, wlan0, enp0s3) are the physical or virtual connections between a Linux machine and a network. They handle data transmission/reception and are managed by the kernel.

  • Physical interfaces: Wired (Ethernet, eth*) or wireless (Wi-Fi, wlan*).
  • Virtual interfaces: Loopback (lo, 127.0.0.1), VLANs, or tunnels (e.g., tun0).

Use ip link show to list all interfaces and their states (e.g., UP, DOWN).

IP Addressing

An IP address (IPv4 or IPv6) uniquely identifies a device on a network. Linux machines typically obtain IPs via:

  • DHCP: Dynamic assignment (common for desktops/servers).
  • Static: Manual configuration (critical for servers/services).

IPv4 addresses (e.g., 192.168.1.100) use subnet masks (e.g., 255.255.255.0 or CIDR 192.168.1.100/24) to define network boundaries.

Routing

Routing determines how packets travel from a source to a destination. The routing table (stored in the kernel) specifies paths via gateways. Key entries include:

  • Default gateway: The router used for external networks (e.g., 0.0.0.0/0 via 192.168.1.1).
  • Local routes: For the machine’s subnet (e.g., 192.168.1.0/24 dev eth0).

View the routing table with ip route show.

DNS Resolution

DNS (Domain Name System) translates human-readable domain names (e.g., google.com) to IP addresses (e.g., 142.250.72.14). Linux uses:

  • /etc/resolv.conf: Lists DNS servers (e.g., nameserver 8.8.8.8).
  • /etc/hosts: Local overrides for domain-to-IP mappings.

TCP/UDP and Ports

  • TCP (Transmission Control Protocol): Connection-oriented, reliable (e.g., HTTP, SSH).
  • UDP (User Datagram Protocol): Connectionless, fast (e.g., DNS, streaming).

Ports (1-65535) identify services on a device (e.g., port 80 for HTTP, 22 for SSH). “Listening” ports accept incoming connections.

Essential Troubleshooting Tools

Linux provides a robust toolkit for network diagnostics. Below are the most critical tools and their use cases.

ip: Network Interface and Routing Management

The ip command (part of iproute2) replaces legacy tools like ifconfig and route. It manages interfaces, IP addresses, and routing.

Common Commands:

  • List interfaces and IP addresses:

    ip addr show  # Shorthand: ip a

    Example output:

    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
        inet 192.168.1.100/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0
           valid_lft 86399sec preferred_lft 86399sec
  • View routing table:

    ip route show  # Shorthand: ip r

    Example output:

    default via 192.168.1.1 dev eth0 proto dhcp metric 100
    192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.100 metric 100
  • Bring an interface up/down:

    sudo ip link set eth0 up   # Enable interface
    sudo ip link set eth0 down # Disable interface

ping: Testing Connectivity

ping sends ICMP echo requests to a target IP/domain to verify reachability.

Common Commands:

  • Test connectivity to a local IP:

    ping -c 4 192.168.1.1  # Send 4 packets

    Success output:

    64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms
    64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.18 ms
    --- 192.168.1.1 ping statistics ---
    4 packets transmitted, 4 received, 0% packet loss, time 3005ms
  • Test external connectivity (e.g., Google DNS):

    ping -c 4 8.8.8.8

Note: Some networks block ICMP, so ping failures don’t always mean no connectivity.

traceroute/mtr: Path Analysis

  • traceroute: Maps the path packets take to a target, showing hops (routers) and latency.

    traceroute google.com
  • mtr (combines ping and traceroute): Provides real-time path monitoring with packet loss stats.

    mtr --report google.com  # Generate a summary report

ss/netstat: Socket Statistics

ss (Socket Statistics) replaces netstat (deprecated) to list active network connections, listening ports, and socket details.

Common Commands:

  • List all listening TCP/UDP ports (numeric, no DNS):

    ss -tuln  # t: TCP, u: UDP, l: listening, n: numeric

    Example output:

    State   Recv-Q  Send-Q   Local Address:Port    Peer Address:Port
    LISTEN  0       128      0.0.0.0:22           0.0.0.0:*
    LISTEN  0       5        127.0.0.1:631        0.0.0.0:*
  • Show established TCP connections:

    ss -tun  # t: TCP, u: UDP, n: numeric

tcpdump: Packet Capture and Analysis

tcpdump captures raw network packets for deep traffic inspection (requires root).

Common Commands:

  • Capture all traffic on eth0 (limit to 10 packets):

    sudo tcpdump -i eth0 -c 10
  • Filter by port (e.g., HTTP/80):

    sudo tcpdump -i eth0 port 80
  • Save captures to a file for later analysis (e.g., with Wireshark):

    sudo tcpdump -i eth0 -w capture.pcap

dig/nslookup: DNS Troubleshooting

  • dig (Domain Information Groper): Queries DNS servers to debug resolution.

    dig google.com  # Basic query
    dig @8.8.8.8 example.com  # Query using Google DNS
  • nslookup: Simpler alternative for DNS lookups:

    nslookup google.com

Systematic Troubleshooting Methodology

Network issues often stem from multiple layers (e.g., physical, IP, DNS). Use this step-by-step workflow to isolate root causes.

Step 1: Identify the Scope of the Issue

  • Is the problem local? (Only your machine vs. others on the network.)
    Ask: “Can other devices connect to the network/internet?”
    • If yes: Issue is isolated to your machine.
    • If no: Problem lies with the network (e.g., router, ISP).

Step 2: Check Physical/Layer 1 Connections

  • Verify Ethernet cables are plugged in (look for link lights on the router/switch).
  • For Wi-Fi: Ensure the interface is enabled (ip link show wlan0) and connected to the correct SSID.

Step 3: Verify Network Interface Status

  • Use ip addr show to confirm:
    • Interface is UP (state: <UP,LOWER_UP>).
    • An IP address is assigned (e.g., inet 192.168.1.100/24).
  • If no IP: Check DHCP (e.g., sudo dhclient eth0 to force a lease).

Step 4: Test Local and Gateway Connectivity

  • Local subnet: Ping another device on the same network (e.g., ping 192.168.1.101).
    • Failure: Check subnet mask, IP conflict, or switch issues.
  • Gateway: Ping the default gateway (from ip route show, e.g., 192.168.1.1).
    • Failure: Gateway is down or misconfigured.
  • External network: Ping a public IP (e.g., 8.8.8.8).
    • Failure: ISP outage or gateway routing issue.

Step 5: Diagnose DNS Issues

If IP-based ping works but domain-based ping fails (e.g., ping google.com), DNS is likely the culprit:

  • Check /etc/resolv.conf for valid DNS servers:
    cat /etc/resolv.conf
    Example (good):
    nameserver 8.8.8.8
    nameserver 8.8.4.4
  • Test DNS resolution with dig @8.8.8.8 google.com. If this works, your DNS server is faulty.

Step 6: Inspect Firewall Rules

Linux firewalls (e.g., iptables, ufw) may block traffic.

  • List ufw rules (simpler):
    sudo ufw status
  • List iptables rules (advanced):
    sudo iptables -L -n
  • Temporarily disable the firewall to test (use cautiously!):
    sudo ufw disable  # For ufw
    sudo systemctl stop iptables  # For iptables

Step 7: Analyze Traffic with Packet Capture

If the issue persists, use tcpdump to inspect raw packets:

sudo tcpdump -i eth0 port 443  # Capture HTTPS traffic

Look for retransmissions (TCP [RST] or [SYN,ACK] failures) indicating blocked ports or misconfigured services.

Common Network Issues and Solutions

No Connectivity to Local Subnet

  • Symptoms: Can’t ping local devices; ip addr shows no IP.
  • Diagnosis: Check if DHCP is working (sudo dhclient -v eth0 for verbose output).
  • Fix:
    • Renew DHCP lease: sudo dhclient -r eth0 && sudo dhclient eth0.
    • Assign a static IP: Edit /etc/netplan/*.yaml (Ubuntu) or /etc/sysconfig/network-scripts/ifcfg-eth0 (RHEL).

No Internet Access (Gateway/ISP Issues)

  • Symptoms: Local ping works, but ping 8.8.8.8 fails.
  • Diagnosis: Check gateway reachability (ping 192.168.1.1). If gateway is down, reboot the router.
  • Fix:
    • Verify gateway in ip route; re-add if missing: sudo ip route add default via 192.168.1.1 dev eth0.
    • Contact ISP if gateway is up but external ping fails.

DNS Resolution Failures

  • Symptoms: ping 8.8.8.8 works, but ping google.com fails.
  • Diagnosis: dig google.com shows NXDOMAIN (no record).
  • Fix:
    • Add reliable DNS servers to /etc/resolv.conf: nameserver 8.8.8.8 and nameserver 8.8.4.4.
    • Restart systemd-resolved (modern systems): sudo systemctl restart systemd-resolved.

Slow Network Performance

  • Symptoms: High latency; mtr shows packet loss on hops.
  • Diagnosis: Use mtr google.com to identify problematic routers.
  • Fix:
    • Check for bandwidth saturation: iftop or nload.
    • Replace faulty cables; move Wi-Fi devices closer to the router.

Port Blocking or Service Unavailability

  • Symptoms: Service (e.g., SSH) is running but unreachable.
  • Diagnosis: ss -tuln shows the service isn’t listening on the expected port; telnet 192.168.1.100 22 fails.
  • Fix:
    • Ensure the service is running: sudo systemctl status sshd.
    • Open the port in the firewall: sudo ufw allow 22/tcp.

IP Address Conflicts

  • Symptoms: Intermittent connectivity; arping shows duplicate MACs.
  • Diagnosis:
    sudo arping -I eth0 192.168.1.100  # Check for conflicting MACs
  • Fix: Assign a static IP outside the DHCP range; restart the DHCP server.

Best Practices for Network Troubleshooting on Linux

  1. **Document the Network Top