-1

We are debugging the logging of some services, such as the moving_data_to_hdfs.service, on our RHEL 7.6 machines.

We ran the following command:

journalctl -u moving_data_to_hdfs.service >/log.txt 

From the journalctl logs, we observed many exceptions like:

Temporary failure in name resolution 

However, when we check the resolution for all hostnames and IP addresses, we do not find any issues.

We also prepared a bash script that checks, in a loop, the resolution of all IPs and hostnames in our cluster, and the results are fine.

Example:

Host <hostname> Host <xxx.xxx.xxx.xxx> 

So, referring back to the journalctl logs, which are complaining about DNS resolution, we want to understand:

How does systemctl test DNS? Or what approach does systemctl use to test the resolution of hostnames or IP addresses?

1 Answer 1

1

How does systemctl test DNS?

It doesn't. Systemctl doesn't touch DNS at all; its job is to start services. (Or more specifically, it talks to systemd to start services.)

And the specific message "Failed to establish a new connection" is not even from systemctl. It's from python-urllib.

To investigate, run your bash script as a systemd service. Hacking the same service, or copying into one with identical parameters (the various "hardening" parameters specifically), would be best as you want to test from the same environment. But even just any systemd service (like e.g. systemd-run --shell) might still make a good test. For that matter, try starting your regular program from systemd-run --shell as that runs inside of a service context.

Note that host is a direct DNS client, so it bypasses the actual Linux "hostname lookup" functions that python-urllib will normally be using. Instead, test using getent hosts or socket.getaddrinfo().

getent hosts example.com getent ahosts example.com getent -s dns hosts example.com 

Again, run these tests from an environment as close to what your program sees as possible. (For example, there briefly was a dbus-broker issue which prevented services from talking to systemd-resolved, while it worked perfectly fine from a non-service context. SELinux is another commonly encountered difference.)

3
  • BTW we are not using the systemd-resolved service on our RHEL machines , and also not dbus-broker , what is running is systemctl status dbus.service Commented May 20 at 18:11
  • another important note is that the exception about DNS happened on specific time for example at 12:00 AM Commented May 20 at 18:30
  • Well, then you need to run your tests at 12:00 AM as well. Maybe your upstream Internet link restarts at 12:00 AM, or maybe your sysadmins apply updates and restart the local DNS server at 12:00 AM. Commented May 21 at 7:40

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.