SELinux Has a UI Problem

SELinux Has a UI Problem

If you ever troubleshoot a problem on Red Hat (or closely related distros such as Fedora and CentOS), you'll come across dozens of tutorials and articles that tell you to resolve the problem by disabling SELinux. It's not just random blog posts and questionable StackOverflow answers: I've seen this advice in the documentation and knowledge base articles for enterprise software.

Don't disable SELinux. Really, don't turn off SELinux.

A post on the official CentOS blog asks "[W]hy do articles feel the need to outright deactivate SELinux rather than help readers work through any problems they might have? Is SELinux that hard? Actually, it's really not." It doesn't have anything to do with SELinux being too difficult. SELinux has a UI problem.

When you say user interface (UI) design or user experience (UX), most people immediately think about graphical user interfaces (GUIs). A good UI is just as important for a command line application. If a command requires a lot of cumbersome flags or difficult-to-remember options, a user may use another app or *gasp* a GUI to accomplish the task. A software library also has a UI. The API is the UI with which a developer interacts with the library.

While SELinux is usually invisible to the user, it does have a UI. I'm not talking about the various tools including some GUIs that allow administrators to set policy. I'm talking about how SELinux gives the user feedback when they attempt to do something that is blocked by SELinux on why it was blocked or even that it was SELinux that blocked it.

Like any good sysadmin, I periodically run yum update to make sure I'm protected from any newly discovered vulnerabilities, but it is not without some trepidation that I do this. It's not often, but sometimes it breaks something that was working perfectly fine before.

The other day I did yum update on a server running Tomcat (fortunately, I did it on the test server first). Tomcat was working fine, but the web app deployed to Tomcat had stopped working. I tried redeploying the app and restarting Tomcat. Still nothing, so I then checked the Tomcat logs:

org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (IO Error: The Network Adapter could not establish the connection)
Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection
Caused by: java.net.ConnectException: Permission denied (connect failed)

There is absolutely nothing here that indicates SELinux is even the problem. Yes, the end of the stack trace is Permission denied, but that sounds like it could mean the database account the web app is using is locked. I verified I was able to access the database over the network (I ran a simple query using SQL Developer from my workstation), I verified the server Tomcat was on in particular could access the database over the network (I ran a simple query using SQL*Plus), and I made sure the database account used by the web app was not locked.

I then turned to the web app vendor's support site. They had a knowledge base article on the error. It turns out a tomcat_can_network_connect_db SELinux boolean was added in RHEL 7.6. There is nothing special about a RHEL point release. They're basically just new installation media. All you need to do is install RHEL 7.x, and a yum update will get you to the latest point release. Setting aside that one shouldn't expect any breaking changes in a point release, it should have been made abundantly clear to the user when Tomcat couldn't connect to a database that the reason was this new SELinux boolean. This is a UI issue.

Most servers don't have a desktop environment installed. The primary way a user (here I'm talking about a user of the server such as an administrator not a user of the web app) interacts with the server is via SSH. In the case of servers, logs often are the UI.

Tomcat is where the problem was, so where am I going to look for the issue? In the Tomcat log, of course. A user-friendly UI then would put some kind of indication in the Tomcat log right alongside the database error that the database connection had been blocked by SELinux. I'm sure SELinux logged the policy violation in its own log, but that is only helpful if I know that SELinux is the issue. That's poor UI design on the part of SELinux.

Some might argue the blame is with Tomcat. Tomcat should log a more helpful error that indicates the issue is SELinux. Tomcat may not have been aware that the issue was SELinux. Tomcat runs on a variety of platforms and Linux distros, many of which do not have SELinux. The Tomcat developers may be unaware of changes to SELinux that could effect Tomcat. SELinux, on the other hand, knew this change would potentially break many Tomcat setups (tomcat is in the name of the boolean). I wasn't using a third-party build of Tomcat either; I was using the Tomcat straight from the Oracle Linux (based on RHEL) repo and journalctl -u tomcat to view the logs. Since everything is logged centrally via systemd, it seems SELinux should be able to log messages there as well. Why else force all logging through systemd? (I am no hater of systemd and actually think creating system services with systemd is a breeze compared to SysV-style init, but I'm not a fan of the way systemd handles logging.)

Why do so many people and vendors just recommend turning off SELinux? Because the SELinux UI is not user-friendly. In fact, the knowledge base article I found on the vendor's support site said the way to resolve the problem was to edit /etc/selinux/config and set SELINUX=disabled. I didn't do that. Since the article already identified the boolean, I turned that specific boolean on with setsebool -P tomcat_can_network_connect_db 1 (the -P flag causes the change to persist after reboots).

This isn't an isolated experience. One time I was trying to configure Apache to serve files in a home directory. There was nothing in the Apache logs to indicate SELinux was the issue. I spent hours messing with file permissions and chmoding directories before discovering the SELinux boolean httpd_enable_homedirs.

I feel the benefits of SELinux outweigh the inconvenience of the poor UI. I spent time troubleshooting the database when the problem had nothing to do with the database. I often use open source projects with a poor UI, because I feel the benefits of open source outweigh the bad design (though there are instances where I think the open source solution actually has a superior UI than its proprietary alternatives). But it would be nice if SELinux had a better UI, and if we want to win new users to open source, we need to improve our UI. SELinux needs to get better about indicating when something is blocked by SELinux in the location where the user is going to be looking for the problem (e.g. in the Tomcat log for Tomcat issues or the Apache log for Apache issues).