Troubleshooting

Start with the obvious: journalctl --since "10 min ago" -p err and the site's own logs in /data/web/<site>/logs/. Most problems show up there immediately.

The site doesn't respond at all

Diagnostic flow:

  1. curl -I http://example.com/ — what does the server actually return?
    • 444 / connection reset → no vhost is matching. Check nginx -T 2>/dev/null | grep -A2 server_name.
    • 502 Bad Gateway → nginx is up but FPM isn't reachable. See next case.
    • 404 → vhost is matching but the path or webroot is wrong. Check root in /etc/nginx/sites-available/example.com.conf.
    • timeout → firewall or DNS. sudo ufw status, dig example.com.
  2. sudo nginx -t — is the config valid?
  3. sudo systemctl status nginx — is the daemon running?
  4. tail -F /data/web/example.com/logs/nginx-error.log

502 Bad Gateway

nginx couldn't reach the FPM socket. Check:

# Is the socket file there?
ls -l /data/web/example.com/tmp/php-fpm.sock

# Is the FPM pool actually listening?
sudo ss -lxnp | grep php-fpm.sock

# Is the pool config valid?
sudo php-fpm8.4 -t

# Is the FPM master alive?
sudo systemctl status php8.4-fpm

# Recent FPM errors?
sudo journalctl -u php8.4-fpm --since "10 min ago"

If the socket is missing or has the wrong permissions, reload FPM. If reload doesn't fix it, the pool config is bad — fix and try again.

Common cause: a typo in conf/php.ini or in the pool's php_admin_value[X] lines causes php-fpm -t to fail and the master won't reload. The OLD pool stays up until the next full restart.

"SQLSTATE[HY000] [1045] Access denied" or "Connection refused"

The DB credentials in .envtulix don't work. Test directly:

source /data/web/example.com/conf/.envtulix
mariadb -h"$DB_HOST" -P"$DB_PORT" -u"$DB_USER" -p"$DB_PASS" -e "SELECT 1" "$DB_NAME"

Possible causes:

"NOAUTH Authentication required" or "NOPERM no permission" from Redis

source /data/web/example.com/conf/.envtulix
redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" \
    --user "$REDIS_USER" --pass "$REDIS_PASS" \
    -n "$REDIS_DB" PING

If NOAUTH → password is wrong. Reset:

sudo sed -i "/^user $REDIS_USER /d" /etc/redis/users.acl
sudo bash -c "echo 'user $REDIS_USER on >$REDIS_PASS ~$REDIS_PREFIX:* &$REDIS_PREFIX:* +@all -@dangerous -select +select|$REDIS_DB' >> /etc/redis/users.acl"
sudo redis-cli -a "$REDIS_ADMIN_PASS" ACL LOAD

If NOPERM → app is using keys without the prefix. The ACL only allows keys matching $REDIS_PREFIX:*. Confirm via:

sudo redis-cli -a "$REDIS_ADMIN_PASS" ACL GETUSER "$REDIS_USER"

Permission denied on log file

Symptom: PHP error log or nginx access log is empty / not being written. Check ownership:

ls -l /data/web/example.com/logs/

All files should be owned by the per-site user (e.g. web_example_com:web_example_com), mode 0640. nginx writes via its own master process (root) so it can write into per-site dirs even without being in the per-site group. PHP-FPM writes as the per-site user. If a previous run created a file as root, fix it:

sudo chown -R web_example_com:web_example_com /data/web/example.com/logs/
sudo chmod 0640 /data/web/example.com/logs/*.log

certbot failed during create_vhost

Most common causes:

To retry after fixing the underlying issue:

sudo certbot --nginx -d example.com -d www.example.com
sudo nginx -t && sudo systemctl reload nginx

Then re-run create_vhost.sh example.com --ssl=auto --force to rewrite the vhost in the managed-HTTPS shape.

OPcache filled up / weird stale code after deploy

Symptoms: hit rate drops, OOM restarts climbing, edits aren't taking effect. From the PHP tab check OPcache memory %, wasted %, OOM restarts.

Quick fix:

sudo systemctl reload php8.4-fpm   # invalidates entire opcache

Long-term: raise opcache.memory_consumption and max_accelerated_files in conf/php-fpm.conf. For deploy-time invalidation without a full reload, call opcache_reset() from a privileged script.

Site uses too much RAM / killed by OOM

sudo journalctl -k --since "1 hour ago" | grep -i oom
# Which pool is the most expensive?
sudo ps -eo pid,user,rss,comm | grep php-fpm | sort -k3 -n | tail -20

Mitigations: lower pm.max_children, lower memory_limit, set a pm.max_requests recycle so workers reset (default is 500). For a site that legitimately needs more memory, raise memory_limit but proportionally lower pm.max_children so the worst-case stays bounded.

Backup script fails for one site

sudo /usr/local/sbin/tulixhost-backup_site example.com --reason=debug

Look at the output. Common failure modes:

Restore fails partway

Restore is not atomic. If it fails in the middle, the site is in a partial state:

I removed a site but it's still in nginx -T output

Stale symlink. Check both:

ls -la /etc/nginx/sites-enabled/ /etc/nginx/sites-available/ | grep example.com

Remove leftovers and reload:

sudo rm -f /etc/nginx/sites-enabled/example.com.conf /etc/nginx/sites-available/example.com.conf
sudo nginx -t && sudo systemctl reload nginx

How to tell what's actually running

nginx -V                    # nginx build options
nginx -T 2>&1 | less        # full effective nginx config
php -v                      # PHP CLI version
php -m                      # loaded modules
sudo php-fpm8.4 -tt         # FPM config including all pools
mariadb --version
redis-server --version
sudo systemctl list-units --state=running 'php*' 'nginx' 'mariadb' 'redis*'