Troubleshooting
Start with the obvious: journalctl --since "10 min ago" -p err and the site's
own logs in /data/web/<site>/logs/. Most problems show up there immediately.
The site doesn't respond at all
Diagnostic flow:
curl -I http://example.com/— what does the server actually return?- 444 / connection reset → no vhost is matching. Check
nginx -T 2>/dev/null | grep -A2 server_name. - 502 Bad Gateway → nginx is up but FPM isn't reachable. See next case.
- 404 → vhost is matching but the path or webroot is wrong. Check
rootin/etc/nginx/sites-available/example.com.conf. - timeout → firewall or DNS.
sudo ufw status,dig example.com.
- 444 / connection reset → no vhost is matching. Check
sudo nginx -t— is the config valid?sudo systemctl status nginx— is the daemon running?tail -F /data/web/example.com/logs/nginx-error.log
502 Bad Gateway
nginx couldn't reach the FPM socket. Check:
# Is the socket file there?
ls -l /data/web/example.com/tmp/php-fpm.sock
# Is the FPM pool actually listening?
sudo ss -lxnp | grep php-fpm.sock
# Is the pool config valid?
sudo php-fpm8.4 -t
# Is the FPM master alive?
sudo systemctl status php8.4-fpm
# Recent FPM errors?
sudo journalctl -u php8.4-fpm --since "10 min ago"
If the socket is missing or has the wrong permissions, reload FPM. If reload doesn't fix it, the pool config is bad — fix and try again.
Common cause: a typo in conf/php.ini or in the pool's
php_admin_value[X] lines causes php-fpm -t to fail and the master
won't reload. The OLD pool stays up until the next full restart.
"SQLSTATE[HY000] [1045] Access denied" or "Connection refused"
The DB credentials in .envtulix don't work. Test directly:
source /data/web/example.com/conf/.envtulix
mariadb -h"$DB_HOST" -P"$DB_PORT" -u"$DB_USER" -p"$DB_PASS" -e "SELECT 1" "$DB_NAME"
Possible causes:
- Password drift. The DB user's password got changed but
.envtulixstill has the old one. Fix:sudo mariadb -e "ALTER USER '$DB_USER'@'127.0.0.1' IDENTIFIED BY '$DB_PASS';" - Connecting from the wrong host. The user is granted on
'@127.0.0.1', notlocalhost. PHP-FPM uses 127.0.0.1 via the DSN. Ad-hocmariadbwithout-huses the socket, which is a different identity. - MariaDB stopped.
sudo systemctl status mariadb.
"NOAUTH Authentication required" or "NOPERM no permission" from Redis
source /data/web/example.com/conf/.envtulix
redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" \
--user "$REDIS_USER" --pass "$REDIS_PASS" \
-n "$REDIS_DB" PING
If NOAUTH → password is wrong. Reset:
sudo sed -i "/^user $REDIS_USER /d" /etc/redis/users.acl
sudo bash -c "echo 'user $REDIS_USER on >$REDIS_PASS ~$REDIS_PREFIX:* &$REDIS_PREFIX:* +@all -@dangerous -select +select|$REDIS_DB' >> /etc/redis/users.acl"
sudo redis-cli -a "$REDIS_ADMIN_PASS" ACL LOAD
If NOPERM → app is using keys without the prefix. The ACL only allows keys matching
$REDIS_PREFIX:*. Confirm via:
sudo redis-cli -a "$REDIS_ADMIN_PASS" ACL GETUSER "$REDIS_USER"
Permission denied on log file
Symptom: PHP error log or nginx access log is empty / not being written. Check ownership:
ls -l /data/web/example.com/logs/
All files should be owned by the per-site user (e.g. web_example_com:web_example_com),
mode 0640. nginx writes via its own master process (root) so it can write into per-site dirs even
without being in the per-site group. PHP-FPM writes as the per-site user. If a previous run created
a file as root, fix it:
sudo chown -R web_example_com:web_example_com /data/web/example.com/logs/
sudo chmod 0640 /data/web/example.com/logs/*.log
certbot failed during create_vhost
Most common causes:
- DNS isn't pointed yet. The hostname must resolve to this server's public IP
before Let's Encrypt can issue.
dig +short example.com @1.1.1.1. - Port 80 is blocked. Even if you're going to redirect to HTTPS,
certbot needs an HTTP-01 challenge over plain port 80.
sudo ufw status; you should see "Nginx Full ALLOW". - Cloudflare proxy ON. The HTTP-01 challenge can't reach origin. Either
temporarily turn off the orange-cloud, or use DNS-01 via
certbot --dns-cloudflare. - Rate limit. Let's Encrypt rate-limits per registered domain. If you've been bouncing — wait an hour or use staging certs for testing.
To retry after fixing the underlying issue:
sudo certbot --nginx -d example.com -d www.example.com
sudo nginx -t && sudo systemctl reload nginx
Then re-run create_vhost.sh example.com --ssl=auto --force to rewrite the vhost in
the managed-HTTPS shape.
OPcache filled up / weird stale code after deploy
Symptoms: hit rate drops, OOM restarts climbing, edits aren't taking effect. From the PHP tab check OPcache memory %, wasted %, OOM restarts.
Quick fix:
sudo systemctl reload php8.4-fpm # invalidates entire opcache
Long-term: raise opcache.memory_consumption and max_accelerated_files
in conf/php-fpm.conf. For deploy-time invalidation without a full reload, call
opcache_reset() from a privileged script.
Site uses too much RAM / killed by OOM
sudo journalctl -k --since "1 hour ago" | grep -i oom
# Which pool is the most expensive?
sudo ps -eo pid,user,rss,comm | grep php-fpm | sort -k3 -n | tail -20
Mitigations: lower pm.max_children, lower memory_limit, set a
pm.max_requests recycle so workers reset (default is 500). For a site that legitimately
needs more memory, raise memory_limit but proportionally lower
pm.max_children so the worst-case stays bounded.
Backup script fails for one site
sudo /usr/local/sbin/tulixhost-backup_site example.com --reason=debug
Look at the output. Common failure modes:
- mysqldump access denied → the DB password in
.envtulixdoesn't match the live DB user. See "Access denied" section above. - Redis DUMP returns NOAUTH → the Redis admin password in
/etc/tulixhost/tulixhost.confdoesn't match the liverequirepass. Likely happens if redis.conf was edited by hand. Fix:
Update one to match the other.sudo grep -E '^requirepass|^REDIS_ADMIN' /etc/redis/redis.conf /etc/tulixhost/tulixhost.conf - Disk full → check
df -h /data. Rotation may have already kicked in but the new backup is mid-write; clear space and retry.
Restore fails partway
Restore is not atomic. If it fails in the middle, the site is in a partial state:
- If the failure was before DB load → DB exists with the new user but no schema. Re-run the
same
restore_site.shcommand; it's idempotent up through DB user creation. - If the failure was during DB load → schema is partially loaded. Drop and recreate:
sudo mariadb -e "DROP DATABASE web_example_com;" sudo /usr/local/sbin/tulixhost-restore_site /data/backups/example.com/latest.tar.gz - If the failure was during Redis replay → keys are partially loaded. Wipe and retry:
source /data/web/example.com/conf/.envtulix redis-cli -u "$REDIS_URL" --scan --pattern "$REDIS_PREFIX:*" \ | xargs -r -n100 redis-cli -u "$REDIS_URL" DEL
I removed a site but it's still in nginx -T output
Stale symlink. Check both:
ls -la /etc/nginx/sites-enabled/ /etc/nginx/sites-available/ | grep example.com
Remove leftovers and reload:
sudo rm -f /etc/nginx/sites-enabled/example.com.conf /etc/nginx/sites-available/example.com.conf
sudo nginx -t && sudo systemctl reload nginx
How to tell what's actually running
nginx -V # nginx build options
nginx -T 2>&1 | less # full effective nginx config
php -v # PHP CLI version
php -m # loaded modules
sudo php-fpm8.4 -tt # FPM config including all pools
mariadb --version
redis-server --version
sudo systemctl list-units --state=running 'php*' 'nginx' 'mariadb' 'redis*'