Troubleshooting

Start with the obvious: journalctl --since "10 min ago" -p err and the site's own logs in /data/web/<site>/logs/. Most problems show up there immediately.

The site doesn't respond at all

Diagnostic flow:

curl -I http://example.com/ — what does the server actually return?
- 444 / connection reset → no vhost is matching. Check nginx -T 2>/dev/null | grep -A2 server_name.
- 502 Bad Gateway → nginx is up but FPM isn't reachable. See next case.
- 404 → vhost is matching but the path or webroot is wrong. Check root in /etc/nginx/sites-available/example.com.conf.
- timeout → firewall or DNS. sudo ufw status, dig example.com.
sudo nginx -t — is the config valid?
sudo systemctl status nginx — is the daemon running?
tail -F /data/web/example.com/logs/nginx-error.log

502 Bad Gateway

nginx couldn't reach the FPM socket. Check:

# Is the socket file there?
ls -l /data/web/example.com/tmp/php-fpm.sock

# Is the FPM pool actually listening?
sudo ss -lxnp | grep php-fpm.sock

# Is the pool config valid?
sudo php-fpm8.4 -t

# Is the FPM master alive?
sudo systemctl status php8.4-fpm

# Recent FPM errors?
sudo journalctl -u php8.4-fpm --since "10 min ago"

If the socket is missing or has the wrong permissions, reload FPM. If reload doesn't fix it, the pool config is bad — fix and try again.

Common cause: a typo in conf/php.ini or in the pool's php_admin_value[X] lines causes php-fpm -t to fail and the master won't reload. The OLD pool stays up until the next full restart.

"SQLSTATE[HY000] [1045] Access denied" or "Connection refused"

The DB credentials in .envtulix don't work. Test directly:

source /data/web/example.com/conf/.envtulix
mariadb -h"$DB_HOST" -P"$DB_PORT" -u"$DB_USER" -p"$DB_PASS" -e "SELECT 1" "$DB_NAME"

Possible causes:

Password drift. The DB user's password got changed but .envtulix still has the old one. Fix:
```
sudo mariadb -e "ALTER USER '$DB_USER'@'127.0.0.1' IDENTIFIED BY '$DB_PASS';"
```
Connecting from the wrong host. The user is granted on '@127.0.0.1', not localhost. PHP-FPM uses 127.0.0.1 via the DSN. Ad-hoc mariadb without -h uses the socket, which is a different identity.
MariaDB stopped. sudo systemctl status mariadb.

"NOAUTH Authentication required" or "NOPERM no permission" from Redis

source /data/web/example.com/conf/.envtulix
redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" \
    --user "$REDIS_USER" --pass "$REDIS_PASS" \
    -n "$REDIS_DB" PING

If NOAUTH → password is wrong. Reset:

sudo sed -i "/^user $REDIS_USER /d" /etc/redis/users.acl
sudo bash -c "echo 'user $REDIS_USER on >$REDIS_PASS ~$REDIS_PREFIX:* &$REDIS_PREFIX:* +@all -@dangerous -select +select|$REDIS_DB' >> /etc/redis/users.acl"
sudo redis-cli -a "$REDIS_ADMIN_PASS" ACL LOAD

If NOPERM → app is using keys without the prefix. The ACL only allows keys matching $REDIS_PREFIX:*. Confirm via:

sudo redis-cli -a "$REDIS_ADMIN_PASS" ACL GETUSER "$REDIS_USER"

Permission denied on log file

Symptom: PHP error log or nginx access log is empty / not being written. Check ownership:

ls -l /data/web/example.com/logs/

All files should be owned by the per-site user (e.g. web_example_com:web_example_com), mode 0640. nginx writes via its own master process (root) so it can write into per-site dirs even without being in the per-site group. PHP-FPM writes as the per-site user. If a previous run created a file as root, fix it:

sudo chown -R web_example_com:web_example_com /data/web/example.com/logs/
sudo chmod 0640 /data/web/example.com/logs/*.log

certbot failed during create_vhost

Most common causes:

DNS isn't pointed yet. The hostname must resolve to this server's public IP before Let's Encrypt can issue. dig +short example.com @1.1.1.1.
Port 80 is blocked. Even if you're going to redirect to HTTPS, certbot needs an HTTP-01 challenge over plain port 80. sudo ufw status; you should see "Nginx Full ALLOW".
Cloudflare proxy ON. The HTTP-01 challenge can't reach origin. Either temporarily turn off the orange-cloud, or use DNS-01 via certbot --dns-cloudflare.
Rate limit. Let's Encrypt rate-limits per registered domain. If you've been bouncing — wait an hour or use staging certs for testing.

To retry after fixing the underlying issue:

sudo certbot --nginx -d example.com -d www.example.com
sudo nginx -t && sudo systemctl reload nginx

Then re-run create_vhost.sh example.com --ssl=auto --force to rewrite the vhost in the managed-HTTPS shape.

OPcache filled up / weird stale code after deploy

Symptoms: hit rate drops, OOM restarts climbing, edits aren't taking effect. From the PHP tab check OPcache memory %, wasted %, OOM restarts.

Quick fix:

sudo systemctl reload php8.4-fpm   # invalidates entire opcache

Long-term: raise opcache.memory_consumption and max_accelerated_files in conf/php-fpm.conf. For deploy-time invalidation without a full reload, call opcache_reset() from a privileged script.

Site uses too much RAM / killed by OOM

sudo journalctl -k --since "1 hour ago" | grep -i oom
# Which pool is the most expensive?
sudo ps -eo pid,user,rss,comm | grep php-fpm | sort -k3 -n | tail -20

Mitigations: lower pm.max_children, lower memory_limit, set a pm.max_requests recycle so workers reset (default is 500). For a site that legitimately needs more memory, raise memory_limit but proportionally lower pm.max_children so the worst-case stays bounded.

Backup script fails for one site

sudo /usr/local/sbin/tulixhost-backup_site example.com --reason=debug

Look at the output. Common failure modes:

mysqldump access denied → the DB password in .envtulix doesn't match the live DB user. See "Access denied" section above.
Redis DUMP returns NOAUTH → the Redis admin password in /etc/tulixhost/tulixhost.conf doesn't match the live requirepass. Likely happens if redis.conf was edited by hand. Fix:
```
sudo grep -E '^requirepass|^REDIS_ADMIN' /etc/redis/redis.conf /etc/tulixhost/tulixhost.conf
```
Update one to match the other.
Disk full → check df -h /data. Rotation may have already kicked in but the new backup is mid-write; clear space and retry.

Restore fails partway

Restore is not atomic. If it fails in the middle, the site is in a partial state:

If the failure was before DB load → DB exists with the new user but no schema. Re-run the same restore_site.sh command; it's idempotent up through DB user creation.

If the failure was during DB load → schema is partially loaded. Drop and recreate:

sudo mariadb -e "DROP DATABASE web_example_com;"
sudo /usr/local/sbin/tulixhost-restore_site /data/backups/example.com/latest.tar.gz

If the failure was during Redis replay → keys are partially loaded. Wipe and retry:

source /data/web/example.com/conf/.envtulix
redis-cli -u "$REDIS_URL" --scan --pattern "$REDIS_PREFIX:*" \
  | xargs -r -n100 redis-cli -u "$REDIS_URL" DEL

I removed a site but it's still in nginx -T output

Stale symlink. Check both:

ls -la /etc/nginx/sites-enabled/ /etc/nginx/sites-available/ | grep example.com

Remove leftovers and reload:

sudo rm -f /etc/nginx/sites-enabled/example.com.conf /etc/nginx/sites-available/example.com.conf
sudo nginx -t && sudo systemctl reload nginx

How to tell what's actually running

nginx -V                    # nginx build options
nginx -T 2>&1 | less        # full effective nginx config
php -v                      # PHP CLI version
php -m                      # loaded modules
sudo php-fpm8.4 -tt         # FPM config including all pools
mariadb --version
redis-server --version
sudo systemctl list-units --state=running 'php*' 'nginx' 'mariadb' 'redis*'