Tutorials 7 min read

Hardening Guide: Securing ComfyUI and AI Inference Servers Against Real-World Threats

Key Takeaways

  • AI inference servers (ComfyUI, Stable Diffusion, Ollama) are high-value targets
  • Never expose an inference instance directly to the Internet without authentication
  • Hardening guide: authentication, reverse proxy, network segmentation, GPU monitoring
  • GPUs are expensive resources; unauthorized cryptomining can cost thousands of euros per day

Why Your AI Inference Servers Are Being Targeted

The cryptomining campaign that compromised more than 1,000 ComfyUI servers in March 2026 highlighted a major blind spot in many organizations’ security postures: AI inference servers do not receive the same protections as the rest of the infrastructure. They are often deployed rapidly by teams focused on model capabilities rather than security posture, and exposed to the internet with default configurations that provide no authentication.

The target value of these servers is high for a straightforward reason: an A100 or H100 GPU can generate several hundred dollars per day in cryptocurrency mining. Multiplied across dozens or hundreds of compromised servers, the economics of these operations are highly favorable to attackers. The cost to victims is symmetrical: exploding cloud bills, GPUs unavailable for legitimate tasks, and potentially data exfiltration or persistent access to infrastructure.

This guide covers essential measures for securing ComfyUI deployments, but the principles apply equally to Ollama, WebUI (Automatic1111), and any other inference server exposing an HTTP interface.

Foundational Rule: Never Expose Directly to the Internet

Before any other measure, you must understand and apply one non-negotiable principle: no AI inference server should be directly accessible from the internet without authentication. ComfyUI, by design, does not include an authentication system in its base interface. It is intended to be used locally or within a secured network.

If you currently have port 8188 (ComfyUI), 7860 (WebUI), 11434 (Ollama), or any other inference service port open directly to the internet without protection, this is a security emergency that must be resolved before reading further in this guide.

Check immediately with your cloud provider or through your firewall rules whether these ports are accessible from the outside.

Step 1: Set Up a Reverse Proxy with TLS

The first line of defense is a reverse proxy that terminates incoming connections before they reach your inference server. Nginx is the most widely used and best-documented solution for this purpose.

A minimal Nginx configuration for ComfyUI includes HTTP-to-HTTPS redirection, TLS termination with a valid certificate (Let’s Encrypt via Certbot works very well), and proxying to localhost:8188 only. Nginx should listen on ports 80 and 443; ComfyUI should never listen directly.

The TLS certificate is essential, even for internal use: it protects your communications against passive eavesdropping and is required for certain security features of modern browsers.

Step 2: Add an Authentication Layer

Nginx basic auth is the simplest solution to implement. It protects access to your reverse proxy with a username and password. The .htpasswd file contains hashed credentials, generated via the command htpasswd -c /etc/nginx/.htpasswd your_username.

For more demanding environments or multiple users, Authelia is an open source authentication solution that adds two-factor authentication, session management, and LDAP/Active Directory integration. It sits between Nginx and your inference service.

Cloudflare Access offers a third option, particularly suited if you already use Cloudflare for DNS management. It enables authenticating access via Google, GitHub accounts, or email-based rules, without any complex server configuration. For small teams, this is often the path of least resistance to getting solid authentication in place quickly.

Step 3: Network Segmentation

GPU inference servers must not reside on the same network segment as your production systems or sensitive data. Segregating inference servers into a dedicated VLAN limits the impact of a compromise: an attacker who gains access to your ComfyUI server should not be able to directly reach your databases, file servers, or identity management systems.

In a cloud environment, use Security Groups (AWS) or Network Security Groups (Azure) to explicitly define permitted traffic flows. The inference server should only accept connections from your reverse proxy, and its outbound connections should be limited to the legitimate services it needs.

This containment means that even in a worst-case scenario where an attacker fully compromises your inference server, the blast radius is limited to that server and the GPU resources it controls, rather than becoming a pivot point into your broader infrastructure.

Step 4: Disable ComfyUI-Manager in Production

ComfyUI-Manager is the extension that enables installation of custom nodes from third-party repositories. It is the primary vector exploited in current attack campaigns. In production, where you do not need to install new nodes regularly, disabling this extension significantly reduces the attack surface.

If you must use ComfyUI-Manager for updates or testing, do so in an isolated environment, not on your production server. Validate nodes in this test environment before deploying them to production. Treat every custom node as third-party code that requires review before execution, because that is exactly what it is.

Step 5: Audit Installed Custom Nodes

Before deploying an existing ComfyUI server to production, or if you suspect a compromise, audit all installed nodes. The list is located in the custom_nodes/ directory of your ComfyUI installation.

For each node, verify its origin (official GitHub repository with stars and active contributors), its installation date relative to the last time you intentionally performed an installation, and its source code for any suspicious network activity (HTTP requests to external domains, shell command execution, system file access).

The presence of a node named “GPU Performance Monitor” must trigger an immediate investigation: this is the name used by the active campaign documented in March 2026. More broadly, any node whose name you cannot associate with a specific deliberate installation decision should be treated as suspicious until verified.

Step 6: Monitor GPU Utilization

An inference server compromised for cryptomining presents a very characteristic signature: high, constant GPU utilization even in the absence of legitimate requests. Implementing continuous GPU utilization monitoring allows you to detect these anomalies quickly.

nvidia-smi in monitoring mode (nvidia-smi dmon) provides real-time metrics. For integration into an existing monitoring system (Prometheus, Grafana, Datadog), NVIDIA’s dcgm-exporter plugin exposes GPU metrics in Prometheus format.

Define alerts on the following thresholds: GPU utilization above 80% for more than 10 minutes without active requests in the ComfyUI logs, abnormally high GPU memory consumption, GPU temperature significantly above typical values. These thresholds should be calibrated against your normal workload baseline.

Step 7: Container Isolation with Resource Limits

Docker adds a valuable isolation layer for inference servers. With Docker, you can limit the resources ComfyUI can access, control volume mounts to restrict access to the host file system, and facilitate clean updates or recreation of the container if compromise is suspected.

Define explicit resource limits in your Docker run command or compose file: maximum CPU cores, memory limits, and GPU access via the --gpus flag specifying only the GPU devices the service legitimately needs.

Avoid using the --network=host flag with Docker: it removes the network isolation that Docker provides by default. Prefer creating a dedicated Docker network with explicitly configured access.

Step 8: Keep ComfyUI Updated

ComfyUI updates regularly fix security vulnerabilities, including in custom node management. Establish a regular update process, at minimum monthly for production environments, and immediate for critical security updates.

Follow the official ComfyUI GitHub repository for security announcements. Release notes explicitly mention security fixes. This is not optional maintenance: in the current threat environment, running an outdated version of any internet-facing service is an accepted risk that must be consciously evaluated and justified.

Hardening Checklist Summary

Before putting an inference server into production, verify that each point is addressed: firewall blocking direct access to inference ports from the internet, Nginx reverse proxy with TLS configured, authentication in place (basic auth, Authelia, or Cloudflare Access), ComfyUI-Manager disabled, custom node audit completed, GPU monitoring configured with alerts, network isolation via VLAN or Security Groups, and a documented update process.

For teams that remotely access their inference servers for supervision or maintenance, NordVPN provides an additional network protection layer to secure these administrative connections.

These are affiliate links. If you make a purchase through these links, we may receive a commission at no additional cost to you.

  • CompTIA Security+ Study Guide: practical security fundamentals covering network security, access control, and infrastructure hardening principles directly applicable to inference server protection.
  • CISSP Official Study Guide: comprehensive reference for security architects designing secure AI infrastructure deployments, covering network segmentation and defense in depth.
  • Cybersecurity Essentials: accessible guide for AI and ML teams that need to rapidly build security awareness around their deployment infrastructure.

Sources

Share :

Advertisement

Related Articles