Network Rendering

From K-3D

Overview

Currently, K-3D implements a simple system for queueing rendering jobs on the local host. When the user initiates a job (such as rendering a preview image, rendering a final frame, or rendering an animation), a job directory is created (usually in /tmp, although this can be configured). The job directory contains a lock file, a control file named control.k3d, and one-to-many numbered frame directories. The name of the lock file changes to flag the status of the job as "ready", "running", "complete", or "error". The control.k3d file is currently unused.

Each frame directory contains a lock file, a control file named control.k3d and zero-to-many frame resources. The name of the lock file changes to flag the status of the frame as "ready", "running", "complete", or "error". The control.k3d file is an XML file that encodes a set of operations that must be completed succesfully to change the status of the frame from "ready" to "complete". Three types of operation are allowed: a "render" operation that executes a pre-configured render command using a fixed set of command-line arguments, a "copy" operation that copies a file from one filesystem location to another; and a "view" operation that displays a bitmap image using the user's choice of image viewer. The frame resources are inputs to the render engine, such as RIB files, texture images, shaders, etc.

The k3d-renderjob executable is run with the path to a job directory as an argument. k3d-renderjob updates the job lock file, iterating over each frame directory, spawning an instance of k3d-renderframe for each frame (one at a time). k3d-renderframe is run with the path to a frame directory as an argument, updating the frame lock file, reading the control file, and executing the operations that it specifies.

This document discusses extending or replacing the current system to support network rendering, i.e. submitting jobs to machines other than the local host; providing the user with job control, i.e. canceling, scheduling, and modifying multiple jobs with a graphical user interface; and support for parallel rendering, i.e. rendering more than one frame of a given job at a time.

Use Cases

User installs and runs K-3D. Without installing any other software and without explicitly starting any queue server, local preview/render just works.
User can run multiple local preview/render jobs simultaneously, or N at a time from the K-3D GUI, or have them queued based on load average, etc.
User can manage (cancel, alter priority, pause, continue, etc) multiple local jobs using a GUI.
User can exit K-3D, and jobs continue running. If there are no jobs, server quits.
User can run a separate GUI for managing jobs, without starting the main K-3D GUI.
If a network queue server is available, user can perform all of the above actions remotely, using the same GUI.
User can pick-and-choose whether to use their local queue server or the network queue server on a case-by-case basis.
Each job is made-up of one-or-more frames, and each frame runs an arbitrary number of commands to compile shaders, do multi-pass-rendering, perform compositing, etc. to complete the frame.

Terminology

For convenience, we adopt the following terminology:

Master - a server that queues jobs, maintaining status and scheduling information for each job. The master might be running on the localhost, on a render node, or any other node.
Slave - a server that can process frames. A slave accepts a frame from the master, processes it, delivers status information to the master, and waits for the next frame. A slave might be running on the localhost, on the same host as the master, on a dedicated render node, or any other node.
Client - a client communicates with a server to submit jobs, monitor the job queue, or alter the job queue.

Requirements

It must be possible to run multiple master processes on the same node or network. Rationale: K-3D must have a local master for doing zero-configuration local rendering, and it should be able to submit jobs to other masters on the network.
Frame processing must include shader compilation. Rationale: in a networked environment, shaders will have to be compiled for the software/platform doing the rendering, the current shader cache will not work. In addition, this simplifies paths for RIB files and makes each frame a self-contained unit that could be shared using other tools (e.g: submitting bug reports for render engines).
Frame processing must be arbitrarily complex: a frame will, at a minimum, have to compile N shaders and perform M render passes. We shouldn't constrain the quantity or complexity of processing to be performed. Note that the processing to be performed for a given frame is determined at render-time by the user, there is not a fixed number of operations that could be part of the slave configuration.
Client functionality will have to include image viewing capabilities. Rationale: in the networked environment, preview rendering, view-as-you-render, etc. need to be displayed using the client's tools, rather than the slave's tools.
The client must be able to retrieve log/stdout/stderr data from masters and slaves. Rationale: essential for troubleshooting and bug-reports.

Proposal

Deployment

DrQueue http://www.drqueue.org is an open-source distributed render queue. Ideally, we would like to implement a single rendering system in K-3D that could interoperate with existing DrQueue installations. In this scenario, K-3D would implement the functionality to become a DrQueue "client", communicating with a DrQueue "master" using an open DrQueue protocol. To handle zero-configuration, low-latency local rendering and previews, K-3D would also implement its own DrQueue-compatible master + slave that would be part of the normal K-3D distribution.

Protocol

In this proposal, K-3D will use the DrQueue "general" module to submit jobs specified at a command-line level. Some sample XML:

<k3dml>
   <frame>
     <command>
       <exec binary="aqsl" timeout="300">
         <environment>
           <variable type="string" name="PATH" value="$PATH$"/>
         </environment>
         <arguments>
           <argument type="path" value="-o $DQ_FRAME_PATH$/plastic.slx"/>
           <argument type="path" value="-I$DQ_JOB_PATH$"/>
           <argument type="path" value="-I$DQ_FRAME_PATH$"/>
           <argument type="path" value="$DQ_JOB_PATH$/plastic.sl"/>
         </arguments>
       </exec>
     </command>
     <command>
       <exec binary="aqsis">
         <environment>
           <variable type="string" name="PATH" value="$PATH$"/>
         </environment>
         <arguments>
           <argument type="string" value="-nostandard"/>
           <argument type="path" value="-shaders=$DQ_JOB_PATH$;$DQ_FRAME_PATH$"/>
           <argument type="path" value="$DQ_FRAME_PATH$/output.rib"/>
         </arguments>
       </exec>
     </command>
     <command>
       <copy source="$DQ_FRAME_PATH$/output.tiff" target="/mount/home/tshead/projects/test/test_003.tiff"/>
     </command>
     <command>
       <view file="/mount/home/tshead/projects/test/test_003.tiff"/>
     </command>
   </frame>
</k3dml>

Some key observations from this example:

Each frame contains one-to-many <module> tags, which are executed in-order by the slave. The "type" attribute specifies the module to be executed, K-3D will always use the "general" module.
Each <module> tag will contain module-specific markup.
For the general module, each <command> tag represents a command to be executed. The "exec" attribute is the binary executable to be run with an exec() call (commands are not run in a shell, to avoid portability problems). The optional "timeout" attribute specifies the maximum time in seconds that the command will be allowed to run.
Each <command> tag contains an optional <environment> tag, used to control the environment block for the executed command. For portability the default environment is empty - variables must be explicitly added to a command's environment using <variable> tags.
Each <variable> tag contains "type", "name", and "value" attributes.
The <variable> "name" and "value" attributes specify the name and value of the environment variable.
The <argument> "type" attribute can be either "string" or "path". "string" arguments are passed to the executable unmodified, "path" arguments are modified to conform to the underlying filesystem (e.g: converting slashes to backslashes on Win32).
Each <command> tag contains an optional <arguments> tag, used to specify arguments passed to the executable at runtime.
Each <argument> has "type" and "value" attributes.
The <argument> "value" is the string passed to the executable at runtime.
The <argument> "type" attribute can be either "string" or "path". "string" arguments are passed to the executable unmodified, "path" arguments are modified to conform to the underlying filesystem (e.g: converting slashes to backslashes on Win32).
String substitution is performed on <variable> and <argument> "value" attributes, using $VARIABLE$ syntax. DrQueue provides a collection of predefined variables, such as $DQ_JOB_PATH$ and $DQ_FRAME_PATH$ which refer to the absolute paths to the job and frame storage on the local slave at runtime. Any variable name not predefined by DrQueue (such as $PATH$) is assumed to reference an environment variable from the local slave's environment.
The <copy> tag defines a copy operation to be performed by the slave between the paths specified by the "from" and "to" attributes. This operation is executed by the slave and allows a final output image (or images) to be copied from the frame storage to some user-specified final destination (assuming that destination is mounted by the slave).
The <client> tag defines markup that will be ignored by the slave. It allows clients to perform arbitrary processing (particularly viewing a completed frame) once work on the slave is complete.

Network Rendering

From K-3D

Overview

Use Cases

Terminology

Requirements

Proposal

Deployment

Protocol

Views

Personal tools

Navigation

Search

Toolbox