Komprise patents dynamic load balancing tech

Published

Unstructured data manager Komprise has patented technology to dynamically subdivide the work in sending large unstructured data sets to AI processing GPU across many compute engines to get it done faster.

The Komprise Elastic Shares (KES) patent; # US-12566637-B2, was filed by Komprise CTO Michael Peercy and others, and is entitled “System and methods for subdividing an unknown tree for execution of operations by multiple compute engines.”

Peercy said: “With Elastic Shares, our customers can fully utilize precious compute, memory, and network resources to gain near-linear scaling and the best competitive advantage.”

Think of a petabyte scale dataset which is being sent to an AI server for analytic, LLM or AI agent processing. Descending the file directory tree or traversing all the prefixes in an object store could take many hours if a single compute engine is given the task. That engine could be a standalone server or thread, a collection of threads, a process, or a collection of processes - the principles are the same. You can set up a group of compute engines to do the work in parallel but this is done before the file directory or object prefixes are traversed. This runs into trouble if some tree branches or prefixes are small and the corresponding compute engine then sits idle, waiting for the others to complete.

Komprise patent US-12566637-B2.
Komprise patent US-12566637-B2.

What Peercy and his group invented was a kind of job supervisor that runs in one of the compute engines and monitors the others. When the job comes in, it is initially partitioned across the available set of compute engines and their status monitored, such that, if one completes, it is returned to the available set and can be given the next waiting partition of the job. That compute engine does not then sit idle and the whole job completes faster, with other compute engines given fresh partitions to process when they complete their initial partition.

A Komprise blog says KES “continuously redistributes unstructured data processing tasks across a grid of machines in a streaming fashion. This ensures near-linear speed-up at scale without requiring prior knowledge of dataset size, structure, or processing time.”

This means that: “As soon as one machine finishes, new work can be assigned to it, which keeps all machines busy until the processing has finished.” 

Michael Peercy.
Michael Peercy.

The patent text explains that “Frequently, in operations such as copying all files from one file server, file volume, or file share to another, the amount of data to transfer may be great while the time allowed may be small. A single compute engine—for example, and without limitation, a thread, a collection of threads, a process, a collection of processes, or a standalone computer—generally cannot do all the work quickly enough. In this case one may need to employ parallel processing across multiple compute engines.” 

It goes in to say: “the contents of a directory tree may be discovered only as the tree is traversed from the root nodes toward the leaf nodes. Furthermore, it is possible that the data must be processed in order because, for example, parent directories must precede subdirectories or files. In this case the data must be subdivided on the fly. This subdivision must be effective and tolerant of all sorts of data structures that are encountered. In other words, it should be dynamic and not static. 

KES vs trad load-balancing.
KES vs trad load-balancing.

The patent applies to both files and directories, and objects and prefixes. Also, while the patent “is designed for separate computers, it can be applied in the case of separate compute engines of any type, including, without limitation, threads, collections of threads, processes, collections of processes, services, and collections of services.”

Komprise identifies three benefits:

  • Dynamic partitioning ensures expensive resources get assigned new tasks as soon as the resources become available,
  • Komprise can process datasets without prior knowledge of their size, structure, and diverse processing times, which is essential for data streaming to AI,
  • Komprise automatically rebalances resource allocation to address unstructured data hierarchies of unknown branch densities.

It's clever technology.