Version: v2.8.0

Enable Cambricon MLU Sharing

HAMi now supports cambricon.com/mlu by implementing most device-sharing features similar to NVIDIA GPUs, including:

MLU Sharing: Tasks can request a fraction of an MLU instead of an entire MLU card. This enables multiple tasks to share the same MLU device.
Device Memory Control: You can allocate MLUs with a specified memory size, with guaranteed enforcement to ensure usage does not exceed the requested limit.
Device Core Control: MLUs can be assigned a specific number of compute cores, and enforcement ensures core usage remains within bounds.
MLU Type Selection: You can use annotations to specify which MLU types a task must use or must avoid by setting cambricon.com/use-mlutype or cambricon.com/nouse-mlutype.

Prerequisites

neuware-mlu370-driver > 5.10
cntoolkit > 2.5.3

Install HAMi via Helm

Follow the instructions under the Enabling vGPU Support in Kubernetes section in the HAMi README.

Enable SMLU mode on each MLU device

cnmon set -c 0 -smlu on
cnmon set -c 1 -smlu on
# Repeat for all devices...

Deploy the Cambricon device plugin

Get the cambricon-device-plugin from your device provider, and configure it with the following parameters:
- mode=dynamic-smlu: Enables dynamic SMLU support.
- min-dsmlu-unit=256: Sets the minimum allocatable memory unit to 256 MB.
Refer to your provider’s documentation for additional details.

Apply the configured plugin

kubectl apply -f cambricon-device-plugin-daemonset.yaml

Running MLU Jobs

To request shared MLU resources in a container, use the following resource types:

cambricon.com/vmlu
cambricon.com/mlu.smlu.vmemory
cambricon.com/mlu.smlu.vcore

Here is an YAML example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: binpack-1
  labels:
    app: binpack-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: binpack-1
  template:
    metadata:
      labels:
        app: binpack-1
    spec:
      containers:
        - name: c-1
          image: ubuntu:18.04
          command: ["sleep"]
          args: ["100000"]
          resources:
            limits:
              cambricon.com/vmlu: "1"
              cambricon.com/mlu.smlu.vmemory: "20"
              cambricon.com/mlu.smlu.vcore: "10"

note

Init containers are not supported for MLU sharing.

Pods with the cambricon.com/mlumem resource specified in an init container will not be scheduled.
Resource constraints only apply to shared mode (vmlu=1).

The cambricon.com/mlu.smlu.vmemory and cambricon.com/mlu.smlu.vcore resources are only effective when cambricon.com/vmlu is set to 1. If vmlu > 1, a full MLU device will be allocated regardless of vmemory and vcore values.

Prerequisites​

Enabling MLU Sharing​

Running MLU Jobs​

Prerequisites

Enabling MLU Sharing

Running MLU Jobs