Cloud Ingest Edge Volumes configuration
This article describes the configuration for Cloud Ingest Edge Volumes (blob upload with local purge).
What is Cloud Ingest Edge Volumes?
Cloud Ingest Edge Volumes facilitates limitless data ingestion from edge to blob, including ADLSgen2. Files written to this storage type are seamlessly transferred to blob storage and once confirmed uploaded, are then purged locally. This removal ensures space availability for new data. Moreover, this storage option supports data integrity in disconnected environments, which enables local storage and synchronization upon reconnection to the network.
For example, you can write a file to your cloud ingest PVC, and a process runs a scan to check for new files every minute. Once identified, the file is sent for uploading to your designated blob destination. Following confirmation of a successful upload, Cloud Ingest Edge Volume waits for five minutes, and then deletes the local version of your file.
Prerequisites
Create a storage account following the instructions here.
Note
When you create your storage account, it's recommended that you create it under the same resource group and region/location as your Kubernetes cluster.
Create a container in the storage account that you created previously, following the instructions here.
Configure Extension Identity
Edge Volumes allows the use of a system-assigned extension identity for access to blob storage. This section describes how to use the system-assigned extension identity to grant access to your storage account, allowing you to upload cloud ingest volumes to these storage systems.
It's recommended that you use Extension Identity. If your final destination is blob storage or ADLSgen2, see the following instructions. If your final destination is OneLake, follow the instructions in Configure OneLake for Extension Identity.
While it's not recommended, if you prefer to use key-based authentication, follow the instructions in Key-based authentication.
Obtain Extension Identity
Azure portal
- Navigate to your Arc-connected cluster.
- Select Extensions.
- Select your Azure Container Storage enabled by Azure Arc extension.
- Note the Principal ID under Cluster Extension Details.
Configure blob storage account for Extension Identity
Add Extension Identity permissions to a storage account
- Navigate to storage account in the Azure portal.
- Select Access Control (IAM).
- Select Add+ -> Add role assignment.
- Select Storage Blob Data Owner, then select Next.
- Select +Select Members.
- To add your principal ID to the Selected Members: list, paste the ID and select + next to the identity.
- Click Select.
- To review and assign permissions, select Next, then select Review + Assign.
Create a Cloud Ingest Persistent Volume Claim (PVC)
Create a file named
cloudIngestPVC.yaml
with the following contents. Edit themetadata.name
line and create a name for your Persistent Volume Claim. This name is referenced on the last line ofdeploymentExample.yaml
in the next step. Also, update themetadata.namespace
value with your intended consuming pod. If you don't have an intended consuming pod, themetadata.namespace
value isdefault
. Thespec.resources.requests.storage
parameter determines the size of the persistent volume. It's 2 GB in this example, but can be modified to fit your needs:Note
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
kind: PersistentVolumeClaim apiVersion: v1 metadata: ### Create a name for your PVC ### name: <create-persistent-volume-claim-name-here> ### Use a namespace that matched your intended consuming pod, or "default" ### namespace: <intended-consuming-pod-or-default-here> spec: accessModes: - ReadWriteMany resources: requests: storage: 2Gi storageClassName: cloud-backed-sc
To apply
cloudIngestPVC.yaml
, run:kubectl apply -f "cloudIngestPVC.yaml"
Attach subvolume to Edge Volume
To create a subvolume using extension identity to connect to your storage account container, use the following process:
Get the name of your Ingest Edge Volume using the following command:
kubectl get edgevolumes
Create a file named
edgeSubvolume.yaml
and copy the following contents. These variables must be updated with your information:Note
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
metadata.name
: Create a name for your subvolume.spec.edgevolume
: This name was retrieved from the previous step usingkubectl get edgevolumes
.spec.path
: Create your own subdirectory name under the mount path. The following example already contains an example name (exampleSubDir
). If you change this path name, line 33 indeploymentExample.yaml
must be updated with the new path name. If you choose to rename the path, don't use a preceding slash.spec.container
: The container name in your storage account.spec.storageaccountendpoint
: Navigate to your storage account in the Azure portal. On the Overview page, near the top right of the screen, select JSON View. You can find thestorageaccountendpoint
link under properties.primaryEndpoints.blob. Copy the entire link; for example,https://mytest.blob.core.windows.net/
.
apiVersion: "arccontainerstorage.azure.net/v1" kind: EdgeSubvolume metadata: name: <create-a-subvolume-name-here> spec: edgevolume: <your-edge-volume-name-here> path: exampleSubDir # If you change this path, line 33 in deploymentExample.yaml must be updated. Don't use a preceding slash. auth: authType: MANAGED_IDENTITY storageaccountendpoint: "https://<STORAGE ACCOUNT NAME>.blob.core.windows.net/" container: <your-blob-storage-account-container-name> ingestPolicy: edgeingestpolicy-default # Optional: See the following instructions if you want to update the ingestPolicy with your own configuration
To apply
edgeSubvolume.yaml
, run:kubectl apply -f "edgeSubvolume.yaml"
Optional: Modify the ingestPolicy
from the default
If you want to change the
ingestPolicy
from the defaultedgeingestpolicy-default
, create a file namedmyedgeingest-policy.yaml
with the following contents. The following variables must be updated with your preferences:Note
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
metadata.name
: Create a name for your ingestPolicy. This name must be updated and referenced in thespec.ingestPolicy
section of youredgeSubvolume.yaml
.spec.ingest.order
: The order in which dirty files are uploaded. This is best effort, not a guarantee (defaults to oldest-first). Options for order are: oldest-first or newest-first.spec.ingest.minDelaySec
: The minimum number of seconds before a dirty file is eligible for ingest (defaults to 60). This number can range between 0 and 31536000.spec.eviction.order
: How files are evicted (defaults to unordered). Options for eviction order are: unordered or never.spec.eviction.minDelaySec
: The number of seconds before a clean file is eligible for eviction (defaults to 300). This number can range between 0 and 31536000.
apiVersion: arccontainerstorage.azure.net/v1 kind: EdgeIngestPolicy metadata: name: <create-a-policy-name-here> # This must be updated and referenced in the spec.ingestPolicy section of the edgeSubvolume.yaml spec: ingest: order: <your-ingest-order> minDelaySec: <your-min-delay-sec> eviction: order: <your-eviction-order> minDelaySec: <your-min-delay-sec>
For more information about these specifications, see Set ingest policy.
To apply
myedgeingest-policy.yaml
, run:kubectl apply -f "myedgeingest-policy.yaml"
Attach your app (Kubernetes native application)
To configure a generic single pod (Kubernetes native application) against the Persistent Volume Claim (PVC), create a file named
deploymentExample.yaml
with the following contents. Modify thecontainers.name
andvolumes.persistentVolumeClaim.claimName
values. If you updated the path name fromedgeSubvolume.yaml
,exampleSubDir
on line 33 must be updated with your new path name. Thespec.replicas
parameter determines the number of replica pods to create. It's 2 in this example, but can be modified to fit your needs:Note
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
apiVersion: apps/v1 kind: Deployment metadata: name: cloudingestedgevol-deployment ### This must be unique for each deployment you choose to create. spec: replicas: 2 selector: matchLabels: name: wyvern-testclientdeployment template: metadata: name: wyvern-testclientdeployment labels: name: wyvern-testclientdeployment spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - wyvern-testclientdeployment topologyKey: kubernetes.io/hostname containers: ### Specify the container in which to launch the busy box. ### - name: <create-a-container-name-here> image: mcr.microsoft.com/azure-cli:2.57.0@sha256:c7c8a97f2dec87539983f9ded34cd40397986dcbed23ddbb5964a18edae9cd09 command: - "/bin/sh" - "-c" - "dd if=/dev/urandom of=/data/exampleSubDir/acsaingesttestfile count=16 bs=1M && while true; do ls /data &>/dev/null || break; sleep 1; done" volumeMounts: ### This name must match the volumes.name attribute below ### - name: wyvern-volume ### This mountPath is where the PVC is attached to the pod's filesystem ### mountPath: "/data" volumes: ### User-defined 'name' that's used to link the volumeMounts. This name must match volumeMounts.name as previously specified. ### - name: wyvern-volume persistentVolumeClaim: ### This claimName must refer to your PVC metadata.name (Line 5) claimName: <your-pvc-metadata-name-from-line-5-of-pvc-yaml>
To apply
deploymentExample.yaml
, run:kubectl apply -f "deploymentExample.yaml"
Use
kubectl get pods
to find the name of your pod. Copy this name to use in the next step.Note
Because
spec.replicas
fromdeploymentExample.yaml
was specified as2
, two pods appear usingkubectl get pods
. You can choose either pod name to use for the next step.Run the following command and replace
POD_NAME_HERE
with your copied value from the last step:kubectl exec -it POD_NAME_HERE -- sh
Change directories into the
/data
mount path as specified from yourdeploymentExample.yaml
.You should see a directory with the name you specified as your
path
in Step 2 of the Attach subvolume to Edge Volume section. Change directories into/YOUR_PATH_NAME_HERE
, replacing theYOUR_PATH_NAME_HERE
value with your details.As an example, create a file named
file1.txt
and write to it usingecho "Hello World" > file1.txt
.In the Azure portal, navigate to your storage account and find the container specified from Step 2 of Attach subvolume to Edge Volume. When you select your container, you should find
file1.txt
populated within the container. If the file hasn't appeared yet, wait approximately 1 minute; Edge Volumes waits a minute before uploading.
Next steps
After you complete these steps, you can begin monitoring your deployment using Azure Monitor and Kubernetes Monitoring or 3rd-party monitoring with Prometheus and Grafana.