Day 36 - Persistent Volumes in K8s

Day 36 - Persistent Volumes in K8s

Managing storage is a distinct problem from managing compute instances. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed.

What are Persistent Volumes (PVs)?

Persistent Volumes (PVs) are a mechanism to provide persistent storage for applications running in the cluster. PVs represent a piece of network-attached storage that is provisioned by the cluster administrator.

A Persistent Volume is a cluster-level resource that decouples storage from individual pods or applications. It has a lifecycle independent of any specific pod and can be dynamically provisioned or statically created.

PVs have configurable properties such as capacity, access modes (e.g., ReadWriteOnce, ReadOnlyMany, ReadWriteMany), and storage class (a way to dynamically provision PVs). They can be dynamically provisioned by storage providers, or manually created and managed by the cluster administrator.

PVs can be backed by various storage types, including local storage, network-attached storage (NAS), cloud storage, or other external storage systems. They provide a persistent and reliable storage solution for applications, allowing data to persist even if the associated pod is deleted or rescheduled.

What is Persistent Volume Claim (PVC)?

A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory) whereas Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany, ReadWriteMany, or ReadWriteOncePod).

When an application requires persistent storage, it creates a PVC with the desired characteristics. Kubernetes then matches the PVC with an appropriate PV based on the PVC’s specifications, such as storage capacity, access modes, and storage class.

A PVC is essentially a request to mount a PV meeting certain requirements on a pod. PVCs do not specify a specific PV—instead, they specify which StorageClass the pod requires. Administrators can define StorageClasses that indicate properties of storage devices, such as performance, service levels, and back-end policies.

Kubernetes Persistent Volumes | Recog

Persistent Volume Claims: Static vs. Dynamic Provisioning

To bind a pod to a PV, the pod must contain a volume mount and a PVC. These declarations allow users to mount PVs in pods without knowing the details of the underlying storage equipment.

There are two options for mounting PVs to a pod:

  1. Static configuration: involves administrators manually creating PVs and defining a StorageClass that matches the criteria of these PVs. When a pod uses a PVC that specifies the StorageClass, it gains access to one of these static PVs.

  2. Dynamic configuration: occurs when there is no static PV that matches the PVC. In this case, the Kubernetes cluster provisions a new PV based on the StorageClass definitions.

Benefits of having PV and PVC in K8s

  1. Improved High Availability: By using PVs and PVCs, applications can achieve high availability and fault tolerance. PVs can be backed by replicated storage systems, ensuring data redundancy and availability.

  2. Abstraction and Portability: PVs and PVCs abstract the underlying storage details from applications. This allows applications to request and use storage resources in a standardized and portable manner.

  3. Ease of Scaling and Migration: With PVs and PVCs, applications can easily scale by dynamically provisioning additional PVs or increasing the capacity of existing PVs.

  4. Data Persistence: PVs and PVCs ensure that data persists even if a pod or application is terminated or rescheduled. The decoupling of storage from pods allows the data to be retained in the PV, ensuring data durability and availability.

  5. Resource Management: PVs and PVCs provide better resource management by separating storage from applications. Administrators can define storage classes and allocate appropriate storage resources to different PVCs based on capacity, access modes, and other criteria.

  6. Dynamic Provisioning: PVCs support dynamic provisioning, where the cluster automatically creates PVs to fulfill the storage requirements specified in the PVC. This enables on-demand storage allocation and eliminates the need for manual PV creation.

  7. Flexibility and Customization: PVCs allow applications to request storage resources with a specific capacity, and access modes (e.g., ReadWriteOnce, ReadOnlyMany, ReadWriteMany). This flexibility enables applications to choose the appropriate storage characteristics that align with their needs, ensuring optimal performance, security, and data access patterns.

Lifecycle of PV and PVC

In a Kubernetes cluster, a PV exists as a storage resource in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle: -

  1. Provisioning: the creation of the PV, either directly (static) or dynamically using StorageClass.

  2. Binding: assigning the PV to the PVC.

  3. Using: Pods use the volume through the PVC.

  4. Reclaiming: the PV is reclaimed, either by keeping it for the next use or by deleting it directly from the cloud storage.

  5. Retain: allows for manual reclamation of the resource.

  6. Delete: removes the PV from Kubernetes.

A volume will be in one of the following states: -

  1. Available - this state shows that the PV is ready to be used by the PVC.

  2. Bound - this state shows that the PV has been assigned to a PVC.

  3. Released - the claim has been deleted, but the cluster has not yet reclaimed the resource.

  4. Failed - this state shows that an error has occurred in the PV.

PVs Use Cases

Some of the use cases of PVs are: -

  1. Database management: Databases often require persistent storage to maintain data integrity and ensure high availability. By using a Persistent Volume, administrators can ensure that the data stored in the database remains persistent even if the corresponding pod goes offline or gets rescheduled.

  2. File Sharing and Collaboration Applications: Applications like content management systems or file servers often require shared storage where multiple pods can read and write data simultaneously. Persistent Volumes provide the necessary functionality to meet these requirements.

  3. Machine Learning Applications: Large datasets need to be stored and accessed by multiple pods simultaneously, PVs offer a scalable and efficient solution. Similarly, in IoT deployments, where sensor data needs to be collected and stored persistently, Persistent Volumes can ensure data reliability and availability.

Adding Persistent Volume

In this, we will add a persistent volume to our deployment todo app.

Step 1: We are taking todo-app. Refer to Deploy a sample todo app.

Step 2: Create a Persistent Volume using a file on your node.

vi pv.yml
# pv.yml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-todo-app
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/tmp/data"

Step 3: Now, apply the persistent volume.

kubectl apply -f pv.yml

Step 4: Create a Persistent Volume Claim that references the Persistent Volume.

vi pvc.yml
# pvc.yml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-todo-app
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi

Step 5: Now, apply the Persistent Volume Claim by using the below command.

kubectl apply -f pvc.yml

Step 6: Update your deployment.yml file to include the Persistent Volume Claim. After Applying pv.yml and pvc.yml to your deployment file.

vi deployment.yml
#deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: todo-app-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: todo-app
  template:
    metadata:
      labels:
        app: todo-app
    spec:
      containers:
        - name: todo-app
          image: mudit097/django-todo
          ports:
            - containerPort: 8000
          volumeMounts:
            - name: todo-app-data
              mountPath: /app
      volumes:
        - name: todo-app-data
          persistentVolumeClaim:
            claimName: pvc-todo-app

Step 7: Apply the updated deployment using the command:

kubectl apply -f deployment.yml

Step 8: Verify that the Persistent Volume has been added to your Deployment. Use the below commands: -

# Getting details of Persistent Volume
kubectl get pv

# Getting details of Persistent Volume Claim
kubectl get pvc

# Getting the pvc details of the application
kubectl describe pvc pvc-todo-app

Hooray!! Persistent Volume has been added to our deployment application.🎉

Now, let's move with accessing the data in that volume.👇

Accessing data in the Persistent Volume

Step 1: Get the Pod details by using below command: -

# Getting the pods details
kubectl get pods

Step 2: Connect to a Pod in your Deployment using the command: -

# Get into one of the pod
kubectl exec -it <pod-name> /bin/bash

Step 3: Verify that you can access the data stored in the Persistent Volume from within the Pod by creating a file inside the /app directory on the worker node using the below command: -

# Now change dir to /app
cd /app/

# Create a file test.txt and put some content into it
echo "Hello" > /app/test.txt

# Verify the file is present in the dir or not.
ls

Step 4: Now, delete that pod where we create a file. By deleting the deployment of pod and checking if the file, in the new pod is created after applying the deployment again in the server.

# Checking total pods
kubectl get pods

# Deleting the existing pod
kubectl delete pod <pod-name>

Step 5: As Auto healing is there, a new pod is generated after the old pod is deleted.

Step 6: Now, verify the file you have created on the worker node: -

#Get the docker container name of newly created pod
docker ps

# Now get into the newly cretaed pod
docker exec -it <pod-name> /bin/bash

# Then change dir to /app
cd /app/

# List down all the file, and we can see the file we create before is present
ls

# Now Read the existing file. And we had the same file and file content is also same.
cat test.txt

Hooray!!! We have access data in the persistent volume..🎉

Conclusion

In Kubernetes (k8s), PV (Persistent Volume) and PVC (Persistent Volume Claim) are integral components for managing and persisting data in containerized applications.

  • PV (Persistent Volume): It represents a physical storage resource in the cluster, such as a disk, network storage, or other storage solutions. PVs can be pre-provisioned by administrators or dynamically provisioned by storage classes.

  • PVC (Persistent Volume Claim): It is a request for storage by a user or a pod. PVCs abstract away the underlying storage details, allowing developers to request storage resources without worrying about the specific implementation.

Together, PVs and PVCs provide a scalable and flexible approach to managing storage in Kubernetes. PVs decouple storage configuration from pod specifications, promoting a more modular and scalable architecture. PVCs allow applications to consume storage resources dynamically based on their needs, facilitating efficient resource utilization.

So, In this blog basically we have seen how to :

  1. Add persistent Volume to our deployment todo application, and

  2. Access data in the Persistent Volume.

Hope you find it helpful🤞 So I encourage you to try this on your own and let me know in the comment section👇 about your learning experience.✨

👆The information presented above is based on my interpretation. Suggestions are always welcome.😊

~Smriti Sharma✌