Streamer Deployment Task

This tutorial is the third step in our data pipeline deployment. Here, you will deploy the inverter-streamer, a custom application that acts as a data producer. It generates simulated solar inverter data and sends it to your Kafka cluster.

This task is designed for students learning how to deploy custom applications on Kubernetes, especially those that need to connect to other services and pull images from private container registries.

In this tutorial, you will be tasked to:

Deploy an application from a private GitHub Container Registry (ghcr.io).
Use an imagePullSecret to authorize access to the private registry.
Configure the application using environment variables to specify the Kafka connection details, topic name, and other operational parameters.
Verify that the application is running and successfully producing data by inspecting its logs.

Before you start

Make sure that:

The Zookeeper and Kafka deployments from the previous tasks are running successfully.
You are SSHed into the k3smain node and have switched to the shared user account.
A pre-configured secret named ghcr-pull-secret exists in the cluster to allow access to the private container registry. You do not need to create this secret yourself.

Part 1: Create the Streamer Deployment Manifest

Your goal is to create a Kubernetes deployment file named inverter-streamer.yaml that runs the data streamer application.

Deployment Requirements

The Streamer instance must be configured with the following specifications:

Deployment Name: The deployment must be named inverter-streamer.
Node Affinity: The pod must be scheduled to run on the node named k3smain.
Container Image: Use the ghcr.io/decsresearch/c2sr-bootcamp-streamer:latest image.
Image Pull Secret: Since the image is in a private registry, the deployment must reference the ghcr-pull-secret to be able to pull the image.
Container Command: The container must be explicitly told what command to run: ["python", "main.py"].
Environment Configuration: The following environment variables must be set:
- KAFKA_HOST: The name of the Kafka service (kafka-service).
- KAFKA_PORT: The port for the Kafka service (9092).
- PRODUCTION_INTERVAL: How often to send data, in seconds (1).
- PRODUCE_TO: The name of the Kafka topic to send data to (nano).
- nano01 through nano06: Set the value for each of these to 494654.

Skeleton File for Deployment

Create a file named inverter-streamer.yaml and fill it out according to the requirements above.

# inverter-streamer.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  # Requirement 1: Specify the deployment name
  name: <Your-Deployment-Name>
spec:
  selector:
    matchLabels:
      app: inverter-streamer
  template:
    metadata:
      labels:
        app: inverter-streamer
    spec:
      containers:
      - name: inverter-streamer
        # Requirement 3: Specify the full container image path
        image: <Your-Image-Path>
        imagePullPolicy: Always
        # Requirement 5: Add the container command
        command: [<command>, <argument>]
        # Requirement 6: Add all 9 required environment variables
        env:
          - name: KAFKA_HOST
            value: <Value>
          - name: KAFKA_PORT
            value: <Value>
          # ... add all other env vars here ...
      # Requirement 4: Add the image pull secret
      imagePullSecrets:
        - name: <Your-Secret-Name>
      # Requirement 2: Add the node selector
      nodeSelector:
        kubernetes.io/hostname: <Your-Node-Name>

Part 2: Deployment and Verification

Once you have created the inverter-streamer.yaml file, apply it to the cluster and check its logs. The logs are very important here, as they will provide visual confirmation that data is being generated and sent to Kafka.

Apply the manifest: kubectl apply -f inverter-streamer.yaml
Verify the deployment: kubectl get pods -l app=inverter-streamer
Check for data production: Inspect the logs to see the streaming data. POD_NAME=$(kubectl get pods -l app=inverter-streamer -o jsonpath='{.items[0].metadata.name}') kubectl logs -f $POD_NAME (Use -f to follow the log stream in real-time.)

Once you can see data being successfully produced in the logs, you can move on to deploying the inverter, which will consume this data.

Solutions

# inverter-streamer.yaml --- apiVersion: apps/v1 kind: Deployment metadata: name: inverter-streamer spec: # replicas: 1 selector: matchLabels: app: inverter-streamer template: metadata: labels: app: inverter-streamer spec: containers: - name: inverter-streamer image: ghcr.io/decsresearch/c2sr-bootcamp-streamer:latest imagePullPolicy: Always command: ["python", "main.py"] env: - name: KAFKA_HOST value: "kafka-service" - name: KAFKA_PORT value: "9092" - name: PRODUCTION_INTERVAL value: "1" - name: PRODUCE_TO value: "nano" - name: nano01 value: "494654" - name: nano02 value: "494654" - name: nano03 value: "494654" - name: nano04 value: "494654" - name: nano05 value: "494654" - name: nano06 value: "494654" imagePullSecrets: - name: ghcr-pull-secret nodeSelector: kubernetes.io/hostname: k3smain

Last modified: 22 June 2025