Runtime monitoring

In this lab, we will explore runtime monitoring tools for Kubernetes. We will first experiment with Falco, which detects potential security threats by monitoring the behavior of containers in real-time, in particular observing system calls. We will then deploy Prometheus and Grafana which offer monitoring and visualization capabilities.

Prerequisites: Setting up Minikube

To begin, we need to install Minikube and set up a Kubernetes cluster to deploy the monitoring software.containers.

Start the Minikube cluster:

minikube start --driver=qemu --cpus=4 --memory=8g

Verify that all pods are running:

kubectl get pods --all-namespaces

Metrics-server

Metrics-server is a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines. It collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API.

Download the Kubernetes manifest for metrics-server.

Download the metrics-server.yaml file to your local machine:

curl -LO https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.1/components.yaml

Modify the components.yaml file to disable certificate validation.

In the components.yaml on the Deployment description, add the --kubelet-insecure-tls=true argument. You should have a result so that:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls=true

Install the Kubernetes deployment

kubectl apply -f components.yaml

Verify the deployment:

After deploying the updated configuration, check if the metrics-server pod is running correctly:

kubectl get pods -n kube-system

Prometheus and Grafana

Step 1: Installing Helm charts

Add the necessary Helm repositories:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install Prometheus (metrics collection):

helm install prometheus prometheus-community/kube-prometheus-stack

Verify that Prometheus is running (this can take several minutes):

kubectl get pods

Step 2: Access the dashboards

To access the Prometheus UI:

kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090

Then, open this link in your browser. Try entering a PromQL query to check Prometheus can scrap metrics:

container_memory_usage_bytes

From another tab, get the Grafana admin password:

kubectl get secret prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 --decode

To access Grafana:

kubectl port-forward svc/prometheus-grafana 3000:80

Then, open this link in your browser. Login with admin and the retrieved password.

In Grafana, add the Prometheus data source:

Go to Configuration > Data Sources.
Click Add data source > Prometheus.

Set the URL to:

http://prometheus-kube-prometheus-prometheus.default.svc.cluster.local:9090

Click Save & Test.

Step 3: Pre-Built Dashboards

Grafana has many pre-built dashboards for Prometheus metrics. You can import these by:

On the right, clicking "New dashboard > Import"

You can either import using a Grafana Dashboard ID or upload a JSON file for pre-configured dashboards. The library of available dashboard is available here.

Some popular Prometheus dashboard IDs include:

Node Exporter Dashboard: 1860
Kubernetes Add-ons Prometheus: 19105
Kubernetes system API server: 15761
Kubernetes system Core DNS: 15762
Kubernetes views global: 15757

Enter the ID of the dashboard and select Prometheus as datasource.

Step 4: Create custom dashboards to detect given attacks

You will now launch three different attacks in a Kubernetes environment and set up monitoring panels in Grafana to detect these threats using Prometheus metrics.

Scenario 1: DoS attack on a container (high CPU and memory consumption)

An attacker might overload a container with computationally expensive operations (e.g., fork bombs, infinite loops), leading to resource exhaustion.

Create a new dashboard:

On the left side, click on the Dashboard menu, then select New > New Dashboard on the right.

Click on "Add visualization" and select "prometheus" as data source.

Metrics to monitor:

container_cpu_usage_seconds_total (CPU usage per container)
container_memory_usage_bytes (Memory usage per container)

To do so, in the "Queries" section, select the metric you would like to plot and visualize the results

You will see that values are not aggregated, so you may want to aggregate them by any label.

For example, for monitor CPU spikes, you can first apply an aggregation "sum" and look at the rate at each minute ("1m"), for each pod, that would give this result:

sum(rate(container_cpu_usage_seconds_total{namespace="default"}[1m])) by (pod)

You can apply a similar operation to visualize the memory usage.

Launch the attack

We will now launch an attack that runs a CPU-intensive workload inside a pod to simulate a DoS attack:

kubectl run dos-attack --image=alpine -- /bin/sh -c "while true; do openssl speed; done"

Have a look to the plot and see how it evolves.

Scenario 2: Nmap scan (high number of network connections)

An attacker uses Nmap to scan for open ports and services running in the cluster.

Create a new dashboard:

Metrics that can be monitored are: - node_network_tcp_connections (Unusual spikes in new connections) - node_network_transmit_packets_total (High packet transmission rate)

As for the previous scenario, adapt the query, e.g., to detect the top 5 higher rate of TCP connections

topk(5, rate(node_network_tcp_connections[1m]))

Launch the attack

Run an Nmap scan inside a Kubernetes pod. This scans all ports in the 10.0.0.0/24 subnet:

kubectl run nmap-scan --image=alpine -- /bin/sh -c "apk add --no-cache nmap && nmap -sS -p- 10.0.0.0/24"

Have a look to the plot in the dashboard and see how it evolves.

Falco

The Falco monitoring tool is designed to detect abnormal behavior in your system or Kubernetes clusters by analyzing system calls in real-time. It uses a set of predefined rules to identify suspicious or anomalous activities, such as privilege escalation, container escapes, and file system changes. Falco helps in detecting both known and unknown threats, providing immediate alerts when any of these events are detected. It is particularly useful in detecting attack patterns, policy violations, and malicious activities across your containers and Kubernetes environments.

Falco works by continuously monitoring system calls, container processes, and other kernel-level events. It leverages eBPF (extended Berkeley Packet Filter) or other kernel tracing methods to capture and inspect system events in real time. Once an event matches a rule, Falco generates an alert, and you can configure it to take further actions such as logging the event, sending a notification, or invoking remediation processes.

Step 1: Install Falco using Helm

Add the Falco Helm repository and update the local Helm repository cache:

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update

Install Falco using Helm:

helm install falco \
     --set driver.kind=modern_ebpf \
     --set tty=true \
     --set metrics.enabled=true \
     --set webserver.enabled=true \
     --set webserver.prometheus_metrics_enabled=true \
     falcosecurity/falco

This can take several seconds or minutes. The output should be similar to:

NAME: falco
LAST DEPLOYED: Thu Mar 27 14:02:38 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Falco agents are spinning up on each node in your cluster. After a few
seconds, they are going to start monitoring your containers looking for
security issues.

No further action should be required.

Tip:
You can easily forward Falco events to Slack, Kafka, AWS Lambda and more with falcosidekick.
Full list of outputs: https://github.com/falcosecurity/charts/tree/master/charts/falcosidekick.
You can enable its deployment with `--set falcosidekick.enabled=true` or in your values.yaml.
See: https://github.com/falcosecurity/charts/blob/master/charts/falcosidekick/values.yaml for configuration values.

Check the logs to ensure that Falco is running:

kubectl logs -l app.kubernetes.io/name=falco --all-containers

The output should be similar to:

{"level":"INFO","msg":"Resolving dependencies ...","timestamp":"2025-03-27 13:02:58"}
{"level":"INFO","msg":"Installing artifacts","refs":["ghcr.io/falcosecurity/rules/falco-rules:3"],"timestamp":"2025-03-27 13:03:00"}
{"level":"INFO","msg":"Preparing to pull artifact","ref":"ghcr.io/falcosecurity/rules/falco-rules:3","timestamp":"2025-03-27 13:03:00"}
{"level":"INFO","msg":"Pulling layer 8da145602705","timestamp":"2025-03-27 13:03:01"}
{"level":"INFO","msg":"Pulling layer b3990bf0209c","timestamp":"2025-03-27 13:03:01"}
{"level":"INFO","msg":"Pulling layer de2cd036fd7f","timestamp":"2025-03-27 13:03:01"}
{"digest":"ghcr.io/falcosecurity/rules/falco-rules@sha256:de2cd036fd7f9bb87de5d62b36d0f35ff4fa8afbeb9a41aa9624e5f6f9a004e1","level":"INFO","msg":"Verifying signature for artifact","timestamp":"2025-03-27 13:03:01"}
{"level":"INFO","msg":"Signature successfully verified!","timestamp":"2025-03-27 13:03:03"}
{"file":"falco_rules.yaml.tar.gz","level":"INFO","msg":"Extracting and installing artifact","timestamp":"2025-03-27 13:03:03","type":"rulesfile"}
{"digest":"sha256:de2cd036fd7f9bb87de5d62b36d0f35ff4fa8afbeb9a41aa9624e5f6f9a004e1","directory":"/rulesfiles","level":"INFO","msg":"Artifact successfully installed","name":"ghcr.io/falcosecurity/rules/falco-rules:3","timestamp":"2025-03-27 13:03:03","type":"rulesfile"}
Thu Mar 27 13:03:21 2025: System info: Linux version 5.10.207 (jenkins@ubuntu-iso) (aarch64-minikube-linux-gnu-gcc.br_real (Buildroot 2023.02.9-dirty) 11.4.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Tue Jan 14 05:18:43 UTC 2025
Thu Mar 27 13:03:21 2025: Loading rules from:
Thu Mar 27 13:03:21 2025:    /etc/falco/falco_rules.yaml | schema validation: ok
Thu Mar 27 13:03:21 2025: Hostname value has been overridden via environment variable to: minikube
Thu Mar 27 13:03:21 2025: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Thu Mar 27 13:03:21 2025: Starting health webserver with threadiness 2, listening on 0.0.0.0:8765
Thu Mar 27 13:03:21 2025: Loaded event sources: syscall
Thu Mar 27 13:03:21 2025: Enabled event sources: syscall
Thu Mar 27 13:03:21 2025: Opening 'syscall' source with modern BPF probe.
Thu Mar 27 13:03:21 2025: One ring buffer every '2' CPUs.
{"artifact":"falco-rules:3","check every":"6h0m0s","level":"INFO","msg":"Creating follower","timestamp":"2025-03-27 13:03:21"}
{"artifact":"ghcr.io/falcosecurity/rules/falco-rules:3","level":"INFO","msg":"Starting follower","timestamp":"2025-03-27 13:03:21"}
{"followerName":"ghcr.io/falcosecurity/rules/falco-rules:3","level":"INFO","msg":"Found new artifact version","tag":"3","timestamp":"2025-03-27 13:03:21"}
{"artifactName":"ghcr.io/falcosecurity/rules/falco-rules:3","digest":"sha256:de2cd036fd7f9bb87de5d62b36d0f35ff4fa8afbeb9a41aa9624e5f6f9a004e1","directory":"/rulesfiles","followerName":"ghcr.io/falcosecurity/rules/falco-rules:3","level":"INFO","msg":"Artifact correctly installed","timestamp":"2025-03-27 13:03:27","type":"rulesfile"}

Step 2: Exporting Falco metrics

Once the installation is complete, you can expose the Falco pod to get access to it from your local computer.

kubectl port-forward <falco_pod_name> 8765:8765

You can verify from your local environent the Falco endpoint:

curl http://localhost:8765/metrics

Step 3: Inspecting Falco configuration

You can also log into the Falco pod to inspect its configuration:

Get the name of the Pod running Falco

kubectl get pods

This <falco_pod_name> should be over the form falco-<ID>, e.g., falco-wmmt.

Log into the Pod

You can then log in to the pod, replacing with the ID of the pod, and execute commands. This first command will show the Falco configuration file

kubectl exec -it <falco_pod_name> -- cat /etc/falco/falco.yaml

This file reference rules files, such as:

rules_files:
- /etc/falco/falco_rules.yaml
- /etc/falco/falco_rules.local.yaml
- /etc/falco/rules.d

This second command enables to view the existing rules by checking the falco_rules.yaml file:

kubectl exec -it <falco_pod_name> -- cat /etc/falco/falco_rules.yaml

We can see the structure of existing Falco rules. They begin with a list of conditions, such as specific syscalls, user actions, or container behaviors that are being monitored. A Falco rule is defined in YAML format and follows a structured syntax to specify which system events should be considered suspicious. Rules are generally organized by event types, such as "process creation," "file access," or "network activity."

Each rule typically includes the following components:

Rule Name: A unique identifier for the rule.
Description: A short explanation of what the rule is monitoring and why it is significant.
Condition: The condition that triggers the rule. This could be a syscall, a specific file path, a user action, or a combination of factors. Conditions often use variables like user.name, process.name, container.id, etc.
Output: What Falco should log or report when the rule is triggered. The output typically includes the details of the event, such as process names, user IDs, and timestamps.
Priority: Defines the severity of the rule. Common priority levels include Emergency, Alert, Critical, Error, Warning, and Informational.
Action: The specific action that Falco should take when the rule is triggered. This could involve logging the event, sending notifications to a monitoring system, or even blocking the action in certain cases.
Tags: Optionally, rules can be tagged for easier classification or filtering. These tags can represent categories like "container," "network," or "file-system."

Macro can also be defined to simplify rules and make them more readable. Macros allow you to define reusable sets of conditions or variables that can be referenced throughout the rule set. This helps avoid duplication and makes it easier to maintain and update the rules. Macros are typically defined at the top of the rules file and can be used in the conditions of any rule.

For example, a macro might be defined like this:

- macro: critical_syscalls
  condition: evt.type in (execve, open, unlink)
And then referenced in a rule as:

And a rule:

- rule: "Critical system calls"
  condition: critical_syscalls
  output: "Critical system call detected: %evt.type"
  priority: "Critical"

This modular approach helps in managing complex rule sets while maintaining clarity.

This file contains a default set of Falco rules. Novel, custom, Falco rules can also be added in file /etc/falco/rules.d/falco_rules.local.yaml.

Now try to access a given file on the Falco pod:

kubectl exec -it <falco_pod_name> -- cat /etc/shadow

Step 4: Viewing Logs and Events

Open another name (named Tab 2), where we will visualize the logs produced by Falco.

To do so, run this command and replace <falco_pod_name> with the name of the pod:

kubectl logs <falco_pod_name> falco -f

Keep this tab opened, as it is waiting for new Falco logs.

Question 1

What events are reported by Falco in the logs? For each event, answer the following questions:

What is the severity level of the alert?
What is the description of the event?
Which Pod and container are affected?
Which action caused this alert?

Now go back to the first tab (Tab 1).

Then let's create a novel Pod, with the nginx image:

kubectl run nginx-pod --image=nginx

Check when the pod started correctly and is in Running state. Once it is ok, we will execute a command within the pod:

kubectl exec -it nginx-pod -- bash
exit

Go back to Tab 2 that continuously monitors the logs and have a look to novel ones.

Question 2

What novel event is reported by Falco in the logs? For this event, answer the following questions:

What is the alert level?
What is the description of the event?
Which Pod and container are affected?
What action caused the event?

Step 5: Writing custom rules and macros

Each rule contains a given number of basic elements, as can be shown in the example below:

- rule: "shell_in_container"
  desc: notice shell activity within a container
  condition: >
    evt.type = execve and
    evt.dir < and
    (proc.name = bash or
    proc.name = ksh)
  output: >
    shell in a container
    (user=%user.name container_id=%container.id container_name=%container.name
    shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)
  priority: WARNING

Field	Description
rules	The name of the rule. It should be unique and descriptive of what the rule detects.
desc	A human-readable description of what the rule does, explaining the security threat being detected.
condition	A logical expression that defines when the rule should trigger an alert.
output	The alert message that is generated when the rule condition is met.
priority	The severity level of the rule (e.g., EMERGENCY, ALERT, CRITICAL? ERROR, WARNING, etc). Higher severity indicates more critical security events.

Create a simple Falco rule, created directly within the Falco container. For that, first go back to Tab 1. We will directly update the Helm Chart to add the custom rule.

Go back to Tab 1 and save the values.yaml file from the Helm chart with all exisiting configurations:

helm show values falcosecurity/falco > values.yaml

We will now edit the values.yaml file to add the custom rules. Use the vi editor to show the file:

vi values.yaml

Search for the customRules field in the file. You can then edit it to add the new rule:

customRules:
  falco_rules.local.yaml: |-
    - rule: Detect curl execution in Kubernetes Pod
      desc: Detects when the curl utility is executed within a Kubernetes pod.
      condition: >
        spawned_process and container and
        proc.name = "curl"
      output: >
        Suspicious process detected (curl) inside a container.
      priority: WARNING

Falco should be restarted for it to load the new rule. Run the following command:

helm upgrade falco falcosecurity/falco -f values.yaml 
kubectl rollout restart daemonset falco

This can take one or several minutes. Wait a bit and check that the pods run correctly: kubectl get pods. You will notice that the <falco_pod_name> changed since we restarted the daemon set: note the novel name.

Once it is ok, verify that the new rule have been uploaded to the pod in the /etc/falco/rules.d/falco_rules.local.yaml file:

kubectl exec -it <falco_pod_name> -- ls -l /etc/falco/rules.d
kubectl exec -it <falco_pod_name> -- cat /etc/falco/rules.d/falco_rules.local.yaml

Now go back to the Tab 2 collecting the logs. This process stopped because we restarted the Falco pods. You can run the command to collect logs again:

kubectl logs <falco_pod_name> falco -f

Now run a curl command from the nginx pod from Tab 1:

kubectl exec nginx-pod -- curl google.com

Go back to Tab 2 collecting the logs. You should get an input similar to:

Warning Suspicious process detected (curl) inside a container.

We noticed that this event is less precise than usually. We will try to update it to give more information to the administrator, e.g., including the container ID and name.

Question 3

Which field(s) should you edit to embed the container ID and name?

Propose a novel formulation to include this.

Update the values.yaml file with the new rule.

Restart Falco again and check the logs to verify it includes all necessary information (as described in previous steps).

We will now explore the concept of macros. Macros are essentially predefined rule conditions. They allow you to avoid repeatedly writing the sample complex expressions

Macro example:

Question 4

Looking at the documentation, write a macro named "sensitive_files" that detects when a file is sensitive. In this simple example, we define that this file is named /tmp/sensitive.txt.

Question 5

Once you wrote this macro, define a rule that detects any process attempting to read or write sensitive files using the previously defined macro for the condition.

In the output, give information about the user name, process name, and file name.

Step 6: Optimal configuration of Falco

Falco's configuration file is a YAML file containing a collection of key: value or key:[value list] pairs. The configuration file is available at /etc/falco/falco.yaml once you execute the Falco pod.

You can first visualize the Falco configuration file:

kubectl exec -it <falco_pod_name> -- cat /etc/falco/falco.yaml

With Minikube, it is easier to modify the values.yaml file from the Helm chart, as previously described:

vi values.yaml

Question 8

Which main types of configuration can we set up in this file?

Question 9

Modify the following parameters in the configuration file:

Modify the priority to warning
Change syscall_event_drops.action to log

Falco should be restarted for it to load the new rule. Run the following command:

kubectl rollout restart daemonset falco

Finally, verify that Falco is running with the new configuration:

falco --validate /etc/falco/falco.yaml

Conclusion

In this lab, we successfully deployed and configured essential runtime monitoring tools for Kubernetes: Falco, Prometheus, and Grafana. These tools provided valuable insights into the behavior of containers in real-time and helped detect potential security threats. With the ability to monitor resource usage, network activity, and system calls, we can enhance the security posture of Kubernetes environments and effectively respond to incidents.