Runtime monitoring
In this lab, we will explore runtime monitoring tools for Kubernetes. We will first experiment with Falco, which detects potential security threats by monitoring the behavior of containers in real-time, in particular observing system calls. We will then deploy Prometheus and Grafana which offer monitoring and visualization capabilities.
Prerequisites: Setting up Minikube
To begin, we need to install Minikube and set up a Kubernetes cluster to deploy the monitoring software.containers.
- Start the Minikube cluster:
- Verify that all pods are running:
Metrics-server
Metrics-server is a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines. It collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API.
- Download the Kubernetes manifest for
metrics-server.
Download the metrics-server.yaml file to your local machine:
- Modify the
components.yamlfile to disable certificate validation.
In the components.yaml on the Deployment description, add the --kubelet-insecure-tls=true argument. You should have a result so that:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls=true
- Install the Kubernetes deployment
- Verify the deployment:
After deploying the updated configuration, check if the metrics-server pod is running correctly:
Prometheus and Grafana
Step 1: Installing Helm charts
-
Add the necessary Helm repositories:
-
Install Prometheus (metrics collection):
Verify that Prometheus is running (this can take several minutes):
Step 2: Access the dashboards
To access the Prometheus UI:
Then, open this link in your browser. Try entering a PromQL query to check Prometheus can scrap metrics:
From another tab, get the Grafana admin password:
To access Grafana:
Then, open this link in your browser. Login with admin and the retrieved password.
In Grafana, add the Prometheus data source:
- Go to Configuration > Data Sources.
- Click Add data source > Prometheus.
- Set the URL to:
- Click Save & Test.
Step 3: Pre-Built Dashboards
Grafana has many pre-built dashboards for Prometheus metrics. You can import these by:
On the right, clicking "New dashboard > Import"
You can either import using a Grafana Dashboard ID or upload a JSON file for pre-configured dashboards. The library of available dashboard is available here.
Some popular Prometheus dashboard IDs include:
- Node Exporter Dashboard: 1860
- Kubernetes Add-ons Prometheus: 19105
- Kubernetes system API server: 15761
- Kubernetes system Core DNS: 15762
- Kubernetes views global: 15757
Enter the ID of the dashboard and select Prometheus as datasource.
Step 4: Create custom dashboards to detect given attacks
You will now launch three different attacks in a Kubernetes environment and set up monitoring panels in Grafana to detect these threats using Prometheus metrics.
Scenario 1: DoS attack on a container (high CPU and memory consumption)
An attacker might overload a container with computationally expensive operations (e.g., fork bombs, infinite loops), leading to resource exhaustion.
- Create a new dashboard:
On the left side, click on the Dashboard menu, then select New > New Dashboard on the right.
Click on "Add visualization" and select "prometheus" as data source.
Metrics to monitor:
container_cpu_usage_seconds_total(CPU usage per container)container_memory_usage_bytes(Memory usage per container)
To do so, in the "Queries" section, select the metric you would like to plot and visualize the results
You will see that values are not aggregated, so you may want to aggregate them by any label.
For example, for monitor CPU spikes, you can first apply an aggregation "sum" and look at the rate at each minute ("1m"), for each pod, that would give this result:
You can apply a similar operation to visualize the memory usage.
- Launch the attack
We will now launch an attack that runs a CPU-intensive workload inside a pod to simulate a DoS attack:
Have a look to the plot and see how it evolves.
Scenario 2: Nmap scan (high number of network connections)
An attacker uses Nmap to scan for open ports and services running in the cluster.
- Create a new dashboard:
Metrics that can be monitored are:
- node_network_tcp_connections (Unusual spikes in new connections)
- node_network_transmit_packets_total (High packet transmission rate)
As for the previous scenario, adapt the query, e.g., to detect the top 5 higher rate of TCP connections
- Launch the attack
Run an Nmap scan inside a Kubernetes pod. This scans all ports in the 10.0.0.0/24 subnet:
kubectl run nmap-scan --image=alpine -- /bin/sh -c "apk add --no-cache nmap && nmap -sS -p- 10.0.0.0/24"
Have a look to the plot in the dashboard and see how it evolves.
Falco
The Falco monitoring tool is designed to detect abnormal behavior in your system or Kubernetes clusters by analyzing system calls in real-time. It uses a set of predefined rules to identify suspicious or anomalous activities, such as privilege escalation, container escapes, and file system changes. Falco helps in detecting both known and unknown threats, providing immediate alerts when any of these events are detected. It is particularly useful in detecting attack patterns, policy violations, and malicious activities across your containers and Kubernetes environments.
Falco works by continuously monitoring system calls, container processes, and other kernel-level events. It leverages eBPF (extended Berkeley Packet Filter) or other kernel tracing methods to capture and inspect system events in real time. Once an event matches a rule, Falco generates an alert, and you can configure it to take further actions such as logging the event, sending a notification, or invoking remediation processes.
Step 1: Install Falco using Helm
-
Add the Falco Helm repository and update the local Helm repository cache:
-
Install Falco using Helm:
This can take several seconds or minutes. The output should be similar to:
NAME: falco
LAST DEPLOYED: Thu Mar 27 14:02:38 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Falco agents are spinning up on each node in your cluster. After a few
seconds, they are going to start monitoring your containers looking for
security issues.
No further action should be required.
Tip:
You can easily forward Falco events to Slack, Kafka, AWS Lambda and more with falcosidekick.
Full list of outputs: https://github.com/falcosecurity/charts/tree/master/charts/falcosidekick.
You can enable its deployment with `--set falcosidekick.enabled=true` or in your values.yaml.
See: https://github.com/falcosecurity/charts/blob/master/charts/falcosidekick/values.yaml for configuration values.
- Check the logs to ensure that Falco is running:
The output should be similar to:
{"level":"INFO","msg":"Resolving dependencies ...","timestamp":"2025-03-27 13:02:58"}
{"level":"INFO","msg":"Installing artifacts","refs":["ghcr.io/falcosecurity/rules/falco-rules:3"],"timestamp":"2025-03-27 13:03:00"}
{"level":"INFO","msg":"Preparing to pull artifact","ref":"ghcr.io/falcosecurity/rules/falco-rules:3","timestamp":"2025-03-27 13:03:00"}
{"level":"INFO","msg":"Pulling layer 8da145602705","timestamp":"2025-03-27 13:03:01"}
{"level":"INFO","msg":"Pulling layer b3990bf0209c","timestamp":"2025-03-27 13:03:01"}
{"level":"INFO","msg":"Pulling layer de2cd036fd7f","timestamp":"2025-03-27 13:03:01"}
{"digest":"ghcr.io/falcosecurity/rules/falco-rules@sha256:de2cd036fd7f9bb87de5d62b36d0f35ff4fa8afbeb9a41aa9624e5f6f9a004e1","level":"INFO","msg":"Verifying signature for artifact","timestamp":"2025-03-27 13:03:01"}
{"level":"INFO","msg":"Signature successfully verified!","timestamp":"2025-03-27 13:03:03"}
{"file":"falco_rules.yaml.tar.gz","level":"INFO","msg":"Extracting and installing artifact","timestamp":"2025-03-27 13:03:03","type":"rulesfile"}
{"digest":"sha256:de2cd036fd7f9bb87de5d62b36d0f35ff4fa8afbeb9a41aa9624e5f6f9a004e1","directory":"/rulesfiles","level":"INFO","msg":"Artifact successfully installed","name":"ghcr.io/falcosecurity/rules/falco-rules:3","timestamp":"2025-03-27 13:03:03","type":"rulesfile"}
Thu Mar 27 13:03:21 2025: System info: Linux version 5.10.207 (jenkins@ubuntu-iso) (aarch64-minikube-linux-gnu-gcc.br_real (Buildroot 2023.02.9-dirty) 11.4.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Tue Jan 14 05:18:43 UTC 2025
Thu Mar 27 13:03:21 2025: Loading rules from:
Thu Mar 27 13:03:21 2025: /etc/falco/falco_rules.yaml | schema validation: ok
Thu Mar 27 13:03:21 2025: Hostname value has been overridden via environment variable to: minikube
Thu Mar 27 13:03:21 2025: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Thu Mar 27 13:03:21 2025: Starting health webserver with threadiness 2, listening on 0.0.0.0:8765
Thu Mar 27 13:03:21 2025: Loaded event sources: syscall
Thu Mar 27 13:03:21 2025: Enabled event sources: syscall
Thu Mar 27 13:03:21 2025: Opening 'syscall' source with modern BPF probe.
Thu Mar 27 13:03:21 2025: One ring buffer every '2' CPUs.
{"artifact":"falco-rules:3","check every":"6h0m0s","level":"INFO","msg":"Creating follower","timestamp":"2025-03-27 13:03:21"}
{"artifact":"ghcr.io/falcosecurity/rules/falco-rules:3","level":"INFO","msg":"Starting follower","timestamp":"2025-03-27 13:03:21"}
{"followerName":"ghcr.io/falcosecurity/rules/falco-rules:3","level":"INFO","msg":"Found new artifact version","tag":"3","timestamp":"2025-03-27 13:03:21"}
{"artifactName":"ghcr.io/falcosecurity/rules/falco-rules:3","digest":"sha256:de2cd036fd7f9bb87de5d62b36d0f35ff4fa8afbeb9a41aa9624e5f6f9a004e1","directory":"/rulesfiles","followerName":"ghcr.io/falcosecurity/rules/falco-rules:3","level":"INFO","msg":"Artifact correctly installed","timestamp":"2025-03-27 13:03:27","type":"rulesfile"}
Step 2: Exporting Falco metrics
Once the installation is complete, you can expose the Falco pod to get access to it from your local computer.
You can verify from your local environent the Falco endpoint:
Step 3: Inspecting Falco configuration
You can also log into the Falco pod to inspect its configuration:
- Get the name of the Pod running Falco
This <falco_pod_name> should be over the form falco-<ID>, e.g., falco-wmmt.
- Log into the Pod
You can then log in to the pod, replacing
This file reference rules files, such as:
This second command enables to view the existing rules by checking the falco_rules.yaml file:
We can see the structure of existing Falco rules. They begin with a list of conditions, such as specific syscalls, user actions, or container behaviors that are being monitored. A Falco rule is defined in YAML format and follows a structured syntax to specify which system events should be considered suspicious. Rules are generally organized by event types, such as "process creation," "file access," or "network activity."
Each rule typically includes the following components:
-
Rule Name: A unique identifier for the rule.
-
Description: A short explanation of what the rule is monitoring and why it is significant.
-
Condition: The condition that triggers the rule. This could be a syscall, a specific file path, a user action, or a combination of factors. Conditions often use variables like user.name, process.name, container.id, etc.
-
Output: What Falco should log or report when the rule is triggered. The output typically includes the details of the event, such as process names, user IDs, and timestamps.
-
Priority: Defines the severity of the rule. Common priority levels include Emergency, Alert, Critical, Error, Warning, and Informational.
-
Action: The specific action that Falco should take when the rule is triggered. This could involve logging the event, sending notifications to a monitoring system, or even blocking the action in certain cases.
-
Tags: Optionally, rules can be tagged for easier classification or filtering. These tags can represent categories like "container," "network," or "file-system."
Macro can also be defined to simplify rules and make them more readable. Macros allow you to define reusable sets of conditions or variables that can be referenced throughout the rule set. This helps avoid duplication and makes it easier to maintain and update the rules. Macros are typically defined at the top of the rules file and can be used in the conditions of any rule.
For example, a macro might be defined like this:
- macro: critical_syscalls
condition: evt.type in (execve, open, unlink)
And then referenced in a rule as:
And a rule:
- rule: "Critical system calls"
condition: critical_syscalls
output: "Critical system call detected: %evt.type"
priority: "Critical"
This file contains a default set of Falco rules. Novel, custom, Falco rules can also be added in file /etc/falco/rules.d/falco_rules.local.yaml.
Now try to access a given file on the Falco pod:
Step 4: Viewing Logs and Events
Open another name (named Tab 2), where we will visualize the logs produced by Falco.
To do so, run this command and replace <falco_pod_name> with the name of the pod:
Keep this tab opened, as it is waiting for new Falco logs.
Question 1
What events are reported by Falco in the logs? For each event, answer the following questions:
- What is the severity level of the alert?
- What is the description of the event?
- Which Pod and container are affected?
- Which action caused this alert?
Now go back to the first tab (Tab 1).
Then let's create a novel Pod, with the nginx image:
Check when the pod started correctly and is in Running state. Once it is ok, we will execute a command within the pod:
Go back to Tab 2 that continuously monitors the logs and have a look to novel ones.
Question 2
What novel event is reported by Falco in the logs? For this event, answer the following questions:
- What is the alert level?
- What is the description of the event?
- Which Pod and container are affected?
- What action caused the event?
Step 5: Writing custom rules and macros
Each rule contains a given number of basic elements, as can be shown in the example below:
- rule: "shell_in_container"
desc: notice shell activity within a container
condition: >
evt.type = execve and
evt.dir < and
(proc.name = bash or
proc.name = ksh)
output: >
shell in a container
(user=%user.name container_id=%container.id container_name=%container.name
shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)
priority: WARNING
| Field | Description |
|---|---|
| rules | The name of the rule. It should be unique and descriptive of what the rule detects. |
| desc | A human-readable description of what the rule does, explaining the security threat being detected. |
| condition | A logical expression that defines when the rule should trigger an alert. |
| output | The alert message that is generated when the rule condition is met. |
| priority | The severity level of the rule (e.g., EMERGENCY, ALERT, CRITICAL? ERROR, WARNING, etc). Higher severity indicates more critical security events. |
Create a simple Falco rule, created directly within the Falco container. For that, first go back to Tab 1. We will directly update the Helm Chart to add the custom rule.
Go back to Tab 1 and save the values.yaml file from the Helm chart with all exisiting configurations:
We will now edit the values.yaml file to add the custom rules. Use the vi editor to show the file:
Search for the customRules field in the file. You can then edit it to add the new rule:
customRules:
falco_rules.local.yaml: |-
- rule: Detect curl execution in Kubernetes Pod
desc: Detects when the curl utility is executed within a Kubernetes pod.
condition: >
spawned_process and container and
proc.name = "curl"
output: >
Suspicious process detected (curl) inside a container.
priority: WARNING
Falco should be restarted for it to load the new rule. Run the following command:
This can take one or several minutes. Wait a bit and check that the pods run correctly: kubectl get pods. You will notice that the <falco_pod_name> changed since we restarted the daemon set: note the novel name.
Once it is ok, verify that the new rule have been uploaded to the pod in the /etc/falco/rules.d/falco_rules.local.yaml file:
kubectl exec -it <falco_pod_name> -- ls -l /etc/falco/rules.d
kubectl exec -it <falco_pod_name> -- cat /etc/falco/rules.d/falco_rules.local.yaml
Now go back to the Tab 2 collecting the logs. This process stopped because we restarted the Falco pods. You can run the command to collect logs again:
Now run a curl command from the nginx pod from Tab 1:
Go back to Tab 2 collecting the logs. You should get an input similar to:
We noticed that this event is less precise than usually. We will try to update it to give more information to the administrator, e.g., including the container ID and name.
Question 3
Which field(s) should you edit to embed the container ID and name?
Propose a novel formulation to include this.
Update the values.yaml file with the new rule.
Restart Falco again and check the logs to verify it includes all necessary information (as described in previous steps).
We will now explore the concept of macros. Macros are essentially predefined rule conditions. They allow you to avoid repeatedly writing the sample complex expressions
Macro example:
Question 4
Looking at the documentation, write a macro named "sensitive_files" that detects when a file is sensitive. In this simple example, we define that this file is named /tmp/sensitive.txt.
Question 5
Once you wrote this macro, define a rule that detects any process attempting to read or write sensitive files using the previously defined macro for the condition.
In the output, give information about the user name, process name, and file name.
Step 6: Optimal configuration of Falco
Falco's configuration file is a YAML file containing a collection of key: value or key:[value list] pairs. The configuration file is available at /etc/falco/falco.yaml once you execute the Falco pod.
You can first visualize the Falco configuration file:
With Minikube, it is easier to modify the values.yaml file from the Helm chart, as previously described:
Question 8
Which main types of configuration can we set up in this file?
Question 9
Modify the following parameters in the configuration file:
- Modify the
prioritytowarning - Change
syscall_event_drops.actiontolog
Falco should be restarted for it to load the new rule. Run the following command:
Finally, verify that Falco is running with the new configuration:
Conclusion
In this lab, we successfully deployed and configured essential runtime monitoring tools for Kubernetes: Falco, Prometheus, and Grafana. These tools provided valuable insights into the behavior of containers in real-time and helped detect potential security threats. With the ability to monitor resource usage, network activity, and system calls, we can enhance the security posture of Kubernetes environments and effectively respond to incidents.