This commit is contained in:
Waylon S. Walker 2025-05-07 08:09:39 -05:00
commit 0a4b32dffe
12 changed files with 840 additions and 0 deletions

1
.gitignore vendored Normal file
View file

@ -0,0 +1 @@
.null-ls_806023_main.py

118
README.md Normal file
View file

@ -0,0 +1,118 @@
# Monitor Kubernetes Logs with Grafana Alloy and Loki
> Note this scenario works using the K8s Monitoring Helm chart. This abstracts the need to configure Alloy and deploys best practices for monitoring Kubernetes clusters. The chart supports; metrics, logs, profiling, and tracing. For this scenario, we will use the K8s Monitoring Helm chart to monitor Kubernetes logs.
This scenario demonstrates how to setup the Kubernetes monitoring helm and Loki. This scenario will install three Helm charts: Loki, Grafana, and k8s-monitoring-helm. Loki will be used to store the logs, Grafana will be used to visualize the logs, and Alloy (k8s-monitoring-helm) will be used to collect three different log sources:
* Pod Logs
* Kubernetes Events
## Prerequisites
Clone the repository:
```bash
git clone https://github.com/grafana/alloy-scenarios.git
```
Change to the directory:
```bash
cd alloy-scenarios/k8s-logs
```
Next you will need a Kubernetes cluster (In this example, we will configure a local Kubernetes cluster using [Kind](https://kind.sigs.k8s.io/docs/user/quick-start/))
An example kind cluster configuration is provided in the `kind.yml` file. To create a kind cluster using this configuration, run the following command:
```bash
kind create cluster --config kind.yml
```
Lastly you will need to make sure you install Helm on your local machine. You can install Helm by following the instructions [here](https://helm.sh/docs/intro/install/). You will also need to install the Grafana Helm repository:
```bash
helm repo add grafana https://grafana.github.io/helm-charts
```
## Create the `meta` and `prod` namespaces
The first step is to create the `meta` and `prod` namespaces. To create the namespaces, run the following commands:
```bash
kubectl create namespace meta && \
kubectl create namespace prod
```
## Install the Loki Helm Chart
The first step is to install the Loki Helm chart. This will install Loki in the `meta` namespace. The `loki-values.yml` file contains the configuration for the Loki Helm chart. To install Loki, run the following command:
```bash
helm install --values loki-values.yml loki grafana/loki -n meta
```
This installs Loki in monolithic mode. For more information on Loki modes, see the [Loki documentation](https://grafana.com/docs/loki/latest/get-started/deployment-modes/).
## Install the Grafana Helm Chart
The next step is to install the Grafana Helm chart. This will install Grafana in the `meta` namespace. The `grafana-values.yml` file contains the configuration for the Grafana Helm chart. To install Grafana, run the following command:
```bash
helm install --values grafana-values.yml grafana grafana/grafana --namespace meta
```
Note that within the `grafana-values.yml` file, the `grafana.ini` configuration is set to use the Loki data source. This is done by setting the `datasources.datasources.yaml` field to the Loki data source configuration.
## Install the K8s Monitoring Helm Chart
The final step is to install the K8s monitoring Helm chart. This will install Alloy in the `meta` namespace. The `k8s-monitoring-values.yml` file contains the configuration for the K8s monitoring Helm chart. To install the K8s monitoring Helm chart, run the following command:
```bash
helm install --values ./k8s-monitoring-values.yml k8s grafana/k8s-monitoring -n meta --create-namespace
```
Within the `k8s-monitoring-values.yml` file we declare the Alloy configuration. This configuration specifies the log sources that Alloy will collect logs from. In this scenario, we are collecting logs from two different sources: Pod Logs and Kubernetes Events.
## Accessing the Grafana UI
To access the Grafana UI, you will need to port-forward the Grafana pod to your local machine. First, get the name of the Grafana pod:
```bash
export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}")
```
Next, port-forward the Grafana pod to your local machine:
```bash
kubectl --namespace meta port-forward $POD_NAME 3000
```
Open your browser and go to [http://localhost:3000](http://localhost:3000). You can log in with the default username `admin` and password `adminadminadmin`.
## Accessing the Alloy UI
To access the Alloy UI, you will need to port-forward the Alloy pod to your local machine. First, get the name of the Alloy pod:
```bash
export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=alloy-logs,app.kubernetes.io/instance=k8s" -o jsonpath="{.items[0].metadata.name}")
```
Next, port-forward the Alloy pod to your local machine:
```bash
kubectl --namespace meta port-forward $POD_NAME 12345
```
## View the logs using Explore Logs in Grafana
Explore Logs is a new feature in Grafana which provides a queryless way to explore logs. To access Explore Logs. To access Explore logs open a browser and go to [http://localhost:3000/a/grafana-lokiexplore-app](http://localhost:3000/a/grafana-lokiexplore-app).
## Adding a demo prod app
The k8s monitoring app is configured to collect logs from two namespaces: `meta` and `prod`. To add a demo prod app, run the following command:
```bash
helm install tempo grafana/tempo-distributed -n prod
```
This will install the Tempo distributed tracing system in the `prod` namespace.

166
dashboards.yaml Normal file
View file

@ -0,0 +1,166 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: my-dashboard
namespace: meta
labels:
grafana_dashboard: "1"
data:
my-dashboard.json: |
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 59,
"links": [],
"panels": [
{
"datasource": {
"type": "tempo",
"uid": "cel2jaxx4s4xsf"
},
"fieldConfig": {
"defaults": {
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green"
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true
},
"pluginVersion": "11.6.1",
"targets": [
{
"datasource": {
"type": "tempo",
"uid": "cel2jaxx4s4xsf"
},
"filters": [
{
"id": "574a7fa6",
"operator": "=",
"scope": "span"
}
],
"limit": 20,
"metricsQueryType": "range",
"queryType": "traceqlSearch",
"refId": "A",
"tableType": "traces"
}
],
"title": "traces over last 5 minutes",
"type": "table"
},
{
"datasource": {
"type": "loki",
"uid": "P8E80F9AEF21F6940"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"id": 2,
"options": {
"dedupStrategy": "none",
"enableInfiniteScrolling": false,
"enableLogDetails": true,
"prettifyLogMessage": false,
"showCommonLabels": false,
"showLabels": false,
"showTime": false,
"sortOrder": "Descending",
"wrapLogMessage": false
},
"pluginVersion": "11.6.1",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "P8E80F9AEF21F6940"
},
"direction": "backward",
"editorMode": "builder",
"expr": "{service_name=~\"python-otel-.*\"} |= ``",
"queryType": "range",
"refId": "A"
}
],
"title": "python-otel-logs",
"type": "logs"
}
],
"preload": false,
"schemaVersion": 41,
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-5m",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "python-otel-from-configmap",
"uid": "fel2uhjhepg5ce",
"version": 3
}

1
grafana-dashboards Submodule

@ -0,0 +1 @@
Subproject commit ad76eac2fbc99c8f6171048abecd8a7faf39297c

43
grafana-values.yml Normal file
View file

@ -0,0 +1,43 @@
---
persistence:
type: pvc
enabled: true
# DO NOT DO THIS IN PRODUCTION USECASES
adminUser: admin
adminPassword: adminadminadmin
# CONSIDER USING AN EXISTING SECRET
# Use an existing secret for the admin user.
# admin:
## Name of the secret. Can be templated.
# existingSecret: ""
# userKey: admin-user
# passwordKey: admin-password
service:
enabled: true
type: ClusterIP
sidecar:
dashboards:
enabled: true
label: grafana_dashboard
labelValue: "1"
folder: /tmp/dashboards
searchNamespace: ""
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
orgId: 1
url: http://loki-gateway.meta.svc.cluster.local:80
basicAuth: false
isDefault: false
version: 1
editable: false

25
justfile Normal file
View file

@ -0,0 +1,25 @@
forward_loki:
kubectl port-forward --namespace meta svc/loki-gateway 3100:80
forward_grafana:
#!/bin/bash
export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace meta port-forward $POD_NAME 3000
forward_alloy_logs:
#!/bin/bash
export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=alloy-logs,app.kubernetes.io/instance=k8s" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace meta port-forward $POD_NAME 12345
forward_otel_collector:
#!/bin/bash
export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=otel-collector,app.kubernetes.io/instance=k8s" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace meta port-forward $POD_NAME 4317
get_inotify_max_user_watches:
#!/bin/bashcat /proc/sys/fs/inotify/max_user_watches
export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}")
# kubectl exec -n <grafana-namespace> <grafana-pod> -- sh -c "ulimit -n && cat /proc/sys/fs/inotify/max_user_watches"
kubectl exec -n meta $POD_NAME -- sh -c "ulimit -n && cat /proc/sys/fs/inotify/max_user_watches"

57
k8s-monitoring-values.yml Normal file
View file

@ -0,0 +1,57 @@
cluster:
name: meta-monitoring-tutorial
destinations:
- name: loki
type: loki
url: http://loki-gateway.meta.svc.cluster.local/loki/api/v1/push
clusterEvents:
enabled: true
collector: alloy-logs
namespaces: []
nodeLogs:
enabled: false
podLogs:
enabled: true
gatherMethod: volumes
collector: alloy-logs
labelsToKeep: ["app_kubernetes_io_name", "container", "instance", "job", "level", "namespace", "service_name", "service_namespace", "deployment_environment", "deployment_environment_name", 'span_id', 'trace_id']
operators:
- type: json_parser
id: parse_json
parse_from: body
- type: add_labels
labels:
trace_id: ${body.trace_id}
span_id: ${body.span_id}
service_name: ${body.service_name}
structuredMetadata:
pod: pod # Set structured metadata "pod" from label "pod"
namespaces: []
# Collectors
alloy-singleton:
enabled: false
alloy-metrics:
enabled: false
nodeExporter:
enabled: true
kube-state-metrics:
enabled: true
nodeLogs:
enabled: true
kubeletMetrics:
enabled: true
cAdvisor:
enabled: true
dashboards:
enabled: true
alloy-logs:
enabled: true
alloy:
mounts:
varlog: true
clustering:
enabled: false
alloy-profiles:
enabled: false
alloy-receiver:
enabled: false

78
loki-values.yml Normal file
View file

@ -0,0 +1,78 @@
---
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
schemaConfig:
configs:
- from: 2024-04-01
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
ingester:
chunk_encoding: snappy
tracing:
enabled: true
pattern_ingester:
enabled: true
limits_config:
allow_structured_metadata: true
volume_enabled: true
ruler:
enable_api: true
querier:
# Default is 4, if you have enough memory and CPU you can increase, reduce if OOMing
max_concurrent: 4
minio:
enabled: true
deploymentMode: SingleBinary
singleBinary:
replicas: 1
resources:
limits:
cpu: 4
memory: 4Gi
requests:
cpu: 2
memory: 2Gi
extraEnv:
# Keep a little bit lower than memory limits
- name: GOMEMLIMIT
value: 3750MiB
chunksCache:
# default is 500MB, with limited memory keep this smaller
writebackSizeLimit: 10MB
# Zero out replica counts of other deployment modes
backend:
replicas: 0
read:
replicas: 0
write:
replicas: 0
ingester:
replicas: 0
querier:
replicas: 0
queryFrontend:
replicas: 0
queryScheduler:
replicas: 0
distributor:
replicas: 0
compactor:
replicas: 0
indexGateway:
replicas: 0
bloomCompactor:
replicas: 0
bloomGateway:
replicas: 0

100
main.py Normal file
View file

@ -0,0 +1,100 @@
import time
import random
from flask import Flask
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.sdk.resources import Resource
import logging
from opentelemetry.trace import get_current_span
# Configure tracing
trace.set_tracer_provider(
TracerProvider(resource=Resource.create({"service.name": "my-flask-app"}))
)
tracer = trace.get_tracer(__name__)
span_processor = BatchSpanProcessor(OTLPSpanExporter())
trace.get_tracer_provider().add_span_processor(span_processor)
# Setup logging with trace_id
class TraceIdFilter(logging.Filter):
def filter(self, record):
span = trace.get_current_span()
ctx = span.get_span_context()
if ctx.trace_id != 0:
record.trace_id = format(ctx.trace_id, "032x")
record.span_id = format(ctx.span_id, "016x")
else:
record.trace_id = None
record.span_id = None
return True
formatter = logging.Formatter(
'{"message": "%(message)s", "trace_id": "%(trace_id)s", "span_id": "%(span_id)s"}'
)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
handler.addFilter(TraceIdFilter())
logger = logging.getLogger("myapp")
logger.setLevel(logging.INFO)
logger.addHandler(handler)
# Setup Tracer
resource = Resource(
attributes={
"service.name": "python-otel-demo-app", # 👈 choose any name you like
}
)
trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)
# Configure OTLP HTTP exporter
otlp_exporter = OTLPSpanExporter(
endpoint="http://otel-collector.meta.svc:4318/v1/traces",
# insecure=True,
)
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)
# Flask App
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
def get_trace_id():
span = get_current_span()
if span:
return span.get_span_context().trace_id
return None
@app.route("/")
def root():
with tracer.start_as_current_span("handle_homepage"):
logger.info("handling request", extra={"trace_id": get_trace_id()})
time.sleep(random.random() * 0.05) # simulate request parsing
fetch_user()
calculate_recommendations()
return "Hello from / with traces!"
def fetch_user():
with tracer.start_as_current_span("fetch_user"):
time.sleep(random.random() * 0.1) # simulate DB query
def calculate_recommendations():
with tracer.start_as_current_span("calculate_recommendations"):
for i in range(2):
with tracer.start_as_current_span(f"score_item_{i}"):
time.sleep(random.random())
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)

84
otel-collector.yaml Normal file
View file

@ -0,0 +1,84 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: meta
spec:
replicas: 1
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:0.81.0
args: ["--config=/etc/otel/otel-collector-config.yaml"]
volumeMounts:
- name: otel-config
mountPath: /etc/otel
volumes:
- name: otel-config
configMap:
name: otel-collector-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: meta
data:
otel-collector-config.yaml: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
spanmetrics:
dimensions:
- name: operation
default: unknown
metrics_exporter: prometheus
aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
exporters:
otlp:
endpoint: tempo:4317
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:8889"
service:
pipelines:
traces:
receivers: [otlp]
processors: [spanmetrics]
exporters: [otlp]
metrics:
receivers: []
exporters: [prometheus]
processors: []
---
apiVersion: v1
kind: Service
metadata:
name: otel-collector
namespace: meta
spec:
ports:
- name: grpc
port: 4317
targetPort: 4317
- name: http
port: 4318
targetPort: 4318
selector:
app: otel-collector

2
prometheus-values.yaml Normal file
View file

@ -0,0 +1,2 @@
grafana:
enabled: true

165
pyservice.yaml Normal file
View file

@ -0,0 +1,165 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: my-python-app
namespace: meta
data:
main.py: |
import time
import random
from flask import Flask
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.sdk.resources import Resource
import logging
from opentelemetry.trace import get_current_span
SERVICE_NAME = "my-flask-app" # match your OTEL resource name
trace.set_tracer_provider(
TracerProvider(resource=Resource.create({"service.name": "my-flask-app"}))
)
tracer = trace.get_tracer(__name__)
span_processor = BatchSpanProcessor(OTLPSpanExporter())
trace.get_tracer_provider().add_span_processor(span_processor)
class TraceIdFilter(logging.Filter):
def filter(self, record):
span = trace.get_current_span()
ctx = span.get_span_context()
if ctx.trace_id != 0:
record.trace_id = format(ctx.trace_id, "032x")
record.span_id = format(ctx.span_id, "016x")
record.service_name = SERVICE_NAME
else:
record.trace_id = None
record.span_id = None
return True
formatter = logging.Formatter(
'{"message": "%(message)s", "trace_id": "%(trace_id)s", "span_id": "%(span_id)s", "service_name": "my-flask-app"}'
)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
handler.addFilter(TraceIdFilter())
logger = logging.getLogger("myapp")
logger.setLevel(logging.INFO)
logger.addHandler(handler)
resource = Resource(
attributes={
"service.name": "python-otel-demo-app", # 👈 choose any name you like
}
)
trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)
otlp_exporter = OTLPSpanExporter(
endpoint="http://otel-collector.meta.svc:4318/v1/traces",
# insecure=True,
)
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
def get_trace_id():
span = get_current_span()
if span:
return span.get_span_context().trace_id
return None
@app.route("/")
def root():
with tracer.start_as_current_span("handle_homepage"):
logger.info("handling request", extra={"trace_id": get_trace_id()})
time.sleep(random.random() * 0.05) # simulate request parsing
fetch_user()
calculate_recommendations()
return "Hello from / with traces!"
def fetch_user():
with tracer.start_as_current_span("fetch_user"):
time.sleep(random.random() * 0.1) # simulate DB query
def calculate_recommendations():
with tracer.start_as_current_span("calculate_recommendations"):
for i in range(2):
with tracer.start_as_current_span(f"score_item_{i}"):
time.sleep(random.random())
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: python-otel
namespace: meta
labels:
app: python-otel
spec:
replicas: 1
selector:
matchLabels:
app: python-otel
template:
metadata:
labels:
app: python-otel
spec:
containers:
- name: python
image: python
command: ["/bin/sh", "-c"]
args:
- |
python -m venv .venv && \
.venv/bin/python -m pip install --upgrade pip && \
.venv/bin/python -m pip install \
opentelemetry-sdk \
opentelemetry-exporter-otlp \
opentelemetry-instrumentation-flask \
flask && \
.venv/bin/python /app/main.py
volumeMounts:
- name: app-code
mountPath: /app
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://otel-collector.meta.svc:4318
- name: OTEL_EXPORTER_OTLP_INSECURE
value: "true"
volumes:
- name: app-code
configMap:
name: my-python-app
---
apiVersion: v1
kind: Service
metadata:
name: python-otel
namespace: meta
spec:
selector:
app: python-otel
ports:
- port: 80
targetPort: 5000