commit 0a4b32dffe40aa44690174e9abd40081ae579989 Author: Waylon S. Walker Date: Wed May 7 08:09:39 2025 -0500 init diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..6899691 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.null-ls_806023_main.py diff --git a/README.md b/README.md new file mode 100644 index 0000000..bd06063 --- /dev/null +++ b/README.md @@ -0,0 +1,118 @@ + +# Monitor Kubernetes Logs with Grafana Alloy and Loki + +> Note this scenario works using the K8s Monitoring Helm chart. This abstracts the need to configure Alloy and deploys best practices for monitoring Kubernetes clusters. The chart supports; metrics, logs, profiling, and tracing. For this scenario, we will use the K8s Monitoring Helm chart to monitor Kubernetes logs. + +This scenario demonstrates how to setup the Kubernetes monitoring helm and Loki. This scenario will install three Helm charts: Loki, Grafana, and k8s-monitoring-helm. Loki will be used to store the logs, Grafana will be used to visualize the logs, and Alloy (k8s-monitoring-helm) will be used to collect three different log sources: +* Pod Logs +* Kubernetes Events + +## Prerequisites + +Clone the repository: + +```bash +git clone https://github.com/grafana/alloy-scenarios.git +``` + +Change to the directory: + +```bash +cd alloy-scenarios/k8s-logs +``` + +Next you will need a Kubernetes cluster (In this example, we will configure a local Kubernetes cluster using [Kind](https://kind.sigs.k8s.io/docs/user/quick-start/)) + +An example kind cluster configuration is provided in the `kind.yml` file. To create a kind cluster using this configuration, run the following command: + +```bash +kind create cluster --config kind.yml +``` + +Lastly you will need to make sure you install Helm on your local machine. You can install Helm by following the instructions [here](https://helm.sh/docs/intro/install/). You will also need to install the Grafana Helm repository: + +```bash +helm repo add grafana https://grafana.github.io/helm-charts +``` + +## Create the `meta` and `prod` namespaces + +The first step is to create the `meta` and `prod` namespaces. To create the namespaces, run the following commands: + +```bash +kubectl create namespace meta && \ +kubectl create namespace prod +``` + + +## Install the Loki Helm Chart + +The first step is to install the Loki Helm chart. This will install Loki in the `meta` namespace. The `loki-values.yml` file contains the configuration for the Loki Helm chart. To install Loki, run the following command: + +```bash +helm install --values loki-values.yml loki grafana/loki -n meta +``` + +This installs Loki in monolithic mode. For more information on Loki modes, see the [Loki documentation](https://grafana.com/docs/loki/latest/get-started/deployment-modes/). + +## Install the Grafana Helm Chart + +The next step is to install the Grafana Helm chart. This will install Grafana in the `meta` namespace. The `grafana-values.yml` file contains the configuration for the Grafana Helm chart. To install Grafana, run the following command: + +```bash +helm install --values grafana-values.yml grafana grafana/grafana --namespace meta +``` +Note that within the `grafana-values.yml` file, the `grafana.ini` configuration is set to use the Loki data source. This is done by setting the `datasources.datasources.yaml` field to the Loki data source configuration. + +## Install the K8s Monitoring Helm Chart + +The final step is to install the K8s monitoring Helm chart. This will install Alloy in the `meta` namespace. The `k8s-monitoring-values.yml` file contains the configuration for the K8s monitoring Helm chart. To install the K8s monitoring Helm chart, run the following command: + +```bash +helm install --values ./k8s-monitoring-values.yml k8s grafana/k8s-monitoring -n meta --create-namespace +``` +Within the `k8s-monitoring-values.yml` file we declare the Alloy configuration. This configuration specifies the log sources that Alloy will collect logs from. In this scenario, we are collecting logs from two different sources: Pod Logs and Kubernetes Events. + +## Accessing the Grafana UI + +To access the Grafana UI, you will need to port-forward the Grafana pod to your local machine. First, get the name of the Grafana pod: + +```bash +export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}") +``` + +Next, port-forward the Grafana pod to your local machine: + +```bash +kubectl --namespace meta port-forward $POD_NAME 3000 +``` + +Open your browser and go to [http://localhost:3000](http://localhost:3000). You can log in with the default username `admin` and password `adminadminadmin`. + +## Accessing the Alloy UI + +To access the Alloy UI, you will need to port-forward the Alloy pod to your local machine. First, get the name of the Alloy pod: + +```bash +export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=alloy-logs,app.kubernetes.io/instance=k8s" -o jsonpath="{.items[0].metadata.name}") +``` + +Next, port-forward the Alloy pod to your local machine: + +```bash +kubectl --namespace meta port-forward $POD_NAME 12345 +``` + +## View the logs using Explore Logs in Grafana + +Explore Logs is a new feature in Grafana which provides a queryless way to explore logs. To access Explore Logs. To access Explore logs open a browser and go to [http://localhost:3000/a/grafana-lokiexplore-app](http://localhost:3000/a/grafana-lokiexplore-app). + +## Adding a demo prod app + +The k8s monitoring app is configured to collect logs from two namespaces: `meta` and `prod`. To add a demo prod app, run the following command: + +```bash +helm install tempo grafana/tempo-distributed -n prod +``` + +This will install the Tempo distributed tracing system in the `prod` namespace. \ No newline at end of file diff --git a/dashboards.yaml b/dashboards.yaml new file mode 100644 index 0000000..15407bb --- /dev/null +++ b/dashboards.yaml @@ -0,0 +1,166 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: my-dashboard + namespace: meta + labels: + grafana_dashboard: "1" +data: + my-dashboard.json: | + { + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "fiscalYearStartMonth": 0, + "graphTooltip": 0, + "id": 59, + "links": [], + "panels": [ + { + "datasource": { + "type": "tempo", + "uid": "cel2jaxx4s4xsf" + }, + "fieldConfig": { + "defaults": { + "custom": { + "align": "auto", + "cellOptions": { + "type": "auto" + }, + "inspect": false + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green" + }, + { + "color": "red", + "value": 80 + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 0 + }, + "id": 1, + "options": { + "cellHeight": "sm", + "footer": { + "countRows": false, + "fields": "", + "reducer": [ + "sum" + ], + "show": false + }, + "showHeader": true + }, + "pluginVersion": "11.6.1", + "targets": [ + { + "datasource": { + "type": "tempo", + "uid": "cel2jaxx4s4xsf" + }, + "filters": [ + { + "id": "574a7fa6", + "operator": "=", + "scope": "span" + } + ], + "limit": 20, + "metricsQueryType": "range", + "queryType": "traceqlSearch", + "refId": "A", + "tableType": "traces" + } + ], + "title": "traces over last 5 minutes", + "type": "table" + }, + { + "datasource": { + "type": "loki", + "uid": "P8E80F9AEF21F6940" + }, + "fieldConfig": { + "defaults": {}, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 8 + }, + "id": 2, + "options": { + "dedupStrategy": "none", + "enableInfiniteScrolling": false, + "enableLogDetails": true, + "prettifyLogMessage": false, + "showCommonLabels": false, + "showLabels": false, + "showTime": false, + "sortOrder": "Descending", + "wrapLogMessage": false + }, + "pluginVersion": "11.6.1", + "targets": [ + { + "datasource": { + "type": "loki", + "uid": "P8E80F9AEF21F6940" + }, + "direction": "backward", + "editorMode": "builder", + "expr": "{service_name=~\"python-otel-.*\"} |= ``", + "queryType": "range", + "refId": "A" + } + ], + "title": "python-otel-logs", + "type": "logs" + } + ], + "preload": false, + "schemaVersion": 41, + "tags": [], + "templating": { + "list": [] + }, + "time": { + "from": "now-5m", + "to": "now" + }, + "timepicker": {}, + "timezone": "browser", + "title": "python-otel-from-configmap", + "uid": "fel2uhjhepg5ce", + "version": 3 + } diff --git a/grafana-dashboards b/grafana-dashboards new file mode 160000 index 0000000..ad76eac --- /dev/null +++ b/grafana-dashboards @@ -0,0 +1 @@ +Subproject commit ad76eac2fbc99c8f6171048abecd8a7faf39297c diff --git a/grafana-values.yml b/grafana-values.yml new file mode 100644 index 0000000..d64824b --- /dev/null +++ b/grafana-values.yml @@ -0,0 +1,43 @@ +--- +persistence: + type: pvc + enabled: true + +# DO NOT DO THIS IN PRODUCTION USECASES +adminUser: admin +adminPassword: adminadminadmin +# CONSIDER USING AN EXISTING SECRET +# Use an existing secret for the admin user. +# admin: + ## Name of the secret. Can be templated. +# existingSecret: "" +# userKey: admin-user +# passwordKey: admin-password + +service: + enabled: true + type: ClusterIP + +sidecar: + dashboards: + enabled: true + label: grafana_dashboard + labelValue: "1" + folder: /tmp/dashboards + searchNamespace: "" + + +datasources: + datasources.yaml: + apiVersion: 1 + datasources: + - name: Loki + type: loki + access: proxy + orgId: 1 + url: http://loki-gateway.meta.svc.cluster.local:80 + basicAuth: false + isDefault: false + version: 1 + editable: false + diff --git a/justfile b/justfile new file mode 100644 index 0000000..b9e25ed --- /dev/null +++ b/justfile @@ -0,0 +1,25 @@ +forward_loki: + kubectl port-forward --namespace meta svc/loki-gateway 3100:80 +forward_grafana: + #!/bin/bash + export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}") + kubectl --namespace meta port-forward $POD_NAME 3000 + +forward_alloy_logs: + #!/bin/bash + export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=alloy-logs,app.kubernetes.io/instance=k8s" -o jsonpath="{.items[0].metadata.name}") + kubectl --namespace meta port-forward $POD_NAME 12345 + +forward_otel_collector: + #!/bin/bash + export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=otel-collector,app.kubernetes.io/instance=k8s" -o jsonpath="{.items[0].metadata.name}") + kubectl --namespace meta port-forward $POD_NAME 4317 + +get_inotify_max_user_watches: + #!/bin/bashcat /proc/sys/fs/inotify/max_user_watches + + export POD_NAME=$(kubectl get pods --namespace meta -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}") + # kubectl exec -n -- sh -c "ulimit -n && cat /proc/sys/fs/inotify/max_user_watches" + kubectl exec -n meta $POD_NAME -- sh -c "ulimit -n && cat /proc/sys/fs/inotify/max_user_watches" + + diff --git a/k8s-monitoring-values.yml b/k8s-monitoring-values.yml new file mode 100644 index 0000000..a5c8458 --- /dev/null +++ b/k8s-monitoring-values.yml @@ -0,0 +1,57 @@ +cluster: + name: meta-monitoring-tutorial +destinations: + - name: loki + type: loki + url: http://loki-gateway.meta.svc.cluster.local/loki/api/v1/push +clusterEvents: + enabled: true + collector: alloy-logs + namespaces: [] +nodeLogs: + enabled: false +podLogs: + enabled: true + gatherMethod: volumes + collector: alloy-logs + labelsToKeep: ["app_kubernetes_io_name", "container", "instance", "job", "level", "namespace", "service_name", "service_namespace", "deployment_environment", "deployment_environment_name", 'span_id', 'trace_id'] + operators: + - type: json_parser + id: parse_json + parse_from: body + - type: add_labels + labels: + trace_id: ${body.trace_id} + span_id: ${body.span_id} + service_name: ${body.service_name} + structuredMetadata: + pod: pod # Set structured metadata "pod" from label "pod" + namespaces: [] +# Collectors +alloy-singleton: + enabled: false +alloy-metrics: + enabled: false +nodeExporter: + enabled: true +kube-state-metrics: + enabled: true +nodeLogs: + enabled: true +kubeletMetrics: + enabled: true +cAdvisor: + enabled: true +dashboards: + enabled: true +alloy-logs: + enabled: true + alloy: + mounts: + varlog: true + clustering: + enabled: false +alloy-profiles: + enabled: false +alloy-receiver: + enabled: false diff --git a/loki-values.yml b/loki-values.yml new file mode 100644 index 0000000..040e55a --- /dev/null +++ b/loki-values.yml @@ -0,0 +1,78 @@ +--- +loki: + auth_enabled: false + commonConfig: + replication_factor: 1 + schemaConfig: + configs: + - from: 2024-04-01 + store: tsdb + object_store: s3 + schema: v13 + index: + prefix: loki_index_ + period: 24h + ingester: + chunk_encoding: snappy + tracing: + enabled: true + pattern_ingester: + enabled: true + limits_config: + allow_structured_metadata: true + volume_enabled: true + ruler: + enable_api: true + querier: + # Default is 4, if you have enough memory and CPU you can increase, reduce if OOMing + max_concurrent: 4 + +minio: + enabled: true + +deploymentMode: SingleBinary +singleBinary: + replicas: 1 + resources: + limits: + cpu: 4 + memory: 4Gi + requests: + cpu: 2 + memory: 2Gi + extraEnv: + # Keep a little bit lower than memory limits + - name: GOMEMLIMIT + value: 3750MiB + +chunksCache: + # default is 500MB, with limited memory keep this smaller + writebackSizeLimit: 10MB + + +# Zero out replica counts of other deployment modes +backend: + replicas: 0 +read: + replicas: 0 +write: + replicas: 0 + +ingester: + replicas: 0 +querier: + replicas: 0 +queryFrontend: + replicas: 0 +queryScheduler: + replicas: 0 +distributor: + replicas: 0 +compactor: + replicas: 0 +indexGateway: + replicas: 0 +bloomCompactor: + replicas: 0 +bloomGateway: + replicas: 0 \ No newline at end of file diff --git a/main.py b/main.py new file mode 100644 index 0000000..e179293 --- /dev/null +++ b/main.py @@ -0,0 +1,100 @@ +import time +import random +from flask import Flask +from opentelemetry import trace +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import BatchSpanProcessor +from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter +from opentelemetry.instrumentation.flask import FlaskInstrumentor +from opentelemetry.sdk.resources import Resource +import logging +from opentelemetry.trace import get_current_span + + +# Configure tracing +trace.set_tracer_provider( + TracerProvider(resource=Resource.create({"service.name": "my-flask-app"})) +) +tracer = trace.get_tracer(__name__) +span_processor = BatchSpanProcessor(OTLPSpanExporter()) +trace.get_tracer_provider().add_span_processor(span_processor) + + +# Setup logging with trace_id +class TraceIdFilter(logging.Filter): + def filter(self, record): + span = trace.get_current_span() + ctx = span.get_span_context() + if ctx.trace_id != 0: + record.trace_id = format(ctx.trace_id, "032x") + record.span_id = format(ctx.span_id, "016x") + else: + record.trace_id = None + record.span_id = None + return True + + +formatter = logging.Formatter( + '{"message": "%(message)s", "trace_id": "%(trace_id)s", "span_id": "%(span_id)s"}' +) +handler = logging.StreamHandler() +handler.setFormatter(formatter) +handler.addFilter(TraceIdFilter()) + +logger = logging.getLogger("myapp") +logger.setLevel(logging.INFO) +logger.addHandler(handler) + +# Setup Tracer +resource = Resource( + attributes={ + "service.name": "python-otel-demo-app", # 👈 choose any name you like + } +) +trace.set_tracer_provider(TracerProvider(resource=resource)) +tracer = trace.get_tracer(__name__) + +# Configure OTLP HTTP exporter +otlp_exporter = OTLPSpanExporter( + endpoint="http://otel-collector.meta.svc:4318/v1/traces", + # insecure=True, +) +span_processor = BatchSpanProcessor(otlp_exporter) +trace.get_tracer_provider().add_span_processor(span_processor) + +# Flask App +app = Flask(__name__) +FlaskInstrumentor().instrument_app(app) + + +def get_trace_id(): + span = get_current_span() + if span: + return span.get_span_context().trace_id + return None + + +@app.route("/") +def root(): + with tracer.start_as_current_span("handle_homepage"): + logger.info("handling request", extra={"trace_id": get_trace_id()}) + time.sleep(random.random() * 0.05) # simulate request parsing + fetch_user() + calculate_recommendations() + return "Hello from / with traces!" + + +def fetch_user(): + with tracer.start_as_current_span("fetch_user"): + time.sleep(random.random() * 0.1) # simulate DB query + + +def calculate_recommendations(): + with tracer.start_as_current_span("calculate_recommendations"): + for i in range(2): + with tracer.start_as_current_span(f"score_item_{i}"): + time.sleep(random.random()) + + +if __name__ == "__main__": + app.run(host="0.0.0.0", port=5000) diff --git a/otel-collector.yaml b/otel-collector.yaml new file mode 100644 index 0000000..80a3ce0 --- /dev/null +++ b/otel-collector.yaml @@ -0,0 +1,84 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: otel-collector + namespace: meta +spec: + replicas: 1 + selector: + matchLabels: + app: otel-collector + template: + metadata: + labels: + app: otel-collector + spec: + containers: + - name: otel-collector + image: otel/opentelemetry-collector-contrib:0.81.0 + args: ["--config=/etc/otel/otel-collector-config.yaml"] + volumeMounts: + - name: otel-config + mountPath: /etc/otel + volumes: + - name: otel-config + configMap: + name: otel-collector-config +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: otel-collector-config + namespace: meta +data: + otel-collector-config.yaml: | + receivers: + otlp: + protocols: + grpc: + endpoint: 0.0.0.0:4317 + http: + endpoint: 0.0.0.0:4318 + + processors: + spanmetrics: + dimensions: + - name: operation + default: unknown + metrics_exporter: prometheus + aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE" + + exporters: + otlp: + endpoint: tempo:4317 + tls: + insecure: true + prometheus: + endpoint: "0.0.0.0:8889" + + service: + pipelines: + traces: + receivers: [otlp] + processors: [spanmetrics] + exporters: [otlp] + metrics: + receivers: [] + exporters: [prometheus] + processors: [] +--- +apiVersion: v1 +kind: Service +metadata: + name: otel-collector + namespace: meta +spec: + ports: + - name: grpc + port: 4317 + targetPort: 4317 + - name: http + port: 4318 + targetPort: 4318 + selector: + app: otel-collector diff --git a/prometheus-values.yaml b/prometheus-values.yaml new file mode 100644 index 0000000..4b0b000 --- /dev/null +++ b/prometheus-values.yaml @@ -0,0 +1,2 @@ +grafana: + enabled: true diff --git a/pyservice.yaml b/pyservice.yaml new file mode 100644 index 0000000..fcad3c2 --- /dev/null +++ b/pyservice.yaml @@ -0,0 +1,165 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: my-python-app + namespace: meta +data: + main.py: | + import time + import random + from flask import Flask + from opentelemetry import trace + from opentelemetry.sdk.trace import TracerProvider + from opentelemetry.sdk.trace.export import BatchSpanProcessor + from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter + from opentelemetry.instrumentation.flask import FlaskInstrumentor + from opentelemetry.sdk.resources import Resource + import logging + from opentelemetry.trace import get_current_span + SERVICE_NAME = "my-flask-app" # match your OTEL resource name + + + + trace.set_tracer_provider( + TracerProvider(resource=Resource.create({"service.name": "my-flask-app"})) + ) + tracer = trace.get_tracer(__name__) + span_processor = BatchSpanProcessor(OTLPSpanExporter()) + trace.get_tracer_provider().add_span_processor(span_processor) + + + + class TraceIdFilter(logging.Filter): + def filter(self, record): + span = trace.get_current_span() + ctx = span.get_span_context() + if ctx.trace_id != 0: + record.trace_id = format(ctx.trace_id, "032x") + record.span_id = format(ctx.span_id, "016x") + record.service_name = SERVICE_NAME + else: + record.trace_id = None + record.span_id = None + return True + + + formatter = logging.Formatter( + '{"message": "%(message)s", "trace_id": "%(trace_id)s", "span_id": "%(span_id)s", "service_name": "my-flask-app"}' + ) + handler = logging.StreamHandler() + handler.setFormatter(formatter) + handler.addFilter(TraceIdFilter()) + + logger = logging.getLogger("myapp") + logger.setLevel(logging.INFO) + logger.addHandler(handler) + + + resource = Resource( + attributes={ + "service.name": "python-otel-demo-app", # 👈 choose any name you like + } + ) + trace.set_tracer_provider(TracerProvider(resource=resource)) + tracer = trace.get_tracer(__name__) + + + otlp_exporter = OTLPSpanExporter( + endpoint="http://otel-collector.meta.svc:4318/v1/traces", + # insecure=True, + ) + span_processor = BatchSpanProcessor(otlp_exporter) + trace.get_tracer_provider().add_span_processor(span_processor) + + + app = Flask(__name__) + FlaskInstrumentor().instrument_app(app) + + + def get_trace_id(): + span = get_current_span() + if span: + return span.get_span_context().trace_id + return None + + + @app.route("/") + def root(): + with tracer.start_as_current_span("handle_homepage"): + logger.info("handling request", extra={"trace_id": get_trace_id()}) + time.sleep(random.random() * 0.05) # simulate request parsing + fetch_user() + calculate_recommendations() + return "Hello from / with traces!" + + + def fetch_user(): + with tracer.start_as_current_span("fetch_user"): + time.sleep(random.random() * 0.1) # simulate DB query + + + def calculate_recommendations(): + with tracer.start_as_current_span("calculate_recommendations"): + for i in range(2): + with tracer.start_as_current_span(f"score_item_{i}"): + time.sleep(random.random()) + + + if __name__ == "__main__": + app.run(host="0.0.0.0", port=5000) +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: python-otel + namespace: meta + labels: + app: python-otel +spec: + replicas: 1 + selector: + matchLabels: + app: python-otel + template: + metadata: + labels: + app: python-otel + spec: + containers: + - name: python + image: python + command: ["/bin/sh", "-c"] + args: + - | + python -m venv .venv && \ + .venv/bin/python -m pip install --upgrade pip && \ + .venv/bin/python -m pip install \ + opentelemetry-sdk \ + opentelemetry-exporter-otlp \ + opentelemetry-instrumentation-flask \ + flask && \ + .venv/bin/python /app/main.py + volumeMounts: + - name: app-code + mountPath: /app + env: + - name: OTEL_EXPORTER_OTLP_ENDPOINT + value: http://otel-collector.meta.svc:4318 + - name: OTEL_EXPORTER_OTLP_INSECURE + value: "true" + volumes: + - name: app-code + configMap: + name: my-python-app +--- +apiVersion: v1 +kind: Service +metadata: + name: python-otel + namespace: meta +spec: + selector: + app: python-otel + ports: + - port: 80 + targetPort: 5000