opentelemetry-collectorでCPU使用率やメモリ使用量を取ることもできるのでそちらを使ってみます。
hostmetricsreceiverを使う
CPUやメモリなどを取るにはhostmetricsreceiverというreciverが用意されているのでそちらを使います。設定もシンプルで取りたい値をscrapeに指定しておくだけでopentelemetry-collectorが取得しにいきます。
設定ファイルは以下のように記述しています。
receivers: hostmetrics: collection_interval: 1m initial_delay: 1s scrapers: memory: exporters: file: path: /dev/stdout processors: batch: service: pipelines: metrics: receivers: [hostmetrics] processors: [batch] exporters: [file]
取得間隔やinitialまでの時間を指定しているだけです。ちなみにslabの詳細はいらないよみたいなケースでもreciverだけでは指定ができずFiltering Processorを別途挟んでそこでメトリクスを落としてあげる必要があるとのことです。
{"resourceMetrics":[{"resource":{},"scopeMetrics":[{"scope":{"name":"otelcol/hostmetricsreceiver/memory","version":"0.67.0"},"metrics":[{"name":"system.memory.usage","description":"Bytes of memory in use.","unit":"By","sum":{"dataPoints":[{"attributes":[{"key":"state","value":{"stringValue":"used"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"734248960"},{"attributes":[{"key":"state","value":{"stringValue":"free"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"6687731712"},{"attributes":[{"key":"state","value":{"stringValue":"buffered"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"60973056"},{"attributes":[{"key":"state","value":{"stringValue":"cached"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"836018176"},{"attributes":[{"key":"state","value":{"stringValue":"slab_reclaimable"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"54681600"},{"attributes":[{"key":"state","value":{"stringValue":"slab_unreclaimable"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"59551744"}],"aggregationTemporality":2}}]}],"schemaUrl":"https://opentelemetry.io/schemas/1.9.0"}]}
この場合はhostからメトリクスを取りにcollectorが行くのでいわゆるpull型の監視になる。この辺仕様にどう書いてるんだろう思って調べたらどうやら両方をサポートしているらしい。reciverやexporterごとにどちらをbaseにするかを選択することができる。
There are various tradeoffs between using Delta vs. Cumulative aggregation, in various use cases, e.g.: * Detecting process restarts * Calculating rates * Push vs. Pull based metric reporting OTLP supports both models, and allows APIs, SDKs and users to determine the best tradeoff for their use case.