地方エンジニアの学習日記

興味ある技術の雑なメモだったりを書いてくブログ。たまに日記とガジェット紹介。

OpenTelemetryでCPU使用率やメモリ使用量などを取ってみる

opentelemetry-collectorでCPU使用率やメモリ使用量を取ることもできるのでそちらを使ってみます。

hostmetricsreceiverを使う

CPUやメモリなどを取るにはhostmetricsreceiverというreciverが用意されているのでそちらを使います。設定もシンプルで取りたい値をscrapeに指定しておくだけでopentelemetry-collectorが取得しにいきます。

設定ファイルは以下のように記述しています。

receivers:
  hostmetrics:
    collection_interval: 1m
    initial_delay: 1s
    scrapers:
      memory:

exporters:
  file:
    path: /dev/stdout

processors:
  batch:

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: [batch]
      exporters: [file]

取得間隔やinitialまでの時間を指定しているだけです。ちなみにslabの詳細はいらないよみたいなケースでもreciverだけでは指定ができずFiltering Processorを別途挟んでそこでメトリクスを落としてあげる必要があるとのことです。

{"resourceMetrics":[{"resource":{},"scopeMetrics":[{"scope":{"name":"otelcol/hostmetricsreceiver/memory","version":"0.67.0"},"metrics":[{"name":"system.memory.usage","description":"Bytes of memory in use.","unit":"By","sum":{"dataPoints":[{"attributes":[{"key":"state","value":{"stringValue":"used"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"734248960"},{"attributes":[{"key":"state","value":{"stringValue":"free"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"6687731712"},{"attributes":[{"key":"state","value":{"stringValue":"buffered"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"60973056"},{"attributes":[{"key":"state","value":{"stringValue":"cached"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"836018176"},{"attributes":[{"key":"state","value":{"stringValue":"slab_reclaimable"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"54681600"},{"attributes":[{"key":"state","value":{"stringValue":"slab_unreclaimable"}}],"startTimeUnixNano":"1692346703000000000","timeUnixNano":"1692347059940199793","asInt":"59551744"}],"aggregationTemporality":2}}]}],"schemaUrl":"https://opentelemetry.io/schemas/1.9.0"}]}

この場合はhostからメトリクスを取りにcollectorが行くのでいわゆるpull型の監視になる。この辺仕様にどう書いてるんだろう思って調べたらどうやら両方をサポートしているらしい。reciverやexporterごとにどちらをbaseにするかを選択することができる。

There are various tradeoffs between using Delta vs. Cumulative aggregation, in various use cases, e.g.:

* Detecting process restarts
* Calculating rates
* Push vs. Pull based metric reporting

OTLP supports both models, and allows APIs, SDKs and users to determine the best tradeoff for their use case.