[Bug] Kubernetes discovery - targets not being removed after pods tear down #634

grzesuav · 2024-09-02T14:06:27Z

Current Behavior

Currently in the topology/target selection I can still see old/nonexisting targets in the vew, even like 5 minutes after pods stopped

Expected Behavior

Non-running pods are removed

Steps To Reproduce

k get rs
❯ k get rs
NAME                           DESIRED   CURRENT   READY   AGE
registry-556c9d5446            2         2         2       17m
registry-6878b7c78b            0         0         0       70m
registry-f459568bf             0         0         0       9d

as you can see, replicaset f459568bf is quite old, and it does not currently any running pod

❯ k get pods
NAME                                 READY   STATUS    RESTARTS   AGE
registry-556c9d5446-bzm2m            2/2     Running   0          18m
registry-556c9d5446-xh2nq            2/2     Running   0          20m

Environment

- OS: AKSUbuntu
- Environment: AKS 1.31
- Version: Cryoostat 3.0

Anything else?

No response

The text was updated successfully, but these errors were encountered:

andrewazores · 2024-09-02T14:23:06Z

@grzesuav are there any exceptions that appear in the Cryostat container logs at the time (or within some seconds after) you scale down or delete one of these deployments?

And could you paste the output from:

$ kubectl get -o yaml endpoints

grzesuav · 2024-09-02T15:32:13Z

❯ k get endpoints registry -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    endpoints.kubernetes.io/last-change-trigger-time: "2024-09-02T15:26:47Z
  name: registry
  namespace: registry
subsets:
- addresses:
  - ip: 10.184.uuu.xxx
    nodeName: aks-nodepool0609-redacted
    targetRef:
      kind: Pod
      name: registry-8b68c85b8-mjt7n
      namespace: registry
      uid: 852fcb5b-redacted
  - ip: 10.184.fff.rrr
    nodeName: aks-nodepool0609-redacted
    targetRef:
      kind: Pod
      name: registry-8b68c85b8-sqktd
      namespace: registry
      uid: 03b7ba07-redacted
  notReadyAddresses:
  - ip: 10.184.yyy.xxx
    nodeName: aks-nodepool0609-redacted
    targetRef:
      kind: Pod
      name: registry-8b68c85b8-2zj9n
      namespace: registry
      uid: 66499772-redacted
  ports:
  - name: http
    port: 9000
    protocol: TCP
  - name: jfr-jmx
    port: 9091
    protocol: TCP
  - name: http-prometheus
    port: 9090
    protocol: TCP

 k get pods -o wide
NAME                                 READY   STATUS    RESTARTS   AGE     IP              NODE                                   NOMINATED NODE   READINESS GATES
registry-8b68c85b8-2zj9n             2/2     Running   0          38s     10.184.   aks-nodepool0609-redacted   <none>           <none>
registry-8b68c85b8-mjt7n             2/2     Running   0          6m22s   10.184.   aks-nodepool0609-redacted   <none>           <none>
registry-8b68c85b8-sqktd             2/2     Running   0          6m43s   10.184.    aks-nodepool0609-redacted   <none>           <none>

grzesuav · 2024-09-02T16:01:31Z

I see some various errors in cryostat logs, will continue tomorrow to provide more details

grzesuav · 2024-09-03T10:28:31Z

So actually today I see the same targets as in #634 (comment) where the pods aren't ther for many hours.
Explore-logs-2024-09-03 12_26_01.txt

attaching logs which apper now when I am trying to connect.

@andrewazores what is the name of the kubernetes discovery logger ? Maybe I can filter logs related to that to find something intertesting ?

andrewazores · 2024-09-03T15:47:47Z

cryostat/src/main/java/io/cryostat/discovery/KubeApiDiscovery.java

Line 81 in 9e2a375

@Inject Logger logger;

I think the Logger's name should be io.cryostat.discovery.KubeApiDiscovery.

Possibly related: #353 , #396

grzesuav added bug Something isn't working needs-triage Needs thorough attention from code reviewers labels Sep 2, 2024

andrewazores changed the title ~~[Bug] Cryostat discovery - targets not being removed after pods tear down~~ [Bug] Kubernetes discovery - targets not being removed after pods tear down Sep 2, 2024

andrewazores removed the needs-triage Needs thorough attention from code reviewers label Oct 9, 2024

andrewazores self-assigned this Oct 9, 2024

andrewazores linked a pull request Oct 9, 2024 that will close this issue

fix(discovery): k8s discovery synchronization for stale lost targets #689

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Kubernetes discovery - targets not being removed after pods tear down #634

[Bug] Kubernetes discovery - targets not being removed after pods tear down #634

grzesuav commented Sep 2, 2024

andrewazores commented Sep 2, 2024 •

edited

Loading

grzesuav commented Sep 2, 2024

grzesuav commented Sep 2, 2024

grzesuav commented Sep 3, 2024

andrewazores commented Sep 3, 2024

[Bug] Kubernetes discovery - targets not being removed after pods tear down #634

[Bug] Kubernetes discovery - targets not being removed after pods tear down #634

Comments

grzesuav commented Sep 2, 2024

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

andrewazores commented Sep 2, 2024 • edited Loading

grzesuav commented Sep 2, 2024

grzesuav commented Sep 2, 2024

grzesuav commented Sep 3, 2024

andrewazores commented Sep 3, 2024

andrewazores commented Sep 2, 2024 •

edited

Loading