Skip to content

Conversation

zainkabani
Copy link
Contributor

When there are multiple concurrent active servers, the stats reporting reports a 0 value average. This is because of the old value being updated and then the average being overridden when a new server pointing to the same address stats gets updated.

SERVER STATS: "postgres_shard_0_replica_0", 452826896 total_value: 1648, old_value: 0 average: 329 SERVER STATS: "postgres_shard_0_replica_1", -1989878371 total_value: 1585, old_value: 0 average: 317 SERVER STATS: "postgres_shard_0_replica_0", 1447777444 total_value: 1648, old_value: 1648 average: 0 SERVER STATS: "postgres_shard_0_replica_0", -1279761882 total_value: 1648, old_value: 1648 average: 0 SERVER STATS: "postgres_shard_0_replica_1", 1763453831 total_value: 1585, old_value: 1585 average: 0 SERVER STATS: "postgres_shard_0_replica_1", 2126090134 total_value: 1585, old_value: 1585 average: 0 SERVER STATS: "postgres_shard_0_replica_0", -969830376 total_value: 1648, old_value: 1648 average: 0 

This PR fixes this by running a check to make sure that we only ever do this once for any given AddressStats Arc object.

@zainkabani zainkabani marked this pull request as ready for review May 12, 2023 00:12

for stats in SERVER_STATS.read().values() {
stats.address_stats().update_averages();
// Hold read lock for duration of update to retain all server stats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused here. We only have one instance of the collector running, so there is only one task updating averages. How can we run update_averages() twice or more even with multiple server connections connected to the same address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue here comes from all the servers having the same reference to the AddressStats. When we update address stat's old_total. The next server that we try to update averages for will see no difference between the current total and the old total cause it was updated in the previous iteration of the loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an example of this in the description of the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah okay. Confusing! Arcs are tricky. Ok, sounds good, I think this is fine for now. Would be good to investigate further and maybe remove the leaked reference?

@levkk levkk merged commit 7326069 into postgresml:main May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants