## Summary
When there are a large number of record changes and several databrokers,
the query to delete those record changes can sometimes deadlock. If we
use a CTE with `FOR UPDATE SKIP LOCKED` we can avoid this deadlock.
I also lowered the default expiry from 24 hours to 1 hour. The record
changes table is used via `Sync` and any client using `Sync` should
already account for missing records by switching to `SyncLatest` if its
too far behind. An hour ought to be more than sufficient.
## Related issues
-
[ENG-2635](https://linear.app/pomerium/issue/ENG-2635/core-postgres-deadlock-in-record-change-deletion)
## Checklist
- [x] reference any related issues
- [x] updated unit tests
- [x] add appropriate label (`enhancement`, `bug`, `breaking`,
`dependencies`, `ci`)
- [x] ready for review
## Summary
Some test improvements:
- disable the race detector on macos, which currently results in a ton
of linking errors, and I think doing race detection on linux is
sufficient to find stuff
- add build tags to the `integration` folder so it's not included by
default when running `go test ./...`
- remove some unused stuff in the Makefile
- change the file `testenv` searches for from `.git` to `go.mod` which
allows test caching to work more reliably
- remove the test for orphaned connections in postgres. It doesn't
actually make sense because we have listener connections in the
background. I think it was racy.
- fix yarn caching
## Checklist
- [ ] reference any related issues
- [x] updated unit tests
- [x] add appropriate label (`enhancement`, `bug`, `breaking`,
`dependencies`, `ci`)
- [x] ready for review
## Summary
Improvements to the `contextutil.Merge` function:
- it can now take an arbitrary number of contexts rather than just 2
- it returns a `CancelCauseFunc` instead of a `CancelFunc` which should
be more flexible
- `Deadline` now returns the soonest time instead of the first one
defined
- use `synctest` for the tests, add a few more tests
## Checklist
- [ ] reference any related issues
- [x] updated unit tests
- [x] add appropriate label (`enhancement`, `bug`, `breaking`,
`dependencies`, `ci`)
- [x] ready for review
## Summary
Refactor the storage tests so they're mostly in `storagetest` and used
for both providers.
Use `SO_REUSEPORT` on listeners to try and make tests more reliable.
## Related issues
<!-- For example...
- #159
-->
## User Explanation
<!-- How would you explain this change to the user? If this
change doesn't create any user-facing changes, you can leave
this blank. If filled out, add the `docs` label -->
## Checklist
- [ ] reference any related issues
- [ ] updated unit tests
- [x] add appropriate label (`enhancement`, `bug`, `breaking`,
`dependencies`, `ci`)
- [x] ready for review
## Summary
There are 3 indices in the postgres storage driver that are redundant.
This PR drops them.
## Related issues
-
[ENG-2560](https://linear.app/pomerium/issue/ENG-2560/request-remove-redundant-database-indexes)
## Checklist
- [x] reference any related issues
- [ ] updated unit tests
- [x] add appropriate label (`enhancement`, `bug`, `breaking`,
`dependencies`, `ci`)
- [x] ready for review
## Summary
Enables metrics for the `pgxpool` that is used by the PostgreSQL
databroker backend.
Metrics are updated at 1s interval.
Will add the following metric output in the regular Prometheus
`/metrics` endpoint:
```
# HELP pomerium_pgxpool_acquire_duration_nanoseconds_total Total duration of all successful acquires from the pool in nanoseconds.
# TYPE pomerium_pgxpool_acquire_duration_nanoseconds_total counter
pomerium_pgxpool_acquire_duration_nanoseconds_total{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 5.1058702e+07
# HELP pomerium_pgxpool_acquired_connections Number of currently acquired connections in the pool.
# TYPE pomerium_pgxpool_acquired_connections gauge
pomerium_pgxpool_acquired_connections{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 0
# HELP pomerium_pgxpool_acquires_total Cumulative count of successful acquires from the pool.
# TYPE pomerium_pgxpool_acquires_total counter
pomerium_pgxpool_acquires_total{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 91
# HELP pomerium_pgxpool_canceled_acquires_total Cumulative count of acquires from the pool that were canceled by a context.
# TYPE pomerium_pgxpool_canceled_acquires_total counter
pomerium_pgxpool_canceled_acquires_total{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 0
# HELP pomerium_pgxpool_constructing_connections_milliseconds Number of connections with construction in progress in the pool.
# TYPE pomerium_pgxpool_constructing_connections_milliseconds gauge
pomerium_pgxpool_constructing_connections_milliseconds{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 0
# HELP pomerium_pgxpool_empty_acquire_total Cumulative count of successful acquires from the pool that waited for a resource to be released or constructed because the pool was empty.
# TYPE pomerium_pgxpool_empty_acquire_total counter
pomerium_pgxpool_empty_acquire_total{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 6
# HELP pomerium_pgxpool_idle_connections Number of currently idle connections in the pool.
# TYPE pomerium_pgxpool_idle_connections gauge
pomerium_pgxpool_idle_connections{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 5
# HELP pomerium_pgxpool_max_connections Maximum size of the pool.
# TYPE pomerium_pgxpool_max_connections gauge
pomerium_pgxpool_max_connections{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 10
# HELP pomerium_pgxpool_max_idle_destroys_total Cumulative count of connections destroyed because they exceeded MaxConnectionsIdleTime.
# TYPE pomerium_pgxpool_max_idle_destroys_total counter
pomerium_pgxpool_max_idle_destroys_total{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 0
# HELP pomerium_pgxpool_max_lifetime_destroys_total Cumulative count of connections destroyed because they exceeded MaxConnectionsLifetime.
# TYPE pomerium_pgxpool_max_lifetime_destroys_total counter
pomerium_pgxpool_max_lifetime_destroys_total{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 0
# HELP pomerium_pgxpool_new_connections_total Cumulative count of new connections opened.
# TYPE pomerium_pgxpool_new_connections_total counter
pomerium_pgxpool_new_connections_total{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 6
# HELP pomerium_pgxpool_total_connections Total number of resources currently in the pool. The value is the sum of ConstructingConnections, AcquiredConnections, and IdleConnections.
# TYPE pomerium_pgxpool_total_connections gauge
pomerium_pgxpool_total_connections{db_client_connection_pool_name="localhost:5432/pomerium",db_system="postgresql",otel_scope_name="github.com/exaring/otelpgx",otel_scope_version="v0.9.1",hostname="MacBookPro"} 5
```
## Related issues
<!-- For example...
- #159
-->
## User Explanation
<!-- How would you explain this change to the user? If this
change doesn't create any user-facing changes, you can leave
this blank. If filled out, add the `docs` label -->
## Checklist
- [x] reference any related issues
- [x] updated unit tests
- [x] add appropriate label (`enhancement`, `bug`, `breaking`,
`dependencies`, `ci`)
- [x] ready for review
## Summary
Add metrics for the global cache. Configure `otel` to export metrics to
prometheus.
## Related issues
-
[ENG-2407](https://linear.app/pomerium/issue/ENG-2407/add-additional-metrics-and-tracing-spans-to-pomerium)
## Checklist
- [x] reference any related issues
- [ ] updated unit tests
- [ ] add appropriate label (`enhancement`, `bug`, `breaking`,
`dependencies`, `ci`)
- [x] ready for review
## Summary
Invalidate the sync querier when records are updated so that we fallback
to databroker querying until the sync is complete.
## Related issues
For
[ENG-2377](https://linear.app/pomerium/issue/ENG-2377/core-initial-access-with-idp-accessidentity-tokens-sometimes-fails)
## Checklist
- [x] reference any related issues
- [x] updated unit tests
- [x] add appropriate label (`enhancement`, `bug`, `breaking`,
`dependencies`, `ci`)
- [x] ready for review
* upgrade to go v1.24
* add a macOS-specific //nolint comment too
---------
Co-authored-by: Kenneth Jenkins <51246568+kenjenkins@users.noreply.github.com>
This also replaces instances where we manually write "return ctx.Err()"
with "return context.Cause(ctx)" which is functionally identical, but
will also correctly propagate cause errors if present.
Add a test that interleaves calls to Put() (with different type strings)
and ListTypes(). At least on my machine, this appears to reliably detect
the data race fixed in commit 2f8743522d
when run with the Go race detector. (The 'make test' and 'make cover'
targets run with the Go race detector enabled.)
The Patch() method was intended to skip any records that do not
currently exist. However, currently inmemory.Backend.Patch() will return
ErrNotFound if the last record in the records slice is not found (it
will ignore any other previous records that are not found).
Update the error handling logic here to be consistent with the postgres
backend, and add a unit test to exercise this case.
Add a Patch() method to the databroker gRPC service.
Update the storage.Backend interface to include the Patch() method now
that all the storage.Backend implementations include it.
Add a test to exercise the patch method under concurrent usage.
Remove the Redis databroker backend. According to
https://www.pomerium.com/docs/internals/data-storage#redis it has been
discouraged since Pomerium v0.18.
Update the config options validation to return an error if "redis" is
set as the databroker storage backend type.
Add a new Patch() method that updates specific fields of an existing
record's data, based on a field mask.
Extract some logic from the existing Get() and Put() methods so it can
be shared with the new Patch() method.