Since v0.26, Pomerium configures Envoy to use a custom HTML error page
format string for most errors served by Envoy itself. This format string
uses %COMMAND% directives to include details about the error.
The HTML error page template also includes any branding options set via
the corresponding Enterprise settings. We need to ensure that any %
signs in the branding options strings are escaped to %% so that Envoy
will not interpret them as the start of a %COMMAND% directive, which
could lead to Envoy rejecting the format string as invalid.
* Refactor trace config to match supported otel options
* use duration instead of int64 for otel timeouts
* change 'trace client updated' log level to debug
* update tracing config definitions
* new tracing system
* performance improvements
* only configure tracing in envoy if it is enabled in pomerium
* [tracing] refactor to use custom extension for trace id editing (#5420)
refactor to use custom extension for trace id editing
* set default tracing sample rate to 1.0
* fix proxy service http middleware
* improve some existing auth related traces
* test fixes
* bump envoyproxy/go-control-plane
* code cleanup
* test fixes
* Fix missing spans for well-known endpoints
* import extension apis from pomerium/envoy-custom
The previous default sample rate of 0.0001 is very low, so traces are
unlikely to be visible after enabling them until many thousands of
requests have been sent. This could be confusing to users.
Currently Pomerium will always generate a wildcard certificate for use
as a fallback certificate.
If any other certificate is configured, this fallback certificate will
not normally be presented, except in the case of a TLS connection where
the client does not include the Server Name Indication (SNI) extension.
All modern browsers support SNI, so in practice this certificate should
never be presented to end users.
However, some network scanning tools will probe connections by IP
addresses (without SNI), and so this fallback certificate may be
presented. The presence of this certificate may be flagged as a problem
in some automated vulnerability scans.
Let's avoid generating this fallback certificate if Pomerium has any
other certificate configured (unless specifically requested by the Auto
TLS option). This should prevent false positive reports from these
particular vulnerability scans.
* envoyconfig: cleanup
* remove listener access log for mtls for insecure server which can't use mtls
* use new functions
* rename method
* refactor common code
* Initial test environment implementation
* linter pass
* wip: update request latency test
* bugfixes
* Fix logic race in envoy process monitor when canceling context
* skip tests using test environment on non-linux
This also replaces instances where we manually write "return ctx.Err()"
with "return context.Cause(ctx)" which is functionally identical, but
will also correctly propagate cause errors if present.
* Optimize policy iterators (go1.23)
This modifies (*Options).GetAllPolicies() to use a go 1.23 iterator
instead of copying all policies on every call, which can be extremely
expensive. All existing usages of this function were updated as
necessary.
Additionally, a new (*Options).NumPolicies() method was added which
quickly computes the number of policies that would be given by
GetAllPolicies(), since there were several usages where only the
number of policies was needed.
* Fix race condition when assigning default envoy opts to a policy
This allows using the scheme 'h2c' to indicate http2 prior knowledge for
insecure upstream servers. This can be used to perform TLS termination for
GRPC servers configured with insecure credentials.
As an example, this allows the following route configuration:
routes:
- from: https://grpc.localhost.pomerium.io
to: h2c://localhost:9090
envoy: log mtls failures
This implements limited listener-based access logging for downstream
transport failures, only enabled when downstream_mtls.enforcement is
set to 'reject_connection'. Client certificate details and the error
message will be logged.
Additionally, the new key 'client-certificate' can be set in the
access_log_fields list in the configuration, which will add peer
certificate properties (issuer, subject, SANs) to the existing
per-request http logs.
---------
Co-authored-by: Kenneth Jenkins <51246568+kenjenkins@users.noreply.github.com>
Add a new 'user_principal_name' type to the downstream mTLS
match_subject_alt_names option. This corresponds to the 'OtherName' type
with type-id 1.3.6.1.4.1.311.20.2.3 and a UTF8String value.
Add support for UserPrincipalName SAN matching to the policy evaluator.
Replace Atoi() calls with ParseUint(), and update the buildAddress()
defaultPort parameter to be a uint32. (A uint16 would arguably make more
sense for a port number, but uint32 matches the Envoy proto field.)
Delete a ParseAddress() method that appears to be unused.
Add a distinction between TCP routes depending on whether the To URL(s)
have the scheme tcp://. For routes with a TCP upstream, configure Envoy
to terminate CONNECT requests and open a TCP tunnel to the upstream
service (this is the current behavior). For routes without a TCP
upstream, configure Envoy to proxy CONNECT requests to the upstream.
This new mode can allow an upstream proxy server to terminate a CONNECT
request and open its own TCP tunnel to the final destination server.
(Note that this will typically require setting the preserve_host_header
option as well.)
Note that this requires Envoy 1.30 or later.
Envoy has an option 'auto_host_rewrite' that rewrites the Host header of
an incoming request to match the upstream domain that the proxied
request is sent to. Pomerium sets the 'auto_host_rewrite' option for all
Pomerium routes that do not set one of the "Host Rewrite options" (see
https://www.pomerium.com/docs/reference/routes/headers#host-rewrite-options).
When Envoy rewrites the Host header, it does not include the upstream
port, even when it is a non-standard port for the scheme (i.e. a port
other than 80 for http or a port other than 443 for https).
I think this behavior does not conform to RFC 9110. The nearest thing I
can find in the text is this statement about http and https URIs:
"If the port is equal to the default port for a scheme, the normal form
is to omit the port subcomponent."
(from https://datatracker.ietf.org/doc/html/rfc9110#section-4.2.3)
I take this to mean that the port should be specified in other cases.
There is a work-around: we can set an explicit hostname on each cluster
endpoint. Let's set this hostname based on the 'to' URL(s) from the
Pomerium route.
This should change the current behavior in two cases:
- When a route has a 'to' URL with a port number, this port number will
now be included in the Host header in the requests made by Pomerium.
- When a route has a 'to' URL with 'localhost' or an IP address as the
host, Pomerium will now rewrite the Host header to match the 'to'
URL.
There should be no change in behavior for routes where one of the "Host
Rewrite options" is set.
In split service mode, and during periods of inactivity, the gRPC
connections to the databroker may fall idle. Some network firewalls may
eventually time out an idle TCP connection and even start dropping
subsequent packets once connection traffic resumes. Combined with Linux
default TCP retransmission settings, this could cause a broken
connection to persist for over 15 minutes.
In an attempt to avoid this scenario, enable TCP keepalive for outbound
gRPC connections, matching the Go standard library default settings for
time & interval: 15 seconds for both. (The probe count does not appear
to be set, so it will remain at the OS default.)
Add a test case exercising the BuildClusters() method with the default
configuration options, comparing the results with a reference "golden"
file in the testdata directory. Also add an '-update' flag to make it
easier to update the reference golden when needed:
go test ./config/envoyconfig -update