Sergey Kandaurov [Tue, 30 Nov 2021 11:30:59 +0000 (14:30 +0300)]
QUIC: simplified ngx_quic_send_alert() callback.
Removed sending CLOSE_CONNECTION directly to avoid duplicate frames,
since it is sent later again in SSL_do_handshake() error handling.
As such, removed redundant settings of error fields set elsewhere.
While here, improved debug message.
Vladimir Homutov [Thu, 18 Nov 2021 11:33:21 +0000 (14:33 +0300)]
QUIC: removed unnecessary closing of active/backup sockets.
All open sockets are stored in a queue. There is no need to close some
of them separately. If it happens that active and backup point to same
socket, double close may happen (leading to possible segfault).
Vladimir Homutov [Mon, 29 Nov 2021 08:51:14 +0000 (11:51 +0300)]
QUIC: fixed migration during NAT rebinding.
The RFC 9000 allows a packet from known CID arrive from unknown path:
These requirements regarding connection ID reuse apply only to the
sending of packets, as unintentional changes in path without a change
in connection ID are possible. For example, after a period of
network inactivity, NAT rebinding might cause packets to be sent on a
new path when the client resumes sending.
Before the patch, such packets were rejected with an error in the
ngx_quic_check_migration() function. Removing the check makes the
separate function excessive - remaining checks are early migration
check and "disable_active_migration" check. The latter is a transport
parameter sent to client and it should not be used by server.
The server should send "disable_active_migration" "if the endpoint does
not support active connection migration" (18.2). The support status depends
on nginx configuration: to have migration working with multiple workers,
you need bpf helper, available on recent Linux systems. The patch does
not set "disable_active_migration" automatically and leaves it for the
administrator. By default, active migration is enabled.
RFC 900 says that it is ok to migrate if the peer violates
"disable_active_migration" flag requirements:
If the peer violates this requirement,
the endpoint MUST either drop the incoming packets on that path without
generating a Stateless Reset
OR
proceed with path validation and allow the peer to migrate. Generating a
Stateless Reset or closing the connection would allow third parties in the
network to cause connections to close by spoofing or otherwise manipulating
observed traffic.
So, nginx adheres to the second option and proceeds to path validation.
Note:
The ngtcp2 may be used for testing both active migration and NAT rebinding:
Vladimir Homutov [Mon, 29 Nov 2021 08:49:09 +0000 (11:49 +0300)]
QUIC: refactored multiple QUIC packets handling.
Single UDP datagram may contain multiple QUIC datagrams. In order to
facilitate handling of such cases, 'first' flag in the ngx_quic_header_t
structure is introduced.
Vladimir Homutov [Thu, 18 Nov 2021 11:19:36 +0000 (14:19 +0300)]
QUIC: fixed handling of RETIRE_CONNECTION_ID frame.
Previously, the retired socket was not closed if it didn't match
active or backup.
New sockets could not be created (due to count limit), since retired socket
was not closed before calling ngx_quic_create_sockets().
When replacing retired socket, new socket is only requested after closing
old one, to avoid hitting the limit on the number of active connection ids.
Together with added restrictions, this fixes an issue when a current socket
could be closed during migration, recreated and erroneously reused leading
to null pointer dereference.
Roman Arutyunyan [Wed, 17 Nov 2021 20:07:51 +0000 (23:07 +0300)]
QUIC: handle DATA_BLOCKED frame from client.
Previously the frame was not handled and connection was closed with an error.
Now, after receiving this frame, global flow control is updated and new
flow control credit is sent to client.
Roman Arutyunyan [Wed, 17 Nov 2021 20:07:38 +0000 (23:07 +0300)]
QUIC: update stream flow control credit on STREAM_DATA_BLOCKED.
Previously, after receiving STREAM_DATA_BLOCKED, current flow control limit
was sent to client. Now, if the limit can be updated to the full window size,
it is updated and the new value is sent to client, otherwise nothing is sent.
The change lets client update flow control credit on demand. Also, it saves
traffic by not sending MAX_STREAM_DATA with the same value twice.
Roman Arutyunyan [Thu, 11 Nov 2021 16:07:00 +0000 (19:07 +0300)]
QUIC: reject streams which we could not create.
The reasons why a stream may not be created by server currently include hitting
worker_connections limit and memory allocation error. Previously in these
cases the entire QUIC connection was closed and all its streams were shut down.
Now the new stream is rejected and existing streams continue working.
To reject an HTTP/3 request stream, RESET_STREAM and STOP_SENDING with
H3_REQUEST_REJECTED error code are sent to client. HTTP/3 uni streams and
Stream streams are not rejected.
The variable contains a negotiated curve used for the handshake key
exchange process. Known curves are listed by their names, unknown
ones are shown in hex.
Note that for resumed sessions in TLSv1.2 and older protocols,
$ssl_curve contains the curve used during the initial handshake,
while in TLSv1.3 it contains the curve used during the session
resumption (see the SSL_get_negotiated_group manual page for
details).
The variable is only meaningful when using OpenSSL 3.0 and above.
With older versions the variable is empty.
Maxim Dounin [Fri, 29 Oct 2021 23:39:19 +0000 (02:39 +0300)]
Changed ngx_chain_update_chains() to test tag first (ticket #2248).
Without this change, aio used with HTTP/2 can result in connection hang,
as observed with "aio threads; aio_write on;" and proxying (ticket #2248).
The problem is that HTTP/2 updates buffers outside of the output filters
(notably, marks them as sent), and then posts a write event to call
output filters. If a filter does not call the next one for some reason
(for example, because of an AIO operation in progress), this might
result in a state when the owner of a buffer already called
ngx_chain_update_chains() and can reuse the buffer, while the same buffer
is still sitting in the busy chain of some other filter.
In the particular case a buffer was sitting in output chain's ctx->busy,
and was reused by event pipe. Output chain's ctx->busy was permanently
blocked by it, and this resulted in connection hang.
Fix is to change ngx_chain_update_chains() to skip buffers from other
modules unconditionally, without trying to wait for these buffers to
become empty.
Maxim Dounin [Fri, 29 Oct 2021 17:21:57 +0000 (20:21 +0300)]
Changed default value of sendfile_max_chunk to 2m.
The "sendfile_max_chunk" directive is important to prevent worker
monopolization by fast connections. The 2m value implies maximum 200ms
delay with 100 Mbps links, 20ms delay with 1 Gbps links, and 2ms on
10 Gbps links. It also seems to be a good value for disks.
Maxim Dounin [Fri, 29 Oct 2021 17:21:54 +0000 (20:21 +0300)]
Upstream: sendfile_max_chunk support.
Previously, connections to upstream servers used sendfile() if it was
enabled, but never honored sendfile_max_chunk. This might result
in worker monopolization for a long time if large request bodies
are allowed.
Maxim Dounin [Fri, 29 Oct 2021 17:21:51 +0000 (20:21 +0300)]
Fixed sendfile() limit handling on Linux.
On Linux starting with 2.6.16, sendfile() silently limits all operations
to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
triggered the interrupt check, and resulted in 0-sized writev() on the
next loop iteration.
Fix is to make sure the limit is always checked, so we will return from
the loop if the limit is already reached even if number of bytes sent is
not exactly equal to the number of bytes we've tried to send.
Maxim Dounin [Fri, 29 Oct 2021 17:21:48 +0000 (20:21 +0300)]
Simplified sendfile_max_chunk handling.
Previously, it was checked that sendfile_max_chunk was enabled and
almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
delaying connections where sendfile_max_chunk wasn't reached (for example,
when sending responses smaller than sendfile_max_chunk). Now we instead
check if there are unsent data, and the connection is still ready for writing.
Additionally we also check c->write->delayed to ignore connections already
delayed by limit_rate.
This approach is believed to be more robust, and correctly handles
not only sendfile_max_chunk, but also internal limits of c->send_chain(),
such as sendfile() maximum supported length (ticket #1870).
Maxim Dounin [Fri, 29 Oct 2021 17:21:43 +0000 (20:21 +0300)]
Switched to using posted next events after sendfile_max_chunk.
Previously, 1 millisecond delay was used instead. In certain edge cases
this might result in noticeable performance degradation though, notably on
Linux with typical CONFIG_HZ=250 (so 1ms delay becomes 4ms),
sendfile_max_chunk 2m, and link speed above 2.5 Gbps.
Using posted next events removes the artificial delay and makes processing
fast in all cases.
Roman Arutyunyan [Thu, 28 Oct 2021 11:14:25 +0000 (14:14 +0300)]
Mp4: mp4_start_key_frame directive.
The directive enables including all frames from start time to the most recent
key frame in the result. Those frames are removed from presentation timeline
using mp4 edit lists.
Edit lists are currently supported by popular players and browsers such as
Chrome, Safari, QuickTime and ffmpeg. Among those not supporting them properly
is Firefox[1].
Based on a patch by Tracey Jaquith, Internet Archive.
The function updates the duration field of mdhd atom. Previously it was
updated in ngx_http_mp4_read_mdhd_atom(). The change makes it possible to
alter track duration as a result of processing track frames.
Roman Arutyunyan [Mon, 18 Oct 2021 11:48:11 +0000 (14:48 +0300)]
HTTP/3: send Stream Cancellation instruction.
As per quic-qpack-21:
When a stream is reset or reading is abandoned, the decoder emits a
Stream Cancellation instruction.
Previously the instruction was not sent. Now it's sent when closing QUIC
stream connection if dynamic table capacity is non-zero and eof was not
received from client. The latter condition means that a trailers section
may still be on its way from client and the stream needs to be cancelled.
Roman Arutyunyan [Mon, 18 Oct 2021 12:47:06 +0000 (15:47 +0300)]
HTTP/3: allowed QUIC stream connection reuse.
A QUIC stream connection is treated as reusable until first bytes of request
arrive, which is also when the request object is now allocated. A connection
closed as a result of draining, is reset with the error code
H3_REQUEST_REJECTED. Such behavior is allowed by quic-http-34:
Once a request stream has been opened, the request MAY be cancelled
by either endpoint. Clients cancel requests if the response is no
longer of interest; servers cancel requests if they are unable to or
choose not to respond.
When the server cancels a request without performing any application
processing, the request is considered "rejected." The server SHOULD
abort its response stream with the error code H3_REQUEST_REJECTED.
The client can treat requests rejected by the server as though they had
never been sent at all, thereby allowing them to be retried later.
Roman Arutyunyan [Mon, 18 Oct 2021 12:22:33 +0000 (15:22 +0300)]
HTTP/3: adjusted QUIC connection finalization.
When an HTTP/3 function returns an error in context of a QUIC stream, it's
this function's responsibility now to finalize the entire QUIC connection
with the right code, if required. Previously, QUIC connection finalization
could be done both outside and inside such functions. The new rule follows
a similar rule for logging, leads to cleaner code, and allows to provide more
details about the error.
While here, a few error cases are no longer treated as fatal and QUIC connection
is no longer finalized in these cases. A few other cases now lead to
stream reset instead of connection finalization.
Vladimir Homutov [Wed, 13 Oct 2021 11:48:33 +0000 (14:48 +0300)]
QUIC: fixed removal of unused client IDs.
If client ID was never used, its refcount is zero. To keep things simple,
the ngx_quic_unref_client_id() function is now aware of such IDs.
If client ID was used, the ngx_quic_replace_retired_client_id() function
is supposed to find all users and unref the ID, thus ngx_quic_unref_client_id()
should not be called after it.
The "min" and "max" arguments refer to UDP datagram size. Generating payload
requires to account properly for header size, which is variable and depends on
payload size and packet number.
Sergey Kandaurov [Tue, 26 Oct 2021 14:43:10 +0000 (17:43 +0300)]
QUIC: speeding up processing 0-RTT.
After fe919fd63b0b, processing QUIC streams was postponed until after handshake
completion, which means that 0-RTT is effectively off. With ssl_ocsp enabled,
it could be further delayed. This differs from how OCSP validation works with
SSL_read_early_data(). With this change, processing QUIC streams is unlocked
when obtaining 0-RTT secret.
Vladimir Homutov [Fri, 15 Oct 2021 09:26:42 +0000 (12:26 +0300)]
QUIC: optimized ack range processing.
The sent queue is sorted by packet number. It is possible to avoid
traversing full queue while handling ack ranges. It makes sense to
start traversing from the queue head (i.e. check oldest packets first).
Roman Arutyunyan [Wed, 13 Oct 2021 11:41:46 +0000 (14:41 +0300)]
QUIC: traffic-based flood detection.
With this patch, all traffic over a QUIC connection is compared to traffic
over QUIC streams. As long as total traffic is many times larger than stream
traffic, we consider this to be a flood.
With this patch, all traffic over HTTP/3 bidi and uni streams is counted in
the h3c->total_bytes field, and payload traffic is counted in the
h3c->payload_bytes field. As long as total traffic is many times larger than
payload traffic, we consider this to be a flood.
Request header traffic is counted as if all fields are literal. Response
header traffic is counted as is.
Martin Duke [Tue, 12 Oct 2021 08:57:50 +0000 (11:57 +0300)]
QUIC: attempt decrypt before checking for stateless reset.
Checking the reset after encryption avoids false positives. More importantly,
it avoids the check entirely in the usual case where decryption succeeds.
RFC 9000, 10.3.1 Detecting a Stateless Reset
Endpoints MAY skip this check if any packet from a datagram is
successfully processed.
Roman Arutyunyan [Thu, 30 Sep 2021 14:14:42 +0000 (17:14 +0300)]
Added r->response_sent flag.
The flag indicates that the entire response was sent to the socket up to the
last_buf flag. The flag is only usable for protocol implementations that call
ngx_http_write_filter() from header filter, such as HTTP/1.x and HTTP/3.
Stream: fixed segfault when using SSL certificates with variables.
Similar to the previous change, a segmentation fault occurres when evaluating
SSL certificates on a QUIC connection due to an uninitialized stream session.
The fix is to adjust initializing the QUIC part of a connection until after
it has session and variables initialized.
Similarly, this appends logging error context for QUIC connections:
- client 127.0.0.1:54749 connected to 127.0.0.1:8880 while handling frames
- quic client timed out (60: Operation timed out) while handling quic input
HTTP/3: fixed segfault when using SSL certificates with variables.
A QUIC connection doesn't have c->log->data and friends initialized to sensible
values. Yet, a request can be created in the certificate callback with such an
assumption, which leads to a segmentation fault due to null pointer dereference
in ngx_http_free_request(). The fix is to adjust initializing the QUIC part of
a connection such that it has all of that in place.
Further, this appends logging error context for unsuccessful QUIC handshakes:
- cannot load certificate .. while handling frames
- SSL_do_handshake() failed .. while sending frames
Ruslan Ermilov [Mon, 27 Sep 2021 07:10:38 +0000 (10:10 +0300)]
Configure: fixed QUIC support test.
OpenSSL library QUIC support cannot be tested at configure time when
using the --with-openssl option so assume it's present if requested.
While here, fixed the error message in case QUIC support is missing.
Roman Arutyunyan [Wed, 22 Sep 2021 11:08:21 +0000 (14:08 +0300)]
HTTP/3: fixed ngx_stat_active counter.
Previously the counter was not incremented for HTTP/3 streams, but still
decremented in ngx_http_close_connection(). There are two solutions here, one
is to increment the counter for HTTP/3 streams, and the other one is not to
decrement the counter for HTTP/3 streams. The latter solution looks
inconsistent with ngx_stat_reading/ngx_stat_writing, which are incremented on a
per-request basis. The change adds ngx_stat_active increment for HTTP/3
request and push streams.
QUIC: set NGX_TCP_NODELAY_DISABLED for fake stream connections.
Notably, it is to avoid setting the TCP_NODELAY flag for QUIC streams
in ngx_http_upstream_send_response(). It is an invalid operation on
inherently SOCK_DGRAM sockets, which leads to QUIC connection close.
The change reduces diff to the default branch in stream content phase.
Roman Arutyunyan [Fri, 17 Sep 2021 13:32:23 +0000 (16:32 +0300)]
HTTP/3: make ngx_http_log_error() static again.
This function was only referenced from ngx_http_v3_create_push_request() to
initialize push connection log. Now the log handler is copied from the parent
request connection.
The functions ngx_quic_handle_read_event() and ngx_quic_handle_write_event()
are added. Previously this code was a part of ngx_handle_read_event() and
ngx_handle_write_event().
The change simplifies ngx_handle_read_event() and ngx_handle_write_event()
by moving QUIC-related code to a QUIC source file.
QUIC: store QUIC connection fd in stream fake connection.
Previously it had -1 as fd. This fixes proxying, which relies on downstream
connection having a real fd. Also, this reduces diff to the default branch for
ngx_close_connection().
HTTP/2: optimized processing of small DATA frames.
The request body filter chain is no longer called after processing
a DATA frame. Instead, we now post a read event to do this. This
ensures that multiple small DATA frames read during the same event loop
iteration are coalesced together, resulting in much faster processing.
Since rb->buf can now contain unprocessed data, window update is no
longer sent in ngx_http_v2_state_read_data() in case of flow control
being used due to filter buffering. Instead, window will be updated
by ngx_http_v2_read_client_request_body_handler() in the posted read
event.
HTTP/2: fixed timers left after request body reading.
Following rb->filter_need_buffering changes, request body reading is
only finished after the filter chain is called and rb->last_saved is set.
As such, with r->request_body_no_buffering, timer on fc->read is no
longer removed when the last part of the body is received, potentially
resulting in incorrect behaviour.
The fix is to call ngx_http_v2_process_request_body() from the
ngx_http_v2_read_unbuffered_request_body() function instead of
directly calling ngx_http_v2_filter_request_body(), so the timer
is properly removed.
HTTP/2: fixed window updates when buffering in filters.
In the body read handler, the window was incorrectly calculated
based on the full buffer size instead of the amount of free space
in the buffer. If the request body is buffered by a filter, and
the buffer is not empty after the read event is generated by the
filter to resume request body processing, this could result in
"http2 negative window update" alerts.
Further, in the body ready handler and in ngx_http_v2_state_read_data()
the buffer wasn't cleared when the data were already written to disk,
so the client might stuck without window updates.
QUIC: fixed null pointer dereference in MAX_DATA handler.
If a MAX_DATA frame was received before any stream was created, then the worker
process would crash in nginx_quic_handle_max_data_frame() while traversing the
stream tree. The issue is solved by adding a check that makes sure the tree is
not empty.