Add minstratum and maxstratum directives to specify the minimum and
maximum allowed stratum of sources to be selected. The default values
are 0 and 15 respectively, allowing all NTP sources and refclocks.
Sources that are rejected due to having too large or too small stratum
are marked with 'r' in the selection log and selectdata report.
This is similar to the "tos floor" and "tos ceiling" settings of ntpd,
except that maxstratum is interpreted as one below the ceiling.
SCK_AcceptConnection() always returns a non-blocking socket. Clear the
O_NONBLOCK flag in the socket unit test, which relies on blocking, to
avoid failures.
Reported-by: Matthias Andree <matthias.andree@gmx.de>
The recent rework of refclock reachability to better work with
driver-specific filtering (PHC driver dropping samples with unexpected
delay) introduced an issue that a PPS refclock is indicated as reachable
even when its "lock" refclock is permanently unreachable, or its samples
constistently fail in other sample checks, and no actual samples can be
accumulated. This breaks the new maxunreach option.
Rework the refclock code to provide samples from drivers together with
their quality level (all drivers except PHC provide samples with
constant quality of 1) and drop samples with quality 0 after passing
all checks, right before the actual accumulation in the median sample
filter. Increment the reachability counter only for samples that would
be accumulated.
This fixes the problem with refclocks indicated as reachable when their
samples would be dropped for other reasons than the PHC-specific delay
filter, and the maxunreach option can work as expected.
Fixes: b9b338a8df ("refclock: rework update of reachability")
Modify the HCL_ProcessReadings() function to try to always provide
a valid sample. Instead of dropping a sample outside of the expected
delay, provide its assumed quality level as a small integer (relative to
already accumulated samples), and let the caller decide what quality is
acceptable.
Add maxunreach option to NTP sources and refclocks to specify the
maximum number of polls that the source can stay selected for
synchronization when it is unreachable (i.e. no valid sample was
received in the last 8 polls).
It is an additional requirement to having at least one sample more
recent than the oldest sample of reachable sources.
The default value is 100000. Setting the option to 0 disables selection
of unreachable sources, which matches RFC 5905.
To minimize the impact of potential attacks targeting chronyc started
under root (e.g. performed by a local chronyd process running without
root privileges, a remote chronyd process, or a MITM attacker on the
network), add support for changing the effective UID/GID in chronyc
after start.
The user can be specified by the -u option, similarly to chronyd. The
default chronyc user can be changed by the --with-chronyc-user
configure option. The default value of the default chronyc user is
"root", i.e. chronyc doesn't try to change the identity by default.
The default chronyc user does not follow the default chronyd user
set by the configure --with-user option to avoid errors on systems where
chronyc is not allowed to change its UID/GID (e.g. by a SELinux policy).
If a clock step enabled by the makestep directive or requested by the
makestep command fails, accumulate the missing step back to keep the
tracking offset valid.
This fixes time served by an instance configured with the makestep
directive and the -x option (the null driver cannot perform steps) at
the same time. It will still generate error log messages.
Switch from memcmp() to the new constant-time function to compare the
received and expected authentication data generated with a symmetric key
(NTP MAC or AES CMAC).
While this doesn't seem to be strictly necessary with the current
code, it is a recommended practice to prevent timing attacks. If
memcmp() compared the MACs one byte at a time (a typical memcmp()
implementation works with wider integers for better performance) and
chronyd as an NTP client/server/peer was leaking the timing of the
comparison (e.g. in the monitoring protocol), an attacker might be able
for a given NTP request or response find in a sequence the individual
bytes of the MAC by observing differences in the timing over a large
number of attempts. However, this process would likely be so slow the
authenticated request or response would not be useful in a MITM attack
as the expected origin timestamp is changing with each poll.
Extend the keys unit test to compare the time the function takes to
compare two identical MACs and MACs differing in the first byte
(maximizing the timing difference). It should fail if the compiler's
optimizations figure out the function can return early. The test is not
included in the util unit test to avoid compile-time optimizations with
the function and its caller together. The test can be disabled by
setting NO_TIMING_TESTS environment variable if it turns out to be
unreliable.
Add a function to check if two buffers of the same length contain the
same data, but do the comparison in a constant time with respect to the
returned value to avoid creating a timing side channel, i.e. the time
depends only on the buffer length, not on the content.
Use the gnutls_memcmp() or nettle_memeql_sec() functions if available,
otherwise use the same algorithm as nettle - bitwise ORing XORed data.
Don't allow the NTP support and asynchronous name resolving to be
disabled. pthreads are now a hard requirement.
NTP is the primary task of chrony. This functionality doesn't seem to be
commonly disabled (allowing only refclocks and manual input).
This removes rarely (if ever) used code and simplifies testing.
Add "waitunsynced" option to specify how long chronyd needs to wait
before it can activate the local reference when the clock is not
synchronized to give the configured sources a chance to synchronize the
local clock after start. The default is 300 seconds when the orphan
option is enabled (same as the ntpd's default orphanwait), 0 otherwise.
Add "waitsynced" option to specify how long it should wait when the
clock is synchronized. It is an additional requirement to the distance
and activate options.
Include the number of unreachable sources in the "Can't synchronise: no
selectable sources" log message to provide a hint whether it might be a
networking issue.
Add the number of sources that form an agreement (overlapping
intervals), if at least two agree with each other, and number of
reachable sources to the "Can't synchronize: no majority" log message to
better explain why synchronization is failing and hint that adding more
sources might help.
Replace the hardcoded list of open commands (accessible over UDP),
with a list that can be configured with a new "opencommands" directive.
The default matches the original list. All read-only commands except
accheck and cmdaccheck can be enabled. The naming follows the chronyc
naming. Enable the N_SOURCES request only when needed.
This makes it possible to have a full monitoring access without access
to the Unix domain socket. It also allows restricting the monitoring
access to a smaller number of commands if some commands from the default
list are not needed.
Mention in the man page that the protocol of the non-default commands is
not consider stable and the information they provide may have security
implications.
Add a new parameter to limit the negative value of the step state
variable. It's set as a maximum delay in number of updates before the
actual step applied to the quantile estimate starts growing from the
minimum step when the input value is consistently larger or smaller than
the estimate.
This prevents the algorithm from effectively becoming the slower 1U
variant if the quantile estimate is stable most of the time.
Set it to 100 updates for the NTP delay and 1000 updates for the hwclock
delay. An option could be added later to make it configurable.
The algorithm was designed for estimating quantiles in streams of
integer values. When the estimate is equal to the input value, the
step state variable does not change. This causes problems for the
floating-point adaptation used for measurents of delay in chrony.
One problem is numerical instability due to the strict comparison of
the input value and the current estimate.
Another problem is with signals that are so stable that the nanosecond
resolution of the system functions becomes the limitation. There is a
large difference in the value of the step state variable, which
determines how quickly the estimate will adapt to a new distribution,
between signals that are constant in the nanosecond resolution and
signals that can move in two nanoseconds.
Change the estimate update to never consider the input value equal to
the current estimate and don't set the estimate exactly to the input
value. Keep it off by a quarter of the minimum step to force jumping
around the input value if it's constant and decreasing the step variable
to negative values. Also fix the initial adjustment to step at least by
the minimum step (the original algorithm is described with ceil(), not
fabs()).
Reduce the minimum number of samples required by the filter from
min(4, length) to 1.
This makes the filtering less confusing. The sample lifetime is limited
to one poll and the default filtering of the SOCK refclock (where the
maximum number of samples per poll is unknown) is identical to the other
refclocks.
A concern with potential variability in number of samples per poll below
4 is switching between different calculations of dispersion in
combine_selected_samples() in samplefilt.c.
The 106-refclock test shows how the order of refclocks in the config can
impact the first filtered sample and selection. If the PPS refclock
follows SHM, a single low-quality PPS sample is accepted in the same
poll where SHM is selected and the initial clock correction started,
which causes larger skew later and delays the first selection of the PPS
refclock.
Update the reachability register of a refclock source by 1 if a valid
measurement is received by the drivers between source polls, and not
only when it is accumulated to sourcestats, similarly to how
reachability works with NTP sources.
This avoids drops in the reported reachability when a PHC refclock is
dropping samples due to significant changes in the measured delay (e.g.
due to high PCIe load), or a PPS refclock dropping samples due to failed
lock.
Add ntsaeads directive to specify a list of AEAD algorithms enabled for
NTS. The list is shared between the server and client. For the client it
also specifies the order of priority. The default is "30 15", matching
the previously hardcoded preference of AES-128-GCM-SIV (30) over
AES-SIV-CMAC-256 (15).
Make sure the TLS session is not NULL in NKSN_GetKeys() before trying to
export the keys in case some future code tried to call the function
outside of the NTS-KE message handler.
Implement a fallback for the NTS-NTP client to switch to the compliant
AES-128-GCM-SIV exporter context when the server is using the compliant
context, but does not support the new NTS-KE record negotiating its use,
assuming it can respond with an NTS NAK to the request authenticated
with the incorrect key.
Export both sets of keys when processing the NTS-KE response. If an NTS
NAK is the only valid response from the server after the last NTS-KE
session, switch to the keys exported with the compliant context for the
following requests instead of dropping all cookies and restarting
NTS-KE. Don't switch back to the original keys if an NTS NAK is received
again.