With forced reselection during source removal selected_source_index
can only be INVALID_SOURCE if there are no sources. The "Can't
synchronise: no sources" message couldn't be logged even before that as
SRC_ReselectSource() resets the index before calling SRC_SelectSource().
Replace the message with an assertion.
When a selected source is being removed, reset the instance and rerun
the selection while the source is still marked as selected. This forces
a "Can't synchronise" message to be logged when all sources are removed.
Reported-by: Thomas Lange <thomas@corelatus.se>
When switching from basic mode to interleaved mode following a response
which wasn't accumulated due to failed test A, B, C, or D, allow
timestamps of the failed sample to be reused in interleaved mode, i.e.
replacing the server's less accurate transmit timestamp with a more
accurate timestamp server can turn a failed sample into acceptable one.
Move the presend check into test A to simplify the code.
The presend option in interleaved mode uses two presend requests instead
of one to get an interleaved response from servers like chrony which
delay the first interleaved response due to an optimization saving
timestamps only for clients actually using the interleaved mode.
After commit 0ae6f2485b ("ntp: don't use first response in interleaved
mode") the first interleaved response following the two presend
responses in basic mode is dropped as the preferred set of timestamps
minimizing error in delay was already used by the second sample in
basic mode. There are only three responses in the burst and no sample is
accumulated.
Increasing the number of presend requests to three to get a fourth
sample would be wasteful. Instead, allow reusing timestamps of the
second presend sample in basic mode, which is never accumulated.
Reported-by: Aaron Thompson
Fixes: 0ae6f2485b ("ntp: don't use first response in interleaved mode")
Close all reusable sockets when the NTS-KE server helper is forked. It
is not supposed to have access to any of the server sockets, just the
socket for getting requests from the main process and the syslog socket.
Set the CLOEXEC flag on all reusable sockets in the initialization to
avoid leaking them to sendmail (mailonchange directive) in case the
chrony configuration doesn't use all sockets provided by systemd.
Before opening new IPv4/IPv6 server sockets, chronyd will check for
matching reusable sockets passed from the service manager (for example,
passed via systemd socket activation:
https://www.freedesktop.org/software/systemd/man/latest/sd_listen_fds.html)
and use those instead.
Aside from IPV6_V6ONLY (which cannot be set on already-bound sockets),
the daemon sets the same socket options on reusable sockets as it would
on sockets it opens itself.
Unit tests test the correct parsing of the LISTEN_FDS environment
variable.
Add 011-systemd system test to test socket activation for DGRAM and
STREAM sockets (both IPv4 and IPv6). The tests use the
systemd-socket-activate test tool, which has some limitations requiring
workarounds discussed in inline comments.
Initialize the unused value of network correction parsed from
own transmitted packets to avoid a use-of-uninitialized-value error
in NIO_UnwrapMessage() reported by clang.
Fixes: 6372a9f93f ("ntp: save PTP correction from NTP-over-PTP messages")
If the network correction is known for both the request and response,
and their sum is not larger that the measured peer delay, allowing the
transparent clocks to be running up to 100 ppm faster than the client's
clock, apply the corrections to the NTP offset and peer delay. Don't
correct the root delay to not change the estimated maximum error.
Provide the network correction (PTP correction + RX duration) of the
request in the new extension field if included in the request and
NTP-over-PTP is enabled.
To be able to verify PTP corrections, the client will need to know both
the correction of the request received by the server and the correction
of the response. Add a new experimental NTP extension field that the
clients will use to request the correction and servers return the
value.
Add two new fields to the NTP_Local_Timestamp structure:
- receive duration as the time it takes to receive the ethernet frame,
currently known only with HW timestamping
- network correction as a generalized PTP correction
The PTP correction is provided by transparent clocks in the correction
field of PTP messages to remove the receive, processing and queueing
delays of network switches and routers. Only one-step end-to-end unicast
transparent clocks are useful for NTP-over-PTP. Two-step transparent
clocks use follow-up messages and peer-to-peer transparent clocks don't
handle delay requests.
The RX duration will be included in the network correction to compensate
for asymmetric link speeds of the server and client as the NTP RX
timestamp corresponds to the end of the reception (in order to
compensate for the asymmetry in the normal case when no corrections
are applied).
Rename the exp1 extension field to exp_mono_root (monotonic timestamp +
root delay/dispersion) to better distinguish it from future experimental
extension fields.
Add support for AES-128-GCM-SIV in the current development code of
gnutls. There doesn't seem to be an API to get the cipher's minimum and
maximum nonce length and it doesn't check for invalid lengths. Hardcode
and check the limits in chrony for now.
When reloading a modified source from sourcedir which is ordered before
the original source (e.g. maxpoll was decreased), the new source is
added before the original one is removed. If the source is specified by
IP address, the addition fails due to the conflict with the original
source. Sources specified by hostname don't conflict. They are resolved
later (repeatedly if the resolver provides only conflicting addresses).
Split the processing of sorted source lists into two phases, so all
modified sources are removed before they are added again to avoid the
conflict.
Reported-by: Thomas Lange <thomas@corelatus.se>
FreeBSD allows switching the receive timestamp format to struct timespec by
setting the SO_TS_CLOCK socket option to SO_TS_REALTIME after enabling
SO_TIMESTAMP. If successful, the kernel then starts adding SCM_REALTIME
control messages instead of SCM_TIMESTAMP.
Avoid frequently ending in the middle of a client/server exchange with
long delays. This changed after commit 4a11399c2e ("ntp: rework
calculation of transmit timeout").
Clients sockets are closed immediately after receiving valid response.
Don't wait for the first early HW TX timestamp to enable waiting for
late timestamps. It may take a long time or never come if the HW/driver
is consistently slow. It's a chicken and egg problem.
Instead, simply check if HW timestamping is enabled on at least one
interface. Responses from NTP sources on other interfaces will always be
saved (for 1 millisecond by default).
If noselect is present in the configured options, don't assume it
cannot change and the effective options are equal. This fixes chronyc
selectopts +noselect command.
Fixes: 3877734814 ("sources: add function to modify selection options")
When refreshing a source, compare the newly resolved addresses with the
originally resolved address instead of the current address to avoid
unnecessary replacements when the address is changed due to the NTS-KE
server negotiation.
The getdate code (copied from gnulib before it was switched to GPLv3)
has multiple issues with signed integer overflows. Use the -fwrapv
compiler option for this object to at least make the operations defined.
Refresh NTP sources specified by hostname periodically (every 2 weeks
by default) to avoid long-running instances using a server which is no
longer intended for service, even if it is still responding correctly
and would not be replaced as unreachable, and help redistributing load
in large pools like pool.ntp.org. Only one source is refreshed at a time
to not interrupt clock updates if there are multiple selectable servers.
The refresh directive configures the interval. A value of 0 disables
the periodic refreshment.
Suggested-by: Ask Bjørn Hansen <ask@develooper.com>
Don't leave dangling pointers to timer queue entries when they are
freed in the scheduler finalization in case some code tried to remove
a timer later.
Fixes: 6ea1082a72 ("sched: free timer blocks on exit")
Wait for four consecutive source selections giving a bad status
(falseticker, bad distance or jittery) before triggering the source
replacement. This should reduce the rate of unnecessary replacements
and shorten the time needed to find a solution when unreplaceable
falsetickers are preventing other sources from forming a majority due
to switching back and forth to unreachable servers.
Instead of waiting for the next update of reachability, trigger
replacement of falsetickers, jittery and distant sources as soon as
the selection status is updated in their SRC_SelectSource() call.
When starting the daemon, wait in the grandparent process for the parent
process to terminate before exiting to avoid systemd logging a warning
"Supervising process $PID which is not our child". Waiting for the pipe
to be closed by the kernel when the parent process exits is not
sufficient.
Reported-by: Jan Pazdziora <jpazdziora@redhat.com>
Replacement attempts are globally rate limited to one per 7*2^8 seconds
to limit the rate of DNS requests for public servers like pool.ntp.org.
If multiple sources are repeatedly attempting replacement (at their
polling intervals), one source can be getting all attempts for periods
of time.
Use a randomly generated interval to randomize the order of source
replacements without changing the average rate.
The clang memory sanitizer seems to trigger on an uninitialized value
passed to format_name() when the source is a refclock, even though the
value is not used for anything. Pass 0 in this case to avoid the error.
valgrind 3.21.0 reports realloc() of 0 bytes as an error due to having
different behavior on different systems. The only place where this can
happen in chrony is the array, which doesn't care what value realloc()
returns.
Modify the realloc wrapper to call free() in this case to make valgrind
happy.
Don't load keys and cookies from the client's dump file if it has an
unsupported algorithm and unparseable keys (matching the algorithm's
expected length of zero). They would fail all SIV operations and trigger
new NTS-KE session.
Initialize the unused part of shorter server NTS keys (AES-128-GCM-SIV)
loaded from ntsdumpdir to avoid sending uninitialized data in requests
to the NTS-KE helper process.
Do that also for newly generated keys in case the memory will be
allocated dynamically.
Fixes: b1230efac3 ("nts: add support for encrypting cookies with AES-128-GCM-SIV")
If the resolver orders addresses by IP family, there is more than one
address in the preferred IP family, and they are all reachable, but
not selectable (e.g. falsetickers in a small pool which cannot remove
them from DNS), chronyd is unable to switch to addresses in the other IP
family as it follows the resolver's order.
Enable randomization of the address selection for all source
replacements and not just replacement of (unreachable) tentative
sources. If the system doesn't have connectivity in the other family,
the addresses will be skipped and no change in behavior should be
observed.
The polltarget value is used in a floating-point division in the
calculation of the poll adjustment. Set 1 as the minimum accepted
polltarget value to avoid working with infinite values.
Set the polling interval to minpoll when changing address of a source,
but only if it is reachable to avoid increasing load on server or
network in case that is the reason for the source being unreachable.
This shortens the time needed to replace a falseticker or
unsynchronized source with a selectable source.
When the refresh command is issued, instead of trying to replace all
NTP sources as if they were unreachable or falsetickers, keep using the
current address if it is still returned by the resolver for the name.
This avoids unnecessary loss of measurements and switching to
potentially unreachable addresses.
Rework handling of late HW TX timestamps. Instead of suspending reading
from client-only sockets that have HW TX timestamping enabled, save the
whole response if it is valid and a HW TX timestamp was received for the
source before. When the timestamp is received, or the configurable
timeout is reached, process the saved response again, but skip the
authentication test as the NTS code allows only one response per
request. Only one valid response per source can be saved. If a second
valid response is received while waiting for the timestamp, process both
responses immediately in the order they were received.
The main advantage of this approach is that it works on all sockets, i.e.
even in the symmetric mode and with NTP-over-PTP, and the kernel does
not need to buffer invalid responses.
Previously, in the calculation of the next transmission time
corresponding to the current polling interval, the reference point was
the current time in the client mode (i.e. the time when the response is
processed) and the last transmission time in the symmetric mode.
Rework the code to use the last transmission in both modes and make it
independent from the time when the response is processed to avoid extra
delays due to waiting for HW TX timestamps.
Update the serverstats response to use the new 64-bit integers.
Don't define a new value for the response as it already had an
incompatible change since the latest release (new fields added for
timestamp counters).
Add a structure for 64-bit integers without requiring 64-bit alignment
to be usable in CMD_Reply without struct packing.
Add utility functions for conversion to/from network order. Avoid using
be64toh() and htobe64() as they don't seem to be available on all
supported systems.
Count served timestamps in all combinations of RX/TX and
daemon/kernel/hardware. Repurpose CLG_LogAuthNtpRequest() to update all
NTP-specific stats in one call per accepted request and response.
After 5f4cbaab7e ("ntp: optimize detection of clients using
interleaved mode") the local TX timestamp is saved for all requests
indicating interleaved mode even when no previous RX timestamp is found.
Specify maxpoll for HW timestamping (default minpoll + 1) to track the
PHC well even when there is little NTP traffic on the interface. After
each PHC reading schedule a timeout according to the maxpoll. Polling
between minpoll and maxpoll is still triggered by HW timestamps.
Wait for the first HW timestamp before adding the timeout to avoid
polling PHCs on interfaces that are enabled in the configuration but
not used for NTP. Add a new scheduling class to separate polling of
different PHCs to avoid too long intervals between processing I/O
events.
In some cases even the new timeout of 1 millisecond is not sufficient to
get all HW TX timestamps. Add a new directive to allow users to
specify longer timeouts.
With some hardware it takes milliseconds to get the HW TX timestamp.
Rework the code to handle multiple suspended client-only sockets at the
same time in order to allow longer timeouts, which may overlap for
different sources. Instead of waiting for the first read event simply
suspend the socket and create timeout when the HW TX timestamp is
requested.
The Linux kernel (as of 6.2) has a shared queue of external timestamps
for all descriptors of the same PHC. If multiple refclocks using the
same PHC and the same or different channels were specified, some
refclocks didn't receive any or most of their timestamps, depending on
the rate and timing of the events (with the previous commit avoiding
blocking reads).
Track extpps-enabled refclocks in an array. Add PHC index to the PHC
instance. When a timestamp is read from the descriptor, provide it to
all refclocks that have the same PHC index and a channel matching the
event.
Make sure the timestamp is different from the previous one in case the
kernel will be improved to duplicate the timestamps for different
descriptors.
Reported-by: Matt Corallo <ntp-lists@mattcorallo.com>
The kernel has a common queue for all readers of a PHC device. With
multiple PHC refclocks using the same device some reads blocked. PHC
devices don't seem to support non-blocking reads. Use poll() to check if
a timestamp is available before reading from the descriptor.
Count missing samples for the median filter when
NAU_PrepareRequestAuth() is failing.
Fixes: 4234732b08 ("ntp: rework filter option to count missing samples")
Don't adjust the NTP polling interval and decrement the burst count when
NAU_PrepareRequestAuth() fails (e.g. no NTS-KE response received yet,
network being down, or the server refusing connections), same as if an
NTP request could not be sent. Rely on the rate limiting implemented in
the NTS code.
When chronyd configured with an NTS source not specified as offline and
resolvable without network was started before the network was up, it was
using an unnecessarily long NTS-KE retry interval, same as if the server
was refusing the connections.
When the network is down, the connect() call made from NKC_Start() on
the non-blocking TCP socket should fail with a different error than
EINPROGRESS and cause NKC_Start() to return with failure. Add a constant
2-second retry interval (matching default iburst) for this case.
When NKC_Start() fails (e.g. due to unreachable network), don't wait for
the next poll to destroy the client and another poll to create and start
it again.
In a non-tty session with chronyc it is not possible to detect the
end of the response without relying on timeouts, or separate responses
to a repeated command if using the -c option.
Add -e option to end each response with a line containing a single dot.
The sample time used in calculation of the last_meas_ago (LastRx) value
in the sources report is aligned to the second to minimize the leak
of the NTP receive timestamp, which could be useful in some attacks.
There is no need to do that with reference clocks, which are often used
with very short polling intervals and an extra second in the LastRx
value can be misinterpreted as a missed sample.
Log a warning message for each detected falseticker, but only once
between changes in the selection of the best source. Don't print all
sources when no majority is reached as that case has its own warning
message.
After dropping root privileges, log a warning message if chronyd
doesn't have read access or has (unnecessary) write access to the
files containing symmetric and server NTS keys.
Log a warning message if the file specified by the keyfile or
ntsserverkey directive is world-readable or writable, which is likely
an insecure misconfiguration. There is no check of directories
containing the file.
Split the new SOCK conditional using __GLIBC_PREREQ macro (which has
arguments) to fix compilation when it is not defined.
Fix also debug message using sizeof(time_t) in case it's enabled on
64-bit systems.
Reported-by: Bryan Christianson <bryan@whatroute.net>
Fixes: badaa83c31 ("refclock: convert mismatched timeval in SOCK messages")
On 32-bit glibc-based (>=2.34) systems, allow the SOCK client to send
messages with timevals using the other time_t size than chrony. If the
length of the received message corresponds to the other size, convert
the timeval and move the rest of the message before its processing.
This is needed for compatibility with the current development version of
gpsd, which forces 64-bit time_t on these systems, while chrony needs to
be compiled with the same time_t as gnutls.
The NTP SHM refclock protocol has the following properties:
- the memory segments have a predictable key (first segment 0x4e545030)
- it's expected to work in any order of starting chronyd and the program
providing samples to chronyd, i.e. both the consumer and producer need
to be able to create the segment
- the producer and consumer generally don't know under which user is
the other side running (e.g. gpsd can create the segment as root and
also as nobody after it drops root privileges)
- there is no authentication of data provided via SHM
- there is no way to restart the protocol
This makes it difficult for chronyd to ensure it is receiving
measurements from the process that the admin expects it to and not some
other process that managed to create the segment before it was started.
It's up to the admin to configure the system so that chronyd or the
producer is started before untrusted applications or users can create
the segment, or at least verify at some point later that the segment was
created with the expected owner and permissions.
There doesn't seem to be a backward-compatible fix of the protocol. Even
if one side could detect the segment had a wrong owner or permissions,
it wouldn't be able to tell the other side to reattach after recreating
the segment with the expected owner and permissions, if it still had the
permissions to do that.
The protocol would need to specify which side is responsible for
creating the segment and the start order would need to strictly follow
that.
As gpsd (likely the most common refclock source for chronyd) now
supports in the latest version SOCK even for message-based timing,
update the man page and FAQ to deprecate SHM in favor of SOCK.
This is a more restricted version of the chronyd service intended for
minimal NTP/NTS client configurations. The daemon is started without
root privileges and is allowed to write only to its own runtime, state,
and log directories. It cannot bind to privileged ports in order to
operate as an NTP server, or provide monitoring access over IPv4/IPv6.
It cannot use reference clocks, HW timestamping, RTC tracking, and other
features.
Add a function to add new selection options or remove existing options
specified in the configuration for both NTP sources and reference
clocks.
Provide a pair of IP address and reference ID to identify the source
depending on the type. Find the source directly in the array of sources
instead of going through the NSR hashtable for NTP sources to not
complicate it unnecessarily.
Log important changes from chronyc for auditing purposes.
Add log messages for:
- loaded symmetric keys and server NTS keys (logged also on start)
- modified maxupdateskew and makestep
- enabled/disabled local reference mode (logged also on start)
- reset time smoothing (logged also on clock steps)
- reset sources
Log a message when a single NTP source or pool of sources is added or
removed. Use the INFO severity if it's a result of a chronyc command or
(re)load of sourcefiles (which are assumed to change over time), and
DEBUG for other contexts, e.g. sources loaded from the config, sources
removed when pruning pools after reaching maxsources, and other parts of
normal operation.
Allow messages to have severity set to INFO or DEBUG depending on the
context in which they are made to allow logging important changes made
from chronyc or sourcefile, but not spam the system log if those changes
are normally expected (e.g. specified in the config).
If an NTS server is configured without ntsdumpdir, keys will not be
saved and reloaded after restart, which will cause existing cookies
to be invalidated and can cause a short-term denial of service if
the server has so many clients that it cannot handle them all
making an NTS-KE session within one polling interval.
Log a warning message if a server key+certificate is specified without
ntsdumpdir.
If the authenticator SIV encryption fails (e.g. due to wrong nonce
length), decrement the number of extension fields to keep the packet
info consistent.
Specify the AEAD ID for each key saved in the ntskeys file instead of
one ID for all keys. Keep support for loading files in the old format.
This will allow servers to save their keys after upgrading to a new
version with AES-128-GCM-SIV support before the loaded AES-SIV-CMAC-256
keys are rotated out.
If an unsupported key is found, don't load any keys. Also, change the
severity of the error message from debug to error.
If AES-128-GCM-SIV is available on the server, use it for encryption of
cookies. This makes them shorter by 4 bytes due to shorter nonce and it
might also improve the server performance.
After server upgrade and restart with ntsdumpdir, the switch will happen
on the second rotation of the server key. Clients should accept shorter
cookies without restarting NTS-KE. The first response will have extra
padding in the authenticator field to make the length symmetric.
Keep a server SIV instance for each available algorithm.
Select AES-128-GCM-SIV if requested by NTS-KE client as the first
supported algorithm.
Instead of encoding the AEAD ID in the cookie, select the algorithm
according to the length of decrypted keys. (This can work as a long as
all supported algorithms use keys with different lengths.)
If AES-128-GCM-SIV is available on the client, add it to the requested
algorithms in NTS-KE as the first (preferred) entry.
If supported on the server, it will make the cookies shorter, which
will get the length of NTP messages containing only one cookie below
200 octets. This should make NTS more reliable in networks where longer
NTP packets are filtered as a mitigation against amplification attacks
exploiting the ntpd mode 6/7 protocol.
Don't allow a cookie to contain keys with different lengths to not break
the assumption made in decoding, if there will ever be a case where this
could be requested.
While AES-SIV-CMAC allows nonces of any length, AES-GCM-SIV requires
exactly 12 bytes, which is less than the unpadded minimum length of 16
used in the NTS authenticator field. These functions will be needed to
support both ciphers in the NTS code.
This is a newer nonce misuse-resistant cipher specified in RFC 8452,
which is now supported in the development code of the Nettle library.
The advantages over AES-SIV-CMAC-256 are shorter keys and better
performance.
In glibc 2.36 was added the arc4random family of functions. However,
unlike on other supported systems, it is not a user-space PRNG
implementation. It just wraps the getrandom() system call with no
buffering, which causes a performance loss on NTP servers due to
the function being called twice for each response to add randomness
to the RX and TX timestamp below the clock precision.
Don't check for arc4random on Linux to keep using the buffered
getrandom().
Replace NULL in test code of functions which have (at least in glibc) or
could have arguments marked as nonnull to avoid the -Wnonnull warnings,
which breaks the detection with the -Werror option.
If the randomly generated timestamps are close to the current time, the
source can be selected for synchronization, which causes a crash when
logging the source name due to uninitialized ntp_sources.
Specify the source with the noselect option to prevent selection.
Call the function with current time instead of latest sample of the
first source to avoid undefined conversion of negative double to long
int.
Fixes: 07600cbd71 ("test: extend sources unit test")
Add a new test for maximum delay using a long-term estimate of a
p-quantile of the peer delay. If enabled, it replaces the
maxdelaydevratio test. It's main advantage is that it is not sensitive
to outliers corrupting the minimum delay.
As it can take a large number of samples for the estimate to reach the
expected value and adapt to a new value after a network change, the
option is recommended only for local networks with very short polling
intervals.
Instead of waiting for the sample filter to accumulate the specified
number of samples and then deciding if the result is acceptable, count
missing samples and get the result after the specified number of polls.
This should work better when samples are dropped at a high rate. The
source and clock update interval will be stable as long as at least
one sample can be collected.
When the minimum round-trip time is checked to enable a sub-second
polling interval, consider also the last sample in the filter to avoid
waiting for the first sample to be accumulated in sourcestats.
If a sub-second polling interval is configured, initialize the local
poll to 0 to avoid a shorter interval between the first and second
request in case no response to the first request is received (in time).
Return with an error code from chronyc if the command is expected to
print some data and fflush() or ferror() indicates an error. This should
make it easier for scripts to detect missing data when redirected to a
file.
Filtering was moved to a separate source file in commit
c498c21fad ("refclock: split off median filter). It looks like
MedianFilter struct somehow survived the split. Remove it to reduce
confusion.
With the first interleaved response coming after a basic response the
client is forced to select the four timestamps covering most of the last
polling interval, which makes measured delay very sensitive to the
frequency offset between server and client. To avoid corrupting the
minimum delay held in sourcestats (which can cause testC failures),
reject the first interleaved response in the client/server mode as
failing the test A.
This does not change anything for the symmetric mode, where both sets of
the four timestamps generally cover a significant part of the polling
interval.
If the computer is overloaded so much that chronyd cannot stop a slew
within one second of the scheduled end and the actual duration is more
than doubled (2 seconds with the minimum duration of 1 second), the
overshoot will be larger than the intended correction. If these
conditions persist, the oscillation will grow up to the maximum offset
allowed by maxslewrate and the delay in stopping.
Monitor the excess duration as an exponentially decaying maximum value
and don't allow any slews shorter than 5 times the value to damp the
oscillation. Ignore delays longer than 100 seconds, assuming they have a
different cause (e.g. the system was suspended and resumed) and are
already handled in the scheduler by triggering cancellation of the
ongoing slew.
This should also make it safer to shorten the minimum duration if
needed.
Reported-by: Daniel Franke <dff@amazon.com>
Estimate the 1st and 2nd 10-quantile of the reading delay and accept
only readings between them unless the error of the offset predicted from
previous samples is larger than the minimum reading error. With the 25
PHC readings per ioctl it should combine about 2-3 readings.
This should improve hwclock tracking and synchronization stability when
a PHC reading delay occasionally falls below the normal expected
minimum, or all readings in the batch are delayed significantly (e.g.
due to high PCIe load).
Add estimation of quantiles using the Frugal-2U streaming algorithm
(https://arxiv.org/pdf/1407.1121v1.pdf). It does not need to save
previous samples and adapts to changes in the distribution.
Allow multiple estimates of the same quantile and select the median for
better stability.
Move processing of PHC readings from sys_linux to hwclock, where
statistics can be collected and filtering improved.
In the PHC refclock driver accumulate the samples even if not in the
external timestamping mode to update the context which will be needed
for improved filtering.
Increase the number of requested readings from 10 to 25 - the maximum
accepted by the PTP_SYS_OFFSET* ioctls. This should improve stability of
HW clock tracking and PHC refclock.
Latest versions of ethtool print only the shorter lower-case names of
capabilities and filters. Explain that chronyd doesn't synchronize the
PHC and refer to the new vclock feature of the kernel, which should be
used by applications that need a synchronized PHC (e.g. ptp4l and
phc2sys) in order to not interfere with chronyd.
Instead of the generic clock driver silently zeroing the remaining
offset after detecting an external step, cancel it properly with the
slew handlers in order to correct timestamps that are not reset in
handling of the unknown step (e.g. the NTP local TX).
Use 3 as the minimum maxlockage in the local mode to avoid disruptions
due to losing the lock when a single sample is missed, e.g. when the PPS
driver polling interval is slightly longer than the pulse interval and a
pulse is skipped.
A refclock in the local mode is locked to itself. When the maxlockage
check failed after missing some samples, it failed permanently and the
refclock was not able to accumulate any new samples.
When the check fails, drop all samples and reset the source to start
from scratch.
Reported-by: Dan Drown <dan-ntp@drown.org>
A new function is provided by the latest gnutls (should be in 3.7.5) to
set the key of an AEAD cipher. If available, use it to avoid destroying
and creating a new SIV instance with each key change.
This improves the server NTS-NTP performance if using gnutls for SIV.
Initialization of the gnutls priority cache can fail depending on the
system crypto policy (e.g. disabled TLS1.3). Log an error mentioning
TLS, but continue to run without the server/client credentials.
Use snprintf() instead of strcat() and don't try to parse commands
longer than 2048 characters to make it consistent with the chrony.conf
parser, avoid memory allocation, and not rely on the system ARG_MAX to
keep the length sane.
Some grep implementations detect binary data and return success without
matching whole line. This might be an issue for the DHCPv6 NTP FQDN
check. The GNU grep in the C locale seems to check only for the NUL
character, which cannot be passed in an environment variable, but other
implementations might behave differently and there doesn't seem to be a
portable way to force matching the whole line.
Instead of the grep command, check for invalid characters by comparing
the length of the input passed through "tr -d -c".
When an added source is specified by IP address, save the original
string instead of formatting a new string from the parsed address, which
can be different (e.g. compressed vs expanded IPv6 address).
This fixes the chronyc sourcename command and -N option to print the IP
address exactly as it was specified in the configuration file or chronyc
add command.
The new unsynchronised source state is now reported in selectdata before
the first measurement.
Fixes: c29f8888c767 ("sources: handle unsynchronized sources in selection")
Some PHCs that have a PPS input don't have configurable pins (their
function is hardcoded). Accept a negative pin index to skip the pin
configuration before requesting external timestamping.
If a SHM or PHC refclock has a very large offset compensated by the
offset option, or ignored with the pps or local option, there is a
persistent loss of precision in the calculation of the sample offset
using the double format.
Rework the code to delay the calculation of the accumulated offset to
include the specificed compensation and remaining correction of the
system clock, where the calculation can be split to improve the
precision. In the pps mode ignore integer seconds competely.
The precision of the SOCK refclock is now limited to 1 nanosecond due to
the extra double->timespec->double conversion.
Add "local" option to specify that the reference clock is an
unsynchronized clock which is more stable than the system clock (e.g.
TCXO, OCXO, or atomic clock) and it should be used as a local standard
to stabilize the system clock.
Handle the local refclock as a PPS refclock locked to itself which gives
the unsynchronized status to be ignored in the source selection. Wait
for the refclock to get at least minsamples samples and adjust the clock
directly to follow changes in the refclock's sourcestats frequency and
offset.
There should be at most one refclock specified with this option.
Add support for accumulating frequency and time offset without changing
the reference parameters and calling the local parameter change
handlers.
This will allow an unsynchronized source to operate below other sources
in order to stabilize the clock.
Allow sources to accumulate samples with the leap status set to not
synchronized. Define a new state for them to be ignored in the
selection. This is intended for sources that are never synchronized and
will be used only for stabilization.
Don't leave the variables set to values outside their effective range.
This has no functional impact, but makes it clear what is the precedence
of the two settings.
Run the chronyc onoffline command also when the connectivity-change
and dhcp6-change actions are reported by the NetworkManager dispatcher.
The latter should not be necessary, but there currently doesn't seem to
be any action for IPv6 becoming routable after duplicate address
detection, so at least in networks using DHCPv6, IPv6 NTP servers should
not be stuck in the offline state from a previously reported action.
Latest NetworkManager code provides NTP servers from the DHCPv6 NTP
option (RFC 5908) in the DHCP6_DHCP6_NTP_SERVERS variable to dispatcher
scripts.
Check for invalid characters (which can come from the FQDN suboption)
and include the servers in the interface-specific sources file.
If chronyc waitsync was started before chronyd, it would try all
addresses (Unix socket, IPv4, IPv6) and get stuck with no address, not
getting any response later when chronyd was running.
Reset the address index in open_io() when returning with failure to
allow the next call to start with the first address again.
Reported-by: Jan Mikkelsen <janm@transactionware.com>
Fix the FreeBSD-specific code checking for a bound IPv4 socket to
include the new PTP port. This should fix a multihomed server to respond
to NTP-over-PTP requests from the address which received the request.
Fixes: be3158c4e5 ("ntp: add support for NTP over PTP")
In the FreeBSD-specific code checking for a bound IPv4 socket, make
sure it is not a Unix domain address to avoid reading uninitialized
IP-specific fields.
This fixes an error reported by valgrind.
Zero the whole sockaddr struct before calling bind() and connect() to
initialize the FreeBSD-specific sa_len field.
This fixes errors reported by valgrind.
Avoid searching the hash table of sources when a packet in the client
mode is received. It cannot be a response from our source. Analogously,
avoid source lookups for transmitted packets in the server mode. This
doesn't change anything for packets in symmetric modes, which can be
requests and responses at the same time.
This slightly improves the maximum packet rate handled as a server.
For a long time, the Solaris support in chrony wasn't tested on a real
Solaris system, but on illumos/OpenIndiana, which was forked from
OpenSolaris when it was discontinued in 2010.
While Solaris and illumos might have not diverged enough to make a
difference for chrony, replace Solaris in the documentation with illumos
to make it clear which system is actually supported by the chrony
project.
The dosynctodr kernel variable needs to be set to 0 to block automatic
synchronization of the system clock to the hardware clock. chronyd used
to disable dosynctodr on Solaris versions before 2.6, but it seems it is
now needed even on current versions as the clock driver sets frequency
only without calling adjtime() or setting the ntp_adjtime() PLL offset.
This issue was reproduced and fix tested on current OpenIndiana.
Fixes: 8feb37df2b ("sys_solaris: use timex driver")
In addition to the 16s limit in per-response change in the monotonic
offset, don't allow the total accumulated offset injected in sourcestats
to be larger than 16 seconds.
Check that the leap_when variable is set before testing a timestamp for
being close to a leap second. This allows the first measurement to be
accepted if starting at the Unix epoch (e.g. in a test).
Calculate the delay since the previous transmission only if the
TX timestamp is actually set. This removes an unnecessary delay when
starting at the Unix epoch in 1970 (e.g. in a test).
It seems there is no longer an issue with the first sample after the
initial trim and it can be accumulated. It might have been a workaround
for an unrelated bug which was fixed since then.
This fixes the number of samples reported in rtcdata briefly jumping to
65535 and also brings back the expectation that n_samples is never
negative.
Some of the code (e.g. util and clientlog) may work with negative
values. Require that time_t and the tv_nsec types are signed. This seems
to be the case on all supported systems, but it it is not required by
POSIX.
Close /dev/urandom and drop cached getrandom() data after forking helper
processes to avoid them getting the same sequence of random numbers
(e.g. two NTS-KE helpers generating cookies with identical nonces).
arc4random() is assumed to be able to detect forks and reseed
automatically.
This is not strictly necessary with the current code, which does not use
the GetRandom functions before the NTS-KE helper processes are forked,
but that could change in future.
Also, call the reset function before exit to close /dev/urandom in order
to avoid valgrind reporting the file object as "still reachable".
Don't ignore the magic field when searching for the exp1 extension
field in a received response. If there were two exp1 fields in the
packet, and only one of them had the expected magic value, it should
pick the right one.
Fixes: 2319f72b29 ("ntp: add client support for experimental extension field")
If the xleave option is enabled, ignore the key option and the hash
length. Always use version 4 as the default to get interleaved responses
from new chrony servers.
The interleaved modes are being specified for NTPv4 only. As a server,
detect interleaved requests only in NTPv4 packets.
Clients and peers can still send interleaved requests in lower-version
packets if configured with the version option.
Frequency transfer and time smoothing are conflicting features. Set the
monotonic timestamp in the experimental extension field to zero
(invalid) if time smoothing is activated.
The maximum value of the new 32-bit fields is slightly less than 16,
which can cause the NTP test #7 to pass for a server which has a zero
root delay but maximum root dispersion.
Interpret the maximum value as the maximum value of the original 32-bit
fields (~65536.0 seconds) for better compatibility with NTPv4.
Add "extfield F323" option to include the new extension field in
requests. If the server responds with this field, use the root
delay/dispersion and monotonic timestamp. Accumulate changes in the
offset between the monotonic and real-time receive timestamps and use
it for the correction of previous offsets in sourcestats. In the
interleaved mode, cancel out the latest change in the offset in
timestamps of the previous request and response, which were captured
before the change actually happened.
Maintain a server monotonic timescale needed for the experimental
extension field. It follows the best estimate of frequency without
time corrections. Implement it as an offset relative to the NTP time,
starting at zero, using a slew handler to cancel time corrections of the
NTP clock. The 32-bit epoch ID is set to a random value on start and
every step of the system clock.
Add an experimental extension field for some features that were proposed
for NTPv5. Higher-resolution root delay and dispersion (using 28-bit
fraction) are added. A monotonic receive timestamp will allow a
frequency transfer between the server and client. The client will be
able to separate the server's time corrections from frequency
corrections by tracking the offset between the real-time and monotonic
receive timestamps.
The field has a type of 0xF323 from the new experimental range proposed
by the NTP working group. Include a magic 32-bit value in the field to
avoid interoperability issues if a different implementation choses the
same type for its own experimental field. The value will be changed on
incompatible changes to avoid issues between two different chrony
versions.
Add a new variable to the packet info structure with flags for extension
fields included in received packets and add a new parameter to
transmit_packet() to add the fields to transmitted packets.
Since commit fdfcabd79b ("ntp: drop support for long NTPv4 MACs"), the
parser doesn't need to check validify of MACs in NTPv4 packets to
distinguish them from extension fields. Move the parser to ntp_core to
avoid having a separate iteration looking for non-authentication
extension fields.
Add extra space to the socket message buffer to be able to receive
maximum-length NTP-over-PTP SW/HW-timestamped messages from the Linux
error queue (which are looped back as layer-2 frames).
When calculating the root delay and dispersion of a sample measured in
the interleaved mode, use the root delay and dispersion values from
the previous response (to which the TX timestamp corresponds). If the TX
timestamp is combined with the RX timestamp of the latest response (e.g.
in the symmetric mode), use the maximum of the previous and latest root
delay/dispersion.
When the server clock was updated between saving of the RX timestamp and
updating the TX timestamp, a client using interleaved mode with the four
timestamps which minimize error in measured delay (e.g. chrony) had the
server clock adjustment included in the measured delay, which could
disrupt the sample filtering and weighting.
Add a handler to track the slew epoch and remember the last offset. Undo
the adjustment in TX timestamps which have their RX timestamp in the
previous epoch to fix the delay observed by the clients.
If an unknown clock step is detected, drop all timestamps.
Don't save server RX and TX timestamp to clientlog if the transmission
or authentication failed (e.g. packet is handled in ntp_signd). They
will not be needed.
Zero the initial TX timestamp which is saved for the interleaved
mode in case there is no previous timestamp saved in clientlog and
transmit_packet() does not generate a new one (e.g. due to failure in
authentication).
Fixes: 5f4cbaab7e ("ntp: optimize detection of clients using interleaved mode")
Report the number of received interleaved requests and current timestamp
count with their span.
Expand the serverstats description in chronyc man page.
When responding to a request, don't waste time with TX timestamping
if the timestamp will not be saved (i.e. clientlog is disabled).
Fixes: 5f4cbaab7e ("ntp: optimize detection of clients using interleaved mode")
Use the lowest bit of the server RX and TX timestamp as a flag
indicating RX timestamp. This allows the server to detect potential
interleaved requests without having to save all its RX timestamps. It
significantly reduces the amount of memory needed to support clients
using the interleaved mode if most of the server's clients are using the
basic mode (e.g. a public server).
Capture the TX timestamp on the first response to the request which has
the flag set to not further delay the first interleaved response.
False positives are possible with broken clients which set the origin
timestamp to something else than zero or the server RX or TX timestamp.
This causes an unnecessary RX timestamp to be saved and TX timestamp
captured and saved.
Move the calls resetting and generating authentication data out of the
loop checking for unique TX timestamp. This allows the timestamps to be
manipulated after the check.
Instead of keeping one pair of RX and TX timestamp for each address, add
a separate RX->TX map using an ordered circular buffer. Save the RX
timestamps as 64-bit integers and search them with a combined linear
interpolation and binary algorithm.
This enables the server to support multiple interleaved clients sharing
the same IP address (e.g. NAT) and it will allow other improvements to
be implemented later. A drawback is that a single broken client sending
interleaved requests at a high rate (without spoofing the source
address) can now prevent clients on other addresses from getting
interleaved responses.
The total number of saved timestamps does not change. It's still
determined by the clientloglimit directive. A new option may be added
later if needed. The whole buffer is allocated at once, but only on
first use to not waste memory on client-only configurations.
The BINDTODEVICE socket option is the first option in the seccomp filter
setting a string instead of int. Remove the length check from the
setsockopt rules to allow a device name longer than 3 characters.
This was reported in Debian bug #995207.
Fixes: b9f5ce83b0 ("sys_linux: allow BINDTODEVICE option in seccomp filter")
Add various settings to the example chronyd and chrony-wait services to
decrease the exposure reported by the "systemd-analyze security"
command. The original exposure was high as the analyzer does not check
the actual process (e.g. that it dropped the root privileges or that it
has its own seccomp filter).
Limit read-write access to /run, /var/lib/chrony, and /var/spool.
Access to /run (instead of /run/chrony) is needed for the refclock
socket expected by gpsd.
The mailonchange directive is most likely to break as it executes
/usr/sbin/sendmail, which can do unexpected operations depending on the
implementation. It should work with a setuid/setgid binary, but it is
not expected to write outside of /var/spool and the private /tmp.
Apparently some routers with hardware NAT acceleration have a bug
causing the kernel timestamps to be corrupted and break NTP. Similarly
to the sanity check applied to hardware timestamps, require the
kernel/driver timestamps to be within one second of the daemon timestamp
to be accepted.
On some systems (e.g. Solaris/OpenIndiana) rand() and random() have
different ranges. RAND_MAX is the maximum value returned by rand(),
but random() should always have a range of 0 through 2^31-1.
This fixes multiple failures in different tests.
Use the new cmdparse function for parsing the (cmd)allow/deny commands
and refactor the code a bit to reduce the number of functions needed for
all the (cmd)allow/deny(all) combinations.
Refactor the (cmd)allow/deny parser and make it more strict in what
input it accepts. Check the scanned numbers and require whole input to
be processed.
Move the parser to cmdparse to make it available to the client.
gnutls running in the FIPS140-2 mode does not allow MD5 to be
initialized, which breaks chronyd using MD5 to calculate reference ID
of IPv6 addresses. Specify a new hash algorithm for non-security MD5 use
and temporarily switch to the lax mode when initializing the hash
function.
Add support for crypto hash functions in gnutls (internally using
nettle). This can be useful to avoid directly linking with nettle to
avoid ABI breaks.
gnutls_aead_cipher_init() is declared in gnutls/crypto.h. If the
compiler handles implicit declarations as errors, the SIV support was
not detected. Fix the check to use the correct header.
Allow NTP messages to be exchanged as a payload of PTP messages to
enable full hardware timestamping on NICs that can timestamp PTP packets
only. Implemented is the protocol described in this draft (version 00):
https://datatracker.ietf.org/doc/draft-mlichvar-ntp-over-ptp/
This is an experimental feature. It can be changed or removed in future.
The used PTP domain is 123 and the NTP TLV type is 0x2023 from the "do
not propagate" experimental range.
The ptpport directive enables NTP-over-PTP as a server and as a client
for all sources that have the port option set to the PTP port. The port
should be the PTP event port (319) to trigger timestamping in the
hardware.
The implementation is contained to ntp_io. It is transparent to
ntp_core.
A while back, support for memory locking and real-time scheduling was
added to more platforms. The chronyd documentation wasn't updated at
that time (chronyd.conf was). This patch fixes that.
With the latest glibc it's now possible to define _TIME_BITS=64 to get
64-bit time_t on 32-bit Linux systems. This breaks the %ld printf/scanf
modifier used with the RTC drift timestamp. Process it as a double.
The files are read after dropping root privileges. They need to be
readable by the chrony user. The error message "Could not set
credentials : Error while reading file." does not make this requirement
very obvious.
Don't require timespec/timeval-double conversion tests to produce
correctly rounded results to handle x86 and other archs with wider
intermediate results.
Add level "2" to enable a filter which blocks only specific system calls
like fork and exec* instead of blocking everything unknown. It should
be reliable with respect to changes in libraries, but it provides only a
very limited protection.
When loading a dump file with the -r option, check also sanity of the
sample time, offset, peer/root delay/dispersion, and the sample order to
better handle corrupted files.
Don't print the original IP address in parentheses in the "Selected
source ..." message if it is identical to the current address. That is
expected to be the usual case for sources specified by IP address.
Log an error message when adding of a source fails, e.g. due to the new
limit on number of sources, or when the same address is specified
multiple times.
In the NTS-KE client don't reject the response if it has non-critical
records that are too long for the processing buffer. This is not
expected to happen with the current specification, but it might be
needed with future extensions.
Fixes: 7925ed39b8 ("nts: fix handling of long server negotiation record")
The cookie record is currently assumed to be the longest record that
needs to be accepted by the client, but that does not have to be always
the case. Define the processing buffer using the maximum body record
constant instead and add an assertion to make sure it's not smaller than
the maximum accepted cookie length.
Recent change in handling of the NTPv4 server negotiation record (commit
754097944b) increased the length of the instance name buffer to make
room for the trailing dot. This allowed a record with body truncated in
the processing buffer to be accepted and caused an over-read of 1 byte
in the memcpy() call saving the name to the instance buffer.
Modify the client to accept only records that fit in the processing
buffer.
Fixes: 754097944b ("nts: handle negotiated server as FQDN")
For reference clocks, which don't have a name, print "." instead of
NULL.
Fixes: f8610d69f0 ("sources: improve handling of dump files and their format")
The NTS RFC requires the recipient of the Server Negotiation NTS-KE
record to handle the name as a fully qualified domain name. Add a
trailing dot if not present to force the name to be resolved as one.
Early beta releases of macOS Big Sur had a signed/unsigned error in
Apple's implementation of ntp_adjtime. Apple have since fixed this error
and the workaround is no longer required.
When reading a *.sources file require that each line is termined by the
newline character to avoid processing an unfinished line, e.g. due to an
unexpected call of the reload command when the file is being written in
place.
When separate client and server instances of chronyd are running on one
computer (e.g. for security or performance reasons) and are synchronized
to each other, the server instance provides a reference ID based on the
local address used for synchronization of its NTP clock, which breaks
detection of synchronization loops for its own clients.
Add a "copy" option to specify that the server and client are closely
related, no loop can form between them, and the client should assume the
reference ID and stratum of the server to fix detection of loops between
the server and clients of the client.
Don't update the leap and stratum used in source selection if they
indicate an unsynchronized source.
Fixes: 2582be8754 ("sources: separate update of leap status")
The LOG_FATAL macro expands to (emitting the message and then) exit(1).
So a return after LOG_FATAL isn't reached. Drop all those to simplify
the code a bit.
It is not sufficient to check for disabled server sockets as they are
not open only after the special reference modes end (e.g. initstepslew).
Fixes: 004986310d ("ntp: skip loop test if no server socket is open")
This system call is required by the DSCP marking feature introduced in commit
6a5665ca58 ("conf: add dscp directive").
Before this change, enabling seccomp filtering (chronyd -F 1) and specifying a
custom DSCP value in the configuration (for example "dscp 46") caused the
process to be killed by seccomp due to IP_TOS not being allowed by the filter.
Tested before and after the change on Ubuntu 21.04, kernel 5.11.0-13-generic.
IP_TOS is available since Linux 1.0, so I didn't add any ifdefs for it.
Signed-off-by: Foster Snowhill <forst@forstwoof.ru>
Increase the maximum acceptable offset of the PPS lock reference from
20% to 40% of the PPS interval to not require the refclock offset to be
specified in configuration so accurately, or enable operation with a
highly unstable reference clock.
... for configuration checks. Compiler wrappers check for this name
in order to skip any instrumentation of the build that is intended
for regular source files only.
Instead of selectively suspending logging by redirecting messages to
/dev/null, increase the default minimum log severity to FATAL. In the
debug mode, all messages are printed.
Check if the name passed to DNS_Name2IPAddress() is an IP address
before calling getaddrinfo(), which can be much slower and work
differently on different systems.
On FreeBSD, the source address cannot be specified when sending a
message on a socket bound to a non-any IPv4 address, e.g. in default
configuration 127.0.0.1. In this case, make the address unspecified.
This is similar to commit 6af39d63aa ("ntp: don't use IP_SENDSRCADDR
on bound socket").
Fixes: f06c1cfa97 ("cmdmon: respond from same address")
Log a warning message if the main process has not dropped the root
privileges, i.e. when the compiled-in user or user specified by the user
directive or -u option is root.
Log a warning message if the interval covered by the maxlockage at the
PPS rate of a refclock is shorter than driver poll of the locked
refclock.
Reported-by: Matt Corallo <ntp-lists@mattcorallo.com>
If the online command is received when the resolver is running, start
it again as soon as it finishes instead of waiting for the timer.
This should reduce the time needed to get all sources resolved on boot
if chronyd is started before the network is online and the chronyc
online command is issued before the first round of resolving can finish,
e.g. due to an unreachable DNS server in resolv.conf.
If the O_NOFOLLOW flag used by open() is not defined, try it with
_GNU_SOURCE. This is needed with glibc-2.11 and earlier.
Reported-by: Marius Rohde <marius.rohde@meinberg.de>
With glibc 2.33 on armhf statx and fstatat64 are triggered.
Allow this call to un-break chrony on such platforms.
Without this e.g. test 005-scfilter fails and with ltrace -rTS reports:
a)
0.001684 SYS_397(11, 0xf75def08, 6144, 2047 <no return ...>
0.759239 +++ killed by SIGSYS +++
b)
0.003749 SYS_327(-100, 0xffdbcc3c, 0xffdbcb50, 0)
0.000821 --- SIGSYS (Bad system call) ---
Current armhf syscalls from:
https://github.com/torvalds/linux/blob/v5.10/arch/arm/tools/syscall.tbl
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Don't rely on assertions and running out of memory to terminate if
an extremely large number of sources is added. Set the maximum number
to 65536 to have a practical limit where chronyd still has a chance to
appear functional with some operations having a quadratic time
complexity.
When an NTS-KE server stops providing the NTP address or port, change
them to the original values to avoid the client getting stuck
with a non-responding address/port.
Instead of waiting for the first request, try to load the cookies as
soon as the instance is created, or the NTS address is changed.
This enables loading of dump files for servers that are negotiated in
NTS-KE.
In the NTS-NTP client instance, maintain a local copy of the NTP address
instead of using a pointer to the NCR's address, which may change at
unexpected times.
Also, change the NNC_CreateInstance() to accept only the NTP port to
make it clear the initial NTP address is the same as the NTS-KE address
and to make it consistent with NNC_ChangeAddress(), which accepts only
one address.
Allow NSR_UpdateSourceNtpAddress() to be (indirectly) called from
NCR_CreateInstance() and NCR_ChangeRemoteAddress(). In these cases, save
the addresses and make the update later when the function calls return.
After loading the dump files with the -r option, immediately perform a
source selection with forced setting of the reference. This shortens the
interval when a restarted server doesn't respond with synchronized time.
It no longer needs to wait for the first measurement from the best
source (which had to pass all the filters).
Check for write errors when saving dump files. Don't save files with no
samples. Add more sanity checks for loaded data.
Extend the file format to include an identifier, the reachability
register, leap status, name, and authentication flag. Avoid loading
unauthenticated data after switching authentication on. Change format
and order of some fields to simplify parsing. Drop fields that were kept
only for compatibility.
The dump files now contain all information needed to perform the source
selection and update the reference.
There is no support kept for the old file format. Loading of old dump
files will fail after upgrading to new version.
Remove stratum from the NTP sample and update it together with the leap
status. This enables a faster update when samples are dropped by the NTP
filters.
Certificates can include IP addresses as alternative names to enable
clients to verify such certificates without knowing the hostname.
Accept an IP address as a name in the NTS-NTP client and modify the
session code to not set the SNI in this case.
For sources specified by an IP address, keep the original address as the
source's name and pass it to the NCR instance. Allow the sources to go
through the replacement process if their address has changed.
This will be useful with NTS-KE negotiation.
The IP-based source names are now provided via cmdmon. This means
chronyc -n and -N can show two different addresses for a source.
Remove packet interval checks with long delays as the tests are much
more likely to end when the client is waiting for a response. Increase
the base delay to make selection with two sources more reliable.
Reported-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Make sure each processed control messages has the expected length.
Beside improved safety, this should prevent potential issues with broken
timestamps on systems that support both 64-bit and 32-bit time_t.
The "infinite loop in scheduling" fatal error was observed on a system
running out of memory. Presumably, the execution of the process slowed
down due to memory thrashing so much that the dispatching loop wasn't
able to break with a single server polled at a 16-second interval.
To allow recovery in such a case, require for the error more than
20 handled timeouts and a rate higher than 100 per second.
Reported-by: Jamie Gruener <jamie.gruener@biospatial.io>
This commit updates the FAQ with a new entry.
chronyd's Linux RTC driver (rtc_linux.c) requires the following ioctl
requests to be functional:
RTC_UIE_ON
RTC_UIE_OFF
However, a Linux system's RTC driver does not necessarily implement them,
as noted in these previous commits:
d66b2f2b24
rtc: handle RTCs that don't support interrupts
Tue Dec 10 17:45:28 2019 +0100
bff3f51d13
rtc: extend check for RTCs that don't support interrupts
Thu Dec 12 12:50:19 2019 +0100
Fortunately, the Linux kernel can be built with software emulation of
these hardware requests, by enabling the following config variable:
CONFIG_RTC_INTF_DEV_UIE_EMUL
Provides an emulation for RTC_UIE if the underlying rtc chip
driver does not expose RTC_UIE ioctls. Those requests generate
once-per-second update interrupts, used for synchronization.
The emulation code will read the time from the hardware
clock several times per second, please enable this option
only if you know that you really need it.
This commit records these facts for the benefit of the user.
If ntsdumpdir is specified and the server NTS keys are not reloaded from
the file, save the generated keys on start instead of waiting for the
first rotation or exit. This allows the keys to be shared with another
server without having to use the dump command.
In the initial resolving of pool sources try to assign each address only
once. If it fails, it means the address is already used (DNS provided
the same address) or the address is not connectable. The same result can
be expected for other unresolved sources of the pool as they don't have
a real address yet.
The NTS-KE helper doesn't need to bind sockets or adjust the clock.
Don't start the privops helper, or keep the capabilities, when dropping
root privileges in its context.
Update the monotonic time before the timestamps are corrected for
unexpected jumps, e.g. due to the computer being suspended and resumed,
and switch to the raw timestamps. This should allow the NTS refresh
interval to better follow real time, but it will not be corrected for
a frequency offset if the clock is not synchronized (e.g. with -x).
Even if a received message will not be returned to the caller (e.g.
because it is truncated), process its control messages to avoid leaking
received descriptors.
Fixes: f231efb811 ("socket: add support for sending and receiving descriptors")
Generate a new uniq ID on each client poll to invalidate responses to
the previous request, even if a new request cannot be generated (e.g.
due to missing cookies). Reset the NAK indicator earlier in the request
sequence. Also, drop the cookie even if it's not included in the request
to prevent the client from getting stuck with a cookie that has an
invalid length. Rely on the exponentially increasing interval to avoid
frequent NTS-KE sessions due to a client bug.
Check the mode instead of the nts pointer to make it clear the pointer
is not expected to be NULL in an NTS instance (unless the NTS support is
stubbed).
GNU readline switched to GPLv3+ in version 6.0, which is incompatible
with the chrony's GPLv2 license.
Drop support for the readline library. Only editline is supported now.
The -U option can be used to start chronyd under a non-root user if it
is provided with all capabilities and access to files, directories, and
devices, needed to operate correctly in the specified configuration. It
is not recommended in cases where the configuration is unknown.
Don't accept NTPv4 packets which have a MAC longer than 24 octets to
strictly follow RFC 7822, which specifies the maximum length of a MAC
and the minimum length of the last extension field to avoid an ambiguity
in parsing of the packet.
This removes an ugly hack that was needed to accept packets that
contained one or more extension fields without a MAC, before RFC 7822
was written and NTP implementations started using truncated MACs.
The long MACs were used by chrony in versions 2.x when configured to
authenticate a server or peer with a key using a 256-bit or longer hash
(e.g. SHA256). For compatibility with chrony >= 4.0, these clients/peers
will need to have "version 3" added to the server/peer line in
chrony.conf.
Show untrusted sources with the '?' symbol instead of '-' to make them
consistent with not selectable and selectable sources in the selectdata
description.
Log an error message when SCK_OpenTcpSocket() fails in the NTS-KE
client, e.g. when connect() fails due to the port not being allowed in
the SELinux policy.
Before sending a cmdmon response, make sure it is not longer than the
request to avoid amplification in case the response/padding length is
incorrectly specified for a request.
Make the precision of the system clock configurable. This can be useful
on servers using hardware timestamping to reduce the amount of noise
added to the NTP timestamps and improve stability of NTP measurements.
These syscalls seem to be needed when gnutls is loading system trusted
certificates due to p11-kit >= 0.23.21 getting the program name from
/proc/self/exe.
On macOS 11.0 (Big Sur) beta, ntp_adjtime() incorrectly returns
timex.freq as an unsigned number. This patch is a workaround for the bug
and should be removed when Apple fix the problem (assuming they will).
When opening a file for appending (i.e. a log file), use the O_NOFOLLOW
flag to get an error if the path is a symlink. Opening log files through
symlinks is no longer supported.
This is a protection against symlink attacks if chronyd is misconfigured
to write a log in a world-writable directory (e.g. /tmp). That is not
meant to become a recommended practice. Log messages will be lost, or
chronyd won't start, if a symlink exists at the location of the log
file.
Session tickets should never be enabled with the currect code on both
clients and servers. Set the GNUTLS_NO_TICKETS flag when opening a TLS
session in case this understanding is wrong, or it changes in future, to
reduce the TLS attack surface.
The directive sets the DSCP value in transmitted NTP packets, which can
be useful in local networks where switches/routers are configured to
prioritise packets with specific DSCP values.
Before generating a MAC, make sure there is enough space in the packet.
This is always true with the current code, but it may change when a
non-NTS extension field is supported.
Update the packet auth info after generating a MAC in case it's needed
before the transmission.
Add more assertions and make other changes for better readability.
It seems gnutls (at least in version 3.6.14) allows clients to connect
using TLS1.2 when it has a DTLS version enabled in the priority cache.
Disable all DTLS versions in order to disable TLS1.2.
Destroy the NTS-KE session of the client immediately even when the
resolver of the NTP address is running. This removes the session
local change handler and avoids an assertion failure in the local
finalization.
When the request has an unrecognized critical record before the
NEXT_PROTOCOL and AEAD_ALGORITHM records, respond with error 0
(unrecognized critical record) instead of 1 (bad request).
When the request has multiple NEXT_PROTOCOL or AEAD_ALGORITHM records,
respond with error 1 (bad request).
Add a function to get the current minimum severity and a function to set
a global prefix for debug messages in order to identify messages from
helpers.
For consistency and safety, change the CMC and HSH functions to accept
signed lengths and handle negative values as errors. Also, change the
input data type to void * to not require casting in the caller.
Modify NNA_DecryptAuthEF() to not assume that the authenticator is the
last extension field in the packet as some extension fields specified in
future may need to be placed after the authenticator. The caller of the
function is supposed to verify the position.
Add more comments and assertions, replace getsockopt() call with
SCK_GetIntOption(), replace strncmp() with memcmp(), move a return
statement for clarity, and remove an unused field from the instance
record.
The daemon transmit timestamps are precompensated for the time it takes
to generate a MAC using a symmetric key (as measured on chronyd start)
and also an average round-trip time of the Samba signing of MS-SNTP
responses. This improves accuracy of the transmit timestamp, but it
has some issues.
The correction has a random error which is changing over time due to
variable CPU frequency, system load, migration to a different machine,
etc. If the measured delay is too large, the correction may cause the
transmit timestamp to be later than the actual transmission. Also, the
delay is measured for a packet of a minimal length with no extension
fields, and there is no support for NTS.
Drop the precompensation in favor of the interleaved mode, which now
avoids the authentication delay even when no kernel/hardware timestamps
are available.
If the daemon transmit timestamp is saved for processing of a future
response or responding in the interleaved mode, get a more accurate
timestamp right before calling NIO_SendPacket(). Avoid unnecessary
reading of the clock for the transmit timestamp in the packet (i.e.
in interleaved modes and client basic mode).
This should improve accuracy and stability when authentication is
enabled in the client and symmetric basic modes and also interleaved
modes if kernel/hardware timestamps are not available.
After commit e49aececce ("socket: don't set interface for sent
packets") the NTP and cmdmon server stopped responding to requests from
link-local addresses.
Set the interface specifically for packets sent to a link-local address.
Revert commit e49aececce ("socket: don't set interface for sent
packets") to allow the interface to be selected for outgoing packets,
but don't set it in the callers yet.
Add binddevice, bindacqdevice, and bindcmddevice directive to specify
the interface for binding the NTP server, NTP client, and command socket
respectively.
As a Linux-specific feature, allow sockets to be bound to a device using
the SO_BINDTODEVICE socket option. The CAP_NET_RAW capability is
required for setting the option.
Similar to the DHCP dispatcher, add a variable for the chronyc
executable path, which can be overwritten more easily by
downstream packages if needed.
Also give an `.onoffline` suffix to more clearly differentiate
this script from `chrony.nm-dispatcher.dhcp`.
Add new NM dispatcher script for NTP servers given by DHCP through
NetworkManager in a similar way to how distributions have done in
11-dhclient, e.g. [1]. New NTP servers are written as entries to a
file per-interface in /var/run/chrony-dhcp, which is re-read by
chronyd upon executing `chronyc reload sources`.
This provides a way for NTP server configuration to be carried over
from NetworkManager DHCP events to chrony, for DHCP clients other
than dhclient. Part of fixing integration where the NetworkManager
internal client is used, e.g [2].
Paths to the chronyc executable and sources directory are set in
variables, which may be overwritten by downstream packages, but
should work for distributions for the most part.
[1] https://src.fedoraproject.org/rpms/dhcp/blob/master/f/11-dhclient
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1800901
Ignore IPv6 addresses returned by getaddrinfo() that have a non-zero
scope ID to avoid silently ignoring the ID if it was specified with the
% sign in the provided string.
This can be removed when the scope ID is returned from the function and
the callers handle it.
Instead of making the initial burst only once and immediately after
chronyd start (even when iburst is specified together with the offline
option), trigger the burst whenever the connectivity changes from
offline to online.
Add some new directives, remove dumponexit (it's a no-op), remove
broadcast (to not encourage its use), fix a typo, and remove a
OS-specific limitation.
Allow an IP family to be specified in the socket initialization in order
to globally disable the other family. This replaces the ntp_io and
cmdmon code handling the -4/-6 options and fixes a case where the NTP
client could still use a disabled family if the source was specified
with an IP address.
Add -p option to chronyd to print lines from the configuration as they
are parsed and exit. It can be used to verify the syntax and get the
whole configuration when it is split into multiple files.
When the main makefile is used to get the list of chronyd objects in
order to build the unit tests, clang started (with the -MM option) to
generate the dependency files prints error messages about wrong
inclusions. Set a NODEPS variable to completely disable the generation
of the files.
Add support for the AES-SIV-CMAC cipher in gnutls using the AEAD
interface. It should be available in gnutls-3.6.14.
This will enable NTS support on systems that have a pre-3.6 version of
Nettle, without falling back to the internal SIV implementation.
When compiled with NTS support, don't require a SIV cipher to be always
supported (e.g. due to a different version of a library used for
building). Handle this case with a fatal message instead of crash.
Also, check the support early in the client unit test to prevent a hang.
This is not expected to happen, but make sure the endpoints of each
source are in the right order (i.e. the distance is not negative) to
prevent getting a negative depth in the selection.
Handle trusted sources as a separate set of sources which is required to
have a majority for the selection to proceed. This should improve the
selection with multiple trusted sources (e.g. due to the auth selection
mode).
When the selection has some trusted sources, don't require non-trusted
sources to be contained in the best interval as that can usually pass
only one source if the best interval is the interval of the source, or
no source at all if the best interval is an intersection of multiple
sources.
Relax the requirement for non-trusted sources to be contained in the
best interval of trusted sources alone instead of all sources in the
trusted interval.
The selection options returned as flags are not reported by the
client and will be better reported in a separate command with other
selection-specific data.
Destroy the client cert credentials when destroying the last NKC
instance instead of NKC_Finalise(). This allows the client to reload the
trusted cert file between NTS-KE sessions.
Instead of sharing the NTP rate limiting with NTS-KE, specify a new
service for NTS-KE and use it in the NTS-KE server.
Add ntsratelimit directive for configuration.
Refactor the client record and clientlog API to reuse more code between
different services and enumerate the services instead of hardcoding NTP
and cmdmon.
Add a new field to the CLIENT_ACCESSES_BY_INDEX request to specify the
minimum number of NTP or cmdmon packets for a client to be reported.
Add -p option to the chronyc clients command to specify the threshold
(by default 0). This option can be used to minimize the number of cmdmon
requests when interested only in clients sending a large number
of requests.
Add a flag to the CLIENT_ACCESSES_BY_INDEX request to reset the
NTP/cmdmon hits/dropped counters after reporting the current values.
Add -r option to the chronyc clients command to perform the reset. This
should make it easier to find clients that send large number of requests
over short periods of time.
Remove the "adjustment started" part from the "System clock wrong by *
seconds, adjustment started" log message as it might be confusing in
some cases. There may be a step instead of a slow adjustment, or there
may be no adjustment at all when running with the -x option.
With asymmetric routing (e.g. with BGP) it may not be possible to
respond to a request using the same interface. In such case, setting the
interface index in IP*_PKTINFO* causes the packet to be silently dropped
by the kernel.
Until we can predict if sending with the specified interface will
succeed, or provide only a hint, don't set the interface and leave it
to the kernel to select an interface.
This reverts commit 5fc7674e36 ("ntp: set interface index in
IP*_PKTINFO when responding").
Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Add a confdirs directive to include *.conf files from multiple
directories. If a file with the same name exists in multiple
directories, only the first one in the order of the specified
directories will be included.
When authentication is enabled for an NTP source, unauthenticated NTP
sources need to be disabled or limited in selection. That might be
difficult to do when the configuration comes from different sources
(e.g. networking scripts adding servers from DHCP).
Define four modes for the source selection to consider authentication:
require, prefer, mix, ignore. In different modes different selection
options (require, trust, noselect) are added to authenticated and
unauthenticated sources.
The mode can be selected by the authselectmode directive. The mix mode
is the default. The ignore mode enables the old behavior, where all
sources are used exactly as specified in the configuration.
Refactor the code to allow the selection options of the current sources
to be modified when other sources are added and removed. Also, make the
authentication status of each source available to the code which makes
the modifications.
Add "nocerttimecheck" directive to specify the number of clock updates
that need to be made before the time validation of certificates is
enabled. This makes NTS usable on machines that don't have a RTC.
If ntsrotate is set to 0, don't generate new server keys and don't save
them to ntsdumpdir. This allows the keys to be managed externally and
shared with other servers.
When replacing an NTP source, update the NTS address before the NTP
address to save cookies with the old NTP address instead of the newly
resolved address (which may immediately change to an address provided by
NTS-KE).
Include in the key dump file an identifier, the AEAD number, and the
age of the last key to improve robustness and avoid generating a new key
immediately on start.
Also, improve the code that saves and loads the file.
Save the NTS context and cookies to files in the NTS dumpdir when the
client NTS instances are destroyed or the address is changed, and reload
the data to avoid unnecessary NTS-KE requests when chronyd is restarted
or it is switching between different addresses resolved from the NTS-KE
or NTP name.
Add a context structure for the algorithm and keys established by
NTS-KE. Modify the client to save the context and reset the SIV key to
the C2S/S2C key before each request/response instead of keeping two SIV
instances.
This will make it easier for the server to support different algorithms
and allow the client to save the context with cookies to disk.
Make the NTS-KE retry interval exponentially increasing, using a factor
provided by the NKE session. Use shorter intervals when the server is
refusing TCP connections or the connection is closed or timing out
before the TLS handshake.
The server session instances are reused for different clients. Separate
the server name from the label used in log messages and set it on each
start of the session.
When a source was replaced and the new source had the same slot as the
old source, a wrong message was logged. Fix the condition to distinguish
correctly between changed address and port.
Fixes: 9468fd4aa6 ("ntp: allow changing port of source")
Improve the check to work with the actual timestamp of the leap second
instead of the closest midnight and don't turn it off on the leap
timeout. Also allow sample times to be checked in addition to the system
time and NTP time to avoid accumulation of samples mixing pre-leap and
post-leap timestamps (causing error of +/-0.5 or +/-1.0 seconds).
Don't require the caller to provide a SCK_Message (on stack). Modify the
SCK_ReceiveMessage*() functions to return a pointer to static buffers,
as the message buffer which SCK_Message points to already is.
On Linux, enable the SO_REUSEPORT option on sockets bound to a port in
order to support load balancing with multiple chronyd instances
(configured to not adjust the system clock).
The IP_FREEBIND option already allowed different instances to bind to
the same address and port, but only one was actually receiving packets.
As the instances don't share their state, sharing the NTP port doesn't
work well with the interleaved mode, symmetric mode, and rate limiting.
Sharing the NTS-KE port will not work until the server keys can be
derived from a shared key.
Earlier versions of macOS do not provide clock_gettime(). This patch
checks for clock_gettime() at run-time and falls back to gettimeofday()
if the symbol is not present.
Update the local clock errors with each update of the leap status to
avoid the kernel marking the clock as unsynchronized when a large
number of NTP samples is dropped.
When a leap second status is updated by a source, don't wait for the
next source selection and full update of the reference. Count votes from
sources that passed the previous selection and update the reference leap
status directly.
This should allow leap seconds to spread quickly even when the
samples are dropped or delayed by the filters.
Remove leap status from the NTP sample and set it independently from
the sample accumulation in order to accept a leap second sooner when
samples are filtered.
The reset command drops all measurements and switches the reference to
the unsynchronised state. This command can help chronyd with recovery
when the measurements are known to be no longer valid or accurate, e.g.
due to moving the computer to a different network, or resuming the
computer from a low-power state (which resets the system clock).
The source handler resets SST instances on an unknown step, which
makes the sources unselectable, but SRC_SelectSource() doesn't call
REF_SetUnsynchronised() when no source is selectable.
Handle the step in the reference handler.
Fixes: 049eae661a ("sources: keep synchronized status with unreachable/unselectable sources")
Measure the interval since the start in order to provide a monotonic
time for periodical tasks not using timers like driftfile updates, key
refresh, etc. Return the interval in the double format, but keep an
integer remainder limiting the precision to 0.01 second to avoid issues
with very small increments in a long-running process.
Before enabling NTS, check for more gnutls functions (some added in
3.6.3) to avoid build failures with older gnutls versions. Also, make
sure that nettle supports the new AES interface (added in 3.0).
The onoffline command switches an unresolved source to the offline
status, even when the network is already up.
Ignore the onoffline command for unresolved sources to prevent sources
unexpectedly staying in the offline status, e.g. when the command is
issued from a network dispatcher script (and no other call is expected
later when the name is resolved).
Allow the nts and ntsport options to be specified for sources added from
chronyc. This is an incompatible change in the request, but there was no
release using the new REQ_ADD_SOURCE command yet.
Add an option to enable NTS for an NTP source. Check for NTS-specific
extension fields and pass the packets to the NTS-NTP code in order to
enable the NTS client and server.
Allow multiple resolving threads to be running at the same time in order
to support multiple callers, but use a mutex to avoid sending multiple
requests to the privops helper. This will be needed for the NTS-KE
server negotiation.
Avoid resetting the active status when an NTP source changes its
address in NCR_ChangeRemoteAddress().
This will allow an NTP source to update its address with NTS-KE
hostname negotiation and continue in a special reference mode
(e.g. -q/-Q option).
This will allow a source to have its address changed due to NTS-KE
server negotiation, which allows the NTS-KE server to have a different
address than the NTP server.
Modify the replace_source() function to not require a different IP
address when replacing a source with the same address but different
port. This will enable the NTS-KE port negotiation.
If authentication is not enabled in configuration, responses are not
expected to be authenticated. Handle such responses as having failed
authentication.
A case where this could happen is a misconfigured symmetric association
where only one peer has specified the other with a key. Before this
change synchronization would work in one direction and used packets
with an asymmetric length.
MAC longer than 24 octets in NTPv4 packet is supported only for
compatibility with some pre-RFC7822 chrony versions. They didn't use
any extension fields.
Move most of the authentication-specific code to a new file and
introduce authenticator instances in order to support other
authentication mechanisms (e.g. NTS).
Rework the code to detect the authentication mode and count extension
fields in the first parsing of the packet and store this information in
the new packet info structure.
When sending a response in the server or passive mode, make sure the
response is not longer than the request to prevent amplification
attacks when resposes may contain extension fields (e.g. NTS).
Add a structure for length and other information about received and
transmitted NTP packets to minimize the number of parameters and avoid
repeated parsing of the packet.
When changing an address of a source (both known and unknown), make sure
the new address is connectable. This should avoid useless replacements,
e.g. polling an IPv6 address on IPv4-only systems.
Don't open a dumpfile for reading or writing if the NTP source doesn't
have a real address.
Fixes: d7e3ad17ff ("ntp: create sources for unresolved addresses")
Add -a option to the sources and sourcestats commands to print all
sources, including those that don't have a resolved address yet. By
default, only sources that have a real address are printed for
compatibility. Remove the "210 Number of sources" messages to avoid
confusion. Also, modify the ntpdata command to always print only sources
with a resolved address.
When resolving of a pool name succeeds, don't remove the remaining
unresolved sources, i.e. try to get all maxsources (default 4) sources,
even if it takes multiple DNS requests.
If an individual unresolved source or all unresolved sources from a pool
are removed, stop resolving their addresses on the next attempt (don't
remove them immediately as the resolver may be running).
Add a new type of address for NTP sources that don't have a resolved
address yet. This will allow the sources to be displayed, modified and
deleted by chronyc.
Update utility functions to support the new addresses.
With the new file utility functions permissions can be restricted for
newly created files. For the log file specified by the -l option it
is better to remove the "other" permissions (0640) to make it similar
to the system log.
Try stat() before calling unlink() to make sure the file is accessible.
This fixes chronyc running under a non-root/chrony user printing an
error message due to missing permissions on /var/run/chrony before
trying to bind its socket.
The current default NTP era split passed the Unix epoch (~50 years ago),
which means the epoch converted to an NTP timestamp and back ends up in
the next NTP era (year 2106).
Fix the test to take into account the era split.
Add a new command to print the original name of a source specified by
address. This could be useful in scripts to avoid having to run the
sources command with and without -N.
Add -N option to chronyc to print the original names by which the
sources were specified instead of using reverse DNS lookup. The option
works in the sources, sourcestats and tracking commands.
Specify a new request to get the name of the NTP source corresponding to
an address, which was originally specified in the configuration file or
chronyc add command.
Modify the request for adding a source to provide the name of the source
instead of its address (resolved in chronyc) in order to enable chronyd
to replace the source, support an "add pool" command, and enable an NTS
client to verify the server's certificate.
The name resolving does not block the response. Success is indicated
even if the name cannot be resolved, or a source with the same address
is already present.
To prevent unresolvable names from getting to chronyd, chronyc does not
send the request if it could not resolve the name itself (assuming they
are both running on the same host using the same resolver).
Return an error status when the name is not printable or contains a
space (don't bother with full hostname validation). If the name is an
address, return the same status as NSR_AddSource(). Otherwise, return a
"not resolved yet" status.
The test might run on different platforms. If the platform happens
to have a RTC that does exist but unable to have RTC_UIE_ON set the
test will fail, while the chrony code is actually good.
Examples of bad clocks are:
- ppc64el: rtc-generic
- arm64: rtc-efi
To avoid that extend the log message check on 101-rtc to accept
that condition as a valid test result as well.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Several RTCs would only expose the broken behavior on enabling
interrupts. The reason for that is that the kernel only returns the
error if the state changes. Therefore the check has to probe
switch_interrupts(1) as well.
On platforms that work it will be switched on and off, while on those it
never works it will just stay off.
Clocks known to expose that behavior include, but are not limited to:
PPC64# dmesg | grep -i rtc
[ 0.241872] rtc-generic rtc-generic: registered as rtc0
[ 0.270221] rtc-generic rtc-generic: setting system clock to ...
ARM64# dmesg | grep -i rtc
[ 0.876198] rtc-efi rtc-efi: registered as rtc0
[ 1.046869] rtc-efi rtc-efi: setting system clock to ...
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Some RTCs supported by the Linux kernel don't support the RTC_UIE_ON/OFF
ioctls, which causes chronyd started with the -s option to get stuck in
the initial RTC mode.
After opening the RTC device in the initialization, return error if
the ioctls are not supported to prevent the upper layer from calling the
time_init() function and expecting it to finish.
The function may be called from a separate thread, but logging is not
considered thread safe (e.g. due to using functions which read
environment variables).
Add a function to open a file for reading, writing, or appending.
In uppercase modes errors are handled as fatal, i.e. the caller doesn't
need to check for NULL. To avoid string manipulations in the callers,
the function accepts an optional directory and suffix. New files are
created with specified permissions, which will be needed for saving
keys. The O_EXCL flag is used in the writing mode to make sure a new
file is created (on filesystems that support it).
Also, add a function to rename a temporary file by changing its suffix,
and a function to remove a file.
All functions log all errors, at least as debug messages.
When replacing an existing rtc file with the temporary file, don't
change the ownership or permissions of the temporary file to match the
old rtc file, as if it didn't exist.
When replacing an existing drift file with the temporary file, don't
change the ownership or permissions of the temporary file to match the
old drift file, as if it didn't exist.
Include <limits.h> and use the PATH_MAX macro to define the length of
buffers containing paths to make it constistent. (It's not supposed to
fit all possible paths.)
Call exit() in LOG_Message() after printing a fatal message to allow the
LOG macro or LOG_Message() to be used directly instead of the LOG_FATAL
macro.
Allow a cipher (AES128 or AES256) to be specified as the type of a key
in the key file to authenticate NTP packets with a CMAC instead of the
NTPv4 (RFC 5905) MAC using a hash function. This follows RFC 8573.
Remove the magic constant compensating for copying, conversions, etc.
It cannot possibly be accurate on all hardware. The delay is supposed to
be a minimum delay.
An analysis by Tim Ruffing [1] shows that a length extension attack
adding valid extension fields to NTPv4 packets is possible with some
specific key lengths and hash functions using little-endian length like
MD5 and RIPEMD160.
chronyd currently doesn't process or generate any extension fields, but
it could be a problem in future when a non-authentication extension
field is supported.
Drop support for all RIPEMD functions as they don't seem to be secure in
the context of the NTPv4 MAC. MD5 is kept only for compatibility.
[1] https://mailarchive.ietf.org/arch/msg/ntp/gvibuB6bTbDRBumfHNdJ84Kq4kA
Improve the client's test D to compare the stratum, reference ID,
reference timestamp, and root delay from the received packet with its
own reference data in order to prevent it from synchronizing to itself,
e.g. due to a misconfiguration.
In the local reference mode, instead of returning the adjusted current
time as the reference time, return the same timestamp updated only once
per about 62.5 seconds.
This will enable chronyd to detect polling of itself even when the local
reference mode is active.
Instead of converting the reference timestamp to the NTP format and
back, add a negative double value to the timestamp directly. Move the
code to a separate function. This will allow the timestamp to stay
outside the compiled-in NTP era, which is useful for testing of the
cmdmon protocol.
Setting maxsamples to 1 or 2 prevented the source from being selected as
the regression would always fail. Handle this as a special case with
disabled frequency tracking in order to enable a fast reference update
with the -q/-Q option.
If there are too few samples to make a regression, at least update the
offset estimate from the last sample and keep the previous frequency
offset unchanged. Also, reset the error estimates.
Don't call bind() if the specified local address of a socket has port 0
and the "any" address. It will be bound automatically on connect() or
sendmsg().
On start, check if the SOCK_CLOEXEC and SOCK_NONBLOCK flags are
supported in the socket() call and use them instead of fcntl() in order
to reduce the number of system calls required to send a client request.
All networking code in chronyd (NTP server/client, signd client, cmdmon
server) assumes sending a message will not block, but only the signd
client actually checks for a write event and only the NTP server
requests a non-blocking socket. The cmdmon server and NTP client
(if using one socket for all servers) might be blocked.
chronyc doesn't need a non-blocking socket, but it is not expected to
block as it sends only one message at a time.
Prefer dropped messages over blocking in all cases. Remove the
SCK_FLAG_NONBLOCK flag and make all sockets non-blocking.
Add a new file implementing support for opening sockets, sending and
receiving messages with control messages (e.g. addresses, timestamps),
and related operations, which should be simpler to use than the system
functions and allow their features to be reused between different parts
of the chrony code.
It is based on the ntp_io.c and ntp_io_linux.c files. It will be used by
the NTP client/server, cmdmon server, client, and others.
Rename NTP_Remote_Address to IPSockAddr to make it usable in non-NTP
context and provide NTP_Remote_Address for compatibility. Also, change
the type of port to uint16_t.
Don't abort on start when no UDP socket could be opened/bound for
cmdmon. The Unix socket is more important and with the IP_FREEBIND
option this case was not caught anyway.
Reorder the LOGS_Severity enum in order of severity and change the code
to not log/print messages with severity below the specified minimum
instead of having a separate debug level.
Specify SOCK_DGRAM socktype instead of SOCK_STREAM in hints for
getaddrinfo() as chronyd is (and will mainly be) using the returned
addresses to open UDP sockets. This shouldn't make a difference in
practice, but it might avoid some confusion.
In NIO_Linux_RequestTxTimestamp(), check the returned pointer and the
length of the buffer before adding the control message. This fixes an
issue reported by the Clang static analyzer.
With future kernels it may be possible to get, but not set, the HW
timestamping configuration on some specific interfaces like macvlan in
containers. This would require the admin to configure the timestamping
before starting chronyd.
If SIOCSHWTSTAMP failed on an interface, try SIOCGHWTSTAMP to check if
the current configuration matches the expected configuration and allow
the interface to be used for HW timestamping.
Stop trying to maintain a list of individual contributions. Just list
the contributors. For tracking individual changes in the source code
there is git.
Real-time scheduling and memory locking is available on posix compliant
OSs. This patch centralizes this functionality and brings support to
FreeBSD, NetBSD, and Solaris.
[ML: updated coding style]
Add a new set of tests for testing basic functionality, starting chronyd
with root privileges on the actual system instead of the simulator.
Tests numbered in the 100-199 range are considered destructive and
intended to be used only on machines dedicated for development or
testing. They are started by the run script only with the -d option.
They may adjust/step the system clock and other clocks, block the RTC,
enable HW timestamping, create SHM segments, etc.
Other tests should not interfere with the system and should work even
when another NTP server/client is running.
Fix an issue with Linux and musl libc where sched_setscheduler is not
implemented. It seems that pthread_setschedparam is more widely
supported across different C libraries and OSs. For our use case, it
should make no difference which call is used.
On FreeBSD, sendmsg() fails when IP_SENDSRCADDR specifies a source
address on a socket that is bound to the address. This prevents a server
configured with the bindaddress directive from responding to clients.
Add a new variable to check whether the server IPv4 socket is not bound
before setting the source address.
A new ioctl will probably be added in Linux 4.21. It should enable a
significantly more accurate measurement of the offset between PHC and
system clock.
Don't forget to include the length of the frame check sequence (FCS) in
the RX timestamp transposition when the L2 length of the received packet
is from SCM_TIMESTAMPING_PKTINFO.
This fixes commit 934d4047f1.
Instead of linking unit tests with *.o in the root directory, which may
include conflicting objects from a different configuration (e.g. hash),
add a print target to the main Makefile and use it in the unit test
Makefile to link only with objects that are relevant in the current
configuration.
The example spec file was too limited to be recommended for use in any
rpm-based distribution, e.g. it didn't configure chronyd to drop the
root privileges.
Users that want to build a package from the latest source code should
start with the official package of their distribution.
FreeBSD doesn't support IP_PKTINFO. Instead it provides IP_RECVDSTADDR
and IP_SENDSRCADDR, which can be used to get/set the destination/source
address.
In future IP_RECVIF and IP_SENDIF may be supported to get and set also
the interface.
When a server with multiple interfaces in the same network is sending a
response, setting the ipi_spec_dst/ipi6_addr field of the IP*_PKTINFO
control message selects the source address, but not necessarily the
interface. The packet has the expected source address, but it may be
sent by an interface that doesn't have the address.
Set the ipi_ifindex/ipi6_ifindex field to respond on the same interface
as the request was received from to avoid asymmetries in delay and
timestamping.
Tomcrypt, some NSS hash functions, and the internal MD5 require the
output buffer to be at least as long as the digest. To provide the same
hashing API with all four options, use an extra buffer for the digest
when necessary and copy only the requested bytes to the caller.
Don't wait for other sources to be selectable when the maximum
selectable and non-selectable reachability registers happen to match
and a register is already full (e.g. after heavy packet loss).
Starting with Linux 4.19, the frequency of the system clock should be
updated immediately in the system call itself, which will significantly
reduce the maximum delay of the update.
Increase the assumed tick rate in order to reduce the dispersion
accumulated by the driver when it sets the frequency.
Add an option to use the median filter to reduce noise in measurements
before they are accumulated to sourcestats, similarly to reference
clocks. The option specifies how many samples are reduced to a single
sample.
The filter is intended to be used with very short polling intervals in
local networks where it is acceptable to generate a lot of NTP traffic.
Move the implementation of the median filter to a separate file to make
it useful for NTP. Replace some constants with parameters and generalize
the code to work with full NTP samples (including root dispersion/delay,
stratum, and leap).
For refclocks it should give the same results as before.
This moves the leap status of the last sample from the source instance
to the sourcestats instance in order to make them both accumulate the
same data.
The current code uses macros from inttypes.h. There is no point in
detecting and selecting between stdint.h and inttypes.h as the latter is
always needed.
In chronyc handle SIGPIPE similarly to SIGTERM. In chronyd ignore the
signal to avoid crashing when a TCP socket will be needed (e.g. for
NTS-KE) and will be unexpectedly closed from the other side.
Before dispatching a handler, check if it is still valid. This allows a
handler to remove itself when a descriptor has two different events at
the same time.
When the local polling interval is adjusted between minpoll and maxpoll
to a sub-second value, check if the source is reachable and the minimum
measured delay is 10 milliseconds or less. If it's not, ignore the
maxpoll value and set the interval to 1 second.
This should prevent clients (mis)configured with an extremely short
minpoll/maxpoll from flooding servers on the Internet.
If the polling interval is shorter than 8 seconds, set the burst
interval to the 1/4th of the polling interval instead of the 2-second
constant. This should make the burst option and command useful with
very short polling intervals.
Fix mismatches between the format and sign of variables passed to
printf() or scanf(), which were found in a Frama-C analysis and gcc
using the -Wformat-signedness option.
Instead of adding the recipient to the sendmail command line (which is
interpretted by the shell) add a "To" line to the message and run
sendmail with the -t option to read the recipient from the message.
While it is not expected to happen with any time that can be represented
by the system clock, the functions are allowed to return NULL. Check the
pointer before dereferencing.
This issue was found in a Frama-C analysis.
Remove spaces from tab-completion results and now break on a space.
Tested with both readline and editline (libedit)
Incorporated Miroslav's suggestions.
This allows chronyd to remove its pidfile on exit after dropping the
root privileges in order to prevent another chronyd instance from
failing to start, e.g. due to a wrong SELinux label from chronyd -q.
Instead of counting missing responses, switch to the offline state
immediately when sendmsg() fails.
This makes the option usable with servers and networks that may drop
packets, and the effect will be consistent with the onoffline command.
The onoffline command tells chronyd to switch all sources to the online
or offline status according to the current network configuration. A
source is considered online if it is possible to send requests to it,
i.e. a route to the network is present.
Allow SRC_MAYBE_ONLINE to be specified for new NTP sources and
connectivity setting to select between SRC_ONLINE and SRC_OFFLINE
according to the result of the connect() system call, i.e. check whether
the client has a route to send its requests.
Apparently, it is possible for an interface to report all necessary
flags for HW timestamping without having a PHC. Check the PHC index to
avoid an error message in the system log saying that /dev/ptp-1 cannot
be opened.
With recent changes in the Linux kernel, the getrandom() system call may
block for a long time after boot on machines that don't have enough
entropy. It blocks the chronyd's initialization before it can detach
from the terminal and may cause a chronyd service to fail to start due
to a timeout.
At least for now, enable the GRND_NONBLOCK flag to make the system call
non-blocking and let the code fall back to reading /dev/urandom (which
never blocks) if the system call failed with EAGAIN or any other error.
This makes the start of chronyd non-deterministic with respect to files
that it needs to open and possibly also makes it slightly easier to
guess the transmit/receive timestamp in client requests until the
urandom source is fully initialized.
Historically there were plenty of callback based implementations around
ifupdown via /etc/network/if-up and similar. NetworkManager added the
dispatcher [1] feature for such a kind of functionality.
But so far a systemd-networkd (only) systemd had no means to handle those
cases. This is solved by networkd-dispatcher which is currently available
at least in ArchLinux and Ubuntu.
It takes away the responsibility to listen on netlink events in each
application and provides a more classic script-drop-in interface to respond
to networkd events [3].
This commit makes the NM example compatible to be used by NetworkManager
dispatcher as well as by networkd-dispatcher. That way we avoid too much
code duplication and can from now on handle special cases in the
beginning so that the tail can stay commonly used.
After discussion on IRC the current check differs by checking the
argument count (only in NetworkManager), if ever needed we could extend
that to check for known custom environment vars (NetworkManager =>
CONNECTION_UUID; networkd-dispatcher => OperationalState).
[1]: https://developer.gnome.org/NetworkManager/stable/NetworkManager.html
[2]: https://github.com/craftyguy/networkd-dispatcher
[3]: https://github.com/systemd/systemd/blob/master/src/systemd/sd-network.h#L86
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
The cap_get_bound() function and CAP_IS_SUPPORTED macro were added in
libcap-2.21. Check if the macro is defined before use.
The sys/capability.h header from libcap-2.16 and earlier disables the
linux/types.h header, which breaks the linux/ptp_clock.h header. Change
the order to include sys/capability.h as the last system header.
In the next Linux version the recvmmsg() system call will be probably
fixed to not return socket errors (e.g. due to ICMP) when reading from
the error queue.
The NTP I/O code assumed this was the correct behavior. When the system
call is fixed, a socket error on a client socket will cause chronyd to
enter a busy loop consuming the CPU until the receive timeout is reached
(8 seconds by default).
Use getsockopt(SO_ERROR) to clear the socket error when reading from the
error queue failed.
Some hash functions in the freebl3 library don't support truncated
digests and either return immediately with no update of the output
length, or ignore the length of the output buffer and always write whole
digest.
Initialize the return value to zero to get correct result with the
former.
This is triggered only in the hash unit test. chronyd always provides a
sufficient buffer for the digest.
Use nettle_hashes[] instead of nettle_get_hashes(), which is available
only in nettle >= 3.4. nettle_hashes[] is a symbol available in older
versions and may be renamed in future. In nettle >= 3.4 it is a macro
using nettle_get_hashes() for compatibility.
Instead of having adjtimex just fail with a permission issue
improve the error messaging by warning for the lack of
CAP_SYS_TIME on SYS_Linux_Initialise.
Message will look like (instead of only the latter message):
CAP_SYS_TIME not present
adjtimex(0x8001) failed : Operation not permitted
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
NTPv1 packets have a reserved field instead of the mode field and the
actual mode is determined from the port numbers. It seems there is still
a large number of clients sending NTPv1 requests with a zero value in
the field (per RFC 1059).
Follow ntpd and respond to the requests with server mode packets.
Rework the code to not ignore valid packets with unknown or obsolete
responses and return immediately with "bad reply from daemon" instead of
timing out with "cannot talk to daemon".
Instead of zeroing individual fields, zero all bytes of the buffer
before the reply is formed.
This may have a small impact on performance, but it simplifies the code
and minimizes the risk of leaking uninitialized memory.
Make the length of responses containing manual samples constant to
simplify the protocol. It was the only type of response that had a
variable length.
This reverts commit 2343e7a89c.
Clients sending packets in the interleaved mode are supposed to use
a different receive and transmit timestamp in order to reliably detect
the mode of the response. If an interleaved request with the receive
timestamp equal to the transmit timestamp is detected, respond in the
basic mode.
Wait until a kernel RX timestamp is actually missing before opening the
dummy socket in order to avoid a small performance impact in case the
servers are so slow/distant that the kernel can constantly win the race.
When the burst option is specified in the server/pool directive and the
current poll is longer than the minimum poll, initiate on each poll a
burst with 1 good sample and 2 or 4 total samples according to the
difference between the current and minimum poll.
In the source selection algorithm, include extra dispersion due to
maxclockerror in the root distance of sources that don't have new
samples (the last sample is older than span of all samples) to not
prefer unreachable sources with a short distance and small skew over
reachable sources for too long, and also to decrease their chances of
becoming falsetickers.
chronyd doesn't normally write anything to stdout or stderr when running
as a daemon, but it is a good practice to replace them with descriptors
of /dev/null to prevent accidental writes to other files or sockets that
would otherwise take their place.
If opening the log file specified with the -l option failed (after
closing all descriptors), the error message is written to an invalid
descriptor as no log file or syslog is opened yet. Fix the code to track
when the output is usable.
Compare both receive and transmit timestamps in the NTP test number 1.
This prevents a client from dropping a valid response in the interleaved
mode if it follows a response in the basic mode and the server did not
have a kernel/hardware transmit timestamp, and the random bits of the
two timestamps happen to be the same (chance of 1 in 2^(32-precision)).
Before sending a new packet, check if the receive/transmit timestamp
is not equal to the origin timestamp or the previous receive/transmit
timestamp in order to prevent the packet from being its own valid
response (in the symmetric mode) and invalidate responses to the
previous packet.
This improves protection against replay attacks in the symmetric mode.
Save the local receive and remote transmit timestamp needed for
(re)starting the symmetric protocol when no valid reply was received
separately from the timestamps that are used for synchronization of the
local clock.
This extends the interval in which the local NTP state is (partially)
protected against replay attacks in order to complete a measurement
in the interleaved symmetric mode from [last valid RX, next TX] to
[last TX, next TX], i.e. it should be the same as in the basic mode.
The socket.h header provided by musl doesn't seem to include the kernel
headers and is missing SCM_TIMESTAMPING_PKTINFO, which causes the
Linux-specific code in chrony to fail to build.
Before reading the n_samples field of the MANUAL_LIST reply, check if it
is actually contained in the received message. This does not change the
outcome of the client's length check as the returned length was always
larger than the length of the truncated reply and it was dropped anyway,
but it prevents the client from reading uninitialized memory.
The Linux kernel has a counter for sockets using kernel RX timestamping
and timestamps (all) received packets only when it is not zero. However,
this counter is updated asynchronously from setsockopt(). If there are
currently no other sockets using the timestamping, it is possible that a
fast server response is received before the kernel timestamping is
actually enabled after setting the socket option and sending a request.
Open a dummy socket on start to make sure there is always at least one
timestamping socket to avoid the race condition.
When dropping the root privileges, don't try to keep the CAP_SYS_TIME
capability if the -x option was enabled. This allows chronyd to be
started without the capability (e.g. in containers) and also drop the
root privileges.
When sending client requests to a close and fast server, it is possible
that a response will be received before the HW transmit timestamp of
the request itself. To avoid processing of the response without the HW
timestamp, monitor events returned by select() and suspend reading of
packets from the receive queue for up to 200 microseconds. As the
requests are normally separated by at least 200 milliseconds, it is
sufficient to monitor and suspend one socket at a time.
If chronyc sent a request which caused chronyd to step the clock (e.g.
makestep, settime) and the second reading of the clock before calling
select() to wait for a response happened after the clock was stepped, a
new request could be sent immediately and chronyd would process the same
command twice. If the second request failed (e.g. a settime request too
close to the first request), chronyc would report an error.
Change the submit_request() function to read the clock only once per
select() to wait for the first response even when the clock was stepped.
If the system clock was stepped forward after chronyc sent a request and
before it read the clock in order to calculate the receive timeout,
select() could be called with a negative timeout, which resulted in an
infinite loop waiting for select() to succeed.
Fix the submit_request() function to not call select() with a negative
timeout. Also, return immediately on any error of select().
Instead of using the TAI-UTC offset which corresponds to the current
system time, get the offset for the reference time. This allows the
clock to be accurately stepped from a time with different TAI-UTC
offset.
This option is for indicating to chronyd that the reference clock is
kept in TAI and that chrony should attempt to convert from TAI to UTC by
using the timezone configured by the "leapsectz" directive.
Although gmtime() is expected to convert any time of the system clock at
least in the next few NTP eras, a correct code should always check the
returned value and this shouldn't be a fatal error in handling of leap
seconds.
Similarly to the maxdelaydevratio test, include in the maximum delay
dispersion which accumulated in the interval since the last sample.
Also, enable the test for symmetric associations.
Instead of giving NTP-specific data to sourcestats in order to perform
the test, provide a function to get all data needed for the test in
ntp_core. While at it, improve the naming of variables.
When (re)allocating an array with very large number of elements using
the MallocArray or ReallocArray macros, the calculated size of the array
could overflow size_t and less memory would be allocated than requested.
Add new functions for (re)allocating arrays that check the size and use
them in the MallocArray and ReallocArray macros.
This couldn't be exploited, because all arrays that can grow with cmdmon
or NTP requests already have their size checked before allocation, or
they are much smaller than memory allocated for structures to which they
are related (i.e. ntp_core and sourcestats instances), so a memory
allocation would fail before their size could overflow.
This issue was found in an audit performed by Cure53 and sponsored by
Mozilla.
Fix the UTI_TimeToLogForm() function to check if gmtime() didn't fail.
This caused chronyc to crash due to dereferencing a NULL pointer when
a response to the "manual list" request contained time which gmtime()
could not convert to broken-down representation.
This issue was found in an audit performed by Cure53 and sponsored by
Mozilla.
If no rxfilter is specified in the hwtimestamp directive and the NIC
doesn't support the all or ntp filter, enable TX-only HW timestamping
with the none filter.
On some HW it seems it's possible to get an occasional bad reading of
the PHC (with normal delay), or in a worse case the clock can step due
to a HW/driver bug, which triggers reset of the HW clock instance. To
avoid having a bad estimate of the frequency when the next (good) sample
is accumulated, drop also the last sample which triggered the reset.
If the minimum delay is known (in a static network configuration), it
can replace the measured minimum from the register. This should improve
the stability of corrections for asymmetric jitter, sample weighting and
maxdelay* tests.
Programming pins for external PHC timestamping was added in Linux 3.15,
but the PHC subsystem is older than that. Compile the programming code
only when the ioctl is defined.
On some systems, passing NULL as the first argument to adjtime, will
result in returning the amount of adjustment outstanding from a previous
call to adjtime().
On macOS this is not allowed and the adjtime call will fault. We can
simulate the behaviour of the other systems by cancelling the current
adjustment then restarting the adjustment using the outstanding time
that was returned. On macOS 10.13 and later, the netbsd driver is now
used and must use these semantics when making/measuring corrections.
The sources and sourcestats commands accept -v as an option, but the
glibc implementation of getopt() reorders the arguments and parses the
option as a command-line option of chronyc.
Add '+' to the getopt string to disable this feature. Other getopt()
implementations should consider it a new command-line option, which will
be handled as an error if present.
This allows transmissions in symmetric mode to be scheduled
independently from client transmissions. This reduces maximum delay
in scheduling when chronyd is configured with a larger number of
servers.
Fix a sign error in conversion of HW time to local time, which caused
the jitter to be amplified instead of reduced. NTP with HW timestamping
should now be more stable and able to ignore occasionally delayed
readings of PHC.
In basic client mode, set the origin and receive timestamp to zero.
This reduces the amount of information useful for fingerprinting and
improves privacy as the origin timestamp allows a passive observer to
track individual NTP clients as they move across networks. (With chrony
clients that assumes the timestamp wasn't reset by the chronyc offline
and online commands.)
This follows recommendations from the current version of IETF draft on
NTP data minimization [1].
The timestamp could be theoretically useful for enhanced rate limiting
which can limit individual clients behind NAT and better deal with DoS
attacks, but no server implementation is known to do that.
[1] https://tools.ietf.org/html/draft-ietf-ntp-data-minimization-01
When no default route is configured, check each source if it has a
route. If the system has multiple network interfaces, this prevents
setting local NTP servers to offline when they can still be reached over
one of the interfaces.
In interleaved client mode, when so many consecutive requests were lost
that the first valid (interleaved) response would be dropped for being
too old, switch to basic mode so the response can be accepted if it
doesn't fail in the other tests.
This reworks commit 16afa8eb50.
In symmetric mode, don't send a packet in interleaved mode unless it is
the first response to the last valid request received from the peer and
there was just one response to the previous valid request. This prevents
the peer from matching the transmit timestamp with an older response if
it can't detect missed responses.
In interleaved symmetric mode, check if the remote TX timestamp is
before RX timestamp. Only the first response from the peer after
receiving a request should pass this test. Check also the interval
between last two remote transmit timestamps when we know the remote poll
can't be constrained by minpoll. Use the minimum of previous remote and
local poll as a lower bound of the actual interval between peer's
transmissions.
If synchronised to a stratum 15 source, return stratum of 16 instead of
0 in the tracking report. It will not match the value in server mode
packets, but it should be less confusing.
Check how many responses were missing before accumulating a sample using
old timestamps to avoid correcting the clock with an offset extrapolated
over a long interval.
This should be eventually done in sourcestats for all sources.
With the new selection of timestamps in the interleaved mode it's no
longer necessary to reverse the poll tracking in order to reduce the
local and remote intervals of measurements that makes the peer with
higher stratum.
This reverts commit 4a24368763.
Use previous local TX and remote RX timestamps for the new sample in the
interleaved mode if it will make the local and remote intervals
significantly shorter in order to improve the accuracy of the measured
delay.
If no CFLAGS are specified, check if common security hardening options
are supported and add them to the CFLAGS/LDFLAGS. These are typically
enabled in downstream packages, but users compiling chrony from sources
with default CFLAGS should get hardened binaries too.
macOS 10.13 will implement the ntp_adjtime() system call, allowing
better control over the system clock than is possible with the existing
adjtime() system call. chronyd will support both the older and newer
calls, enabling binary code to run without recompilation on macOS 10.9
through macOS 10.13.
Early releases of macOS 10.13 have a very buggy adjtime() call. The
macOS driver tests adjtime() to see if the bug has been fixed. If the
bug persists then the timex driver is invoked otherwise the netbsd
driver.
If the -Q option is specified, disable by default pidfile, ntpport,
cmdport, Unix domain command socket, and clock control, in order to
allow starting chronyd without root privileges and/or when another
chronyd instance is already running.
The tai field in struct timex is a Linux-specific feature. It's possible
to read the current offset with ntp_gettime() (or ntp_gettimex() on
Linux), but apparently not all libc implementations support it.
Rework the code to save and adjust the last value instead of reading
the current value from the kernel.
Unlike in the basic mode, the peer with a higher stratum needs to wait
for a response before sending the next request in order to minimize the
delay of the measurement and error in the measured delay.
Slightly increase the delay adjustment to make it work with older chrony
versions.
Update the remote poll and remote stratum even for unsychronised peers,
and handle stratum of 0 as 16, so the peers work with the opposite
differences between their strata and can adjust their polling intervals
in order to interleave the packets.
Use the timezone specified by the leapsectz directive to get the
current TAI-UTC offset and set the offset of the system clock in order
to provide correct TAI time to applications using ntp_adjtime(),
ntp_gettime(), or clock_gettime(CLOCK_TAI).
Instead of repeatedly expanding the range of b with the same increment,
double the range on each iteration to speed up the expansion. Also, add
a sanity check for the interval.
The bisection always terminated after one iteration. Change the code to
check if the middle is different from the lower and upper limits as
suggested in the original recipe.
This fixes commit b14689d59b.
In order to stabilize the weights of refclock samples which have only
slightly different distances, don't allow the stddev value used in the
weight calculation to be smaller than the precision and also assign
weight of 1 to all samples which have distance < minimum + precision.
When parsing the include directive, call glob() with the GLOB_ERR and
GLOB_NOMAGIC flags, and abort with an error message when matching of the
pattern failed with other error than GLOB_NOMATCH.
This restores the original behavior of the directive when it didn't
allow patterns, but it will still not fail with patterns not matching
any files in an existing directory.
When the poll value in a client request is smaller than the server's NTP
rate limiting interval, set poll in the response to the rate limiting
interval to suggest the client to increase its polling interval.
This follows ntpd as a server. No current client implementation seems to
be increasing its interval by the poll, but it may change in the future.
Add an rxfilter option to the hwtimestamp directive to select which
received packets should be timestamped. It can be set to "none", "ntp",
or "all". The default value is ntp, which falls back to all when ntp is
not supported.
New timestamping options may be available in kernel 4.13. They can be
used to get the index of the interface which timestamped incoming packet
together with its length at layer 2, enable simultaneous SW and HW TX
timestamping, and enable a new RX filter for NTP packets.
Request SW timestamps with SCM_TIMESTAMPING even if HW timestamping is
enabled. This replaces SCM_TIMESTAMP(NS) for RX and enables TX SW
timestamping on interfaces that don't support HW timestamping (or don't
have it enabled) if another interface has HW timestamping enabled.
Don't give up when one of the addresses/hostnames specified by -h fails
to resolve in DNS_Name2IPAddress(), e.g. with the default setting try to
connect to ::1 even when 127.0.0.1 failed due to the -6 option.
This allows multiple options to be specified together and also may
options follow configuration directives on systems where getopt()
permutates the arguments.
Source selection uses the last event time as current time. If it was
called from a refclock which generates a sample in its poll function
(e.g. PHC), the sample time may be later than the event time. This
gives a negative elapsed time in SST_GetSelectionData() and possibly
also a negative root distance, which causes the source to be rejected as
a falseticker.
Use absolute value of the difference in order to always get a positive
root distance.
Add width option to the refclock directive to set expected width of
pulses in a PPS signal. The width adds a limit for the maximum offset
and root distance in order to reject PPS samples from wrong events, e.g.
PHCs which cannot be configured to timestamp only rising of falling
edges.
Add extpps driver option to the PHC refclock to enable external
timestamping of PPS signal and also options to configure the channel and
pin index. In this mode, the driver polling function accumulates samples
for hwclock, which is used to convert received timestamping events to
local time.
Add pps option to the refclock directive to force chronyd to treat any
refclock as a PPS refclock. This is intended for refclocks that may
provide time off by a whole number of seconds due to missing or wrong
TAI/GPS->UTC conversion.
Split RCL_AddPulse() in order to provide a new function for refclock
drivers which can make PPS samples without having raw system time, e.g.
from PHC timestamps.
When processing data from the PTP_SYS_OFFSET ioctl, the sample is
dropped when an interval between two consecutive readings of the system
clock is negative or zero, assuming the clock has been stepped between
the two readings.
With a real PHC the interval is normally expected to be at least a
microsecond, but with a virtual PHC and a low-resolution system clock
it's possible to get two readings with the same system time. Modify the
check to drop only samples with a negative delay.
Specify the maximum length of the path in the snprintf() format to avoid
a new gcc warning (-Wformat-truncation). If the path doesn't fit in the
buffer, indicate with the '>' symbol that it was truncated. The function
is used only for debug messages.
Use the new options of the run script in the check target to make it
reliable for automatic testing without using a fixed random seed and add
a new quickcheck target for the original check using just one iteration.
Add options to allow running the tests in multiple iterations while
allowing a small number of failures per test. Some tests are expected to
fail occasionally as they are basically statistical tests. Improving
their reliability is possible, but it's always a compromise between
sensitivity, reliability, and execution time.
Add a new clock driver that doesn't actually try to adjust the clock.
It allows chronyd to run without the capability to adjust/set the system
clock, e.g. in some containers. It can be enabled by the -x option.
Always write the measurement history on exit when the dump directory is
specified and silently ignore the dumponexit directive. There doesn't
seem to be a good use case for dumpdir and -r without dumponexit as the
history would be invalidated by adjustments of the clock that happened
between the dump command and chronyd exit.
It was never used for anything and messages in debug output already
include filenames, which can be easily grepped if there is a need
to see log messages only from a particular file.
Move the res_init() call from do_name_to_ipaddress() into a separate
privops operation. Use it in ntp_sources and avoid unnecessary
res_init() calls in the main thread.
Coverity doesn't seem to like the new field in the IPAddr struct (used
as explicit padding of the structure) to be left uninitialized, even
though it's never used for anything and is cleared by memset() in
UTI_IPHostToNetwork() before leaving the process.
While the measurements log can be useful for debugging problems in NTP
configuration (e.g. authentication failures with symmetric keys), it
seems most users are interested only in valid measurements (e.g. for
producing graphs) and don't expect/handle entries where some of the RFC
5905 tests 1-7 failed. Modify the measurements log option to log only
valid measurements, and for debugging purposes add a new rawmeasurements
option.
When the server's transmit timestamp was updated with a kernel/HW
timestamp, it didn't include the time smoothing offset. If the offset
was larger than one second, the update failed and clients using the
interleaved mode received less accurate timestamps. If the update
succeeded, the clients received timestamps that were not adjusted for
the time smoothing offset, which added an error of up to 0.5s/1s to
their measured offset/delay.
Fix the update to include the smoothing offset in the new timestamp.
Handle zero NTP timestamp in UTI_Ntp64ToTimespec() as a special value to
make it symmetric with UTI_TimespecToNtp64(). This is needed since
commit d75f6830f1, in which a timestamp is
converted back and forth without checking for zero.
It also makes zero NTP timestamps more apparent in debug output.
When accumulating a new sample, check if the new RTC time is newer the
last sample time. If it is not, discard all previous samples, assuming
something has stepped the RTC, or it's a broken RTC/driver.
This reduces leak of sample times (and receive timestamps which are
related to sample times), which could be useful in off-path attacks on
unauthenticated symmetric interleaved mode.
Before sending an NTP packet, check whether the TX timestamp is not
equal to the RX timestamp. If it is, generate a new TX timestamp and try
again. This is extremely unlikely to happen in normal operation, but it
is needed for reliable detection of the interleaved mode.
For now, when converting a raw timestamp, return error of the last
sample as the maximum error of the timestamp. This is needed to include
the PHC reading delay in the NTP dispersion.
Change the default NTP rate limiting leak to 2 (25%). Change the default
command rate limiting interval to -4 (16 packets per second) and burst
to 8, so the interval is the only difference between NTP and command
rate limiting defaults.
This reverts commit 50022e9286.
Testing showed that ntpd as an NTP client performs poorly when it's
getting only 25% of responses. At least for now, disable rate limiting
by default again.
Change the default interval of both NTP and command rate limiting to -10
(1024 packets per second) and the burst to 16. The default NTP leak is 2
(rate limiting is enabled by default) and the default command leak is 0
(rate limiting is disabled by default).
Use the -h option to force chronyc to use internet socket instead of
Unix domain as the access to the socket may be blocked by SELinux and
trying to open it generates SELinux warnings.
With a larger number of configured servers, the handler of the emulated
resolver repeatedly scheduled timeout of zero, which triggered the
infinite loop detection in the scheduler and caused abort. This bug was
introduced in commit 967e358dbc.
Rework the code to use pipes instead of timeouts to avoid this problem.
The maxlockage option specifies in number of pulses how old can be
samples from the refclock specified by the lock option to be paired with
the pulses. Increasing this value is useful when the samples are
produced at a lower rate than the pulses.
The maxjitter directive sets the maximum allowed jitter of the sources
to not be rejected by the source selection algorithm. This prevents
synchronisation with sources that have a small root distance, but their
time is too variable. By default, the maximum jitter is 1 second.
Instead of a worst-case delay use a mean value and relate it to the
source's time. This makes it more stable in the interleaved and
symmetric modes, which should improve the weighting and asymmetry
correction. Modify the test A and B to work with a minimum estimated
delay (delay - dispersion).
Change default minsamples to 6 and polltarget to 8. This should improve
stability with extremely small jitters (e.g. HW timestamping) and not
decrease time accuracy at minimum polling interval too much.
This option sets a timeout (in seconds) after which chronyd will exit.
If the clock is not synchronised, it will exit with a non-zero status.
This is useful with the -q or -Q option to shorten the maximum time
waiting for measurements, or with the -r option to limit the time when
chronyd is running, but still allow it to adjust the frequency of the
system clock.
If the MAC in NTPv4 requests would be truncated, use version 3 by
default to avoid the truncation. This is necessary for compatibility
with older chronyd servers, which do not respond to messages with
truncated MACs.
In order to allow deterministic parsing of NTPv4 extension fields, the
MAC must not be longer than 192 bits (RFC 7822). One way to get around
this limitation when using symmetric keys which produce longer MACs is
to truncate them to 192 bits (32-bit key ID and 160-bit hash).
Modify the code to accept NTPv4 packets with MACs truncated to 192
bits, but still allow long MACs in NTPv4 packets to not break
compatibility with older chrony clients.
This is an incompatible change in the output of the tracking command,
which may break some scripts, but it's necessary to avoid confusion with
IPv4 addresses when synchronised to an IPv6 server or reference clock.
In a burst of three requests (two presend + one normal) the server can
detect the client is using the interleaved mode and save the transmit
timestamp of the second response for the third response. This shortens
the interval in which the server has to keep the state.
Rework the code to make a real request for presend and process the
response, but don't accumulate the sample. This allows presend to work
in the interleaved client mode.
In unauthenticated interleaved symmetric NTP mode we should be now
careful with the reference timestamp as it may be useful with the peer
delay for estimating the local receive timestamp and increasing the
chance of spoofing a valid response from the peer.
When updating the reference time, add a random error of up to one second
to make it less sensitive when disclosed to NTP and cmdmon clients.
We need to transpose HW RX timestamps as HW timestamps are normally
preamble timestamps and RX timestamps in NTP are supposed to be trailer
timestamps. Without raw sockets we don't know the length of the packet
at layer 2, so we make an assumption that UDP data start at the same
position as in the last transmitted packet which had a HW TX timestamp.
If all or most SHM/SOCK samples collected in a polling interval had the
same local timestamp, the dispersion could end up as nan, which could
trigger an assert failure later in the code.
Before accumulating a refclock sample, check if the timestamp is newer
than the previous one.
When the smoothing process is updated with extremely small (e.g.
sub-nanosecond) values, both directions may give a negative length of
the 1st or 3rd stage due to numerical errors and the selection will fail
an in assertion. Rework the code to select the direction which gives a
smaller error.
When the SO_TIMESTAMP socket option was enabled, the expected type of
control messages containing timestamps was SO_TIMESTAMP instead of
SCM_TIMESTAMP. This worked on Linux, where the two values are equal, but
not on the other supported systems. The timestamps were ignored and this
probably worsened the accuracy and stability of the synchronisation.
Don't waste time with processing messages that don't fit in the receive
buffer as they most likely wouldn't pass the format check due to an
invalid length of an extension field.
Always allow update from the first valid response, even if its transmit
timestamp is not newer than the currently saved timestamp. This shoud
provide a temporary protection in the case where the attacker does have
an authenticated packet from future, but the peers are using the same
polling interval and the protocol is already synchronised. This could be
also useful in the case where the attacker cannot observe the traffic
and authentication is disabled.
Extend the random value which is included in the calculation of the
delay from 16 to 32 bits. This makes scheduling of NTP transmissions
random to one microsecond for polling intervals up to 17.
Don't rely on random source port of a connected socket alone as a
protection against spoofed packets in chronyc. Generate a fully random
32-bit sequence number for each request and modify the code to not send
a new request until the timeout expires or a valid response is received.
For a monitoring protocol this should be more than good enough.
This allows sharing of the same directory for sockets, logs and dumps as
the socket directory needs to be created first (with mode 0770) in order
to pass the check of the permissions.
A recently published paper [1] (section VIII) describes a DoS attack
on symmetric associations authenticated with a symmetric key where the
attacker can only observe and replay packets. Although the attacker
cannot prevent packets from reaching the other peer (not even by
flooding the network for example), s/he has the same power as a MitM
attacker.
As the authors explain, this is a fundamental flaw of the protocol,
which cannot be fixed in the general case. However, we can at least try
to protect associations in a case where the peers use the same polling
interval (i.e. for each request is expected one response) and all peers
that share the symmetric key never start with clocks in future or very
distant past (i.e. the attacker does not have any packets from future
that could be replayed).
Require that updates of the NTP state between requests have increasing
transmit timestamp and when a packet that passed all NTP tests to be
considered a valid response was received, don't allow any more updates
of the state from packets that don't pass the tests. This should ensure
the last update of the state is from the first time the last real
response was received and still allow the protocol to recover in case
one of the peers steps its clock back or the attacker does have a packet
from future and the attack stops.
[1] Aanchal Malhotra, Matthew Van Gundy, Mayank Varia, Haydn Kennedy,
Jonathan Gardner, and Sharon Goldberg. The Security of NTP's
Datagram Protocol. https://eprint.iacr.org/2016/1006
Add a new directive to specify interfaces which should be used for HW
timestamping. Extend the Linux ntp_io initialization to enable HW
timestamping, configure the RX filter using the SIOCSHWTSTAMP ioctl,
open their PHC devices, and track them as hwclock instances. When
messages with HW timestamps are received, use the PTP_SYS_OFFSET ioctl
to make PHC samples for hwclock.
Adapt the interleaved symmetric mode for client/server associations.
On server, save the state needed for detection and responding in the
interleaved mode in the client log. On client, enable the interleaved
mode when the server is specified with the xleave option. Always accept
responses in basic mode to allow synchronization with servers that
don't support the interleaved mode, have too many clients, or have
multiple clients behing the same IP address. This is also necessary to
prevent DoS attacks on the client by overwriting or flushing the server
state. Protect the client's state variables against replay attacks as
the timestamps are now needed when processing the subsequent packet.
Add xleave option to the peer directive to enable an interleaved mode
compatible with ntpd. This allows peers to exchange transmit timestamps
captured after the actual transmission and significantly improve
the accuracy of the measurements.
Enable SCM_TIMESTAMPING control messages and the socket's error queue in
order to receive our transmitted packets with a more accurate transmit
timestamp. Add a new file for Linux-specific NTP I/O and implement
processing of these messages there.
Introduce a new structure for local timestamps that will hold the
timestamp with its estimated error and also its source (daemon, kernel
or HW). While at it, reorder parameters of the functions that accept the
timestamps.
Add new functions for processing of packets after they are actually
sent by the kernel or HW in order to get a more accurate transmit
timestamp. Rename old functions for processing of received packets and
their parameters to make the naming more consistent.
If all or most SHM/SOCK samples collected in a polling interval had the
same local timestamp, the dispersion could end up as nan, which could
trigger an assert failure later in the code.
Before accumulating a refclock sample, check if the timestamp is newer
than the previous one.
Use the ipi_addr field instead of ipi_spec_dst as the local address
after recvmsg() to be consistent with the processing of struct
in6_pktinfo. This may make a difference for messages from the error
queue.
When the smoothing process is updated with extremely small (e.g.
sub-nanosecond) values, both directions may give a negative length of
the 1st or 3rd stage due to numerical errors and the selection will fail
an in assertion. Rework the code to select the direction which gives a
smaller error.
When chronyd is starting, after the point where dump files are loaded,
remove all files in the dump directory that match the naming scheme used
for dump files. This prevents loading stale dump files that were not
saved in the latest run of chronyd.
Use empty string instead of "." (which is normally the root directory)
as the default value of dumpdir and logdir to indicate they are not
specified. Print warnings in syslog when trying to log or dump
measurements without dumpdir or logdir.
When the SO_TIMESTAMP socket option was enabled, the expected type of
control messages containing timestamps was SO_TIMESTAMP instead of
SCM_TIMESTAMP. This worked on Linux, where the two values are equal, but
not on the other supported systems. The timestamps were ignored and this
probably worsened the accuracy and stability of the synchronisation.
Include IP address instead of reference ID in the name of dump file
for NTP sources and for reference clocks format the reference ID as a
hexadecimal number instead of quad dotted notation.
Also, avoid dynamic memory allocation and improve warning messages.
Replace struct timeval with struct timespec as the main data type for
timestamps. This will allow the NTP code to work with timestamps in
nanosecond resolution.
Crypto-NAK is useful only with Autokey where it allows quick reset
of the association. There is no plan to support Autokey and NTS will
specify its own message for authentication errors.
Estimate asymmetry of network jitter on the path to the source as a
slope of offset against network delay in multiple linear regression. If
the asymmetry is significant and its sign doesn't change frequently, the
measured offsets (which are used later to estimate the offset and
frequency of the clock) are corrected to correspond to the minimum
network delay. This can significantly improve the accuracy and stability
of the estimated offset and frequency.
When selecting sources from a pool, ignore responses which didn't
produce a new sample. Sources with acceptable delay (as configured by
the maxdelay* options) should be prefered.
When a valid packet is received from an unsynchronised source (i.e. only
a test of leap, stratum or root distance failed), there is no point in
waiting for another packet or the RX timeout, and the client socket can
be immediately closed.
Add support for authenticating MS-SNTP responses in Samba (ntp_signd).
Supported is currently only the old MS-SNTP authenticator field. It's
disabled by default. It can be enabled with the --enable-ntp-signd
configure option and the ntpsigndsocket directive, which specifies the
location of the Samba ntp_signd socket.
When a received packet fails to authenticate, check if the digest
contains zeroes and treat it as an MS-SNTP packet with authenticator or
extended authenticator field. For now, discard these packets, i.e. don't
respond with a crypto-NAK.
Replace the flag that enables authentication using a symmetric key with
an enum. Specify crypto-NAK as a special mode used for responses instead
of relying on zero key ID. Also, rework check_packet_auth() to always
save the mode and key ID.
Add offset option to the server/pool/peer directive. It specifies a
correction which will be applied to offsets measured with the NTP
source. It's particularly useful to compensate for a known asymmetry in
network delay or timestamping errors.
Instead of copying a prepared fd_set to the fd_set used by select(),
fill it from scratch according to the array of file handlers before each
select() call. This should make the code simpler and save some memory
when other events are supported.
Replace SCH_*InputFileHandler() functions with more general
SCH_*FileHandler(), where events are specified as a new parameter and
which will later support other file events, e.g. file ready for ouput
and exception.
The file handlers have two new parameters: file descriptor and event.
- fix word order, articles, consistency, and some typos
- avoid slashes, contractions, `may`, dashes in running text
- use colons before example and code blocks
- add Oxford commas
This allows a server that will become the orphan source to initialize
its time with the initstepslew directive from the current orphan source
or its clients.
The NTP_*_MAC_LENGTH macros didn't include the key ID, which caused the
NTP authentication check to ignore MACs with 512-bit hashes (SHA512,
WHIRLPOOL).
This was broken since update to NTPv4.
If a special reference mode is enabled, always pass the test for
synchronization loop. This allows chronyd using the initstepslew
directive (or the -q/-Q option) to accept time from its own clients
after restart as is documented in the chrony.conf man page.
This was broken since update to NTPv4.
Change the array with refclock instances to store just pointers and
avoid reallocation of the instances. This fixes a bug with the SOCK
refclock, which uses the pointer to the instance in a file handler and
which was invalid when the instance was reallocated (after adding
another refclock).
The bug is from commit d92583ed33.
Don't require the scheduler to be initialized in SCH_QuitProgram().
This fixes a crash when a signal is received between scheduler
finalization and chronyd exit.
Ignore orphan sources that are unreachable (but still have usable stats)
to have a quick and consistent source selection between orphans.
This also fixes the "Unknown local refid in orphan mode" error appearing
when a selected orphan source is removed, as the source is marked as
unreachable and the selection runs with disabled NTP instance before the
source instance is actually removed.
Instead of using a timer for switching the reference to the
unsynchronised state (which activates the local reference), check
if it should be active when returning the reference parameters.
If local reference is active, return normal leap, but unsynchronised
status. Update the callers of the function to work with the leap
directly and not change their behaviour.
REF_IsLocalActive() is no longer needed.
Use REF_GetReferenceParams() in the tracking command to simplify the
code and report the same values as what NTP clients of the server see.
When the local reference mode is active, this changes the leap status to
synchronised and reference time to one second behind current time. When
not synchronised, the root delay and root dispersion are now 1 second.
If the replaced source never had a valid reply (e.g. because it was a
bad replacement), ignore the order of addresses from the resolver to not
get stuck to a pair of addresses if the order doesn't change, or a group
of IPv4/IPv6 addresses if the resolver prefers inaccessible IP family.
When ntpd as an NTP server has active orphan mode, it doesn't update
its reference time and the reference timestamp may fail the NTP test
3 and 7. (https://bugs.ntp.org/show_bug.cgi?id=1098)
Remove both checks of the timestamp to allow chronyd to operate as
a client of ntpd server in the orphan mode. When ntpd is fixed and
old versions are no longer used, this may be reverted.
When the local reference is configured with the orphan option, NTP
sources that have stratum equal to the configured local stratum are
considered to be orphans (i.e. serving local time while not being
synchronised with real time) and are excluded from the normal source
selection. Sources with stratum larger than the local stratum are
considered to be directly on indirectly synchronised to an orphan and
are always ignored.
If no selectable source is available and all orphan sources have
reference IDs larger than the local ID, no source will be selected and
the local reference mode will be activated at some point, i.e. this host
will become an orphan. Otherwise, the orphan source with the smallest
reference ID will be selected. This ensures a group of servers polling
each other (with the same orphan configuration) which have no external
source can settle down to a state where only one server is serving its
local unsychronised time and others are synchronised to it.
Since the update to NTPv4, when the clock is in the synchronised state
and the clock updates stop (e.g. sources become unreachable), it doesn't
switch to the unsynchronised state and the local reference is never
activate. This can be a problem for clients that rely on the server to
always have root distance below some value (e.g. chronyd's maxdistance).
Add a timer that will activate the local reference when the root
distance reaches a specified threshold. It can be configured with the
distance option in the local directive (by default 1.0 second).
Allow the local directive to be specified without the stratum field.
It's an option now, with default value 10. Also, move the parsing code
to cmdparse.c to make it available to the client.
When a valid NTP reply is received, save the local address (e.g. from
IP_PKTINFO), so the reference ID which would the source use for this
host can be calculated when needed.
Add maxdrift directive to set the maximum assumed drift of the clock,
which sets the maximum frequency offset chronyd is allowed to use to
to correct the drift.
Similarly to unreachable sources and falsetickers, try to replace
sources with distance larger than the limit set by the maxdistance
directive with a newly resolved address of the hostname.
Fix conversion of floating point numbers from the cmdmon format with
very small exponents, as for instance could be in the smoothing report
when the smoothing process ends.
This was broken in commit 8e71a46173.
Add a new option (-c) to chronyc to enable printing of reports in a
column-separated values (CSV) format. IP addresses will not be resolved
to hostnames, time will be printed as number of seconds since the epoch
and values in seconds will not be converted to other units.
Add a new printf-like function to allow printing of all fields at once
and rework all commands which print a report to use it. Add functions
for printing of headers and information fields, and formatting of IP
addresses and reference IDs.
Include a random (constant) value in the hash in UTI_IPToHash() to
randomize the order in which NTP sources are stored in the hash table
and polled on start. This change also randomizes the order of clientlog
records.
Split and convert the manual into four AsciiDoc documents, a document
about installation and three documents in the manpage type for
chrony.conf, chronyd and chronyc. The minimal man pages that were
maintained separately from the manual are replaced by full man pages
generated from AsciiDoc. Info files will no longer be provided.
Some parts of the manual are rewritten, updated or trimmed. The
introduction chapter is partially merged with README. The chapter about
typical operating scenarios is included in the chrony.conf man page.
Remove automatic download and compilation of clknetsim. If clknetsim is
not found, skip all simulation tests, but don't fail "make check".
Also, respect the CLKNETSIM_PATH environment variable.
There was an incompatible change in the client access report. To avoid
bumping the protocol version drop support for the original request/reply
types and define new CLIENT_ACCESSES_BY_INDEX2 types as a newer version
of the command.
The clientlog record still uses 16-bit integers to count dropped
packets, but this will avoid an incompatible change in the command
reply if there will be a need to count more than 2^16 drops.
After restricting authentication of servers and peers to the specified
key, a short key in the key file is a security problem from the client's
point of view only if it's specified for a source.
Count total number of NTP and command hits. Count also number of log
records that were replaced when the hash table couldn't be resized due
to the memory limit.
When a server/peer was specified with a key number to enable
authentication with a symmetric key, packets received from the
server/peer were accepted if they were authenticated with any of
the keys contained in the key file and not just the specified key.
This allowed an attacker who knew one key of a client/peer to modify
packets from its servers/peers that were authenticated with other
keys in a man-in-the-middle (MITM) attack. For example, in a network
where each NTP association had a separate key and all hosts had only
keys they needed, a client of a server could not attack other clients
of the server, but it could attack the server and also attack its own
clients (i.e. modify packets from other servers).
To not allow the server/peer to be authenticated with other keys
extend the authentication test to check if the key ID in the received
packet is equal to the configured key number. As a consequence, it's
no longer possible to authenticate two peers to each other with two
different keys, both peers have to be configured to use the same key.
This issue was discovered by Matt Street of Cisco ASIG.
Consider 80 bits as the absolute minimum for a secure symmetric key. If
a loaded key is shorter, send a warning to the system log to encourage
the admin to replace it with a longer key.
Enable the PRV_Name2IPAddress() function with seccomp support and start
the helper process before loading the seccomp filter (but after dropping
root privileges). This will move the getaddrinfo() call outside the
seccomp filter and should make it more reliable as the list of required
system calls won't depend on what glibc NSS modules are used on the
system.
Since commit 8b235297, which changed address hashing, the first packet
is not sent to the first server and doesn't have the extra delay. If the
last packet is sent to the first server, the mean outgoing interval will
be significantly longer than the incoming interval and the check will
fail.
Require that at least one of the sources specified with this option is
selectable (i.e. recently reachable and not a falseticker) before
updating the clock. Together with the trust option this may be useful to
allow a trusted, but not very precise, reference clock or a trusted
authenticated NTP source to be safely combined with unauthenticated NTP
sources in order to improve the accuracy of the clock. They can be
selected and used for synchronization only if they agree with the
trusted and required source.
Assume time from a source that is specified with the trust option is
always true. It can't be rejected as falseticker in the source
selection if sources that are specified without this option don't agree
with it.
Replace thresholds that activated rate limiting with token buckets.
Response rate limiting is now not just active or inactive, the interval
and burst options directly control the response rate.
If the whole process group receives a signal (e.g. CTRL-C in terminal),
the helper process needs to keep running until it gets the QUIT request,
so the system drivers can still use it in their finalisation, e.g. to
cancel remaining slew.
In receive_reponse() don't interpret return codes in helper responses as
a non-zero value may not necessarily mean an error. Just copy errno if
it's not zero and let PRV_* functions deal with the return code.
Prepare a list of required privileged operations first and from that
define the PRIVOPS macros. This will reduce the amount of code that will
be needed when the privileged helper is used on other platforms.
Rename PRV_Initialise() to PRV_StartHelper() and add a new
initialisation function, which just sets the helper fd to -1. Move
the initialision/finalisation calls from the system drivers to main.c.
If privops is not included in the build, define empty macros for the
function names, so their calls don't have to be wrapped in #ifdefs.
With SOCK_DGRAM sockets, the helper doesn't stop as there is no error
received when the socket is closed on the daemon side.
Add a QUIT operation to the protocol which is requested when the daemon
is exiting. It has no response. Register the stopping function with
atexit() to stop the helper even when the daemon is not exiting cleanly,
e.g. due to a fatal error.
Split out the sending part of the function into send_request() and
rename it to submit_request(). This will be useful to send a request
without waiting for a response.
Also, remove the fd parameter from the functions and just use helper_fd
directly.
SOCK_SEQPACKET is preferred over SOCK_DGRAM for communication with the
helper as the process will get an error when the other end of the socket
is closed. It's not supported on all platforms.
If SOCK_SEQPACKET is defined, try creating the pair of sockets with this
type first and if that fails, fall back to SOCK_DGRAM.
When the rtcsync directive is specified in the chronyd config file,
chronyd will update the RTC via settimeofday() every 60 minutes if
the system time is synchronised to NTP.
Instead of printing some large arbitrary values use dash in the LastRx
column of the sources output and the Last/Int columns in the clients
output when no sample or hit is recorded.
Instead of time_t use a 32-bit fixed point representation with 4-bit
fraction to save the time of the last hit. The rate can now be measured
up to 16 packets per second. Maximum interval between hits is about 4
years.
Abort immediately on start if chronyd is compiled on a platform with int
shorter than 32 bits, using other representation than two's complement,
or unexpected conversion of large unsigned integers to signed.
Some libc calls like memcpy() expect the pointer to be valid even when
the size is zero and there is nothing to do. Instead of checking the
size before all such calls, modify ARR_GetElements() to return a pointer
to the array instance itself if data was not allocated yet.
Add new fields from clientlog to the report and print them in chronyc.
Rework the code to skip empty records in the hash table. The reply no
longer has variable length, all client fields are filled even if some
are empty. Reply with RPY_NULL when the facility is disabled.
When the measured NTP or command request rate of a client exceeds
a threshold, reply only to a small fraction of the requests to reduce
the network traffic. Clients are allowed to send a burst of requests.
Try to detect broken clients which increase the request rate when not
getting replies and suppress the rate limiting for them.
Add ratelimit and cmdratelimit directives to configure the thresholds,
bursts and leak rates independently for NTP and command response rate
limiting. Both are disabled by default. Commands from localhost are
never limited.
This simplifies the code and allows older records to be reused when no
more memory can be allocated for new addresses. Each slot of the hash
table has 16 records and there is no chaining between different slots.
Reused records may be newer than records in other slots, but the search
time remains constant.
Don't log NTP peer access and auth/bad command access. Also, change
types for logging number of hits from long to uint32_t. This reduces the
size of the node and allows more clients to be monitored in the same
amount of memory.
The meaning of the poll value in KoD RATE packets is not currently
defined in the NTP specification (RFC 5905). In the reference NTP
implementation it signals the minimum acceptable polling interval to the
clients. In chrony the minimum poll is set to the KoD RATE poll if it's
larger, but not to a larger value than 10.
The problem is that ntpd as a server sets the KoD RATE poll to the
maximum of the client's poll and the configured rate limiting interval.
An attacker can send a burst of spoofed packets to the server to trigger
the client's request rate limit. When the client sends its next request
and the server responds with a KoD RATE packet, the client will set its
minimum poll to the current poll and it will no longer be able to switch
to a shorter poll when needed.
ntpd could be fixed to always set the KoD RATE poll to the rate limiting
interval. Unfortunately, ntpd as a client seems to depend on the current
behavior. It tries to follow the server poll and if the KoD RATE poll
was shorter than the current poll, the polling interval would be
reduced, defeating the purpose of KoD RATE. The server fix will probably
need to wait until clients are fixed and that could take a very long
time.
For now, ignore the poll value in KoD RATE packets. Just add an extra
delay based on the current poll to the next transmit timeout and stop an
ongoing burst.
First packet after setting a source to online was sent with constant
delay (0.2s). If the period in which the source was offline was shorter
than the current polling interval, the new packet was sent sooner than
it would be if the source wasn't switched to offline and back.
Don't reset the local tx timestamp when mode is changed. When starting
the initial transmit timeout, adjust the delay to make the interval
between the two packets at least as long as the current polling
interval.
Use UTI_GetRandomBytes() instead of random() to calculate the random
part of the timeout. This was the only remaining use of random() in the
code and the srandom() call can be removed.
In client packets set the leap, stratum, reference ID, reference time,
root delay and root dispersion to constant values to not reveal the
state of the synchronization. Use precision 32 to make the receive and
transmit timestamps completely random and not reveal the local time.
Use UTI_GetRandomBytes() instead of random() to generate random bits
below precision. Save the result in NTP_int64 in the network order and
allow precision in the full range from -32 to 32. With precision 32
the fuzzing now makes the timestamp completely random and can be used to
hide the time.
Add a function to fill a buffer with random bytes which uses a better
PRNG than random(). Use arc4random() if it's available on the system.
Fall back to reading from /dev/urandom, which should be available on
all currently supported systems.
After sending a client packet, schedule a timeout to close the socket
at the time when all server replies would fail the delay test, so the
socket is not open for longer than necessary (e.g. when the server is
unreachable). With the default maxdelay of 3 seconds the timeout is 7
seconds.
For testA in the client mode require also that the time the server
needed to process the client request is not longer than 4 seconds.
With maximum peer delay this limits the interval in which the client can
accept a server reply.
To avoid problems in the very unlikely case where a timeout is so long
and new IDs are allocated so frequently that they would have a chance
to overflow and catch up with it, make sure before returning new ID that
it's currently not in use.
Timeout ID of zero can be now safely used to indicate that the timer is
not running. Remove the extra timer_running variables that were
necessary to track that.
This is useful on computers that have an RTC, but there is no battery to
keep the time when they are turned off and start with the same time on
each boot.
Instead of calling the handler directly schedule a timeout with zero
delay for resolving to make the function behave similarly to the real
asynchronous resolver. This should prevent problems with code that
inadvertently depends on this behavior and which would break only when
compiled without support for asynchronous resolving.
The system drivers may implement their own slewing which the generic
driver can use to slew faster than the maximum frequency the driver is
allowed to set directly.
Remove driver functions based on adjtime() and switch to the new timex
driver. The kernel allows the timex frequency to be set in the full
range of int32_t, which gives a maximum frequency of 32768 ppm. Round
the limit to 32500 ppm.
- a feature test macro is needed to get msg_control in struct msghdr
- variables must not be named sun to avoid conflict with a macro
- res_init() needs -lresolv
- configure tests for IPv6 and getaddrinfo need -lsocket -lnsl
- pid_t is defined as long and needs to be cast for %d format
Check if the C compiler works to get a useful error message when it
doesn't or it's missing. If the CC environment variable is not set, try
gcc and then cc.
Switch from the SunOS adjtime() based driver to the timex driver.
There is no FreeBSD-specific code, so call SYS_Timex_Initialise()
and SYS_Timex_Finalise() directly from sys.c.
Remove the driver functions based on adjtime() and switch to the new
timex driver, which is based on ntp_adjtime(). This allows chronyd to
control the kernel frequency, adjust the offset with sub-microsecond
accuracy, and set the kernel leap and sync status. A drawback is that
the maximum slew rate is now limited by the 500 ppm maximum frequency
offset, while adjtime() on NetBSD slewed by up to 5000 ppm.
Remove functions that are included in the new timex driver. Keep only
functions that have extended functionality, i.e. read and set the
frequency using the timex tick field and apply step offset with
ADJ_SETOFFSET.
Merge the code from wrap_adjtimex.c that is still needed with
sys_linux.c and remove the file.
This is based on sys_linux.c and wrap_adjtimex.c. It's intended for all
systems that support the adjtimex() or ntp_adjtime() system call. The
driver functions can be replaced with extended system-specific versions
(e.g. to control the frequency with the tick field on Linux).
Include a test program to determine how the adjtime() implementation
behaves. Check the range of supported offset, support for readonly
operation, and slew rate with different update intervals and offsets.
Also, add a test for ntp_adjtime() to check what frequency range it
supports.
Since the NTPv4 update, the detection of synchronization loops based on
the refid prevents a server to initialize its clock from its clients
after restart. Remove that part from the recommended configuration.
Also, mention the time smoothing feature.
The Linux secure computing (seccomp) facility allows a process to
install a filter in the kernel that will allow only specific system
calls to be made. The process is killed when trying to make other system
calls. This is useful to reduce the kernel attack surface and possibly
prevent kernel exploits when the process is compromised.
Use the libseccomp library to add rules and load the filter into the
kernel. Keep a list of system calls that are always allowed after
chronyd is initialized. Restrict arguments that may be passed to the
socket(), setsockopt(), fcntl(), and ioctl() system calls. Arguments
to socketcall(), which is used on some architectures as a multiplexer
instead of separate socket system calls, are not restricted for now.
The mailonchange directive is not allowed as it calls sendmail.
Calls made by the libraries that chronyd is using have to be covered
too. It's difficult to determine which system calls they need as it may
change after an upgrade and it may depend on their configuration (e.g.
resolver in libc). There are also differences between architectures. It
can all break very easily and is therefore disabled by default. It can
be enabled with the new -F option.
This is based on a patch from Andrew Griffiths <agriffit@redhat.com>.
Fix RTC_Linux_TimePreInit() to return 0 when the RTC device can be
opened, but reading its time fails to at least have the time restored
from the driftfile.
When a large spike occurs in offset_sd the drift removal interval can be
set to an excessively long time, although what ever event caused the
perturbation has passed. At the next set_sync_status() we now compare
the expected drift removal interval with that currently in effect. If
they are significantly different, the current timer is cancelled and new
cycle started using the new drift removal interval.
The optimization avoiding unnecessary setting of the kernel leap status
can cause a problem when something outside chronyd sets the status to
the new expected value. There will be no TMX_SetLeap() call which would
update the saved status and the kernel status will be overwritten with
the old (incorrect) value in a later TMX_*() call.
Always call TMX_SetLeap() to save the new value and for the log message
selection just check if a leap second has been applied.
Adds option -P to chronyd on MacOS X which can be used to enable the
thread time constraint scheduling policy. This near real-time scheduling
policy removes a 1usec bias from the 'System time' offset.
Add maxdistance directive to set the maximum root distance the sources
are allowed to have to be selected. This is useful to reject NTPv4
sources that are no longer synchronized and report large dispersion.
The default value is 3 seconds.
On NetBSD programs with write access to /dev/clockctl can adjust or set
the system clock without the root privileges. Add a function to drop the
privileges and check if the process has write access to the device to
get a more descriptive error message when the chrony uid/gid doesn't
match the owner of the device.
Call the CAM, NIO, NCR initialization functions and setup the access
restrictions before root is dropped. This will be needed on NetBSD,
where it's not possible to bind sockets to privileged ports without the
root privileges. Split the creation of the Unix domain command socket
from the CAM initialization to keep the chrony user as the owner of the
socket.
Follow the removal of the server authentication support and remove also
the client support. The -a and -f options are now silently ignored to
not break scripts. The authhash and password commands print a warning,
but they don't return an error.
With the new support for cmdmon over Unix domain sockets, authentication
is no longer necessary to authorize a client running on localhost with
the permissions of the root or chrony user/group. Remove the cmdmon
authentication support to simplify the code and significantly reduce the
attack surface of the protocol.
Only monitoring commands are now allowed remotely. Users that need to
configure chronyd remotely or locally without root/chrony permissions
are advised to use ssh and/or sudo.
Allow all commands received from the Unix domain command socket (which
is accessible only by the root and chrony user/group), even when they
are not authenticated with the command key.
Allow multiple hostnames/addresses separated by comma to be specified
with the -h option. Hostnames are resolved to up to 16 addresses. When
connecting to an address fails or no reply is received, try the next
address in the list.
Set the default value for the -h option to 127.0.0.1,::1.
If the specified hostname starts with /, consider it to be the path of
the chronyd Unix domain command socket. Create the client socket in the
same directory as the server socket (which is not accessible by others)
and change its permission to 0666 to allow chronyd running without root
privileges to send a reply. Remove the socket on exit.
Call connect() on the socket to set the remote address and switch from
sendto()/recvfrom() to send()/recv(). Setting the IP_RECVERR option no
longer seems to be necessary in order to get ECONNREFUSED errors.
Adjust the drift removal interval based on the observed offset_sd.
A newly calculated interval goes into effect after the current drift
removal has completed. When offset_sd is high, the interval is increased
resulting in fewer wakeups to adjust the drift offset. At lower values
of offset_sd the drift removal adjustment interval is pinned to 0.5
seconds. The predicted error applied at the start of an adjustment is
based on the remaining time of the drift removal that is currently in
effect. Default drift removal adjustment interval is 4.0 seconds (was
1.0). If not synchronised set interval to maximum of default interval
and current interval. Clamp elapsed drift removal time to
[0, current_drift_removal_interval] to cover clock stepping.
Call chown() in create_dir() even when the specified uid/gid is zero.
This is needed on BSD systems, where directories are created with gid
of the parent directory.
In drivers with periodic drift removal include in the adjustment also a
prediction of the error gained in half of the interval to move the mean
offset of the clock closer to zero. E.g. offset of a stable clock
drifting by 10 ppm should now be closer to 0 +/- 5 microseconds instead
of 5 +/- 5 microseconds.
Try to create the directory where will be the Unix domain command socket
bound to allow starting with empty /var/run. Check the permissions and
owner/group in case the directory already existed. It MUST NOT be
accessible by others as permissions on Unix domain sockets are ignored
on some systems (e.g. Solaris).
Create logdir and dumpdir before dropping root. Set their uid/gid to the
user chronyd will switch to. This allows chronyd to create the
directories in a directory where the user won't have write permissions
(e.g. /var/lib).
This prevents error messages when running chronyd -d/-q/-Q with default
logdir in a directory chronyd is not allowed do access after dropping
the root privileges.
The darwin kernel implementation of adjtime() does not require the
adjustment to be aligned to a tickadj boundary, and we can apply
adjustments to the nearest microsecond. Rounding is accounted for by
adding any rounding errors back into the offset.
In addition to the IPv4/IPv6 command sockets, create also a Unix domain
socket to process cmdmon requests. For now, there is no difference for
authorized commands, packets from all sockets need to be authenticated.
The default path of the socket is /var/run/chrony/chronyd.sock. It can
be configured with the bindcmdaddress directive with an address starting
with /.
Add a signal handler and rework the code to go through close_io() even
when terminated by a signal. This will allow chronyc to remove Unix
domain sockets on exit.
When a leap second is applied by the kernel, it doesn't actually clear
the STA_INS|STA_DEL bits from the status word, but the state returned
by ntp_adjtime()/adjtimex() is TIME_WAIT until the application clears
the bits.
Add "System clock status reset after leap second" log message for this
case.
Don't install chrony.txt in make install to avoid dependency on makeinfo
since chrony.texi is prepared by configure to set the default paths in
the documentation.
The kernel requires in the ADJ_SETOFFSET | ADJ_NANO mode that the
timex.time.tv_usec value is smaller than 10^9 nanosecond, which wasn't
the case with a negative integer offset (e.g. inserted leap second).
Set refid in server/broadcast packets to 127.127.1.255 when a time
smoothing offset is applied to the timestamps. This allows the clients
and administrators to detect that the server is not serving its best
estimate of the true time.
Abort when the system time gets so close to the end of 32-bit time_t
that timeouts added by delay start to overflow. This is an addition to
the loop detector in dispatch_timeouts().
It's not expected we will work with such large arrays anytime soon, but
better be safe than sorry.
Also, limit the number of elements to 2^31-1 to prevent infinite loop in
the calculation of allocated elements.
When reducing the list of selectable sources to sources with the prefer
option, sources before the first preferred source were left with the
SRC_OK status, which triggered an assertion failure in the next
selection.
The leaponly option can be used to enable a mode where only leap seconds
are smoothed out and normal offset/frequency changes are ignored. This
is useful to make the interval in which a leap second is smoothed out
constant and allow an NTP client to use multiple leap smearing servers
safely.
Sources that are not specified as a pool and have a name (i.e. not
specified by an IP address or added from chronyc) will be replaced with
a newly resolved address of the name when they become unreachable or
falseticker too.
In UTI_IsTimeOffsetSane() consider time in one year interval before
32-bit time_t overflow (in 2038) as invalid. Hopefully everything will
be using 64-bit time_t when that time comes.
Different systems may consider different time values to be valid.
Don't exit on settimeofday()/adjtimex() error in case the check in
UTI_IsTimeOffsetSane() isn't restrictive enough.
When allocating memory to save unacknowledged replies to authenticated
command requests, the last "next" pointer was not initialized to NULL.
When all allocated reply slots were used, the next reply could be
written to an invalid memory instead of allocating a new slot for it.
An attacker that has the command key and is allowed to access cmdmon
(only localhost is allowed by default) could exploit this to crash
chronyd or possibly execute arbitrary code with the privileges of the
chronyd process.
When NTP or cmdmon access was configured (from chrony.conf or via
authenticated cmdmon) with a subnet size that is indivisible by 4 and
an address that has nonzero bits in the 4-bit subnet remainder (e.g.
192.168.15.0/22 or f000::/3), the new setting was written to an
incorrect location, possibly outside the allocated array.
An attacker that has the command key and is allowed to access cmdmon
(only localhost is allowed by default) could exploit this to crash
chronyd or possibly execute arbitrary code with the privileges of the
chronyd process.
Time smoothing determines an offset that needs to be applied to the
cooked time to make it smooth for external observers. Observed offset
and frequency change slowly and there are no discontinuities. This can
be used on an NTP server to make it easier for the clients to track the
time and keep their clocks close together even when large offset or
frequency corrections are applied to the server's clock (e.g. after
being offline for longer time).
Accumulated offset and frequency are smoothed out in three stages. In
the first stage, the frequency is changed at a constant rate (wander) up
to a maximum, in the second stage the frequency stays at the maximum for
as long as needed and in the third stage the frequency is brought back
to zero.
Time smoothing is configured by the smoothtime directive. It takes two
arguments, maximum frequency offset and maximum wander. It's disabled by
default.
An attacker knowing that NTP hosts A and B are peering with each other
(symmetric association) can send a packet with random timestamps to host
A with source address of B which will set the NTP state variables on A
to the values sent by the attacker. Host A will then send on its next
poll to B a packet with originate timestamp that doesn't match the
transmit timestamp of B and the packet will be dropped. If the attacker
does this periodically for both hosts, they won't be able to synchronize
to each other. It is a denial-of-service attack.
According to [1], NTP authentication is supposed to protect symmetric
associations against this attack, but in the NTPv3 (RFC 1305) and NTPv4
(RFC 5905) specifications the state variables are updated before the
authentication check is performed, which means the association is
vulnerable to the attack even when authentication is enabled.
To fix this problem in chrony, save the originate and local timestamps
only when the authentication check (test5) passed.
[1] https://www.eecis.udel.edu/~mills/onwire.html
During inserted leap second the time is invalid, reply with
unsynchronized status to avoid confusing clients that are not smart
enough to ignore measurements close to leap second.
In addition to the system driver handling add new modes to slew or step
the system clock for leap second, or ignore it completely. This can be
configured with leapsecmode directive.
With the new pool directive chronyd is now able to replace unreachable
NTP servers with newly resolved addresses automatically. Starting
without DNS wasn't a problem since 1.25.
Server sockets are now explicitly opened and closed for normal NTP
server, NTP broadcast and NTP peering. This will allow closing the
NTP port when not needed.
This will be used to set the kernel adjtimex() variables to allow other
applications running on the system to know if the system clock is
synchronized and the estimated error and the maximum error.
The second form configures the automatic stepping, similarly to the
makestep directive. It has two parameters, stepping threshold (in
seconds) and number of future clock updates for which will be the
threshold active. This can be used with the burst command to quickly
make a new measurement and correct the clock by stepping if needed,
without waiting for chronyd to complete the measurement and update the
clock.
The minsamples and maxsamples directives now set the default value,
which can be overriden for individual sources in the server/peer/pool
and refclock directives.
Add new functions to change source's reference ID/address and reset the
instance. Use that instead of destroying and creating a new instance
when the NTP address is changed.
A new option can be now used in the pool directive: maxsources sets the
maximum number of sources that can be used from the pool, the default
value is 4.
On start, when the pool name is resolved, chronyd will add up to 16
sources, one for each resolved address. When the number of sources from
which at least one valid reply was received reaches maxsources, the
other sources will be removed.
In addition to the quadratic function, allow configuration of the
compensation with a file containing list of (temperature, compensation)
points used for linear interpolation and extrapolation.
The new NTPv4 loop synchronization check fails as clknetsim doesn't
support IP_PKTINFO yet. Use the noselect option to prevent the server to
synchronize to the client.
When using server socket to send client requests (acquisitionport 123)
and currently not waiting for a reply, the socket check will fail for
client requests from the source.
The check needs to be moved to correctly handle the requests as from an
unknown source.
The pool directive can be used to configure chronyd for a pool of NTP
servers (e.g. pool.ntp.org). The name is expected to resolve to multiple
addresses which change over time.
On start, a source will be added for each resolved address. When a
source from the pool is unreachable or marked as falseticker, chronyd
will try to replace the source with a newly resolved address of the
pool.
The minimum interval between replacements is currently set to 244
seconds to avoid frequent DNS requests.
When adding a server from configuration file, don't give up when the
first returned address was already added for another server directive,
but try adding other addresses until one succeeds.
With the recent change allowing unreachable sources to remain selected,
offline sources will now be selectable only for some time, similarly to
online unreachable sources.
Reachability is no longer a requirement for selection. An unreachable
source can remain selected if its newest sample is not older than the
oldest sample from all reachable sources.
This is useful to prevent reselection when an accurate source uses a
very short polling interval (e.g. refclock) and is occasionally
unreachable for short periods of time.
Add new source states and rename some states so there is one state for
each reason a source can be rejected in the source selection.
This fixes reported status when sources are selectable, but the actual
selection was postponed until next update. It will also allow more
detailed reports when the cmdmon protocol is updated.
Following RFC 5905, don't call REF_SetUnsynchronised() when there are no
reachable or selectable sources. It's up to the client to consider the
source unsynchronized when the root distance exceeds a threshold.
The unsynchronized status is still set when no majority is reached.
The next pointer in the last allocated reply slot was not set. This
could cause a crash when more slots were needed. (the slots are used to
save unacknowledged replies to authenticated commands)
This should reduce the number of possible memory leaks reported by
valgrind. The remaining reported leaks are sched tqe allocation, async
DNS instance allocation, cmdmon response/timestamp cell allocation, and
clientlog subnet allocation.
DEBUG_LOG("Packet added to signd queue (%u:%u)",queue_head,queue_tail);
return1;
}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.