This should allow us to eventually properly saturate high-bandwidth
network links when using TCP, once other nonoptimal parts of our
network stack are improved.
Instead of lying and claiming we always have space left in our receive
buffer, actually report the available space.
While this doesn't really affect network-bound workloads, it makes a
world of difference in cpu/disk-bound ones, like git clones. Resulting
in a considerable speed-up, and in some cases making them work at all.
(instead of the sender side hanging up the connection due to timeouts)
Currently, ephemeral port allocation is handled by the
allocate_local_port_if_needed() and protocol_allocate_local_port()
methods. Actually binding the socket to an address (which means
inserting the socket/address pair into a global map) is performed either
in protocol_allocate_local_port() (for ephemeral ports) or in
protocol_listen() (for non-ephemeral ports); the latter will fail with
EADDRINUSE if the address is already used by an existing pair present in
the map.
There used to be a bug where for listen() without an explicit bind(),
the port allocation would conflict with itself: first an ephemeral port
would get allocated and inserted into the map, and then
protocol_listen() would check again for the port being free, find the
just-created map entry, and error out. This was fixed in commit
01e5af487f by passing an additional flag
did_allocate_port into protocol_listen() which specifies whether the
port was just allocated, and skipping the check in protocol_listen() if
the flag is set.
However, this only helps if the socket is bound to an ephemeral port
inside of this very listen() call. But calling bind(sin_port = 0) from
userspace should succeed and bind to an allocated ephemeral port, in the
same was as using an unbound socket for connect() does. The port number
can then be retrieved from userspace by calling getsockname (), and it
should be possible to either connect() or listen() on this socket,
keeping the allocated port number. Also, calling bind() when already
bound (either explicitly or implicitly) should always result in EINVAL.
To untangle this, introduce an explicit m_bound state in IPv4Socket,
just like LocalSocket has already. Once a socket is bound, further
attempt to bind it fail. Some operations cause the socket to implicitly
get bound to an (ephemeral) address; this is implemented by the new
ensure_bound() method. The protocol_allocate_local_port() method is
gone; it is now up to a protocol to assign a port to the socket inside
protocol_bind() if it finds that the socket has local_port() == 0.
protocol_bind() is now called in more cases, such as inside listen() if
the socket wasn't bound before that.
This has KString, KBuffer, DoubleBuffer, KBufferBuilder, IOWindow,
UserOrKernelBuffer and ScopedCritical classes being moved to the
Kernel/Library subdirectory.
Also, move the panic and assertions handling code to that directory.
"Wherever applicable" = most places, actually :^), especially for
networking and filesystem timestamps.
This includes changes to unzip, which uses DOSPackedTime, since that is
changed for the FAT file systems.
That's what this class really is; in fact that's what the first line of
the comment says it is.
This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
This was mostly straightforward, as all the storage locations are
guarded by some related mutex.
The use of old-school associated mutexes instead of MutexProtected
is unfortunate, but the process to modernize such code is ongoing.
Using policy based design `SinglyLinkedList` and
`SinglyLinkedListWithCount` can be combined into one class which takes
a policy to determine how to keep track of the size of the list. The
default policy is to use list iteration to count the items in the list
each time. The `WithCount` form is a different policy which tracks the
size, but comes with the overhead of storing the count and
incrementing/decrementing on each modification.
This model is extensible to have other forms of counting by
implementing only a new policy instead of implementing a totally new
type.
Instead of temporary changing the open file description's "blocking"
flag while doing a non-waiting recvfrom, we instead plumb the currently
wanted blocking behavior all the way through to the underlying socket.
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:
- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable
This patch renames the Kernel classes so that they can coexist with
the original AK classes:
- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable
The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
This argument is always set to description.is_blocking(), but
description is also given as a separate argument, so there's no point
to piping it through separately.
Before this commit, we only checked the receive buffer on the socket,
which is unused on datagram streams. Now we return the actual size of
the datagram without the protocol headers, which required the protocol
to tell us what the size of the payload is.
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.
Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
Found due to smelly code in InodeFile::absolute_path.
In particular, this replaces the following misleading methods:
File::absolute_path
This method *never* returns an actual path, and if called on an
InodeFile (which is impossible), it would VERIFY_NOT_REACHED().
OpenFileDescription::try_serialize_absolute_path
OpenFileDescription::absolute_path
These methods do not guarantee to return an actual path (just like the
other method), and just like Custody::absolute_path they do not
guarantee accuracy. In particular, just renaming the method made a
TOCTOU bug obvious.
The new method signatures use KResultOr, just like
try_serialize_absolute_path() already did.
Previously there was a mix of returning plain strings and returning
explicit string views using `operator ""sv`. This change switches them
all to standardized on `operator ""sv` as it avoids a call to strlen.
The TimeWait state is intended to prevent another socket from taking the
address tuple in case any packets are still in transit after the final
close. Since this state never delivers packets to userspace, it doesn't
make sense to keep the receive buffer around.
This was a weird KBuffer API that assumed failure was impossible.
This patch converts it to a modern KResultOr<NonnullOwnPtr<KBuffer>> API
and updates the two clients to the new style.
The IPv4Socket requires a DoubleBuffer for storage of any data it
received on the socket. However it was previously using the default
constructor which can not observe allocation failure. Address this by
plumbing the receive buffer through the various derived classes.
It's easy to forget the responsibility of validating and safely copying
kernel parameters in code that is far away from syscalls. ioctl's are
one such example, and bugs there are just as dangerous as at the root
syscall level.
To avoid this case, utilize the AK::Userspace<T> template in the ioctl
kernel interface so that implementors have no choice but to properly
validate and copy ioctl pointer arguments.
This increases the buffer size for connection-oriented sockets
to 256kB. In combination with the other patches in this series
I was able to receive TCP packets at a rate of about 120Mbps.
An IP socket can now join a multicast group by using the
IP_ADD_MEMBERSHIP sockopt, which will cause it to start receiving
packets sent to the multicast address, even though this address does
not belong to this host.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
When ProcFS could no longer allocate KBuffer objects to serve calls to
read, it would just return 0, indicating EOF. This then triggered
parsing errors because code assumed it read the file.
Because read isn't supposed to return ENOMEM, change ProcFS to populate
the file data upon file open or seek to the beginning. This also means
that calls to open can now return ENOMEM if needed. This allows the
caller to either be able to successfully open the file and read it, or
fail to open it in the first place.
The overrides of this function don't need to know how the original
packet was stored, so let's just give them a ReadonlyBytes view of
the raw packet data.
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.
This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
Since the receiving socket isn't yet known at packet receive time,
keep timestamps for all packets.
This is useful for keeping statistics about in-kernel queue latencies
in the future, and it can be used to implement SO_TIMESTAMP.
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.