Commit Graph

2556 Commits

Author SHA1 Message Date
Andreas Kling
983b4bd9f2 Kernel+ProfileViewer: Move symbolication to userspace for time profiles
This makes the time profiles look like the memory profiles so we can
use the userspace symbolication code in ProfileViewer.
2020-02-22 10:09:54 +01:00
Andreas Kling
94652fd2fb Kernel: Fully validate pointers when walking stack during profiling
It's not enough to just check that things wouldn't page fault, we also
need to verify that addresses are accessible to the profiled thread.
2020-02-22 10:09:54 +01:00
Andreas Kling
f020081a38 Kernel: Put "Couldn't find user region" spam behind MM_DEBUG
This basically never tells us anything actionable anyway, and it's a
real annoyance when doing something validation-heavy like profiling.
2020-02-22 10:09:54 +01:00
Andreas Kling
b6887bd9cd Ext2FS: The max current block count of a file is size/block_size
Turns out that i_blocks does not take block list holes into account.
2020-02-21 19:07:23 +01:00
Andreas Kling
b298c01e92 Kernel: Log instead of crashing when getting a page fault during IRQ
This is definitely a bug, but it seems to happen randomly every now
and then and we need more info to track it down, so let's log for now.
2020-02-21 19:05:45 +01:00
Andreas Kling
59c052a72a Ext2FS: Allow holes in block lists
Linux creates holes in block lists for all-zero content. This is very
reasonable and we can now handle that situation as well.

Note that we're not smart enough to generate these holes ourselves yet,
but now we can at least read from such files.
2020-02-21 17:50:51 +01:00
Andreas Kling
04e40da188 Kernel: Fix crash when reading /proc/PID/vmobjects
InodeVMObjects can have nulled-out physical page slots. That just means
we haven't cached that page from disk right now.
2020-02-21 16:03:56 +01:00
Andreas Kling
59b9e49bcd Kernel: Don't trigger page faults during profiling stack walk
The kernel sampling profiler will walk thread stacks during the timer
tick handler. Since it's not safe to trigger page faults during IRQ's,
we now avoid this by checking the page tables manually before accessing
each stack location.
2020-02-21 15:49:39 +01:00
Andreas Kling
f9a138aa4b Kernel: Commit the profiling sample buffer memory up front
This avoids getting page faults while storing samples in the timer IRQ.
2020-02-21 15:49:37 +01:00
Andreas Kling
8047ff8205 Kernel: Expose the underlying Region of a KBuffer 2020-02-21 15:49:35 +01:00
Andreas Kling
d46071c08f Kernel: Assert on page fault during IRQ
We're not equipped to deal with page faults during an IRQ handler,
so add an assertion so we can immediately tell what's wrong.

This is why profiling sometimes hangs the system -- walking the stack
of the profiled thread causes a page fault and things fall apart.
2020-02-21 15:49:34 +01:00
Andreas Kling
2a679f228e Kernel: Fix bitrotted DEBUG_IO logging 2020-02-21 15:49:30 +01:00
Sergey Bugaev
1d2986ea15 Kernel: Fix a panic in VFS::rename()
If we get an -ENOENT when resolving the target because of some part, that is not
the very last part, missing, we should just return the error instead of panicking
later :^)

To test:
    $ mkdir /tmp/foo/
    $ mv /tmp/foo/ /tmp/bar/

Related to https://github.com/SerenityOS/serenity/issues/1253
2020-02-20 19:13:20 +01:00
Sergey Bugaev
3439498744 Kernel: Support trailing slashes in VFS::mkdir()
This is apparently a special case unlike any other, so let's handle it
directly in VFS::mkdir() instead of adding an alternative code path into
VFS::resolve_path().

Fixes https://github.com/SerenityOS/serenity/issues/1253
2020-02-20 19:13:20 +01:00
Andreas Kling
7592f9afd5 AK: Use size_t for CircularQueue and CircularDeque 2020-02-20 13:20:34 +01:00
Andreas Kling
88b9fcb976 AK: Use size_t for ByteBuffer sizes
This matches what we already do for string types.
2020-02-20 13:20:34 +01:00
Andreas Kling
0ba458cfa0 Kernel+LibC: Add SO_REUSEADDR macro
Note that this is not actually implemented, I'm just defining it.
2020-02-20 06:57:01 +01:00
Andreas Kling
a87544fe8b Kernel: Refuse to allocate 0 bytes of virtual address space 2020-02-19 22:19:55 +01:00
Andreas Kling
f17c377a0c Kernel: Use bitfields in Region
This makes Region 4 bytes smaller and we can use bitfield initializers
since they are allowed in C++20. :^)
2020-02-19 12:03:11 +01:00
Andreas Kling
a31ca1282e Base: Rename /dev/psaux to /dev/mouse
Since this device doesn't actually hand out raw PS/2 aux packets,
let's just call it "mouse" instead. :^)
2020-02-18 14:30:39 +01:00
Andreas Kling
bead20c40f Kernel: Remove SmapDisabler in sys$create_shared_buffer() 2020-02-18 14:12:39 +01:00
Andreas Kling
9aa234cc47 Kernel: Reset FPU state on exec() 2020-02-18 13:44:27 +01:00
Jesse Buhagiar
35ba4bf005 TTY: Reset VGA start row when setting graphical TTY
This was causing the screen (on a real machine) to be split in half.
2020-02-18 12:55:31 +01:00
Andreas Kling
a7dbb3cf96 Kernel: Use a FixedArray for a process's extra GIDs
There's not really enough of these to justify using a HashTable.
2020-02-18 11:35:47 +01:00
Andreas Kling
4b16ac0034 Kernel: Purging a page should point it back to the shared zero page
Anonymous VM objects should never have null entries in their physical
page list. Instead, "empty" or untouched pages should refer to the
shared zero page.

Fixes #1237.
2020-02-18 09:56:11 +01:00
Andreas Kling
737e455cbc SystemMenu: Add a separate program to host the system menu
This will allow us to run the system menu as any user. It will also
enable further lockdown of the WindowServer process since it should no
longer need to pledge proc and exec. :^)

Note that this program is not finished yet.

Work towards #1231.
2020-02-17 16:50:48 +01:00
Andreas Kling
48f7c28a5c Kernel: Replace "current" with Thread::current and Process::current
Suggested by Sergey. The currently running Thread and Process are now
Thread::current and Process::current respectively. :^)
2020-02-17 15:04:27 +01:00
Andreas Kling
4f4af24b9d Kernel: Tear down process address space during finalization
Process teardown is divided into two main stages: finalize and reap.

Finalization happens in the "Finalizer" kernel and runs with interrupts
enabled, allowing destructors to take locks, etc.

Reaping happens either in sys$waitid() or in the scheduler for orphans.

The more work we can do in finalization, the better, since it's fully
pre-emptible and reduces the amount of time the system runs without
interrupts enabled.
2020-02-17 14:33:06 +01:00
Andreas Kling
0e33f53cf8 Kernel: Allow multiple inspectors of a process (in /proc)
Replace Process::m_being_inspected with an inspector reference count.
This prevents an assertion from firing when inspecting the same process
in /proc from multiple processes at the same time.

It was trivially reproducible by opening multiple FileManagers.
2020-02-17 13:29:49 +01:00
Andreas Kling
9f54ea9bcd NotificationServer: Add a system service for desktop notifications
This patch adds NotificationServer, which runs as the "notify" user
and provides an IPC API for desktop notifications.

LibGUI gains the GUI::Notification class for showing notifications.

NotificationServer is spawned on demand and will unspawn after
dimissing all visible notifications. :^)

Finally, this also comes with a small /bin/notify utility.
2020-02-16 21:58:17 +01:00
Andreas Kling
9794e18a20 Base: Run WindowServer as a separate "window" user
This was actually rather painless and straightforward. WindowServer now
runs as the "window" user. Users in the "window" group can connect to
it via the socket in /tmp/portal/window as usual.
2020-02-16 21:58:17 +01:00
Andreas Kling
31e1af732f Kernel+LibC: Allow sys$mmap() callers to specify address alignment
This is exposed via the non-standard serenity_mmap() call in userspace.
2020-02-16 12:55:56 +01:00
Andreas Kling
7a8be7f777 Kernel: Remove SmapDisabler in sys$accept() 2020-02-16 08:20:54 +01:00
Andreas Kling
7717084ac7 Kernel: Remove SmapDisabler in sys$clock_gettime() 2020-02-16 08:13:11 +01:00
Andreas Kling
7533d61458 Kernel: Fix weird whitespace mistake in RangeAllocator 2020-02-16 08:01:33 +01:00
Andreas Kling
e90765e957 Kernel: Remove Process inheriting from Weakable
This mechanism wasn't actually used to create any WeakPtr<Process>.
Such pointers would be pretty hard to work with anyway, due to the
multi-step destruction ritual of Process.
2020-02-16 02:16:22 +01:00
Andreas Kling
635ae70b8f Kernel: More header dependency reduction work 2020-02-16 02:15:33 +01:00
Andreas Kling
16818322c5 Kernel: Reduce header dependencies of Process and Thread 2020-02-16 02:01:42 +01:00
Andreas Kling
e28809a996 Kernel: Add forward declaration header 2020-02-16 01:50:32 +01:00
Andreas Kling
1d611e4a11 Kernel: Reduce header dependencies of MemoryManager and Region 2020-02-16 01:33:41 +01:00
Andreas Kling
a356e48150 Kernel: Move all code into the Kernel namespace 2020-02-16 01:27:42 +01:00
Andreas Kling
1f55079488 Kernel: Remove SmapDisabler in sys$getgroups() 2020-02-16 00:30:00 +01:00
Andreas Kling
eb7b0c76a8 Kernel: Remove SmapDisabler in sys$setgroups() 2020-02-16 00:27:10 +01:00
Andreas Kling
0341ddc5eb Kernel: Rename RegisterDump => RegisterState 2020-02-16 00:15:37 +01:00
Andreas Kling
5507945306 Kernel: Widen PhysicalPage refcount to 32 bits
A 16-bit refcount is just begging for trouble right nowl.
A 32-bit refcount will be begging for trouble later down the line,
so we'll have to revisit this eventually. :^)
2020-02-15 22:34:48 +01:00
Andreas Kling
c624d3875e Kernel: Use a shared physical page for zero-filled pages until written
This patch adds a globally shared zero-filled PhysicalPage that will
be mapped into every slot of every zero-filled AnonymousVMObject until
that page is written to, achieving CoW-like zero-filled pages.

Initial testing show that this doesn't actually achieve any sharing yet
but it seems like a good design regardless, since it may reduce the
number of page faults taken by programs.

If you look at the refcount of MM.shared_zero_page() it will have quite
a high refcount, but that's just because everything maps it everywhere.
If you want to see the "real" refcount, you can build with the
MAP_SHARED_ZERO_PAGE_LAZILY flag, and we'll defer mapping of the shared
zero page until the first NP read fault.

I've left this behavior behind a flag for future testing of this code.
2020-02-15 13:17:40 +01:00
Andreas Kling
1828d9eadd Kernel: Remove some commented-out code in Scheduler::yield() 2020-02-10 20:16:50 +01:00
Andreas Kling
7cf33a8ccb Kernel: Remove outdated FIXME from Scheduler 2020-02-10 20:15:53 +01:00
Andreas Kling
27f0102bbe Kernel: Add getter and setter for the X86 CR3 register
This gets rid of a bunch of inline assembly.
2020-02-10 20:00:32 +01:00
Andreas Kling
580a94bc44 Kernel+LibC: Merge sys$stat() and sys$lstat()
There is now only one sys$stat() instead of two separate syscalls.
2020-02-10 19:49:49 +01:00
Andreas Kling
ccfee3e573 Kernel: Remove more <LibBareMetal/Output/kstdio.h> includes 2020-02-10 12:07:48 +01:00
Andreas Kling
6cbd72f54f AK: Remove bitrotted Traits::dump() mechanism
This was only used by HashTable::dump() which I used when doing the
first HashTable implementation. Removing this allows us to also remove
most includes of <AK/kstdio.h>.
2020-02-10 11:55:34 +01:00
Shannon Booth
fe668db999 Meta: Fix shellcheck warnings in various scripts
Warnings fixed:
 * SC2086: Double quote to prevent globbing and word splitting.
 * SC2006: Use $(...) notation instead of legacy backticked `...`
 * SC2039: In POSIX sh, echo flags are undefined
 * SC2209: Use var=$(command) to assign output (or quote to assign string)
 * SC2164: Use 'cd ... || exit' or 'cd ... || return' in case cd fails
 * SC2166: Prefer [ p ] && [ q ] as [ p -a q ] is not well defined.
 * SC2034: i appears unused. Verify use (or export if used externally)
 * SC2046: Quote this to prevent word splitting.
 * SC2236: Use -z instead of ! -n.

There are still a lot of warnings in Kernel/run about:
 - SC2086: Double quote to prevent globbing and word splitting.

However, splitting on space is intentional in this case, and not trivial to
change. Therefore ignore the warning for now - but we should fix this in
the future.
2020-02-10 10:46:25 +01:00
Liav A
99ea80695e Kernel: Use VirtualAddress & PhysicalAddress classes from LibBareMetal 2020-02-09 19:38:17 +01:00
Liav A
e559af2008 Kernel: Apply changes to use LibBareMetal definitions 2020-02-09 19:38:17 +01:00
Andreas Kling
db2ede9427 Net: Short-circuit routing to the IPv4 address of a local adapter
This makes it possible to open http://localhost:8000/ in Browser. :^)
2020-02-09 14:15:55 +01:00
Andreas Kling
6c752c15a2 WebServer: Implement a very basic HTTP server :^)
This server listens on port 8000 and serves HTML files from /www.
It's very simple and quite naive, but I think we can start here and
build our way to something pretty neat.

Work towards #792.
2020-02-09 14:15:55 +01:00
Andreas Kling
271bc4b2f2 Net: When routing to loopback, use the loopback adapter's MAC address
Otherwise the routing decision gets interpreted as "host unreachable."
2020-02-09 14:15:55 +01:00
Andreas Kling
d8a30e2ad2 Net: Give the LoopbackAdapter a MAC address
Since the routing code currently interprets an all-zero MAC address as
an invalid next hop, let's give the loopback adapter an address.
2020-02-09 14:15:55 +01:00
Conrad Pankoff
a189285658 Kernel: Support reading/writing PATADiskDevices directly via /dev/hdX 2020-02-09 12:58:45 +01:00
asliturk
57edcb54c2 MenuApplets: Add UserName applet.
Move code from WindowServer.WSMenuManager to the applet.
2020-02-09 10:37:35 +01:00
Andreas Kling
deb154be61 Kernel: Send SIGPIPE to the current thread on write to a broken pipe 2020-02-08 19:12:06 +01:00
Andreas Kling
70c9a89707 IPv4: Put some TCP close handshake debug spam behind TCP_SOCKET_DEBUG 2020-02-08 16:04:58 +01:00
Andreas Kling
3891e6d739 IPv4: Sockets should say can_read() after reading is shut down
This allows clients to get their EOF after shutting down reading.
2020-02-08 16:04:31 +01:00
Andreas Kling
228a1e9099 IPv4: Basic implementation of TCP socket shutdown
We can now participate in the TCP connection closing handshake. :^)
This implementation is definitely not complete and needs to handle a
bunch of other cases. But it's a huge improvement over not being able
to close connections at all.

Note that we hold on to pending-close sockets indefinitely, until they
are moved into the Closed state. This should also have a timeout but
that's still a FIXME. :^)

Fixes #428.
2020-02-08 16:04:27 +01:00
Andreas Kling
1037a1d2ba IPv4: Don't ACK empty TCP packets
Wireshark was complaining about duplicate ACK's and this was why.
2020-02-08 14:09:02 +01:00
Andreas Kling
48f13f2a81 IPv4: Split IPv4Socket::recvfrom() into packet/byte buffered functions
This code was really hard to follow since it handles two separate modes
of buffering the data.
2020-02-08 13:09:37 +01:00
Andreas Kling
00d8ec3ead Kernel: The inode fault handler should grab the VMObject lock earlier
It doesn't look healthy to create raw references into an array before
a temporary unlock. In fact, that temporary unlock looks generally
unhealthy, but it's a different problem.
2020-02-08 12:55:21 +01:00
Andreas Kling
a9d7902bb7 x86: Simplify region unmapping a bit
Add PageTableEntry::clear() to zero out a whole PTE, and use that for
unmapping instead of clearing individual fields.
2020-02-08 12:49:38 +01:00
Andreas Kling
7291370478 Kernel: Make File::truncate() take a u64
No point in taking a signed type here. We validate at the syscall layer
and then pass around a u64 from then on.
2020-02-08 12:07:04 +01:00
Andreas Kling
42d41fdf94 Kernel: Simplify FS::create_inode() a little bit
Return a KResultOr<NonnullRefPtr<Inode>> instead of returning errors in
an out-parameter.
2020-02-08 11:58:28 +01:00
Andreas Kling
2f82d4fb31 Kernel: Add KResultOr<T>::result()
This is just a handy way to get either an error or a KSuccess, even if
there is a T present.
2020-02-08 11:57:53 +01:00
Andreas Kling
f91b3aab47 Kernel: Cloned shared regions should also be marked as shared 2020-02-08 02:39:46 +01:00
Andreas Kling
8731682d0e Kernel: Simplify FS::create_directory() a little bit
None of the clients of this function actually used the returned Inode,
so it can simply return a KResult instead.
2020-02-08 02:34:22 +01:00
Andreas Kling
cb97ef5589 Ext2FS: Fail with EMFILE if we would overflow i_links_count 2020-02-08 02:26:33 +01:00
Andreas Kling
88ea152b24 Kernel: Merge unnecessary DiskDevice class into BlockDevice 2020-02-08 02:20:03 +01:00
Andreas Kling
6be880bd10 IPv4: Send TCP packets right away instead of waiting to "retry"
Also be more explicit about zero-initializing OutgoingPacket objects.
2020-02-08 01:45:45 +01:00
Andreas Kling
0c12d9a618 IPv4: Drop incoming packets on sockets that are shut down for reading 2020-02-08 00:58:11 +01:00
Andreas Kling
2b0b7cc5a4 Net: Add a basic sys$shutdown() implementation
Calling shutdown prevents further reads and/or writes on a socket.
We should do a few more things based on the type of socket, but this
initial implementation just puts the basic mechanism in place.

Work towards #428.
2020-02-08 00:54:43 +01:00
Andreas Kling
a3f39fe789 Net: Make NetworkAdapter reference-counted
The idea behind WeakPtr<NetworkAdapter> was to support hot-pluggable
network adapters, but on closer thought, that's super impractical so
let's not go down that road.
2020-02-08 00:19:46 +01:00
Andreas Kling
f3a5985bb2 Kernel: Remove two bad FIXME's
We should absolutely *not* create a new thread in sys$exec().
There's also no sys$spawn() anymore.
2020-02-08 00:06:15 +01:00
Andreas Kling
71ca7ba31f Kernel: Fix three broken format strings in Socket::{get,set}sockopt()
These had more %'s than actual arguments, oops!
2020-02-07 23:49:15 +01:00
Andreas Kling
d04fcccc90 Kernel: Truncate addresses stored by getsockname() and getpeername()
If there's not enough space in the output buffer for the whole sockaddr
we now simply truncate the address instead of returning EINVAL.

This patch also makes getpeername() actually return the peer address
rather than the local address.. :^)
2020-02-07 23:43:32 +01:00
Andreas Kling
083b81a566 Kernel: Allow PS2MouseDevice to read multiple packets
We were overwriting the start of the output buffer over and over when
reading multiple mouse packets.
2020-02-07 11:25:12 +01:00
Andreas Kling
dc18859695 Kernel: memset() all siginfo_t structs after creating them 2020-02-06 14:12:20 +01:00
Sergey Bugaev
1b866bbf42 Kernel: Fix sys$waitid(P_ALL, WNOHANG) return value
According to POSIX, waitid() should fill si_signo and si_pid members
with zeroes if there are no children that have already changed their
state by the time of the call. Let's just fill the whole structure
with zeroes to avoid leaking kernel memory.
2020-02-06 16:06:30 +03:00
Liav A
0bce5f7403 Kernel Commandline: Change nopci_mmio to be pci_mmio
Instead of having nopci_mmio, the boot argument now is
pci_mmio='on|off'.
2020-02-05 23:01:41 +01:00
Liav A
8a41256497 Kernel Commandline: Change no_vmmouse boot argument to be vmmouse
Instead of having no_vmmouse, the boot argument now is vmmouse='on|off'.
2020-02-05 23:01:41 +01:00
Liav A
b5857ceaad Kernel Commandline: Remove noacpi & noacpi_aml boot arguments
Instead of having boot arguments like noacpi & noacpi_aml, we have one
boot argument - acpi='on|off|limited'.
2020-02-05 23:01:41 +01:00
Andreas Kling
75cb125e56 Kernel: Put sys$waitid() debug logging behind PROCESS_DEBUG 2020-02-05 19:14:56 +01:00
Liav A
f6ce24eb48 Kernel: Move the VMWare helpers out of the IO namespace 2020-02-05 18:58:27 +01:00
Liav A
8e8f5c212b Kernel: Fix vmmouse detection method
Also, add debug messages in the VMWareBackdoor class.
2020-02-05 18:58:27 +01:00
Liav A
6070fe581b Kernel: Add support for high bandwidth IO communication with VMWare 2020-02-05 18:58:27 +01:00
Sergey Bugaev
b3a24d732d Kernel+LibC: Add sys$waitid(), and make sys$waitpid() wrap it
sys$waitid() takes an explicit description of whether it's waiting for a single
process with the given PID, all of the children, a group, etc., and returns its
info as a siginfo_t.

It also doesn't automatically imply WEXITED, which clears up the confusion in
the kernel.
2020-02-05 18:14:37 +01:00
Sergey Bugaev
a6cb7f759e Kernel+LibC: Add some Unix signal types & definitions 2020-02-05 18:14:37 +01:00
Liav A
47978a5828 Kernel: Add support for vmmouse
We add this feature together with the VMWareBackdoor class.
VMWareBackdoor class is responsible for enabling the vmmouse, and then
controlling it from the PS2 mouse IRQ handler.
2020-02-04 19:11:52 +01:00
Sergey Bugaev
0334656e45 Kernel: Stub absolute mouse positioning support
This is not the real kernel patch, @supercomputer7 is doing that :^)
2020-02-04 19:11:52 +01:00
Peter Wang
3969fcc72e build-root-filesystem.sh: Set umask to 0022
On my system (Void Linux) the root user has a default umask of 0077,
causing files and directories in the disk image to have zero group and
world permissions.
2020-02-03 19:52:02 +01:00
Andreas Kling
3879e5b9d4 Kernel: Start working on a syscall for logging performance events
This patch introduces sys$perf_event() with two event types:

- PERF_EVENT_MALLOC
- PERF_EVENT_FREE

After the first call to sys$perf_event(), a process will begin keeping
these events in a buffer. When the process dies, that buffer will be
written out to "perfcore" in the current directory unless that filename
is already taken.

This is probably not the best way to do this, but it's a start and will
make it possible to start doing memory allocation profiling. :^)
2020-02-02 20:26:27 +01:00
Andreas Kling
25b635c841 Kernel: Remove unnecessary forward declaration in SlabAllocator 2020-02-02 20:25:41 +01:00
Andreas Kling
ea8d386146 Kernel: Update Thread::raw_backtrace() signature to use uintptr_t 2020-02-02 19:00:38 +01:00
Liav A
583e9ad372 Kernel: Detect devices when enumerating the PCI bus
Instead of making each driver to enumerate the PCI bus itself,
PCI::Initializer will call detect_devices() to do one enumeration
of the bus.
2020-02-02 00:57:13 +01:00
Liav A
60715695b2 Partition Table: Change Script files
From now we can use build-image-grub.sh to generate a virtual disk
with the supported partition schemes - MBR, GPT & EBR (MBR +
Extended partitions).
2020-02-02 00:20:41 +01:00
Liav A
81544dc5b4 Partition Table: Add support for Extended partitions
Now also MBR configurations with extended partitions are supported.
2020-02-02 00:20:41 +01:00
Liav A
8cde707931 Partition Table: Replace __attribute__((packed)) with [[gnu::packed]] 2020-02-02 00:20:41 +01:00
Liav A
5d760bf172 Partition Table: Allow to boot with a partition number higher than 4
This is true currently only to GUID partitions,
Booting with an MBR partition is still limited to partition numbers 1-4.
2020-02-01 17:32:25 +01:00
Liav A
a3113721d4 Partition Table: Replace __attribute__((packed)) with [[gnu::packed]] 2020-02-01 17:32:25 +01:00
Andreas Kling
625f6c0d86 Kernel: Add -fbuiltin to Kernel CXXFLAGS
This allows the compiler to assume some helpful things, like strlen()
not modifying global memory and thus being a safe inlinable thing.
2020-02-01 13:52:52 +01:00
Andreas Kling
37d336d741 Kernel: Add memory scrubbing in slab_alloc() and slab_dealloc()
These now scrub allocated and freed memory like kmalloc()/kfree() was
already doing.
2020-02-01 10:56:17 +01:00
Andreas Kling
934b1d8a9b Kernel: Finalizer should not go back to sleep if there's more to do
Before putting itself back on the wait queue, the finalizer task will
now check if there's more work to do, and if so, do it first. :^)

This patch also puts a bunch of process/thread debug logging behind
PROCESS_DEBUG and THREAD_DEBUG since it was unbearable to debug this
stuff with all the spam.
2020-02-01 10:56:17 +01:00
Andreas Kling
8d51352b96 Kernel: Add crash logging heuristic for uninitialized kmalloc()/kfree()
Since we scrub both kmalloc() and kfree() with predictable values, we
can log a helpful message when hitting a crash that looks like it might
be a dereference of such scrubbed data.
2020-02-01 10:56:17 +01:00
Andreas Kling
f2846e8e08 Kernel: Allow short writes to DoubleBuffer
DoubleBuffer is the internal buffer for things like TTY, FIFO, sockets,
etc. If you try to write more than the buffer can hold, it will now
do a short write instead of asserting.

This is likely to expose issues at higher levels, and we'll have to
deal with them as they are discovered.
2020-02-01 10:56:17 +01:00
Andreas Kling
c44b4d61f3 Kernel: Make Inode::lookup() return a RefPtr<Inode>
Previously this API would return an InodeIdentifier, which meant that
there was a race in path resolution where an inode could be unlinked
in between finding the InodeIdentifier for a path component, and
actually resolving that to an Inode object.

Attaching a test that would quickly trip an assertion before.

Test: Kernel/path-resolution-race.cpp
2020-02-01 10:56:17 +01:00
William McPherson
61a6ae038e Kernel: Add key_code_count 2020-01-31 13:13:04 +01:00
Andreas Kling
625ab1f527 Kernel: LocalSocket should fail with EADDRINUSE for already-bound files 2020-01-30 22:15:45 +01:00
Andreas Kling
6634da31d9 Kernel: Disallow empty ranges in munmap/mprotect/madvise 2020-01-30 21:55:49 +01:00
Andreas Kling
bf5b7c32d8 Kernel: Add some sanity assertions in RangeAllocator::deallocate()
We should never end up deallocating an empty range, or a range that
ends before it begins.
2020-01-30 21:51:27 +01:00
Andreas Kling
31a141bd10 Kernel: Range::contains() should reject ranges with 2^32 wrap-around 2020-01-30 21:51:27 +01:00
Andreas Kling
31d1c82621 Kernel: Reject non-user address ranges in mmap/munmap/mprotect/madvise
There's no valid reason to allow non-userspace address ranges in these
system calls.
2020-01-30 21:51:27 +01:00
Andreas Kling
afd2b5a53e Kernel: Copy "stack" and "mmap" bits when splitting a Region 2020-01-30 21:51:27 +01:00
Andreas Kling
c9e877a294 Kernel: Address validation helpers should take size_t, not ssize_t 2020-01-30 21:51:27 +01:00
Andreas Kling
164d9ecad7 Kernel: Some more int => size_t in NetworkAdapter and subclasses 2020-01-30 21:51:27 +01:00
Sergey Bugaev
3ffdff5c02 Kernel: Dump backtrace when denying a path because of a veil
This will make it much easier to see why a process wants to open the file.
2020-01-30 12:23:22 +01:00
Andreas Kling
a27c5d2fb7 Kernel: Fail with EFAULT for any address+size that would wrap around
Previously we were only checking that each of the virtual pages in the
specified range were valid.

This made it possible to pass in negative buffer sizes to some syscalls
as long as (address) and (address+size) were on the same page.
2020-01-29 12:56:07 +01:00
Andreas Kling
03837e37a3 Kernel: Make IPv4Socket::protocol_send() use a size_t for buffer size 2020-01-29 12:27:42 +01:00
Andreas Kling
1d2c9dbc3a BXVGA: Disallow resolutions higher than 4096x2160
There's no sense in allowing arbitrarily huge resolutions. Instead, we
now cap the screen size at 4K DCI resolution and will reject attempts
to go bigger with EINVAL.
2020-01-28 20:57:40 +01:00
Andreas Kling
c17f80e720 Kernel: AnonymousVMObject::create_for_physical_range() should fail more
Previously it was not possible for this function to fail. You could
exploit this by triggering the creation of a VMObject whose physical
memory range would wrap around the 32-bit limit.

It was quite easy to map kernel memory into userspace and read/write
whatever you wanted in it.

Test: Kernel/bxvga-mmap-kernel-into-userspace.cpp
2020-01-28 20:48:07 +01:00
Andreas Kling
bd059e32e1 Kernel: Tweak some include statements 2020-01-28 20:42:27 +01:00
Andreas Kling
8131875da6 Kernel: Remove outdated comment in MemoryManager
Regions *do* zero-fill on demand now. :^)
2020-01-28 10:28:04 +01:00
Andreas Kling
c64904a483 Kernel: sys$readlink() should return the number of bytes written out 2020-01-27 21:50:51 +01:00
Andreas Kling
8b49804895 Kernel: sys$waitpid() only needs the waitee thread in the stopped case
If the waitee process is dead, we don't need to inspect the thread.

This fixes an issue with sys$waitpid() failing before reap() since
dead processes will have no remaining threads alive.
2020-01-27 21:21:48 +01:00
Andreas Kling
f4302b58fb Kernel: Remove SmapDisablers in sys$getsockname() and sys$getpeername()
Instead use the user/kernel copy helpers to only copy the minimum stuff
needed from to/from userspace.

Based on work started by Brian Gianforcaro.
2020-01-27 21:11:36 +01:00
Andreas Kling
5163c5cc63 Kernel: Expose the signal that stopped a thread via sys$waitpid() 2020-01-27 20:47:10 +01:00
Andreas Kling
638fe6f84a Kernel: Disable interrupts while looking into the thread table
There was a race window in a bunch of syscalls between calling
Thread::from_tid() and checking if the found thread was in the same
process as the calling thread.

If the found thread object was destroyed at that point, there was a
use-after-free that could be exploited by filling the kernel heap with
something that looked like a thread object.
2020-01-27 14:04:57 +01:00
Andreas Kling
17210a39e4 Kernel: Remove ancient hack that put the current PID in TSS.SS2
While I was bringing up multitasking, I put the current PID in the SS2
(ring 2 stack segment) slot of the TSS. This was so I could see which
PID was currently running when just inspecting the CPU state.
2020-01-27 13:10:24 +01:00
Andreas Kling
ae0f92a0a1 Kernel: Simplify kernel thread stack allocation
We had two identical code paths doing this for some reason.
2020-01-27 12:52:45 +01:00
Andreas Kling
c1f74bf327 Kernel: Never validate access to the kmalloc memory range
Memory validation is used to verify that user syscalls are allowed to
access a given memory range. Ring 0 threads never make syscalls, and
so will never end up in validation anyway.

The reason we were allowing kmalloc memory accesses is because kernel
thread stacks used to be allocated in kmalloc memory. Since that's no
longer the case, we can stop making exceptions for kmalloc in the
validation code.
2020-01-27 12:43:21 +01:00
Andreas Kling
23ffd6c319 Kernel+LibC+Userland: Switch to 64-bit time_t
Let's not have that 2038 problem people are talking about. :^)
2020-01-27 10:59:29 +01:00
Andreas Kling
137a45dff2 Kernel: read()/write() should respect timeouts when used on a sockets
Move timeout management to the ReadBlocker and WriteBlocker classes.
Also get rid of the specialized ReceiveBlocker since it no longer does
anything that ReadBlocker can't do.
2020-01-26 17:54:23 +01:00
Andreas Kling
2ce9a705e3 IPv4: Mark UDP sockets as connected immediately upon connect()
This makes it possible to write() to a blocking UDPSocket. Previously
this was not possible since can_write() depends on is_connected().
2020-01-26 14:43:08 +01:00
Andreas Kling
388d40d755 IPv4: Fix bitrot in IPv4Socket debug logging 2020-01-26 14:42:44 +01:00
Andreas Kling
22d563b1aa IPv4: Don't hold IPv4Socket lock when blocking on byte-buffered receive 2020-01-26 10:26:27 +01:00
Andreas Kling
1d506a935c Ext2FS: Give names to some KBuffers
The more we give names to KBuffers, the easier it gets to understand
what's what in a kernel region dump. :^)
2020-01-26 10:18:18 +01:00
Andreas Kling
b011857e4f Kernel: Make writev() work again
Vector::ensure_capacity() makes sure the underlying vector buffer can
contain all the data, but it doesn't update the Vector::size().

As a result, writev() would simply collect all the buffers to write,
and then do nothing.
2020-01-26 10:10:15 +01:00
Andreas Kling
b93f6b07c2 Kernel: Make sched_setparam() and sched_getparam() operate on threads
Instead of operating on "some random thread in PID", these now operate
on the thread with a specific TID. This matches other systems better.
2020-01-26 09:58:58 +01:00
Andreas Kling
67950c80c8 Kernel: Zero-initialize LocalSocket::m_address
It was possible to read uninitialized kernel memory via getsockname().
Of course, kmalloc() is a good boy and scrubs new allocations with 0xBB
so all you got was a bunch of 0xBB.
2020-01-26 09:48:53 +01:00
Marios Prokopakis
da296f5865 Ext2FS: allocate_blocks allocates contiguous blocks (#1095)
This implementation uses the new helper method of Bitmap called
find_longest_range_of_unset_bits. This method looks for the biggest 
range of contiguous bits unset in the bitmap and returns the start of
the range back to the caller.
2020-01-26 09:48:24 +01:00
Andreas Kling
edbe7d3769 Kernel: Unbreak canonical mode TTY erase after LibVT changes
Now that LibVT's backspace character (8) is non-destructive, the kernel
line editing code has to take care of erasing manually.
2020-01-25 20:44:33 +01:00
Andreas Kling
f4e7aecec2 Kernel: Preserve CoW bits when splitting VM regions 2020-01-25 17:57:10 +01:00
Andreas Kling
7cc0b18f65 Kernel: Only open a single description for stdio in non-fork processes 2020-01-25 17:05:02 +01:00
Andreas Kling
603bf6fb4a Build: Remove -fno-sized-deallocation -Wno-sized-deallocation
Add sized variants of the global operator delete functions so we don't
have to use these GCC options anymore.
2020-01-25 16:59:21 +01:00
Andreas Kling
81ddd2dae0 Kernel: Make sys$setsid() clear the calling process's controlling TTY 2020-01-25 14:53:48 +01:00
Andreas Kling
f309381d4e Ext2FS: Use more dbg() in Ext2FS code
We should use dbg() instead of dbgprintf() as much as possible to
protect ourselves against format string bugs. Here's a bunch of
conversions in Ext2FS.
2020-01-25 14:30:53 +01:00
Andreas Kling
2bf11b8348 Kernel: Allow empty strings in validate_and_copy_string_from_user()
Sergey pointed out that we should just allow empty strings everywhere.
2020-01-25 14:14:11 +01:00
Andreas Kling
69de90a625 Kernel: Simplify Process constructor
Move all the fork-specific inheritance logic to sys$fork(), and all the
stuff for setting up stdio for non-fork ring 3 processes moves to
Process::create_user_process().

Also: we were setting up the PGID, SID and umask twice. Also the code
for copying the open file descriptors was overly complicated. Now it's
just a simple Vector copy assignment. :^)
2020-01-25 14:13:47 +01:00
Andreas Kling
0f5221568b Kernel: sys$execve() should not EFAULT for empty argument strings
It's okay to exec { "/bin/echo", "" } and it should not EFAULT.
2020-01-25 12:21:30 +01:00
Andreas Kling
e576c9e952 Kernel: Clear ESI and EDI on syscall entry
Since these are not part of the system call convention, we don't care
what userspace had in there. Might as well scrub it before entering
the kernel.

I would scrub EBP too, but that breaks the comfy kernel-thru-userspace
stack traces we currently get. It can be done with some effort.
2020-01-25 10:34:32 +01:00
Andreas Kling
b0192cfb38 Meta: Remove some copyright headers added in error 2020-01-25 10:34:32 +01:00
Sergey Bugaev
c0b32f7b76 Meta: Claim copyright for files created by me
This changes copyright holder to myself for the source code files that I've
created or have (almost) completely rewritten. Not included are the files
that were significantly changed by others even though it was me who originally
created them (think HtmlView), or the many other files I've contributed code to.
2020-01-24 15:15:16 +01:00
Andreas Kling
03d73cbaae Kernel: Allow Socket subclasses to fail construction
For example, socket(AF_INET) should only succeed for valid SOCK_TYPEs.
2020-01-23 21:33:15 +01:00
Andreas Kling
3de5439579 AK: Let's call decrementing reference counts "unref" instead of "deref"
It always bothered me that we're using the overloaded "dereference"
term for this. Let's call it "unreference" instead. :^)
2020-01-23 15:14:21 +01:00
Andreas Kling
15aac1f9e9 Build: Fix silly mistake in makeall.sh 2020-01-23 10:41:07 +01:00
Andreas Kling
e64c335e5a Revert "Kernel: Replace IRQHandler with the new InterruptHandler class"
This reverts commit 6c72736b26.

I am unable to boot on my home machine with this change in the tree.
2020-01-22 22:27:06 +01:00
Oliver Kraitschy
8e21e31b3a Build: use absolute path for /sbin/mke2fs
Distros like Debian and Ubuntu don't have /sbin in PATH, thus mke2fs is
not found.
2020-01-22 22:04:29 +01:00
Jesse Buhagiar
9cbce68b1d Meta: Change copyright holder of `FloppyDiskDevice.* 2020-01-22 14:37:44 +01:00
Liav A
6c72736b26 Kernel: Replace IRQHandler with the new InterruptHandler class
System components that need an IRQ handling are now inheriting the
InterruptHandler class.

In addition to that, the initialization process of PATAChannel was
changed to fit the changes.
PATAChannel, E1000NetworkAdapter and RTL8139NetworkAdapter are now
inheriting from PCI::Device instead of InterruptHandler directly.
2020-01-22 12:22:09 +01:00
Liav A
1ee37245cd Kernel: Introduce IRQ sharing support
The support is very basic - Each component that needs to handle IRQs
inherits from InterruptHandler class. When the InterruptHandler
constructor is called it registers itself in a SharedInterruptHandler.
When an IRQ is fired, the SharedInterruptHandler is invoked, then it
iterates through a list of the registered InterruptHandlers.

Also, InterruptEnabler class was created to provide a way to enable IRQ
handling temporarily, similar to InterruptDisabler (in CPU.h, which does
the opposite).

In addition to that a PCI::Device class has been added, that inherits
from InterruptHandler.
2020-01-22 12:22:09 +01:00
Liav A
2a160faf98 Kernel: Run clang-format on KeyboardDevice.cpp 2020-01-22 12:22:09 +01:00
Andreas Kling
30ad7953ca Kernel: Rename UnveilState to VeilState 2020-01-21 19:28:59 +01:00
Andreas Kling
66598f60fe SystemMonitor: Show process unveil() state as "Veil"
A process has one of three veil states:

- None: unveil() has never been called.
- Dropped: unveil() has been called, and can be called again.
- Locked: unveil() has been called, and cannot be called again.
2020-01-21 18:56:23 +01:00
Andreas Kling
f38cfb3562 Kernel: Tidy up debug logging a little bit
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.

This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
2020-01-21 16:16:20 +01:00
Andreas Kling
07075cd001 Kernel+LibC: Clean up open() flag (O_*) definitions
These were using a mix of decimal, octal and hexadecimal for no reason.
2020-01-21 13:34:39 +01:00
Andreas Kling
6081c76515 Kernel: Make O_RDONLY non-zero
Sergey suggested that having a non-zero O_RDONLY would make some things
less confusing, and it seems like he's right about that.

We can now easily check read/write permissions separately instead of
dancing around with the bits.

This patch also fixes unveil() validation for O_RDWR which previously
forgot to check for "r" permission.
2020-01-21 13:27:08 +01:00
Andreas Kling
1b3cac2f42 Kernel: Don't forget about unveiled paths with zero permissions
We need to keep these around, otherwise the calling process can remove
and re-add a path to increase its permissions.
2020-01-21 11:42:28 +01:00
Liav A
200a5b0649 Kernel: Remove map_for_kernel() in MemoryManager
We don't need to have this method anymore. It was a hack that was used
in many components in the system but currently we use better methods to
create virtual memory mappings. To prevent any further use of this
method it's best to just remove it completely.

Also, the APIC code is disabled for now since it doesn't help booting
the system, and is broken since it relies on identity mapping to exist
in the first 1MB. Any call to the APIC code will result in assertion
failed.

In addition to that, the name of the method which is responsible to
create an identity mapping between 1MB to 2MB was changed, to be more
precise about its purpose.
2020-01-21 11:29:58 +01:00
Liav A
60c32f44dd Kernel: ACPI code doesn't rely on identity mapping anymore
The problem was mostly in the initialization code, since in that stage
the parser assumed that there is an identity mapping in the first 1MB of
the address space. Now during initialization the parser will create the
correct mappings to locate the required data.
2020-01-21 11:29:58 +01:00
Liav A
325022cbd7 Kernel: DMIDecoder no longer depends on identity-mapping
DMIDecoder creates the mappings using the standard helpers, thus no
need to rely on the identity mapping in the first 1MB in memory.
2020-01-21 11:29:58 +01:00
Liav A
aca317d889 Kernel: PCI MMIO no longer uses map_for_kernel()
PCI MMIO access is done by modifying the related PhysicalPage directly,
then we request to remap the region to create the mapping.
2020-01-21 11:29:58 +01:00
Andreas Kling
22cfb1f3bd Kernel: Clear unveiled state on exec() 2020-01-21 10:46:31 +01:00
Andreas Kling
cf48c20170 Kernel: Forked children should inherit unveil()'ed paths 2020-01-21 09:44:32 +01:00
Andreas Kling
02406b7305 ProcFS: Add /proc/PID/unveil
This file exposes a JSON array of all the unveiled paths in a process.
2020-01-20 22:19:02 +01:00
Andreas Kling
0569123ad7 Kernel: Add a basic implementation of unveil()
This syscall is a complement to pledge() and adds the same sort of
incremental relinquishing of capabilities for filesystem access.

The first call to unveil() will "drop a veil" on the process, and from
now on, only unveiled parts of the filesystem are visible to it.

Each call to unveil() specifies a path to either a directory or a file
along with permissions for that path. The permissions are a combination
of the following:

- r: Read access (like the "rpath" promise)
- w: Write access (like the "wpath" promise)
- x: Execute access
- c: Create/remove access (like the "cpath" promise)

Attempts to open a path that has not been unveiled with fail with
ENOENT. If the unveiled path lacks sufficient permissions, it will fail
with EACCES.

Like pledge(), subsequent calls to unveil() with the same path can only
remove permissions, not add them.

Once you call unveil(nullptr, nullptr), the veil is locked, and it's no
longer possible to unveil any more paths for the process, ever.

This concept comes from OpenBSD, and their implementation does various
things differently, I'm sure. This is just a first implementation for
SerenityOS, and we'll keep improving on it as we go. :^)
2020-01-20 22:12:04 +01:00
Andreas Kling
f4f958f99f Kernel: Make DoubleBuffer use a KBuffer instead of kmalloc()ing
Background: DoubleBuffer is a handy buffer class in the kernel that
allows you to keep writing to it from the "outside" while the "inside"
reads from it. It's used for things like LocalSocket and TTY's.
Internally, it has a read buffer and a write buffer, but the two will
swap places when the read buffer is exhausted (by reading from it.)

Before this patch, it was internally implemented as two Vector<u8>
that we would swap between when the reader side had exhausted the data
in the read buffer. Now instead we preallocate a large KBuffer (64KB*2)
on DoubleBuffer construction and use that throughout its lifetime.

This removes all the kmalloc heap traffic caused by DoubleBuffers :^)
2020-01-20 16:08:49 +01:00
Andreas Kling
0a282e0a02 Kernel: Allow naming KBuffers 2020-01-20 14:00:11 +01:00
Andreas Kling
e901a3695a Kernel: Use the templated copy_to/from_user() in more places
These ensure that the "to" and "from" pointers have the same type,
and also that we copy the correct number of bytes.
2020-01-20 13:41:21 +01:00
Sergey Bugaev
d5426fcc88 Kernel: Misc tweaks 2020-01-20 13:26:06 +01:00
Sergey Bugaev
9bc6157998 Kernel: Return new fd from sys$fcntl(F_DUPFD)
This fixes GNU Bash getting confused after performing a redirection.
2020-01-20 13:26:06 +01:00
Andreas Kling
b52d0afecf SB16: Map the DMA buffer in kernelspace so we can write to it
This broke with the >3GB paging overhaul. It's no longer possible to
write directly to physical addresses below the 8MB mark. Physical pages
need to be mapped into kernel VM by using a Region.

Fixes #1099.
2020-01-20 13:13:03 +01:00
Andreas Kling
a0b716cfc5 Add AnonymousVMObject::create_with_physical_page()
This can be used to create a VMObject for a single PhysicalPage.
2020-01-20 13:13:03 +01:00
Andreas Kling
4ebff10bde Kernel: Write-only regions should still be mapped as present
There is no real "read protection" on x86, so we have no choice but to
map write-only pages simply as "present & read/write".

If we get a read page fault in a non-readable region, that's still a
correctness issue, so we crash the process. It's by no means a complete
protection against invalid reads, since it's trivial to fool the kernel
by first causing a write fault in the same region.
2020-01-20 13:13:03 +01:00
Andreas Kling
4b7a89911c Kernel: Remove some unnecessary casts to uintptr_t
VirtualAddress is constructible from uintptr_t and const void*.
PhysicalAddress is constructible from uintptr_t but not const void*.
2020-01-20 13:13:03 +01:00
Andreas Kling
a246e9cd7e Use uintptr_t instead of u32 when storing pointers as integers
uintptr_t is 32-bit or 64-bit depending on the target platform.
This will help us write pointer size agnostic code so that when the day
comes that we want to do a 64-bit port, we'll be in better shape.
2020-01-20 13:13:03 +01:00
Andreas Kling
5ce1cc89a0 Kernel: Add fast-path for sys$gettid()
The userspace locks are very aggressively calling sys$gettid() to find
out which thread ID they have.

Since syscalls are quite heavy, this can get very expensive for some
programs. This patch adds a fast-path for sys$gettid(), which makes it
skip all of the usual syscall validation and just return the thread ID
right away.

This cuts Kernel/Process.cpp compile time by ~18%, from ~29 to ~24 sec.
2020-01-19 17:32:05 +01:00
Andreas Kling
8d9dd1b04b Kernel: Add a 1-deep cache to Process::region_from_range()
This simple cache gets hit over 70% of the time on "g++ Process.cpp"
and shaves ~3% off the runtime.
2020-01-19 16:44:37 +01:00
Andreas Kling
ae0c435e68 Kernel: Add a Process::add_region() helper
This is a private helper for adding a Region to Process::m_regions.
It's just for convenience since it's a bit cumbersome to do this.
2020-01-19 16:26:42 +01:00
Andreas Kling
1dc9fa9506 Kernel: Simplify PageDirectory swapping in sys$execve()
Swap out both the PageDirectory and the Region list at the same time,
instead of doing the Region list slightly later.
2020-01-19 16:05:42 +01:00
Andreas Kling
05836757c6 Kernel: Oops, fix bad sort order of available VM ranges
This made the allocator perform worse, so here's another second off of
the Kernel/Process.cpp compile time from a simple bugfix! (31s to 30s)
2020-01-19 15:53:43 +01:00
Andreas Kling
167b57a6b7 TmpFS: Grow the underlying inode buffer with 2x factor when written to
Before this, we would end up in memcpy() churn hell when a program was
doing repeated write() calls to a file in /tmp.

An even better solution will be to only grow the VM allocation of the
underlying buffer and keep using the same physical pages. This would
eliminate all the memcpy() work.

I've benchmarked this using g++ to compile Kernel/Process.cpp.
With these changes, compilation goes from ~35 sec to ~31 sec. :^)
2020-01-19 14:01:32 +01:00
Andreas Kling
1d02ac35fc Kernel: Limit Thread::raw_backtrace() to the max profiler stack size
Let's avoid walking overly long stacks here, since kmalloc() is finite.
2020-01-19 13:54:09 +01:00
Andreas Kling
6eab7b398d Kernel: Make ProcessPagingScope restore CR3 properly
Instead of restoring CR3 to the current process's paging scope when a
ProcessPagingScope goes out of scope, we now restore exactly whatever
the CR3 value was when we created the ProcessPagingScope.

This fixes breakage in situations where a process ends up with nested
ProcessPagingScopes. This was making profiling very fragile, and with
this change it's now possible to profile g++! :^)
2020-01-19 13:44:53 +01:00
Andreas Kling
ad3f931707 Kernel: Optimize VM range deallocation a bit
Previously, when deallocating a range of VM, we would sort and merge
the range list. This was quite slow for large processes.

This patch optimizes VM deallocation in the following ways:

- Use binary search instead of linear scan to find the place to insert
  the deallocated range.

- Insert at the right place immediately, removing the need to sort.

- Merge the inserted range with any adjacent range(s) in-line instead
  of doing a separate merge pass into a list copy.

- Add Traits<Range> to inform Vector that Range objects are trivial
  and can be moved using memmove().

I've also added an assertion that deallocated ranges are actually part
of the RangeAllocator's initial address range.

I've benchmarked this using g++ to compile Kernel/Process.cpp.
With these changes, compilation goes from ~41 sec to ~35 sec.
2020-01-19 13:29:59 +01:00
Andreas Kling
87583aea9c Kernel: Use copy_from_user() when appropriate during thread backtracing 2020-01-19 10:33:26 +01:00
Andreas Kling
38fc31ff11 Kernel: Always switch to own page tables when crashing/asserting
I noticed this while debugging a crash in backtrace generation.
If a process would crash while temporarily inspecting another process's
address space, the crashing thread would still use the other process's
page tables while handling the crash, causing all kinds of confusion
when trying to walk the stack of the crashing thread.
2020-01-19 10:33:17 +01:00
Andreas Kling
f7b394e9a1 Kernel: Assert that copy_to/from_user() are called with user addresses
This will panic the kernel immediately if these functions are misused
so we can catch it and fix the misuse.

This patch fixes a couple of misuses:

    - create_signal_trampolines() writes to a user-accessible page
      above the 3GB address mark. We should really get rid of this
      page but that's a whole other thing.

    - CoW faults need to use copy_from_user rather than copy_to_user
      since it's the *source* pointer that points to user memory.

    - Inode faults need to use memcpy rather than copy_to_user since
      we're copying a kernel stack buffer into a quickmapped page.

This should make the copy_to/from_user() functions slightly less useful
for exploitation. Before this, they were essentially just glorified
memcpy() with SMAP disabled. :^)
2020-01-19 09:18:55 +01:00
Andreas Kling
2cd212e5df Kernel: Let's say that everything < 3GB is user virtual memory
Technically the bottom 2MB is still identity-mapped for the kernel and
not made available to userspace at all, but for simplicity's sake we
can just ignore that and make "address < 0xc0000000" the canonical
check for user/kernel.
2020-01-19 08:58:33 +01:00
Andreas Kling
5ce9382e98 Kernel: Only require "stdio" pledge for sending signals to self
This should match what OpenBSD does. Sending a signal to yourself seems
basically harmless.
2020-01-19 08:50:55 +01:00
Sergey Bugaev
3e1ed38d4b Kernel: Do not return ENOENT for unresolved symbols
ENOENT means "no such file or directory", not "no such symbol". Return EINVAL
instead, as we already do in other cases.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
d0d13e2bf5 Kernel: Move setting file flags and r/w mode to VFS::open()
Previously, VFS::open() would only use the passed flags for permission checking
purposes, and Process::sys$open() would set them on the created FileDescription
explicitly. Now, they should be set by VFS::open() on any files being opened,
including files that the kernel opens internally.

This also lets us get rid of the explicit check for whether or not the returned
FileDescription was a preopen fd, and in fact, fixes a bug where a read-only
preopen fd without any other flags would be considered freshly opened (due to
O_RDONLY being indistinguishable from 0) and granted a new set of flags.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
544b8286da Kernel: Do not open stdio fds for kernel processes
Kernel processes just do not need them.

This also avoids touching the file (sub)system early in the boot process when
initializing the colonel process.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
6466c3d750 Kernel: Pass correct permission flags when opening files
Right now, permission flags passed to VFS::open() are effectively ignored, but
that is going to change.

* O_RDONLY is 0, but it's still nicer to pass it explicitly
* POSIX says that binding a Unix socket to a symlink shall fail with EADDRINUSE
2020-01-18 23:51:22 +01:00
Sergey Bugaev
7d4a267504 Kernel: Fix identifier casing 2020-01-18 23:51:22 +01:00
Andreas Kling
862b3ccb4e Kernel: Enforce W^X between sys$mmap() and sys$execve()
It's now an error to sys$mmap() a file as writable if it's currently
mapped executable by anyone else.

It's also an error to sys$execve() a file that's currently mapped
writable by anyone else.

This fixes a race condition vulnerability where one program could make
modifications to an executable while another process was in the kernel,
in the middle of exec'ing the same executable.

Test: Kernel/elf-execve-mmap-race.cpp
2020-01-18 23:40:12 +01:00
Andreas Kling
4e6fe3c14b Kernel: Symbolicate kernel EIP on process crash
Process::crash() was assuming that EIP was always inside the ELF binary
of the program, but it could also be in the kernel.
2020-01-18 14:38:39 +01:00
Andreas Kling
9c9fe62a4b Kernel: Validate the requested range in allocate_region_with_vmobject() 2020-01-18 14:37:22 +01:00
Andreas Kling
aa63de53bd Kernel: Use get_syscall_path_argument() in sys$execve()
Paths passed to sys$execve() should certainly be subject to all the
usual path validation checks.
2020-01-18 11:43:28 +01:00
Andreas Kling
b65572b3fe Kernel: Disallow mmap names longer than PATH_MAX 2020-01-18 11:34:53 +01:00
Andreas Kling
c3e4387c57 Kernel: Stop flushing GDT/IDT registers all the time 2020-01-18 11:10:44 +01:00
Andreas Kling
17d4e74518 Kernel: Clean up and reorganize init.cpp
This is where we first enter into the kernel, so we should make at
least some effort to keep things nice and understandable.
2020-01-18 10:24:57 +01:00
Andreas Kling
6fea316611 Kernel: Move all CPU feature initialization into cpu_setup()
..and do it very very early in boot.
2020-01-18 10:11:29 +01:00
Andreas Kling
210adaeca6 ACPI: Re-enable ACPI initialization after paging changes 2020-01-18 10:03:14 +01:00
Andreas Kling
22d4920cef RTL8139: Unbreak RealTek Ethernet driver after paging changes 2020-01-18 09:52:40 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Andreas Kling
19c31d1617 Kernel: Always dump kernel regions when dumping process regions 2020-01-18 08:57:18 +01:00
Andreas Kling
345f92d5ac Kernel: Remove two unused MemoryManager functions 2020-01-18 08:57:18 +01:00
Andreas Kling
3e8b60c618 Kernel: Clean up MemoryManager initialization a bit more
Move the CPU feature enabling to functions in Arch/i386/CPU.cpp.
2020-01-18 00:28:16 +01:00
Andreas Kling
a850a89c1b Kernel: Add a random offset to the base of the per-process VM allocator
This is not ASLR, but it does de-trivialize exploiting the ELF loader
which would previously always parse executables at 0x01001000 in every
single exec(). I've taken advantage of this multiple times in my own
toy exploits and it's starting to feel cheesy. :^)
2020-01-17 23:29:54 +01:00
Andreas Kling
536c0ff3ee Kernel: Only clone the bottom 2MB of mappings from kernel to processes 2020-01-17 22:34:36 +01:00
Andreas Kling
122c76d7fa Kernel: Don't allocate per-process PDPT from super pages either
The default system is now down to 3 super pages allocated on boot. :^)
2020-01-17 22:34:36 +01:00
Andreas Kling
ad1f79fb4a Kernel: Stop allocating page tables from the super pages pool
We now use the regular "user" physical pages for on-demand page table
allocations. This was by far the biggest source of super physical page
exhaustion, so that bug should be a thing of the past now. :^)

We still have super pages, but they are barely used. They remain useful
for code that requires memory with a low physical address.

Fixes #1000.
2020-01-17 22:34:36 +01:00
Andreas Kling
f71fc88393 Kernel: Re-enable protection of the kernel image in memory 2020-01-17 22:34:36 +01:00
Andreas Kling
59b584d983 Kernel: Tidy up the lowest part of the address space
After MemoryManager initialization, we now only leave the lowest 1MB
of memory identity-mapped. The very first (null) page is not present.
All other pages are RW but not X. Supervisor only.
2020-01-17 22:34:36 +01:00
Andreas Kling
545ec578b3 Kernel: Tidy up the types imported from boot.S a little bit 2020-01-17 22:34:36 +01:00
Andreas Kling
7e6f0efe7c Kernel: Move Multiboot memory map parsing to its own function 2020-01-17 22:34:36 +01:00
Andreas Kling
ba8275a48e Kernel: Clean up ensure_pte() 2020-01-17 22:34:36 +01:00
Andreas Kling
e362b56b4f Kernel: Move kernel above the 3GB virtual address mark
The kernel and its static data structures are no longer identity-mapped
in the bottom 8MB of the address space, but instead move above 3GB.

The first 8MB above 3GB are pseudo-identity-mapped to the bottom 8MB of
the physical address space. But things don't have to stay this way!

Thanks to Jesse who made an earlier attempt at this, it was really easy
to get device drivers working once the page tables were in place! :^)

Fixes #734.
2020-01-17 22:34:26 +01:00
Sergey Bugaev
4417bd97d7 Kernel: Misc tweaks 2020-01-17 21:49:58 +01:00
Sergey Bugaev
064cd2278c Kernel: Remove the use of FileSystemPath in sys$realpath()
Now that VFS::resolve_path() canonicalizes paths automatically, we don't need to
do that here anymore.
2020-01-17 21:49:58 +01:00
Sergey Bugaev
68aeefa49b ProcFS: Implement symlink magic 2020-01-17 21:49:58 +01:00
Sergey Bugaev
8642a7046c Kernel: Let inodes provide pre-open file descriptions
Some magical inodes, such as /proc/pid/fd/fileno, are going to want to open() to
a custom FileDescription, so add a hook for that.
2020-01-17 21:49:58 +01:00
Sergey Bugaev
ae64fd1b27 Kernel: Let symlinks resolve themselves
Symlink resolution is now a virtual method on an inode,
Inode::resolve_as_symlink(). The default implementation just reads the stored
inode contents, treats them as a path and calls through to VFS::resolve_path().

This will let us support other, magical files that appear to be plain old
symlinks but resolve to something else. This is particularly useful for ProcFS.
2020-01-17 21:49:58 +01:00
Sergey Bugaev
e0013a6b4c Kernel+LibC: Unify sys$open() and sys$openat()
The syscall is now called sys$open(), but it behaves like the old sys$openat().
In userspace, open_with_path_length() is made a wrapper over openat_with_path_length().
2020-01-17 21:49:58 +01:00
Sergey Bugaev
d6184afcae Kernel: Simplify VFS::resolve_path() further
It turns out we don't even need to store the whole custody chain, as we only
ever access its last element. So we can just store one custody. This also fixes
a performance FIXME :^)

Also, rename parent_custody to out_parent.
2020-01-17 21:49:58 +01:00
Andreas Kling
4d4d5e1c07 Kernel: Drop futex queues/state on exec()
This state is not meaningful to the new process image so just drop it.
2020-01-17 16:08:00 +01:00
Andreas Kling
26a31c7efb Kernel: Add "accept" pledge promise for accepting incoming connections
This patch adds a new "accept" promise that allows you to call accept()
on an already listening socket. This lets programs set up a socket for
for listening and then dropping "inet" and/or "unix" so that only
incoming (and existing) connections are allowed from that point on.
No new outgoing connections or listening server sockets can be created.

In addition to accept() it also allows getsockopt() with SOL_SOCKET
and SO_PEERCRED, which is used to find the PID/UID/GID of the socket
peer. This is used by our IPC library when creating shared buffers that
should only be accessible to a specific peer process.

This allows us to drop "unix" in WindowServer and LookupServer. :^)

It also makes the debugging/introspection RPC sockets in CEventLoop
based programs work again.
2020-01-17 11:19:06 +01:00
Andreas Kling
a9b24ebbe8 Kernel: Reindent linker script 2020-01-17 11:07:02 +01:00
Andreas Kling
c6e552ac8f Kernel+LibELF: Don't blindly trust ELF symbol offsets in symbolication
It was possible to craft a custom ELF executable that when symbolicated
would cause the kernel to read from user-controlled addresses anywhere
in memory. You could then fetch this memory via /proc/PID/stack

We fix this by making ELFImage hand out StringView rather than raw
const char* for symbol names. In case a symbol offset is outside the
ELF image, you get a null StringView. :^)

Test: Kernel/elf-symbolication-kernel-read-exploit.cpp
2020-01-16 22:11:31 +01:00
Andreas Kling
806f19d647 run: Bump default RAM size from 128 MB to 256 MB 2020-01-15 23:14:20 +01:00
Andreas Kling
d4d17ce423 Kernel: Trying to sys$link() a directory should fail with EPERM 2020-01-15 22:11:44 +01:00
Andreas Kling
e91f03cb39 Ext2FS: Assert that inline symlink read/write always uses offset=0 2020-01-15 22:11:44 +01:00
Andreas Kling
5a13a5416e Kernel: Avoid an extra call to read_bytes() in Inode::read_entire()
If we slurp up the entire inode in a single read_bytes(), no need to
call read_bytes() again.
2020-01-15 22:11:44 +01:00