This class had slightly confusing semantics and the added weirdness
doesn't seem worth it just so we can say "." instead of "->" when
iterating over a vector of NNRPs.
This patch replaces NonnullRefPtrVector<T> with Vector<NNRP<T>>.
There are now 2 separate classes for almost the same object type:
- EnumerableDeviceIdentifier, which is used in the enumeration code for
all PCI host controller classes. This is allowed to be moved and
copied, as it doesn't support ref-counting.
- DeviceIdentifier, which inherits from EnumerableDeviceIdentifier. This
class uses ref-counting, and is not allowed to be copied. It has a
spinlock member in its structure to allow safely executing complicated
IO sequences on a PCI device and its space configuration.
There's a static method that allows a quick conversion from
EnumerableDeviceIdentifier to DeviceIdentifier while creating a
NonnullRefPtr out of it.
The reason for doing this is for the sake of integrity and reliablity of
the system in 2 places:
- Ensure that "complicated" tasks that rely on manipulating PCI device
registers are done in a safe manner. For example, determining a PCI
BAR space size requires multiple read and writes to the same register,
and if another CPU tries to do something else with our selected
register, then the result will be a catastrophe.
- Allow the PCI API to have a united form around a shared object which
actually holds much more data than the PCI::Address structure. This is
fundamental if we want to do certain types of optimizations, and be
able to support more features of the PCI bus in the foreseeable
future.
This patch already has several implications:
- All PCI::Device(s) hold a reference to a DeviceIdentifier structure
being given originally from the PCI::Access singleton. This means that
all instances of DeviceIdentifier structures are located in one place,
and all references are pointing to that location. This ensures that
locking the operation spinlock will take effect in all the appropriate
places.
- We no longer support adding PCI host controllers and then immediately
allow for enumerating it with a lambda function. It was found that
this method is extremely broken and too much complicated to work
reliably with the new paradigm being introduced in this patch. This
means that for Volume Management Devices (Intel VMD devices), we
simply first enumerate the PCI bus for such devices in the storage
code, and if we find a device, we attach it in the PCI::Access method
which will scan for devices behind that bridge and will add new
DeviceIdentifier(s) objects to its internal Vector. Afterwards, we
just continue as usual with scanning for actual storage controllers,
so we will find a corresponding NVMe controllers if there were any
behind that VMD bridge.
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
obtain any lock rank just via the template parameter. It was not
previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
wanted in the first place (or made use of). Locks randomly changing
their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
taken in the right order (with the right lock rank checking
implementation) as rank information is fully statically known.
This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
This patch fixes some include problems on aarch64. aarch64 is still
currently broken but this will get us back to the underlying problem
of FloatExtractor.
This device is supposed to be used in microvm and ISA-PC machine types,
and we assume that if we are able to probe for the QEMU BGA version of
0xB0C5, then we have an existing ISA Bochs VGA adapter to utilize.
To ensure we don't instantiate the driver for non isa-vga devices, we
try to ensure that PCI is disabled because hardware IO test probe failed
so we can be sure that we use this special handling code only in the
QEMU microvm and ISA-PC machine types. Unfortunately, this means that if
for some reason the isa-vga device is attached for the i440FX or Q35
machine types, we simply are not able to drive the device in such setups
at all.
To determine the amount of VRAM being available, we read VBE register at
offset 0xA. That register holds the amount of VRAM divided by 64K, so we
need to multiply the value in our code to use the actual VRAM size value
again.
The isa-vga device requires us to hardcode the framebuffer physical
address to 0xE0000000, and that address is not expected to change in the
future as many other projects rely on the isa-vga framebuffer to be
present at that physical memory address.
The simple PCI::HostBridge class implements access to the PCI
configuration space by using x86 IO instructions. Therefore, it should
be put in the Arch/x86/PCI directory so it can be easily omitted for
non-x86 builds.
All users which relied on the default constructor use a None lock rank
for now. This will make it easier to in the future remove LockRank and
actually annotate the ranks by searching for None.
Instead, hold the lock while we copy the contents to a stack-based
Vector then iterate on it without any locking.
Because we rely on heap allocations, we need to propagate errors back
in case of OOM condition, therefore, both PCI::enumerate API function
and PCI::Access::add_host_controller_and_enumerate_attached_devices use
now a ErrorOr<void> return value to propagate errors. OOM Error can only
occur when enumerating the m_device_identifiers vector under a spinlock
and trying to expand the temporary Vector which will be used locklessly
to actually iterate over the PCI::DeviceIdentifiers objects.
To declare that we don't have a PCI bus in the system we do two things:
1. Probe IO ports before enabling access -
In case we are using the QEMU ISA-PC machine type, IO probing results in
floating bus condition (returning 0xFF values), thus, we know we don't
have PCI bus on the system.
2. Allow the user to specify to not use the PCI bus at all in the kernel
commandline.
Two classes are added - HostBridge and MemoryBackedHostBridge, which
both derive from HostController class. This allows the kernel to map
different busses from different PCI domains in the same time. Each
HostController implementation doesn't take the Address object to address
PCI devices but instead we take distinct numbers of the PCI bus, device
and function as it allows us to specify arbitrary PCI domains in the
Address structure and still to get the correct PCI devices. This also
matches the hardware behavior of PCI domains - the host bridge merely
takes memory operations or IO operations and translates them to
addressing of three components - PCI bus, device and function.
These changes also greatly simplify how enumeration of Host Bridges work
now - scanning of the hardware depends on what the Host bridges can do
for us, so in case we have multiple host bridges that expose a memory
mapped region or IO ports to access PCI configuration space, we simply
let the code of the host bridge to figure out how to fetch data for us.
Another semantical change is that a PCI domain structure is no longer
attached to a PhysicalAddress, so even in the case that the machine
doesn't implement PCI domains, we still treat that machine to contain 1
PCI domain to treat that one host bridge in the same way, like with a
machine with one or more PCI domains.
Instead, just ensure we pick the m_access_lock and then m_scan_lock when
doing a scan/re-scan of the PCI configuration space so we know nobody
can actually access the PCI configuration space during the scan.
The m_scan_lock is now a Spinlock, to ensure we cannot yield to other
process while we do the PCI configuration space scanning.
A couple of things were changed:
1. Semantic changes - PCI segments are now called PCI domains, to better
match what they are really. It's also the name that Linux gave, and it
seems that Wikipedia also uses this name.
We also remove PCI::ChangeableAddress, because it was used in the past
but now it's no longer being used.
2. There are no WindowedMMIOAccess or MMIOAccess classes anymore, as
they made a bunch of unnecessary complexity. Instead, Windowed access is
removed entirely (this was tested, but never was benchmarked), so we are
left with IO access and memory access options. The memory access option
is essentially mapping the PCI bus (from the chosen PCI domain), to
virtual memory as-is. This means that unless needed, at any time, there
is only one PCI bus being mapped, and this is changed if access to
another PCI bus in the same PCI domain is needed. For now, we don't
support mapping of different PCI buses from different PCI domains at the
same time, because basically it's still a non-issue for most machines
out there.
2. OOM-safety is increased, especially when constructing the Access
object. It means that we pre-allocating any needed resources, and we try
to find PCI domains (if requested to initialize memory access) after we
attempt to construct the Access object, so it's possible to fail at this
point "gracefully".
3. All PCI API functions are now separated into a different header file,
which means only "clients" of the PCI subsystem API will need to include
that header file.
4. Functional changes - we only allow now to enumerate the bus after
a hardware scan. This means that the old method "enumerate_hardware"
is removed, so, when initializing an Access object, the initializing
function must call rescan on it to force it to find devices. This makes
it possible to fail rescan, and also to defer it after construction from
both OOM-safety terms and hotplug capabilities.
There's no need for generated files in SysFS to tell you their precise
file size when you stat() them.
I noticed when profiling "find /" that we were spending a chunk of time
generating and throwing away SysFS content just so we could tell you
exactly how large it would be. :^)