Commit Graph

2201 Commits

Author SHA1 Message Date
Ted Blackman
c8be121455
Pointer Compression to enable 8G Loom (#164)
### Description

Resolves #163

In fulfillment of https://urbit.org/grants/loom-pointer-compression

### Benchmark

#### Basic brass pill fakezod boot benchmark - x86_64 linux

Pay primary attention to `Elapsed (wall clock) time`, `Maximum resident
set size
(kbytes)` and `Major (requiring I/O) page faults`

##### Takeaway

We expected increased memory usage because this is naturally a tradeoff
of
alignment. Do note, that runs (2) and (3) included changes to align the
_stack_
as well as the _heap_. In the run without stack alignment (4) you can
see that
stack alignment has no effect on max RSS -- at least when booting from a
pill. From some basic evaluation in gdb I've done in the past, I expect
stack
usage when DWORD-aligned to increase by ~50% (rather than a theoretical
100%). Stack usage is quite small compared to heap usage however, so you
shouldn't expect to see this reflected in maximum RSS. Overall maximum
resident
memory increased by about ~33%.

The number of major pagefaults encountered during a brass boot is
roughly equal
to prior.

The elapsed (wall clock) time difference between (2) and (3) is
essentially
zero. There is essentially no performance gained by the virtual bit size
being a
compile-time constant.

There is a small latency cost in the current DWORD-aligned heap
allocation
implementation as compared to a runtime that doesn't require allocations
to be
aligned. Compare the elapsed times of (1) -- 2:21.09 or 141.09 s -- and
(2) --
2:23.50 or 143.50 s -- result: ~1.7% increased latency. If you look at
run (4)
however, which excluded stack alignment changes -- 2:22.55 or 142.55 s
--, we
split the difference at ~1.0% increased latency. Note, this _is_
repeatable and
the 1% difference isn't random. Running the same program over again on
the same
system exhibits tiny variance.

##### 1) -O3 no pointer compression vere/develop

A run from the HEAD of vere/develop

commit 7c890c3350

```
Command being timed: "./urbit -t -q -F zod -B brass.pill -c zod"
User time (seconds): 1.25
System time (seconds): 0.03
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:21.09
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 148036
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3492
Minor (reclaiming a frame) page faults: 6188
Voluntary context switches: 68
Involuntary context switches: 5
Swaps: 0
File system inputs: 14866
File system outputs: 21544
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
```

##### 2) -O3 compiletime determined virtual bit size
i/163/pointer-compression

State of pointer compression work prior to migration (where concessions
to
runtime determined virtual bit size were made)

commit 4083f1c660

```
Command being timed: "./urbit -t -q -F zod -B brass.pill -c zod"
User time (seconds): 1.39
System time (seconds): 0.04
Percent of CPU this job got: 1%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:23.50
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 197176
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3487
Minor (reclaiming a frame) page faults: 6219
Voluntary context switches: 68
Involuntary context switches: 2
Swaps: 0
File system inputs: 14866
File system outputs: 21544
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
```

##### 3) -O3 runtime determined virtual bit size
i/163/pointer-compression

Current state of pointer compression work -- after implementation of
migration
and runtime determined virtual bit size concession

commit 8dffe067e1:

```
Command being timed: "./urbit -t -q -F zod -B brass.pill -c zod"
User time (seconds): 1.40
System time (seconds): 0.06
Percent of CPU this job got: 1%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:23.52
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 197200
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3489
Minor (reclaiming a frame) page faults: 7242
Voluntary context switches: 69
Involuntary context switches: 4
Swaps: 0
File system inputs: 14866
File system outputs: 21544
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
```

##### 4) -O3 runtime determined virtual bit size _WITHOUT_ stack
alignment barter-simsum/pointer-compression-no-align-stack

commit 8b0438ab3b

```
Command being timed: "./urbit -t -q -F zod -B brass.pill -c zod"
User time (seconds): 1.42
System time (seconds): 0.06
Percent of CPU this job got: 1%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:22.55
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 197204
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3492
Minor (reclaiming a frame) page faults: 6221
Voluntary context switches: 68
Involuntary context switches: 3
Swaps: 0
File system inputs: 14866
File system outputs: 21544
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
```

##### FINAL BENCHMARK BEFORE MERGE:

This was run after some fairly significant changes to minimize malloc
padding,
fix memory corruption when run with `U3_MEMORY_DEBUG`, and more.

It was agreed to keep stack alignment out of this PR as it currently
isn't used
and costs us a bit of latency.

Runtime vs compiletime determined pointer compression still shows no
latency
difference on the x86 linux machine tested (ddr4 memory). On an m2 mac
air,
there _was_ a 5% latency increase from compiletime to runtime pointer
compression. This may be fixed later and would not necessitate another
migration.

The additional free list sanity checking done in `u3a_loom_sane`
introduces
negligible latency in `u3e_save`. On a relatively fragmented heap, it
only takes
60ms to complete. This will be kept in order to detect _some_ memory
corruption
if it occurs and prevent that corruption from propagating to disk.

A brass pill boot was performed off of
13e0b43d8da4bdd318fcd4e3d3610caa3af4608a. Observe there is no regression
in the
Elapsed (wall clock) time statistic. Further, the maximum resident set
size has
been reduced by 25% back to its pre pointer compression size (150M).
This is
likely due to a decrease in the average allocation's padding.

Lastly, total sweep size was compared between a freshly booted pier
without
pointer compression and with pointer compression post migration. There
is no
noticeable increase in the overall size of allocations.

```
Command being timed: "./burbit -t -q -F zod -B brass.pill -c brasspillbench"
User time (seconds): 1.27
System time (seconds): 0.04
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:21.49
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 150088
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3490
Minor (reclaiming a frame) page faults: 6188
Voluntary context switches: 64
Involuntary context switches: 4
Swaps: 0
File system inputs: 14866
File system outputs: 21544
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
```
2023-02-28 15:56:25 -05:00
barter-simsum
c08ada2524 u3a_loom_sane() 2023-02-28 12:07:37 -05:00
barter-simsum
65328d762a misc. 2023-02-28 12:07:37 -05:00
barter-simsum
ba1b0f1e3d refactor alignment functions 2023-02-28 12:07:37 -05:00
barter-simsum
953b21ae82 fix U3_MEMORY_DEBUG corruption. Minimize required padding
pad malloc internal pad calculations were largely responsible for the
corruption. It happened to be the case that without U3_MEMORY_DEBUG set (which
doubles the size of a `u3a_box`), we overallocated just enough memory for the
pad miscalculation to not effect us.

This fixes both the overallocation and the pad miscalculation.
2023-02-28 12:07:37 -05:00
barter-simsum
c71060be84 (ALLO, ALHI) -> (C3_ALGHI, C3_ALGLO) 2023-02-28 12:07:37 -05:00
barter-simsum
45217c3409 factor out _me_align_(pad|dap) and all instances of alp_w
This drastically simplifies the bizarre (ald_w, alp_w) alignment logic which
also seems to have been the cause of issues with heap corruption originating in
handling of internal 16 byte alignment by _ca_box_make_hat
2023-02-28 12:07:37 -05:00
barter-simsum
be4b92b38b cenums are dumb 2023-02-28 12:07:37 -05:00
barter-simsum
ef2c4bba4d introduce vere/version.h
version numbers - currently for u3e and u3v - should be kept in their own header
to avoid dependency loops since anything should be able to source these.

I would like to avoid literal comparison like:

if (ver_w == 2) { }

and instead opt for

if (ver_w == U3V_VER2) { }

or

if (ver_w == U3V_LATEST) { }

Though this may seem overkill at first given the only version we've incremented
is u3H->ver_w, this is the simplest solution to avoid dependency loops
2023-02-28 12:07:37 -05:00
barter-simsum
98ef6582f6 paren wrap road argument in road macros u3a_open, u3a_full, etc 2023-02-28 12:07:37 -05:00
barter-simsum
48e326ff03 loom migration to support compressed pointers 2023-02-28 12:07:37 -05:00
barter-simsum
0966fb0195 fix patch control version var name pat_u->con_u->ver_y -> ver_w 2023-02-28 12:07:37 -05:00
barter-simsum
8c0cc71127 specify up to 33 bits (8G) with --loom
Unsure how we should ultimately do this. Specifying 33 bits should be
conditional on the loom having compressed pointers obviously.

P.S. Having implemented the migration by now, the migration is not optional, so
this is fine.
2023-02-28 12:07:37 -05:00
barter-simsum
b5bef4fc09 refactor _ce_patch_verify -- use new PRIc3... printf specifiers 2023-02-25 16:14:09 -05:00
barter-simsum
c93ef2e82c fix snapshot crashes caused by > (2^32) addressable bytes in loom
Changes in next/vere also fixed this. This is nearly a subset of those changes.
2023-02-25 16:14:09 -05:00
barter-simsum
fbf69a018e prevent overflow when printing memory with |mass
Previously, anything above 4G would overflow and print as less memory
2023-02-25 16:14:09 -05:00
barter-simsum
6c36069749 golf u3r_met -- also fixes failing STATIC_ASSERT after 8G expansion
I'm assuming the old switch-case was an attempted performance optimization -
more constants with math that could be elided by the compiler, e.g.

`if (gal_w > (UINT32_MAX - 35 >> 5))`.

However, looking at -O3 disassembly, I really doubt it's any faster. There are
at least as many conditional jumps and the instruction size is about 2x larger.

===

Moreover, this change does not artificially limit the size that gal_w can
be. For instance, in the previous implementation, for a value of a_y=2, gal_w
could not exceed the following without bailing:

`(UINT32_MAX - 35) >> 5` =>
`0x07FFFFFF`

Now, gal_w cannot exceed

`((UINT32_MAX - (32 + max_y)) >> (5 - a_y))` =>
`((UINT32_MAX - 35) >> 3)` =>
`0x1FFFFFFF`

===

This has been confirmed to return exactly the same results as prior
2023-02-25 16:14:09 -05:00
barter-simsum
cea338ce27 4G -> 8G loom 2023-02-25 16:14:09 -05:00
barter-simsum
8413fddb33 pointer compression on 4G loom 2023-02-25 16:14:09 -05:00
barter-simsum
d6261e914f DWORD aligned heap allocs 2023-02-25 16:14:09 -05:00
barter-simsum
abfd14c72a make u3a_[into|outa|to[off|ptr|wtr]] macros inline funcs 2023-02-25 16:14:09 -05:00
barter-simsum
bcac1d6d24 misc. semantically inconsequential 2023-02-25 16:14:07 -05:00
barter-simsum
c552b1a468 error when creating pier and -c unspecified. Exception: -w and -F
-w and -F to continue with previous behavior of implicit pier creation.
2023-02-23 09:32:51 -05:00
barter-simsum
6d39000749 misc 2023-02-17 14:43:32 -05:00
barter-simsum
4b718a5b25 use reserved bit in U3_OS_LoomBits for 4G loom 2023-02-17 14:43:32 -05:00
barter-simsum
f04b4637f2 c3_dessert -- debug assert (breakpoint trap) 2023-02-17 14:43:32 -05:00
barter-simsum
2da1f13cb7 new canon 2023-02-17 14:43:32 -05:00
Matthew LeVan
5565ca3414
Strip trailing commit SHA from version number when downloading pill (#133)
## Description

Resolves #126.
2023-02-17 10:01:17 -05:00
Josh Lehman
6efa7cb17a
build: fix broken import for macos-x86_64 (#147)
These were the most minimal set of changes that allowed me to build vere
on macOS x86-64. See #131 for context. To build, I ran `bazel build
--clang_version="12.0.0" :urbit`.
2023-02-17 11:43:54 -03:00
Philip Monk
563e9f312d fix -Y filename
Fixes #222
2023-02-13 13:51:26 -07:00
Josh Lehman
7098eb8825
chop: offline event log truncation (#165)
## `chop`

`urbit chop <pier>` implements a simple, offline **event log
truncation**[^1] tool.

`chop` gracefully stops the given pier (if running), backs up the
current snapshot to `<pier>/.urb/bhk`, makes sure a current snapshot
exists (i.e., is fully written to disk in `chk/*.bin` with no existing
patch files), reads the metadata and the last event from the pier's
event log, initializes a fresh event log in the `<pier>/.urb/log/chop`
directory, writes the metadata and last event from the original log into
the fresh one, renames the original event log to
`<pier>/.urb/log/chop/data_<first>_<last>.mdb.bak` where `first` and
`last` are the first and last event numbers from the event log, and
exits.

Pilots are then free to move, archive, or delete their `.bak` event log
file, resume normal operation of their ship, and enjoy the many benefits
of lowered disk pressure and any reductions in associated hosting costs.

I've tested `chop` successfully on my own planet `~mastyr-bottec`
(multiple times), three different comets (all fresh), and multitudes of
fake galaxies.

Resolves #122.

Note: `knit`, which is the "undo" button for `chop`, is being
implemented in its own PR #184.

[^1]: https://roadmap.urbit.org/project/event-log-truncation
2023-02-09 07:01:05 -08:00
Matthew LeVan
abbdbe3701 fix that 2023-02-08 13:49:29 -05:00
Matthew LeVan
f166f39d1c warning message 2023-02-08 13:48:49 -05:00
Matthew LeVan
025cf47505 warning message 2023-02-08 13:45:12 -05:00
Matthew LeVan
8606792c54 more error checking 2023-02-08 13:23:19 -05:00
Matthew LeVan
44a903af7c add more error checking 2023-02-08 09:16:24 -05:00
Josh Lehman
01946506e8
Eval improvements (#155) 2023-02-08 06:16:05 -08:00
Amadeo Bellotti
8505fe6fc5 changed u3v_wish_n() to be a u3m_soft() call as well as the additional housekeeping needed 2023-02-07 16:14:31 -05:00
Amadeo Bellotti
1da48e9188 more formatting nits 2023-02-07 09:29:33 -05:00
Amadeo Bellotti
4d0ea7e503 formatting nits and variable name changes 2023-02-07 09:25:46 -05:00
Amadeo Bellotti
195646160e removed whitespace 2023-02-06 20:46:33 -05:00
Amadeo Bellotti
c302614d06 removed whitespace 2023-02-06 20:45:54 -05:00
Peter McEvoy
47ba262383
Implement simple replay command (#192)
Co-authored-by: Peter McEvoy <git@mcevoypeter.com>
2023-02-06 16:00:42 -05:00
Amadeo Bellotti
cdea3edbda added info on u3c_wish_n 2023-02-06 15:56:02 -05:00
Amadeo Bellotti
4b0221d170 Made changes that ~barter-simsum asked for 2023-02-06 15:53:06 -05:00
Matthew LeVan
f725ea46fd add more instructions on completion 2023-02-06 12:56:29 -05:00
Matthew LeVan
44c3b0c37c check if a *current* snapshot exists; move u3m_stop() to end of function 2023-02-06 12:41:09 -05:00
Matthew LeVan
5707981ae4 remove u3e_curr() (unnecessary) 2023-02-06 12:40:37 -05:00
Matthew LeVan
d5ae71115c add warning printf 2023-02-03 15:09:44 -05:00
Matthew LeVan
4806b36c0c don't leak memory when patches exist 2023-02-02 20:43:50 -05:00