urbit/vere - vere - gitea: Gitea Service

urbit/vere

mirror of https://github.com/urbit/vere.git synced 2024-09-11 11:55:31 +03:00

Author	SHA1	Message	Date
Ted Blackman	c8be121455	Pointer Compression to enable 8G Loom (#164 ) ### Description Resolves #163 In fulfillment of https://urbit.org/grants/loom-pointer-compression ### Benchmark #### Basic brass pill fakezod boot benchmark - x86_64 linux Pay primary attention to `Elapsed (wall clock) time`, `Maximum resident set size (kbytes)` and `Major (requiring I/O) page faults` ##### Takeaway We expected increased memory usage because this is naturally a tradeoff of alignment. Do note, that runs (2) and (3) included changes to align the _stack_ as well as the _heap_. In the run without stack alignment (4) you can see that stack alignment has no effect on max RSS -- at least when booting from a pill. From some basic evaluation in gdb I've done in the past, I expect stack usage when DWORD-aligned to increase by ~50% (rather than a theoretical 100%). Stack usage is quite small compared to heap usage however, so you shouldn't expect to see this reflected in maximum RSS. Overall maximum resident memory increased by about ~33%. The number of major pagefaults encountered during a brass boot is roughly equal to prior. The elapsed (wall clock) time difference between (2) and (3) is essentially zero. There is essentially no performance gained by the virtual bit size being a compile-time constant. There is a small latency cost in the current DWORD-aligned heap allocation implementation as compared to a runtime that doesn't require allocations to be aligned. Compare the elapsed times of (1) -- 2:21.09 or 141.09 s -- and (2) -- 2:23.50 or 143.50 s -- result: ~1.7% increased latency. If you look at run (4) however, which excluded stack alignment changes -- 2:22.55 or 142.55 s --, we split the difference at ~1.0% increased latency. Note, this _is_ repeatable and the 1% difference isn't random. Running the same program over again on the same system exhibits tiny variance. ##### 1) -O3 no pointer compression vere/develop A run from the HEAD of vere/develop commit `7c890c3350` ``` Command being timed: "./urbit -t -q -F zod -B brass.pill -c zod" User time (seconds): 1.25 System time (seconds): 0.03 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:21.09 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 148036 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 3492 Minor (reclaiming a frame) page faults: 6188 Voluntary context switches: 68 Involuntary context switches: 5 Swaps: 0 File system inputs: 14866 File system outputs: 21544 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` ##### 2) -O3 compiletime determined virtual bit size i/163/pointer-compression State of pointer compression work prior to migration (where concessions to runtime determined virtual bit size were made) commit `4083f1c660` ``` Command being timed: "./urbit -t -q -F zod -B brass.pill -c zod" User time (seconds): 1.39 System time (seconds): 0.04 Percent of CPU this job got: 1% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:23.50 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 197176 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 3487 Minor (reclaiming a frame) page faults: 6219 Voluntary context switches: 68 Involuntary context switches: 2 Swaps: 0 File system inputs: 14866 File system outputs: 21544 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` ##### 3) -O3 runtime determined virtual bit size i/163/pointer-compression Current state of pointer compression work -- after implementation of migration and runtime determined virtual bit size concession commit `8dffe067e1`: ``` Command being timed: "./urbit -t -q -F zod -B brass.pill -c zod" User time (seconds): 1.40 System time (seconds): 0.06 Percent of CPU this job got: 1% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:23.52 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 197200 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 3489 Minor (reclaiming a frame) page faults: 7242 Voluntary context switches: 69 Involuntary context switches: 4 Swaps: 0 File system inputs: 14866 File system outputs: 21544 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` ##### 4) -O3 runtime determined virtual bit size _WITHOUT_ stack alignment barter-simsum/pointer-compression-no-align-stack commit `8b0438ab3b` ``` Command being timed: "./urbit -t -q -F zod -B brass.pill -c zod" User time (seconds): 1.42 System time (seconds): 0.06 Percent of CPU this job got: 1% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:22.55 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 197204 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 3492 Minor (reclaiming a frame) page faults: 6221 Voluntary context switches: 68 Involuntary context switches: 3 Swaps: 0 File system inputs: 14866 File system outputs: 21544 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` ##### FINAL BENCHMARK BEFORE MERGE: This was run after some fairly significant changes to minimize malloc padding, fix memory corruption when run with `U3_MEMORY_DEBUG`, and more. It was agreed to keep stack alignment out of this PR as it currently isn't used and costs us a bit of latency. Runtime vs compiletime determined pointer compression still shows no latency difference on the x86 linux machine tested (ddr4 memory). On an m2 mac air, there _was_ a 5% latency increase from compiletime to runtime pointer compression. This may be fixed later and would not necessitate another migration. The additional free list sanity checking done in `u3a_loom_sane` introduces negligible latency in `u3e_save`. On a relatively fragmented heap, it only takes 60ms to complete. This will be kept in order to detect _some_ memory corruption if it occurs and prevent that corruption from propagating to disk. A brass pill boot was performed off of 13e0b43d8da4bdd318fcd4e3d3610caa3af4608a. Observe there is no regression in the Elapsed (wall clock) time statistic. Further, the maximum resident set size has been reduced by 25% back to its pre pointer compression size (150M). This is likely due to a decrease in the average allocation's padding. Lastly, total sweep size was compared between a freshly booted pier without pointer compression and with pointer compression post migration. There is no noticeable increase in the overall size of allocations. ``` Command being timed: "./burbit -t -q -F zod -B brass.pill -c brasspillbench" User time (seconds): 1.27 System time (seconds): 0.04 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:21.49 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 150088 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 3490 Minor (reclaiming a frame) page faults: 6188 Voluntary context switches: 64 Involuntary context switches: 4 Swaps: 0 File system inputs: 14866 File system outputs: 21544 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ```	2023-02-28 15:56:25 -05:00
barter-simsum	c08ada2524	u3a_loom_sane()	2023-02-28 12:07:37 -05:00
barter-simsum	65328d762a	misc.	2023-02-28 12:07:37 -05:00
barter-simsum	ba1b0f1e3d	refactor alignment functions	2023-02-28 12:07:37 -05:00
barter-simsum	953b21ae82	fix U3_MEMORY_DEBUG corruption. Minimize required padding pad malloc internal pad calculations were largely responsible for the corruption. It happened to be the case that without U3_MEMORY_DEBUG set (which doubles the size of a `u3a_box`), we overallocated just enough memory for the pad miscalculation to not effect us. This fixes both the overallocation and the pad miscalculation.	2023-02-28 12:07:37 -05:00
barter-simsum	c71060be84	(ALLO, ALHI) -> (C3_ALGHI, C3_ALGLO)	2023-02-28 12:07:37 -05:00
barter-simsum	45217c3409	factor out _me_align_(pad\|dap) and all instances of alp_w This drastically simplifies the bizarre (ald_w, alp_w) alignment logic which also seems to have been the cause of issues with heap corruption originating in handling of internal 16 byte alignment by _ca_box_make_hat	2023-02-28 12:07:37 -05:00
barter-simsum	be4b92b38b	cenums are dumb	2023-02-28 12:07:37 -05:00
barter-simsum	ef2c4bba4d	introduce vere/version.h version numbers - currently for u3e and u3v - should be kept in their own header to avoid dependency loops since anything should be able to source these. I would like to avoid literal comparison like: if (ver_w == 2) { } and instead opt for if (ver_w == U3V_VER2) { } or if (ver_w == U3V_LATEST) { } Though this may seem overkill at first given the only version we've incremented is u3H->ver_w, this is the simplest solution to avoid dependency loops	2023-02-28 12:07:37 -05:00
barter-simsum	98ef6582f6	paren wrap road argument in road macros u3a_open, u3a_full, etc	2023-02-28 12:07:37 -05:00
barter-simsum	48e326ff03	loom migration to support compressed pointers	2023-02-28 12:07:37 -05:00
barter-simsum	0966fb0195	fix patch control version var name pat_u->con_u->ver_y -> ver_w	2023-02-28 12:07:37 -05:00
barter-simsum	8c0cc71127	specify up to 33 bits (8G) with --loom Unsure how we should ultimately do this. Specifying 33 bits should be conditional on the loom having compressed pointers obviously. P.S. Having implemented the migration by now, the migration is not optional, so this is fine.	2023-02-28 12:07:37 -05:00
barter-simsum	b5bef4fc09	refactor _ce_patch_verify -- use new PRIc3... printf specifiers	2023-02-25 16:14:09 -05:00
barter-simsum	c93ef2e82c	fix snapshot crashes caused by > (2^32) addressable bytes in loom Changes in next/vere also fixed this. This is nearly a subset of those changes.	2023-02-25 16:14:09 -05:00
barter-simsum	fbf69a018e	prevent overflow when printing memory with \|mass Previously, anything above 4G would overflow and print as less memory	2023-02-25 16:14:09 -05:00
barter-simsum	6c36069749	golf u3r_met -- also fixes failing STATIC_ASSERT after 8G expansion I'm assuming the old switch-case was an attempted performance optimization - more constants with math that could be elided by the compiler, e.g. `if (gal_w > (UINT32_MAX - 35 >> 5))`. However, looking at -O3 disassembly, I really doubt it's any faster. There are at least as many conditional jumps and the instruction size is about 2x larger. === Moreover, this change does not artificially limit the size that gal_w can be. For instance, in the previous implementation, for a value of a_y=2, gal_w could not exceed the following without bailing: `(UINT32_MAX - 35) >> 5` => `0x07FFFFFF` Now, gal_w cannot exceed `((UINT32_MAX - (32 + max_y)) >> (5 - a_y))` => `((UINT32_MAX - 35) >> 3)` => `0x1FFFFFFF` === This has been confirmed to return exactly the same results as prior	2023-02-25 16:14:09 -05:00
barter-simsum	cea338ce27	4G -> 8G loom	2023-02-25 16:14:09 -05:00
barter-simsum	8413fddb33	pointer compression on 4G loom	2023-02-25 16:14:09 -05:00
barter-simsum	d6261e914f	DWORD aligned heap allocs	2023-02-25 16:14:09 -05:00
barter-simsum	abfd14c72a	make u3a_[into\|outa\|to[off\|ptr\|wtr]] macros inline funcs	2023-02-25 16:14:09 -05:00
barter-simsum	bcac1d6d24	misc. semantically inconsequential	2023-02-25 16:14:07 -05:00
barter-simsum	c552b1a468	error when creating pier and -c unspecified. Exception: -w and -F -w and -F to continue with previous behavior of implicit pier creation.	2023-02-23 09:32:51 -05:00
barter-simsum	6d39000749	misc	2023-02-17 14:43:32 -05:00
barter-simsum	4b718a5b25	use reserved bit in U3_OS_LoomBits for 4G loom	2023-02-17 14:43:32 -05:00
barter-simsum	f04b4637f2	c3_dessert -- debug assert (breakpoint trap)	2023-02-17 14:43:32 -05:00
barter-simsum	2da1f13cb7	new canon	2023-02-17 14:43:32 -05:00
Matthew LeVan	5565ca3414	Strip trailing commit SHA from version number when downloading pill (#133 ) ## Description Resolves #126.	2023-02-17 10:01:17 -05:00
Josh Lehman	6efa7cb17a	build: fix broken import for macos-x86_64 (#147 ) These were the most minimal set of changes that allowed me to build vere on macOS x86-64. See #131 for context. To build, I ran `bazel build --clang_version="12.0.0" :urbit`.	2023-02-17 11:43:54 -03:00
Philip Monk	563e9f312d	fix -Y filename Fixes #222	2023-02-13 13:51:26 -07:00
Josh Lehman	7098eb8825	chop: offline event log truncation (#165 ) ## `chop` `urbit chop <pier>` implements a simple, offline event log truncation[^1] tool. `chop` gracefully stops the given pier (if running), backs up the current snapshot to `<pier>/.urb/bhk`, makes sure a current snapshot exists (i.e., is fully written to disk in `chk/*.bin` with no existing patch files), reads the metadata and the last event from the pier's event log, initializes a fresh event log in the `<pier>/.urb/log/chop` directory, writes the metadata and last event from the original log into the fresh one, renames the original event log to `<pier>/.urb/log/chop/data_<first>_<last>.mdb.bak` where `first` and `last` are the first and last event numbers from the event log, and exits. Pilots are then free to move, archive, or delete their `.bak` event log file, resume normal operation of their ship, and enjoy the many benefits of lowered disk pressure and any reductions in associated hosting costs. I've tested `chop` successfully on my own planet `~mastyr-bottec` (multiple times), three different comets (all fresh), and multitudes of fake galaxies. Resolves #122. Note: `knit`, which is the "undo" button for `chop`, is being implemented in its own PR #184. [^1]: https://roadmap.urbit.org/project/event-log-truncation	2023-02-09 07:01:05 -08:00
Matthew LeVan	abbdbe3701	fix that	2023-02-08 13:49:29 -05:00
Matthew LeVan	f166f39d1c	warning message	2023-02-08 13:48:49 -05:00
Matthew LeVan	025cf47505	warning message	2023-02-08 13:45:12 -05:00
Matthew LeVan	8606792c54	more error checking	2023-02-08 13:23:19 -05:00
Matthew LeVan	44a903af7c	add more error checking	2023-02-08 09:16:24 -05:00
Josh Lehman	01946506e8	Eval improvements (#155 )	2023-02-08 06:16:05 -08:00
Amadeo Bellotti	8505fe6fc5	changed `u3v_wish_n()` to be a `u3m_soft()` call as well as the additional housekeeping needed	2023-02-07 16:14:31 -05:00
Amadeo Bellotti	1da48e9188	more formatting nits	2023-02-07 09:29:33 -05:00
Amadeo Bellotti	4d0ea7e503	formatting nits and variable name changes	2023-02-07 09:25:46 -05:00
Amadeo Bellotti	195646160e	removed whitespace	2023-02-06 20:46:33 -05:00
Amadeo Bellotti	c302614d06	removed whitespace	2023-02-06 20:45:54 -05:00
Peter McEvoy	47ba262383	Implement simple replay command (#192 ) Co-authored-by: Peter McEvoy <git@mcevoypeter.com>	2023-02-06 16:00:42 -05:00
Amadeo Bellotti	cdea3edbda	added info on u3c_wish_n	2023-02-06 15:56:02 -05:00
Amadeo Bellotti	4b0221d170	Made changes that ~barter-simsum asked for	2023-02-06 15:53:06 -05:00
Matthew LeVan	f725ea46fd	add more instructions on completion	2023-02-06 12:56:29 -05:00
Matthew LeVan	44c3b0c37c	check if a current snapshot exists; move `u3m_stop()` to end of function	2023-02-06 12:41:09 -05:00
Matthew LeVan	5707981ae4	remove `u3e_curr()` (unnecessary)	2023-02-06 12:40:37 -05:00
Matthew LeVan	d5ae71115c	add warning printf	2023-02-03 15:09:44 -05:00
Matthew LeVan	4806b36c0c	don't leak memory when patches exist	2023-02-02 20:43:50 -05:00

1 2 3 4 5 ...

2201 Commits