Commit Graph

15 Commits

Author SHA1 Message Date
Matt
09de59f026 do not store cblock, etc. tags into tagdb to save
disk space. added tagdb file cache for better performance,
less disk accesses. will help reduce disk load.
put file cache sizes in master controls and if they change
then update the cache size dynamically.
2015-09-10 12:46:00 -06:00
Matt
4e8a42e024 text replacements for bad int32_t substitutions 2014-11-17 18:24:38 -08:00
Matt
931a1c4bc6 good checkpoint. quite a few fixes. 2014-11-17 18:13:36 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
Matt Wells
b13f3d24d7 replaced unsigned long long with uint64_t 2014-10-30 13:30:39 -06:00
mwells
950352d781 do not hash redundant xpaths that have the same inner sentence/alnum
html as their children tags. waste of index space.
2014-07-09 17:16:01 -07:00
mwells
b231bc8042 incorporate total # of docs with that xpathsitehash
into the tag attr. so using the MxDy should be good
enough to determine if something is chrome or not.
2014-07-09 16:47:47 -07:00
mwells
c8567f8a24 sectioning stuff working halfway decent.
still need to do docid-based stats perhaps.
need to scroll to section hash when clicking
the 'sections' link.
2014-07-07 16:46:38 -07:00
mwells
c314e61968 make sectiondb stats just a special case of facets 2014-06-17 16:39:02 -06:00
mwells
d71922168e facetize the sectiondb stuff 2014-06-16 20:40:35 -07:00
mwells
6f70282ba2 almost got sectiondb integration compiling 2014-06-11 17:24:58 -07:00
Matt Wells
4f64677b4f get new global preemptive cache
logic compiling, with section voting
stats.
2014-01-05 11:51:09 -08:00
mwells
183b7c372e make sections grow dynamically so we do not
OOM when trying to index a gbdmoz.urls.txt.* file
which can be 25MB.
2013-10-06 11:04:10 -06:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00