Commit Graph

48 Commits

Author SHA1 Message Date
Matt
1e8f656d30 Merge branch 'diffbot-testing' into ia-zak 2015-09-25 08:23:42 -06:00
Matt
8a0461b82f Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-09-24 09:10:37 -06:00
Matt
d92b153090 added 'verify writes' switch to track down data corruption 2015-09-24 09:10:20 -06:00
Matt Wells
3dcaf414db report bytes saved to disk.
if thread crashes try to dump core.
2015-09-23 15:40:30 -07:00
Matt Wells
ba8ebc7794 Revert "data corruption fixes"
This reverts commit 27172945c7.
2015-09-23 14:38:17 -07:00
Matt Wells
27172945c7 data corruption fixes 2015-09-23 14:34:52 -07:00
Matt
100888d691 fix file/dir creation permissions bugs 2015-09-21 12:44:41 -06:00
Matt
74cde33a3a just use the user's umask val for all file/dir creation 2015-09-21 11:33:38 -06:00
Matt
ce7b06fc4d all files made are now group writable.
if you don't like that then you can make
a special group and set the directory just
group writable for that group using chmod g+s <dir>.
2015-09-21 11:19:34 -06:00
Matt
f1b0bd0149 quick fix for tree sanity checker 2015-07-15 09:46:27 -06:00
Matt Wells
0d1acb09bc try to fix tree if corruption detected when dumping to disk 2015-07-14 22:27:43 -06:00
mwells
692c2932e8 fixed bug of gb not saving 2015-02-22 13:11:20 -07:00
Matt
24eac820d5 fixed bad deletenode call causing dups in
winnertree.
2015-02-12 16:12:23 -08:00
Matt
c8fb1af5c4 added tree mem reclaimer for doledb since it
is now a tree-only rdb.
2015-02-12 12:12:25 -08:00
mwells
87285ba3cd use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt
4e8a42e024 text replacements for bad int32_t substitutions 2014-11-17 18:24:38 -08:00
Matt
931a1c4bc6 good checkpoint. quite a few fixes. 2014-11-17 18:13:36 -08:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
Matt Wells
d2b1196a85 Merge branch 'diffbot-testing' into testing 2014-07-22 10:47:33 -07:00
Matt Wells
72883dd340 fix a core when deleting a coll while
saving its doledb. fix it right actually.
2014-07-20 20:20:45 -07:00
Matt Wells
a0addd4000 try to fix spiders not going.
try to fix another core.
2014-07-17 13:48:43 -07:00
mwells
43d0d636ee fix dmoz building. 2014-07-05 22:20:15 -07:00
mwells
1d2b234831 quick fix for core 2014-05-12 07:32:05 -07:00
mwells
6e922722da tree repair logic. 2014-05-10 12:32:01 -07:00
mwells
2b37f56e4c Merge branch 'diffbot-matt' into testing 2014-05-10 07:56:45 -07:00
mwells
ed816b2c11 a few bug fixes 2014-05-10 07:48:23 -07:00
Matt Wells
e21e0a404c fixed bug for product title extraction.
titledb-saved.dat tree loop corruption bug.
no main coll bug.
put the ajax widget on spider status page so you can
see spider going in realtime. will give customers
a good idea of the spider moving along.
more widget fixes, to use new base64 thumbs, etc.
2014-04-28 13:30:24 -07:00
Matt Wells
f9dbd64056 get streaming time sliced results working 2014-02-06 14:25:44 -08:00
Matt Wells
e351cb9939 free spidercolls on exit 2014-01-22 23:52:23 -08:00
Matt Wells
066d910934 try to fix rebalancing some more. 2014-01-21 22:39:01 -08:00
Matt Wells
31cb71214c more rdbtree fixes when invalid
collections are in there
2014-01-21 20:00:34 -08:00
Matt Wells
33c5d9c07f a lot of times rdb tree has invalid collection
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
92047661ae fix annoying rdbtree pos/neg key counting issue 2014-01-11 18:04:28 -08:00
Matt Wells
ec4d77f00a make waiting trees grow dynamically to save
space. was taking like 1.5GB of ram for
like 100 collections or so.
2013-11-19 15:23:25 -08:00
Matt Wells
0655160c26 fixed quite a few nasty bugs.
collectionrec neg/pos key counting overruns.
2013-11-06 15:44:50 -08:00
Matt Wells
2d413578f2 track down some nasty cores. fix
for waiting tree out of sync.
2013-10-29 16:37:14 -07:00
Matt Wells
0e4d96b3f8 added "seeds" to json reply. store seed urls
(and deup them) in collrec. fixed some respidering
issues. any time we re-enter url filters
then rebuild the waiting tree.
2013-10-21 17:35:14 -07:00
Matt Wells
978910ca7a fix more bugs. 2013-10-21 14:17:32 -07:00
Matt Wells
605289e130 fix a couple collection related bugs
causing cores in crawlbot.
2013-10-21 11:38:33 -07:00
Matt Wells
85bca4f3d1 can now delete collection while spiders are out 2013-10-18 18:11:14 -07:00
Matt Wells
889583ec4b now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
Matt Wells
ecab57ff0f change collnum of reset collection
so any adds in progress will fail.
2013-10-18 15:46:00 -07:00
mwells
321f5cf938 quite a few fixes. something still
overwrite CollectionRec::m_overflow/m_overflow2...
2013-09-27 21:00:40 -06:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
Matt Wells
94e6492916 removed MAX_COLL_RECS so we can have unlimited
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00