Matt
1e8f656d30
Merge branch 'diffbot-testing' into ia-zak
2015-09-25 08:23:42 -06:00
Matt
8a0461b82f
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-09-24 09:10:37 -06:00
Matt
d92b153090
added 'verify writes' switch to track down data corruption
2015-09-24 09:10:20 -06:00
Matt Wells
3dcaf414db
report bytes saved to disk.
...
if thread crashes try to dump core.
2015-09-23 15:40:30 -07:00
Matt Wells
ba8ebc7794
Revert "data corruption fixes"
...
This reverts commit 27172945c7
.
2015-09-23 14:38:17 -07:00
Matt Wells
27172945c7
data corruption fixes
2015-09-23 14:34:52 -07:00
Matt
100888d691
fix file/dir creation permissions bugs
2015-09-21 12:44:41 -06:00
Matt
74cde33a3a
just use the user's umask val for all file/dir creation
2015-09-21 11:33:38 -06:00
Matt
ce7b06fc4d
all files made are now group writable.
...
if you don't like that then you can make
a special group and set the directory just
group writable for that group using chmod g+s <dir>.
2015-09-21 11:19:34 -06:00
Matt
f1b0bd0149
quick fix for tree sanity checker
2015-07-15 09:46:27 -06:00
Matt Wells
0d1acb09bc
try to fix tree if corruption detected when dumping to disk
2015-07-14 22:27:43 -06:00
mwells
692c2932e8
fixed bug of gb not saving
2015-02-22 13:11:20 -07:00
Matt
24eac820d5
fixed bad deletenode call causing dups in
...
winnertree.
2015-02-12 16:12:23 -08:00
Matt
c8fb1af5c4
added tree mem reclaimer for doledb since it
...
is now a tree-only rdb.
2015-02-12 12:12:25 -08:00
mwells
87285ba3cd
use gbmemcpy not memcpy so we can get profiler working again
...
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt
4e8a42e024
text replacements for bad int32_t substitutions
2014-11-17 18:24:38 -08:00
Matt
931a1c4bc6
good checkpoint. quite a few fixes.
2014-11-17 18:13:36 -08:00
Matt
4c19453ea9
working with -m32 for basic testing.
...
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3
now it compiles with -m32
2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956
replace long long with int64_t
2014-10-30 13:36:39 -06:00
Matt Wells
d2b1196a85
Merge branch 'diffbot-testing' into testing
2014-07-22 10:47:33 -07:00
Matt Wells
72883dd340
fix a core when deleting a coll while
...
saving its doledb. fix it right actually.
2014-07-20 20:20:45 -07:00
Matt Wells
a0addd4000
try to fix spiders not going.
...
try to fix another core.
2014-07-17 13:48:43 -07:00
mwells
43d0d636ee
fix dmoz building.
2014-07-05 22:20:15 -07:00
mwells
1d2b234831
quick fix for core
2014-05-12 07:32:05 -07:00
mwells
6e922722da
tree repair logic.
2014-05-10 12:32:01 -07:00
mwells
2b37f56e4c
Merge branch 'diffbot-matt' into testing
2014-05-10 07:56:45 -07:00
mwells
ed816b2c11
a few bug fixes
2014-05-10 07:48:23 -07:00
Matt Wells
e21e0a404c
fixed bug for product title extraction.
...
titledb-saved.dat tree loop corruption bug.
no main coll bug.
put the ajax widget on spider status page so you can
see spider going in realtime. will give customers
a good idea of the spider moving along.
more widget fixes, to use new base64 thumbs, etc.
2014-04-28 13:30:24 -07:00
Matt Wells
f9dbd64056
get streaming time sliced results working
2014-02-06 14:25:44 -08:00
Matt Wells
e351cb9939
free spidercolls on exit
2014-01-22 23:52:23 -08:00
Matt Wells
066d910934
try to fix rebalancing some more.
2014-01-21 22:39:01 -08:00
Matt Wells
31cb71214c
more rdbtree fixes when invalid
...
collections are in there
2014-01-21 20:00:34 -08:00
Matt Wells
33c5d9c07f
a lot of times rdb tree has invalid collection
...
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
92047661ae
fix annoying rdbtree pos/neg key counting issue
2014-01-11 18:04:28 -08:00
Matt Wells
ec4d77f00a
make waiting trees grow dynamically to save
...
space. was taking like 1.5GB of ram for
like 100 collections or so.
2013-11-19 15:23:25 -08:00
Matt Wells
0655160c26
fixed quite a few nasty bugs.
...
collectionrec neg/pos key counting overruns.
2013-11-06 15:44:50 -08:00
Matt Wells
2d413578f2
track down some nasty cores. fix
...
for waiting tree out of sync.
2013-10-29 16:37:14 -07:00
Matt Wells
0e4d96b3f8
added "seeds" to json reply. store seed urls
...
(and deup them) in collrec. fixed some respidering
issues. any time we re-enter url filters
then rebuild the waiting tree.
2013-10-21 17:35:14 -07:00
Matt Wells
978910ca7a
fix more bugs.
2013-10-21 14:17:32 -07:00
Matt Wells
605289e130
fix a couple collection related bugs
...
causing cores in crawlbot.
2013-10-21 11:38:33 -07:00
Matt Wells
85bca4f3d1
can now delete collection while spiders are out
2013-10-18 18:11:14 -07:00
Matt Wells
889583ec4b
now we can reset collection mid stream
2013-10-18 17:49:36 -07:00
Matt Wells
ecab57ff0f
change collnum of reset collection
...
so any adds in progress will fail.
2013-10-18 15:46:00 -07:00
mwells
321f5cf938
quite a few fixes. something still
...
overwrite CollectionRec::m_overflow/m_overflow2...
2013-09-27 21:00:40 -06:00
Matt Wells
5dc7bd2ab4
integrate diffbot from svn back into git.
2013-09-13 09:23:18 -07:00
Matt Wells
94e6492916
removed MAX_COLL_RECS so we can have unlimited
...
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00