mwells
1d2b234831
quick fix for core
2014-05-12 07:32:05 -07:00
mwells
6e922722da
tree repair logic.
2014-05-10 12:32:01 -07:00
mwells
2b37f56e4c
Merge branch 'diffbot-matt' into testing
2014-05-10 07:56:45 -07:00
mwells
ed816b2c11
a few bug fixes
2014-05-10 07:48:23 -07:00
Matt Wells
e21e0a404c
fixed bug for product title extraction.
...
titledb-saved.dat tree loop corruption bug.
no main coll bug.
put the ajax widget on spider status page so you can
see spider going in realtime. will give customers
a good idea of the spider moving along.
more widget fixes, to use new base64 thumbs, etc.
2014-04-28 13:30:24 -07:00
Matt Wells
f9dbd64056
get streaming time sliced results working
2014-02-06 14:25:44 -08:00
Matt Wells
e351cb9939
free spidercolls on exit
2014-01-22 23:52:23 -08:00
Matt Wells
066d910934
try to fix rebalancing some more.
2014-01-21 22:39:01 -08:00
Matt Wells
31cb71214c
more rdbtree fixes when invalid
...
collections are in there
2014-01-21 20:00:34 -08:00
Matt Wells
33c5d9c07f
a lot of times rdb tree has invalid collection
...
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
92047661ae
fix annoying rdbtree pos/neg key counting issue
2014-01-11 18:04:28 -08:00
Matt Wells
ec4d77f00a
make waiting trees grow dynamically to save
...
space. was taking like 1.5GB of ram for
like 100 collections or so.
2013-11-19 15:23:25 -08:00
Matt Wells
0655160c26
fixed quite a few nasty bugs.
...
collectionrec neg/pos key counting overruns.
2013-11-06 15:44:50 -08:00
Matt Wells
2d413578f2
track down some nasty cores. fix
...
for waiting tree out of sync.
2013-10-29 16:37:14 -07:00
Matt Wells
0e4d96b3f8
added "seeds" to json reply. store seed urls
...
(and deup them) in collrec. fixed some respidering
issues. any time we re-enter url filters
then rebuild the waiting tree.
2013-10-21 17:35:14 -07:00
Matt Wells
978910ca7a
fix more bugs.
2013-10-21 14:17:32 -07:00
Matt Wells
605289e130
fix a couple collection related bugs
...
causing cores in crawlbot.
2013-10-21 11:38:33 -07:00
Matt Wells
85bca4f3d1
can now delete collection while spiders are out
2013-10-18 18:11:14 -07:00
Matt Wells
889583ec4b
now we can reset collection mid stream
2013-10-18 17:49:36 -07:00
Matt Wells
ecab57ff0f
change collnum of reset collection
...
so any adds in progress will fail.
2013-10-18 15:46:00 -07:00
mwells
321f5cf938
quite a few fixes. something still
...
overwrite CollectionRec::m_overflow/m_overflow2...
2013-09-27 21:00:40 -06:00
Matt Wells
5dc7bd2ab4
integrate diffbot from svn back into git.
2013-09-13 09:23:18 -07:00
Matt Wells
94e6492916
removed MAX_COLL_RECS so we can have unlimited
...
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00