Commit Graph

20 Commits

Author SHA1 Message Date
Matt Wells
e21e0a404c fixed bug for product title extraction.
titledb-saved.dat tree loop corruption bug.
no main coll bug.
put the ajax widget on spider status page so you can
see spider going in realtime. will give customers
a good idea of the spider moving along.
more widget fixes, to use new base64 thumbs, etc.
2014-04-28 13:30:24 -07:00
Matt Wells
f9dbd64056 get streaming time sliced results working 2014-02-06 14:25:44 -08:00
Matt Wells
e351cb9939 free spidercolls on exit 2014-01-22 23:52:23 -08:00
Matt Wells
066d910934 try to fix rebalancing some more. 2014-01-21 22:39:01 -08:00
Matt Wells
31cb71214c more rdbtree fixes when invalid
collections are in there
2014-01-21 20:00:34 -08:00
Matt Wells
33c5d9c07f a lot of times rdb tree has invalid collection
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
92047661ae fix annoying rdbtree pos/neg key counting issue 2014-01-11 18:04:28 -08:00
Matt Wells
ec4d77f00a make waiting trees grow dynamically to save
space. was taking like 1.5GB of ram for
like 100 collections or so.
2013-11-19 15:23:25 -08:00
Matt Wells
0655160c26 fixed quite a few nasty bugs.
collectionrec neg/pos key counting overruns.
2013-11-06 15:44:50 -08:00
Matt Wells
2d413578f2 track down some nasty cores. fix
for waiting tree out of sync.
2013-10-29 16:37:14 -07:00
Matt Wells
0e4d96b3f8 added "seeds" to json reply. store seed urls
(and deup them) in collrec. fixed some respidering
issues. any time we re-enter url filters
then rebuild the waiting tree.
2013-10-21 17:35:14 -07:00
Matt Wells
978910ca7a fix more bugs. 2013-10-21 14:17:32 -07:00
Matt Wells
605289e130 fix a couple collection related bugs
causing cores in crawlbot.
2013-10-21 11:38:33 -07:00
Matt Wells
85bca4f3d1 can now delete collection while spiders are out 2013-10-18 18:11:14 -07:00
Matt Wells
889583ec4b now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
Matt Wells
ecab57ff0f change collnum of reset collection
so any adds in progress will fail.
2013-10-18 15:46:00 -07:00
mwells
321f5cf938 quite a few fixes. something still
overwrite CollectionRec::m_overflow/m_overflow2...
2013-09-27 21:00:40 -06:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
Matt Wells
94e6492916 removed MAX_COLL_RECS so we can have unlimited
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00