Matt Wells
e351d2a6f1
get searching on token working
2014-03-06 17:01:41 -08:00
Matt Wells
27e8e810d2
use collnum instead of coll string.
...
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
ceb623bb8f
do not dedup bulks.
...
only respider urls if error is tmp.
mess with msg1 in spider.cpp so niceness
is MAX_NICENESS and not 0 because it was
not able to trigger a doledb dump.
2014-02-23 20:04:46 -08:00
Matt Wells
fc17521697
Merge branch 'master' into diffbot
...
Conflicts:
Hostdb.cpp
Makefile
PageResults.cpp
PageRoot.cpp
Pages.cpp
Rdb.cpp
SearchInput.cpp
SearchInput.h
Spider.cpp
Spider.h
XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
3bc85cf528
a few cleanups for the new dmoz code.
2013-10-13 16:48:59 -07:00
mwells
612f2872f7
use addurl to add the gbdmoz url
...
files to gigablast. it should index
just those dmoz urls, and not spider their links.
it should ignore external errors like
ETCPTIMEDOUT when indexing so it will be
identical to dmoz.
2013-10-05 23:22:51 -06:00
Matt Wells
fe97e08281
move from groups to shards. got rid of annoying
...
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
mwells
a0c79932bb
catdb is now generated successfully.
2013-10-02 23:36:49 -06:00
mwells
942379427e
log fixes for debugging. try to
...
stop spammy log msgs.
2013-10-02 22:37:20 -06:00
mwells
b16d8519fc
more spider fixes. still need more speedups
...
when spidering multiple spiders on same ip.
2013-09-24 16:40:14 -06:00
Matt Wells
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00