Commit Graph

11 Commits

Author SHA1 Message Date
Matt Wells
e351d2a6f1 get searching on token working 2014-03-06 17:01:41 -08:00
Matt Wells
27e8e810d2 use collnum instead of coll string.
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
ceb623bb8f do not dedup bulks.
only respider urls if error is tmp.
mess with msg1 in spider.cpp so niceness
is MAX_NICENESS and not 0 because it was
not able to trigger a doledb dump.
2014-02-23 20:04:46 -08:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
3bc85cf528 a few cleanups for the new dmoz code. 2013-10-13 16:48:59 -07:00
mwells
612f2872f7 use addurl to add the gbdmoz url
files to gigablast. it should index
just those dmoz urls, and not spider their links.
it should ignore external errors like
ETCPTIMEDOUT when indexing so it will be
identical to dmoz.
2013-10-05 23:22:51 -06:00
Matt Wells
fe97e08281 move from groups to shards. got rid of annoying
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
mwells
a0c79932bb catdb is now generated successfully. 2013-10-02 23:36:49 -06:00
mwells
942379427e log fixes for debugging. try to
stop spammy log msgs.
2013-10-02 22:37:20 -06:00
mwells
b16d8519fc more spider fixes. still need more speedups
when spidering multiple spiders on same ip.
2013-09-24 16:40:14 -06:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00