Commit Graph

24 Commits

Author SHA1 Message Date
Dmitry Smirnov
b1ace63607 codespell: spelling corrections 2021-05-06 01:52:55 +10:00
Matt Wells
0b5f417349 if old title rec was corrupted we would get a random docid
when re-spidering the url causing some chaos. now things
should return to normal and we should overwrite the corrupted
titlerec on the next spidering. also, no longer do robots.txt
titlerec lookups. silly.
2016-03-15 23:26:57 -07:00
Matt
296651d416 fix getLeastLoadedInShard() to only return
the appropriate nospider/noquery hosts when using
nospider/noquery in hosts.conf.
2015-11-16 09:53:40 -07:00
Zak Betz
1351d9f994 Code cleanup. 2015-11-09 09:01:20 -07:00
Zak Betz
ea139a65e6 Warc stream busy loop fixes.
Load balance msg22 to the one with the least outstanding requests.
2015-10-15 22:30:07 -06:00
Matt
47795f4c70 fix core 2015-09-07 12:14:10 -06:00
mwells
87285ba3cd use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt
4e8a42e024 text replacements for bad int32_t substitutions 2014-11-17 18:24:38 -08:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
mwells
77ead7819b fix 2014-09-20 07:59:41 -06:00
mwells
1148be91b1 log msgs useful for debug 2014-09-19 17:06:35 -07:00
Matt Wells
56af753c3e fixed nasty bug of resetting RdbBases for
random collnums, causing data loss and corruption.
2014-06-09 10:16:29 -07:00
Matt Wells
48df53e74f Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
Conflicts:
	Msg22.cpp
2014-05-14 07:48:23 -07:00
Matt Wells
0242fe88ff try to fix msg22 based cores 2014-05-14 07:46:32 -07:00
Matt Wells
88eb44827f fix avail docid logic some more for indexing
spdier replies
2014-05-13 21:27:05 -07:00
Matt Wells
015b9d4597 fix oopsy 2014-05-13 21:10:34 -07:00
Matt Wells
0905fc48c1 fix bug in getAvailDocId() 2014-05-13 20:10:03 -07:00
Matt Wells
eb49094343 try to start indexing spider replies
as regular search results in the index so
you can query on those. get histograms of
spider status msgs, etc. ability to turn
that and images on/off.
2014-05-09 11:18:24 -07:00
Matt Wells
27e8e810d2 use collnum instead of coll string.
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
45cb5c9a0c fix bugs to try to get sharding working
on crawlbot today
2014-01-21 13:58:21 -08:00
Matt Wells
fe97e08281 move from groups to shards. got rid of annoying
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00