Dmitry Smirnov
b1ace63607
codespell: spelling corrections
2021-05-06 01:52:55 +10:00
Matt Wells
0b5f417349
if old title rec was corrupted we would get a random docid
...
when re-spidering the url causing some chaos. now things
should return to normal and we should overwrite the corrupted
titlerec on the next spidering. also, no longer do robots.txt
titlerec lookups. silly.
2016-03-15 23:26:57 -07:00
Matt
296651d416
fix getLeastLoadedInShard() to only return
...
the appropriate nospider/noquery hosts when using
nospider/noquery in hosts.conf.
2015-11-16 09:53:40 -07:00
Zak Betz
1351d9f994
Code cleanup.
2015-11-09 09:01:20 -07:00
Zak Betz
ea139a65e6
Warc stream busy loop fixes.
...
Load balance msg22 to the one with the least outstanding requests.
2015-10-15 22:30:07 -06:00
Matt
47795f4c70
fix core
2015-09-07 12:14:10 -06:00
mwells
87285ba3cd
use gbmemcpy not memcpy so we can get profiler working again
...
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt
4e8a42e024
text replacements for bad int32_t substitutions
2014-11-17 18:24:38 -08:00
Matt
4c19453ea9
working with -m32 for basic testing.
...
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3
now it compiles with -m32
2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956
replace long long with int64_t
2014-10-30 13:36:39 -06:00
mwells
77ead7819b
fix
2014-09-20 07:59:41 -06:00
mwells
1148be91b1
log msgs useful for debug
2014-09-19 17:06:35 -07:00
Matt Wells
56af753c3e
fixed nasty bug of resetting RdbBases for
...
random collnums, causing data loss and corruption.
2014-06-09 10:16:29 -07:00
Matt Wells
48df53e74f
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
...
Conflicts:
Msg22.cpp
2014-05-14 07:48:23 -07:00
Matt Wells
0242fe88ff
try to fix msg22 based cores
2014-05-14 07:46:32 -07:00
Matt Wells
88eb44827f
fix avail docid logic some more for indexing
...
spdier replies
2014-05-13 21:27:05 -07:00
Matt Wells
015b9d4597
fix oopsy
2014-05-13 21:10:34 -07:00
Matt Wells
0905fc48c1
fix bug in getAvailDocId()
2014-05-13 20:10:03 -07:00
Matt Wells
eb49094343
try to start indexing spider replies
...
as regular search results in the index so
you can query on those. get histograms of
spider status msgs, etc. ability to turn
that and images on/off.
2014-05-09 11:18:24 -07:00
Matt Wells
27e8e810d2
use collnum instead of coll string.
...
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
45cb5c9a0c
fix bugs to try to get sharding working
...
on crawlbot today
2014-01-21 13:58:21 -08:00
Matt Wells
fe97e08281
move from groups to shards. got rid of annoying
...
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
Matt Wells
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00