Matt Wells
8ac691f324
fix merging getting clogged by so many
...
collections tring to merge tagdb at once
2014-06-05 21:27:33 -07:00
Matt Wells
4298e4e752
sanity checks for debugging duplicate
...
titledb file bug.
2014-06-04 12:15:12 -07:00
mwells
45b8bb3421
log msg cleanups
2014-05-11 21:55:44 -07:00
mwells
6e922722da
tree repair logic.
2014-05-10 12:32:01 -07:00
mwells
7e1429cc30
more bug fixes
2014-05-10 08:22:26 -07:00
mwells
8e381504a1
fix makeTrashDir()
2014-05-10 08:02:46 -07:00
mwells
2b37f56e4c
Merge branch 'diffbot-matt' into testing
2014-05-10 07:56:45 -07:00
mwells
ed816b2c11
a few bug fixes
2014-05-10 07:48:23 -07:00
mwells
81369b786c
make trash dir for image thumbs automatically
2014-04-29 17:01:48 -06:00
Matt Wells
d4302e3301
fix core
2014-03-18 11:12:50 -07:00
Matt Wells
bd4484db3c
Merge branch 'testing' into diffbot-testing
2014-03-10 12:08:23 -07:00
Matt Wells
624c1d4e68
nuke doledb fixes
2014-03-08 10:51:15 -07:00
Matt Wells
27e8e810d2
use collnum instead of coll string.
...
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
a6b7e088f5
take out tfndb, unused. fix core
...
from diffbot url too long.
2014-02-26 01:07:13 -08:00
Matt Wells
32526a9b25
more checksum fixes for json. fixes for
...
repair/rebuild procedure.
2014-02-16 10:46:41 -08:00
Matt Wells
106077c163
fix spiderrequest deduping some more
2014-02-06 09:47:18 -08:00
Matt Wells
4029b0b937
more faster spider fixes. tried to fix
...
corrupt rdbcache.
2014-02-06 09:25:27 -08:00
Matt Wells
ecc10c2cb9
dup cache fixes. do not add dups to spiderdb either.
2014-02-05 14:09:35 -08:00
Matt Wells
4606e88721
code cleanups.
...
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00
Matt Wells
980d63632a
more msg5 re-read fixes.
...
stop re-reading if increasing minrecsizes did nothing.
fix tight merges so they work over all colls.
fix merge counting to be fast and not loop over
all rdbbases which could be thousands.
add num mirrors to rebalance.txt.
fix updateCrawlInfo to wait for all replies. critical error!
2014-01-16 13:38:22 -08:00
Matt Wells
f8c2329bd2
rebalancer fixes
2014-01-15 15:42:59 -08:00
Matt Wells
8a49e87a61
got code with shard rebalancing compiling.
...
now we store a "sharded by termid" bit in posdb
key for checksums, etc keys that are not sharded
by docid. save having to do disk seeks on every
host in the cluster to do a dup check, etc.
2014-01-11 16:08:42 -08:00
Matt Wells
c0447de3a1
watch out for NULL "base" after a coll delete.
2013-12-29 01:32:40 -08:00
Matt Wells
d8a9a3f4e3
fix parm sync code some more.
...
added localhosts.conf to the 'gb install' dist.
2013-12-27 14:00:37 -08:00
Matt Wells
048b715962
if coll is deleted or reset in a middle of a dump
...
or merge then stop the dump/merge with ENOCOLLREC
error. avoid calling "base->" functions since it
could be NULL if deleted.
2013-12-25 17:12:09 -08:00
Matt Wells
3f19ece776
parmdb updates
2013-12-16 17:07:15 -08:00
Matt Wells
617a0ff76e
parmdb fixes
2013-12-16 16:04:43 -08:00
Matt Wells
6c652c1cc6
more parmdb fixes
2013-12-16 15:39:24 -08:00
mwells
76bb3d05e1
clean up logging so i can see what's going on
2013-12-10 16:41:30 -08:00
mwells
82494baa89
move CollectionRec stuff into Collectiondb files
...
for simplicity.
2013-12-10 15:28:04 -08:00
mwells
f2d5661965
parmdb overhaul. support collection add/del
...
sync when host comes back online. use udp not tcp.
host #0 can now handle a new incoming request while
a parm change is currently outstanding.
all missed "command" parms will be received when a dead host
comes back online, too, like a tight merge for instance.
does not use msg4, uses msg3e and msg3f for syncing and
sending parms.
2013-12-10 13:09:55 -08:00
Matt Wells
06edfddf31
a bunch of bug fixes, mostly spider related.
...
also some for pagereindex.
2013-12-07 21:56:37 -07:00
Matt Wells
fe1a7d1a75
rdbbase not fully resetting? it was
...
trying to dump to coll directories that
had been moved to trash folder.
and printing out "deleted from under us".
at least it was corrupting data in RdbMem
this time because i added m_dumpErrno logic.
2013-11-15 09:01:58 -08:00
Matt Wells
eb719849a6
do not core on this dump error
2013-11-13 19:04:22 -08:00
Matt Wells
a31b13ad61
fix a few bugs.
2013-11-13 13:27:22 -08:00
Matt Wells
3afac4812d
fix bug of trying to del/reset coll while
...
disable writing was engaged. we already
had it check to see if tree was saving,
but not if writes were disabled. so it
gets ETRYAGAIN and retries later.
2013-11-10 09:40:32 -08:00
Matt Wells
396a88799a
fix bad bug of basically emptying out all our data
...
on auto-save!
2013-11-06 19:49:20 -08:00
Matt Wells
0655160c26
fixed quite a few nasty bugs.
...
collectionrec neg/pos key counting overruns.
2013-11-06 15:44:50 -08:00
Matt Wells
b83dd59913
fix bug when we nuke a collnum
...
from a tree right in the middle of when
saving rdb trees in process.cpp.
2013-10-30 12:27:08 -07:00
Matt Wells
2d413578f2
track down some nasty cores. fix
...
for waiting tree out of sync.
2013-10-29 16:37:14 -07:00
Matt Wells
240da39873
Merge branch 'master' into diffbot
2013-10-25 12:32:02 -07:00
Matt Wells
605289e130
fix a couple collection related bugs
...
causing cores in crawlbot.
2013-10-21 11:38:33 -07:00
Matt Wells
54915dc384
fix data corruption in RdbMem buffer
...
when running with threads disabled.
2013-10-19 19:37:29 -07:00
Matt Wells
889583ec4b
now we can reset collection mid stream
2013-10-18 17:49:36 -07:00
Matt Wells
b589b17e63
fix collection resetting.
2013-10-18 15:21:00 -07:00
Matt Wells
57ee9739e5
fix addColl() logic for collectionless rdbs
2013-10-16 14:38:09 -07:00
Matt Wells
fc17521697
Merge branch 'master' into diffbot
...
Conflicts:
Hostdb.cpp
Makefile
PageResults.cpp
PageRoot.cpp
Pages.cpp
Rdb.cpp
SearchInput.cpp
SearchInput.h
Spider.cpp
Spider.h
XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
3374ce450a
fix a couple catdb generation bugs.
...
MAX_CATIDS violation causing corruption.
not saving catdb tree to catdb-saved.dat
causing missing catdb recs.
2013-10-12 20:33:04 -07:00
mwells
71d5d05f7c
use catdb/ subdir not cat/ for consistency.
2013-10-04 21:35:13 -06:00
Matt Wells
fe97e08281
move from groups to shards. got rid of annoying
...
groupid bit mask thing.
2013-10-04 16:18:56 -07:00