Commit Graph

71 Commits

Author SHA1 Message Date
Matt Wells
8ac691f324 fix merging getting clogged by so many
collections tring to merge tagdb at once
2014-06-05 21:27:33 -07:00
Matt Wells
4298e4e752 sanity checks for debugging duplicate
titledb file bug.
2014-06-04 12:15:12 -07:00
mwells
45b8bb3421 log msg cleanups 2014-05-11 21:55:44 -07:00
mwells
6e922722da tree repair logic. 2014-05-10 12:32:01 -07:00
mwells
7e1429cc30 more bug fixes 2014-05-10 08:22:26 -07:00
mwells
8e381504a1 fix makeTrashDir() 2014-05-10 08:02:46 -07:00
mwells
2b37f56e4c Merge branch 'diffbot-matt' into testing 2014-05-10 07:56:45 -07:00
mwells
ed816b2c11 a few bug fixes 2014-05-10 07:48:23 -07:00
mwells
81369b786c make trash dir for image thumbs automatically 2014-04-29 17:01:48 -06:00
Matt Wells
d4302e3301 fix core 2014-03-18 11:12:50 -07:00
Matt Wells
bd4484db3c Merge branch 'testing' into diffbot-testing 2014-03-10 12:08:23 -07:00
Matt Wells
624c1d4e68 nuke doledb fixes 2014-03-08 10:51:15 -07:00
Matt Wells
27e8e810d2 use collnum instead of coll string.
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
a6b7e088f5 take out tfndb, unused. fix core
from diffbot url too long.
2014-02-26 01:07:13 -08:00
Matt Wells
32526a9b25 more checksum fixes for json. fixes for
repair/rebuild procedure.
2014-02-16 10:46:41 -08:00
Matt Wells
106077c163 fix spiderrequest deduping some more 2014-02-06 09:47:18 -08:00
Matt Wells
4029b0b937 more faster spider fixes. tried to fix
corrupt rdbcache.
2014-02-06 09:25:27 -08:00
Matt Wells
ecc10c2cb9 dup cache fixes. do not add dups to spiderdb either. 2014-02-05 14:09:35 -08:00
Matt Wells
4606e88721 code cleanups.
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00
Matt Wells
980d63632a more msg5 re-read fixes.
stop re-reading if increasing minrecsizes did nothing.
fix tight merges so they work over all colls.
fix merge counting to be fast and not loop over
all rdbbases which could be thousands.
add num mirrors to rebalance.txt.
fix updateCrawlInfo to wait for all replies. critical error!
2014-01-16 13:38:22 -08:00
Matt Wells
f8c2329bd2 rebalancer fixes 2014-01-15 15:42:59 -08:00
Matt Wells
8a49e87a61 got code with shard rebalancing compiling.
now we store a "sharded by termid" bit in posdb
key for checksums, etc keys that are not sharded
by docid. save having to do disk seeks on every
host in the cluster to do a dup check, etc.
2014-01-11 16:08:42 -08:00
Matt Wells
c0447de3a1 watch out for NULL "base" after a coll delete. 2013-12-29 01:32:40 -08:00
Matt Wells
d8a9a3f4e3 fix parm sync code some more.
added localhosts.conf  to the 'gb install' dist.
2013-12-27 14:00:37 -08:00
Matt Wells
048b715962 if coll is deleted or reset in a middle of a dump
or merge then stop the dump/merge with ENOCOLLREC
error. avoid calling "base->" functions since it
could be NULL if deleted.
2013-12-25 17:12:09 -08:00
Matt Wells
3f19ece776 parmdb updates 2013-12-16 17:07:15 -08:00
Matt Wells
617a0ff76e parmdb fixes 2013-12-16 16:04:43 -08:00
Matt Wells
6c652c1cc6 more parmdb fixes 2013-12-16 15:39:24 -08:00
mwells
76bb3d05e1 clean up logging so i can see what's going on 2013-12-10 16:41:30 -08:00
mwells
82494baa89 move CollectionRec stuff into Collectiondb files
for simplicity.
2013-12-10 15:28:04 -08:00
mwells
f2d5661965 parmdb overhaul. support collection add/del
sync when host comes back online. use udp not tcp.
host #0 can now handle a new incoming request while
a parm change is currently outstanding.
all missed "command" parms will be received when a dead host
comes back online, too, like a tight merge for instance.
does not use msg4, uses msg3e and msg3f for syncing and
sending parms.
2013-12-10 13:09:55 -08:00
Matt Wells
06edfddf31 a bunch of bug fixes, mostly spider related.
also some for pagereindex.
2013-12-07 21:56:37 -07:00
Matt Wells
fe1a7d1a75 rdbbase not fully resetting? it was
trying to dump to coll directories that
had been moved to trash folder.
and printing out "deleted from under us".
at least it was corrupting data in RdbMem
this time because i added m_dumpErrno logic.
2013-11-15 09:01:58 -08:00
Matt Wells
eb719849a6 do not core on this dump error 2013-11-13 19:04:22 -08:00
Matt Wells
a31b13ad61 fix a few bugs. 2013-11-13 13:27:22 -08:00
Matt Wells
3afac4812d fix bug of trying to del/reset coll while
disable writing was engaged. we already
had it check to see if tree was saving,
but not if writes were disabled. so it
gets ETRYAGAIN and retries later.
2013-11-10 09:40:32 -08:00
Matt Wells
396a88799a fix bad bug of basically emptying out all our data
on auto-save!
2013-11-06 19:49:20 -08:00
Matt Wells
0655160c26 fixed quite a few nasty bugs.
collectionrec neg/pos key counting overruns.
2013-11-06 15:44:50 -08:00
Matt Wells
b83dd59913 fix bug when we nuke a collnum
from a tree right in the middle of when
saving rdb trees in process.cpp.
2013-10-30 12:27:08 -07:00
Matt Wells
2d413578f2 track down some nasty cores. fix
for waiting tree out of sync.
2013-10-29 16:37:14 -07:00
Matt Wells
240da39873 Merge branch 'master' into diffbot 2013-10-25 12:32:02 -07:00
Matt Wells
605289e130 fix a couple collection related bugs
causing cores in crawlbot.
2013-10-21 11:38:33 -07:00
Matt Wells
54915dc384 fix data corruption in RdbMem buffer
when running with threads disabled.
2013-10-19 19:37:29 -07:00
Matt Wells
889583ec4b now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
Matt Wells
b589b17e63 fix collection resetting. 2013-10-18 15:21:00 -07:00
Matt Wells
57ee9739e5 fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
3374ce450a fix a couple catdb generation bugs.
MAX_CATIDS violation causing corruption.
not saving catdb tree to catdb-saved.dat
causing missing catdb recs.
2013-10-12 20:33:04 -07:00
mwells
71d5d05f7c use catdb/ subdir not cat/ for consistency. 2013-10-04 21:35:13 -06:00
Matt Wells
fe97e08281 move from groups to shards. got rid of annoying
groupid bit mask thing.
2013-10-04 16:18:56 -07:00