Commit Graph

34 Commits

Author SHA1 Message Date
Matt
f4ca6d8cd4 try ddomain only urls with www. when looking up
in sitelinks.txt
2015-01-31 15:33:37 -07:00
Matt
cad1d3d076 added support for sitelinks.txt file 2015-01-31 15:18:06 -07:00
Matt
1ef3932b32 use ./gb dump z main 0 -1 1 to generate sitelinks.txt 2015-01-25 18:45:40 -07:00
mwells
87285ba3cd use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt Wells
7d67f104fb emergency fixes 2014-12-11 08:39:26 -08:00
Matt
0460335861 more permission system updates 2014-12-08 09:49:17 -08:00
Matt
adcef39376 Merge branch 'diffbot-testing' into diffbot-matt
Conflicts:
	Collectiondb.cpp
	Collectiondb.h
	Conf.cpp
	Conf.h
	Msg39.cpp
	PageEvents.cpp
	PageResults.cpp
	PageTurk.cpp
	Pages.cpp
	Parms.cpp
	Posdb.cpp
	Proxy.cpp
	Query.cpp
	Query.h
	RdbBase.cpp
	RdbMap.cpp
	Repair.cpp
	Repair.h
	SafeBuf.cpp
	Spider.cpp
	Tagdb.cpp
	TopTree.cpp
	XmlDoc.cpp
	main.cpp
2014-11-20 16:53:07 -08:00
Matt
4e8a42e024 text replacements for bad int32_t substitutions 2014-11-17 18:24:38 -08:00
Matt
931a1c4bc6 good checkpoint. quite a few fixes. 2014-11-17 18:13:36 -08:00
Matt
69ef3c14ef fixes for repair/rebuild functionality.
more to come.
2014-11-13 13:04:28 -08:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
Matt Wells
789bb73dd3 when docid is banned do not print json/xml
cruft in the serps. was causing json
parsing errors.
2014-09-19 07:27:33 -07:00
mwells
7f622bd416 fixes for cloud support. 2014-08-31 16:23:11 -07:00
Matt Wells
5d3fd80063 make it so we can dump tagdb to a wget-table
list of urls to re-add tags to another tagdb.
2014-08-23 07:29:40 -07:00
Matt Wells
d6434191d1 nomenclature changes to reduce collissions.
name collection 'qatest123' for doing smoke tests,
not 'test'.
2014-03-31 15:02:17 -07:00
Matt Wells
edbd61b0c5 thread fixes. if pthread_create fails then
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
2014-03-15 20:07:02 -07:00
Matt Wells
bd4484db3c Merge branch 'testing' into diffbot-testing 2014-03-10 12:08:23 -07:00
Matt Wells
e351d2a6f1 get searching on token working 2014-03-06 17:01:41 -08:00
Matt Wells
27e8e810d2 use collnum instead of coll string.
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
c9ef525338 code checkpoint 2014-02-09 12:55:45 -07:00
Matt Wells
b6c3ecc20e more formatting 2014-01-19 11:56:36 -08:00
Matt Wells
fe3a879758 formatting changes 2014-01-19 00:38:02 -08:00
Matt Wells
4606e88721 code cleanups.
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00
Matt Wells
8c4ac3c514 Merge branch 'master' into diffbot 2014-01-17 20:17:40 -08:00
Matt Wells
dde05446f5 sharding fixes for 3+ stripes. 2014-01-16 11:20:12 -07:00
Matt Wells
8a49e87a61 got code with shard rebalancing compiling.
now we store a "sharded by termid" bit in posdb
key for checksums, etc keys that are not sharded
by docid. save having to do disk seeks on every
host in the cluster to do a dup check, etc.
2014-01-11 16:08:42 -08:00
Matt Wells
f64b53bfb3 almost done with rebalancing code 2014-01-10 14:12:58 -08:00
Matt Wells
1b5057ad42 log cleanups mostly.
took out disk page cache,
kinda buggy... need to fix at some point.
2013-12-18 10:57:18 -08:00
mwells
76bb3d05e1 clean up logging so i can see what's going on 2013-12-10 16:41:30 -08:00
Matt Wells
9f1d79b124 check for null collrec 2013-12-02 10:13:19 -08:00
Matt Wells
fe97e08281 move from groups to shards. got rid of annoying
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00