Matt Wells
b13f3d24d7
replaced unsigned long long with uint64_t
2014-10-30 13:30:39 -06:00
mwells
10f897e5be
use gbsystem() not system() so it can turn off alarms
...
since it forks.
2014-09-11 05:01:55 -07:00
mwells
d9ae010371
shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed.
...
got them working again multicasting a msg 0x39 to the appropriate shard.
set special msg39request flag for better performance for those guys.
2014-07-07 12:32:27 -07:00
Matt Wells
98b317b421
Merge branch 'diffbot-testing' into diffbot-matt
...
Conflicts:
Parms.cpp
Query.cpp
2014-06-27 17:23:03 -07:00
Matt Wells
e9ff8c48d8
try to remove the sluggishness from
...
all hosts... should really reduce load.
2014-06-25 17:46:28 -07:00
mwells
a09d4cd723
Merge branch 'master' into diffbot-matt
...
Conflicts:
Collectiondb.cpp
Pages.cpp
XmlDoc.cpp
gb.conf
2014-06-20 09:35:39 -07:00
mwells
494c43d5dd
fix gb execution in main.cpp::getcwd2() function.
2014-06-19 06:03:11 -07:00
mwells
584af942d4
Merge branch 'testing' into diffbot-matt
...
Conflicts:
Collectiondb.cpp
Make.depend
Parms.cpp
2014-06-16 20:42:28 -07:00
Matt Wells
549f8eb5bc
fix bug in hosts.conf when expanding working dir.
2014-06-16 11:32:10 -07:00
mwells
4a2717a88f
Merge branch 'diffbot-testing' into diffbot-matt
2014-06-09 12:42:54 -07:00
mwells
d57ce8a2df
simplify compilation more. remove clones()
2014-06-07 14:26:11 -07:00
mwells
a1f1daad16
Merge branch 'master' into diffbot-matt
...
Conflicts:
Spider.cpp
2014-06-03 11:41:46 -07:00
mwells
a811462d5f
spider proxy stuff compiles now
2014-05-30 15:05:00 -07:00
Matt Wells
b0f9227bbc
path fixes for gb startup
2014-05-25 10:28:13 -04:00
Matt Wells
037067170c
fix for symlinks in host paths in hosts.conf
2014-05-12 20:50:11 -07:00
Matt Wells
5f7bbe7523
fix diffbot smoke tests. do not index spider replies
...
for custom crawls.
2014-05-12 15:14:11 -07:00
mwells
a9dc18c866
fix more bugs.
2014-05-11 19:44:41 -07:00
mwells
c3a1c674c3
now we run gb without a hostid.
...
we use its path and the local ip to identify its
hostid # in the hosts.conf.
2014-05-11 19:36:24 -07:00
mwells
463dc2159f
more make install updates
2014-05-11 17:02:15 -07:00
mwells
2b37f56e4c
Merge branch 'diffbot-matt' into testing
2014-05-10 07:56:45 -07:00
mwells
f19014cc6c
fixed missing /
2014-05-10 06:39:36 -07:00
Matt Wells
9edd5c8264
thumbnail generation support back in.
2014-04-24 10:13:45 -07:00
mwells
72dc660598
Merge branch 'testing' into diffbot-matt
...
Conflicts:
Collectiondb.cpp
HttpRequest.h
PageBasic.cpp
coll.main.0/coll.conf
2014-04-09 11:18:39 -07:00
mwells
1b5c6a6278
create hosts.conf into cwd if not there.
...
pretty up logging system.
update admin.html
2014-04-06 21:12:52 -07:00
mwells
23e5a94ddf
move log file in the binary itself now.
2014-04-06 14:02:51 -07:00
Matt Wells
a6b7e088f5
take out tfndb, unused. fix core
...
from diffbot url too long.
2014-02-26 01:07:13 -08:00
Matt Wells
e8a6d8f345
fix another core from freening wrong byte sized
...
crawl info reply.
2014-01-30 20:16:41 -08:00
Matt Wells
3a6a271dd9
make crawl sync bug fixes.
...
fix Puz crawl from dying out on host 9
because spider reply did not resuscitate waiting
tree for its ip.
fix mike's zola crawl with a repeat of 3 days
from not incmreneting the round because it had
maxrounds 0, which means to ignore... assume 0
means to ignore now. send out 0xc1 crawl info
requests to even dead hosts so we can at least use
their last known good info.
2014-01-25 13:47:03 -08:00
Matt Wells
e3f769dffe
fixes for sudden revitilization of dead crawls.
2014-01-25 11:03:15 -08:00
Matt Wells
4606e88721
code cleanups.
...
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00
Matt Wells
d091c7e959
fix hostsinagreement bug
2014-01-14 11:24:32 -08:00
Matt Wells
8a49e87a61
got code with shard rebalancing compiling.
...
now we store a "sharded by termid" bit in posdb
key for checksums, etc keys that are not sharded
by docid. save having to do disk seeks on every
host in the cluster to do a dup check, etc.
2014-01-11 16:08:42 -08:00
Matt Wells
f64b53bfb3
almost done with rebalancing code
2014-01-10 14:12:58 -08:00
Matt Wells
141a76c322
try localhosts.conf before hosts.conf
2013-12-26 09:32:22 -08:00
Matt Wells
f7e7acb398
minor log msg updates.
...
updated admin.html to give some performance and
storage capacity info.
2013-12-09 23:16:24 -07:00
Matt Wells
fb7096dc5d
num-mirrors: updates
2013-10-24 14:59:35 -07:00
Matt Wells
f65a2fd625
support num-mirrors: instead of index-splits:
...
directive.
2013-10-24 14:32:56 -07:00
Matt Wells
fc17521697
Merge branch 'master' into diffbot
...
Conflicts:
Hostdb.cpp
Makefile
PageResults.cpp
PageRoot.cpp
Pages.cpp
Rdb.cpp
SearchInput.cpp
SearchInput.h
Spider.cpp
Spider.h
XmlDoc.cpp
2013-10-16 14:28:42 -07:00
Matt Wells
ddbacab12f
fix shard mapping of spiderdb.
2013-10-08 16:35:37 -07:00
Matt Wells
a76e8e42c3
fix json parsing oopsy.
2013-10-08 16:28:25 -07:00
Matt Wells
fe97e08281
move from groups to shards. got rid of annoying
...
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
mwells
6c2c9f7774
trying to bring back dmoz integration.
2013-10-02 22:34:21 -06:00
mwells
9730e5f3ef
fix lost spiders from updating crawl info.
...
fix maxspidersperip limitation not being obeyed.
removed fakedb.
only add "0" time waiting tree keys to waiting tree.
only scanSpiderdb() will change their times to
a future time or add them to doledb directly.
confirmLockAcquisition() will not add to waitingtree
if max spiders per ip limit would be exceeded.
an incoming spider reply will trigger the add to
waiting tree with a time of "0".
2013-09-28 13:12:33 -06:00
mwells
b90ef3de0d
more spider fixes. right after getting lock,
...
use msg12 to remove rec from doledb/doleiptable
and add 0 entry to waiting table so doledb is
again immediately repopulated with that firstIp
so we can spider multiple urls from the same ip
at the same time.
2013-09-23 20:25:28 -06:00
mwells
4d33737ac1
fakedb fixes
2013-09-23 08:19:54 -07:00
Matt Wells
5dc7bd2ab4
integrate diffbot from svn back into git.
2013-09-13 09:23:18 -07:00
mwells
be7aab78b7
Fixed bugs with running a proxy.
...
Added more comments into hosts.conf.
2013-08-08 14:41:38 -06:00
Matt Wells
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00