Commit Graph

15 Commits

Author SHA1 Message Date
Matt Wells
9b5e3016df fix hosts.conf 2013-12-26 09:34:35 -08:00
Matt Wells
7624a3db0a if url is manually added and it is simplifiedredirect
then re-add with the same manually added bit set
in the new spider request, otherwise seed url might
not get spidered since it might not match the regex.
2013-12-26 08:58:56 -08:00
Matt Wells
6cc69106c2 fix hosts.conf 2013-12-23 10:30:45 -08:00
mwells
76bb3d05e1 clean up logging so i can see what's going on 2013-12-10 16:41:30 -08:00
Matt Wells
263bb8dfbc fix oops 2013-11-05 14:32:56 -08:00
Matt Wells
2b904e9563 include firstip in the spider url lock,
not just uh48, because using fake ips
results in having the same url crawled twice
since it is from a different "firstip" so
we should include "firstip" in the lock as well
to prevent a double round increment.
see comment in Spider.cpp to this effect.
2013-11-05 14:31:05 -08:00
Matt Wells
b22f8d5d19 minor msg update 2013-10-29 15:26:32 -07:00
Matt Wells
54c50c1f3a added "retrictDomain" parm which defaults to 1.
will restrict spidered urls to same domain as
seed urls.
2013-10-29 09:31:57 -07:00
Matt Wells
fb7096dc5d num-mirrors: updates 2013-10-24 14:59:35 -07:00
Matt Wells
f65a2fd625 support num-mirrors: instead of index-splits:
directive.
2013-10-24 14:32:56 -07:00
Matt Wells
91b8921b9e have to use different ports if multiple gb
instances/processes on same server.
2013-10-02 16:12:17 -07:00
mwells
c03e862b99 use a better version of hosts.conf where we
specify the working directory for each host
entry. then we can use the exact same hosts.conf
file for each gb instance rather than having to
change the single "working-dir:" directive for
each instance, in the case where the each have
a different working directory.
2013-10-02 13:11:58 -06:00
mwells
e9297df240 listen on DNS port 5998 not 6000. 6000 seemed
to cause issues on a particular install for
some reason.
2013-08-19 15:02:27 -06:00
mwells
be7aab78b7 Fixed bugs with running a proxy.
Added more comments into hosts.conf.
2013-08-08 14:41:38 -06:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00