mwells
ca6af65217
git dmoz nagivation system working.
...
now we just need to index the urls to
populate dmoz.
2013-10-10 22:08:21 -07:00
mwells
7ba9994804
many dmoz fixes. but still more we need to do.
...
isn't printing subcategories right now.
2013-10-08 23:55:11 -07:00
mwells
63c7764cd1
c=dmoz3 to c=dmoz
2013-10-06 17:12:45 -07:00
mwells
59b491f007
return fake tag recs for links if
...
usefakeips meta tag is given. saves
some lookups in tagdb when adding gbdmoz.urls.txt.*
files which have tons of links each. like 500,000.
2013-10-06 16:42:32 -07:00
mwells
2383905c80
start using fakeips flag to stop
...
ip tragrec lookups
2013-10-06 16:40:04 -06:00
mwells
183b7c372e
make sections grow dynamically so we do not
...
OOM when trying to index a gbdmoz.urls.txt.* file
which can be 25MB.
2013-10-06 11:04:10 -06:00
mwells
d8e6ac8748
fixed bug of not putting meta tags
...
in all gbdmoz.urls.txt.* files in
dmozparse.cpp
2013-10-06 00:18:59 -06:00
mwells
000caa5a26
support for usefakeips meta tag
2013-10-06 00:10:07 -06:00
mwells
2935a143f0
if downloading a url on 127.0.0.1 or other local
...
ip then do not limit download size. should fix
downloading of gbdmoz.urls.txt.* files which can be
> 25MB big.
2013-10-05 23:43:00 -06:00
mwells
612f2872f7
use addurl to add the gbdmoz url
...
files to gigablast. it should index
just those dmoz urls, and not spider their links.
it should ignore external errors like
ETCPTIMEDOUT when indexing so it will be
identical to dmoz.
2013-10-05 23:22:51 -06:00
mwells
d464066da4
use catdb/ not cat/
2013-10-04 22:39:41 -06:00
mwells
71d5d05f7c
use catdb/ subdir not cat/ for consistency.
2013-10-04 21:35:13 -06:00
mwells
78c4bda368
fix dmozparse urldump -s bugs
...
for dumping out urls in dmoz.
2013-10-04 00:00:26 -06:00
mwells
f562e6da9a
just ignore all urls with # (hashtag) in them
...
from the dmoz dump. we were truncating
http://twitter.com/#!/ronpaul to
http://twitter.com/ and when looking up
the catids of twitter.com got that ronpaul url.
so that's bad. people should respect the hashtag.
2013-10-03 23:33:55 -06:00
mwells
0176f8d6a7
fix cores in catdb logic.
2013-10-03 22:34:49 -06:00
mwells
9e1fee2cb9
dmozparse works with latest dmoz files now
2013-10-03 22:08:40 -06:00
mwells
a0c79932bb
catdb is now generated successfully.
2013-10-02 23:36:49 -06:00
mwells
6c2c9f7774
trying to bring back dmoz integration.
2013-10-02 22:34:21 -06:00
Matt Wells
91b8921b9e
have to use different ports if multiple gb
...
instances/processes on same server.
2013-10-02 16:12:17 -07:00
mwells
c03e862b99
use a better version of hosts.conf where we
...
specify the working directory for each host
entry. then we can use the exact same hosts.conf
file for each gb instance rather than having to
change the single "working-dir:" directive for
each instance, in the case where the each have
a different working directory.
2013-10-02 13:11:58 -06:00
Matt Wells
c911a606c9
renamed matches.h and matches.cpp to
...
matches2.h and matches2.cpp to avoid potential
confusion with Matches.h and Matches.cpp files.
2013-10-01 07:58:24 -07:00
mwells
5a52072888
recommended SSDs for optimal performance in admin.html.
2013-09-28 14:02:02 -06:00
mwells
a80cb52740
minor log msg.
2013-09-28 13:58:59 -06:00
mwells
737f3eae4d
Merge branch 'master' of github.com:gigablast/open-source-search-engine
2013-09-28 13:45:39 -06:00
mwells
5884951190
only do certain things if running
...
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
mwells
88677e1a15
fix bad engineer error that comes up sometimes
...
when viewing cached pages.
2013-09-27 18:15:59 -06:00
Matt Wells
e34afd21ea
fix bug of possibly not removing some locks
2013-09-25 09:28:35 -07:00
Matt Wells
a687380aeb
fix a bug of not reading enough spiderdb
...
records for a given "ip" because short reads
were causing us to bail out early. still not
sure as to the cause of the short reads.
2013-09-24 20:48:48 -07:00
Matt Wells
fbd853fdf7
fix long-standing spider bug causing some
...
ip queues to not get fully spidered.
2013-09-24 20:44:55 -07:00
Matt Wells
fbd62cecba
updated compilation instructions. need
...
to apt-get install gcc-multilib.
2013-09-20 10:06:01 -07:00
mwells
d6815f2c9d
if family filter enabled (&ff=1) then
...
prepend "gbadult:0 |" to the query to
restrict to non-adult pages.
2013-09-18 00:11:55 -06:00
mwells
a0032e0eb7
added another log statement for when
...
debugging the adult content detectory.
we err on the side of caution for the most part.
2013-09-18 00:06:21 -06:00
mwells
119a4c0c22
fix adult content detector
2013-09-17 23:53:17 -06:00
mwells
5ec3803312
fix core in hashing gbisadult:[0|1] term.
2013-09-17 23:27:31 -06:00
Matt Wells
3005f904c7
index gbisadult:1 if adult content
...
gbisadult:0 if not.
2013-09-17 22:05:47 -07:00
Matt Wells
3ac79de92e
fix type adurl -> addurl.
2013-09-16 08:11:06 -07:00
Matt Wells
e6f87f5049
do not send email alerts to sysadmin@gigablast.
2013-09-16 08:10:18 -07:00
Matt Wells
5deda56ede
minor documentation updates.
2013-09-15 22:16:14 -07:00
Matt Wells
3fdbae4b05
admin.html documentation update.
2013-09-15 22:05:01 -07:00
Matt Wells
68db2e6cc6
fix bug when checking the delete checkbox on
...
the injection page.
2013-09-15 21:47:42 -07:00
Matt Wells
965e23f192
fix core from hashtablex::set() not getting
...
enough buf space. now we force it to allocate
a minimum of 32 slots to fix another bug where
it was re-allocating immediately upon adding a
key because growTable() is ALWAYS called if there
are less than 20 slots!
2013-09-15 21:15:58 -07:00
Matt Wells
991e2f30f7
speed up whitelist hashtable like 20x
...
using hashtable key magic.
2013-09-15 21:10:53 -07:00
Matt Wells
928dc36a03
get "&site=abc.com+xyz.com"... working to restrict
...
search results to specified sites. tested a little.
2013-09-15 20:16:48 -07:00
mwells
2211881e59
take apt-get install ssl stuff out of admin.html
...
installation instructions since we supply the
ssl headers now.
2013-09-15 18:27:47 -06:00
mwells
01c2a6d381
we already include our own 32-bit
...
libssl.a and libcrypto.a so we can ensure
stability. so we have to include the header
files as well really.
2013-09-15 18:25:49 -06:00
mwells
107037c6a2
new &sites=xyz.com+abc.com+... functionality compiles ok.
2013-09-15 18:14:32 -06:00
mwells
b684414e16
almost done adding support for whitelists.
...
i.e. list of sites to restrict search results to,
for instance.
2013-09-15 15:15:56 -06:00
mwells
e152205765
make depend update
2013-09-09 02:37:47 -06:00
Matt Wells
1d63aa936c
remove plotter.h includes causing
...
compiler errors on some machines.
2013-09-09 01:25:00 -07:00
Matt Wells
76b390aea2
fix typo
2013-09-08 19:51:57 -07:00