mwells
7c30c6b970
make install fixes. getting ready for pkg build.
2014-05-11 14:20:24 -07:00
mwells
1b5c6a6278
create hosts.conf into cwd if not there.
...
pretty up logging system.
update admin.html
2014-04-06 21:12:52 -07:00
mwells
5ee79a4c2f
daemonize on ./gb 0 etc.
2014-04-06 15:57:38 -07:00
Matt Wells
8aa0662a27
Merge branch 'diffbot' into testing
...
Conflicts:
Make.depend
PageResults.cpp
Parms.cpp
Spider.cpp
Spider.h
gb.conf
2014-03-08 09:38:44 -07:00
Matt Wells
1b62f1582b
print memtable when almost full so we can see
...
where the leak is. more spiders for ethan.
do not try to get diffbot reply if page is already json.
likely it is an injected diffbot json reply.
2014-03-04 18:19:50 -08:00
Matt Wells
94a55bf9a6
fixes for new link info code so it doesn't
...
bottleneck. got EFENCE_SIZE working so we
can use efence on large allocs only so we don't
go oom using it. might help finding some of
the out of bounds writing going on.
2014-02-25 10:55:05 -08:00
Matt Wells
ecdd167d9b
code checkpoint
2014-02-09 16:41:43 -07:00
Matt Wells
7b424a6236
always use kstart.
...
fixed restrictDomain bug of not saving parm.
sped up csv download around 2x.
2014-01-28 14:37:21 -08:00
Matt Wells
33c5d9c07f
a lot of times rdb tree has invalid collection
...
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
fe3a879758
formatting changes
2014-01-19 00:38:02 -08:00
Matt Wells
4a842c1c68
fix occassional core in Mem.cpp
2014-01-08 01:32:24 -07:00
Matt Wells
43e40208b8
Merge branch 'master' into diffbot
...
Conflicts:
SafeBuf.cpp
SafeBuf.h
SearchInput.cpp
XmlDoc.cpp
2013-11-20 15:51:58 -08:00
mwells
46a683a904
label the bigger safebuf chunks of mem
...
so we can see a better breakdown of mem
on the stats page, not just a big "SafeBuf"
allocation.
2013-11-19 23:53:40 -07:00
Matt Wells
7248641bc4
fix mem leaks. turn off electric fence.
2013-11-11 09:58:14 -08:00
Matt Wells
3e4db4f1bc
show all crawl details in url webhook
...
notification in the post body.
2013-11-07 13:59:43 -08:00
Matt Wells
c39b45ff88
fix crawl round end detection etc.
...
inc round counter even if not repeating crawl
2013-10-23 15:53:59 -07:00
Matt Wells
64a1c7c2f2
more bug fixes. if spiders disabled for row
...
in url filters, don't spider the url.
2013-10-21 14:45:12 -07:00
Matt Wells
84a3aded94
spider round updates correction
2013-10-17 17:18:05 -07:00
Matt Wells
df7fd21253
spider rounds update.
2013-10-17 17:17:19 -07:00
Matt Wells
fc17521697
Merge branch 'master' into diffbot
...
Conflicts:
Hostdb.cpp
Makefile
PageResults.cpp
PageRoot.cpp
Pages.cpp
Rdb.cpp
SearchInput.cpp
SearchInput.h
Spider.cpp
Spider.h
XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
d4b5c37f45
Merge branch 'master' into testing
2013-10-13 00:20:37 -07:00
mwells
c283e85e40
add support for noindex meta tag.
...
use it in the gbdmoz.urls.txt.* files
that contain the dmoz urls we want to spider.
2013-10-12 22:50:23 -07:00
Matt Wells
0b4bbf926e
fix potential compiler error.
2013-10-09 11:52:58 -07:00
Matt Wells
283ec2f6b4
email and webhook alerts when spider runs out of urls
...
to spider.
2013-10-09 11:42:56 -07:00
Matt Wells
a412c798bf
Merge branch 'master' into diffbot
...
Conflicts:
PageResults.cpp
2013-09-13 09:24:28 -07:00
Matt Wells
5dc7bd2ab4
integrate diffbot from svn back into git.
2013-09-13 09:23:18 -07:00
Matt Wells
76b390aea2
fix typo
2013-09-08 19:51:57 -07:00
mwells
d930a833cc
try to fix compiler error related to bad
...
delete function override. added "throw()"
before the first "{" in the function
body.
2013-09-08 20:15:39 -06:00
Matt Wells
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00