Commit Graph

29 Commits

Author SHA1 Message Date
mwells
7c30c6b970 make install fixes. getting ready for pkg build. 2014-05-11 14:20:24 -07:00
mwells
1b5c6a6278 create hosts.conf into cwd if not there.
pretty up logging system.
update admin.html
2014-04-06 21:12:52 -07:00
mwells
5ee79a4c2f daemonize on ./gb 0 etc. 2014-04-06 15:57:38 -07:00
Matt Wells
8aa0662a27 Merge branch 'diffbot' into testing
Conflicts:

	Make.depend
	PageResults.cpp
	Parms.cpp
	Spider.cpp
	Spider.h
	gb.conf
2014-03-08 09:38:44 -07:00
Matt Wells
1b62f1582b print memtable when almost full so we can see
where the leak is. more spiders for ethan.
do not try to get diffbot reply if page is already json.
likely it is an injected diffbot json reply.
2014-03-04 18:19:50 -08:00
Matt Wells
94a55bf9a6 fixes for new link info code so it doesn't
bottleneck. got EFENCE_SIZE working so we
can use efence on large allocs only so we don't
go oom using it. might help finding some of
the out of bounds writing going on.
2014-02-25 10:55:05 -08:00
Matt Wells
ecdd167d9b code checkpoint 2014-02-09 16:41:43 -07:00
Matt Wells
7b424a6236 always use kstart.
fixed restrictDomain bug of not saving parm.
sped up csv download around 2x.
2014-01-28 14:37:21 -08:00
Matt Wells
33c5d9c07f a lot of times rdb tree has invalid collection
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
fe3a879758 formatting changes 2014-01-19 00:38:02 -08:00
Matt Wells
4a842c1c68 fix occassional core in Mem.cpp 2014-01-08 01:32:24 -07:00
Matt Wells
43e40208b8 Merge branch 'master' into diffbot
Conflicts:
	SafeBuf.cpp
	SafeBuf.h
	SearchInput.cpp
	XmlDoc.cpp
2013-11-20 15:51:58 -08:00
mwells
46a683a904 label the bigger safebuf chunks of mem
so we can see a better breakdown of mem
on the stats page, not just a big "SafeBuf"
allocation.
2013-11-19 23:53:40 -07:00
Matt Wells
7248641bc4 fix mem leaks. turn off electric fence. 2013-11-11 09:58:14 -08:00
Matt Wells
3e4db4f1bc show all crawl details in url webhook
notification in the post body.
2013-11-07 13:59:43 -08:00
Matt Wells
c39b45ff88 fix crawl round end detection etc.
inc round counter even if not repeating crawl
2013-10-23 15:53:59 -07:00
Matt Wells
64a1c7c2f2 more bug fixes. if spiders disabled for row
in url filters, don't spider the url.
2013-10-21 14:45:12 -07:00
Matt Wells
84a3aded94 spider round updates correction 2013-10-17 17:18:05 -07:00
Matt Wells
df7fd21253 spider rounds update. 2013-10-17 17:17:19 -07:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
d4b5c37f45 Merge branch 'master' into testing 2013-10-13 00:20:37 -07:00
mwells
c283e85e40 add support for noindex meta tag.
use it in the gbdmoz.urls.txt.* files
that contain the dmoz urls we want to spider.
2013-10-12 22:50:23 -07:00
Matt Wells
0b4bbf926e fix potential compiler error. 2013-10-09 11:52:58 -07:00
Matt Wells
283ec2f6b4 email and webhook alerts when spider runs out of urls
to spider.
2013-10-09 11:42:56 -07:00
Matt Wells
a412c798bf Merge branch 'master' into diffbot
Conflicts:
	PageResults.cpp
2013-09-13 09:24:28 -07:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
Matt Wells
76b390aea2 fix typo 2013-09-08 19:51:57 -07:00
mwells
d930a833cc try to fix compiler error related to bad
delete function override. added "throw()"
before the first "{" in the function
body.
2013-09-08 20:15:39 -06:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00