Commit Graph

53 Commits

Author SHA1 Message Date
Matt
95e3a760e9 proxy fixes 2015-03-05 11:10:40 -08:00
Matt
0eafc68a13 debug msg helper 2015-03-04 12:45:06 -08:00
Matt
e886f1bbac replace memcpy_ass with bcopy 2015-01-14 14:12:55 -08:00
mwells
87285ba3cd use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt
730b131bbf added new indicators so we can make gb more stable.
now hosts table reports # ooms, disk read corruptions,
closed sockets from overloads, and we # of outstanding
spiders. made ping request a class so we can easily add
new indicators.
2014-12-16 16:22:50 -08:00
Matt Wells
251f7d2f22 fix core when removing row from url filters table.
tail safebuf was not detaching buf.
clear all of memtable on startup, use sizeof(char) not 4.
fix m_memtablesize since it can't be based on m_maxMem
because g_hostdb inits before g_conf.m_maxMem and calls
Mem::addMem()
2014-12-02 16:17:06 -08:00
Matt Wells
abfa9a500e mem fix 2014-12-02 07:09:57 -08:00
Matt Wells
a1d673936f fix some final issues with 64bit stuff 2014-12-02 06:48:56 -08:00
Matt
ea67c688b9 fixed a couple really nasty mem leak bugs from new facet code 2014-11-25 11:00:27 -07:00
Matt
adcef39376 Merge branch 'diffbot-testing' into diffbot-matt
Conflicts:
	Collectiondb.cpp
	Collectiondb.h
	Conf.cpp
	Conf.h
	Msg39.cpp
	PageEvents.cpp
	PageResults.cpp
	PageTurk.cpp
	Pages.cpp
	Parms.cpp
	Posdb.cpp
	Proxy.cpp
	Query.cpp
	Query.h
	RdbBase.cpp
	RdbMap.cpp
	Repair.cpp
	Repair.h
	SafeBuf.cpp
	Spider.cpp
	Tagdb.cpp
	TopTree.cpp
	XmlDoc.cpp
	main.cpp
2014-11-20 16:53:07 -08:00
Matt
dbd8af0eaa -O4 put backs in makefile. efence off. 2014-11-17 18:14:13 -08:00
Matt
931a1c4bc6 good checkpoint. quite a few fixes. 2014-11-17 18:13:36 -08:00
Matt
4a0554c76f more 64bit fixes 2014-11-14 17:30:32 -08:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
Matt Wells
b13f3d24d7 replaced unsigned long long with uint64_t 2014-10-30 13:30:39 -06:00
mwells
3457245893 fix printf compiler warnings 2014-08-28 13:23:46 -07:00
mwells
caee238c46 fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
Matt Wells
2137e150e7 Merge branch 'testing' into diffbot-matt
Conflicts:
	Collectiondb.cpp
	Make.depend
	Parms.cpp
2014-06-27 17:17:14 -07:00
Matt Wells
3b8741a7cb trying to prevent some cores. 2014-06-16 07:03:51 -07:00
mwells
4a2717a88f Merge branch 'diffbot-testing' into diffbot-matt 2014-06-09 12:42:54 -07:00
mwells
628fe2336f make code compile cleaner. 2014-06-07 14:11:12 -07:00
mwells
ee5af6b30e more spider proxy fixes 2014-06-02 14:59:15 -07:00
mwells
7c30c6b970 make install fixes. getting ready for pkg build. 2014-05-11 14:20:24 -07:00
mwells
1b5c6a6278 create hosts.conf into cwd if not there.
pretty up logging system.
update admin.html
2014-04-06 21:12:52 -07:00
mwells
5ee79a4c2f daemonize on ./gb 0 etc. 2014-04-06 15:57:38 -07:00
Matt Wells
8aa0662a27 Merge branch 'diffbot' into testing
Conflicts:

	Make.depend
	PageResults.cpp
	Parms.cpp
	Spider.cpp
	Spider.h
	gb.conf
2014-03-08 09:38:44 -07:00
Matt Wells
1b62f1582b print memtable when almost full so we can see
where the leak is. more spiders for ethan.
do not try to get diffbot reply if page is already json.
likely it is an injected diffbot json reply.
2014-03-04 18:19:50 -08:00
Matt Wells
94a55bf9a6 fixes for new link info code so it doesn't
bottleneck. got EFENCE_SIZE working so we
can use efence on large allocs only so we don't
go oom using it. might help finding some of
the out of bounds writing going on.
2014-02-25 10:55:05 -08:00
Matt Wells
ecdd167d9b code checkpoint 2014-02-09 16:41:43 -07:00
Matt Wells
7b424a6236 always use kstart.
fixed restrictDomain bug of not saving parm.
sped up csv download around 2x.
2014-01-28 14:37:21 -08:00
Matt Wells
33c5d9c07f a lot of times rdb tree has invalid collection
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
fe3a879758 formatting changes 2014-01-19 00:38:02 -08:00
Matt Wells
4a842c1c68 fix occassional core in Mem.cpp 2014-01-08 01:32:24 -07:00
Matt Wells
43e40208b8 Merge branch 'master' into diffbot
Conflicts:
	SafeBuf.cpp
	SafeBuf.h
	SearchInput.cpp
	XmlDoc.cpp
2013-11-20 15:51:58 -08:00
mwells
46a683a904 label the bigger safebuf chunks of mem
so we can see a better breakdown of mem
on the stats page, not just a big "SafeBuf"
allocation.
2013-11-19 23:53:40 -07:00
Matt Wells
7248641bc4 fix mem leaks. turn off electric fence. 2013-11-11 09:58:14 -08:00
Matt Wells
3e4db4f1bc show all crawl details in url webhook
notification in the post body.
2013-11-07 13:59:43 -08:00
Matt Wells
c39b45ff88 fix crawl round end detection etc.
inc round counter even if not repeating crawl
2013-10-23 15:53:59 -07:00
Matt Wells
64a1c7c2f2 more bug fixes. if spiders disabled for row
in url filters, don't spider the url.
2013-10-21 14:45:12 -07:00
Matt Wells
84a3aded94 spider round updates correction 2013-10-17 17:18:05 -07:00
Matt Wells
df7fd21253 spider rounds update. 2013-10-17 17:17:19 -07:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
d4b5c37f45 Merge branch 'master' into testing 2013-10-13 00:20:37 -07:00
mwells
c283e85e40 add support for noindex meta tag.
use it in the gbdmoz.urls.txt.* files
that contain the dmoz urls we want to spider.
2013-10-12 22:50:23 -07:00
Matt Wells
0b4bbf926e fix potential compiler error. 2013-10-09 11:52:58 -07:00
Matt Wells
283ec2f6b4 email and webhook alerts when spider runs out of urls
to spider.
2013-10-09 11:42:56 -07:00
Matt Wells
a412c798bf Merge branch 'master' into diffbot
Conflicts:
	PageResults.cpp
2013-09-13 09:24:28 -07:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00