Matt
95e3a760e9
proxy fixes
2015-03-05 11:10:40 -08:00
Matt
0eafc68a13
debug msg helper
2015-03-04 12:45:06 -08:00
Matt
e886f1bbac
replace memcpy_ass with bcopy
2015-01-14 14:12:55 -08:00
mwells
87285ba3cd
use gbmemcpy not memcpy so we can get profiler working again
...
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt
730b131bbf
added new indicators so we can make gb more stable.
...
now hosts table reports # ooms, disk read corruptions,
closed sockets from overloads, and we # of outstanding
spiders. made ping request a class so we can easily add
new indicators.
2014-12-16 16:22:50 -08:00
Matt Wells
251f7d2f22
fix core when removing row from url filters table.
...
tail safebuf was not detaching buf.
clear all of memtable on startup, use sizeof(char) not 4.
fix m_memtablesize since it can't be based on m_maxMem
because g_hostdb inits before g_conf.m_maxMem and calls
Mem::addMem()
2014-12-02 16:17:06 -08:00
Matt Wells
abfa9a500e
mem fix
2014-12-02 07:09:57 -08:00
Matt Wells
a1d673936f
fix some final issues with 64bit stuff
2014-12-02 06:48:56 -08:00
Matt
ea67c688b9
fixed a couple really nasty mem leak bugs from new facet code
2014-11-25 11:00:27 -07:00
Matt
adcef39376
Merge branch 'diffbot-testing' into diffbot-matt
...
Conflicts:
Collectiondb.cpp
Collectiondb.h
Conf.cpp
Conf.h
Msg39.cpp
PageEvents.cpp
PageResults.cpp
PageTurk.cpp
Pages.cpp
Parms.cpp
Posdb.cpp
Proxy.cpp
Query.cpp
Query.h
RdbBase.cpp
RdbMap.cpp
Repair.cpp
Repair.h
SafeBuf.cpp
Spider.cpp
Tagdb.cpp
TopTree.cpp
XmlDoc.cpp
main.cpp
2014-11-20 16:53:07 -08:00
Matt
dbd8af0eaa
-O4 put backs in makefile. efence off.
2014-11-17 18:14:13 -08:00
Matt
931a1c4bc6
good checkpoint. quite a few fixes.
2014-11-17 18:13:36 -08:00
Matt
4a0554c76f
more 64bit fixes
2014-11-14 17:30:32 -08:00
Matt
4c19453ea9
working with -m32 for basic testing.
...
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3
now it compiles with -m32
2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956
replace long long with int64_t
2014-10-30 13:36:39 -06:00
Matt Wells
b13f3d24d7
replaced unsigned long long with uint64_t
2014-10-30 13:30:39 -06:00
mwells
3457245893
fix printf compiler warnings
2014-08-28 13:23:46 -07:00
mwells
caee238c46
fixes to make easier to compile on max os x.
2014-08-28 12:55:02 -07:00
Matt Wells
2137e150e7
Merge branch 'testing' into diffbot-matt
...
Conflicts:
Collectiondb.cpp
Make.depend
Parms.cpp
2014-06-27 17:17:14 -07:00
Matt Wells
3b8741a7cb
trying to prevent some cores.
2014-06-16 07:03:51 -07:00
mwells
4a2717a88f
Merge branch 'diffbot-testing' into diffbot-matt
2014-06-09 12:42:54 -07:00
mwells
628fe2336f
make code compile cleaner.
2014-06-07 14:11:12 -07:00
mwells
ee5af6b30e
more spider proxy fixes
2014-06-02 14:59:15 -07:00
mwells
7c30c6b970
make install fixes. getting ready for pkg build.
2014-05-11 14:20:24 -07:00
mwells
1b5c6a6278
create hosts.conf into cwd if not there.
...
pretty up logging system.
update admin.html
2014-04-06 21:12:52 -07:00
mwells
5ee79a4c2f
daemonize on ./gb 0 etc.
2014-04-06 15:57:38 -07:00
Matt Wells
8aa0662a27
Merge branch 'diffbot' into testing
...
Conflicts:
Make.depend
PageResults.cpp
Parms.cpp
Spider.cpp
Spider.h
gb.conf
2014-03-08 09:38:44 -07:00
Matt Wells
1b62f1582b
print memtable when almost full so we can see
...
where the leak is. more spiders for ethan.
do not try to get diffbot reply if page is already json.
likely it is an injected diffbot json reply.
2014-03-04 18:19:50 -08:00
Matt Wells
94a55bf9a6
fixes for new link info code so it doesn't
...
bottleneck. got EFENCE_SIZE working so we
can use efence on large allocs only so we don't
go oom using it. might help finding some of
the out of bounds writing going on.
2014-02-25 10:55:05 -08:00
Matt Wells
ecdd167d9b
code checkpoint
2014-02-09 16:41:43 -07:00
Matt Wells
7b424a6236
always use kstart.
...
fixed restrictDomain bug of not saving parm.
sped up csv download around 2x.
2014-01-28 14:37:21 -08:00
Matt Wells
33c5d9c07f
a lot of times rdb tree has invalid collection
...
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
fe3a879758
formatting changes
2014-01-19 00:38:02 -08:00
Matt Wells
4a842c1c68
fix occassional core in Mem.cpp
2014-01-08 01:32:24 -07:00
Matt Wells
43e40208b8
Merge branch 'master' into diffbot
...
Conflicts:
SafeBuf.cpp
SafeBuf.h
SearchInput.cpp
XmlDoc.cpp
2013-11-20 15:51:58 -08:00
mwells
46a683a904
label the bigger safebuf chunks of mem
...
so we can see a better breakdown of mem
on the stats page, not just a big "SafeBuf"
allocation.
2013-11-19 23:53:40 -07:00
Matt Wells
7248641bc4
fix mem leaks. turn off electric fence.
2013-11-11 09:58:14 -08:00
Matt Wells
3e4db4f1bc
show all crawl details in url webhook
...
notification in the post body.
2013-11-07 13:59:43 -08:00
Matt Wells
c39b45ff88
fix crawl round end detection etc.
...
inc round counter even if not repeating crawl
2013-10-23 15:53:59 -07:00
Matt Wells
64a1c7c2f2
more bug fixes. if spiders disabled for row
...
in url filters, don't spider the url.
2013-10-21 14:45:12 -07:00
Matt Wells
84a3aded94
spider round updates correction
2013-10-17 17:18:05 -07:00
Matt Wells
df7fd21253
spider rounds update.
2013-10-17 17:17:19 -07:00
Matt Wells
fc17521697
Merge branch 'master' into diffbot
...
Conflicts:
Hostdb.cpp
Makefile
PageResults.cpp
PageRoot.cpp
Pages.cpp
Rdb.cpp
SearchInput.cpp
SearchInput.h
Spider.cpp
Spider.h
XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
d4b5c37f45
Merge branch 'master' into testing
2013-10-13 00:20:37 -07:00
mwells
c283e85e40
add support for noindex meta tag.
...
use it in the gbdmoz.urls.txt.* files
that contain the dmoz urls we want to spider.
2013-10-12 22:50:23 -07:00
Matt Wells
0b4bbf926e
fix potential compiler error.
2013-10-09 11:52:58 -07:00
Matt Wells
283ec2f6b4
email and webhook alerts when spider runs out of urls
...
to spider.
2013-10-09 11:42:56 -07:00
Matt Wells
a412c798bf
Merge branch 'master' into diffbot
...
Conflicts:
PageResults.cpp
2013-09-13 09:24:28 -07:00
Matt Wells
5dc7bd2ab4
integrate diffbot from svn back into git.
2013-09-13 09:23:18 -07:00