Commit Graph

51 Commits

Author SHA1 Message Date
Matt Wells
1fb1e2af7e fixed form input. fixed page parser submission.
added ability to dump out termlist from posdb
like type:json (with a colon in it) to try to debug
msft seeing html in csv output.
2014-01-29 14:10:08 -08:00
Matt Wells
a9909e189f fix delete collection api 2014-01-27 15:28:26 -08:00
Matt Wells
df063dbdf2 fix a core 2014-01-22 22:26:50 -08:00
Matt Wells
33c5d9c07f a lot of times rdb tree has invalid collection
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
9354d06493 menu updates. 2014-01-21 13:01:37 -08:00
Matt Wells
8d5e1cb547 added url download support 2014-01-20 23:17:04 -08:00
Matt Wells
089d7f34a0 more spiderdb spider request fixes 2014-01-19 18:00:56 -08:00
Matt Wells
970d5b2488 formatting 2014-01-19 16:40:22 -08:00
Matt Wells
fa0e3f784f formatting 2014-01-19 15:06:02 -08:00
Matt Wells
99de2188e1 formatting 2014-01-19 13:21:58 -08:00
Matt Wells
ca816492b5 doc links 2014-01-19 12:01:32 -08:00
Matt Wells
471599e9e7 formatting 2014-01-19 10:44:19 -08:00
Matt Wells
fe3a879758 formatting changes 2014-01-19 00:38:02 -08:00
Matt Wells
4606e88721 code cleanups.
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00
Matt Wells
f3000e2763 set m_needsSave in collectionrec when parms updated 2014-01-18 12:51:10 -08:00
Matt Wells
9c1f6197eb added indexbody control so i can
turn it off for my special json
global index.
2014-01-18 10:04:33 -08:00
Matt Wells
2faba0efd1 fix repeat rounds sticking bug
by adding PF_REBUILDURLFILTERS flag to
spiderroundastarttime parm
2014-01-17 17:17:10 -08:00
Matt Wells
4b27b22949 git rebalancing working right 2014-01-15 17:40:17 -08:00
Matt Wells
883487889d make gb install only have 10 outstanding per an ip
since ssh seems to close connections if you have more
than 12 out.
2014-01-15 14:41:30 -08:00
Matt Wells
d091c7e959 fix hostsinagreement bug 2014-01-14 11:24:32 -08:00
Matt Wells
cb5b4af271 show reason spiders are not going above
the spider queue page.
2014-01-11 21:40:45 -08:00
Matt Wells
9da106e7ca added ermergency msg box on all admin pages 2014-01-11 20:35:13 -08:00
Matt Wells
eed606601e added emergency msg box on all admin pages 2014-01-11 20:14:44 -08:00
Matt Wells
6de7abf6ba display fixes.
./gb installgb and ./gb installgb2 now install 'gb'
if 'gb.new' is not present.
2014-01-11 17:16:20 -08:00
Matt Wells
f64b53bfb3 almost done with rebalancing code 2014-01-10 14:12:58 -08:00
Matt Wells
8943106389 minor print updates 2014-01-09 21:23:51 -08:00
Matt Wells
1d6ba52dcd list collections in sidebar. 2014-01-09 21:13:41 -08:00
Matt Wells
645360b730 parm simplifcations 2014-01-09 19:00:21 -08:00
Matt Wells
501f49c81b gui and parm updates. simplifcations. 2014-01-09 17:29:18 -08:00
Matt Wells
4d7fa1eea9 pretty up url filters table 2014-01-09 13:34:43 -08:00
Matt Wells
70f8c416de allow collections to be added when no colls exist.
fixed gb start2 etc. to be sequential.
2014-01-09 13:07:16 -08:00
Matt Wells
0615acff17 zero out url filters checkboxes on submit 2013-12-16 11:03:40 -08:00
Matt Wells
a13114605a more parm overhaul fixes 2013-12-12 12:44:54 -08:00
mwells
82494baa89 move CollectionRec stuff into Collectiondb files
for simplicity.
2013-12-10 15:28:04 -08:00
mwells
f2d5661965 parmdb overhaul. support collection add/del
sync when host comes back online. use udp not tcp.
host #0 can now handle a new incoming request while
a parm change is currently outstanding.
all missed "command" parms will be received when a dead host
comes back online, too, like a tight merge for instance.
does not use msg4, uses msg3e and msg3f for syncing and
sending parms.
2013-12-10 13:09:55 -08:00
Matt Wells
dd3b49faa9 collection name hell 2013-12-08 16:44:37 -07:00
Matt Wells
df28c4e0c2 search results in csv format.
remove serps per page limit if custom crawl.
2013-11-12 16:33:45 -08:00
Matt Wells
22f9e9355d /v2/bulk api fixes 2013-10-22 18:51:09 -07:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
7ba9994804 many dmoz fixes. but still more we need to do.
isn't printing subcategories right now.
2013-10-08 23:55:11 -07:00
Matt Wells
c0f1330d70 Merge branch 'master' into diffbot
Conflicts:

	HttpServer.cpp
	Makefile
	PageGet.cpp
	Pages.h
	SafeBuf.h
2013-09-28 13:13:12 -07:00
mwells
5884951190 only do certain things if running
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
mwells
7cdb3d6f9c fix infinite loop from json parsing and
fix some core dumps.
2013-09-27 17:52:36 -06:00
mwells
e7377d72ab fix robots.txt switch. fix collection rec saving.
require collname explicitly for injecturl urldata.
2013-09-27 11:39:23 -06:00
mwells
eb3f657411 fixed distributed support for adding/deleting/resetting
collections. now need to specify collection name
like &addcoll=mycoll when adding a coll.
2013-09-27 10:49:24 -06:00
mwells
fd081478de fix crawlbot to work on a distributed network
as far as adding/deleting/resetting  colls
and updating parms. ideally we'd have a Colldb
Rdb where each key was a parm. that would make
syncing easier if a host went down, then it would
get the negative/positive colldb parm keys later.
so it could sync up on all your operations as long
as all your operations in terms of adding and deleting
database key/value pairs.
2013-09-26 22:41:05 -06:00
Matt Wells
4c11265a98 more updates to crawlbot api 2013-09-16 13:59:11 -07:00
Matt Wells
a412c798bf Merge branch 'master' into diffbot
Conflicts:
	PageResults.cpp
2013-09-13 09:24:28 -07:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
mwells
34b6d3e74a fixed some cores. brought in fixes from
old repo.
2013-09-08 16:16:13 -06:00