mwells
16ead85cfd
added support for adding an alias to a collection
...
using &alias=xxxxx
2013-09-26 14:50:34 -06:00
mwells
e3c4ce189a
fixed cores. fixed json.
2013-09-26 14:28:04 -06:00
mwells
8fde0c5343
added support for serialize/deserialize
...
of TYPE_SAFEBUF parms over distributed network.
2013-09-26 08:56:14 -06:00
mwells
f252dd9189
minor crawlbot gui updates
2013-09-25 19:41:20 -06:00
mwells
65df6dfe52
added some handy links
2013-09-25 18:00:16 -06:00
mwells
0b5a45e8aa
more api updates. added m_avoidSpiderLinks to
...
spider request so urldata=xxxx can turn link
spidering off. probably desirable so its default.
so &spiderlinks=[0|1] applies to urldata as well
as injecturl=
2013-09-25 17:51:43 -06:00
mwells
01fa9fe383
make it proper json output
2013-09-25 17:12:01 -06:00
mwells
6fca32e4b5
minor oops fix.
2013-09-25 17:06:01 -06:00
mwells
5fbf323cb5
json api now shows all collections
...
and their relevant parms and stats
for /crawlbot?token=xxx&format=json
2013-09-25 16:59:31 -06:00
mwells
d14832f93e
new json api code compiles. need to test now.
2013-09-25 16:04:16 -06:00
mwells
0039b23064
almost done with json api.
2013-09-25 15:37:20 -06:00
mwells
50ba93991b
minor ui changes
2013-09-25 13:09:02 -06:00
mwells
9dc9114902
added stat page for all collections.
2013-09-25 12:57:07 -06:00
mwells
0fe0147913
fix invisible columns in url filters table.
2013-09-25 12:24:13 -06:00
mwells
1d92004e06
fix spider flow debug msgs
2013-09-25 12:07:11 -06:00
mwells
40192249f9
spider speedups and fixes.
2013-09-25 11:58:03 -06:00
Matt Wells
e34afd21ea
fix bug of possibly not removing some locks
2013-09-25 09:28:35 -07:00
Matt Wells
a687380aeb
fix a bug of not reading enough spiderdb
...
records for a given "ip" because short reads
were causing us to bail out early. still not
sure as to the cause of the short reads.
2013-09-24 20:48:48 -07:00
Matt Wells
fbd853fdf7
fix long-standing spider bug causing some
...
ip queues to not get fully spidered.
2013-09-24 20:44:55 -07:00
mwells
b16d8519fc
more spider fixes. still need more speedups
...
when spidering multiple spiders on same ip.
2013-09-24 16:40:14 -06:00
mwells
e594af898a
seems like we can spider multiple urls
...
from same ip at same time now.
2013-09-24 09:32:26 -06:00
mwells
8461e33b53
fixed more spider bugs.
2013-09-23 21:26:27 -07:00
mwells
b90ef3de0d
more spider fixes. right after getting lock,
...
use msg12 to remove rec from doledb/doleiptable
and add 0 entry to waiting table so doledb is
again immediately repopulated with that firstIp
so we can spider multiple urls from the same ip
at the same time.
2013-09-23 20:25:28 -06:00
mwells
7c31ecff4a
fixed fakedb key support.
2013-09-23 15:16:23 -06:00
mwells
4d33737ac1
fakedb fixes
2013-09-23 08:19:54 -07:00
mwells
83e87fc755
fixed ability to spider multiple urls from the
...
same IP at the same time. Also respects
sameIpWait constraints.
2013-09-20 15:42:48 -07:00
mwells
05400a0c25
updated spider code documentation.
2013-09-20 11:19:24 -07:00
Matt Wells
fbd62cecba
updated compilation instructions. need
...
to apt-get install gcc-multilib.
2013-09-20 10:06:01 -07:00
Matt Wells
bcc55dc46b
fixed a couple bugs. Added more documentation
...
into Spider.h.
2013-09-19 18:21:52 -07:00
Matt Wells
47465f6d90
more fixes. trying to fix spiders to
...
spider multiple urls from same ip...
2013-09-19 11:13:40 -07:00
Matt Wells
a3ea867305
update crawlbot api.
2013-09-18 17:13:36 -07:00
Matt Wells
022caeec04
use -diffbotxyz%li as a more unique appendage.
...
show token on crawlbot page.
2013-09-18 17:05:41 -07:00
Matt Wells
29f5c5d644
added isonsamesubdomain and isonsamedomain
2013-09-18 16:45:37 -07:00
Matt Wells
8de246d9c4
only show urls being spidered from your coll
2013-09-18 16:29:47 -07:00
Matt Wells
3bdd28ab1d
fix spider bug
2013-09-18 16:17:08 -07:00
Matt Wells
7fdbd0f66a
delete spider coll when deleting coll
2013-09-18 15:36:30 -07:00
Matt Wells
f90d20f4dd
diffbot api integration updates
2013-09-18 15:07:47 -07:00
Matt Wells
70ff54ce03
hide the parms that might scare users away
...
in the url filters.
2013-09-18 14:27:59 -07:00
Matt Wells
6af02119a1
use cookies to display url filters table.
2013-09-18 13:50:55 -07:00
Matt Wells
04b0a08ef9
propagate showtable=1 when submitting url filters table
2013-09-18 12:38:05 -07:00
Matt Wells
924d1320a2
fix bugs inserting and deleting rows
...
using TYPE_SAFEBUF parms.
2013-09-18 12:35:01 -07:00
Matt Wells
c1bcebb7bb
url filter documentation update.
2013-09-18 12:00:29 -07:00
Matt Wells
459a7e98fb
add diffbot dropdown to url filters table
2013-09-18 11:24:16 -07:00
Matt Wells
487d3f0a0e
fix url filters bugs.
2013-09-18 11:02:09 -07:00
Matt Wells
39d9760e5d
added ismedia url filter to
...
cover all the jpg,gif,mpeg,css rules.
2013-09-18 09:40:59 -07:00
Matt Wells
c77453348f
Merge branch 'master' into diffbot
...
Conflicts:
SearchInput.cpp
XmlDoc.cpp
2013-09-18 09:23:48 -07:00
mwells
d6815f2c9d
if family filter enabled (&ff=1) then
...
prepend "gbadult:0 |" to the query to
restrict to non-adult pages.
2013-09-18 00:11:55 -06:00
mwells
a0032e0eb7
added another log statement for when
...
debugging the adult content detectory.
we err on the side of caution for the most part.
2013-09-18 00:06:21 -06:00
mwells
119a4c0c22
fix adult content detector
2013-09-17 23:53:17 -06:00
mwells
5ec3803312
fix core in hashing gbisadult:[0|1] term.
2013-09-17 23:27:31 -06:00