Commit Graph

362 Commits

Author SHA1 Message Date
mwells
05400a0c25 updated spider code documentation. 2013-09-20 11:19:24 -07:00
Matt Wells
fbd62cecba updated compilation instructions. need
to apt-get install gcc-multilib.
2013-09-20 10:06:01 -07:00
Matt Wells
bcc55dc46b fixed a couple bugs. Added more documentation
into Spider.h.
2013-09-19 18:21:52 -07:00
Matt Wells
47465f6d90 more fixes. trying to fix spiders to
spider multiple urls from same ip...
2013-09-19 11:13:40 -07:00
Matt Wells
a3ea867305 update crawlbot api. 2013-09-18 17:13:36 -07:00
Matt Wells
022caeec04 use -diffbotxyz%li as a more unique appendage.
show token on crawlbot page.
2013-09-18 17:05:41 -07:00
Matt Wells
29f5c5d644 added isonsamesubdomain and isonsamedomain 2013-09-18 16:45:37 -07:00
Matt Wells
8de246d9c4 only show urls being spidered from your coll 2013-09-18 16:29:47 -07:00
Matt Wells
3bdd28ab1d fix spider bug 2013-09-18 16:17:08 -07:00
Matt Wells
7fdbd0f66a delete spider coll when deleting coll 2013-09-18 15:36:30 -07:00
Matt Wells
f90d20f4dd diffbot api integration updates 2013-09-18 15:07:47 -07:00
Matt Wells
70ff54ce03 hide the parms that might scare users away
in the url filters.
2013-09-18 14:27:59 -07:00
Matt Wells
6af02119a1 use cookies to display url filters table. 2013-09-18 13:50:55 -07:00
Matt Wells
04b0a08ef9 propagate showtable=1 when submitting url filters table 2013-09-18 12:38:05 -07:00
Matt Wells
924d1320a2 fix bugs inserting and deleting rows
using TYPE_SAFEBUF parms.
2013-09-18 12:35:01 -07:00
Matt Wells
c1bcebb7bb url filter documentation update. 2013-09-18 12:00:29 -07:00
Matt Wells
459a7e98fb add diffbot dropdown to url filters table 2013-09-18 11:24:16 -07:00
Matt Wells
487d3f0a0e fix url filters bugs. 2013-09-18 11:02:09 -07:00
Matt Wells
39d9760e5d added ismedia url filter to
cover all the jpg,gif,mpeg,css rules.
2013-09-18 09:40:59 -07:00
Matt Wells
c77453348f Merge branch 'master' into diffbot
Conflicts:
	SearchInput.cpp
	XmlDoc.cpp
2013-09-18 09:23:48 -07:00
mwells
d6815f2c9d if family filter enabled (&ff=1) then
prepend "gbadult:0 |" to the query to
restrict to non-adult pages.
2013-09-18 00:11:55 -06:00
mwells
a0032e0eb7 added another log statement for when
debugging the adult content detectory.
we err on the side of caution for the most part.
2013-09-18 00:06:21 -06:00
mwells
119a4c0c22 fix adult content detector 2013-09-17 23:53:17 -06:00
mwells
5ec3803312 fix core in hashing gbisadult:[0|1] term. 2013-09-17 23:27:31 -06:00
Matt Wells
3005f904c7 index gbisadult:1 if adult content
gbisadult:0 if not.
2013-09-17 22:05:47 -07:00
Matt Wells
10fcfb6987 minor updates 2013-09-17 17:32:49 -07:00
Matt Wells
b8590d7df9 do not show json pages if searching pages. 2013-09-17 17:23:58 -07:00
Matt Wells
7fa4138d1c fix Next 10 link 2013-09-17 17:19:41 -07:00
Matt Wells
98caa3225a fix query prepend logic for json searches 2013-09-17 17:16:39 -07:00
Matt Wells
017a0febef fix api dropdown selection. 2013-09-17 16:38:56 -07:00
Matt Wells
5e3b727eb5 crawlbot api fixes. 2013-09-17 16:30:57 -07:00
Matt Wells
b38d54cef9 save crawlinfo as binary so its easier
to not miss anything.
2013-09-17 16:07:59 -07:00
Matt Wells
2beff7f7d8 crawlbot api updates 2013-09-17 15:59:50 -07:00
Matt Wells
e50da4d012 crawlbot api fixes 2013-09-17 15:47:44 -07:00
Matt Wells
c16fe8601b more crawlbot api fixes 2013-09-17 15:32:28 -07:00
Matt Wells
e7151e6cc6 fix bug with spiders not coming on. 2013-09-17 14:35:48 -07:00
Matt Wells
c81f700bf0 get reset collection kinda working. 2013-09-17 14:13:44 -07:00
Matt Wells
4321f02e4e trying to get reset collection working 2013-09-17 12:21:09 -07:00
Matt Wells
fff8b80969 get collection delete working 2013-09-17 11:27:31 -07:00
Matt Wells
63973cf9c0 get "add new collection" working. 2013-09-17 10:43:23 -07:00
Matt Wells
02bf6ab3cc new crawlbot api. not backwards compatible any more. 2013-09-17 10:25:54 -07:00
mwells
f34a7f44ab compiler flag fix for xmldoc.o 2013-09-16 22:35:16 -06:00
mwells
afd1b3a9a2 added Diffbot.h 2013-09-16 21:42:48 -06:00
Matt Wells
fc692202ba fix integration of urls filters into crawlbot page 2013-09-16 16:27:48 -07:00
Matt Wells
e7ed9254d4 formatting... 2013-09-16 15:33:45 -07:00
Matt Wells
1a780d1f4a pretty up a little 2013-09-16 15:18:55 -07:00
Matt Wells
a034604cef clean up to remove g_conf.m_useDiffbot 2013-09-16 15:00:43 -07:00
Matt Wells
cb9969ad22 fix token bug 2013-09-16 14:38:29 -07:00
Matt Wells
3dfba4de69 doc updates 2013-09-16 14:29:01 -07:00
Matt Wells
4c11265a98 more updates to crawlbot api 2013-09-16 13:59:11 -07:00