Commit Graph

2001 Commits

Author SHA1 Message Date
Matt Wells
bcb584c1fd Merge branch 'diffbot-matt' of github.com:gigablast/open-source-search-engine into diffbot-matt 2014-07-10 10:03:20 -07:00
Matt Wells
e3532a9c5f fix core when getting facet values in xmldoc.cpp 2014-07-10 10:02:53 -07:00
mwells
0da6063983 bring tags back in site list / url filters. 2014-07-10 07:44:16 -07:00
mwells
683abd3875 more api work 2014-07-10 07:10:49 -07:00
mwells
5bbdb8e172 got page add url and add url api working. 2014-07-09 20:32:30 -07:00
Matt Wells
4c72e376fa fix core dump when no collrec 2014-07-09 17:36:23 -07:00
mwells
950352d781 do not hash redundant xpaths that have the same inner sentence/alnum
html as their children tags. waste of index space.
2014-07-09 17:16:01 -07:00
mwells
b231bc8042 incorporate total # of docs with that xpathsitehash
into the tag attr. so using the MxDy should be good
enough to determine if something is chrome or not.
2014-07-09 16:47:47 -07:00
mwells
50c64f9369 fix printing of getInlineSectionVotingBuf() to be more accurate 2014-07-09 15:44:41 -07:00
mwells
05fcef9651 more vote infusion and squid proxy fixes. 2014-07-09 14:57:58 -07:00
mwells
d4218e01d7 inject docs that come through our squid proxy 2014-07-09 12:25:23 -07:00
mwells
d7b67f21e7 return error if we get CONNECT requests. we don't
handle those because we can't cache them or inject
the sectiondb voting info into their tags because they
are encrypted from us.
2014-07-09 11:06:46 -07:00
mwells
0f9409235e some cleanups 2014-07-09 10:41:38 -07:00
mwells
62c920efd0 Merge branch 'testing' into diffbot-matt 2014-07-09 10:23:27 -07:00
mwells
205349cfbb added a clone coll page. for after creation cloning. 2014-07-09 07:54:29 -07:00
mwells
192b2ee393 added sendPageClone() entire page for cloning as well 2014-07-09 07:33:14 -07:00
mwells
f7e7468e74 get clone working when adding new coll 2014-07-09 07:13:24 -07:00
mwells
0b64d7f0af show api/xml/json in serps 2014-07-09 06:36:36 -07:00
mwells
a154f679d1 some setup for qaspider() 2014-07-08 20:33:13 -07:00
mwells
5ae476f34e print facets for each search result 2014-07-08 19:38:54 -07:00
mwells
1af75c5d88 send back facet field/value pairs in msg20reply 2014-07-08 14:22:55 -07:00
mwells
99872e9f72 Merge branch 'diffbot-testing' into testing 2014-07-08 13:38:10 -07:00
Matt Wells
48d12eb147 application/xhtml+xml should be type html not xml
otherwise we don't end up spidering the links
2014-07-08 13:34:53 -07:00
Matt Wells
a09ba6261f fix ./gb installgb 2014-07-08 13:23:22 -07:00
mwells
a4273a1269 section voting markup updates 2014-07-08 11:14:45 -07:00
mwells
1e8f6ce474 forgot to specify coll for facet link 2014-07-08 10:40:34 -07:00
mwells
e658ebc8f6 fix up sections page some more. useful
for debugging sections stuff.
2014-07-08 10:31:42 -07:00
mwells
842d72b5db Merge branch 'testing' into diffbot-matt 2014-07-08 09:58:54 -07:00
mwells
5a557e765a added copyCollRec() function to clone
setting of one coll to another.
2014-07-08 07:57:49 -07:00
mwells
d7cc290a1f added a few new search parms that can be used
to override collection defaults.
hide all clustered results.
max title len.
max summary excerpt/line width.
2014-07-08 07:01:51 -07:00
mwells
eb7d83cbad added &showimages=0 parm. also print image url. 2014-07-08 06:26:42 -07:00
mwells
a7bddbcc0b return up to the first 3 h1 tags when &geth1tag=1
is specified for an xml or json feed.
2014-07-07 21:01:07 -07:00
Matt Wells
445896e04c fix query reindex core 2014-07-07 19:11:01 -07:00
mwells
67ba89dd11 added gbequalint: query operator for showing
docs with a specific facet VALUE.
2014-07-07 17:40:49 -07:00
mwells
c8567f8a24 sectioning stuff working halfway decent.
still need to do docid-based stats perhaps.
need to scroll to section hash when clicking
the 'sections' link.
2014-07-07 16:46:38 -07:00
mwells
a3ef40ccf5 pass qa test 2014-07-07 12:42:30 -07:00
mwells
d9ae010371 shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed.
got them working again multicasting a msg 0x39 to the appropriate shard.
set special msg39request flag for better performance for those guys.
2014-07-07 12:32:27 -07:00
mwells
6434e5cc04 Merge branch 'testing' into diffbot-matt
Conflicts:
	Errno.cpp
	Errno.h
	Parms.h
2014-07-07 09:49:59 -07:00
mwells
05065f7f8c treat http status 999 as forbidden. 2014-07-07 09:46:24 -07:00
mwells
e22641997a fix geth1tag some more.
fixed bad comment tag detection. was losing
a good deal of some pages because of that.
2014-07-07 08:20:21 -07:00
mwells
fed7b73b9f passing qa test again 2014-07-06 22:06:33 -07:00
mwells
38e64a6600 update qa loop 2014-07-06 19:43:00 -07:00
mwells
dc6c97c59c basic qa tests running 2014-07-06 18:53:05 -07:00
mwells
4dee019107 qa fixes 2014-07-06 16:47:04 -07:00
mwells
aeae6bb1a5 qa test updates 2014-07-06 15:04:21 -07:00
mwells
70e1eab935 more api updates 2014-07-06 14:13:00 -07:00
mwells
97ad9a62e0 support &addcoll= as well as &addColl= 2014-07-06 12:09:41 -07:00
mwells
574b3f9354 if netpbm pkg already installed use it. 2014-07-06 09:54:28 -07:00
mwells
e4f43848d5 fix http://abc.com in sitelist some more. 2014-07-06 08:31:27 -07:00
mwells
43d0d636ee fix dmoz building. 2014-07-05 22:20:15 -07:00