Commit Graph

51 Commits

Author SHA1 Message Date
Matt
95e3a760e9 proxy fixes 2015-03-05 11:10:40 -08:00
Matt
cda39715f2 fix gigabit sample for json so we get some nice gigabits
and fast facts.
2015-01-12 18:06:42 -08:00
Matt Wells
febb1d4658 print pretty floats in the facets menu,
whether printing a single float or a range
of floats.
2014-12-09 17:17:12 -08:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
mwells
caee238c46 fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
mwells
c4174a0ca6 fix bug causing qa json facet test to fail 2014-07-30 15:36:08 -07:00
mwells
58f5a2dd57 save conf files safely to disk so we don't
lose them because the disk is full.
2014-07-29 10:02:43 -07:00
mwells
837b6cf465 api updates 2014-07-23 08:47:48 -07:00
mwells
5ae476f34e print facets for each search result 2014-07-08 19:38:54 -07:00
mwells
6434e5cc04 Merge branch 'testing' into diffbot-matt
Conflicts:
	Errno.cpp
	Errno.h
	Parms.h
2014-07-07 09:49:59 -07:00
mwells
dc6c97c59c basic qa tests running 2014-07-06 18:53:05 -07:00
mwells
29d170631a more api updates 2014-07-05 12:36:01 -07:00
mwells
ea2650292a more api updates. will also be useful
for running qa tests.
2014-07-04 20:57:42 -07:00
mwells
9249564191 now floaters are working pretty well 2014-06-30 16:26:10 -06:00
mwells
72df0d25d2 added safebuf base64decode func 2014-06-06 16:20:15 -07:00
mwells
a811462d5f spider proxy stuff compiles now 2014-05-30 15:05:00 -07:00
mwells
8fb8669da1 more spider proxy updates. 2014-05-29 21:17:51 -06:00
Matt Wells
72c6d032d8 fix query reindex on subdocuments (diffbot json blurbs)
so that they just put in a spiderrequest to reindex
the parent url. Added &diffbotreply= to the injection
interface so dan can provide that along with the
pageUrl he passes in with &u=
2014-05-15 14:11:12 -07:00
Matt Wells
82726879a2 support base64 generated thumbnails in serps. 2014-04-24 14:04:57 -07:00
mwells
61b4ec4ca6 added some qa testing logic. qa.cpp. 2014-04-05 11:33:42 -07:00
Matt Wells
2d4af1aefe index numbers as integers too, not just floats
so we can sort by spider date without losing
128 seconds of resolution.
2014-02-06 20:57:54 -08:00
Matt Wells
239811b024 take out confusing function no longer used 2014-01-28 11:10:59 -08:00
Matt Wells
8a49e87a61 got code with shard rebalancing compiling.
now we store a "sharded by termid" bit in posdb
key for checksums, etc keys that are not sharded
by docid. save having to do disk seeks on every
host in the cluster to do a dup check, etc.
2014-01-11 16:08:42 -08:00
Matt Wells
e366c12470 Merge branch 'master' into diffbot
Conflicts:
	Collectiondb.cpp
	Msg13.cpp
	Parms.cpp
	Spider.h
2014-01-07 12:09:11 -08:00
Matt Wells
7df2111ceb fixed 'gb inject titledb-DIR newhosts.conf' command
for populating an index from titledb files in DIR
and transmitting to appropriate host in newhosts.conf.
also prettied up the gb -h output to use a formatting
function.
2014-01-02 01:20:08 -07:00
Matt Wells
c2f8445a70 expand reg ex shortcuts like \d to [0-9] 2013-12-19 18:31:37 -08:00
mwells
82494baa89 move CollectionRec stuff into Collectiondb files
for simplicity.
2013-12-10 15:28:04 -08:00
Matt Wells
78a4cfe6da forgot to push the .h files 2013-12-07 22:12:48 -07:00
Matt Wells
5e4b5a112c Merge branch 'master' into diffbot
Conflicts:

	PageResults.cpp
	Threads.cpp
	XmlDoc.cpp
	XmlDoc.h
2013-12-07 11:34:26 -07:00
Matt Wells
5da41cd113 fix a couple different cores. 2013-11-24 19:46:44 -07:00
Matt Wells
e0a15194e1 fix json double decoding issue. no more
partial decodes, json parser stores
fully decoded string into separate buf.
2013-11-22 14:16:14 -08:00
Matt Wells
43e40208b8 Merge branch 'master' into diffbot
Conflicts:
	SafeBuf.cpp
	SafeBuf.h
	SearchInput.cpp
	XmlDoc.cpp
2013-11-20 15:51:58 -08:00
mwells
46a683a904 label the bigger safebuf chunks of mem
so we can see a better breakdown of mem
on the stats page, not just a big "SafeBuf"
allocation.
2013-11-19 23:53:40 -07:00
Matt Wells
fbcd6b8afd display json objects that are not in arrays
in csv. show csv header. how to deal
with heterogenous object lists?
index spiderdate: for gbsortby:spiderdate.
added gbrevsortby: support.
2013-11-12 13:51:52 -08:00
Matt Wells
09f28b2f26 now we index all numbers that have field names
(so can't just be a number in the body) but it
can be in a meta tag or json item. then use
like gbsortby:products.offerPrice to sort the
search results (json objects) by that.
2013-11-08 16:16:13 -08:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
Matt Wells
f5e5b0f5d3 fix crawlbot bugs 2013-10-16 12:12:22 -07:00
mwells
a0808df2ae got new diffbot api compiled 2013-10-14 18:19:59 -06:00
mwells
6d5643e185 json parsing 2013-10-11 16:14:26 -06:00
mwells
6c2c9f7774 trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
Matt Wells
c0f1330d70 Merge branch 'master' into diffbot
Conflicts:

	HttpServer.cpp
	Makefile
	PageGet.cpp
	Pages.h
	SafeBuf.h
2013-09-28 13:13:12 -07:00
mwells
5884951190 only do certain things if running
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
mwells
5fbf323cb5 json api now shows all collections
and their relevant parms and stats
for /crawlbot?token=xxx&format=json
2013-09-25 16:59:31 -06:00
Matt Wells
9db501d91c resolve merge conflict for nullTerm() 2013-09-16 09:06:33 -07:00
Matt Wells
78a334198b Merge branch 'master' into diffbot 2013-09-16 09:05:37 -07:00
Matt Wells
928dc36a03 get "&site=abc.com+xyz.com"... working to restrict
search results to specified sites. tested a little.
2013-09-15 20:16:48 -07:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
Matt Wells
94e6492916 removed MAX_COLL_RECS so we can have unlimited
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00