Commit Graph

1358 Commits

Author SHA1 Message Date
mwells
2b37f56e4c Merge branch 'diffbot-matt' into testing 2014-05-10 07:56:45 -07:00
mwells
38a79888b6 Merge branch 'diffbot-testing' into testing 2014-05-10 07:49:29 -07:00
mwells
ed816b2c11 a few bug fixes 2014-05-10 07:48:23 -07:00
mwells
6b92a1f3d4 Merge branch 'master' into testing 2014-05-10 06:43:27 -07:00
mwells
f19014cc6c fixed missing / 2014-05-10 06:39:36 -07:00
Matt Wells
e70f760d87 us gbstatus: and gbstatusmsg: field operators 2014-05-09 18:10:38 -07:00
Matt Wells
b1cd0cac86 indexing spider replies now working.
use type:status to see them or
gbstatus:success or gbstatus:tcp or gbstatus:0.
2014-05-09 18:07:38 -07:00
Matt Wells
941c8f1892 now added CT_STATUS type results into serps.
one for each spider reply we add so we can query
spider replies. using url: or type:status etc.
2014-05-09 13:52:12 -07:00
Matt Wells
eb49094343 try to start indexing spider replies
as regular search results in the index so
you can query on those. get histograms of
spider status msgs, etc. ability to turn
that and images on/off.
2014-05-09 11:18:24 -07:00
mwells
6048ae849b added support for spidering a particular language
with higher priority.
2014-05-09 10:03:24 -06:00
Matt Wells
305340e4ff minor update 2014-05-07 16:33:16 -07:00
Matt Wells
a2c47750fa temp disable wpid logic. do indexing of err pages first. 2014-05-07 16:26:57 -07:00
Matt Wells
3bf52f0f2d if "wpid" is supplied try to update sitelist
for that wpid. hopefully we can get the wp admin
tools to send a /search?wpid=xxxx&sites=xyz.com request so
we can start spidering those sites before they even see the
widget. also it is simpler than trying to update m_siteListBuf
each time someone does a query since those can be hundreds
a second.
2014-05-07 16:10:26 -07:00
Matt Wells
01a6ae1166 take html column out of csv 2014-05-07 13:28:20 -07:00
Matt Wells
dd5f35b06d added icon 2014-05-07 13:06:56 -07:00
Matt Wells
dd3ab38e55 formatting updates for widget 2014-05-07 13:06:42 -07:00
Matt Wells
cebaffe76b show embed code for html or php below widget 2014-05-07 12:56:06 -07:00
Matt Wells
09ab2a8d15 fix jerky scrollbar thing 2014-05-06 15:11:57 -07:00
Matt Wells
54e50f5c72 minor updates 2014-05-06 14:53:01 -07:00
Matt Wells
0bface8c9c fix focus issues with widget qbox 2014-05-06 14:38:51 -07:00
Matt Wells
4135a8c0a3 fix thumbnail g_errno bug. make thumbnails
bigger. 250x250.
fix query logic for widget.
2014-05-06 14:05:33 -07:00
Matt Wells
ea0c3abcc6 fix widget query box to do ajax 2014-05-06 13:45:53 -07:00
Matt Wells
285c7e298e fixes for widget 2014-05-06 11:33:00 -07:00
Matt Wells
2f331d55e5 widget updates 2014-05-06 10:47:57 -07:00
Matt Wells
0daced51df Merge branch 'diffbot-testing' into diffbot-matt 2014-05-02 14:34:04 -07:00
Matt Wells
4d059109f1 work on infinite scrolling better updating
so user doesn't lose scrollbar position
2014-05-02 14:33:05 -07:00
Matt Wells
9f67eb7699 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2014-05-02 12:20:57 -07:00
Matt Wells
5ff56e3863 fix problem when too many docids are deduped
and we don't have enough to show. re-merge the shard
termlists to try to get more...
2014-05-02 12:24:38 -07:00
Matt Wells
980b67ce7c remove temp hack in there 2014-05-02 09:58:15 -07:00
Matt Wells
494db4ede8 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2014-05-01 17:07:51 -07:00
Matt Wells
a503cb35a2 fix gb installconf 2014-05-01 17:09:15 -07:00
Matt Wells
1d766826ae retry if too man docids deduped when &stream=1 2014-05-01 17:07:31 -07:00
Matt Wells
060f7da967 fix data corruption detection and repair bug.
do not core on corrupt http reply missing \0.
just set the g_errno to ECORRUPTDATA.
give more informative corruption log msgs.
2014-05-01 10:38:00 -07:00
Matt Wells
3fcb146462 minor updates 2014-04-30 13:44:19 -07:00
mwells
ff121a76d9 fix formatting bugs 2014-04-30 14:17:39 -06:00
mwells
81369b786c make trash dir for image thumbs automatically 2014-04-29 17:01:48 -06:00
Matt Wells
f05ab3698b fix thumbnail scaling. 2014-04-28 14:30:29 -07:00
Matt Wells
066a01cba6 Merge branch 'diffbot-testing' into diffbot-matt 2014-04-28 14:15:02 -07:00
Matt Wells
a2c6527ada put thumbnail in proper proportion.
other formatting fixes.
2014-04-28 14:14:18 -07:00
Matt Wells
e21e0a404c fixed bug for product title extraction.
titledb-saved.dat tree loop corruption bug.
no main coll bug.
put the ajax widget on spider status page so you can
see spider going in realtime. will give customers
a good idea of the spider moving along.
more widget fixes, to use new base64 thumbs, etc.
2014-04-28 13:30:24 -07:00
Matt Wells
de4a0a13a8 more thumbnail generation updates 2014-04-27 11:05:30 -07:00
Matt Wells
65493fcdec minor print fix 2014-04-26 13:41:08 -07:00
Matt Wells
20a2729827 added jobCreationTimeUTC and jobCompletionTimeUTC
to json api
2014-04-25 14:12:18 -07:00
Matt Wells
5c0d646133 fix invalid json when doing &s=1 2014-04-25 13:46:20 -07:00
Matt Wells
f3c06ced57 try to fix core from deleting coll 2014-04-25 11:52:17 -07:00
Matt Wells
82726879a2 support base64 generated thumbnails in serps. 2014-04-24 14:04:57 -07:00
Matt Wells
08058d4f69 Merge branch 'master' into diffbot-matt 2014-04-24 10:14:53 -07:00
Matt Wells
efc16d2b21 Merge branch 'diffbot-testing' into diffbot-matt 2014-04-24 10:14:49 -07:00
Matt Wells
9edd5c8264 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
Matt Wells
45e2506598 use statically compiled pdftohtml 2014-04-24 08:31:52 -07:00