mwells
|
5ee79a4c2f
|
daemonize on ./gb 0 etc.
|
2014-04-06 15:57:38 -07:00 |
|
mwells
|
c20c30c53f
|
Merge branch 'testing' of github.com:gigablast/open-source-search-engine into testing
|
2014-04-06 14:03:13 -07:00 |
|
mwells
|
23e5a94ddf
|
move log file in the binary itself now.
|
2014-04-06 14:02:51 -07:00 |
|
mwells
|
5ff88fafbc
|
spider status updates
|
2014-04-05 18:52:40 -07:00 |
|
mwells
|
264f27b826
|
fix url filters to have !insitelist directive
|
2014-04-05 18:40:39 -07:00 |
|
mwells
|
b0dbf833a7
|
fix sitelist update logic.
|
2014-04-05 18:26:00 -07:00 |
|
mwells
|
ac5cf7971b
|
more misc updates.
|
2014-04-05 18:09:04 -07:00 |
|
mwells
|
bd82145626
|
Merge branch 'diffbot-testing' into testing
|
2014-04-05 12:34:46 -07:00 |
|
mwells
|
89f5c8c059
|
Merge branch 'diffbot-matt' into diffbot-testing
|
2014-04-05 11:34:27 -07:00 |
|
mwells
|
61b4ec4ca6
|
added some qa testing logic. qa.cpp.
|
2014-04-05 11:33:42 -07:00 |
|
Daniel Steinberg
|
0988a134d0
|
Merge remote-tracking branch 'origin/diffbot' into diffbot-dan
|
2014-04-01 19:48:24 -07:00 |
|
Daniel Steinberg
|
4856cc4c60
|
||, not &&
|
2014-04-01 10:45:54 -07:00 |
|
Daniel Steinberg
|
3e38bd169e
|
and return an error
|
2014-04-01 10:43:17 -07:00 |
|
Daniel Steinberg
|
94b169b8dc
|
only delete if there were no io errors
|
2014-04-01 10:42:12 -07:00 |
|
Daniel Steinberg
|
6568858e81
|
implement something that works like mv, which tries rename first, and if that fails copies the bytes. rename doesn't work across devices
|
2014-03-31 20:44:39 -07:00 |
|
Matt Wells
|
d6434191d1
|
nomenclature changes to reduce collissions.
name collection 'qatest123' for doing smoke tests,
not 'test'.
|
2014-03-31 15:02:17 -07:00 |
|
Matt Wells
|
9c8410767d
|
fix critical title alloc/free bug
in title.cpp.
|
2014-03-28 08:01:01 -07:00 |
|
Matt Wells
|
c1671015c8
|
Merge branch 'diffbot-dan' into diffbot-testing
|
2014-03-27 12:19:50 -07:00 |
|
Matt Wells
|
582349334f
|
do not use certain other json fields
when computing checksum for deduping.
like stats, querystring, ...
|
2014-03-27 12:20:53 -07:00 |
|
Matt Wells
|
402377d2e6
|
fix bug of gbmin, gbmax etc. not working.
floats were being rounded down to ints
in most cases it seems. so .9 -> 0 etc.
|
2014-03-26 11:56:06 -07:00 |
|
Daniel Steinberg
|
d67f09feeb
|
also include a timestamp field with an RFC 1123 formatted date
|
2014-03-25 21:45:21 -07:00 |
|
Daniel Steinberg
|
0efac8c156
|
Defect #2080: seed URLs duplicated
|
2014-03-25 17:25:55 -07:00 |
|
Daniel Steinberg
|
e1b1b15a38
|
bigger buffer
|
2014-03-25 16:34:40 -07:00 |
|
Daniel Steinberg
|
9846061dff
|
when restarting a bulk job, copy bulkurls.txt to /tmp, and then transfer it back to the new collection folder
|
2014-03-25 16:20:24 -07:00 |
|
Daniel Steinberg
|
ab90c06d8d
|
add TODO for regex checking
|
2014-03-25 13:05:43 -07:00 |
|
Daniel Steinberg
|
1ff6c1fae0
|
Merge remote-tracking branch 'origin/diffbot' into diffbot-dan
|
2014-03-25 12:53:37 -07:00 |
|
Daniel Steinberg
|
b8836745f0
|
use SpiderRequest instead of isonsamedomain flag to determine whether to output data in CSV (Defect #2122)
|
2014-03-25 12:51:08 -07:00 |
|
mwells
|
b6e5424e32
|
do not download bulkjob urls in crawlbot.
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
|
2014-03-21 12:40:38 -07:00 |
|
mwells
|
502752aba4
|
doc updates
|
2014-03-21 08:59:13 -07:00 |
|
Matt Wells
|
b33121af7d
|
make all field names lower case without
spaces when we hash them to make the
prefixhash. since json names often have
mixed case field names and spaces.
|
2014-03-20 16:08:02 -07:00 |
|
Matt Wells
|
98a10d4936
|
Merge branch 'testing' into diffbot-testing
|
2014-03-20 15:50:49 -07:00 |
|
Matt Wells
|
bbc8fc0c79
|
always show admin link
|
2014-03-20 15:48:51 -07:00 |
|
Matt Wells
|
99bd9319fd
|
temp hack to reduce network comm
between trinity and neo
|
2014-03-20 15:42:34 -07:00 |
|
Matt Wells
|
67202f3731
|
Merge branch 'diffbot' into diffbot-testing
|
2014-03-20 15:39:03 -07:00 |
|
Matt Wells
|
5ed19026d9
|
temp debug comments
|
2014-03-20 15:33:37 -07:00 |
|
Matt Wells
|
b8d0e95035
|
Merge branch 'diffbot' into diffbot-testing
|
2014-03-20 10:26:55 -07:00 |
|
mwells
|
ca0843aa8b
|
more bool query fixes.
|
2014-03-20 10:03:25 -07:00 |
|
mwells
|
cfbec626e8
|
more righteous fixes for bool queries
|
2014-03-19 13:51:32 -07:00 |
|
mwells
|
ab3368b5a0
|
more bool fixes. not operator support.
|
2014-03-19 09:38:45 -07:00 |
|
mwells
|
1bb91149d6
|
more bool fixes
|
2014-03-18 14:42:50 -07:00 |
|
mwells
|
652892dc10
|
more bool fixes
|
2014-03-18 14:37:59 -07:00 |
|
mwells
|
f392826b1e
|
nested bool query fixes
|
2014-03-18 14:08:59 -07:00 |
|
mwells
|
b7d80fd02d
|
more bool query fixes
|
2014-03-18 13:41:36 -07:00 |
|
mwells
|
b31eaee9fd
|
simple bool queries work
|
2014-03-18 12:07:29 -07:00 |
|
Matt Wells
|
d4302e3301
|
fix core
|
2014-03-18 11:12:50 -07:00 |
|
Matt Wells
|
3b97682cc3
|
more bool query fixes
|
2014-03-18 10:44:56 -07:00 |
|
Matt Wells
|
6e23d37e47
|
Merge branch 'diffbot' into diffbot-testing
|
2014-03-17 17:27:28 -07:00 |
|
mwells
|
54cc8088fb
|
more bool query fixes. hopefully this will do it,
but still can do some optimizations for speed.
|
2014-03-17 17:00:08 -07:00 |
|
Matt Wells
|
9d3c35ad17
|
nothing
|
2014-03-17 13:53:19 -07:00 |
|
Matt Wells
|
4abf56a75d
|
cleanups
|
2014-03-16 18:06:22 -07:00 |
|