Commit Graph

1246 Commits

Author SHA1 Message Date
mwells
b3fcfb1ab0 updated admin.html 2014-04-06 21:19:39 -07:00
mwells
1b5c6a6278 create hosts.conf into cwd if not there.
pretty up logging system.
update admin.html
2014-04-06 21:12:52 -07:00
mwells
5ee79a4c2f daemonize on ./gb 0 etc. 2014-04-06 15:57:38 -07:00
mwells
c20c30c53f Merge branch 'testing' of github.com:gigablast/open-source-search-engine into testing 2014-04-06 14:03:13 -07:00
mwells
23e5a94ddf move log file in the binary itself now. 2014-04-06 14:02:51 -07:00
mwells
5ff88fafbc spider status updates 2014-04-05 18:52:40 -07:00
mwells
264f27b826 fix url filters to have !insitelist directive 2014-04-05 18:40:39 -07:00
mwells
b0dbf833a7 fix sitelist update logic. 2014-04-05 18:26:00 -07:00
mwells
ac5cf7971b more misc updates. 2014-04-05 18:09:04 -07:00
mwells
bd82145626 Merge branch 'diffbot-testing' into testing 2014-04-05 12:34:46 -07:00
mwells
89f5c8c059 Merge branch 'diffbot-matt' into diffbot-testing 2014-04-05 11:34:27 -07:00
mwells
61b4ec4ca6 added some qa testing logic. qa.cpp. 2014-04-05 11:33:42 -07:00
Daniel Steinberg
0988a134d0 Merge remote-tracking branch 'origin/diffbot' into diffbot-dan 2014-04-01 19:48:24 -07:00
Daniel Steinberg
4856cc4c60 ||, not && 2014-04-01 10:45:54 -07:00
Daniel Steinberg
3e38bd169e and return an error 2014-04-01 10:43:17 -07:00
Daniel Steinberg
94b169b8dc only delete if there were no io errors 2014-04-01 10:42:12 -07:00
Daniel Steinberg
6568858e81 implement something that works like mv, which tries rename first, and if that fails copies the bytes. rename doesn't work across devices 2014-03-31 20:44:39 -07:00
Matt Wells
d6434191d1 nomenclature changes to reduce collissions.
name collection 'qatest123' for doing smoke tests,
not 'test'.
2014-03-31 15:02:17 -07:00
Matt Wells
9c8410767d fix critical title alloc/free bug
in title.cpp.
2014-03-28 08:01:01 -07:00
Matt Wells
c1671015c8 Merge branch 'diffbot-dan' into diffbot-testing 2014-03-27 12:19:50 -07:00
Matt Wells
582349334f do not use certain other json fields
when computing checksum for deduping.
like stats, querystring, ...
2014-03-27 12:20:53 -07:00
Matt Wells
402377d2e6 fix bug of gbmin, gbmax etc. not working.
floats were being rounded down to ints
in most cases it seems. so .9 -> 0 etc.
2014-03-26 11:56:06 -07:00
Daniel Steinberg
d67f09feeb also include a timestamp field with an RFC 1123 formatted date 2014-03-25 21:45:21 -07:00
Daniel Steinberg
0efac8c156 Defect #2080: seed URLs duplicated 2014-03-25 17:25:55 -07:00
Daniel Steinberg
e1b1b15a38 bigger buffer 2014-03-25 16:34:40 -07:00
Daniel Steinberg
9846061dff when restarting a bulk job, copy bulkurls.txt to /tmp, and then transfer it back to the new collection folder 2014-03-25 16:20:24 -07:00
Daniel Steinberg
ab90c06d8d add TODO for regex checking 2014-03-25 13:05:43 -07:00
Daniel Steinberg
1ff6c1fae0 Merge remote-tracking branch 'origin/diffbot' into diffbot-dan 2014-03-25 12:53:37 -07:00
Daniel Steinberg
b8836745f0 use SpiderRequest instead of isonsamedomain flag to determine whether to output data in CSV (Defect #2122) 2014-03-25 12:51:08 -07:00
mwells
b6e5424e32 do not download bulkjob urls in crawlbot.
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
2014-03-21 12:40:38 -07:00
mwells
502752aba4 doc updates 2014-03-21 08:59:13 -07:00
Matt Wells
b33121af7d make all field names lower case without
spaces when we hash them to make the
prefixhash. since json names often have
mixed case field names and spaces.
2014-03-20 16:08:02 -07:00
Matt Wells
98a10d4936 Merge branch 'testing' into diffbot-testing 2014-03-20 15:50:49 -07:00
Matt Wells
bbc8fc0c79 always show admin link 2014-03-20 15:48:51 -07:00
Matt Wells
99bd9319fd temp hack to reduce network comm
between trinity and neo
2014-03-20 15:42:34 -07:00
Matt Wells
67202f3731 Merge branch 'diffbot' into diffbot-testing 2014-03-20 15:39:03 -07:00
Matt Wells
5ed19026d9 temp debug comments 2014-03-20 15:33:37 -07:00
Matt Wells
b8d0e95035 Merge branch 'diffbot' into diffbot-testing 2014-03-20 10:26:55 -07:00
mwells
ca0843aa8b more bool query fixes. 2014-03-20 10:03:25 -07:00
mwells
cfbec626e8 more righteous fixes for bool queries 2014-03-19 13:51:32 -07:00
mwells
ab3368b5a0 more bool fixes. not operator support. 2014-03-19 09:38:45 -07:00
mwells
1bb91149d6 more bool fixes 2014-03-18 14:42:50 -07:00
mwells
652892dc10 more bool fixes 2014-03-18 14:37:59 -07:00
mwells
f392826b1e nested bool query fixes 2014-03-18 14:08:59 -07:00
mwells
b7d80fd02d more bool query fixes 2014-03-18 13:41:36 -07:00
mwells
b31eaee9fd simple bool queries work 2014-03-18 12:07:29 -07:00
Matt Wells
d4302e3301 fix core 2014-03-18 11:12:50 -07:00
Matt Wells
3b97682cc3 more bool query fixes 2014-03-18 10:44:56 -07:00
Matt Wells
6e23d37e47 Merge branch 'diffbot' into diffbot-testing 2014-03-17 17:27:28 -07:00
mwells
54cc8088fb more bool query fixes. hopefully this will do it,
but still can do some optimizations for speed.
2014-03-17 17:00:08 -07:00