mwells
22271c0bb2
do not accept msg4 add requests until in sync with host 0
2013-12-10 13:20:23 -08:00
mwells
f2d5661965
parmdb overhaul. support collection add/del
...
sync when host comes back online. use udp not tcp.
host #0 can now handle a new incoming request while
a parm change is currently outstanding.
all missed "command" parms will be received when a dead host
comes back online, too, like a tight merge for instance.
does not use msg4, uses msg3e and msg3f for syncing and
sending parms.
2013-12-10 13:09:55 -08:00
mwells
0e47d48d8c
test commit
2013-12-10 13:02:52 -08:00
mwells
e04d596288
minor comments update.
2013-12-09 13:42:33 -08:00
Matt Wells
dd3b49faa9
collection name hell
2013-12-08 16:44:37 -07:00
Matt Wells
3353a90a85
fix resuming a killed merge condition.
2013-12-08 15:50:45 -07:00
Matt Wells
ed79b67d2e
core dump fixes
2013-12-08 15:36:23 -07:00
Matt Wells
144e2c898e
save resources by not doing reads
...
on an empty doledb priority.
stop saving allSpidersOn and Off parms.
2013-12-08 14:07:31 -07:00
Matt Wells
a2e52a5dc3
little fix
2013-12-08 10:15:54 -07:00
Matt Wells
020d7741b9
new coll.conf for main with ismedia filter.
...
updated url filters docs some more for "isnew"
and explained the errorcount stuff more.
2013-12-08 10:10:51 -07:00
Matt Wells
65e75167e3
limit posdb merging to 8 files max.
...
added some more url filters documentation.
2013-12-08 09:41:05 -07:00
Matt Wells
78a4cfe6da
forgot to push the .h files
2013-12-07 22:12:48 -07:00
Matt Wells
e1712fc94f
fix uninitialized diffbot titlerec
...
header parms. ignore them when not
a custom crawl.
2013-12-07 22:11:26 -07:00
Matt Wells
06edfddf31
a bunch of bug fixes, mostly spider related.
...
also some for pagereindex.
2013-12-07 21:56:37 -07:00
Matt Wells
5e4b5a112c
Merge branch 'master' into diffbot
...
Conflicts:
PageResults.cpp
Threads.cpp
XmlDoc.cpp
XmlDoc.h
2013-12-07 11:34:26 -07:00
Matt Wells
105be1fbdc
more core fixes
2013-12-07 10:38:47 -07:00
Matt Wells
8d92a079c2
minor spider error reply time fix
2013-12-07 10:21:51 -07:00
Matt Wells
e731e5a4d8
Merge branch 'diffbot' of git@github.com:gigablast/open-source-search-engine into diffbot
2013-12-07 10:21:21 -07:00
Matt Wells
0e846a9389
minor spider reply error fix
2013-12-07 10:21:02 -07:00
Matt Wells
626a97770c
another core fix
2013-12-07 10:14:37 -07:00
Matt Wells
fda7b48500
fix core
2013-12-07 10:11:13 -07:00
Matt Wells
1bc80ab552
fixed pagereindex. we now add spiderreplies
...
for internal errors like ENOMEM or ENOTFOUND
to try to avoid the "CRITICAL CRITICAL" msgs.
these are considered temporary errors.
2013-12-07 10:01:17 -07:00
Matt Wells
d9b31d3481
quick bug fix
2013-12-06 22:57:49 -07:00
Matt Wells
269c10f648
try to figure out why pagereindex never
...
displayed html page when done.
2013-12-06 22:56:06 -07:00
Matt Wells
e7bd904765
fix docids only printing.
2013-12-06 09:53:32 -07:00
Matt Wells
c50ef1954f
show admin controls on serps if ip is local.
...
fixed up the "reindex" page for deleting/reindexing
search results for a given query.
2013-12-06 09:48:30 -07:00
Matt Wells
4b3e111bed
fix spider dumping to remember
...
uh48's between list readings.
was showing dups for www.nordicusa.com/webtv
at the end.
2013-12-05 10:09:06 -08:00
Matt Wells
99cc10fccd
allow seed urls to match url crawl pattern
...
regardless.
2013-12-03 17:13:38 -08:00
Matt Wells
432099c4e6
added rebuild=true fix for regex crawl change
2013-12-03 16:23:58 -08:00
Matt Wells
2e46bcc97f
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2013-12-03 16:23:20 -08:00
Matt Wells
03219a3057
add regex support back in
2013-12-03 16:23:05 -08:00
Matt Wells
6ab9041f45
fix bug when just getting the crawl parms
...
was rebuilding the waiting tree.
2013-12-03 16:17:36 -08:00
Matt Wells
9f1d79b124
check for null collrec
2013-12-02 10:13:19 -08:00
Matt Wells
cda5968b75
update common word list
2013-12-01 15:19:33 -07:00
Matt Wells
39f8dc646b
default gigabits on for my copy.
2013-12-01 15:07:06 -07:00
Matt Wells
7f4dca7a07
Merge branch 'master' of git@github.com:gigablast/open-source-search-engine
2013-12-01 14:47:16 -07:00
Matt Wells
7874c8d832
added ifdef NEEDSLICENSE
2013-12-01 14:47:08 -07:00
Gigablast
dfe72a76a0
Update LICENSE
...
updates to license
2013-12-01 13:43:14 -08:00
Matt Wells
d43b55103c
show query in msg20 log msg
2013-12-01 12:11:25 -07:00
Matt Wells
1077191e4a
fix log msg bug.
2013-12-01 12:08:05 -07:00
Matt Wells
08030865e4
fix compiler warning
2013-12-01 11:57:26 -07:00
Matt Wells
d811a13627
fix small oopsy
2013-12-01 11:56:33 -07:00
Matt Wells
3155869fbf
added new log msg for
...
recording cpu time for summary generation.
2013-12-01 11:53:41 -07:00
Matt Wells
5ee2be8fcf
fixed data corruption bug. m_finalCrawlDelay
...
was being stored in xmldoc titlerec header.
2013-11-27 14:18:15 -08:00
Matt Wells
1129e9b635
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2013-11-27 14:09:54 -08:00
Matt Wells
57eb231a4e
do not add timestamps to lastdownload
...
cache if skiphammercheck is true. those
are like robots.txt or redirs or root files.
2013-11-26 14:21:17 -08:00
Matt Wells
0f3374e3f3
measure crawl delay by default from
...
start of each download now. it is
a parm in msg13request.
2013-11-26 14:07:28 -08:00
Matt Wells
4769ca0881
if pthread_create() returns EAGAIN then do
...
not always retry, it makes an infinite loop.
2013-11-26 14:52:07 -07:00
Matt Wells
8bb086ac60
crawldelay works now but it measures
...
from the end of the download, not the
beginning.
2013-11-26 12:58:14 -08:00
Matt Wells
1c7c9a4d80
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2013-11-26 09:19:26 -08:00