open-source-search-engine

mirror of https://github.com/gigablast/open-source-search-engine.git synced 2024-10-04 12:17:35 +03:00

Author	SHA1	Message	Date
Matt Wells	647d004c04	fix core from sending a url alert, then customer deleting collection before email alert reply comes back. then it comes back to a delete collrec and cores.	2015-09-08 15:57:46 -07:00
Matt Wells	74ec812959	try to fix core from adding a file that already exists. just return an error now. hopefully merge will try again later. also core if you try to write recs to an rdbmap that has already had its memory footprint reduced so we can find that overrun bug.	2015-08-21 14:00:40 -07:00
Matt	a1ed368d82	bring back max mem control into master controls. it's useful to limit per process mem usage to prevent oom killer because we can't save if we get killed. overhaul diskpagecache to just use rdbcache. much simpler and faster, but disabled for now until debugged more. reduce min files to merge for crawlbot collections so they stay more tightly merged to conserve fds and mem. improved logDebugDisk msgs. overhauled File.cpp fd pool. now it is way faster and doesn't use any extra mem. much simpler too. although could be sped up a little by using a linked list, but probably is not significant enough to warrant doing right now. increase mem ptr table from 3M to 8M slots. should really make dynamic though. fix core from null msg20s[0]->m_r. only call attemptMergeAll once every 60 seconds really. do not attempt merge if already merging.	2015-08-14 12:58:54 -06:00
Matt	0970975a57	tested auto proxy use and auto spider (non-proxy) backoff to 3 second crawldelay successfully on the stamps site.	2015-04-30 15:31:09 -07:00
Matt	4a43e1387e	better fixes for core from sig alarms	2015-04-13 10:28:43 -06:00
Matt Wells	97d3b185c1	just use INCOMING udp slots/sockets for jam detection. this will highlight the slow nodes better.	2015-04-08 15:52:43 -06:00
Matt	2ce107e4be	keep track of how many times the host exited/cored as an exponent to the 'x' in the hosts table. this way we can detect hosts that have restarted many times and fix them.	2015-04-01 16:28:58 -06:00
Matt	76ec7f3a4a	add # of tcp connections to hosts table	2015-02-03 14:14:17 -08:00
Matt	fe14079ffe	show shards with excessive udp slots to detect jam up.	2015-01-22 14:47:30 -07:00
Matt Wells	51cda3bac0	fix malformed http reply header	2015-01-15 10:40:23 -08:00
mwells	87285ba3cd	use gbmemcpy not memcpy so we can get profiler working again since memcpy can't be interrupted and backtrace() called.	2015-01-13 12:25:42 -07:00
Matt Wells	e5b81cfb04	fix ping age being negative in hosts table bug.	2015-01-05 15:19:46 -08:00
Matt Wells	d57f2264c4	more indicator fixes	2014-12-17 15:11:49 -08:00
Matt Wells	f52e163fb0	fix a couple bugs. added out of sync indicator.	2014-12-17 14:28:32 -08:00
Matt	465d30e0ee	fix ping bug.	2014-12-17 10:43:00 -08:00
Matt Wells	2fd511f002	updates	2014-12-16 17:09:25 -08:00
Matt Wells	d4179634a1	crc fixes	2014-12-16 16:38:54 -08:00
Matt	730b131bbf	added new indicators so we can make gb more stable. now hosts table reports # ooms, disk read corruptions, closed sockets from overloads, and we # of outstanding spiders. made ping request a class so we can easily add new indicators.	2014-12-16 16:22:50 -08:00
Matt	4e8a42e024	text replacements for bad int32_t substitutions	2014-11-17 18:24:38 -08:00
Matt	931a1c4bc6	good checkpoint. quite a few fixes.	2014-11-17 18:13:36 -08:00
Matt	4c19453ea9	working with -m32 for basic testing. compiles for 64-bit.	2014-11-12 11:38:37 -08:00
Matt	96b8197ad3	now it compiles with -m32	2014-11-10 14:45:11 -08:00
Matt Wells	e7dd8f7956	replace long long with int64_t	2014-10-30 13:36:39 -06:00
Matt Wells	8cf5bdc8a2	force gb to recompile version every time you do a make, so version is updated automatically.	2014-09-19 12:23:40 -07:00
Matt Wells	67ee615d1d	log note to updat version if differences detected.	2014-09-19 09:35:35 -07:00
mwells	9d69c1362d	added proper version computation to gb	2014-09-19 10:25:48 -06:00
mwells	caee238c46	fixes to make easier to compile on max os x.	2014-08-28 12:55:02 -07:00
mwells	628fe2336f	make code compile cleaner.	2014-06-07 14:11:12 -07:00
Matt Wells	8aa0662a27	Merge branch 'diffbot' into testing Conflicts: Make.depend PageResults.cpp Parms.cpp Spider.cpp Spider.h gb.conf	2014-03-08 09:38:44 -07:00
Matt Wells	cf6695f625	speed up getNumTotalRecs() by caching it basically for 2 seconds since pingserver.cpp calls it all the time.	2014-02-25 12:14:51 -08:00
Matt Wells	ca4aafa8a6	added host disk usage redbox and stats.	2014-02-12 09:47:44 -07:00
Matt Wells	f420bd2769	checkpoint	2014-02-09 15:09:48 -07:00
Matt Wells	4346fcee29	added recovery mode display in hosts table	2014-02-01 10:16:46 -08:00
Matt Wells	2faba0efd1	fix repeat rounds sticking bug by adding PF_REBUILDURLFILTERS flag to spiderroundastarttime parm	2014-01-17 17:17:10 -08:00
Matt Wells	16f8af0d57	added awesome streaming mode support to tcpserver.cpp for sending back json objects as we get them from shards. and as we get them in small pieces so we don't go oom. made that code much simpler and more reliable in the long run.	2014-01-17 16:26:17 -08:00
Matt Wells	9da106e7ca	added ermergency msg box on all admin pages	2014-01-11 20:35:13 -08:00
Matt Wells	eed606601e	added emergency msg box on all admin pages	2014-01-11 20:14:44 -08:00
Matt Wells	8a49e87a61	got code with shard rebalancing compiling. now we store a "sharded by termid" bit in posdb key for checksums, etc keys that are not sharded by docid. save having to do disk seeks on every host in the cluster to do a dup check, etc.	2014-01-11 16:08:42 -08:00
Matt Wells	f64b53bfb3	almost done with rebalancing code	2014-01-10 14:12:58 -08:00
Matt Wells	a76f4c6974	just POST a full request for webhook now so we can do application/json content type	2013-11-07 14:20:15 -08:00
Matt Wells	3e4db4f1bc	show all crawl details in url webhook notification in the post body.	2013-11-07 13:59:43 -08:00
Matt Wells	adf4d258ae	better crawl status reporting. allow for _ in coll names.	2013-10-30 10:00:46 -07:00
Matt Wells	20052e34fe	made webhook return the crawl name and status as X- fields in the mime.	2013-10-28 22:03:10 -07:00
Matt Wells	a5a7ab2434	added spider status msg to json output to indicate if spider has hit a limit. no longer disable spiders in xmldoc.cpp when a crawl/process limit is hit. just check for limit when spidering urls in spider.cpp and if it is hit set CollectionRec::m_spiderStatus[Msg] and send email from there. Added maxCrawlRounds parm.	2013-10-23 11:40:30 -07:00
Matt Wells	0e4d96b3f8	added "seeds" to json reply. store seed urls (and deup them) in collrec. fixed some respidering issues. any time we re-enter url filters then rebuild the waiting tree.	2013-10-21 17:35:14 -07:00
Matt Wells	b589b17e63	fix collection resetting.	2013-10-18 15:21:00 -07:00
Matt Wells	a288217e9f	a few bug fixes	2013-10-17 18:59:00 -07:00
mwells	ea859ef685	added 'gb emailmandrill' for testing. got it working. it posts json, not url encoded.	2013-10-09 17:35:51 -06:00
mwells	c1c5c4e3d0	send notifications if no urls available for immediate spidering.	2013-10-09 15:24:35 -06:00
Matt Wells	283ec2f6b4	email and webhook alerts when spider runs out of urls to spider.	2013-10-09 11:42:56 -07:00

1 2

59 Commits