Commit Graph

25 Commits

Author SHA1 Message Date
Zak Betz
ea139a65e6 Warc stream busy loop fixes.
Load balance msg22 to the one with the least outstanding requests.
2015-10-15 22:30:07 -06:00
Zak Betz
6a40315237 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into warc-stream
Conflicts:
	XmlDoc.cpp
2015-10-12 00:32:52 -06:00
Zak Betz
45744d74f3 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into warc-stream 2015-10-07 08:46:07 -06:00
Zak Betz
49ec9c99fd Don't restart all items when forcing a list of items into injector. 2015-10-05 15:33:09 -06:00
Zak Betz
c947252fee Add gbcapturedate to individual doc's metadata when injecting warcs. 2015-10-04 01:53:54 -06:00
Zak Betz
6becb55a2b Stream warcs instead of downloading them and unzipping them on disk. 2015-09-30 22:25:59 -06:00
Zak Betz
55169be6fc Warc injector update. 2015-09-21 09:31:59 -06:00
Zak Betz
e50c57db90 Warc injector script 2015-09-12 14:08:42 -06:00
Zak Betz
a270e163de Fix coring on udp timeout when clustering search results.
Add ability to force update a list of items in warc injector.
2015-09-11 11:05:57 -06:00
Zak Betz
41268aeba7 Changes to script to copy to back twins. 2015-08-28 09:06:16 -06:00
Zak Betz
94871521e7 nothing. 2015-08-23 21:06:19 -06:00
Zak Betz
e252dfb088 Add docs per second stat.
Fix auto update on statsdb graph.
Add Stat toggles for statsdb graph.
Add a unit test for indexing an array in metadata.
2015-08-22 12:05:20 -06:00
Zak Betz
36b8d384bd Fixes to injector script.
New colors and metrics on performance graph.
2015-08-13 23:29:20 -06:00
Zak Betz
dead58329e Add a script for interacting with hosts.conf files. 2015-07-21 10:17:01 -06:00
Zak Betz
15eb7f659d Fix some malformed html on hosts page.
Fix core when no collection record in injection request.
Add a script to test disk speed.
2015-07-16 12:02:14 -06:00
Zak Betz
a697b3d5a5 Fix Bad File Descriptor loop bug when downloading a static file on a
slow disk.
2015-07-14 17:00:09 -06:00
Zak Betz
6e21bc7d7c Injection script fixes.
Temporary fix for core when injecting large warc.
2015-07-08 14:03:39 -06:00
Zak Betz
87fcda0f93 Fix atotime5 to parse ISO8601.
Fix qa test for warcs and arcs.
Fix inject script.
2015-07-06 00:51:18 -06:00
Zak Betz
6de4199ee8 Fix linkdb core.
Make file and line number the label for StackBuf.
2015-07-02 12:17:10 -06:00
Zak Betz
b88079a2d4 Fix warc injector script. 2015-06-30 22:19:13 -06:00
Zak Betz
7b507a70ef Set value length to 0 for something that does not return a string value
in Json.cpp.
Fix the '-' -> '_' when indexing generic fields.
Add a StackBuf macro which is a Safebuf initialized with a small
stack buffer for use in a local scope.
2015-06-30 14:09:57 -06:00
Zak Betz
f490847eb2 Fix injector build script.
Add IA's lib for getting metadata.
2015-06-18 01:23:13 -06:00
Zak Betz
9f61636881 Change collection on inject script. 2015-06-18 00:24:36 -06:00
Zak Betz
9ca0223cf1 Translate metadata field names with dashes to _.
Add unit tests for searching for certain types of metadata.
2015-06-17 23:36:31 -06:00
Zak Betz
32987e76ee Add json metadata field to page inject.
Fix memory leak when spidering warc files.
Add script to inject warcs from internet archives search results.
2015-06-14 20:58:41 -06:00