Matt
744cd54131
Merge branch 'ia' into ia-zak
2015-08-31 09:14:27 -06:00
Matt
d9422d8b0e
get rid of limits on file sizes. dynamically allocate
...
file names and fixed-size File array in BigFile class. should
save gigabytes of memory in many-collection systems with
1+ million files or so.
2015-08-14 20:14:50 -06:00
Matt
a1ed368d82
bring back max mem control into master controls.
...
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt
e9f86f362e
Merge branch 'ia' into ia-zak
2015-07-22 12:02:19 -06:00
Matt
16fd428887
fix more cores from the dynamic query size changes.
...
add how many query terms we truncated in the json/xml replies.
document those fields as well.
2015-07-18 14:15:47 -06:00
Zak Betz
a697b3d5a5
Fix Bad File Descriptor loop bug when downloading a static file on a
...
slow disk.
2015-07-14 17:00:09 -06:00
Matt
599b33524f
wget cookie support
2015-05-02 21:52:58 -07:00
Matt Wells
2421bf3d1d
ia checkpoint
2015-05-02 23:51:19 +00:00
Matt
d3c071e4c0
fix gbiaitem page
2015-04-30 21:27:11 -07:00
Matt
9370c8f52e
more fixes
2015-04-28 23:20:16 -07:00
Matt
faf2c06d29
some fixes for indexing warcs/arcs.
2015-04-28 22:30:58 -07:00
Matt
0eb415d408
added preliminary support for spidering .warc.gz and .arc.gz files
2015-04-27 21:41:22 -06:00
Matt
ccb53eb4e7
use http://127.0.0.1:8000/iagbcoll/ <itemname> as a url whose
...
content will be the arc/warc files as urls.
2015-04-25 17:50:22 -06:00
Matt Wells
a2feab9a4a
tap in some fixes for running the newly updated smokes
...
for dealing with the new urls.csv format
2015-04-21 15:20:57 -07:00
Matt
ef42a9cf28
new urls.csv polish. moved columns around. added
...
some new gbss fields, like spidered time.
2015-04-15 17:42:56 -06:00
Matt
43ced700d0
calls NEWS BLOG
2015-04-12 12:33:09 -06:00
Matt
95e3a760e9
proxy fixes
2015-03-05 11:10:40 -08:00
Matt Wells
b80a70a6fd
fix for https urls through proxies
...
using newly updated tcp/loop code.
2015-02-21 09:25:54 -08:00
Matt
2488c1a338
added proper write callback registration into
...
TcpServer.cpp so we only register write callbacks
when a non-blocking write does not write all the
bytes requested of it, or when a connection does not
complete. also fixed up the sslHandshake() function
which calls SSL_connect().
2015-02-16 14:48:39 -07:00
Matt
c15bd53e52
added support for supplying basic proxy authorization
...
to spider proxies. username:password@1.2.3.4:80
2015-02-02 13:23:38 -08:00
mwells
87285ba3cd
use gbmemcpy not memcpy so we can get profiler working again
...
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt
adcef39376
Merge branch 'diffbot-testing' into diffbot-matt
...
Conflicts:
Collectiondb.cpp
Collectiondb.h
Conf.cpp
Conf.h
Msg39.cpp
PageEvents.cpp
PageResults.cpp
PageTurk.cpp
Pages.cpp
Parms.cpp
Posdb.cpp
Proxy.cpp
Query.cpp
Query.h
RdbBase.cpp
RdbMap.cpp
Repair.cpp
Repair.h
SafeBuf.cpp
Spider.cpp
Tagdb.cpp
TopTree.cpp
XmlDoc.cpp
main.cpp
2014-11-20 16:53:07 -08:00
Matt
4e8a42e024
text replacements for bad int32_t substitutions
2014-11-17 18:24:38 -08:00
Matt
931a1c4bc6
good checkpoint. quite a few fixes.
2014-11-17 18:13:36 -08:00
Matt
4c19453ea9
working with -m32 for basic testing.
...
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3
now it compiles with -m32
2014-11-10 14:45:11 -08:00
Matt Wells
95f6dcf4f7
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
...
Conflicts:
HttpServer.cpp
2014-11-01 06:18:20 -07:00
Matt Wells
45972d9837
disregard CONNECT requests for now
2014-11-01 06:17:36 -07:00
Matt Wells
e7dd8f7956
replace long long with int64_t
2014-10-30 13:36:39 -06:00
Matt Wells
033a8b80a0
fix core if json item has column not in table
...
when dumping json items as csv.
2014-10-10 07:00:11 -07:00
Matt Wells
8bb3545b71
emergency fixes for out of sockets core and
...
get proxy request timing out causing spider to hang bug.
2014-10-09 07:20:04 -07:00
Mike Tung
837974e0ec
Valid JSON output for showinput=1.
2014-10-05 21:15:08 -07:00
Matt Wells
7df7fbe721
support the CONNECT for gb squid proxy
2014-10-02 12:36:43 -07:00
mwells
4e7152b487
fix more bugs in squid proxy implementation.
...
force squid proxy stack to use floaters.
2014-10-02 11:54:50 -07:00
mwells
42b891219d
several fixes for floater proxy through squid proxy.
...
gb needs to act like squid for the rendering machines so
it can do crawl delay backoff and load balancing over the
floaters.
2014-10-02 02:08:38 -07:00
mwells
6de7a3f6b3
get advanced search working again
2014-09-27 11:12:47 -07:00
mwells
783ae1d4e7
print chrome on other pages
2014-09-23 20:59:48 -07:00
mwells
5b69f03b59
more updates
2014-09-20 16:41:14 -07:00
Matt Wells
6b6583fc0a
update gui
2014-09-20 11:01:22 -07:00
mwells
65e533bbb7
website updates
2014-09-01 17:23:15 -07:00
mwells
58d8861a34
widget page updates
2014-09-01 17:04:08 -07:00
mwells
25b79684c5
website gui fixes
2014-09-01 13:31:13 -07:00
mwells
1bc5fecb33
website updates
2014-08-31 11:11:12 -07:00
mwells
ef8cb47590
website updates.
2014-08-31 10:51:37 -07:00
mwells
754d5b4755
rename admin.html to faq.html etc. file juggling.
2014-08-31 09:51:21 -07:00
mwells
947be58f10
Merge branch 'diffbot-testing' into testing
...
Conflicts:
HttpRequest.cpp
Msg13.cpp
XmlDoc.cpp
2014-08-05 17:19:53 -07:00
mwells
cc1ceaaac2
fix nyt.com cookie redir bug.
...
fixed bug when POSTing injection request with multipart/form-data.
2014-08-05 17:04:11 -07:00
mwells
13743acd5a
gui updates
2014-08-03 10:42:45 -07:00
mwells
3cc54b72cc
qa updates
2014-07-28 19:15:31 -07:00
mwells
d5805733e5
more api updates
2014-07-13 09:35:44 -07:00