Update README.md

This commit is contained in:
liameno 2022-09-01 15:09:23 +03:00
parent ceab6bbb73
commit e6a096c674

131
README.md
View File

@ -4,16 +4,15 @@
![](images/demo.png)
## Features
#### Crawler
- Threads
- Multithreading
- Cache
- Robots.txt
- Update info after time
- Proxy
- Queue (BFS)
- Detect trackers
- Http to https
- Normalize url
- Detect Trackers
- Http -> Https
#### Website / CLI
- Encryption (rsa)
@ -22,6 +21,46 @@
- Nodes
- Rating
```shell
cd scripts && sh install_deps.sh
```
## Build
```shell
cd scripts && sh build_all.sh
```
## Run
#### DB
```shell
mkdir /tmp/typesense-data &&
./typesense-server --data-dir=/tmp/typesense-data --api-key=xyz --enable-cors &&
sh scripts/init_db.sh
```
#### Crawler
```shell
./crawler ../../sites.txt 5 ../../config.json
#[sites_path] [threads_count] [config path]
```
#### Website
```shell
./website ../../config.json
#[config path]
```
#### CLI
###### Run website before!
```shell
./cli gnu 1 ../../config.json
#[query] [page] [config path]
```
##Instances
¯\_(ツ)_/¯
## TODO
- [x] Encryption (assymetric)
- [x] Multithreading crawler
@ -36,84 +75,12 @@
- typesense ([source](https://github.com/typesense/typesense))
- openssl ([source](https://github.com/openssl/openssl))
```shell
cd scripts && sh install_deps.sh
```
## Build
```shell
cd scripts && sh build_all.sh
```
## Run
```shell
mkdir /tmp/typesense-data &&
./typesense-server --data-dir=/tmp/typesense-data --api-key=xyz --enable-cors &&
sh scripts/init_db.sh
```
#### Crawler
```shell
./crawler ../../sites.txt 5 ../../config.json
#[sites_path] [threads_count] [config path]
```
#### Website
```shell
./website ../../config.json
#[config path]
```
#### CLI
###### Run website before
```shell
./cli gnu 1 ../../config.json
#[query] [page] [config path]
```
## Config
```json
//proxy: type://ip:port OR empty ("")
//socks5://127.0.0.1:9050
## Config
./config.json
//_s - seconds
{
"global": {
//edit also website/frontend/js/search_encrypt.js
"rsa_key_length": 1024, //1024|2048|4096
"max_title_show_size": 55,
"max_desc_show_size": 350,
"nodes": [
{
"name": "This",
"url": "http://127.0.0.1:8080"
}
]
},
"crawler": {
"user_agent": "librengine",
"proxy": "socks5://127.0.0.1:9050",
"load_page_timeout_s": 10,
"update_time_site_info_s_after": 864000, //10 days
"delay_time_s": 3,
"max_pages_site": 5,
"max_page_symbols": 50000000, //50mb
"max_robots_txt_symbols": 3000,
"max_lru_cache_size_host": 512,
"max_lru_cache_size_url": 512,
"is_http_to_https": true,
"is_check_robots_txt": true
},
"cli": {
"proxy": "socks5://127.0.0.1:9050"
},
"website": {
"port": 8080,
"proxy": "socks5://127.0.0.1:9050"
},
//edit also init_db.sh
"db": {
"url": "http://localhost:8108",
"api_key": "xyz"
}
}
```
## Mirrors
https://github.com/liameno/librengine
https://codeberg.org/liameno/librengine
## License
GNU Affero General Public License v3.0