Privacy Web Search Engine (not meta, own crawler)
Go to file
2023-06-19 21:25:25 +00:00
.github Update issue templates 2022-06-22 07:26:01 +00:00
cli Add Docker 2022-09-04 18:05:28 +03:00
crawler Add Docker 2022-09-04 18:05:28 +03:00
images Update /node/info demo image 2022-06-22 11:23:01 +03:00
lib Add Docker 2022-09-04 18:05:28 +03:00
scripts Fix #1 2022-08-20 20:30:40 +03:00
website Add Docker 2022-09-04 18:05:28 +03:00
.gitignore Update .gitignore 2022-07-14 10:31:34 +03:00
CMakeLists.txt Update May (opensearch->typesense) 2022-05-02 03:32:11 -04:00
config.json Replacing a third party robots.txt parser with own robots.txt parser 2022-05-27 19:40:36 +03:00
docker-compose.yml Add Docker 2022-09-04 18:05:28 +03:00
LICENSE Create LICENSE 2022-05-03 17:40:30 +00:00
README.md Update README.md 2023-06-19 21:25:25 +00:00
sites.txt Add Docker 2022-09-04 18:05:28 +03:00

New version: https://github.com/sightnet

This project will not be maintained.

==========

==========



Privacy Web Search Engine

Website

Features

Crawler

  • Multithreading
  • Cache
  • Robots.txt
  • Proxy
  • Queue (BFS)
  • Detect Trackers
  • Http -> Https

Website / CLI

  • Encryption (rsa)
  • API
  • Proxy
  • Nodes
  • Rating

Usage (Docker)

Please run the build every time to change the arguments.
The site is launched by default on port 8080 AND with tor proxy (!!!), to edit it you need to change config.json and rebuild website.
The api key for the database must be changed in the config and when the database is started(--api-key).

DB - please run before using other

sudo docker pull typesense/typesense:0.24.0.rcn6
mkdir /tmp/typesense-data
sudo docker run -p 8108:8108 -v/tmp/data:/data typesense/typesense:0.24.0.rcn6 --data-dir /data --api-key=xyz

Crawler

sudo docker-compose build crawler --build-arg SITES="$(cat sites.txt)"  --build-arg THREADS=1 --build-arg CONFIG="$(cat config.json)"
sudo docker-compose up crawler

Website

sudo docker-compose build website --build-arg CONFIG="$(cat config.json)"
sudo docker-compose up website

Usage (Manual)

Deps

cd scripts && sh install_deps.sh

Build

cd scripts && sh build_all.sh

Run

The site is launched by default on port 8080 AND with tor proxy (!!!), to edit it you need to change config.json.
The api key for the database must be changed in the config and when the database is started(--api-key).

DB - please run before using other

mkdir /tmp/typesense-data &&
./typesense-server --data-dir=/tmp/typesense-data --api-key=xyz --enable-cors &&
sh scripts/init_db.sh

Crawler

./crawler ../../sites.txt 5 ../../config.json
#[sites_path] [threads_count] [config path]

Website

./website ../../config.json
#[config path]

Instances

¯\(ツ)

TODO

  • Docker
  • Encryption (assymetric)
  • Multithreading crawler
  • Robots Rules (from headers & html) & crawl-delay
  • Responsive web design
  • Own FTS (...)
  • Images Crawler

Dependencies

Config

./config.json

Mirrors

License

GNU Affero General Public License v3.0