1 Commits

Author SHA1 Message Date
Gianni Carafa
797a33068d WIP: Add rules argument 2023-11-06 14:50:33 +01:00
13 changed files with 78 additions and 315 deletions

0
.github/pull_request_template.md vendored Normal file
View File

View File

@@ -3,8 +3,6 @@
</p> </p>
<h1 align="center">Ladder</h1> <h1 align="center">Ladder</h1>
<div><img alt="License" src="https://img.shields.io/github/license/kubero-dev/ladder"> <img alt="go.mod Go version " src="https://img.shields.io/github/go-mod/go-version/kubero-dev/ladder"> <img alt="GitHub tag (with filter)" src="https://img.shields.io/github/v/tag/kubero-dev/ladder"> <img alt="GitHub (Pre-)Release Date" src="https://img.shields.io/github/release-date-pre/kubero-dev/ladder"> <img alt="GitHub Downloads all releases" src="https://img.shields.io/github/downloads/kubero-dev/ladder/total"> <img alt="GitHub Build Status (with event)" src="https://img.shields.io/github/actions/workflow/status/kubero-dev/ladder/release-binaries.yaml"></div>
*Ladder is a web proxy to help bypass paywalls.* This is a selfhosted version of [1ft.io](https://1ft.io) and [12ft.io](https://12ft.io). It is inspired by [13ft](https://github.com/wasi-master/13ft). *Ladder is a web proxy to help bypass paywalls.* This is a selfhosted version of [1ft.io](https://1ft.io) and [12ft.io](https://12ft.io). It is inspired by [13ft](https://github.com/wasi-master/13ft).
@@ -27,7 +25,7 @@ Freedom of information is an essential pillar of democracy and informed decision
- [x] Linux binary - [x] Linux binary
- [x] Mac OS binary - [x] Mac OS binary
- [x] Windows binary (untested) - [x] Windows binary (untested)
- [x] Removes most of the ads (unexpected side effect ¯\\\_(ツ)_/¯ ) - [x] Removes most of the ads (unexpected side effect ¯\_(ツ)_/¯ )
- [x] Basic Auth - [x] Basic Auth
- [x] Disable logs - [x] Disable logs
- [x] No Tracking - [x] No Tracking
@@ -38,7 +36,7 @@ Freedom of information is an essential pillar of democracy and informed decision
- [ ] Fetch from Google Cache if not available - [ ] Fetch from Google Cache if not available
### Limitations ### Limitations
Certain sites may display missing images or encounter formatting issues. This can be attributed to the site's reliance on JavaScript or CSS for image and resource loading, which presents a limitation when accessed through this proxy. If you prefer a full experience, please consider buying a subscription for the site. Certain sites may display missing images or encounter formatting issues. This can be attributed to the site's reliance on JavaScript or CSS for image and resource loading, which presents a limitation when accessed through this proxy. If you prefer a full experience, please concider buying a subscription for the site.
Some sites do not expose their content to search engines, which means that the proxy cannot access the content. A future version will try to fetch the content from Google Cache. Some sites do not expose their content to search engines, which means that the proxy cannot access the content. A future version will try to fetch the content from Google Cache.
@@ -62,9 +60,6 @@ curl https://raw.githubusercontent.com/kubero-dev/ladder/main/docker-compose.yam
docker-compose up -d docker-compose up -d
``` ```
### Helm
See [README.md](/helm-chart/README.md) in helm-chart sub-directory for more information.
## Usage ## Usage
### Browser ### Browser
@@ -75,11 +70,6 @@ See [README.md](/helm-chart/README.md) in helm-chart sub-directory for more info
Or direct by appending the URL to the end of the proxy URL: Or direct by appending the URL to the end of the proxy URL:
http://localhost:8080/https://www.example.com http://localhost:8080/https://www.example.com
Or create a bookmark with the following URL:
```javascript
javascript:window.location.href="http://localhost:8080/"+location.href
```
### API ### API
```bash ```bash
curl -X GET "http://localhost:8080/api/https://www.example.com" curl -X GET "http://localhost:8080/api/https://www.example.com"
@@ -106,7 +96,7 @@ http://localhost:8080/ruleset
| `LOG_URLS` | Log fetched URL's | `true` | | `LOG_URLS` | Log fetched URL's | `true` |
| `DISABLE_FORM` | Disables URL Form Frontpage | `false` | | `DISABLE_FORM` | Disables URL Form Frontpage | `false` |
| `FORM_PATH` | Path to custom Form HTML | `` | | `FORM_PATH` | Path to custom Form HTML | `` |
| `RULESET` | URL to a ruleset file | `https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml` or `/path/to/my/rules.yaml` | | `RULESET` | URL to a ruleset file | `https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml` or `/path/to/my/rules.yaml` or `default` |
| `EXPOSE_RULESET` | Make your Ruleset available to other ladders | `true` | | `EXPOSE_RULESET` | Make your Ruleset available to other ladders | `true` |
| `ALLOWED_DOMAINS` | Comma separated list of allowed domains. Empty = no limitations | `` | | `ALLOWED_DOMAINS` | Comma separated list of allowed domains. Empty = no limitations | `` |
| `ALLOWED_DOMAINS_RULESET` | Allow Domains from Ruleset. false = no limitations | `false` | | `ALLOWED_DOMAINS_RULESET` | Allow Domains from Ruleset. false = no limitations | `false` |
@@ -121,15 +111,12 @@ See in [ruleset.yaml](ruleset.yaml) for an example.
```yaml ```yaml
- domain: www.example.com - domain: www.example.com
domains: # Additional domains to apply the rule
- www.example.com
- www.beispiel.de
regexRules: regexRules:
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)" - match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
replace: <script $1 script="/https://www.example.com/$3" replace: <script $1 script="/https://www.example.com/$3"
injections: injections:
- position: head # Position where to inject the code - position: head # Position where to inject the code
append: | # possible keys: append, prepend, replace append: |
<script> <script>
window.localStorage.clear(); window.localStorage.clear();
console.log("test"); console.log("test");

View File

@@ -7,6 +7,7 @@ import (
"ladder/handlers" "ladder/handlers"
"log" "log"
"os" "os"
"strconv"
"strings" "strings"
"github.com/akamensky/argparse" "github.com/akamensky/argparse"
@@ -22,28 +23,34 @@ func main() {
parser := argparse.NewParser("ladder", "Every Wall needs a Ladder") parser := argparse.NewParser("ladder", "Every Wall needs a Ladder")
portEnv := os.Getenv("PORT") p := os.Getenv("PORT")
if os.Getenv("PORT") == "" { if os.Getenv("PORT") == "" {
portEnv = "8080" p = "8080"
} }
port := parser.String("p", "port", &argparse.Options{ port := parser.String("p", "port", &argparse.Options{
Required: false, Required: false,
Default: portEnv, Default: p,
Help: "Port the webserver will listen on"}) Help: "Port the webserver will listen on"})
pf, _ := strconv.ParseBool(os.Getenv("PREFORK"))
prefork := parser.Flag("P", "prefork", &argparse.Options{ prefork := parser.Flag("P", "prefork", &argparse.Options{
Required: false, Required: false,
Default: pf,
Help: "This will spawn multiple processes listening"}) Help: "This will spawn multiple processes listening"})
r := os.Getenv("RULESET")
ruleset := parser.String("r", "ruleset", &argparse.Options{
Required: false,
Default: r,
Help: "Path or URL to your ruleset"})
handlers.LoadRules(*ruleset)
err := parser.Parse(os.Args) err := parser.Parse(os.Args)
if err != nil { if err != nil {
fmt.Print(parser.Usage(err)) fmt.Print(parser.Usage(err))
} }
if os.Getenv("PREFORK") == "true" {
*prefork = true
}
app := fiber.New( app := fiber.New(
fiber.Config{ fiber.Config{
Prefork: *prefork, Prefork: *prefork,

View File

@@ -3,17 +3,12 @@ services:
ladder: ladder:
image: ghcr.io/kubero-dev/ladder:latest image: ghcr.io/kubero-dev/ladder:latest
container_name: ladder container_name: ladder
#build: . build: .
#restart: always #restart: always
#command: sh -c ./ladder #command: sh -c ./ladder
environment: environment:
- PORT=8080 - PORT=8080
- RULESET=/app/ruleset.yaml #- PREFORK=true
#- ALLOWED_DOMAINS_RULESET=false
#- EXPOSE_RULESET=true
#- PREFORK=false
#- DISABLE_FORM=fase
#- FORM_PATH=/app/form.html
#- X_FORWARDED_FOR=66.249.66.1 #- X_FORWARDED_FOR=66.249.66.1
#- USER_AGENT=Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) #- USER_AGENT=Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
#- USERPASS=foo:bar #- USERPASS=foo:bar
@@ -21,6 +16,11 @@ services:
#- GODEBUG=netdns=go #- GODEBUG=netdns=go
ports: ports:
- "8080:8080" - "8080:8080"
volumes: deploy:
- ./ruleset.yaml:/app/ruleset.yaml resources:
- ./handlers/form.html:/app/form.html limits:
cpus: "0.50"
memory: 512M
reservations:
cpus: "0.25"
memory: 128M

View File

@@ -27,8 +27,7 @@ func Api(c *fiber.Ctx) error {
Version: version, Version: version,
Body: body, Body: body,
} }
response.Request.Headers = make([]interface{}, 0)
response.Request.Headers = make([]any, 0, len(req.Header))
for k, v := range req.Header { for k, v := range req.Header {
response.Request.Headers = append(response.Request.Headers, map[string]string{ response.Request.Headers = append(response.Request.Headers, map[string]string{
"key": k, "key": k,
@@ -36,7 +35,7 @@ func Api(c *fiber.Ctx) error {
}) })
} }
response.Response.Headers = make([]any, 0, len(resp.Header)) response.Response.Headers = make([]interface{}, 0)
for k, v := range resp.Header { for k, v := range resp.Header {
response.Response.Headers = append(response.Response.Headers, map[string]string{ response.Response.Headers = append(response.Response.Headers, map[string]string{
"key": k, "key": k,

View File

@@ -17,8 +17,11 @@ import (
var UserAgent = getenv("USER_AGENT", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)") var UserAgent = getenv("USER_AGENT", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
var ForwardedFor = getenv("X_FORWARDED_FOR", "66.249.66.1") var ForwardedFor = getenv("X_FORWARDED_FOR", "66.249.66.1")
var rulesSet = loadRules() var rulesSet RuleSet
// var rulesSet = loadRules()
var allowedDomains = strings.Split(os.Getenv("ALLOWED_DOMAINS"), ",") var allowedDomains = strings.Split(os.Getenv("ALLOWED_DOMAINS"), ",")
var Aaaa = "aaaa"
func ProxySite(c *fiber.Ctx) error { func ProxySite(c *fiber.Ctx) error {
// Get the url from the URL // Get the url from the URL
@@ -56,7 +59,7 @@ func fetchSite(urlpath string, queries map[string]string) (string, *http.Request
return "", nil, nil, fmt.Errorf("domain not allowed. %s not in %s", u.Host, allowedDomains) return "", nil, nil, fmt.Errorf("domain not allowed. %s not in %s", u.Host, allowedDomains)
} }
if os.Getenv("LOG_URLS ") == "true" { if os.Getenv("DEBUG ") == "true" {
log.Println(u.String() + urlQuery) log.Println(u.String() + urlQuery)
} }
@@ -117,13 +120,18 @@ func getenv(key, fallback string) string {
return value return value
} }
func loadRules() RuleSet { func LoadRules(rulesUrl string) RuleSet {
rulesUrl := os.Getenv("RULESET") //rulesUrl := os.Getenv("RULESET")
if rulesUrl == "" { if rulesUrl == "" {
RulesList := RuleSet{} RulesList := RuleSet{}
return RulesList return RulesList
} }
log.Println("Loading rules")
if rulesUrl == "default" {
rulesUrl = "https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml"
}
log.Println("Loading rules: " + rulesUrl)
var ruleSet RuleSet var ruleSet RuleSet
if strings.HasPrefix(rulesUrl, "http") { if strings.HasPrefix(rulesUrl, "http") {
@@ -155,17 +163,14 @@ func loadRules() RuleSet {
yaml.Unmarshal(yamlFile, &ruleSet) yaml.Unmarshal(yamlFile, &ruleSet)
} }
domains := []string{}
for _, rule := range ruleSet { for _, rule := range ruleSet {
//log.Println("Loaded rules for", rule.Domain)
domains = append(domains, rule.Domain)
domains = append(domains, rule.Domains...)
if os.Getenv("ALLOWED_DOMAINS_RULESET") == "true" { if os.Getenv("ALLOWED_DOMAINS_RULESET") == "true" {
allowedDomains = append(allowedDomains, domains...) allowedDomains = append(allowedDomains, rule.Domain)
} }
} }
log.Println("Loaded ", len(ruleSet), " rules for", len(domains), "Domains") log.Println("Loaded rules for", len(ruleSet), "Domains")
return ruleSet return ruleSet
} }
@@ -175,10 +180,7 @@ func applyRules(domain string, path string, body string) string {
} }
for _, rule := range rulesSet { for _, rule := range rulesSet {
domains := rule.Domains if rule.Domain != domain {
domains = append(domains, rule.Domain)
for _, ruleDomain := range domains {
if ruleDomain != domain {
continue continue
} }
if len(rule.Paths) > 0 && !StringInSlice(path, rule.Paths) { if len(rule.Paths) > 0 && !StringInSlice(path, rule.Paths) {
@@ -208,7 +210,6 @@ func applyRules(domain string, path string, body string) string {
} }
} }
} }
}
return body return body
} }
@@ -220,7 +221,6 @@ type Rule struct {
type RuleSet []struct { type RuleSet []struct {
Domain string `yaml:"domain"` Domain string `yaml:"domain"`
Domains []string `yaml:"domains,omitempty"`
Paths []string `yaml:"paths,omitempty"` Paths []string `yaml:"paths,omitempty"`
GoogleCache bool `yaml:"googleCache,omitempty"` GoogleCache bool `yaml:"googleCache,omitempty"`
RegexRules []Rule `yaml:"regexRules"` RegexRules []Rule `yaml:"regexRules"`

View File

@@ -1,6 +0,0 @@
apiVersion: v2
name: ladder
description: A helm chart to deploy kubero-dev/ladder
type: application
version: "1.0"
appVersion: "v0.0.11"

View File

@@ -1,27 +0,0 @@
# Helm Chart for deployment of Ladder
This folder contains a basic helm chart deployment for the ladder app.
# Deployment pre-reqs
## Values
Edit the values to your own preferences, with the only minimum requirement being `ingress.HOST` (line 19) being updated to your intended domain name.
Other variables in `values.yaml` can be updated as to your preferences, with details on each variable being listed in the main [README.md](/README.md) in the root of this repo.
## Defaults in K8s
No ingress default has been specified.
You can set this manually by adding an annotation to the ingress.yaml - if needed.
For example, to use Traefik -
```yaml
metadata:
name: ladder-ingress
annotations:
kubernetes.io/ingress.class: traefik
```
## Helm Install
`helm install <name> <location> -n <namespace-name> --create-namespace`
`helm install ladder .\ladder\ -n ladder --create-namespace`
## Helm Upgrade
`helm upgrade <name> <location> -n <namespace-name>`
`helm upgrade ladder .\ladder\ -n ladder`

View File

@@ -1,55 +0,0 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ladder
name: ladder
spec:
replicas: 1
selector:
matchLabels:
app: ladder
template:
metadata:
labels:
app: ladder
spec:
containers:
- image: "{{ .Values.image.RELEASE }}"
imagePullPolicy: Always
name: ladder
resources:
limits:
cpu: 250m
memory: 128Mi
requests:
cpu: 250m
memory: 128Mi
env:
- name: PORT
value: "{{ .Values.env.PORT }}"
- name: PREFORK
value: "{{ .Values.env.PREFORK }}"
- name: USER_AGENT
value: "{{ .Values.env.USER_AGENT }}"
- name: X_FORWARDED_FOR
value: "{{ .Values.env.X_FORWARDED_FOR }}"
- name: USERPASS
value: "{{ .Values.env.USERPASS }}"
- name: LOG_URLS
value: "{{ .Values.env.LOG_URLS }}"
- name: DISABLE_FORM
value: "{{ .Values.env.DISABLE_FORM }}"
- name: FORM_PATH
value: "{{ .Values.env.FORM_PATH }}"
- name: RULESET
value: "{{ .Values.env.RULESET }}"
- name: EXPOSE_RULESET
value: "{{ .Values.env.EXPOSE_RULESET }}"
- name: ALLOWED_DOMAINS
value: "{{ .Values.env.ALLOWED_DOMAINS }}"
- name: ALLOWED_DOMAINS_RULESET
value: "{{ .Values.env.ALLOWED_DOMAINS_RULESET }}"
restartPolicy: Always
terminationGracePeriodSeconds: 30

View File

@@ -1,17 +0,0 @@
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ladder-ingress
spec:
rules:
- host: "{{ .Values.ingress.HOST }}"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ladder-service
port:
number: {{ .Values.ingress.PORT }}

View File

@@ -1,14 +0,0 @@
---
kind: Service
apiVersion: v1
metadata:
name: ladder-service
spec:
type: ClusterIP
selector:
app: ladder
ports:
- name: http
port: {{ .Values.ingress.PORT }}
protocol: TCP
targetPort: {{ .Values.env.PORT }}

View File

@@ -1,20 +0,0 @@
image:
RELEASE: ghcr.io/kubero-dev/ladder:v0.0.11
env:
PORT: 8080
PREFORK: "false"
USER_AGENT: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
X_FORWARDED_FOR:
USERPASS: ""
LOG_URLS: "true"
DISABLE_FORM: "false"
FORM_PATH: ""
RULESET: "https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml"
EXPOSE_RULESET: "true"
ALLOWED_DOMAINS: ""
ALLOWED_DOMAINS_RULESET: "false"
ingress:
HOST: "ladder.domain.com"
PORT: 80

View File

@@ -1,6 +1,4 @@
- domain: www.example.com - domain: www.example.com
domains:
- www.beispiel.com
regexRules: regexRules:
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)" - match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
replace: <script $1 script="/https://www.example.com/$3" replace: <script $1 script="/https://www.example.com/$3"
@@ -55,92 +53,3 @@
removeDOMElement(paywall) removeDOMElement(paywall)
}); });
</script> </script>
- domains:
- www.architecturaldigest.com
- www.bonappetit.com
- www.cntraveler.com
- www.epicurious.com
- www.gq.com
- www.newyorker.com
- www.vanityfair.com
- www.vogue.com
- www.wired.com
injections:
- position: head
append: |
<script>
document.addEventListener("DOMContentLoaded", () => {
const banners = document.querySelectorAll('.paywall-bar, div[class^="MessageBannerWrapper-"');
banners.forEach(el => { el.remove(); });
});
</script>
- domains:
- www.nytimes.com
- www.time.com
injections:
- position: head
append: |
<script>
window.localStorage.clear();
document.addEventListener("DOMContentLoaded", () => {
const banners = document.querySelectorAll('div[data-testid="inline-message"], div[id^="ad-"], div[id^="leaderboard-"], div.expanded-dock, div.pz-ad-box, div[id="top-wrapper"], div[id="bottom-wrapper"]');
banners.forEach(el => { el.remove(); });
});
</script>
- domains:
- www.thestar.com
- www.niagarafallsreview.ca
- www.stcatharinesstandard.ca
- www.thepeterboroughexaminer.com
- www.therecord.com
- www.thespec.com
- www.wellandtribune.ca
injections:
- position: head
append: |
<script>
window.localStorage.clear();
document.addEventListener("DOMContentLoaded", () => {
const paywall = document.querySelectorAll('div.subscriber-offers');
paywall.forEach(el => { el.remove(); });
const subscriber_only = document.querySelectorAll('div.subscriber-only');
for (const elem of subscriber_only) {
if (elem.classList.contains('encrypted-content') && dompurify_loaded) {
const parser = new DOMParser();
const doc = parser.parseFromString('<div>' + DOMPurify.sanitize(unscramble(elem.innerText)) + '</div>', 'text/html');
const content_new = doc.querySelector('div');
elem.parentNode.replaceChild(content_new, elem);
}
elem.removeAttribute('style');
elem.removeAttribute('class');
}
const banners = document.querySelectorAll('div.subscription-required, div.redacted-overlay, div.subscriber-hide, div.tnt-ads-container');
banners.forEach(el => { el.remove(); });
const ads = document.querySelectorAll('div.tnt-ads-container, div[class*="adLabelWrapper"]');
ads.forEach(el => { el.remove(); });
const recommendations = document.querySelectorAll('div[id^="tncms-region-article"]');
recommendations.forEach(el => { el.remove(); });
});
</script>
- domain: www.usatoday.com
injections:
- position: head
append: |
<script>
document.addEventListener("DOMContentLoaded", () => {
const banners = document.querySelectorAll('div.roadblock-container, .gnt_nb, [aria-label="advertisement"], div[id="main-frame-error"]');
banners.forEach(el => { el.remove(); });
});
</script>
- domain: www.washingtonpost.com
injections:
- position: head
append: |
<script>
document.addEventListener("DOMContentLoaded", () => {
let paywall = document.querySelectorAll('div[data-qa$="-ad"], div[id="leaderboard-wrapper"], div[data-qa="subscribe-promo"]');
paywall.forEach(el => { el.remove(); });
const images = document.querySelectorAll('img');
images.forEach(image => { image.parentElement.style.filter = ''; });
});
</script>