9 Commits

Author SHA1 Message Date
Gianni Carafa
ec4dc5c2cc update README 2023-11-05 22:59:18 +01:00
Gianni Carafa
ba87d6b980 add limitations 2023-11-05 22:55:57 +01:00
Gianni Carafa
a6ee6aebfc add another domain 2023-11-05 22:32:13 +01:00
Gianni Carafa
7b519c7016 add americanbanker Rule 2023-11-05 22:00:16 +01:00
Gianni Carafa
fba9db3d94 implement local ruleset 2023-11-05 17:12:57 +01:00
Gianni Carafa
0dd5edd5ba update README.md 2023-11-05 09:35:40 +01:00
Gianni Carafa
4023718b12 update README.md 2023-11-05 09:34:53 +01:00
Gianni Carafa
cb02f52a46 add ruleset example 2023-11-05 00:29:00 +01:00
Gianni Carafa
3e20678e3d update README 2023-11-05 00:21:36 +01:00
3 changed files with 140 additions and 42 deletions

View File

@@ -10,8 +10,6 @@
Freedom of information is an essential pillar of democracy and informed decision-making. While media organizations have legitimate financial interests, it is crucial to strike a balance between profitability and the public's right to access information. The proliferation of paywalls raises concerns about the erosion of this fundamental freedom, and it is imperative for society to find innovative ways to preserve access to vital information without compromising the sustainability of journalism. In a world where knowledge should be shared and not commodified, paywalls should be critically examined to ensure that they do not undermine the principles of an open and informed society.
Certain sites may display missing images or encounter formatting issues. This can be attributed to the site's reliance on JavaScript or CSS for image and resource loading, which presents a limitation when accessed through this proxy. If you prefer a full experience, please concider buying a subscription for the site.
> **Disclaimer:** This project is intended for educational purposes only. The author does not endorse or encourage any unethical or illegal activity. Use this tool at your own risk.
### Features
@@ -31,7 +29,15 @@ Certain sites may display missing images or encounter formatting issues. This ca
- [x] Basic Auth
- [x] Disable logs
- [x] No Tracking
- [x] Limit the proxy to a list of domains
- [ ] Optional TOR proxy
- [ ] A key to share only one URL
- [ ] Fetch from Google Cache if not available
### Limitations
Certain sites may display missing images or encounter formatting issues. This can be attributed to the site's reliance on JavaScript or CSS for image and resource loading, which presents a limitation when accessed through this proxy. If you prefer a full experience, please concider buying a subscription for the site.
Some sites do not expose their content to search engines, which means that the proxy cannot access the content. A future version will try to fetch the content from Google Cache.
## Installation
@@ -75,7 +81,7 @@ http://localhost:8080/raw/https://www.example.com
### Environment Variables
| Variable | Description | Default |
| Variable | Description | Value |
| --- | --- | --- |
| `PORT` | Port to listen on | `8080` |
| `PREFORK` | Spawn multiple server instances | `false` |
@@ -85,10 +91,42 @@ http://localhost:8080/raw/https://www.example.com
| `LOG_URLS` | Log fetched URL's | `true` |
| `DISABLE_FORM` | Disables URL Form Frontpage | `false` |
| `FORM_PATH` | Path to custom Form HTML | `` |
| `RULES_URL` | URL to a ruleset file | `https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml` |
| `RULESET` | URL to a ruleset file | `https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml` or `/path/to/my/rules.yaml` |
| `ALLOWED_DOMAINS` | Comma separated list of allowed domains. Empty = no limitations | `` |
| `ALLOWED_DOMAINS_RULESET` | Allow Domains from Ruleset. false = no limitations | `false` |
`ALLOWED_DOMAINS` and `ALLOWED_DOMAINS_RULESET` are joined together. If both are empty, no limitations are applied.
### Ruleset
It is possible to apply custom rules to modify the response. This can be used to remove unwanted or modify elements from the page. The ruleset is a YAML file that contains a list of rules for each domain and is loaded on startup
See in [ruleset.yaml](ruleset.yaml) for an example.
```yaml
- domain: www.example.com
regexRules:
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
replace: <script $1 script="/https://www.example.com/$3"
injections:
- position: head # Position where to inject the code
append: |
<script>
window.localStorage.clear();
console.log("test");
alert("Hello!");
</script>
- domain: www.anotherdomain.com # Domain where the rule applies
path: /article # Path where the rule applies
googleCache: false # Search also in Google Cache
regexRules: # Regex rules to apply
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
replace: <script $1 script="/https://www.example.com/$3"
injections:
- position: .left-content article .post-title # Position where to inject the code into DOM
replace: |
<h1>My Custom Title</h1>
- position: .left-content article # Position where to inject the code into DOM
prepend: |
<h2>Suptitle</h2>
```

View File

@@ -18,6 +18,7 @@ import (
var UserAgent = getenv("USER_AGENT", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
var ForwardedFor = getenv("X_FORWARDED_FOR", "66.249.66.1")
var rulesSet = loadRules()
var allowedDomains = strings.Split(os.Getenv("ALLOWED_DOMAINS"), ",")
func ProxySite(c *fiber.Ctx) error {
// Get the url from the URL
@@ -51,6 +52,10 @@ func fetchSite(urlpath string, queries map[string]string) (string, *http.Request
return "", nil, nil, err
}
if len(allowedDomains) > 0 && !StringInSlice(u.Host, allowedDomains) {
return "", nil, nil, fmt.Errorf("domain not allowed. %s not in %s", u.Host, allowedDomains)
}
if os.Getenv("DEBUG ") == "true" {
log.Println(u.String() + urlQuery)
}
@@ -98,7 +103,7 @@ func rewriteHtml(bodyB []byte, u *url.URL) string {
body = strings.ReplaceAll(body, "url(/", "url(/https://"+u.Host+"/")
body = strings.ReplaceAll(body, "href=\"https://"+u.Host, "href=\"/https://"+u.Host+"/")
if os.Getenv("RULES_URL") != "" {
if os.Getenv("RULESET") != "" {
body = applyRules(u.Host, u.Path, body)
}
return body
@@ -113,32 +118,48 @@ func getenv(key, fallback string) string {
}
func loadRules() RuleSet {
rulesUrl := os.Getenv("RULES_URL")
rulesUrl := os.Getenv("RULESET")
if rulesUrl == "" {
RulesList := RuleSet{}
return RulesList
}
log.Println("Loading rules")
resp, err := http.Get(rulesUrl)
if err != nil {
log.Println("ERROR:", err)
}
defer resp.Body.Close()
if resp.StatusCode >= 400 {
log.Println("ERROR:", resp.StatusCode, rulesUrl)
}
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Println("ERROR:", err)
}
var ruleSet RuleSet
yaml.Unmarshal(body, &ruleSet)
if err != nil {
log.Println("ERROR:", err)
if strings.HasPrefix(rulesUrl, "http") {
resp, err := http.Get(rulesUrl)
if err != nil {
log.Println("ERROR:", err)
}
defer resp.Body.Close()
if resp.StatusCode >= 400 {
log.Println("ERROR:", resp.StatusCode, rulesUrl)
}
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Println("ERROR:", err)
}
yaml.Unmarshal(body, &ruleSet)
if err != nil {
log.Println("ERROR:", err)
}
} else {
yamlFile, err := os.ReadFile(rulesUrl)
if err != nil {
log.Println("ERROR:", err)
}
yaml.Unmarshal(yamlFile, &ruleSet)
}
for _, rule := range ruleSet {
//log.Println("Loaded rules for", rule.Domain)
if os.Getenv("ALLOWED_DOMAINS_RULESET") == "true" {
allowedDomains = append(allowedDomains, rule.Domain)
}
}
log.Println("Loaded rules for", len(ruleSet), "Domains")
@@ -154,7 +175,7 @@ func applyRules(domain string, path string, body string) string {
if rule.Domain != domain {
continue
}
if rule.Path != "" && rule.Path != path {
if len(rule.Paths) > 0 && !StringInSlice(path, rule.Paths) {
continue
}
for _, regexRule := range rule.RegexRules {
@@ -191,10 +212,10 @@ type Rule struct {
}
type RuleSet []struct {
Domain string `yaml:"domain"`
Path string `yaml:"path,omitempty"`
GoogleCache bool `yaml:"googleCache,omitempty"`
RegexRules []Rule `yaml:"regexRules"`
Domain string `yaml:"domain"`
Paths []string `yaml:"paths,omitempty"`
GoogleCache bool `yaml:"googleCache,omitempty"`
RegexRules []Rule `yaml:"regexRules"`
Injections []struct {
Position string `yaml:"position"`
Append string `yaml:"append"`
@@ -202,3 +223,12 @@ type RuleSet []struct {
Replace string `yaml:"replace"`
} `yaml:"injections"`
}
func StringInSlice(s string, list []string) bool {
for _, x := range list {
if strings.HasPrefix(s, x) {
return true
}
}
return false
}

View File

@@ -10,16 +10,46 @@
console.log("test");
alert("Hello!");
</script>
- domain: www.anotherdomain.com # Domain where the rule applies
path: /article # Path where the rule applies
googleCache: false # Search also in Google Cache
regexRules: # Regex rules to apply
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
replace: <script $1 script="/https://www.example.com/$3"
injections:
- position: .left-content article .post-title # Position where to inject the code into DOM
- position: h1
replace: |
<h1>My Custom Title</h1>
- position: .left-content article # Position where to inject the code into DOM
prepend: |
<h2>Suptitle</h2>
<h1>An example with a ladder ;-)</h1>
- domain: www.americanbanker.com
paths:
- /news
injections:
- position: head
append: |
<script>
document.addEventListener("DOMContentLoaded", () => {
const inlineGate = document.querySelector('.inline-gate');
if (inlineGate) {
inlineGate.classList.remove('inline-gate');
const inlineGated = document.querySelectorAll('.inline-gated');
for (const elem of inlineGated) { elem.classList.remove('inline-gated'); }
}
});
</script>
- domain: www.nzz.ch
paths:
- /international
- /sport
- /wirtschaft
- /technologie
- /feuilleton
- /zuerich
- /wissenschaft
- /gesellschaft
- /panorama
- /mobilitaet
- /reisen
- /meinung
- /finanze
injections:
- position: head
append: |
<script>
document.addEventListener("DOMContentLoaded", () => {
const paywall = document.querySelector('.dynamic-regwall');
removeDOMElement(paywall)
});
</script>