12 Commits

Author SHA1 Message Date
Gianni Carafa
e3eb866d48 allow disabling Ruleset 2023-11-05 23:37:01 +01:00
Gianni Carafa
34a2683457 add ruleset route 2023-11-05 23:30:32 +01:00
Gianni Carafa
07513f6dc4 update README 2023-11-05 23:10:12 +01:00
Gianni Carafa
ec4dc5c2cc update README 2023-11-05 22:59:18 +01:00
Gianni Carafa
ba87d6b980 add limitations 2023-11-05 22:55:57 +01:00
Gianni Carafa
a6ee6aebfc add another domain 2023-11-05 22:32:13 +01:00
Gianni Carafa
7b519c7016 add americanbanker Rule 2023-11-05 22:00:16 +01:00
Gianni Carafa
fba9db3d94 implement local ruleset 2023-11-05 17:12:57 +01:00
Gianni Carafa
0dd5edd5ba update README.md 2023-11-05 09:35:40 +01:00
Gianni Carafa
4023718b12 update README.md 2023-11-05 09:34:53 +01:00
Gianni Carafa
cb02f52a46 add ruleset example 2023-11-05 00:29:00 +01:00
Gianni Carafa
3e20678e3d update README 2023-11-05 00:21:36 +01:00
5 changed files with 175 additions and 45 deletions

View File

@@ -10,8 +10,6 @@
Freedom of information is an essential pillar of democracy and informed decision-making. While media organizations have legitimate financial interests, it is crucial to strike a balance between profitability and the public's right to access information. The proliferation of paywalls raises concerns about the erosion of this fundamental freedom, and it is imperative for society to find innovative ways to preserve access to vital information without compromising the sustainability of journalism. In a world where knowledge should be shared and not commodified, paywalls should be critically examined to ensure that they do not undermine the principles of an open and informed society.
Certain sites may display missing images or encounter formatting issues. This can be attributed to the site's reliance on JavaScript or CSS for image and resource loading, which presents a limitation when accessed through this proxy. If you prefer a full experience, please concider buying a subscription for the site.
> **Disclaimer:** This project is intended for educational purposes only. The author does not endorse or encourage any unethical or illegal activity. Use this tool at your own risk.
### Features
@@ -23,15 +21,24 @@ Certain sites may display missing images or encounter formatting issues. This ca
- [x] Fetch RAW HTML
- [x] Custom User Agent
- [x] Custom X-Forwarded-For IP
- [x] [Docker container](https://github.com/kubero-dev/ladder/pkgs/container/ladder)
- [x] [Docker container](https://github.com/kubero-dev/ladder/pkgs/container/ladder) (amd64, arm64)
- [x] Linux binary
- [x] Mac OS binary
- [x] Windows binary (untested)
- [x] Removes most of the ads (unexpected side effect)
- [x] Removes most of the ads (unexpected side effect ¯\_(ツ)_/¯ )
- [x] Basic Auth
- [x] Disable logs
- [x] No Tracking
- [x] Limit the proxy to a list of domains
- [x] Expose Ruleset to other ladders
- [ ] Optional TOR proxy
- [ ] A key to share only one URL
- [ ] Fetch from Google Cache if not available
### Limitations
Certain sites may display missing images or encounter formatting issues. This can be attributed to the site's reliance on JavaScript or CSS for image and resource loading, which presents a limitation when accessed through this proxy. If you prefer a full experience, please concider buying a subscription for the site.
Some sites do not expose their content to search engines, which means that the proxy cannot access the content. A future version will try to fetch the content from Google Cache.
## Installation
@@ -71,11 +78,15 @@ curl -X GET "http://localhost:8080/api/https://www.example.com"
### RAW
http://localhost:8080/raw/https://www.example.com
### Running Ruleset
http://localhost:8080/ruleset
## Configuration
### Environment Variables
| Variable | Description | Default |
| Variable | Description | Value |
| --- | --- | --- |
| `PORT` | Port to listen on | `8080` |
| `PREFORK` | Spawn multiple server instances | `false` |
@@ -85,10 +96,44 @@ http://localhost:8080/raw/https://www.example.com
| `LOG_URLS` | Log fetched URL's | `true` |
| `DISABLE_FORM` | Disables URL Form Frontpage | `false` |
| `FORM_PATH` | Path to custom Form HTML | `` |
| `RULES_URL` | URL to a ruleset file | `https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml` |
| `RULESET` | URL to a ruleset file | `https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml` or `/path/to/my/rules.yaml` |
| `EXPOSE_RULESET` | Make your Ruleset available to other ladders | `true` |
| `ALLOWED_DOMAINS` | Comma separated list of allowed domains. Empty = no limitations | `` |
| `ALLOWED_DOMAINS_RULESET` | Allow Domains from Ruleset. false = no limitations | `false` |
`ALLOWED_DOMAINS` and `ALLOWED_DOMAINS_RULESET` are joined together. If both are empty, no limitations are applied.
### Ruleset
It is possible to apply custom rules to modify the response. This can be used to remove unwanted or modify elements from the page. The ruleset is a YAML file that contains a list of rules for each domain and is loaded on startup
See in [ruleset.yaml](ruleset.yaml) for an example.
```yaml
- domain: www.example.com
regexRules:
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
replace: <script $1 script="/https://www.example.com/$3"
injections:
- position: head # Position where to inject the code
append: |
<script>
window.localStorage.clear();
console.log("test");
alert("Hello!");
</script>
- domain: www.anotherdomain.com # Domain where the rule applies
paths: # Paths where the rule applies
- /article
googleCache: false # Search also in Google Cache
regexRules: # Regex rules to apply
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
replace: <script $1 script="/https://www.example.com/$3"
injections:
- position: .left-content article .post-title # Position where to inject the code into DOM
replace: |
<h1>My Custom Title</h1>
- position: .left-content article # Position where to inject the code into DOM
prepend: |
<h2>Suptitle</h2>
```

View File

@@ -39,7 +39,6 @@ func main() {
Default: pf,
Help: "This will spawn multiple processes listening"})
// Parse input
err := parser.Parse(os.Args)
if err != nil {
fmt.Print(parser.Usage(err))
@@ -74,9 +73,11 @@ func main() {
}
app.Get("/", handlers.Form)
app.Get("ruleset", handlers.Ruleset)
app.Get("raw/*", handlers.Raw)
app.Get("api/*", handlers.Api)
app.Get("ruleset", handlers.Raw)
app.Get("/*", handlers.ProxySite)
log.Fatal(app.Listen(":" + *port))

View File

@@ -18,6 +18,7 @@ import (
var UserAgent = getenv("USER_AGENT", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
var ForwardedFor = getenv("X_FORWARDED_FOR", "66.249.66.1")
var rulesSet = loadRules()
var allowedDomains = strings.Split(os.Getenv("ALLOWED_DOMAINS"), ",")
func ProxySite(c *fiber.Ctx) error {
// Get the url from the URL
@@ -51,6 +52,10 @@ func fetchSite(urlpath string, queries map[string]string) (string, *http.Request
return "", nil, nil, err
}
if len(allowedDomains) > 0 && !StringInSlice(u.Host, allowedDomains) {
return "", nil, nil, fmt.Errorf("domain not allowed. %s not in %s", u.Host, allowedDomains)
}
if os.Getenv("DEBUG ") == "true" {
log.Println(u.String() + urlQuery)
}
@@ -98,7 +103,7 @@ func rewriteHtml(bodyB []byte, u *url.URL) string {
body = strings.ReplaceAll(body, "url(/", "url(/https://"+u.Host+"/")
body = strings.ReplaceAll(body, "href=\"https://"+u.Host, "href=\"/https://"+u.Host+"/")
if os.Getenv("RULES_URL") != "" {
if os.Getenv("RULESET") != "" {
body = applyRules(u.Host, u.Path, body)
}
return body
@@ -113,32 +118,48 @@ func getenv(key, fallback string) string {
}
func loadRules() RuleSet {
rulesUrl := os.Getenv("RULES_URL")
rulesUrl := os.Getenv("RULESET")
if rulesUrl == "" {
RulesList := RuleSet{}
return RulesList
}
log.Println("Loading rules")
resp, err := http.Get(rulesUrl)
if err != nil {
log.Println("ERROR:", err)
}
defer resp.Body.Close()
if resp.StatusCode >= 400 {
log.Println("ERROR:", resp.StatusCode, rulesUrl)
}
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Println("ERROR:", err)
}
var ruleSet RuleSet
yaml.Unmarshal(body, &ruleSet)
if err != nil {
log.Println("ERROR:", err)
if strings.HasPrefix(rulesUrl, "http") {
resp, err := http.Get(rulesUrl)
if err != nil {
log.Println("ERROR:", err)
}
defer resp.Body.Close()
if resp.StatusCode >= 400 {
log.Println("ERROR:", resp.StatusCode, rulesUrl)
}
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Println("ERROR:", err)
}
yaml.Unmarshal(body, &ruleSet)
if err != nil {
log.Println("ERROR:", err)
}
} else {
yamlFile, err := os.ReadFile(rulesUrl)
if err != nil {
log.Println("ERROR:", err)
}
yaml.Unmarshal(yamlFile, &ruleSet)
}
for _, rule := range ruleSet {
//log.Println("Loaded rules for", rule.Domain)
if os.Getenv("ALLOWED_DOMAINS_RULESET") == "true" {
allowedDomains = append(allowedDomains, rule.Domain)
}
}
log.Println("Loaded rules for", len(ruleSet), "Domains")
@@ -154,7 +175,7 @@ func applyRules(domain string, path string, body string) string {
if rule.Domain != domain {
continue
}
if rule.Path != "" && rule.Path != path {
if len(rule.Paths) > 0 && !StringInSlice(path, rule.Paths) {
continue
}
for _, regexRule := range rule.RegexRules {
@@ -191,10 +212,10 @@ type Rule struct {
}
type RuleSet []struct {
Domain string `yaml:"domain"`
Path string `yaml:"path,omitempty"`
GoogleCache bool `yaml:"googleCache,omitempty"`
RegexRules []Rule `yaml:"regexRules"`
Domain string `yaml:"domain"`
Paths []string `yaml:"paths,omitempty"`
GoogleCache bool `yaml:"googleCache,omitempty"`
RegexRules []Rule `yaml:"regexRules"`
Injections []struct {
Position string `yaml:"position"`
Append string `yaml:"append"`
@@ -202,3 +223,12 @@ type RuleSet []struct {
Replace string `yaml:"replace"`
} `yaml:"injections"`
}
func StringInSlice(s string, list []string) bool {
for _, x := range list {
if strings.HasPrefix(s, x) {
return true
}
}
return false
}

24
handlers/ruleset.go Normal file
View File

@@ -0,0 +1,24 @@
package handlers
import (
"os"
"github.com/gofiber/fiber/v2"
"gopkg.in/yaml.v3"
)
func Ruleset(c *fiber.Ctx) error {
if os.Getenv("EXPOSE_RULESET") == "false" {
c.SendStatus(fiber.StatusForbidden)
return c.SendString("Rules Disabled")
}
body, err := yaml.Marshal(rulesSet)
if err != nil {
c.SendStatus(fiber.StatusInternalServerError)
return c.SendString(err.Error())
}
return c.SendString(string(body))
}

View File

@@ -10,16 +10,46 @@
console.log("test");
alert("Hello!");
</script>
- domain: www.anotherdomain.com # Domain where the rule applies
path: /article # Path where the rule applies
googleCache: false # Search also in Google Cache
regexRules: # Regex rules to apply
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
replace: <script $1 script="/https://www.example.com/$3"
injections:
- position: .left-content article .post-title # Position where to inject the code into DOM
- position: h1
replace: |
<h1>My Custom Title</h1>
- position: .left-content article # Position where to inject the code into DOM
prepend: |
<h2>Suptitle</h2>
<h1>An example with a ladder ;-)</h1>
- domain: www.americanbanker.com
paths:
- /news
injections:
- position: head
append: |
<script>
document.addEventListener("DOMContentLoaded", () => {
const inlineGate = document.querySelector('.inline-gate');
if (inlineGate) {
inlineGate.classList.remove('inline-gate');
const inlineGated = document.querySelectorAll('.inline-gated');
for (const elem of inlineGated) { elem.classList.remove('inline-gated'); }
}
});
</script>
- domain: www.nzz.ch
paths:
- /international
- /sport
- /wirtschaft
- /technologie
- /feuilleton
- /zuerich
- /wissenschaft
- /gesellschaft
- /panorama
- /mobilitaet
- /reisen
- /meinung
- /finanze
injections:
- position: head
append: |
<script>
document.addEventListener("DOMContentLoaded", () => {
const paywall = document.querySelector('.dynamic-regwall');
removeDOMElement(paywall)
});
</script>