Compare commits
13 Commits
v0.0.9
...
feature/ad
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
797a33068d | ||
|
|
e3eb866d48 | ||
|
|
34a2683457 | ||
|
|
07513f6dc4 | ||
|
|
ec4dc5c2cc | ||
|
|
ba87d6b980 | ||
|
|
a6ee6aebfc | ||
|
|
7b519c7016 | ||
|
|
fba9db3d94 | ||
|
|
0dd5edd5ba | ||
|
|
4023718b12 | ||
|
|
cb02f52a46 | ||
|
|
3e20678e3d |
0
.github/pull_request_template.md
vendored
Normal file
0
.github/pull_request_template.md
vendored
Normal file
57
README.md
57
README.md
@@ -10,8 +10,6 @@
|
||||
|
||||
Freedom of information is an essential pillar of democracy and informed decision-making. While media organizations have legitimate financial interests, it is crucial to strike a balance between profitability and the public's right to access information. The proliferation of paywalls raises concerns about the erosion of this fundamental freedom, and it is imperative for society to find innovative ways to preserve access to vital information without compromising the sustainability of journalism. In a world where knowledge should be shared and not commodified, paywalls should be critically examined to ensure that they do not undermine the principles of an open and informed society.
|
||||
|
||||
Certain sites may display missing images or encounter formatting issues. This can be attributed to the site's reliance on JavaScript or CSS for image and resource loading, which presents a limitation when accessed through this proxy. If you prefer a full experience, please concider buying a subscription for the site.
|
||||
|
||||
> **Disclaimer:** This project is intended for educational purposes only. The author does not endorse or encourage any unethical or illegal activity. Use this tool at your own risk.
|
||||
|
||||
### Features
|
||||
@@ -23,15 +21,24 @@ Certain sites may display missing images or encounter formatting issues. This ca
|
||||
- [x] Fetch RAW HTML
|
||||
- [x] Custom User Agent
|
||||
- [x] Custom X-Forwarded-For IP
|
||||
- [x] [Docker container](https://github.com/kubero-dev/ladder/pkgs/container/ladder)
|
||||
- [x] [Docker container](https://github.com/kubero-dev/ladder/pkgs/container/ladder) (amd64, arm64)
|
||||
- [x] Linux binary
|
||||
- [x] Mac OS binary
|
||||
- [x] Windows binary (untested)
|
||||
- [x] Removes most of the ads (unexpected side effect)
|
||||
- [x] Removes most of the ads (unexpected side effect ¯\_(ツ)_/¯ )
|
||||
- [x] Basic Auth
|
||||
- [x] Disable logs
|
||||
- [x] No Tracking
|
||||
- [x] Limit the proxy to a list of domains
|
||||
- [x] Expose Ruleset to other ladders
|
||||
- [ ] Optional TOR proxy
|
||||
- [ ] A key to share only one URL
|
||||
- [ ] Fetch from Google Cache if not available
|
||||
|
||||
### Limitations
|
||||
Certain sites may display missing images or encounter formatting issues. This can be attributed to the site's reliance on JavaScript or CSS for image and resource loading, which presents a limitation when accessed through this proxy. If you prefer a full experience, please concider buying a subscription for the site.
|
||||
|
||||
Some sites do not expose their content to search engines, which means that the proxy cannot access the content. A future version will try to fetch the content from Google Cache.
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -71,11 +78,15 @@ curl -X GET "http://localhost:8080/api/https://www.example.com"
|
||||
### RAW
|
||||
http://localhost:8080/raw/https://www.example.com
|
||||
|
||||
|
||||
### Running Ruleset
|
||||
http://localhost:8080/ruleset
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
| Variable | Description | Value |
|
||||
| --- | --- | --- |
|
||||
| `PORT` | Port to listen on | `8080` |
|
||||
| `PREFORK` | Spawn multiple server instances | `false` |
|
||||
@@ -85,10 +96,44 @@ http://localhost:8080/raw/https://www.example.com
|
||||
| `LOG_URLS` | Log fetched URL's | `true` |
|
||||
| `DISABLE_FORM` | Disables URL Form Frontpage | `false` |
|
||||
| `FORM_PATH` | Path to custom Form HTML | `` |
|
||||
| `RULES_URL` | URL to a ruleset file | `https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml` |
|
||||
| `RULESET` | URL to a ruleset file | `https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml` or `/path/to/my/rules.yaml` or `default` |
|
||||
| `EXPOSE_RULESET` | Make your Ruleset available to other ladders | `true` |
|
||||
| `ALLOWED_DOMAINS` | Comma separated list of allowed domains. Empty = no limitations | `` |
|
||||
| `ALLOWED_DOMAINS_RULESET` | Allow Domains from Ruleset. false = no limitations | `false` |
|
||||
|
||||
`ALLOWED_DOMAINS` and `ALLOWED_DOMAINS_RULESET` are joined together. If both are empty, no limitations are applied.
|
||||
|
||||
### Ruleset
|
||||
|
||||
It is possible to apply custom rules to modify the response. This can be used to remove unwanted or modify elements from the page. The ruleset is a YAML file that contains a list of rules for each domain and is loaded on startup
|
||||
|
||||
See in [ruleset.yaml](ruleset.yaml) for an example.
|
||||
|
||||
```yaml
|
||||
- domain: www.example.com
|
||||
regexRules:
|
||||
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
|
||||
replace: <script $1 script="/https://www.example.com/$3"
|
||||
injections:
|
||||
- position: head # Position where to inject the code
|
||||
append: |
|
||||
<script>
|
||||
window.localStorage.clear();
|
||||
console.log("test");
|
||||
alert("Hello!");
|
||||
</script>
|
||||
- domain: www.anotherdomain.com # Domain where the rule applies
|
||||
paths: # Paths where the rule applies
|
||||
- /article
|
||||
googleCache: false # Search also in Google Cache
|
||||
regexRules: # Regex rules to apply
|
||||
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
|
||||
replace: <script $1 script="/https://www.example.com/$3"
|
||||
injections:
|
||||
- position: .left-content article .post-title # Position where to inject the code into DOM
|
||||
replace: |
|
||||
<h1>My Custom Title</h1>
|
||||
- position: .left-content article # Position where to inject the code into DOM
|
||||
prepend: |
|
||||
<h2>Suptitle</h2>
|
||||
```
|
||||
12
cmd/main.go
12
cmd/main.go
@@ -33,13 +33,19 @@ func main() {
|
||||
Help: "Port the webserver will listen on"})
|
||||
|
||||
pf, _ := strconv.ParseBool(os.Getenv("PREFORK"))
|
||||
|
||||
prefork := parser.Flag("P", "prefork", &argparse.Options{
|
||||
Required: false,
|
||||
Default: pf,
|
||||
Help: "This will spawn multiple processes listening"})
|
||||
|
||||
// Parse input
|
||||
r := os.Getenv("RULESET")
|
||||
ruleset := parser.String("r", "ruleset", &argparse.Options{
|
||||
Required: false,
|
||||
Default: r,
|
||||
Help: "Path or URL to your ruleset"})
|
||||
|
||||
handlers.LoadRules(*ruleset)
|
||||
|
||||
err := parser.Parse(os.Args)
|
||||
if err != nil {
|
||||
fmt.Print(parser.Usage(err))
|
||||
@@ -74,9 +80,11 @@ func main() {
|
||||
}
|
||||
|
||||
app.Get("/", handlers.Form)
|
||||
app.Get("ruleset", handlers.Ruleset)
|
||||
|
||||
app.Get("raw/*", handlers.Raw)
|
||||
app.Get("api/*", handlers.Api)
|
||||
app.Get("ruleset", handlers.Raw)
|
||||
app.Get("/*", handlers.ProxySite)
|
||||
|
||||
log.Fatal(app.Listen(":" + *port))
|
||||
|
||||
@@ -17,7 +17,11 @@ import (
|
||||
|
||||
var UserAgent = getenv("USER_AGENT", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
|
||||
var ForwardedFor = getenv("X_FORWARDED_FOR", "66.249.66.1")
|
||||
var rulesSet = loadRules()
|
||||
var rulesSet RuleSet
|
||||
|
||||
// var rulesSet = loadRules()
|
||||
var allowedDomains = strings.Split(os.Getenv("ALLOWED_DOMAINS"), ",")
|
||||
var Aaaa = "aaaa"
|
||||
|
||||
func ProxySite(c *fiber.Ctx) error {
|
||||
// Get the url from the URL
|
||||
@@ -51,6 +55,10 @@ func fetchSite(urlpath string, queries map[string]string) (string, *http.Request
|
||||
return "", nil, nil, err
|
||||
}
|
||||
|
||||
if len(allowedDomains) > 0 && !StringInSlice(u.Host, allowedDomains) {
|
||||
return "", nil, nil, fmt.Errorf("domain not allowed. %s not in %s", u.Host, allowedDomains)
|
||||
}
|
||||
|
||||
if os.Getenv("DEBUG ") == "true" {
|
||||
log.Println(u.String() + urlQuery)
|
||||
}
|
||||
@@ -98,7 +106,7 @@ func rewriteHtml(bodyB []byte, u *url.URL) string {
|
||||
body = strings.ReplaceAll(body, "url(/", "url(/https://"+u.Host+"/")
|
||||
body = strings.ReplaceAll(body, "href=\"https://"+u.Host, "href=\"/https://"+u.Host+"/")
|
||||
|
||||
if os.Getenv("RULES_URL") != "" {
|
||||
if os.Getenv("RULESET") != "" {
|
||||
body = applyRules(u.Host, u.Path, body)
|
||||
}
|
||||
return body
|
||||
@@ -112,13 +120,21 @@ func getenv(key, fallback string) string {
|
||||
return value
|
||||
}
|
||||
|
||||
func loadRules() RuleSet {
|
||||
rulesUrl := os.Getenv("RULES_URL")
|
||||
func LoadRules(rulesUrl string) RuleSet {
|
||||
//rulesUrl := os.Getenv("RULESET")
|
||||
if rulesUrl == "" {
|
||||
RulesList := RuleSet{}
|
||||
return RulesList
|
||||
}
|
||||
log.Println("Loading rules")
|
||||
|
||||
if rulesUrl == "default" {
|
||||
rulesUrl = "https://raw.githubusercontent.com/kubero-dev/ladder/main/ruleset.yaml"
|
||||
}
|
||||
|
||||
log.Println("Loading rules: " + rulesUrl)
|
||||
|
||||
var ruleSet RuleSet
|
||||
if strings.HasPrefix(rulesUrl, "http") {
|
||||
|
||||
resp, err := http.Get(rulesUrl)
|
||||
if err != nil {
|
||||
@@ -134,12 +150,25 @@ func loadRules() RuleSet {
|
||||
if err != nil {
|
||||
log.Println("ERROR:", err)
|
||||
}
|
||||
|
||||
var ruleSet RuleSet
|
||||
yaml.Unmarshal(body, &ruleSet)
|
||||
|
||||
if err != nil {
|
||||
log.Println("ERROR:", err)
|
||||
}
|
||||
} else {
|
||||
yamlFile, err := os.ReadFile(rulesUrl)
|
||||
if err != nil {
|
||||
log.Println("ERROR:", err)
|
||||
}
|
||||
yaml.Unmarshal(yamlFile, &ruleSet)
|
||||
}
|
||||
|
||||
for _, rule := range ruleSet {
|
||||
//log.Println("Loaded rules for", rule.Domain)
|
||||
if os.Getenv("ALLOWED_DOMAINS_RULESET") == "true" {
|
||||
allowedDomains = append(allowedDomains, rule.Domain)
|
||||
}
|
||||
}
|
||||
|
||||
log.Println("Loaded rules for", len(ruleSet), "Domains")
|
||||
return ruleSet
|
||||
@@ -154,7 +183,7 @@ func applyRules(domain string, path string, body string) string {
|
||||
if rule.Domain != domain {
|
||||
continue
|
||||
}
|
||||
if rule.Path != "" && rule.Path != path {
|
||||
if len(rule.Paths) > 0 && !StringInSlice(path, rule.Paths) {
|
||||
continue
|
||||
}
|
||||
for _, regexRule := range rule.RegexRules {
|
||||
@@ -192,7 +221,7 @@ type Rule struct {
|
||||
|
||||
type RuleSet []struct {
|
||||
Domain string `yaml:"domain"`
|
||||
Path string `yaml:"path,omitempty"`
|
||||
Paths []string `yaml:"paths,omitempty"`
|
||||
GoogleCache bool `yaml:"googleCache,omitempty"`
|
||||
RegexRules []Rule `yaml:"regexRules"`
|
||||
Injections []struct {
|
||||
@@ -202,3 +231,12 @@ type RuleSet []struct {
|
||||
Replace string `yaml:"replace"`
|
||||
} `yaml:"injections"`
|
||||
}
|
||||
|
||||
func StringInSlice(s string, list []string) bool {
|
||||
for _, x := range list {
|
||||
if strings.HasPrefix(s, x) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
24
handlers/ruleset.go
Normal file
24
handlers/ruleset.go
Normal file
@@ -0,0 +1,24 @@
|
||||
package handlers
|
||||
|
||||
import (
|
||||
"os"
|
||||
|
||||
"github.com/gofiber/fiber/v2"
|
||||
"gopkg.in/yaml.v3"
|
||||
)
|
||||
|
||||
func Ruleset(c *fiber.Ctx) error {
|
||||
|
||||
if os.Getenv("EXPOSE_RULESET") == "false" {
|
||||
c.SendStatus(fiber.StatusForbidden)
|
||||
return c.SendString("Rules Disabled")
|
||||
}
|
||||
|
||||
body, err := yaml.Marshal(rulesSet)
|
||||
if err != nil {
|
||||
c.SendStatus(fiber.StatusInternalServerError)
|
||||
return c.SendString(err.Error())
|
||||
}
|
||||
|
||||
return c.SendString(string(body))
|
||||
}
|
||||
54
ruleset.yaml
54
ruleset.yaml
@@ -10,16 +10,46 @@
|
||||
console.log("test");
|
||||
alert("Hello!");
|
||||
</script>
|
||||
- domain: www.anotherdomain.com # Domain where the rule applies
|
||||
path: /article # Path where the rule applies
|
||||
googleCache: false # Search also in Google Cache
|
||||
regexRules: # Regex rules to apply
|
||||
- match: <script\s+([^>]*\s+)?src="(/)([^"]*)"
|
||||
replace: <script $1 script="/https://www.example.com/$3"
|
||||
injections:
|
||||
- position: .left-content article .post-title # Position where to inject the code into DOM
|
||||
- position: h1
|
||||
replace: |
|
||||
<h1>My Custom Title</h1>
|
||||
- position: .left-content article # Position where to inject the code into DOM
|
||||
prepend: |
|
||||
<h2>Suptitle</h2>
|
||||
<h1>An example with a ladder ;-)</h1>
|
||||
- domain: www.americanbanker.com
|
||||
paths:
|
||||
- /news
|
||||
injections:
|
||||
- position: head
|
||||
append: |
|
||||
<script>
|
||||
document.addEventListener("DOMContentLoaded", () => {
|
||||
const inlineGate = document.querySelector('.inline-gate');
|
||||
if (inlineGate) {
|
||||
inlineGate.classList.remove('inline-gate');
|
||||
const inlineGated = document.querySelectorAll('.inline-gated');
|
||||
for (const elem of inlineGated) { elem.classList.remove('inline-gated'); }
|
||||
}
|
||||
});
|
||||
</script>
|
||||
- domain: www.nzz.ch
|
||||
paths:
|
||||
- /international
|
||||
- /sport
|
||||
- /wirtschaft
|
||||
- /technologie
|
||||
- /feuilleton
|
||||
- /zuerich
|
||||
- /wissenschaft
|
||||
- /gesellschaft
|
||||
- /panorama
|
||||
- /mobilitaet
|
||||
- /reisen
|
||||
- /meinung
|
||||
- /finanze
|
||||
injections:
|
||||
- position: head
|
||||
append: |
|
||||
<script>
|
||||
document.addEventListener("DOMContentLoaded", () => {
|
||||
const paywall = document.querySelector('.dynamic-regwall');
|
||||
removeDOMElement(paywall)
|
||||
});
|
||||
</script>
|
||||
Reference in New Issue
Block a user