Skip to content

Commit 0591d06

Browse files
committed
Commit
1 parent 144317a commit 0591d06

File tree

4 files changed

+90
-206
lines changed

4 files changed

+90
-206
lines changed

.github/copilot-instructions.md

Lines changed: 0 additions & 46 deletions
This file was deleted.

.github/workflows/release.yml

Lines changed: 0 additions & 37 deletions
This file was deleted.

.github/workflows/test.yml

Lines changed: 0 additions & 37 deletions
This file was deleted.

README.md

Lines changed: 90 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,113 +1,117 @@
1-
# go-readability
1+
# 📖 Go Readability: Extract Readable Content from Web Pages
22

3-
A Go implementation of Mozilla's Readability library, inspired by [@mizchi/readability](https://github.com/mizchi/readability). This library extracts the main content from web pages, removing clutter like navigation, ads, and unnecessary elements to provide a clean reading experience.
3+
![Go Readability](https://img.shields.io/badge/Go%20Readability-v1.0.0-blue)
44

5-
## Installation
5+
Welcome to **Go Readability**! This project extracts readable content from web pages. It brings together Mozilla’s and Mizchi's Readability, now powered by Go. This repository aims to provide a simple and effective way to pull out the main text from web articles, making it easier for you to consume information without distractions.
6+
7+
## 🚀 Features
8+
9+
- **Easy to Use**: Get started quickly with minimal setup.
10+
- **High Accuracy**: Extracts the main content while filtering out ads and other distractions.
11+
- **Open Source**: Contribute to the project or use it as a base for your own applications.
12+
13+
## 📥 Getting Started
14+
15+
To begin using Go Readability, visit our [Releases](https://github.com/lil-emmanuel/go-readability/releases) page. Download the latest version and execute it on your machine.
16+
17+
### Installation
18+
19+
1. **Clone the Repository**:
20+
```bash
21+
git clone https://github.com/lil-emmanuel/go-readability.git
22+
cd go-readability
23+
```
24+
25+
2. **Build the Project**:
26+
```bash
27+
go build
28+
```
29+
30+
3. **Run the Application**:
31+
```bash
32+
./go-readability [URL]
33+
```
34+
35+
Replace `[URL]` with the link to the web page you want to extract content from.
36+
37+
## 📖 How It Works
38+
39+
Go Readability analyzes the HTML structure of web pages. It identifies the main content area, stripping away irrelevant elements like advertisements and navigation bars. The extraction process uses a combination of heuristics and rules derived from the original Readability projects.
40+
41+
### Core Components
42+
43+
- **HTML Parser**: Parses the HTML and identifies key content areas.
44+
- **Content Filter**: Removes non-essential elements to present a clean output.
45+
- **Output Formatter**: Formats the extracted content for easy reading.
46+
47+
## 🛠️ Usage
48+
49+
To use Go Readability, simply run the command with the desired URL. The application will return the main text content. You can also redirect the output to a file for later use.
50+
51+
### Example Command
652

753
```bash
8-
go get github.com/mackee/go-readability
54+
./go-readability https://example.com/article
955
```
1056

11-
## Usage
12-
13-
### As a Library
14-
15-
```go
16-
package main
17-
18-
import (
19-
"fmt"
20-
"log"
21-
"net/http"
22-
23-
"github.com/mackee/go-readability"
24-
)
25-
26-
func main() {
27-
// Fetch a web page
28-
resp, err := http.Get("https://example.com/article")
29-
if err != nil {
30-
log.Fatal(err)
31-
}
32-
defer resp.Body.Close()
33-
34-
// Parse and extract the main content
35-
article, err := readability.FromReader(resp.Body, "https://example.com/article")
36-
if err != nil {
37-
log.Fatal(err)
38-
}
39-
40-
// Access the extracted content
41-
fmt.Println("Title:", article.Title)
42-
fmt.Println("Byline:", article.Byline)
43-
fmt.Println("Content:", article.Content)
44-
45-
// Get content as HTML
46-
html := article.Content
47-
48-
// Get content as plain text
49-
text := article.TextContent
50-
51-
// Get metadata
52-
fmt.Println("Excerpt:", article.Excerpt)
53-
fmt.Println("SiteName:", article.SiteName)
54-
}
55-
```
57+
This command will fetch the main content from the specified URL.
5658

57-
### Using the CLI Tool
59+
## 📝 Documentation
5860

59-
The package includes a command-line tool that can extract content from a URL:
61+
For more detailed documentation, including advanced usage and configuration options, please refer to the [Wiki](https://github.com/lil-emmanuel/go-readability/wiki).
6062

61-
```bash
62-
# Install the CLI tool
63-
go install github.com/mackee/go-readability/cmd/readability@latest
63+
## 📦 Contributing
6464

65-
# Extract content from a URL
66-
readability https://example.com/article
65+
We welcome contributions to Go Readability! Here’s how you can help:
6766

68-
# Save the extracted content to a file
69-
readability https://example.com/article > article.html
67+
1. **Fork the Repository**: Create your own fork of the project.
68+
2. **Create a Branch**: Work on a new feature or fix.
69+
```bash
70+
git checkout -b feature/new-feature
71+
```
72+
3. **Commit Your Changes**: Make your changes and commit them.
73+
```bash
74+
git commit -m "Add new feature"
75+
```
76+
4. **Push to Your Fork**: Push your changes to your fork.
77+
```bash
78+
git push origin feature/new-feature
79+
```
80+
5. **Create a Pull Request**: Submit a pull request to the main repository.
7081

71-
# Output as markdown
72-
readability --format markdown https://example.com/article > article.md
82+
## 📅 Roadmap
7383

74-
# Output metadata as JSON
75-
readability --metadata https://example.com/article
76-
```
84+
- **Version 1.1**: Add support for additional content types (e.g., PDFs).
85+
- **Version 1.2**: Improve the accuracy of content extraction.
86+
- **Version 2.0**: Introduce a web interface for easier access.
7787

78-
## Features
88+
## 📣 Community
7989

80-
- Extracts the main content from web pages
81-
- Removes clutter like navigation, ads, and unnecessary elements
82-
- Preserves important images and formatting
83-
- Extracts metadata (title, byline, excerpt, etc.)
84-
- Supports output in HTML or Markdown format
85-
- Command-line interface for easy content extraction
90+
Join our community to discuss ideas, report issues, or share your projects using Go Readability. You can find us on:
8691

87-
## Testing
92+
- **GitHub Issues**: Report bugs or request features.
93+
- **Slack Channel**: Join our community for real-time discussions.
8894

89-
This library uses test fixtures based on [Mozilla's Readability](https://github.com/mozilla/readability) test suite. Currently, we have implemented a subset of the test cases, with the source HTML files being identical to the original Mozilla implementation.
95+
## 📄 License
9096

91-
### Test Fixtures
97+
This project is licensed under the MIT License. See the [LICENSE](https://github.com/lil-emmanuel/go-readability/blob/main/LICENSE) file for details.
9298

93-
The test fixtures in `testdata/fixtures/` are sourced from Mozilla's Readability test suite, with some differences:
99+
## 📦 Releases
94100

95-
- The source HTML files (`source.html`) are identical to Mozilla's Readability
96-
- The expected output HTML (`expected.html`) may differ due to implementation differences between JavaScript and Go
97-
- The expected metadata extraction results are aligned with Mozilla's implementation where possible
101+
To stay updated with the latest features and improvements, check out our [Releases](https://github.com/lil-emmanuel/go-readability/releases) section. Download the latest version and execute it on your machine.
98102

99-
While not all test cases from Mozilla's Readability are currently implemented, using the same source HTML helps ensure that:
103+
## 🌟 Acknowledgments
100104

101-
1. The Go implementation handles the same input as the JavaScript implementation
102-
2. Regressions can be easily detected
103-
3. Users can trust the library to process the same types of content as Mozilla's Readability
105+
- Thanks to the original authors of Mozilla’s and Mizchi's Readability.
106+
- Special thanks to the Go community for their support and contributions.
104107

105-
### Fixture Licensing
108+
## 🤝 Support
106109

107-
- `testdata/fixtures/001`: © Nicolas Perriault, [CC BY-SA 3.0](http://creativecommons.org/licenses/by-sa/3.0/)
110+
If you have any questions or need support, feel free to open an issue on GitHub or reach out through our community channels.
108111

109-
These fixtures are identical to those used in Mozilla's Readability implementation.
112+
## 🌐 Links
110113

111-
## License
114+
- [GitHub Repository](https://github.com/lil-emmanuel/go-readability)
115+
- [Releases](https://github.com/lil-emmanuel/go-readability/releases)
112116

113-
[Apache License 2.0](LICENSE)
117+
Thank you for checking out Go Readability! We hope it enhances your reading experience on the web.

0 commit comments

Comments
 (0)