50 lines
1.6 KiB
Markdown
50 lines
1.6 KiB
Markdown
# RSS Link Audit (FastAPI)
|
||
|
||
A FastAPI app that accepts an RSS/Atom feed URL, fetches each post’s full HTML, extracts outbound links, groups them by hostname, **hunts for each host’s RSS feed** (common endpoints + homepage discovery), and renders a stylish report using the **Royal Armory** palette.
|
||
|
||
## Features
|
||
|
||
- Input a feed URL via UI or JSON.
|
||
- Concurrent fetching (httpx + asyncio).
|
||
- Extract links from each post page.
|
||
- Group by hostname; count occurrences.
|
||
- Heuristic RSS discovery:
|
||
- Probe common feed endpoints (e.g. `/feed`, `/rss.xml`, `/atom.xml`, etc.).
|
||
- Parse homepage `<link rel="alternate" ...>` for RSS/Atom.
|
||
- Scan homepage `<a>` tags for `rss|atom|feed`.
|
||
- Validate candidates with `feedparser`.
|
||
- Report UI:
|
||
- Per-host card with counts.
|
||
- **Bar** visual for how many links a host has.
|
||
- **Top links** (if mentioned > 1).
|
||
- Links list truncated with a **More** button.
|
||
- RSS/Atom badge if found.
|
||
|
||
## Run locally
|
||
|
||
```bash
|
||
python -m venv .venv
|
||
source .venv/bin/activate # Windows: .venv\Scripts\activate
|
||
pip install -r requirements.txt
|
||
uvicorn main:app --reload
|
||
```
|
||
|
||
Open: http://127.0.0.1:8000
|
||
|
||
## API
|
||
|
||
```
|
||
POST /api/analyze
|
||
Content-Type: application/json
|
||
|
||
{"feed_url": "https://example.com/feed.xml"}
|
||
```
|
||
|
||
Returns JSON with the summarized data.
|
||
|
||
## Notes / Caveats
|
||
|
||
- Only static HTML is parsed (no JS rendering).
|
||
- Some sites block bots; results may vary.
|
||
- For large feeds, you may wish to trim the number of posts (e.g., slice `post_urls` in `analyze_feed`).
|
||
- Consider adding caching (e.g., `aiocache`, Redis) if you’ll run this frequently.
|