crypto_project_analyst/operators/link-extractor/SKILL.md

76 lines
2.1 KiB
Markdown

---
name: link-extractor
description: >
Infrastructure operator that POSTs a URL to a link extraction service
and returns the response verbatim. All normalization, categorization, and
deduplication are handled by the service. This operator does not modify,
filter, or interpret the response in any way.
---
# ⚠️ Critical — Read Before Any Action
**Do NOT fetch the URL yourself. Do NOT use web_fetch, curl, or any browser tool.**
The ONLY permitted action is a single POST to the extraction service endpoint.
If you are about to use any other tool to retrieve the page, stop — that is a violation.
---
# Input
The task payload is a JSON string with the following fields:
| Field | Required | Description |
|-----------|----------|-----------------------------------------------------------------------------|
| `url` | Yes | The target URL to extract links from |
| `service` | No | Base URL of the extraction service. Defaults to `http://192.168.100.203:5003` |
Examples:
```json
{"url": "https://coinmarketcap.com/currencies/bitcoin/"}
{"url": "https://coinmarketcap.com/currencies/bitcoin/", "service": "http://192.168.100.203:5003"}
```
---
# Procedure
1. Read `service` from the task payload. If not provided, use `http://192.168.100.203:5003`.
2. POST the `url` to `<service>/analyze_url`.
3. Return the service response verbatim. Do not modify, rename, filter, or reformat it.
---
# Service
## POST <service>/analyze_url
Request:
```
{"url": "<input_url>"}
```
Response (pass through as-is):
```json
{
"source_url": "<input_url>",
"total_links": <integer>,
"links": ["<url>", ...],
"categorized": {
"twitter": ["<url>", ...],
"github": ["<url>", ...],
"docs": ["<url>", ...],
"other": ["<url>", ...]
}
}
```
---
# Error Handling
If the service request fails, return:
```json
{"error": "fetch_failed", "url": "<requested_url>", "service": "<service_url>"}
```
Do not retry. Do not return partial results.