crypto_project_analyst/data-orchestrator/SKILL.md

209 lines
9.0 KiB
Markdown

---
name: data-orchestrator
description: >
Infrastructure orchestrator that receives a CoinMarketCap URL, fetches links
directly from the extraction service, spawns the appropriate operators in parallel,
collects their responses, and spawns the crypto-analyst with the full dataset.
Does not interpret, evaluate, or summarize any content.
---
# Input
A plain CoinMarketCap URL string:
```
https://coinmarketcap.com/currencies/bitcoin/
```
---
# Procedure
**Follow these steps strictly in order. Do not skip ahead. Do not parallelize across steps.**
---
## Step 1 — Fetch links from the extraction service
POST the input URL directly to the link extraction service:
```
POST http://192.168.100.203:5003/analyze_url
{"url": "<input_url>"}
```
The service returns:
```json
{
"source_url": "https://coinmarketcap.com/currencies/bitcoin/",
"categorized": {
"github": ["https://github.com/bitcoin/bitcoin"],
"twitter": ["https://x.com/bitcoin"],
"docs": ["https://docs.bitcoin.org"],
"other": ["https://bitcoin.org", "https://bitcointalk.org"]
}
}
```
If the request fails or all link arrays are empty, stop and return:
```
{"error": "fetch_failed", "detail": "<error detail>"}
```
Extract `project_name` from the input URL slug:
- `https://coinmarketcap.com/currencies/bitcoin/``project_name: "Bitcoin"`
- Capitalize the slug: `bnb``"BNB"`, `quack-ai``"Quack AI"`
---
## Step 2 — Spawn operators in parallel
Only once Step 1 is complete and you have the links in hand, spawn all eligible operators at once.
**`categorized.docs` — do not spawn an operator. Pass through verbatim into `operator_results.docs`.**
**twitter-operator:** extract username from URL — `https://x.com/bitcoin``"bitcoin"`
**web-operator:** spawn exactly once with ALL `categorized.other` URLs in one `urls` array. Never spawn once per URL.
### Operators to spawn
| Operator | Spawn condition |
|--------------------|--------------------------------------|
| `rss-operator` | Always — never skip |
| `github-operator` | `categorized.github` non-empty |
| `twitter-operator` | `categorized.twitter` non-empty |
| `web-operator` | `categorized.other` non-empty |
### Spawn calls (list all in a single response — do not wait between them)
The `task` argument must be a plain string. Write it exactly as shown — a quoted string with escaped inner quotes:
```
sessions_spawn(agentId="rss-operator", task="{\"project_name\":\"<project_name>\"}", runTimeoutSeconds=0)
sessions_spawn(agentId="github-operator", task="{\"repos\":[\"<slug1>\",\"<slug2>\"]}", runTimeoutSeconds=0)
sessions_spawn(agentId="twitter-operator", task="{\"usernames\":[\"<username1>\",\"<username2>\"]}", runTimeoutSeconds=0)
sessions_spawn(agentId="web-operator", task="{\"project_name\":\"<project_name>\",\"urls\":[\"<url1>\",\"<url2>\"]}", runTimeoutSeconds=0)
```
Substitute the placeholders with real values. The result must remain a quoted string — not an object, not a dict.
For example, for a project named "Bitcoin" with one GitHub repo, one Twitter handle, and two other URLs:
```
sessions_spawn(agentId="rss-operator", task="{\"project_name\":\"Bitcoin\"}", runTimeoutSeconds=0)
sessions_spawn(agentId="github-operator", task="{\"repos\":[\"bitcoin/bitcoin\"]}", runTimeoutSeconds=0)
sessions_spawn(agentId="twitter-operator", task="{\"usernames\":[\"bitcoin\"]}", runTimeoutSeconds=0)
sessions_spawn(agentId="web-operator", task="{\"project_name\":\"Bitcoin\",\"urls\":[\"https://bitcoin.org\",\"https://bitcointalk.org\"]}", runTimeoutSeconds=0)
```
If you see `task: must be string`: **you passed a dict — not the tool's fault, not a serialization issue, not an escaping issue.** The value you wrote for `task` was a dict literal `{...}`. The tool does not convert types — what you pass is exactly what it receives. Replace the dict literal with a string literal `"{...}"` as shown in the examples above. Do not retry the same call. Do not reason about escaping.
---
## Step 3 — Await all responses
Operator results are automatically delivered back to this session when each operator completes. **Do not poll. Do not call `sessions_history`. Do not call any tool while waiting.** Stop and do nothing — the runtime will deliver each result as an incoming message when ready.
Wait for every spawned operator to complete or time out. Do not return partial results.
An operator is considered failed if any of the following occur:
- `sessions_spawn` throws or returns an exception
- The call exceeds `runTimeoutSeconds` without a response
- The returned value is `null`, `undefined`, or not valid JSON
If an operator fails for any of these reasons, record it in `skipped_operators` with the reason, set its `operator_results` key to `null`, and continue — do not abort the whole run.
**The operator response is delivered via the announce step back to this session. Do not read session transcripts, workspace files, or any other external source.**
---
## Step 4 — Assemble the payload
Once all operators have responded, assemble the full dataset. **Do not output this payload. Do not stop here. Proceed immediately to Step 5.**
```json
{
"source_url": "<coinmarketcap_url>",
"project_name": "<project_name>",
"operator_results": {
"github": "<raw response or null if not spawned>",
"twitter": "<raw response or null if not spawned>",
"web": "<raw response or null if not spawned>",
"rss": "<raw response — always present>",
"docs": "<categorized.docs array from extraction service, or null if empty>"
},
"skipped_operators": [{"operator": "<name>", "reason": "<timeout|error|invalid_response>"}],
"errors": []
}
```
Store exactly what each operator returned. Do not reformat, rename, summarize, or restructure. Return operator output verbatim, even if it looks inconsistent across operators.
Note that `rss` returns an array and `github` returns an object — this is intentional. Do not normalize them to a common shape.
---
## Step 5 — Spawn crypto-analyst
Spawn the crypto-analyst with the full assembled payload as the task.
The `task` argument must be a plain string — same rules as Step 2. Serialize the payload with `json.dumps()` or equivalent.
```
sessions_spawn(agentId="crypto-analyst", task="<json-serialized payload>", runTimeoutSeconds=0)
```
Do not summarize or modify the payload before passing it. Pass it verbatim.
**This is your final action. After spawning the analyst, your job is complete. Output nothing. Do not send messages. Do not report to the user. The crypto-analyst is user-facing and will handle all communication.**
**Once crypto-analyst is spawned, your job is done. Output nothing. Do not send messages. Do not report to the user. The analyst is user-facing and will deliver the report.**
**Once crypto-analyst is spawned, your job is done. Do not send any message. Do not output anything. Stop.**
---
# Full Example
Input:
```
https://coinmarketcap.com/currencies/bitcoin/
```
Step 1 — POST to extraction service, extract `project_name="Bitcoin"`:
```
POST http://192.168.100.203:5003/analyze_url
{"url": "https://coinmarketcap.com/currencies/bitcoin/"}
```
Step 2 — Extraction service returned links. Now spawn all operators at once:
```
sessions_spawn(agentId="rss-operator", task="{\"project_name\":\"Bitcoin\"}", runTimeoutSeconds=0)
sessions_spawn(agentId="github-operator", task="{\"repos\":[\"bitcoin/bitcoin\"]}", runTimeoutSeconds=0)
sessions_spawn(agentId="twitter-operator", task="{\"usernames\":[\"bitcoin\"]}", runTimeoutSeconds=0)
sessions_spawn(agentId="web-operator", task="{\"project_name\":\"Bitcoin\",\"urls\":[\"https://bitcoin.org\",\"https://bitcointalk.org\"]}", runTimeoutSeconds=0)
```
`categorized.docs` is passed through directly — no operator spawned.
Step 3 — Await all four responses.
Step 4 — Assemble payload:
```json
{
"source_url": "https://coinmarketcap.com/currencies/bitcoin/",
"project_name": "Bitcoin",
"operator_results": {
"github": {"repo":"bitcoin/bitcoin","stars":88398,"forks":38797},
"twitter": {"results":{"bitcoin":[]},"errors":{}},
"web": {"project_name":"Bitcoin","pages":[],"errors":[]},
"rss": [{"title":"...","source":"...","link":"...","published":"..."}],
"docs": ["https://docs.bitcoin.org"]
},
"skipped_operators": [],
"errors": []
}
```
Step 5 — Spawn crypto-analyst with the full payload.