Problem
Research workflow remains largely non-operational with only 20% success rate despite TAVILY_API_KEY secret being added. The workflow has been effectively offline for 17 days.
Current Status (2026-01-25)
Root Cause Analysis
Key insight: Daily News workflow recovered immediately after TAVILY_API_KEY was added (2026-01-22), but Research workflow did NOT recover. This suggests:
-
Hypothesis 1: Workflow needs recompilation
- Secret was added AFTER last compilation
- Lock file may not reference the new secret
- Solution:
make recompile
-
Hypothesis 2: Different MCP Gateway configuration
- Research may use different MCP server setup than Daily News
- May need additional configuration beyond TAVILY_API_KEY
- Review frontmatter differences
-
Hypothesis 3: Intermittent MCP Gateway issues
- 1/5 runs succeeded (20% rate)
- May be timing/connectivity related
- Could be transient MCP server availability
Comparison with Daily News and MCP Inspector
| Aspect |
Daily News (✅) |
Research (⚠️) |
MCP Inspector (❌) |
| TAVILY_API_KEY |
Present |
Present |
Present |
| Recovery |
Immediate |
Partial (20%) |
None (0%) |
| Success rate |
40% recovering |
20% low |
0% failing |
| Last compiled |
Unknown |
Unknown |
Unknown |
| MCP Gateway |
Working |
Intermittent |
Failing |
Recommended Investigation Steps
Step 1: Recompile Workflow
cd /path/to/repo
make recompile
git add .github/workflows/research.lock.yml
git commit -m "Recompile Research workflow after TAVILY_API_KEY fix"
git push
Step 2: Compare Frontmatter
Compare configurations:
.github/workflows/daily-news.md (working, 40% success)
.github/workflows/research.md (failing, 20% success)
.github/workflows/mcp-inspector.md (failing, 0% success)
Look for differences in:
- MCP server configuration
- Tool permissions
- Timeout settings
- Environment variables
Step 3: Analyze Failed Run Logs
Download artifacts from run 21078189533:
- Check
/tmp/gh-aw/mcp-logs/ for MCP Gateway errors
- Review agent stdio logs
- Look for timeout or connection issues
Step 4: Test Manually Multiple Times
# Run 3-5 times to check for intermittent issues
for i in {1..5}; do
gh workflow run research.lock.yml
sleep 60
done
Monitor success rate of manual runs.
Success Criteria
- Research workflow runs successfully
- Success rate returns to >80% over next 5 runs
- Research and knowledge work capabilities fully operational
- No intermittent failures
Priority: P1 (High)
Impact: Research capabilities severely limited for 17 days. This blocks automated research tasks, knowledge work, and investigation workflows.
Urgency: High - research functionality is critical for knowledge-based agents and analysis workflows.
Next steps:
- Recompile workflow (5 min)
- Test manually 3-5 times (30 min)
- Analyze intermittent failure pattern (30 min)
- Apply fix based on findings (variable)
References:
AI generated by Workflow Health Manager - Meta-Orchestrator
Problem
Research workflow remains largely non-operational with only 20% success rate despite TAVILY_API_KEY secret being added. The workflow has been effectively offline for 17 days.
Current Status (2026-01-25)
Root Cause Analysis
Key insight: Daily News workflow recovered immediately after TAVILY_API_KEY was added (2026-01-22), but Research workflow did NOT recover. This suggests:
Hypothesis 1: Workflow needs recompilation
make recompileHypothesis 2: Different MCP Gateway configuration
Hypothesis 3: Intermittent MCP Gateway issues
Comparison with Daily News and MCP Inspector
Recommended Investigation Steps
Step 1: Recompile Workflow
Step 2: Compare Frontmatter
Compare configurations:
.github/workflows/daily-news.md(working, 40% success).github/workflows/research.md(failing, 20% success).github/workflows/mcp-inspector.md(failing, 0% success)Look for differences in:
Step 3: Analyze Failed Run Logs
Download artifacts from run 21078189533:
/tmp/gh-aw/mcp-logs/for MCP Gateway errorsStep 4: Test Manually Multiple Times
Monitor success rate of manual runs.
Success Criteria
Priority: P1 (High)
Impact: Research capabilities severely limited for 17 days. This blocks automated research tasks, knowledge work, and investigation workflows.
Urgency: High - research functionality is critical for knowledge-based agents and analysis workflows.
Next steps:
References: