Always separate tools for production systems - it enables independent scaling, testing, and failure isolation. Implement a Kernel Agent that classifies queries and routes to specialized tools (RetrievalTool, SummarizationTool, QuestionAnsweringTool).
The critical pattern: have your kernel use LLM classification with explicit categories and fallback to 'unknown'. Keep classification prompts under 100 tokens and use temperature=0.7 to handle edge cases.
Chain tools when needed - QuestionAnsweringTool should internally call RetrievalTool rather than duplicating retrieval logic.
Pro tip: Add a params dictionary to your process_query method for tool-specific parameters (like paper_ids for summarization). This modular approach allows you to swap out individual tools, add new ones, or update tool logic without affecting the entire system. Monitor tool usage patterns - if 80% of queries go to one tool, consider optimizing that path specifically.