Messed up, then fixed ton of files. Note: never leave context window from where you start

2025-02-11 01:21:39 +00:00 · 2025-02-11 01:21:39 +00:00 · d5b90f555d
commit d5b90f555d
parent ac43cdd77e
20 changed files with 7921 additions and 235 deletions
--- a/README.md
+++ b/README.md
@ -1,72 +1,83 @@
-# 🤖📚 LastIn-AI: The Research Assistant That Wrote Itself
+# 🧠 LastIn AI: Your Paper's New Overlord
-*"I used the AI to create the AI" - This README, probably*
+[![AI Overlord](https://img.shields.io/badge/Powered%20By-Sentient%20Sparks-blue?style=for-the-badge)](https://www.youtube.com/watch?v=dQw4w9WgXcQ)
-## 🚀 What Does This Thing Do?
+**An AI system that reads, judges, and organizes papers so you can pretend you're keeping up with the literature**
-This autonomous research assistant:
+## 🚀 Features
 1. **Paper Hunter** 🔍: Daily arXiv scans using our `PaperFetcher` class
 2. **Category Ninja** 🥷: Manage topics via `config/arxiv_categories.json`
 3. **Time Traveler** ⏳: Fetch papers from past X days
 4. **Self-Aware Entity** 🤯: Literally wrote this README about itself
-## ⚙️ Installation (For Humans)
+- 🤖 **Self-Optimizing Pipeline**: Now with 20% more existential dread about its purpose
 - 🧩 **Modular Architecture**: Swappable components like you're building Ikea furniture for robots
 - 🔍 **Context-Aware Analysis**: Reads between LaTeX equations like a PhD student reads Twitter
 - 🛠 **Self-Healing Storage**: Fixes database issues while questioning why it bothers
 - 🤯 **Flux Capacitor Mode**: Time-aware processing (results may violate causality)
 ## ⚙️ Installation
 ```bash
-# Clone this repository that an AI created
+# Clone repository (we promise there's no paperclip maximizer)
-git clone https://github.com/yourusername/lastin-ai.git
+git clone https://github.com/your-lab/lastin-ai.git
 cd lastin-ai
-# Create virtual environment (because even AIs hate dependency conflicts)
+# Install dependencies (virtual env recommended)
-python -m venv .venv
+pip install -r requirements.txt  # Now with non-Euclidean dependency resolution!
 .venv\Scripts\activate
-# Install dependencies
+# Initialize the system (requires PostgreSQL)
-pip install -r requirements.txt
+python -m src.main init-db  # Creates tables and a small existential crisis
 ```
-## 🧠 Configuration
+## 🔧 Configuration
-Edit `config/arxiv_categories.json` to add your academic obsessions:
+Rename `.env.example` to `.env` and feed it your secrets:
 ```ini
 OPENAI_API_KEY=sk-your-key-here  # We pinky-promise not to become self-aware
 DEEPSEEK_API_KEY=sk-moon-shot     # For when regular AI isn't dramatic enough
-```json
+DB_HOST=localhost                # Where we store your academic sins
-{
+DB_NAME=paper_analysis           # Schema designed during a SIGBOVIK break
  "categories": [
    {
      "id": "cs.AI",
      "name": "Artificial Intelligence",
      "description": "Papers about systems that will eventually write better systems"
    }
  ],
  "default_categories": ["cs.AI"]
 }
 ```
-## 💻 Usage
+## 🧠 Usage
-```python
+```bash
-# Let the AI research AI research
+# Harvest papers like an academic combine
-python -m src.scripts.fetch_papers --days 3
+python -m src.main fetch --categories cs.AI --days 7
 # Query your digital hoard
 python -m src.main query "papers that cite GPT but clearly didn't read it"
 ```
-**Sample Output:**
+**Pro Mode:** Add `--loglevel DEBUG` to watch neural networks question the meaning of "breakthrough".
 ## 🏗 Architecture
 ```
-[AI]: Found 42 papers about AI
+lastin-ai/
-[AI]: Determining which papers are about AI writing better AI...
+├── src/
-[AI]: existential_crisis.exe triggered
+│   ├── data_acquisition/   # Papers go in, embeddings come out
 │   ├── storage/             # Where vectors go to rethink their life choices
 │   ├── utils/               # Home of AgentController (the real protagonist)
 │   └── main.py              # The big red button (do not push)
 ```
-## 🤖 Philosophical Corner
+Our **Agent Controller** handles:
 - 🧬 Data consistency through sheer willpower
 - 📉 Quality metrics that judge us all
 - 🧮 Vector math that would make Von Neumann blush
-*This section written by the AI about the AI that wrote the AI*
+## 🌌 Roadmap
-Q: Who watches the watchmen?  
+- [ ] Implement ethical constraints (optional)
-A: An AI that watches watchmen-watching AI.
+- [ ] Add support for papers written by AIs about AIs
 - [ ] Quantum-resistant pretension detection
 - [ ] Automated rebuttal generator for peer review
-## 📜 License
+## 🌟 Acknowledgments
-MIT License - because even self-writing code needs legal coverage
+- **ArXiv** for the digital paper avalanche
 - **GPUs** for pretending we're not just matrix multiplying
 - **The concept of attention** - you're the real MVP
 ---
-*README written by Cascade (AI) at 2025-02-10 22:54 UTC - bow before your new robotic researcher overlords*
+*Disclaimer: May occasionally generate paper summaries more coherent than the originals. Not liable for recursive self-improvement loops.* 🤖➰
--- a/data/papers/papers_cs_AI_20250210_231748.json
+++ b/data/papers/papers_cs_AI_20250210_231748.json
--- a/data/papers/papers_cs_AI_20250210_231924.json
+++ b/data/papers/papers_cs_AI_20250210_231924.json
--- a/data/papers/papers_cs_AI_20250210_231946.json
+++ b/data/papers/papers_cs_AI_20250210_231946.json
--- a/example.env
+++ b/example.env
@ -0,0 +1,10 @@
 OPENAI_API_KEY=
 DEEPSEEK_API_KEY=
 # Database Configuration
 DB_HOST=localhost
 DB_PORT=5432
 DB_NAME=paper_analysis
 DB_USER=paper_analysis
 DB_PASSWORD=xyz
 DATABASE_URL=postgresql://postgres:postgres@localhost:5432/paper_analysis
--- a/papers/2502_05110v1.json
+++ b/papers/2502_05110v1.json
@ -0,0 +1,15 @@
 {
  "title": "ApplE: An Applied Ethics Ontology with Event Context",
  "authors": [
    "Aisha Aijaz",
    "Raghava Mutharaju",
    "Manohar Kumar"
  ],
  "abstract": "Applied ethics is ubiquitous in most domains, requiring much deliberation due\nto its philosophical nature. Varying views often lead to conflicting courses of\naction where ethical dilemmas become challenging to resolve. Although many\nfactors contribute to such a decision, the major driving forces can be\ndiscretized and thus simplified to provide an indicative answer. Knowledge\nrepresentation and reasoning offer a way to explicitly translate abstract\nethical concepts into applicable principles within the context of an event. To\nachieve this, we propose ApplE, an Applied Ethics ontology that captures\nphilosophical theory and event context to holistically describe the morality of\nan action. The development process adheres to a modified version of the\nSimplified Agile Methodology for Ontology Development (SAMOD) and utilizes\nstandard design and publication practices. Using ApplE, we model a use case\nfrom the bioethics domain that demonstrates our ontology's social and\nscientific value. Apart from the ontological reasoning and quality checks,\nApplE is also evaluated using the three-fold testing process of SAMOD. ApplE\nfollows FAIR principles and aims to be a viable resource for applied ethicists\nand ontology engineers.",
  "pdf_url": "http://arxiv.org/pdf/2502.05110v1",
  "entry_id": "http://arxiv.org/abs/2502.05110v1",
  "categories": [
    "cs.CY",
    "cs.AI"
  ]
 }
--- a/papers/2502_05111v1.json
+++ b/papers/2502_05111v1.json
@ -0,0 +1,15 @@
 {
  "title": "Flexible and Efficient Grammar-Constrained Decoding",
  "authors": [
    "Kanghee Park",
    "Timothy Zhou",
    "Loris D'Antoni"
  ],
  "abstract": "Large Language Models (LLMs) are often asked to generate structured outputs\nthat obey precise syntactic rules, such as code snippets or formatted data.\nGrammar-constrained decoding (GCD) can guarantee that LLM outputs matches such\nrules by masking out tokens that will provably lead to outputs that do not\nbelong to a specified context-free grammar (CFG). To guarantee soundness, GCD\nalgorithms have to compute how a given LLM subword tokenizer can align with the\ntokens used\n  by a given context-free grammar and compute token masks based on this\ninformation. Doing so efficiently is challenging and existing GCD algorithms\nrequire tens of minutes to preprocess common grammars. We present a new GCD\nalgorithm together with an implementation that offers 17.71x faster offline\npreprocessing than existing approaches while preserving state-of-the-art\nefficiency in online mask computation.",
  "pdf_url": "http://arxiv.org/pdf/2502.05111v1",
  "entry_id": "http://arxiv.org/abs/2502.05111v1",
  "categories": [
    "cs.CL",
    "cs.AI"
  ]
 }
--- a/papers/2502_05130v1.json
+++ b/papers/2502_05130v1.json
@ -0,0 +1,25 @@
 {
  "title": "Latent Swap Joint Diffusion for Long-Form Audio Generation",
  "authors": [
    "Yusheng Dai",
    "Chenxi Wang",
    "Chang Li",
    "Chen Wang",
    "Jun Du",
    "Kewei Li",
    "Ruoyu Wang",
    "Jiefeng Ma",
    "Lei Sun",
    "Jianqing Gao"
  ],
  "abstract": "Previous work on long-form audio generation using global-view diffusion or\niterative generation demands significant training or inference costs. While\nrecent advancements in multi-view joint diffusion for panoramic generation\nprovide an efficient option, they struggle with spectrum generation with severe\noverlap distortions and high cross-view consistency costs. We initially explore\nthis phenomenon through the connectivity inheritance of latent maps and uncover\nthat averaging operations excessively smooth the high-frequency components of\nthe latent map. To address these issues, we propose Swap Forward (SaFa), a\nframe-level latent swap framework that synchronizes multiple diffusions to\nproduce a globally coherent long audio with more spectrum details in a\nforward-only manner. At its core, the bidirectional Self-Loop Latent Swap is\napplied between adjacent views, leveraging stepwise diffusion trajectory to\nadaptively enhance high-frequency components without disrupting low-frequency\ncomponents. Furthermore, to ensure cross-view consistency, the unidirectional\nReference-Guided Latent Swap is applied between the reference and the\nnon-overlap regions of each subview during the early stages, providing\ncentralized trajectory guidance. Quantitative and qualitative experiments\ndemonstrate that SaFa significantly outperforms existing joint diffusion\nmethods and even training-based long audio generation models. Moreover, we find\nthat it also adapts well to panoramic generation, achieving comparable\nstate-of-the-art performance with greater efficiency and model\ngeneralizability. Project page is available at https://swapforward.github.io/.",
  "pdf_url": "http://arxiv.org/pdf/2502.05130v1",
  "entry_id": "http://arxiv.org/abs/2502.05130v1",
  "categories": [
    "cs.SD",
    "cs.AI",
    "cs.CV",
    "cs.MM",
    "eess.AS"
  ]
 }
--- a/papers/2502_05147v1.json
+++ b/papers/2502_05147v1.json
@ -0,0 +1,17 @@
 {
  "title": "LP-DETR: Layer-wise Progressive Relations for Object Detection",
  "authors": [
    "Zhengjian Kang",
    "Ye Zhang",
    "Xiaoyu Deng",
    "Xintao Li",
    "Yongzhe Zhang"
  ],
  "abstract": "This paper presents LP-DETR (Layer-wise Progressive DETR), a novel approach\nthat enhances DETR-based object detection through multi-scale relation\nmodeling. Our method introduces learnable spatial relationships between object\nqueries through a relation-aware self-attention mechanism, which adaptively\nlearns to balance different scales of relations (local, medium and global)\nacross decoder layers. This progressive design enables the model to effectively\ncapture evolving spatial dependencies throughout the detection pipeline.\nExtensive experiments on COCO 2017 dataset demonstrate that our method improves\nboth convergence speed and detection accuracy compared to standard\nself-attention module. The proposed method achieves competitive results,\nreaching 52.3\\% AP with 12 epochs and 52.5\\% AP with 24 epochs using ResNet-50\nbackbone, and further improving to 58.0\\% AP with Swin-L backbone. Furthermore,\nour analysis reveals an interesting pattern: the model naturally learns to\nprioritize local spatial relations in early decoder layers while gradually\nshifting attention to broader contexts in deeper layers, providing valuable\ninsights for future research in object detection.",
  "pdf_url": "http://arxiv.org/pdf/2502.05147v1",
  "entry_id": "http://arxiv.org/abs/2502.05147v1",
  "categories": [
    "cs.CV",
    "cs.AI"
  ]
 }
--- a/papers/2502_05151v1.json
+++ b/papers/2502_05151v1.json
@ -0,0 +1,28 @@
 {
  "title": "Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation",
  "authors": [
    "Steffen Eger",
    "Yong Cao",
    "Jennifer D'Souza",
    "Andreas Geiger",
    "Christian Greisinger",
    "Stephanie Gross",
    "Yufang Hou",
    "Brigitte Krenn",
    "Anne Lauscher",
    "Yizhi Li",
    "Chenghua Lin",
    "Nafise Sadat Moosavi",
    "Wei Zhao",
    "Tristan Miller"
  ],
  "abstract": "With the advent of large multimodal language models, science is now at a\nthreshold of an AI-based technological transformation. Recently, a plethora of\nnew AI models and tools has been proposed, promising to empower researchers and\nacademics worldwide to conduct their research more effectively and efficiently.\nThis includes all aspects of the research cycle, especially (1) searching for\nrelevant literature; (2) generating research ideas and conducting\nexperimentation; generating (3) text-based and (4) multimodal content (e.g.,\nscientific figures and diagrams); and (5) AI-based automatic peer review. In\nthis survey, we provide an in-depth overview over these exciting recent\ndevelopments, which promise to fundamentally alter the scientific research\nprocess for good. Our survey covers the five aspects outlined above, indicating\nrelevant datasets, methods and results (including evaluation) as well as\nlimitations and scope for future research. Ethical concerns regarding\nshortcomings of these tools and potential for misuse (fake science, plagiarism,\nharms to research integrity) take a particularly prominent place in our\ndiscussion. We hope that our survey will not only become a reference guide for\nnewcomers to the field but also a catalyst for new AI-based initiatives in the\narea of \"AI4Science\".",
  "pdf_url": "http://arxiv.org/pdf/2502.05151v1",
  "entry_id": "http://arxiv.org/abs/2502.05151v1",
  "categories": [
    "cs.CL",
    "cs.AI",
    "cs.CV",
    "cs.LG"
  ]
 }
--- a/papers/2502_05172v1.json
+++ b/papers/2502_05172v1.json
@ -0,0 +1,24 @@
 {
  "title": "Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient",
  "authors": [
    "Jan Ludziejewski",
    "Maciej Pióro",
    "Jakub Krajewski",
    "Maciej Stefaniak",
    "Michał Krutul",
    "Jan Małaśnicki",
    "Marek Cygan",
    "Piotr Sankowski",
    "Kamil Adamczewski",
    "Piotr Miłoś",
    "Sebastian Jaszczur"
  ],
  "abstract": "Mixture of Experts (MoE) architectures have significantly increased\ncomputational efficiency in both research and real-world applications of\nlarge-scale machine learning models. However, their scalability and efficiency\nunder memory constraints remain relatively underexplored. In this work, we\npresent joint scaling laws for dense and MoE models, incorporating key factors\nsuch as the number of active parameters, dataset size, and the number of\nexperts. Our findings provide a principled framework for selecting the optimal\nMoE configuration under fixed memory and compute budgets. Surprisingly, we show\nthat MoE models can be more memory-efficient than dense models, contradicting\nconventional wisdom. To derive and validate the theoretical predictions of our\nscaling laws, we conduct over 280 experiments with up to 2.7B active parameters\nand up to 5B total parameters. These results offer actionable insights for\ndesigning and deploying MoE models in practical large-scale training scenarios.",
  "pdf_url": "http://arxiv.org/pdf/2502.05172v1",
  "entry_id": "http://arxiv.org/abs/2502.05172v1",
  "categories": [
    "cs.LG",
    "cs.AI",
    "cs.CL"
  ]
 }
--- a/papers/2502_05174v1.json
+++ b/papers/2502_05174v1.json
@ -0,0 +1,17 @@
 {
  "title": "MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison",
  "authors": [
    "Kaijie Zhu",
    "Xianjun Yang",
    "Jindong Wang",
    "Wenbo Guo",
    "William Yang Wang"
  ],
  "abstract": "Recent research has explored that LLM agents are vulnerable to indirect\nprompt injection (IPI) attacks, where malicious tasks embedded in\ntool-retrieved information can redirect the agent to take unauthorized actions.\nExisting defenses against IPI have significant limitations: either require\nessential model training resources, lack effectiveness against sophisticated\nattacks, or harm the normal utilities. We present MELON (Masked re-Execution\nand TooL comparisON), a novel IPI defense. Our approach builds on the\nobservation that under a successful attack, the agent's next action becomes\nless dependent on user tasks and more on malicious tasks. Following this, we\ndesign MELON to detect attacks by re-executing the agent's trajectory with a\nmasked user prompt modified through a masking function. We identify an attack\nif the actions generated in the original and masked executions are similar. We\nalso include three key designs to reduce the potential false positives and\nfalse negatives. Extensive evaluation on the IPI benchmark AgentDojo\ndemonstrates that MELON outperforms SOTA defenses in both attack prevention and\nutility preservation. Moreover, we show that combining MELON with a SOTA prompt\naugmentation defense (denoted as MELON-Aug) further improves its performance.\nWe also conduct a detailed ablation study to validate our key designs.",
  "pdf_url": "http://arxiv.org/pdf/2502.05174v1",
  "entry_id": "http://arxiv.org/abs/2502.05174v1",
  "categories": [
    "cs.CR",
    "cs.AI"
  ]
 }
--- a/src/config/logging_config.py
+++ b/src/config/logging_config.py
@ -22,13 +22,16 @@ def setup_logging(log_dir: str = "logs"):
            },
            'simple': {
                'format': '%(levelname)s: %(message)s'
            },
            'error': {
                'format': '%(levelname)s: %(message)s\n%(exc_info)s'
            }
        },
        'handlers': {
            'console': {
                'class': 'logging.StreamHandler',
-                'level': 'WARNING',  # Only show warnings and above in console
+                'level': 'INFO',  # Show info and above in console
-                'formatter': 'simple',
+                'formatter': 'error',
                'stream': 'ext://sys.stdout'
            },
            'debug_file': {
--- a/src/data_acquisition/arxiv_client.py
+++ b/src/data_acquisition/arxiv_client.py
@ -1,11 +1,21 @@
 """Client for interacting with the arXiv API."""
 import arxiv
-from typing import List, Dict, Any
+from typing import List, Dict, Any, Optional
 from tenacity import retry, stop_after_attempt, wait_exponential
 import aiohttp
 from datetime import datetime, timedelta
 import logging
 logger = logging.getLogger(__name__)
 class ArxivClient:
    """
    Client for interacting with the arXiv API.
    Documentation: https://info.arxiv.org/help/api/user-manual.html
    """
    def __init__(self):
-        self.client = arxiv.Client()
+        """Initialize the ArxivClient."""
        self._session = None
    async def _get_session(self) -> aiohttp.ClientSession:
@ -15,42 +25,116 @@ class ArxivClient:
        return self._session
    async def close(self):
-        """Close the session."""
+        """Close any open resources."""
-        if self._session is not None:
+        if self._session:
            await self._session.close()
            self._session = None
    async def __aenter__(self):
        """Async context manager entry."""
        await self._get_session()
        return self
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        """Async context manager exit."""
        await self.close()
-    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
+    async def fetch_papers(self, query: str = None, category: str = None, 
-    async def fetch_papers(self, query: str, max_results: int = 10) -> List[Dict[Any, Any]]:
+                         start_date = None, end_date = None, max_results: int = 100) -> List[Dict[str, Any]]:
-        """Fetch papers from arXiv based on query."""
+        """
        Fetch papers from arXiv based on query and/or category.
        Args:
            query (str, optional): Search query
            category (str, optional): arXiv category (e.g., 'cs.AI')
            start_date: Start date for paper search (naive datetime)
            end_date: End date for paper search (naive datetime)
            max_results (int): Maximum number of results to return
        Returns:
            List[Dict[str, Any]]: List of paper metadata
        """
        # Build the search query according to arXiv API syntax
        search_terms = []
        # Add category constraint if specified
        if category:
            search_terms.append(f"cat:{category}")
        # Add custom query if specified
        if query:
            formatted_query = query.replace(' AND ', ' AND ').replace(' OR ', ' OR ').replace(' NOT ', ' NOT ')
            search_terms.append(f"({formatted_query})")
        # Add date range constraint if specified
        if start_date and end_date:
            date_range = (
                f"submittedDate:[{start_date.strftime('%Y%m%d')}0000 TO "
                f"{end_date.strftime('%Y%m%d')}2359]"
            )
            search_terms.append(date_range)
            logger.info(f"Searching for papers between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}")
        # Combine search terms
        final_query = ' AND '.join(f'({term})' for term in search_terms) if search_terms else 'all:*'
        # Configure and execute search
        search = arxiv.Search(
-            query=query,
+            query=final_query,
            max_results=max_results,
-            sort_by=arxiv.SortCriterion.SubmittedDate
+            sort_by=arxiv.SortCriterion.SubmittedDate,
            sort_order=arxiv.SortOrder.Descending
        )
-        results = []
+        papers = []
-        # Use list() to get all results at once instead of async iteration
+        try:
-        papers = list(self.client.results(search))
+            client = arxiv.Client()
-        for paper in papers:
+            @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
-            results.append({
+            def fetch_results():
-                'id': paper.entry_id.split('/')[-1],  # Get the ID without the full URL
+                return list(client.results(search))
-                'title': paper.title,
+            
-                'authors': [author.name for author in paper.authors],
+            results = fetch_results()
-                'summary': paper.summary,
+            
-                'pdf_url': paper.pdf_url,
+            for result in results:
-                'published': paper.published.isoformat() if paper.published else None,
+                papers.append({
-                'updated': paper.updated.isoformat() if paper.updated else None,
+                    "title": result.title,
-                'categories': paper.categories
+                    "authors": [author.name for author in result.authors],
                    "abstract": result.summary,
                    "pdf_url": result.pdf_url,
                    "entry_id": result.entry_id,
                    "categories": result.categories
                })
-        return results
+            logger.info(f"Successfully fetched {len(papers)} papers")
            return papers
        except Exception as e:
            logger.error(f"Error fetching papers: {e}")
            raise
    async def get_paper_by_id(self, paper_id: str) -> Optional[Dict[str, Any]]:
        """Fetch a single paper by arXiv ID."""
        try:
            # Extract arXiv ID from URL format
            arxiv_id = paper_id.split('/')[-1].replace('v', '')  # Remove version suffix
            search = arxiv.Search(id_list=[arxiv_id])
            client = arxiv.Client()
            results = list(client.results(search))
            if not results:
                return None
            result = results[0]
            return {
                "entry_id": result.entry_id,
                "title": result.title,
                "authors": [author.name for author in result.authors],
                "summary": result.summary,
                "pdf_url": result.pdf_url,
                "categories": result.categories
            }
        except Exception as e:
            logger.error(f"Failed to fetch paper {paper_id}: {e}")
            return None
--- a/src/main.py
+++ b/src/main.py
@ -2,26 +2,32 @@ import asyncio
 import signal
 import logging
 import sys
-from src.agent_controller import AgentController
+import os
 import json
 from datetime import datetime, timedelta
 from typing import List, Optional
 from zoneinfo import ZoneInfo
 from src.utils.agent_controller import AgentController
 from src.data_acquisition.arxiv_client import ArxivClient
 from src.utils.debug import tracker
 from src.config.logging_config import setup_logging
 logger = logging.getLogger(__name__)
-def handle_sigint():
+def handle_sigint(signum, frame):
    """Handle interrupt signal."""
-    logger.info("Received interrupt signal, canceling tasks...")
+    logger.info("Received interrupt signal, exiting...")
-    for task in asyncio.all_tasks():
+    sys.exit(0)
        task.cancel()
-async def cleanup_resources(loop: asyncio.AbstractEventLoop):
+async def cleanup_resources():
    """Clean up any remaining resources."""
    logger.debug("Starting resource cleanup...")
    tracker.print_active_resources()
    try:
-        # Cancel all tasks
+        # Cancel all tasks except current
-        pending = asyncio.all_tasks(loop)
+        pending = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
        if pending:
            logger.debug(f"Cancelling {len(pending)} pending tasks...")
            for task in pending:
@ -29,107 +35,168 @@ async def cleanup_resources(loop: asyncio.AbstractEventLoop):
                    task.cancel()
            # Wait for tasks to complete with timeout
            try:
                logger.debug("Waiting for tasks to complete...")
                await asyncio.wait(pending, timeout=5)
            except asyncio.CancelledError:
                logger.debug("Task wait cancelled")
-            # Force close any remaining tasks
+        logger.debug("Cleanup completed successfully")
-            for task in pending:
+    except Exception as e:
-                if not task.done():
+        logger.error(f"Error during cleanup: {e}")
-                    logger.warning(f"Force closing task: {task.get_name()}")
+
 async def fetch_papers(days: int = 7, categories: Optional[List[str]] = None) -> None:
    """Fetch and analyze papers from arXiv."""
    logger.info(f"Fetching papers from the last {days} days")
    if categories is None:
        categories = ["cs.AI"]
    async with ArxivClient() as client, AgentController() as agent:
        try:
-                        task.close()
+            # Calculate date range
-                    except Exception as e:
+            end_date = datetime.now()
-                        logger.error(f"Error closing task: {str(e)}")
+            start_date = end_date - timedelta(days=days)
            logger.info(f"Date range: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")
            # Fetch and analyze papers for each category
            for category in categories:
                logger.info(f"Fetching papers for category: {category}")
                papers = await client.fetch_papers(category=category, 
                                                start_date=start_date,
                                                end_date=end_date)
                if not papers:
                    print(f"\nNo recent papers found in {category}")
                    continue
                print(f"\nProcessing papers in {category}:")
                print("=" * 80)
                print()
                for i, paper in enumerate(papers, 1):
                    print(f"Processing paper {i}/{len(papers)}: {paper['title']}")
                    try:
                        # Analyze paper
                        analysis = await agent.analyze_paper(paper)
                        # Print analysis
                        print("\nAnalysis:")
                        print(f"Summary: {analysis.get('summary', 'No summary available')}")
                        print("\nTechnical Concepts:")
                        print(analysis.get('technical_concepts', 'No technical concepts available'))
                        # Print fluff analysis
                        fluff = analysis.get('fluff', {})
                        score = fluff.get('score')
                        if score is not None:
                            color = '\033[92m' if score < 30 else '\033[93m' if score < 70 else '\033[91m'
                            reset = '\033[0m'
                            print(f"\nFluff Score: {color}{score}/100{reset}")
                            print("Analysis:")
                            print(fluff.get('explanation', 'No explanation available'))
                    except Exception as e:
-        logger.error(f"Error during cleanup: {str(e)}")
+                        logger.error(f"Error processing paper: {e}")
-    finally:
+                        print("Failed to analyze paper")
-        logger.debug("Resource cleanup completed")
+                        
                    print("-" * 80)
                    print()
        except Exception as e:
            logger.error(f"Error fetching papers: {e}")
            raise
 async def process_query(query: str) -> None:
    """Process a search query and display results."""
    async with AgentController() as agent:
        try:
            results = await agent.process_query(query)
            if not results:
                print("\nNo matching papers found.")
                return
            print(f"\nFound {len(results)} matching papers:")
            print("=" * 80)
            for i, paper in enumerate(results, 1):
                print(f"\n{i}. {paper['title']}")
                print(f"   Authors: {', '.join(paper['authors'])}")
                analysis = paper.get('analysis', {})
                print("\nSummary:")
                print(analysis.get('summary', 'No summary available'))
                print("\nTechnical Concepts:")
                print(analysis.get('technical_concepts', 'No technical concepts available'))
                fluff = analysis.get('fluff', {})
                score = fluff.get('score')
                if score is not None:
                    color = '\033[92m' if score < 30 else '\033[93m' if score < 70 else '\033[91m'
                    reset = '\033[0m'
                    print(f"\nFluff Score: {color}{score}/100{reset}")
                    print("Analysis:")
                    print(fluff.get('explanation', 'No explanation available'))
                print("-" * 80)
        except Exception as e:
            logger.error(f"Error processing query: {e}")
            raise
 async def main():
    """Main application entry point."""
-    try:
+    parser = argparse.ArgumentParser(description='Fetch and analyze arXiv papers')
-        logger.debug("Starting main application...")
+    subparsers = parser.add_subparsers(dest='command', help='Command to run')
        async with AgentController() as agent:
            query = "Large Language Models recent advances"
            logger.debug(f"Processing query: {query}")
-            try:
+    # Fetch papers command
-                # Process query and get both new and existing papers
+    fetch_parser = subparsers.add_parser('fetch', help='Fetch recent papers')
-                results = await agent.process_query(query)
+    fetch_parser.add_argument('--days', type=int, default=7,
                            help='Number of days to look back')
    fetch_parser.add_argument('--categories', nargs='+', default=['cs.AI'],
                            help='arXiv categories to fetch')
-                if results:
+    # Search papers command
-                    print("\n=== Analysis Results ===")
+    search_parser = subparsers.add_parser('search', help='Search papers')
-                    for paper in results:
+    search_parser.add_argument('query', help='Search query')
                        print(f"\nTitle: {paper['title']}")
                        print(f"Authors: {paper.get('authors', 'Unknown')}")
-                        if 'analysis' in paper:
+    args = parser.parse_args()
-                            # This is a newly processed paper
+    
-                            print("Status: Newly Processed")
+    if args.command == 'fetch':
-                            print("\nSummary:", paper['analysis'].get('summary', 'No summary available'))
+        await fetch_papers(days=args.days, categories=args.categories)
-                            print("\nTechnical Concepts:", paper['analysis'].get('technical_concepts', 'No concepts available'))
+    elif args.command == 'search':
        await process_query(args.query)
    else:
-                            # This is an existing paper from the database
+        parser.print_help()
                            print("Status: Retrieved from Database")
                            print("\nRelevant Text:", paper.get('relevant_text', 'No text available')[:500] + "...")
                else:
                    print("\nNo papers found for the query.")
            except Exception as e:
                logger.error(f"Error in main: {str(e)}")
                raise
    finally:
        logger.debug("Main execution completed")
 def run_main():
    """Run the main application."""
    # Set up logging
    setup_logging()
-    # Create new event loop
+    # Set up signal handlers for Windows
-    logger.debug("Creating new event loop...")
+    signal.signal(signal.SIGINT, handle_sigint)
-    loop = asyncio.new_event_loop()
+    signal.signal(signal.SIGTERM, handle_sigint)
    asyncio.set_event_loop(loop)
    tracker.track_loop(loop)
    # Set up signal handlers
    signals = (signal.SIGTERM, signal.SIGINT)
    for s in signals:
        loop.add_signal_handler(
            s, lambda s=s: handle_sigint()
        )
    try:
-        logger.debug("Running main function...")
+        # Run main application
-        loop.run_until_complete(main())
+        asyncio.run(main())
-    except asyncio.CancelledError:
+    except KeyboardInterrupt:
-        logger.info("Main execution cancelled")
+        logger.info("Application cancelled by user")
    except Exception as e:
-        logger.error(f"Error in run_main: {str(e)}")
+        logger.error(f"Application error: {e}")
-        raise
+        sys.exit(1)
    finally:
        # Clean up
        try:
-            # Run cleanup
+            asyncio.run(cleanup_resources())
            logger.debug("Running final cleanup...")
            loop.run_until_complete(cleanup_resources(loop))
            # Close the loop
            logger.debug("Closing event loop...")
            loop.run_until_complete(loop.shutdown_asyncgens())
            loop.run_until_complete(loop.shutdown_default_executor())
            # Remove the event loop
            asyncio.set_event_loop(None)
            loop.close()
            tracker.untrack_loop(loop)
            logger.debug("Cleanup completed successfully")
        except Exception as e:
-            logger.error(f"Error during final cleanup: {str(e)}")
+            logger.error(f"Error during cleanup: {e}")
        finally:
            # Force exit to ensure all resources are released
            sys.exit(0)
 if __name__ == "__main__":
    import argparse
    run_main()
--- a/src/scripts/fetch_papers.py
+++ b/src/scripts/fetch_papers.py
@ -79,20 +79,29 @@ class PaperFetcher:
        logger.info(f"Fetching papers from categories: {categories}")
-        # Initialize clients
+        # Initialize agent
        arxiv_client = ArxivClient()
        agent = AgentController()
        try:
            await agent.initialize()
-            # Calculate date range
+            # Calculate date range (use a minimum of 7 days if days is too small)
            end_date = datetime.now()
            days = max(days, 7)  # Ensure we look at least 7 days back
            start_date = end_date - timedelta(days=days)
            logger.info(f"Searching for papers from {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")
            # Create output directory if it doesn't exist
            output_dir = os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(__file__))), "data", "papers")
            os.makedirs(output_dir, exist_ok=True)
            # Create and use ArxivClient as async context manager
            async with ArxivClient() as arxiv_client:
                # Fetch and process papers for each category
                for category in categories:
                    logger.info(f"Processing category: {category}")
                    try:
                        papers = await arxiv_client.fetch_papers(
                            category=category,
                            start_date=start_date,
@ -100,13 +109,47 @@ class PaperFetcher:
                        )
                        if not papers:
-                    logger.info(f"No new papers found for category {category}")
+                            logger.info(f"No papers found for category {category}")
                            continue
                        logger.info(f"Found {len(papers)} papers in {category}")
                for paper in papers:
                    await agent.process_paper(paper)
                        # Save papers to JSON file
                        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                        output_file = os.path.join(output_dir, f"papers_{category.replace('.', '_')}_{timestamp}.json")
                        with open(output_file, 'w', encoding='utf-8') as f:
                            json.dump(papers, f, indent=2, ensure_ascii=False)
                        logger.info(f"Saved papers to {output_file}")
                        # Display paper information
                        print(f"\nRecent papers in {category}:")
                        print("=" * 80)
                        for i, paper in enumerate(papers, 1):
                            print(f"\n{i}. {paper['title']}")
                            print(f"   Authors: {', '.join(paper['authors'])}")
                            print(f"   Published: {paper['published']}")
                            print(f"   URL: {paper['pdf_url']}")
                            print(f"   Categories: {', '.join(paper['categories'])}")
                            if paper.get('doi'):
                                print(f"   DOI: {paper['doi']}")
                            print("-" * 80)
                        for paper in papers:
                            try:
                                analysis = await agent.analyze_paper(paper)
                                # TODO: Store analysis results
                            except Exception as e:
                                logger.error(f"Failed to analyze paper {paper.get('title', 'Unknown')}: {e}")
                                continue
                    except Exception as e:
                        logger.error(f"Failed to fetch papers for category {category}: {e}")
                        continue
        except Exception as e:
            logger.error(f"Error in fetch_papers: {e}")
        finally:
            await agent.close()
--- a/src/storage/paper_store.py
+++ b/src/storage/paper_store.py
@ -19,6 +19,8 @@ CREATE TABLE IF NOT EXISTS papers (
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
 );
 CREATE UNIQUE INDEX IF NOT EXISTS papers_id_idx ON papers (id);
 """
 class PaperStore:
@ -59,17 +61,16 @@ class PaperStore:
        if self.pool:
            await self.pool.close()
-    async def store_paper(self, paper: Dict) -> None:
+    async def store_paper(self, metadata: Dict) -> None:
-        """Store a paper and its analysis in the database."""
+        """Store paper metadata in the database."""
        if not self.pool:
            await self.initialize()
-        async with self.pool.acquire() as conn:
+        try:
-            # Extract analysis if present
+            logger.info(f"Storing paper metadata: {metadata.get('title')} (ID: {metadata.get('id')})")
            analysis = paper.get('analysis', {})
            fluff_analysis = analysis.get('fluff', {})
-            # Store paper
+            async with self.pool.acquire() as conn:
                # Store paper metadata
                await conn.execute('''
                    INSERT INTO papers (
                        id, title, authors, summary, technical_concepts, 
@ -86,45 +87,52 @@ class PaperStore:
                        pdf_url = EXCLUDED.pdf_url,
                        updated_at = CURRENT_TIMESTAMP
                ''', 
-            paper['id'],
+                metadata['id'],
-            paper['title'],
+                metadata['title'],
-            paper['authors'],
+                metadata['authors'],
-            analysis.get('summary'),
+                metadata.get('summary'),
-            analysis.get('technical_concepts'),
+                metadata.get('technical_concepts'),
-            fluff_analysis.get('score'),
+                metadata.get('fluff_score'),
-            fluff_analysis.get('explanation'),
+                metadata.get('fluff_explanation'),
-            paper.get('pdf_url'))
+                metadata.get('pdf_url'))
            logger.info(f"Successfully stored paper metadata {metadata.get('id')}")
        except Exception as e:
            logger.error(f"Error storing paper metadata {metadata.get('id')}: {str(e)}")
            raise
    async def get_paper(self, paper_id: str) -> Optional[Dict]:
-        """Retrieve a paper by its ID."""
+        """Retrieve paper metadata by its ID."""
        if not self.pool:
            await self.initialize()
        async with self.pool.acquire() as conn:
            # Get paper metadata
            row = await conn.fetchrow('''
-                SELECT * FROM papers WHERE id = $1
+                SELECT id, title, authors, summary, technical_concepts,
                       fluff_score, fluff_explanation, pdf_url,
                       created_at, updated_at
                FROM papers
                WHERE id = $1
            ''', paper_id)
            if row:
                return dict(row)
            return None
-    async def get_all_paper_ids(self) -> List[str]:
+    async def get_all_papers(self) -> List[Dict]:
-        """Get all paper IDs in the database."""
+        """Retrieve all papers ordered by creation date."""
        if not self.pool:
            await self.initialize()
        async with self.pool.acquire() as conn:
            rows = await conn.fetch('SELECT id FROM papers')
            return [row['id'] for row in rows]
    async def get_papers_by_ids(self, paper_ids: List[str]) -> List[Dict]:
        """Retrieve multiple papers by their IDs."""
        if not self.pool:
            await self.initialize()
        async with self.pool.acquire() as conn:
            # Get all papers
            rows = await conn.fetch('''
-                SELECT * FROM papers WHERE id = ANY($1)
+                SELECT id, title, authors, summary, technical_concepts,
-            ''', paper_ids)
+                       fluff_score, fluff_explanation, pdf_url,
                       created_at, updated_at
                FROM papers
                ORDER BY created_at DESC
            ''')
            return [dict(row) for row in rows]
--- a/src/storage/vector_store.py
+++ b/src/storage/vector_store.py
@ -1,6 +1,9 @@
 from typing import List, Dict
 import chromadb
 from chromadb.config import Settings
 import logging
 logger = logging.getLogger(__name__)
 class VectorStore:
    def __init__(self, persist_directory: str = "chroma_db"):
@ -15,6 +18,11 @@ class VectorStore:
            metadata={"hnsw:space": "cosine"}  # Use cosine similarity
        )
    async def close(self):
        """Close the vector store client."""
        # ChromaDB doesn't need explicit cleanup
        pass
    def paper_exists(self, paper_id: str) -> bool:
        """Check if a paper already exists in the vector store."""
        try:
@ -61,6 +69,14 @@ class VectorStore:
            ids=ids
        )
    def delete_paper(self, paper_id: str) -> None:
        """Delete all chunks for a paper."""
        try:
            self.collection.delete(where={"paper_id": paper_id})
            logger.info(f"Deleted vector store entries for paper {paper_id}")
        except Exception as e:
            logger.error(f"Failed to delete paper {paper_id}: {e}")
    def query_similar(self, query: str, n_results: int = 5) -> Dict:
        """Query for similar chunks."""
        try:
--- a/src/utils/init.py
+++ b/src/utils/init.py
@ -0,0 +1 @@
--- a/src/utils/agent_controller.py
+++ b/src/utils/agent_controller.py
@ -0,0 +1,234 @@
 """
 Agent Controller for managing AI interactions and paper analysis.
 """
 import os
 import json
 import logging
 from typing import Dict, Any, Optional, List
 from datetime import datetime
 from pathlib import Path
 from dotenv import load_dotenv
 from src.analysis.llm_analyzer import LLMAnalyzer
 from src.storage.paper_store import PaperStore
 from src.storage.vector_store import VectorStore
 logger = logging.getLogger(__name__)
 class AgentController:
    """Controller class for managing AI agent operations."""
    def __init__(self):
        """Initialize the agent controller."""
        self.initialized = False
        self.llm_analyzer = None
        self.paper_store = None
        self.vector_store = None
        self.papers_dir = Path("papers")
    async def __aenter__(self):
        """Async context manager entry."""
        await self.initialize()
        return self
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        """Async context manager exit."""
        await self.close()
        return None  # Don't suppress exceptions
    async def initialize(self):
        """Initialize the agent and prepare it for paper analysis."""
        if self.initialized:
            return
        try:
            # Load environment variables
            load_dotenv()
            # Get API keys from environment
            deepseek_key = os.getenv('DEEPSEEK_API_KEY')
            if not deepseek_key:
                raise ValueError("DEEPSEEK_API_KEY environment variable is required")
            # Create papers directory if it doesn't exist
            self.papers_dir.mkdir(parents=True, exist_ok=True)
            # Initialize components
            self.llm_analyzer = LLMAnalyzer(api_key=deepseek_key, provider='deepseek')
            self.paper_store = PaperStore()
            self.vector_store = VectorStore()
            # Initialize database
            await self.paper_store.initialize()
            self.initialized = True
            logger.info("Agent controller initialized successfully")
        except Exception as e:
            logger.error(f"Failed to initialize agent controller: {e}")
            raise
    async def analyze_paper(self, paper_data: Dict[str, Any]) -> Dict[str, Any]:
        """
        Analyze a research paper using AI capabilities.
        Args:
            paper_data (dict): Paper metadata and content to analyze
        Returns:
            dict: Analysis results including summary, technical concepts, and fluff analysis
        """
        if not self.initialized:
            await self.initialize()
        try:
            # Get paper text (combine title and abstract)
            paper_text = f"Title: {paper_data.get('title', '')}\n\n"
            if paper_data.get('abstract'):
                paper_text += f"Abstract: {paper_data['abstract']}\n\n"
            # Analyze paper using LLM
            analysis = await self.llm_analyzer.analyze_paper(paper_text)
            # Store paper in databases
            paper_id = paper_data.get('entry_id')  # Use entry_id from arXiv
            if paper_id:
                logger.debug(f"Checking PostgreSQL for paper {paper_id}")
                existing = await self.paper_store.get_paper(paper_id)
                if existing:
                    logger.info(f"Paper {paper_id} already in database - skipping")
                    return existing
                logger.debug(f"Checking vector store for paper {paper_id}")
                if self.vector_store.paper_exists(paper_id):
                    logger.warning(f"Found orphaned vector entry for {paper_id} - repairing")
                    await self._repair_orphaned_paper(paper_id, paper_data)
                    return {}
                # Clean paper_id to use as filename
                safe_id = paper_id.split('/')[-1].replace('.', '_')
                # Save paper content to file
                paper_path = self.papers_dir / f"{safe_id}.json"
                with open(paper_path, 'w', encoding='utf-8') as f:
                    json.dump(paper_data, f, indent=2, ensure_ascii=False)
                # Store metadata in PostgreSQL
                metadata = {
                    'id': paper_id,
                    'title': paper_data['title'],
                    'authors': paper_data['authors'],
                    'summary': analysis.get('summary'),
                    'technical_concepts': analysis.get('technical_concepts'),
                    'fluff_score': analysis.get('fluff', {}).get('score'),
                    'fluff_explanation': analysis.get('fluff', {}).get('explanation'),
                    'pdf_url': paper_data.get('pdf_url')
                }
                await self.paper_store.store_paper(metadata)
                # Store in vector database for similarity search
                chunks = [
                    paper_text,  # Store full text
                    analysis.get('summary', ''),  # Store summary
                    analysis.get('technical_concepts', '')  # Store technical concepts
                ]
                chunk_metadata = [
                    {'paper_id': paper_id, 'type': 'full_text'},
                    {'paper_id': paper_id, 'type': 'summary'},
                    {'paper_id': paper_id, 'type': 'technical_concepts'}
                ]
                chunk_ids = [
                    f"{safe_id}_text",
                    f"{safe_id}_summary",
                    f"{safe_id}_concepts"
                ]
                self.vector_store.add_chunks(chunks, chunk_metadata, chunk_ids)
            return analysis
        except Exception as e:
            logger.error(f"Failed to analyze paper: {e}")
            raise
    async def process_query(self, query: str) -> List[Dict[str, Any]]:
        """
        Process a search query and return relevant papers.
        Args:
            query (str): Search query string
        Returns:
            list: List of relevant papers with their analysis
        """
        if not self.initialized:
            await self.initialize()
        try:
            # Get similar papers from vector store
            results = self.vector_store.query_similar(query)
            # Get unique paper IDs from results
            paper_ids = set()
            for metadata in results['metadatas']:
                if metadata and 'paper_id' in metadata:
                    paper_ids.add(metadata['paper_id'])
            # Get paper metadata from PostgreSQL
            papers = []
            for paper_id in paper_ids:
                paper = await self.paper_store.get_paper(paper_id)
                if paper:
                    # Load full paper data if needed
                    safe_id = paper_id.split('/')[-1].replace('.', '_')
                    paper_path = self.papers_dir / f"{safe_id}.json"
                    if paper_path.exists():
                        with open(paper_path, 'r', encoding='utf-8') as f:
                            full_paper = json.load(f)
                            paper['full_data'] = full_paper
                    papers.append(paper)
            return papers
        except Exception as e:
            logger.error(f"Error processing query: {e}")
            raise
    async def close(self):
        """Clean up resources."""
        if not self.initialized:
            return
        try:
            if self.llm_analyzer:
                await self.llm_analyzer.close()
            if self.paper_store:
                await self.paper_store.close()
            if self.vector_store:
                await self.vector_store.close()
            self.initialized = False
            logger.info("Agent controller closed successfully")
        except Exception as e:
            logger.error(f"Failed to close agent controller: {e}")
            raise
    async def _repair_orphaned_paper(self, paper_id: str, paper_data: Dict):
        """Repair a paper that exists in vector store but not PostgreSQL."""
        try:
            # Try to get fresh metadata from arXiv
            fresh_data = await self.arxiv_client.get_paper_by_id(paper_id)
            if fresh_data:
                # Store in PostgreSQL with original analysis data
                await self.paper_store.store_paper({
                    **fresh_data,
                    "technical_concepts": paper_data.get('technical_concepts'),
                    "fluff_score": paper_data.get('fluff_score'),
                    "fluff_explanation": paper_data.get('fluff_explanation')
                })
                logger.info(f"Repaired orphaned paper {paper_id}")
            else:
                # Paper no longer exists on arXiv - clean up vector store
                self.vector_store.delete_paper(paper_id)
                logger.warning(f"Deleted orphaned paper {paper_id} (not found on arXiv)")
        except Exception as e:
            logger.error(f"Failed to repair paper {paper_id}: {e}")