This repository has been archived on 2025-02-10. You can view files and clone it, but cannot push or open issues or pull requests.
lastin-ai/architecture.md

1.6 KiB

AI Paper Review Agent Architecture

Overview

Modular system for fetching, processing, and summarizing academic papers using LangChain.

Core Components

  1. Data Acquisition Layer

    • arXiv API Client (REST interface)
    • PDF Downloader Service
    • Metadata Extraction Module
  2. Processing Pipeline

    • PDF Text Extractor (PyPDF2/Unstructured)
    • Semantic Chunker
    • Metadata Enricher (author institutions, citations)
  3. Analysis Engine

    • LangChain Document Loaders
    • Multi-stage Summary Chain (Deepseek r1)
    • Technical Concept Extractor
    • Cross-Paper Insight Aggregator
  4. Storage Layer

    • Relational Storage (PostgreSQL)
    • Vector Store (Chroma)
    • Cache System (Redis)
  5. Orchestration

    • Agent Controller Class
    • Retry Mechanism with Exponential Backoff
    • Quality Assurance Checks

Architectural Diagram

graph LR
    A[User Query] --> B(arXiv API)
    B --> C[PDF Storage]
    C --> D{Processing Queue}
    D --> E[Text Extraction]
    E --> F[Chunking]
    F --> G[Embedding]
    G --> H[Vector Store]
    H --> I[LLM Analysis]
    I --> J[Report Generation]

Key Decisions

  1. Modular Design: Components communicate via clean interfaces for easy replacement
  2. Batch Processing: Asynchronous pipeline for parallel paper processing
  3. Caching Layer: Reduces API calls and improves performance
  4. Fallback Strategies: Multiple PDF parsers with automatic fallback
  5. Security: Environment variables for credentials, encrypted storage

Dependencies

langchain==0.2.1
arxiv==2.1.0
unstructured==0.12.2
openai==1.30.1
faiss-cpu==1.8.0
sqlalchemy==2.0.30