Note: This README contains all essential documentation. Previous documentation files have been consolidated here. Component-specific READMEs remain in their respective directories (
chat/README.md
, etc.). Archived components and old docs are in thearchive/
directory.
The easiest way to test everything is to run the main pipeline:
cd /Users/sagar/projects/codechat
# Set environment variables (or use .env file)
export OPENAI_API_KEY="your-openai-key"
export PINECONE_API_KEY="your-pinecone-key"
export PINECONE_ENVIRONMENT="us-east-1-aws"
export PINECONE_INDEX_NAME="model-earth"
# Run the complete pipeline
python main.py
What this does:
config/repositories.yml
pip install -r requirements.txt
cp config/.env.example .env
# Edit .env with your actual API keys:
# OPENAI_API_KEY=your_openai_key
# PINECONE_API_KEY=your_pinecone_key
# PINECONE_ENVIRONMENT=your_pinecone_env
Verify that all modules can be imported correctly:
# Test core modules
python -c "from src.core.simple_processor import SimpleCodeProcessor; print('β
Simple processor import OK')"
python -c "from src.core.code_processor_main import Config; print('β
Main processor import OK')"
python -c "from src.core.summarizer import CodeSummarizer; print('β
Summarizer import OK')"
python -c "from src.core.embedding_generator import CodeEmbeddingGenerator; print('β
Embedding generator import OK')"
# Test utility modules
python -c "from src.utils.evaluate import *; print('β
Utils import OK')"
Test the basic file processing pipeline without external APIs:
# Run the simple processor (will use mock embeddings)
python -m src.core.simple_processor
# Expected output:
# π Processing 1 repositories...
# π¦ Processing: codechat
# β οΈ Repository path repo_analysis_output/test_repo not found, using current workspace for testing
# π Found X files to process
# π Processing: [filename]
# πΎ Would store chunk chunk_0 in Pinecone
# Index: model-earth, Namespace: codechat-test
# Embedding size: 1536
# Metadata keys: ['repo_name', 'file_path', 'chunk_content', 'chunk_summary', 'chunk_id', 'language', 'timestamp']
# β
Completed: codechat
# π Processing complete!
Test that configuration files are loaded correctly:
# Test repositories.yml loading
python -c "
from src.core.simple_processor import SimpleCodeProcessor
processor = SimpleCodeProcessor()
repos = processor.load_repositories()
print(f'β
Loaded {len(repos)} repositories from config')
for repo in repos:
print(f' - {repo.get(\"name\", \"unnamed\")}: {repo.get(\"url\", \"no url\")}')
"
Test individual components separately:
python -c "
from src.core.chunker.smart_chunker import SmartChunker
chunker = SmartChunker()
test_content = '''def hello():
print('Hello World')
class TestClass:
def method(self):
return 'test' '''
chunks = chunker.smart_chunk_file_from_content('test.py', test_content)
print(f'β
Generated {len(chunks)} chunks')
for i, chunk in enumerate(chunks[:2]): # Show first 2 chunks
print(f' Chunk {i}: {len(chunk[\"content\"])} chars')
"
python -c "
import os
if os.getenv('OPENAI_API_KEY'):
from src.core.summarizer import CodeSummarizer
summarizer = CodeSummarizer(os.getenv('OPENAI_API_KEY'))
test_code = 'def calculate_sum(a, b): return a + b'
summary = summarizer.summarize_full_code(test_code, 'test.py')
print('β
Summarizer working')
print(f'Summary: {summary.get(\"summary\", \"No summary\")[:100]}...')
else:
print('β οΈ OPENAI_API_KEY not set - skipping summarizer test')
"
python -c "
import os
if os.getenv('OPENAI_API_KEY'):
from src.core.embedding_generator import CodeEmbeddingGenerator
generator = CodeEmbeddingGenerator(os.getenv('OPENAI_API_KEY'))
test_text = 'def hello(): return \"world\"'
embedding = generator.generate_embedding(test_text)
print(f'β
Embedding generated: {len(embedding)} dimensions')
else:
print('β οΈ OPENAI_API_KEY not set - skipping embedding test')
"
Test the utility scripts in src/utils/
:
python src/utils/run_chunker_demo.py
python src/utils/evaluate.py
python src/utils/test.py
Run a complete end-to-end test (requires API keys):
# Make sure your .env file has the correct API keys
export $(cat .env | xargs)
# Run the main processor
python -m src.core.code_processor_main
# Expected: Complete processing pipeline with real API calls
Test the Lambda functions locally:
# Test code processor Lambda
python -c "
from src.lambdas.code_processor.index import lambda_handler
event = {'repositories': [{'url': 'https://github.com/test/repo', 'name': 'test'}]}
result = lambda_handler(event, None)
print('β
Lambda function executed')
print(f'Result: {result}')
"
If you want to test the web interface:
# Check if there are any web files
ls src/web/
# If there's an index.html, you can open it in a browser
# Or run a local server
python -m http.server 8000
# Then visit http://localhost:8000/src/web/index.html
# Check Python path
python -c "import sys; print('\n'.join(sys.path))"
# Add src to Python path manually
export PYTHONPATH=$PYTHONPATH:$(pwd)/src
# Check if config files exist
ls -la config/
# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('config/repositories.yml'))"
# Check environment variables
echo $OPENAI_API_KEY
echo $PINECONE_API_KEY
# Test API connectivity
python -c "import openai; openai.api_key = os.getenv('OPENAI_API_KEY'); print('β
OpenAI API accessible')"
Test | Expected Result | Notes |
---|---|---|
Import Test | β Success messages | All modules should import without errors |
Simple Processor | β File processing output | Should show chunking and mock storage |
Config Loading | β Repository list loaded | Should show configured repositories |
Chunker Test | β Chunks generated | Should create multiple chunks from test code |
Summarizer Test | β Summary generated | Requires OpenAI API key |
Embedding Test | β 1536-dim vector | Requires OpenAI API key |
Utility Scripts | β Script execution | May require additional setup |
Full Integration | β Complete pipeline | Requires all API keys |
Lambda Tests | β Function execution | Should handle events properly |
Test Suite | β Passing tests | May have some integration test failures |
Solution: Add src to Python path
export PYTHONPATH=$PYTHONPATH:$(pwd)/src
Solution: Check config/repositories.yml exists and has valid YAML
Solution: Copy config/.env.example to .env and fill in your keys
Solution: Make sure scripts are executable
chmod +x src/utils/*.py
For a fast sanity check, run this one-liner:
cd /Users/sagar/projects/codechat && python -c "
from src.core.simple_processor import SimpleCodeProcessor
from src.core.code_processor_main import Config
print('β
Core modules import successfully')
processor = SimpleCodeProcessor()
repos = processor.load_repositories()
print(f'β
Configuration loaded: {len(repos)} repositories')
print('π Ready for testing!')
"
This will verify that the restructuring worked correctly and the basic functionality is intact!
CodeChat uses a streamlined AWS serverless architecture focused on essential components:
Core Components:
query_handler
(main API), get_repositories
(repository listing)/query
, /repositories
)Frontend (chat/)
β HTTP requests
API Gateway (/query, /repositories)
β Lambda invocations
Lambda Functions (query_handler, get_repositories)
β Dependencies
Lambda Layer (Python packages)
β Configuration
S3 Bucket (modelearth_repos.yml)
CodeChat implements an advanced multi-agent chunking system for optimal code understanding:
The system uses agentic components for enhanced search:
# Query Analysis Agent
class QueryAnalysisAgent:
def analyze_query(self, query: str) -> QueryAnalysis:
# Determines query type and search strategy
# Returns: code_search, conceptual_search, debugging_help, etc.
# Repository Intelligence Agent
class RepositoryIntelligentSearchAgent:
def search(self, query_analysis: QueryAnalysis, repo_context: str):
# Executes targeted search based on query type
# Returns: relevant code chunks with explanations
# Set environment variables
export TF_VAR_openai_api_key="your-openai-key"
export TF_VAR_pinecone_api_key="your-pinecone-key"
# Deploy everything
./deploy-clean.sh
# 1. Build Lambda layers
cd backend/lambda_layers
pip3 install -r lambda_layer_query_handler_requirements.txt -t temp_layer/python/
zip -r lambda-layer-query-handler.zip temp_layer/python/
# 2. Deploy infrastructure
cd ../infra
terraform init
terraform apply -var-file="terraform-clean.tfvars"
# 3. Configure frontend
API_URL=$(terraform output -raw api_gateway_url)
echo "window.CODECHAT_API_ENDPOINT = '$API_URL';" >> ../../chat/script.js
config/modelearth_repos.yml
)repositories:
- name: "modelearth/webroot"
description: "Main website repository"
priority: "high"
- name: "modelearth/cloud"
description: "Cloud infrastructure"
priority: "medium"
# Required for deployment
export TF_VAR_openai_api_key="sk-..."
export TF_VAR_pinecone_api_key="..."
# Optional (have defaults)
export TF_VAR_aws_region="us-east-1"
export TF_VAR_pinecone_environment="us-east-1-aws"
export TF_VAR_pinecone_index="model-earth-jam-stack"
Archived Components (moved from active to archive):
Essential Components Kept:
Benefits Achieved:
/query
Submit search queries to the repository-intelligent search system:
{
"query": "How does authentication work?",
"repo_name": "modelearth/webroot",
"llm_provider": "bedrock"
}
/repositories
Get list of available repositories for search:
{
"repositories": [
{"name": "modelearth/webroot", "description": "Main website"},
{"name": "modelearth/cloud", "description": "Cloud infrastructure"}
]
}
The chat interface (chat/index.html
) provides:
# View Lambda logs
aws logs describe-log-groups --log-group-name-prefix "/aws/lambda/codechat"
# Monitor API Gateway metrics
aws cloudwatch get-metric-statistics --namespace AWS/ApiGateway
This streamlined architecture provides a robust, scalable foundation for repository-intelligent code search while maintaining simplicity and cost-effectiveness.
For component-specific documentation, see individual README files in chat/
, backend/
, etc. All major system documentation has been consolidated into this main README.