Document Processing (DocumentCall)¶
DocumentCall processes uploaded files using document processing models like docling. Unlike LLM jobs, document processing does not require a text prompt -- it analyzes files directly.
Upload files first
Document processing requires files to be uploaded before creating the job. The workflow is: Upload → Create Job → Submit.
Basic Usage¶
from microdc import Client, DocumentCall
client = Client(api_key="mDC_...")
# Step 1: Upload the file
upload_result = client.upload_file("document.pdf")
file_token = upload_result['id']
# Step 2: Create the document processing job
job = DocumentCall(model="docling")
job.add_file(file_token)
# Step 3: Submit and wait
job_id = client.send_job(job)
client.wait_for_all()
# Step 4: Get results
result = client.get_job_details(job_id)
if result.is_successful():
print(result.result)
client.acknowledge_job(job_id)
Configuration Options¶
job = DocumentCall(
model="docling", # Required: processing model
max_tokens=None, # Maximum tokens to generate
temperature=0.7 # Sampling temperature (0.0-2.0)
)
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
str |
(required) | Document processing model |
max_tokens |
int |
None |
Maximum output tokens |
temperature |
float |
0.7 |
Sampling temperature |
Adding Files¶
job = DocumentCall(model="docling")
# Add a single file
job.add_file(file_token)
# Add multiple files at once
job.add_files([token_1, token_2, token_3])
Batch Document Processing¶
Process multiple documents in parallel:
from microdc import Client, DocumentCall
client = Client(api_key="mDC_...")
# Upload multiple documents
documents = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
file_tokens = []
for doc_path in documents:
upload_result = client.upload_file(doc_path)
file_tokens.append(upload_result['id'])
print(f"Uploaded {doc_path}")
# Create and submit jobs
job_ids = []
for i, token in enumerate(file_tokens):
job = DocumentCall(model="docling")
job.add_file(token)
job.metadata = {"filename": documents[i]}
job_id = client.send_job(job)
job_ids.append(job_id)
# Wait for all jobs
client.wait_for_all(timeout=600)
# Collect results
for job_id in job_ids:
details = client.get_job_details(job_id)
filename = details.metadata['filename']
if details.is_successful():
print(f" {filename} processed successfully")
client.acknowledge_job(job_id)
else:
print(f" {filename} failed: {details.error_message}")
Document Analysis Pipeline¶
Combine document processing with LLM analysis:
from microdc import Client, DocumentCall, LLMChat
client = Client(api_key="mDC_...")
# Step 1: Process document with docling
upload_result = client.upload_file("research_paper.pdf")
file_token = upload_result['id']
doc_job = DocumentCall(model="docling")
doc_job.add_file(file_token)
doc_job_id = client.send_job(doc_job)
client.wait_for_job(doc_job_id)
# Step 2: Get extracted text
doc_details = client.get_job_details(doc_job_id)
document_text = doc_details.result.get('text', '')
# Step 3: Analyze with LLM
analysis = LLMChat(model="llama3.3")
analysis.set_system("You are a research paper analyst.")
analysis.add_user_message(
f"Summarize the key findings:\n\n{document_text}"
)
analysis_id = client.send_job(analysis)
client.wait_for_job(analysis_id)
result = client.get_job_details(analysis_id)
print(result.result)
Job Type Comparison¶
| Feature | LLM Jobs | Embed Jobs | Document Jobs |
|---|---|---|---|
| Text Input | Required | Required | Not needed |
| File Upload | Optional | No | Mandatory |
| Workflow | Create → Submit | Create → Submit | Upload → Create → Submit |