Document Processing (DocumentCall)¶

DocumentCall processes uploaded files using document processing models like docling. Unlike LLM jobs, document processing does not require a text prompt -- it analyzes files directly.

Upload files first

Document processing requires files to be uploaded before creating the job. The workflow is: Upload → Create Job → Submit.

Basic Usage¶

from microdc import Client, DocumentCall

client = Client(api_key="mDC_...")

# Step 1: Upload the file
upload_result = client.upload_file("document.pdf")
file_token = upload_result['id']

# Step 2: Create the document processing job
job = DocumentCall(model="docling")
job.add_file(file_token)

# Step 3: Submit and wait
job_id = client.send_job(job)
client.wait_for_all()

# Step 4: Get results
result = client.get_job_details(job_id)
if result.is_successful():
    print(result.result)
    client.acknowledge_job(job_id)

Configuration Options¶

job = DocumentCall(
    model="docling",        # Required: processing model
    max_tokens=None,         # Maximum tokens to generate
    temperature=0.7          # Sampling temperature (0.0-2.0)
)

Parameter	Type	Default	Description
`model`	`str`	(required)	Document processing model
`max_tokens`	`int`	`None`	Maximum output tokens
`temperature`	`float`	`0.7`	Sampling temperature

Adding Files¶

job = DocumentCall(model="docling")

# Add a single file
job.add_file(file_token)

# Add multiple files at once
job.add_files([token_1, token_2, token_3])

Batch Document Processing¶

Process multiple documents in parallel:

from microdc import Client, DocumentCall

client = Client(api_key="mDC_...")

# Upload multiple documents
documents = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
file_tokens = []

for doc_path in documents:
    upload_result = client.upload_file(doc_path)
    file_tokens.append(upload_result['id'])
    print(f"Uploaded {doc_path}")

# Create and submit jobs
job_ids = []
for i, token in enumerate(file_tokens):
    job = DocumentCall(model="docling")
    job.add_file(token)
    job.metadata = {"filename": documents[i]}

    job_id = client.send_job(job)
    job_ids.append(job_id)

# Wait for all jobs
client.wait_for_all(timeout=600)

# Collect results
for job_id in job_ids:
    details = client.get_job_details(job_id)
    filename = details.metadata['filename']

    if details.is_successful():
        print(f"  {filename} processed successfully")
        client.acknowledge_job(job_id)
    else:
        print(f"  {filename} failed: {details.error_message}")

Document Analysis Pipeline¶

Combine document processing with LLM analysis:

from microdc import Client, DocumentCall, LLMChat

client = Client(api_key="mDC_...")

# Step 1: Process document with docling
upload_result = client.upload_file("research_paper.pdf")
file_token = upload_result['id']

doc_job = DocumentCall(model="docling")
doc_job.add_file(file_token)
doc_job_id = client.send_job(doc_job)
client.wait_for_job(doc_job_id)

# Step 2: Get extracted text
doc_details = client.get_job_details(doc_job_id)
document_text = doc_details.result.get('text', '')

# Step 3: Analyze with LLM
analysis = LLMChat(model="llama3.3")
analysis.set_system("You are a research paper analyst.")
analysis.add_user_message(
    f"Summarize the key findings:\n\n{document_text}"
)

analysis_id = client.send_job(analysis)
client.wait_for_job(analysis_id)

result = client.get_job_details(analysis_id)
print(result.result)

Job Type Comparison¶

Feature	LLM Jobs	Embed Jobs	Document Jobs
Text Input	Required	Required	Not needed
File Upload	Optional	No	Mandatory
Workflow	Create → Submit	Create → Submit	Upload → Create → Submit