Unlocking Production-Ready Text-to-SQL with Cost-Effective Fine-Tuning on Amazon Bedrock and SageMaker

The challenge of accurately translating natural language into SQL queries, particularly for custom database dialects and specialized schemas, has long been a significant hurdle for enterprise AI applications. While foundational models exhibit impressive capabilities with standard SQL, achieving the production-grade precision required for bespoke environments necessitates fine-tuning. However, this crucial step introduces a persistent operational trade-off: hosting these customized models on dedicated infrastructure incurs continuous costs, even during periods of minimal or zero utilization. This economic reality has prompted a search for more agile and cost-efficient solutions.
Amazon Web Services (AWS) is addressing this challenge through its Amazon Bedrock service, offering on-demand inference for fine-tuned models. By integrating the efficiency of Low-Rank Adaptation (LoRA) fine-tuning with serverless, pay-per-token inference, organizations can now develop sophisticated text-to-SQL capabilities without the burden of persistent model hosting costs. This innovative approach allows for custom model performance that scales with usage, rather than provisioned capacity. Recent demonstrations highlight the viability of this strategy, with an example workload achieving a monthly cost of just $0.80 for 22,000 queries, a substantial saving compared to traditionally hosted models.
This article delves into two distinct approaches to fine-tuning Amazon Nova Micro models for custom SQL dialect generation, emphasizing both cost efficiency and production-ready performance. These methods aim to democratize access to advanced text-to-SQL capabilities, enabling businesses to leverage their data more effectively.
The Evolving Landscape of Text-to-SQL
The ability to query databases using natural language has been a long-standing goal in data analytics and artificial intelligence. Early attempts often relied on rule-based systems or simpler machine learning models, which struggled with the inherent complexity and ambiguity of human language and the vast diversity of SQL syntax. The advent of large language models (LLMs) and, more recently, foundation models (FMs), has dramatically advanced the state of the art. These models, trained on massive datasets, can understand nuanced language and generate coherent responses.
However, the enterprise environment presents unique challenges. Databases often employ custom SQL dialects, incorporating proprietary functions or specific syntax conventions. Furthermore, database schemas can be highly domain-specific, featuring unique table and column names that are not present in generic training data. Consequently, a general-purpose text-to-SQL model, while competent, will often falter when faced with these specialized requirements. This gap necessitates fine-tuning, a process where a pre-trained model is further trained on a smaller, domain-specific dataset to adapt its behavior.
The Cost Conundrum of Fine-Tuning
The operationalization of fine-tuned AI models, including text-to-SQL solutions, has historically been a capital-intensive undertaking. Deploying a fine-tuned model typically involves provisioning dedicated compute resources, such as virtual machines or specialized hardware, that remain active 24/7. This persistent infrastructure incurs ongoing costs for power, cooling, maintenance, and licensing, regardless of whether the model is actively processing queries. For applications with intermittent or variable usage patterns, such as internal reporting tools or ad-hoc data exploration, this model of fixed operational expenditure can be economically unsustainable. The cost-per-query can skyrocket during low-usage periods, negating the benefits of AI-driven insights.
Amazon Bedrock and SageMaker: A Dual Approach to Fine-Tuning
AWS offers two primary pathways for organizations looking to fine-tune Amazon Nova Micro models for text-to-SQL tasks: Amazon Bedrock’s managed model customization and Amazon SageMaker’s comprehensive AI training capabilities. Each approach caters to different organizational needs and technical proficiencies, while both leverage the underlying efficiency of LoRA fine-tuning and the cost-effectiveness of on-demand inference.
The core innovation lies in the combination of LoRA, a parameter-efficient fine-tuning technique that significantly reduces the computational resources and data required for adaptation, with serverless inference. LoRA works by injecting small, trainable "adapter" matrices into the layers of a pre-trained model. Only these adapters are updated during fine-tuning, drastically reducing the number of parameters that need to be stored and trained, thus lowering the overall cost and complexity of the fine-tuning process.
When combined with serverless inference on Amazon Bedrock, the fine-tuned model is dynamically scaled based on demand. This means that compute resources are only allocated and billed when queries are being processed. The pay-per-token pricing model ensures that costs are directly tied to actual usage, making it highly economical for variable workloads. While LoRA fine-tuning does introduce a slight overhead in inference time due to the application of these adapters, testing has confirmed that the resulting latency remains well within acceptable limits for interactive text-to-SQL applications.
Prerequisites for Implementation
To embark on building these custom text-to-SQL solutions, organizations will need to ensure the following prerequisites are met:
- AWS Account: Access to an AWS account is fundamental for utilizing Amazon Bedrock, Amazon S3, and Amazon SageMaker services.
- IAM Permissions: Appropriate Identity and Access Management (IAM) roles and policies are required to grant services the necessary permissions to access resources like S3 buckets and to initiate training jobs.
- AWS CLI or SDK: Familiarity with the AWS Command Line Interface (CLI) or the AWS SDKs (e.g., Boto3 for Python) is beneficial for automating deployment and interaction with AWS services.
- Data Preparation Tools: Tools for data manipulation and transformation will be necessary for formatting the training dataset.
Solution Overview: A Unified Data Pipeline
Both the Amazon Bedrock and Amazon SageMaker approaches share a common foundation in data preparation. This crucial first step ensures that the training data is in the correct format for the chosen fine-tuning method.
The high-level steps involved in both solutions are:
- Dataset Preparation: Gathering and formatting a relevant dataset for text-to-SQL training.
- Data Upload: Storing the prepared dataset in Amazon S3 for accessibility by training services.
- Fine-Tuning: Executing the fine-tuning process using either Amazon Bedrock’s managed customization or Amazon SageMaker’s training jobs.
- Model Deployment: Making the fine-tuned model available for inference.
- Testing and Evaluation: Assessing the model’s accuracy and operational performance.
To illustrate these concepts, two distinct implementation paths are provided, each offering a GitHub code sample for practical application. The first prioritizes simplicity and rapid deployment through Amazon Bedrock’s managed model customization. The second caters to organizations requiring more granular control over hyperparameters and training infrastructure, utilizing Amazon SageMaker AI training jobs. Both implementations share the same robust data preparation pipeline and ultimately deploy to Amazon Bedrock for serverless, on-demand inference.
The architectural diagram below illustrates the end-to-end workflow, encompassing data preparation, the two fine-tuning approaches, and the final deployment to Amazon Bedrock.
[Insert Architecture Diagram Here – Placeholder as the original text provided an image URL]
1. Dataset Preparation: Building the Foundation for Accuracy
The efficacy of any fine-tuned model is heavily dependent on the quality and relevance of its training data. For this text-to-SQL use case, a comprehensive dataset is essential to capture the nuances of natural language queries and their corresponding SQL translations across diverse database structures.
Leveraging the sql-create-context Dataset
The demonstration utilizes the sql-create-context dataset, a curated amalgamation of the widely recognized WikiSQL and Spider datasets. This combined resource offers a rich tapestry of over 78,000 examples, each featuring a natural language question meticulously paired with its accurate SQL query. The dataset spans a broad spectrum of database schemas, ensuring exposure to various data models and complexities. This variety is critical for developing a text-to-SQL model that can generalize effectively, handling everything from straightforward SELECT statements to intricate multi-table joins with aggregations and subqueries.

Data Formatting and Structure for Bedrock
The training data must adhere to a specific format for ingestion by Amazon Bedrock’s model customization feature. This involves structuring the data into JSON Lines (JSONL) files. Each record within these files comprises a system prompt, providing context and instructions to the model, followed by a sequence of user messages (the natural language question) and assistant messages (the corresponding SQL query). This conversational format allows the model to learn the interaction pattern of question-answering based on provided schema information.
A sample record, as outlined in the AWS documentation for Amazon Bedrock, illustrates this structure:
"schemaVersion": "bedrock-conversation-2024",
"system": [
"text": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You can use the following table schema for context: CREATE TABLE head (age INTEGER)"
],
"messages": [
"role": "user",
"content": [
"text": "Return the SQL query that answers the following question: How many heads of the departments are older than 56 ?"
]
,
"role": "assistant",
"content": [
"text": "SELECT COUNT(*) FROM head WHERE age > 56"
]
]
This structured approach ensures that the model understands the context of the database schema and the intent behind the user’s question, leading to more accurate SQL generation. The formatted dataset is then partitioned into training and validation sets, uploaded to Amazon Simple Storage Service (Amazon S3) in JSONL format, and prepared for the subsequent fine-tuning processes.
2. Fine-Tuning Approaches: Bedrock vs. SageMaker
AWS provides two distinct pathways for fine-tuning Amazon Nova Micro models, each offering unique advantages: Amazon Bedrock’s managed model customization and Amazon SageMaker’s advanced AI training jobs.
2a. The Amazon Bedrock Model Customization Approach
Amazon Bedrock offers a streamlined, fully managed experience for fine-tuning Amazon Nova models. This approach abstracts away the complexities of infrastructure management, allowing data scientists and developers to focus on data preparation and model evaluation. It is particularly well-suited for teams prioritizing rapid iteration, minimal operational overhead, and a quick path to production for custom text-to-SQL capabilities.
Process Overview:
- Data Upload: The prepared training data, formatted as JSONL files, is uploaded to an Amazon S3 bucket.
- Job Configuration: Fine-tuning jobs are configured either through the intuitive AWS Management Console or programmatically using the AWS SDK for Python (Boto3).
- Managed Training: AWS automatically provisions and manages the underlying training infrastructure, handling all aspects of compute, storage, and orchestration.
- On-Demand Inference Deployment: Upon successful completion of the fine-tuning job, the custom model can be deployed for on-demand inference. This deployment retains the same pay-per-token pricing as the base Nova Micro model, with no additional markup, making it an exceptionally cost-effective solution for workloads with variable usage patterns.
This method is ideal when the objective is to quickly customize a model for specific SQL dialects without the need to manage complex ML infrastructure, when minimizing operational complexity is paramount, or when serverless inference with automatic scaling is a primary requirement.
Creating a Fine-Tuning Job Using Amazon Bedrock:
Fine-tuning jobs can be initiated using the AWS Console or the AWS SDK for Python (Boto3). The official AWS documentation provides comprehensive guidance on submitting training jobs. For a practical, step-by-step implementation, refer to the sample notebook available in the Amazon Nova Samples GitHub repository.
Configuring Hyperparameters for Nova Micro on Bedrock:
Optimizing the fine-tuning process involves adjusting specific hyperparameters. For Amazon Nova Micro fine-tuning on Amazon Bedrock, the following parameters can be customized to enhance text-to-SQL model performance:
| Parameter | Range/Constraints | Purpose | What Was Used (Example) |
|---|---|---|---|
| Epochs | 1–5 | Number of complete passes through the dataset | 5 epochs |
| Batch Size | Fixed at 1 | Samples processed per update | 1 (fixed for Nova Micro) |
| Learning Rate | 0.000001–0.0001 | Step size for optimization | 0.00001 |
| Learning Rate Warmup Steps | 0–100 | Gradual increase of learning rate at the start | 10 |
Note: These hyperparameters were optimized for a specific dataset and use case. Optimal values may vary based on dataset size and complexity. In the sample dataset, this configuration achieved a good balance between model accuracy and training time, completing in approximately 2-3 hours.
Analyzing Training Metrics:
Amazon Bedrock automatically generates and stores training and validation metrics in the specified S3 output location. These metrics provide insights into the model’s learning process, including loss curves for both training and validation sets. A consistent decrease in both training and validation loss, with similar patterns and convergence, indicates successful training.
[Insert Image: Graph depicting training loss validation metric]
Deploying with On-Demand Inference:
Once the fine-tuning job is successfully completed, the custom Nova Micro model can be deployed for on-demand inference. This deployment model offers automatic scaling and pay-per-token pricing, making it ideal for variable workloads without the need to provision dedicated compute resources.
Invoking the Custom Nova Micro Model:

After deployment, the custom text-to-SQL model can be invoked using the deployment ARN as the modelId in the Amazon Bedrock Converse API.
# Use the deployment ARN as the model ID
deployment_arn = "arn:aws:bedrock:us-east-1:<account-id>:deployment/<deployment-id>"
# Prepare the inference request
response = bedrock_runtime.converse(
modelId=deployment_arn,
messages=[
"role": "user",
"content": [
"text": """Database schema:
CREATE TABLE sales (
id INT,
product_name VARCHAR(100),
category VARCHAR(50),
revenue DECIMAL(10,2),
sale_date DATE
);
Question: What are the top 5 products by revenue in the Electronics category?"""
]
],
inferenceConfig=
"maxTokens": 512,
"temperature": 0.1, # Low temperature for deterministic SQL generation
"topP": 0.9
)
# Extract the generated SQL query
sql_query = response['output']['message']['content']['text']
print(f"Generated SQL:nsql_query")
2b. The Amazon SageMaker AI Fine-Tuning Approach
For organizations that require deeper control over the fine-tuning process, including granular adjustments to hyperparameters, training infrastructure, and integration with existing MLOps workflows, Amazon SageMaker AI offers a more comprehensive solution. SageMaker provides extensive flexibility that can significantly impact training efficiency and model performance.
Key Advantages of SageMaker AI:
- Granular Hyperparameter Control: SageMaker allows for fine-tuning of parameters such as batch size for speed and memory optimization, dropout settings across layers to prevent overfitting, and learning rate schedules for enhanced training stability.
- Advanced LoRA Configuration: Specifically for LoRA fine-tuning, SageMaker enables customization of scaling factors and regularization parameters, allowing for fine-tuning tailored to multimodal versus text-only datasets.
- Context Window and Optimizer Settings: Adjustments to the context window size and optimizer settings can be made to precisely match specific use case requirements.
- MLOps Integration: SageMaker seamlessly integrates with existing MLOps pipelines, facilitating streamlined model development, deployment, and monitoring.
Data Preparation and Upload for SageMaker:
The data preparation and upload process for SageMaker AI fine-tuning is identical to that of the Amazon Bedrock implementation. This involves converting the SQL dataset to the bedrock-conversation-2024 schema format, splitting the data into training and test sets, and uploading the JSONL files directly to S3.
# S3 prefix for training data
training_input_path = f's3://sess.default_bucket()/datasets/nova-sql-context'
# Upload datasets to S3
train_s3_path = sess.upload_data(
path='data/train_dataset.jsonl',
bucket=bucket_name,
key_prefix=training_input_path
)
test_s3_path = sess.upload_data(
path='data/test_dataset.jsonl',
bucket=bucket_name,
key_prefix=training_input_path
)
print(f'Training data uploaded to: train_s3_path')
print(f'Test data uploaded to: test_s3_path')
Creating a Fine-Tuning Job Using Amazon SageMaker AI:
The SageMaker approach involves selecting the model ID, a pre-configured training recipe, and the appropriate container image URI.
# Nova configuration
model_id = "nova-micro/prod"
recipe = "https://raw.githubusercontent.com/aws/sagemaker-hyperpod-recipes/refs/heads/main/recipes_collection/recipes/fine-tuning/nova/nova_1_0/nova_micro/SFT/nova_micro_1_0_g5_g6_48x_gpu_lora_sft.yaml"
instance_type = "ml.g5.48xlarge"
instance_count = 1
# Nova-specific image URI
image_uri = f"708977205387.dkr.ecr.sess.boto_region_name.amazonaws.com/nova-fine-tune-repo:SM-TJ-SFT-latest"
print(f'Model ID: model_id')
print(f'Recipe: recipe')
print(f'Instance type: instance_type')
print(f'Instance count: instance_count')
print(f'Image URI: image_uri')
Configuring Custom Training Recipes:
A significant advantage of using Amazon SageMaker AI for Nova model fine-tuning is the ability to customize training recipes. These recipes are pre-configured training stacks that accelerate the fine-tuning process. While maintaining compatibility with standard hyperparameters like epochs, batch size, learning rate, and warmup steps, SageMaker recipes extend hyperparameter options through:
- Extended Hyperparameter Options: Access to a wider array of parameters for fine-grained control.
- Customizable Training Stacks: Ability to define specific compute instances, storage configurations, and networking settings.
- Integration with AWS Services: Seamless integration with other AWS services for data storage, monitoring, and deployment.
The recommended strategy is to start with default recipe settings to establish a baseline, then iteratively optimize based on specific requirements. The following table lists some of the additional parameters that can be fine-tuned:
| Parameter | Range/Constraints | Purpose |
|---|---|---|
max_length |
1024–8192 | Controls the maximum context window size for inputs |
global_batch_size |
16, 32, 64 | Number of samples processed per weight update |
hidden_dropout |
0.0–1.0 | Regularization for hidden layers to prevent overfitting |
attention_dropout |
0.0–1.0 | Regularization for attention mechanism weights |
ffn_dropout |
0.0–1.0 | Regularization for feed-forward network layers |
weight_decay |
0.0–1.0 | L2 Regularization strength for model weights |
Adapter_dropout |
0.0–1.0 | Regularization for LoRA adapter parameters |
The complete recipe used in the example can be accessed via the provided GitHub link.
Creating and Executing a SageMaker AI Training Job:
After configuring the model and recipe, a ModelTrainer object is initialized to begin the training process.
from sagemaker.train import ModelTrainer
trainer = ModelTrainer.from_recipe(
training_recipe=recipe,
recipe_overrides=recipe_overrides,
compute=compute_config,
stopping_condition=stopping_condition,
output_data_config=output_config,
role=role,
base_job_name=job_name,
sagemaker_session=sess,
training_image=image_uri
)
# Configure data channels
from sagemaker.train.configs import InputData, S3DataSource
train_input = InputData(
channel_name="train",
data_source=S3DataSource(
s3_uri=train_s3_path,
s3_data_type="Converse",
s3_data_distribution_type="FullyReplicated"
)
)
val_input = InputData(
channel_name="val",
data_source=S3DataSource(
s3_uri=test_s3_path,
s3_data_type="Converse",
s3_data_distribution_type="FullyReplicated"
)
)
# Begin training
training_job = trainer.train(
input_data_config=[train_input,val_input],
wait=False
)
Following training, the model is registered with Amazon Bedrock via the create_custom_model_deployment API, enabling on-demand inference through the converse API. In the SageMaker AI training job example, using default recipe parameters (2 epochs, batch size 64) with a dataset of 20,000 lines, the training job lasted approximately 4 hours. On an ml.g5.48xlarge instance, the total cost for fine-tuning the Nova Micro model was $65.
3. Testing and Evaluation: Ensuring Accuracy and Performance
Rigorous testing is paramount to validate the effectiveness of the fine-tuned text-to-SQL models. This involves both accuracy assessments and operational performance evaluations.
Accuracy Testing with LLM-as-a-Judge
To evaluate the accuracy of the generated SQL queries, an "LLM-as-a-Judge" approach was implemented. This method involves collecting questions and the corresponding SQL responses from the fine-tuned model. A separate, capable judge model then scores these generated responses against the ground truth SQL queries. This technique provides a scalable and nuanced way to assess the model’s precision.
The scoring function, get_score, utilizes a prompt designed to instruct a judge model to compare a student’s answer (the generated SQL) against a correct answer, considering the question and schema context.
def get_score(system, user, assistant, generated):
formatted_prompt = (
"You are a data science teacher that is introducing students to SQL. "
f"Consider the following question and schema:"
f"<question>user</question>"
f"<schema>system</schema>"
"Here is the correct answer:"
f"<correct_answer>assistant</correct_answer>"
f"Here is the student's answer:"
f"<student_answer>generated</student_answer>"
"Please provide a numeric score from 0 to 100 on how well the student's "
"answer matches the correct answer. Put the score in <SCORE> XML tags."
)
_, result = ask_claude(formatted_prompt) # Assuming ask_claude is a function to call a judge model
pattern = r'<SCORE>(.*?)</SCORE>'
match = re.search(pattern, result)
return match.group(1) if match else "0"
Operational Testing: Latency and Throughput
Operational testing focuses on the real-world performance of the model, measuring key metrics such as Time to First Token (TTFT) and Output Tokens Per Second (OTPS). Compared to the base Nova Micro model, the fine-tuned version experienced a cold start TTFT averaging 639 ms across five runs, representing a 34% increase. This latency is attributed to the runtime application of LoRA adapters.

However, this architectural choice yields significant cost benefits. The fine-tuned Nova Micro model incurs the same token-based pricing as the base model, enabling on-demand pricing with pay-per-use flexibility and no minimum commitments. During normal operation (warm start), the time to first token averages 380 ms across 50 calls, a modest 7% increase. The end-to-end latency for complete response generation is approximately 477 ms. Token generation maintains a rate of roughly 183 tokens per second, a mere 27% decrease from the base model, which remains highly suitable for interactive applications.
[Insert Image: Graph showing comparison of time to first token cold start vs warm start]
4. Cost Summary and Implications
The economic advantages of the on-demand inference model are substantial, particularly when compared to traditional self-hosted solutions.
One-Time Costs:
- Bedrock Customization: Primarily the cost of training compute for the duration of the fine-tuning job.
- SageMaker Training: Costs associated with the compute instances used for training, based on instance type and duration.
Example Calculation for a Production Workload:
Consider a workload generating 22,000 queries per month (e.g., 100 users making 10 queries per day over 22 business days).
- Bedrock On-Demand Inference: For 22,000 queries, the estimated cost for inference using the pay-per-token model is exceptionally low, often in the range of $0.80 per month, as demonstrated in the example. This is a direct result of serverless scaling and usage-based billing.
- Self-Hosted Model Infrastructure: The cost of maintaining persistent infrastructure for a self-hosted model, including compute instances, networking, and management overhead, would typically range from hundreds to thousands of dollars per month, regardless of actual query volume.
This analysis clearly validates that for custom dialect text-to-SQL use cases, fine-tuning a Nova model using PEFT LoRA on Amazon Bedrock is significantly more cost-effective than self-hosting custom models on persistent infrastructure. While self-hosted approaches might be necessary for specific use cases demanding maximum control over infrastructure, security configurations, or complex integration requirements, the Amazon Bedrock on-demand cost model offers substantial savings for the majority of production text-to-SQL workloads.
Conclusion: Tailoring AI to Organizational Needs
The presented implementation options for Amazon Nova fine-tuning underscore the platform’s flexibility in adapting to diverse organizational needs and technical requirements. Two distinct approaches have been explored, each catering to different user profiles and use cases. Whether opting for the managed simplicity of Amazon Bedrock or the enhanced control offered by SageMaker AI training, the common thread is the serverless deployment model and on-demand pricing, ensuring that users pay only for what they consume while eliminating infrastructure management burdens.
The Amazon Bedrock model customization approach offers a streamlined, fully managed solution that removes the complexities of infrastructure management. Data scientists can concentrate on data preparation and model evaluation without the overhead of managing training infrastructure, making it an ideal choice for rapid experimentation and development cycles.
Conversely, the Amazon SageMaker AI training approach provides greater control over every facet of the fine-tuning process. Machine learning engineers gain granular command over training parameters, infrastructure selection, and seamless integration with existing MLOps workflows. This level of control enables meticulous optimization for required performance, cost, and operational objectives. For instance, users can adjust batch sizes and instance types to optimize training speed, or modify learning rates and LoRA parameters to achieve an optimal balance between model quality and training time, tailored to specific operational needs.
Choose Amazon Bedrock model customization when:
- Rapid iteration and development are priorities.
- Limited ML infrastructure expertise is available.
- Minimizing operational overhead is a key objective, while still achieving custom model performance.
Choose Amazon SageMaker AI training when:
- Fine-grained control over training parameters is essential.
- Specific infrastructure or compliance requirements must be met.
- Integration with existing MLOps pipelines is necessary.
- Optimizing every aspect of the training process for peak efficiency is desired.
Both approaches leverage the same cost-efficient deployment model, empowering organizations to select the path that best aligns with their team’s expertise and strategic requirements, rather than being constrained by cost considerations alone.
Get Started
Organizations ready to build their own cost-effective text-to-SQL solutions can access complete implementation guides and code samples through the provided GitHub links for both the Amazon Bedrock and Amazon SageMaker approaches. This allows for a practical, hands-on experience in developing and deploying advanced AI capabilities for data analysis.
About the Authors
[Placeholder for Author Information – as provided in the original text]







