Prompt tune the model using watsonx.ai python SDK

Lukasz Cmielowski, PhD
4 min readDec 11, 2023

The task

Based on a narrative input prompt, suggest the product that matches the query. The available products are: “credit_card”, “debt_collection”, “mortgages_and_loans”, “credit reporting”, and “retail_banking”.

[
{
"input": "Narrative: asked verification debt appeared credit report validate debt simply asked collector send signed medical document hipaa release itemized list showing debt send method validation date alleged hospital visit per fdcpa need removed unless furnish requested documentation within day\nProduct:\n",
"output": "debt_collection"
}
]

About prompt tuning

Foundation models are sensitive to the input. Your input, or how you prompt the model, can introduce context that the model will use to tailor its generated output.

Prompt-tuning applies machine learning to the task of prompt engineering. Instead of adding words to the input itself, prompt-tuning is a method for finding a sequence of values that, when added as a prefix to the input text, improve the model’s quality.

To find the best values for the prompt vector, you run a tuning experiment. You demonstrate the type of output that you want for a corresponding input by providing the model with input and output example pairs in training data. With each training run of the experiment, the generated output is compared to the training data output. Based on what it learns from differences between the two, the experiment adjusts the values in the prompt vector. After many runs through the training data, the model finds the prompt vector that works best.

Run tuning experiment

Using ibm-watson-machine-learning release 1.0.335 or higher one can run prompt tuning experiments programmatically. You can find the list of experiment configuration parameters in python API documentation.

First, let’s set up the tuner.

from ibm_watson_machine_learning.experiment import TuneExperiment

experiment = TuneExperiment(credentials, project_id=project_id)
prompt_tuner = experiment.prompt_tuner(
name="Tune the model to classify the text better.",
task_id=experiment.Tasks.CLASSIFICATION,
base_model='google/flan-t5-xl',
accumulate_steps=32,
batch_size=16,
learning_rate=0.2,
max_input_tokens=256,
max_output_tokens=20,
num_epochs=6,
tuning_type=experiment.PromptTuningTypes.PT,
verbalizer="Including narratives choice the best match product with the items from the list: 'credit_card', 'debt_collection', 'mortgages_and_loans', 'credit_reporting', 'retail_banking'. Input: {{input}} Output: ",
auto_update_model=True
)

After the tuner is well defined we can trigger the experiment. Change the background_mode parameter to True to run prompt tuning process in the background. The training data set (json format) that we’re using is located in IBM COS; in the example code, we reference it as a data connection.

tuning_details = prompt_tuner.run(
training_data_references=[data_conn],
background_mode=False)
##############################################

Running '706d26eb-ee3c-4e9b-b3fb-d5abd1a8b51e'

##############################################


pending....
running.................
completed
Training of '706d26eb-ee3c-4e9b-b3fb-d5abd1a8b51e' finished successfully.

If Prompt Tuning is running is the background, you can check the status using this method:

prompt_tuner.get_run_status()
'completed'

Let’s summarize the run using the prompt_tuner.summary() method and learning curves.

prompt_tuner.summary()
prompt_tuner.plot_learning_curve()

In the next step we will create an online deployment for our tuned model. That will allow us to access it through a REST API endpoint and integrate it with 3rd party applications.

Deployment

Define the metadata information like name for the deployment creation.

from datetime import datetime
meta_props = {
client.deployments.ConfigurationMetaNames.NAME: "Prompt-tuned model deployment.",
client.deployments.ConfigurationMetaNames.ONLINE: {},
client.deployments.ConfigurationMetaNames.SERVING_NAME : f"pt_sdk_deployment_{datetime.utcnow().strftime('%Y_%m_%d_%H%M%S')}"
}
deployment_details = client.deployments.create(model_id, meta_props)
#######################################################################################

Synchronous deployment creation for uid: 'a0613c34-faa5-49c0-86b5-9d3357293ce1' started

#######################################################################################


initializing
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='5217e49f-3c61-4f06-ac67-ca1b2de4bdf2'
------------------------------------------------------------------------------------------------

The deployment_id can be extracted from deployment_details:

deployment_id = deployment_details['metadata']['id']

Tuned model inference

Initialize the ModelInference class by passing generate text parameters and deployment_id.

from ibm_watson_machine_learning.foundation_models import ModelInference
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams

generate_params = {
GenParams.MAX_NEW_TOKENS: 20,
GenParams.STOP_SEQUENCES: ["\n"]
}

tuned_model = ModelInference(
deployment_id=deployment_id,
params=generate_params,
api_client=client
)

Analyze the product class for a sample prompt by calling the generate_text method.

tuned_model.generate_text(prompt="Including narratives choice the best match product with the items from the list: 'credit_card', 'debt_collection', 'mortgages_and_loans', 'credit_reporting', 'retail_banking'.\nComment: hi landed job reside ca needed room rent found place rent paid deposit dollar however position going didnt work longer needed rent place bay asked landlord refund security deposit refused told called back wellsfargo disputed transaction recently noticed card reversal checking account got charged amount dollar called bank werent able refund money also emailed landlord asking refund money ten day passed still response hope cfpb take action successfully resolve issue thank\nProduct:\n")
'credit_card'

You can call the tuned model now.

Evaluate the tuned model

Let’s evaluate the quality of the tuned model using test dataset (300 records). Using the accuracy score we’ll compare base and tuned models.

from sklearn.metrics import accuracy_score

tuned_model_results = tuned_model.generate_text(prompt=prompts_batch)
print(f'accuracy_score: {accuracy_score(products, tuned_model_results)}')
accuracy_score: 0.8006666666667

Calculate the accuracy of base model by initializing the ModelInference class with base model.

base_model = ModelInference(
model_id='google/flan-t5-xl',
params=generate_params,
api_client=client
)
base_model_results = base_model.generate_text(prompt=prompts_batch)
print(f'base model accuracy_score: {accuracy_score(products, base_model_results)}')
base model accuracy_score: 0.5333333333333333

As we can see, using a training data set (3000 records) and prompt-tuning improved the model’s accuracy by over 25%.

Here is the notebook with the code (train data has been subsampled to 700 to speed up the experience).

Have a good prompt tuning!.

--

--

Lukasz Cmielowski, PhD

Senior Technical Staff Member at IBM, responsible for AutoAI (AutoML).