Building customised inference endpoint for foundation models chain

3 min readFeb 20, 2024

Written by: Lukasz Cmielowski, PhD

The task

Build the customised inference endpoint (REST API) for the foundation models chain. Chain of two foundation models is built using SequentialChain from Langchain. First of models generates the question based on provided topic that the second model needs to answer. For more details you can refer to medium story “Automating watsonx.ai foundation models with Langchain”.

In that story the focus is on moving the automation code to production environment as single inference endpoint that can be integrated with 3rd party web app.

The python function

watsonx.ai has capability to deploy python functions as online deployments (scoring endpoint). The code below shows how to wrap langchain automation code within such deployable function chain_text_generator(). The embedded function score() will be called each time the request to scoring endpoint is made. The outer function is executed only once (during deployment setup).

ai_params = {
    "credentials": credentials,
    "project_id": project_id,
    "generation_parameters": parameters
}

def chain_text_generator(params=ai_params):
    from ibm_watsonx_ai.foundation_models import Model
    from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes
    from langchain import PromptTemplate
    from langchain.chains import LLMChain, SequentialChain

    credentials = params["credentials"]
    project_id = params['project_id']
    parameters = params['generation_parameters']
    flan_ul2 = Model(model_id=ModelTypes.FLAN_UL2, params=parameters, credentials=credentials, project_id=project_id)
    flan_t5 = Model(model_id=ModelTypes.FLAN_T5_XXL, credentials=credentials, project_id=project_id)
    prompt_1 = PromptTemplate(input_variables=["topic"], template="Generate a random question about {topic}: Question: ")
    prompt_2 = PromptTemplate(input_variables=["question"], template="Answer the following question: {question}")
    prompt_to_flan_ul2 = LLMChain(llm=flan_ul2.to_langchain(), prompt=prompt_1, output_key='question')
    flan_to_t5 = LLMChain(llm=flan_t5.to_langchain(), prompt=prompt_2, output_key='answer')
    chain = SequentialChain(chains=[prompt_to_flan_ul2, flan_to_t5], input_variables=["topic"], output_variables=['question', 'answer'])

    def score(payload):
        """Generates question based on provided topic and returns the answer.
        """

        answer = chain({"topic": payload["input_data"][0]['values'][0][0]})
        return {'predictions': [{'fields': ['topic', 'question', 'answer'], 'values': [answer['topic'], answer['question'], answer['answer']]}]}

    return score

Let’s test the function code locally before publishing and deploying.

sample_payload = {
    "input_data": [
        {
            "fields": ["topic"],
            "values": [["life"]]
        }
    ]
}

inference = chain_text_generator()
inference(sample_payload)

{'predictions': [{'fields': ['topic', 'question', 'answer'],
   'values': ['life',
    'What is the most important element of life?',
    'water']}]}

Software specification

Deployment service needs to know what extra packages are required to run our function code correctly. We need to create custom software specification that will add few packages to default python runtime (langchain is one of them).

config_yml =\
"""
name: python310
channels:
  - empty
dependencies:
  - pip:
    - pydantic>=1.10.0
    - langchain==0.0.340
    - ibm-watsonx-ai
prefix: /opt/anaconda3/envs/python310
"""

with open("config.yaml", "w", encoding="utf-8") as f:
    f.write(config_yml)

from ibm_watsonx_ai import APIClient

client = APIClient(credentials)
client.set.default_space(space_id)
base_sw_spec_uid = client.software_specifications.get_uid_by_name("runtime-23.1-py3.10")
meta_prop_pkg_extn = {
    client.package_extensions.ConfigurationMetaNames.NAME: "langchain watsonx.ai env",
    client.package_extensions.ConfigurationMetaNames.DESCRIPTION: "Environment with langchain",
    client.package_extensions.ConfigurationMetaNames.TYPE: "conda_yml"
}

pkg_extn_details = client.package_extensions.store(meta_props=meta_prop_pkg_extn, file_path="config.yaml")
pkg_extn_uid = client.package_extensions.get_uid(pkg_extn_details)

After defining the package extensions let’s create langchain based software specification.

meta_prop_sw_spec = {
    client.software_specifications.ConfigurationMetaNames.NAME: "langchain & watsonx.ai",
    client.software_specifications.ConfigurationMetaNames.DESCRIPTION: "Software specification for langchain",
    client.software_specifications.ConfigurationMetaNames.BASE_SOFTWARE_SPECIFICATION: {"guid": base_sw_spec_uid}
}

sw_spec_details = client.software_specifications.store(meta_props=meta_prop_sw_spec)
sw_spec_uid = client.software_specifications.get_uid(sw_spec_details)
client.software_specifications.add_package_extension(sw_spec_uid, pkg_extn_uid)

Our software specification is successfully stored and can be linked with our function.

Inference endpoint

As a next step the function needs to be stored and deployed as scoring endpoint.

meta_props = {
    client.repository.FunctionMetaNames.NAME: "SequenceChain LLM function",
    client.repository.FunctionMetaNames.SOFTWARE_SPEC_UID: sw_spec_uid
}

function_details = client.repository.store_function(meta_props=meta_props, function=chain_text_generator)
function_id = client.repository.get_function_id(function_details)

The function has been uploaded and can be deployed.

metadata = {
    client.deployments.ConfigurationMetaNames.NAME: "Deployment of LLMs chain function",
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

function_deployment = client.deployments.create(function_id, meta_props=metadata)

The online deployment is ready.

Put together all pieces

Using deployment details we can extract deployment id required to call our function.

deployment_id = client.deployments.get_id(function_deployment)

Next, using the score() method make the call to it.

client.deployments.score(deployment_id, sample_payload)

{'predictions': [{'fields': ['topic', 'question', 'answer'],
   'values': [['life'],
    'When did the first life forms appear on Earth?',
    '3.8 billion years ago']}]}

If needed the scoring endpoint url can be extracted from deployment’s details and pure REST API request used to generate the text.

That’s all folks!

One more thing — you can find sample notebooks here.