Issues with function input names not matching the column name #1206

xzdandy · 2023-09-24T18:16:00Z

Search before asking

I have searched the EvaDB issues and found no similar bug report.

Bug

The function is defined as the following:

import pandas as pd
from evadb.catalog.catalog_type import ColumnType
from evadb.functions.abstract.abstract_function import AbstractFunction
from evadb.functions.decorators.decorators import forward, setup
from evadb.functions.decorators.io_descriptors.data_types import PandasDataframe

class Chunk(AbstractFunction):
    """
    Arguments:
        None

    Input Signatures:
        input_dataframe (DataFrame) : A DataFrame containing a column of strings.

    Output Signatures:
        output_dataframe (DataFrame) : A DataFrame containing chunks of strings.

    Example Usage:
        You can use this function to concatenate strings in a DataFrame and split them into chunks.
    """

    @property
    def name(self) -> str:
        return "Chunk"

    @setup(cacheable=False)
    def setup(self) -> None:
        # Any setup or initialization can be done here if needed
        pass

    @forward(
        input_signatures=[
            PandasDataframe(
                columns=["input_string"],
                column_types=[ColumnType.TEXT],
                column_shapes=[(None,)],
            )
        ],
        output_signatures=[
            PandasDataframe(
                columns=["chunks"],
                column_types=[ColumnType.TEXT],
                column_shapes=[(None,)],
            )
        ],
    )
    def forward(self, input_dataframe):
        # Ensure input is provided
        if input_dataframe.empty:
            raise ValueError("Input DataFrame must not be empty.")

        # Define the maximum number of tokens per chunk
        max_tokens_per_chunk = 16000  # Adjust this value as needed

        # Initialize lists for the output DataFrame
        output_strings = []

        # Iterate over rows of the input DataFrame
        for _, row in input_dataframe.iterrows():
            input_string = row["input_string"]

            # Split the input string into chunks of maximum tokens
            chunks = [input_string[i:i + max_tokens_per_chunk] for i in range(0, len(input_string), max_tokens_per_chunk)]

            output_strings.extend(chunks)

        # Create a DataFrame with the output strings
        output_dataframe = pd.DataFrame({"chunks": output_strings})

        return output_dataframe

The row["input_string"] does not work when the input dataframe from the SlackCSV table has column named text. We get the following error message:

KeyError: 'input_string'

Environment

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

xzdandy added the Bug 🐞 EVA is not working as expected label Sep 24, 2023

xzdandy added this to the v0.3.7 milestone Sep 24, 2023

xzdandy added this to EVA Public Roadmap ⚡🚀 Sep 24, 2023

xzdandy moved this to ToDo in EVA Public Roadmap ⚡🚀 Sep 24, 2023

jarulraj added High Priority ⚡️ User Experience labels Sep 25, 2023

jiashenC self-assigned this Sep 25, 2023

jiashenC linked a pull request Sep 27, 2023 that will close this issue

fix: issues with function input names not matching the column name #1227

Open

xzdandy added User Experience and removed User Experience Bug 🐞 EVA is not working as expected labels Sep 29, 2023

xzdandy removed this from the v0.3.7 milestone Sep 30, 2023

Jjx003 mentioned this issue Oct 31, 2023

Notify Users about Invalid Forward Headers #1329

Closed

2 tasks

haozihong mentioned this issue Nov 21, 2023

fix: Issues with function input names not matching the column name #1381

Open

Ignacio-DiLeva mentioned this issue Nov 26, 2023

Fix for function input names not matching the column name #1390

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with function input names not matching the column name #1206

Issues with function input names not matching the column name #1206

xzdandy commented Sep 24, 2023

Issues with function input names not matching the column name #1206

Issues with function input names not matching the column name #1206

Comments

xzdandy commented Sep 24, 2023

Search before asking

Bug

Environment

Are you willing to submit a PR?