You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the EvaDB issues and found no similar bug report.
Bug
The function is defined as the following:
import pandas as pd
from evadb.catalog.catalog_type import ColumnType
from evadb.functions.abstract.abstract_function import AbstractFunction
from evadb.functions.decorators.decorators import forward, setup
from evadb.functions.decorators.io_descriptors.data_types import PandasDataframe
class Chunk(AbstractFunction):
"""
Arguments:
None
Input Signatures:
input_dataframe (DataFrame) : A DataFrame containing a column of strings.
Output Signatures:
output_dataframe (DataFrame) : A DataFrame containing chunks of strings.
Example Usage:
You can use this function to concatenate strings in a DataFrame and split them into chunks.
"""
@property
def name(self) -> str:
return "Chunk"
@setup(cacheable=False)
def setup(self) -> None:
# Any setup or initialization can be done here if needed
pass
@forward(
input_signatures=[
PandasDataframe(
columns=["input_string"],
column_types=[ColumnType.TEXT],
column_shapes=[(None,)],
)
],
output_signatures=[
PandasDataframe(
columns=["chunks"],
column_types=[ColumnType.TEXT],
column_shapes=[(None,)],
)
],
)
def forward(self, input_dataframe):
# Ensure input is provided
if input_dataframe.empty:
raise ValueError("Input DataFrame must not be empty.")
# Define the maximum number of tokens per chunk
max_tokens_per_chunk = 16000 # Adjust this value as needed
# Initialize lists for the output DataFrame
output_strings = []
# Iterate over rows of the input DataFrame
for _, row in input_dataframe.iterrows():
input_string = row["input_string"]
# Split the input string into chunks of maximum tokens
chunks = [input_string[i:i + max_tokens_per_chunk] for i in range(0, len(input_string), max_tokens_per_chunk)]
output_strings.extend(chunks)
# Create a DataFrame with the output strings
output_dataframe = pd.DataFrame({"chunks": output_strings})
return output_dataframe
The row["input_string"] does not work when the input dataframe from the SlackCSV table has column named text. We get the following error message:
KeyError: 'input_string'
Environment
No response
Are you willing to submit a PR?
Yes I'd like to help by submitting a PR!
The text was updated successfully, but these errors were encountered:
Search before asking
Bug
The function is defined as the following:
The
row["input_string"]
does not work when the input dataframe from the SlackCSV table has column namedtext
. We get the following error message:Environment
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: