Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scalding] java.lang.IllegalArgumentException: given types array contains null #20

Closed
amatsukawa opened this issue Aug 1, 2014 · 17 comments

Comments

@amatsukawa
Copy link

This isn't exactly a bug in cascading-jdbc and more likely an in Scalding integration with it, but since I couldn't find any documentation or examples, thought I would file an issue and ask. I hope that's ok.

Scalding JDBC integration seems to have completely broken with 2.5.3 (and with current master).

class JDBCTestWrite(args: Args) extends Job(args) {
  import TDsl._
  // create a simple data set
  val data = IterableSource(List(1, 2, 3, 4, 5), 'value1)
  // read it and write to DB.
  data.read.write(DBOutputSource())
}

Produces an error:

Caused by: java.lang.IllegalArgumentException: given types array contains null
    at cascading.tuple.Fields.<init>(Fields.java:650)
    at cascading.jdbc.JDBCScheme.deriveInternalSinkFields(JDBCScheme.java:755)
    at cascading.jdbc.JDBCScheme.presentSinkFields(JDBCScheme.java:722)
    at cascading.tap.Tap.presentSinkFields(Tap.java:382)
    at cascading.flow.BaseFlow.presentSinkFields(BaseFlow.java:255)
    at cascading.flow.BaseFlow.updateSchemes(BaseFlow.java:206)
    at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:272)
    ... 20 more

This seems to be caused by fields.getType[i] returning null here:

types[ i ] = InternalTypeMapping.findInternalType( fields.getType( i ) );
.

This is part of the new JDBCScheme mechanism introduced in 2.5.3. Must all fields have types attached to them for jdbc integration to work now?

@fs111
Copy link
Contributor

fs111 commented Aug 1, 2014

In cascading-jdbc we need a way to determine the database types, so that we can create the table correctly. You can either do that by giving the types in the TableDesc object or we can also determine/guess them by looking at the Fields. If you pass the types via TableDesc, we should not run into this problem. Can you try that?

That being said, it is also a problem, if we have a null handling bug. I will see, if I can look into that in the coming days.

@amatsukawa
Copy link
Author

Sorry I'm quite new to the code base, what is the mechanism to provide types to a TableDesc? I see that we can provide String [] columnDefs as the constructor argument, will the type be inferred from that?

There are many constructors in JDBCScheme that take an array column names and constructs Fields from them that contain no types. Does that mean these cannot be used without providing a type through TableDesc?

EDIT: What would be useful for me is a description of how this mechanism has changed since in 2.5.3 compared to 2.5.2. How as the type inferred before? That would help me change the Scalding source code to make this work again.

@amatsukawa
Copy link
Author

I tried this snippet of code, which as far as I can tell, is the only type-related thing in TableDesc.
Of course this would not work in general, but is only tailored to the test job above for testing purposes.

val tableDesc = new TableDesc(tableName) //, columnNames, columnDefinitions, null)
tableDesc.completeFromFields(new Fields("value1", Integer.TYPE))
println("providing table desc")
println(tableDesc)
tableDesc

The fields were set as expected in TableDesc, but did not solve the problem :(

providing table desc
TableDesc{tableName='test.amatsukawa_scalding_jdbc', columnNames=[value1], columnDefs=[int not null], primaryKeys=null}  // <-- setting TableDesc worked as expected
14/08/01 22:57:10 INFO util.HadoopUtil: resolving application jar from found main method on: com.twitter.scalding.Tool$
14/08/01 22:57:10 INFO planner.HadoopPlanner: using application jar: /data/home/amatsukawa/vertica-jdbc-fix-deploy.jar
14/08/01 22:57:10 INFO property.AppProps: using app.id: BB2AE66B7E2E45738926AEF69091AACB
14/08/01 22:57:11 INFO jdbc.JDBCScheme: receiving final sink fields 'value1'
14/08/01 22:57:11 INFO jdbc.JDBCScheme: type of field value1 is null    // <-- I added this log line, this is the problem
Exception in thread "main" java.lang.Throwable: If you know what exactly caused this error, please consider contributing to GitHub via following link.
....
Caused by: java.lang.IllegalArgumentException: given types array contains null
    at cascading.tuple.Fields.<init>(Fields.java:650)
    at cascading.jdbc.JDBCScheme.deriveInternalSinkFields(JDBCScheme.java:757)
    at cascading.jdbc.JDBCScheme.presentSinkFields(JDBCScheme.java:722)
    at cascading.tap.Tap.presentSinkFields(Tap.java:382)
    at cascading.flow.BaseFlow.presentSinkFields(BaseFlow.java:255)
    at cascading.flow.BaseFlow.updateSchemes(BaseFlow.java:206)
    at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:272)
    ... 20 more

Since that didn't work, I'll do my best to provide an overview of the problem as I understand it from jumping in the Cascading code. There are many constructors in JDBCScheme that take an array of column names but not fields, eg. we use:

// JDBCScheme.java#L199
public JDBCScheme( Class<? extends DBInputFormat> inputFormatClass, Class<? extends DBOutputFormat> outputFormatClass, String[] columns, String[] orderBy, String conditions, String[] updateBy ) {
    this( inputFormatClass, outputFormatClass, columns, orderBy, conditions, -1, updateBy );
}

// which calls
public JDBCScheme( Class<? extends DBInputFormat> inputFormatClass, Class<? extends DBOutputFormat> outputFormatClass, String[] columns, String[] orderBy, String conditions, long limit, String[] updateBy ) {
    this( inputFormatClass, outputFormatClass, new Fields( columns ), columns, orderBy, conditions, limit, updateBy != null ? new Fields(updateBy ) : null, updateBy );
}

As you can see, a new Fields( columns ) is created from columns with no types. This is then used to construct the Flow, used in in the ElementGraph to hook the sink up to the tail Extent, becomes the output Scope used for the tap, and eventually gets used here (https://github.com/Cascading/cascading/blob/2.5/cascading-core/src/main/java/cascading/flow/BaseFlow.java#L267) and to give the fields back to JDBCScheme and results in our error.

@amatsukawa
Copy link
Author

Sorry for the consecutive updates, testing things as I go. I've confirmed that using a JDBCScheme constructor that actually accepts Fields and passing in a typed one solved the problem. However, I would not rather maintain a mapping of types in Scalding code. What would be the solution you recommend?

@fs111
Copy link
Contributor

fs111 commented Aug 2, 2014

There was indeed a bug with the type handling and version 2.5-4-wip-87 contains a fix. Please try that.

One thing to keep in mind though is, that some RDMBs systems like postgres do not support automatic coercion/casting, meaning if we try to write a tuple with a String to an int field, it will fail. By using Fields instances with proper types, we can avoid that problem, since we do a proper type coercion in java, before handing the data to the JDBC driver. I added a small paragraph to the README about that.

If this works for you, we can do a release of cascading-jdbc next week.

@amatsukawa
Copy link
Author

Thanks @fs111. Does the current master include 2.5.4-wip-87? I've tried
publishing my own jar with master and it didn't seem to work. I have some
other changes (eg. #19) that I would like to try out, so having the code
would be better.

On Sat, Aug 2, 2014 at 8:06 AM, André Kelpe [email protected]
wrote:

There was indeed a bug with the type handling and version 2.5-4-wip-87
contains a fix. Please try that.

One thing to keep in mind though is, that some RDMBs systems like postgres
do not support automatic coercion/casting, meaning if we try to write a
tuple with a String to an int field, it will fail. By using Fields
instances with proper types, we can avoid that problem, since we do a
proper type coercion in java, before handing the data to the JDBC driver. I
added a small paragraph to the README about that.

If this works for you, we can do a release of cascading-jdbc next week.


Reply to this email directly or view it on GitHub
#20 (comment)
.

@fs111
Copy link
Contributor

fs111 commented Aug 3, 2014

There is no master. The changes are in the wip-2.5 branch.

@amatsukawa
Copy link
Author

Oops, sorry I meant HEAD instead of master.
I've tried a combination of versions and this actually appears to come from upgrading to cascading 2.5.5. I do not see this error with cascading-jdbc 2.5.4-wip published from HEAD and cascading 2.5.4.

@amatsukawa
Copy link
Author

It's because the checking for nulls in Fields types is new in 2.5.5

  // 2.5.4
  public Fields( Comparable[] fields, Type[] types )
    {
    this( fields );

    if( isDefined() && types != null )
      {
      if( this.fields.length != types.length )
        throw new IllegalArgumentException( "given types array must be same length as fields" );

      this.types = copyTypes( types, this.fields.length );
      }
    }

  //2.5.5
  public Fields( Comparable[] fields, Type[] types )
    {
    this( fields );

    if( isDefined() && types != null )
      {
      if( this.fields.length != types.length )
        throw new IllegalArgumentException( "given types array must be same length as fields" );

      if( Util.containsNull( types ) )
        throw new IllegalArgumentException( "given types array contains null" );

      this.types = copyTypes( types, this.fields.length );
      }
    }

@fs111
Copy link
Contributor

fs111 commented Aug 5, 2014

Can you give me a minimal example, preferrably a test, that triggers your problem? I currently don't see, how there can be a null in the types array.

@amatsukawa
Copy link
Author

We just use the constructor

      new JDBCScheme(
        null,  // inputFormatClass
        null,  // outputFormatClass
        columnNames.toArray,
        null,  // orderBy
        filterCondition.getOrElse(null),
        updateByFields.toArray
      )

Which as far as I can tell gives no types to the fields, but is eventually expected. It seems to me all these constructors that don't directly provide a Fields argument as columns with types won't work at all. Could you give an example of the correct usage of these constructors? You mentioned providing types through TableDesc, but as far as I can tell there is no way to do that.

You can find a minimal scalding example at the initial post of the ticket. I don't have the setup to run vanilla cascading applications, although if you really need it I can look into setting it up. I will look into whether writing a test is easier.

@amatsukawa
Copy link
Author

Looking at the tests for JDBCScheme it looks like the test is written such that sink fields are just instantiated and passed with types.

https://github.com/Cascading/cascading-jdbc/blob/2.5/cascading-jdbc-core/src/test/java/cascading/jdbc/JDBCSchemeTest.java#L85

I jumped through cascading code (which I am very, very unfamiliar with so I could be completely wrong) but it looked to me like in practice these fields are actually derived from JDBCScheme.columnFields, which have null types if you use the constructor I gave above.

@fs111
Copy link
Contributor

fs111 commented Aug 6, 2014

You are looking at the wrong branch. This test has not types on the fields: https://github.com/Cascading/cascading-jdbc/blob/wip-2.5/cascading-jdbc-core/src/test/java/cascading/jdbc/JDBCTestingBase.java#L214

If you create a Fields instance with types, you have to give exactly one type per field and none of them is allowed to be null. If you don't give types at all, it will work as well. That is what I fixed in 2.5.4-wip-87. Can you confirm, that you are using that version?

@amatsukawa
Copy link
Author

I guess I was looking at the wrong test?
I've been working off HEAD of the wip-2.5 branch, and that did not work. I
guess I will give that exact jar version a try.

On Wed, Aug 6, 2014 at 8:36 AM, André Kelpe [email protected]
wrote:

You are looking at the wrong branch. This test has not types on the
fields:
https://github.com/Cascading/cascading-jdbc/blob/wip-2.5/cascading-jdbc-core/src/test/java/cascading/jdbc/JDBCTestingBase.java#L214

If you create a Fields instance with types, you have to give exactly one
type per field and none of them is allowed to be null. If you don't give
types at all, it will work as well. That is what I fixed in 2.5.4-wip-87.
Can you confirm, that you are using that version?


Reply to this email directly or view it on GitHub
#20 (comment)
.

@amatsukawa
Copy link
Author

Ah sorry, gave the branch another try and it worked. When you said a fix was in, I didn't realize you meant it was a very recent fix. I thought you were referring to a fix in the far past before this ticket. I had not pulled then wip-2.5 branch in a few days.

Sorry about the confusion, and thanks for the fix! Could we do a new cascading-jdbc release as you mentioned?

@concurrentinc
Copy link

wip releases are published on every push.

http://conjars.org/search?q=jdbc

ckw

On Aug 6, 2014, at 3:42 PM, Akihiro Matsukawa [email protected] wrote:

Ah sorry, gave the branch another try and it worked. When you said a fix was in, I didn't realize you meant it was a very recent fix. I thought you were referring to a fix in the far past before this ticket. I had not pulled then wip-2.5 branch in a few days.

Sorry about the confusion, and thanks for the fix! Could we do a new cascading-jdbc release as you mentioned?


Reply to this email directly or view it on GitHub.

Chris K Wensel
[email protected]
http://concurrentinc.com

@fs111
Copy link
Contributor

fs111 commented Aug 7, 2014

2.5.4 is released. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants