Tag Archives: Talend Integration

Talend tSalesforceOutputBulkExec Component

Salesforce supports cascading update and insert function by using reference key. In Talend, we can perform the cascading UPSERT function by using tSalesforceOutputBulkExec or tSalesforceOutput components. The benefit of using this function is to reduce the API call from Talend to Salesforce. Imagine the number of calls to make if there are millions of record for data migration.In order for us to use this function, we need to have an external Id.
For example, when we upsert a Contact record and want it assigned to an Account record, all we will need to do is pass in the Account External Id. This will help to find the related Account record and assign it back to the Contact via the lookup field.
Now, let’s try to upsert some sample Contact records into Salesforce.
Contact.csv
contact
As you can see, there is a column called as accountId which is the Account’s external Id. Later in the Talend process we will use this value to perform cascading upsert function.
Account.csv
account
Before the following process, the Account.csv has already been preloaded. Below is the sample upsert job in Talend.
finalResult
Below is the mapping result:
talendMap
How to Configure:
In the tSalesforceOutputBulkExec component, select the ‘Advanced Setting’. Now focus on the ‘Relationship mapping for upsert’ column.
2
In the ‘Column name of Talend Schema’, select the external id value from the map. In our case, the AccountId from the contact profile(from the mapping) will be the external id. Secondly, the ‘Lookup field name’ is the lookup field between the Contact and Account objects in the Salesforce. Hence, it will be the Account field in Contact object. Do not be confused by the ‘External id name’ as it indicates the external id in Account object, not the Contact object. Below is the complete configuration:
cofiguration
We are finally done. Check out the results!
result

Talend – How to use tFileOutputMSDelimited (multi-schema output)

This is a simple guide to show how you can use Talend to generate a complex delimited file (e.g, combination of parent and related child records) by using the tFileOutputMSDelimited component. In this scenario, the sample output that we will be generated is:

Parent Row 1,P1,This is sample description,2013-08-01
Child Row 1,C1,P1
Child Row 2,C2,P1
Parent Row 2,P2,This is sample description,2013-09-01
Child Row 3,C3,P2
Child Row 4,C4,P2

Step 1:
Before we use the tFileOutputMSDelimited component to generate the designated output, we will need to prepare 2 files (we assume that you already know how to do this using Talend 😀 ):

One that contains the unique parent rows, for example:

Parent Name,Parent ID,Description,Date
Parent Row 1,P1,This is sample description,2013-08-01
Parent Row 2,P2,This is sample description,2013-09-01

And one that contains all the related child rows with parent id, for example:

Child Name,Child ID,Parent Ref ID
Child Row 1,C1,P1
Child Row 2,C2,P1
Child Row 3,C3,P2
Child Row 4,C4,P2

Please note that, the Parent ID in the child file is very important as it will be used to match the parent row.

Step 2:
Create a job to read the Parent file and Child file by using the tFileInputDelimited component as shown below.

These are the schemas that we configured in the parentFile and childFile tFileInputDelimited component respectively

Step 3:
Add a tFileOutputMSDelimited component to the job and link the main output of the parentFile and childFile component to it.

Step 4:
Now, let’s configure the tFileOutputMSDelimited component to look like this:

The first schema which appear as “tFileOutputMSDelimited_1” is referring to the parentFile schema. Below will be the explanation of the configuration:

  • Parent row – you should leave this as empty because this schema will be the parent data
  • Parent key column – you should leave this as empty because this schema will be the parent data
  • Key column – this will be the Parent ID column name that is specified in the parentFile schema shown in Step 3. This is important as it will be used as the unique identifier for child row.

The second schema which appear as “row2” is the childFile schema. Below will be the explanation of the configuration:

  • Parent row – set it to row1, this is to tell Talend to park all the related child record under the parent record from row1
  • Parent key column – this will be the same as the key column of the parent row
  • Key column – this should be the Parent Ref ID field in the child schema instead of the Child ID. The reason being is that Talend will use this as the unique key to group all the related child into a list.

Once you have done the configuration above, you are good to go.

Talend: Trigger a Job using Socket Listener

We talked about triggering a Talend job by using file listener previously. Today, let’s proceed to the next component which is the tWaitForSocket.

This component utilizes the ServerSocket object in Java to act as a server to listen to a port and trigger the integration process if there is an activity. As you can see in Figure 1, this component does not have much configuration if compares to the tWaitForFile component. You just need to specify the following:

  1. port to listen on
  2. response message for client
  3. action whether to continue listening on the port or close the socket

Figure 1: tWaitForSocket

Once the configuration is done, you can run the job to have it start listening on the port that you have specified. To test the configuration, you can create a simple Java client as shown:

import java.io.*;
import java.net.*;

public class TestConnection {

    public static void main( String [] args ) throws IOException {

 DataOutputStream dos = null;
 BufferedReader in = null;

        int timeout = 2000; // two seconds       

        String host = args[0];
        String port = args[1];
 String data = args[2];

        System.out.printf("Triggering Talend job: %s port: %s ....n", host, port );

        Socket socket = new Socket();
        InetSocketAddress endPoint = new InetSocketAddress( host, Integer.parseInt( port )  );

        try { 

 socket.connect(endPoint , timeout);
 System.out.printf("Success: %s  n",  endPoint );

 dos = new DataOutputStream(socket.getOutputStream());
 dos.writeBytes(data);

 in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
 System.out.println("Response from Talend: " + in.readLine());

        } catch( IOException ioe ) {

            System.out.printf("Failure: %s message: %s - %s n", endPoint , ioe.getClass().getSimpleName(),  ioe.getMessage());

        } finally {
 dos.close();
 socket.close();
        } 

    }

}

Important note:

This component is useful if you want to trigger a Talend job remotely, however, it does not take or process the data that send to the socket.

Talend: Validating XML file with XML Schema (XSD)

All of us know that generating XML data in integration process is easy, however, generating an XML file that is compliant with the standard of a target application is hard as the data from the source application may be different from the target application. The best practice to prevent sending invalid XML data to the target application is to validate it against the XML schema (XSD).

In Talend, you can use the tXSDValidator component to do this for you. As you can see in Figure 1 below, you just need to specify the XML schema to use and XML data file to be validated.

Figure 1: tXSDValidator configuration

Here are the sample scenario. I have created a XML data that contains “Australia” as the country,

<Contact>
    ...
    <Address>
        <Street>Allan Road</Street>
        <City>Sydney</City>
        <Postcode>5415</Postcode>
        <Country>Australia</Country>
    </Address>
    ...
<Contact>

However, the XML schema only allows a fixed set of values such as “AU”, “NZ” and etc.

<xs:simpleType name="Country">
    <xs:restriction base="xs:string">
        <xs:enumeration value="AU">
            <xs:annotation>
                <xs:documentation>AUSTRALIA</xs:documentation>
            </xs:annotation>
        </xs:enumeration>
        <xs:enumeration value="NZ">
            <xs:annotation>
                <xs:documentation>NEW ZEALAND</xs:documentation>
            </xs:annotation>
        </xs:enumeration>
        ...
        ...
    </xs:restriction>
</xs:simpleType>

When I run the job, I will see the following error.

Figure 2: Invalid XML

Once I fixed the error at line 15 and 26 as shown in the error message, I should get a success validation.

<Contact>
    ...
    <Address>
        <Street>Allan Road</Street>
        <City>Sydney</City>
        <Postcode>5415</Postcode>
        <Country>AU</Country>
    </Address>
    ...
<Contact>


Figure 3: Valid XML

Talend: Returning data from Child to Parent process

Sometimes, I prefer to split a very large and complicated logic process into multiple sub-processes and have them executed in sequence. This way, I can manage the process better and make my testing as well as troubleshooting easier. Ever wonder how does the data in such organized process being transmitted from one sub-process to another?

In Talend, we can use the tBufferOutput component to achieve. This component acts as a temporary memory pool that keeps all the data in a child process and have them return to the parent process to continue the processing. Let’s take a quick look at the simple demo below.

Figure 1: Child process

In child process shown in Figure 1, I output the data from a tFixedFlowInput component into a tBufferOutput component. By doing this, Talend will keep the data into memory temporarily.

Now, what I need to do is to call the child process from a parent process by using tRunJob component and have the output row connected to the next component as shown in Figure 2. In this demo, I will use tLogRow component to print the data that I get from the child process.


Figure 2: tRunJob component in Parent process

Once the configuration is done, I can then run the Parent process and Talend will print out the data from Child process automatically.


Figure 3: Result

Talend: Trigger a Job using File Listener

In Talend, there are 3 useful listener components that we can use to implement an integration process that is being executed/triggered on-demand. The first listener component that we discuss here is tWaitForFile. This component allows you to listen on a file directory and execute an integration process whenever a file is dropped into the directory, updated or even deleted from the directory.

Figure 1: tWaitForFile configuration

In Figure 1, you can see that you can configure how frequent the file listener should run, which directory to scan, type of file to be processed as well as the action on a file (created, updated or deleted) in the component. Another highlights of this listener is that it will ignore the file that has been processed in the directory even though the file is not deleted. This is very handy as you don’t need to cater for the file deletion logic in the process and you can keep those file for future reference.

Hopefully this information give you a quick kick start on how to use the file listener.

Talend: use tSchemaComplianceCheck to validate data

Inserting data from a system to another could be quite troublesome at times especially due to the difference in field length for both systems. However, this can be easily overcome in Talend by utilizing the tSchemaComplianceCheck component. This component can help to validate the data according to the schema that you have defined and it also allows you to catch the problematic data.

Here is a simple demo on how to use the component. I have the following data in a CSV file:

Name,UserID
Lean,WDCI0001
Deborah,WDCI0000000002

In my database, the length of the UserID field is 8 characters and this is the schema that I defined in Talend. Please note that the Nullable, Date Pattern and Length attributes are important as that will be used by the tSchemaComplianceCheck component to validate the data. In this example, I want to validate the length of all the fields that I have. So, I have defined the allowed field length in the Database component schema accordingly.

After that, I map the output row from the tMap component to the Database component through tSchemaComplianceCheck. In the tSchemaComplianceCheck setting, I set the mode to “check all columns from schema”. Once everything is set, I should see the second row of data from my CSV file is rejected by the tSchemaComplianceCheck due to the UserId exceeding the maximum length allowed as shown in the screenshot below.

Talend Tips: User Defined Map Function

We have discussed a lot of Talend Integration Studio’s component previously. Today, let’s move on to another interesting tips where we will define our own custom function and use it as a map (User Defined Map Function).
In order to create a user defined map function, first you will need to create a custom Routine in Talend as shown in Figure 1 below. In this demo, we named the Routine as DemoFunction.

Figure 1: Routine in Talend

Please take note that it is important to provide an informative comment in the Routine function as this will help us to understand the usage of the function. Now, we have created the Routine. Let’s take a look on how to call it in a map.

In the tMap component, you can call the user defined function that you have just created by: clicking on Expression Builder -> choose *User Defined from the Categories -> double click on the function “udfExample” and it will appears in the Expression Builder.
Figure 2: Expression Builder in tMap

Once you have added the user defined function, you just need to pass in a parameter and it will returns you the result. For example, if you pass in a String “Talend Demo”, then it will return you “Talend Demo. This is the “User Defined Function” as the result.

Hopefully this gives you an idea on how to start building your own user defined map function.

Talend: Connector lookup in tMap

Are you aware that you can actually do a connector lookup in Talend by using tMap component to retrieve the data? This is very useful as it helps to retrieve the data that you want instead of loading additional unnecessary data into Talend and helps to improve the performance of the Talend job.

I’m going to show you a simple example of how to achieve this


Figure 1: Overview of the process

The screenshot above shows the overview of the process that we are going to build. In this demo, I will focus on configuring the tMap component to enable lookup function and get the “getContactByAccountId” tSalesforceInput component to query Salesforce contact by AccountId.

Step 1: To enable the lookup function, you need to change the lookup model in the tMap to “Reload at each row” as shown in Figure 2.
 
Figure 2: tMap configuration
 
Step 2: As soon as you change the lookup model, a new configuration section which contains the “Expr.” and “globalMap key” will appear. This is the place where you specify the lookup key and the value to be hold by the lookup key. In this example, I use “accountid” as the key and map the Id field from “account” main flow to it as the lookup value. You can also see that the same Id field is mapped to the “AccountId” in “contact” lookup flow. This is to tell Talend that I would like to join the result of the main flow with lookup flow and match them with “AccountId”. You can refer to an old post if you would like to know more about Join Model in tMap.
 
Step 3: Now, click on Ok to save the changes and we will move on to configure the “getContactByAccountId” tSalesforceInput component. In the Query Condition input box of the tSalesforceInput component, you just need to specify the lookup key that you configured in the tMap. In order to get the value of the lookup key, you need to call it as following (or as shown in Figure 3):
 
globalMap.get(“accountid”)
 
Figure 3: Get the value of lookup key
 
Yup, now we are done. The Talend job now has connector lookup function enabled!

Talend Tip: error logging with Apache log4j

In an earlier blog, we explained how to use tLogCatcher to catch errors as they occur in the job and print them to the console. This time, we are going to improve the error logging by utilizing the famous error logging framework from Apache, log4j.

Firstly, you have to create a routine in Talend (we named it ApacheLog4jLogger in this example) and instantiate the Apache logger:
/* =============== Sample routine in Talend ===================*/
public class ApacheLog4jLogger {
  public static final Logger logger = Logger.getRootLogger();

  static {
    try {
      final File logDir = new File("logs");
      logDir.mkdirs();

      final File logFile = new File(logDir + "/mylog4j.log");
      
      final PatternLayout pl = new PatternLayout("%d %-5p: %m%n");
      final DailyRollingFileAppender dailyfp = new DailyRollingFileAppender( pl, logFile.getCanonicalPath(), ".yyyy-MM-dd" );
      
      logger.setLevel( Level.INFO );
      logger.addAppender( dailyfp );
      
      final ConsoleAppender cp = new ConsoleAppender( pl, "System.out" );
      cp.setImmediateFlush( true );

      logger.addAppender( cp );
      
    } catch ( Exception e ) {
      e.printStackTrace( System.err );
    }
  }
  
  ....
  ....
}
Once you have done this, you will need to create a static function in the routine that will accept the error message from the Talend job:
public static void writeLogFile(String message, String level, String errorCode) {
  if(level.equalsIgnoreCase("info")){
    logger.info(errorCode + " - " + message);
  } else if(level.equalsIgnoreCase("warn")){
    logger.warn(errorCode + " - " + message);
  } else if(level.equalsIgnoreCase("error")){
    logger.error(errorCode + " - " + message);
  }
}
The function will accept error messages and error codes and write the error to the log file according to the logging level (info, warn or error). Now we have completed creating the routine. Next, we will show you how to log the error by using this routine.
In order for Talend to build the job correctly, we have to load the log4j library into classpath by using the tLibraryLoad component as show below:
Now, you just need to replace the tLogRow with tJavaRow and link it to the tLogCatcher. Ah yes, you have to call the ApacheLog4jLogger routine within the tJavaRow in order to log the error to the log file.
Yay!! You have successfully created a job that uses log4j. You should see a log file created in the logs folder when the job encounters an exception.