Salesforce: Monitoring Bulk Load Jobs

I talked about the goodies Salesforce Bulk API quite some time ago. I can’t deny that the Bulk API is really useful especially for data migration or integration that involves thousands of records. But, as a Salesforce administrator, it is very important for me to know what data has been retrieved or updated via Salesforce web service. Lucky that Salesforce do provide a console that allows me to monitor the Bulk API job that has been submitted to Salesforce instance. It can be accessed via Setup -> Administration Setup -> Monitoring -> Bulk Data Load Jobs.

 
Figure 1. Bulk Load monitoring console
From the console, I can see:
  1. the user who made the request
  2. object that involved in the request
  3. total records of the request
  4. type of operation of the request (e.g, query, create, update, upsert or delete)
Besides, it also allows me to view the Bulk API call request and result in the Job Details screen show below.
Figure 2. Job Detail
Hopefully this information is helpful for all Salesforce administrator (just like me :D ) that needs to monitor the activities happen in Salesforce.

Talend Salesforce Connector – modifying it for serial Bulk API mode

I know, I know, I said I wouldn’t keep on about the Salesforce BULK API, but this is a good tip to share. This time I would like to share the problem  I encountered during my integration process development using Talend Data Integration Studio and my solution. Feel free to read this post if you are not familiar with using the Salesforce BULK API in Talend.
 
I used tSalesforceBulkExec to upload 10000 records to Salesforce, and the number of rows in a batch was set 5000.
In Salesforce, the customer has a trigger that creates a new unique parent record (if it does not exist in Salesforce) for the record uploaded from Talend. During the upload, I encountered an error saying that the trigger failed to create an unique parent record.
 
The Problem
After spending a fair bit of time to troubleshoot the problem (phew…), I realized that the tSalesforceBulkExec component was using parallel mode. This caused the batches that I uploaded to Salesforce to be processed in parallel and  there were 2 records in different batches that were sharing the same parent record. The trigger was not able to identify the unique parent as the batches were processing in parallel.
 
The Solution
To overcome this, I thought of changing the BULK API concurrency mode to serial mode. However, Talend does not allow you to configure this option in the tSalesforceBulkExec component. So I had to modify the createJob() method in the SalesforceBulkAPI class.
private JobInfo createJob() throws AsyncApiException {
        JobInfo job = new JobInfo();
        job.setObject(sObjectType);
        job.setOperation(operation);
        if (OperationEnum.upsert.equals(operation)) {
            job.setExternalIdFieldName(externalIdFieldName);
        }
        //add the concurrency mode here
        job.setConcurrencyMode(ConcurrencyMode.Serial);
        job.setContentType(contentType);
        job = connection.createJob(job);
        // System.out.println(job);
        return job;
}
 
You can obtain the source of the SalesforceBulkAPI class in the <talend installation folder>/org.talend.designer.components.localprovider_5.0.1.r74687/components/tSalesforceBulkExec/salesforceBulkAPI.jar.
 
I hope this helps you if you ever encounter this issue.

Jitterbit Data Loader for Salesforce – Bulk API in serial or parallel mode

I know, I know, enough already on the Salesforce Bulk API running in serial or parallel mode.  I promise this is the last blog on this topic, but we couldn’t help ourselves and had to go back and take another look at the new Jitterbit tool to see if it supported the serial mode option for the Salesforce Bulk API.

The good news is that it does.

It’s not necessary to perform any particular setup or configuration to enable this feature in the Jitterbit Data Loader. When you create a bulk job via the “Bulk Processes” section in Jitterbit Data Loader, you will see the following:

When you create a bulk loading operation (upsert, insert, etc.) the BULK API configuration is configured in the job’s Advanced Options > Operation Options section in which you can:
  1. edit the Batch Settings (i.e. records per file, characters per file)
  2. set the concurrency mode to Serial mode (Parallel is the default mode)
  3. set whether to Compress the data

You can find out more about the Jitterbit Data Loader from their website and community.

 

Salesforce Integration Tips: Using the Bulk API – Serial or Parallel mode options

I talked about the usage of Salesforce BULK API in Talend Data Integration Studio and the  Dell Boomi Integration Platform previously. So I’m wondering, how many of us really know that there are actually 2 concurrency modes available in the Salesforce BULK API – Serial mode and Parallel mode.

Parallel mode
This is the default mode in the BULK API. Salesforce will process all the batches in a posted job in parallel and this gives you a better performance in uploading data. However, this might lead to lock contention which will cause the upload to fail.
Serial mode
Salesforce will process all the batches in a posted job one by one and this can help to prevent the lock contention issue that can be encountered in Parallel mode. But, please bear in mind that you should only use this mode if you couldn’t get around the lock contention issue.
How to Enable Serial Mode in Apex Data Loader?
You can enable this in Apex Data Loader settings as shown in the screenshot below.
How to Enable Serial Mode in Java Client?
You can set the concurrency mode in the job as the following:
JobInfo job = new JobInfo();
job.setObject(“ObjectName”);
job.setOperation(OperationEnum.query);
job.setConcurrencyMode(ConcurrencyMode.Serial);
job.setContentType(ContentType.CSV);
job = bulkConnection.createJob(job);

If you would like to know more information about concurrency mode in Salesforce BULK API, please visit the Bulk API Developer’s Guide:

Dell Boomi Integration Tips: Using the Salesforce Bulk API option

In the last blog post, I talked about the usage of Salesforce BULK API in one of the integration tools that we use.

This brought up my curiosity to find out how the BULK API option works in the Dell Boomi Salesforce connector. Out of the box the Salesforce connector only supports the BULK API option in parallel mode. Please the screenshot below:
There are 2 options that you need to look at:
1. Use Bulk API – to tell the connector to load data to Salesforce using Bulk API
2. Batch Count – the maximum number of records in a batch
To have your data upload with Bulk API, you just need to check on the “Use Bulk API” checkbox and specify the Batch Count (the default value is 200) in the operation configuration screen. The Dell Boomi Salesforce connector will handle everything for you automatically at the backend including preparing the data into batch according to the Batch Count.
As there is no option to turn on serial model, some options to address lock contention in Salesforce could be:
  1. Reduce batch sizes
  2. Use flow control

Salesforce BULK API in Talend Data Integration

Migrating or integrating your data from an in-house application or cloud application to Salesforce can be difficult, time consuming and consume a lot of API calls if you choose the wrong approach.
There are a lot of tools available in the Internet nowadays, and the most common tool that a developer is like to use is the Salesforce Apex Data Loader. The advantage of using this tool is that it does support the BULK API, which can help to save the number of API calls that you need during the upload. However, if you want to implement additional logic to manipulate the data before uploading the data to Salesforce, you’ll need consider to using an integration tool.
Here, I will go through the steps to implement a simple Talend Data Integration job which will upload data from a CSV file to the Salesforce Account object by utilizing the Salesforce Bulk API.
First, you need to create a new job in Talend. Here, I will start with tPrejob component and connect it to a tFileInputDelimited component. This will force the job to execute the tFileInputDelimited component during runtime to read the data that I want to load into Salesforce.
Figure 1 – Read data using tFileInputDelimited
 
Next, you have to specify the schema for the tFileInputDelimited component according to the fields that you have in the CSV file. Below is the schema that I use in this example:
Figure 2 – Schema for tFileInputDelimited
 
Now, you have to drag the tSalesforceOutputBulk component into the design workspace and specify the location to save the Salesforce bulk data load file and schema. Please note that the name of the field for the tSalesforceOutputBulk component must be exactly the same as the API name in that you see in Salesforce Account object (Setup -> App Setup -> Customize -> Accounts -> Fields).
Figure 3 – Salesforce bulk data load file location for tSalesforceOutputBulk component
 
Figure 4 – Schema for tSalesforceOutputBulk component
 
Once the schema for tFileInputDelimited and tSalesforceOutputBulk are specified, we will do a simple transformation in between and mapping by using tMap component. You need to drag the tMap component from the palette and:
1. connect the tFileInputDelimited to tMap by right click on the component -> Row -> Main
2. connect the tMap to tSalesforceOutputBulk by right click on the component -> Row -> *New Output* (Main) -> name the output (in this example, I name it as sf_data) then click Yes when it prompts you “Do you want to get the schema of the target component?”.
Figure 5 will be the current flow that I have in design workspace.
Figure 5 – Flows from tFileInputDelimited to tSalesforceOutputBulk
 
In the tMap component, you can map the fields and apply additional logic according to your business logic. In this example, I want to join the address1 and address2 to become BillingStreet in Salesforce
Figure 6 – Field mapping in tMap component
 
At this point, you have done first part of the process and we will move on to the second part where the job will read the Salesforce bulk data load file using the tSalesforceBulkExec component and save the success and failure result to CSV files (salesforce_account_bulk_success.csv and salesforce_account_bulk_fail.csv).
Figure 7 – tSalesforceBulkExec
 
Before you move on to the next step, you need to configure the connection settings and the number of rows to commit in the tSalesforceBulkExec component. The default Rows to commit is 10000. You can reduce the number according to your requirement. I will stick with 10000 in this example as my company has data to load in a day and this can help to save the number of API calls :)
Figure 8 – tSalesforceBulkExec connection setting
 
Figure 9 – Rows to commit in tSalesforceBulkExec
 
After that, you need to connect the Main row from tSalesforceBulkExec component to a tFileOutputDelimited to record the success record and the Reject row to another tFileOutputDelimited to record the failure record. The reason of doing this is you will be able to know which record is uploaded to Salesforce successfully and the record id in Salesforce. This makes your life easier if you would want to use the record id in another job. Below are the schemas that you should see in Main row and Reject row:
Figure 10 – Schema for Main row from tSalesforceBulkExec
 
Figure 11 – Schema for Reject row from tSalesforceBulkExec
 
Yes, now you are done and you should something similar to this in the design workspace. You can run the job by clicking on the Run button and you should see the data is uploaded to Salesforce.
Figure 12 – Complete flow