Handling REST API Data in ADF: A Guide to Pagination and JSON Aggregation

When working with data from REST APIs, it's common to encounter limitations on how much data can be retrieved in a single call. Recently, I faced a challenge where the API limited responses to 1000 rows per call and lacked the usual pagination mechanism, such as a "next page URL" parameter in the response. This absence makes it difficult for developers to automate the data retrieval process, as there's no clear way to determine when all the data has been retrieved.

In scenarios like data backup, migration, or reporting, this limitation can become an obstacle. For instance, using a real-life scenario as an example: if your company manages its HR-related processes in SAP SuccessFactors, you’ll encounter this same challenge. Without a native connection between SuccessFactors and Power BI, one of the viable options for pulling data for reporting or building a data warehouse (DWH) is through REST API calls. While Power BI offers a quick solution via Power Query, it’s not always the best tool—particularly when dealing with larger datasets or when corporate policies limit your options. This is where Azure Data Factory (ADF) becomes a more appropriate choice. However, ADF presents its own challenges, such as handling API responses larger than 4MB or managing more than 5000 rows per lookup activity.

This article will walk you through overcoming these limitations when working with JSON files and API responses in ADF. By the end, you'll learn how to append multiple API responses into a single result and handle API calls with unknown pagination parameters.

While alternative solutions like Python or Power BI/FabricDataflow Gen2.0 may offer quicker implementations, there are cases where ADF is necessary due to restrictions or specific use cases. This guide is tailored for ADF developers or anyone interested in experimenting with new approaches.  

If you're familiar with working with APIs, you're likely aware that limitations on the number of rows per call are a common practice. Typically, APIs will let users know if there’s more data to retrieve by including a "next page URL" or similar parameter at the end of the response. This indicates that additional API calls are necessary to retrieve the full dataset.

However, in the scenario we faced, this part of the URL was missing, leaving no clear indication of whether more data remained in the system. Without this pagination rule, it's challenging to determine from a single successful API call whether more API requests are required to retrieve the complete dataset. This makes automating the data retrieval process more complicated, as you have to implement a method to check whether further calls are necessary.

This is the first issue we’ll solve using ADF.

The second issue involves handling API responses in JSON format and merging different JSON files into a single result.

If the requirement is to store all the data in one single JSON file, there are several approaches you can take:

  1. Create one single JSON file from all the responses and store or process it later.
  2. Generate multiple JSON files, one for each API call, and then flatten them into one file at a later stage in the ADF pipeline.
                  
     

Alternatively: Write the data directly to a SQL table, if the final destination is a database.

In this article, we’ll focus on Option 1—creating one single JSON file from multiple responses. While this may not be the ideal solution, it presents a unique challenge when working with JSON arrays in ADF.

While the solution itself is not overly complicated, there are a few important points where developers should proceed with caution.

First, be aware of the limitations ADF imposes on each LookUp Activity or Web Activity: the output size is restricted to 4 MBs or 5000 rows. If your response size slightly exceeds this limit, you can adjust the settings—lowering the number of rows per call from 1000 to, say, 800. However, keep in mind that this adjustment could significantly increase the overall runtime, especially if you're dealing with many columns of data. In such cases, consider an alternative approach, such as using the Copy Activity to write the data directly into a SQL database or generate multiple JSON files and merge these into one later.

Another critical point is the use of loops. The solution involves two loops, so it’s essential to carefully handle scenarios that could result in endless loops. Proper checks and conditions must be implemented to avoid such issues and ensure smooth execution.

Implementation - ADF

Here is the logic of the entire pipeline:

To manage API pagination and build a single valid JSON file in ADF, you will need to define several variables, as shown in the image above: Variables Setup:

           
  1. Skip X Rows (Integer):This variable will store the number of rows to skip in each REST API call, which is the dynamic part of the URL.
  2.        
  3. Skip X Rows - Temporary (Integer):Skip X Rows - Temporary (Integer): This variable is needed because ADF doesn’t support self-referencing for variables. You can’t directly update Skip X Rows using itself, so this temporary variable helps track progress.
  4.        
  5. REST API Response is empty? (Boolean):This Boolean flag will indicate whether the last API response was empty (i.e., no more data), triggering the loop to stop.
  6.        
  7. API Response Array (Array):Used to store each individual API response during the loop. This allows you to gather all responses one-by-one before processing them.
  8.        
  9. All API Response Array (Array) [Optional]:This array is optional and can be used to store all responses combined after the loop finishes.
  10.        
  11. Current JSON (String):Stores one individual API response in JSON format as a string.
  12.        
  13. Interim Combined (String):Stores the concatenated JSON responses as you append them together in the loop.
  14.        
  15. Combined JSON (String):Holds the final complete JSON result after all responses have been processed and combined.

Step-by-Step Execution

1)  Initialize Variables:

           
  • Set Skip X Rows to 0. This represents the starting point for the API.
           
  • Set Skip X Rows - Temporary to 0. This is a temporary counter to help update the primary skip rows.
           
  • Set REST API Response is empty? to false. This Boolean will control when to stop the loop.

         

2) Add an UNTIL activity: Set up a WHILE loop (or UNTIL activity) with the condition @equals(variables('REST API Response is empty?'), true) so it continues running as long as there is data to retrieve.

3) Inside the WHILE Loop: a) Lookup Activity (Initial API Call):

           
  • Perform a Lookup Activity calling the REST API, but limit the returned data to only one column (e.g., just the ID which should be never empty if present). This keeps the response light and allows you to check if more data exists.b) IF Condition (Check Response):
           
  • If the response is empty, set REST API Response is empty? to true to end the loop.
           
  • If not empty, proceed to the next step.c) Full API Call:
           
  • If the response is not empty, perform the full REST API call to retrieve the desired data.
           
  • Append the response to the API Response array variable.
  • d) Update Variables:
           
  • Increase Skip X Rows - Temporary by the number of rows retrieved (e.g., 1000).
           
  • Set Skip X Rows to the value of Skip X Rows - Temporary to update the dynamic part of the API URL.
  • 4) Handle Failure Scenarios:
           
  • Optionally, but highly recommended: add a Fail Condition or a Timeout Check. This condition will break the loop if there is a problem with the API response (e.g., a 404 error).

After gathering all the API responses, you'll have a list containing multiple JSON arrays. You’ll need to remove the unnecessary brackets, commas or other JSON elements. To do that you’ll need a for loop which iterates over all the JSON arrays in the array variable and modifies those accordingly.

The steps followed inside the for loop:

           
  1. Save the currently iterated JSON into 1 variable and save as string (string format is needed so you can manipulate the response as text).
           
  1. Modify JSONTo flatten a JSON files into a single file you need to remove the first “[“ or last “]” character so concatenating it will result in a valid file.

        3. Save it to another variable which will store the final JSON Due the lack of self referencing option in ADF you need to update the combined JSON variable every time a new JSON piece is added.  

And that’s it! You have successfully addressed both the pagination and JSON file handling challenges using ADF.

Implementation – Power BI

Compared to this, the solution in Power Query is much more straightforward. You need one function where you can control the number of rows you want to skip, which basically calls the API by 1000 rows. And you need another query which starts whit a while loop which calls the API as many times as it doesn’t return an empty response. Once it’s ready, you can combine the list of tables into one table. By expanding it, you’ll end up with the complete dataset.  Here is the code of the function:

let

   Source =  ( rows_to_skip as number ) =>

let

   Base_URL = "https://Your_API_URL",

   Relative_URL = "The relative URL part of your API call",

   Source = Json.Document(

       Web.Contents(

           Base_URL,

           [Relative_Path = Relative_URL & Number.ToText(rows_to_skip) ]

       )

   ),

   //Additionally you can convert the data directly with this function to table

   Convert_to_Table = Table.FromRecords ( {Source} )

in

   Convert_to_Table

in

   Source




Here is the query which will invoke the function and act as a while loop.  
let

Surce =

//Create a list of tables

List.Generate( () =>

// Try to call the function and set input parameter to 0 during the 1st call.

   [Result = try Function_by_1000(0) otherwise null, Page = 0],

//Checks if the inside of first row (referenced by the “{0}” part) is empty. Due to logic of this particular API it checks the “
results” inside the “d” parameter in the response.



   each not List.IsEmpty([Result]{0}[d.results]),

// Try to call the function again and increase the input parameter of the function by 1000 (max rows by API call)

   each [Result = try Function_by_1000([Page]+1000) otherwise null, Page = [Page]+1000],

   each [Result])

in

   Source

Blog posts

There's a new way to turn business ideas into app frameworks
April 23, 2025
5 min read

There's a new way to turn business ideas into app frameworks

Read blog

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

The past, present and future of ERP systems
April 9, 2025
6 min read

The past, present and future of ERP systems

Read blog

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Ready to talk about your use cases?

Request your free audit by filling out this form. Our team will get back to you to discuss how we can support you.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Stay ahead with the latest insights
Subscribe to our newsletter for expert insights, industry updates, and exclusive content delivered straight to your inbox.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.