Skip to content
This repository was archived by the owner on Oct 26, 2023. It is now read-only.

Commit 917ab1a

Browse files
brandonh-msftnzthiago
authored andcommitted
Adding 'Notes' section to README.MD (#4)
* Adding 'Notes' section to README.MD - fixes #3 * Update README.md
1 parent 20ead2e commit 917ab1a

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,3 +207,8 @@ After deployment:
207207
- Issue an HTTP POST to that endpoint with the `path` parameter populated from the output of the PowerShell script you ran in [2.1](#21-copy-the-dataset-to-an-azure-blob-storage-instance)
208208

209209
You'll receive back a list of URLs you can use to check status, issue new events (not handled by this sample), or terminate the orchestration.
210+
211+
# Notes
212+
The implementation shown here utilizes a single reducer. If you needed multiple reducers, you would create **a sub-orchestration per reducer**, launch those in parallel, do the `CallActivityAsync()` calls within the sub-orchestrator, reduce the results, pass that up to the parent orchestrator to further reduce the results, and so on.
213+
214+
It's also important to remember while Serverless technologies allow you to scale "infinitely" we must use the right tool for the right job. Durable Functions will, in theory, scale out to run any number of jobs in parallel and come back to the reduce step so **this approach may work very well for loads that can be highly parallelized** however the machines on which Azure Functions run (in Consumption plan) are of limited specs ([more detail can be found here in the docs](https://docs.microsoft.com/en-us/azure/azure-functions/functions-scale#how-the-consumption-plan-works)). This means **this approach _may not_ work well for loads with very large sections to be processed by each mapper**. In this case, you may look in to hosting this implementation on an App Service Plan with large VMs or, even better, a Functions Premium offering with a large VM size to process the data more quickly. Another thing to note is the way DF passes results and parameters around; these are done via Azure Storage Queues which may/may not add unacceptable latency in to your big data process.

0 commit comments

Comments
 (0)