You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We send onto the name topic these values before starting the app with remember this topic is compacted.
457
+
458
+
```shell
459
+
tom:perks
460
+
tom:matthews
461
+
tom:stevens
462
+
sharon:news
463
+
sharon:car
464
+
tom:party
465
+
```
466
+
467
+
As expected we now process all the values. The buffering and cache layer does not merge the records.
468
+
469
+
```shell
470
+
Processing tom, perks
471
+
Joining the Stream Name perks to the KTable Name perks
472
+
Processing tom, matthews
473
+
Joining the Stream Name matthews to the KTable Name matthews
474
+
Processing tom, stevens
475
+
Joining the Stream Name stevens to the KTable Name stevens
476
+
Processing sharon, news
477
+
Joining the Stream Name news to the KTable Name news
478
+
Processing sharon, car
479
+
Joining the Stream Name car to the KTable Name car
480
+
Processing tom, party
481
+
Joining the Stream Name party to the KTable Name party
482
+
```
483
+
484
+
Output to the topic all the values.
485
+
486
+
```shell
487
+
perks
488
+
matthews
489
+
stevens
490
+
news
491
+
car
492
+
party
493
+
```
494
+
495
+
Lets do this same example and turn the cache back on.
496
+
497
+
```shell
498
+
tom:perks
499
+
tom:matthews
500
+
tom:stevens
501
+
sharon:news
502
+
sharon:car
503
+
tom:party
504
+
```
505
+
506
+
Results in the data being merged which is what we expected so there is no guarantee of compacting the data, it depends
507
+
on the `streamsConfiguration[StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG]` and consider `COMMIT_INTERVAL_MS_CONFIG`.
508
+
509
+
```shell
510
+
Processing sharon, car
511
+
Joining the Stream Name car to the KTable Name car
512
+
Processing tom, party
513
+
Joining the Stream Name party to the KTable Name party
514
+
```
515
+
516
+
Now as per this [JIRA](https://issues.apache.org/jira/browse/KAFKA-4113) you can set the timestamps of messages to 0 and
517
+
this will ensure the KTable behaves like a GlobalKTable.
518
+
519
+
Now lets follow this advice and use the custom timestamp extractor lets put the same data onto the topic. This time we
520
+
expect even with no cache that the data will only join with the latest timestamp record.
521
+
522
+
The data will still stream in order but the join will only ever be with the latest.
523
+
524
+
Place the data on the topic done new data this time
525
+
526
+
```shell
527
+
clark:perks
528
+
clark:matthews
529
+
clark:stevens
530
+
sarah:news
531
+
sarah:car
532
+
clark:party
533
+
```
534
+
535
+
Interesting with the cache disabled and this custom timestamp extractor using zero we still process all events and join
536
+
with the same timstamp.
537
+
538
+
```shell
539
+
Processing sarah, news
540
+
Joining the Stream Name news to the KTable Name news
541
+
Processing sarah, car
542
+
Joining the Stream Name car to the KTable Name car
543
+
Processing clark, perks
544
+
Joining the Stream Name perks to the KTable Name perks
545
+
Processing clark, matthews
546
+
Joining the Stream Name matthews to the KTable Name matthews
547
+
Processing clark, stevens
548
+
Joining the Stream Name stevens to the KTable Name stevens
549
+
Processing clark, party
550
+
Joining the Stream Name party to the KTable Name party
551
+
```
552
+
553
+
If read further up you see why:
554
+
555
+
```shell
556
+
What you could do it, to write a custom timestamp extractor, and return `0` for each table side record and wall-clock time for each stream side record. In `extract()` to get a `ConsumerRecord` and can inspect the topic name to distinguish between both. Because `0` is smaller than wall-clock time, you can "bootstrap" the table to the end of the topic before any stream-side record gets processed.
557
+
```
558
+
559
+
We need to set zero only for the bootstrap but here we are doing a self join.
560
+
561
+
Therefore we can implement a custom transformer and change the timestamp to the correct one on the stream flow whilst
562
+
setting to zero on the KTable consume.
563
+
564
+
Here is the customer Timestamp extractor where all values are set to timestamp zero.
565
+
566
+
```shell
567
+
class IgnoreTimestampExtractor : TimestampExtractor {
568
+
override fun extract(record: ConsumerRecord<Any, Any>?, partitionTime: Long): Long {
StreamsConfig.OPTIMIZE// do not create internal changelog have to have source topic as compact https://stackoverflow.com/questions/57164133/kafka-stream-topology-optimization
Copy file name to clipboardExpand all lines: kotlin-kafka-streams-examples/src/main/kotlin/com/perkss/kafka/reactive/BootstrapSemanticsSelfJoinTopology.kt
0 commit comments