New dimensions in Nvidia DLSS 3 - AI upscaling

New dimensions in Nvidia DLSS 3 - AI upscaling

Two key PC technologies have begun to emerge towards the end of 2018 - hardware-accelerated ray tracing and machine learning based super-sampling. The technologies have continued to be refined throughout the years with the introduction of the RTX 4000 graphics line. DLSS 3 has built a new capability in performance-boosting technology. We''ve been impressed by the results for ten days.

Nvidia has provided us with a GeForce RTX 4090 ahead of time, along with incomplete preview builds of three DLSS 3-enabled titles: the Portal RTX, Marvel''s Spider-Man, and Cyberpunk 2077. The latter shouldn''t be confused with the new RT Overdrive version, but DLSS 3 is capable of even running maxed, but RTX 4090 and DLSS 3 are compatible with this technique, but these games, for the very first time

The DLSS 3 is a combined of three different techniques Nvidia has spent years developing. It starts with the existing, highly successful DLSS 2 (though Intel XeSS and AMD FSR 2.x are getting closer) and this is joined by a DLSS frame generation. Basically, the GPU renders two frames and then inserts a new frame between them, which is enhanced by a new fixed function block in the new Ada Lovelace architecture - which Nvidia claims is three times faster than

Because frames are now being buffered, extra latency is added to the pipeline, which Nvidia wants to mitigate with its lag-reduction technology, Reflex. At worst, Reflex will reduce the extra lag caused by the extra buffering and perhaps even knock off further milliseconds. This means that prior Turing and Ampere cards cannot run DLSS frame generation. This is not the case for future versions of the DLSS 2.

I''m reminded of the previous AFR (alternate frame rendering) techniques used with SLI - where two graphics cards worked in tandem rendering every other frame. This has a similar increase in latency, although frame generation does not. The implications of the DLSS frame generation on the same GPU we had are not serious, but I believe it is a good fit for ultra-fast esports titles.

Extremely fast motion - particularly near the camera - might cause artefacts. However, the likelihood of acceleration is minimal, especially in actual gameplay. Although 120fps are relatively low, per-frame persistence isn''t much a problem. It''s only really with prolonged eyeballing that you can see where the DLSS 3 frame generation has weakened.

Please include JavaScript in its comparison tool.

Even then, the results of the new technique, rendered in 3 million pixels by the GPU, are far beyond the best of the offline frame-rate processors out there. We measured identical content from Marvel''s Spider-Man on the basis of Adobe After Effects'' Pixel Motion and Topaz Video Enhance AI''s Chronos SlowMo V3 models. The per-frame calculation cost there on a Ryzen 9 5950X, which is backed by specific hardware acceleration on the silicon, is inevitably poorer than those shown

The focus of the exercise is improving performance, but it also serves as a source of new experiences. Portal RTX is integrated into previous versions, making for fully path-traced renditions of traditional PC games. In its keynote, we discovered how Morrowind received a new RT look, but we''ve actually been hands-on with Portal RTX, which is a fascinating new way to look at the game.

In the first screenshot, you''ll see a 3.19x performance uplift from DLSS 2 on its own, which is increased to 5.29x with the addition of frame generation. Note that the DLSS 2 version is very much responsive than native rendering.

Portal RTX Test Chamber 14Perf DifferentialReflex OffReflex On
Native 4K100%129ms95ms
DLSS 2 Performance317%59ms53ms
DLSS 3 Frame Generation529%-56ms

The motherboard of Marvel''s Spider-Man presents a completely different challenge: even with a Core i9 12900K, today''s GPUs can easily be bottlenecked by the CPU when the game''s ray traced reflections are enabled. What''s wrong with this quicktime event is that at native 4K, we''re GPU constrained, while DLSS 2 reduces the number of CPU limitations.

Because the DLSS 3 frame generation does not rely on the CPU preparing instructions for the frames it creates, the performance increase kicks in the CPU being fully tapped out. The whole process in the trailer will be CPU-constrained at around 100-120fps. DLSS 3 frame generation is effectively doubleing the frame-rate.

I stuttered the GPU as much as possible, but especially because of Peter Parker''s visits to Feast HQ. Unfortunately, Reflex does not help latency much with DLSS 3, but it remains effective in reducing latency, which is difficult to achieve if the CPU is above its performance limit. Nonetheless, the game is so fast that latency figures are extremely low across the board.

Marvel''''s Spider-Man Feast HQPerf DifferentialReflex OffReflex On
Native 4K100%39ms36ms
DLSS 2 Performance136%24ms23ms
DLSS 3 Frame Generation219%-38ms

The final title for testing was a CD Projekt RED preview build of Cyberpunk 2077. Along with a long drive through Night City and into the desert, there''s more evidence that the lower the base frame-rate, the greater the performance multiplier.

Frame rates are increasing by up to a factor of four again, making one of the most demanding PC video games a reality. In the video embedded at the top of the page, you''ll see a fair amount of 4K 120fps capture reduced to 50 percent speed to work in a 60fps video. There''s also a sense of fluidity there.

Nvidia Reflex latency figures with DLSS 3 cannot match DLSS 2 with Reflex off, which I expect to be the "unofficial" target. Even so, the 12ms deficit here is unlikely to be a twitch shooter or an esports competitive event, but with that said, we''ll certainly need to see how latency rates will benefit in other DLSS 3 titles going forward.

Cyberpunk 2077 MarketPerf DifferentialReflex OffReflex On
Native 4K100%108ms62ms
DLSS 2 Performance258%42ms31ms
DLSS 3 Frame Generation399%-54ms

The RTX 4090 Ti, the last-gen Ampere architecture''s silicon champion, is narrowed in performance terms, but Nvidia recommended pure performance statistics to be held back for the review day embargo, where users may compare performance from the numbers provided by the PC press. While a limited DLSS 2 vs DLSS 3 comparison may not be completely ideal, I''d argue that it does represent the likely use-case scenario of those cards.

The first image from the static scene where I made the highest GPU load from Test Chamber 14 was displayed, along with two portals facing one another. DLSS 2 on Ampere vs DLSS 3 on Ada Lovelace essentially provides a three-times increase to performance overall. It is game-changing in that at the most basic level, a good experience on a 4K 60Hz variable refresh rate screen is close to flawlessly.

The same applies to the Cyberpunk 2077 preview design, where the performance multiplier gen-on-gen may not be as large as Portal RTX, but the base frame-rate on the RTX 3090 Ti side is larger. Once again, it''s the difference between a good 60Hz VRR experience on the older card up against a bad 120Hz VR experience with the RTX 4090.

RTX 3090 Ti DLSS 2RTX 4090 DLSS 3
Portal RTX Stress Test100%291%
Cyberpunk 2077 Market100%247%

First, let''s face brass tacks and identifying the obvious issues. This depends on the speed of the action and the ability of the DLSS 3 algorithm to track movement. The Spider-Man running image in the zoomer block above is a particularly challenging example. However, if the motion is too high, it may be difficult to see how different Spider-type images are shown. It also indicates how high the game time is for these three images.

In this scenario, the third-person Spider-Man image looks at the left of it in the zoomer block. Again, switch to full image mode and cycle between the three frames, as captured across a total of 24.9 million pixels. This represents something closer to normal motion within the game. Played on a 120Hz screen, this presents as a touch of flicker.

The next of the obvious questions is why is DLSS 3 frame generation available on RTX 2000 and 3000 cards? Nvidia says that the optical flow analyzer in Ada Lovelace is three times faster than the Ampere equivalent, which would have serious implications on DLSS 3''s 3ms generation cost. One thing that Alex Battaglia and I noticed in image quality comparisons with Adobe''s Pixel Motion and Topaz Video Enhance AI''s Chronos SlowMo model is that even poor-

Next up, let''s go deeper into how frame generation can help overcome the CPU limit. For Marvel''s Spider-Man, our tests with the Core i9 12900K doubled the game''s performance, while the game was still responsive to frame, even though the base frame-rate was entirely compromised by the CPU. However, frame generation may be called frame amplification. If the CPU isn''t supplying adequate frame-times, the stutter may be magnified too. There are a lot of

The intention was to cover DLSS 3 in large strokes without disrupting all of the whole review. However, the work ended up being more thorough than we imagined. We''ve still to marvel at what DLSS 3 offers and how it should be tested.

When gaming at an amplified 120 frames per second, there''s a question of how low the base frame-rate may be post DLSS 2. However, visual discontinuities in AI generated frames are difficult to understand, given that DLSS 3 actually works in transforming a 30fps game like 60fps? Are there inherent limitations in the image interpolation that we''ve never seen before?

With the RT Overdrive upgrade from Cyberpunk 2077, we see something potentially significant. This is a game that has been designed with radiation tracing. It''s a game that can only be found with two different renderers, and this is, in effect, an incredible experience for future PC users. This is, in effect, a path-traced rendition of one of the most demanding PC games on the market. Consoles could never do this, yet we''ll be returning to DLSS 3 and Cyber

Related Articles