Whereas current work on text-conditional 3D object era has proven promising outcomes, the state-of-the-art strategies sometimes require a number of GPU-hours to supply a single pattern. That is in stark distinction to state-of-the-art generative picture fashions, which produce samples in quite a lot of seconds or minutes. On this paper, we discover another methodology for 3D object era which produces 3D fashions in solely 1-2 minutes on a single GPU. Our methodology first generates a single artificial view utilizing a text-to-image diffusion mannequin, after which produces a 3D level cloud utilizing a second diffusion mannequin which situations on the generated picture. Whereas our methodology nonetheless falls wanting the state-of-the-art by way of pattern high quality, it’s one to 2 orders of magnitude sooner to pattern from, providing a sensible trade-off for some use circumstances. We launch our pre-trained level cloud diffusion fashions, in addition to analysis code and fashions, at this https URL.