Skip to content

HaFred/worldmodels.mindspore

Β 
Β 

World Modelings on MindSpore

Let's explore the world modeling potentials on MindSpore ;)

Note

The interactive UI below is better rendered in vscode.

News

Janus-Pro is supported!

Capture

MindSpore implementation for Janus-Pro training/inference is now released! Supporting both multimodal understanding and visual generation on Ascend NPU. Decoupling visual encoding generation/understanding-specific tasks surely bring Omni-capability. Details can be found here.

MVDream is supported!

MVDream is a diffusion model that is able to generate consistent multiview images from a given text prompt. It shows that learning from both 2D and 3D data, a multiview diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings. Details can be found here

Input Prompt Rendererd MView Video 3D Mesh Generation in Color
an astronaut riding a horse
ast.mp4
<iframe title="an astronaut riding a horse_ms" frameborder="0" allowfullscreen mozallowfullscreen="true" webkitallowfullscreen="true" allow="autoplay; fullscreen; xr-spatial-tracking" xr-spatial-tracking execution-while-out-of-viewport execution-while-not-rendered web-share src="https://sketchfab.com/models/2191db5b61834839aac5238f60d70e59/embed"> </iframe>
Michelangelo style statue of dog reading news on a cellphone
mich.mp4
<iframe title="Michelangelo style statue of dog reading news_ms" frameborder="0" allowfullscreen mozallowfullscreen="true" webkitallowfullscreen="true" allow="autoplay; fullscreen; xr-spatial-tracking" xr-spatial-tracking execution-while-out-of-viewport execution-while-not-rendered web-share src="https://sketchfab.com/models/c21773f276884a5db7d47e41926645e4/embed"> </iframe>

These videos are rendered from the trained 3D implicit field in our MVDream model. Color meshes are extracted with the script MVDream-threestudio/extract_color_mesh.py.

InstantMesh is supported!

We support instantmesh for the 3D mesh generation using the multiview images extracted from the sv3d pipeline.

Capture

Using the multiview images input from 3D mesh extracted from the sv3d pipeline, we extracted 3D meshes as below. Please kindly find the input illustrated by following the link to the sv3d pipeline below.

akun

anya

<iframe title="akun_ms" frameborder="0" allowfullscreen mozallowfullscreen="true" webkitallowfullscreen="true" allow="autoplay; fullscreen; xr-spatial-tracking" xr-spatial-tracking execution-while-out-of-viewport execution-while-not-rendered web-share src="https://sketchfab.com/models/c8b5b475529d48589b85746aab638d2b/embed"></iframe>
<iframe title="anya_ms" frameborder="0" allowfullscreen mozallowfullscreen="true" webkitallowfullscreen="true" allow="autoplay; fullscreen; xr-spatial-tracking" xr-spatial-tracking execution-while-out-of-viewport execution-while-not-rendered web-share src="https://sketchfab.com/models/180fd247ba2f4437ac665114a4cd4dca/embed"></iframe>

The illustrations here are better viewed in viewers than with HTML support (e.g., the vscode built-in viewer).

Stable Video 3D is supported!

Output Vis
Output Multiview Images (21x576x576)

A camera-guided diffusion model that can generate the multiview snippet of a given image! Details can be found here.

More Inference Demos
Input Output


aaa


aaa multiview


akun


akun multiview


anya


anya multiview


bag


bag multiview


groot


groot multiview


princess-large


princess-large multiview

Quick tour

To install MindONE v0.3.0, please install MindSpore 2.5.0 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run.

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

sd3
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

  • mindone diffusers is under active development, most tasks were tested with mindspore 2.5.0 on Ascend Atlas 800T A2 machines.
  • compatibale with hf diffusers 0.32.2
component features
pipeline support text-to-image,text-to-video,text-to-audio tasks 160+
models support audoencoder & transformers base models same as hf diffusers 50+
schedulers support diffusion schedulers (e.g., ddpm and dpm solver) same as hf diffusers 35+

supported models under mindone/examples

task model inference finetune pretrain institute
Image-to-Video hunyuanvideo-i2v πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Tencent
Text/Image-to-Video wan2.1 πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Text-to-Image cogview4 πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Zhipuai
Text-to-Video step_video_t2v πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ StepFun
Image-Text-to-Text qwen2_vl πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Any-to-Any janus πŸ”₯πŸ”₯πŸ”₯ βœ… βœ… βœ… DeepSeek
Any-to-Any emu3 πŸ”₯πŸ”₯ βœ… βœ… βœ… BAAI
Class-to-Image varπŸ”₯πŸ”₯ βœ… βœ… βœ… ByteDance
Text/Image-to-Video hpcai open sora 1.2/2.0 πŸ”₯πŸ”₯ βœ… βœ… βœ… HPC-AI Tech
Text/Image-to-Video cogvideox 1.5 5B~30B πŸ”₯πŸ”₯ βœ… βœ… βœ… Zhipu
Text-to-Video open sora plan 1.3 πŸ”₯πŸ”₯ βœ… βœ… βœ… PKU
Text-to-Video hunyuanvideo πŸ”₯πŸ”₯ βœ… βœ… βœ… Tencent
Text-to-Video movie gen 30B πŸ”₯πŸ”₯ βœ… βœ… βœ… Meta
Video-Encode-Decode magvit βœ… βœ… βœ… Google
Text-to-Image story_diffusion βœ… βœ–οΈ βœ–οΈ ByteDance
Image-to-Video dynamicrafter βœ… βœ–οΈ βœ–οΈ Tencent
Video-to-Video venhancer βœ… βœ–οΈ βœ–οΈ Shanghai AI Lab
Text-to-Video t2v_turbo βœ… βœ… βœ… Google
Image-to-Video svd βœ… βœ… βœ… Stability AI
Text-to-Video animate diff βœ… βœ… βœ… CUHK
Text/Image-to-Video video composer βœ… βœ… βœ… Alibaba
Text-to-Image flux πŸ”₯ βœ… βœ… βœ–οΈ Black Forest Lab
Text-to-Image stable diffusion 3 πŸ”₯ βœ… βœ… βœ–οΈ Stability AI
Text-to-Image kohya_sd_scripts βœ… βœ… βœ–οΈ kohya
Text-to-Image stable diffusion xl βœ… βœ… βœ… Stability AI
Text-to-Image stable diffusion βœ… βœ… βœ… Stability AI
Text-to-Image hunyuan_dit βœ… βœ… βœ… Tencent
Text-to-Image pixart_sigma βœ… βœ… βœ… Huawei
Text-to-Image fit βœ… βœ… βœ… Shanghai AI Lab
Class-to-Video latte βœ… βœ… βœ… Shanghai AI Lab
Class-to-Image dit βœ… βœ… βœ… Meta
Text-to-Image t2i-adapter βœ… βœ… βœ… Shanghai AI Lab
Text-to-Image ip adapter βœ… βœ… βœ… Tencent
Text-to-3D mvdream βœ… βœ… βœ… ByteDance
Image-to-3D instantmesh βœ… βœ… βœ… Tencent
Image-to-3D sv3d βœ… βœ… βœ… Stability AI
Text/Image-to-3D hunyuan3d-1.0 βœ… βœ… βœ… Tencent

supported captioner

task model inference finetune pretrain features
Image-Text-to-Text pllava πŸ”₯ βœ… βœ–οΈ βœ–οΈ support video and image captioning

About

The world model collections developed on MindSpore

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Other 0.3%