World Modelings on MindSpore

Let's explore the world modeling potentials on MindSpore ;)

Note

The interactive UI below is better rendered in vscode.

News

Janus-Pro is supported!

MindSpore implementation for Janus-Pro training/inference is now released! Supporting both multimodal understanding and visual generation on Ascend NPU. Decoupling visual encoding generation/understanding-specific tasks surely bring Omni-capability. Details can be found here.

MVDream is supported!

MVDream is a diffusion model that is able to generate consistent multiview images from a given text prompt. It shows that learning from both 2D and 3D data, a multiview diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings. Details can be found here

Input Prompt	Rendererd MView Video	3D Mesh Generation in Color
`an astronaut riding a horse`	ast.mp4	<iframe title="an astronaut riding a horse_ms" frameborder="0" allowfullscreen mozallowfullscreen="true" webkitallowfullscreen="true" allow="autoplay; fullscreen; xr-spatial-tracking" xr-spatial-tracking execution-while-out-of-viewport execution-while-not-rendered web-share src="https://sketchfab.com/models/2191db5b61834839aac5238f60d70e59/embed"> </iframe>
`Michelangelo style statue of dog reading news on a cellphone`	mich.mp4	<iframe title="Michelangelo style statue of dog reading news_ms" frameborder="0" allowfullscreen mozallowfullscreen="true" webkitallowfullscreen="true" allow="autoplay; fullscreen; xr-spatial-tracking" xr-spatial-tracking execution-while-out-of-viewport execution-while-not-rendered web-share src="https://sketchfab.com/models/c21773f276884a5db7d47e41926645e4/embed"> </iframe>

These videos are rendered from the trained 3D implicit field in our MVDream model. Color meshes are extracted with the script MVDream-threestudio/extract_color_mesh.py.

InstantMesh is supported!

We support instantmesh for the 3D mesh generation using the multiview images extracted from the sv3d pipeline.

Using the multiview images input from 3D mesh extracted from the sv3d pipeline, we extracted 3D meshes as below. Please kindly find the input illustrated by following the link to the sv3d pipeline below.

akun	anya
<iframe title="akun_ms" frameborder="0" allowfullscreen mozallowfullscreen="true" webkitallowfullscreen="true" allow="autoplay; fullscreen; xr-spatial-tracking" xr-spatial-tracking execution-while-out-of-viewport execution-while-not-rendered web-share src="https://sketchfab.com/models/c8b5b475529d48589b85746aab638d2b/embed"></iframe>	<iframe title="anya_ms" frameborder="0" allowfullscreen mozallowfullscreen="true" webkitallowfullscreen="true" allow="autoplay; fullscreen; xr-spatial-tracking" xr-spatial-tracking execution-while-out-of-viewport execution-while-not-rendered web-share src="https://sketchfab.com/models/180fd247ba2f4437ac665114a4cd4dca/embed"></iframe>

The illustrations here are better viewed in viewers than with HTML support (e.g., the vscode built-in viewer).

Stable Video 3D is supported!

Output Multiview Images (21x576x576)

A camera-guided diffusion model that can generate the multiview snippet of a given image! Details can be found here.

More Inference Demos

Input	Output
aaa	aaa multiview
akun	akun multiview
anya	anya multiview
bag	bag multiview
groot	groot multiview
princess-large	princess-large multiview

Quick tour

To install MindONE v0.3.0, please install MindSpore 2.5.0 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run.

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

mindone diffusers is under active development, most tasks were tested with mindspore 2.5.0 on Ascend Atlas 800T A2 machines.
compatibale with hf diffusers 0.32.2

component	features
pipeline	support text-to-image,text-to-video,text-to-audio tasks 160+
models	support audoencoder & transformers base models same as hf diffusers 50+
schedulers	support diffusion schedulers (e.g., ddpm and dpm solver) same as hf diffusers 35+

supported models under mindone/examples

task	model	inference	finetune	pretrain	institute
Image-to-Video	hunyuanvideo-i2v 🔥🔥	✅	✖️	✖️	Tencent
Text/Image-to-Video	wan2.1 🔥🔥🔥	✅	✖️	✖️	Alibaba
Text-to-Image	cogview4 🔥🔥🔥	✅	✖️	✖️	Zhipuai
Text-to-Video	step_video_t2v 🔥🔥	✅	✖️	✖️	StepFun
Image-Text-to-Text	qwen2_vl 🔥🔥🔥	✅	✖️	✖️	Alibaba
Any-to-Any	janus 🔥🔥🔥	✅	✅	✅	DeepSeek
Any-to-Any	emu3 🔥🔥	✅	✅	✅	BAAI
Class-to-Image	var🔥🔥	✅	✅	✅	ByteDance
Text/Image-to-Video	hpcai open sora 1.2/2.0 🔥🔥	✅	✅	✅	HPC-AI Tech
Text/Image-to-Video	cogvideox 1.5 5B~30B 🔥🔥	✅	✅	✅	Zhipu
Text-to-Video	open sora plan 1.3 🔥🔥	✅	✅	✅	PKU
Text-to-Video	hunyuanvideo 🔥🔥	✅	✅	✅	Tencent
Text-to-Video	movie gen 30B 🔥🔥	✅	✅	✅	Meta
Video-Encode-Decode	magvit	✅	✅	✅	Google
Text-to-Image	story_diffusion	✅	✖️	✖️	ByteDance
Image-to-Video	dynamicrafter	✅	✖️	✖️	Tencent
Video-to-Video	venhancer	✅	✖️	✖️	Shanghai AI Lab
Text-to-Video	t2v_turbo	✅	✅	✅	Google
Image-to-Video	svd	✅	✅	✅	Stability AI
Text-to-Video	animate diff	✅	✅	✅	CUHK
Text/Image-to-Video	video composer	✅	✅	✅	Alibaba
Text-to-Image	flux 🔥	✅	✅	✖️	Black Forest Lab
Text-to-Image	stable diffusion 3 🔥	✅	✅	✖️	Stability AI
Text-to-Image	kohya_sd_scripts	✅	✅	✖️	kohya
Text-to-Image	stable diffusion xl	✅	✅	✅	Stability AI
Text-to-Image	stable diffusion	✅	✅	✅	Stability AI
Text-to-Image	hunyuan_dit	✅	✅	✅	Tencent
Text-to-Image	pixart_sigma	✅	✅	✅	Huawei
Text-to-Image	fit	✅	✅	✅	Shanghai AI Lab
Class-to-Video	latte	✅	✅	✅	Shanghai AI Lab
Class-to-Image	dit	✅	✅	✅	Meta
Text-to-Image	t2i-adapter	✅	✅	✅	Shanghai AI Lab
Text-to-Image	ip adapter	✅	✅	✅	Tencent
Text-to-3D	mvdream	✅	✅	✅	ByteDance
Image-to-3D	instantmesh	✅	✅	✅	Tencent
Image-to-3D	sv3d	✅	✅	✅	Stability AI
Text/Image-to-3D	hunyuan3d-1.0	✅	✅	✅	Tencent

supported captioner

task	model	inference	finetune	pretrain	features
Image-Text-to-Text	pllava 🔥	✅	✖️	✖️	support video and image captioning

Name		Name	Last commit message	Last commit date
Latest commit History 664 Commits
.github		.github
docs		docs
examples		examples
mindone		mindone
scripts		scripts
tests		tests
tools		tools
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
awesome_vision.md		awesome_vision.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

World Modelings on MindSpore

News

Janus-Pro is supported!

MVDream is supported!

InstantMesh is supported!

Stable Video 3D is supported!

Quick tour

run hf diffusers on mindspore

supported models under mindone/examples

supported captioner

About

Uh oh!

Releases

Packages

Languages

License

HaFred/worldmodels.mindspore

Folders and files

Latest commit

History

Repository files navigation

World Modelings on MindSpore

News

Janus-Pro is supported!

MVDream is supported!

InstantMesh is supported!

Stable Video 3D is supported!

Quick tour

run hf diffusers on mindspore

supported models under mindone/examples

supported captioner

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages