WineD3D: WineD3D: Use the shader backend to enable / disable atifs and nvts

The drawback of this is that now the workaround for the mesa atifs bug
minimally affects nvidia cards too. It's a bit tricky to design around
driver bugs :-/

Use this patch instead, the last one had a small but pretty bad bug that
broken GLSL

This patch doesn't make sense at all. Wasn't ATIFS supposed to be an
implementation of fixed function fragment processing for ATI cards? In
that case it doesn't make sense to only use it on cards that don't
support ARB or GLSL. That will only make newer ATI cards have worse
fixed function fragment processing (assuming ATIFS works as
advertised). Of course what it comes down to again is that the shader
backend isn't the right place to implement this. (Not to mention the
issue of making a workaround for a Mesa bug part of the shader
infrastructure... that bug should be fixed in Mesa.)

Post by Stefan DÃ¶singer
+ void (*shader_fragment_enable)(IWineD3DDevice *iface, BOOL enable);

The name "shader_fragment_enable" is ambiguous at best within the
context of a shader backend.

Stefan Dösinger

2008-04-11 07:59:56 UTC

Am Freitag, 11. April 2008 08:58:05 schrieb H. Verbeet:

First of all: We really want that patch to go in, Wine is currently badly
broken on fglrx.

Post by H. Verbeet
This patch doesn't make sense at all. Wasn't ATIFS supposed to be an
implementation of fixed function fragment processing for ATI cards? In
that case it doesn't make sense to only use it on cards that don't
support ARB or GLSL. That will only make newer ATI cards have worse
fixed function fragment processing (assuming ATIFS works as
advertised). Of course what it comes down to again is that the shader
backend isn't the right place to implement this.

We need that in a shader backend if we are ever going to implement pixel
shaders using atifs. We also have to make nvts/nvrc a shader backend if we
ever implement pixel shaders using that extension.

As for the worse fragment processing on newer ATI cards: We do not want to use
ATIFS on those cards anyway, because ATIFS has a hardcoded limitation of 6
simultanous textures, whereas radeon 9500+ cards support 8 textures with
regular OpenGL fixed function processing. Also fglrx supports
GL_ATI_envmap_bumpmap on those cards, so we have bump mapping without ATIFS
as well(Still missing stuff like specular color input and rare blending
parameters though). For newer ATI cards we need an ARBFP / GLSL pipeline
replacement, no matter what we do with ATIFS.

Remember, my original patchset used ATIFS with newer cards as well. Ivan did
not like the 3 inherited atifs backends, so I've limited ATIFS to its final
use in a 'perfect' wine that has an ARBFP and GLSL replacement.

Post by H. Verbeet
(Not to mention the
issue of making a workaround for a Mesa bug part of the shader
infrastructure... that bug should be fixed in Mesa.)

It is fixed there already, but we're using the workaround to allow users of
older Mesa versions to run the tests without crashing. If it's just about me
we can take the workaround out again, I have the newest Mesa code...

H. Verbeet

2008-04-11 08:39:13 UTC

We need that in a shader backend if we are ever going to implement pixel
shaders using atifs. We also have to make nvts/nvrc a shader backend if we
ever implement pixel shaders using that extension.

I'm not convinced we should lump that together in the shader backend
just because it uses the same GL extension, but I guess we've been
through that discussion. Fact is that we don't currently implement
pixel shaders using atifs, and nobody knows how long it may take
before we will, if ever.

Post by Stefan DÃ¶singer
As for the worse fragment processing on newer ATI cards: We do not want to use
ATIFS on those cards anyway, because ATIFS has a hardcoded limitation of 6
simultanous textures, whereas radeon 9500+ cards support 8 textures with
regular OpenGL fixed function processing. Also fglrx supports
GL_ATI_envmap_bumpmap on those cards, so we have bump mapping without ATIFS
as well(Still missing stuff like specular color input and rare blending
parameters though). For newer ATI cards we need an ARBFP / GLSL pipeline
replacement, no matter what we do with ATIFS.

I think in practice the specular color input is probably more
important than the two extra textures. It's pretty rare for programs
to hit the nvrc limit of 4, and if they do they don't usually break as
bad as when they try to use the missing specular input.

Post by Stefan DÃ¶singer
First of all: We really want that patch to go in, Wine is currently badly
broken on fglrx.

Yes it's broken, but I'm not convinced this is the right way to fix
it. (I'd be fine with committing this just to fix the breakage, but I
do think this is simply more evidence that the atifs stuff is in the
wrong place.) Ultimately I'm just giving my opinion of course, it's up
to you and Alexandre to decide what to do with that.

Stefan Dösinger

2008-04-11 13:56:12 UTC

Post by H. Verbeet
I think in practice the specular color input is probably more
important than the two extra textures. It's pretty rare for programs
to hit the nvrc limit of 4, and if they do they don't usually break as
bad as when they try to use the missing specular input.

Not really, many games check the MaxSimultaneousTextures cap and use that
value even if they are using pixel shaders. E.g. HL2 fails in dxlevel 81 with
that with ARB shaders because it checks that cap if it finds SM 1.x, and Age
of Empires 3 fails as well. Call of duty 4 is another popular example, it
requires 8 fixed function textures. (COD4 wants TSSARGTEMP as well, so that
point is moot)

H. Verbeet

2008-04-11 14:10:49 UTC

Post by Stefan DÃ¶singer
Not really, many games check the MaxSimultaneousTextures cap and use that
value even if they are using pixel shaders. E.g. HL2 fails in dxlevel 81 with
that with ARB shaders because it checks that cap if it finds SM 1.x, and Age
of Empires 3 fails as well. Call of duty 4 is another popular example, it
requires 8 fixed function textures. (COD4 wants TSSARGTEMP as well, so that
point is moot)

Sure, but they don't generally break if you just return 8 there.
(Compared to eg. HL2/CSS in dxlevel70 without D3DTA_SPECULAR). Of
course the proper way to get around all of the issues at the same time
would be to write a real GLSL fixed function fragment processing
implementation, but that doesn't change the fact that the current
situation is pretty ugly.

Stefan Dösinger

2008-04-11 14:32:54 UTC

Post by H. Verbeet
Sure, but they don't generally break if you just return 8 there.
(Compared to eg. HL2/CSS in dxlevel70 without D3DTA_SPECULAR). Of
course the proper way to get around all of the issues at the same time
would be to write a real GLSL fixed function fragment processing
implementation, but that doesn't change the fact that the current
situation is pretty ugly.

I don't think it's the D3DTA_SPECULAR that breaks things. HL2 doesn't use it.
If it finds pixel shader support advertised it suddenly works without
D3DTA_SPECULAR support and does not use pixel shaders either, so I think
there's a different capability flag issue there(or HL2 is just broken on
cards without pixel shader support nowadays).

Stefan Dösinger

2008-04-11 14:18:49 UTC

Alexandre didn't commit the patch, I think we should come to an agreement on
this issue, otherwise it is going to come up again and again.

The requirements I have in mind are these:

1) The state table should be selectable based on the available opengl features
and possibly registry settings. I think we all agree on that

2) We want a fixed function vertex and fragment pipeline replacement with ARB
and GLSL

3) We want to be able to support pixel shaders with atifs and nvts. I don't
know if that will ever be implemented, but if we choose a design that makes
this hard or impossible that's not going to help

4) Even if we have an ARB/GLSL replacement, it should be possible to use D3D
shaders but still use the opengl fixed function pipeline

5) A nice to have is to be able to use the replacement pipelines together with
shaders, but that is not a hard requirement for me. We need an ARB and GLSL
replacement anyway.

---

I understand Henri's suggestion this way: Don't put the state table into the
shader backend(*), but select it separately. This way the ATIFS pipeline
replacement doesn't have to be a shader model. That way we can choose the ARB
shader backend and ATIFS state table on r200 cards, and use GLSL and ATIFS on
newer cards.

That way we get full shader support and still the advantages of the pipeline
replacement without using 3 inherited shader backends. Then the state table
could have some backend to enable / disable ATIFS like in this patch. (The
bottom line of the patch here is that we should enable the extension we
*use*, and not what happens to be available).

The state table would report the fixed function caps it has, and the shader
backend reports the programmable pipeline caps. GetDeviceCaps has to collect
the caps from multiple places, but that isn't an issue IMO.

Is that correct, or did I missunderstand something?

However, I see two issues with that:

*) In which place do we decide which program/shader to use in the end? If we
have an ARB fixed function replacement program and an ARB program generated
from a D3D shader? Currently this happens to work because ARB/GLSL override
NVTS and NVRC, but this fails if we have a shader and ffp implementation
using the same extension

*) What do we do if a shader implementation wants to overwrite some states for
its programmable pipeline work? For example take my GLSL clipplane patch: The
ARB shader backend disables clipplanes when a shader is used. This example is
about a driver bug, but there are others as well: ATIFS and NVTS with a pixel
shader implementation will have to override a part of the sampler setting(the
stuff that is currently done in activate_dimensions)

How would you deal with this?

Stefan

(*): Ivan and I agreed on a discussion on IRC that "shader backend" is not a
good name. The shader backends today are more a pipeline implementation
rather than a shader implemenation, and should be renamed accordingly. I did
not rename them yet because I am not 100% happy with the name "pipeline" yet.

Chris Robinson

2008-04-11 15:22:20 UTC

Post by Stefan DÃ¶singer
Alexandre didn't commit the patch, I think we should come to an agreement
on this issue, otherwise it is going to come up again and again.

For my own 2 cents on the issue, I think the fixed-function replacements
should work through the selected shader backend. For example, if someone has
GLSL enabled, the fixed-function replacement should be done through GLSL, and
if ARB is used, it should done through ARB shaders. In this way, the atifs
fixed-function replacement should never be used when the ARB or GLSL shader
backend is used (one shader mode shouldn't supercede another while the
program is running).

This would ensure compatibility on the OpenGL side when D3D's fixed-function
pipeline is mixed with D3D shaders, still leave room for a proper atifs/nvts
shader backend, and not introduce problems related to how one shader backend
might interact with another. It honestly doesn't make sense to me why wined3d
should just jump in and start using atifs/nvts shaders for fixed-function
replacements, when the selected shader mode is ARB or GLSL. And really,
atifs/nvts should be last-ditch resorts because they are only available on
selected vendors, while ARB and GLSL are more widespread (eg. on Intel,
MESA).

Ideally, in my mind, wined3d should autoselect one shader mode (unless a
registry option is present to force one). It would be selected based on
availability:

* if nv vertex/fragment programs are available, they should be used as a
suppliment to the ARB shader backend (giving up to SM3.0 (4.0, actually),
with functionality that most closely follows D3D)
* else if GLSL is available, that should be used (giving SM2.0/3.0)
* else if ARB shaders are available, then they should be used (giving PS1.1
and VS1.4)
* else if atifs/nvts are available, use them (giving SM1.x)
* else disable D3D shaders

In all cases, it should pick one and stick with it through the life of the
process, for both D3D shaders and D3D fixed-function. Don't use atifs if ARB
is available, don't use GLSL if nv-supplimented ARB is available, etc.

Though, perhaps I'm misunderstanding the whole issue, and I'm spouting
non-sense..

Stefan Dösinger

2008-04-11 15:35:21 UTC

Post by Chris Robinson
In all cases, it should pick one and stick with it through the life of the
process, for both D3D shaders and D3D fixed-function. Don't use atifs if
ARB is available, don't use GLSL if nv-supplimented ARB is available, etc.

There are three issues that lead to this discussion:

*) We may want to mix shader backends. GLSL is a pain on MacOS at least on the
vertex side, but it works okish on the fragment shader side. It may be
helpful to compile a ARB vertex shader and a GLSL pixel shader. Also an ARB
replacement may be faster there because the GLSL compiler on MacOS often
creates slow code.

*) We have an NVTS and ATIFS replacement now, but no ARB or GLSL fragment
processing code, and it would be cool if we can make use of the ATIFS code on
newer cards as well. NVTS currently works because it is included in the
baseshader/fixed function backend.

*) It's also about getting the design right in principle, and finding a design
that doesn't need an entire rework when dx10 and geometry shaders are
introduced.

Chris Robinson

2008-04-11 18:41:57 UTC

Post by Stefan DÃ¶singer
*) We may want to mix shader backends. GLSL is a pain on MacOS at least on
the vertex side, but it works okish on the fragment shader side. It may be
helpful to compile a ARB vertex shader and a GLSL pixel shader. Also an ARB
replacement may be faster there because the GLSL compiler on MacOS often
creates slow code.

As far as I can see, OSX's broken drivers are the only reason you'd not want
to use GLSL when nothing else of better capability is available. For that
case, there's two options: for nVidia users, extend ARB shaders with nv
programs and use them (likely suprassing GLSL performance), and for other
cards, bug Apple for proper drivers. IMHO, I don't think WineD3D should be
that hindered and cause WineD3D issues with other vendors because Apple
insists on making drivers themselves, and doing a poor job at it.

Post by Stefan DÃ¶singer
*) We have an NVTS and ATIFS replacement now, but no ARB or GLSL fragment
processing code, and it would be cool if we can make use of the ATIFS code
on newer cards as well. NVTS currently works because it is included in the
baseshader/fixed function backend.

Why would it be cool to use ATIFS on newer cards? It'd be cool if we had
something lean, sleek, and working. Mixing output shader modes is just asking
for problems, IMO.

Post by Stefan DÃ¶singer
*) It's also about getting the design right in principle, and finding a
design that doesn't need an entire rework when dx10 and geometry shaders
are introduced.

As it is, I imagine WineD3D will need to be reworked for D3D10 no matter what
we do here. Unless someone has a working design plan for what changes are
needed by D3D10 for WineD3D, we're ultimately just guessing about what may
work best (not to mention it'll be using GL3 instead of the current stuff, so
whatever's picked may not even be efficient API-wise for it).

Stefan Dösinger

2008-04-11 23:41:06 UTC

Post by Chris Robinson
As far as I can see, OSX's broken drivers are the only reason you'd not
want to use GLSL when nothing else of better capability is available. For
that case, there's two options: for nVidia users, extend ARB shaders with
nv programs and use them (likely suprassing GLSL performance), and for
other cards, bug Apple for proper drivers. IMHO, I don't think WineD3D
should be that hindered and cause WineD3D issues with other vendors because
Apple insists on making drivers themselves, and doing a poor job at it.

I don't know who exactly writes the Apple drivers, but the Mac drivers have a
few identical issues as the Linux ATI driver.

We have to face that driver bugs are reality. I think we are having more
issues in form of user complaints due to the driver<->wined3d connection than
the wined3d<->application one. I doubt Apple is going to fix their vertex
shader bugs anytime soon. They come up weekly or monthly on their development
lists, no official statement yet.

Post by Chris Robinson
Why would it be cool to use ATIFS on newer cards? It'd be cool if we had
something lean, sleek, and working. Mixing output shader modes is just
asking for problems, IMO.

For one part it is helpful for testing, and it is a temporary solution until
we have an ARB and GLSL pipeline replacement(post 1.0 most likely)

Post by Chris Robinson
As it is, I imagine WineD3D will need to be reworked for D3D10 no matter
what we do here. Unless someone has a working design plan for what changes
are needed by D3D10 for WineD3D, we're ultimately just guessing about what
may work best (not to mention it'll be using GL3 instead of the current
stuff, so whatever's picked may not even be efficient API-wise for it).

I have a rough design plan

-> Replace the current d3d9 shader language with an intermediate language
generated by the shader compiler and assembler library I am working on(see
the other thread). d3d9 will feed a d3d9 shader into that lib and get a
parsed shader back that it passes to wined3d, d3d10 works similarly.

-> Extending pixel shader and vertex shader support to 4.0 is fairly easy with
whatever design we choose. Just extend the existing code.

-> Adding geometry shaders should work with the one-in-all shader backend. Add
geometry shaders next to pixel and vertex and link all 3 together. I don't
know the details with Henri's suggestion, but I think we'd just add an extra
geometry state handler and extend the shader backend(or add a new shader
backend if we split things up)

-> OpenGL 3.0 needs the finalized GL 3 spec. If GLSL is unmodified(I think
so), we can just reuse the existing GLSL shader backend. Otherwise we have to
create a new one, and depending on how similar they are maintain it as a
separate backend or overwrite a few parts via inheritance.

-> The non-shader parts of GL3 need a separation of the management code and GL
specific code. For the state setters, we either need an entirely different
set of state setters, or we can reuse a fully-GLSLed pipeline implementation.
For textures, surfaces, buffers, devices we can move the management code into
a base class and inherit a GL2 and GL3 class, similarly to
BaseSurface-D3DSurface-GDISurface

-> A major API rewrite in D3D10 is the new resource model. Basically we have
to make IndexBuffers, VertexBuffers and Surfaces general "Buffers". IB and VB
can be just replaced by the buffer d3d <= 9 wise, for surfaces we have to
create a derived class where we add D3D <= 9 surface specific methods like
Blt, the Container texture, Palettes, etc. Textures should be replaced with
the more flexible ShaderResourceView objects. (I don't know if one buffer can
be used in more than one resource view object. If yes than that might be an
issue, as it means that a surface can be part of many textures e.g. as a mip
level)

-> Vertex declarations? They are some shader resource view as well now, I have
to look at them more in detail. Maybe we can just reuse them unmodified.

-> Pixel formats: We need tests to find out what the d3d10 formats need, but
basically replace the WINED3DFORMAT enum with a d3d10-like one and add
private values for d3d <= 9 formats missing in d3d10 like P8, X4R4G4B4. The
DXGI_FORMATs are nicer than the D3D9 ones IMO.

-> One open issue: Where do we implement srgb reading switching? In WineD3D or
D3D9?

There will surely be more open issues, this is just a rough plan.

H. Verbeet

2008-04-12 03:07:49 UTC

Post by Stefan DÃ¶singer
We have to face that driver bugs are reality. I think we are having more
issues in form of user complaints due to the driver<->wined3d connection than
the wined3d<->application one. I doubt Apple is going to fix their vertex
shader bugs anytime soon. They come up weekly or monthly on their development
lists, no official statement yet.

This is probably more of a concern to CW than Wine in general, but I
guess it's a somewhat valid issue. The issue then becomes more one of
how much ugliness we're willing to accept in the Wine tree in order to
work around obviously broken drivers.

I thought we decided on IRC that it would make more sense to call the
shader library from wined3d rather than d3d8, d3d9 and d3d10.

Post by Stefan DÃ¶singer
-> Adding geometry shaders should work with the one-in-all shader backend. Add
geometry shaders next to pixel and vertex and link all 3 together. I don't
know the details with Henri's suggestion, but I think we'd just add an extra
geometry state handler and extend the shader backend(or add a new shader
backend if we split things up)

A geometry state handler would only make sense if you would want to
add a "fixed function geometry processing replacement". Just
supporting the shaders only requires adding them to the shader backend
(split up or not).

Post by Stefan DÃ¶singer
-> OpenGL 3.0 needs the finalized GL 3 spec. If GLSL is unmodified(I think
so), we can just reuse the existing GLSL shader backend. Otherwise we have to
create a new one, and depending on how similar they are maintain it as a
separate backend or overwrite a few parts via inheritance.

AFAIK GLSL will be mostly unchanged, although supposedly it will also
allow things like bindable uniforms and binding separate programs for
vertex/fragment/geometry stages. This would allow us to get rid of our
main use of GLSL linking.

Post by Stefan DÃ¶singer
-> The non-shader parts of GL3 need a separation of the management code and GL
specific code. For the state setters, we either need an entirely different
set of state setters, or we can reuse a fully-GLSLed pipeline implementation.
For textures, surfaces, buffers, devices we can move the management code into
a base class and inherit a GL2 and GL3 class, similarly to
BaseSurface-D3DSurface-GDISurface

I think going the GLSL way makes sense there. Anything supporting GL3
should fully support GLSL already, and I'm not even sure GL3 will
support a traditional fixed function pipeline.

Stefan Dösinger

2008-04-12 12:25:12 UTC

Post by H. Verbeet
I thought we decided on IRC that it would make more sense to call the
shader library from wined3d rather than d3d8, d3d9 and d3d10.

From where we call the shader library isn't all that important to me. For me
it's mainly important that the shader library parses the d3d9 and d3d10
shader and the wined3d GL shader generator works on Wine's internal shader
representation

Post by H. Verbeet
I think going the GLSL way makes sense there. Anything supporting GL3
should fully support GLSL already, and I'm not even sure GL3 will
support a traditional fixed function pipeline.

What I meant is that it depends on how shaders will look in GL 3.0. If it is
GLSL as it is now, we can just use the GLSL shader backend. If it is vastly
different we'll need a new one.

H. Verbeet

2008-04-11 20:42:17 UTC

Post by Stefan DÃ¶singer
1) The state table should be selectable based on the available opengl features
and possibly registry settings. I think we all agree on that

Up to the level of being able to use different state handlers in
different situations. I don't necessarily agree with copying and
swapping the entire table in one piece.

Post by Stefan DÃ¶singer
2) We want a fixed function vertex and fragment pipeline replacement with ARB
and GLSL

Only GLSL is a requirement for me. ARB could be nice, but is probably redundant.

Post by Stefan DÃ¶singer
3) We want to be able to support pixel shaders with atifs and nvts. I don't
know if that will ever be implemented, but if we choose a design that makes
this hard or impossible that's not going to help

This is not a hard requirement for me, although I certainly think we
should be able to create a design that allows for this.

Post by Stefan DÃ¶singer
4) Even if we have an ARB/GLSL replacement, it should be possible to use D3D
shaders but still use the opengl fixed function pipeline

Agreed.

Post by Stefan DÃ¶singer
5) A nice to have is to be able to use the replacement pipelines together with
shaders, but that is not a hard requirement for me. We need an ARB and GLSL
replacement anyway.

I assume you mean atifs & nvrc specifically here, in which case this
is probably a bit more important than "nice to have". The most tricky
situation to support here will be cards that support vertex shaders,
but not fragment shaders. If we want to support pixel shaders using
atifs / nvrc we have to allow mixing GLSL / ARB with nvrc / atifs for
it to be of any use.

Post by Stefan DÃ¶singer
I understand Henri's suggestion this way: Don't put the state table into the
shader backend(*), but select it separately. This way the ATIFS pipeline
replacement doesn't have to be a shader model. That way we can choose the ARB
shader backend and ATIFS state table on r200 cards, and use GLSL and ATIFS on
newer cards.
That way we get full shader support and still the advantages of the pipeline
replacement without using 3 inherited shader backends. Then the state table
could have some backend to enable / disable ATIFS like in this patch. (The
bottom line of the patch here is that we should enable the extension we
*use*, and not what happens to be available).
The state table would report the fixed function caps it has, and the shader
backend reports the programmable pipeline caps. GetDeviceCaps has to collect
the caps from multiple places, but that isn't an issue IMO.
Is that correct, or did I missunderstand something?

My main point is that a fixed function replacement and the shader
backend should be two different things, at least interface wise. My
suggestion for an implementation would be somethingg like this:

- Split the state table in three pieces: vertex processing states
(eg. lighting, materials, etc), fragment processing states (eg.
texture stage stuff) and other states (eg. blending states). Aside
from being clearer this allows you to swap these parts out
independently from each other and possibly skip applying them as a
whole in case a shader is active for that part of the pipeline (I
imagine this could have some performance advantages as well, although
I'm not sure how much).
- Allow the state table more explicitly to have some state of its
own. It currently has the list of dirty states of course, but it's
more managed by the device than by the statetable as such. In effect
this would introduce a state management object.

The basic setup would be something like this:

struct shader_backend_t
{
void (*shader_select)(void *data, BOOL usePS, BOOL useVS);
void (*shader_load_constants)(void *data, char usePS, char useVS);
...
};

struct state_management_t
{
void (*mark_state_dirty)(void *data, DWORD state);
void (*apply_states)(void *data);
...
};

struct IWineD3DDeviceImpl {
...
struct state_management_t vertex_state_manager;
struct state_management_t fragment_state_manager;
struct state_management_t other_state_manager;
struct shader_backend_t shader_backend;

void *vertex_private_data;
void *fragment_private_data;
void *other_private_data;
void *shader_private_data;
...
};

/* Usage */
device->vertex_state_manager->mark_state_dirty(device->vertex_private_data,
state);
device->fragment_state_manager->mark_state_dirty(device->fragment_private_data,
state);
etc.
...
if (!use_vs) {
device->vertex_state_manager->apply_states(device->vertex_private_data);
}
if (!use_ps) {
device->fragment_state_manager->apply_states(device->fragment_private_data);
}
device->shader_backend->select_shader(device->shader_private_date,
use_vs, use_ps);

Some example configurations:

GLSL FFP, GLSL shaders
device = {glsl_vsm, glsl_fsm, ff_osm, glsl_shader_backend,
glsl_private_data, glsl_private_data, ff_private_data,
glsl_private_data};

Fixed function vertex, ATIFS fragment processing, GLSL shader backend
device = {ff_vsm, atifs_fsm, ff_osm, glsl_shader_backend,
ff_private_data, atifs_private_data, ff_private_data,
glsl_private_data};

ARB vertex FFP, Fixed function fragment, ARB shaders
device = arb_vsm, ff_fsm, ff_osm, arb_shader_backend,
arb_ffp_private_data, ff_private_data, ff_private_data,
arb_shader_private_data};

This doesn't support mixing eg. ARB vertex shaders with NVRC pixel
shaders, but it would "simply" be a matter of splitting up the shader
backend in a similar way to the state table. Important to note here is
that the private data could be shared between fixed function
replacements and the shader backend, like in the case of GLSL. I could
imagine using a structure more similar to state_management_t for the
shader backend as well.

Of course there are a number of variations you can make on this
(perhaps most notably storing the call tables and private data in the
same struct), but I think the important parts are the split between
vertex/fragment/other/shader, and the state management having private
data.

Post by Stefan DÃ¶singer
*) In which place do we decide which program/shader to use in the end? If we
have an ARB fixed function replacement program and an ARB program generated
from a D3D shader? Currently this happens to work because ARB/GLSL override
NVTS and NVRC, but this fails if we have a shader and ffp implementation
using the same extension

The private data between the FFP replacement and the shader backend
can be shared. That means our general GLSL management stuff can see
you've got eg. a FFP vertex shader and a pixel shader and link them
together.

Post by Stefan DÃ¶singer
*) What do we do if a shader implementation wants to overwrite some states for
its programmable pipeline work? For example take my GLSL clipplane patch: The
ARB shader backend disables clipplanes when a shader is used. This example is
about a driver bug, but there are others as well: ATIFS and NVTS with a pixel
shader implementation will have to override a part of the sampler setting(the
stuff that is currently done in activate_dimensions)
How would you deal with this?

I guess this will mostly be an issue for states that end up being in
the "other" category. In general I think that as long as it's only for
a couple of states we can just keep doing more or less what we
currently do and check if the relevant state management backend is
used.

Stefan Dösinger

2008-04-11 22:57:57 UTC

Am Freitag, 11. April 2008 22:42:17 schrieb H. Verbeet:

I did not yet read it in depth, so I am just replying to a few concerns I
spotted. I'll look at it more in-depth tomorrow.

Post by Stefan DÃ¶singer
1) The state table should be selectable based on the available opengl
features and possibly registry settings. I think we all agree on that

Up to the level of being able to use different state handlers in
different situations. I don't necessarily agree with copying and
swapping the entire table in one piece.

Agreed, we just want to be able to swap states, in which way is open.

Post by Stefan DÃ¶singer
2) We want a fixed function vertex and fragment pipeline replacement
with ARB and GLSL

Only GLSL is a requirement for me. ARB could be nice, but is probably redundant.

Intel cards? Also GLSL has the problem with the link times.

Post by Stefan DÃ¶singer
5) A nice to have is to be able to use the replacement pipelines
together with shaders, but that is not a hard requirement for me. We need
an ARB and GLSL replacement anyway.

The current implementation allows that, we're using ATIFS+ARBVP

Post by H. Verbeet
Aside
from being clearer this allows you to swap these parts out
independently from each other and possibly skip applying them as a
whole in case a shader is active for that part of the pipeline (I
imagine this could have some performance advantages as well, although
I'm not sure how much).

For the performance, it depends on the app. If we skip applying (redundant)
fixed function settings when a shader is used, we have to reapply all of them
the shader use might have changed when the shader is deactivated. It's a
decision between making shader on-off switches cheap vs filtering out
redundant state changes done by the app. Currently there is no additional
fixed function pipeline cost involved when using shaders as long as the app
doesn't touch the ffp states.

Post by H. Verbeet
struct IWineD3DDeviceImpl {
...
struct state_management_t vertex_state_manager;
struct state_management_t fragment_state_manager;
struct state_management_t other_state_manager;
struct shader_backend_t shader_backend;
void *vertex_private_data;
void *fragment_private_data;
void *other_private_data;
void *shader_private_data;
...
};

One of the reasons for using a single state table was that we have
interconnections between the vertex states, fragment states and other states.
For example, where would you put the vertex buffer loading? Is that
an "other" state, or a vertex state? The FVF and the vertex shader influence
if we're entering the FF or shader vertex processing codepaths. The FVF
affects fogging, which is a vertex state in some shader implementations(nvts,
atifs), but implemented in pixel shaders in GLSL and ARB. The texture
transform flags are a fixed function vertex state on paper, but we have to
compile the D3DTTFF_PROJECTED flag into the fragment shader. (we already do
so, that's documented on the msdn, so clearly a missdesign on Microsoft's
side). There are many more examples of vertex-fragment-other state
interconnections.

Post by H. Verbeet
This doesn't support mixing eg. ARB vertex shaders with NVRC pixel
shaders, but it would "simply" be a matter of splitting up the shader
backend in a similar way to the state table. Important to note here is
that the private data could be shared between fixed function
replacements and the shader backend, like in the case of GLSL. I could
imagine using a structure more similar to state_management_t for the
shader backend as well.

The unsplit shader backend is the current d3d shader implementation + Ivan's
pipeline linker object. If you split them up, where do you set GLSL shader
constants?

Also note that the GLSL pixel shaders depend on the type of vertex
processing(non-GLSL or GLSL) to load the 3.0 varyings correctly. We could get
rid of that by requiring GLSL vp, but that would break requirement 4, using
the GL fixed function pipeline.

My concern is that if you split vertex processing and fragment processing up,
you need a linker object that deals with linking. This linker object has know
about the vertex and fragment processing state handlers and tables, thus
creating a special linker for each vs-ps-vff-ffp-other combination. I don't
have any objections against that in principle, but I am afraid that due to
the high interconnection between vertex, fragment and other states we would
end up with implementing most things in the linker.

H. Verbeet

2008-04-12 02:27:07 UTC

Post by Stefan DÃ¶singer
2) We want a fixed function vertex and fragment pipeline replacement
with ARB and GLSL

Only GLSL is a requirement for me. ARB could be nice, but is probably redundant.

Intel cards? Also GLSL has the problem with the link times.

I'm currently not too bothered about Intel cards, although that might
change in the future. Either way, it's certainly possible to create an
ARB implementation, it's more a matter of priority.

I'm not sure I follow your reasoning here. Just to be clear, I'm only
talking about applying states, not marking them dirty. In the worst
case it simply means you apply them when switching back to fixed
function instead of in the current draw call, but you could
potentially save some redundant state applications while the shader is
active. However, what I was actually wondering about is if it might be
more expensive on some drivers to do fixed function state changes
while a shader is active.

Post by Stefan DÃ¶singer
One of the reasons for using a single state table was that we have
interconnections between the vertex states, fragment states and other states.
For example, where would you put the vertex buffer loading? Is that
an "other" state, or a vertex state? The FVF and the vertex shader influence
if we're entering the FF or shader vertex processing codepaths. The FVF
affects fogging, which is a vertex state in some shader implementations(nvts,
atifs), but implemented in pixel shaders in GLSL and ARB. The texture
transform flags are a fixed function vertex state on paper, but we have to
compile the D3DTTFF_PROJECTED flag into the fragment shader. (we already do
so, that's documented on the msdn, so clearly a missdesign on Microsoft's
side). There are many more examples of vertex-fragment-other state
interconnections.

Anything that gets ignored when a vertex shader is active gets put in
the vertex states, anything that gets ignored when a fragment shader
is active should be part of the fragment states. Resource loading
would be part of the "other" states. Most of the connections you
mention appear to be connections on the D3D side, these would have no
consequences for a separation on the GL side of things. Iow, it's
perfectly valid for a state in the vertex block and a state in the
fragment block to read from the same D3D state.

The unsplit shader backend is the current d3d shader implementation + Ivan's
pipeline linker object. If you split them up, where do you set GLSL shader
constants?

In the shader backend. Splitting things up doesn't change much there.
You could decide to not even expose constant loading though, and
instead just mark the state dirty on the shader backend. Deciding when
to load constants then becomes the responsibility of the shader
backend.

Post by Stefan DÃ¶singer
Also note that the GLSL pixel shaders depend on the type of vertex
processing(non-GLSL or GLSL) to load the 3.0 varyings correctly. We could get
rid of that by requiring GLSL vp, but that would break requirement 4, using
the GL fixed function pipeline.

Stefan Dösinger

2008-04-12 11:34:12 UTC

Post by H. Verbeet
Anything that gets ignored when a vertex shader is active gets put in
the vertex states, anything that gets ignored when a fragment shader
is active should be part of the fragment states. Resource loading
would be part of the "other" states.

GL named arrays get ignored when a vertex shader is in use, unless the shader
explicitly uses them...

Post by H. Verbeet
Most of the connections you
mention appear to be connections on the D3D side, these would have no
consequences for a separation on the GL side of things. Iow, it's
perfectly valid for a state in the vertex block and a state in the
fragment block to read from the same D3D state.

So if e.g. the vertex declaration is changed you would dirtify many states:
-> misc stream sources
-> vertex shader(use it or not?)
-> Fog
-> Fixed function vertex processing matrices(rhw vertices or not)
-> texture transforms
-> (a few others as well)
I have no problem with doing that, changing the vdecl is an expensive business
no matter what we do, just asking to make sure I understand what you mean.
How do you control which gl states are dirtified by which d3d state? This
will depend on the combination of backends you use.

There are quite a few opengl connections as well, although they work
differently.It's more the various interactions between shader extensions. In
quite a few cases the fragment processing implementation has to configure the
vertex processing correctly to feed it in the right way, and also the other
way round.

For example, to stick to the texture transform flags. Let's consider we're
using fixed function D3D vertex processing in whatever GL extension. Now
enter fragment processing and D3DTTFF_PROJECTED:

-> With fixed function GL or a NVTS fixed function replacement we have to make
sure that the 4th coordinate is 1.0 to disabe the GL division if
TTFF_PROJECTED is not set, and if it is set with TTFF_COUNT3 make sure that
the 3rd coord is copied to the 4th
-> With GLSL or ARB fixed function replacement we can handle the lack of
TTFF_PROJECTED properly, but not COUNT3
-> With ATIFS we can handle everything properly in the replacement shader
-> With an ARB, GLSL or ATIFS D3D shader we don't need any special texture
transform fixups
-> With an NVTS D3D shader we have to take care about disabling projected
textures in vertex processing again

That means different fragment processing implementations have different vertex
processing requirements. Now you could make that a flag in the fragment
processing and pixel shader implementation. You'd need 4
flags(nonshader_unprojected, shader_unprojected, nonshader_count3,
shader_count3). Are you sure the flags won't grow out of control?

Another example is fogging. Fog is overwritten by ARB and GLSL, but not ATIFS
and NVTS(as far as I can see). Is fog a vertex or fragment state? How do you
share the quite complex fog applying code between the ATIFS, NVTS and GL
fixed function implementation if you make it a fragment state?

Post by Stefan DÃ¶singer
Also note that the GLSL pixel shaders depend on the type of vertex
processing(non-GLSL or GLSL) to load the 3.0 varyings correctly. We
could get rid of that by requiring GLSL vp, but that would break
requirement 4, using the GL fixed function pipeline.

H. Verbeet

2008-04-12 13:06:38 UTC

GL named arrays get ignored when a vertex shader is in use, unless the shader
explicitly uses them...

It doesn't get ignored, you still do the upload and the data is still
available should the shader choose to use it. Still, I probably
should've phrased it as "functionality that gets replaced by a vertex
/ fragment shader".

-> misc stream sources
-> vertex shader(use it or not?)
-> Fog
-> Fixed function vertex processing matrices(rhw vertices or not)
-> texture transforms
-> (a few others as well)
I have no problem with doing that, changing the vdecl is an expensive business
no matter what we do, just asking to make sure I understand what you mean.
How do you control which gl states are dirtified by which d3d state? This
will depend on the combination of backends you use.

I'm not sure about the exact splitup you're using here, but it would
mean potentially dirtifying multiple states, yes. I could imagine it
as simply dirtifying the vertexdeclaration state on both the vertex
and fragment state tables.

Post by Stefan DÃ¶singer
There are quite a few opengl connections as well, although they work
differently.It's more the various interactions between shader extensions. In
quite a few cases the fragment processing implementation has to configure the
vertex processing correctly to feed it in the right way, and also the other
way round.
For example, to stick to the texture transform flags. Let's consider we're
using fixed function D3D vertex processing in whatever GL extension. Now
-> With fixed function GL or a NVTS fixed function replacement we have to make
sure that the 4th coordinate is 1.0 to disabe the GL division if
TTFF_PROJECTED is not set, and if it is set with TTFF_COUNT3 make sure that
the 3rd coord is copied to the 4th
-> With GLSL or ARB fixed function replacement we can handle the lack of
TTFF_PROJECTED properly, but not COUNT3
-> With ATIFS we can handle everything properly in the replacement shader
-> With an ARB, GLSL or ATIFS D3D shader we don't need any special texture
transform fixups
-> With an NVTS D3D shader we have to take care about disabling projected
textures in vertex processing again
That means different fragment processing implementations have different vertex
processing requirements. Now you could make that a flag in the fragment
processing and pixel shader implementation. You'd need 4
flags(nonshader_unprojected, shader_unprojected, nonshader_count3,
shader_count3). Are you sure the flags won't grow out of control?

You only need two. One to toggle writing 1.0 to the 4th coordinate
when needed, and one to toggle copying the 3th coordinate to the 4th
when needed. It would certainly beat doing an extension check for
every possible backend. Right now we always do the fixup, so in that
respect it would be an improvement as well.

Post by Stefan DÃ¶singer
Another example is fogging. Fog is overwritten by ARB and GLSL, but not ATIFS
and NVTS(as far as I can see). Is fog a vertex or fragment state? How do you
share the quite complex fog applying code between the ATIFS, NVTS and GL
fixed function implementation if you make it a fragment state?

That depends on the fog type. Vertex fog is a vertex state, fragment
fog is a fragment state. Changing the type would obviously have
interactions with both parts of the pipeline. As for applying the
state, there's no reason different implementations can call a common
function in eg. utils.c to calculate things like the fog mode, type,
start, end, etc. One could argue it doesn't belong in state_fog() in
the first place.

Stefan Dösinger

2008-04-12 15:05:26 UTC

Post by H. Verbeet
It doesn't get ignored, you still do the upload and the data is still
available should the shader choose to use it. Still, I probably
should've phrased it as "functionality that gets replaced by a vertex
/ fragment shader".

But since D3D vertex shaders always read the numbered arrays and fixed
function always reads the named arrays the named arrays get de-facto replaced
as far as we're concerned.

Post by Stefan DÃ¶singer
That means different fragment processing implementations have different
vertex processing requirements. Now you could make that a flag in the
fragment processing and pixel shader implementation. You'd need 4
flags(nonshader_unprojected, shader_unprojected, nonshader_count3,
shader_count3). Are you sure the flags won't grow out of control?

I'm working on a patch that makes the atifs shader code take care of applying
the texture transform matrix. That works naturally without any flags or
backend check if you don't try to split vertex and fragment processing by
force.

Post by H. Verbeet
That depends on the fog type. Vertex fog is a vertex state, fragment
fog is a fragment state. Changing the type would obviously have
interactions with both parts of the pipeline.

On the GL side both vertex and fragment fog are applied to the same GL state.
Using a ARBFP or GLSL fragment shader replaces vertex fog as well, so you'll
have to implement both types in the fragment processing replacement.

The fog settings depend on the vertex decl(XYZ vs XYZRHW) and the shader
properties("foggy shader"). That means the fragment processing code(pixel
shader and ffp replacement) would look at the core of the vertex processing
settings. Doesn't that defeat separating them in the first place?

(I understand that you want to split the state types on the GL side, not the
D3D side. But when you split applying of one D3D state in 3 pieces, I fail to
see how it is cleaner)

You didn't answer how you plan to implement state dirtification. You have a
SetRenderState call that changes render state X. Which implementation
state(s) do you dirtify? Ie:
device->fragment_state_manager->mark_state_dirty(device->fragment_private_data,

Post by H. Verbeet
state);

Where does "state" come from?

Post by H. Verbeet
if (!use_vs) {
device->vertex_state_manager->apply_states(device->vertex_private_data);
}
...

This would be done where?

Post by Stefan DÃ¶singer
Where would you write the TEXCOORD0-7 and D3DCOLOR0 and 1 varyings from
a GLSL vertex shader, and where do you read them from in the pixel
shader? Keep indirect varying addressing in the pshader in mind.

Just to be clear, with "GLSL vertex shader" you mean "GLSL vertex
processing replacement", right? A vertex processing replacement shader
would write to the regular fixed function output, ie gl_FrontColor,
gl_FrontSecondaryColor, gl_TexCoord[], etc. The fragment shader would
read them the same way as it does when paired with fixed function or
pre-3.0 vertex shaders.

No, I mean a D3D (3.0) vertex shader that is translated and running via GLSL.
Currently the pixel shader decides where the vertex shader(all versions)
writes the varyings to(generate_param_reorder_function). If a fixed function
vertex replacement shader and pre-3.0 shader writes to the regular fixed
function output, how would you run a pixel shader like the one in
vshader_version_varying_test() in visual.c together with a 1.x or 2.0 vertex
shader, or XYZRHW data from a GLSL vertex pipeline replacement shader? (With
the fixed function GL vertex pipeline we're screwed, but we aren't
necessarily screwed with a GLSL vertex pipeline replacement)

Post by H. Verbeet
Pixel shaders + fragment processing replacement doesn't make sense.
Either a GLSL vertex processing replacement + GLSL pixel shader or a
GLSL vertex shader + GLSL fragment processing replacement would work
though. The "GLSL pipeline object" would know if it's being used as
vertex and/or fragment replacement and link everything together. In
case atifs is used no linking is required.

What I meant was that: Scenario 1:
GLSL is used for pixel shaders and fragment pipeline replacement. How do I
find out if I have to link the fragment replacement GLSL shader or the pixel
shader GLSL shader into my program I activate? -> Share private data between
GLSL shader backend and GLSL fixed function object

Followup scenario 2: We don't have a GLSL fixed function backend yet. I have
an ATI X1600 card, I am using GLSL for pixel shaders, and I am using ATIFS
for the pipeline replacement. The GLSL pixel shader code reads the GLSL
fragment pipeline replacement private data. -> This GLSL fragment replacement
private data is either nonexistant or hanging around without code maintaining
it. How do we deal with that?

Post by Stefan DÃ¶singer
They are pretty widespread and are even used in the EEEPC, so I think
dealing with these cards will become a priority soon, at least for me.
Unfortunately the driver sucks in terms of stability and performance.

Are those cards powerful enough to support a fixed function replacement?

Afaik the older cards do not support vertex processing, it's done in software,
so it depends on the driver. On the fragment side they are pretty solid and
work only with programmable fragment processing internally. The newer X3100
only supports programmable vertex and fragment processing in hardware, so a
pipeline replacement is either done by us or the driver. It might be powerful
enough for GLSL as well though.

H. Verbeet

2008-04-12 17:55:53 UTC

Post by Stefan DÃ¶singer
But since D3D vertex shaders always read the numbered arrays and fixed
function always reads the named arrays the named arrays get de-facto replaced
as far as we're concerned.

Which is completely irrelevant for classifying the operation.
Please read the ARB_vertex_program spec, issue 3 and the
ARB_fragment_program spec, issue 13 to get a better idea of what
shaders replace and what they don't.

Post by H. Verbeet
You only need two. One to toggle writing 1.0 to the 4th coordinate
when needed, and one to toggle copying the 3th coordinate to the 4th
when needed. It would certainly beat doing an extension check for
every possible backend. Right now we always do the fixup, so in that
respect it would be an improvement as well.

If the issue is that you've got an interest in keeping the existing
structure because you've already written code on top of it there's not
much point in having this discussion in the first place. If that's not
the issue, I'd like to mention that the whole point of having
interfaces is that you can avoid ugliness like setting vertex
processing state in the fragment processing part of the pipeline.
There's also nothing forceful about splitting your pipeline in vertex
and fragment processing, that's how the hardware works, it's how GL
works, and it's how D3D works.

Post by H. Verbeet
That depends on the fog type. Vertex fog is a vertex state, fragment
fog is a fragment state. Changing the type would obviously have
interactions with both parts of the pipeline.

The fog blending is a fragment operation, yes. Coordinate calculation
depends on the coordinate source and fog hint, and can happen either
during vertex processing, fragment processing or not at all if fog
coordinates are specified.

Post by Stefan DÃ¶singer
The fog settings depend on the vertex decl(XYZ vs XYZRHW) and the shader
properties("foggy shader"). That means the fragment processing code(pixel
shader and ffp replacement) would look at the core of the vertex processing
settings. Doesn't that defeat separating them in the first place?

No, like you correctly mention below, the point is to separate the
implementation, not where the implementation gets its information from
on the D3D side.

Post by Stefan DÃ¶singer
(I understand that you want to split the state types on the GL side, not the
D3D side. But when you split applying of one D3D state in 3 pieces, I fail to
see how it is cleaner)

This would hardly be something new. You mentioned the
vertexdeclaration state yourself that modifies multiple GL states, and
it's hardly the only one.

Post by Stefan DÃ¶singer
You didn't answer how you plan to implement state dirtification. You have a
SetRenderState call that changes render state X. Which implementation
device->fragment_state_manager->mark_state_dirty(device->fragment_private_data,

Post by H. Verbeet
state);

Where does "state" come from?

I don't remember you asking, but I see no reason to change the basic
way dirtification is currently done.

Post by H. Verbeet
if (!use_vs) {
device->vertex_state_manager->apply_states(device->vertex_private_data);
}
...

This would be done where?

ActivateContext, CTXUSAGE_DRAWPRIM. (Yes it should probably be part of
the context, not the device, my bad.)

Post by Stefan DÃ¶singer
No, I mean a D3D (3.0) vertex shader that is translated and running via GLSL.
Currently the pixel shader decides where the vertex shader(all versions)
writes the varyings to(generate_param_reorder_function). If a fixed function
vertex replacement shader and pre-3.0 shader writes to the regular fixed
function output, how would you run a pixel shader like the one in
vshader_version_varying_test() in visual.c together with a 1.x or 2.0 vertex
shader, or XYZRHW data from a GLSL vertex pipeline replacement shader? (With
the fixed function GL vertex pipeline we're screwed, but we aren't
necessarily screwed with a GLSL vertex pipeline replacement)

The pipeline object would certainly have access to all the information
required to create such a reordering function, but I fail to see how
it's relevant at this point. The idea here certainly isn't to
magically fix all state and shader related issues in wined3d, it's
just about making fixed function replacement shaders possible in a
maintainable way.

Post by Stefan DÃ¶singer
GLSL is used for pixel shaders and fragment pipeline replacement. How do I
find out if I have to link the fragment replacement GLSL shader or the pixel
shader GLSL shader into my program I activate? -> Share private data between
GLSL shader backend and GLSL fixed function object
Followup scenario 2: We don't have a GLSL fixed function backend yet. I have
an ATI X1600 card, I am using GLSL for pixel shaders, and I am using ATIFS
for the pipeline replacement. The GLSL pixel shader code reads the GLSL
fragment pipeline replacement private data. -> This GLSL fragment replacement
private data is either nonexistant or hanging around without code maintaining
it. How do we deal with that?

Not exactly. In the first place, if there's no GLSL fixed function
implementation you'll never have to link to it, simple as that. Now in
case there *is* a GLSL fixed function replacement but it isn't used
(for whatever reason), that simply means the GLSL pipeline object's
private data will tell it it doesn't have to link anything (or rather,
*not* tell it it has to link to the ffp replacement).

There really is no distinction between "pixel shader private data" and
"ffp replacement private data", they're both pointers to the same
block of memory, and eg. the shader backend will always only get
passed its own private data.

I'd also like to note that most of the issues you bring up are not new
or specific to this design at all, and some of them wouldn't work at
all with the current structure.

Stefan Dösinger

2008-04-12 22:28:16 UTC

Post by H. Verbeet
Which is completely irrelevant for classifying the operation.
Please read the ARB_vertex_program spec, issue 3 and the
ARB_fragment_program spec, issue 13 to get a better idea of what
shaders replace and what they don't.

Yes, GL says so, but I think strictly sticking to the GL classification is a
bad idea, see below.

Post by H. Verbeet
If the issue is that you've got an interest in keeping the existing
structure because you've already written code on top of it there's not
much point in having this discussion in the first place. If that's not
the issue, I'd like to mention that the whole point of having
interfaces is that you can avoid ugliness like setting vertex
processing state in the fragment processing part of the pipeline.
There's also nothing forceful about splitting your pipeline in vertex
and fragment processing, that's how the hardware works, it's how GL
works, and it's how D3D works.

I agree that GL works that way(fragment-vertex split), and almost certainly
graphics hardware as well, although we don't know unless we look at the
drivers(how the hardware works is irrelevant since GL abstracts that).
However, D3D does *not* work that way, otherwise there would not be any issue
to discuss here, since then D3D states would perfectly match the GL ones.

As I understand it, this discussion comes down to the designing the state
setting interface and implementation in an OpenGL oriented way or a D3D
oriented way.

@ keeping old code: That's not too much of an issue, but you attacked my
design on the grounds of a missing implementation of a functionality, so I
explained how that would be implemented.

Post by H. Verbeet
device->fragment_state_manager->mark_state_dirty(device->fragment_private
_data,

Post by H. Verbeet
state);

Where does "state" come from?

I don't remember you asking, but I see no reason to change the basic
way dirtification is currently done.

Where does SetRenderState(or any other state setter) know if it has to dirtify
a fragment, vertex, misc state, or multiples of them?

Post by H. Verbeet
if (!use_vs) {
device->vertex_state_manager->apply_states(device->vertex_private_data
); }
...

This would be done where?

ActivateContext, CTXUSAGE_DRAWPRIM. (Yes it should probably be part of
the context, not the device, my bad.)

That means polling the states that use_vs() and use_ps() check for changes
instead of getting notified about changes, as well as polling the
SetPixelShader and SetVertexShader settings(via select_shader()). Avoiding
polling was one of the goals of the state management rewrite more than a year
ago. I don't think there's any relevant performance penalty in doing the
little polling you suggest, but where do we draw the line?

Post by H. Verbeet
The pipeline object would certainly have access to all the information
required to create such a reordering function, but I fail to see how
it's relevant at this point. The idea here certainly isn't to
magically fix all state and shader related issues in wined3d, it's
just about making fixed function replacement shaders possible in a
maintainable way.

It is insofar relevant as it affects the issues we can later on fix with the
pipeline replacement and which we cannot. If the GLSL vertex pipeline
replacement shader can only write to the builtin varying because the pixel
shader input expects it that way, we'll never be able to fix the todo_wines
in fixed_function_varying_test() in visual.c

Post by H. Verbeet
...
There really is no distinction between "pixel shader private data" and
"ffp replacement private data", they're both pointers to the same
block of memory, and eg. the shader backend will always only get
passed its own private data.

Point taken, I see now how you can sort that out the fixed function
replacement vs shader activation in the same backend via the private data.
You'll have to watch out though that the default initialization value is a
valid "do not load a replacement shader" request.

Post by H. Verbeet
I'd also like to note that most of the issues you bring up are not new
or specific to this design at all, and some of them wouldn't work at
all with the current structure.

Surely they aren't specific, but I do not see any that would not work with the
current structure. Agreed, using ATIFS fragment processing + ARBFP / GLSL
pixel shaders needs 3 inherited ATIFS shader backends, which is ugly, I
agree. But it does not hide the ugliness and makes the issues that mixing GL
shader functionality causes explicit. It does not limit the fixed function
replacement to the lowest common denoimator(GL fixed function) and allows us
to flexibly use additional features of GL extensions or GLSL to fix bugs we'd
otherwise have to mark as WONTFIX.

I think designing the state setting interface in a D3D oriented way is better
because it is opengl extension independent. If the state setting API is built
on the GL state classifications some additional layer to deal with the
differences of the state implementations is needed(e.g. by giving vertex,
fragment and misc state setters a full state table each). Additionally with
the splitup there is the already discussed hidden issue of implementation
interactions which I don't think can be consumed in the abstraction layers,
and the mentioned issue with the lowest common denominator interface of the
fixed function replacement shaders.

Ivan Gyurdiev

2008-04-12 02:27:01 UTC

Post by Stefan DÃ¶singer
Alexandre didn't commit the patch, I think we should come to an agreement on
this issue, otherwise it is going to come up again and again.

The fundamental issue is pretty straightforward - not sure why it's so
difficult to come to an agreement.

- You want to mix and match vertex and fragment GL backends
- The only maintainable way to do that is to define an interface
between vertex and fragment objects

- You're concerned about the interface constraining the ability of
fragment to talk to vertex
- Write a smarter interface, I also suggested an object to manage it
(linker).

You can't have it both ways - ability to mix and match backends, and
unconstrained interface. The interface doesn't have to be "the lowest
common denominator" - it could be the highest common denominator if
properly written. As Henri pointed out the fact that both GL and the
hardware are vertex/fragment aligned naturally suggests we should break
up things the same way. Even the D3D programmable pipeline is broken up
this way (there are Pixel and Vertex shader objects) - and the fixed
pipeline is going away, so if anything we should move away from its
interface.

- You want to replace the fixed pipeline using GL shaders.
- Then replace the fixed pipeline - this has nothing to do with the
programmable pipeline, therefore should not affect any programmable
pipeline interfaces.

You've pointed out that shader_backend_t is more aligned to a GL
extension than it is to a D3D shader. If that's the case I see little
value to having that interface - it will just cause confusion as
unrelated functionalities are packed into it. The disparity between GL
extensions will get greater as we try to add new features using a
particular extension, and we'll see a lot of functions that are just
empty forwards to other "shader backends", since they don't support
something themselves.

Ivan

Stefan Dösinger

2008-04-13 19:26:19 UTC

Post by Ivan Gyurdiev

Post by Stefan DÃ¶singer
Alexandre didn't commit the patch, I think we should come to an agreement
on this issue, otherwise it is going to come up again and again.

I certainly see the advantages of a constrained interface, I just don't
see(and still don't see) how it can be designed cleanly without greatly
limiting functionality of the pipeline / shader implementation.

I discussed the topic with Henri on IRC again(@Henri: Please correct me if I
missunderstood you), and he explained that his plans consider making GLSL
vertex shaders, GLSL vertex replacement, GLSL fragment replacement and GLSL
fragment replacement one object with different interfaces. So we can have
backchannel communication between the various interface implementations(e.g.
flags or private data) which keeps everything flexible. It's not precisely
nice(backchannel communication somewhat defeats the point of interfaces), but
my design has ugliness as well, so I can live with that.

What's most important to me about this is that we don't have to close any bug
as WONTFIX due to design constraints. So I can stop feeling strongly against
splitting up the interface since the implementations remain the same.

Now the main issue with not splitting up the interfaces I see is that it is
unclear what code in state.c(or the shader backend's state table) changes
which GL state. If state_something changes both vertex and fragment GL states
I can't overwrite it properly in the ATIFS/NVRC code without messing with the
vertex side as well.

I have a few remaining issues though:
-> With splitting up the state table there are now 5 "root" states which have
to be polled for changes in CTXUSAGE_DRAWPRIM setting, also the
GL_TEXTURE_SHADER_NV and GL_FRAGMENT_SHADER_ATI states need to be polled for
enabling/disabling. Is there no way to avoid that?

-> Some cross pipeline part communication issues are still remaining, see
below

-> Increased state dirtification complexity: Now each Set*State has to find
out which part of the pipeline it has to dirtify(a switch-case statement or
probably table referencing), and has to dirtify up to 3 pipelines. That's not
precicely going to help performance.

I know that Ivan doesn't really care about that, especially since it's not an
algorithmic complexity change. However, performance is a top priority issue,
and no gamer will accept the next-gen hardware excuse for inefficient code.

I mainly want to avoid more bad PR like this:
http://www.phoronix.com/scan.php?page=article&item=938
http://www.phoronix.com/scan.php?page=article&item=crossover_games

The 2nd article would be good for me if it was due to a tuneup in cxgames, but
it is a regression in wine instead. Yes, that's phoronix and all, and the
first article's regressions are technically perfectly explainable. Still new
features don't have to come with performance costs. I have a ~5% performance
regression myself which I don't know where it comes from.

So I propose the following plan:

1) We commit the patch to fix fglrx

2) We keep the shader / ffp interface as it is for Wine 1.0. We freeze in two
weeks. I am away for one week now, so if anyone wants any shader interface
changes in 1.0 he'll have to do it himself

3) We investigate where the recent performance regression(s) came from

4) We audit the state handlers in state.c and find out which D3D state handler
changes which parts of the GL pipeline and find states that touch more than
one part and why.

5) Build a battle plan how to separate the following D3D states in various GL
pipeline parts:

-> Vertex shaders - streams. The tricky part here is that the fixed function
GL vertex pipeline needs named arrays, while an ARB/GLSL vertex pipeline
needs numbered arrays(otherwise no vertex blending emulation). How do we
communicate the need for numbered arrays, and the choosen assignment?

-> vertex decl - loaded pointers. We're currently checking the vertex shader
and fog states when a vertex buffer offset it changed, that is not needed

-> Samplers - GL_TEXTURE_xD enable - colorop. That's a major pain for ATIFS
and even more NVTS. I haven't found a nicer implementation using split up
interfaces

-> How do we deal with the depth blit shaders? Do they belong to the shader
backend, or to something else?

-> Should we move the SetupForBlit to some of the shader code? The blitting
and state switching might be more efficient if we're using shaders for it and
just set a ARB / GLSL shader instead of falling back to the absolute low
level limit and killing all states

-> Texture transform flags, clipping, more?

My ultimate hope is to have a clean assignment for each code referenced by the
state table to the pipeline part, so instead of splitting up the state table
in code we have a programming guidelines about which part of the state table
may be changed by which pipeline replacement to avoid the additional run-time
costs. If that turns out to be impossible we need a clear assignment anyway
for a splitup.

6) Investigate the performance implications of the state management, state
polling, the current conditional state linking state dirtification checks and
the driver side cost of fixed function state changing while a

What are your opinions? (btw, I don't think I can implement that stuff alone
anytime soon, I am pretty busy the next months)

Post by Ivan Gyurdiev
Even the D3D programmable pipeline is broken up
this way (there are Pixel and Vertex shader objects) - and the fixed
pipeline is going away, so if anything we should move away from its
interface.

One could argue that the name "pixel shader" already shows that Direct3D does
not separate the pipeline parts properly(GL_NV_texture_shader, issue 1).
Ironically pixel processing is one of the only remaining parts of Direct3D10
that is not programmable.

Stefan Dösinger

2008-04-14 08:26:37 UTC

A small implementation change plan:

With Henri's suggestion, we have two levels of indirection, one that maps a
D3D state to a pipeline part and a state, and one that maps a pipeline part
and a state to an application function. Since both states are known at latest
at device creation time, we can remove one indirection there.

So my suggestion is this: Instead of trying to keep the single table in code,
add 3 pipeline stage backends, like Henri suggested: A vertex one, a fragment
and a misc backend. The pipeline backend has a description structure which
contains a priv data creation and destruction data, possibly some flags to
tell other parts of the pipeline how to communicate with it and a set of
states and application functions:

struct pipeline_backend {
DWORD state; /* State this sets */
apply_func apply; /* Apply function like the current state mgmt uses */
DWORD representative;/* For state grouping */
}

state and representative work like the current state identifiers via
STATE_RENDER(x), STATE_TEXTURESTAGE(x, y), etc. The apply function takes the
stateblock, state and context as argument, as usual.

The current global state table(s) are removed. At device creation(or Init3D or
somewhere else) we select a vertex, fragment and misc state backend and a
shader backend. The device contains a full state table like the current
FFPStateTable / ATIFSStateTable. A pipeline compiler(is there some better
name?) iterates over the 3 partial state tables and inserts them in the
device's state table. If a state is handled by more than one pipeline part, a
helper function can be used which calls the callbacks in a row. That's not as
efficient as the current inlining, but it should be at least as fast as in
Henri's proposal, and I can live with that.

Advantages:
-> We can split up pipeline part handlers and select them dynamically etc
-> We have no additional overhead in the rendering loop since we only deal
with one state table there
-> Minimal state polling because pixel and vertex shaders are dirtifyable
states

Problems:
-> The issues from the last mail still apply and need to be solved(not a
problem, just mentioning)

-> Shaders not 100% separated from the fixed function pipeline since both are
equal states in the vertex and fragment pipelines

-> No state handler or shader_select is not guaranted to be applied each draw,
so we can't use them to enable GL_FRAGMENT_SHADER_ATI and
GL_TEXTURE_SHADER_ATI. That will have to be a fixed function fragment
pipeline callback. If there's a shader implementation using that extension in
use as well, they can sort this out via private data sharing.
(Note: we would get away with using the colorop setter for this, because after
device creation this state is dirty, after a blit it is dirty, and otherwise
the extension will be enabled anyway. I don't particularly like that as it is
a rather fragile setup)

-> What do we do if a state has different representatives in different
pipeline parts? E.g. the vdecl and vertex shader will be linked in both misc
and vertex pipeline backend. In the misc backend, the stream sources will be
added there as well. This means that if a stream source is changed the VDECL
fog/lighting changes will be performed needlessly(it happens currently as
well, so it won't be a regression). There may be other such state groups, but
I can't think of any right now.

Does that sound like a reasonable idea?

Ivan Gyurdiev

2008-04-14 05:08:30 UTC