Skip to content

Floating Point to Fixed and Fixed to Floating Point Conversion#242

Open
sweeksBigBlue wants to merge 15 commits intointel:mainfrom
jcfarwe:fp2fpcvt
Open

Floating Point to Fixed and Fixed to Floating Point Conversion#242
sweeksBigBlue wants to merge 15 commits intointel:mainfrom
jcfarwe:fp2fpcvt

Conversation

@sweeksBigBlue
Copy link
Contributor

This pull request is a base line implementation of PR 112.

In this PR we make a generalized converter for Fixed to Float and Float to Fixed method implementations. This should make it easier in the future for designers to implement transitions between the two elements and expand hardware design capabilities.

@sweeksBigBlue
Copy link
Contributor Author

@desmonddak and @mkorbel1 we are finally ready to review this!

Signed-off-by: James Farwell <james.c.farwell@intel.com>
Copy link
Contributor

@mkorbel1 mkorbel1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good on the right track!

}

/// Losslessly convert a [FloatingPointValue] to a [FixedPointValue].
/// TODO(jcfarwe):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to include these TODO's in this PR or do them?

Also, TODOs should be after // rather than /// (since they are not part of the doc comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

}
}
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about testing with vs without explicitJBit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind expanding on that? Idk what that is but happy to add it to my tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/intel/rohd-hcl/blob/main/doc/components/floating_point.md#explicit-j-bit

I'll test the documentation on you: if this doc doesn't explain it well enough, we should improve it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I think maybe this would be an excellent second issue I can follow-up on in a different PR maybe? Honest answer is, even with reading that I'd need time to digest that and maybe that should be an expanded method to this base one? Or maybe I'm over complicating that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be a relatively straight-forward change, basically even normal things look like sub-normals with an explicit 1. @desmonddak what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply: it should be quite simple; a small change in how you adjust the mantissa in each direction.
So going from fixed value to normal float value: you would traditionally compute the mantissa for the floating point value and chop off the leading '1' and store the rest of the mantissa into the FPV.

But going to an explicit j-bit form, you would keep that leading 1 and put it into the mantissa. The FPV feels effectively one bit shorter because you need to store the leading '1. Note that you would do this just as you do with subnormals. In other words, if you fixed point was small enough to require a subnormal representation in FPV, you would also keep that leading '1'.

Going from an explicitJBit normal float to fixed, you would recognize that you don't need to prefix the mantissa with a '1' representing the j-bit. The leading '1' is already in the mantissa. This is just like it is for subnormal. If you are converting a subnormal float to fixed, you do not prefix the mantissa with a '1'.

To test, you need to provide an explicitJBit form of floatingPointValue to translate to fixed and conversely, indicate that you want an explicitJBit form of floatingPointValue when converting from fixed.

Explicit JBIT is experimental: we recognize that we do not have to normalize in some situations as it doesn't produce any more accuracy (say adding two narrow mantissa FPs to get a wider mantissa FP). Rather than normalizing, we can leave it in explicit j-bit form for the next operation and save a normalization step, and let normalization happen after the next operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this;
test('FixedPointValue: toFloatingPointValue signed', () {
const m = 10;
const n = 12;
const width = m + n + 1; // 1 for sign bit
for (var i = 0; i < pow(2, width); i++) {
final testVal = (i / pow(2, width)).toInt();
final fxv = FixedPointValue.populator(
integerWidth: m, fractionWidth: n, signed: true)
.ofLogicValue(LogicValue.ofInt(testVal, width));
FloatingPointValue fpv;
if (testVal > 0) {
fpv = fxv.toFloatingPointValue(false); // explicitJBit
} else {
fpv = fxv.toFloatingPointValue(true);
}
expect(fpv.toDouble(), fxv.toDouble(),
reason: 'toFloatingPointValue failed for $testVal');
}
});
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcfarwe has pushed tests for J-bit, ready for review @desmonddak and @mkorbel1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good. Can you merge the signed/unsigned versions into one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you shorten the widths to speed up the tests? 22 bits is quite wide.

I tried the following which ran in about 1s:

    for (final (m, n) in [(5, 7), (7, 5), (0, 7), (7, 0)]) {

});
test('FPV: toFixedPointValue', () async {
//
// TODO(jcfarwe): Find all corner cases, automate test cases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lingering TODO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

}
});
});
test('FPV: toFixedPointValue', () async {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an exhaustive run as well for at least some width(s)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added exhaustive test for various exponent and mantissa widths


/// Addition operation that returns a [FixedPointValue].
/// Converts a [FixedPointValue] to a [FloatingPointValue].
FloatingPointValue toFloat() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be toFloatingPointValue()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I can fix that if you think that is more appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated name, it is more clear and obvious good catch.

];

for (final testCase in testCases) {
final bias = (pow(2, testCase[1] - 1) - 1).toInt();
Copy link
Contributor

@desmonddak desmonddak Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the sizes and using the FPV populator might be more readable for this test.
It's a bit confusing whether the sizes are for the FPV or the FXV.

See compound_adder_test.dart:262 for an example of list structuring and using Boolean literals. It's not great but it helps to structure the list to help the reader parse (the structure has separated inputs and outputs, for example).

mantissa: LogicValue.zero);
}

// FOR loop to find first one to figure out correct exponent value and width
Copy link
Contributor

@desmonddak desmonddak Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On comments: this can confuse people. Find the first 1 .would be better, or 'Count the leading zeros in order to correct the exponent value.' may be better still. No need to say 'FOR loop'. :j)

If you like to use one, then use it like this one and zero.

Remember to end on a period (Dart conventions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed comments, good call :)

jcfarwe and others added 8 commits July 21, 2025 16:14
Signed-off-by: James Farwell <james.c.farwell@intel.com>
Signed-off-by: James Farwell <james.c.farwell@intel.com>
Signed-off-by: James Farwell <james.c.farwell@intel.com>
Signed-off-by: James Farwell <james.c.farwell@intel.com>
Signed-off-by: James Farwell <james.c.farwell@intel.com>
for (var testExp = minExp; testExp < maxExp; testExp++) {
final bias = (pow(2, expWidth - 1) - 1).toInt();
final exp = LogicValue.ofInt(testExp, expWidth);
final sign = LogicValue.ofInt(0, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also want to put 'sign in a loop:

for (final sign in [LogicValue.zero, Logicvalue.one]) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

Signed-off-by: James Farwell <james.c.farwell@intel.com>
@sweeksBigBlue
Copy link
Contributor Author

@mkorbel1 and @desmonddak just back from vacation, where are we at with the PR? What changes are still outstanding that you feel needs addressed?

});

test('FixedPointValue: toFloatingPointValue unsigned', () {
for (final (m, n) in [(10, 12), (0, 22), (22, 0)]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely narrow these widths...

@@ -631,4 +631,152 @@ void main() {
}
});
});
Copy link
Contributor

@desmonddak desmonddak Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We like to see a blank line between tests or groups.

).ofLogicValue(finalMantissa);
}

test('FPV: toFixedPointValue exhaustive', () async {
Copy link
Contributor

@desmonddak desmonddak Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is fishy about this test. It only fills the upper 20 bits or so of the integer portion, yet the test is labeled 'exhaustive'.

Here is what I see when I print fxv:

(0 0010000000000000000101101100000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000)
(1 1101111111111111111010010100000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants