Floating Point to Fixed and Fixed to Floating Point Conversion#242
Floating Point to Fixed and Fixed to Floating Point Conversion#242sweeksBigBlue wants to merge 15 commits intointel:mainfrom
Conversation
|
@desmonddak and @mkorbel1 we are finally ready to review this! |
Signed-off-by: James Farwell <james.c.farwell@intel.com>
mkorbel1
left a comment
There was a problem hiding this comment.
Looking very good on the right track!
| } | ||
|
|
||
| /// Losslessly convert a [FloatingPointValue] to a [FixedPointValue]. | ||
| /// TODO(jcfarwe): |
There was a problem hiding this comment.
Did you mean to include these TODO's in this PR or do them?
Also, TODOs should be after // rather than /// (since they are not part of the doc comment)
| } | ||
| } | ||
| }); | ||
|
|
There was a problem hiding this comment.
what about testing with vs without explicitJBit?
There was a problem hiding this comment.
Do you mind expanding on that? Idk what that is but happy to add it to my tests.
There was a problem hiding this comment.
https://github.com/intel/rohd-hcl/blob/main/doc/components/floating_point.md#explicit-j-bit
I'll test the documentation on you: if this doc doesn't explain it well enough, we should improve it!
There was a problem hiding this comment.
Great question! I think maybe this would be an excellent second issue I can follow-up on in a different PR maybe? Honest answer is, even with reading that I'd need time to digest that and maybe that should be an expanded method to this base one? Or maybe I'm over complicating that.
There was a problem hiding this comment.
I think it should be a relatively straight-forward change, basically even normal things look like sub-normals with an explicit 1. @desmonddak what do you think?
There was a problem hiding this comment.
Sorry for the late reply: it should be quite simple; a small change in how you adjust the mantissa in each direction.
So going from fixed value to normal float value: you would traditionally compute the mantissa for the floating point value and chop off the leading '1' and store the rest of the mantissa into the FPV.
But going to an explicit j-bit form, you would keep that leading 1 and put it into the mantissa. The FPV feels effectively one bit shorter because you need to store the leading '1. Note that you would do this just as you do with subnormals. In other words, if you fixed point was small enough to require a subnormal representation in FPV, you would also keep that leading '1'.
Going from an explicitJBit normal float to fixed, you would recognize that you don't need to prefix the mantissa with a '1' representing the j-bit. The leading '1' is already in the mantissa. This is just like it is for subnormal. If you are converting a subnormal float to fixed, you do not prefix the mantissa with a '1'.
To test, you need to provide an explicitJBit form of floatingPointValue to translate to fixed and conversely, indicate that you want an explicitJBit form of floatingPointValue when converting from fixed.
Explicit JBIT is experimental: we recognize that we do not have to normalize in some situations as it doesn't produce any more accuracy (say adding two narrow mantissa FPs to get a wider mantissa FP). Rather than normalizing, we can leave it in explicit j-bit form for the next operation and save a normalization step, and let normalization happen after the next operation.
There was a problem hiding this comment.
Something like this;
test('FixedPointValue: toFloatingPointValue signed', () {
const m = 10;
const n = 12;
const width = m + n + 1; // 1 for sign bit
for (var i = 0; i < pow(2, width); i++) {
final testVal = (i / pow(2, width)).toInt();
final fxv = FixedPointValue.populator(
integerWidth: m, fractionWidth: n, signed: true)
.ofLogicValue(LogicValue.ofInt(testVal, width));
FloatingPointValue fpv;
if (testVal > 0) {
fpv = fxv.toFloatingPointValue(false); // explicitJBit
} else {
fpv = fxv.toFloatingPointValue(true);
}
expect(fpv.toDouble(), fxv.toDouble(),
reason: 'toFloatingPointValue failed for $testVal');
}
});
?
There was a problem hiding this comment.
@jcfarwe has pushed tests for J-bit, ready for review @desmonddak and @mkorbel1
There was a problem hiding this comment.
Tests look good. Can you merge the signed/unsigned versions into one?
There was a problem hiding this comment.
Can you shorten the widths to speed up the tests? 22 bits is quite wide.
I tried the following which ran in about 1s:
for (final (m, n) in [(5, 7), (7, 5), (0, 7), (7, 0)]) {| }); | ||
| test('FPV: toFixedPointValue', () async { | ||
| // | ||
| // TODO(jcfarwe): Find all corner cases, automate test cases. |
| } | ||
| }); | ||
| }); | ||
| test('FPV: toFixedPointValue', () async { |
There was a problem hiding this comment.
maybe add an exhaustive run as well for at least some width(s)?
There was a problem hiding this comment.
Added exhaustive test for various exponent and mantissa widths
|
|
||
| /// Addition operation that returns a [FixedPointValue]. | ||
| /// Converts a [FixedPointValue] to a [FloatingPointValue]. | ||
| FloatingPointValue toFloat() { |
There was a problem hiding this comment.
Should this be toFloatingPointValue()?
There was a problem hiding this comment.
Great point, I can fix that if you think that is more appropriate
There was a problem hiding this comment.
Updated name, it is more clear and obvious good catch.
| ]; | ||
|
|
||
| for (final testCase in testCases) { | ||
| final bias = (pow(2, testCase[1] - 1) - 1).toInt(); |
There was a problem hiding this comment.
Removing the sizes and using the FPV populator might be more readable for this test.
It's a bit confusing whether the sizes are for the FPV or the FXV.
See compound_adder_test.dart:262 for an example of list structuring and using Boolean literals. It's not great but it helps to structure the list to help the reader parse (the structure has separated inputs and outputs, for example).
| mantissa: LogicValue.zero); | ||
| } | ||
|
|
||
| // FOR loop to find first one to figure out correct exponent value and width |
There was a problem hiding this comment.
On comments: this can confuse people. Find the first 1 .would be better, or 'Count the leading zeros in order to correct the exponent value.' may be better still. No need to say 'FOR loop'. :j)
If you like to use one, then use it like this one and zero.
Remember to end on a period (Dart conventions).
There was a problem hiding this comment.
Fixed comments, good call :)
Signed-off-by: James Farwell <james.c.farwell@intel.com>
Signed-off-by: James Farwell <james.c.farwell@intel.com>
Signed-off-by: James Farwell <james.c.farwell@intel.com>
Signed-off-by: James Farwell <james.c.farwell@intel.com>
Signed-off-by: James Farwell <james.c.farwell@intel.com>
| for (var testExp = minExp; testExp < maxExp; testExp++) { | ||
| final bias = (pow(2, expWidth - 1) - 1).toInt(); | ||
| final exp = LogicValue.ofInt(testExp, expWidth); | ||
| final sign = LogicValue.ofInt(0, 1); |
There was a problem hiding this comment.
Do you also want to put 'sign in a loop:
for (final sign in [LogicValue.zero, Logicvalue.one]) {Signed-off-by: James Farwell <james.c.farwell@intel.com>
|
@mkorbel1 and @desmonddak just back from vacation, where are we at with the PR? What changes are still outstanding that you feel needs addressed? |
| }); | ||
|
|
||
| test('FixedPointValue: toFloatingPointValue unsigned', () { | ||
| for (final (m, n) in [(10, 12), (0, 22), (22, 0)]) { |
There was a problem hiding this comment.
Definitely narrow these widths...
| @@ -631,4 +631,152 @@ void main() { | |||
| } | |||
| }); | |||
| }); | |||
There was a problem hiding this comment.
We like to see a blank line between tests or groups.
| ).ofLogicValue(finalMantissa); | ||
| } | ||
|
|
||
| test('FPV: toFixedPointValue exhaustive', () async { |
There was a problem hiding this comment.
Something is fishy about this test. It only fills the upper 20 bits or so of the integer portion, yet the test is labeled 'exhaustive'.
Here is what I see when I print fxv:
(0 0010000000000000000101101100000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000)
(1 1101111111111111111010010100000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000)
This pull request is a base line implementation of PR 112.
In this PR we make a generalized converter for Fixed to Float and Float to Fixed method implementations. This should make it easier in the future for designers to implement transitions between the two elements and expand hardware design capabilities.