Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support superclasses with (dependent) type variables not in the base typeclass #633

Open
quark17 opened this issue Oct 23, 2023 · 2 comments
Labels
bug Something isn't working enhancement New feature or request typecheck Relates to the typecheck stage of compilation

Comments

@quark17
Copy link
Collaborator

quark17 commented Oct 23, 2023

This is bug 1504 in Bluespec Inc's pre-GitHub bug database. But was also brought up in a recent example:

typeclass IsSmallBits#(type a)
  provisos (Bits#(a, _t), Add#(_t, _k, 32));
endtypeclass

In this case, it was an attempt to bundle two provisos under one simpler name -- allowing the user to just write IsSmallerBits#(t) and not have to write all the other provisos, which are implied. However, this situation can also arise in more typical uses of typeclasses.

BSC will accept the above code without an error, but that's because it has lazily inserted the error into the symbol table entry for this class. If you write code that uses the class, then you get an error:

Error: "Test.bsv", line 2, column 22: (T0008)
  Unbound type variable `_t'

You can also get this error just by forcing evaluation of the symbol table, such as by using the flag -dsyminitial (the first of the symbol table dump points).

The error occurs in getCls (in MakeSymTab.hs) which constructs the entry for a typeclass in the symbol table. The entry has several fields, including super which contains the superclass declarations. In the definition of super (line 786),getCls converts the untyped CSyntax parsed from what the user wrote into typed CSyntax -- meaning that the kinds for type variables have been filled in. This is done using the function convCTypeAssumps which takes a list of "assumptions", aka pairings of variables to their kinds. When it finds a type variable whose kind it doesn't know, it reports the user error T0008 that we saw. (This potentially should have been an internal error.) The reason that the variable isn't found in the assumptions is because BSC only created assumptions for the variables that appear in the base -- in the example, only a and not _t and _k.

BSC has performed kind inference and reported if the user wrote superclasses that don't kind/typecheck. But BSC has not kept around the results of that inference, by the time that the symbol table is being constructed. The kind inference was performed by inferKinds in InferKinds.hs, and typeclasses specifically are handled in inferKDefn at line 89. You'll see a comment there:

-- there may be additional variables in the superclass
-- XXX we should confirm that they are dependent utimately on "vs"
let pvs = concatMap (S.toList . getCPTyVars) ps \\ vs

The comment is on code that gets the list of variables from the superclasses which are not already mentioned in the base typeclass (although its use of concatMap and \\ is wrong and should be fixed). I added this comment in 2008, when I thought I was making BSC support such variables in superclasses (Bluespec Inc bug 1503). But this just allows BSC to compile the typeclass and store the error lazily in the symbol table; any example that attempts to use it will still fail (thus bug 1504 was opened). Some place does need to check that the new variables are dependent on the base variables, but it may be easier to do that in getClass than in inferKDefn.

So inferKinds has checked that the kinds are OK. But it has not recorded the kinds that it learned for the superclass variables. What it returns is an assumption map, pairing top level names to their kinds. So it contains entries like:

(IsSmallBits, * -> *)
(Bits, * -> # -> *)
(Add, # -> # -> # -> *)

This is the information that getCls has available. This would either need to be expanded to include the kind information for superclasses, or getCls would need to re-run kind inference on the superprovisos. It is already using the KI monad to do the conversion; I don't think it's too much extra work to repeat the inference.

I have written code that does this inference in getCls, but the example is not working yet (once I add in an instance of the typeclass and a function that carries it as a proviso). Either I need to debug the output of my code, or something also needs to be changed in the place where superclasses are introduced. I suspect that, when superclasses are added to the known contexts, the non-base variables might need to be instantiated as fresh variables (dependent, but not bound). That's still to be explored. Also, I'm assuming that this point needs the kinds in place -- if they don't, then getCls wouldn't need to do kind inference; but I think it does (though that's also still to be explored).

A few things to note. The symbol table is constructed recursively -- this allows the construction of the typeclass' entry to lookup other typeclasses and get their entries -- but it means that we need to be careful where you add trace statements when debugging, to be sure not to force evaluation of things you don't need, resulting in GHC reporting <<loop>>. (So, for example, where getCls returns a tuple of the new entry c for the symbol table and a list of errors errs, it is safe to add trace statements attached to the errors, but not attached to the symtab entry.) Also, the symbol table is constructed in steps, resulting in getCls being called twice -- and I think the results of the first call are not expected to matter. The first step of constructing the symbol table fills in most information, but doesn't have legitimate data for the typeclasses yet (or, at least, some fields of the entry are not legitimate yet, such as the superclasses); the second step adds in the corrent typeclass entry. This is possibly done because of the recursion, to provide a base that the second step can build on? The first call to getCls, the assumption map is entirely empty (so any typeclass with superclasses will have a lazy error in the entry at that point). Maybe getCls could be updated to not compute the superclasses the first time, but it doesn't seem to hurt.

It's also worth noting that getCls is called both on the typeclasses defined in the current package (Cclass) and on the typeclasses from imported files (CIclass). When we write tests for this, should make sure to test when the typeclass is imported.

We should also test, as I said, that the new variables are determined by the base variables. If not, this is a situation we call "ambiguous" and report when user writes in type signatures. This is checked by checkForAmbiguousPreds in TCMisc.hs, and reports EAmbiguousExplCtx (T0079), which is tested in bsc.typechecker/context-errors on examples like these:

function a#(m) x(a#(m) y) provisos (BitExtend#(n,m,a));

instance Foo#(Bool, Bool) provisos (Eq#(a));

Also, I know that BSC reports kind errors in superclasses (having checked it in the past), but I don't know if this is being regression tested in the testsuite. We should check for that and add a test if not.

@quark17 quark17 added bug Something isn't working enhancement New feature or request typecheck Relates to the typecheck stage of compilation labels Oct 23, 2023
@quark17
Copy link
Collaborator Author

quark17 commented Oct 23, 2023

Note that there is a workaround, for typeclasses that have associated type constructors. The original example can be made to work by replacing the new variables with constructors on the base variables:

typeclass IsSmallBits#(type a)
  provisos (
    Bits#(a, SizeOf#(a)),
    Add#(SizeOf#(a), TSub#(32,SizeOf#(a)), 32)
  );
endtypeclass

@quark17
Copy link
Collaborator Author

quark17 commented Nov 7, 2023

The superclass info is currently used in two places in BSC: (1) In convInst (in MakeSymTab.hs) to convert instances into structs, where the member functions are fields, but also the superclasses become fields. And (2) in bySuperE (in TCMisc.hs) which is used by sat (the proviso solver) to attempt to resolve inferred provisos by looking at the superclasses of provisos that are given.

(1) The conversion in convInst produces a struct like this:

Test.IsSmallBits~a :: Test2.IsSmallBits a;
Test.IsSmallBits~a = 
    let in Test.IsSmallBits { Bits = _::(Prelude.Bits a _t) => (TGen -1);
			      Add = _::(Prelude.Add _t _k 32) => (TGen -1);
			      };

The problem here is that the type of the field Add is ambiguous, because _t is not defined without also having the Bits proviso to give it meaning (as a dependency of the bound variable a). I need to investigate how these fields are used, to know the right way to fix this. It may be that the type for Add needs to also include the Bit proviso -- and maybe it's easiest to just put all the superclasses on each field, but then are separate fields for each superclass necessary or can we just have one field?

(2) Typechecking infers provisos from the user's code, and then attempts to resolve them all, given a set of explicit provisos that the user wrote. For example, if we have this code:

function Bit#(32) mypack(t x)
provisos(IsSmallBits#(t));
  return zeroExtend(pack(x));
endfunction

the use of pack will lead to a Bits proviso being inferred and the use of zeroExtend will lead to a BitExtend proviso -- and there is an instance of BitExtend for Bit that introduces an Add proviso. These inferred provisos will have fresh type variables in them (prefixed by _tctyvar and followed by a unique number):

(Prelude.Bits t _tctyvar1010)
(Prelude.BitExtend _tctyvar1010 32 Prelude.Bit)

(Prelude.Add _tctyvar1025 _tctyvar1010 32)

These provisos to be resolved are often called ps in the code, and the explicit (given) provisos are often called es (or eps). In the above example, we are explicitly given IsSmallBits#(t).

I was assuming that the solver adds the superclasses of the es to the set es before attempting to resolve, but that doesn't appear to be the case. It appears that superclasses are considered in a separate step -- an explicit call to bySuperE, that attempts to resolve any remaining provisos by using the superclasses of the es. It may be doing so this so that error messages are easier to report to users (by not including all the superclasses in the list of given provisos)? But it could be worth trying to just add the superclasses to es and see what happens?

How it works is that bySuperE produces a list of all the provisos that are known, due to superclasses -- and knowing that Add x y z in the list means that Add y x z should also be added to the list -- and then sat takes the returned list and tries to match each of the ps against each of the elements of that list (using a function lookfor, that I haven't examined deeply yet).

I have a patch that at least updates bySuperE to substitute fresh type variables for the unbound vars that the user wrote (the _t and _k). So, given IsSmallBits#(t), it produces the following list of four epreds:

 [(_tcdict1001):(Test3.IsSmallBits t),
  (.Bits ·t _tcdict1001):(Prelude.Bits t _tctyvar1021),
  (.Add ·t _tcdict1001):(Prelude.Add _tctyvar1021 _tctyvar1022 32),
  (.Add ·t _tcdict1001):(Prelude.Add _tctyvar1022 _tctyvar1021 32)]

(Ah, here is where we see the fields of the convInst struct being used: The left side of the colon is an expression for the dictionary, which is just a variable for the given predicate, but for the superclasses it is a field extraction applied to that variable.)

The fresh variable _tctyvar1021 has been introduced for _t and _tctyvar10122 for _k.

This works for resolving a Bits proviso on t, because the dependent variables match:

(Prelude.Bits t _tctyvar1010)  // to be resolved
(Prelude.Bits t _tctyvar1021)  // known from the super class

This resolves and we learn the substitution that _tctyvar1021 can be replaced with _tctyvar1010.

Unfortunately, this list of superclasses isn't preserved. Later, when attempting to resolve an Add proviso (that _tctyvar1010 is not greater than 32), bySuperE is called again, and this time it produces new fresh variables:

 [(_tcdict1001):(Test3.IsSmallBits t),
  (.Bits ·t _tcdict1001):(Prelude.Bits t _tctyvar1023),
  (.Add ·t _tcdict1001):(Prelude.Add _tctyvar1023 _tctyvar1024 32),
  (.Add ·t _tcdict1001):(Prelude.Add _tctyvar1024 _tctyvar1023 32)])

Because it's not using _tctyvar1010 (or even _tctyvar1021 which can be substituted), matching isn't possible:

(Prelude.Add _tctyvar1025 _tctyvar1010 32)  // to be resolved
(Prelude.Add _tctyvar1024 _tctyvar1023 32)  // known from the superclass

A third variable can depend on the other two, so this matching needs to have two types in common, and only 32 is the same. If the superclass proviso had still contained _tctyvar1010 (or _tctyvar1021 which could be substituted into _tctyvar1010), then matching would work.

So the fresh variables from the superclass information needs to stored somehow. One way to do this would be to just add the superclasses to the es (as mentioned above), and do it once and be done. Alternatively, the results of bySuperE could maybe be memoized? But that's essentially like them to es but just kept to the side -- keep es and ss together, and pull out the ss when needed instead of recomputing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request typecheck Relates to the typecheck stage of compilation
Projects
None yet
Development

No branches or pull requests

1 participant