Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to load Candid data #24

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Add script to load Candid data #24

wants to merge 4 commits into from

Conversation

jasonaowen
Copy link
Contributor

This is by no means finished, but I wanted to get something out before I left in case y'all are interested in looking before I get back to this.

@jasonaowen
Copy link
Contributor Author

@slifty this should be far enough along for you to get some sense of what's involved on the Charity Navigator side; I'd also be happy to split the first few commits out into their own PR if there's enough there for you to start building without wanting to wait on this one.

README.md Outdated Show resolved Hide resolved
src/index.ts Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
@jasonaowen
Copy link
Contributor Author

Okay, I pushed the scripts that I used to test PhilanthropyDataCommons/service#373 . I'm still not done; in particular, I still need to handle rate limits (by sleeping), upload in the bulk case, resolve linter errors, and rebase to address merge conflicts from recent dependabots.

It sounds like both @slifty and @bickelj are against using winston for logging here, so setting up pino with a useful configuration is on my list, too. If either of you want to take a pass at that, I'd welcome the help.

In the meantime, I'm going to go eat!

@jasonaowen
Copy link
Contributor Author

Still have some cleanup to do, but it should now successfully upload Candid data for each proposal in the PDC. Hard to test for sure without fully importing the database locally, which I don't want to do.

I am beginning to suspect that, in the oh-so-grand tradition of this repo, we will run the script and use the results before the PR introducing the script is merged.

src/candid.ts Outdated
Comment on lines 107 to 109
args.oidcBaseUrl as string,
args.oidcClientId as string,
args.oidcClientSecret as string,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this, nor of the sequence of type guards alternative, but either the types for yargs or TypeScript itself doesn't allow for statically typing these slightly-dynamic options. If you have any suggestions for how best to tell TypeScript that "hey, this can't be undefined, actually", I'd love to hear them!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the problem space, but does this help? https://github.com/yargs/yargs/blob/main/docs/typescript.md#typescript-usage-examples

src/candid.ts Outdated
Comment on lines 127 to 132
const token = await getToken(
args.oidcBaseUrl as string,
args.oidcClientId as string,
args.oidcClientSecret as string,
);
const eins = await getEinsFromPdc(args.pdcApiBaseUrl as string, token);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not a fan of this pattern of passing a token + the base URL around. An alternative I considered is to make a PDC client class that initializes an Axios client, keeps it in the instance, and has a bunch of methods that use it; another alternative is to pass that AxiosInstance around directly. I'm kinda leaning towards the latter, but haven't implemented it yet.

src/candid.ts Outdated Show resolved Hide resolved
@jasonaowen
Copy link
Contributor Author

Now that PhilanthropyDataCommons/service#373 has been deployed - thank you, @bickelj - it turns out we have some EINs that Candid does not know about, so I still need to handle 404s.

@slifty, how're you doing with Charity Navigator? Do you want to just add on to this PR? Or what are you thinking? Sorry this has been such a mess :/

@slifty
Copy link
Member

slifty commented May 30, 2023

@jasonaowen I think it'd be cleaner to have Charity Navigator as a separate PR (I have it on another branch from this one / am fine rebasing as needed). I was with family today for the holiday, but should have Charity Navigator data in the API tomorrow.

src/candid.ts Outdated Show resolved Hide resolved
@jasonaowen
Copy link
Contributor Author

@jasonaowen I think it'd be cleaner to have Charity Navigator as a separate PR (I have it on another branch from this one / am fine rebasing as needed).

Okay. I'm deep into the front-end right now, and probably won't have a chance to work on this for a few days. Again, I'm sorry to open this big sprawling PR - particularly with the logging misstep - and then basically have no bandwidth to work on it further!

I was with family today for the holiday, but should have Charity Navigator data in the API tomorrow.

I won't be able to get Charity Navigator wired up in the front end by tomorrow's 10am PT pre-demo, then.

@slifty
Copy link
Member

slifty commented May 30, 2023

 I won't be able to get Charity Navigator wired up in the front end by tomorrow's 10am PT pre-demo, then.

There isn't much data to show anyway! Sorry if I should have spoken up to be clear I wouldn't be working over the weekend.

src/candid.ts Outdated
await postPlatformProviderData(
args.pdcApiBaseUrl as string,
token,
args.ein as string,
Copy link
Member

@slifty slifty May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to call out this pattern since I just got bit by it -- this violates type safety, since the actual value is not a string. This means when doing something like JSON.stringify(args.ein) the result is an encoded number, not a string.

I don't quite know what magic yargs is implicitly doing around arguments (I can't tell where ein is actually defined, for instance), but it would be really nice to have all arguments be explicitly typed and provided to us a tool-typed value instead of shoehorned to pass type safety checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's great feedback! It sounds like I misunderstood how that works.

I can't tell where ein is actually defined

It is defined as an argument to the update command via the angle brackets in the command definition:

'update <ein>',

See the docs for positional arguments.

Looks like there's a way to specify the type of positional arguments, which I'd missed - sorry for leaving a trap that bit you, @slifty!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to call out this pattern since I just got bit by it -- this violates type safety, since the actual value is not a string. This means when doing something like JSON.stringify(args.ein) the result is an encoded number, not a string.

Hm, I wish I had dug into this more deeply while @slifty was around more. I haven't been able to reproduce this.

Positional arguments seem to default to be parsed as strings; entering numbers on the command line - and, in particular, numeric expressions like EINs - show up as strings for me. Specifying that a positional argument or an option is a 'number' does cause it to be parsed as such.

The one wrinkle is that if you set an option to be a string, and then pass in a --config file that has the value as a JSON number, it will get passed in as a number. The reverse is not true: setting the option type to number, and passing in a --config file with a string-encoded numeric value will be parsed as a number.


I also looked more deeply into the original concern I had about needing to add type casts on use, and I think I figured out a solution! We'll need to declare a type for the expected arguments, a la React, and the handler should infer that its args is of that type:

interface ExampleCommandArgs {
  aRequiredString: string;
  anOptionalNumber?: number;
}
const exampleCommand: CommandModule<{}, ExampleCommandArgs> = {
  command: 'example',
  builder: {
    'a-required-string': { type: 'string', demandOption: true },
    'an-optional-number': { type: 'number' },
  },
  handler: (args) => {
    console.log(`It is safe to use string methods ${args.aRequiredString.substring(0)}`);
    if (args.anOptionalNumber) {
      assert(args.anOptionalNumber + 0 !== args.anOptionalNumber + '0');
    }
  },
};

This is better than adding as string or whatever to every use, although it does not rise to the level of runtime type assertions. This addresses my concern, although I don't know if it addresses @slifty's.

@jasonaowen
Copy link
Contributor Author

In addition to rebasing to resolve merge conflicts, I'm also going to split out the infrastructure bits from the Candid-specific bits. That'll be in a separate PR, which this will ladder off.

src/client.ts Show resolved Hide resolved
baseUrl: string,
path: string,
params: Record<string, string>,
token: AccessTokenSet,
Copy link
Collaborator

@bickelj bickelj Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PDC functions I'd prefer the simpler string here rather than forcing AccessTokenSet. Somewhere in the middle (i.e. other functions that call PDC functions and have an AccessTokenSet at hand) would call with accessTokenSet.access_token. A script wouldn't be required to use the client_credentials flow from oidc.ts, then, but the client_credentials flow would still be easy with the code from oidc.ts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think where I'm leaning now is to pass around a client (an AxiosInstance, specifically) that includes both the base URL, the authentication header, common logging options, and whatever else. That way, other non-client-credential users can construct the client however they want.

(I mentioned this as a possibility earlier.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get to this, I'm afraid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still prefer not to have the token here be strictly an AccessTokenSet defined in oidc.ts but the token string. If I find myself writing code to call this function later in another script, I might change it.

@jasonaowen
Copy link
Contributor Author

I've split off #48 for adding logging. After that, I'll add the yargs framework in a followup PR. Then, I'll rebase this PR - which will only focus on Candid - onto those.

@jasonaowen
Copy link
Contributor Author

I split out #52 for adding yargs.

@jasonaowen jasonaowen force-pushed the candid branch 2 times, most recently from 374a1f6 to a69beb3 Compare July 1, 2023 00:41
Copy link
Contributor Author

@jasonaowen jasonaowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as done as it will be this month, I'm afraid. I'm not completely happy with where I ran out of time, but it works, it's a mostly-decent example of how to add commands to the unified script, and it passes the linter.

Sorry to drop this at the end of the day at the end of the week at the end of the month!

Comment on lines +83 to +87
.option('candid-api-key', {
describe: 'Candid Premier API key; get from account management at https://dashboard.candid.org/',
demandOption: true,
type: 'string',
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love repeating this stanza from lookup.

Copy link
Collaborator

@bickelj bickelj Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, no big deal.

Comment on lines +103 to +105
args.oidcBaseUrl as string,
args.oidcClientId as string,
args.oidcClientSecret as string,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out a way to avoid these as being typed as unknown.

baseUrl: string,
path: string,
params: Record<string, string>,
token: AccessTokenSet,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get to this, I'm afraid.

@jasonaowen jasonaowen marked this pull request as ready for review July 1, 2023 00:45
ein: string,
): Promise<CandidPremierResult> => {
logger.debug(`Looking up EIN ${ein} in Candid Premier API`);
const { data } = await client.get<CandidPremierResult>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this returns a 401 with npm start candid lookup 12-3456789 -- --candid-api-key='testing', I don't see the logger boilerplate. It's not a problem, really, but it differs from the behavior exemplified in npm start error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, great catch, @bickelj! It turns out error handling behaves differently between synchronously throwing an error and asynchronously returning a Promise that later rejects. I've added a commit to fix that!

src/candid.ts Show resolved Hide resolved
src/candid.ts Show resolved Hide resolved
src/candid.ts Show resolved Hide resolved
import axios, { InternalAxiosRequestConfig, AxiosResponse } from 'axios';
import { logger } from './logger';

const clientLogger = logger.child({}, { msgPrefix: '🌐 ' });
Copy link
Collaborator

@bickelj bickelj Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, what? I like it. But I wasn't able to make the debug messages happen yet. I guess it's related to when an error happens on the axios.get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to see it with LOG_LEVEL=debug: LOG_LEVEL=debug npm start -- candid lookup 12345678.

Copy link
Collaborator

@bickelj bickelj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't actually call Candid (no API key) but it looks good and trust that it works!

@slifty
Copy link
Member

slifty commented Sep 13, 2023

@jasonaowen probably worth getting this merged since it's been approved.

Start interacting with the Candid API by adding a command that takes an
EIN and will either display the results or write them to a file.

Build out an Axios client that has some reasonable defaults, including
some light request and response logging.
Given an EIN, look up an organization in the Candid Premier API, and
upload it to the PDC data platform provider cache.

Unfortunately, the type inference around yargs commands is not quite
powerful enough. I spent some time fighting the type checker and
couldn't get it to do the right thing around reused `Options`; it just
isn't able to figure them out, or at least I haven't been able to figure
out how to help it do so correctly. It works better when the `builder`
is a literal instead of a function, but positional arguments cannot be
specified via the literal syntax, and the `check` validator is also only
available via the function builder. So, sadly, back to sprinkling `as
string` around.
Add the last command for storing Candid data in the PDC data platform
provider cache: update-all. Fetch the list of applications from the PDC,
find their EINs, call Candid with each EIN, and upload the response from
Candid to the PDC.

Happily, as this command does not include any positional arguments and
so does not need to use the yargs function builder API, the type
definitions are much better.
Yargs handles commands that throw synchronous errors differently from
commands that return rejected promises. The example code introduced in
commit 863379d threw a synchronous
error, but the error returned by axios is via a rejected Promise.

Configure yargs to not include its standard error handling behavior, so
that our error handler will fire, and update the example error command
to be async so as to exercise that behavior.

Thanks to @bickelj for catching this!

https://github.com/yargs/yargs/blob/main/docs/api.md#failfn--boolean
@jasonaowen
Copy link
Contributor Author

I rebased onto main and added a commit to address a concern @bickelj raised. I'll leave this open for a day or two in case anyone wants to review that new change, and then I'll merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants