-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to load Candid data #24
base: main
Are you sure you want to change the base?
Conversation
@slifty this should be far enough along for you to get some sense of what's involved on the Charity Navigator side; I'd also be happy to split the first few commits out into their own PR if there's enough there for you to start building without wanting to wait on this one. |
Okay, I pushed the scripts that I used to test PhilanthropyDataCommons/service#373 . I'm still not done; in particular, I still need to handle rate limits (by sleeping), upload in the bulk case, resolve linter errors, and rebase to address merge conflicts from recent dependabots. It sounds like both @slifty and @bickelj are against using winston for logging here, so setting up pino with a useful configuration is on my list, too. If either of you want to take a pass at that, I'd welcome the help. In the meantime, I'm going to go eat! |
Still have some cleanup to do, but it should now successfully upload Candid data for each proposal in the PDC. Hard to test for sure without fully importing the database locally, which I don't want to do. I am beginning to suspect that, in the oh-so-grand tradition of this repo, we will run the script and use the results before the PR introducing the script is merged. |
src/candid.ts
Outdated
args.oidcBaseUrl as string, | ||
args.oidcClientId as string, | ||
args.oidcClientSecret as string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this, nor of the sequence of type guards alternative, but either the types for yargs or TypeScript itself doesn't allow for statically typing these slightly-dynamic options. If you have any suggestions for how best to tell TypeScript that "hey, this can't be undefined
, actually", I'd love to hear them!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the problem space, but does this help? https://github.com/yargs/yargs/blob/main/docs/typescript.md#typescript-usage-examples
src/candid.ts
Outdated
const token = await getToken( | ||
args.oidcBaseUrl as string, | ||
args.oidcClientId as string, | ||
args.oidcClientSecret as string, | ||
); | ||
const eins = await getEinsFromPdc(args.pdcApiBaseUrl as string, token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not a fan of this pattern of passing a token + the base URL around. An alternative I considered is to make a PDC client class that initializes an Axios client, keeps it in the instance, and has a bunch of methods that use it; another alternative is to pass that AxiosInstance
around directly. I'm kinda leaning towards the latter, but haven't implemented it yet.
Now that PhilanthropyDataCommons/service#373 has been deployed - thank you, @bickelj - it turns out we have some EINs that Candid does not know about, so I still need to handle 404s. @slifty, how're you doing with Charity Navigator? Do you want to just add on to this PR? Or what are you thinking? Sorry this has been such a mess :/ |
@jasonaowen I think it'd be cleaner to have Charity Navigator as a separate PR (I have it on another branch from this one / am fine rebasing as needed). I was with family today for the holiday, but should have Charity Navigator data in the API tomorrow. |
Okay. I'm deep into the front-end right now, and probably won't have a chance to work on this for a few days. Again, I'm sorry to open this big sprawling PR - particularly with the logging misstep - and then basically have no bandwidth to work on it further!
I won't be able to get Charity Navigator wired up in the front end by tomorrow's 10am PT pre-demo, then. |
There isn't much data to show anyway! Sorry if I should have spoken up to be clear I wouldn't be working over the weekend. |
src/candid.ts
Outdated
await postPlatformProviderData( | ||
args.pdcApiBaseUrl as string, | ||
token, | ||
args.ein as string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to call out this pattern since I just got bit by it -- this violates type safety, since the actual value is not a string. This means when doing something like JSON.stringify(args.ein)
the result is an encoded number, not a string.
I don't quite know what magic yargs is implicitly doing around arguments (I can't tell where ein
is actually defined, for instance), but it would be really nice to have all arguments be explicitly typed and provided to us a tool-typed value instead of shoehorned to pass type safety checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's great feedback! It sounds like I misunderstood how that works.
I can't tell where
ein
is actually defined
It is defined as an argument to the update
command via the angle brackets in the command definition:
Line 102 in 9780b99
'update <ein>', |
See the docs for positional arguments.
Looks like there's a way to specify the type of positional arguments, which I'd missed - sorry for leaving a trap that bit you, @slifty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to call out this pattern since I just got bit by it -- this violates type safety, since the actual value is not a string. This means when doing something like
JSON.stringify(args.ein)
the result is an encoded number, not a string.
Hm, I wish I had dug into this more deeply while @slifty was around more. I haven't been able to reproduce this.
Positional arguments seem to default to be parsed as string
s; entering numbers on the command line - and, in particular, numeric expressions like EINs - show up as string
s for me. Specifying that a positional argument or an option is a 'number'
does cause it to be parsed as such.
The one wrinkle is that if you set an option to be a string
, and then pass in a --config
file that has the value as a JSON number, it will get passed in as a number. The reverse is not true: setting the option type to number
, and passing in a --config
file with a string-encoded numeric value will be parsed as a number.
I also looked more deeply into the original concern I had about needing to add type casts on use, and I think I figured out a solution! We'll need to declare a type for the expected arguments, a la React, and the handler should infer that its args
is of that type:
interface ExampleCommandArgs {
aRequiredString: string;
anOptionalNumber?: number;
}
const exampleCommand: CommandModule<{}, ExampleCommandArgs> = {
command: 'example',
builder: {
'a-required-string': { type: 'string', demandOption: true },
'an-optional-number': { type: 'number' },
},
handler: (args) => {
console.log(`It is safe to use string methods ${args.aRequiredString.substring(0)}`);
if (args.anOptionalNumber) {
assert(args.anOptionalNumber + 0 !== args.anOptionalNumber + '0');
}
},
};
This is better than adding as string
or whatever to every use, although it does not rise to the level of runtime type assertions. This addresses my concern, although I don't know if it addresses @slifty's.
In addition to rebasing to resolve merge conflicts, I'm also going to split out the infrastructure bits from the Candid-specific bits. That'll be in a separate PR, which this will ladder off. |
baseUrl: string, | ||
path: string, | ||
params: Record<string, string>, | ||
token: AccessTokenSet, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PDC functions I'd prefer the simpler string
here rather than forcing AccessTokenSet
. Somewhere in the middle (i.e. other functions that call PDC functions and have an AccessTokenSet
at hand) would call with accessTokenSet.access_token
. A script wouldn't be required to use the client_credentials
flow from oidc.ts
, then, but the client_credentials
flow would still be easy with the code from oidc.ts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think where I'm leaning now is to pass around a client (an AxiosInstance
, specifically) that includes both the base URL, the authentication header, common logging options, and whatever else. That way, other non-client-credential users can construct the client however they want.
(I mentioned this as a possibility earlier.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get to this, I'm afraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still prefer not to have the token here be strictly an AccessTokenSet
defined in oidc.ts
but the token string. If I find myself writing code to call this function later in another script, I might change it.
I've split off #48 for adding logging. After that, I'll add the yargs framework in a followup PR. Then, I'll rebase this PR - which will only focus on Candid - onto those. |
I split out #52 for adding yargs. |
374a1f6
to
a69beb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as done as it will be this month, I'm afraid. I'm not completely happy with where I ran out of time, but it works, it's a mostly-decent example of how to add commands to the unified script, and it passes the linter.
Sorry to drop this at the end of the day at the end of the week at the end of the month!
.option('candid-api-key', { | ||
describe: 'Candid Premier API key; get from account management at https://dashboard.candid.org/', | ||
demandOption: true, | ||
type: 'string', | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love repeating this stanza from lookup
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh, no big deal.
args.oidcBaseUrl as string, | ||
args.oidcClientId as string, | ||
args.oidcClientSecret as string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out a way to avoid these as being typed as unknown
.
baseUrl: string, | ||
path: string, | ||
params: Record<string, string>, | ||
token: AccessTokenSet, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get to this, I'm afraid.
ein: string, | ||
): Promise<CandidPremierResult> => { | ||
logger.debug(`Looking up EIN ${ein} in Candid Premier API`); | ||
const { data } = await client.get<CandidPremierResult>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this returns a 401 with npm start candid lookup 12-3456789 -- --candid-api-key='testing'
, I don't see the logger boilerplate. It's not a problem, really, but it differs from the behavior exemplified in npm start error
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, great catch, @bickelj! It turns out error handling behaves differently between synchronously throwing an error and asynchronously returning a Promise that later rejects. I've added a commit to fix that!
import axios, { InternalAxiosRequestConfig, AxiosResponse } from 'axios'; | ||
import { logger } from './logger'; | ||
|
||
const clientLogger = logger.child({}, { msgPrefix: '🌐 ' }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, what? I like it. But I wasn't able to make the debug messages happen yet. I guess it's related to when an error happens on the axios.get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to see it with LOG_LEVEL=debug
: LOG_LEVEL=debug npm start -- candid lookup 12345678
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't actually call Candid (no API key) but it looks good and trust that it works!
@jasonaowen probably worth getting this merged since it's been approved. |
07eedb1
to
12829cc
Compare
Start interacting with the Candid API by adding a command that takes an EIN and will either display the results or write them to a file. Build out an Axios client that has some reasonable defaults, including some light request and response logging.
Given an EIN, look up an organization in the Candid Premier API, and upload it to the PDC data platform provider cache. Unfortunately, the type inference around yargs commands is not quite powerful enough. I spent some time fighting the type checker and couldn't get it to do the right thing around reused `Options`; it just isn't able to figure them out, or at least I haven't been able to figure out how to help it do so correctly. It works better when the `builder` is a literal instead of a function, but positional arguments cannot be specified via the literal syntax, and the `check` validator is also only available via the function builder. So, sadly, back to sprinkling `as string` around.
Add the last command for storing Candid data in the PDC data platform provider cache: update-all. Fetch the list of applications from the PDC, find their EINs, call Candid with each EIN, and upload the response from Candid to the PDC. Happily, as this command does not include any positional arguments and so does not need to use the yargs function builder API, the type definitions are much better.
Yargs handles commands that throw synchronous errors differently from commands that return rejected promises. The example code introduced in commit 863379d threw a synchronous error, but the error returned by axios is via a rejected Promise. Configure yargs to not include its standard error handling behavior, so that our error handler will fire, and update the example error command to be async so as to exercise that behavior. Thanks to @bickelj for catching this! https://github.com/yargs/yargs/blob/main/docs/api.md#failfn--boolean
This is by no means finished, but I wanted to get something out before I left in case y'all are interested in looking before I get back to this.