Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generated arbitrary list from fixed inputs #450

Open
drobert opened this issue Jan 9, 2023 · 2 comments
Open

generated arbitrary list from fixed inputs #450

drobert opened this issue Jan 9, 2023 · 2 comments

Comments

@drobert
Copy link

drobert commented Jan 9, 2023

Testing Problem

In complicated data sets (e.g. for spark or similar) we often need to establish the 'bounds' or 'universe of allowed IDs' or similar such that we can produce a semi-arbitrary data set that still has high likelihood (or perfect likelihood) of joining together safely.

Consider testing a system that combines ad clicks with ad campaigns, with data types looking something like this (note this is an intentionally simplistic example):

// imagine getters/setters exist, etc.
class AdClick {
  private long adId;
  private Instant clickTime;
}
class Ad {
  private long id;
  private long campaignId;
}
class Campaign {
  private long id;
  private BigDecimal costPerClick;
  private BigDecimal bugdet;
}

Processing might join all ad clicks against all ads against campaigns to produce the total cost (clicks * costs per click) vs the budget for each campaign at some point in time. One important test case is the 'happy path' where all ad clicks correspond to an ad and all ads correspond to a campaign.

Mechanically, I think the approach would generally be to:

  1. produce Arbitrary<List<Campaign>> (some arbitrary list of campaigns, comprising the 'universe' of known campaigns)
  2. produce one or more Ad for each Campaign
  3. produce one or more AdClick for each Ad
    (alternatively, produce all known campaign ids and all known ad ids up-front and then generate the full objects from there).

In either case, there would at some point be within a flatMap operation List<Campaign> and we need to create at least one AdGroup for each Campaign. I don't see an approach that looks much different than this:

Arbitrary<List<Campaign>> arbCampaigns = ...;
arbCampaigns.flatMap(campaigns -> 
  // note: this is List<Arbitrary<T>> rather than Arbitrary<List<T>>.
  // semantically, it feels reasonable, the list isn't really arbitrary but 
  // each element within the list is arbitrary except for the corresponding campaign id
  List<Arbitrary<AdGroup>> arbAdGroups = 
    campaigns.stream()
    .map(Campaign::getId)
    .map(campaignId -> Arbitraries.longs().map(adId -> new AdGroup(id, campaignId))
  );
  // and a similar flatMap for the list of arbitrary ad clicks
)

Suggested Solution

I think it would be useful to have a built-in mechanism to go from List<Arbitrary<T>> to Arbitrary<List<T>>. (And/or Stream<Arbitrary<T>>). Something like:

// I picked 'sequence' as the common FP name for such an operation
public static <T> Arbitrary<List<T>> sequence(List<Arbitrary<T>> in) {
    return in.stream()
        .collect(
            () -> Arbitraries.just(new ArrayList<>()),
            (arbList, arb) -> arb.flatMap(e -> arbList.map(l -> l.add(e))),
            (l1, l2) -> {
                l1.flatMap(l1p -> l2.map(l2p -> {
                    l1p.addAll(l2p);
                    return l1p;
               }));
             }
        );
}

The above example would then look something like:

Arbitrary<List<Campaign>> arbCampaigns = ...;
arbCampaigns.flatMap(campaigns -> 
  List<Arbitrary<AdGroup>> tmpArbAdGroups = 
    campaigns.stream()
    .map(Campaign::getId)
    .map(campaignId -> Arbitraries.longs().map(adId -> new AdGroup(id, campaignId))
  );

  Arbitrary<List<AdGroup>> arbAdGroups = sequence(tmpArbAdGroups);
)
@drobert drobert changed the title generated fixed-list arbitrary list? generated fixed-size arbitrary list? Jan 9, 2023
@drobert drobert changed the title generated fixed-size arbitrary list? generated arbitrary list from fixed inputs Jan 9, 2023
@jlink
Copy link
Collaborator

jlink commented Jan 10, 2023

I haven't gone through your problem in detail (yet). Have you looked at
ListArbitrary.mapEach(..) and
ListArbitrary.flatMapEach(..)?

@jlink
Copy link
Collaborator

jlink commented Jan 10, 2023

A somewhat simpler example using flatMapEach for what I think you want to do:

@Property(tries = 10)
void addAgesToFixedListOfUsers(@ForAll("users") List<User> users) {
	System.out.println(users);
	// Assertions.assertThat(users).hasSize(1);
}

@Provide
Arbitrary<List<User>> users() {
	Arbitrary<String> names = Arbitraries.strings().alpha().ofLength(5);
	ListArbitrary<User> users = names.map(User::new).list().ofMinSize(1).ofMaxSize(5);
	return users.flatMapEach((allUsers, user) -> {
		IntegerArbitrary ages = Arbitraries.integers().between(0, 100);
		return ages.map(age -> {
			user.age = age;
			return user;
		});
	});
}

static class User {
	String name;
	int age = -1;

	public User(String name) {
		this.name = name;
	}

	@Override
	public String toString() {
		return "User{name='" + name + '\'' + ", age=" + age + '}';
	}
}

One could argue that there should be a simpler form for

return users.flatMapEach((allUsers, user) -> {
	IntegerArbitrary ages = Arbitraries.integers().between(0, 100);
	return ages.map(age -> {
		user.age = age;
		return user;
	});
});

Especially since allUsers is not needed in this case.
For example:

IntegerArbitrary ages = Arbitraries.integers().between(0, 100);
return users.combineEach(ages, (user, age) -> {
	user.age = age;
        return user;
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants